WD's chief product bod peers into the AI data system future
Ahmed Shihab believes that AI data storage needs will grow so much that AI datacenters need treating as data systems ensuring that the underlying data architecture can support data lifecycle, scale, cost, durability, and availability demands over time.
He sets out his views in a blog entitled "AI Data Centers are Data Systems, not Merely Compute Systems," which we saw before publication.
While it's not surprising for an HDD company to say AI data generation means we’ll need more disk drives; the WD’s Chief Product Officer's blog contains clues about the future product direction of what is inarguably one of the top players in the industry.
Currently AI focused datacenters are largely focusing on compute but data amounts are scaling and that’s going to change the focus.
Shihab writes: “Data compounds. It grows with every training run, every token, every inference cycle, and every interaction. Think of all the contexts being built up with every inference run, across more than a billion users.”
And then he says: “Once inference begins, storage keeps growing. Compute does not. … [Compute] Capacity shifts from training to inference, is repurposed across workloads, and becomes more efficient over time. The same infrastructure delivers more output as software improves.”
Not so with storage as the growing capacity can overwhelm the infrastructure: “A single 5-second AI video generation produces operational exhaust — logs, traces, intermediate outputs, metadata — that can match the output itself in size, before any retention for tuning, compliance, or audit. Multiply that across billions of daily inferences and the math stops being incidental. It becomes structural.”
Okay. Data used by and generated by AI inferencing will grow and grow. What does that mean?
He thinks storage “becomes the foundation upon which the rest of the system depends, to deliver business results.”
The first realization from this is that “AI datacenters are not built on a single storage layer—they are systems composed of multiple tiers, each optimized for different workloads across the data lifecycle. High-performance tiers support active inference and real-time access, while capacity-optimized tiers store most retained data—logs, embeddings, outputs, and historical context—that accumulate over time.”
Okay again. That means SSD and HDD tiers. Most of us already know this.
He says particular aspects of storage are becoming more important; durability, replication, and predictability. Storage systems as a whole will be defined by “how data is placed, how it moves, and how it is managed across a distributed architecture. Data keeps moving behind the scenes to the lowest cost and is constantly being read and written for durability, even when the user is not accessing it.”
That last thought is revealing: “Data keeps moving behind the scenes to the lowest cost and is constantly being read and written for durability.”
To us, this implies object, tape or some other archival storage where durability matters. But none of these are current WD technologies. In fact WD sold off its ActiveScale object storage business to Quantum several years ago. Is WD now considering providing disk or tape-based storage for applications needing high data durability?
Shihab’s blog says: "The systems that succeed will be those designed with this reality in mind: not as compute environments, but as data systems — where storage is foundational, architecture is tiered, and scale is defined by how effectively data is retained, managed, and used over time.”
WD had an IntelliFlash storage systems business in the past which it sold off to DDN in 2019. It has remnants of this with OpenFlex and RapidFlex but these are also SSD-based. Is WD now thinking about offering architecturally tiered storage products? Or disk drives dedicated to particular storage tiers, such as spun-down disks for a cold data archive tier?
We have asked WD these questions and look forward to presenting the answers.