Graid sees cash potential in KV caching

Chris Mellor Chris Mellor Blocks & Files editor

Published thu 23 Apr 2026 // 11:29 UTC

GPU-powered RAID card supplier Graid reckons its RAIDed SSDs can accelerate KV caches for GPU servers and has a 3-pronged product strategy to earn KV cache cash revenue.

Its Agentic AI Storage Portfolio spans three deployment tiers: KV Cache Server, KV Cache Rack, and KV Cache Platform. The last one, KV Cache Platform, the highest tier, is aligned to Nvidia’s STX reference architecture, with native BlueField-4 DPU execution on the roadmap for H2 2026.

Graid Technology CEO Leander Yu said: “A year ago, at GTC 2025, Jensen Huang predicted that storage would become GPU-accelerated for the first time. This year, NVIDIA turned that concept into an architecture with STX and CMX. Our KV Cache Portfolio is built for precisely this moment, delivering the storage performance that agentic AI demands, at storage-tier economics."

BANDF AD

Nvidia’s STX architecture stores evicted KV Cache vector embeddings data in external SSD storage, with a direct pipe between it and a GPU’s HBM.

Graid explains the KV Caching is needed because “models running continuous multi-step tasks and maintaining context across hours of operation generate KV cache demands that overwhelm GPU HBM. The result: latency spikes up to 18x, GPU utilization as low as 50 percent, and model-level failures, including hallucinations and reasoning degradation.”

Graid’s SupremeRAID technology aggregates up to 32 NVMe SSDs into a single 280 GBps virtual pool, supports GPU Direct Storage, and delivers KV cache reads at 1.3ms- 77x faster than standard NVMe SSDs. Its three KV Cache products use this box;

BANDF AD

KV Cache Server - single-node NVMe acceleration for individual inference servers and edge AI deployments. Available now.
KV Cache Rack - rack-scale, partner-validated solutions co-engineered with server OEM partners for enterprise multi-GPU clusters. Available now.
KV Cache Platform - Purpose-built for Nvidia's STX reference architecture, with native BlueField-4 DPU execution and rack-scale storage expansion on the roadmap.

The KV Cache Rack is co-engineered with Supermicro, AIC, and Gigabyte, delivering shared, high-bandwidth NVMe storage across the entire AI cluster in a single rack.

For the KV Cache platform, “SupremeRAID serves as the G3.5 storage performance engine, the NVMe acceleration layer beneath BlueField-4 DPUs and DOCA Memos, making instant agentic context handoff between GPUs viable at inference speed.”

Native execution on the BlueField-4 DPU, available in 2H 2026, will expand SupremeRAID’s deployment model to run directly within the STX storage chassis; extending the platform from GPU-adjacent to DPU-native, and giving infrastructure teams a fully integrated STX storage node either with or without a discrete accelerator.

BANDF AD

Expanded drive count support will allow a single SupremeRAID instance to span multiple CMX chassis, serving an entire rack of STX storage nodes at significantly greater aggregate bandwidth from one virtualized pool, simplifying DOCA Memos namespace management and delivering rack-scale throughput from a single logical storage resource.

By supporting KV Cache offload to external storage and the STX ref’ architecture, Graid is joining other STX-supporting KV Cache extenders such as Cloudian, Dell, DDN, Everpure, Hammerspace, Hitachi Vantara, HPE, Lightbits/Scaleflux, MinIO, NetApp, Nutanix, Peak:AIO, Pliops, VAST Data and WEKA; virtually every file and object storage system supplier. KV Cahe and SX support is becoming tablestakes.

Read more in a Graid Solution brief document and a blog.

Graid sees cash potential in KV caching

Norway’s 2 petabytes of Huawei flash storage and LLM training

Storage news ticker - 22 May

LucidLink CEO says it's needed for AEC data center boom

Kioxia rides the AI wave to record revenues and a US listing

Huawei’s new stacking tech for high-capacity SSDs

Commvault sees ResOps as a business model, not malware prevention/recovery mechanics

PowerStore gets performance and capacity upgrades - and there’s more

Everpure’s immutable snapshots provide accelerated malware attack recovery

Dell's AI Factory getting supercharged storage

WD securing disk drives with post-quantum cryptography

Redis agentic AI flowers with Iris

Scality says Samsung is developing nearline SSDs up to 1 PB

Kioxia and Dell cram 10 PB into slim 2RU server

Kioxia launches XG10 PCIe 5.0 client SSD

HPE updates Alletras X and B10000, Zerto and Data Fabric in GreenLake private cloud update blast

The storage refresh that outlives the flash cycle

Scality’s Autonomous Data Infrastructure does agent-driven tiering and more

MinIO adds petabyte-scale MemKV cache for Nvidia GPU inference

MSP-focussed Virtuozzo goes all-in on AI

Ten enterprise AI storage systems reviewed and reported

DRAM and gloom-glut cyclicality

DDN storage being used in French Pangea supercomputer