Graid sees cash potential in KV caching

Published

GPU-powered RAID card supplier Graid reckons its RAIDed SSDs can accelerate KV caches for GPU servers and has a 3-pronged product strategy to earn KV cache cash revenue.

Its Agentic AI Storage Portfolio spans three deployment tiers: KV Cache Server, KV Cache Rack, and KV Cache Platform. The last one, KV Cache Platform, the highest tier, is aligned to Nvidia’s STX reference architecture, with native BlueField-4 DPU execution on the roadmap for H2 2026. 

Leander Yu

Graid Technology CEO Leander Yu said: “A year ago, at GTC 2025, Jensen Huang predicted that storage would become GPU-accelerated for the first time. This year, NVIDIA turned that concept into an architecture with STX and CMX. Our KV Cache Portfolio is built for precisely this moment, delivering the storage performance that agentic AI demands, at storage-tier economics."

Nvidia’s STX architecture stores evicted KV Cache vector embeddings data in external SSD storage, with a direct pipe between it and a GPU’s HBM.

Graid explains the KV Caching is needed because “models running continuous multi-step tasks and maintaining context across hours of operation generate KV cache demands that overwhelm GPU HBM. The result: latency spikes up to 18x, GPU utilization as low as 50 percent, and model-level failures, including hallucinations and reasoning degradation.”

Graid’s SupremeRAID technology aggregates up to 32 NVMe SSDs into a single 280 GBps virtual pool, supports GPU Direct Storage, and delivers KV cache reads at 1.3ms- 77x faster than standard NVMe SSDs. Its three KV Cache products use this box;

  • KV Cache Server - single-node NVMe acceleration for individual inference servers and edge AI deployments. Available now.
  • KV Cache Rack - rack-scale, partner-validated solutions co-engineered with server OEM partners for enterprise multi-GPU clusters. Available now.
  • KV Cache Platform - Purpose-built for Nvidia's STX reference architecture, with native BlueField-4 DPU execution and rack-scale storage expansion on the roadmap.

The KV Cache Rack is co-engineered with Supermicro, AIC, and Gigabyte, delivering shared, high-bandwidth NVMe storage across the entire AI cluster in a single rack.

For the KV Cache platform, “SupremeRAID serves as the G3.5 storage performance engine, the NVMe acceleration layer beneath BlueField-4 DPUs and DOCA Memos, making instant agentic context handoff between GPUs viable at inference speed.”

Native execution on the BlueField-4 DPU, available in 2H 2026, will expand SupremeRAID’s deployment model to run directly within the STX storage chassis; extending the platform from GPU-adjacent to DPU-native, and giving infrastructure teams a fully integrated STX storage node either with or without a discrete accelerator. 

Expanded drive count support will allow a single SupremeRAID instance to span multiple CMX chassis, serving an entire rack of STX storage nodes at significantly greater aggregate bandwidth from one virtualized pool, simplifying DOCA Memos namespace management and delivering rack-scale throughput from a single logical storage resource. 

By supporting KV Cache offload to external storage and the STX ref’ architecture, Graid is joining other STX-supporting KV Cache extenders such as Cloudian, Dell, DDN, Everpure, Hammerspace, Hitachi Vantara, HPE, Lightbits/Scaleflux, MinIO, NetApp, Nutanix, Peak:AIO, Pliops, VAST Data and WEKA; virtually every file and object storage system supplier. KV Cahe and SX support is becoming tablestakes.

Read more in a Graid Solution brief document and a blog.