Qdrant teaser.
Qdrant teaser.

Qdrant’s AI vector search is faster, auditable and more available

Composable vector search and database supplier Qdrant has announced faster indexing, 3-way clusters for instant failover, and audit logging for compliance.

The company produces a stand-alone vector database to store the embeddings used by Large Language Models (LLMs) and autonomous AI agents involved in AI inferencing activities such as Retrieval-Augmented Generation (RAG). It says that enterprise AI service acquisition teams ask three basic questions about any vector search offering: can it support growing workloads, can it stay up if infrastructure components fail, and can we audit what it does?

Qdrant co-founder and CEO André Zayarni.
Qdrant co-founder and CEO André Zayarni.

Andre Zayarni, Qdrant’s CEO and co-founder, said: “GPUs aren't just for model inference. They're for indexing too. We've supported GPU-accelerated HNSW construction in open source since v1.13, and now it's available in Qdrant Cloud. Pair that with multi-AZ replication and audit logging, and enterprise teams have everything they need to run Qdrant in production for their most critical workloads. 

Qdrant is announcing:

  • GPU-accelerated indexing delivering up to 4x faster HNSW (Hierarchical Navigable Small World) index builds on dedicated GPUs in Qdrant Cloud, based on Qdrant benchmarks. Customers can add GPUs to existing clusters for high-volume indexing bursts. 
  • Multi-Availability Zone (AZ) clusters replicate data across three availability zones within a region through cross-AZ replication — not failover. If an availability zone goes down, reads and writes continue from the surviving zones with no failover delay and no customer action required.
  • Audit logging captures all operations performed through the Qdrant API: queries, upserts, deletes, collection management, and snapshot operations. Each entry is structured JSON with user and API key attribution, timestamp, target collection, and result of the action (allowed or denied).

When an autonomous system acts on retrieved context, audit logging provides the trail showing which service queried which collection, when, and whether the request was authorized. Retention is configurable; for long-term needs, logs can be downloaded via the API and stored externally.

GPU-accelerated indexing is available today on AWS, with additional cloud providers and regions on the roadmap. Multi-AZ clusters are available on QDdrant’s Premium Multi-AZ tier, offering up to 99.95 percent uptime SLAs. Audit logging is available on all paid Qdrant Cloud clusters.

Find out more about these three items here.

Bootnote

Hierarchical Navigable Small World (HNSW) search aims to find the nearest neighbour in a vector database to a query vector. As we understand it, HNSW views vectors as being connected to other similar vectors in a graph relationship. These graphs become huge as vector numbers increase. HNSW creates layers of virtual graphs with the first, sparse, layer having few vectors, so the nearest neighbour is quickly found, and subsequent ones having more, with the final layer having all vectors present. The algorithm works such that each layer provides a shortcut entry point to the next layer, shortening overall search time.

Qdrant competitor Pinecone supports HNSW and says "Pinecone's inference infrastructure leverages Nvida GPUs to optimize embedding and reranking performance." Zilliz also supports HNSW and GPU-acceleration; "Milvus, which Zilliz Cloud is built on, introduced GPU indexing capabilities powered by Nvidia's CUDA-Accelerated Graph Index for Vector Retrieval (CAGRA), part of the RAPIDS cuVS library."