Introducing a new vector storage format: DiskBBQ

Test Elastic's leading-edge, out-of-the-box capabilities. Dive into our sample notebooks, start a free cloud trial, or try Elastic on your local machine now.

DiskBBQ is an evolution of an Inverted Vector File (IVF) index. It is an alternative to Hierarchical Navigable Small Worlds (HNSW), partitioning vectors into smaller clusters that can handle lower-memory scenarios more efficiently while providing nice query performance.

What’s wrong with HNSW?

We absolutely love HNSW. It’s a fast, compute-efficient algorithm that scales logarithmically to your vector data. However, that speed comes at a cost. For HNSW to work well, all the vectors need to reside in RAM. And while we have made this cheaper by adding increasingly better levels of quantization, we still have issues with vectors falling out of memory and performance plummeting.

Additionally, to index, you must search your existing HNSW graph. So all the memory costs that incur for RAM are also present during indexing. This can significantly slow down hardware that cannot fully hold the vectors in RAM.

How is DiskBBQ different?

DiskBBQ uses Hierarchical K-means to partition vectors into small clusters. It then selects representative centroids to query before querying the individual vectors themselves. It takes advantage of the multi-layer nature of this hierarchy to query at most 2 layers of the centroids, limiting the total explored space. Finally, it explores the vectors contained in each cluster by bulk scoring the distance between the cluster’s vectors and the query vector.

Figure 0. Example Hierarchical KMeans algorithm flow. Splitting sections of vectors, clustering each section, and recursively splitting until the desired partition size is reached.

As the name implies, DiskBBQ also uses our BBQ (Better Binary Quantization) to reduce the size of the vectors and centroids. This allows loading many blocks of vectors into memory at a time for fast scoring, with very low memory requirements and low disk overhead.

Another interesting aspect of DiskBBQ is that it will allow vectors to be assigned to more than one centroid. It utilizes a version of Google’s Spilling with Orthogonality-Amplified Residuals (SOAR) to assign vectors to multiple clusters, which is particularly beneficial when a vector is near a border between two clusters. Because vectors are quantized heavily, this adds minimal disk overhead and requires fewer centroids to be explored during search.

Talk is cheap, show me some numbers

HNSW scales logarithmically. So, it must be many magnitudes faster than something that must search centroids and then score all vectors in each centroid, right?

We have been working hard to make DiskBBQ fully competitive with HNSW. While we are not yet 100% there in all situations, we are happy with the results.

DiskBBQ takes advantage of bulk scoring vectors and performs as many operations as it can off-heap. This means we can read vectors from files directly into memory for optimized vector operations, resulting in some pretty nice performance.

Two main scenarios for DiskBBQ are particularly interesting. The first is one where the entire index can fit in RAM. Since this is key for HNSW performance, it's only fair to see how DiskBBQ compares in this case.

Type	Index Time	Latency	Recall
HNSW BBQ	1,054,319ms	~3.4ms	92%
DiskBBQ	94,075ms	~4.0ms	91%

Table 0. How DiskBBQ compares over 1M vectors. All sitting comfortably in memory. DiskBBQ can be a whopping 10x faster at indexing speed while almost being as fast as HNSW over BBQ quantized vectors.

Things get more interesting when memory is increasingly reduced to where the entire index no longer fits in memory. We will have a separate blog digging into why DiskBBQ is better in low-memory and our methodology for testing. However, here are some high-level numbers:

Index time in milliseconds vs. RAM / JVM Heap — Figure 1: Shows index time in milliseconds vs. RAM / JVM Heap

As overall memory decreases, DiskBBQ indexing degrades gracefully. HNSW, however, simply falls apart once memory becomes very restricted.

Figure 2: Shows search latency in milliseconds vs. RAM / JVM Heap

Again, we see DiskBBQ degrading gracefully in search latency. Indeed, it starts to slow down, but HNSW latency begins to increase exponentially as memory becomes increasingly restricted.

When should I use DiskBBQ?

DiskBBQ and HNSW will both continue to get improvements in Elasticsearch. For now, DiskBBQ’s performance for very high recall (99+%) and very low latency is not as good as HNSW. We hope to improve this while continuing to invest in HNSW.

If you need very, very high recall, have lots of off-heap memory (or are willing to pay for it), and have few index updates (so indexing costs are low), using HNSW with some form of quantization is likely still the best option.

However, if you are okay with 95% or less recall, are cost-sensitive, but still want fast search, DiskBBQ might be the solution for you.

How can I use DiskBBQ?

Here's how to enable DiskBBQ and start querying:

Set up a mapping like this:

Insert a vector:

Here’s an example of querying out that vector:

You can still use num_candidates to control approximation.

Here’s an example of querying out that vector:

Or, if you want more granular control over how many vectors the search will consider, you can set the visit_percentage directly.

Here’s an example of querying out that vector:

DiskBBQ will soon be available in Elasticsearch Serverless.

Report an issue

Related content

Multi-tenancy in Elastic Cloud on Kubernetes deployments: Example architectures

Inside Elastic Operations

October 7, 2025

Multi-tenancy in Elastic Cloud on Kubernetes deployments: Example architectures

Explore architectural strategies for multi-tenancy ECK deployments, including soft vs hard multi-tenancy, Kubernetes isolation, and Elastic operator considerations.

By: Lorenzo Soligo

Experiments in improving Agentic AI tools for Elasticsearch

Agentic AI Inside Elastic+1

October 6, 2025

Experiments in improving Agentic AI tools for Elasticsearch

Learn how we improved AI agent workflows for Elasticsearch through iterative experiments by combining linear retrievers, hybrid search, and semantic_text for scalable RAG optimization.

By: Sean Story

Your first Elastic Agent: From a single query to an AI-powered chat

AI Agentic AI+1

September 25, 2025

Your first Elastic Agent: From a single query to an AI-powered chat

Learn how to use Elastic’s AI Agent builder to create specialized AI agents. In this blog, we'll be building a financial AI Agent.

By: Jeff Vestal

Building AI Agentic workflows with Elasticsearch

AI Agentic AI+1

September 23, 2025

Building AI Agentic workflows with Elasticsearch

Learn about Agent Builder, a new AI layer in Elasticsearch that provides a framework for building AI agentic workflows, using hybrid search to provide agents with the context they need to reason and act.

AM DJ

By: Anish Mathur and Dana Juratoni