Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Jul 21, 2025

While reviewing the code in OptimizedScalarQuantizer, I noticed that we are quantizing the same vector a few times, once when we computing the loss and then again when computing the next grid points.I wondered if we could reuse the valu between those two calls and avoid that repeated computation.

This PR does that, it uses the destination array to keep the quantize value during the loss computation and give to the method computing the grid points. In addition we can skip the final quantization of the vector if the method that optimize the intervals finishes without computing a worst loss.

The only side effect is that we need to remove the legacy method on osq. That's ok as it was only used for benchmark comparison.

The results how a clear speed up in both, scalar and vector variants.

Current values with 128 bits preferred size:

Benchmark (bits) (dims) Mode Cnt Score Error Units OptimizedScalarQuantizerBenchmark.scalar 1 384 thrpt 15 139.486 ± 23.817 ops/ms OptimizedScalarQuantizerBenchmark.scalar 1 702 thrpt 15 79.059 ± 14.286 ops/ms OptimizedScalarQuantizerBenchmark.scalar 1 1024 thrpt 15 50.415 ± 7.558 ops/ms OptimizedScalarQuantizerBenchmark.scalar 4 384 thrpt 15 136.449 ± 21.873 ops/ms OptimizedScalarQuantizerBenchmark.scalar 4 702 thrpt 15 69.242 ± 15.013 ops/ms OptimizedScalarQuantizerBenchmark.scalar 4 1024 thrpt 15 43.425 ± 1.643 ops/ms OptimizedScalarQuantizerBenchmark.scalar 7 384 thrpt 15 149.420 ± 16.853 ops/ms OptimizedScalarQuantizerBenchmark.scalar 7 702 thrpt 15 77.437 ± 6.671 ops/ms OptimizedScalarQuantizerBenchmark.scalar 7 1024 thrpt 15 53.494 ± 7.536 ops/ms OptimizedScalarQuantizerBenchmark.vector 1 384 thrpt 15 562.416 ± 46.832 ops/ms OptimizedScalarQuantizerBenchmark.vector 1 702 thrpt 15 306.875 ± 47.434 ops/ms OptimizedScalarQuantizerBenchmark.vector 1 1024 thrpt 15 216.386 ± 26.207 ops/ms OptimizedScalarQuantizerBenchmark.vector 4 384 thrpt 15 509.608 ± 85.495 ops/ms OptimizedScalarQuantizerBenchmark.vector 4 702 thrpt 15 292.796 ± 55.263 ops/ms OptimizedScalarQuantizerBenchmark.vector 4 1024 thrpt 15 187.569 ± 15.714 ops/ms OptimizedScalarQuantizerBenchmark.vector 7 384 thrpt 15 539.447 ± 42.931 ops/ms OptimizedScalarQuantizerBenchmark.vector 7 702 thrpt 15 309.357 ± 27.685 ops/ms OptimizedScalarQuantizerBenchmark.vector 7 1024 thrpt 15 114.017 ± 71.001 ops/ms 

With this PR:

Benchmark (bits) (dims) Mode Cnt Score Error Units OptimizedScalarQuantizerBenchmark.scalar 1 384 thrpt 15 169.414 ± 23.188 ops/ms OptimizedScalarQuantizerBenchmark.scalar 1 702 thrpt 15 87.899 ± 9.614 ops/ms OptimizedScalarQuantizerBenchmark.scalar 1 1024 thrpt 15 62.872 ± 10.971 ops/ms OptimizedScalarQuantizerBenchmark.scalar 4 384 thrpt 15 161.959 ± 31.947 ops/ms OptimizedScalarQuantizerBenchmark.scalar 4 702 thrpt 15 81.247 ± 6.511 ops/ms OptimizedScalarQuantizerBenchmark.scalar 4 1024 thrpt 15 58.583 ± 17.166 ops/ms OptimizedScalarQuantizerBenchmark.scalar 7 384 thrpt 15 181.835 ± 21.244 ops/ms OptimizedScalarQuantizerBenchmark.scalar 7 702 thrpt 15 97.614 ± 15.205 ops/ms OptimizedScalarQuantizerBenchmark.scalar 7 1024 thrpt 15 65.772 ± 9.829 ops/ms OptimizedScalarQuantizerBenchmark.vector 1 384 thrpt 15 638.882 ± 80.574 ops/ms OptimizedScalarQuantizerBenchmark.vector 1 702 thrpt 15 369.157 ± 44.456 ops/ms OptimizedScalarQuantizerBenchmark.vector 1 1024 thrpt 15 245.174 ± 31.757 ops/ms OptimizedScalarQuantizerBenchmark.vector 4 384 thrpt 15 615.784 ± 110.064 ops/ms OptimizedScalarQuantizerBenchmark.vector 4 702 thrpt 15 363.637 ± 82.684 ops/ms OptimizedScalarQuantizerBenchmark.vector 4 1024 thrpt 15 211.976 ± 12.900 ops/ms OptimizedScalarQuantizerBenchmark.vector 7 384 thrpt 15 686.756 ± 64.638 ops/ms OptimizedScalarQuantizerBenchmark.vector 7 702 thrpt 15 356.240 ± 37.930 ops/ms OptimizedScalarQuantizerBenchmark.vector 7 1024 thrpt 15 245.471 ± 6.831 ops/ms 
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @iverase, I've created a changelog YAML for you.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization makes sense to me.

We don't need to keep the legacy interface.

My only concern is making sure recall is unchanged. Looking at the code, all the paths already did a "Math.round" except now some of the paths are using int instead of rounding floats. Which is fine.

The speed ups are hilarious!

@iverase
Copy link
Contributor Author

iverase commented Jul 22, 2025

My only concern is making sure recall is unchanged.

I am pretty sure the new code is equivalent to the old one, we are just caching the results from the resulls of Math.round between function calls.

@iverase iverase merged commit 4468239 into elastic:main Jul 22, 2025
33 checks passed
@iverase iverase deleted the speed_osq branch July 22, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

3 participants