Skip to content

Conversation

carlosdelest
Copy link
Member

Adds code and tests for supporting byte dense_vectors.

Internally, byte vectors get widened to floats. This is inefficient, but avoids having to create specific infrastructure for variable type blocks. This can be improved in the future.

@carlosdelest carlosdelest added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0 labels Jul 24, 2025
private final FloatVectorValues floatVectorValues;
private final KnnVectorValues.DocIndexIterator iterator;
private final int dimensions;
private abstract static class DenseVectorValuesBlockReader<T extends KnnVectorValues> extends BlockDocValuesReader {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added abstract classes to deal with common code between float and byte vector reading

* @param vectorBR - dense vector encoded in BytesRef
* @param vector - array of bytes where the decoded vector should be stored
*/
public static void decodeDenseVector(IndexVersion indexVersion, BytesRef vectorBR, byte[] vector) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed a specific method for decoding dense vector of byte values - this is an adaptation of the existing float[] method (expand up to see it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A copy of the float vector tests, using the specific byte field


FROM dense_vector
| KEEP id, vector
| KEEP id, float_vector
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed fields to avoid confusion to float_vector, byte_vector

// Indexed field types
for (String indexType : DENSE_VECTOR_INDEX_TYPES) {
params.add(new Object[] { indexType, true, false });
for (String indexType : ALL_DENSE_VECTOR_INDEX_TYPES) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check all types for float and bytes. Quantized types only make sense for floats.

@carlosdelest carlosdelest marked this pull request as ready for review July 24, 2025 19:09
@elasticsearchmachine elasticsearchmachine removed the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the approach makes sense to me, but there are some failures in KnnFunctionIT that we should look into

/**
* Byte elements dense vector field type support.
*/
DENSE_VECTOR_FIELD_TYPE_BYTE_ELEMENTS(EsqlCorePlugin.DENSE_VECTOR_FEATURE_FLAG);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this make sense to me for now - when we get to take these features out of snapshot, maybe we can create single capability to encompass all the features, from dense_vector support to the knn function and brute force functions and we can remove the individual ones we have created for each step.

Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great job Carlos!

@carlosdelest carlosdelest marked this pull request as draft August 6, 2025 15:14
carlosdelest and others added 16 commits August 12, 2025 14:20
…r-support-normalization' into non-issue/esql-dense-vector-support-normalization # Conflicts: #	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java
… non-issue/esql-dense-vector-byte-element-support # Conflicts: #	server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java #	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java #	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/KnnFunctionIT.java
…r-byte-element-support' into non-issue/esql-dense-vector-byte-element-support # Conflicts: #	x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java
…r-byte-element-support' into non-issue/esql-dense-vector-byte-element-support
@carlosdelest carlosdelest marked this pull request as ready for review August 13, 2025 10:51
@carlosdelest carlosdelest added Team:ES|QL :Search Relevance/ES|QL Search functionality in ES|QL labels Aug 13, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good first step. LGTM

@carlosdelest
Copy link
Member Author

Merging as CI failures are unrelated

@carlosdelest carlosdelest merged commit 26ffd7f into elastic:main Aug 14, 2025
31 of 34 checks passed
@carlosdelest carlosdelest mentioned this pull request Aug 14, 2025
14 tasks
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Aug 14, 2025
szybia added a commit to szybia/elasticsearch that referenced this pull request Aug 15, 2025
* upstream/main: (278 commits) ESQL - dense vector support cosine normalization (elastic#132721) [ML] Add support for dimensions in google vertex ai request (elastic#132689) ESQL - Add byte element support for dense_vector data type (elastic#131863) ESQL: Fix async operator warnings not always sent when blocking (elastic#132744) Method not needed anymore (elastic#132912) [Test] Excercise shutdown more reliably in snapshot stress IT (elastic#132909) Update Gradle shadow plugin to 9.0.1 (elastic#132637) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/410_named_queries/named_queries_with_score} elastic#132906 Update docker.elastic.co/wolfi/chainguard-base-fips:latest Docker digest to fa6cb69 (elastic#132735) Remove unnecessary calls to fold() (elastic#131870) Use consistent terminology for transport version resources/references (elastic#132882) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search.vectors/40_knn_search_cosine/kNN search only regular query} elastic#132890 Finalize release notes for v9.1.2 release (elastic#132745) Finalize release notes for v9.0.5 release (elastic#132718) Move inner records out of TransportVersionUtils (elastic#132872) Add support for Lookup Join on Multiple Fields (elastic#131559) Bootstrap PR-based benchmarks (elastic#132717) Refactor MetadataIndexTemplateService to use template maps instead of project metadata (elastic#132662) [Gradle] Update nebula ospackage plugin to 12.1.0 (elastic#132640) Mute org.elasticsearch.xpack.esql.CsvTests test {csv-spec:ip.CdirMatchEqualsInsOrs} elastic#132860 ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue :Search Relevance/ES|QL Search functionality in ES|QL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

5 participants