- Notifications
You must be signed in to change notification settings - Fork 25.5k
ESQL - Add byte element support for dense_vector data type #131863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL - Add byte element support for dense_vector data type #131863
Conversation
…vector-byte-element-support # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java
private final FloatVectorValues floatVectorValues; | ||
private final KnnVectorValues.DocIndexIterator iterator; | ||
private final int dimensions; | ||
private abstract static class DenseVectorValuesBlockReader<T extends KnnVectorValues> extends BlockDocValuesReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added abstract classes to deal with common code between float and byte vector reading
* @param vectorBR - dense vector encoded in BytesRef | ||
* @param vector - array of bytes where the decoded vector should be stored | ||
*/ | ||
public static void decodeDenseVector(IndexVersion indexVersion, BytesRef vectorBR, byte[] vector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed a specific method for decoding dense vector of byte values - this is an adaptation of the existing float[] method (expand up to see it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A copy of the float vector tests, using the specific byte field
| ||
FROM dense_vector | ||
| KEEP id, vector | ||
| KEEP id, float_vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed fields to avoid confusion to float_vector, byte_vector
// Indexed field types | ||
for (String indexType : DENSE_VECTOR_INDEX_TYPES) { | ||
params.add(new Object[] { indexType, true, false }); | ||
for (String indexType : ALL_DENSE_VECTOR_INDEX_TYPES) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check all types for float and bytes. Quantized types only make sense for floats.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the approach makes sense to me, but there are some failures in KnnFunctionIT that we should look into
/** | ||
* Byte elements dense vector field type support. | ||
*/ | ||
DENSE_VECTOR_FIELD_TYPE_BYTE_ELEMENTS(EsqlCorePlugin.DENSE_VECTOR_FEATURE_FLAG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this make sense to me for now - when we get to take these features out of snapshot, maybe we can create single capability to encompass all the features, from dense_vector support to the knn function and brute force functions and we can remove the individual ones we have created for each step.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/data/dense_vector.csv Show resolved Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great job Carlos!
…r-support-normalization' into non-issue/esql-dense-vector-support-normalization # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java
… non-issue/esql-dense-vector-byte-element-support # Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/BlockDocValuesReader.java # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/KnnFunctionIT.java
…r-byte-element-support' into non-issue/esql-dense-vector-byte-element-support # Conflicts: # x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/DenseVectorFieldTypeIT.java
…r-byte-element-support' into non-issue/esql-dense-vector-byte-element-support
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good first step. LGTM
Merging as CI failures are unrelated |
* upstream/main: (278 commits) ESQL - dense vector support cosine normalization (elastic#132721) [ML] Add support for dimensions in google vertex ai request (elastic#132689) ESQL - Add byte element support for dense_vector data type (elastic#131863) ESQL: Fix async operator warnings not always sent when blocking (elastic#132744) Method not needed anymore (elastic#132912) [Test] Excercise shutdown more reliably in snapshot stress IT (elastic#132909) Update Gradle shadow plugin to 9.0.1 (elastic#132637) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/410_named_queries/named_queries_with_score} elastic#132906 Update docker.elastic.co/wolfi/chainguard-base-fips:latest Docker digest to fa6cb69 (elastic#132735) Remove unnecessary calls to fold() (elastic#131870) Use consistent terminology for transport version resources/references (elastic#132882) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search.vectors/40_knn_search_cosine/kNN search only regular query} elastic#132890 Finalize release notes for v9.1.2 release (elastic#132745) Finalize release notes for v9.0.5 release (elastic#132718) Move inner records out of TransportVersionUtils (elastic#132872) Add support for Lookup Join on Multiple Fields (elastic#131559) Bootstrap PR-based benchmarks (elastic#132717) Refactor MetadataIndexTemplateService to use template maps instead of project metadata (elastic#132662) [Gradle] Update nebula ospackage plugin to 12.1.0 (elastic#132640) Mute org.elasticsearch.xpack.esql.CsvTests test {csv-spec:ip.CdirMatchEqualsInsOrs} elastic#132860 ...
Adds code and tests for supporting byte dense_vectors.
Internally, byte vectors get widened to floats. This is inefficient, but avoids having to create specific infrastructure for variable type blocks. This can be improved in the future.