Skip to content

Conversation

@b41sh
Copy link
Member

@b41sh b41sh commented Oct 18, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR improves inverted index support for VARIANT data:

  1. Tantivy Upgrade: Upgraded the underlying Tantivy search engine to its latest version, bringing performance improvements and new features.
  2. Tantivy Search Integration: Replaced custom search functions with Tantivy's search capabilities. This change delivers better query performance, though it may result in a slightly larger index file size, we can improve performance by caching index data locally.
  3. Advanced VARIANT Field Search: Introduced the ability to perform complex searches within VARIANT internal fields, including AND, OR, IN, and Range queries. This allows for precise and flexible filtering of semi-structured data.

for example

CREATE OR REPLACE TABLE test ( id INT NULL, data VARIANT NULL, INVERTED INDEX idx1 (data) ); INSERT INTO test VALUES (1, '{"user":{"name":"Alice","age":20,"hobbies":["football","swimming"]}}'), (2, '{"user":{"name":"Bob","age":25,"hobbies":["shopping","piano"]}}'), (3, '{"user":{"name":"Tom","age":30,"hobbies":["travel","running"]}}'); SELECT * FROM test WHERE query('data.user.name:Bob AND data.user.age:25'); ╭───────────────────────────────────────────────────────────────────────────────────╮ │ id │ data │ │ Nullable(Int32) │ Nullable(Variant) │ ├─────────────────┼─────────────────────────────────────────────────────────────────┤ │ 2 │ {"user":{"age":25,"hobbies":["shopping","piano"],"name":"Bob"}} │ ╰───────────────────────────────────────────────────────────────────────────────────╯ SELECT * FROM test WHERE query('data.user.name:Bob OR data.user.age:30'); ╭───────────────────────────────────────────────────────────────────────────────────╮ │ id │ data │ │ Nullable(Int32) │ Nullable(Variant) │ ├─────────────────┼─────────────────────────────────────────────────────────────────┤ │ 2 │ {"user":{"age":25,"hobbies":["shopping","piano"],"name":"Bob"}} │ │ 3 │ {"user":{"age":30,"hobbies":["travel","running"],"name":"Tom"}} │ ╰───────────────────────────────────────────────────────────────────────────────────╯ SELECT * FROM test WHERE query('data.user.hobbies: IN [football shopping]'); ╭────────────────────────────────────────────────────────────────────────────────────────╮ │ id │ data │ │ Nullable(Int32) │ Nullable(Variant) │ ├─────────────────┼──────────────────────────────────────────────────────────────────────┤ │ 1 │ {"user":{"age":20,"hobbies":["football","swimming"],"name":"Alice"}} │ │ 2 │ {"user":{"age":25,"hobbies":["shopping","piano"],"name":"Bob"}} │ ╰────────────────────────────────────────────────────────────────────────────────────────╯ SELECT * FROM test WHERE query('data.user.age: [25 TO 30]'); ╭───────────────────────────────────────────────────────────────────────────────────╮ │ id │ data │ │ Nullable(Int32) │ Nullable(Variant) │ ├─────────────────┼─────────────────────────────────────────────────────────────────┤ │ 2 │ {"user":{"age":25,"hobbies":["shopping","piano"],"name":"Bob"}} │ │ 3 │ {"user":{"age":30,"hobbies":["travel","running"],"name":"Tom"}} │ ╰───────────────────────────────────────────────────────────────────────────────────╯

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Oct 18, 2025
@BohuTANG BohuTANG added the ci-cloud Build docker image for cloud test label Oct 19, 2025
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-18861-c5331ec-1760855997

note: this image tag is only available for internal use.

@b41sh b41sh force-pushed the feat-inverted-index-json branch from b47670f to 7960533 Compare October 19, 2025 08:16
@b41sh b41sh requested review from BohuTANG and sundy-li October 19, 2025 08:42
@b41sh b41sh marked this pull request as ready for review October 19, 2025 08:43
@BohuTANG BohuTANG merged commit 4e7358c into databendlabs:main Oct 20, 2025
249 of 253 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase

2 participants