Skip to content

Conversation

@cnaples79
Copy link

Summary

Implements an optimized contains_prefix method to efficiently check if any key with a given prefix exists in the tree.

Motivation

Currently, users check for prefix existence using .prefix().next()?.is_some(), which has significant overhead:

  • Sets up a full merge iterator across all levels
  • Creates an MVCC stream wrapper
  • Extracts complete key-value pairs

This PR provides a more efficient alternative that returns as soon as any matching key is found.

Implementation

The new contains_prefix method:

  1. Memtable check: Uses skip list range iteration to find keys in the prefix range
  2. Sealed memtables check: Iterates through sealed memtables in reverse order (newest first)
  3. SST table check: Uses table range iterators with key range overlap checks
  4. Early exit: Returns true immediately upon finding the first matching key
  5. MVCC semantics: Respects sequence number filtering throughout

API

fn contains_prefix<K: AsRef<[u8]>>(&self, prefix: K, seqno: SeqNo) -> Result<bool>

Example usage:

let tree = Config::new(folder).open()?; tree.insert("prefix:key1", "value1", 0); tree.insert("prefix:key2", "value2", 1); tree.insert("other", "value3", 2); assert!(tree.contains_prefix("prefix", 3)?); // true assert!(!tree.contains_prefix("nonexistent", 3)?); // false

Performance Benefits

  • No merge iterator overhead: Avoids setting up iterators across all levels
  • Early termination: Stops searching at first match instead of building full iterator
  • Bloom filter utilization: SST tables still use bloom filters for quick elimination
  • Range optimization: Leverages existing optimized range query paths

Testing

  • Includes doc tests demonstrating usage
  • Follows existing patterns for point reads (similar to get)
  • Maintains MVCC isolation semantics

Closes #138

Implements a more efficient way to check if any key with a given prefix exists, avoiding the overhead of setting up a full merge iterator. The implementation: - Checks memtables using range iteration on the skip list - Checks SST tables using existing range iterators - Returns early as soon as a matching key is found - Respects MVCC semantics with sequence number filtering This is significantly more efficient than the current pattern of using `tree.prefix().next()?.is_some()` because it avoids initializing merge iterators across all levels and stops at the first match. Closes fjall-rs#138
@marvin-j97
Copy link
Contributor

There is kind of an edge case where a prefix is fully tombstoned, in that case, the function could return a false positive (should be easy to test).
That's probably acceptable (because handling tombstones just ends up doing a proper merge again), but then it should be called maybe_contains_prefix.

And actually looking at it, @zaidoon1 already implemented this in https://github.com/fjall-rs/lsm-tree/pull/186/files#diff-ee87832434edd670d369ff1e6d5c0b08c959ef0cd85e865a7b726451f623fbf2R329-R361, but not for AbstractTree.

@marvin-j97 marvin-j97 added enhancement New feature or request api type:table labels Nov 29, 2025
@zaidoon1
Copy link
Contributor

, but not for AbstractTree.

I can add support for that, although since the prefix filter pr is not going to be merged soon, i'll hold off until you are ready to merge it before i fix merge conflicts/address this feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api enhancement New feature or request type:table

3 participants