feat: support FTS query execution for LSM scanner#5905

Open

touch-of-grey wants to merge 1 commit intolance-format:mainfrom

touch-of-grey:LsmFTSQueryPlan

Contributor

touch-of-grey commented Feb 7, 2026

Based on previous discussion, separate out FTS query plan since it requires global BM25. @jackye1995 please take a look

This will calculate global BM25 and then use the same scorer to rank across different inverted indexes, similar to how Lucene does it.

github-actions bot added the enhancement label

jackye1995 self-requested a review

February 7, 2026 08:18

Contributor

jackye1995 commented Feb 7, 2026

Thanks, I will take a look tomorrow morning

codecov bot commented Feb 7, 2026

Codecov Report

❌ Patch coverage is 37.00121% with 521 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/dataset/mem_wal/index/fts.rs	0.00%	317 Missing ⚠️
...st/lance/src/dataset/mem_wal/scanner/fts_search.rs	62.67%	98 Missing and 8 partials ⚠️
rust/lance/src/dataset/mem_wal/scanner/builder.rs	11.76%	29 Missing and 1 partial ⚠️
rust/lance-index/src/scalar/inverted/scorer.rs	0.00%	23 Missing ⚠️
rust/lance-index/src/scalar/inverted/index.rs	45.83%	13 Missing ⚠️
rust/lance/src/io/exec/fts.rs	45.83%	11 Missing and 2 partials ⚠️
rust/lance/src/dataset/scanner.rs	0.00%	8 Missing ⚠️
...e/src/dataset/mem_wal/memtable/scanner/exec/fts.rs	25.00%	5 Missing and 1 partial ⚠️
...ce/src/dataset/mem_wal/memtable/scanner/builder.rs	16.66%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

jackye1995 reviewed

View reviewed changes

rust/lance-index/src/scalar/inverted/query.rs Outdated

Contributor

jackye1995 Feb 14, 2026

cross-generation scoring is too specific to LSM, make the comment more generic

rust/lance-index/src/scalar.rs Outdated

Contributor

jackye1995 Feb 14, 2026

cross-generation scoring is too specific to LSM, make the comment more generic

rust/lance/src/dataset/mem_wal/index/fts.rs Outdated

Contributor

jackye1995 Feb 14, 2026

why do we need a dedicated method for global stats? Can we only use existing mechanism and only allow optional BM25 override?

rust/lance/src/dataset/mem_wal/scanner/fts_search.rs

+                  }
+                  /// Add a bloom filter for staleness detection.
+                  pub fn with_bloom_filter(mut self, generation: u64, bloom_filter: Arc<Sbbf>) -> Self {

Contributor

jackye1995 Feb 14, 2026

I think with the latest design, we can make bloom filter also just a bloom filter index in the flushed memtable. It will have a zone size equal to the row count.

Contributor Author

touch-of-grey Feb 14, 2026

Agree. This impacts both FTS and vector search. I can raise a separated PR later about it

rust/lance/src/dataset/mem_wal/scanner/fts_search.rs Outdated

Contributor

jackye1995 Feb 14, 2026

this implementation is missing

rust/lance/src/dataset/mem_wal/scanner/fts_search.rs Outdated

Contributor

jackye1995 Feb 14, 2026

should not fallback, the index should always have a tokenizer set

rust/lance/src/dataset/mem_wal/scanner/fts_search.rs Outdated

Contributor

jackye1995 Feb 14, 2026

import at top

rust/lance/src/dataset/mem_wal/scanner/fts_search.rs Outdated

Contributor

jackye1995 Feb 14, 2026

we should make sure we use the same session cache across dataset opening for the dataset and flushed memtables.

rust/lance/src/dataset/mem_wal/scanner/fts_search.rs Outdated

Contributor

jackye1995 Feb 14, 2026

this is actually quite expensive. We should make sure we are not blocked on loading the bm25 stats, we should compute it while forming plan and doing execution


          feat: support FTS query execution for LSM scanner

097007a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

touch-of-grey force-pushed the LsmFTSQueryPlan branch from 9c97b23 to 097007a Compare

February 15, 2026 07:53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels