Skip to content

fix: invalidate index fragment bitmaps after data replacement and stale merge#5929

Open
wjones127 wants to merge 4 commits intolance-format:mainfrom
wjones127:fix/index-bitmap-invalidation-5321
Open

fix: invalidate index fragment bitmaps after data replacement and stale merge#5929
wjones127 wants to merge 4 commits intolance-format:mainfrom
wjones127:fix/index-bitmap-invalidation-5321

Conversation

@wjones127
Copy link
Contributor

@wjones127 wjones127 commented Feb 10, 2026

This fixes a bug where queries or merge insert could result in the error this fragment does not exist in the dataset.

This would happen because the index referenced row addresses that were no longer in the dataset.

Here's one way this would happen:

  • Create a BTREE index covering 2 fragments
  • Do a merge insert that rewrites the index column in place. This removes the updated data from the index's fragment bitmap, so data is now hidden.
  • Do compaction. This will move the rows to new fragments.
  • Optimize the BTREE index: this would build the btree index with the old data, not filtering out the invalidated rows. After doing this the invalidated rows, previously masked out, are part of the index. But since compaction happened, those old invalidated rows moved.

This PR makes sure:

  1. When incrementally updating indexes, we only read row ids for data in the fragment bitmap.
  2. Make sure we remove fragments from the index fragment bitmap if we do a DataReplacement on those fields.

This rules are described in the spec at https://lance.org/format/table/index/#loading-an-index

Fixes #5321

…le merge

After a merge_insert that modifies indexed columns (RewriteColumns path),
followed by compaction + optimize_indices, subsequent queries could fail with
"non-existent fragment" errors. Two root causes:

1. optimize_indices merged stale old B-tree data: `merge_indices_with_unindexed_frags`
   used raw `fragment_bitmap` instead of `effective_fragment_bitmap`, carrying
   entries for pruned fragments. When the effective bitmap was empty (fully
   pruned), `index.update()` still called `combine_old_new()` which merged
   stale entries. Now uses effective bitmap and rebuilds from scratch when
   old data is fully stale.

2. DataReplacement had no index invalidation: the DataReplacement handler in
   `build_manifest()` didn't call `prune_updated_fields_from_indices`, so
   replacing an indexed column's data left the index bitmap unchanged.

Fixes lance-format#5321

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the bug Something isn't working label Feb 10, 2026
@github-actions
Copy link
Contributor

Code Review

This PR fixes index fragment bitmap corruption after merge_insert with reordered columns followed by compact_files and optimize_indices. The fix addresses two root causes correctly.

P0: Potential Issue in Scalar Index Rebuild Logic

In rust/lance/src/index/append.rs, lines 140-162, the rebuild-from-scratch path when effective_old_frags.is_empty() passes true for the include_row_id parameter and Some(new_data_stream) for fragments:

super::scalar::build_scalar_index(
    dataset.as_ref(),
    column.name.as_str(),
    &new_uuid.to_string(),
    &params,
    true,  // include_row_id
    None,  // fragments (None means all)
    Some(new_data_stream),  // additional_data_stream
)
.await?

However, the new_data_stream was already prepared from unindexed fragments only (loaded at line 134-142). When rebuilding from scratch with an empty effective bitmap, the intent is to index all current fragments. Passing None for fragments is correct (indexes all), but new_data_stream only contains unindexed fragment data.

Recommendation: Verify that build_scalar_index with fragments=None and additional_data_stream=Some(new_data_stream) produces correct results, or consider passing None for both parameters when rebuilding from scratch.

P1: Replaced Fields Extraction

In rust/lance/src/dataset/transaction.rs, the replaced_fields extraction at lines 2062-2071 only looks at the first data file:

let replaced_fields: Vec<u32> = new_datafiles
    .first()
    .map(|f| {
        f.fields.iter().filter(...).collect()
    })
    .unwrap_or_default();

This is safe given the earlier validation (lines 2037-2058) that ensures all new data files have the same fields. However, adding a comment here would make this dependency explicit.

Tests

The test coverage is thorough:

  • test_merge_insert_with_reordered_columns_and_index directly reproduces the reported issue
  • test_data_replacement_invalidates_index_bitmap validates the DataReplacement fix

Overall the fix logic is sound and addresses the reported issue correctly.

@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 91.13924% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index/append.rs 82.14% 0 Missing and 5 partials ⚠️
rust/lance-index/src/scalar/btree.rs 92.59% 0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 added the critical-fix Bugs that cause crashes, security vulnerabilities, or incorrect data. label Feb 10, 2026
@wjones127 wjones127 force-pushed the fix/index-bitmap-invalidation-5321 branch from ef90fe2 to ed6c6b5 Compare February 10, 2026 23:51
@wjones127 wjones127 force-pushed the fix/index-bitmap-invalidation-5321 branch from ed6c6b5 to 6ba3166 Compare February 10, 2026 23:56
When `optimize_indices` calls `index.update()`, BTreeIndex's
`combine_old_new()` reads all stored data including entries for
invalidated fragments, causing "non-existent fragment" errors. The
bitmap pruning from the previous fix only affects the bitmap, not the
stored index data.

Pass the valid fragment bitmap into `ScalarIndex::update()` so
BTreeIndex can filter old data rows whose fragment ID (upper 32 bits
of the row address) is not in the bitmap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wjones127 wjones127 marked this pull request as ready for review February 11, 2026 23:34
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring this out. It makes me wonder if the update path is even worth the complexity it brings at this point. Btree is the only one that really makes use of it.

Comment on lines +892 to +894
///
/// If `valid_old_fragments` is provided, old index data for fragments not in the bitmap
/// will be filtered out during the merge.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this only matters for btree because it is the only index that scans the old portion of the index instead of rescanning the whole column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, bitmap does this too. I need to fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed bitmap. Other indices should be done as a follow up, but I'm not aware of a code path where they cause a bug yet. Many of them like ZoneMap and BloomFilter given inexact results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: I've been trying to slowly switch over to using string constants like BTREE_IDS_COLUMN and avoiding integral indexing just because it seems less prone to error in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try that.

@wjones127
Copy link
Contributor Author

It makes me wonder if the update path is even worth the complexity it brings at this point. Btree is the only one that really makes use of it.

That's a fair point. There's probably not a huge advantage to scanning the index vs scanning the fragments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working critical-fix Bugs that cause crashes, security vulnerabilities, or incorrect data.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Getting non-existent fragment error in merge_insert

2 participants

Comments