test: add nested data query tests by wjones127 · Pull Request #5901 · lance-format/lance

wjones127 · 2026-02-05T23:52:46Z

Summary

Add query tests for nested data types (List<str>, List<int>, Struct, List<Struct<str>>), covering scan, take, and filter with and without indices (LabelList, BTree, Bitmap).

Test data includes null list elements, null lists, null struct fields, and null struct elements in lists. Some of these cases are expected to fail due to known issues (e.g. LabelList dropping rows with null elements, struct-level nulls not preserved on round-trip).

Test plan

cargo test -p lance --test integration_tests --features slow_tests nested

🤖 Generated with Claude Code

github-actions · 2026-02-05T23:53:52Z

Code Review

This PR adds comprehensive test coverage for nested data types (List, List, Struct, List<Struct>) with various index configurations. The tests are well-structured and follow existing patterns in the query test suite.

No P0 or P1 issues identified. The code is well-organized, follows the established testing patterns, and covers important edge cases (nulls at various levels, empty lists, duplicates).

A few minor observations (not blocking):

The tests appropriately exercise null handling at multiple levels (null elements, null lists, null struct fields, null struct elements) which is valuable for catching regressions like fix: handle NULL elements in LABEL_LIST index results and explain_plan #5867
The test data setup in test_query_list_struct is verbose due to the builder API, but this is unavoidable given Arrow's struct builder ergonomics
The comment noting LabelList doesn't support struct elements explains why that test case omits index configurations

The PR achieves its stated goal of adding query tests for nested data types with good coverage of edge cases.

codecov · 2026-02-06T00:33:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Add tests for List<str>, List<int>, Struct, and List<Struct<str>> covering scan, take, and filter (including NOT/OR variants) with and without indices (LabelList, BTree, Bitmap). Data includes null list elements, null lists, null struct fields, and null struct elements in lists to catch regressions like lance-format#5867. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- test_query_list_str: Fails with LabelList index (issue lance-format#5682) - test_query_struct: Fails due to struct-level nulls not preserved (issue lance-format#1120) - test_query_list_struct: Fails due to list-of-struct not properly handled (issue lance-format#838) Added comprehensive tests for empty lists and nulls on both sides of OR expressions. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

LabelList index still has issues with null element handling despite PR lance-format#5867 and PR lance-format#5914. Tests pass without LabelList index. Re-enable when fully fixed. Issue: lance-format#5682 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Both test_query_struct and test_query_list_struct are expected to be fixed in Lance 2.1+. Re-enable tests when minimum version is 2.1 or later. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Added a with_file_version() method to DatasetTestCases to allow running tests with specific Lance file format versions. This enables testing features that are only available in newer file format versions. Also added test_query_struct_v2_1 which tests struct-level null preservation with Lance 2.1 format. This test now passes, confirming issue lance-format#1120 is fixed in Lance 2.1+. - struct-level nulls are now preserved with V2_1 format (issue lance-format#1120) - list-of-struct still has issues even with V2_1 (issue lance-format#838 remains open) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Investigation revealed that test_query_list_struct fails due to a panic in the repdef encoder (lance-encoding/src/repdef.rs:630), NOT due to list-of-struct filtering limitations (issue lance-format#838). Key findings: - Python API: Successfully writes and reads List<Struct> data - Rust test: Panics during encode with ListBuilder + StructBuilder - Issue is specific to how Rust builders construct validity patterns - This is SEPARATE from lance-format#838 (filtering/selection support) Test status: - test_query_struct_v2_1: PASSES (confirms issue lance-format#1120 fixed in V2.1) - test_query_list_struct: FAILS (encoder panic, separate from lance-format#838) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

CRITICAL FINDING: The List<Struct> encoding issue is related to issue lance-format#1120. Root Cause: Struct-level nulls (null struct elements in a list) are NOT preserved on round-trip. This is the same issue as lance-format#1120 (struct-level nulls not preserved) but for list elements. It's SEPARATE from lance-format#838 (filtering/selection support). Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

CRITICAL FINDING: Behavior differs significantly by file format version: 2.0 (DEFAULT): Silently LOSES null struct elements (data corruption) 2.1: PANICS on READ (assertion in decoder) 2.2: PANICS on READ (assertion in decoder) The panic is in the decoder (struct.rs:382), not the encoder. This indicates that 2.1/2.2 changed how null structs are encoded, but the decoder wasn't properly updated to handle the new encoding. This is a REGRESSION - 2.0's silent corruption is "better" than 2.1/2.2's crash, though both are wrong. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Attempted to reproduce the exact panic from the Rust test: thread 'lance-cpu' panicked at rust/lance-encoding/src/repdef.rs:630:9: assertion failed: self.current_len == 0 || self.current_len == validity.len() + self.current_num_specials Created multiple test patterns: - test_repdef_panic.py: Various list/struct combinations - test_repdef_null_first.py: Null struct elements at offset 0 - test_repdef_builder_order.py: Full sequence from Rust test Result: All Python/PyArrow tests PASS - Python creates structures that encode/decode correctly - Issue is specific to Rust ListBuilder + StructBuilder combination - The 'special' value tracking in definition levels must differ Key insight: PyArrow's higher-level API abstracts away the intermediate builder states. Rust builders expose these states differently, creating mismatches in: - current_len tracking - definition level special count - offset/validity patterns The Rust test's incremental builder pattern likely creates a state where current_len != validity.len() + current_num_specials Recommendation: Debug the Rust test with print statements in repdef.rs:628-631 to see: 1. What value current_len has 2. What value validity.len() + current_num_specials produces 3. Which specific append() call triggers the mismatch Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

github-actions bot added the chore label Feb 5, 2026

wjones127 mentioned this pull request Feb 10, 2026

fix: correct OR null semantics for nullable masks #5919

Merged

wjones127 and others added 3 commits February 10, 2026 09:36

wjones127 force-pushed the wjones/nested-query-tests branch from 3225b12 to c6bff84 Compare February 10, 2026 17:43

wjones127 and others added 6 commits February 10, 2026 10:11

test: add version notes for struct-related tests

25ad953

Both test_query_struct and test_query_list_struct are expected to be fixed in Lance 2.1+. Re-enable tests when minimum version is 2.1 or later. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add nested data query tests#5901

test: add nested data query tests#5901
wjones127 wants to merge 9 commits intolance-format:mainfrom
wjones127:wjones/nested-query-tests

wjones127 commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

codecov bot commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

wjones127 commented Feb 5, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Feb 5, 2026

Code Review

Uh oh!

codecov bot commented Feb 6, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments