refactor: use dict entries and encoded size instead of cardinality for dict decision#5891
refactor: use dict entries and encoded size instead of cardinality for dict decision#5891
Conversation
Code ReviewSummaryThis PR changes the dictionary encoding decision logic from using pre-computed cardinality statistics to using a budget-based approach with P0 Issues
P1 Issues
Positive Notes
Testing SuggestionsConsider adding a test that verifies dictionary encoding still works correctly when the sample suggests near-uniqueness but actual data has lower cardinality (edge case where sampling step misses repeated patterns). |
This PR changed how we decide to use dict or now. Instead of cardinality, we will use dict entries and encoded size instead.
Parts of this PR were drafted with assistance from Codex (with
gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.