Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds a new geo_diff_10 configuration to the geo_filters crate, providing an intermediate precision option between the existing geo_diff_7 and geo_diff_13 configurations. The configuration uses b=10 with a relative error standard deviation of ~0.04, offering a balanced trade-off between memory usage and accuracy.
Changes:
- Added
GeoDiffConfig10andGeoDiffCount10types with b=10, using u32 bucket type, 896 bytes, and 64 MSB - Added test coverage for the estimation lookup table for the new configuration
- Updated evaluation tooling to include geo_diff_10 in accuracy measurements and plot generation
- Added documentation and generated accuracy plot for the new configuration
Reviewed changes
Copilot reviewed 5 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| crates/geo_filters/src/diff_count/config.rs | Adds GeoDiffConfig10 type definition and test_estimation_lut_10 test |
| crates/geo_filters/src/diff_count.rs | Exports GeoDiffConfig10 and defines GeoDiffCount10 type alias |
| crates/geo_filters/scripts/generate-accuracy-plots | Updates script to include geo_diff_10 in accuracy plot generation |
| crates/geo_filters/evaluation/accuracy.rs | Adds geo_diff_10 to evaluation CLI and simulation parsers |
| crates/geo_filters/evaluation/accuracy.md | Adds reference to geo_diff_10 accuracy plot |
| crates/geo_filters/evaluation/accuracy/hll_8.png | Generated accuracy plot (binary file) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| -o accuracy.csv -n 10000 -m 5000000 geo_diff_{7,10,13} geo_distinct_{7,13} hll_{8,14} "$@" | ||
|
|
||
| evaluation/plot-accuracy.r accuracy.csv | ||
|
|
||
| rm -f "$plots_dir"/* | ||
|
|
||
| idx=0 | ||
| for c in geo_diff_{7,13} geo_distinct_{7,13} hll_{8,14}; do | ||
| for c in geo_diff_{7,10,13} geo_distinct_{7,13} hll_{8,14}; do |
There was a problem hiding this comment.
Do we want to include this configuration the geo distinct plots here too?
There was a problem hiding this comment.
mmm. I don't need those yet...
We could do it in a separate run (it was burning quite a lot of my CPU for quite some time to rerun all these evaluations)
| /// Diff count filter with a relative error standard deviation of ~0.125. | ||
| pub type GeoDiffCount7<'a> = GeoDiffCount<'a, GeoDiffConfig7>; | ||
|
|
||
| /// Diff count filter with a relative error standard deviation of ~0.04. |
There was a problem hiding this comment.
Do you know if copilot read the evals to get the relative error?
There was a problem hiding this comment.
I don't know. But you can see from the graphs that it is slightly above 0.04
This configuration increases the precision compared to geo_diff_7 without consuming as much memory as a geo_diff_13.