Add read-only mmap support (mmap_mode="r") for DictStore, TreeStore, and EmbedStore#585
Add read-only mmap support (mmap_mode="r") for DictStore, TreeStore, and EmbedStore#585Karol-G wants to merge 4 commits intoBlosc:mainfrom
Conversation
Enables memory mapping for DictStore and EmbedStore containers to improve read access performance. This enhancement allows opening store container files (b2z, b2d, b2e) in read-only mode using memory mapping, potentially reducing memory usage and improving read speeds. It introduces an optional `mmap_mode` parameter with "r" as the only supported value. Also, adds validation to ensure mmap_mode is only "r" or None, and that it is only used when mode is "r".
DictStore/TreeStore/EmbedStore docs updated | note current limits (only "r", requires mode="r") | add release-notes entry
Accept formatter-only tuple-yield rewrite in DictStore.items(); no functional change.
|
Looks mostly good to me. The only thing extra I would ask for is to add a benchmark to see how mmap has improved read times, but only if you have got time. Thanks for your contribution! |
|
I will check if I find the time for it tomorrow. |
Introduces a benchmark script to compare read performance between regular and memory-mapped read paths for different store containers (EmbedStore, DictStore, TreeStore). This allows for evaluating the impact of mmap on read throughput and latency under various scenarios, including warm and cold cache conditions. The benchmark supports different data layouts (embedded, external, mixed) and generates detailed metrics such as open time, read time, throughput, and speedup ratios.
|
I added a dedicated benchmark for mmap read mode and ran it across Commands used: python bench/mmap_store_read.py --scenario warm_full_scan warm_random_slices
sudo "$(python3 -c 'import sys; print(sys.executable)')" \
bench/mmap_store_read.py \
--scenario cold_full_scan_drop_caches cold_random_slices_drop_caches \
--runs 5Summary of results:
I’m attaching full warm/cold benchmark outputs in text for reproducibility and detailed review. Results: |
Pretty cool accelerations. Are you using an NFS filesystem for that? Which are the specs of your box(es)? |
|
These results are currently from my workstation only. Our local NFS-backed cluster is down at the moment; I’ll run the same benchmark suite there as soon as it’s back online. The speedups observed here are encouraging, but this is still a small-scale benchmark, so real-world gains may be smaller depending on workload and environment. I’ll share a more complete update after broader testing. |
Summary
This PR adds initial read-only memory-mapping support to store containers via
mmap_mode="r".Supported containers:
DictStore(.b2d,.b2z)TreeStore(viaDictStoreinheritance)EmbedStore(.b2e)Changes
mmap_modetoDictStoreandEmbedStore.mmap_modethrough read/open paths, including zip-offset opens.blosc2.open(..., mmap_mode="r")for store containers through special-store forwarding.DictStore,TreeStore, andEmbedStorewith mmap usage and constraints.Validation Rules
Noneor"r"are allowed.mmap_mode="r"requiresmode="r".ValueError.Tests
Added/updated tests for:
DictStore/TreeStore/EmbedStoreblosc2.open(..., mmap_mode="r")on store containersNotes
mmap_modeis not used.r+/csupport and optional handle reuse/caching for repeated__getitem__access.