Add read-only mmap support (mmap_mode="r") for DictStore, TreeStore, and EmbedStore by Karol-G · Pull Request #585 · Blosc/python-blosc2

Karol-G · 2026-02-19T10:11:20Z

Summary

This PR adds initial read-only memory-mapping support to store containers via mmap_mode="r".

Supported containers:

DictStore (.b2d, .b2z)
TreeStore (via DictStore inheritance)
EmbedStore (.b2e)

Changes

Added keyword-only mmap_mode to DictStore and EmbedStore.
Propagated mmap_mode through read/open paths, including zip-offset opens.
Enabled blosc2.open(..., mmap_mode="r") for store containers through special-store forwarding.
Updated reference docs for DictStore, TreeStore, and EmbedStore with mmap usage and constraints.
Added a release-notes entry for the feature.

Validation Rules

Only None or "r" are allowed.
mmap_mode="r" requires mode="r".
Invalid combinations raise ValueError.

Tests

Added/updated tests for:

mmap-backed reads in DictStore / TreeStore / EmbedStore
blosc2.open(..., mmap_mode="r") on store containers
validation errors for unsupported modes and invalid mode combinations

Notes

Backward compatible when mmap_mode is not used.
Follow-up PRs can add r+ / c support and optional handle reuse/caching for repeated __getitem__ access.

Enables memory mapping for DictStore and EmbedStore containers to improve read access performance. This enhancement allows opening store container files (b2z, b2d, b2e) in read-only mode using memory mapping, potentially reducing memory usage and improving read speeds. It introduces an optional `mmap_mode` parameter with "r" as the only supported value. Also, adds validation to ensure mmap_mode is only "r" or None, and that it is only used when mode is "r".

DictStore/TreeStore/EmbedStore docs updated | note current limits (only "r", requires mode="r") | add release-notes entry

Accept formatter-only tuple-yield rewrite in DictStore.items(); no functional change.

lshaw8317 · 2026-02-19T15:42:17Z

Looks mostly good to me. The only thing extra I would ask for is to add a benchmark to see how mmap has improved read times, but only if you have got time. Thanks for your contribution!

Karol-G · 2026-02-19T16:55:40Z

I will check if I find the time for it tomorrow.

Introduces a benchmark script to compare read performance between regular and memory-mapped read paths for different store containers (EmbedStore, DictStore, TreeStore). This allows for evaluating the impact of mmap on read throughput and latency under various scenarios, including warm and cold cache conditions. The benchmark supports different data layouts (embedded, external, mixed) and generates detailed metrics such as open time, read time, throughput, and speedup ratios.

Karol-G · 2026-02-20T09:42:33Z

I added a dedicated benchmark for mmap read mode and ran it across EmbedStore, DictStore, and TreeStore for all supported storage/layout combinations.

Commands used:

python bench/mmap_store_read.py --scenario warm_full_scan warm_random_slices

sudo "$(python3 -c 'import sys; print(sys.executable)')" \
  bench/mmap_store_read.py \
  --scenario cold_full_scan_drop_caches cold_random_slices_drop_caches \
  --runs 5

Summary of results:

mmap_mode="r" consistently improves read performance for embedded payloads.
- Warm runs: large gains, typically around ~2x and up to ~3-4x (especially full scans).
- Cold runs: gains remain strong, typically ~1.7-1.9x for random slices and ~2.5-3.4x for full scans.
For mixed layouts, improvements are moderate but consistent (roughly ~1.1-1.3x).
For external layouts, improvements are small (roughly ~1.05-1.2x) and one case is near-neutral/slight regression (TreeStore + b2d + external + cold full scan).
Overall conclusion: mmap read mode provides clear and robust wins for embedded/container-local read paths, with smaller gains for external-node-heavy workloads.

I’m attaching full warm/cold benchmark outputs in text for reproducibility and detailed review.

Results:

cold_bench_results.txt

warm_bench_results.txt

FrancescAlted · 2026-02-20T09:58:48Z

I added a dedicated benchmark for mmap read mode and ran it across EmbedStore, DictStore, and TreeStore for all supported storage/layout combinations.

Commands used:
python bench/mmap_store_read.py --scenario warm_full_scan warm_random_slices

sudo "$(python3 -c 'import sys; print(sys.executable)')" \
  bench/mmap_store_read.py \
  --scenario cold_full_scan_drop_caches cold_random_slices_drop_caches \
  --runs 5
Summary of results:

mmap_mode="r" consistently improves read performance for embedded payloads.

Warm runs: large gains, typically around ~2x and up to ~3-4x (especially full scans).

Cold runs: gains remain strong, typically ~1.7-1.9x for random slices and ~2.5-3.4x for full scans.

For mixed layouts, improvements are moderate but consistent (roughly ~1.1-1.3x).

For external layouts, improvements are small (roughly ~1.05-1.2x) and one case is near-neutral/slight regression (TreeStore + b2d + external + cold full scan).

Overall conclusion: mmap read mode provides clear and robust wins for embedded/container-local read paths, with smaller gains for external-node-heavy workloads.

I’m attaching full warm/cold benchmark outputs in text for reproducibility and detailed review.

Results:

cold_bench_results.txt

warm_bench_results.txt

Pretty cool accelerations. Are you using an NFS filesystem for that? Which are the specs of your box(es)?

Karol-G · 2026-02-20T10:07:27Z

These results are currently from my workstation only. Our local NFS-backed cluster is down at the moment; I’ll run the same benchmark suite there as soon as it’s back online.

The speedups observed here are encouraging, but this is still a small-scale benchmark, so real-world gains may be smaller depending on workload and environment. I’ll share a more complete update after broader testing.

Karol-G added 3 commits February 19, 2026 10:57

Document initial store-container mmap support

ec5ea9e

DictStore/TreeStore/EmbedStore docs updated | note current limits (only "r", requires mode="r") | add release-notes entry

Apply ruff-format output after store mmap changes

091f340

Accept formatter-only tuple-yield rewrite in DictStore.items(); no functional change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Add read-only mmap support (mmap_mode="r") for DictStore, TreeStore, and EmbedStore#585

Add read-only mmap support (mmap_mode="r") for DictStore, TreeStore, and EmbedStore#585
Karol-G wants to merge 4 commits intoBlosc:mainfrom
Karol-G:feat/store_container_mmap_support

Karol-G commented Feb 19, 2026

Uh oh!

lshaw8317 commented Feb 19, 2026

Uh oh!

Karol-G commented Feb 19, 2026

Uh oh!

Karol-G commented Feb 20, 2026

Uh oh!

FrancescAlted commented Feb 20, 2026

Uh oh!

Karol-G commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

Karol-G commented Feb 19, 2026

Summary

Changes

Validation Rules

Tests

Notes

Uh oh!

lshaw8317 commented Feb 19, 2026

Uh oh!

Karol-G commented Feb 19, 2026

Uh oh!

Karol-G commented Feb 20, 2026

Uh oh!

FrancescAlted commented Feb 20, 2026

Uh oh!

Karol-G commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants