feat: Write transaction fragments directly to storage to reduce consumer memory footprint#3783
feat: Write transaction fragments directly to storage to reduce consumer memory footprint#3783
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
Comment |
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3783 +/- ##
==========================================
+ Coverage 87.16% 87.22% +0.06%
==========================================
Files 25 25
Lines 2314 2325 +11
Branches 580 588 +8
==========================================
+ Hits 2017 2028 +11
Misses 295 295
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment has been minimized.
This comment has been minimized.
d3dc0f4 to
1c882c2
Compare
1c882c2 to
71b61f5
Compare
This comment has been minimized.
This comment has been minimized.
dba7fc3 to
c7d9d17
Compare
This comment has been minimized.
This comment has been minimized.
c7d9d17 to
ccaed2d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ccaed2d3ca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@claude please review this PR.
|
|
Every microsecond counts! |
robacourt
left a comment
There was a problem hiding this comment.
LGTM. Great work! The interaction with the Materializer will need addressing, maybe consumers with materializers should be write_unit=txn
Claude Code ReviewSummarySeventh review iteration of PR #3783 (2026-02-18). The PR successfully addresses the linked issue #3415 by streaming transaction fragments directly to storage, delivering a dramatic memory improvement (9GB → 500MB). The latest commit (751cf24) fixes empty materializer event filtering and ordering assumptions. However, CI test failures remain a merge blocker - 3 MaterializerTest failures suggest issues with event accumulation and notification in fragment mode. What's Working WellMemory optimization delivered: The benchmark results demonstrate the PR's core value - memory usage stays below 500MB during large transaction processing vs. nearly 9GB on main. Solid architecture: The separation between Comprehensive testing infrastructure: The addition of Latest fix (751cf24) is correct: The empty event filtering check ( Issues FoundCritical (Must Fix)1. CI MaterializerTest failures indicate incomplete fix Files: Issue: According to the latest Blacksmith CI report (2026-02-18), three MaterializerTest failures persist despite the fix in commit 751cf24:
Root cause analysis: Looking at the materializer code ( defp maybe_flush_pending_events(state, true) do
events = cancel_matching_move_events(state.pending_events)
if events != %{} do
for pid <- state.subscribers do
send(pid, {:materializer_changes, state.shape_handle, events})
end
end
%{state | pending_events: %{}}
endThe fix correctly prevents sending
Possible issues:
Action required:
2. Innermost dependency shape handling unresolved (from previous reviews) File: Issue: Per @robacourt's comment in review-context, the current logic enables Example:
The inner shape will be set to Evidence this may be causing oracle test failures:
Recommendation:
Code location to fix: write_unit =
if filled_deps == [] do
# BUG: This doesn't check if this shape IS a dependency for others
# and has a materializer that needs full txn visibility
:txn_fragment
else
:txn
endSuggested fix pattern: write_unit =
cond do
# Shapes with dependencies always buffer full txn
filled_deps != [] -> :txn
# Shapes that have dependent outer shapes with materializers need full txn
has_dependent_shapes?(shape) -> :txn
# All other shapes can stream fragments
true -> :txn_fragment
end3. Oracle test baseline still missing (from previous reviews) Status: The author reported oracle test failures on Critical questions remain:
Why this is a blocker: Without a baseline, we cannot distinguish between:
Action required:
Important (Should Fix)4. Incomplete changeset file File: Issue: The changeset describes the feature but doesn't mention potential breaking changes or requirements:
Recommendation: Expand the changeset to document:
5. Test helper robustness File: Issue: The Recommendation: Add a helper that explicitly documents whether changes are committed: defp apply_uncommitted_changes(ctx, changes),
do: Materializer.new_changes(ctx, prep_changes(changes), commit: false)
defp apply_committed_changes(ctx, changes),
do: Materializer.new_changes(ctx, prep_changes(changes), commit: true)This makes test intent clearer and reduces the risk of missing the Test CoverageExcellent additions:
Missing:
Issue ConformanceLinked issue #3415: "Avoid holding whole transaction in consumer memory" ✅ Core objective delivered: 9GB → 500MB memory improvement Architecture & Code QualityStrengths:
Concerns:
Previous Review Status (Review #6 → Review #7)What was addressed since review #6:
What remains from review #6:
Next Steps Before MergeCritical (Merge Blockers):
Important (Pre-Merge):
Overall AssessmentThis PR represents a significant and valuable memory optimization with solid architectural decisions. The ~18x memory reduction (9GB → 500MB) is impressive and addresses a real pain point. However, three merge blockers remain:
The code quality is high, testing infrastructure is excellent, and the approach is sound. Once these three blockers are addressed, this PR will be ready to merge. Recommendation:
Review iteration: 7 | 2026-02-18 |
5f18340 to
4714a7a
Compare
Update (Feb 16)Rebased on current main and pushed new commits addressing review feedback:
Still TODO: address @robacourt's feedback about shapes that are dependencies (innermost shapes with no dependencies themselves but that DO have a materializer) — these should probably use Based on current implementation, Materializer sees changes from all txn fragments but when it then notifies the consumer with a |
This comment has been minimized.
This comment has been minimized.
Latest changes (3 new commits since last push)Defer Materializer subscriber notifications until commit fragment Previously, when Changes:
|
|
Regarding an erlier agentic comment about the potential impact of direct txn fragment writing to storage on TL;DR: Fragment Streaming Storage SafetyAll 7 crash/consistency scenarios are safe. The three-offset architecture (
No code changes are needed — the existing invariants hold correctly with the fragment write path. |
There was a problem hiding this comment.
LGTM. To test the affect of this on subqueries, you could cherry-pick 0cc3b49 and run the tests with the number of mutations in the test transactions over the max txn size and see if there any issues:
ELECTRIC_EXPERIMENTAL_MAX_TXN_SIZE=5 \
MUTATIONS_PER_TXN=15 \
mix test --only oracle test/integration/oracle_property_test.exs|
@robacourt That's dope! It has found one failure already after running for ~10sec. I'll look into fixing it and see if it can discover anything else. |
…be at most 1 txn at a time
…cing txn boundary independently
…processing txn fragments with commit
To be sure no log items are written to storage, we must verify that neither of the writing-to-storage functions got called.
On CI the function has been seen to return an empty list where a non-empty one was expected. Apparently, there may be a delay between the calls themselves and the moment they end up in the test process' mailbox.
…rate functions This is needed to be able to selectively notify the appropriate subscriber and processing individual txn fragments.
…tions PendingTxn no longer needs to track last_log_offset since the commit fragment's offset (txn_fragment.last_log_offset) is now used directly in maybe_complete_pending_txn. This fixes flush alignment lagging behind when the final commit fragment is filtered out. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
InMemoryStorage and TestStorage return false, PureFileStorage returns true. This allows runtime capability checking before enabling fragment streaming mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pport When initializing a consumer, validate that the storage backend supports txn fragment streaming before using :txn_fragment mode. If unsupported (e.g. InMemoryStorage), fall back to full-transaction buffering with a warning log. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
412234d to
60f11a8
Compare
Previously, Materializer sent :materializer_changes to subscribers on every new_changes call. When write_unit=:txn_fragment, this meant the outer Consumer could receive multiple intermediate move-in/move-out notifications within a single transaction, causing redundant work. Add a commit: option to Materializer.new_changes/3 (default: true). Changes are accumulated in pending_events and only flushed to subscribers when commit: true. The fragment-streaming path passes commit: not is_nil(fragment.commit) to defer until the final fragment. maybe_complete_pending_txn sends an empty new_changes with commit: true to handle the edge case where the commit fragment has no matching changes but earlier fragments accumulated events. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ding writes for this txn
Verifies that splitting inserts across two new_changes calls (one uncommitted, one committed) produces the same event order as sending them in a single call. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
60f11a8 to
634162c
Compare
This comment has been minimized.
This comment has been minimized.
674ad7e to
751cf24
Compare
|
@robacourt Pushed 751cf24 to correctly incorporate your recent addition of cancelling out move-ins/move-outs. |
|
I've just run oracle tests with max_batch_size=5 and mutations_per_txn=15 and can relibably reproduce this issue: The key bit is So apparently Electric returns incorrect offset that's not at transaction boundary. Silver lining is that if I set I may be able to fix the handling of multiple txn fragments for shapes without subqueries first and we ship those. Then deal with subqueries. |
|
:( if it's easy to reproduce, we should be able to figure the root cause. There is no rush to ship this PR, so let's do it properly when we know the fix. |





Closes #3415
Summary
For shapes without subquery dependencies (the vast majority), the consumer now streams each transaction fragment directly to storage as it arrives, instead of buffering the entire transaction in memory. This significantly reduces per-shape-consumer memory usage for large transactions.
Shapes with subquery dependencies continue to buffer the full transaction (
write_unit=txn) because they need the complete change set for move-in/move-out reasoning.How it works
The consumer operates in one of two modes, determined at shape creation time:
write_unit=txn_fragment(default): Each fragment is filtered, converted to log items, and written to storage immediately. A lightweightPendingTxnstruct tracks only metadata (xid, offset, byte count, change count) across fragments. When the final fragment with a commit arrives, storage is notified to advance its transaction boundary.write_unit=txn(shapes with subquery dependencies): Fragments are accumulated into a complete transaction before writing, as before.Key additions
Consumer.PendingTxn— tracks in-progress transaction metadata during fragment streaming. It is the lightweight counterpart toTransactionBuilder: whereTransactionBuilderaccumulates all changes in memory,PendingTxnonly tracks offsets and counters while actual data is written to storage and discarded immediately.New storage callbacks —
append_fragment_to_log!/2writes log items without advancing the persisted transaction offset (can be called multiple times per transaction), andsignal_txn_commit!/2marks the transaction as complete so crash-recovery sees the correct committed boundary. The existing full-transaction write path is refactored to share the same underlying logic.Support.StorageTracer— test helper using Erlang trace sessions to observe which storage functions are called from a process, enabling non-intrusive assertions on the write path taken without wrapper modules or function patching.Other notable changes
FlushTrackerinShapeLogCollectoris now only invoked for fragments that include a commit, since mid-transaction fragments don't represent a flush boundary.CrashingFileStorage(dead code).Expected memory footprint reduction
Previously, the consumer held an entire transaction's worth of change structs in memory until commit. Now, each fragment's changes are written to storage and released immediately. The consumer only retains the
PendingTxnmetadata struct (~100 bytes) per in-progress transaction, reducing peak memory by roughly the size of the buffered change data. The improvement scales linearly with transaction size.Testing
Consumer and storage tests have been expanded to cover both
write_unitmodes, including tracing-based assertions on which storage functions are called.