feat: Write transaction fragments directly to storage to reduce consumer memory footprint by alco · Pull Request #3783 · electric-sql/electric

alco · 2026-01-27T22:37:22Z

Summary

For shapes without subquery dependencies (the vast majority), the consumer now streams each transaction fragment directly to storage as it arrives, instead of buffering the entire transaction in memory. This significantly reduces per-shape-consumer memory usage for large transactions.

Shapes with subquery dependencies continue to buffer the full transaction (write_unit=txn) because they need the complete change set for move-in/move-out reasoning.

How it works

The consumer operates in one of two modes, determined at shape creation time:

write_unit=txn_fragment (default): Each fragment is filtered, converted to log items, and written to storage immediately. A lightweight PendingTxn struct tracks only metadata (xid, offset, byte count, change count) across fragments. When the final fragment with a commit arrives, storage is notified to advance its transaction boundary.
write_unit=txn (shapes with subquery dependencies): Fragments are accumulated into a complete transaction before writing, as before.

Key additions

Consumer.PendingTxn — tracks in-progress transaction metadata during fragment streaming. It is the lightweight counterpart to TransactionBuilder: where TransactionBuilder accumulates all changes in memory, PendingTxn only tracks offsets and counters while actual data is written to storage and discarded immediately.

New storage callbacks — append_fragment_to_log!/2 writes log items without advancing the persisted transaction offset (can be called multiple times per transaction), and signal_txn_commit!/2 marks the transaction as complete so crash-recovery sees the correct committed boundary. The existing full-transaction write path is refactored to share the same underlying logic.

Support.StorageTracer — test helper using Erlang trace sessions to observe which storage functions are called from a process, enabling non-intrusive assertions on the write path taken without wrapper modules or function patching.

Other notable changes

Initial snapshot filtering (xid-based skip) is now applied at the first fragment, avoiding accumulation of fragments for transactions that will be discarded.
The FlushTracker in ShapeLogCollector is now only invoked for fragments that include a commit, since mid-transaction fragments don't represent a flush boundary.
Removed CrashingFileStorage (dead code).

Expected memory footprint reduction

Previously, the consumer held an entire transaction's worth of change structs in memory until commit. Now, each fragment's changes are written to storage and released immediately. The consumer only retains the PendingTxn metadata struct (~100 bytes) per in-progress transaction, reducing peak memory by roughly the size of the buffered change data. The improvement scales linearly with transaction size.

Testing

Consumer and storage tests have been expanded to cover both write_unit modes, including tracing-based assertions on which storage functions are called.

coderabbitai · 2026-01-27T22:37:32Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

netlify · 2026-01-27T22:39:21Z

✅ Deploy Preview for electric-next ready!

Name	Link
🔨 Latest commit	`634162c`
🔍 Latest deploy log	https://app.netlify.com/projects/electric-next/deploys/6995a5643350ab00082eacdf
😎 Deploy Preview	https://deploy-preview-3783--electric-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

codecov · 2026-01-27T22:40:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.22%. Comparing base (03943ad) to head (751cf24).
⚠️ Report is 4 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3783      +/-   ##
==========================================
+ Coverage   87.16%   87.22%   +0.06%     
==========================================
  Files          25       25              
  Lines        2314     2325      +11     
  Branches      580      588       +8     
==========================================
+ Hits         2017     2028      +11     
  Misses        295      295              
  Partials        2        2

Flag	Coverage Δ
packages/experimental	`87.73% <ø> (ø)`
packages/react-hooks	`86.48% <ø> (ø)`
packages/start	`82.83% <ø> (ø)`
packages/typescript-client	`92.09% <ø> (+0.05%)`	⬆️
packages/y-electric	`56.05% <ø> (ø)`
typescript	`87.22% <ø> (+0.06%)`	⬆️
unit-tests	`87.22% <ø> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

alco · 2026-02-10T14:25:00Z

Manual memory benchmark results

The tests below where done separately with electric@main and electric@this-branch. The database had a single table

CREATE TABLE IF NOT EXISTS public.users (
  id bigserial PRIMARY KEY,
  name TEXT NOT NULL,
  email TEXT NOT NULL                                                  
);

and Electric was initialized with 200 shapes, each shape having a where clause N=N where N in 1..200.

After creating the shapes and firing up Observer, I executed the following INSERT in the database:

insert into                              
  public.users(name, email)
    select
      'username_' || generate_series::text || '_' || repeat('x', 2000),
      'username_' || generate_series::text || '@example.com'
    from
      generate_series(1,10001);

Memory charts

The total memory usage on electric@main grows almost to 9GB during transaction processing. For electric@this-branch the total memory usage stays below 500MB.

electric@main

electric@this-branch

System stats comparison

For each run, I screenshotted the system stats in Observer both before the INSERT and after the memory usage had fallen back down (after GC and consumer hibernation).

The key thing to notice from these is that electric@main has residual process memory usage after the load testing which is not the case for electric@this-branch.

electric@main

electric@this-branch

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ccaed2d3ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/sync-service/lib/electric/shapes/consumer/state.ex

packages/sync-service/lib/electric/shapes/consumer.ex

alco · 2026-02-10T14:34:47Z

@claude please review this PR.

Does it address the referenced issue?
Do you see any obvious issue in the code?
Lack of test coverage?
Any gnarly or hairy code?

alco · 2026-02-11T11:16:12Z

I've run the replication throughput benchmark on our cloud fleet with the following params, comparing main (i.e. electricsql/electric:canary) with this PR (pushed to Docker Hub as alco/electric:fragment-direct-pr):

%{
  "row_size" => [150],
  "shape_count" => [5, 100, 1000, 2000],
  "tx_affects_shape" => [true, false],
  "tx_count" => [1000],
  "tx_row_count" => [90, 100, 120, 200, 220],
  "where_clause" => ["name = '\#{name}'", "name ILIKE '\#{name}%'"]
}

balegas · 2026-02-11T12:01:25Z

Every microsecond counts!

robacourt

LGTM. Great work! The interaction with the Materializer will need addressing, maybe consumers with materializers should be write_unit=txn

claude · 2026-02-11T16:01:28Z

Claude Code Review

Summary

Seventh review iteration of PR #3783 (2026-02-18). The PR successfully addresses the linked issue #3415 by streaming transaction fragments directly to storage, delivering a dramatic memory improvement (9GB → 500MB). The latest commit (751cf24) fixes empty materializer event filtering and ordering assumptions. However, CI test failures remain a merge blocker - 3 MaterializerTest failures suggest issues with event accumulation and notification in fragment mode.

What's Working Well

Memory optimization delivered: The benchmark results demonstrate the PR's core value - memory usage stays below 500MB during large transaction processing vs. nearly 9GB on main.

Solid architecture: The separation between PendingTxn (lightweight metadata tracking) and TransactionBuilder (full buffering) is clean and appropriate. The storage capability checking via supports_txn_fragment_streaming? provides a good safety net.

Comprehensive testing infrastructure: The addition of StorageTracer for non-intrusive test assertions and oracle-based ground-truth validation shows excellent testing practices.

Latest fix (751cf24) is correct: The empty event filtering check (events != %{}) and sorting fix for event ordering properly address the identified issues.

Issues Found

Critical (Must Fix)

1. CI MaterializerTest failures indicate incomplete fix

Files: packages/sync-service/test/electric/shapes/consumer/materializer_test.exs:353-375, 408-429

Issue: According to the latest Blacksmith CI report (2026-02-18), three MaterializerTest failures persist despite the fix in commit 751cf24:

test events are accumulated in correct order across uncommitted fragments - Ordering assertion failures
test same-batch move event cancellation existing value removed and re-added emits no events - Unexpected message received
test same-batch move event cancellation insert and delete in same batch emits no events - Unexpected message received

Root cause analysis:

Looking at the materializer code (materializer.ex:309-322):

defp maybe_flush_pending_events(state, true) do
  events = cancel_matching_move_events(state.pending_events)
  
  if events != %{} do
    for pid <- state.subscribers do
      send(pid, {:materializer_changes, state.shape_handle, events})
    end
  end
  
  %{state | pending_events: %{}}
end

The fix correctly prevents sending %{}, but the test failures suggest:

Event accumulation across fragments may not be working correctly (test line 353-375)
The commit: flag may not be propagated correctly in all code paths
Tests receiving unexpected messages implies empty events are still being sent from somewhere

Possible issues:

The Consumer may be calling Materializer.new_changes/3 without the correct commit: flag in some edge cases
There might be a race condition where maybe_flush_pending_events is called multiple times
The event merging logic in merge_events/2 might have edge cases

Action required:

Run the failing tests locally with debug logging to see what messages are actually being sent
Add logging to track all calls to maybe_flush_pending_events and new_changes with their arguments
Verify that the Consumer's fragment processing path always passes commit: false for non-final fragments
Check if there's a code path that bypasses the events != %{} check

2. Innermost dependency shape handling unresolved (from previous reviews)

File: packages/sync-service/lib/electric/shapes/consumer/state.ex:174-175

Issue: Per @robacourt's comment in review-context, the current logic enables write_unit=:txn_fragment for shapes without dependencies, but innermost dependency shapes (shapes with no dependencies that ARE dependencies for others) need special handling.

Example:

Outer shape: where: id IN (SELECT id FROM accounts) - dependencies: [Inner Shape]
Inner shape: SELECT id FROM accounts - dependencies: [] - HAS a materializer

The inner shape will be set to write_unit=:txn_fragment, but its materializer may need full transaction visibility for correct move-in/move-out reasoning.

Evidence this may be causing oracle test failures:

Oracle tests fail at varying rates (2-10 runs with specific params)
Example discrepancy: materialized shows 6 rows, oracle shows 9 rows (missing rows)
This pattern is consistent with incomplete transaction visibility in materializers

Recommendation:

Check if a shape has dependent shapes before setting write_unit=:txn_fragment
If a shape has a materializer (regardless of whether it has dependencies), consider whether it needs full transaction visibility
Add explicit test: "innermost dependency shape with materializer receives fragments correctly"
Correlate oracle test failures with shapes that have materializers - if failures are specific to these shapes, this is likely the root cause

Code location to fix: packages/sync-service/lib/electric/shapes/consumer/state.ex:174-175

write_unit =
  if filled_deps == [] do
    # BUG: This doesn't check if this shape IS a dependency for others
    # and has a materializer that needs full txn visibility
    :txn_fragment
  else
    :txn
  end

Suggested fix pattern:

write_unit =
  cond do
    # Shapes with dependencies always buffer full txn
    filled_deps != [] -> :txn
    # Shapes that have dependent outer shapes with materializers need full txn
    has_dependent_shapes?(shape) -> :txn
    # All other shapes can stream fragments
    true -> :txn_fragment
  end

3. Oracle test baseline still missing (from previous reviews)

Status: The author reported oracle test failures on main but no baseline comparison has been documented yet.

Critical questions remain:

Does main fail oracle tests at the same rate with the same seeds?
Does this PR introduce new failures or expose existing bugs?
Is the oracle test harness itself correct?

Why this is a blocker: Without a baseline, we cannot distinguish between:

Pre-existing bugs in main
Bugs introduced by this PR
Bugs in the test harness itself

Action required:

Run oracle tests on main with fixed seeds (document the seeds used)
Run the same tests on this branch with identical seeds
Document findings in PR comment with failure rates and specific discrepancies
If this PR introduces new failures, investigate before merge
If failures are pre-existing, file a separate issue and document it

Important (Should Fix)

4. Incomplete changeset file

File: .changeset/brown-rice-float.md

Issue: The changeset describes the feature but doesn't mention potential breaking changes or requirements:

Storage backends must implement new callbacks (append_fragment_to_log!/2, signal_txn_commit!/2, supports_txn_fragment_streaming?/1)
InMemoryStorage and TestStorage explicitly return false for fragment streaming support
No migration guide for custom storage implementations

Recommendation: Expand the changeset to document:

New storage interface requirements
Backward compatibility strategy (unsupported storages fall back to :txn mode)
Any performance implications for shapes with subquery dependencies

5. Test helper robustness

File: packages/sync-service/test/electric/shapes/consumer/materializer_test.exs:492

Issue: The apply_changes/2 helper calls prep_changes/1 which adds metadata, but there's no verification that the commit: parameter is being passed correctly by callers.

Recommendation: Add a helper that explicitly documents whether changes are committed:

defp apply_uncommitted_changes(ctx, changes), 
  do: Materializer.new_changes(ctx, prep_changes(changes), commit: false)

defp apply_committed_changes(ctx, changes),
  do: Materializer.new_changes(ctx, prep_changes(changes), commit: true)

This makes test intent clearer and reduces the risk of missing the commit: flag.

Test Coverage

Excellent additions:

Oracle tests provide ground-truth validation (commit 634162c)
Fragment ordering test (commit bb169f2)
Event cancellation tests with move-in/move-out logic
Storage capability checking tests
StorageTracer for non-intrusive testing

Missing:

✅ Fragment streaming with innermost dependency shapes (CRITICAL)
Multi-fragment transaction where final fragment is filtered out (edge case for offset mapping)
Materializer event accumulation edge cases (commit: false → commit: false → commit: true)
Oracle baseline comparison tests on main

Issue Conformance

Linked issue #3415: "Avoid holding whole transaction in consumer memory"

✅ Core objective delivered: 9GB → 500MB memory improvement
✅ Fragment streaming working: For shapes without dependencies
✅ Safety mechanisms in place: Storage capability checking prevents crashes
⚠️ Materializer edge cases: CI failures need resolution
⚠️ Innermost dependency shapes: Unclear if handled correctly
⚠️ Oracle test baseline: Needed to validate correctness

Architecture & Code Quality

Strengths:

Clean separation: PendingTxn (metadata) vs TransactionBuilder (full buffering)
Defensive programming: Empty event filtering, storage capability checking
Excellent observability: StorageTracer, oracle tests
Well-structured storage abstraction for heterogeneous backends

Concerns:

Materializer event notification has subtle complexity causing test failures
Innermost dependency shape scenario needs explicit handling or documentation
Oracle test reliability needs baseline to establish confidence

Previous Review Status (Review #6 → Review #7)

What was addressed since review #6:

✅ Empty event filtering added (commit 751cf24)
✅ Ordering assumption removed from tests (commit 751cf24)
✅ Storage capability checking implemented (commits 4e630ab, 7ec4c9a)
✅ Materializer notification split for fragments vs clients (commit a93967e)
✅ Commit fragment offset fix for txn_offset_mapping (commit 78a629a)

What remains from review #6:

❌ MaterializerTest CI failures - Still failing (3 tests)
❌ Oracle test baseline on main - Not yet documented
❌ Innermost dependency shape question - Unresolved per @robacourt feedback

Next Steps Before Merge

Critical (Merge Blockers):

Fix MaterializerTest CI failures (assignee: @alco)
- Reproduce failures locally
- Add debug logging to track maybe_flush_pending_events calls
- Verify commit: flag propagation in all Consumer code paths
- Fix and verify tests pass consistently
Establish oracle test baseline on main (assignee: @alco)
- Document seeds used for reproducibility
- Run tests on main with fixed seeds (10+ runs)
- Run tests on this branch with same seeds
- Document comparison: pre-existing vs new failures
- If new failures, investigate root cause
Resolve innermost dependency shape handling (assignee: @alco + @robacourt)
- Add test: innermost dependency shape (no deps, has materializer) with fragment streaming
- Verify materializer sees complete transaction for move-in/move-out logic
- If broken, add logic to detect "is dependency for other shapes" and use :txn mode
- Document decision in code comments

Important (Pre-Merge):

Expand changeset file with storage interface breaking changes
Add test helper clarity for committed vs uncommitted changes
Add inline comments explaining when write_unit=:txn vs :txn_fragment is chosen

Overall Assessment

This PR represents a significant and valuable memory optimization with solid architectural decisions. The ~18x memory reduction (9GB → 500MB) is impressive and addresses a real pain point.

However, three merge blockers remain:

CI test failures are unresolved - MaterializerTest failures suggest the empty event fix is incomplete or there are edge cases in event accumulation logic
Innermost dependency shape behavior is unclear - @robacourt's comment highlights a gap in the current write_unit selection logic that may affect shapes with materializers
Oracle test baseline is missing - Cannot validate correctness without comparing failure rates against main

The code quality is high, testing infrastructure is excellent, and the approach is sound. Once these three blockers are addressed, this PR will be ready to merge.

Recommendation:

Immediate priority: Debug and fix MaterializerTest failures - they indicate a real issue
High priority: Add test for innermost dependency shape + materializer scenario
Before merge: Document oracle test baseline comparison with main

Review iteration: 7 | 2026-02-18

alco · 2026-02-16T13:10:20Z

Update (Feb 16)

Rebased on current main and pushed new commits addressing review feedback:

Split materializer and client notifications (a93967e): Separated notify_new_changes into distinct functions for notifying the materializer vs. clients. This is needed to selectively notify the appropriate subscriber when processing individual txn fragments.
Use commit fragment offset for txn_offset_mapping and client notifications (78a629a): Fixes the flush alignment issue flagged in review — when the final commit fragment is filtered out, the mapping now uses the commit fragment's last_log_offset directly instead of the last relevant-change boundary. PendingTxn no longer needs to track last_log_offset.
Add supports_txn_fragment_streaming? callback to Storage behaviour (4e630ab): InMemoryStorage and TestStorage return false, PureFileStorage returns true. This allows runtime capability checking before enabling fragment streaming mode.
Downgrade write_unit to :txn when storage lacks fragment streaming support (4714a7a): On consumer init, validates that the storage backend supports txn fragment streaming. If unsupported (e.g. InMemoryStorage), falls back to full-transaction buffering with a warning log. This addresses the concern about InMemoryStorage crashing on fragment callbacks.

Still TODO: address @robacourt's feedback about shapes that are dependencies (innermost shapes with no dependencies themselves but that DO have a materializer) — these should probably use write_unit=txn to ensure the materializer gets the full transaction change set on commit.

Based on current implementation, Materializer sees changes from all txn fragments but when it then notifies the consumer with a :materializer_changes message in return, Consumer may still be assuming that a complete transaction has been processed.

alco · 2026-02-16T15:18:01Z

Latest changes (3 new commits since last push)

Defer Materializer subscriber notifications until commit fragment

Previously, when write_unit=:txn_fragment, the inner Consumer called Materializer.new_changes() per fragment, and the Materializer immediately sent :materializer_changes (move-in/move-out events) to the outer Consumer after each fragment. This could cause redundant intermediate notifications within a single transaction.

Changes:

Added a commit: option to Materializer.new_changes/3 (defaults to true for backward compatibility). When commit: false, events are accumulated in a new pending_events field on the Materializer state without notifying subscribers. When commit: true, accumulated events are flushed to subscribers.
The fragment-streaming path in consumer.ex passes commit: false for every fragment. The flush happens via notify_new_changes in maybe_complete_pending_txn which uses commit: true (the default).
Events from multiple fragments are merged by prepending newer events to older ones, matching the existing reverse-chronological ordering convention used by increment_value/decrement_value.
Added a test verifying correct event ordering across uncommitted fragments.
Moved signal_txn_commit! inside the num_changes > 0 branch since there's no need to signal storage when no changes were written.

alco · 2026-02-16T15:24:02Z

Regarding an erlier agentic comment about the potential impact of direct txn fragment writing to storage on PureFileStorage.trim_log!, I've asked Claude to analyse it and here's the result - https://gist.github.com/alco/588b3d7d7665f63d768d5f1b611499c7#file-fragment_storage_safety_analysis-md.

TL;DR: Fragment Streaming Storage Safety

All 7 crash/consistency scenarios are safe. The three-offset architecture (last_seen_offset / last_seen_txn_offset / last_persisted_txn_offset) provides the core safety guarantee:

trim_log!/2 correctly truncates uncommitted fragments — it uses last_persisted_txn_offset (only advanced after a complete txn is finalized + flushed) to remove both chunk index entries and log file bytes beyond the committed boundary.
Readers never see uncommitted data — stream_main_log caps reads at last_seen_txn_offset, which isn't advanced until signal_txn_commit! calls finalize_txn.
ETS consistency is maintained — fragment entries stay in an in-memory buffer until finalize_txn; the disk/ETS split read strategy handles mid-transaction flushes correctly.
Compaction has one theoretical edge case (transaction spanning 30+ MB creating 3+ chunks) where the compacted file could retain uncommitted data after a crash, but readers are still bounded by the txn offset, so no data corruption occurs. Practically, this is a near-zero likelihood scenario.

No code changes are needed — the existing invariants hold correctly with the fragment write path.

robacourt

LGTM. To test the affect of this on subqueries, you could cherry-pick 0cc3b49 and run the tests with the number of mutations in the test transactions over the max txn size and see if there any issues:

ELECTRIC_EXPERIMENTAL_MAX_TXN_SIZE=5 \
MUTATIONS_PER_TXN=15 \
mix test --only oracle test/integration/oracle_property_test.exs

alco · 2026-02-16T17:06:29Z

@robacourt That's dope! It has found one failure already after running for ~10sec. I'll look into fixing it and see if it can discover anything else.

…be at most 1 txn at a time

…ragment

…cing txn boundary independently

…processing txn fragments with commit

To be sure no log items are written to storage, we must verify that neither of the writing-to-storage functions got called.

…ming

On CI the function has been seen to return an empty list where a non-empty one was expected. Apparently, there may be a delay between the calls themselves and the moment they end up in the test process' mailbox.

…rate functions This is needed to be able to selectively notify the appropriate subscriber and processing individual txn fragments.

…tions PendingTxn no longer needs to track last_log_offset since the commit fragment's offset (txn_fragment.last_log_offset) is now used directly in maybe_complete_pending_txn. This fixes flush alignment lagging behind when the final commit fragment is filtered out. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

InMemoryStorage and TestStorage return false, PureFileStorage returns true. This allows runtime capability checking before enabling fragment streaming mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…pport When initializing a consumer, validate that the storage backend supports txn fragment streaming before using :txn_fragment mode. If unsupported (e.g. InMemoryStorage), fall back to full-transaction buffering with a warning log. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously, Materializer sent :materializer_changes to subscribers on every new_changes call. When write_unit=:txn_fragment, this meant the outer Consumer could receive multiple intermediate move-in/move-out notifications within a single transaction, causing redundant work. Add a commit: option to Materializer.new_changes/3 (default: true). Changes are accumulated in pending_events and only flushed to subscribers when commit: true. The fragment-streaming path passes commit: not is_nil(fragment.commit) to defer until the final fragment. maybe_complete_pending_txn sends an empty new_changes with commit: true to handle the edge case where the commit fragment has no matching changes but earlier fragments accumulated events. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ding writes for this txn

Verifies that splitting inserts across two new_changes calls (one uncommitted, one committed) produces the same event order as sending them in a single call. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…bout ordering

alco · 2026-02-18T17:01:48Z

@robacourt Pushed 751cf24 to correctly incorporate your recent addition of cancelling out move-ins/move-outs.

alco · 2026-02-18T19:08:57Z

I've just run oracle tests with max_batch_size=5 and mutations_per_txn=15 and can relibably reproduce this issue:

Poll error for shape=shape_9 where=(level_3_id = 'l3-5') OR (value BETWEEN 'v1' AND 'v10'): %Electric.Client.Error{message: %{"errors" => %{"offset" => ["out of bounds for this shape"]}, "message" => "Invalid request"}, resp: %Electric.Client.Fetch.Response{status: 400, last_offset: nil, shape_handle: nil, schema: nil, next_cursor: nil, request_timestamp: ~U[2026-02-18 18:57:16.838842Z], body: %{"errors" => %{"offset" => ["out of bounds for this shape"]}, "message" => "Invalid request"}, headers: %{"access-control-allow-methods" => ["GET, POST, HEAD, DELETE, OPTIONS"], "access-control-allow-origin" => ["*"], "access-control-expose-headers" => ["electric-cursor,electric-handle,electric-offset,electric-schema,electric-up-to-date,electric-internal-known-error,retry-after"], "cache-control" => ["no-store"], "content-type" => ["application/json; charset=utf-8"], "date" => ["Wed, 18 Feb 2026 18:57:16 GMT"], "electric-server" => ["ElectricSQL/1.4.4"], "surrogate-control" => ["no-store"], "vary" => ["accept-encoding"], "x-request-id" => ["GJVsPM1fclpQUIgAAEJC"]}}}

The key bit is %Electric.Client.Error{message: %{"errors" => %{"offset" => ["out of bounds for this shape"]}, "message" => "Invalid request"}

So apparently Electric returns incorrect offset that's not at transaction boundary.

Silver lining is that if I set write_unit: :write_unit_txn in this branch, that is stick to buffering txn fragments in memory, the oracle test harness only finds inconsistency errors similar to those that can be reproduced in main.

I may be able to fix the handling of multiple txn fragments for shapes without subqueries first and we ship those. Then deal with subqueries.

balegas · 2026-02-18T23:06:19Z

:( if it's easy to reproduce, we should be able to figure the root cause. There is no rush to ship this PR, so let's do it properly when we know the fix.

This comment has been minimized.

Sign in to view

alco force-pushed the alco/write-txn-fragments-to-storage branch from d3dc0f4 to 1c882c2 Compare January 28, 2026 11:18

alco force-pushed the alco/write-txn-fragments-to-storage branch from 1c882c2 to 71b61f5 Compare February 9, 2026 15:12

This comment has been minimized.

Sign in to view

alco force-pushed the alco/write-txn-fragments-to-storage branch from dba7fc3 to c7d9d17 Compare February 10, 2026 12:42

This comment has been minimized.

Sign in to view

alco mentioned this pull request Feb 10, 2026

Automated fuzz-testing harness for transaction processing pipeline #3821

Open

alco changed the title ~~Alco/write txn fragments to storage~~ feat: Write transaction fragments directly to storage to reduce consumer memory footprint Feb 10, 2026

alco force-pushed the alco/write-txn-fragments-to-storage branch from c7d9d17 to ccaed2d Compare February 10, 2026 13:54

alco marked this pull request as ready for review February 10, 2026 14:26

chatgpt-codex-connector bot reviewed Feb 10, 2026

View reviewed changes

packages/sync-service/lib/electric/shapes/consumer/state.ex Outdated Show resolved Hide resolved

packages/sync-service/lib/electric/shapes/consumer.ex Outdated Show resolved Hide resolved

packages/sync-service/lib/electric/shapes/consumer.ex Outdated Show resolved Hide resolved

alco mentioned this pull request Feb 10, 2026

feat(sync-service): Write transaction fragments directly to storage #3740

Closed

robacourt approved these changes Feb 11, 2026

View reviewed changes

alco added the claude label Feb 11, 2026

alco force-pushed the alco/write-txn-fragments-to-storage branch 2 times, most recently from 5f18340 to 4714a7a Compare February 16, 2026 13:08

This comment has been minimized.

Sign in to view

robacourt approved these changes Feb 16, 2026

View reviewed changes

alco and others added 18 commits February 18, 2026 12:28

Refactor whole transaction handling in Consumer since we know it can …

0668444

…be at most 1 txn at a time

Write transaction fragments directly to storage when write_unit=txn_f…

358d7a9

…ragment

Clearer var naming

425e941

Implement new storage functions for direct txn fragment writing

eb5510d

Update PureFileStorage.WriteLoop to allow writing log lines and advan…

28a8263

…cing txn boundary independently

Minor cleanup of PureFileStorage

4edf2b2

In ShapeLogCollector make sure the FlushTracker is only invoked when …

455d12a

…processing txn fragments with commit

Include full-txn and txn-fragment storage calls in test assertions

5d111a1

To be sure no log items are written to storage, we must verify that neither of the writing-to-storage functions got called.

Update shape consumer tests to cover direct fragment-to-storage strea…

d4c101a

…ming

Test new PureFileStorage functions

77bc882

Add changeset

bf78fec

Add a timeout in Support.StorageTracer.collect_traced_calls()

c6529b5

On CI the function has been seen to return an empty list where a non-empty one was expected. Apparently, there may be a delay between the calls themselves and the moment they end up in the test process' mailbox.

Split notifying the materializer and clients of new changes into sepa…

0dab541

…rate functions This is needed to be able to selectively notify the appropriate subscriber and processing individual txn fragments.

Fix typo

5b9070b

Add supports_txn_fragment_streaming? callback to Storage behaviour

fde10f3

InMemoryStorage and TestStorage return false, PureFileStorage returns true. This allows runtime capability checking before enabling fragment streaming mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix code formatting

14f16b0

alco force-pushed the alco/write-txn-fragments-to-storage branch from 412234d to 60f11a8 Compare February 18, 2026 11:40

alco and others added 3 commits February 18, 2026 12:41

Do not signal txn commit to storage if it there hasn't been any prece…

a6a4579

…ding writes for this txn

Add test for event ordering across uncommitted fragments

bb169f2

Verifies that splitting inserts across two new_changes calls (one uncommitted, one committed) produces the same event order as sending them in a single call. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alco force-pushed the alco/write-txn-fragments-to-storage branch from 60f11a8 to 634162c Compare February 18, 2026 11:41

This comment has been minimized.

Sign in to view

Fix filtering of empty materializer events and incorrect assumption a…

751cf24

…bout ordering

alco force-pushed the alco/write-txn-fragments-to-storage branch from 674ad7e to 751cf24 Compare February 18, 2026 16:07

alco marked this pull request as draft February 18, 2026 19:09

Conversation

alco commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Key additions

Other notable changes

Expected memory footprint reduction

Testing

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited by alco Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

netlify bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for electric-next ready!

Uh oh!

codecov bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

alco commented Feb 10, 2026

Manual memory benchmark results

Memory charts

electric@main

electric@this-branch

System stats comparison

electric@main

electric@this-branch

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alco commented Feb 10, 2026

Uh oh!

alco commented Feb 11, 2026

Uh oh!

balegas commented Feb 11, 2026

Uh oh!

robacourt left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Summary

What's Working Well

Issues Found

Critical (Must Fix)

Important (Should Fix)

Test Coverage

Issue Conformance

Architecture & Code Quality

Previous Review Status (Review #6 → Review #7)

Next Steps Before Merge

Overall Assessment

Uh oh!

alco commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update (Feb 16)

Uh oh!

This comment has been minimized.

alco commented Feb 16, 2026

Latest changes (3 new commits since last push)

Uh oh!

alco commented Feb 16, 2026

TL;DR: Fragment Streaming Storage Safety

Uh oh!

robacourt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alco commented Jan 27, 2026 •

edited

Loading

coderabbitai bot commented Jan 27, 2026 •

edited by alco

Loading

netlify bot commented Jan 27, 2026 •

edited

Loading

codecov bot commented Jan 27, 2026 •

edited

Loading

claude bot commented Feb 11, 2026 •

edited

Loading

alco commented Feb 16, 2026 •

edited

Loading

robacourt left a comment •

edited

Loading

alco commented Feb 18, 2026 •

edited

Loading

balegas commented Feb 18, 2026 •

edited

Loading