Skip to content

Conversation

@platinumhamburg
Copy link
Contributor

This PR introduces the undo recovery mechanism for Flink sink writer to handle failure recovery scenarios with aggregation merge engine tables.

Key components:

  • ByteArrayWrapper: Utility class for using byte arrays as map keys
  • UndoComputer: Computes undo operations by comparing checkpoint state with current log records, supporting both full row and partial update modes
  • UndoRecoveryExecutor: Executes undo operations using UpsertWriter
  • UndoRecoveryCoordinator: Coordinates the recovery process across buckets, managing log scanning, undo computation, and execution
  • BucketRecoveryContext: Holds per-bucket recovery state

The undo recovery works by:

  1. Scanning log records from checkpoint offset to current end offset
  2. Computing inverse operations for uncommitted records
  3. Executing undo operations to restore table state

For partial update mode, INSERT records require full row deletion (not partial column deletion), which is handled by using a separate delete writer with null target columns.

Purpose

Linked issue: close #2544 2544

Brief change log

Tests

API and Format

Documentation

@platinumhamburg platinumhamburg force-pushed the undoinfra branch 3 times, most recently from edd4525 to a75f04f Compare February 2, 2026 05:02
This PR introduces the undo recovery mechanism for Flink sink writer
to handle failure recovery scenarios with aggregation merge engine tables.

Key components:
- ByteArrayWrapper: Utility class for using byte arrays as map keys
- UndoComputer: Computes undo operations by comparing checkpoint state
  with current log records, supporting both full row and partial update modes
- UndoRecoveryExecutor: Executes undo operations using UpsertWriter
- UndoRecoveryCoordinator: Coordinates the recovery process across buckets,
  managing log scanning, undo computation, and execution
- BucketRecoveryContext: Holds per-bucket recovery state

The undo recovery works by:
1. Scanning log records from checkpoint offset to current end offset
2. Computing inverse operations for uncommitted records
3. Executing undo operations to restore table state

For partial update mode, INSERT records require full row deletion
(not partial column deletion), which is handled by using a separate
delete writer with null target columns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add undo recovery support for aggregation tables

1 participant