schemachanger: fix race in distributed merge checkpoint updates with lazy SST cleanup#163776
Conversation
|
😎 Merged successfully - details. |
Previously, checkpoint progress updates had a race with SST cleanup. SetBackfillProgress only updates in-memory state; a background goroutine persists the checkpoint to disk at regular intervals. However, SST cleanup was triggered immediately after calling SetBackfillProgress, before the updated checkpoint was durably written. This could result in temporary SSTs being removed too early if a distributed merge phase transition occurred before the checkpoint hit disk. This change moves SST cleanup into the backfill progress tracker and performs it lazily. Cleanup now runs only after the checkpoint has been persisted and a distributed merge phase transition has completed. This ensures that old temporary SSTs are removed only once the corresponding progress is durable, eliminating the race. Fixes cockroachdb#162330 Epic: CRDB-48845 Release note: None
8bce049 to
0974e8d
Compare
fqazi
left a comment
There was a problem hiding this comment.
One question related to the flushing behaviour
@fqazi reviewed 10 files and all commit messages, and made 2 comments.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on spilchen).
pkg/sql/schemachanger/scexec/backfiller/tracker.go line 284 at r1 (raw file):
// After successful write, detect phase transitions and handle cleanup. if b.cleaner != nil {
Do we immediately flush on these transitions? Or do we rely on the default timer based behavior?
spilchen
left a comment
There was a problem hiding this comment.
@spilchen made 1 comment.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on fqazi).
pkg/sql/schemachanger/scexec/backfiller/tracker.go line 284 at r1 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
Do we immediately flush on these transitions? Or do we rely on the default timer based behavior?
We rely on the timer based behaviour. In an earlier attempt at fixing this, I did add FlushCheckpoint() during phase transitions. But that introduced new timing scenarios if two goroutines are calling FlushCheckpoint() at the same time. So, I opted for the lazy update once the checkpoint has been persisted. This ensures only 1 goroutine will ever be calling FlushCheckpoint.
|
TFTR! |
|
/trunk merge |
Previously, distributed merge checkpoint progress updates had a race with SST cleanup. SetBackfillProgress only updates in-memory state; a background goroutine persists the checkpoint to disk at regular intervals. However, SST cleanup was triggered immediately after calling SetBackfillProgress, before the updated checkpoint was durably written. This could result in temporary SSTs being removed too early if a distributed merge phase transition occurred before the checkpoint hit disk.
This change moves SST cleanup into the backfill progress tracker and performs it lazily. Cleanup now runs only after the checkpoint has been persisted and a distributed merge phase transition has completed. This ensures that old temporary SSTs are removed only once the corresponding progress is durable, eliminating the race.
Fixes #162330
Fixes #163264
Epic: CRDB-48845
Release note: None