Skip to content

schemachanger: fix race in distributed merge checkpoint updates with lazy SST cleanup#163776

Merged
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
spilchen:gh-162330/260217/1441/merge/early-sst-removal-bugfix
Feb 18, 2026
Merged

schemachanger: fix race in distributed merge checkpoint updates with lazy SST cleanup#163776
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
spilchen:gh-162330/260217/1441/merge/early-sst-removal-bugfix

Conversation

@spilchen
Copy link
Contributor

@spilchen spilchen commented Feb 17, 2026

Previously, distributed merge checkpoint progress updates had a race with SST cleanup. SetBackfillProgress only updates in-memory state; a background goroutine persists the checkpoint to disk at regular intervals. However, SST cleanup was triggered immediately after calling SetBackfillProgress, before the updated checkpoint was durably written. This could result in temporary SSTs being removed too early if a distributed merge phase transition occurred before the checkpoint hit disk.

This change moves SST cleanup into the backfill progress tracker and performs it lazily. Cleanup now runs only after the checkpoint has been persisted and a distributed merge phase transition has completed. This ensures that old temporary SSTs are removed only once the corresponding progress is durable, eliminating the race.

Fixes #162330
Fixes #163264
Epic: CRDB-48845

Release note: None

@spilchen spilchen self-assigned this Feb 17, 2026
@trunk-io
Copy link
Contributor

trunk-io bot commented Feb 17, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Previously, checkpoint progress updates had a race with SST cleanup.
SetBackfillProgress only updates in-memory state; a background
goroutine persists the checkpoint to disk at regular intervals. However,
SST cleanup was triggered immediately after calling
SetBackfillProgress, before the updated checkpoint was durably
written. This could result in temporary SSTs being removed too early if
a distributed merge phase transition occurred before the checkpoint hit
disk.

This change moves SST cleanup into the backfill progress tracker and
performs it lazily. Cleanup now runs only after the checkpoint has been
persisted and a distributed merge phase transition has completed. This
ensures that old temporary SSTs are removed only once the corresponding
progress is durable, eliminating the race.

Fixes cockroachdb#162330
Epic: CRDB-48845

Release note: None
@spilchen spilchen force-pushed the gh-162330/260217/1441/merge/early-sst-removal-bugfix branch from 8bce049 to 0974e8d Compare February 18, 2026 13:07
@spilchen spilchen changed the title schemachanger: fix race in checkpoint updates with lazy SST cleanup schemachanger: fix race in distributed merge checkpoint updates with lazy SST cleanup Feb 18, 2026
@spilchen spilchen marked this pull request as ready for review February 18, 2026 13:09
@spilchen spilchen requested a review from a team as a code owner February 18, 2026 13:09
Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

One question related to the flushing behaviour

@fqazi reviewed 10 files and all commit messages, and made 2 comments.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on spilchen).


pkg/sql/schemachanger/scexec/backfiller/tracker.go line 284 at r1 (raw file):

	// After successful write, detect phase transitions and handle cleanup.
	if b.cleaner != nil {

Do we immediately flush on these transitions? Or do we rely on the default timer based behavior?

Copy link
Contributor Author

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spilchen made 1 comment.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on fqazi).


pkg/sql/schemachanger/scexec/backfiller/tracker.go line 284 at r1 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

Do we immediately flush on these transitions? Or do we rely on the default timer based behavior?

We rely on the timer based behaviour. In an earlier attempt at fixing this, I did add FlushCheckpoint() during phase transitions. But that introduced new timing scenarios if two goroutines are calling FlushCheckpoint() at the same time. So, I opted for the lazy update once the checkpoint has been persisted. This ensures only 1 goroutine will ever be calling FlushCheckpoint.

@spilchen
Copy link
Contributor Author

TFTR!

@spilchen
Copy link
Contributor Author

/trunk merge

@trunk-io trunk-io bot merged commit 13fafdc into cockroachdb:master Feb 18, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: TestDistributedMergeStoragePrefixPreservedAcrossPauseResume failed sql: TestDistributedMergeResumePreservesProgress failed

3 participants

Comments