Skip to content

Conversation

@DieterDP-ng
Copy link
Contributor

The BackupLogCleaner prevents WAL files that are needed for future backups from being deleted. In the case where a backup root has a single running backup, there was a small timeframe where relevant files were unprotected because only completed backups were taken into consideration. This commit fixes this.

The old mechanism relied on the "backup start code", which is a timestamp that denotes (per backup root) the lowest (earliest) log-roll timestamp that occurred for the backup. Because this concept had no added value, but is complex to reason about, it is removed. Usages are replaced with equal behavior based on timestamps stored in the backup info. (The backup start codes were calculated in the same way, just stored separately.)

Note that the backup start code calculation suffers from HBASE-29628 (log-roll timestamps of decommissioned region servers are not cleaned up, causing the start code to be lower than it should be). That problem is still present in this commit.

The BackupLogCleaner prevents WAL files that are needed for future backups
from being deleted. In the case where a backup root has a single running
backup, there was a small timeframe where relevant files were unprotected
because only completed backups were taken into consideration. This commit
fixes this.

The old mechanism relied on the "backup start code", which is a timestamp
that denotes (per backup root) the lowest (earliest) log-roll timestamp that
occurred for the backup. Because this concept had no added value, but is
complex to reason about, it is removed. Usages are replaced with equal
behavior based on timestamps stored in the backup info. (The backup start
codes were calculated in the same way, just stored separately.)

Note that the backup start code calculation suffers from HBASE-29628
(log-roll timestamps of decommissioned region servers are not cleaned up,
causing the start code to be lower than it should be). That problem is
still present in this commit.
for (BackupInfo backup : backups) {
BackupInfo existingEntry = newestBackupPerRootDir.get(backup.getBackupRootDir());
if (existingEntry == null || existingEntry.getStartTs() < backup.getStartTs()) {
if (existingEntry == null || existingEntry.getState() == BackupState.RUNNING) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are backups returned sorted latest --> oldest? If so this change makes sense, otherwise we may not be grabbing the latest completed backup with the first entry


Long storedTs = boundaries.get(regionServerAddress);
if (storedTs == null || logRollTs < storedTs) {
boundaries.put(regionServerAddress, logRollTs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note. I've noticed that the current implementation of BackupBoundaries can lead to problems due to boundaries being a global map.

I lean towards having a separate PR to address those issues, but if you'd prefer we could group together both changes in the same patch

@hgromer
Copy link
Contributor

hgromer commented Feb 12, 2026

This looks good to me; appreciate the code cleanups here as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants