Skip to content

add minor compaction for Overlord-based compaction supervisors using the MSQ engine#19059

Open
cecemei wants to merge 13 commits intoapache:masterfrom
cecemei:compact2.2
Open

add minor compaction for Overlord-based compaction supervisors using the MSQ engine#19059
cecemei wants to merge 13 commits intoapache:masterfrom
cecemei:compact2.2

Conversation

@cecemei
Copy link
Contributor

@cecemei cecemei commented Feb 26, 2026

Description

This PR introduces minor compaction support in msq engine to Apache Druid, enabling compaction operations on only uncompacted segments within an interval while upgrading already-compacted segments for consistency.

Key Changes

  1. CompactionMode Enum Extension
  • Added ALL_SEGMENTS mode (full compaction of all segments)
  • Added UNCOMPACTED_SEGMENTS_ONLY mode (compacts only new segments, upgrades existing ones)
  1. UncompactedInputSpec
    A new CompactionInputSpec implementation specifically for minor compaction:
  • Specifies uncompacted segments to compact within an interval
  • Required (non-nullable) interval and uncompactedSegments fields
  • Type: "uncompacted"
  • Replaces nullable field handling in CompactionIntervalSpec, making the intent of minor compaction explicit
  1. SegmentUpgradeAction
    A new task action that updates segment metadata without rewriting data, modifying partition numbers and shard specifications of already-compacted segments.

  2. ShardSpec Interface Enhancements
    Extended with mutation methods including withPartitionNum(), withCorePartitions(), and isNumChunkSupported() checks.

  3. Policy-Level Control
    Introduced minUncompactedBytesPercentForFullCompaction threshold in MostFragmentedIntervalFirstPolicy determining when incremental versus full compaction applies based on uncompacted-to-total segment byte ratios.

  4. Builder Pattern Implementation
    Replaced verbose constructors in UserCompactionTaskQueryTuningConfig with a builder pattern for improved readability.

Release Notes

This feature enables compacting newly ingested segments while preserving already-compacted data, with configuration through ratio thresholds in compaction policies. Support is currently limited to Overlord-based compaction supervisors using the MSQ engine.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Feb 26, 2026
Comment on lines +666 to +676
final DataSegment segment = new DataSegment(
"foo",
Intervals.of("2023-01-0" + i + "/2023-01-0" + (i + 1)),
"2023-01-0" + i,
ImmutableMap.of("path", "a-" + i),
ImmutableList.of("dim1"),
ImmutableList.of("m1"),
new DimensionRangeShardSpec(List.of("dim1"), null, null, i - 1, 8),
9,
100
);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
DataSegment.DataSegment
should be avoided because it has been deprecated.
@cecemei cecemei requested a review from kfaraz February 26, 2026 03:37
@cecemei cecemei marked this pull request as ready for review February 26, 2026 03:37
@cecemei cecemei mentioned this pull request Feb 26, 2026
10 tasks
Copy link
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. left some minor comments

@@ -282,18 +283,20 @@ private boolean startJobIfPendingAndReady(
}

// Check if the job is already running, completed or skipped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment might need a refresh to reflect how the status handling has changed versus the former code. I was confused about why PENDING and COMPLETE resulted in a throw until I reviewed deriveCompactionStatus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no logic change here, just made the code more defensive. updated the comment here. ihmo, CompactionStatus is not the best class to use here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i guess there's some logic change, the policy check has been moved to CompactionConfigBasedJobTemplate since creating the job needs to know the compaction mode (which is decided by the policy), but still the computeCompactionStatus only returns running/skipped/pending state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we update the javadoc to call out how this method now offers special handling for incremental compaction that includes upgrading already compacted segments?

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a partial review for the supporting changes.
Will review the core changes in CompactionTask and IndexerSQLMetadataStorageCoordinator shortly.

case FULL_COMPACTION:
clientCompactionIntervalSpec = new ClientCompactionIntervalSpec(entry.getCompactionInterval(), null, null);
break;
case INCREMENTAL_COMPACTION:
Copy link
Contributor

@kfaraz kfaraz Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed earlier in the old PR, we should avoid supporting incremental compaction in the Coordinator-based CompactSegments duty to drive adoption of the Overlord-based compaction supervisors.
This will allow us to deprecate and eventually remove Coordinator-based compaction duty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is used by both coordinator and overlord, coordinator would always use ALL_SEGMENTS compaction mode in line 266.

Comment on lines 106 to 107
finalCandidate = CompactionCandidate.from(candidate.getUncompactedSegments(), null)
.withCurrentStatus(candidate.getCurrentStatus());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

@cecemei cecemei changed the title add incremental compaction mode 2.2 add minor compaction for Overlord-based compaction supervisors using the MSQ engine Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Release Notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants