add minor compaction for Overlord-based compaction supervisors using the MSQ engine#19059
add minor compaction for Overlord-based compaction supervisors using the MSQ engine#19059cecemei wants to merge 13 commits intoapache:masterfrom
Conversation
| final DataSegment segment = new DataSegment( | ||
| "foo", | ||
| Intervals.of("2023-01-0" + i + "/2023-01-0" + (i + 1)), | ||
| "2023-01-0" + i, | ||
| ImmutableMap.of("path", "a-" + i), | ||
| ImmutableList.of("dim1"), | ||
| ImmutableList.of("m1"), | ||
| new DimensionRangeShardSpec(List.of("dim1"), null, null, i - 1, 8), | ||
| 9, | ||
| 100 | ||
| ); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
capistrant
left a comment
There was a problem hiding this comment.
This is awesome. left some minor comments
server/src/main/java/org/apache/druid/server/compaction/CompactionMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/compaction/MostFragmentedIntervalFirstPolicy.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/compaction/CompactionCandidate.java
Outdated
Show resolved
Hide resolved
| @@ -282,18 +283,20 @@ private boolean startJobIfPendingAndReady( | |||
| } | |||
|
|
|||
| // Check if the job is already running, completed or skipped | |||
There was a problem hiding this comment.
nit: this comment might need a refresh to reflect how the status handling has changed versus the former code. I was confused about why PENDING and COMPLETE resulted in a throw until I reviewed deriveCompactionStatus
There was a problem hiding this comment.
there's no logic change here, just made the code more defensive. updated the comment here. ihmo, CompactionStatus is not the best class to use here.
There was a problem hiding this comment.
actually i guess there's some logic change, the policy check has been moved to CompactionConfigBasedJobTemplate since creating the job needs to know the compaction mode (which is decided by the policy), but still the computeCompactionStatus only returns running/skipped/pending state.
There was a problem hiding this comment.
should we update the javadoc to call out how this method now offers special handling for incremental compaction that includes upgrading already compacted segments?
...ing-service/src/main/java/org/apache/druid/indexing/common/actions/SegmentUpgradeAction.java
Outdated
Show resolved
Hide resolved
...e/src/test/java/org/apache/druid/indexing/common/actions/MarkSegmentToUpgradeActionTest.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/timeline/partition/LinearShardSpec.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java
Outdated
Show resolved
Hide resolved
kfaraz
left a comment
There was a problem hiding this comment.
Leaving a partial review for the supporting changes.
Will review the core changes in CompactionTask and IndexerSQLMetadataStorageCoordinator shortly.
server/src/main/java/org/apache/druid/server/compaction/CompactionMode.java
Outdated
Show resolved
Hide resolved
...-tests/src/test/java/org/apache/druid/testing/embedded/compact/CompactionSupervisorTest.java
Outdated
Show resolved
Hide resolved
...-tests/src/test/java/org/apache/druid/testing/embedded/compact/CompactionSupervisorTest.java
Outdated
Show resolved
Hide resolved
...-tests/src/test/java/org/apache/druid/testing/embedded/compact/CompactionSupervisorTest.java
Outdated
Show resolved
Hide resolved
...-tests/src/test/java/org/apache/druid/testing/embedded/compact/CompactionSupervisorTest.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/compaction/MostFragmentedIntervalFirstPolicy.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/compaction/MostFragmentedIntervalFirstPolicy.java
Outdated
Show resolved
Hide resolved
| case FULL_COMPACTION: | ||
| clientCompactionIntervalSpec = new ClientCompactionIntervalSpec(entry.getCompactionInterval(), null, null); | ||
| break; | ||
| case INCREMENTAL_COMPACTION: |
There was a problem hiding this comment.
As discussed earlier in the old PR, we should avoid supporting incremental compaction in the Coordinator-based CompactSegments duty to drive adoption of the Overlord-based compaction supervisors.
This will allow us to deprecate and eventually remove Coordinator-based compaction duty.
There was a problem hiding this comment.
this method is used by both coordinator and overlord, coordinator would always use ALL_SEGMENTS compaction mode in line 266.
...ervice/src/main/java/org/apache/druid/indexing/compact/CompactionConfigBasedJobTemplate.java
Outdated
Show resolved
Hide resolved
| finalCandidate = CompactionCandidate.from(candidate.getUncompactedSegments(), null) | ||
| .withCurrentStatus(candidate.getCurrentStatus()); |
Description
This PR introduces minor compaction support in msq engine to Apache Druid, enabling compaction operations on only uncompacted segments within an interval while upgrading already-compacted segments for consistency.
Key Changes
A new CompactionInputSpec implementation specifically for minor compaction:
SegmentUpgradeAction
A new task action that updates segment metadata without rewriting data, modifying partition numbers and shard specifications of already-compacted segments.
ShardSpec Interface Enhancements
Extended with mutation methods including withPartitionNum(), withCorePartitions(), and isNumChunkSupported() checks.
Policy-Level Control
Introduced minUncompactedBytesPercentForFullCompaction threshold in MostFragmentedIntervalFirstPolicy determining when incremental versus full compaction applies based on uncompacted-to-total segment byte ratios.
Builder Pattern Implementation
Replaced verbose constructors in UserCompactionTaskQueryTuningConfig with a builder pattern for improved readability.
Release Notes
This feature enables compacting newly ingested segments while preserving already-compacted data, with configuration through ratio thresholds in compaction policies. Support is currently limited to Overlord-based compaction supervisors using the MSQ engine.
This PR has: