-
Notifications
You must be signed in to change notification settings - Fork 285
Open
Labels
enhancementNew feature or requestNew feature or request
Description
I opened #3446 initially to apply a bunch of changes from #3349 to CometNativeScan. The PR got away from me though, particularly due to DPP. I had it working for the static DPP rules, but the dynamic ones that come from AQE are tricky. I am opening this issue to track the things I'd like to tackle in smaller chunks rather than one giant PR:
- Per-partition serde to no longer send every
SparkFilePartitionof tasks to every partition. This simply reduces serde overhead for large scans. Addressed by feat: CometNativeScan per-partition plan serde #3511. - DPP (non-AQE) for V1 operator. These runtime filters are created by Spark's
PlanDynamicPruningFiltersand are easier for Comet support since this rule runs before Comet's rules. - DPP (AQE) for V1 operator. These runtime filters are created by Spark's
PlanAdaptiveDynamicPruningFiltersand are difficult for Comet support since this rule runs after Comet's rules. I'll summarize my learning from feat: CometNativeScan per-partition plan data, add DPP [iceberg] #3446:- Comet's rules replace things like
BroadcastHashJoinwithCometBroadcastHashJoin, whichPlanAdaptiveDynamicPruningFiltersdoes not recognize. - We can't modify Spark rules, so we could wait until after
PlanAdaptiveDynamicPruningFiltersruns. This requires registering new Comet rules after where they currently run. I tried to create a simple rule to defer justBroadcastHashJoinreplacement until later, but this became too complicated with multiple scan implementations. I think when we pare down our scan implementations, we can revisit a broader redesign of Comet rules in a way that works better with AQE. We will need this for stronger Spark 4.0 support.
- Comet's rules replace things like
- CometNativeBatchScan operator. See Add native_datafusion V2 DataSource API reader #3481. Will not support DPP, so pretty quick to implement.
- Minor refactor tasks: consider renaming these (CometNativeScan, CometNativeBatchScan) since we now have other scan types. Don't wrap them in
CometScanorCometBatchScan. This might allow us to collapse our CometScanRule and CometExecRule. Part of the larger rule refactoring discussed above.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request