repro and disable dyn filter for preserve file partitions#20175
Conversation
| // `preserve_file_partitions` can report Hash partitioning for Hive-style | ||
| // file groups, but those partitions are not actually hash-distributed. | ||
| // Partitioned dynamic filters rely on hash routing, so disable them in | ||
| // this mode to avoid incorrect results. | ||
| if config.optimizer.preserve_file_partitions > 0 | ||
| && self.mode == PartitionMode::Partitioned | ||
| { |
There was a problem hiding this comment.
I imagine we'd want this at some point. Is there any issue we can link here?
There was a problem hiding this comment.
I can make an issue today and will link
There was a problem hiding this comment.
There was a problem hiding this comment.
I will assume the comment, can't hurt
There was a problem hiding this comment.
👍 Maybe a way to solve this would be to allow specifying a different kind of routing, not necessarily hash. I'm not sure if this would imply defining a new kind of partitioning, but if users can specify this, should partitioning be a trait? 🤔
NGA-TRAN
left a comment
There was a problem hiding this comment.
This is a good PR to prevent incorrect results.
In the near future, we want to support dynamic filtering for this use case, too
this is a really great idea and something that @fmonjalet brought up. I think that partitioning being a trait would be very valuable. I have mentioned follow up work in #20195 to explore this |
Which issue does this PR close?
Rationale for this change
Dynamic filter pushdown can produce incorrect results when
preserve_file_partitionsis enabled and a partitioned hash join is used. The file groups are Hive‑partitioned (value‑based) but reported as hash‑partitioned, so hash‑routed dynamic filters can drop valid rows.What changes are included in this PR?
PartitionMode::Partitionedwhenpreserve_file_partitions > 0.Are these changes tested?
cargo test --test sqllogictests -- preserve_file_partitioning-> will passgit checkout main datafusion/physical-plan/src/joins/hash_join/exec.rscargo test --test sqllogictests -- preserve_file_partitioning-> will failAre there any user-facing changes?
Join dynamic filter pushdown is disabled when
preserve_file_partitionsis enabled and the join is partitioned.cc: @NGA-TRAN @gabotechs