Skip to content

repro and disable dyn filter for preserve file partitions#20175

Open
gene-bordegaray wants to merge 3 commits intoapache:mainfrom
gene-bordegaray:gene.bordegaray/2026/02/disable_dynamic_filtering_on_preserve_file_partition
Open

repro and disable dyn filter for preserve file partitions#20175
gene-bordegaray wants to merge 3 commits intoapache:mainfrom
gene-bordegaray:gene.bordegaray/2026/02/disable_dynamic_filtering_on_preserve_file_partition

Conversation

@gene-bordegaray
Copy link
Contributor

@gene-bordegaray gene-bordegaray commented Feb 5, 2026

Which issue does this PR close?

Rationale for this change

Dynamic filter pushdown can produce incorrect results when preserve_file_partitions is enabled and a partitioned hash join is used. The file groups are Hive‑partitioned (value‑based) but reported as hash‑partitioned, so hash‑routed dynamic filters can drop valid rows.

What changes are included in this PR?

  • Disable join dynamic filter pushdown for PartitionMode::Partitioned when preserve_file_partitions > 0.
  • Add two sqllogictests that reproduce the issue and validate the fix.

Are these changes tested?

  • cargo test --test sqllogictests -- preserve_file_partitioning -> will pass
  • git checkout main ‎datafusion/physical-plan/src/joins/hash_join/exec.rs
  • cargo test --test sqllogictests -- preserve_file_partitioning -> will fail

Are there any user-facing changes?

Join dynamic filter pushdown is disabled when preserve_file_partitions is enabled and the join is partitioned.

cc: @NGA-TRAN @gabotechs

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Feb 5, 2026
Comment on lines 617 to 623
// `preserve_file_partitions` can report Hash partitioning for Hive-style
// file groups, but those partitions are not actually hash-distributed.
// Partitioned dynamic filters rely on hash routing, so disable them in
// this mode to avoid incorrect results.
if config.optimizer.preserve_file_partitions > 0
&& self.mode == PartitionMode::Partitioned
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine we'd want this at some point. Is there any issue we can link here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make an issue today and will link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I link in PR or comment itself?

Here is issue: #20195

cc: @adriangb @gabotechs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will assume the comment, can't hurt

Copy link
Contributor

@LiaCastaneda LiaCastaneda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Maybe a way to solve this would be to allow specifying a different kind of routing, not necessarily hash. I'm not sure if this would imply defining a new kind of partitioning, but if users can specify this, should partitioning be a trait? 🤔

Copy link
Contributor

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good PR to prevent incorrect results.
In the near future, we want to support dynamic filtering for this use case, too

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gene-bordegaray
Copy link
Contributor Author

but if users can specify this, should partitioning be a trait? 🤔

this is a really great idea and something that @fmonjalet brought up. I think that partitioning being a trait would be very valuable. I have mentioned follow up work in #20195 to explore this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

incorrect dynamic filter on partitioned join when file partitioned

5 participants