fix: Use table schema rather than file schema when projecting properties#20174
fix: Use table schema rather than file schema when projecting properties#20174alamb wants to merge 3 commits intoapache:mainfrom
Conversation
| self.output_ordering.clone(), | ||
| ) | ||
| .with_constraints(self.constraints.clone()); | ||
| let orderings = project_orderings(&self.output_ordering, schema); |
There was a problem hiding this comment.
this is the fix (to call project_orderings) -- the rest is tests
There was a problem hiding this comment.
After some more thought I am not sure if this is correct 🤔 I need to do some more research into when the projection is applied
| @@ -685,11 +685,10 @@ impl DataSource for FileScanConfig { | |||
|
|
|||
| fn eq_properties(&self) -> EquivalenceProperties { | |||
There was a problem hiding this comment.
I also noticed the equivalence properties are computed on demand each time during planning, which can be quite substantial. I'll see if I can find some way to avoid doing so as a follow on
6554aac to
35209a9
Compare
| let object_store_url = ObjectStoreUrl::parse("test:///").unwrap(); | ||
|
|
||
| let table_schema = TableSchema::new(Arc::clone(&file_schema), vec![]); | ||
| let table_schema = TableSchema::from(Arc::clone(&file_schema)); |
There was a problem hiding this comment.
this is a driveby cleanup to reduce test duplication (codex actually copied a lot of this extra bloat into the new tests, so instead I went and cleaned it up at the source)
I can break it into a new PR if you prefer
| self.output_ordering.clone(), | ||
| ) | ||
| .with_constraints(self.constraints.clone()); | ||
| let orderings = project_orderings(&self.output_ordering, schema); |
There was a problem hiding this comment.
After some more thought I am not sure if this is correct 🤔 I need to do some more research into when the projection is applied
Which issue does this PR close?
Rationale for this change
This is a regression we found in our unit tests.
When upgrading to DataFusion 52 we hit a bug in our test cases where pre-sorted data was being resorted
What changes are included in this PR?
Fix the bug
Are these changes tested?
Yes
I tried for a while with codex to get a .slt test purely via SQL but was not successful. To trigger the bug the output_ordering needs to not yet be projected before eq_properties() runs (that’s the only case the project_orderings(...) fix addresses). The SQL planner seems to always build output_ordering in the base schema, so the bug doesn’t manifest.
Here is what I tried
Details
Are there any user-facing changes?