Skip to content

Conversation

@ethan-tyler
Copy link

@ethan-tyler ethan-tyler commented Jan 6, 2026

Which issue does this PR close?

Validation for DataFusion 52 release (apache/datafusion#18566). No iceberg-rust issue.

What changes are included in this PR?

  • Update DataFusion from 51.0 to 52.0 (branch-52 release branch)
  • Update Arrow/Parquet from 57.0 to 57.1
  • Python bindings: Update FFI_TableProvider::new call for DataFusion 52 API changes

Breaking API Change in DataFusion FFI

DataFusion 52 changed FFI_TableProvider::new to require two additional arguments:

// Old (3 args) - DataFusion 51
FFI_TableProvider::new(provider, false, Some(runtime()))

// New (5 args) - DataFusion 52
FFI_TableProvider::new(provider, false, Some(runtime()), task_ctx_provider, None)

The new arguments:

  • task_ctx_provider: Provides task execution context for filter expression serialization across FFI boundary
  • logical_codec: Optional codec for serializing logical expressions (None uses default)

Known Limitations

Python DataFusion Table Provider Tests Skipped

The test_datafusion_table_provider.py tests are skipped in this PR due to an FFI version mismatch:

Component Version
Rust datafusion-ffi 52 (branch-52)
Python datafusion (PyPI) 50.x

The DataFusion FFI ABI is not stable across major versions. The Python bindings for DataFusion 52 are not yet released on PyPI. These tests will be re-enabled when:

  1. DataFusion 52 Python bindings are released, AND
  2. pyproject.toml is updated to use datafusion==52.*

Are these changes tested?

Locally verified:

  • iceberg-datafusion: 58 tests pass
  • Python bindings: clippy passes

CI will run the full test suite.

@ethan-tyler
Copy link
Author

The audit failure (RUSTSEC-2026-0001 for rkyv) is unrelated to this PR - it's being addressed in #1994. Will rebase once that lands.

@ethan-tyler
Copy link
Author

Fix for Python Bindings CI Failure

The initial PR failed the Bindings Python CI workflow due to a breaking API change in DataFusion 51+'s FFI module.

Root cause: FFI_TableProvider::new signature changed from 3 to 5 arguments.

Fix (commit 33d5608):

  • Added datafusion and datafusion-execution dependencies to bindings/python/Cargo.toml
  • Updated datafusion_table_provider.rs to create a TaskContextProvider from SessionContext and pass it to FFI_TableProvider::new

The core iceberg-rust crates were already compatible with DataFusion 52 - only the Python bindings needed this update.

- Update DataFusion from 51.0 to 52.0 (pre-release ref 9a9ff8d)
- Update Arrow/Parquet from 57.0 to 57.1
DataFusion 52 requires two additional arguments to FFI_TableProvider::new:
- task_ctx_provider: provides task execution context for filter serialization
- logical_codec: optional codec for serializing logical expressions (None uses default)

Added datafusion and datafusion-execution dependencies to create the required
TaskContextProvider from a SessionContext.
DataFusion 52 Rust FFI is incompatible with datafusion-python 50.x from PyPI.
These tests will be re-enabled when DataFusion 52 Python bindings are released.
@ethan-tyler ethan-tyler force-pushed the chore/datafusion-52-validation branch from 6a154d0 to f4cf8da Compare January 7, 2026 16:42
Switch from pinned git rev (9a9ff8d) to branch-52 for the official
DataFusion 52 release preparation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant