Skip to content

fix(glue): Support create_table for S3 Tables federated databases#3058

Open
jamesbornholt wants to merge 4 commits intoapache:mainfrom
jamesbornholt:fix-glue-create-table-s3tables
Open

fix(glue): Support create_table for S3 Tables federated databases#3058
jamesbornholt wants to merge 4 commits intoapache:mainfrom
jamesbornholt:fix-glue-create-table-s3tables

Conversation

@jamesbornholt
Copy link

Rationale for this change

GlueCatalog.create_table() fails for S3 Tables because it tries to write Iceberg metadata to a warehouse-derived location before registering the table in Glue. S3 Tables manages storage internally, so the location is not known until the table exists in the service.

Detect S3 Tables federated databases (FederatedDatabase.ConnectionType == "aws:s3tables") and use a two-phase create: first create a minimal table entry in Glue to have S3 Tables allocate storage, then write Iceberg metadata to the managed location and update the Glue table with the metadata pointer. On failure, clean up the table entry.

This follows the same "pre-create then update" pattern used by the Java GlueCatalog when LakeFormation is enabled
(GlueTableOperations.createGlueTempTableIfNecessary).

Are these changes tested?

moto doesn't mock with enough fidelity to test this directly. I've tested locally against both S3 Tables and vanilla Glue Iceberg and confirmed CreateTable still works on both paths.

Are there any user-facing changes?

No.

GlueCatalog.create_table() fails for S3 Tables because it tries to
write Iceberg metadata to a warehouse-derived location before
registering the table in Glue. S3 Tables manages storage internally,
so the location is not known until the table exists in the service.

Detect S3 Tables federated databases (FederatedDatabase.ConnectionType
== "aws:s3tables") and use a two-phase create: first create a minimal
table entry in Glue to have S3 Tables allocate storage, then write
Iceberg metadata to the managed location and update the Glue table
with the metadata pointer. On failure, clean up the table entry.

This follows the same "pre-create then update" pattern used by the
Java GlueCatalog when LakeFormation is enabled
(GlueTableOperations.createGlueTempTableIfNecessary).
@jamesbornholt jamesbornholt force-pushed the fix-glue-create-table-s3tables branch from 5633f4e to 9708909 Compare February 17, 2026 07:31
@kevinjqliu
Copy link
Contributor

thanks for the PR! do you know of a way to test this change?

Add three moto-based tests for the S3 Tables code path in GlueCatalog:
- create_table succeeds and uses the table warehouse location
- create_table rejects an explicit location for S3 Tables databases
- create_table raises TableAlreadyExistsError for duplicate tables

Moto does not support FederatedDatabase or S3 Tables storage allocation,
so the tests patch FakeDatabase.as_dict to return FederatedDatabase and
FakeTable.__init__ to inject a StorageDescriptor.Location.
@jamesbornholt jamesbornholt force-pushed the fix-glue-create-table-s3tables branch from 0c4f375 to 4f9299e Compare February 17, 2026 18:23
@jamesbornholt
Copy link
Author

I took a stab at writing some tests. It requires monkeypatching moto to emulate Glue's behavior here, which is not super elegant, but it captures the right behavior.

@kevinjqliu
Copy link
Contributor

@geruh could you take a look when you get a chance?

Copy link
Contributor

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this @jamesbornholt! I left a few comments. At a high level, Looks like we can already both load and commit against s3t tables federated to glue. This would just fill the create_table gap.

Initially I was a bit hesitant about adding support for this use case as it requires 4/5 api calls and in the non s3t federated case it adds an additional call. Although I don't think there is any way around this.

WDYT?


def _create_table_s3tables(
self,
identifier: str | Identifier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we have some redundancy with these params passed in: ident, db_name, tbl_name maybe we can derive?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, done

)
raise

return self.load_table(identifier=identifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably avoid the last load table here with the update table response

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glue UpdateTable doesn't return anything, but we can actually just construct the table from the staged metadata the same way we do it in the commit path: https://github.com/jamesbornholt/iceberg-python/blob/89a129c61b5f6320e78d7875a615b2383d590fa1/pyiceberg/catalog/glue.py#L381-L387. i've done that for both the S3 Tables and regular creation paths now.

…reate_table_s3tables signature

- Extract "aws:s3tables" string literal into GLUE_CONNECTION_S3_TABLES constant
- Remove redundant database_name/table_name params from _create_table_s3tables;
  derive them from identifier inside the method instead
Avoid a redundant GetTable + GetObject round-trip at the end of
create_table and _create_table_s3tables by constructing the Table
from the staged metadata that was just written, matching the pattern
used in commit_table.
@jamesbornholt
Copy link
Author

Thanks for raising this @jamesbornholt! I left a few comments. At a high level, Looks like we can already both load and commit against s3t tables federated to glue. This would just fill the create_table gap.

Initially I was a bit hesitant about adding support for this use case as it requires 4/5 api calls and in the non s3t federated case it adds an additional call. Although I don't think there is any way around this.

WDYT?

I was worried about this too and it's definitely unfortunate, but I also don't see a way around it, and since it's only on the creation path I think it's OK. I took the suggestion of removing the extra LoadTable request at the end of creation, so the non-federated path nets out as the same number of roundtrips now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments