-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Description
Background
The external table metadata caching layer has grown organically across engines (Hive, Iceberg, Paimon, Hudi, MaxCompute, etc.). Each engine has its own cache design with different structures, invalidation strategies, TTL semantics, and configuration formats. This makes it difficult to reason about cache behavior, maintain consistency, and add new engines.
Goal
Build a unified metadata cache framework for all external table engines, providing:
- Common abstractions: a single set of interfaces for cache modules, engine adapters, and cache specifications that all engines implement
- Per-catalog isolation: each catalog owns independent cache instances with independent configuration
- Consistent configuration: unified property format and TTL/capacity semantics across all engines
- Unified invalidation: catalog / database / table level invalidation through a common path
- Unified monitoring: cache stats and metrics accessible through a single entry point
- Lazy loading: defer expensive metadata loading (snapshots, schemas, file lists) until actually needed
- Schema convergence: embed schema into table/snapshot cache where possible, reducing dependency on standalone schema cache
Related PRs
- [Feature](iceberg) Add manifest-level cache for Iceberg tables to reduce I/O and parsing overhead #59056 — Add manifest-level cache for Iceberg tables to reduce I/O and parsing overhead
- [enhance](iceberg) Refactor Iceberg metadata cache structure and add table cache test cases #59716 — Refactor Iceberg metadata cache structure and add table cache test cases
- [refactor](paimon) Per-catalog Paimon metadata cache with two-level table+snapshot structure #60478 — Per-catalog Paimon metadata cache with two-level table+snapshot structure
- (in progress) Unified meta cache framework with engine adapters
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels