Skip to content

[Refactor][Tracking] Unified external table metadata cache framework #60686

@suxiaogang223

Description

@suxiaogang223

Background

The external table metadata caching layer has grown organically across engines (Hive, Iceberg, Paimon, Hudi, MaxCompute, etc.). Each engine has its own cache design with different structures, invalidation strategies, TTL semantics, and configuration formats. This makes it difficult to reason about cache behavior, maintain consistency, and add new engines.

Goal

Build a unified metadata cache framework for all external table engines, providing:

  • Common abstractions: a single set of interfaces for cache modules, engine adapters, and cache specifications that all engines implement
  • Per-catalog isolation: each catalog owns independent cache instances with independent configuration
  • Consistent configuration: unified property format and TTL/capacity semantics across all engines
  • Unified invalidation: catalog / database / table level invalidation through a common path
  • Unified monitoring: cache stats and metrics accessible through a single entry point
  • Lazy loading: defer expensive metadata loading (snapshots, schemas, file lists) until actually needed
  • Schema convergence: embed schema into table/snapshot cache where possible, reducing dependency on standalone schema cache

Related PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions