docs(rfc): add static CSV provider specification by LNSD · Pull Request #1701 · edgeandnode/amp

LNSD · 2026-02-05T16:33:28Z

Define the full design for amp-providers-static, covering provider config schema, CSV schema inference with column name sanitization, small-file in-memory caching, and lazy catalog integration into the providers registry.

Specify three-phase implementation plan with dependency ordering
Document provider TOML config with grouped tables and column mapping
Define schema inference rules, header auto-detection, and sanitization
Outline in-memory cache strategy with configurable byte threshold
Record all resolved design decisions in verification log

Define the full design for amp-providers-static, covering provider config schema, CSV schema inference with column name sanitization, small-file in-memory caching, and lazy catalog integration into the providers registry. - Specify three-phase implementation plan with dependency ordering - Document provider TOML config with grouped tables and column mapping - Define schema inference rules, header auto-detection, and sanitization - Outline in-memory cache strategy with configurable byte threshold - Record all resolved design decisions in verification log Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>

leoyvens · 2026-02-05T16:54:00Z

Did you consider using datasets instead of providers for this? Then this would benefit from the tooling for dataset discoverability.

LNSD · 2026-02-05T19:35:22Z

Did you consider using datasets instead of providers for this? Then this would benefit from the tooling for dataset discoverability.

That's a very good point. This is something that we should consider after the POC. Yes.

I see, at this moment, two main issues:

Coupling between datasets and materialization

Datasets require writing Parquet files to the Amp data lake.

Basically, there is no separation between extractors and datasets. These two concepts are tightly coupled. With the work in #1673, we'll be able to separate the two concepts (the materialized data from the dataset definition).

A "static-file" dataset: permissioned nature

The issue stems from the nature of the data access: a CSV file stored in an object store.

Datasets, as we understand them, are building blocks, distributable units. If one needs credentials to access that file (i.e., it is permissioned), that would limit the utility of that dataset.

In the end, for me, the provider's concept (external services that act as a data source) fits naturally in the mental model.

I am advocating for a POC to enable some use cases in the short term, and that can evolve alongside the dataset authoring work happening in parallel.

LNSD · 2026-02-05T19:39:14Z

Note that the schema description here is a proposal that could be included or replaced completely by the dataset authoring design (e.g., by introducing a new dataset kind).

leoyvens · 2026-02-05T19:43:26Z

Alright we can try out this design then

leoyvens · 2026-02-06T15:51:18Z

Datasets require writing Parquet files to the Amp data lake.

Just to comment on this aspect, there are tradeoffs but it wouldn't be unreasonable to design this such that the CSV data is copied over into Amp table format.

LNSD self-assigned this Feb 5, 2026

LNSD added the data-plane label Feb 5, 2026

LNSD changed the title ~~docs: add static CSV provider specification~~ docs(rfc): add static CSV provider specification Feb 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(rfc): add static CSV provider specification#1701

docs(rfc): add static CSV provider specification#1701
LNSD wants to merge 1 commit intomainfrom
lnsd/feat-providers-static-external-table

LNSD commented Feb 5, 2026

Uh oh!

leoyvens commented Feb 5, 2026

Uh oh!

LNSD commented Feb 5, 2026

Coupling between datasets and materialization

A "static-file" dataset: permissioned nature

Uh oh!

LNSD commented Feb 5, 2026

Uh oh!

leoyvens commented Feb 5, 2026

Uh oh!

leoyvens commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LNSD commented Feb 5, 2026

Uh oh!

leoyvens commented Feb 5, 2026

Uh oh!

LNSD commented Feb 5, 2026

Coupling between datasets and materialization

A "static-file" dataset: permissioned nature

Uh oh!

LNSD commented Feb 5, 2026

Uh oh!

leoyvens commented Feb 5, 2026

Uh oh!

leoyvens commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants