Conversation
Define the full design for amp-providers-static, covering provider config schema, CSV schema inference with column name sanitization, small-file in-memory caching, and lazy catalog integration into the providers registry. - Specify three-phase implementation plan with dependency ordering - Document provider TOML config with grouped tables and column mapping - Define schema inference rules, header auto-detection, and sanitization - Outline in-memory cache strategy with configurable byte threshold - Record all resolved design decisions in verification log Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>
|
Did you consider using datasets instead of providers for this? Then this would benefit from the tooling for dataset discoverability. |
That's a very good point. This is something that we should consider after the POC. Yes. I see, at this moment, two main issues:
In the end, for me, the provider's concept (external services that act as a data source) fits naturally in the mental model. I am advocating for a POC to enable some use cases in the short term, and that can evolve alongside the dataset authoring work happening in parallel. |
|
Note that the schema description here is a proposal that could be included or replaced completely by the dataset authoring design (e.g., by introducing a new dataset kind). |
|
Alright we can try out this design then |
Just to comment on this aspect, there are tradeoffs but it wouldn't be unreasonable to design this such that the CSV data is copied over into Amp table format. |
Define the full design for amp-providers-static, covering provider config schema, CSV schema inference with column name sanitization, small-file in-memory caching, and lazy catalog integration into the providers registry.