Skip to content

Built-in ingestion pipeline with configurable user profiles#34

Merged
jasperblues merged 2 commits intoembabel:mainfrom
jmjava:issue-32
Feb 28, 2026
Merged

Built-in ingestion pipeline with configurable user profiles#34
jasperblues merged 2 commits intoembabel:mainfrom
jmjava:issue-32

Conversation

@jmjava
Copy link
Contributor

@jmjava jmjava commented Feb 21, 2026

Summary

  • IngestionRunner (ApplicationRunner) ingests URLs and local directories on startup when guide.reload-content-on-startup=true, printing a structured INGESTION COMPLETE banner
  • Fault tolerance at every level: per-URL, per-directory, and per-document failures are collected with reasons and never block remaining items
  • DataManager depends on ChunkingContentElementRepository from rag-core (no custom RagStore wrapper), using the library's existing storage abstraction
  • User profiles under scripts/user-config/ with fresh-ingest.sh and append-ingest.sh scripts
  • Directory ingestion with robust path resolution (~/, absolute, relative)

Test plan

  • 97 tests passing (unit + integration)
  • Run ./scripts/fresh-ingest.sh end-to-end and verify INGESTION COMPLETE banner
  • Run ./scripts/append-ingest.sh with new content only
  • Verify curl http://localhost:1337/api/v1/data/stats returns store metrics
  • Verify MCP tools can search ingested directory content

Closes #32

Made with Cursor

jmjava and others added 2 commits February 21, 2026 08:57
…olerance

IngestionRunner (ApplicationRunner) ingests configured URLs and local
directories on startup when guide.reload-content-on-startup=true and prints
a structured INGESTION COMPLETE banner summarizing results.

Ingestion is fault-tolerant at every level: per-URL, per-directory, and
per-document failures are collected with reasons into IngestionResult and
never block remaining items.

DataManager depends on ChunkingContentElementRepository from rag-core
instead of a custom RagStore wrapper, using the library's existing storage
abstraction. Stats use ContentElementRepositoryInfo directly.

GuideProperties gains a directories list for local repo ingestion and
robust path resolution (tilde, absolute, and relative paths).

User profiles live under scripts/user-config/ (gitignored). fresh-ingest.sh
wipes and re-ingests from scratch; append-ingest.sh adds without clearing.
Both read GUIDE_PROFILE from .env and pass the config location to Spring.

Includes .env.example, INGESTION-TESTING.md, and 97 passing tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jasperblues jasperblues merged commit d4306c8 into embabel:main Feb 28, 2026
1 check passed
@jasperblues
Copy link
Contributor

Thanks @jmjava 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Built-in ingestion pipeline with configurable user profiles

2 participants