Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 11, 2025

Summary

Adds quickstart documentation for Microsoft.Extensions.DataIngestion library, demonstrating complete ETL pipeline for RAG scenarios.

Contributes to #50534

Changes

Documentation

  • New quickstart: docs/ai/quickstarts/process-data.md
    • Document reading with MarkdownReader
    • AI-powered enrichment (image alt-text, summaries)
    • Semantic chunking with embedding-based similarity
    • Vector storage using SQLite
    • Interactive semantic search
  • Dual-platform support via zone pivots (OpenAI/Azure OpenAI)
  • Added to "Chat with your data (RAG)" section in TOC

Code Snippets

  • Complete compilable C# projects for both platforms
  • Demonstrates pipeline composition: reader → enricher → chunker → writer
  • Includes sample data and region markers for doc references
// Compose data ingestion pipeline
using IngestionPipeline<string> pipeline = new(reader, chunker, writer, loggerFactory)
{
    DocumentProcessors = { imageAlternativeTextEnricher },
    ChunkProcessors = { summaryEnricher }
};

await foreach (var result in pipeline.ProcessAsync(new DirectoryInfo("./data"), searchPattern: "*.md"))
{
    Console.WriteLine($"Completed processing '{result.DocumentId}'. Succeeded: '{result.Succeeded}'.");
}

Based on sample from https://github.com/luisquintanilla/DataIngestion and blog announcement at https://devblogs.microsoft.com/dotnet/introducing-data-ingestion-building-blocks-preview/

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • devblogs.microsoft.com
    • Triggering command: /usr/bin/curl curl -s REDACTED (dns block)
    • Triggering command: /usr/bin/wget wget -q -O /tmp/blog.html REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Add a quickstart or tutorial for processing custom data based on the data ingestion sample described in https://devblogs.microsoft.com/dotnet/introducing-data-ingestion-building-blocks-preview/. It should live under the "Chat with your data" section of the AI TOC and be titled "Process data" or something similar. This task contributes to #50534.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.


Internal previews

📄 File 🔗 Preview link
docs/ai/quickstarts/process-data.md Process custom data for AI applications
docs/ai/quickstarts/snippets/process-data/data/sample.md docs/ai/quickstarts/snippets/process-data/data/sample
docs/ai/quickstarts/structured-output.md Request a response with structured output
docs/ai/quickstarts/text-to-image.md Quickstart - Generate images from text using AI
docs/ai/toc.yml docs/ai/toc

Co-authored-by: gewarren <24882762+gewarren@users.noreply.github.com>
Copilot AI changed the title [WIP] Add quickstart tutorial for processing custom data Add data ingestion quickstart for processing custom data Dec 11, 2025
Copilot AI requested a review from gewarren December 11, 2025 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants