Skip to content

handle data parts in AIChat#900

Merged
threepointone merged 7 commits intomainfrom
store-data-parts
Feb 17, 2026
Merged

handle data parts in AIChat#900
threepointone merged 7 commits intomainfrom
store-data-parts

Conversation

@deathbyknowledge
Copy link
Member

@deathbyknowledge deathbyknowledge commented Feb 13, 2026

Add support for data-* stream parts in the shared message builder. Data parts are developer-defined typed JSON blobs that the AI SDK attaches to messages alongside text — they were previously silently dropped.

What changed

packages/ai-chat/src/message-builder.ts — Handle data-* chunk types in applyChunkToParts. This is the shared parser used by both the server (_streamSSEReply) and client (stream resume / cross-tab broadcast), so data parts are handled consistently everywhere. Transient parts (transient: true) are broadcast but not persisted. Non-transient parts support reconciliation by type+id — a second chunk with the same composite key updates .data in-place.

packages/ai-chat/src/react.tsx — Forward the onData callback for data-* chunks on the stream resume and cross-tab broadcast codepaths. The transport path (new messages from this tab) already goes through the AI SDK's internal pipeline which invokes onData. This is the correct way to consume transient data parts on the client since they are not in message.parts.

packages/ai-chat/src/tests/message-builder.test.ts — 9 new tests covering append, id preservation, type+id reconciliation, non-matching ids/types, id-less parts, transient skip, and coexistence with other part types.

examples/resumable-stream-chat/ — Updated to demonstrate three data part patterns: data-sources (reconciliation), data-thinking (transient via onData), and data-usage (persisted metadata). The client uses UIMessage<unknown, { sources; thinking; usage }> for fully typed data parts — no manual casts needed.

What about the items from this comment?

  1. onData callback — Addressed. Extracted from options, stored in a ref, forwarded to useChat, and explicitly called for the resume/broadcast paths.
  2. File part size validation — Removed per "don't skip persistence" feedback. PR ai-chat: architecture cleanup, bug fixes, storage guards, new features, docs rewrite #899 handles oversized rows with compression at the storage layer.
  3. Typed data parts on useAgentChat — Already works. The ChatMessage generic flows through useAgentChatuseChat end-to-end. dataPartSchemas is available via the inherited ChatInit options. The example now demonstrates this — part.type === "data-sources" narrows part.data to the correct shape automatically.

Closes #575

@changeset-bot
Copy link

changeset-bot bot commented Feb 13, 2026

🦋 Changeset detected

Latest commit: 08fa18f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/ai-chat Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@threepointone
Copy link
Contributor

I think I handled this in my pr, can you check?

@threepointone
Copy link
Contributor

Oh this is different

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 13, 2026

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/agents@900

commit: 08fa18f

@deathbyknowledge
Copy link
Member Author

Oh this is different

Yes, I think your PR is missing data parts but you have added a compression for messages too large and I added a simpler size check to skip persistence. Should I remove that?

@deathbyknowledge deathbyknowledge marked this pull request as ready for review February 13, 2026 11:28
@threepointone
Copy link
Contributor

Yeah don't skip persistence

@whoiskatrin
Copy link
Contributor

/bonk review this PR

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Feb 13, 2026

Now I have a thorough understanding of the PR. Here's my review:


PR Review: Handle data parts in AIChat

Summary

This PR adds server-side persistence for data-* stream parts in AIChatAgent._streamSSEReply, which were previously silently dropped. It also adds a size guard on file parts and updates the resumable-stream-chat example to demonstrate three data part patterns.

Core SDK Changes (packages/ai-chat/src/index.ts)

The good:

  • The default case with data.type.startsWith("data-") guard is the right place to handle this — it's a catch-all that won't interfere with existing named cases.
  • The reconciliation logic (find-by-type+id, update in-place) correctly mirrors AI SDK client behavior for progressive updates.
  • Transient parts are correctly excluded from persistence while still being broadcast to clients (broadcast happens at line 1838-1848, after the switch).
  • The size guard on file parts is a sensible addition.

Issues:

  1. Skipping persistence on oversized parts contradicts maintainer feedback. @threepointone explicitly said "don't skip persistence" in the PR comments. The MAX_DATA_PART_SIZE_BYTES guard and the "skip persistence" behavior for both file parts and data parts needs to be removed or reworked per that feedback. This is a blocker.

  2. Unsafe as cast on message.parts.push. At line 1813:

    message.parts.push({
      type: dataChunk.type,
      ...(dataChunk.id != null && { id: dataChunk.id }),
      data: dataChunk.data
    } as ChatMessage["parts"][number]);

    ChatMessage is UIMessage from the AI SDK, and UIMessage["parts"] has a fixed union type. A data-* part won't match any member of that union, so the as cast silences a real type mismatch. This will work at runtime (it's just an array push), but it's hiding a type system gap. Consider whether ChatMessage should be extended to include a generic data part shape, or add a comment explaining why the cast is necessary and safe.

  3. "id" in p check is fragile. At line 1800:

    const existing = message.parts.find(
      (p) =>
        p.type === dataChunk.type &&
        "id" in p &&
        (p as { id?: string }).id === dataChunk.id
    );

    The "id" in p check will also match other part types that happen to have an id field (e.g., reasoning parts have an id). Since you're already filtering by dataChunk.type (which starts with data-), and non-data parts won't share those type strings, this is safe in practice — but a comment noting that would help future readers.

  4. new TextEncoder().encode(...) on potentially large strings for size checking. This allocates a full UTF-8 byte array just to check .byteLength. For the 2MB threshold this is fine in practice, but new Blob([serialized]).size would be more memory-efficient if this ever matters. Not a blocker.

Example Changes (examples/resumable-stream-chat/)

The good:

  • Clean demonstration of all three patterns (reconciliation, transient, persisted metadata).
  • The client-side rendering with dedicated components (SourcesPart, ThinkingPart, UsagePart) is well-structured.

Issues:

  1. Data parts are written after writer.merge(result.toUIMessageStream()). In server.ts, the writer.merge() call returns immediately (it pipes asynchronously), so the data-sources "searching" write at line ~52 fires right away — before any text has likely been emitted. This means the sources card will appear above the text in the UI, which matches the client layout. But the "found" write at line ~58 happens after a 1500ms setTimeout, which is a race with the LLM stream — the sources update might arrive mid-text-stream or after it finishes depending on model speed. This is probably fine for a demo, but a comment noting the intentional race would be helpful.

  2. Hardcoded results in the example. The data-sources "found" results are always the same three strings regardless of the query. Fine for a demo, but worth a brief inline comment so users don't think real search is happening.

  3. The ThinkingPart transient part is written with transient: true but the client also guards on isStreaming. This is belt-and-suspenders — after reload the transient part won't be in persisted messages, AND the streaming check would hide it anyway. Not a bug, just redundancy worth noting.

Changeset

The changeset bumps both @cloudflare/agents-resumable-stream-chat (example) and @cloudflare/ai-chat (SDK) as patch. The SDK bump is appropriate. The example bump is unnecessary if examples aren't independently published, but it doesn't hurt.

Verdict

The main blocker is item #1 — the maintainer has asked to not skip persistence for oversized parts. That feedback needs to be addressed before merge. The approach should either be removed entirely (let SQLite fail naturally, or handle it at the storage layer with compression as @threepointone's PR does) or reworked to match whatever the agreed strategy is.

Everything else is minor and the core reconciliation/transient logic is correct.

github run

@deathbyknowledge
Copy link
Member Author

/bonk review again

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Feb 13, 2026

Good. Now I have a full picture. Let me write the review.


PR Review: Handle data parts in AIChat (re-review)

Previous blocker — resolved

The MAX_DATA_PART_SIZE_BYTES guard and skip-persistence behavior were removed in d8895f7, addressing @threepointone's feedback. Data parts are now always persisted (unless transient). This is correct.

Remaining issues

1. Stray whitespace change (nit)

The diff removes a blank line between CLEANUP_AGE_THRESHOLD_MS and const decoder at line 152. This is unrelated to the feature and will cause a noisy diff. Should be reverted.

2. Transient parts are still broadcast AND stored as stream chunks

Transient parts are correctly excluded from message.parts (the persisted message), but the code falls through to _storeStreamChunk at line 1806 — meaning the raw SSE chunk is stored in the replay buffer. If a client reconnects mid-stream, it will receive the transient chunk during replay, which is fine (the stream is still in-flight). But after the stream completes, the chunk stays in the replay buffer until cleanup. This is probably harmless, but worth a comment noting that transient parts are excluded from message persistence but are still included in the stream replay buffer (by design, since they're part of the SSE stream).

3. as ChatMessage["parts"][number] cast — unchanged from previous review

The cast at line 1777 is still there. UIMessage["parts"] is a closed union that doesn't include data-* shapes, so this as silences a real type error. It works at runtime, but if the AI SDK ever tightens the union or adds runtime validation on parts, this will break silently. At minimum, add a comment like:

// Cast needed because UIMessage["parts"] doesn't include data-* part shapes.
// Safe at runtime — parts is a plain array and data-* types are pass-through.

4. "id" in p type narrowing — unchanged from previous review

The find predicate at line 1764:

(p) =>
  p.type === dataChunk.type &&
  "id" in p &&
  (p as { id?: string }).id === dataChunk.id

The p.type === dataChunk.type guard (where dataChunk.type starts with data-) makes the "id" in p check safe in practice since no built-in part types share data-* prefixes. But the double cast ("id" in p then as { id?: string }) is awkward. A cleaner approach:

const existing = message.parts.find(
  (p) =>
    p.type === dataChunk.type &&
    "id" in p &&
    (p as Record<string, unknown>).id === dataChunk.id
);

Or just cast once at the top since you're already in the data-* branch. Minor nit, not a blocker.

Example changes — look good

  • The hardcoded results comment from the previous review was added (line ~75 in server.ts). Good.
  • The three patterns (reconciliation, transient, persisted) are clearly structured and well-commented.
  • The data-thinking part with transient: true on the server and isStreaming guard on the client is intentionally redundant (belt-and-suspenders), which is reasonable for a demo.

One observation: the writer.merge(result.toUIMessageStream()) call at line 62 starts piping the LLM stream, then immediately writes data-sources "searching" at line 65. Since merge pipes asynchronously, the sources write interleaves with text chunks — the ordering depends on model latency. The demo happens to look fine because the client renders sources above text regardless of arrival order. No action needed, just noting it.

Verdict

No blockers remaining. The main feedback from the previous review (skip persistence) has been addressed. The remaining items are minor — the as cast comment (#3) and the stray whitespace (#1) are worth fixing before merge but shouldn't block it. The core reconciliation and transient logic is correct.

github run

@threepointone
Copy link
Contributor

cursor told me this:

What's still NOT covered even after this PR:
Looking at the AI SDK's full data parts feature set:
onData callback on the client side — The AI SDK's useChat has an onData callback that fires for all data parts as they stream (including transient ones). Our useAgentChat hook doesn't expose this. The example works around it by checking isStreaming to conditionally render the transient thinking part, but that's a UI hack — once streaming ends, the part vanishes from the render, not because the framework removed it, but because the render check hides it. A proper onData callback would give users a clean way to handle transient parts (show toasts, update external state, etc.).
File part size validation — The PR description mentions it, but I don't see it in the diff for index.ts. The diff only shows the data-* handling in the default case.
Typed data parts on useAgentChat — The AI SDK supports useChat with typed data part schemas, so message.parts gives you type-safe access to your custom data-weather, data-sources, etc. The example uses manual type assertions (as DataPart). Our useAgentChat would need to pass through the generic type parameter to fully support this

@threepointone
Copy link
Contributor

I'll review this later this evening, let's get it in shape by then?

@threepointone
Copy link
Contributor

what parts of that feedback I shared were addressed? could you update the pr description? could you add tests (unit/e2e as relevant) thanks! I'll review tomorrow morning before landing.

@deathbyknowledge
Copy link
Member Author

what parts of that feedback I shared were addressed? could you update the pr description? could you add tests (unit/e2e as relevant) thanks! I'll review tomorrow morning before landing.

I've added tests too 🫡

Add a clarifying comment in message-builder explaining that chunk.data should always be provided because a persisted part with data: undefined will be dropped by JSON.stringify (resulting in no data field on reload). Add unit tests to cover: transient parts with the same id are skipped (not reconciled) and handling of undefined data (part persisted without a data field).
@threepointone
Copy link
Contributor

love this, banger. will land once CI passes.

threepointone and others added 2 commits February 17, 2026 11:14
- Add Data Parts section to docs/chat-agents.md covering server-side
  writer.write(), three patterns (reconciliation, append, transient),
  typed client access, and onData callback
- Update docs/AGENTS.md to reflect that the sync-docs workflow has been
  removed — docs must be manually ported to cloudflare-docs

Co-authored-by: Cursor <cursoragent@cursor.com>
Adjust Markdown table spacing and column alignment in the "Three Patterns" section of docs/chat-agents.md to improve readability and rendering. No semantic changes to content or behavior.
@threepointone threepointone merged commit 16b2dca into main Feb 17, 2026
4 checks passed
@threepointone threepointone deleted the store-data-parts branch February 17, 2026 11:20
@github-actions github-actions bot mentioned this pull request Feb 17, 2026
taowen pushed a commit to taowen/agents that referenced this pull request Feb 17, 2026
* feat(ai-chat): handle data-* stream parts and forward onData callback (cloudflare#900)

* typed data parts in example

* add data parts in message builder test

* Clarify chunk.data and add tests

Add a clarifying comment in message-builder explaining that chunk.data should always be provided because a persisted part with data: undefined will be dropped by JSON.stringify (resulting in no data field on reload). Add unit tests to cover: transient parts with the same id are skipped (not reconciled) and handling of undefined data (part persisted without a data field).

* docs: add data parts section to chat-agents, update AGENTS.md

- Add Data Parts section to docs/chat-agents.md covering server-side
  writer.write(), three patterns (reconciliation, append, transient),
  typed client access, and onData callback
- Update docs/AGENTS.md to reflect that the sync-docs workflow has been
  removed — docs must be manually ported to cloudflare-docs

Co-authored-by: Cursor <cursoragent@cursor.com>

* Reformat table in chat-agents.md

Adjust Markdown table spacing and column alignment in the "Three Patterns" section of docs/chat-agents.md to improve readability and rendering. No semantic changes to content or behavior.

---------

Co-authored-by: Sunil Pai <spai@cloudflare.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom data parts (data-*) not persisted to message history

3 participants

Comments