-
Notifications
You must be signed in to change notification settings - Fork 1k
.NET: [BREAKING] fix: Subworkflows do not work well with Chat Protocol and Checkpointing #3240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes multiple issues related to subworkflows when combined with checkpointing and the chat protocol. The changes address concurrency issues, ownership problems, and output propagation when subworkflows are used in different execution modes (AsAgent and direct streaming).
Changes:
- Removed the
runIdparameter from Resume methods as it's redundant (already contained in CheckpointInfo) - Fixed subworkflow ownership release to properly restore previous ownership state instead of always releasing to null
- Added YieldOutputAsync support to ISuperStepJoinContext for propagating subworkflow outputs
- Modified ChatProtocol validation to allow catch-all handlers as valid chat protocol executors
- Fixed WorkflowHostExecutor reset logic to avoid detaching join context prematurely
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| IWorkflowExecutionEnvironment.cs | Removed redundant runId parameter from Resume methods |
| InProcessExecution.cs | Updated static helper methods to remove runId parameter |
| InProcessExecutionEnvironment.cs | Removed runId parameter from ResumeRunAsync, now uses CheckpointInfo.RunId |
| WorkflowThread.cs | Added includeWorkflowOutputsInResponse support and new CreateUpdate overload for ChatMessage |
| WorkflowHostingExtensions.cs | Added includeWorkflowOutputsInResponse parameter to AsAgent extension |
| WorkflowHostAgent.cs | Plumbed through includeWorkflowOutputsInResponse and enabled catch-all for protocol validation |
| Workflow.cs | Modified ReleaseOwnershipAsync to accept targetOwnerToken for proper ownership chaining |
| InProcessRunnerContext.cs | Added ownership tracking fields and YieldOutputAsync implementation |
| WorkflowHostExecutor.cs | Fixed checkpoint manager initialization, join context detachment, and output forwarding |
| OutputMessagesExecutor.cs | Moved from nested class to top-level with explicit executor options |
| ChatProtocolExecutor.cs | Set explicit executor options to disable auto-send/yield |
| ChatProtocol.cs | Added allowCatchAll parameter for protocol validation |
| ProtocolDescriptor.cs | Added AcceptsAll property and constructor parameter |
| Executor.cs | Pass HasCatchAll to ProtocolDescriptor |
| ISuperStepJoinContext.cs | Added YieldOutputAsync method |
| AIAgentHostExecutor.cs | Removed blank line (formatting) |
| SubworkflowBinding.cs | Formatting adjustment for constructor parameters |
| Sample files | Updated to remove runId parameter and added new test case |
| TestRunContext.cs | Added QueuedOutputs and YieldOutputAsync test support |
dotnet/src/Microsoft.Agents.AI.Workflows/WorkflowHostingExtensions.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Microsoft.Agents.AI.Workflows/Specialized/WorkflowHostExecutor.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Microsoft.Agents.AI.Workflows/Specialized/OutputMessagesExecutor.cs
Show resolved
Hide resolved
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/Sample/13_Subworkflow_Checkpointing.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Microsoft.Agents.AI.Workflows/InProc/InProcessRunnerContext.cs
Outdated
Show resolved
Hide resolved
199a520 to
da54134
Compare
Subworkflows run into issues with Checkpointing and the Chat Protocol: * The concurrency rework made subtle changes in behaviour that introduced a hang when using subworkflows with ChatProtocol and streaming execution. * The ResetAsync() implementation in WorkflowHostExecutor was improperly resetting the joinContext - this was happening on restore checkpoint _after_ the join context was attached when * Subworkflows cannot be used as the start node when hosted AsAgent due to inability to treat Catch-All as a Chat Protocol * Subworkflow ownership issue when used in non-concurrent mode after finishing a run Also fixes: * When ChatMessages are output by executors that are not agents, there is no corresponding AgentResponseUpdate/AgentResponse event Breaking Changes * [BREAKING CHANGE] It is possible to provide the wrong RunId when resuming from CheckpointInfo (even though the data already exists on CheckpointInfo)
da54134 to
f7b8b95
Compare
Motivation and Context
Subworkflows run into issues with Checkpointing and the Chat Protocol:
ChatProtocoland streaming execution.ResetAsync()implementation inWorkflowHostExecutorwas improperly resetting thejoinContext- this was happening on restore checkpoint after the join context was attached when the executor is instantiatedAsAgent()due to inability to treat Catch-All as a Chat ProtocolAlso fixes:
ChatMessagesare output by executors that are not agents, there is no correspondingAgentResponseUpdate/AgentResponseeventBreaking Changes
runIdwhen resuming fromCheckpointInfo(even though the data already exists onCheckpointInfo), which could lead to odd error like "unable to find checkpoint with ID=..."Description
Multiple fixes targeting related capabilities around Subworkflows and Checkpointing both AsAgent and directly:
Also fixes:
Likely relevant to #2419, will validate / add test in separate PR.
Breaking Changes
runIdargument fromResumeAsync()andResumeStreamAsync()Contribution Checklist