Skip to content

fix(rivetkit): stall stop handler until start completes#4192

Open
MasterPtato wants to merge 1 commit intomainfrom
02-13-fix_rivetkit_stall_stop_handler_until_start_completes
Open

fix(rivetkit): stall stop handler until start completes#4192
MasterPtato wants to merge 1 commit intomainfrom
02-13-fix_rivetkit_stall_stop_handler_until_start_completes

Conversation

@MasterPtato
Copy link
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link

railway-app bot commented Feb 13, 2026

🚅 Deployed to the rivet-pr-4192 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud ❌ Build Failed (View Logs) Web Feb 18, 2026 at 2:25 am
frontend-inspector ❌ Build Failed (View Logs) Web Feb 18, 2026 at 2:25 am
ladle ❌ Build Failed (View Logs) Web Feb 18, 2026 at 2:24 am
website 😴 Sleeping (View Logs) Web Feb 13, 2026 at 10:06 pm
mcp-hub ✅ Success (View Logs) Web Feb 13, 2026 at 9:55 pm

Copy link
Contributor Author


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Feb 13, 2026

PR Review: fix(rivetkit): stall stop handler until start completes

Summary

This PR fixes a race condition in the engine actor driver where a stop signal could arrive before the actor's start/initialization was complete, potentially leaving the system in an inconsistent state. It also adds tests for connection error serialization.


Core Fix: #runnerOnActorStop (actor-driver.ts)

The change is correct and addresses a real race condition. The core logic is:

  1. If handler doesn't exist at all, bail early (previously the code would silently skip cleanup if handler.actor was falsy, but not delete the handler).
  2. If handler.actorStartPromise exists, wait for it to settle before proceeding with stop — swallowing any start error so cleanup can still run.
  3. Always delete the actor from the map at the end, regardless of whether handler.actor was set.

The old code only deleted the actor from the map inside the if (handler?.actor) branch, meaning a handler that was created but never finished starting (and thus never had .actor populated) would leak from the #actors map. This is the primary bug being fixed.

One concern: After await handler.actorStartPromise.promise returns, the code checks if (handler.actor). This is correct because in the failure path (catch in #runnerOnActorStart) the promise is rejected, but handler.actor may be set partway through if definition.instantiate() succeeded but handler.actor.start() failed. Looking at lines 460–480 of the file, handler.actor is assigned on line 460 before start() is called, so handler.actor could be non-null even in the rejection path. The code handles this correctly by still calling onStop if handler.actor is set — good.

Potential edge case: If #runnerOnActorStop is called concurrently for the same actorId (e.g., two stop signals racing), both will pass the !handler guard and both will await actorStartPromise. Then both may attempt to call handler.actor.onStop and #actors.delete. This is unlikely in normal operation but worth being aware of — a guard or idempotency check after the await would make this more robust.


Tests: actor-lifecycle.ts

The tests are well-structured and cover the intended race conditions. A few observations:

Test: "actor stop during start waits for start to complete"
The actorId variable is declared and assigned via await actor.resolve(), but then it is only used in the verification step. There's no issue, but the comment "Get actor ID" is slightly misleading — resolve() here likely does more than just return an ID (it may also trigger start). The ordering of pingPromise followed by actor.destroy() followed by await pingPromise is intentional and tests the right thing.

Test: "onBeforeActorStart completes before stop proceeds"
This test expects state.initialized and state.startCompleted to both be true after destruction. This is correct given the fix. However, there's a subtle assumption: the getState() call must reach the actor before destroy() completes and removes it. This may be flaky in environments where the actor is torn down very quickly — worth verifying this doesn't cause intermittent failures.

Test: "onDestroy is called even when actor is destroyed during start"
This test checks state.destroyCalled, but destroyCalled is set in onDestroy — and the state is read via statePromise which was initiated before destroy(). If the actor is destroyed before getState() executes, the call will fail with a "not found" error rather than returning the state. The test could be more defensive here (or document that this relies on actor-level ordering guarantees).

Test: "multiple rapid create/destroy cycles handle race correctly"
expect(true).toBe(true) at the end is a trivially true assertion — its only value is confirming no exception was thrown. Consider using expect.hasAssertions() at the top of the test or just relying on the implicit "no throw" pass behavior.

Unused variable:
In "actor stop called with no actor instance cleans up handler", id is assigned from await actor.resolve() but is never used. It can be removed.


New Fixture: start-stop-race.ts

The fixture is clean and purpose-built. The 100ms setTimeout in onWake is intentional to widen the race window, which is appropriate for test fixtures. One note: onDestroy sets c.state.destroyCalled = true with a comment "the actor framework will save it automatically" — this is fine if true, but worth confirming the state is persisted in the destroy handler context before being read by tests.


New Fixture: conn-error-serialization.ts + Tests

The fixture and tests are straightforward and correctly test that ActorError thrown from createConnState preserves group and code through WebSocket serialization. The connection params typing (params: { shouldThrow?: boolean }) is clean.

Minor: The test disposes the connection with await conn.dispose() even in the error case. If the connection was never established (due to the error), calling dispose() on it should be safe/no-op, but this is worth confirming.


Minor / Style

  • The whitespace-only changes (() => { }() => { }) in actor-driver.ts are formatting noise. They are harmless but add diff churn.
  • Log message strings use msg: "..." format consistently with the rest of the file — good.
  • stringifyError(err) is used correctly in the new catch block.

Overall

The core fix is correct and well-targeted. The test coverage is good in breadth but a few tests have subtle ordering assumptions that could make them flaky in faster environments. The unused id variable in one test is a minor cleanup item.

this.#runnerStarted.resolve(undefined);
},
onDisconnected: (_code, _reason) => {},
onDisconnected: (_code, _reason) => { },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove spaces inside the empty function braces to comply with Biome linter formatting rules. Change onDisconnected: (_code, _reason) => { }, to onDisconnected: (_code, _reason) => {},

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

return streamSSE(c, async (stream) => {
// NOTE: onAbort does not work reliably
stream.onAbort(() => {});
stream.onAbort(() => { });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove spaces inside the empty function braces to comply with Biome linter formatting rules. Change stream.onAbort(() => { }); to stream.onAbort(() => {});

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +1 to +71
import { actor } from "rivetkit";

/**
* Actor designed to test start/stop race conditions.
* Has a slow initialization to make race conditions easier to trigger.
*/
export const startStopRaceActor = actor({
state: {
initialized: false,
startTime: 0,
destroyCalled: false,
startCompleted: false,
},
onWake: async (c) => {
c.state.startTime = Date.now();

// Simulate slow initialization to create window for race condition
await new Promise((resolve) => setTimeout(resolve, 100));

c.state.initialized = true;
c.state.startCompleted = true;
},
onDestroy: (c) => {
c.state.destroyCalled = true;
// Don't save state here - the actor framework will save it automatically
},
actions: {
getState: (c) => {
return {
initialized: c.state.initialized,
startTime: c.state.startTime,
destroyCalled: c.state.destroyCalled,
startCompleted: c.state.startCompleted,
};
},
ping: (c) => {
return "pong";
},
destroy: (c) => {
c.destroy();
},
},
});

/**
* Observer actor to track lifecycle events from other actors
*/
export const lifecycleObserver = actor({
state: {
events: [] as Array<{
actorKey: string;
event: string;
timestamp: number;
}>,
},
actions: {
recordEvent: (c, params: { actorKey: string; event: string }) => {
c.state.events.push({
actorKey: params.actorKey,
event: params.event,
timestamp: Date.now(),
});
},
getEvents: (c) => {
return c.state.events;
},
clearEvents: (c) => {
c.state.events = [];
},
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the Biome formatter on this file to ensure proper formatting and sorted imports. The file is new and likely has formatting issues that don't match the project's style guide.

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +71 to +72
import { startStopRaceActor, lifecycleObserver } from "./start-stop-race";
import { connErrorSerializationActor } from "./conn-error-serialization";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports should be sorted alphabetically. Consider reordering these imports to maintain consistent sorting.

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

import { runActorConnHibernationTests } from "./tests/actor-conn-hibernation";
import { runActorConnStateTests } from "./tests/actor-conn-state";
import { runActorDbTests } from "./tests/actor-db";
import { runConnErrorSerializationTests } from "./tests/conn-error-serialization";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import should be sorted alphabetically with the other imports to maintain consistent ordering.

Spotted by Graphite Agent (based on CI logs)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 18, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4192

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4192

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4192

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4192

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4192

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4192

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4192

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4192

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4192

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4192

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4192

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4192

commit: 8f7b569

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments