Skip to content

Comments

fix(pegboard): add threshold parameters to runner protocol metadata#4247

Open
MasterPtato wants to merge 1 commit into02-19-chore_get_docker_compose_working_againfrom
02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata
Open

fix(pegboard): add threshold parameters to runner protocol metadata#4247
MasterPtato wants to merge 1 commit into02-19-chore_get_docker_compose_working_againfrom
02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata

Conversation

@MasterPtato
Copy link
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link

railway-app bot commented Feb 20, 2026

🚅 Deployed to the rivet-pr-4247 environment in rivet-frontend

Service Status Web Updated (UTC)
ladle ❌ Build Failed (View Logs) Web Feb 21, 2026 at 1:41 am
frontend-inspector ❌ Build Failed (View Logs) Web Feb 20, 2026 at 10:19 pm
frontend-cloud ❌ Build Failed (View Logs) Web Feb 20, 2026 at 10:18 pm
website 😴 Sleeping (View Logs) Web Feb 20, 2026 at 2:48 am
mcp-hub ✅ Success (View Logs) Web Feb 20, 2026 at 2:37 am

Copy link
Contributor Author

MasterPtato commented Feb 20, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 20, 2026

More templates

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4247

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4247

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4247

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4247

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4247

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4247

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4247

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4247

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4247

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4247

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4247

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4247

commit: 76e52b6

@claude
Copy link

claude bot commented Feb 20, 2026

test comment - please ignore

@claude
Copy link

claude bot commented Feb 20, 2026

simple test 2

@claude
Copy link

claude bot commented Feb 20, 2026

test with gt sign and lt sign

@claude
Copy link

claude bot commented Feb 20, 2026

PR Review: fix(pegboard): add threshold parameters to runner protocol metadata

Summary

This PR bumps the runner protocol to v6 to add two new fields to ProtocolMetadata: actor_stop_threshold (so runners can dynamically clamp their own timeouts) and serverless_drain_grace_period (making the previously hardcoded 10s drain grace period configurable). The refactoring of duplicated runner-lost timeout code and the Vec → HashMap cache improvement are good cleanups.


Bugs / Correctness

1. Mutation of shared definition.config.options in actor-driver.ts

const definition = lookupInRegistry(this.#config, actorConfig.name);
// ...
definition.config.options.onSleepTimeout = Math.min(...);
definition.config.options.onDestroyTimeout = Math.min(...);

If lookupInRegistry returns a reference to a shared/cached definition object (rather than a clone), this mutation persists across actor instances of the same type. The first actor to connect with a tight protocol metadata threshold would permanently clamp the timeout for every subsequent actor of the same type in that runner process. A defensive clone of definition.config.options (or limiting the mutation to the actor-local handler) should be used here.

2. Truthiness check on serverlessDrainGracePeriod may behave unexpectedly

if (protocolMetadata.serverlessDrainGracePeriod) {

The type is i64 | null, which is bigint | null at runtime. The value 0n is falsy in JavaScript, so a grace period of exactly 0 would silently skip the clamping. The check should be protocolMetadata.serverlessDrainGracePeriod !== null.

3. runnerLostThreshold > 0 compares BigInt with Number

In #startRunnerLostTimeout:

this.#protocolMetadata.runnerLostThreshold > 0

runnerLostThreshold is i64 (BigInt), and 0 is a Number. This comparison works in modern JS but is inconsistent; prefer > 0n to be explicit and avoid subtle issues in older runtimes.

4. u64 → i64 cast in conn.rs

serverless_drain_grace_period: is_serverless
    .then(|| pb.serverless_drain_grace_period() as i64),

A very large u64 value could silently wrap to a negative i64. The default (10,000 ms) is well within range, but a user setting an extremely large value could produce a confusing negative timeout on the client. Consider using u64::try_into::<i64>() with an appropriate clamp/error.


Code Quality

5. actor_stop_threshold: 0 in v5→v6 conversion

metadata: v6::ProtocolMetadata {
    runner_lost_threshold: init.metadata.runner_lost_threshold,
    actor_stop_threshold: 0,
    serverless_drain_grace_period: None,
},

Hardcoding 0 means a v5 runner will see actorStopThreshold = 0 and the engine driver will calculate Math.max(0 - 1000, 0) = 0, clamping onSleepTimeout and onDestroyTimeout to 0ms. This may be intentional (v5 runners do not support the new threshold), but a brief comment would help clarify why 0 is safe here (e.g., "0 disables clamping on older runners").

6. Missing V5 variant in ToServerMk2

pub enum ToServerMk2 {
    V4(v4::ToServer),
    V6(v6::ToServer),  // no V5
}

This is well-handled by the comment and the 5 | 6 => match arm, but the asymmetry with ToClientMk2 (which retains V5) may confuse readers. A short code comment explaining why V5 is collapsed into V6 here would help.


Performance

  • The Vec change for last_ping_cache is a straightforward O(n) -> O(1) improvement. Good catch.
  • The tokio::try_join and tokio::join calls in handle_init and allocate_actor_v2 are good concurrency improvements.

Minor Nits

  • The renamed endian-converter.ts appears unrelated to this PRs purpose; consider separating it.
  • The empty lambda formatting change in actor-driver.ts is a trivial style change that adds diff noise.
  • The removed comments on onSleepTimeout/onDestroyTimeout are now enforced in code rather than documented -- ensure the new clamping logic is covered by tests.

Test Coverage

No tests are added for:

  • The new configurable serverless_drain_grace_period behavior
  • The actorStopThreshold clamping in the engine driver
  • The v5->v6 protocol conversion (particularly the actor_stop_threshold: 0 fallback)

Given the crash/timeout implications of getting these thresholds wrong, at minimum a unit test for the actor-driver clamping logic would be valuable.

@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from 3008d9c to 13b1948 Compare February 20, 2026 22:14
@MasterPtato MasterPtato marked this pull request as ready for review February 20, 2026 22:14
@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from 13b1948 to 76e52b6 Compare February 20, 2026 22:16
@MasterPtato MasterPtato force-pushed the 02-19-fix_pegboard_add_threshold_parameters_to_runner_protocol_metadata branch from 76e52b6 to ce7323c Compare February 21, 2026 01:40
@MasterPtato MasterPtato force-pushed the 02-19-chore_get_docker_compose_working_again branch from 121bf70 to 3da1f9e Compare February 21, 2026 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant