Skip to content

Comments

[Graphite MQ] Draft PR GROUP:spec_3b0cac (PRs 4183, 4190, 4191, 4219, 4220, 4235)#4241

Closed
graphite-app[bot] wants to merge 6 commits intomainfrom
gtmq_spec_3b0cac_1771541680619-bcd60b15-bd47-4c3e-9bcf-e5e3d18aef4e
Closed

[Graphite MQ] Draft PR GROUP:spec_3b0cac (PRs 4183, 4190, 4191, 4219, 4220, 4235)#4241
graphite-app[bot] wants to merge 6 commits intomainfrom
gtmq_spec_3b0cac_1771541680619-bcd60b15-bd47-4c3e-9bcf-e5e3d18aef4e

Conversation

@graphite-app
Copy link
Contributor

@graphite-app graphite-app bot commented Feb 19, 2026

This draft PR was created by the Graphite merge queue.
Trunk will be fast forwarded to the HEAD of this PR when CI passes, and the original PRs will be closed.

The following PRs are included in this draft PR:

# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
# Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

## Type of change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

## Checklist:

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
@railway-app
Copy link

railway-app bot commented Feb 19, 2026

This PR was not deployed automatically as @graphite-app[bot] does not have access to the Railway project.

In order to get automatic PR deploys, please add @graphite-app[bot] to your workspace on Railway.

@claude
Copy link

claude bot commented Feb 19, 2026

PR Review

This Graphite merge queue draft PR (#4183, #4190, #4191, #4219, #4220, #4235) contains a well-focused set of bug fixes, observability improvements, and correctness improvements. Here is the breakdown:


Grafana Dashboard Fixes

label_values(rivet_project) to label_values(up, rivet_project) (all dashboards)

Correct fix. The metric-less label_values form is deprecated in Prometheus and can return stale label values from expired series. Using up as the anchor metric is idiomatic. Applied consistently across all 7 dashboards.

Gasoline dashboard: rivet_gasoline_workflow_total to rivet_gasoline_workflow_sleeping

Appears to be a bug fix for the "sleeping workflows" panel using the wrong metric. Correct.


Rust Changes

parallelization: u128 to u16 (wf/mod.rs, signal.rs, debug.rs, kv/debug.rs)

Good change. The value was already capped at < 1024 in all callers, so u128 was an unnecessarily large type for a CLI arg. The casts to u128 for the chunk size arithmetic (u128::MAX / parallelization as u128) are correct.

max_per_txn: Option<usize> for signal pruning

Useful addition that prevents transactions from growing unbounded when pruning large volumes of signals. The implementation correctly short-circuits the inner loop. One observation: workflow pruning uses a hardcoded MAX_PRUNES_PER_TXN = 1000 constant while signal pruning exposes this as a CLI flag. This asymmetry seems intentional given different workload characteristics, but a comment explaining the rationale would help future readers.

Resilient error handling in db/kv/mod.rs

The change from propagating WorkflowError::FailedBuildingWorkflowHistory to tracing::error! + return Ok(None) is a significant behavioral shift. A workflow that fails to deserialize its history is now silently skipped rather than surfacing an error. This is a pragmatic choice to prevent one bad workflow from blocking others, but data corruption or schema mismatches become harder to detect. Consider adding a metric increment here so silent skips are visible in dashboards.

create_ts/ray_id key handling

Same pattern as above, converting a hard error into a logged skip. Consistent with the approach elsewhere in the PR.

Drain grace period: 5s to 10s

Straightforward increase to give connections more time to drain cleanly.

SSE error handling consolidation in conn.rs

Merging the Transport arm into the general Err arm and using anyhow::Error::from(err).chain() to print the full error chain is a nice improvement for debugging connection failures. The unused use std::error::Error; import is correctly removed.

Tracing instrumentation additions

Adding named spans (name="ping_task", name="tunnel_to_ws_task", etc.) with ray_id and req_id fields is good practice for distributed tracing. Applied consistently across the pegboard runner tasks.

drop_rx.borrow().as_ref()

Minor correctness fix. For a watch::Ref<Option<T>>, calling .as_ref() converts the borrowed Option<T> to Option<&T>, which avoids holding the read-lock longer than needed for the debug format.

if desired_slots != 0 guard in metrics_aggregator.rs

Avoids emitting zero-value metric entries, which is correct for Prometheus counter/gauge hygiene (reduces cardinality noise).


TypeScript Changes (config.ts)

Purely formatting: realigns indentation of discriminated union type members to use consistent 2-space indentation rather than 3-space. No functional change.

The comment update from // This must be less than ACTOR_STOP_THRESHOLD_MS to // This must be less than engine config > pegboard.actor_stop_threshold is more accurate and helpful.


Summary

The changes are well-scoped and correct. The main thing worth a follow-up is observability around the new "skip on error" behavior in workflow history loading. A counter metric would help operators detect if this fallback path is being exercised in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant