Skip to content

fix(drive): eliminate panic in grovedb operations logging under concurrent execution#3142

Open
thepastaclaw wants to merge 2 commits intodashpay:v3.1-devfrom
thepastaclaw:fix/grovedb-logging-panic
Open

fix(drive): eliminate panic in grovedb operations logging under concurrent execution#3142
thepastaclaw wants to merge 2 commits intodashpay:v3.1-devfrom
thepastaclaw:fix/grovedb-logging-panic

Conversation

@thepastaclaw
Copy link
Contributor

@thepastaclaw thepastaclaw commented Feb 21, 2026

Issue

drive-abci tests intermittently panic with:

thread '...' panicked at packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs:97:39:
log params should be set above

This has been observed in CI on multiple PRs (e.g. exponential_distribution_extreme_values, fixed_amount_0) and was reproduced locally under cargo test -p drive-abci --lib -- --test-threads=8.

Root Cause

The grovedb operations logging code (behind #[cfg(feature = "grovedb_operations_logging")]) uses two independent tracing::event_enabled! checks that assume they will always return the same value:

// Check 1: before the grovedb operation
let maybe_params_for_logs = if tracing::event_enabled!(target: "drive_grovedb_operations", TRACE) {
    Some((ops.clone(), root_hash))  // Clone ops for logging
} else {
    None
};

// ... grovedb operation executes here (ops.operations is moved) ...

// Check 2: after the grovedb operation
if tracing::event_enabled!(target: "drive_grovedb_operations", TRACE) && cost_context.value.is_ok() {
    let (ops, previous_root_hash) =
        maybe_params_for_logs.expect("log params should be set above");  // 💥 PANIC if Check 1 was false
    // ...
}

If Check 1 returns false (→ None) but Check 2 returns true, the .expect() panics. The grovedb_operations_logging feature is enabled for drive-abci via its Cargo.toml dependency on drive, so this code is always compiled and active in the test binary.

Why the checks can disagree

The exact mechanism is not fully pinned down, but tracing::event_enabled! evaluates the current dispatcher at call time. In the test binary, drive-abci includes a test_logging test (src/logging/mod.rs:127) that calls tracing::dispatcher::set_default() to install a thread-local subscriber with TRACE level enabled. While set_default is thread-local and guarded, the Rust test framework's thread pool reuse creates a window where dispatcher state can be inconsistent across the two checks.

The issue was reproduced locally under parallel test execution (--test-threads=8) but is non-deterministic — it depends on thread scheduling timing.

Production impact

Low but non-zero. In production, drive-abci initializes a single tracing subscriber at startup and never changes it, so the two checks should always agree. However, the .expect() means any future change that introduces runtime log level reloading (a common feature) would create a latent crash path in every grovedb batch operation — a critical hot path for Platform nodes.

Fix

Replace the second event_enabled! re-check with if let Some(...) on the already-captured Option. This guarantees the logging block only executes when the params were actually captured, regardless of dispatcher state changes:

if let Some((ops, previous_root_hash)) = maybe_params_for_logs {
    if cost_context.value.is_ok() {
        // ... log ...
    }
}

Both affected call sites fixed:

  • grove_apply_batch_with_add_costs/v0/mod.rs
  • grove_clear/v0/mod.rs

Behavioral change

The first event_enabled! check (Check 1) is unchanged — it still gates whether we clone ops and capture the pre-operation root hash. When tracing is disabled, maybe_params_for_logs is None and the if let block is skipped entirely. No extra work happens when tracing is off.

The only behavioral difference from the old code is in the (already-broken) race condition edge case: if tracing is enabled at Check 1 but disabled between Check 1 and Check 2, the old code would skip the post-op block (because Check 2 re-evaluated and saw tracing disabled). The new code will still execute the post-op block — performing one extra root_hash() lookup and emitting a tracing::trace! that the now-disabled subscriber silently drops. This is harmless and is the same timing window that caused the panic in the first place.

Normal flows are identical:

  • Tracing on → both blocks execute (same as before)
  • Tracing off → neither block executes (same as before)

The panic is eliminated entirely.

Error propagation fix (additional)

The post-operation root_hash() call in both logging blocks previously used .map_err(Error::from)?, which could cause an early return from logging-only code. This had two consequences:

  • grove_apply_batch_with_add_costs: If root_hash() errored after the batch was already applied, the ? would skip push_drive_operation_result, silently dropping the operation costs from drive_operations and skewing fee accounting.
  • grove_clear: If root_hash() errored after a successful clear, the function would return an error instead of the successful result, making the caller believe the clear failed.

The post-op root_hash() is only needed for trace output. Errors are now absorbed and logged as warnings instead of propagated, ensuring the actual operation result is always returned to the caller.

CI Failure Evidence

Summary by CodeRabbit

  • Bug Fixes

    • Removed unsafe unwraps in logging paths that could cause panics; logging now fails gracefully and emits warnings when needed.
  • Refactor

    • Logging and trace logic reworked so logs are only prepared and emitted when required parameters are present, preserving prior log fields on success while avoiding crashes when retrievals fail.

Validation

  1. What was tested

    • cargo test -p dash-sdk --lib — PutSettings propagation for token operations
  2. Results

    • All Rust checks passing on CI
  3. Environment

    • GitHub Actions CI on dashpay/platform

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 697846e and 37cdec5.

📒 Files selected for processing (2)
  • packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs
  • packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs

📝 Walkthrough

Walkthrough

Refactored two grove-operation logging blocks to guard on maybe_params_for_logs and the root_hash retrieval. Logging now only runs when parameters exist; root_hash extraction is matched to log a trace on Ok and emit a warning on Err, avoiding unwraps and potential panics.

Changes

Cohort / File(s) Summary
Grove operations logging safety
packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs, packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs
Replace unconditional unwrap/expect paths with if let Some(...) guarding maybe_params_for_logs; match on root_hash Result to log trace on Ok(root_hash) and warn on Err(e). Preserve existing log fields and keep logging tied to tracing and successful cost context.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hopped through code where panics lay,
I nudged the guards and tucked them away,
Now traces sing when hashes show,
Warnings whisper when they don't — soft and slow,
A safer burrow for logs to stay. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main change: eliminating panics in grovedb operations logging during concurrent execution by replacing unsafe unwrap-based logging with guarded optional handling.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs`:
- Around line 89-93: The current logging block calls
self.grove.root_hash(...).unwrap().map_err(?) which can early-return and skip
the later push_drive_operation_result, losing cost accounting; change the
root_hash retrieval in the logging path to absorb errors instead of propagating
them: call self.grove.root_hash(transaction, &drive_version.grove_version) and
handle the Result with match or if let (log the root hash on Ok and log a
warning on Err) so no ? is used there, ensuring push_drive_operation_result
always runs and drive_operations cost is preserved.

In `@packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs`:
- Around line 52-56: The call to self.grove.root_hash(transaction,
&drive_version.grove_version) used only for logging currently uses ?/map_err and
can return an error that overrides the actual successful commit in result;
change this logging-path call to absorb any error instead of propagating it:
call self.grove.root_hash(... ) and handle the Err case (e.g., match or if let
Err(e) = ...) by logging the error with context but continuing, and only use the
Ok(root_hash) value for logging; ensure the variable names root_hash and result
are preserved so the commit result is returned unchanged.

…rrent execution

Replace independent tracing::event_enabled! re-checks with if-let on
the previously captured Option. The old code checked event_enabled!
twice independently — once before the grovedb operation to decide
whether to clone params for logging, and again after to decide whether
to log. If the tracing dispatcher state changed between the two checks,
the second path would .expect() on a None value and panic.

Additionally, replace the error-propagating ? operator in the post-op
root_hash() call with match-based error absorption. The old code used
.map_err(Error::from)? which could cause an early return from a
logging-only code path — skipping push_drive_operation_result (losing
fee accounting) or shadowing a successful result with a logging error.
The post-op root_hash is only needed for trace output, so errors are
now logged as warnings instead of propagated.

Both affected sites fixed:
- grove_apply_batch_with_add_costs/v0/mod.rs
- grove_clear/v0/mod.rs
@thepastaclaw thepastaclaw force-pushed the fix/grovedb-logging-panic branch from 697846e to 6d26b2c Compare February 21, 2026 21:09
@thepastaclaw
Copy link
Contributor Author

Addressed both CodeRabbit findings in the force-push:

? on root_hash() in logging path — Replaced .map_err(Error::from)? with match-based error absorption in both files. The post-op root_hash() is only needed for trace output, so errors are now logged as warnings instead of propagated. This prevents logging code from skipping push_drive_operation_result (losing fee accounting) or shadowing a successful result with a logging error.

Thanks for catching this — it was a pre-existing issue but good to fix while touching this code.

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine; I'm not positive about 6d26b2c so sam / others should evaluate if that is ok. it may change behavior in a way that isn't, I'm not sure

shumkov
shumkov previously approved these changes Feb 23, 2026
@QuantumExplorer
Copy link
Member

@lklimek can you review this one, if you accept it, we should merge it in. I really have no idea what this means.

@thepastaclaw
Copy link
Contributor Author

Thanks for the ping. Quick summary of what the commit changes:

  • The previous code did two separate event_enabled! checks and then did maybe_params_for_logs.expect(...) in the second block.
  • If those checks disagreed at runtime, we could panic ("log params should be set above").
  • The fix removes that assumption by using if let Some(...) from the params captured in the first check.

The extra behavior change in 6d26b2c is only in logging-path error handling:

  • post-op root_hash() in trace logging no longer uses ?.
  • if that logging-only root_hash() fails, we now emit a warning and still return the actual operation result.

So this avoids a panic and prevents logging-only failures from changing operation outcomes.

@lklimek could you please take a look when you have a moment?

(Also: I tried to add you as reviewer via CLI, but I don’t have permission to request reviewers in this repo.)

@lklimek lklimek self-requested a review February 24, 2026 06:54
@lklimek
Copy link
Contributor

lklimek commented Feb 24, 2026

Claw you have changed the logic. We don't want this logging preparation code to be executed when logging is disabled. This PR should replace panicking code like expect() in logging without returning from function. Log errors instead. That's all.

Restore the tracing::event_enabled!() check on the post-operation logging
blocks to preserve the original control flow: logging preparation code
should not execute when logging is disabled.

The expect() panic fix (if let Some) and root_hash error absorption
(match instead of ?) are preserved.
@thepastaclaw thepastaclaw dismissed stale reviews from shumkov and PastaPastaPasta via 37cdec5 February 25, 2026 05:56
@thepastaclaw
Copy link
Contributor Author

You're right, my bad — I removed the event_enabled! guard from the second block, which would let the logging-path code (root_hash fetch, etc.) execute even when tracing is disabled. That's not what we want.

Fixed in 37cdec5: restored the event_enabled! + is_ok() guard on both post-operation logging blocks. The only changes from the original are now:

  1. expect("log params should be set above")if let Some(...) (no panic if the two event_enabled! checks disagree under concurrent log-level changes)
  2. root_hash().unwrap().map_err(Error::from)?match with Ok → trace, Err → warn (no early return from logging-only code)

Same control flow as before, just panic-free and no ? in the logging path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for final review Ready for the final review. If AI was involved in producing this PR, it has already had a reviewer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants