Replace dual-sync-async persistence panic with Watch contract#4436
Replace dual-sync-async persistence panic with Watch contract#4436joostjager wants to merge 3 commits intolightningdevkit:mainfrom
Conversation
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
721f491 to
9aff3f2
Compare
9aff3f2 to
93a9d79
Compare
|
I didn't add a dedicated test for the InProgress -> complete -> Completed progression since it's already well-covered by existing tests that exercise this flow without disabling the contract check: The |
c20c819 to
77a6a30
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4436 +/- ##
==========================================
+ Coverage 85.94% 85.97% +0.03%
==========================================
Files 159 159
Lines 104607 104618 +11
Branches 104607 104618 +11
==========================================
+ Hits 89901 89943 +42
+ Misses 12194 12168 -26
+ Partials 2512 2507 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
TheBlueMatt
left a comment
There was a problem hiding this comment.
There's more context in the PR description than the commit message, please update the commit message with the context (PR descriptions should almost always just be the commit messages themselves!). Also, IIRC the chanmon_consistency target is conservative to avoid violating the old API, we should update it to allow returning Completed more often to match the API changes here.
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
77a6a30 to
218e89a
Compare
|
Updated fuzzer |
fuzz/src/chanmon_consistency.rs
Outdated
| @@ -2101,21 +2101,57 @@ pub fn do_test<Out: Output + MaybeSend + MaybeSync>( | |||
| // harm in doing so. | |||
| 0x00 => { | |||
| *mon_style[0].borrow_mut() = ChannelMonitorUpdateStatus::InProgress; | |||
There was a problem hiding this comment.
We can probably drop mon_style again, no? Given we added it in the being-basically-reverted commit?
There was a problem hiding this comment.
It's currently only used to remember what style to use after a node reload. But I could read from update_ret too and set that again on the new persister.
We could also reduce the number of command bytes by defining a command to toggle the mode.
Commit 0760f99 ("Disallow dual-sync-async persistence without restarting") added a panic in non-test builds when a Persist implementation returns both Completed and InProgress from the same ChannelManager instance. However, this check runs against the status that ChainMonitor returns to ChannelManager, not the raw Persist result. When ChannelMonitor::update_monitor fails (e.g. a counterparty commitment_signed arrives after a funding spend confirms), ChainMonitor persists the full monitor successfully but overrides the return value to InProgress. If the user's Persist impl only ever returns Completed, this override triggers a false mode-mismatch panic. This replaces the panic with a cleaner contract at the Watch trait level: a Watch implementation must not return Completed for a channel update if there are still pending InProgress updates for that channel. This matches the pure-async interface where an update cannot complete until all prior updates have also completed. Persist implementors are expected to be consistent (always sync or always async), so ChainMonitor naturally satisfies this contract without code changes. The mode tracking and panic checks from 0760f99 are removed and replaced with a panic that validates the new contract directly on the in-flight update state. Legacy tests that switch the persister between modes mid-flight can opt out via Node::disable_monitor_completeness_assertion(). The chanmon_consistency fuzz target is also updated to propagate monitor style changes to the live persister immediately rather than only on reload. Switching to Completed is guarded to only take effect when no channels have pending InProgress updates, satisfying the new Watch contract. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
218e89a to
7e2b509
Compare
|
Pushed two commits that remove The only functional change is that nodes now always start in sync mode, but the fuzzer can immediately toggle it to async. On reload, the style is carried over. |
TheBlueMatt
left a comment
There was a problem hiding this comment.
DId you run the fuzzer for a while? I'm not sure that we shouldn't be able to hit a failure like the below.
| /// [`Persist`]/[`Watch`] without first restarting. | ||
| /// A [`Watch`] implementation must not return this for a channel update if there are still | ||
| /// pending [`InProgress`] updates for that channel. That is, an update can only be considered | ||
| /// complete once all prior updates have also completed. |
There was a problem hiding this comment.
Hmm, thinking about this a bit more there's a weird race. Async updates are "completed" only after release_pending_monitor_events returns the completed update, but also only after it has been processed, which may be some arbitrary and indeterminate amount of time later.
For the ChainMonitor return case, this is fine (though we need to update ChainMonitor - we should probably be returning InProgress for any new update that comes after one for which we have to return a spurious InProgress. Do you already intend to do that in a followup? should we not do it here?). But for the general API its pretty weird - you can in theory return a Completed after returning InProgress as long as you want some indeterminate and arbitrary amount of time, so in practice you can't...
We could fix this by adding some mutual exclusion in channelmanager.rs where we wait before processing Completed monitor updates until after any pending MonitorEvents are processed - this should be basically zero cost as ~no one is going to be using mixed-async-sync persistence in practice so we'll rarely if ever have any contention.
On the flip side, we could say that implementations are allowed to flip from Completed to InProgress for a channel, but never back (without a restart). Its more complicated to document, but it captures what we need to allow the ChainMonitor behavior.
There was a problem hiding this comment.
implementations are allowed to flip from Completed to InProgress for a channel, but never back
This sounds like the easier way. There is no need to support mixing sync and async for Persist implementations. I do wonder now whether my initial version #4435 wasn't just sufficient. Moving the mixed mode check to the chainmonitor level and keeping everything else the same.
Fuzzer run was planned for tonight.
The mon_style array was a shadow copy of each node's persistence mode, kept in sync with the persister's update_ret. On node reload, it was used to restore the persistence mode to the new monitor. Instead, read the current update_ret directly from the old monitor's persister during reload, making the persister the single source of truth. This also removes the initial fuzz input byte that was used to seed mon_style, so all nodes now always start with Completed as their persistence mode (the fuzzer can still switch to InProgress via the 0x00-0x02 opcodes). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the 6 separate opcodes (0x00-0x02 for InProgress, 0x04-0x06 for Completed) with 3 toggle opcodes (0x00-0x02) that flip the persistence style for each node. When toggling back to Completed, the existing guard that prevents switching while monitors are pending is preserved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7e2b509 to
c37be93
Compare
Fix false panic by relaxing the
Watchcontract from a global "pick one mode" restriction to a per-channel rule: don't returnCompletedwhile priorInProgressupdates are still pending for that channel.