✨ feat(soft): detect and break stale locks#476
Merged
gaborbernat merged 5 commits intotox-dev:mainfrom Feb 14, 2026
Merged
Conversation
SoftFileLock leaves orphaned lock files when the holding process crashes. Other processes then block forever waiting for a lock that will never be released. Now the lock file stores the holder's PID and hostname. On contention, if the holder is on the same host and its PID no longer exists, the stale lock is atomically renamed away and removed, allowing acquisition to proceed on the next retry. All detection errors are suppressed to preserve backward compatibility with empty or foreign lock files.
4a3c34f to
1d21380
Compare
On Windows, opening the lock file for reading during stale detection blocked concurrent file deletion in _release, causing a livelock under heavy threaded contention (100 threads × 100 iterations). The fix uses CreateFileW with FILE_SHARE_DELETE when reading lock info on Windows, allowing _release's unlink to succeed even while another thread reads the file. On Unix, os.open/os.read/os.close is used instead of Path.read_text for consistent low-level fd handling.
The ty type checker does not narrow sys.platform across method boundaries, so ctypes.windll in a separate _read_lock_info_win method was flagged as unresolved on Linux. Inlined the Windows branch into _read_lock_info under the sys.platform guard.
On Windows, even with FILE_SHARE_DELETE, a file marked for deletion keeps its name visible until the last handle closes. With 100 threads competing, there's always a reader holding a handle, preventing the file from fully disappearing — causing a livelock. Skip stale detection entirely for lock files younger than 2 seconds. During normal threaded contention files are sub-second old, while genuinely stale locks persist much longer. This eliminates the read handle contention without sacrificing stale lock recovery.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a process holding a
SoftFileLockcrashes, the lock file is left behind and blocks every other process forever.This is especially painful on CI and in long-running daemons where manual cleanup is impractical.
The lock file now stores
{pid}\n{hostname}\non acquire. On contention (EEXIST), the competing process reads thismetadata, verifies the hostname matches, and probes whether the holding PID is still alive (
os.kill(pid, 0)on Unix,OpenProcess(SYNCHRONIZE)on Windows). If the holder is confirmed dead, the stale lock is broken via an atomicrename+unlinksequence that avoids races between concurrent breakers.Stale detection is Unix/macOS only. On Windows, Python's C runtime (
_wopen) cannot setFILE_SHARE_DELETE, soany read handle on the lock file blocks
DeleteFileWduring release -- causing a livelock under threaded contention.Windows already distinguishes
EACCES(holder alive, fd open) fromEEXIST(file exists, no active holder), and inpractice
EEXISTresolves quickly as the releasing thread deletes the file. Cross-host stale locks are also leftuntouched since PID liveness cannot be verified remotely.