Skip to content

tooldata_db: Fix signal handling - waiting for multiple processes#3833

Open
smoe wants to merge 1 commit intoLinuxCNC:2.9from
smoe:fix_tooldata_signal_handling
Open

tooldata_db: Fix signal handling - waiting for multiple processes#3833
smoe wants to merge 1 commit intoLinuxCNC:2.9from
smoe:fix_tooldata_signal_handling

Conversation

@smoe
Copy link
Collaborator

@smoe smoe commented Feb 26, 2026

I need some help with this one. It is all a bit confusing. Am I confused?

In tooldata_db.cc (line 67), the old code does:

  1. waitpid(..., WNOHANG) in a while (...) ; loop with an empty body.
  2. Then evaluates WIFSIGNALED(status) and WIFEXITED(status) once, after the loop.

The problem is:

  1. waitpid only writes status when it returns > 0.
  2. With WNOHANG, it can return 0 immediately (no state change yet), or -1 on error.
  3. In those cases, status is not guaranteed to be written, thus left undefined.
  4. That can cause false logs and incorrectly set db_live = 0.

What the patch changes:

  1. It moves WIFSIGNALED/WIFEXITED checks inside the while (waitpid(...) > 0) loop.
  2. status is only inspected in iterations where waitpid definitely produced a valid child status.
  3. If no child was ready, the loop does not run, and nothing is logged or toggled.
  4. If multiple children changed state, each valid status is handled, instead of only the last one.

I patched this against 2.9. But if that is too risky then maybe 2.10 is preferable to give this some time to mature? But then again, we should somehow figure out if this change is correct or not. :-)

In tooldata_db.cc (line 67), the old code does:

1. waitpid(..., WNOHANG) in a while (...) ; loop with an empty body.
2. Then evaluates WIFSIGNALED(status) and WIFEXITED(status) once, after the loop.

The problem is:

1. waitpid only writes status when it returns > 0.
2. With WNOHANG, it can return 0 immediately (no state change yet), or -1 on error.
3. In those cases, status is not guaranteed to be written.
4. The code then reads status anyway via WIF* macros, so it can read stale/uninitialized data.
5. That can cause false logs and incorrectly set db_live = 0.

What the patch changes:

1. It moves WIFSIGNALED/WIFEXITED checks inside the while (waitpid(...) > 0) loop.
2. status is only inspected in iterations where waitpid definitely produced a valid child status.
3. If no child was reapable, the loop does not run, and nothing is logged or toggled.
4. If multiple children changed state, each valid status is handled, instead of only the last one.
@smoe smoe added bug for-discussion-only This pull request is intended as a basis for discussion, not to be merged as-is v2.9 candidate Things that would be nice to have for 2.9 2.10-candidate would be nice to have fixed in 2.10 labels Feb 26, 2026
@rmu75
Copy link
Collaborator

rmu75 commented Feb 26, 2026

Is there a real problem?

To me it looks like SIGCHLD cause signal handler to fire, the while loop eats all available signal info until the WNOHANG case causes waitpid to return 0. If waitpid indeed doesn't alter status in this case (man page is avoiding that topic), it should contain the status of the last signal returned by watipid (and there should be one because we are in a signal handler).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.10-candidate would be nice to have fixed in 2.10 bug for-discussion-only This pull request is intended as a basis for discussion, not to be merged as-is v2.9 candidate Things that would be nice to have for 2.9

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants