Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion doc/ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Change Log

All relevant changes are documented in this file.

[4.16][] - Unreleased
[4.16][UNRELEASED] - Unreleased
---------------------

### Changes
Expand All @@ -18,13 +18,37 @@ All relevant changes are documented in this file.
systemd `RemainAfterExit=yes`, by Aaron Andersen
- Clear service conditions on `initctl reload NAME` to ensure dependent
services are properly updated
- Run service `stop:` and `reload:` scripts as non-blocking processes,
preventing Finit from stalling on long-running helper scripts
- Guard shutdown with timer watchdog to detect and debug shutdown hangs
- Add `~` condition prefix to propagate reload from a dependency to the
dependent service. E.g., `<!~pid/netd>` means not only a regular
condition, but when `netd` reloads, restart this service too. Similar
to systemd's directive `PropagatesReloadTo=`, but declared on the
consumer side. Issue #416

### Fixes
- Fix #464: invalid user:group examples in cgroups.md
- Fix #466: elogind path for Debian-based distros, by Jackie Liu
- Fix #467: TTY services stuck in restart state after non-zero exit.
Throttling logic introduced in v4.15 had duplicate checks causing
infinite timer loop, and TTYs lacked default restart timeout
- Fix #475: clear pid condition on service collection to fix stale
deps. When a service crashes (SIGKILL), the RUNNING → HALTED path
bypasses STOPPING where `cond_clear()` is normally called, leaving
dependents stuck
- Fix #476: dependents not restarted after SIGHUP reload of service in
dependency chain. Add `service_step_all()` at end of reload cycle to
guarantee convergence after conditions are reasserted. See also the
new `~` condition prefix (above) to propagate reload to dependents
- Fix reload of SIGHUP-capable services incorrectly disrupting their
dependents. E.g., `initctl reload syslogd` would stop dbus, dnsmasq,
etc. even though syslogd handles SIGHUP gracefully and its PID persists
- Only remove managed pidfiles in service cleanup. For SysV services
with `pid:!/path`, the pidfile belongs to the service itself and Finit
should not delete it
- Silence spurious cgroup warnings for short-lived processes where the
kernel reaps the child before cgroup assignment completes
- Fix handling of already-mounted cgroups in `cgroup_init()`, can occur
after switch_root or in container environments
- Improve cgroups documentation clarity, grammar, and examples
Expand Down Expand Up @@ -1941,6 +1965,7 @@ Major bug fix release.
* Initial release

[UNRELEASED]: https://github.com/finit-project/finit/compare/4.15...HEAD
[4.16]: https://github.com/finit-project/finit/compare/4.15...4.16
[4.15]: https://github.com/finit-project/finit/compare/4.14...4.15
[4.14]: https://github.com/finit-project/finit/compare/4.13...4.14
[4.13]: https://github.com/troglobit/finit/compare/4.12...4.13
Expand Down
37 changes: 33 additions & 4 deletions doc/conditions.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ specified separated by comma. Multiple conditions are logically AND'ed
during evaluation, i.e. all conditions must be satisfied in order for a
service to run.

A special syntax, using a leading `!` in run/task/service conditions,
denote if a:
Two special prefixes can be used inside the angle brackets:

- service does not support `SIGHUP`
- run/task should not block runlevel changes (i.e., bootstrap)
- `!` -- service does not support `SIGHUP` (noreload), or run/task
should not block runlevel changes (i.e., bootstrap)
- `~` -- propagate reload from this dependency, see below

Finit guarantees by default that all run/tasks run (at least) once
per runlevel. For most tasks this is a good default, for example
Expand Down Expand Up @@ -53,6 +53,30 @@ moving to the next runlevel after bootstrap, so we set `<!>`:
task [S0123456789] <!sys/pwr/fail> name:pwrfail initctl poweroff -- Power failure, shutting down


Propagating Reload in Dependencies
-----------------------------------

By default, when a service reloads during `initctl reload`, dependent
services are paused (`SIGSTOP`) and simply resumed (`SIGCONT`) when the
condition is reasserted. This is correct for barrier-style dependencies
like `<pid/syslogd>`, where dependents just need syslogd running and do
not care if it reloads its config.

For services that need to react when their upstream reloads, the `~`
prefix propagates the reload from the dependency:

service <pid/svc_a> name:svc_b /sbin/svc_b -- Needs A (barrier)
service <!~pid/svc_b> name:svc_c /sbin/svc_c -- Propagate reload from B

Here, `<~pid/svc_b>` means: propagate a reload of `svc_b` to `svc_c`.
When `svc_b` reloads, `svc_c` will be restarted (because of `!`,
noreload) instead of merely resumed. If `svc_c` supported `SIGHUP`
(no `!` prefix), it would be sent `SIGHUP` instead.

This is similar to systemd's `PropagatesReloadTo=` directive, but
declared on the consumer side rather than the provider side.


Triggering
----------

Expand Down Expand Up @@ -281,6 +305,11 @@ This STOP/CONT handling minimizes the number of unnecessary service
restarts that would otherwise occur because a depending service was sent
`SIGHUP` for example.

Services with the `~` prefix are an exception to this rule: when their
conditions return to `on` after being in `flux`, the reload is propagated
-- the service is reloaded (SIGHUP) or restarted (noreload `!`) instead
of simply being resumed.

Therefore, any plugin that supplies Finit with conditions must ensure
that their state is updated after each reconfiguration. This can be
done by binding to the `HOOK_SVC_RECONF` hook. For an example of how
Expand Down
8 changes: 8 additions & 0 deletions doc/config/services.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,14 @@ waits for another service, `zebra`, to have created its PID file in
*all* files in `/var/run`, for each file named `*.pid`, or `*/pid`,
Finit opens it and find the matching `NAME:ID` using the PID.

The condition can be prefixed with `!` and/or `~`:

- `<!pid/zebra>` -- `ospfd` does not support `SIGHUP` (noreload)
- `<~pid/zebra>` -- propagate reload from `zebra` to `ospfd`
- `<!~pid/zebra>` -- both: noreload and propagate reload

For details, see the [Finit Conditions](../conditions.md) document.

Some services do not maintain a PID file and rather than patching each
application Finit provides a workaround. A `pid` keyword can be set
to have Finit automatically create (when starting) and later remove
Expand Down
3 changes: 2 additions & 1 deletion plugins/pidfile.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ static int pidfile_add_path(struct iwatch *iw, char *path)
}
}

return iwatch_add(iw, path, IN_ONLYDIR | IN_CLOSE_WRITE);
return iwatch_add(iw, path, IN_ONLYDIR | IN_CLOSE_WRITE | IN_ATTRIB);
}

static void pidfile_update_conds(char *dir, char *name, uint32_t mask)
Expand Down Expand Up @@ -276,6 +276,7 @@ static void pidfile_reconf(void *arg)
if (cond_get(cond) == COND_ON)
continue;

dbg("Reassert condition %s", cond);
cond_set_path(cond_path(cond), COND_ON);
}

Expand Down
15 changes: 10 additions & 5 deletions src/api.c
Original file line number Diff line number Diff line change
Expand Up @@ -126,17 +126,22 @@ static int reload(svc_t *svc, void *user_data)

/*
* Clear conditions before reload to ensure dependent services
* are properly updated. The conditions are reasserted when
* the service touches its PID file after processing SIGHUP.
* are properly updated. Only needed when the service does NOT
* support SIGHUP (noreload), because then it will be stopped
* and restarted, so conditions genuinely go away. When the
* service handles SIGHUP, its PID and pidfile persist, so the
* condition stays valid and dependents should not be disrupted.
*
* Note: only clear 'ready' for services where the pidfile
* inotify handler reasserts it (pid/none). For s6/systemd
* services readiness relies on their respective notification
* mechanism which may not re-trigger on SIGHUP.
*/
cond_clear(mkcond(svc, cond, sizeof(cond)));
if (svc->notify == SVC_NOTIFY_PID || svc->notify == SVC_NOTIFY_NONE)
service_ready(svc, 0);
if (svc_is_noreload(svc)) {
cond_clear(mkcond(svc, cond, sizeof(cond)));
if (svc->notify == SVC_NOTIFY_PID || svc->notify == SVC_NOTIFY_NONE)
service_ready(svc, 0);
}

svc_mark_dirty(svc);
service_step(svc);
Expand Down
6 changes: 4 additions & 2 deletions src/cgroup.c
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,8 @@ int cgroup_move_pid(const char *group, const char *name, int pid, int delegate)
}

if (rc) {
warn("Failed moving pid %d to cgroup %s", pid, path);
if (errno != ESRCH)
warn("Failed moving pid %d to cgroup %s", pid, path);
return -1;
}
}
Expand Down Expand Up @@ -379,7 +380,8 @@ static int cgroup_leaf_init(const char *group, const char *name, int pid, const
if (!strcmp(group, "root"))
return fnwrite(str("%d", pid), FINIT_CGPATH "/cgroup.procs");

if (cgroup_move_pid(group, name, pid, delegate))
/* Error if error is not "No such process" */
if (cgroup_move_pid(group, name, pid, delegate) && errno != ESRCH)
err(1, "Failed moving pid %d to group %s/%s", pid, group, name);

/* Set up inotify watch for cgroup cleanup */
Expand Down
13 changes: 13 additions & 0 deletions src/conf.c
Original file line number Diff line number Diff line change
Expand Up @@ -745,8 +745,21 @@ void conf_parse_cond(svc_t *svc, char *cond)
return;
}

/*
* The '~' prefix means a reload of the upstream service is
* propagated to this service -- it will be reloaded (SIGHUP)
* or restarted (noreload), not just paused and resumed.
* Syntax: <!~pid/foo,~pid/bar> or <~pid/foo>
*
* Strip '~' from each condition and set the service-level
* flag. This allows '~' on any condition in the list.
*/
svc->cond[0] = 0;
for (i = 0, c = strtok(ptr, ","); c; c = strtok(NULL, ","), i++) {
if (c[0] == '~') {
c++;
svc->flux_reload = 1;
}
devmon_add_cond(c);
if (i)
strlcat(svc->cond, ",", sizeof(svc->cond));
Expand Down
72 changes: 43 additions & 29 deletions src/service.c
Original file line number Diff line number Diff line change
Expand Up @@ -1031,7 +1031,6 @@ static int service_run_script(svc_t *svc, char *script)
{
const char *id = svc_ident(svc, NULL, 0);
pid_t pid = service_fork(svc);
int status, rc;

if (pid < 0) {
err(1, "%s: failed forking off script %s", id, script);
Expand All @@ -1047,6 +1046,8 @@ static int service_run_script(svc_t *svc, char *script)
};
char pidbuf[16];

redirect(svc);

snprintf(pidbuf, sizeof(pidbuf), "%d", svc->pid);
setenv("MAINPID", pidbuf, 1);

Expand All @@ -1056,23 +1057,7 @@ static int service_run_script(svc_t *svc, char *script)
}

dbg("%s: script '%s' started as PID %d", id, script, pid);
if (waitpid(pid, &status, 0) == -1) {
warn("%s: failed calling script %s", id, script);
return -1;
}

rc = WEXITSTATUS(status);
if (WIFEXITED(status)) {
dbg("%s: script '%s' exited without signal, status: %d", id, script, rc);
} else if (WIFSIGNALED(status)) {
dbg("%s: script '%s' terminated by signal %d", id, script, WTERMSIG(status));
if (!rc)
rc = 1;
} else {
dbg("%s: script '%s' exited with status: %d", id, script, rc);
}

return rc;
return service_script_add(svc, pid, svc->killdelay);
}

/* Ensure we don't have any notify socket lingering */
Expand All @@ -1093,15 +1078,31 @@ static void service_notify_stop(svc_t *svc)
*/
static void service_cleanup(svc_t *svc)
{
char *fn;
char cond[MAX_COND_LEN];

/* PID collected, cancel any pending SIGKILL */
service_timeout_cancel(svc);

fn = pid_file(svc);
if (fn && remove(fn) && errno != ENOENT)
logit(LOG_CRIT, "Failed removing service %s pidfile %s",
svc_ident(svc, NULL, 0), fn);
/* Only clean up process' pidfile if managed by us */
if (svc_has_pidfile(svc)) {
char *fn = pid_file(svc);

if (remove(fn) && errno != ENOENT)
logit(LOG_CRIT, "Failed removing service %s pidfile %s",
svc_ident(svc, NULL, 0), fn);
}

/*
* Invalidate the pid/ condition for this service to ensure
* dependent services are properly stopped and restarted.
* Without this, the condition is only cleared asynchronously
* via inotify on pidfile removal, which may not trigger when
* the daemon fails to clean up its own pidfile, or when the
* service dies during a reload cycle and goes directly from
* RUNNING to HALTED (skipping STOPPING where cond_clear()
* is normally called).
*/
cond_clear(mkcond(svc, cond, sizeof(cond)));

service_notify_stop(svc);

Expand Down Expand Up @@ -1140,7 +1141,7 @@ int service_stop(svc_t *svc)
service_timeout_cancel(svc);

if (svc->stop_script[0]) {
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling stop:%s ...", id, svc->pid, svc->stop_script);
logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], calling stop:%s ...", id, svc->pid, svc->stop_script);
} else if (!svc_is_sysv(svc)) {
char *nm = pid_get_name(svc->pid, NULL, 0);
const char *sig = sig_name(svc->sighalt);
Expand All @@ -1155,10 +1156,10 @@ int service_stop(svc_t *svc)
}

dbg("Sending %s to pid:%d name:%s(%s)", sig, svc->pid, id, nm);
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], stopping, sending %s ...", id, svc->pid, sig);
logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], sending %s ...", id, svc->pid, sig);
} else {
compose_cmdline(svc, cmdline, sizeof(cmdline));
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling '%s stop' ...", id, svc->pid, cmdline);
logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], calling '%s stop' ...", id, svc->pid, cmdline);
}

/*
Expand Down Expand Up @@ -1278,16 +1279,16 @@ static int service_reload(svc_t *svc)
print_desc("Restarting ", svc->desc);

if (svc->reload_script[0]) {
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling reload:%s ...", id, svc->pid, svc->reload_script);
logit(LOG_CONSOLE | LOG_NOTICE, "Reloading %s[%d], calling reload:%s ...", id, svc->pid, svc->reload_script);
rc = service_run_script(svc, svc->reload_script);
} else if (svc->sighup) {
if (svc->pid <= 1) {
dbg("%s[%d]: bad PID, cannot reload service", id, svc->pid);
svc->start_time = svc->pid = 0;
goto done;
}
dbg("%s[%d], sending SIGHUP", id, svc->pid);
logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], sending SIGHUP ...", id, svc->pid);
dbg("Reloading %s[%d], sending SIGHUP", id, svc->pid);
logit(LOG_CONSOLE | LOG_NOTICE, "Reloading %s[%d], sending SIGHUP ...", id, svc->pid);
rc = kill(svc->pid, SIGHUP);
if (rc == -1 && (errno == ESRCH || errno == ENOENT)) {
/* nobody home, reset internal state machine */
Expand Down Expand Up @@ -3134,6 +3135,19 @@ int service_step(svc_t *svc)
case COND_ON:
kill(svc->pid, SIGCONT);
svc_set_state(svc, SVC_RUNNING_STATE);

/*
* Propagate reload (~): upstream reloaded, so
* we must also reload/restart, not just resume.
*/
if (svc->flux_reload && !svc_is_changed(svc)) {
if (svc_is_noreload(svc))
service_stop(svc);
else
service_reload(svc);
break;
}

/* Reassert condition if we go from waiting and no change */
if (!svc_is_changed(svc)) {
if (svc->notify == SVC_NOTIFY_PID) {
Expand Down
6 changes: 6 additions & 0 deletions src/sig.c
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,8 @@ void do_shutdown(shutop_t op)
int in_cont = in_container();
int signo = SIGTERM;

if (SHUTDOWN_DEBUG) cprintf("do_shutdown: entered, op=%d\n", op);

if (!in_cont) {
/*
* On a PREEMPT-RT system, Finit must run as the highest prioritized
Expand All @@ -322,13 +324,15 @@ void do_shutdown(shutop_t op)
* Tell remaining non-monitored processes to exit, give them
* time to exit gracefully, 2 sec was customary, we go for 1.
*/
if (SHUTDOWN_DEBUG) cprintf("do_shutdown: killing remaining processes ...\n");
iterate_proc(kill_cb, &signo);
if (do_wait(1)) {
signo = SIGKILL;
iterate_proc(kill_cb, &signo);
}

/* Exit plugins gracefully */
if (SHUTDOWN_DEBUG) cprintf("do_shutdown: plugin_exit() + cond_exit() ...\n");
plugin_exit();

/* We can no longer act on conditions, activate script runner */
Expand All @@ -351,12 +355,14 @@ void do_shutdown(shutop_t op)
for (int fd = 3; fd < 128; fd++)
close(fd);

if (SHUTDOWN_DEBUG) cprintf("do_shutdown: vfork, child will reboot ...\n");
if (vfork()) {
/*
* Put PID 1 aside and let child perform reboot/halt
* kernel may exit child and we don't want to exit PID 1
* ... causing "aiii killing init" during reboot ...
*/
if (SHUTDOWN_DEBUG) cprintf("do_shutdown: PID 1 returning from vfork\n");
return;
}

Expand Down
Loading