diff --git a/doc/ChangeLog.md b/doc/ChangeLog.md index 4057ff1f..0db44b2c 100644 --- a/doc/ChangeLog.md +++ b/doc/ChangeLog.md @@ -3,7 +3,7 @@ Change Log All relevant changes are documented in this file. -[4.16][] - Unreleased +[4.16][UNRELEASED] - Unreleased --------------------- ### Changes @@ -18,6 +18,14 @@ All relevant changes are documented in this file. systemd `RemainAfterExit=yes`, by Aaron Andersen - Clear service conditions on `initctl reload NAME` to ensure dependent services are properly updated +- Run service `stop:` and `reload:` scripts as non-blocking processes, + preventing Finit from stalling on long-running helper scripts +- Guard shutdown with timer watchdog to detect and debug shutdown hangs +- Add `~` condition prefix to propagate reload from a dependency to the + dependent service. E.g., `` means not only a regular + condition, but when `netd` reloads, restart this service too. Similar + to systemd's directive `PropagatesReloadTo=`, but declared on the + consumer side. Issue #416 ### Fixes - Fix #464: invalid user:group examples in cgroups.md @@ -25,6 +33,22 @@ All relevant changes are documented in this file. - Fix #467: TTY services stuck in restart state after non-zero exit. Throttling logic introduced in v4.15 had duplicate checks causing infinite timer loop, and TTYs lacked default restart timeout +- Fix #475: clear pid condition on service collection to fix stale + deps. When a service crashes (SIGKILL), the RUNNING → HALTED path + bypasses STOPPING where `cond_clear()` is normally called, leaving + dependents stuck +- Fix #476: dependents not restarted after SIGHUP reload of service in + dependency chain. Add `service_step_all()` at end of reload cycle to + guarantee convergence after conditions are reasserted. See also the + new `~` condition prefix (above) to propagate reload to dependents +- Fix reload of SIGHUP-capable services incorrectly disrupting their + dependents. E.g., `initctl reload syslogd` would stop dbus, dnsmasq, + etc. even though syslogd handles SIGHUP gracefully and its PID persists +- Only remove managed pidfiles in service cleanup. For SysV services + with `pid:!/path`, the pidfile belongs to the service itself and Finit + should not delete it +- Silence spurious cgroup warnings for short-lived processes where the + kernel reaps the child before cgroup assignment completes - Fix handling of already-mounted cgroups in `cgroup_init()`, can occur after switch_root or in container environments - Improve cgroups documentation clarity, grammar, and examples @@ -1941,6 +1965,7 @@ Major bug fix release. * Initial release [UNRELEASED]: https://github.com/finit-project/finit/compare/4.15...HEAD +[4.16]: https://github.com/finit-project/finit/compare/4.15...4.16 [4.15]: https://github.com/finit-project/finit/compare/4.14...4.15 [4.14]: https://github.com/finit-project/finit/compare/4.13...4.14 [4.13]: https://github.com/troglobit/finit/compare/4.12...4.13 diff --git a/doc/conditions.md b/doc/conditions.md index 93df4169..aac0eadf 100644 --- a/doc/conditions.md +++ b/doc/conditions.md @@ -14,11 +14,11 @@ specified separated by comma. Multiple conditions are logically AND'ed during evaluation, i.e. all conditions must be satisfied in order for a service to run. -A special syntax, using a leading `!` in run/task/service conditions, -denote if a: +Two special prefixes can be used inside the angle brackets: - - service does not support `SIGHUP` - - run/task should not block runlevel changes (i.e., bootstrap) + - `!` -- service does not support `SIGHUP` (noreload), or run/task + should not block runlevel changes (i.e., bootstrap) + - `~` -- propagate reload from this dependency, see below Finit guarantees by default that all run/tasks run (at least) once per runlevel. For most tasks this is a good default, for example @@ -53,6 +53,30 @@ moving to the next runlevel after bootstrap, so we set ``: task [S0123456789] name:pwrfail initctl poweroff -- Power failure, shutting down +Propagating Reload in Dependencies +----------------------------------- + +By default, when a service reloads during `initctl reload`, dependent +services are paused (`SIGSTOP`) and simply resumed (`SIGCONT`) when the +condition is reasserted. This is correct for barrier-style dependencies +like ``, where dependents just need syslogd running and do +not care if it reloads its config. + +For services that need to react when their upstream reloads, the `~` +prefix propagates the reload from the dependency: + + service name:svc_b /sbin/svc_b -- Needs A (barrier) + service name:svc_c /sbin/svc_c -- Propagate reload from B + +Here, `<~pid/svc_b>` means: propagate a reload of `svc_b` to `svc_c`. +When `svc_b` reloads, `svc_c` will be restarted (because of `!`, +noreload) instead of merely resumed. If `svc_c` supported `SIGHUP` +(no `!` prefix), it would be sent `SIGHUP` instead. + +This is similar to systemd's `PropagatesReloadTo=` directive, but +declared on the consumer side rather than the provider side. + + Triggering ---------- @@ -281,6 +305,11 @@ This STOP/CONT handling minimizes the number of unnecessary service restarts that would otherwise occur because a depending service was sent `SIGHUP` for example. +Services with the `~` prefix are an exception to this rule: when their +conditions return to `on` after being in `flux`, the reload is propagated +-- the service is reloaded (SIGHUP) or restarted (noreload `!`) instead +of simply being resumed. + Therefore, any plugin that supplies Finit with conditions must ensure that their state is updated after each reconfiguration. This can be done by binding to the `HOOK_SVC_RECONF` hook. For an example of how diff --git a/doc/config/services.md b/doc/config/services.md index 38ba0be4..a315b372 100644 --- a/doc/config/services.md +++ b/doc/config/services.md @@ -47,6 +47,14 @@ waits for another service, `zebra`, to have created its PID file in *all* files in `/var/run`, for each file named `*.pid`, or `*/pid`, Finit opens it and find the matching `NAME:ID` using the PID. +The condition can be prefixed with `!` and/or `~`: + + - `` -- `ospfd` does not support `SIGHUP` (noreload) + - `<~pid/zebra>` -- propagate reload from `zebra` to `ospfd` + - `` -- both: noreload and propagate reload + +For details, see the [Finit Conditions](../conditions.md) document. + Some services do not maintain a PID file and rather than patching each application Finit provides a workaround. A `pid` keyword can be set to have Finit automatically create (when starting) and later remove diff --git a/plugins/pidfile.c b/plugins/pidfile.c index c64e18ff..ae0fdeea 100644 --- a/plugins/pidfile.c +++ b/plugins/pidfile.c @@ -59,7 +59,7 @@ static int pidfile_add_path(struct iwatch *iw, char *path) } } - return iwatch_add(iw, path, IN_ONLYDIR | IN_CLOSE_WRITE); + return iwatch_add(iw, path, IN_ONLYDIR | IN_CLOSE_WRITE | IN_ATTRIB); } static void pidfile_update_conds(char *dir, char *name, uint32_t mask) @@ -276,6 +276,7 @@ static void pidfile_reconf(void *arg) if (cond_get(cond) == COND_ON) continue; + dbg("Reassert condition %s", cond); cond_set_path(cond_path(cond), COND_ON); } diff --git a/src/api.c b/src/api.c index 85373350..a0af4542 100644 --- a/src/api.c +++ b/src/api.c @@ -126,17 +126,22 @@ static int reload(svc_t *svc, void *user_data) /* * Clear conditions before reload to ensure dependent services - * are properly updated. The conditions are reasserted when - * the service touches its PID file after processing SIGHUP. + * are properly updated. Only needed when the service does NOT + * support SIGHUP (noreload), because then it will be stopped + * and restarted, so conditions genuinely go away. When the + * service handles SIGHUP, its PID and pidfile persist, so the + * condition stays valid and dependents should not be disrupted. * * Note: only clear 'ready' for services where the pidfile * inotify handler reasserts it (pid/none). For s6/systemd * services readiness relies on their respective notification * mechanism which may not re-trigger on SIGHUP. */ - cond_clear(mkcond(svc, cond, sizeof(cond))); - if (svc->notify == SVC_NOTIFY_PID || svc->notify == SVC_NOTIFY_NONE) - service_ready(svc, 0); + if (svc_is_noreload(svc)) { + cond_clear(mkcond(svc, cond, sizeof(cond))); + if (svc->notify == SVC_NOTIFY_PID || svc->notify == SVC_NOTIFY_NONE) + service_ready(svc, 0); + } svc_mark_dirty(svc); service_step(svc); diff --git a/src/cgroup.c b/src/cgroup.c index 20289cf4..58e68875 100644 --- a/src/cgroup.c +++ b/src/cgroup.c @@ -190,7 +190,8 @@ int cgroup_move_pid(const char *group, const char *name, int pid, int delegate) } if (rc) { - warn("Failed moving pid %d to cgroup %s", pid, path); + if (errno != ESRCH) + warn("Failed moving pid %d to cgroup %s", pid, path); return -1; } } @@ -379,7 +380,8 @@ static int cgroup_leaf_init(const char *group, const char *name, int pid, const if (!strcmp(group, "root")) return fnwrite(str("%d", pid), FINIT_CGPATH "/cgroup.procs"); - if (cgroup_move_pid(group, name, pid, delegate)) + /* Error if error is not "No such process" */ + if (cgroup_move_pid(group, name, pid, delegate) && errno != ESRCH) err(1, "Failed moving pid %d to group %s/%s", pid, group, name); /* Set up inotify watch for cgroup cleanup */ diff --git a/src/conf.c b/src/conf.c index b4393e65..d0abb61d 100644 --- a/src/conf.c +++ b/src/conf.c @@ -745,8 +745,21 @@ void conf_parse_cond(svc_t *svc, char *cond) return; } + /* + * The '~' prefix means a reload of the upstream service is + * propagated to this service -- it will be reloaded (SIGHUP) + * or restarted (noreload), not just paused and resumed. + * Syntax: or <~pid/foo> + * + * Strip '~' from each condition and set the service-level + * flag. This allows '~' on any condition in the list. + */ svc->cond[0] = 0; for (i = 0, c = strtok(ptr, ","); c; c = strtok(NULL, ","), i++) { + if (c[0] == '~') { + c++; + svc->flux_reload = 1; + } devmon_add_cond(c); if (i) strlcat(svc->cond, ",", sizeof(svc->cond)); diff --git a/src/service.c b/src/service.c index 05121d06..b2c6c15b 100644 --- a/src/service.c +++ b/src/service.c @@ -1031,7 +1031,6 @@ static int service_run_script(svc_t *svc, char *script) { const char *id = svc_ident(svc, NULL, 0); pid_t pid = service_fork(svc); - int status, rc; if (pid < 0) { err(1, "%s: failed forking off script %s", id, script); @@ -1047,6 +1046,8 @@ static int service_run_script(svc_t *svc, char *script) }; char pidbuf[16]; + redirect(svc); + snprintf(pidbuf, sizeof(pidbuf), "%d", svc->pid); setenv("MAINPID", pidbuf, 1); @@ -1056,23 +1057,7 @@ static int service_run_script(svc_t *svc, char *script) } dbg("%s: script '%s' started as PID %d", id, script, pid); - if (waitpid(pid, &status, 0) == -1) { - warn("%s: failed calling script %s", id, script); - return -1; - } - - rc = WEXITSTATUS(status); - if (WIFEXITED(status)) { - dbg("%s: script '%s' exited without signal, status: %d", id, script, rc); - } else if (WIFSIGNALED(status)) { - dbg("%s: script '%s' terminated by signal %d", id, script, WTERMSIG(status)); - if (!rc) - rc = 1; - } else { - dbg("%s: script '%s' exited with status: %d", id, script, rc); - } - - return rc; + return service_script_add(svc, pid, svc->killdelay); } /* Ensure we don't have any notify socket lingering */ @@ -1093,15 +1078,31 @@ static void service_notify_stop(svc_t *svc) */ static void service_cleanup(svc_t *svc) { - char *fn; + char cond[MAX_COND_LEN]; /* PID collected, cancel any pending SIGKILL */ service_timeout_cancel(svc); - fn = pid_file(svc); - if (fn && remove(fn) && errno != ENOENT) - logit(LOG_CRIT, "Failed removing service %s pidfile %s", - svc_ident(svc, NULL, 0), fn); + /* Only clean up process' pidfile if managed by us */ + if (svc_has_pidfile(svc)) { + char *fn = pid_file(svc); + + if (remove(fn) && errno != ENOENT) + logit(LOG_CRIT, "Failed removing service %s pidfile %s", + svc_ident(svc, NULL, 0), fn); + } + + /* + * Invalidate the pid/ condition for this service to ensure + * dependent services are properly stopped and restarted. + * Without this, the condition is only cleared asynchronously + * via inotify on pidfile removal, which may not trigger when + * the daemon fails to clean up its own pidfile, or when the + * service dies during a reload cycle and goes directly from + * RUNNING to HALTED (skipping STOPPING where cond_clear() + * is normally called). + */ + cond_clear(mkcond(svc, cond, sizeof(cond))); service_notify_stop(svc); @@ -1140,7 +1141,7 @@ int service_stop(svc_t *svc) service_timeout_cancel(svc); if (svc->stop_script[0]) { - logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling stop:%s ...", id, svc->pid, svc->stop_script); + logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], calling stop:%s ...", id, svc->pid, svc->stop_script); } else if (!svc_is_sysv(svc)) { char *nm = pid_get_name(svc->pid, NULL, 0); const char *sig = sig_name(svc->sighalt); @@ -1155,10 +1156,10 @@ int service_stop(svc_t *svc) } dbg("Sending %s to pid:%d name:%s(%s)", sig, svc->pid, id, nm); - logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], stopping, sending %s ...", id, svc->pid, sig); + logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], sending %s ...", id, svc->pid, sig); } else { compose_cmdline(svc, cmdline, sizeof(cmdline)); - logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling '%s stop' ...", id, svc->pid, cmdline); + logit(LOG_CONSOLE | LOG_NOTICE, "Stopping %s[%d], calling '%s stop' ...", id, svc->pid, cmdline); } /* @@ -1278,7 +1279,7 @@ static int service_reload(svc_t *svc) print_desc("Restarting ", svc->desc); if (svc->reload_script[0]) { - logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], calling reload:%s ...", id, svc->pid, svc->reload_script); + logit(LOG_CONSOLE | LOG_NOTICE, "Reloading %s[%d], calling reload:%s ...", id, svc->pid, svc->reload_script); rc = service_run_script(svc, svc->reload_script); } else if (svc->sighup) { if (svc->pid <= 1) { @@ -1286,8 +1287,8 @@ static int service_reload(svc_t *svc) svc->start_time = svc->pid = 0; goto done; } - dbg("%s[%d], sending SIGHUP", id, svc->pid); - logit(LOG_CONSOLE | LOG_NOTICE, "%s[%d], sending SIGHUP ...", id, svc->pid); + dbg("Reloading %s[%d], sending SIGHUP", id, svc->pid); + logit(LOG_CONSOLE | LOG_NOTICE, "Reloading %s[%d], sending SIGHUP ...", id, svc->pid); rc = kill(svc->pid, SIGHUP); if (rc == -1 && (errno == ESRCH || errno == ENOENT)) { /* nobody home, reset internal state machine */ @@ -3134,6 +3135,19 @@ int service_step(svc_t *svc) case COND_ON: kill(svc->pid, SIGCONT); svc_set_state(svc, SVC_RUNNING_STATE); + + /* + * Propagate reload (~): upstream reloaded, so + * we must also reload/restart, not just resume. + */ + if (svc->flux_reload && !svc_is_changed(svc)) { + if (svc_is_noreload(svc)) + service_stop(svc); + else + service_reload(svc); + break; + } + /* Reassert condition if we go from waiting and no change */ if (!svc_is_changed(svc)) { if (svc->notify == SVC_NOTIFY_PID) { diff --git a/src/sig.c b/src/sig.c index e3e25b52..38f16608 100644 --- a/src/sig.c +++ b/src/sig.c @@ -303,6 +303,8 @@ void do_shutdown(shutop_t op) int in_cont = in_container(); int signo = SIGTERM; + if (SHUTDOWN_DEBUG) cprintf("do_shutdown: entered, op=%d\n", op); + if (!in_cont) { /* * On a PREEMPT-RT system, Finit must run as the highest prioritized @@ -322,6 +324,7 @@ void do_shutdown(shutop_t op) * Tell remaining non-monitored processes to exit, give them * time to exit gracefully, 2 sec was customary, we go for 1. */ + if (SHUTDOWN_DEBUG) cprintf("do_shutdown: killing remaining processes ...\n"); iterate_proc(kill_cb, &signo); if (do_wait(1)) { signo = SIGKILL; @@ -329,6 +332,7 @@ void do_shutdown(shutop_t op) } /* Exit plugins gracefully */ + if (SHUTDOWN_DEBUG) cprintf("do_shutdown: plugin_exit() + cond_exit() ...\n"); plugin_exit(); /* We can no longer act on conditions, activate script runner */ @@ -351,12 +355,14 @@ void do_shutdown(shutop_t op) for (int fd = 3; fd < 128; fd++) close(fd); + if (SHUTDOWN_DEBUG) cprintf("do_shutdown: vfork, child will reboot ...\n"); if (vfork()) { /* * Put PID 1 aside and let child perform reboot/halt * kernel may exit child and we don't want to exit PID 1 * ... causing "aiii killing init" during reboot ... */ + if (SHUTDOWN_DEBUG) cprintf("do_shutdown: PID 1 returning from vfork\n"); return; } diff --git a/src/sm.c b/src/sm.c index 69f61c86..ab3abdb9 100644 --- a/src/sm.c +++ b/src/sm.c @@ -63,9 +63,37 @@ static struct { int skip_bootstrap; /* Set to skip bootstrap waiting */ } sm; +/* Shutdown trace: set SHUTDOWN_DEBUG to 1 in sm.h to enable */ +#define shutdbg(fmt, args...) do { \ + if (SHUTDOWN_DEBUG && (runlevel == 0 || runlevel == 6)) \ + cprintf("shutdown[%c]: " fmt "\n", \ + sm_rl2ch(runlevel), ##args); \ +} while (0) + static uev_t console_watcher; static int console_fd = -1; +/* + * Shutdown watchdog: if graceful service stop takes too long, + * force do_shutdown() directly from the event loop callback. + * Default timeout is 30 seconds, adjustable at build time. + */ +#ifndef SHUTDOWN_WDT_TIMEOUT +#define SHUTDOWN_WDT_TIMEOUT 30 +#endif +static uev_t shutdown_wdt; + +static void shutdown_wdt_cb(uev_t *w, void *arg, int events) +{ + (void)w; + (void)arg; + (void)events; + + logit(LOG_CONSOLE | LOG_CRIT, "Timed out (%d sec) waiting for services to stop, forcing shutdown.", + SHUTDOWN_WDT_TIMEOUT); + do_shutdown(halt); +} + /* * Console input callback - handles Ctrl-C during bootstrap wait */ @@ -350,6 +378,7 @@ void sm_step(void) /* reload ? */ if (sm.reload) { + shutdbg("RUNNING got reload request!"); sm.reload = 0; sm.state = SM_RELOAD_CHANGE_STATE; } @@ -362,9 +391,15 @@ void sm_step(void) /* Restore terse mode and run hooks before shutdown */ if (runlevel == 0 || runlevel == 6) { + shutdbg("entering CHANGE state, starting %d sec watchdog", + SHUTDOWN_WDT_TIMEOUT); + uev_timer_init(ctx, &shutdown_wdt, shutdown_wdt_cb, + NULL, SHUTDOWN_WDT_TIMEOUT * 1000, 0); api_exit(); log_exit(); + shutdbg("running HOOK_SHUTDOWN ..."); plugin_run_hooks(HOOK_SHUTDOWN); + shutdbg("HOOK_SHUTDOWN done"); } dbg("Setting new runlevel --> %c <-- previous %c", sm_rl2ch(runlevel), sm_rl2ch(prevlevel)); @@ -384,12 +419,14 @@ void sm_step(void) } /* Reset once flag of runtasks */ + shutdbg("runtask_clean + step_all ..."); service_runtask_clean(); dbg("Stopping services not allowed in new runlevel ..."); sm.in_reload = 1; service_step_all(SVC_TYPE_ANY); + shutdbg("CHANGE done, entering WAIT"); sm.state = SM_RUNLEVEL_WAIT_STATE; break; @@ -400,10 +437,13 @@ void sm_step(void) */ svc = svc_stop_completed(); if (svc) { + shutdbg("WAIT, collecting %s[%d] ...", svc_ident(svc, NULL, 0), svc->pid); dbg("Waiting to collect %s, cmd %s(%d) ...", svc_ident(svc, NULL, 0), svc->cmd, svc->pid); break; } + shutdbg("WAIT done, all services stopped"); + /* Prev runlevel services stopped, call hooks before starting new runlevel ... */ dbg("All services have been stopped, calling runlevel change hooks ..."); plugin_run_hooks(HOOK_RUNLEVEL_CHANGE); /* Reconfigure HW/VLANs/etc here */ @@ -416,6 +456,7 @@ void sm_step(void) sm.in_reload = 0; service_step_all(SVC_TYPE_ANY); + shutdbg("entering CLEAN"); sm.state = SM_RUNLEVEL_CLEAN_STATE; break; @@ -427,11 +468,15 @@ void sm_step(void) */ svc = svc_clean_completed(); if (svc) { + shutdbg("CLEAN, waiting for %s[%d] (%s) ...", + svc_ident(svc, NULL, 0), svc->pid, svc_status(svc)); dbg("Waiting to collect post/cleanup script for %s, cmd %s(%d) ...", svc_ident(svc, NULL, 0), svc->cmd, svc->pid); break; } + shutdbg("CLEAN done, all post/cleanup scripts collected"); + /* Cleanup stale services */ svc_clean_dynamic(service_unregister); @@ -447,9 +492,15 @@ void sm_step(void) * Tannhäuser Gate. All those .. moments .. will be lost in time, like * tears ... in ... rain." */ - if (runlevel == 0 || runlevel == 6) + if (runlevel == 0 || runlevel == 6) { + shutdbg("calling do_shutdown()"); + uev_timer_stop(&shutdown_wdt); do_shutdown(halt); + shutdbg("do_shutdown() returned to SM!"); + } + shutdbg("entering RUNNING (reload=%d, newlevel=%d)", + sm.reload, sm.newlevel); sm.state = SM_RUNNING_STATE; break; @@ -510,15 +561,27 @@ void sm_step(void) /* Cleanup stale services */ svc_clean_dynamic(service_unregister); + dbg("Step all services waiting for reasserted conditions ..."); + service_step_all(SVC_TYPE_ANY); + dbg("Calling reconf hooks ..."); plugin_run_hooks(HOOK_SVC_RECONF); + dbg("Step all services waiting for reasserted conditions ..."); + service_step_all(SVC_TYPE_ANY); + dbg("Update configuration generation of device conditions ..."); devmon_reconf(); + dbg("Step all services waiting for reasserted conditions ..."); + service_step_all(SVC_TYPE_ANY); + dbg("Update configuration generation of unmodified non-native services ..."); service_notify_reconf(); + dbg("Step all services waiting for reasserted conditions ..."); + service_step_all(SVC_TYPE_ANY); + dbg("Reconfiguration done"); sm.state = SM_RUNNING_STATE; break; diff --git a/src/sm.h b/src/sm.h index 15e7bc1f..ada6df23 100644 --- a/src/sm.h +++ b/src/sm.h @@ -24,6 +24,11 @@ #ifndef FINIT_SM_H_ #define FINIT_SM_H_ +/* Set to 1 to enable shutdown trace logging (independent of 'initctl debug') */ +#ifndef SHUTDOWN_DEBUG +#define SHUTDOWN_DEBUG 1 +#endif + void sm_init (void); void sm_step (void); diff --git a/src/svc.h b/src/svc.h index d433a515..f21a8ee6 100644 --- a/src/svc.h +++ b/src/svc.h @@ -146,6 +146,7 @@ typedef struct svc { int starting; /* ... waiting for pidfile to be re-asserted */ int runlevels; int sighup; /* This service supports SIGHUP :) */ + int flux_reload; /* Propagate reload from dependency, '~' prefix */ int forking; /* This is a service/sysv daemon that forks, wait for it ... */ svc_block_t block; /* Reason that this service is currently stopped */ char cond[MAX_COND_LEN]; diff --git a/test/Makefile.am b/test/Makefile.am index ca965ed2..6e5ddfce 100644 --- a/test/Makefile.am +++ b/test/Makefile.am @@ -32,6 +32,7 @@ EXTRA_DIST += add-remove-dynamic-service-sub-config.sh EXTRA_DIST += bootstrap-crash.sh EXTRA_DIST += cond-start-task.sh EXTRA_DIST += crashing.sh +EXTRA_DIST += dep-chain-reload.sh EXTRA_DIST += depserv.sh EXTRA_DIST += devmon.sh EXTRA_DIST += failing-sysv.sh @@ -74,6 +75,7 @@ TESTS += add-remove-dynamic-service-sub-config.sh TESTS += bootstrap-crash.sh TESTS += cond-start-task.sh TESTS += crashing.sh +TESTS += dep-chain-reload.sh TESTS += depserv.sh TESTS += devmon.sh TESTS += failing-sysv.sh diff --git a/test/dep-chain-reload.sh b/test/dep-chain-reload.sh new file mode 100755 index 00000000..dd67ac19 --- /dev/null +++ b/test/dep-chain-reload.sh @@ -0,0 +1,181 @@ +#!/bin/sh +# Verify transitive dependency chain during reload and crash +# +# Four services in a chain: A → B → C → D, all using notify:pid. +# B and C are placed in separate sub-config files so we can use +# 'initctl touch' on them independently. B supports SIGHUP (reload), +# while C and D use '!' (noreload) and '~' (propagate reload), +# matching a common pattern in real-world setups (e.g., FRR routing +# daemons on Infix OS). +# +# Chain: A ← B ← C ← D +# +# Test 1 - touch C + reload (the "Infix" scenario): +# C is in the middle of the chain. After 'initctl touch svc_c.conf' +# + 'initctl reload': +# - A and B should be unaffected (same PID) +# - C is stopped/started (config was touched, noreload) +# - D must be restarted (reload propagated from C) +# +# Test 2 - crash (kill -9): +# When B is killed with SIGKILL the crash path (RUNNING → HALTED) +# bypasses STOPPING. Without cond_clear() in service_cleanup(), +# pid/B is never invalidated and C, D are never restarted. +# +# Test 3 - touch B + reload: +# B supports SIGHUP (no '!' prefix). After 'initctl touch +# svc_b.conf' + 'initctl reload': +# - A should be unaffected (same PID) +# - B is SIGHUP'd (same PID, config was touched) +# - C and D must be restarted (reload propagated, '~' prefix) + +set -eu + +TEST_DIR=$(dirname "$0") + +test_teardown() +{ + say "Running test teardown." + run "rm -f $FINIT_RCSD/svc_b.conf $FINIT_RCSD/svc_c.conf" +} + +pidof() +{ + texec initctl -j status "$1" | jq .pid +} + +test_setup() +{ + run "cat >> $FINIT_CONF" < name:svc_d serv -np -i svc_d -- Needs C +EOF + run "cat >> $FINIT_RCSD/svc_b.conf" < name:svc_b serv -np -i svc_b -- Needs A +EOF + run "cat >> $FINIT_RCSD/svc_c.conf" < name:svc_c serv -np -i svc_c -- Needs B +EOF +} + +# shellcheck source=/dev/null +. "$TEST_DIR/lib/setup.sh" + +sep "Configuration" +run "cat $FINIT_CONF" +run "cat $FINIT_RCSD/svc_b.conf" +run "cat $FINIT_RCSD/svc_c.conf" + +say "Reload Finit to start all services" +run "initctl reload" + +say "Wait for full chain to start" +retry 'assert_status "svc_d" "running"' 10 1 + +run "initctl status" +run "initctl cond dump" + +# ―――――――――――――――――――――――――――――――――――――――――――――――――――――― +# Test 1: touch C (middle of chain, noreload) + reload +# ―――――――――――――――――――――――――――――――――――――――――――――――――――――― +sep "Test 1: Touch C (middle) and global reload" + +pid_a=$(pidof svc_a) +pid_b=$(pidof svc_b) +pid_c=$(pidof svc_c) +pid_d=$(pidof svc_d) +say "PIDs before: A=$pid_a B=$pid_b C=$pid_c D=$pid_d" + +run "initctl touch svc_c.conf" +run "initctl reload" + +say "Wait for chain to settle" +retry 'assert_status "svc_d" "running"' 15 1 + +run "initctl status" +run "initctl cond dump" + +new_pid_a=$(pidof svc_a) +new_pid_b=$(pidof svc_b) +new_pid_c=$(pidof svc_c) +new_pid_d=$(pidof svc_d) +say "PIDs after: A=$new_pid_a B=$new_pid_b C=$new_pid_c D=$new_pid_d" + +# shellcheck disable=SC2086 +assert "A was not restarted" $new_pid_a -eq $pid_a +# shellcheck disable=SC2086 +assert "B was not restarted" $new_pid_b -eq $pid_b +# shellcheck disable=SC2086 +assert "C was restarted (touched)" $new_pid_c -ne $pid_c +# shellcheck disable=SC2086 +assert "D was restarted (transitive dep)" $new_pid_d -ne $pid_d + +# ―――――――――――――――――――――――――――――――――――――――――――――――――――――― +# Test 2: crash (kill -9), bypasses STOPPING +# ―――――――――――――――――――――――――――――――――――――――――――――――――――――― +sep "Test 2: Kill B with SIGKILL (bypasses STOPPING)" + +pid_b=$(pidof svc_b) +pid_c=$(pidof svc_c) +pid_d=$(pidof svc_d) +say "PIDs before: B=$pid_b C=$pid_c D=$pid_d" + +run "kill -9 $pid_b" + +say "Wait for B to respawn and chain to settle" +retry 'assert_status "svc_d" "running"' 15 1 + +run "initctl status" +run "initctl cond dump" + +new_pid_b=$(pidof svc_b) +new_pid_c=$(pidof svc_c) +new_pid_d=$(pidof svc_d) +say "PIDs after: B=$new_pid_b C=$new_pid_c D=$new_pid_d" + +# shellcheck disable=SC2086 +assert "B was restarted (crashed+respawn)" $new_pid_b -ne $pid_b +# shellcheck disable=SC2086 +assert "C was restarted (transitive dep)" $new_pid_c -ne $pid_c +# shellcheck disable=SC2086 +assert "D was restarted (transitive dep)" $new_pid_d -ne $pid_d + +# ―――――――――――――――――――――――――――――――――――――――――――――――――――――― +# Test 3: touch B (supports SIGHUP) + reload +# ―――――――――――――――――――――――――――――――――――――――――――――――――――――― +sep "Test 3: Touch B (SIGHUP) and global reload" + +pid_a=$(pidof svc_a) +pid_b=$(pidof svc_b) +pid_c=$(pidof svc_c) +pid_d=$(pidof svc_d) +say "PIDs before: A=$pid_a B=$pid_b C=$pid_c D=$pid_d" + +run "initctl debug" +run "initctl touch svc_b.conf" +run "initctl reload" +sleep 2 +run "initctl status" + +say "Wait for chain to settle" +retry 'assert_status "svc_d" "running"' 15 1 + +run "initctl status" +run "initctl cond dump" + +new_pid_a=$(pidof svc_a) +new_pid_b=$(pidof svc_b) +new_pid_c=$(pidof svc_c) +new_pid_d=$(pidof svc_d) +say "PIDs after: A=$new_pid_a B=$new_pid_b C=$new_pid_c D=$new_pid_d" + +# shellcheck disable=SC2086 +assert "A was not restarted" $new_pid_a -eq $pid_a +# shellcheck disable=SC2086 +assert "B was not restarted (SIGHUP)" $new_pid_b -eq $pid_b +# shellcheck disable=SC2086 +assert "C was restarted (transitive dep)" $new_pid_c -ne $pid_c +# shellcheck disable=SC2086 +assert "D was restarted (transitive dep)" $new_pid_d -ne $pid_d + +return 0 diff --git a/test/depserv.sh b/test/depserv.sh index 9c0ebb73..0d0eaa1d 100755 --- a/test/depserv.sh +++ b/test/depserv.sh @@ -5,7 +5,7 @@ # - bar must not start until foo is ready # - bar must be stopped when foo goes down # - bar must be restarted when foo is restarted -# - bar must be restarted when foo is reloaded (per-service reload) +# - bar must not be restarted when foo is reloaded (per-service reload) # # Regression test bug #314, that bar is not in a restart loop when foo # is stopped. Also, verify that changing between runlevels, where foo @@ -50,15 +50,15 @@ test_one() run "initctl status" assert "bar is restarted" "$(texec initctl |grep bar | awk '{print $1;}')" != "$pid" - say "Verify bar is restarted when foo is reloaded (per-service) ..." + say "Verify bar is NOT restarted when foo is reloaded (per-service) ..." retry 'assert_status "bar" "running"' 5 1 pid=$(texec initctl |grep bar | awk '{print $1;}') run "initctl reload foo" retry 'assert_status "bar" "running"' 5 1 - assert "bar is restarted on reload" "$(texec initctl |grep bar | awk '{print $1;}')" != "$pid" + assert "bar is NOT restarted on reload" "$(texec initctl |grep bar | awk '{print $1;}')" == "$pid" # Wait for spice to be stolen by the Harkonnen - sleep 3 + sleep 5 # bar should now have detected the loss of spice and be in restart run "initctl status" assert_status "bar" "restart"