В Mon, 27 Jan 2014 15:48:50 +0100
Post by Lennart PoetteringPost by Andrey BorzenkovPost by Lennart PoetteringPost by Ivan ShapovalovAny advices on how to do that?
I have both the issue (reproducible on each shutdown) and will to debug.
Well, enable the debug shell, and then from there try to figure out why
things are stuck. i.e. whether it is systemd --user that really never
exits. Or whether it actually exits but PID 1 doesn't notice it. And
then if you figured out which of the two cases, you'd have to figure out
why that is...
I finally managed to reproduce it with user instance running with debug
level (before *any* attempt to add debugging, strace, whatever resulted
in problem disappearing).
It seems that /bin/kill -RTMIN+24 is being killed itself. I wonder - is
it possible that it is the same SIGTERM that is used by PID 1 to stop
Ah, bummer! Yikes!
Thanks for tracking this done, this really sounds like you nailed the
problem. Now, how to fix it?
Hmm, so, I would claim this is a shortcoming of
KillMode=control-group, which is the default for everything. There has
been an item on the TODO list to maybe introduce a KillMode=mixed
setting, which would send SIGTERM only to the main process, and the
SIGKILL later on to all processes. I am pretty sure that this would
solve the issue at hand quite nicely here, because the systemd user
instance would get a nice chance to clean up its own act, before the
systemd system instance would make tabula rasa...
I still favor alternative approach - let systemd wait for main PID
to exit after ExecStop instead. This is functionally equivalent to the
above with slight advantages
- it will probably decrease number of timeouts because systemd will go
on killing spree as soon as main PID exits, not after final timeout.
- it is more generic as it allows any available method to trigger
service stop, not just a signal.
Comments are welcome :)
From: Andrey Borzenkov <***@gmail.com>
Subject: [PATCH] add WaitForMainPIDOnStop option
WaitForMainPIDOnStop=true will wait for exit of main PID in addition to
command set as ExecStop.
Use it in ***@.service to allow user systemd to complete exit transaction
before starting final kill.
---
man/systemd.service.xml | 13 +++++++++++++
src/core/dbus-service.c | 1 +
src/core/load-fragment-gperf.gperf.m4 | 1 +
src/core/service.c | 15 +++++++++++++--
src/core/service.h | 1 +
units/***@.service.in | 2 ++
6 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/man/systemd.service.xml b/man/systemd.service.xml
index d316ab5..c121c95 100644
--- a/man/systemd.service.xml
+++ b/man/systemd.service.xml
@@ -1016,6 +1016,19 @@ ExecStart=/bin/echo $ONE $TWO ${TWO}
<option>none</option>.</para></listitem>
</varlistentry>
+ <varlistentry>
+ <term><varname>WaitForMainPIDOnStop=</varname></term>
+
+ <listitem><para>Takes a boolean value
+ that specifies whether systemd should
+ additionally wait for the main PID of a service
+ to exit after executing ExecStop command.
+ Default is to wait for completion of ExecStop
+ command only. Defaults to
+ <option>no</option>.</para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<para>Check
diff --git a/src/core/dbus-service.c b/src/core/dbus-service.c
index 3db9339..35a3c2f 100644
--- a/src/core/dbus-service.c
+++ b/src/core/dbus-service.c
@@ -54,6 +54,7 @@ const sd_bus_vtable bus_service_vtable[] = {
SD_BUS_PROPERTY("RootDirectoryStartOnly", "b", bus_property_get_bool, offsetof(Service, root_directory_start_only), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("RemainAfterExit", "b", bus_property_get_bool, offsetof(Service, remain_after_exit), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("GuessMainPID", "b", bus_property_get_bool, offsetof(Service, guess_main_pid), SD_BUS_VTABLE_PROPERTY_CONST),
+ SD_BUS_PROPERTY("WaitForMainPIDOnStop", "b", bus_property_get_bool, offsetof(Service, stop_waits_for_main_pid), SD_BUS_VTABLE_PROPERTY_CONST),
SD_BUS_PROPERTY("MainPID", "u", bus_property_get_pid, offsetof(Service, main_pid), SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
SD_BUS_PROPERTY("ControlPID", "u", bus_property_get_pid, offsetof(Service, control_pid), SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
SD_BUS_PROPERTY("BusName", "s", NULL, offsetof(Service, bus_name), SD_BUS_VTABLE_PROPERTY_CONST),
diff --git a/src/core/load-fragment-gperf.gperf.m4 b/src/core/load-fragment-gperf.gperf.m4
index 59b2a64..85aaef7 100644
--- a/src/core/load-fragment-gperf.gperf.m4
+++ b/src/core/load-fragment-gperf.gperf.m4
@@ -167,6 +167,7 @@ Service.PermissionsStartOnly, config_parse_bool, 0,
Service.RootDirectoryStartOnly, config_parse_bool, 0, offsetof(Service, root_directory_start_only)
Service.RemainAfterExit, config_parse_bool, 0, offsetof(Service, remain_after_exit)
Service.GuessMainPID, config_parse_bool, 0, offsetof(Service, guess_main_pid)
+Service.WaitForMainPIDOnStop, config_parse_bool, 0, offsetof(Service, stop_waits_for_main_pid)
Service.RestartPreventExitStatus, config_parse_set_status, 0, offsetof(Service, restart_ignore_status)
Service.SuccessExitStatus, config_parse_set_status, 0, offsetof(Service, success_status)
m4_ifdef(`HAVE_SYSV_COMPAT',
diff --git a/src/core/service.c b/src/core/service.c
index d949f7a..8e0eb6c 100644
--- a/src/core/service.c
+++ b/src/core/service.c
@@ -1269,6 +1269,7 @@ static void service_dump(Unit *u, FILE *f, const char *prefix) {
"%sRootDirectoryStartOnly: %s\n"
"%sRemainAfterExit: %s\n"
"%sGuessMainPID: %s\n"
+ "%sWaitForMainPIDOnStop: %s\n"
"%sType: %s\n"
"%sRestart: %s\n"
"%sNotifyAccess: %s\n",
@@ -1279,6 +1280,7 @@ static void service_dump(Unit *u, FILE *f, const char *prefix) {
prefix, yes_no(s->root_directory_start_only),
prefix, yes_no(s->remain_after_exit),
prefix, yes_no(s->guess_main_pid),
+ prefix, yes_no(s->stop_waits_for_main_pid),
prefix, service_type_to_string(s->type),
prefix, service_restart_to_string(s->restart),
prefix, notify_access_to_string(s->notify_access));
@@ -2953,11 +2955,17 @@ static void service_sigchld_event(Unit *u, pid_t pid, int code, int status) {
case SERVICE_START_POST:
case SERVICE_RELOAD:
- case SERVICE_STOP:
/* Need to wait until the operation is
* done */
break;
+ case SERVICE_STOP:
+ /* If requested, wait for both main and control
+ PID to finish */
+ if (s->stop_waits_for_main_pid && !control_pid_good(s))
+ service_enter_signal(s, SERVICE_STOP_SIGTERM, f);
+ break;
+
case SERVICE_START:
if (s->type == SERVICE_ONESHOT) {
/* This was our main goal, so let's go on */
@@ -3116,7 +3124,10 @@ static void service_sigchld_event(Unit *u, pid_t pid, int code, int status) {
break;
case SERVICE_STOP:
- service_enter_signal(s, SERVICE_STOP_SIGTERM, f);
+ /* If requested, wait for both main and control
+ PID to finish */
+ if (!s->stop_waits_for_main_pid || main_pid_good(s) <= 0)
+ service_enter_signal(s, SERVICE_STOP_SIGTERM, f);
break;
case SERVICE_STOP_SIGTERM:
diff --git a/src/core/service.h b/src/core/service.h
index 1992926..919d54b 100644
--- a/src/core/service.h
+++ b/src/core/service.h
@@ -163,6 +163,7 @@ struct Service {
bool root_directory_start_only;
bool remain_after_exit;
bool guess_main_pid;
+ bool stop_waits_for_main_pid;
/* If we shut down, remember why */
ServiceResult result;
diff --git a/units/***@.service.in b/units/***@.service.in
index bfc9b70..ac3caf3 100644
--- a/units/***@.service.in
+++ b/units/***@.service.in
@@ -14,4 +14,6 @@ User=%i
PAMName=systemd-user
Type=notify
ExecStart=-@rootlibexecdir@/systemd --user
+ExecStop=@KILL@ -TERM $MANAGERPID
+WaitForMainPIDOnStop=yes
Slice=user-%i.slice
--
tg: (2d5bdf5..) u/wait_for_mainPID_on_stop (depends on: master)