[systemd-devel] Assertion failed in systemd v16

Discussion:

[systemd-devel] Assertion failed in systemd v16

Alexander E. Patrakov

2011-01-10 10:11:26 UTC

Hi.

I tried to write a systemd service file for lighttpd (to be used in
Gentoo). As described here, lighttpd implements a peculiar scheme for
graceful restarts:

http://blog.lighttpd.net/articles/2005/09/02/graceful-restart

As happens with all too-clever solutions, it doesn't work well.

For those who haven't read the blog and have no experience with
lighttpd, here is a summary of how it is supposed to be controlled:

1) To test the configuration file for syntax errors, run:
/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
2) To run lighttpd in the background as a daemon, /usr/sbin/lighttpd -f
/etc/lighttpd/lighttpd.conf
3) If you don't want lighttpd to become a daemon, add the -D option. I
don't use this option.
4) If you want to stop lighttpd and close all connections (equivalent to
apache2 -k stop), send it the TERM signal
5) If you want lighttpd to stop accepting new connections, but finish
the existing transfers gracefully (equivalent to apache2 -k
graceful-stop), send it the INT signal
6) There is no easy procedure for reloading the config while gracefully
finishing the existing transfers (equivalent to apache2 -k graceful).
One is supposed to combine the steps (1), (5) and (2) in the initscript.
7) SIGHUP is for reopening the log after rotation, so not used here.

Here is my naive attempt to implement this as a service file:

[Unit]
Description=Lighttpd Web Server
After=network.target

[Service]
Type=forking
EnvironmentFile=/etc/conf.d/lighttpd
PIDFile=/var/run/lighttpd.pid
ExecStartPre=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecStart=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
ExecStop=/bin/kill -INT $MAINPID
ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecReload=/bin/kill -INT $MAINPID
ExecReload=/bin/sleep 1
ExecReload=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
StandardOutput=syslog
StandardError=syslog

[Install]
WantedBy=multi-user.target
WantedBy=http-daemon.target

Here is what happened when I tried to reload lighttpd while looking with
strace whan systemd is doing:

[ 3639.047649] systemd[1]: Assertion 's->control_command_id ==
SERVICE_EXEC_START' failed at src/service.c:2184, function
service_run_next_main(). Aborting.
[ 3639.052182] systemd[1]: Caught <ABRT>, dumped core as pid 6045.
[ 3639.052376] systemd[1]: Freezing execution.

I could not reproduce the bug without stracing systemd:

strace -ff -e process -p 1

Also, $MAINPID handling seems to be unreliable - e.g., systemctl stop
lighttpd.service works correctly only while systemd is being straced.
Replacing /bin/kill -INT $MAINPID with killall -INT $MAINPID removed
both the unreliability of $MAINPID and the assertion failure. So I guess
that both bugs are somehow related to my usage of $MAINPID.

--
Alexander E. Patrakov

Alexander E. Patrakov

2011-01-10 12:30:50 UTC

Post by Alexander E. Patrakov
Here is what happened when I tried to reload lighttpd while looking
[ 3639.047649] systemd[1]: Assertion 's->control_command_id ==
SERVICE_EXEC_START' failed at src/service.c:2184, function
service_run_next_main(). Aborting.
[ 3639.052182] systemd[1]: Caught <ABRT>, dumped core as pid 6045.
[ 3639.052376] systemd[1]: Freezing execution.
strace -ff -e process -p 1

Now I can reproduce it more reliably without strace. Here is the
slightly modified service file:

[Unit]
Description=Lighttpd Web Server
After=network.target

[Service]
Type=forking

PIDFile=/var/run/lighttpd.pid

#ExecStartPre=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecStart=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf

ExecStop=/bin/kill -INT $MAINPID

#ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecReload=/bin/kill -INT $MAINPID
ExecReload=/bin/sleep 1
ExecReload=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf

StandardOutput=syslog
StandardError=syslog

[Install]
WantedBy=multi-user.target
WantedBy=http-daemon.target

To reproduce, I just have to do some attempts of "start, stop, start
again, reload" actions, with 30-second pauses between them. Reproduction
is not 100% reliable.

Here is the output of "systemctl status lighttpd.service" just before
issuing the "reload" command that triggered the assertion:

home ~ # systemctl status lighttpd.service
lighttpd.service - Lighttpd Web Server
Loaded: loaded (/etc/systemd/system/lighttpd.service)
Active: active (running) since Mon, 10 Jan 2011 17:18:44 +0500;
1s ago
Process: 2382 (/bin/kill -INT $MAINPID, code=exited, status=1/FAILURE)
Process: 2390 (/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf,
code=exited, status=0/SUCCESS)
Main PID: 2393 (lighttpd)
CGroup: name=systemd:/system/lighttpd.service
└ 2393 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf

Here is the backtrace, according to my core file:

#0 0x00007fe1316b7acb in raise () from /lib/libpthread.so.0
#1 0x0000000000408f6b in crash (sig=6) at src/main.c:118
#2 <signal handler called>
#3 0x00007fe130923b25 in raise () from /lib/libc.so.6
#4 0x00007fe130925330 in abort () from /lib/libc.so.6
#5 0x0000000000456ff7 in log_assert (file=0x473cb3 "src/service.c",
line=2184, func=<value optimized out>,
format=<value optimized out>) at src/log.c:492
#6 0x00000000004206f0 in service_run_next_main (u=0x2721990, pid=2393,
code=1, status=0) at src/service.c:2184
#7 service_sigchld_event (u=0x2721990, pid=2393, code=1, status=0) at
src/service.c:2525
#8 0x000000000040c1af in manager_dispatch_sigchld (m=0x25bb1c0) at
src/manager.c:2011
#9 0x00000000004117e3 in manager_process_signal_fd (m=0x25bb1c0) at
src/manager.c:2186
#10 process_event (m=0x25bb1c0) at src/manager.c:2211
#11 manager_loop (m=0x25bb1c0) at src/manager.c:2345
#12 0x000000000040a1e5 in main (argc=<value optimized out>, argv=<value
optimized out>) at src/main.c:1166

--
Alexander E. Patrakov

Kay Sievers

2011-01-10 13:11:16 UTC

Post by Alexander E. Patrakov

Post by Alexander E. Patrakov
[ 3639.047649] systemd[1]: Assertion 's->control_command_id ==
SERVICE_EXEC_START' failed at src/service.c:2184, function
service_run_next_main(). Aborting.

#ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecReload=/bin/kill -INT $MAINPID
ExecReload=/bin/sleep 1
ExecReload=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf

Mirco Tischler

2011-01-10 13:57:41 UTC

This patch should fix it. Can you test it? The problem was that after a
the mainpid exits,
and because there are commands left to execute systemd assumes it
executes an ExecStart
line from a type=oneshot service file. But in this case it executes an
ExecReload line.
This patch simply removes the assumptions. AFAICT this should work.

Mirco
---
src/service.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/src/service.c b/src/service.c
index a28eb8a..4ff9f5d 100644
--- a/src/service.c
+++ b/src/service.c
@@ -2181,9 +2181,6 @@ static void service_run_next_main(Service *s, bool
success) {
if (!success)
s->failure = true;

- assert(s->control_command_id == SERVICE_EXEC_START);
- assert(s->type == SERVICE_ONESHOT);
-
s->control_command = s->control_command->command_next;
service_unwatch_main_pid(s);

Alexander E. Patrakov

2011-01-10 14:34:14 UTC

Post by Mirco Tischler
This patch should fix it. Can you test it? The problem was that after a
the mainpid exits,
and because there are commands left to execute systemd assumes it
executes an ExecStart
line from a type=oneshot service file. But in this case it executes an
ExecReload line.
This patch simply removes the assumptions. AFAICT this should work.

This doesn't crash, but doesn't solve the "/bin/kill from ExecStop
sometimes fails" problem.

Post by Mirco Tischler
Mirco
---
src/service.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/src/service.c b/src/service.c
index a28eb8a..4ff9f5d 100644
--- a/src/service.c
+++ b/src/service.c
@@ -2181,9 +2181,6 @@ static void service_run_next_main(Service *s, bool
success) {
if (!success)
s->failure = true;
- assert(s->control_command_id == SERVICE_EXEC_START);
- assert(s->type == SERVICE_ONESHOT);
-
s->control_command = s->control_command->command_next;
service_unwatch_main_pid(s);

Alexander E. Patrakov

2011-01-10 15:22:25 UTC

This post might be inappropriate. Click to display it.

Lennart Poettering

2011-01-17 23:18:59 UTC

Post by Alexander E. Patrakov

Post by Mirco Tischler
This patch should fix it. Can you test it? The problem was that after a
the mainpid exits,
and because there are commands left to execute systemd assumes it
executes an ExecStart
line from a type=oneshot service file. But in this case it executes an
ExecReload line.
This patch simply removes the assumptions. AFAICT this should work.

On the second thought, I am not sure that I fully agree with the
patch. On IRC, we agreed that it is invalid to try to change the
main PID from within the ExecReload line.

Hmm, wasn't around when this discussion happened. What is the reason
cited for not allowing dynamically changing main pids? There might be
some but at least right now I cannot see any.

To me it appears right now that you just found a couple of bugs and
something like:

ExecStart=/path/to/check-config
ExecStart=/path/to/start-new-instance
ExecStart=/path/to/stop-old-instance

(where the last line might just be an invocation of kill)

should actually work after we fixed those bugs.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Marius Tolzmann

2011-01-10 17:49:39 UTC

hi..

Post by Alexander E. Patrakov
[Unit]
Description=Lighttpd Web Server
After=network.target
[Service]
Type=forking
EnvironmentFile=/etc/conf.d/lighttpd
PIDFile=/var/run/lighttpd.pid
ExecStartPre=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecStart=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
ExecStop=/bin/kill -INT $MAINPID
ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecReload=/bin/kill -INT $MAINPID
ExecReload=/bin/sleep 1
ExecReload=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
StandardOutput=syslog
StandardError=syslog

i still try to get behind all the systemd magic so i have some questions
on this unit:

can the initial problem of restarting the lighttpd gracefully be solved
by something like this?:

[Unit]
..
ConditionPathExists=/etc/lighttpd/lighttpd.conf

[Service]
ExecStartPre=/usr/sbin/lighttpd -t ...
ExecStart=/usr/sbin/lighttpd ...
ExecStop=/bin/kill -INT $MAINPID
ExecReload=/usr/sbin/lighttpd -t ...
ExecReload=/bin/kill -INT $MAINPID
StandardOutput=syslog
Restart=on-success ( or on-abort or always .. don't know what the
exit-status after graceful kill is in lighty)

and why do you explicitly not use the -D option and let systemd do all
the forking stuff for you..? you could get rid off the Type= PIDFile=
lines.. i also removed StandardError=syslog which seems to be redundant
here..

would this work? or isn't restart intended to be used for those kind of
things?

SignalStop=SIGINT and SignalReload=SIGINT or something similar may be a
nice shortcut for doing those kind of things.. Since SignalReload=SIGHUP
seems to be a way how it is done most of the time?

Another question: Doesn't the default KillMode=control-group would send
a SIGTERM to the remaining processes of this service immediately after
the ExecStop= returns and so break the idea behind a graceful shutdown
where the main-process exits and all running http-requests will be
finished? (Or is there also a delay of TimeoutSec= between ExecStop= and
the first SIGTERM?)

..everything is still spinning in my head 8)

marius..
--
Dipl.-Inf. Marius Tolzmann <***@molgen.mpg.de>
----------------------------------.------------------------------
MPI f. molekulare Genetik |
Ihnestrasse 63-73, D-14195 Berlin | ==> MarIuX GNU/Linux <==
Phone: +49 (0)30 8413 1709 |
----------------------------------^------------------------------
God put me on earth to accomplish a certain number of things.
Right now I am so far behind..
..I will never die. <by calvin from calvin&hobbes ;)>

Alexander E. Patrakov

2011-01-11 04:54:16 UTC

Post by Marius Tolzmann
hi..

Post by Alexander E. Patrakov
[Unit]
Description=Lighttpd Web Server
After=network.target
[Service]
Type=forking
EnvironmentFile=/etc/conf.d/lighttpd
PIDFile=/var/run/lighttpd.pid
ExecStartPre=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecStart=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
ExecStop=/bin/kill -INT $MAINPID
ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecReload=/bin/kill -INT $MAINPID
ExecReload=/bin/sleep 1
ExecReload=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
StandardOutput=syslog
StandardError=syslog

i still try to get behind all the systemd magic so i have some
can the initial problem of restarting the lighttpd gracefully be
[Unit]
..
ConditionPathExists=/etc/lighttpd/lighttpd.conf
[Service]
ExecStartPre=/usr/sbin/lighttpd -t ...
ExecStart=/usr/sbin/lighttpd ...
ExecStop=/bin/kill -INT $MAINPID
ExecReload=/usr/sbin/lighttpd -t ...
ExecReload=/bin/kill -INT $MAINPID
StandardOutput=syslog
Restart=on-success ( or on-abort or always .. don't know what the
exit-status after graceful kill is in lighty)

No, the Restart logic in systemd is not right for lighttpd (or rather:
the logic required by lighttpd is insane). Systemd will start a new
instance of lighttpd as soon as the main PID (i.e., in the default setup
with server.max-worker = 1, the only PID) of the old instance exits,
which is too late. I want the old and the new instances of lighttpd to
coexist for some time. See: after SIGINT, the old instance does not
listen on port 80, but continues servicing large downloads that are
still in progress. The new instance accepts new connections and handles
them. The old instance will exit as soon as all old transfers ard completed.

Post by Marius Tolzmann
and why do you explicitly not use the -D option and let systemd do all
the forking stuff for you..? you could get rid off the Type= PIDFile=
lines..

This would indeed work if I didn't need the funky graceful restart logic.

Post by Marius Tolzmann
i also removed StandardError=syslog which seems to be redundant here..

OK. If you want to see what went into Gentoo systemd overlay, look here:
http://git.overlays.gentoo.org/gitweb/?p=user/systemd.git;a=commitdiff;h=d34961cf8be3743013d026c21ca64078ad0f1255
. Essentially, I gave up, but documented the effort.

Post by Marius Tolzmann
SignalStop=SIGINT and SignalReload=SIGINT or something similar may be
a nice shortcut for doing those kind of things.. Since
SignalReload=SIGHUP seems to be a way how it is done most of the time?

Yes, this would be a good shortcut.

Post by Marius Tolzmann
Another question: Doesn't the default KillMode=control-group would
send a SIGTERM to the remaining processes of this service immediately
after the ExecStop= returns and so break the idea behind a graceful
shutdown where the main-process exits and all running http-requests
will be finished? (Or is there also a delay of TimeoutSec= between
ExecStop= and the first SIGTERM?)

The documentation (man systemd.service, search for TimeoutSec) says it
waits.

Anyway, lighttpd is very easy to set up, the default config works fine
for static files, and the only possible trouble is avoided by commenting
out the server.use-ipv6 = "enable" line, so you may want to play with it
yourself. To experiment with graceful restarts, you can put some music
in the document root and attempt to download it with wget -O /dev/null
--limit-rate=20k http://127.0.0.1/file.mp3 while restarting lighttpd.

--
Alexander E. Patrakov

Lennart Poettering

2011-01-17 23:23:22 UTC

instances of lighttpd to coexist for some time. See: after SIGINT,
the old instance does not listen on port 80, but continues servicing
large downloads that are still in progress. The new instance accepts
new connections and handles them. The old instance will exit as soon
as all old transfers ard completed.

How is this implemented in detail? Sending SIGINT is async, so at the
time you start the new instance you cannot be sure that the old instance
has stopped listening? And as I understood SO_REUSEADDR doesn't help you
around this, since the kernel will still not allow two sockets to be
bound to the same address when in listening state.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Alexander E. Patrakov

2011-01-18 11:41:06 UTC

Post by Lennart Poettering

instances of lighttpd to coexist for some time. See: after SIGINT,
the old instance does not listen on port 80, but continues servicing
large downloads that are still in progress. The new instance accepts
new connections and handles them. The old instance will exit as soon
as all old transfers ard completed.

How is this implemented in detail? Sending SIGINT is async, so at the
time you start the new instance you cannot be sure that the old instance
has stopped listening?

Indeed, there is a race here. That's why "/bin/sleep 1" in my initial
service file.

--
Alexander E. Patrakov

Lennart Poettering

2011-01-18 11:51:09 UTC

Post by Alexander E. Patrakov

Post by Lennart Poettering
How is this implemented in detail? Sending SIGINT is async, so at the
time you start the new instance you cannot be sure that the old instance
has stopped listening?

Indeed, there is a race here. That's why "/bin/sleep 1" in my
initial service file.

Hmm, what's the lighthttp people's take on this? They came up with the
scheme, so how do they think this problem should be handled?

(One possible solution could be if the old and the new instance would
communicate and pass the listening sockets via some AF_UNIX
socket/SCM_RIGHTS or so. But that is not trivial.)

"Solutions" such as sleep 1 are not really how we do things in systemd
land...

Lennart

--
Lennart Poettering - Red Hat, Inc.

Alexander E. Patrakov

2011-01-18 12:56:02 UTC

Post by Lennart Poettering

Post by Alexander E. Patrakov

Post by Lennart Poettering
How is this implemented in detail? Sending SIGINT is async, so at the
time you start the new instance you cannot be sure that the old instance
has stopped listening?

Indeed, there is a race here. That's why "/bin/sleep 1" in my
initial service file.

Hmm, what's the lighthttp people's take on this? They came up with the
scheme, so how do they think this problem should be handled?

They usleep(5 * 1000) and, if the new copy of lighttpd still fails to
start due to a race, sleep again and attempt to start it again, ad
infinitum.

Post by Lennart Poettering
(One possible solution could be if the old and the new instance would
communicate and pass the listening sockets via some AF_UNIX
socket/SCM_RIGHTS or so. But that is not trivial.)

I came with a different (albeit systemd-specific) solution. The solution
(implemented as an attached patch) is to fork() after we closed all the
listening sockets and exit immediately in the parent. This way, the
chlid continues to serve already-requested files, and systemd has a
race-free indication that it can restart lighttpd via Restart=always.
Please review the patch and the service file. One known problem: the
patch is completely untested with server.max-worker > 0.

--
Alexander E. Patrakov

Lennart Poettering

2011-01-18 22:38:34 UTC

Post by Alexander E. Patrakov

Post by Lennart Poettering

Post by Alexander E. Patrakov

Post by Lennart Poettering
How is this implemented in detail? Sending SIGINT is async, so at the
time you start the new instance you cannot be sure that the old instance
has stopped listening?

Indeed, there is a race here. That's why "/bin/sleep 1" in my
initial service file.

Hmm, what's the lighthttp people's take on this? They came up with the
scheme, so how do they think this problem should be handled?

They usleep(5 * 1000) and, if the new copy of lighttpd still fails
to start due to a race, sleep again and attempt to start it again,
ad infinitum.

Meh, busy loops like this are equal.

Post by Alexander E. Patrakov

Post by Lennart Poettering
(One possible solution could be if the old and the new instance would
communicate and pass the listening sockets via some AF_UNIX
socket/SCM_RIGHTS or so. But that is not trivial.)

I came with a different (albeit systemd-specific) solution. The
solution (implemented as an attached patch) is to fork() after we
closed all the listening sockets and exit immediately in the parent.
This way, the chlid continues to serve already-requested files, and
systemd has a race-free indication that it can restart lighttpd via
Restart=always. Please review the patch and the service file. One
known problem: the patch is completely untested with
server.max-worker > 0.

fork()ing is ugly, will break all mutexes, eat all threads and is
otherwise ugly too. Using fork() is in itself a problem, seldom a
solution.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Alexander E. Patrakov

2011-01-11 10:51:49 UTC

Post by Marius Tolzmann
hi..
can the initial problem of restarting the lighttpd gracefully be
Restart=on-success ( or on-abort or always .. don't know what the
exit-status after graceful kill is in lighty)

After more research, I found the under-documented lighttpd-angel
program. It does what is needed for babysitting lighttpd: performs the
"send SIGINT to the old copy and immediately start the new one" dance
when it receives SIGHUP. See
http://blog.lighttpd.net/articles/2007/09/02/there-is-an-angel-for-lighty .
So here is a working unit file for lighttpd, with graceful reloading:

==========
[Unit]
Description=Lighttpd Web Server
After=network.target

[Service]
ExecStart=/usr/sbin/lighttpd-angel -f /etc/lighttpd/lighttpd.conf -D
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target
WantedBy=http-daemon.target
==========

While this works, it looks somehow ugly because of a special
"lighttpd-angel" program that only launches and restarts lighttpd (i.e.,
performs a job that is IMHO more suited for systemd) - but well,
lighttpd's requirements are special. Here is what it looks like normally:

home ~ # systemctl status lighttpd.service
lighttpd.service - Lighttpd Web Server
Loaded: loaded (/etc/systemd/system/lighttpd.service)
Active: active (running) since Tue, 11 Jan 2011 15:46:50 +0500; 1s ago
Main PID: 12659 (lighttpd-angel)
CGroup: name=systemd:/system/lighttpd.service
├ 12659 /usr/sbin/lighttpd-angel -f /etc/lighttpd/lighttpd.conf -D
└ 12660 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf -D

Here is what it looks like during graceful restart:

home ~ # systemctl status lighttpd.service
lighttpd.service - Lighttpd Web Server
Loaded: loaded (/etc/systemd/system/lighttpd.service)
Active: active (running) since Tue, 11 Jan 2011 15:46:50 +0500; 58s ago
Process: 12670 (/bin/kill -HUP $MAINPID, code=exited, status=0/SUCCESS)
Main PID: 12659 (lighttpd-angel)
CGroup: name=systemd:/system/lighttpd.service
├ 12659 /usr/sbin/lighttpd-angel -f /etc/lighttpd/lighttpd.conf -D
├ 12660 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf -D
└ 12671 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf -D

And here is the final status when the long transfer which kept PID 12660
busy is finished:

home ~ # systemctl status lighttpd.service
lighttpd.service - Lighttpd Web Server
Loaded: loaded (/etc/systemd/system/lighttpd.service)
Active: active (running) since Tue, 11 Jan 2011 15:46:50 +0500; 1min 41s ago
Process: 12670 (/bin/kill -HUP $MAINPID, code=exited, status=0/SUCCESS)
Main PID: 12659 (lighttpd-angel)
CGroup: name=systemd:/system/lighttpd.service
├ 12659 /usr/sbin/lighttpd-angel -f /etc/lighttpd/lighttpd.conf -D
└ 12671 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf -D

If you also want graceful stop, you may want to add:

ExecStop=/bin/kill -INT $MAINPID

but this doesn't work, as lighttpd gets killed immediately. I don't know
why this happens.

Also, the "test config before reloading" feature still doesn't work
right. If I add this ExecReload line before the existing one:

ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf

then systemd will kill lighttpd-angel when the configuration file is
bad. It should instead complain to syslog and do nothing with the
running instances of lighttpd and lighttpd-angel.

--
Alexander E. Patrakov

Marius Tolzmann

2011-01-11 14:46:01 UTC

hi again..

Post by Alexander E. Patrakov
After more research, I found the under-documented lighttpd-angel
program. It does what is needed for babysitting lighttpd: performs the
"send SIGINT to the old copy and immediately start the new one" dance
when it receives SIGHUP. See
http://blog.lighttpd.net/articles/2007/09/02/there-is-an-angel-for-lighty .

i found this too but thought is was only intended to be included in
versions 1.5+ of lighttpd.. after compiling and installing the latest
lighttpd to test the systemd unit i was happy that the "angel" is
already shipped in the 1.4.x series.. 8)

Post by Alexander E. Patrakov
ExecStop=/bin/kill -INT $MAINPID
but this doesn't work, as lighttpd gets killed immediately. I don't know
why this happens.

have you tried KillMode=process here?

Post by Alexander E. Patrakov
Also, the "test config before reloading" feature still doesn't work
ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
then systemd will kill lighttpd-angel when the configuration file is
bad. It should instead complain to syslog and do nothing with the
running instances of lighttpd and lighttpd-angel.

The behaviour seems to be undocumented (at least i havn't found
anything) if the ExecReload= command fails.

But if a reload fails it makes sense to me to stop a service because
this service wouldn't be in the state the caller wants it to be.
There is no way you can notify systemd that the failed reload attempt
was just a test prior to executing the real reload command (e.g. there
is no ExecReloadPre or the like). (If a service keeps running I won't
check syslog..)

At last: Isn't that something lighttpd should take care off? Or the user
should test before issuing a reload? Or a small script can do?

bye.. marius..

Alexander E. Patrakov

2011-01-11 15:11:29 UTC

Post by Marius Tolzmann
hi again..

Post by Alexander E. Patrakov
ExecStop=/bin/kill -INT $MAINPID
but this doesn't work, as lighttpd gets killed immediately. I don't know
why this happens.

have you tried KillMode=process here?

Yes.

Post by Marius Tolzmann

Post by Alexander E. Patrakov
Also, the "test config before reloading" feature still doesn't work
ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
then systemd will kill lighttpd-angel when the configuration file is
bad. It should instead complain to syslog and do nothing with the
running instances of lighttpd and lighttpd-angel.

The behaviour seems to be undocumented (at least i havn't found
anything) if the ExecReload= command fails.
But if a reload fails it makes sense to me to stop a service because
this service wouldn't be in the state the caller wants it to be.

In this case, it doesn't make sense, because the service is still in a
known state (the old state) that is still better than the non-running
state from the viewpoint of avoiding downtime.

Post by Marius Tolzmann
There is no way you can notify systemd that the failed reload attempt
was just a test prior to executing the real reload command (e.g. there
is no ExecReloadPre or the like). (If a service keeps running I won't
check syslog..)

When the config is bad and the additional ExecReload line is present,
systemd does tell me that reload failed. So I have a good reason to
check syslog, and even am suggested by systemd to do so.

Post by Marius Tolzmann
At last: Isn't that something lighttpd should take care off? Or the
user should test before issuing a reload? Or a small script can do?

Isn't one of the points of systemd to eliminate such small scripts?

--
Alexander E. Patrakov

Mirco Tischler

2011-01-11 15:50:56 UTC

Post by Alexander E. Patrakov

Post by Alexander E. Patrakov
Also, the "test config before reloading" feature still doesn't work

Post by Alexander E. Patrakov
ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
then systemd will kill lighttpd-angel when the configuration file is
bad. It should instead complain to syslog and do nothing with the
running instances of lighttpd and lighttpd-angel.

The behaviour seems to be undocumented (at least i havn't found
anything) if the ExecReload= command fails.
But if a reload fails it makes sense to me to stop a service because
this service wouldn't be in the state the caller wants it to be.

In this case, it doesn't make sense, because the service is still in a
known state (the old state) that is still better than the non-running
state from the viewpoint of avoiding downtime.

Post by Alexander E. Patrakov
There is no way you can notify systemd that the failed reload attempt
was just a test prior to executing the real reload command (e.g.
there is no ExecReloadPre or the like). (If a service keeps running I
won't check syslog..)

When the config is bad and the additional ExecReload line is present,
systemd does tell me that reload failed. So I have a good reason to
check syslog, and even am suggested by systemd to do so.

Post by Alexander E. Patrakov
At last: Isn't that something lighttpd should take care off? Or the
user should test before issuing a reload? Or a small script can do?

Isn't one of the points of systemd to eliminate such small scripts?

I just implemented an additional command: ExecConfigTest, which could
solve your issues. This is just a very quick first version and I'm sure
I've done something wrong or oversaw somethin although in a quick test
it works.
If set, the command is executed prior to those commands: ExecStart or if
exists ExecStartPre, ExecReload and ExecRestart. When failing, on
startup, the server isn't started. On restart and reload failure leads
to the service continuing to run and systemd not executing the
restart/reload command. So downtime because of configuration failures
can be minimized for services which provide a config test command. The
patch will be a followup to this mail. Would this be usefull?

Alexander, I've also thought about your email concerning yesterdays
patch and you have made some valid points. I'm not sure how to fix this
right yet though.

Thanks
Mirco

Mirco Tischler

2011-01-11 15:56:40 UTC

The ExecConfigTest command is trigerred before each ExecStart/Pre,
ExecReload
and ExecRestart command and the execution of those commands is denied when
the ExecConfigTest fails.
---
src/load-fragment.c | 1 +
src/service.c | 52
+++++++++++++++++++++++++++++++++++++++++++++++++++
src/service.h | 2 +
3 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/src/load-fragment.c b/src/load-fragment.c
index 261180d..b268f81 100644
--- a/src/load-fragment.c
+++ b/src/load-fragment.c
@@ -1853,6 +1853,7 @@ static int load_from_path(Unit *u, const char *path) {
{ "ExecReload",
config_parse_exec,
u->service.exec_command+SERVICE_EXEC_RELOAD, "Service" },
{ "ExecStop",
config_parse_exec,
u->service.exec_command+SERVICE_EXEC_STOP, "Service" },
{ "ExecStopPost",
config_parse_exec,
u->service.exec_command+SERVICE_EXEC_STOP_POST, "Service" },
+ { "ExecConfigTest",
config_parse_exec,
u->service.exec_command+SERVICE_EXEC_CONFIG_TEST,"Service" },
{ "RestartSec",
config_parse_usec,
&u->service.restart_usec, "Service" },
{ "TimeoutSec",
config_parse_usec,
&u->service.timeout_usec, "Service" },
{ "Type",
config_parse_service_type,
&u->service.type, "Service" },
diff --git a/src/service.c b/src/service.c
index a28eb8a..829bd5a 100644
--- a/src/service.c
+++ b/src/service.c
@@ -1941,6 +1941,27 @@ static void service_enter_running(Service *s,
bool success) {
service_enter_stop(s, true);
}

+static int service_enter_config_test(Service *s) {
+ int r = 0;
+
+ assert(s);
+
+ s->control_command_id = SERVICE_EXEC_CONFIG_TEST;
+ if ((s->control_command =
s->exec_command[SERVICE_EXEC_CONFIG_TEST]))
+ if((r = service_spawn(s,
+ s->control_command,
+ true,
+ false,
+ !s->permissions_start_only,
+ !s->root_directory_start_only,
+ false,
+ false,
+ &s->control_pid) < 0))
+ service_set_state(s, SERVICE_CONFIG_TEST);
+
+ return r;
+}
+
static void service_enter_start_post(Service *s) {
int r;
assert(s);
@@ -1980,6 +2001,13 @@ static void service_enter_start(Service *s) {
assert(s->exec_command[SERVICE_EXEC_START]);
assert(!s->exec_command[SERVICE_EXEC_START]->command_next ||
s->type == SERVICE_ONESHOT);

+ if(s->exec_command[SERVICE_EXEC_CONFIG_TEST] &&
!s->exec_command[SERVICE_EXEC_START_PRE]) {
+ if((r = service_enter_config_test(s) < 0)) {
+ log_warning("%s failed config test, not
starting: %s", s->meta.id, strerror(-r));
+ goto fail;
+ }
+ }
+
if (s->type == SERVICE_FORKING)
service_unwatch_control_pid(s);
else
@@ -2044,6 +2072,13 @@ static void service_enter_start_pre(Service *s) {

service_unwatch_control_pid(s);

+ if(s->exec_command[SERVICE_EXEC_CONFIG_TEST]) {
+ if((r = service_enter_config_test(s) < 0)) {
+ log_warning("%s failed config test, not
reloading: %s", s->meta.id, strerror(-r));
+ goto fail;
+ }
+ }
+
s->control_command_id = SERVICE_EXEC_START_PRE;
if ((s->control_command =
s->exec_command[SERVICE_EXEC_START_PRE])) {
if ((r = service_spawn(s,
@@ -2082,6 +2117,15 @@ static void service_enter_restart(Service *s) {
goto fail;
}

+ if(s->exec_command[SERVICE_EXEC_CONFIG_TEST]) {
+ if((r = service_enter_config_test(s) < 0)) {
+ log_warning("%s failed config test, not
restarting: %s", s->meta.id, strerror(-r));
+ service_enter_running(s, true);
+ return;
+ }
+ }
+
+
service_enter_dead(s, true, false);

if ((r = manager_add_job(s->meta.manager, JOB_START, UNIT(s),
JOB_FAIL, false, &error, NULL)) < 0)
@@ -2102,6 +2146,14 @@ static void service_enter_reload(Service *s) {

assert(s);

+ if(s->exec_command[SERVICE_EXEC_CONFIG_TEST]){
+ if((r = service_enter_config_test(s) < 0)) {
+ log_warning("%s failed config test, not
reloading: %s", s->meta.id, strerror(-r));
+ service_enter_running(s, true);
+ return;
+ }
+ }
+
service_unwatch_control_pid(s);

s->control_command_id = SERVICE_EXEC_RELOAD;
diff --git a/src/service.h b/src/service.h
index 500bebf..df2a218 100644
--- a/src/service.h
+++ b/src/service.h
@@ -35,6 +35,7 @@ typedef enum ServiceState {
SERVICE_RUNNING,
SERVICE_EXITED, /* Nothing is running anymore, but
RemainAfterExit is true, ehnce this is OK */
SERVICE_RELOAD,
+ SERVICE_CONFIG_TEST,
SERVICE_STOP, /* No STOP_PRE state, instead just
register multiple STOP executables */
SERVICE_STOP_SIGTERM,
SERVICE_STOP_SIGKILL,
@@ -74,6 +75,7 @@ typedef enum ServiceExecCommand {
SERVICE_EXEC_RELOAD,
SERVICE_EXEC_STOP,
SERVICE_EXEC_STOP_POST,
+ SERVICE_EXEC_CONFIG_TEST,
_SERVICE_EXEC_COMMAND_MAX,
_SERVICE_EXEC_COMMAND_INVALID = -1
} ServiceExecCommand;
-- 1.7.3.4

Mirco Tischler

2011-01-11 16:11:40 UTC

diff --git a/src/load-fragment.c b/src/load-fragment.c
index 261180d..b268f81 100644
--- a/src/load-fragment.c
+++ b/src/load-fragment.c
@@ -1853,6 +1853,7 @@ static int load_from_path(Unit *u, const char *path) {
{ "ExecReload", config_parse_exec, u->service.exec_command+SERVICE_EXEC_RELOAD, "Service" },
{ "ExecStop", config_parse_exec, u->service.exec_command+SERVICE_EXEC_STOP, "Service" },
{ "ExecStopPost", config_parse_exec, u->service.exec_command+SERVICE_EXEC_STOP_POST, "Service" },
+ { "ExecConfigTest", config_parse_exec, u->service.exec_command+SERVICE_EXEC_CONFIG_TEST,"Service" },
{ "RestartSec", config_parse_usec, &u->service.restart_usec, "Service" },
{ "TimeoutSec", config_parse_usec, &u->service.timeout_usec, "Service" },
{ "Type", config_parse_service_type, &u->service.type, "Service" },
diff --git a/src/service.c b/src/service.c
index a28eb8a..829bd5a 100644
--- a/src/service.c
+++ b/src/service.c
@@ -1941,6 +1941,27 @@ static void service_enter_running(Service *s, bool success) {
service_enter_stop(s, true);
}

+static int service_enter_config_test(Service *s) {
+ int r = 0;
+
+ assert(s);
+
+ s->control_command_id = SERVICE_EXEC_CONFIG_TEST;
+ if ((s->control_command = s->exec_command[SERVICE_EXEC_CONFIG_TEST]))
+ if((r = service_spawn(s,
+ s->control_command,
+ true,
+ false,
+ !s->permissions_start_only,
+ !s->root_directory_start_only,
+ false,
+ false,
+ &s->control_pid) < 0))
+ service_set_state(s, SERVICE_CONFIG_TEST);
+
+ return r;
+}
+
static void service_enter_start_post(Service *s) {
int r;
assert(s);
@@ -1980,6 +2001,13 @@ static void service_enter_start(Service *s) {
assert(s->exec_command[SERVICE_EXEC_START]);
assert(!s->exec_command[SERVICE_EXEC_START]->command_next || s->type == SERVICE_ONESHOT);

+ if(s->exec_command[SERVICE_EXEC_CONFIG_TEST] && !s->exec_command[SERVICE_EXEC_START_PRE]) {
+ if((r = service_enter_config_test(s) < 0)) {
+ log_warning("%s failed config test, not starting: %s", s->meta.id, strerror(-r));
+ goto fail;
+ }
+ }
+
if (s->type == SERVICE_FORKING)
service_unwatch_control_pid(s);
else
@@ -2044,6 +2072,13 @@ static void service_enter_start_pre(Service *s) {

service_unwatch_control_pid(s);

+ if(s->exec_command[SERVICE_EXEC_CONFIG_TEST]) {
+ if((r = service_enter_config_test(s) < 0)) {
+ log_warning("%s failed config test, not reloading: %s", s->meta.id, strerror(-r));
+ goto fail;
+ }
+ }
+
s->control_command_id = SERVICE_EXEC_START_PRE;
if ((s->control_command = s->exec_command[SERVICE_EXEC_START_PRE])) {
if ((r = service_spawn(s,
@@ -2082,6 +2117,15 @@ static void service_enter_restart(Service *s) {
goto fail;
}

+ if(s->exec_command[SERVICE_EXEC_CONFIG_TEST]) {
+ if((r = service_enter_config_test(s) < 0)) {
+ log_warning("%s failed config test, not restarting: %s", s->meta.id, strerror(-r));
+ service_enter_running(s, true);
+ return;
+ }
+ }
+
+
service_enter_dead(s, true, false);

if ((r = manager_add_job(s->meta.manager, JOB_START, UNIT(s), JOB_FAIL, false, &error, NULL)) < 0)
@@ -2102,6 +2146,14 @@ static void service_enter_reload(Service *s) {

assert(s);

+ if(s->exec_command[SERVICE_EXEC_CONFIG_TEST]){
+ if((r = service_enter_config_test(s) < 0)) {
+ log_warning("%s failed config test, not reloading: %s", s->meta.id, strerror(-r));
+ service_enter_running(s, true);
+ return;
+ }
+ }
+
service_unwatch_control_pid(s);

s->control_command_id = SERVICE_EXEC_RELOAD;
diff --git a/src/service.h b/src/service.h
index 500bebf..df2a218 100644
--- a/src/service.h
+++ b/src/service.h
@@ -35,6 +35,7 @@ typedef enum ServiceState {
SERVICE_RUNNING,
SERVICE_EXITED, /* Nothing is running anymore, but RemainAfterExit is true, ehnce this is OK */
SERVICE_RELOAD,
+ SERVICE_CONFIG_TEST,
SERVICE_STOP, /* No STOP_PRE state, instead just register multiple STOP executables */
SERVICE_STOP_SIGTERM,
SERVICE_STOP_SIGKILL,
@@ -74,6 +75,7 @@ typedef enum ServiceExecCommand {
SERVICE_EXEC_RELOAD,
SERVICE_EXEC_STOP,
SERVICE_EXEC_STOP_POST,
+ SERVICE_EXEC_CONFIG_TEST,
_SERVICE_EXEC_COMMAND_MAX,
_SERVICE_EXEC_COMMAND_INVALID = -1
} ServiceExecCommand;

--
1.7.3.4

Alexander E. Patrakov

2011-01-11 16:40:58 UTC

11.01.2011 21:11, Mirco Tischler wrote:

<a patch>

Sorry, this doesn't work as expected. To reproduce the problem:

1) Start a service with a good config
2) Edit the config. Make a typo.
3) Attempt to reload the service. At this point, systemd will put the
whole service into a "failed" state.
4) Fix the typo.
5) Attempt to reload the service.

Expected result: service reloaded with the new config.
Actual result: systemd doesn't let me reload the service.

Also, when starting a service, systemd tests the config twice for some
reason. And it is really strange that in the system log with
systemd.log_level=debug systemd.log_target=kmsg, systemd attempts to
fork the main command before the check finished and thus seems to run
the main command in parallel with the check. In other words, I doubt
that the correct thing is tested. I suspect that "the executable is
found" fact is tested, not "the command executed successfully".

--
Alexander E. Patrakov

Mirco Tischler

2011-01-12 13:31:09 UTC

Post by Alexander E. Patrakov
<a patch>
1) Start a service with a good config
2) Edit the config. Make a typo.
3) Attempt to reload the service. At this point, systemd will put the
whole service into a "failed" state.
4) Fix the typo.
5) Attempt to reload the service.
Expected result: service reloaded with the new config.
Actual result: systemd doesn't let me reload the service.
Also, when starting a service, systemd tests the config twice for some
reason. And it is really strange that in the system log with
systemd.log_level=debug systemd.log_target=kmsg, systemd attempts to
fork the main command before the check finished and thus seems to run
the main command in parallel with the check. In other words, I doubt
that the correct thing is tested. I suspect that "the executable is
found" fact is tested, not "the command executed successfully".

Thanks for testing it. I didn't expect it to do the right thing yet ;)
All the tests I was able to run for reasons of limited time was service
start, restart and reload without.
Seems I need to read the sources a little more. Some of them could
really use more commenting. :)
A second version will follow probably next week.

And sorry for the patch send twice. The first version was mangled by my
mailclient so I tried git send-email on the second one.

Thanks
Mirco

Marius Tolzmann

2011-01-11 16:30:01 UTC

hi..

Post by Alexander E. Patrakov
When the config is bad and the additional ExecReload line is present,
systemd does tell me that reload failed. So I have a good reason to
check syslog, and even am suggested by systemd to do so.

sorry my fault 8).. i thought it was silently failing and you have to
check syslog for errors..

Post by Alexander E. Patrakov

Post by Marius Tolzmann
At last: Isn't that something lighttpd should take care off? Or the
user should test before issuing a reload? Or a small script can do?

Isn't one of the points of systemd to eliminate such small scripts?

Mirkos ExecConfigTest= idea seems to solve this issue perfectly.. what i
really like: now the configtest-procedure only has to be defined once..
and so your unit file will be more readable compared to the very first
version..

thanks for the unit file.. i will use it soon 8)

cheers, marius..

Lennart Poettering

2011-01-17 23:50:34 UTC

Post by Marius Tolzmann

Post by Alexander E. Patrakov

Post by Marius Tolzmann
At last: Isn't that something lighttpd should take care off? Or the
user should test before issuing a reload? Or a small script can do?

Isn't one of the points of systemd to eliminate such small scripts?

Mirkos ExecConfigTest= idea seems to solve this issue perfectly..
what i really like: now the configtest-procedure only has to be
defined once.. and so your unit file will be more readable compared
to the very first version..

If ExecReload= is fixed to allow multiple lines without crashing, and
also doesn't shut down the service if a reload command fails then
you will not need ExecConfigTest, correct?

Lennart

--
Lennart Poettering - Red Hat, Inc.

Marius Tolzmann

2011-01-18 10:09:41 UTC

hi..

Post by Lennart Poettering
If ExecReload= is fixed to allow multiple lines without crashing, and
also doesn't shut down the service if a reload command fails then
you will not need ExecConfigTest, correct?

true true 8)

btw systemd is doing great in our linux-from-scratch-based distribution
so far.. thx..

bye marius..

Lennart Poettering

2011-01-17 23:48:40 UTC

Post by Alexander E. Patrakov

Post by Marius Tolzmann
There is no way you can notify systemd that the failed reload
attempt was just a test prior to executing the real reload command
(e.g. there is no ExecReloadPre or the like). (If a service keeps
running I won't check syslog..)

When the config is bad and the additional ExecReload line is
present, systemd does tell me that reload failed. So I have a good
reason to check syslog, and even am suggested by systemd to do so.

What systemd currently does is:

If the reload fails it shutdowns the service and informs you about
the failure.

What systemd probably should do (and what is now in the TODO list) is:

If the reload fails it should leave the service as is but informs
you about the failure.

That should make it easy to plug in check scripts via the ExecReload
directive and make the need of an additional directive unnecessary.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Alexander E. Patrakov

2011-01-18 13:00:50 UTC

Post by Lennart Poettering
If the reload fails it shutdowns the service and informs you about
the failure.
If the reload fails it should leave the service as is but informs
you about the failure.
That should make it easy to plug in check scripts via the ExecReload
directive and make the need of an additional directive unnecessary.

For the case of reloading, you are right. However, the original idea was
to run the configuration check also on restart (but not on stop). It is
not obvious for me how this can work without a new directive.

--
Alexander E. Patrakov

Lennart Poettering

2011-01-18 22:39:49 UTC

Post by Alexander E. Patrakov

Post by Lennart Poettering
If the reload fails it shutdowns the service and informs you about
the failure.
If the reload fails it should leave the service as is but informs
you about the failure.
That should make it easy to plug in check scripts via the ExecReload
directive and make the need of an additional directive unnecessary.

For the case of reloading, you are right. However, the original idea
was to run the configuration check also on restart (but not on
stop). It is not obvious for me how this can work without a new
directive.

Hmm, so on restart you want to run the check script before the "stop"
operation is run? Hmm, not sure either how to do that best. I guess for
that we'd have to add ExecRestartPre= or something like that.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Lennart Poettering

2011-01-17 23:41:49 UTC

Post by Marius Tolzmann
hi again..

Post by Alexander E. Patrakov
After more research, I found the under-documented lighttpd-angel
program. It does what is needed for babysitting lighttpd: performs the
"send SIGINT to the old copy and immediately start the new one" dance
when it receives SIGHUP. See
http://blog.lighttpd.net/articles/2007/09/02/there-is-an-angel-for-lighty .

i found this too but thought is was only intended to be included in
versions 1.5+ of lighttpd.. after compiling and installing the
latest lighttpd to test the systemd unit i was happy that the
"angel" is already shipped in the 1.4.x series.. 8)

Post by Alexander E. Patrakov
ExecStop=/bin/kill -INT $MAINPID
but this doesn't work, as lighttpd gets killed immediately. I don't know
why this happens.

have you tried KillMode=process here?

The stop command should be sync, otherweise systemd finds that there is
still process running and will kill it right-away with SIGTERM. That
means lighthttpd will receive first INT and then very likely TERM
too. And KillMode=process will only avoid that the TERM is sent to the
non-main processes, but the main process will still receive two signlas
in most cases.

Post by Marius Tolzmann

Post by Alexander E. Patrakov
Also, the "test config before reloading" feature still doesn't work
ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
then systemd will kill lighttpd-angel when the configuration file is
bad. It should instead complain to syslog and do nothing with the
running instances of lighttpd and lighttpd-angel.

The behaviour seems to be undocumented (at least i havn't found
anything) if the ExecReload= command fails.

Hmm, we probably shouldn't shut down the service if a reload fails.

/me adds this to the TODO list.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Lennart Poettering

2011-01-17 23:46:03 UTC

Post by Marius Tolzmann
and why do you explicitly not use the -D option and let systemd do
all the forking stuff for you..? you could get rid off the Type=
PIDFile= lines.. i also removed StandardError=syslog which seems to
be redundant here..

Generally it is advisable to let systemd handle the forking, but I guess
if the daemon is weird as lighthttpd is then it is a better idea to
use the internally forking and make systemd load the PID file.

Post by Marius Tolzmann
would this work? or isn't restart intended to be used for those kind
of things?
SignalStop=SIGINT and SignalReload=SIGINT or something similar may
be a nice shortcut for doing those kind of things.. Since
SignalReload=SIGHUP seems to be a way how it is done most of the
time?

As mentioned elsewhere I am not a big fan of using signals like this,
since signal delivery is async, and the stop/reload operations should be
synchronous, i.e. if you issue "systemctl reload foo.service" you should
be able to rely that the reloading is complete if the command following
this line is executed.

Post by Marius Tolzmann
Another question: Doesn't the default KillMode=control-group would
send a SIGTERM to the remaining processes of this service
immediately after the ExecStop= returns and so break the idea behind
a graceful shutdown where the main-process exits and all running
http-requests will be finished? (Or is there also a delay of
TimeoutSec= between ExecStop= and the first SIGTERM?)

No, there isn no such delay.

systemd will send the SIGTERM to the processes remaining in the cgroup
only. If the the stop command of the service didn't do its work (or
wasn't configured) it will kill everything that is left.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Lennart Poettering

2011-01-17 23:14:51 UTC

Post by Alexander E. Patrakov
[Unit]
Description=Lighttpd Web Server
After=network.target
[Service]
Type=forking
EnvironmentFile=/etc/conf.d/lighttpd
PIDFile=/var/run/lighttpd.pid
ExecStartPre=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf

Hmm, whiy is this necessary? I presumee starting the daemon will do an
implicit configuration check anyway, right? I mean, how could it load
the config without checking for its validity?

Post by Alexander E. Patrakov
ExecStart=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
ExecStop=/bin/kill -INT $MAINPID

This is asynchronous. The stop operation is supposed to be synchronous
however, should not return before it finished.

Post by Alexander E. Patrakov
ExecReload=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecReload=/bin/kill -INT $MAINPID

As you noticed, this changes the PID, and systemd currently cannot
handle this.

We could however reload the PID file after a reload completed I
guess. (/me adds this to the todo list)

Post by Alexander E. Patrakov
ExecReload=/bin/sleep 1

Evil! What's this for?

Post by Alexander E. Patrakov
ExecReload=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
StandardOutput=syslog
StandardError=syslog

If you don't set StandardError explicitly it will be implicitly set to
the same value as StandardOutput. In most cases it is hence sufficient
to set only StandardOutput=.

Post by Alexander E. Patrakov
[Install]
WantedBy=multi-user.target
WantedBy=http-daemon.target
Here is what happened when I tried to reload lighttpd while looking
[ 3639.047649] systemd[1]: Assertion 's->control_command_id ==
SERVICE_EXEC_START' failed at src/service.c:2184, function
service_run_next_main(). Aborting.
[ 3639.052182] systemd[1]: Caught <ABRT>, dumped core as pid 6045.
[ 3639.052376] systemd[1]: Freezing execution.

Ouch. This definitely needs to be fixed.

Post by Alexander E. Patrakov
Also, $MAINPID handling seems to be unreliable - e.g., systemctl
stop lighttpd.service works correctly only while systemd is being
straced. Replacing /bin/kill -INT $MAINPID with killall -INT
$MAINPID removed both the unreliability of $MAINPID and the
assertion failure. So I guess that both bugs are somehow related to
my usage of $MAINPID.

Yeah, MAINPID cannot really change during runtime right now. But I see
no reason why we shouldn't make it possible to have a dynamically
changing main PID to deal with lighthttpd here.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Alexander E. Patrakov

2011-01-18 12:30:22 UTC

Post by Lennart Poettering

Post by Alexander E. Patrakov
[Unit]
Description=Lighttpd Web Server
After=network.target
[Service]
Type=forking
EnvironmentFile=/etc/conf.d/lighttpd
PIDFile=/var/run/lighttpd.pid
ExecStartPre=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf

Hmm, whiy is this necessary? I presumee starting the daemon will do an
implicit configuration check anyway, right? I mean, how could it load
the config without checking for its validity?

Indeed, my bad.

Post by Lennart Poettering

Post by Alexander E. Patrakov
ExecStart=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
ExecStop=/bin/kill -INT $MAINPID

This is asynchronous. The stop operation is supposed to be synchronous
however, should not return before it finished.

This was modeled after the existing apache2 service file in gentoo
systemd overlay, which uses graceful asynchronous stop. If you delete
the ExecStop line, systemd will use SIGTERM (non-graceful stop) and
wait. That's probably what you want the stop operation to be. OTOH, like
it or not, too many existing services don't have any mechanism for
synchronous reload and use SIGHUP.

I think you should audit all existing service files in Gentoo systemd
overlay to ensure that I don't copy any more bad practice. See
http://git.overlays.gentoo.org/gitweb/?p=user/systemd.git;a=summary

Post by Lennart Poettering
As you noticed, this changes the PID, and systemd currently cannot
handle this.
We could however reload the PID file after a reload completed I
guess. (/me adds this to the todo list)

Well, there are cases (live update of nginx, see
http://wiki.nginx.org/NginxCommandLine#Upgrading_To_a_New_Binary_On_The_Fly)
where the main PID would change without the explicit "systemctl reload"
command. In the case of nginx, one can follow up the live update with a
dummy "systemctl reload", so I am not sure if it matters.

OTOH, maybe it would be better to evaluate the main PID lazily when it
is needed, instead of trying to enumerate all places where it can change
and reloading it there. But this way we will also hide all races caused
by bad PID file handling logic, so I am not sure.

--
Alexander E. Patrakov

Marius Tolzmann

2011-01-18 13:04:17 UTC

hi..

Post by Alexander E. Patrakov

Post by Lennart Poettering
As you noticed, this changes the PID, and systemd currently cannot
handle this.
We could however reload the PID file after a reload completed I
guess. (/me adds this to the todo list)

Well, there are cases (live update of nginx, see
http://wiki.nginx.org/NginxCommandLine#Upgrading_To_a_New_Binary_On_The_Fly)
where the main PID would change without the explicit "systemctl reload"
command. In the case of nginx, one can follow up the live update with a
dummy "systemctl reload", so I am not sure if it matters.
OTOH, maybe it would be better to evaluate the main PID lazily when it
is needed, instead of trying to enumerate all places where it can change
and reloading it there. But this way we will also hide all races caused
by bad PID file handling logic, so I am not sure.

we too use software where the MAINPID changes over time due to
automatic/on-the-fly reloading of the main binary..

that is how the on-the-fly logic is implemented:
when the modification of the binary is detected
-> the server will spawn a new main process (new MAINPID)
-> when successfully initialized
-> the new process will kill the old one
-> else (on failure) the old process continues until the next
modification is detected

systemd is not notified..

i considered issuing a "systemctl reload" in the main-process to let
systemd reload the whole service where ReloadExec would spawn the new
instance which kills the old one. But the ReloadExec would never return
and so the state of the service would probably be "reloading" or
something and the new MAINPID should be the one of ReloadExec-process..
..this does not seem to work at all..

how is this supposed to be handled in systemd??

m.

Lennart Poettering

2011-01-18 22:34:10 UTC

Post by Marius Tolzmann
hi..

Post by Alexander E. Patrakov

Post by Lennart Poettering
As you noticed, this changes the PID, and systemd currently cannot
handle this.
We could however reload the PID file after a reload completed I
guess. (/me adds this to the todo list)

Well, there are cases (live update of nginx, see
http://wiki.nginx.org/NginxCommandLine#Upgrading_To_a_New_Binary_On_The_Fly)
where the main PID would change without the explicit "systemctl reload"
command. In the case of nginx, one can follow up the live update with a
dummy "systemctl reload", so I am not sure if it matters.
OTOH, maybe it would be better to evaluate the main PID lazily when it
is needed, instead of trying to enumerate all places where it can change
and reloading it there. But this way we will also hide all races caused
by bad PID file handling logic, so I am not sure.

we too use software where the MAINPID changes over time due to
automatic/on-the-fly reloading of the main binary..
when the modification of the binary is detected

I am not a big fan of such a logic for two reasons: a) it's incredible
hard to respawn like this if *any* of the loaded code blobs are change
(i.e. an app is seldom just a single binary, but uses shared libraries
and loadable modules) and b) if multiple things change it is hard to
impossible to find the right time where really everything has been fully
updated, i.e. you might end up with a half-updated process in the end.

Post by Marius Tolzmann
-> the server will spawn a new main process (new MAINPID)
-> when successfully initialized
-> the new process will kill the old one
-> else (on failure) the old process continues until the next
modification is detected
systemd is not notified..

I guess in this case too sd_notify(3) could be of help, and the least
ugly solution.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Lennart Poettering

2011-01-18 22:31:04 UTC

Post by Alexander E. Patrakov

Post by Lennart Poettering

Post by Alexander E. Patrakov
ExecStart=/usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
ExecStop=/bin/kill -INT $MAINPID

This is asynchronous. The stop operation is supposed to be synchronous
however, should not return before it finished.

This was modeled after the existing apache2 service file in gentoo
systemd overlay, which uses graceful asynchronous stop. If you
delete the ExecStop line, systemd will use SIGTERM (non-graceful
stop) and wait. That's probably what you want the stop operation to
be. OTOH, like it or not, too many existing services don't have any
mechanism for synchronous reload and use SIGHUP.

Note that you can change the kill signal via "KillSignal=SIGHUP" or
something similar.

Post by Alexander E. Patrakov
Well, there are cases (live update of nginx, see http://wiki.nginx.org/NginxCommandLine#Upgrading_To_a_New_Binary_On_The_Fly)
where the main PID would change without the explicit "systemctl
reload" command. In the case of nginx, one can follow up the live
update with a dummy "systemctl reload", so I am not sure if it
matters.
OTOH, maybe it would be better to evaluate the main PID lazily when
it is needed, instead of trying to enumerate all places where it can
change and reloading it there. But this way we will also hide all
races caused by bad PID file handling logic, so I am not sure.

Hmpf, we actually send out bus notifications when service properties
such as the PID change. I figure in weirdo cases like this it would be
the nicest solution if the service would just invoke the following when
they change PID:

sd_notifyf(0, "MAINPID=%lu", (unsigned long) getpid());

sd_notify() is defined in sd-daemon.[ch], and may be used to inform the
init system about service status changes, include change of the main PID
with a command like the above. See sd_notify(3) for details.

Lennart

--
Lennart Poettering - Red Hat, Inc.

34 Replies
102 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Alexander E. Patrakov 2011-01-10 10:11:26 UTC

Alexander E. Patrakov 2011-01-10 12:30:50 UTC

Kay Sievers 2011-01-10 13:11:16 UTC

Mirco Tischler 2011-01-10 13:57:41 UTC

Alexander E. Patrakov 2011-01-10 14:34:14 UTC

Alexander E. Patrakov 2011-01-10 15:22:25 UTC

Lennart Poettering 2011-01-17 23:18:59 UTC

Marius Tolzmann 2011-01-10 17:49:39 UTC

Alexander E. Patrakov 2011-01-11 04:54:16 UTC

Lennart Poettering 2011-01-17 23:23:22 UTC

Alexander E. Patrakov 2011-01-18 11:41:06 UTC

Lennart Poettering 2011-01-18 11:51:09 UTC

Alexander E. Patrakov 2011-01-18 12:56:02 UTC

Lennart Poettering 2011-01-18 22:38:34 UTC

Alexander E. Patrakov 2011-01-11 10:51:49 UTC

Marius Tolzmann 2011-01-11 14:46:01 UTC

Alexander E. Patrakov 2011-01-11 15:11:29 UTC

Mirco Tischler 2011-01-11 15:50:56 UTC

Mirco Tischler 2011-01-11 15:56:40 UTC

Mirco Tischler 2011-01-11 16:11:40 UTC

Alexander E. Patrakov 2011-01-11 16:40:58 UTC

Mirco Tischler 2011-01-12 13:31:09 UTC

Marius Tolzmann 2011-01-11 16:30:01 UTC

Lennart Poettering 2011-01-17 23:50:34 UTC

Marius Tolzmann 2011-01-18 10:09:41 UTC

Lennart Poettering 2011-01-17 23:48:40 UTC

Alexander E. Patrakov 2011-01-18 13:00:50 UTC

Lennart Poettering 2011-01-18 22:39:49 UTC

Lennart Poettering 2011-01-17 23:41:49 UTC

Lennart Poettering 2011-01-17 23:46:03 UTC

Lennart Poettering 2011-01-17 23:14:51 UTC

Alexander E. Patrakov 2011-01-18 12:30:22 UTC

Marius Tolzmann 2011-01-18 13:04:17 UTC

Lennart Poettering 2011-01-18 22:34:10 UTC

Lennart Poettering 2011-01-18 22:31:04 UTC

about - legalese

Loading...