Discussion:
[systemd-devel] High CPU usage of journald
Holger Freyther
2013-02-17 10:12:47 UTC
Permalink
Dear maintainers,

first of all the good news. The same setup used to exhibit high
memory usage (and exploding virtual address space). In a relatively
idle system I see a constant CPU usage of journald of around 20% (
using the busybox top). The main issue with that is that the system
starts to feel really slow due this. Typing commands via the serial
console feels like a satellite link.

Config:
[Journal]
Storage=volatile
RuntimeMaxUse=648K
(removed the lines starting with #)


Systemd:
As it is included in Poky/OE-Core right now (version 197)


Kernel:
This is a 3.2.37 kernel with minimal cgroup support.


Service:
The service file can be seen here[1]. It will re-load firmware, toggle
a LED and start the BTS process. If the BTS can not connect to a BSC it
will exit and the service file is asked to restart it in two seconds.


kernel messages:
"DSP 0: Firmware loaded" is printed every time the firmware
is reloaded..


log messages from userspace:
I am posting through gmane as attaching the file is a bit difficult. I
have uploaded them here[2]. It is 47 lines including the messages from
systemd about re-starting the service.


Strace:
Some seconds of output with strace is another paste[3]. What is noticeable
is the polling on /proc/*/cgroup, mmapping to read the content, closing
the file, unmapping. GLIBCs fgets is a bit wasteful in this regard.


I am currently re-compiling my rootfs to not omit the framepointers and
then will hopefully be able to have useful output of perf. Is the current
behavior expected/wanted?


holger


[1] http://cgit.osmocom.org/cgit/osmo-bts/tree/contrib/sysmobts.service
[2] http://paste.lisp.org/display/135526
[3] http://paste.lisp.org/display/135527
Holger Freyther
2013-02-17 17:54:58 UTC
Permalink
Post by Holger Freyther
I am currently re-compiling my rootfs to not omit the framepointers and
then will hopefully be able to have useful output of perf. Is the current
behavior expected/wanted?
Somehow backtrace generation is still broken (perf should be able to
walk user stacks as long as the framepointer is there).


$ perf top -p `pidof systemd-journald` -K
15.43% libc-2.17.so [.] _int_malloc
11.97% libc-2.17.so [.] vfprintf
5.98% systemd-journald [.] 0x2bdc
5.35% systemd-journald [.] generic_array_bisect
4.09% libc-2.17.so [.] ptmalloc_init
3.62% libc-2.17.so [.] __malloc_stats
2.83% systemd-journald [.] journal_file_append_entry_internal
2.68% libc-2.17.so [.] _IO_flush_all_lockp
2.52% libc-2.17.so [.] memchr
2.36% libc-2.17.so [.] __vwprintf_chk
2.20% systemd-journald [.] __aeabi_dmul
2.20% libc-2.17.so [.] _IO_link_in
2.05% libc-2.17.so [.] _int_memalign
2.05% [unknown] [.] 0xffff0fc0
1.89% systemd-journald [.] journal_file_copy_entry
1.89% libc-2.17.so [.] _IO_cleanup
1.89% libc-2.17.so [.] __strcoll_l
1.73% systemd-journald [.] journal_file_move_to_entry_by_realtime_for_data


So where does _int_malloc come from? I have used gdb to 'sample'
it by hand. So the answer is from a lot of places. Do you consider
cutting back on how you dynamically allocate strings? E.g. stop
using fopen, fgets, only have one dynamically tmp string for the
various format routines one is using sequentially?




Samples:
Breakpoint 1, _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>,
bytes=***@entry=15) at malloc.c:3241
3241 malloc.c: No such file or directory.
(gdb) bt
#0 _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>, bytes=***@entry=15)
at malloc.c:3241
#1 0x4dce2fdc in __GI___libc_malloc (bytes=***@entry=15) at malloc.c:2859
#2 0x4dce7c60 in __GI___strdup (s=***@entry=0x4dd999fc "/etc/localtime")
at strdup.c:42
#3 0x4dcfcdd4 in tzset_internal (always=<optimized out>,
explicit=***@entry=1) at tzset.c:441
#4 0x4dcfd210 in __tz_convert (timer=***@entry=0xbefc0754,
use_localtime=***@entry=1, tp=0x4ddb16cc <_tmbuf>) at tzset.c:629
#5 0x4dcfb618 in __GI_localtime (t=***@entry=0xbefc0754) at localtime.c:42
#6 0x0000e694 in server_forward_syslog (s=***@entry=0xbefc0b48,


#0 _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>, bytes=***@entry=11)
at malloc.c:3241
#1 0x4dce2fdc in __GI___libc_malloc (bytes=***@entry=11) at malloc.c:2859
#2 0x4dd5af68 in __GI___vasprintf_chk (
result_ptr=***@entry=0xbefc0824, flags=***@entry=1,
format=***@entry=0x2e73c "PRIORITY=%i", args=***@entry=...)
at vasprintf_chk.c:80
#3 0x4dd5ae54 in __asprintf_chk (result_ptr=***@entry=0xbefc0824,
flags=***@entry=1, format=0x2e73c "PRIORITY=%i") at asprintf_chk.c:33
#4 0x0002aac4 in asprintf (__fmt=0x2e73c "PRIORITY=%i", __ptr=0xbefc0824)
at /home/ich/source/openembedded/poky/build/tmp/sysroots/sysmobts-
v2/usr/include/bits/stdio2.h:178
#5 stdout_stream_log (p=0x4d88d "Shutdown timer expired", s=0x4d868)
at src/journal/journald-stream.c:106
#6 stdout_stream_line (s=***@entry=0x4d868, p=<optimized out>,
***@entry=0x4d88d "Shutdown timer expired")
at src/journal/journald-stream.c:231
#7 0x0002ad68 in stdout_stream_scan (s=***@entry=0x4d868,
force_flush=***@entry=false) at src/journal/journald-stream.c:


#0 _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>, bytes=***@entry=27)
at malloc.c:3241
#1 0x4dce2fdc in __GI___libc_malloc (bytes=***@entry=27) at malloc.c:2859
#2 0x0001af2c in malloc_multiply (b=27, a=1) at src/shared/util.h:533
#3 strnappend (s=***@entry=0x2e764 "SYSLOG_IDENTIFIER=",
suffix=***@entry=0x4ace0 "sysmobts", b=8) at src/shared/util.c:1123
#4 0x0001afb8 in strappend (s=0x2e764 "SYSLOG_IDENTIFIER=",
suffix=0x4ace0 "sysmobts") at src/shared/util.c:1135
#5 0x0002ab50 in stdout_stream_log (p=0x4d88d "Shutdown timer expired",
s=0x4d868) at src/journal/journald-stream.c:114
#6 stdout_stream_line (s=***@entry=0x4d868, p=<optimized out>,
***@entry=0x4d88d "Shutdown timer expired")
at src/journal/journald-stream.c:231


#0 _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>, bytes=***@entry=100)
at malloc.c:3241
#1 0x4dce2fdc in __GI___libc_malloc (bytes=***@entry=100) at malloc.c:2859
#2 0x4dd5ae90 in __GI___vasprintf_chk (
result_ptr=***@entry=0xbefbff50, flags=***@entry=1,
format=***@entry=0x304a0 "/proc/%lu/cgroup", args=..., ***@entry=...)
at vasprintf_chk.c:48
#3 0x4dd5ae54 in __asprintf_chk (result_ptr=***@entry=0xbefbff50,
flags=***@entry=1, format=0x304a0 "/proc/%lu/cgroup")
at asprintf_chk.c:33
#4 0x0001594c in asprintf (__fmt=0x304a0 "/proc/%lu/cgroup", __ptr=0xbefbff50)
at /home/ich/source/openembedded/poky/build/tmp/sysroots/sysmobts-
v2/usr/include/bits/stdio2.h:178
#5 cg_get_by_pid (controller=0x2ed98 "name=systemd", pid=<optimized out>,
path=***@entry=0xbefc0780) at src/shared/cgroup-util.c:792


#0 _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>, bytes=***@entry=360)
at malloc.c:3241
#1 0x4dce2fdc in __GI___libc_malloc (bytes=***@entry=360) at malloc.c:2859
#2 0x4dccbb4c in __fopen_internal (filename=0x4cc88 "/proc/3575/cgroup",
mode=0x2f624 "re", is32=***@entry=0) at iofopen.c:73
#3 0x4dccf100 in _IO_fopen64 (filename=<optimized out>, mode=<optimized out>)
at iofopen64.c:39
#4 0x00015964 in cg_get_by_pid (controller=0x2ed98 "name=systemd",
pid=<optimized out>, path=***@entry=0xbefc0780)
at src/shared/cgroup-util.c:795
#5 0x0000f498 in shortened_cgroup_path (pid=<optimized out>)
at src/journal/journald-server.c:403

#0 _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>, bytes=***@entry=100)
at malloc.c:3241
---Type <return> to continue, or q <return> to quit---
#1 0x4dce2fdc in __GI___libc_malloc (bytes=***@entry=100) at malloc.c:2859
#2 0x4dd5ae90 in __GI___vasprintf_chk (
result_ptr=***@entry=0xbefc0708, flags=***@entry=1,
format=***@entry=0x2f174 "_UID=%lu", args=..., ***@entry=...)
at vasprintf_chk.c:48
#3 0x4dd5ae54 in __asprintf_chk (result_ptr=***@entry=0xbefc0708,
flags=***@entry=1, format=0x2f174 "_UID=%lu") at asprintf_chk.c:33
#4 0x00010b24 in asprintf (__fmt=0x2f174 "_UID=%lu", __ptr=0xbefc0708)
at /home/ich/source/openembedded/poky/build/tmp/sysroots/sysmobts-
v2/usr/include/bits/stdio2.h:178
#5 dispatch_message_real (s=***@entry=0xbefc0b48, iovec=***@entry=0xbefc0830,
n=6, ***@entry=5, m=***@entry=22, ucred=***@entry=0x4d874, tv=***@entry=0x0,
label=***@entry=0x0, label_len=***@entry=0,
unit_id=***@entry=0x4cda0 "sysmobts.service")
at src/journal/journald-server.c:551

Breakpoint 1, _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>,
bytes=***@entry=100) at malloc.c:3241
3241 in malloc.c
#0 _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>, bytes=***@entry=100)
at malloc.c:3241
#1 0x4dce2fdc in __GI___libc_malloc (bytes=***@entry=100) at malloc.c:2859
#2 0x4dd5ae90 in __GI___vasprintf_chk (
result_ptr=***@entry=0xbefc06bc, flags=***@entry=1,
format=***@entry=0x31ed0 "/proc/%lu/comm", args=..., ***@entry=...)
at vasprintf_chk.c:48
#3 0x4dd5ae54 in __asprintf_chk (result_ptr=***@entry=0xbefc06bc,
flags=***@entry=1, format=0x31ed0 "/proc/%lu/comm") at asprintf_chk.c:33
#4 0x0001d36c in asprintf (__fmt=0x31ed0 "/proc/%lu/comm", __ptr=0xbefc06bc)
at /home/ich/source/openembedded/poky/build/tmp/sysroots/sysmobts-
v2/usr/include/bits/stdio2.h:178
#5 get_process_comm (pid=<optimized out>, name=***@entry=0xbefc0728)


Breakpoint 1, _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>,
bytes=***@entry=100) at malloc.c:3241
3241 in malloc.c
#0 _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>, bytes=***@entry=100)
at malloc.c:3241
#1 0x4dce2fdc in __GI___libc_malloc (bytes=***@entry=100) at malloc.c:2859
#2 0x0001b03c in malloc_multiply (b=<optimized out>, a=<optimized out>)
at src/shared/util.h:533
#3 readlink_malloc (p=0x4cc88 "/proc/3575/exe", r=***@entry=0xbefc0728)
at src/shared/util.c:1148
#4 0x0001b158 in get_process_exe (pid=<optimized out>,
name=***@entry=0xbefc0728) at src/shared/util.c:1037

#0 _int_malloc (av=***@entry=0x4ddae8e4 <main_arena>, bytes=***@entry=19)
at malloc.c:3241
#1 0x4dce2fdc in __GI___libc_malloc (bytes=***@entry=19) at malloc.c:2859
---Type <return> to continue, or q <return> to quit---
#2 0x4dd5af68 in __GI___vasprintf_chk (
result_ptr=***@entry=0xbefc06a4, flags=***@entry=1,
format=***@entry=0x31678 "/proc/%lu/cmdline", args=***@entry=...)
at vasprintf_chk.c:80
#3 0x4dd5ae54 in __asprintf_chk (result_ptr=***@entry=0xbefc06a4,
flags=***@entry=1, format=0x31678 "/proc/%lu/cmdline")
at asprintf_chk.c:33
#4 0x0001d460 in asprintf (__fmt=0x31678 "/proc/%lu/cmdline",
__ptr=0xbefc06a4)
at /home/ich/source/openembedded/poky/build/tmp/sysroots/sysmobts-
v2/usr/include/bits/stdio2.h:178
#5 get_process_cmdline (pid=3575, max_length=***@entry=2048,
comm_fallback=***@entry=false, line=***@entry=0xbefc0728)
at src/shared/util.c:921
Cristian Rodríguez
2013-02-17 18:31:45 UTC
Permalink
Post by Holger Freyther
So where does _int_malloc come from? I have used gdb to 'sample'
it by hand. So the answer is from a lot of places. Do you consider
cutting back on how you dynamically allocate strings? E.g. stop
using fopen, fgets, only have one dynamically tmp string for the
various format routines one is using sequentially?
Why should systemd stop using standard IO functions to please some
obscure, out of the ordinary need ?

If libc behaves in a way you dont like or want, you will have to send
your complains to libc developers..
Holger Freyther
2013-02-18 08:00:37 UTC
Permalink
Cristian Rodríguez <crrodriguez <at> opensuse.org> writes:

Good Morning Cristian,
Post by Cristian Rodríguez
Why should systemd stop using standard IO functions to please some
obscure, out of the ordinary need?
Could you please clarify obscure? So any embedded device that prints
a couple of log messages on start of an application is obscure in your
opinion? It is obvious from the perf output I posted that journald's
performance is determined by the malloc operations. If the maintainers
think that the main purpose of journald is to allocate memory then I am
happy to continue to use the busybox syslogd.
Post by Cristian Rodríguez
If libc behaves in a way you dont like or want, you will have to send
your complains to libc developers..
Well, there are various things a libc implementation needs to do that
should not matter to systemd (e.g. thread safety, io cancellation,
re-entrancy, locking, ...).
Lennart Poettering
2013-02-19 02:27:02 UTC
Permalink
Post by Holger Freyther
Good Morning Cristian,
Post by Cristian Rodríguez
Why should systemd stop using standard IO functions to please some
obscure, out of the ordinary need?
Could you please clarify obscure? So any embedded device that prints
a couple of log messages on start of an application is obscure in your
opinion? It is obvious from the perf output I posted that journald's
Well, to be fair: if an app just prints a "couple of log messages", then
the functions you point out should hardly matter... Optimize inner
loops, not just the stuff that happens to be called a "couple of"
times...
Post by Holger Freyther
performance is determined by the malloc operations. If the maintainers
think that the main purpose of journald is to allocate memory then I am
happy to continue to use the busybox syslogd.
No, the purpose of journald is not to allocate memory. It's primary
purpose is actually to collect sarcastic comments by people. We thrive
on that...
Post by Holger Freyther
Post by Cristian Rodríguez
If libc behaves in a way you dont like or want, you will have to send
your complains to libc developers..
Well, there are various things a libc implementation needs to do that
should not matter to systemd (e.g. thread safety, io cancellation,
re-entrancy, locking, ...).
Holger Hans Peter Freyther
2013-02-19 18:21:17 UTC
Permalink
Post by Lennart Poettering
Well, to be fair: if an app just prints a "couple of log messages", then
the functions you point out should hardly matter... Optimize inner
loops, not just the stuff that happens to be called a "couple of"
times...
I agree, with a "couple of log messages" journald shouldn't be anywhere
the CPU usage it is. Now my instruments are primitive, 'cross' perf
appears to be not implemented yet (another thing to put in the backlog) but
with what I have the _int_malloc sounds plausible. I spent years optimizing
WebKit and getting rid off a malloc was always a measurable improvement.
Post by Lennart Poettering
No, the purpose of journald is not to allocate memory. It's primary
purpose is actually to collect sarcastic comments by people. We thrive
on that...
And it is called journald so you can keep track of the comments? ;)
Post by Lennart Poettering
Does this mean I will now optimize them all away for you? No, not
really, I don't even have the appropriate hardware to profile this. Does
this mean I will merge good patches that optimize this? Hell, yes!
It works both ways. If I'm going to spend a significant amount of my time
to make journald usable on embedded hardware (right now my product doesn't
really need it) I want to have some kind of commitment that while I cut back
on dynamic allocations not a ton of new ones get introduced at the same time.

Just to be clear, when I talk about cutting back dynamic allocations I don't
mean introducing char foo[SHOULD_BE_BIG_ENOUGH_BUT_IS_NOT] but to be a bit
more clever about the usage. E.g. have a 'scratch' string, avoid reallocs
in steps of one, etc.

cheers
holger

PS: I thought Nokia hooked you up with a N900 or such?
Lennart Poettering
2013-02-20 01:46:35 UTC
Permalink
Post by Holger Hans Peter Freyther
Post by Lennart Poettering
Does this mean I will now optimize them all away for you? No, not
really, I don't even have the appropriate hardware to profile this. Does
this mean I will merge good patches that optimize this? Hell, yes!
It works both ways. If I'm going to spend a significant amount of my time
to make journald usable on embedded hardware (right now my product doesn't
really need it) I want to have some kind of commitment that while I cut back
on dynamic allocations not a ton of new ones get introduced at the same time.
Well, I am not going to give you a blanket promise that I will merge any
patch you send us, but yes, we will consider every patch and merge the
good stuff, and we will try to avoid making things worse again. I mean,
the code in question otherwise is not really code to refactor anyway in
my eyes, so I think there's not much risk to fuck this up again
anyways...

Also, adding a comment to dispatch_message_real() saying:

/* Hey, this code is run for each log line, consider it an inner
loop function, so minimize memory allocations and sprintf()
invocations please, thxgoodbye! */

should make sure enough that people changing the code keep in mind that
the call is performance-sensitive.
Post by Holger Hans Peter Freyther
Just to be clear, when I talk about cutting back dynamic allocations I don't
mean introducing char foo[SHOULD_BE_BIG_ENOUGH_BUT_IS_NOT] but to be a bit
more clever about the usage. E.g. have a 'scratch' string, avoid reallocs
in steps of one, etc.
Wel, keep it readable, that's most important...
Post by Holger Hans Peter Freyther
PS: I thought Nokia hooked you up with a N900 or such?
Yeah, and I sold it on ebay a while ago... ;-)

Lennart
--
Lennart Poettering - Red Hat, Inc.
David Strauss
2013-02-18 16:47:44 UTC
Permalink
Post by Holger Freyther
Do you consider
cutting back on how you dynamically allocate strings? E.g. stop
using fopen, fgets, only have one dynamically tmp string for the
various format routines one is using sequentially?
It's possible that a lot of it could also allocate on the stack or use
stack-style management of a pool in heap. This is pretty
performance-critical code, and we're seeing similar CPU overhead.
--
David Strauss
| ***@davidstrauss.net
| +1 512 577 5827 [mobile]
Holger Hans Peter Freyther
2013-02-18 18:52:39 UTC
Permalink
On Mon, Feb 18, 2013 at 08:47:44AM -0800, David Strauss wrote:

Dear David, all,
Post by David Strauss
It's possible that a lot of it could also allocate on the stack or use
stack-style management of a pool in heap. This is pretty
performance-critical code, and we're seeing similar CPU overhead.
I think upstream doesn't care about the CPU overhead of the journald. I
had a quick look on my notebook with the same perf command and _int_malloc
is noticable in the trace as well.

In OpenBSC/Osmocom we have our struct msgb (a clone of struct skb) and
we can pull and push from the allocated memory. E.g. to replace a LayerX
header or prepend another header.


I browsed through journald-server.c:dispatch_message_real and there are
several things one could do... there are certainly a lot more items.


1.) Several things are done over and over again... e.g. calls to the
have_effective_cap.. it is unlikely that it will change after the journald
has started.


2.) Avoid allocation part one...

r = get_process_cmdline(ucred->pid, 0, false, &t);
if (r >= 0) {
cmdline = strappend("_CMDLINE=", t);
free(t);

if (cmdline)
IOVEC_SET_STRING(iovec[n++], cmdline);
}

So instead of the strappend (which will do strdup..) one could make sure
that 't' has enough space to prepend the _CMDLINE= or the _EXE, or the
other strings.


3.) Looking at code like... get_process_cmdline

asprintf(&p, "/proc/%lu/cmdline", (unsigned long) pid)


So first of all /proc/%lu/ is created over and over again for exe, comm,
cmdline, sessionid, cgroup... With a different API one could easily avoid
this.

4.) Still at the same method..

if (max_length == 0) {
size_t len = 1;
while ((c = getc(f)) != EOF) {
k = realloc(r, len+1);
if (k == NULL) {

is most likely not the most effective way to handle the allocation. Or
maybe call get_process_cmdline with a max length?


5.) I wonder if some of the information could be sent from the systemd
to avoid the work in the journald...



I will end up commenting out most of dispatch_message_real and check if
the CPU down drops to a low single digit. I would assume it will.


regards
holger
Colin Guthrie
2013-02-18 22:50:17 UTC
Permalink
'Twas brillig, and Holger Hans Peter Freyther at 18/02/13 18:52 did gyre
Post by Holger Hans Peter Freyther
Dear David, all,
Post by David Strauss
It's possible that a lot of it could also allocate on the stack or use
stack-style management of a pool in heap. This is pretty
performance-critical code, and we're seeing similar CPU overhead.
I think upstream doesn't care about the CPU overhead of the journald.
I don't really think that's true. It certainly hasn't been optimized
heavily yet, and more concentration is definitely going into features at
present, but that's not to say that performance isn't a goal in the
completeness of time. Also there will be lots of people for whom this is
more of a priority right now and thus they will be the ones perhaps
contributing the patches to help edge things in that direction :)

Col
--
Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
Tribalogic Limited http://www.tribalogic.net/
Open Source:
Mageia Contributor http://www.mageia.org/
PulseAudio Hacker http://www.pulseaudio.org/
Trac Hacker http://trac.edgewall.org/
Kok, Auke-jan H
2013-02-19 02:07:41 UTC
Permalink
Post by Colin Guthrie
'Twas brillig, and Holger Hans Peter Freyther at 18/02/13 18:52 did gyre
Post by Holger Hans Peter Freyther
Dear David, all,
Post by David Strauss
It's possible that a lot of it could also allocate on the stack or use
stack-style management of a pool in heap. This is pretty
performance-critical code, and we're seeing similar CPU overhead.
I think upstream doesn't care about the CPU overhead of the journald.
I don't really think that's true. It certainly hasn't been optimized
heavily yet, and more concentration is definitely going into features at
present, but that's not to say that performance isn't a goal in the
completeness of time. Also there will be lots of people for whom this is
more of a priority right now and thus they will be the ones perhaps
contributing the patches to help edge things in that direction :)
It's certainly been on my radar already as a problem for performance
numbers... I'd like to see if we can fix this.

Auke
Lennart Poettering
2013-02-19 02:08:13 UTC
Permalink
Post by Holger Hans Peter Freyther
Dear David, all,
Post by David Strauss
It's possible that a lot of it could also allocate on the stack or use
stack-style management of a pool in heap. This is pretty
performance-critical code, and we're seeing similar CPU overhead.
I think upstream doesn't care about the CPU overhead of the
journald.
I can assure you that that's definitely not true. We do care about
performance.

That said, the journal is relatively new and obviously not optimized in
all details for all machines, and we knowingly and frequently pick the
more readable codepath rather than the faster one because readable and
manageable code is important to us.

You need to understand that on x86 journald (and systemd) are
virtually "free" CPU-wise, these days. We develop on x86. And hence
optimizing things was never the most important issue for us, because the
codepaths never showed up badly anywhere anyway. That we develop on x86
doesn't mean we wouldn't care for embedded CPUs. We do, and we have
optimized some things in this area already, but nobody yet gave this
love to the journald component of systemd. I mean, udev's entire history
is more or less one of optimizations. And in systemd you find quite a
few changes that are about optimization and nothing else.

So, if you found something that isn't as fast as it should be and shows
up in profiling, then we really should optimize that, and am happy to
look into this. However, as I don't even possess any machine with a CPU
where these issues show up I am not sure I will be able to do this all
by myself...

For many string operations that might end up in inner loop we try to use
alloca() these days, however for some that's really hard to get right,
for example for sprintf() related loops...
Post by Holger Hans Peter Freyther
I browsed through journald-server.c:dispatch_message_real and there are
several things one could do... there are certainly a lot more items.
We should really base this all on profiling data. Without that premature
optimizations will just make the code unreadable.
Post by Holger Hans Peter Freyther
1.) Several things are done over and over again... e.g. calls to the
have_effective_cap.. it is unlikely that it will change after the journald
has started.
Does this really show up negatively in profiling runs?
Post by Holger Hans Peter Freyther
2.) Avoid allocation part one...
r = get_process_cmdline(ucred->pid, 0, false, &t);
if (r >= 0) {
cmdline = strappend("_CMDLINE=", t);
free(t);
if (cmdline)
IOVEC_SET_STRING(iovec[n++], cmdline);
}
So instead of the strappend (which will do strdup..) one could make sure
that 't' has enough space to prepend the _CMDLINE= or the _EXE, or the
other strings.
If this is indeed something that shows up negative in profiling we could
just extend get_process_cmdline() with a prefix char* arg, that if
non-NULL is prepended implicitly to the returned string.
Post by Holger Hans Peter Freyther
3.) Looking at code like... get_process_cmdline
asprintf(&p, "/proc/%lu/cmdline", (unsigned long) pid)
So first of all /proc/%lu/ is created over and over again for exe, comm,
cmdline, sessionid, cgroup... With a different API one could easily avoid
this.
It would probably be best to just use strings on the stack for this anyway...
Post by Holger Hans Peter Freyther
4.) Still at the same method..
if (max_length == 0) {
size_t len = 1;
while ((c = getc(f)) != EOF) {
k = realloc(r, len+1);
if (k == NULL) {
is most likely not the most effective way to handle the allocation. Or
maybe call get_process_cmdline with a max length?
Actually, if you look in the glibc sources you will find that it does
some exponentional size increasing anyway internally of realloc(). After
realizing this we recently droppped a couple of our own exponential
realloc() loops in favour of the simpler to read linear realloc(foo,
len+1) loops. I'd be really surprised if this would show up in any
profiling. If it did glibc's own exponential increas code would not
work...
Post by Holger Hans Peter Freyther
5.) I wonder if some of the information could be sent from the systemd
to avoid the work in the journald...
So it's a bit of a security thing too: the journal distuingishes
"trusted" fields from "untrusted" ones. The former are collected by
journald and read directly from the kernel and can thus not be
faked. The latter are passed in from the clients, and can thus be faked
data. The "trusted" fields are prefixed with an "_", and clients can
never send fields prefixed with "_".
Post by Holger Hans Peter Freyther
I will end up commenting out most of dispatch_message_real and check if
the CPU down drops to a low single digit. I would assume it will.
This would certainly be interesting.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Lennart Poettering
2013-02-19 02:15:53 UTC
Permalink
Post by Holger Freyther
So where does _int_malloc come from? I have used gdb to 'sample'
it by hand. So the answer is from a lot of places. Do you consider
cutting back on how you dynamically allocate strings? E.g. stop
using fopen, fgets, only have one dynamically tmp string for the
various format routines one is using sequentially?
Well this depends. Dynamically allocating strings makes code generally
more readable and just sidesteps the whole "how long should I make this
buffer" issue. As long as these allocations never show up in inner loops
this is a non-issue.

There are many cases where we can easily get rid of dynamic string
allocation. For example, in the various calls that access
/proc/$PID/foobar, we can certainly generate that path string on the
stack, since the PID number is finite.

So, the answer is: it depends on the case. If something is used in inner
loops an doesn't fuck up readability entirely I am more than happy to
merge a patch. In many cases asprintf() can already be replaced by
strjoin() and strappend() which actually is already a major
speed-up. Moving to strings allocated on stack in many cases where the
string is known to have a reasonable max size anyway is good too (but
please, don't just blindly allocate "char buf[256]" everywhere...).

Blindly replacing all uses of asprintf() however, nah, not sure I like
that too much... It does make things a lot more readable.
Post by Holger Freyther
3241 malloc.c: No such file or directory.
(gdb) bt
at malloc.c:3241
at strdup.c:42
#3 0x4dcfcdd4 in tzset_internal (always=<optimized out>,
Please note that this one is a glibc internal issue. It's the
implementation of localtime().

Lennart
--
Lennart Poettering - Red Hat, Inc.
Loading...