Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Discussion:

Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

(too old to reply)

jim_p

2018-10-28 05:40:01 UTC

Package: openssh-server
Version: 1:7.9p1-1
Severity: normal

Dear Maintainer,

After yesterdays upgrade to 7.9p1 in testing, openssh-server delays my system's
boot by quite some time.
In fact, it is always the first and slowest to start process in systemd-analyze
blame, with times varying from < 20 seconds most of the time

19.044s ssh.service
2.083s ifupdown-wait-online.service

to this insane 100+ second delay I found out yesterday

1min 24.779s ssh.service
1.120s ifupdown-wait-online.service

Even those 20 seconds are a lot, because due to its delay, my boot time (as
mentioned by systemd-analyze) increased from 5-7 seconds to 20+. The os is on
an ssd drive obviously.

If it helps, I have ipv6 disabled system-wide with this kernel parameter:
ipv6.disable=1.

-- System Information:
Debian Release: buster/sid
APT prefers testing
APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages openssh-server depends on:
ii adduser 3.118
ii debconf [debconf-2.0] 1.5.69
ii dpkg 1.19.2
ii libaudit1 1:2.8.4-2
ii libc6 2.27-6
ii libcom-err2 1.44.4-2
ii libgssapi-krb5-2 1.16.1-1
ii libkrb5-3 1.16.1-1
ii libpam-modules 1.1.8-3.8
ii libpam-runtime 1.1.8-3.8
ii libpam0g 1.1.8-3.8
ii libselinux1 2.8-1+b1
ii libssl1.1 1.1.1-1
ii libsystemd0 239-10
ii libwrap0 7.6.q-27
ii lsb-base 9.20170808
ii openssh-client 1:7.9p1-1
ii openssh-sftp-server 1:7.9p1-1
ii procps 2:3.3.15-2
ii ucf 3.0038
ii zlib1g 1:1.2.11.dfsg-1

Versions of packages openssh-server recommends:
ii libpam-systemd 239-10
ii ncurses-term 6.1+20181013-1
ii xauth 1:1.0.10-1

Versions of packages openssh-server suggests:
pn molly-guard <none>
pn monkeysphere <none>
pn rssh <none>
pn ssh-askpass <none>
pn ufw <none>

-- debconf information:
openssh-server/password-authentication: true
openssh-server/permit-root-login: true

Bernhard Übelacker

2018-10-28 10:50:02 UTC

Dear Maintainer,
just some more information, because I think I see this
difference in my qemu buster amd64 VM too.

Before I could ssh into that machine just after some seconds.

Now it takes some time until that line "random: crng init done"
appears in dmesg.
With logging in in the qemu window this line appears just after a
few seconds, when just trying via ssh it takes much longer.

I tried to find out where it blocks exactly and came to that location:

#0 0x00007fb515b1803a in getentropy (buffer=0x56285633d440, length=***@entry=32) at ../sysdeps/unix/sysv/linux/getentropy.c:45
#1 0x00007fb5161e3603 in syscall_random (buflen=32, buf=<optimized out>) at ../crypto/rand/rand_unix.c:277
#2 rand_pool_acquire_entropy (pool=***@entry=0x5628563394e0) at ../crypto/rand/rand_unix.c:469
#3 0x00007fb5161e2d8d in rand_drbg_get_entropy (drbg=0x562856339e80, pout=0x7ffd1c2bce60, entropy=<optimized out>, min_len=<optimized out>, max_len=<optimized out>, prediction_resistance=0) at ../crypt$
#4 0x00007fb5161e11b2 in RAND_DRBG_instantiate (drbg=***@entry=0x562856339e80, pers=***@entry=0x7fb516289d20 <ossl_pers_string> "OpenSSL NIST SP 800-90A DRBG", perslen=***@entry=28) at ../crypto/$
#5 0x00007fb5161e21a8 in drbg_setup (parent=***@entry=0x0) at ../crypto/rand/drbg_lib.c:870
#6 0x00007fb5161e222f in do_rand_drbg_init () at ../crypto/rand/drbg_lib.c:899
#7 do_rand_drbg_init_ossl_ () at ../crypto/rand/drbg_lib.c:884
#8 0x00007fb5150c9827 in __pthread_once_slow (once_control=0x7fb5163118f8 <rand_drbg_init>, init_routine=0x7fb5161e21d0 <do_rand_drbg_init_ossl_>) at pthread_once.c:116
#9 0x00007fb5150c98e5 in __GI___pthread_once (once_control=***@entry=0x7fb5163118f8 <rand_drbg_init>, init_routine=***@entry=0x7fb5161e21d0 <do_rand_drbg_init_ossl_>) at pthread_once.$
#10 0x00007fb516221329 in CRYPTO_THREAD_run_once (once=***@entry=0x7fb5163118f8 <rand_drbg_init>, init=***@entry=0x7fb5161e21d0 <do_rand_drbg_init_ossl_>) at ../crypto/threads_pthread.c:113
#11 0x00007fb5161e2327 in RAND_DRBG_get0_master () at ../crypto/rand/drbg_lib.c:1010
#12 0x00007fb5161e235d in drbg_status () at ../crypto/rand/drbg_lib.c:992
#13 0x00005628556a253f in seed_rng () at ../../entropy.c:238
#14 0x000056285564b13c in main (ac=2, av=0x56285631b970) at ../../sshd.c:1696

Most of the stack is inside /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
e.g libssl1.1 that got also a new upload at that time.

Attached file describes how I retrieved that stack and some more logging.
(For my qemu machine a workaround would just be
to add "-device virtio-rng-pci", to have proper randomness.)

Kind regards,
Bernhard

Colin Watson

2018-10-28 18:50:01 UTC

Post by Bernhard Ãbelacker
Now it takes some time until that line "random: crng init done"
appears in dmesg.
With logging in in the qemu window this line appears just after a
few seconds, when just trying via ssh it takes much longer.
#1 0x00007fb5161e3603 in syscall_random (buflen=32, buf=<optimized out>) at ../crypto/rand/rand_unix.c:277
#3 0x00007fb5161e2d8d in rand_drbg_get_entropy (drbg=0x562856339e80, pout=0x7ffd1c2bce60, entropy=<optimized out>, min_len=<optimized out>, max_len=<optimized out>, prediction_resistance=0) at ../crypt$
#6 0x00007fb5161e222f in do_rand_drbg_init () at ../crypto/rand/drbg_lib.c:899
#7 do_rand_drbg_init_ossl_ () at ../crypto/rand/drbg_lib.c:884
#8 0x00007fb5150c9827 in __pthread_once_slow (once_control=0x7fb5163118f8 <rand_drbg_init>, init_routine=0x7fb5161e21d0 <do_rand_drbg_init_ossl_>) at pthread_once.c:116
#11 0x00007fb5161e2327 in RAND_DRBG_get0_master () at ../crypto/rand/drbg_lib.c:1010
#12 0x00007fb5161e235d in drbg_status () at ../crypto/rand/drbg_lib.c:992
#13 0x00005628556a253f in seed_rng () at ../../entropy.c:238
#14 0x000056285564b13c in main (ac=2, av=0x56285631b970) at ../../sshd.c:1696
Most of the stack is inside /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
e.g libssl1.1 that got also a new upload at that time.

Thanks for the investigation. (Note also that the OpenSSH version in
question is the one that switched from OpenSSL 1.0 to 1.1, which was a
big change.)

There were some significant changes in this area in OpenSSL 1.1.1.
Would it be possible to try running OpenSSH with OpenSSL 1.1.0h to see
if that makes a difference? Unfortunately this is a little complicated
as it will require doing a local build of the Debian OpenSSH source
package in order to reduce the dependency; let me know if you need help
with setting this up.

--
Colin Watson [***@debian.org]

Bernhard Übelacker

2018-10-29 00:00:01 UTC

Post by Colin Watson
Thanks for the investigation. (Note also that the OpenSSH version in
question is the one that switched from OpenSSL 1.0 to 1.1, which was a
big change.)
There were some significant changes in this area in OpenSSL 1.1.1.
Would it be possible to try running OpenSSH with OpenSSL 1.1.0h to see
if that makes a difference? Unfortunately this is a little complicated
as it will require doing a local build of the Debian OpenSSH source
package in order to reduce the dependency; let me know if you need help
with setting this up.

Hello Colin Watson,
I built a local package OpenSSH 7.9p1-1 against OpenSSL 1.1.0h like
described in the upper half of attached file.
This shows an normal start of the ssh service and login is
immediately after a restart possible,
running on linux-image-4.18.0-2-amd64 4.18.10-2.

Because in another bug suggested to test the previous kernel
with a similar issue with the login manager (that I cannot find right
now), I reverted back to regular OpenSSH 7.9p1-1 with OpenSSL 1.1.1-1
and it shows the same delay when runnging with kernel
linux-image-4.17.0-3-amd64 4.17.17-1.

Just found the possibly somehow related bug #910504, that proposes
the installation of rng-tools - but this just fails to start because
of "Cannot find a hardware RNG device to use.", with OpenSSH 7.9p1-1
with OpenSSL 1.1.1-1 at linux-image-4.18.0-2-amd64 4.18.10-2.

Kind regards,
Bernhard

Colin Watson

2018-10-29 00:40:01 UTC

Control: reassign -1 openssl 1.1.1-1
Control: affects -1 openssh

Post by Bernhard Ãbelacker

Post by Colin Watson
Thanks for the investigation. (Note also that the OpenSSH version in
question is the one that switched from OpenSSL 1.0 to 1.1, which was a
big change.)
There were some significant changes in this area in OpenSSL 1.1.1.
Would it be possible to try running OpenSSH with OpenSSL 1.1.0h to see
if that makes a difference? Unfortunately this is a little complicated
as it will require doing a local build of the Debian OpenSSH source
package in order to reduce the dependency; let me know if you need help
with setting this up.

Hello Colin Watson,
I built a local package OpenSSH 7.9p1-1 against OpenSSL 1.1.0h like
described in the upper half of attached file.
This shows an normal start of the ssh service and login is
immediately after a restart possible,
running on linux-image-4.18.0-2-amd64 4.18.10-2.
Because in another bug suggested to test the previous kernel
with a similar issue with the login manager (that I cannot find right
now), I reverted back to regular OpenSSH 7.9p1-1 with OpenSSL 1.1.1-1
and it shows the same delay when runnging with kernel
linux-image-4.17.0-3-amd64 4.17.17-1.
Just found the possibly somehow related bug #910504, that proposes
the installation of rng-tools - but this just fails to start because
of "Cannot find a hardware RNG device to use.", with OpenSSH 7.9p1-1
with OpenSSL 1.1.1-1 at linux-image-4.18.0-2-amd64 4.18.10-2.

Thanks for testing; it does look as though this problem is mainly due to
the changes to the OpenSSL random generator in 1.1.1, and the appearance
of it being an OpenSSH regression was caused by OpenSSH switching to
OpenSSL 1.1.x at around the same time.

Reassigning to OpenSSL - could the OpenSSL maintainers please have a
look and advise what's best to do? (See the start of the bug, reporting
a delay of more than one minute in system boot in some cases, mainly
waiting for sshd to start.)

Thanks,

--
Colin Watson [***@debian.org]

Kurt Roeckx

2018-10-29 07:40:02 UTC

Post by Colin Watson
Reassigning to OpenSSL - could the OpenSSL maintainers please have a
look and advise what's best to do? (See the start of the bug, reporting
a delay of more than one minute in system boot in some cases, mainly
waiting for sshd to start.)

The biggest change related to this is that we know use
getrandom()/getentropy() on kernels that have it, so kernels

Post by Colin Watson
= 3.17. And the kernel using that interface doesn't return random

numbers until it has been initialized.

Something should be initializing the kernel with random data from
the previous boot. This used to be done by /etc/init.d/urandom,
but I'm not sure if that's still used. This should be done as
early as possible during the boot not to cause such problems. You
should look into when during the boot process the kernel gets this
random data.

Kurt

Kurt Roeckx

2018-10-29 17:30:01 UTC

reassign 912087 openssh-server,systemd
thanks

Post by Kurt Roeckx

Post by Colin Watson
Reassigning to OpenSSL - could the OpenSSL maintainers please have a
look and advise what's best to do? (See the start of the bug, reporting
a delay of more than one minute in system boot in some cases, mainly
waiting for sshd to start.)

The biggest change related to this is that we know use
getrandom()/getentropy() on kernels that have it, so kernels

Post by Colin Watson
= 3.17. And the kernel using that interface doesn't return random

numbers until it has been initialized.
Something should be initializing the kernel with random data from
the previous boot. This used to be done by /etc/init.d/urandom,
but I'm not sure if that's still used. This should be done as
early as possible during the boot not to cause such problems. You
should look into when during the boot process the kernel gets this
random data.

So I believe this is not an openssl issue, but something in the
order that the kernel's RNG is initialized and openssh is started.
Potentionally the RNG isn't initialized at all and you actually
have to wait for the kernel to get it's random data from the slow
way.

So I'm reassigning this to systemd and openssh-server, I have no
idea where the problem really is.

Kurt

Sebastian Andrzej Siewior

2018-10-29 21:10:02 UTC

Post by Kurt Roeckx
So I believe this is not an openssl issue, but something in the
order that the kernel's RNG is initialized and openssh is started.
Potentionally the RNG isn't initialized at all and you actually
have to wait for the kernel to get it's random data from the slow
way.
So I'm reassigning this to systemd and openssh-server, I have no
idea where the problem really is.

I see it, too. So during boot someone invokes "sshd -t" which invokes
getrandom(, 32, 0)
and this blocks. I get a login prompt and everything but sshd is started
"late" due to this.

Post by Kurt Roeckx
Kurt

Sebastian

Kurt Roeckx

2018-10-29 22:40:02 UTC

Post by Sebastian Andrzej Siewior

Post by Kurt Roeckx
So I believe this is not an openssl issue, but something in the
order that the kernel's RNG is initialized and openssh is started.
Potentionally the RNG isn't initialized at all and you actually
have to wait for the kernel to get it's random data from the slow
way.
So I'm reassigning this to systemd and openssh-server, I have no
idea where the problem really is.

I see it, too. So during boot someone invokes "sshd -t" which invokes

That's:
ExecStartPre=/usr/sbin/sshd -t

Post by Sebastian Andrzej Siewior
getrandom(, 32, 0)
and this blocks.

And did systemd-random-seed.service get run before that?

Kurt

Sebastian Andrzej Siewior

2018-10-30 00:30:01 UTC

Post by Kurt Roeckx

Post by Sebastian Andrzej Siewior

Post by Kurt Roeckx
So I believe this is not an openssl issue, but something in the
order that the kernel's RNG is initialized and openssh is started.
Potentionally the RNG isn't initialized at all and you actually
have to wait for the kernel to get it's random data from the slow
way.
So I'm reassigning this to systemd and openssh-server, I have no
idea where the problem really is.

I see it, too. So during boot someone invokes "sshd -t" which invokes

ExecStartPre=/usr/sbin/sshd -t

Post by Sebastian Andrzej Siewior
getrandom(, 32, 0)
and this blocks.

And did systemd-random-seed.service get run before that?

Yes, but it does not matter from what I can see in the code. On my
system this writes 512 to /dev/urandom at timestamp 11.670639. But sshd
does this:

sshd-2638 [004] ....... 22.445819: __x64_sys_getrandom: 1| 32 0
sshd asks for 32 bytes (flags = 0)

sshd-2638 [004] ....... 22.445824: __x64_sys_getrandom: 2
-> crng_ready() is not true so we wait_for_random_bytes()

sshd-3164 [004] ....... 117.577454: __x64_sys_getrandom: 3
-> "crng init done", sshd's getrandom() resumed.

The problem is that the entropy is added but the entropy count is not
increased. So we wait.

Using ioctl(/dev/urandom, RNDADDENTROPY, ) instead writting to
/dev/urandom would do the trick. Or using RNDADDTOENTCNT to increment
the entropy count after it was written. Those two are documented in
random(4). Or RNDRESEEDCRNG could be used to force crng to be reseeded.
It does also the job, too.

Ted, is there any best practise what to do with the seed which as
extrected from /dev/urandom on system shutdown? Using RNDADDTOENTCNT to
speed up init or just write to back to urandom and issue RNDRESEEDCRNG?

Post by Kurt Roeckx
Kurt

Sebastian

Michael Biebl

2018-10-30 02:40:01 UTC

On Tue, 30 Oct 2018 01:18:08 +0100 Sebastian Andrzej Siewior

Post by Sebastian Andrzej Siewior
Using ioctl(/dev/urandom, RNDADDENTROPY, ) instead writting to
/dev/urandom would do the trick. Or using RNDADDTOENTCNT to increment
the entropy count after it was written. Those two are documented in
random(4). Or RNDRESEEDCRNG could be used to force crng to be reseeded.
It does also the job, too.
Ted, is there any best practise what to do with the seed which as
extrected from /dev/urandom on system shutdown? Using RNDADDTOENTCNT to
speed up init or just write to back to urandom and issue RNDRESEEDCRNG?

Sebastian, if you have more insight on this matter, please followup on
https://github.com/systemd/systemd/issues/4271

What systemd-random-seed currently does is more or less the same what
the old /etc/init.d/urandom init script did.

Michael

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

Theodore Y. Ts'o

2018-10-30 14:20:01 UTC

Post by Sebastian Andrzej Siewior
Using ioctl(/dev/urandom, RNDADDENTROPY, ) instead writting to
/dev/urandom would do the trick. Or using RNDADDTOENTCNT to increment
the entropy count after it was written. Those two are documented in
random(4). Or RNDRESEEDCRNG could be used to force crng to be reseeded.
It does also the job, too.
Ted, is there any best practise what to do with the seed which as
extrected from /dev/urandom on system shutdown? Using RNDADDTOENTCNT to
speed up init or just write to back to urandom and issue RNDRESEEDCRNG?

The reason why writing to /dev/[u]random via something like:

cat /var/lib/random/seed > /dev/random

Dosn't bump the the entropy counter is because it's possible that an
attacker could read /var/lib/random/seed. Even if the seed file is
refreshed on shutdown, (a) the attacker could have read the file while
the system is down, or (b) the system could have crashed so the seed
file was not refreshed and the attacker could have read the file
before the crash.

If you are using a VM, if the host has virtio-rng, using a kernel that
has virtio-rng support will solve the problem. For qemu, this means
you can enable via something like this:

-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-rng-pci,rng=rng0

If you are using Google Compute Engine, I can't comment about future
product features, but I would encourage you to file a feature request
bug with Google requesting virtio-rng support ASAP.

On any VM (cloud or on-prem), since you have to trust the host
*anyway*, with v4.19, you can add random.trust_cpu=on to the boot
command-line, or build the kernel with CONFIG_RANDOM_TRUST_CPU.

For the Debian 4.18 kernel, this can be backported via commits
39a8883a2b98 and 9b25436662d5.

- Ted

Kurt Roeckx

2018-10-30 18:40:02 UTC

Post by Theodore Y. Ts'o

Post by Sebastian Andrzej Siewior
Using ioctl(/dev/urandom, RNDADDENTROPY, ) instead writting to
/dev/urandom would do the trick. Or using RNDADDTOENTCNT to increment
the entropy count after it was written. Those two are documented in
random(4). Or RNDRESEEDCRNG could be used to force crng to be reseeded.
It does also the job, too.
Ted, is there any best practise what to do with the seed which as
extrected from /dev/urandom on system shutdown? Using RNDADDTOENTCNT to
speed up init or just write to back to urandom and issue RNDRESEEDCRNG?

cat /var/lib/random/seed > /dev/random
Dosn't bump the the entropy counter is because it's possible that an
attacker could read /var/lib/random/seed. Even if the seed file is
refreshed on shutdown, (a) the attacker could have read the file while
the system is down, or (b) the system could have crashed so the seed
file was not refreshed and the attacker could have read the file
before the crash.

So are you saying that the /var/lib/random/seed is untrusted, and
should never be used, and we should always wait for fresh entropy?

Anyway, I think if an attacker somehow has access to that file,
you have much more serious problems.

Kurt

Theodore Y. Ts'o

2018-10-30 21:00:02 UTC

Post by Kurt Roeckx
So are you saying that the /var/lib/random/seed is untrusted, and
should never be used, and we should always wait for fresh entropy?
Anyway, I think if an attacker somehow has access to that file,
you have much more serious problems.

So it's complicated. It's not a binary trusted/untrusted sort of
thing. We should definitely use it, and the fact we have it saved us
(at least after the system is installed) when there is a kernel bug
such as CVE-2018-1108 where we screwed up and treated the DMI table as
100% random and counted it towards required 256 bits of entropy needed
to consider the CRNG to be fully initialized.

If the attacker has access to the file, whether or not it matters
really depends on how the rest of the system is put together. So for
example, if you have secure boot (via a secured bootloader and a
signed kernel), and the root file system is protected using dm-verity,
the fact that seed file might be compromisable by an external attacker
is bad, but it's not necessarily catastrophic. (This is essential the
situation for ChromeOS and modern Android handsets, BTW.)

OTOH, there are definitely scenarios where you are correct, and if the
attacker has access to the files, you probably are toast, and so
therefore relying on it makes sense. Whether or not you think that is
more or less safer than relying on RDRAND is going to be a judgement
call, and very much depends on your assumptions of the threat
environment.

(Suppose in the future the Chinese come up with a 100% chinese made
CPU, that has a RDRAND equivalent; the US military might not be
comfortable relying on that CPU or its RDRAND unit, but the Chinese
Military might be perfectly comfortable relying on it; what a
Debian-provided kernel should when we're trying to be a "Universal
Operating System" is a very interesting question --- and that's why
random.trust_cpu is a boot command line option.)

In any case, if Debian wants to ship a program which reads a seed file
and uses it to initialize the random pull assuming that it's
trustworthy via the RNDADDENTROPY ioctl, that's not an insane thing to
do. My recommendation would be to make it be configurable, however,
just as whether we trust RDRAND should be trusted (in isolation) to
initialize the CRNG.

The point is that everyone is going to have a different opinion about
what entropy source is fully trusted, by itself, to initialize the
kernel's CRNG. We should mix in everything; but what we should
consider as trustworthy enough to give entropy credit is going to vary
from one sysadmin/system designer/system security officer to another.

Personally, I'm comfortable to run my personal kernel with
CONFIG_RANDOM_TRUST_CPU. I'm not willing to impose my beliefs on the
all Linux users, however.

Cheers,

- Ted

Sebastian Andrzej Siewior

2018-10-31 12:00:02 UTC

Post by Theodore Y. Ts'o
So it's complicated. It's not a binary trusted/untrusted sort of
thing.

What about RNDRESEEDCRNG? Would it be reasonable to issue it after writing the seed as part of the boot process?

Post by Theodore Y. Ts'o
Cheers,
- Ted

--
Sebastian

Theodore Y. Ts'o

2018-10-31 22:50:01 UTC

Post by Sebastian Andrzej Siewior

Post by Theodore Y. Ts'o
So it's complicated. It's not a binary trusted/untrusted sort of
thing.

What about RNDRESEEDCRNG? Would it be reasonable to issue it after writing the seed as part of the boot process?

No, that's for debugging purposes only.

When there is sufficient entropy added (either through a hw_random
subsystem, or because RDRAND is trusted, or the RNDADDENTORPY ioctl),
the crng is automatically reseeded by credit_entropy_bits(). So it's
not needed to use RNDRESEEDCRNG.

- Ted

Sebastian Andrzej Siewior

2018-11-01 22:30:02 UTC

Post by Theodore Y. Ts'o

Post by Sebastian Andrzej Siewior

Post by Theodore Y. Ts'o
So it's complicated. It's not a binary trusted/untrusted sort of
thing.

What about RNDRESEEDCRNG? Would it be reasonable to issue it after writing the seed as part of the boot process?

No, that's for debugging purposes only.

Okay. I'm asking because it has been added to the kernel, marked stable
and the man page has not been updated. So it did not look like a
debugging interface :)

Post by Theodore Y. Ts'o
When there is sufficient entropy added (either through a hw_random
subsystem, or because RDRAND is trusted, or the RNDADDENTORPY ioctl),
the crng is automatically reseeded by credit_entropy_bits(). So it's
not needed to use RNDRESEEDCRNG.

Okay. So you wrote what can be done for a system with HW-RNG/kvm. On
bare metal with nothing fancy I have:
[ 3.544985] systemd[1]: systemd 239 running in system mode. (+PAM…
[ 10.363377] r8169 0000:05:00.0 eth0: link up
[ 41.966375] random: crng init done

which means I have to wait about half a minute until I can ssh into. And
there is no way to speed it up?
You did not oppose RNDADDTOENTCNT/RNDADDENTROPY but you wanted to make
it configureable and not default, correct?

Post by Theodore Y. Ts'o
- Ted

Sebastian

Theodore Y. Ts'o

2018-11-02 00:00:01 UTC

Post by Sebastian Andrzej Siewior
Okay. So you wrote what can be done for a system with HW-RNG/kvm. On
[ 3.544985] systemd[1]: systemd 239 running in system mode. (+PAM…
[ 10.363377] r8169 0000:05:00.0 eth0: link up
[ 41.966375] random: crng init done
which means I have to wait about half a minute until I can ssh into. And
there is no way to speed it up?

So that surprises me. Can you tell me more about the hardware? Is it
something like a Rasberry Pi? Or is it an x86 server or desktop? In
my experience for most x86 platforms this isn't an issue.

The main reason why I've talked about VM system is because this is
where it where most of the problems that people ahve reported to me.

Here's the problem: if we "speed it up" inappropriately, you're
risking the security of the ssh. If people who are making a print
server or Wifi Rounter who screw it up, they're the ones who are at
fault. (And this isn't hypothetical. See https://factorable.net)

So if I make a blanket recommendation, and it causes Debian to ship
some kind of default that causes Debian users to be insecure, I'm
going to be feel really bad. This is why I'm very cautious about what
I say. If you want to do whatever you want on your own system, hey
consulting adults can do whatever they want. :-)

Post by Sebastian Andrzej Siewior
You did not oppose RNDADDTOENTCNT/RNDADDENTROPY but you wanted to make
it configureable and not default, correct?

I'd want to see a full design doc, or a git repository, or set of
changes before I give it an unqualified endorsement, but there *are*
configurations where such a thing would be sane.

That's the problem with security recommendations. It's much like a
lawyer giving legal advice. They're very careful about doing that in
an unstructured circumstances. If it gets taken in the wrong way,
they could be legally liable and people might blame/sue them.

And then on top of that, there are the political considerations.
Suppose I told you, "just use RDRAND and be happy". Some people who
sure that RDRAND has been backdoored would claim that I'm in the
pocket of the NSA and/or Intel. That's why all I'm going to say is,
"I'm comfortable turning RDRAND on my own systems; you can do what you
want."

Cheers,

- Ted

P.S. Although if I were going to generate a high-value key, I *would*
plug in my handy-dandy Chaos Key[1] first. Keith gave a
presentation[2] about it at Debconf 16.

[1] https://keithp.com/blogs/chaoskey/
[2] https://debconf16.debconf.org/talks/94/

And certainly if you were doing something where you had millions of
dollars at risk, or where the EU might fine you into oblivion for
millions of Euros due to some privacy exposure of your users, I
certainly would recommend that you spend the $40 USD to get a Chaos
Key and just be *done* with it.

Kurt Roeckx

2018-11-02 00:30:01 UTC

Post by Theodore Y. Ts'o

Post by Sebastian Andrzej Siewior
Okay. So you wrote what can be done for a system with HW-RNG/kvm. On
[ 3.544985] systemd[1]: systemd 239 running in system mode. (+PAM…
[ 10.363377] r8169 0000:05:00.0 eth0: link up
[ 41.966375] random: crng init done
which means I have to wait about half a minute until I can ssh into. And
there is no way to speed it up?

So that surprises me. Can you tell me more about the hardware? Is it
something like a Rasberry Pi? Or is it an x86 server or desktop? In
my experience for most x86 platforms this isn't an issue.

The original poster had:
Architecture: amd64 (x86_64)
Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores)

But I'm not sure if that's a real machine or some virtual host,
I'm going to guess it's a virtual host.

Anyway, on my laptop I get:
[ 12.675935] random: crng init done

If the TPM is enabled, I also have an /etc/hwrng, but rng-tools is
started later after the init is done.

On my desktop (with a chaos key attached)
[ 3.844484] random: crng init done
[ 5.312406] systemd[1]: systemd 239 running in system mode.

Kurt

Theodore Y. Ts'o

2018-11-02 02:20:01 UTC

Post by Kurt Roeckx
[ 12.675935] random: crng init done
If the TPM is enabled, I also have an /etc/hwrng, but rng-tools is
started later after the init is done.
On my desktop (with a chaos key attached)
[ 3.844484] random: crng init done
[ 5.312406] systemd[1]: systemd 239 running in system mode.

Starting with the 3.17 kernel, the kernel will automatically pull from
hardware random number generators without needing to install a user
space daemon, such as rng-tools. For most hardware devices, it is not
enabled by default, so you have to enable by adding something like
"rng_core.default_quality=700" to the kernel boot line.

There are *two* devices which are an exception to this rule. The
first is virtio_rng, since the assumption is if you are using a VM,
you had better trust the host infrastructure or you have much worse
problems. The second is the driver for the Chaos Key. That appears
to be because the author of the driver for the Chaos Key wasn't aware
of the general policy that hardware rng's shouldn't be trusted by
default, and the driver was coded violating that policy.

This is why (with a chaos key attached) you see the "crng init done"
message so early, *before* the root file system is mounted. (The root
file system gets mounted after the "systemd running in system mode"
message is logged.)

This is better than relying on rng-toonls, since we can initialize the
CRNG must earlier in the boot process. (It should have been the case
that this would only happen if you configured by setting the
rng_core.default_quality parameter, but see above about how the Chaos
Key driver is currently violating policy.)

In the future I should change the kernel so you can explicitly specify
something like tpm.rng_quality=500 and chaos_key.rng_quality=1000 on
the boot command line. That way the system administrator can be very
explicit about which hwrng they trust; right now what we have is not
ideal since it's not clear which hwrng the system administrator wanted
to configure as trusted, and if you have more than one hwnrg in the
system (say, a closed source, proprietary tpm, and an open hardware
Chaos Key) you can't say which one you want to have trusted.

Cheers,

- Ted

Sebastian Andrzej Siewior

2018-11-04 00:30:01 UTC

Post by Theodore Y. Ts'o

Post by Sebastian Andrzej Siewior
Okay. So you wrote what can be done for a system with HW-RNG/kvm. On
[ 3.544985] systemd[1]: systemd 239 running in system mode. (+PAM…
[ 10.363377] r8169 0000:05:00.0 eth0: link up
[ 41.966375] random: crng init done
which means I have to wait about half a minute until I can ssh into. And
there is no way to speed it up?

So that surprises me. Can you tell me more about the hardware? Is it
something like a Rasberry Pi? Or is it an x86 server or desktop? In
my experience for most x86 platforms this isn't an issue.

another boot on the same box:
| dmesg |grep -i random
| [ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x52e with crng_init=0
| [ 1.774332] random: fast init done
| [ 7.318640] random: systemd: uninitialized urandom read (16 bytes read)
| [ 7.318925] random: systemd: uninitialized urandom read (16 bytes read)
| [ 7.338074] random: systemd: uninitialized urandom read (16 bytes read)
| [ 68.791389] random: crng init done
| [ 68.791397] random: 7 urandom warning(s) missed due to ratelimiting

This is a headless i7-Sandy Bridge. A small rootfs partition and there
are hardly any daemons comming up. It waits for a remote login. Running
Debian unstable (incl. kernel).

Post by Theodore Y. Ts'o
The main reason why I've talked about VM system is because this is
where it where most of the problems that people ahve reported to me.

Yes. Thanks for that. I have another box which I use as a desktop
machine (basically a terminal). It is older than the i7 but I unlock the
crypted root disk as part of the boot process and I assume that due to
this it initializes in less than 10secs. Same goes for my notebook. But
the i7 has just two cables…

Post by Theodore Y. Ts'o
So if I make a blanket recommendation, and it causes Debian to ship
some kind of default that causes Debian users to be insecure, I'm
going to be feel really bad. This is why I'm very cautious about what
I say. If you want to do whatever you want on your own system, hey
consulting adults can do whatever they want. :-)

I have a few other headless boxes but those are newer and support
RDRAND. I assume that this makes a difference because otherwise I don't
see a difference (and they don't take long to init).

Post by Theodore Y. Ts'o

Post by Sebastian Andrzej Siewior
You did not oppose RNDADDTOENTCNT/RNDADDENTROPY but you wanted to make
it configureable and not default, correct?

I'd want to see a full design doc, or a git repository, or set of
changes before I give it an unqualified endorsement, but there *are*
configurations where such a thing would be sane.
That's the problem with security recommendations. It's much like a
lawyer giving legal advice. They're very careful about doing that in
an unstructured circumstances. If it gets taken in the wrong way,
they could be legally liable and people might blame/sue them.
And then on top of that, there are the political considerations.
Suppose I told you, "just use RDRAND and be happy". Some people who
sure that RDRAND has been backdoored would claim that I'm in the
pocket of the NSA and/or Intel. That's why all I'm going to say is,
"I'm comfortable turning RDRAND on my own systems; you can do what you
want."

Okay, okay. Let sum that up:
- openssh uses openssl's random number generator which now uses
getrandom().
- getrandom() blocks until the random pool is initializes. Can be
checked in dmesg:
[ TIME.STAMP] random: crng init done
This wasn't the case earlier where /dev/urandom was used.
- random entropy like interrupts or HW random support (<ad> chaos
key</ad>) will speed the initalisation process up.
- emulated hardware / KVM can take long to init but it helps if a hw-rnd
device is added as part of qemu setup.
- it is possible to manually increase the entropy count and/or tell the
random pool to init asap but it shouldn't be done because it will
probably lead to weak random pool and probably used in wrong setups.

Post by Theodore Y. Ts'o
Cheers,
- Ted
P.S…
I
certainly would recommend that you spend the $40 USD to get a Chaos
Key and just be *done* with it.

I do own a Nitrokey which can create random. That is not the problem. I
just have one devel box which requires me to wait a minute before I can
login and I have to figure out how to deal with it.

Sebastian

Uwe Kleine-König

2018-12-12 16:30:01 UTC

Post by jim_p
Dear Maintainer,
just some more information, because I think I see this
difference in my qemu buster amd64 VM too.
Before I could ssh into that machine just after some seconds.
Now it takes some time until that line "random: crng init done"
appears in dmesg.
With logging in in the qemu window this line appears just after a
few seconds, when just trying via ssh it takes much longer.

I have a NAS running Debian[1] and there I only get the "crng init done"
message after more than 15 minutes (even though there is network activity and I
debugged the failure via rs232):

***@machine:~# dmesg | tail
[ 22.579884] audit: type=1400 audit(1529665920.708:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=299 comm="apparmor_parser"
[ 22.627877] audit: type=1400 audit(1529665920.708:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=299 comm="apparmor_parser"
[ 22.711102] 8021q: 802.1Q VLAN Support v1.8
[ 22.840150] mvneta d0074000.ethernet eth1: PHY [d0072004.mdio-mii:01] driver [Marvell 88E1318S]
[ 22.856020] mvneta d0074000.ethernet eth1: configuring for phy/rgmii-id link mode
[ 22.872501] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 26.020202] mvneta d0074000.ethernet eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 26.027639] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 1057.771583] random: crng init done
[ 1057.774739] random: 7 urandom warning(s) missed due to ratelimiting

I'm not aware the machine has a random number generator, so the solutions
presented here up to now don't help me.

I think the idea to mention this in the release notes for buster is a
good one (unless a solution is found until then of course).

Best regards
Uwe

[1] this i an armv7/mvebu machine with a Armada 370 CPU

Uwe Kleine-König

2018-12-15 08:30:01 UTC

Hello,

Post by Uwe Kleine-KÃ¶nig
[ 1057.771583] random: crng init done
[ 1057.774739] random: 7 urandom warning(s) missed due to ratelimiting
I'm not aware the machine has a random number generator, so the solutions
presented here up to now don't help me.

For the record: After just powering on the machine and waiting for the
crng to be initialized I once saw the "crng init done" only after 1h40.

I now installed haveged and now the time is down to

[ 27.490557] random: crng init done

which is at least bearable.

Post by Uwe Kleine-KÃ¶nig
I think the idea to mention this in the release notes for buster is a
good one (unless a solution is found until then of course).

Maybe openssh should recommend haveged?

Best regards
Uwe

jim_p

2018-10-29 05:30:01 UTC

Package: openssl
Version: 1.1.1-1
Followup-For: Bug #912087

Now that by bug report has been switched to the openssl package, I would like
to add that after seeing the above post about entropy, I installed haveged and
now openssh-server starts in a fraction of the mentioned time.

On top of that, lightdm also starts faster (= it gets me to the login prompt
quicker) than what it used to a few days ago.
I was about to open a new bug report for it, but I do not know if it is related
to this issue.

-- System Information:
Debian Release: buster/sid
APT prefers testing
APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages openssl depends on:
ii libc6 2.27-6
ii libssl1.1 1.1.1-1

openssl recommends no packages.

Versions of packages openssl suggests:
ii ca-certificates 20170717

-- no debconf information

Kurt Roeckx

2018-10-29 21:00:01 UTC

Post by Kurt Roeckx
reassign 912087 openssh-server,systemd
thanks

Post by Kurt Roeckx

Post by Colin Watson
Reassigning to OpenSSL - could the OpenSSL maintainers please have a
look and advise what's best to do? (See the start of the bug, reporting
a delay of more than one minute in system boot in some cases, mainly
waiting for sshd to start.)

The biggest change related to this is that we know use
getrandom()/getentropy() on kernels that have it, so kernels

Post by Colin Watson
= 3.17. And the kernel using that interface doesn't return random

numbers until it has been initialized.
Something should be initializing the kernel with random data from
the previous boot. This used to be done by /etc/init.d/urandom,
but I'm not sure if that's still used. This should be done as
early as possible during the boot not to cause such problems. You
should look into when during the boot process the kernel gets this
random data.

So I believe this is not an openssl issue, but something in the
order that the kernel's RNG is initialized and openssh is started.
Potentionally the RNG isn't initialized at all and you actually
have to wait for the kernel to get it's random data from the slow
way.

The service is called systemd-random-seed.service and stores the random
seed during shutdown and restores it during boot. Pretty much as urandom
did under sysvinit.
This service is run in sysinit.target, ssh.service is started in
multi-user.target, which is ordered after sysvinit.target.

Clearly it's not working for the person reporting this issue,
or we wouldn't have this discussion.

Kurt

Olaf van der Spek

2018-11-18 09:50:01 UTC

Package: openssh-server
Version: 1:7.9p1-4
Followup-For: Bug #912087

Hi,

The first is virtio_rng, since the assumption is if you are using a VM, you had better trust the host infrastructure or you have much worse problems.

Doesn't that apply to non-virtual hardware as well?

BTW, mariadb and php-fpm are affected as well.
If services can't assume the rng being ready, they should have a dependency on the rng such that systemd doesn't attempt to start them before it's ready.

Gr,

Olaf

-- System Information:
Debian Release: buster/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages openssh-server depends on:
ii adduser 3.118
ii debconf [debconf-2.0] 1.5.69
ii dpkg 1.19.2
ii libaudit1 1:2.8.4-2
ii libc6 2.27-8
ii libcom-err2 1.44.4-2
ii libgssapi-krb5-2 1.16.1-1
ii libkrb5-3 1.16.1-1
ii libpam-modules 1.1.8-3.8
ii libpam-runtime 1.1.8-3.8
ii libpam0g 1.1.8-3.8
ii libselinux1 2.8-1+b1
ii libssl1.1 1.1.1-2
ii libsystemd0 239-11
ii libwrap0 7.6.q-27
ii lsb-base 9.20170808
ii openssh-client 1:7.9p1-4
ii openssh-sftp-server 1:7.9p1-4
ii procps 2:3.3.15-2
ii ucf 3.0038
ii zlib1g 1:1.2.11.dfsg-1

Versions of packages openssh-server recommends:
ii libpam-systemd 239-11
ii ncurses-term 6.1+20181013-1
ii xauth 1:1.0.10-1

Versions of packages openssh-server suggests:
pn molly-guard <none>
pn monkeysphere <none>
pn rssh <none>
pn ssh-askpass <none>
pn ufw <none>

-- debconf information:
openssh-server/permit-root-login: false
openssh-server/password-authentication: true

Sebastian Andrzej Siewior

2018-11-26 23:10:01 UTC

control: reassign -1 systemd 239-13

I hereby reassign the bug to systemd.
The problem is that OpenSSL is now using getrandom() for entropy which
is not (yet) ready and therefore OpenSSH needs longer for startup by
simply waiting for entropy.

Theodore Y. Ts'o suggested adding hw-rng to KVM/virt setups [0].
Everything else should work just "fine".

Either way there is nothing OpenSSL wise that can be done. A similar
issue [1] has been reported once GnuTLS which to getrandom().

Systemd wise there is
https://github.com/systemd/systemd/issues/4271
https://github.com/systemd/systemd/pull/10621
where people want systemd to systemd to credit entropy. I'm not much a
fan of that but that is a different story.

[0] https://bugs.debian.org/912087#118
[1] https://bugs.debian.org/837597

Sebastian

Sebastian Andrzej Siewior

2018-11-27 13:20:01 UTC

Sigh, and there is nothing that systemd can do to fix this, so I don't
see a point re-assigning this to systemd (again).

I was about to close it myself then I though maybe reassign in case you
want close it if that PR gets merged it.

Even if this PR is merged, upstream made it very clear that this will be
explicitly opt-in, so users will still be affected by default.

Yes, I am not a fan that.

They might just as well install haveged or configure virtio-rng in such
a case.

Right. Do you think, that it would necessary to add something to the
release notes?

Oh well, I case this means we just have to close this issue as wontfix then.

Thank you.

Sebastian

Olaf van der Spek

2018-11-28 12:50:01 UTC

Op di 27 nov. 2018 om 14:12 schreef Sebastian Andrzej Siewior

Post by Sebastian Andrzej Siewior

Sigh, and there is nothing that systemd can do to fix this, so I don't
see a point re-assigning this to systemd (again).

I was about to close it myself then I though maybe reassign in case you
want close it if that PR gets merged it.

Even if this PR is merged, upstream made it very clear that this will be
explicitly opt-in, so users will still be affected by default.

Yes, I am not a fan that.

They might just as well install haveged or configure virtio-rng in such
a case.

Right. Do you think, that it would necessary to add something to the
release notes?

I do. ;)
What's the workaround for VMware?

Does it just take longer to start or do some services not start at all?

--
Olaf

Sebastian Andrzej Siewior

2018-11-29 14:00:02 UTC

Post by Olaf van der Spek

Post by Sebastian Andrzej Siewior

They might just as well install haveged or configure virtio-rng in such
a case.

Right. Do you think, that it would necessary to add something to the
release notes?

I do. ;)
What's the workaround for VMware?
Does it just take longer to start or do some services not start at all?

It will take longer to start, it will start. Let me pass that workaround
question to someone from vmware/virtualbox and #912087:

On a recent kernel you see something like that:
[ 68.823013] random: crng init done

which means it took almost 69 seconds for the system to initialize its
rng. OpenSSH 7.9 can be compilied against OpenSSL 1.1.1 which in turn
switched to getrandom() (for its randomness).
This syscall will block until kernel's rng is ready which took in this
example almost 69 seconds.
Those "high" numbers are not a problem on decent/recent HW but occur
oftern on in virtualized environments.
For KVM we have CONFIG_HW_RANDOM_VIRTIO. Are there any plans to get
something similar for VMware/Vbox?

[0] http://bugs.debian.org/912087

Sebastian

Olaf van der Spek

2018-11-29 14:10:01 UTC

Op do 29 nov. 2018 om 14:54 schreef Sebastian Andrzej Siewior

Post by Sebastian Andrzej Siewior

Post by Olaf van der Spek

Post by Sebastian Andrzej Siewior

They might just as well install haveged or configure virtio-rng in such
a case.

Right. Do you think, that it would necessary to add something to the
release notes?

I do. ;)
What's the workaround for VMware?
Does it just take longer to start or do some services not start at all?

It will take longer to start, it will start. Let me pass that workaround

Are you sure?
I've had php-fpm not start due to this, I think:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906686

Gr,

--
Olaf

Olaf van der Spek

2019-01-02 13:50:02 UTC

Post by Olaf van der Spek
Op do 29 nov. 2018 om 14:54 schreef Sebastian Andrzej Siewior

Post by Sebastian Andrzej Siewior

Post by Olaf van der Spek

Post by Sebastian Andrzej Siewior

They might just as well install haveged or configure virtio-rng in such
a case.

Right. Do you think, that it would necessary to add something to the
release notes?

I do. ;)
What's the workaround for VMware?
Does it just take longer to start or do some services not start at all?

It will take longer to start, it will start. Let me pass that workaround

Are you sure?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906686

Today on a Digital Ocean VM:
Jan 2 13:22:29 www systemd[1]: php7.3-fpm.service: Start operation
timed out. Terminating.
Jan 2 13:22:29 www systemd[1]: ssh.service: Start-pre operation timed
out. Terminating.
Jan 2 13:22:29 www systemd[1]: nginx.service: Start-pre operation
timed out. Terminating.
Jan 2 13:22:29 www systemd[1]: php7.3-fpm.service: Main process
exited, code=killed, status=15/TERM
Jan 2 13:22:29 www systemd[1]: php7.3-fpm.service: Failed with result
'timeout'.
Jan 2 13:22:29 www systemd[1]: Failed to start The PHP 7.3 FastCGI
Process Manager.
Jan 2 13:22:29 www systemd[1]: nginx.service: Control process exited,
code=killed, status=15/TERM
Jan 2 13:22:29 www systemd[1]: nginx.service: Failed with result 'timeout'.
Jan 2 13:22:29 www systemd[1]: Failed to start A high performance web
server and a reverse proxy server.
Jan 2 13:22:29 www systemd[1]: ssh.service: Control process exited,
code=killed, status=15/TERM
Jan 2 13:22:29 www systemd[1]: ssh.service: Failed with result 'timeout'.
Jan 2 13:22:29 www systemd[1]: Failed to start OpenBSD Secure Shell server.
Jan 2 13:22:29 www systemd[1]: Reached target Multi-User System.
Jan 2 13:22:29 www systemd[1]: Starting Execute cloud user/final scripts...
Jan 2 13:22:29 www systemd[1]: Reached target Graphical Interface.
Jan 2 13:22:29 www systemd[1]: Starting Update UTMP about System
Runlevel Changes...
Jan 2 13:22:29 www kernel: [ 97.513700] urandom_read: 3 callbacks suppressed
Jan 2 13:22:29 www kernel: [ 97.513702] random: cloud-init:
uninitialized urandom read (24 bytes read)
Jan 2 13:22:29 www systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jan 2 13:22:29 www systemd[1]: Started Update UTMP about System
Runlevel Changes.
Jan 2 13:22:29 www systemd[1]: ssh.service: Service RestartSec=100ms
expired, scheduling restart.
Jan 2 13:22:29 www systemd[1]: ssh.service: Scheduled restart job,
restart counter is at 1.
Jan 2 13:22:29 www systemd[1]: Stopped OpenBSD Secure Shell server.
Jan 2 13:22:29 www systemd[1]: Starting OpenBSD Secure Shell server...

Jan 2 13:26:06 www kernel: [ 314.100132] random: crng init done
Jan 2 13:26:06 www systemd[1]: Started OpenBSD Secure Shell server.

I think more than some release notes is required.

--
Olaf

Andy Simpkins

2019-01-19 22:10:03 UTC

Follow up report:

Bare metal install onto an APM Mustang board (see debian arm64 buildds)
of debian-buster-DI-alpha4-arm64-DVD-1.iso [1]

sshd takes > 7 min to start [2]

This is clearly going to be a problem for Buster as things stand...

/Andy

[1] DI alpha4 uses kernel 4.18.20-2 (2018-11-23)
I am not updating the current Buster kernel (4.19.0-1-arm64) because
I am seeing a problem with it causing mustangs to panic during init
(this is the next bug I will report)

[2] ***@sally:~# systemctl status ssh
ssh.service - OpenBSD Secure Shell server
Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor
preset: enab
Active: activating (start-pre) since Sat 2019-01-19 21:23:39 GMT;
2min 16s ag
Docs: man:sshd(8)
man:sshd_config(5)
Cntrl PID: 459 (sshd)
Tasks: 1 (limit: 4915)
Memory: 2.8M
CGroup: /system.slice/ssh.service
��└��─459 /usr/sbin/sshd -t

syslog extract:
Jan 19 21:30:49 sally systemd[1]: Starting OpenBSD Secure Shell
server...

Jan 19 21:31:01 sally CRON[516]: (root) CMD ( test -x
/etc/cron.daily/popularity-contest && /etc/cron.daily/popularity-
contest --crond)

Jan 19 21:31:58 sally kernel: [ 442.940274] random: crng init done

Jan 19 21:31:58 sally kernel: [ 442.940280] random: 7 urandom
warning(s) missed due to ratelimiting

Jan 19 21:31:58 sally systemd[1]: Started OpenBSD Secure Shell
server.

On Wed, 2 Jan 2019 14:45:18 +0100 Olaf van der Spek

Post by Olaf van der Spek

Post by Olaf van der Spek
Op do 29 nov. 2018 om 14:54 schreef Sebastian Andrzej Siewior

Post by Sebastian Andrzej Siewior

Post by Olaf van der Spek

Post by Sebastian Andrzej Siewior

They might just as well install haveged or configure virtio-rng in such
a case.

Right. Do you think, that it would necessary to add something to the
release notes?

I do. ;)
What's the workaround for VMware?
Does it just take longer to start or do some services not start at all?

It will take longer to start, it will start. Let me pass that workaround

Are you sure?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906686

Jan 2 13:22:29 www systemd[1]: php7.3-fpm.service: Start operation
timed out. Terminating.
Jan 2 13:22:29 www systemd[1]: ssh.service: Start-pre operation timed
out. Terminating.
Jan 2 13:22:29 www systemd[1]: nginx.service: Start-pre operation
timed out. Terminating.
Jan 2 13:22:29 www systemd[1]: php7.3-fpm.service: Main process
exited, code=killed, status=15/TERM
Jan 2 13:22:29 www systemd[1]: php7.3-fpm.service: Failed with result
'timeout'.
Jan 2 13:22:29 www systemd[1]: Failed to start The PHP 7.3 FastCGI
Process Manager.
Jan 2 13:22:29 www systemd[1]: nginx.service: Control process exited,
code=killed, status=15/TERM
Jan 2 13:22:29 www systemd[1]: nginx.service: Failed with result 'timeout'.
Jan 2 13:22:29 www systemd[1]: Failed to start A high performance web
server and a reverse proxy server.
Jan 2 13:22:29 www systemd[1]: ssh.service: Control process exited,
code=killed, status=15/TERM
Jan 2 13:22:29 www systemd[1]: ssh.service: Failed with result 'timeout'.
Jan 2 13:22:29 www systemd[1]: Failed to start OpenBSD Secure Shell server.
Jan 2 13:22:29 www systemd[1]: Reached target Multi-User System.
Jan 2 13:22:29 www systemd[1]: Starting Execute cloud user/final scripts...
Jan 2 13:22:29 www systemd[1]: Reached target Graphical Interface.
Jan 2 13:22:29 www systemd[1]: Starting Update UTMP about System
Runlevel Changes...
Jan 2 13:22:29 www kernel: [ 97.513700] urandom_read: 3 callbacks suppressed
uninitialized urandom read (24 bytes read)
Jan 2 13:22:29 www systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jan 2 13:22:29 www systemd[1]: Started Update UTMP about System
Runlevel Changes.
Jan 2 13:22:29 www systemd[1]: ssh.service: Service RestartSec=100ms
expired, scheduling restart.
Jan 2 13:22:29 www systemd[1]: ssh.service: Scheduled restart job,
restart counter is at 1.

Matt Zagrabelny

2018-11-29 22:20:01 UTC

Greetings,

Sorry I am late to the discussion.

I have some systems running in a VMWare infrastructure and things like ssh
and puppet are not starting (puppet times out and fails) or taking a long
time to start (ssh).

If I install sshd, I expect it to start (somewhat quickly) not 200 seconds
after boot. If I install puppet, I expect it to start - not timeout and
fail.

It seems installing haveged has "fixed" these issues [Thanks jim_p <
***@gmail.com>] , but knowing those specific solutions remains the
challenge.

I've spent a lot of time debugging, reading, and researching to find these
bug reports and threads. It takes a lot of time and even then I'm not
certain of the appropriateness of the fix. Is installing haveged the right
solution for a headless VM that needs to have ssh and puppet running?

Right now systemd upstream is dubious of PR 10621 and according to Michael
Biebl it will be opt-in only. PS. Thank you Michael for your work on
systemd.

Where do we communicate to admins that their services might be slow to
start or fail to start and they need to manually install some package
(haveged) or touch a systemd config (PR 10621)?

Thanks everyone for their input in this discussion!

It is appreciated!

-m

Nathan Anthonypillai

2018-12-01 14:30:01 UTC

Greetings,

First off, thank you for your work on this issue.

After spending a bit more that 6 hours strugging with this issue on a KVM VM
running headless debian-testing, I simply want to add the following:

- If you do not realize that the problem has something to do with crng, finding
this bug report on a search engine is a bit difficult.

- Installing haveged is what finally solved it and brough me straight here.

Finding the solution was a bit of a stroke of luck (i.e. haveged is one of the
packages I install on production VMs from my ansible playbook, and one such VM
did not have this issue).

It would definitely have saved me a lot of time if this issue was more easily
accessible through a search engine / listchanges mentioned it.

NB: Going through bug reports for openssh-server on the tracker is one of the
first thing I did, unfortunately it had been bumped somewhere else by that time.

Hopefully this entry will help people using a search engine to figure out why
they need to log in on their VM through console for sshd to start listening
on the network.

Once again, thank you to everyone who pitched in in here.

Best regards,

--
Nathan Anthonypillai
GPG Public Key: https://pgp.key-server.io/search/***@anthonypillai.net

Michael Biebl

2018-12-01 15:00:01 UTC

Control: reassign openssh-server 1:7.9p1-1

Post by Matt Zagrabelny
Greetings,
First off, thank you for your work on this issue.
After spending a bit more that 6 hours strugging with this issue on a KVM VM
- If you do not realize that the problem has something to do with crng, finding
this bug report on a search engine is a bit difficult.
- Installing haveged is what finally solved it and brough me straight here.
Finding the solution was a bit of a stroke of luck (i.e. haveged is one of the
packages I install on production VMs from my ansible playbook, and one such VM
did not have this issue).
It would definitely have saved me a lot of time if this issue was more easily
accessible through a search engine / listchanges mentioned it.
NB: Going through bug reports for openssh-server on the tracker is one of the
first thing I did, unfortunately it had been bumped somewhere else by that time.
Hopefully this entry will help people using a search engine to figure out why
they need to log in on their VM through console for sshd to start listening
on the network.
Once again, thank you to everyone who pitched in in here.

Good point. Re-assigning back to make the bug report easier to find.
As explained in previous posts, it wasn't a systemd bug to begin with

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

Ryan Goodfellow

2018-12-09 01:10:02 UTC

For people using libvirt, I've found that adding the following to a
domain solves this problem.

<rng model="virtio">
<backend model="random">/dev/urandom</backend>
</rng>

Thanks for the useful pointer in this direction Ted.

--
~ ry

36 Replies
4 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

jim_p 2018-10-28 05:40:01 UTC

Bernhard Übelacker 2018-10-28 10:50:02 UTC

Colin Watson 2018-10-28 18:50:01 UTC

Bernhard Übelacker 2018-10-29 00:00:01 UTC

Colin Watson 2018-10-29 00:40:01 UTC

Kurt Roeckx 2018-10-29 07:40:02 UTC

Kurt Roeckx 2018-10-29 17:30:01 UTC

Sebastian Andrzej Siewior 2018-10-29 21:10:02 UTC

Kurt Roeckx 2018-10-29 22:40:02 UTC

Sebastian Andrzej Siewior 2018-10-30 00:30:01 UTC

Michael Biebl 2018-10-30 02:40:01 UTC

Theodore Y. Ts'o 2018-10-30 14:20:01 UTC

Kurt Roeckx 2018-10-30 18:40:02 UTC

Theodore Y. Ts'o 2018-10-30 21:00:02 UTC

Sebastian Andrzej Siewior 2018-10-31 12:00:02 UTC

Theodore Y. Ts'o 2018-10-31 22:50:01 UTC

Sebastian Andrzej Siewior 2018-11-01 22:30:02 UTC

Theodore Y. Ts'o 2018-11-02 00:00:01 UTC

Kurt Roeckx 2018-11-02 00:30:01 UTC

Theodore Y. Ts'o 2018-11-02 02:20:01 UTC

Sebastian Andrzej Siewior 2018-11-04 00:30:01 UTC

Uwe Kleine-König 2018-12-12 16:30:01 UTC

Uwe Kleine-König 2018-12-15 08:30:01 UTC

jim_p 2018-10-29 05:30:01 UTC

Kurt Roeckx 2018-10-29 21:00:01 UTC

Olaf van der Spek 2018-11-18 09:50:01 UTC

Sebastian Andrzej Siewior 2018-11-26 23:10:01 UTC

Sebastian Andrzej Siewior 2018-11-27 13:20:01 UTC

Olaf van der Spek 2018-11-28 12:50:01 UTC

Sebastian Andrzej Siewior 2018-11-29 14:00:02 UTC

Olaf van der Spek 2018-11-29 14:10:01 UTC

Olaf van der Spek 2019-01-02 13:50:02 UTC

Andy Simpkins 2019-01-19 22:10:03 UTC

Matt Zagrabelny 2018-11-29 22:20:01 UTC

Nathan Anthonypillai 2018-12-01 14:30:01 UTC

Michael Biebl 2018-12-01 15:00:01 UTC

Ryan Goodfellow 2018-12-09 01:10:02 UTC

about - legalese

Loading...