Discussion:
raise fileno limit to make Steam Proton / Wine+esync work well in Fedora
Kamil Paral
2018-10-05 17:31:07 UTC
Permalink
(cross-posting to devel and desktop lists, ideally reply to both)

Hello,

this is a request for feedback regarding adjusting process limits to make
Steam/Wine work better on Fedora.

*A quick background:*
In August Valve announced [1] Proton [2], their own fork of Wine, to get
included in their Steam Play functionality, and allowing Linux players to
run Windows-only games transparently from Steam without any complex
configuration, just with a press of a button. Not everything works of
course, but the success rate is more than decent [3]. As far as I can tell,
the reception by Linux gamers has been *very* enthusiastic.

*The technical details:*
In order to boost performance, Valve included DXVK [4] which converts
DirectX calls to Vulkan (instead of OpenGL in vanilla Wine), and esync [5],
an existing wine patchset that performs process synchronization in a more
efficient manner than vanilla Wine, using file descriptors. You can read
the linked readme for full explanation.

The esync patchset is the main subject of this email. It uses a lot of file
descriptors and the default kernel limits are not sufficient for many
games. Valve notes that in their requirements document (under "FD LIMIT
REQUIREMENTS") [6] and it is further described in the esync readme [5]. The
documents also say that Debian and its derivatives like Ubuntu have already
raised the limit on open file descriptors, so those distributions work out
of the box with esync. Fedora is one of those that doesn't. I wonder if we
can consider changing that.

*Debian and Ubuntu:*
I've installed both Debian (Sid) and Ubuntu (18.10) to verify this, and can
confirm it. The default soft limit stays the same (1024), but the hard
limit is increased from 4096 to 1048576 (2^20). However, this only applies
to the systemd's system instance (systemd --system, PID 1), and not to
systemd user instances (systemd --user). Most of the apps you start in your
session are children of gnome-shell, which doesn't run under the systemd
user instance, so the higher limits apply for them as well (including
Steam). However some apps (probably started via dbus, I'm not really sure,
but importantly this includes also gnome-terminal) are started as children
of the systemd user instance and therefore have the original low limits
applied. I don't know whether this is intentional or just an omission. I
tried really hard to find the place where Debian/Ubuntu patches the
upstream limits, so that I could read some justification/explanation of the
change, but I wasn't able to find it (I searched the available patches for
kernel, systemd and pam).

*Configuring the limits:*
You can display the soft+hard limits of your current terminal using ulimit:
$ ulimit -Sn
1024
$ ulimit -Hn
4096
However, note that gnome-terminal runs under the systemd user session, so
at least in Debian/Ubuntu this will not see higher limits (i.e. neither
will Steam started from the terminal).

You can also use prlimit to see the limits of any running process. You can
use this to check limits of the systemd system instance, systemd user
instance, running Steam, etc.
$ sudo prlimit --nofile --pid 1
RESOURCE DESCRIPTION SOFT HARD UNITS
NOFILE max number of open files 1048576 1048576 files

You can modify the limits on the fly like this:
$ sudo prlimit --nofile=1024:1048576 --pid PID

You can increase the default limits by editing /etc/systemd/system.conf
(and /etc/systemd/user.conf, if you want to edit systemd user session as
well) and setting:
DefaultLimitNOFILE=1024:1048576

Alternatively, you can drop a file like this:
[Manager]
DefaultLimitNOFILE=1024:1048576
into /etc/systemd/system.conf.d/ (and /etc/systemd/user.conf.d/).

*Default limits in Fedora:*
From a technical point of view I'm not able to judge whether raising the
fileno limits by default is a trivial change or something with important
security implications. That's why I'm writing this email to hopefully get
replies from more knowledgeable people. The fact that Debian raised the
limits gives me hope we could do the same in Fedora (perhaps just in
Workstation, if the change in not welcome in the whole distribution). If
somebody can find the justification of Debian devs, that would be great.
I'd very much like to see Fedora (Workstation) being a good choice for
Linux gamers (we already packaged gamemode), and this might be an important
step to make sure Steam games don't work worse than on Ubuntu (but
hopefully even better, due to our more recent drivers).

Thanks for your feedback.


[1]
https://steamcommunity.com/games/221410/announcements/detail/1696055855739350561
[2] https://github.com/ValveSoftware/Proton
[3] https://spcr.netlify.com
[4] https://github.com/doitsujin/dxvk
[5] https://github.com/zfigura/wine/blob/esync/README.esync
[6]
https://github.com/ValveSoftware/Proton/blob/proton_3.7/PREREQS.md#fd-limit-requirements
Chris Murphy
2018-10-05 17:53:03 UTC
Permalink
Post by Kamil Paral
(cross-posting to devel and desktop lists, ideally reply to both)
I've installed both Debian (Sid) and Ubuntu (18.10) to verify this, and can
confirm it. The default soft limit stays the same (1024), but the hard limit
is increased from 4096 to 1048576 (2^20). However, this only applies to the
systemd's system instance (systemd --system, PID 1), and not to systemd user
instances (systemd --user).
It seems uncontroversial to at least raise it to 65535, about an order
magnitude, rather than three. And apply it to both system and user
instances.



--
Chris Murphy
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproj
Lennart Poettering
2018-10-05 18:03:57 UTC
Permalink
Post by Kamil Paral
(cross-posting to devel and desktop lists, ideally reply to both)
Coincidentally, at All Systems Go! in Berlin last week I had some
discussions with kernel people about RLIMIT_NOFILE defaults. They
basically suggested that the memory and performance cost of large
numbers of fds on current kernels is cheap, and that we should bump
the hard limit in systemd for all userspace processes.

I have thus prepared this a few days ago:

https://github.com/systemd/systemd/pull/10244

This should have the effect on systemd systems that do not patch
around in RLIMIT_NOFILE otherwise that the new default hard limit for
all userspace is 256K (though the soft limit remains at 1K, for compat
with select()). AFAIK Fedora doesn't override RLIMIT_NOFILE
artificially, hence these new systemd upstream defaults should trickle
down to Fedora too.

This is not quite the 1M you appear to ask for though… I picked 256K
mostly because I wanted to stay lower than the kernel built-in max
(which is 1M, i.e. /proc/sys/fs/nr_open), and needed to pick
something. Do you have any particular reason to prefer 1M over 256K? I
am completely open to suggestions there...

Lennart

--
Lennart Poettering, Red Hat
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/lis
Michael Cronenworth
2018-10-05 18:55:12 UTC
Permalink
First off, thanks, Kamil, for starting this discussion. I've been meaning to bring
it up.

On 10/5/18 1:03 PM, Lennart Poettering wrote:
[snip]
Post by Lennart Poettering
This is not quite the 1M you appear to ask for though… I picked 256K
mostly because I wanted to stay lower than the kernel built-in max
(which is 1M, i.e. /proc/sys/fs/nr_open), and needed to pick
something. Do you have any particular reason to prefer 1M over 256K? I
am completely open to suggestions there...
The upstream esync branch requests setting the hard limit to 1M.

https://github.com/zfigura/wine/blob/esync/README.esync

I haven't torn apart the project to see if 1M is really necessary so a different
limit may be up for discussion.

Regards,
Michael
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproject.org
Nicholas Miell
2018-10-05 19:28:44 UTC
Permalink
Post by Michael Cronenworth
First off, thanks, Kamil, for starting this discussion. I've been
meaning to bring it up.
[snip]
Post by Lennart Poettering
This is not quite the 1M you appear to ask for though… I picked 256K
mostly because I wanted to stay lower than the kernel built-in max
(which is 1M, i.e. /proc/sys/fs/nr_open), and needed to pick
something. Do you have any particular reason to prefer 1M over 256K? I
am completely open to suggestions there...
The upstream esync branch requests setting the hard limit to 1M.
https://github.com/zfigura/wine/blob/esync/README.esync
I haven't torn apart the project to see if 1M is really necessary so a
different limit may be up for discussion.
esync uses eventfd to reduce IPC to wineserver when emulating Windows
kernel objects, the exact number of eventfds needed depends entirely on
the behavior of the Windows application you are running.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives
Lennart Poettering
2018-10-05 20:19:30 UTC
Permalink
Post by Nicholas Miell
Post by Michael Cronenworth
Post by Lennart Poettering
This is not quite the 1M you appear to ask for though… I picked 256K
mostly because I wanted to stay lower than the kernel built-in max
(which is 1M, i.e. /proc/sys/fs/nr_open), and needed to pick
something. Do you have any particular reason to prefer 1M over 256K? I
am completely open to suggestions there...
The upstream esync branch requests setting the hard limit to 1M.
https://github.com/zfigura/wine/blob/esync/README.esync
I haven't torn apart the project to see if 1M is really necessary so a
different limit may be up for discussion.
esync uses eventfd to reduce IPC to wineserver when emulating Windows
kernel objects, the exact number of eventfds needed depends entirely on
the behavior of the Windows application you are running.
So, any idea why they picked 1M? Are there typical apps that require
really that many?

I mean, it could be two things:

1) yes, they ran into real-life apps that require 500K fds and hence
set the limit to 1M since they can't set it any higher anyway, and
it's far away from (i.e. double of) 500K.

2) no, they didn't run into real-life apps like this, but didn't want
to figure out what a good limit is, hence they set it to the
kernel's built-in maximum of 1M.

If it's #1 then I figure we should bump the systemd upstream to 1M
too. If it's #2, then I figure we can start with 256K as my PR
currently does, for now.

Lennart

--
Lennart Poettering, Red Hat
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraprojec
Kamil Paral
2018-10-08 08:53:18 UTC
Permalink
Post by Lennart Poettering
https://github.com/systemd/systemd/pull/10244
This is great, thank you.

So, any idea why they picked 1M? Are there typical apps that require
Post by Lennart Poettering
really that many?
I've emailed Zebediah Figura, the esync author. I asked him to either get
back to me and I'll resend his reply here, or to comment in your pull
request. Hopefully he can tell us what his process was for picking the 1M
value.
Michal Konečný
2018-10-08 09:07:23 UTC
Permalink
Maybe it will be also good to look at the SteamOS distribution and what
limits they are using.
On Fri, Oct 5, 2018 at 10:20 PM Lennart Poettering
https://github.com/systemd/systemd/pull/10244
This is great, thank you.
So, any idea why they picked 1M? Are there typical apps that require
really that many?
I've emailed Zebediah Figura, the esync author. I asked him to either
get back to me and I'll resend his reply here, or to comment in your
pull request. Hopefully he can tell us what his process was for
picking the 1M value.
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
Kamil Paral
2018-10-08 09:24:05 UTC
Permalink
Post by Michal Konečný
Maybe it will be also good to look at the SteamOS distribution and what
limits they are using.
According to the proton document [1], SteamOS also has an increased fileno
hard limit. I haven't verified the actual value, but they probably just
inherit it from Debian. And on SteamOS they really wouldn't need to go
lower than on Debian, it's a gaming-only OS.

[1]
https://github.com/ValveSoftware/Proton/blob/proton_3.7/PREREQS.md#fd-limit-requirements
Zebediah Figura
2018-10-08 19:03:17 UTC
Permalink
Hi all,

My thanks as well to Kamil for raising the question; it's been on my
list of things to do for a while.

The design of my patch set necessitates the allocation of one eventfd
descriptor for each kernel handle (which is, sort of, the Windows
equivalent of an fd) associated with a sync object. Some applications
can use rather a lot of these; users have regularly run into the default
limit of 4096.

The 1M number comes from Debian and derivatives, which have this as
their default hard limit (and is also the distribution I regularly use).
I don't have any familiarity with the kernel or anything about the
relevant infrastructure, but I don't see any reason why this would be
unreasonable, and not just for my Wine patches specifically.

On the other hand, I don't think it's quite necessary to go that high;
many people have used a 200k limit, apparently successfully. (One badly
misbehaving application, Google Earth VR, has a leak that causes it to
allocate at least 300k descriptors, at least while loading, but I think
that's the only one we've seen that has a leak on that scale. On the
other hand 1M would be enough even for it.) One user reported 16k as
being not enough for a more well-behaved game (I think it was
Frostpunk); that's the highest lower bound I recall hearing.

I've been meaning for a while to bring this up to the kernel itself; I
certainly don't think 4096 seems like a reasonable limit these days, and
that's not just as far as Wine and my patch set is concerned.

ἔρρωσθε,
Zeb
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fed
Laura Abbott
2018-10-05 20:20:38 UTC
Permalink
Post by Lennart Poettering
Post by Kamil Paral
(cross-posting to devel and desktop lists, ideally reply to both)
Coincidentally, at All Systems Go! in Berlin last week I had some
discussions with kernel people about RLIMIT_NOFILE defaults. They
basically suggested that the memory and performance cost of large
numbers of fds on current kernels is cheap, and that we should bump
the hard limit in systemd for all userspace processes.
https://github.com/systemd/systemd/pull/10244
This should have the effect on systemd systems that do not patch
around in RLIMIT_NOFILE otherwise that the new default hard limit for
all userspace is 256K (though the soft limit remains at 1K, for compat
with select()). AFAIK Fedora doesn't override RLIMIT_NOFILE
artificially, hence these new systemd upstream defaults should trickle
down to Fedora too.
I was asked about the file limit for SteamProton a few months ago
and I mentioned asking about policy limits publicly for a discussion.
It's good to know that the cost is relatively low and we can
increase the default in systemd.

Thanks,
Laura
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedor
Florian Weimer
2018-10-19 07:12:41 UTC
Permalink
Post by Lennart Poettering
Post by Kamil Paral
(cross-posting to devel and desktop lists, ideally reply to both)
Coincidentally, at All Systems Go! in Berlin last week I had some
discussions with kernel people about RLIMIT_NOFILE defaults. They
basically suggested that the memory and performance cost of large
numbers of fds on current kernels is cheap, and that we should bump
the hard limit in systemd for all userspace processes.
Which kernel version is that? Is that a new patch? Or some older
kernel?

It's definitely not true for kernel 4.18, see the script I posted.

Thanks,
Florian
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/ar
Lennart Poettering
2018-10-19 15:16:41 UTC
Permalink
Post by Florian Weimer
Post by Lennart Poettering
Post by Kamil Paral
(cross-posting to devel and desktop lists, ideally reply to both)
Coincidentally, at All Systems Go! in Berlin last week I had some
discussions with kernel people about RLIMIT_NOFILE defaults. They
basically suggested that the memory and performance cost of large
numbers of fds on current kernels is cheap, and that we should bump
the hard limit in systemd for all userspace processes.
Which kernel version is that? Is that a new patch? Or some older
kernel?
It's definitely not true for kernel 4.18, see the script I posted.
I inquired Tejun Heo about this all, this is what he replied:

<snip>
In cgroup1, socket buffers are handled by a separate memory
sub-controller. It's cumbersome to use, somewhat broken and doesn't
allow for comprehensive memory control. cgroup2, however, by default
accounts socket buffer as part of a given cgroup's memory consumption
correctly interacting with socket window management.

OOM killer too fails to take socket buffer into account and high
number of sockets can lead it to make ineffective decisions; however,
this failure mode isn't confined to high number of sockets at all -
fewer number of fat pipes, tmpfs, mount points or any other kernel
objects which can be pinned by processes can trigger this.

cgroup2 can track or control most of these usages and at least for us
switching to oomd for workload health management solves most of the
problems that we've encountered. In the longer term, the kernel OOM
killer can be improved to make better decisions too.
</snip>

("us" in the above is facebook btw.)

So, yeah, if we'd use cgroupv2 on Fedora, then everything would be
great (unfortunately the container messiness blocks that for now). But
as long as we don't, lifting the fd limit is not really making things
worse, given that there are tons of other easily exploitable ways to
acquire untracked memory...

Lennart

--
Lennart Poettering, Red Hat
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraprojec
Florian Weimer
2018-10-22 09:58:20 UTC
Permalink
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Post by Kamil Paral
(cross-posting to devel and desktop lists, ideally reply to both)
Coincidentally, at All Systems Go! in Berlin last week I had some
discussions with kernel people about RLIMIT_NOFILE defaults. They
basically suggested that the memory and performance cost of large
numbers of fds on current kernels is cheap, and that we should bump
the hard limit in systemd for all userspace processes.
Which kernel version is that? Is that a new patch? Or some older
kernel?
It's definitely not true for kernel 4.18, see the script I posted.
So, yeah, if we'd use cgroupv2 on Fedora, then everything would be
great (unfortunately the container messiness blocks that for now). But
as long as we don't, lifting the fd limit is not really making things
worse, given that there are tons of other easily exploitable ways to
acquire untracked memory...
How does cgroupv2 solve this if we do not configure hard limits for the
user session? I don't want us to go back to static resource allocation
for applications, similar to what System 9 did.

Anyway, the problem suggests to me that the default soft limit should
not be raised until the kernel gets better recovery, so that
applications won't trigger the issue by accident.

Thanks,
Florian
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedor
Lennart Poettering
2018-10-22 20:04:52 UTC
Permalink
Post by Florian Weimer
Anyway, the problem suggests to me that the default soft limit should
not be raised until the kernel gets better recovery, so that
applications won't trigger the issue by accident.
During the whole discussions we always made clear that we can't and
won't change the default soft limit, because of the incompatibility
with select(), which cannot deal with fds > 1024. i.e. there always
needs to be the explicit "opt-in" step for apps to say "i am happy
with fds > 1024" (aka, "I promise not to use select()") by bumping the
soft limit to the high limit.

Lennart

--
Lennart Poettering, Red Hat
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedor
Igor Gnatenko
2018-10-24 06:45:32 UTC
Permalink
How can I enable cgroups2 on my laptop?
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Post by Kamil Paral
(cross-posting to devel and desktop lists, ideally reply to both)
Coincidentally, at All Systems Go! in Berlin last week I had some
discussions with kernel people about RLIMIT_NOFILE defaults. They
basically suggested that the memory and performance cost of large
numbers of fds on current kernels is cheap, and that we should bump
the hard limit in systemd for all userspace processes.
Which kernel version is that? Is that a new patch? Or some older
kernel?
It's definitely not true for kernel 4.18, see the script I posted.
<snip>
In cgroup1, socket buffers are handled by a separate memory
sub-controller. It's cumbersome to use, somewhat broken and doesn't
allow for comprehensive memory control. cgroup2, however, by default
accounts socket buffer as part of a given cgroup's memory consumption
correctly interacting with socket window management.
OOM killer too fails to take socket buffer into account and high
number of sockets can lead it to make ineffective decisions; however,
this failure mode isn't confined to high number of sockets at all -
fewer number of fat pipes, tmpfs, mount points or any other kernel
objects which can be pinned by processes can trigger this.
cgroup2 can track or control most of these usages and at least for us
switching to oomd for workload health management solves most of the
problems that we've encountered. In the longer term, the kernel OOM
killer can be improved to make better decisions too.
</snip>
("us" in the above is facebook btw.)
So, yeah, if we'd use cgroupv2 on Fedora, then everything would be
great (unfortunately the container messiness blocks that for now). But
as long as we don't, lifting the fd limit is not really making things
worse, given that there are tons of other easily exploitable ways to
acquire untracked memory...
Lennart
--
Lennart Poettering, Red Hat
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
Zbigniew Jędrzejewski-Szmek
2018-10-24 07:35:23 UTC
Permalink
Post by Igor Gnatenko
How can I enable cgroups2 on my laptop?
Put systemd.unified-cgroup-hierarchy on the kernel command line.

Zbyszek
Post by Igor Gnatenko
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Post by Kamil Paral
(cross-posting to devel and desktop lists, ideally reply to both)
Coincidentally, at All Systems Go! in Berlin last week I had some
discussions with kernel people about RLIMIT_NOFILE defaults. They
basically suggested that the memory and performance cost of large
numbers of fds on current kernels is cheap, and that we should bump
the hard limit in systemd for all userspace processes.
Which kernel version is that? Is that a new patch? Or some older
kernel?
It's definitely not true for kernel 4.18, see the script I posted.
<snip>
In cgroup1, socket buffers are handled by a separate memory
sub-controller. It's cumbersome to use, somewhat broken and doesn't
allow for comprehensive memory control. cgroup2, however, by default
accounts socket buffer as part of a given cgroup's memory consumption
correctly interacting with socket window management.
OOM killer too fails to take socket buffer into account and high
number of sockets can lead it to make ineffective decisions; however,
this failure mode isn't confined to high number of sockets at all -
fewer number of fat pipes, tmpfs, mount points or any other kernel
objects which can be pinned by processes can trigger this.
cgroup2 can track or control most of these usages and at least for us
switching to oomd for workload health management solves most of the
problems that we've encountered. In the longer term, the kernel OOM
killer can be improved to make better decisions too.
</snip>
("us" in the above is facebook btw.)
So, yeah, if we'd use cgroupv2 on Fedora, then everything would be
great (unfortunately the container messiness blocks that for now). But
as long as we don't, lifting the fd limit is not really making things
worse, given that there are tons of other easily exploitable ways to
acquire untracked memory...
Lennart
--
Lennart Poettering, Red Hat
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproject.
John Reiser
2018-10-08 20:00:01 UTC
Permalink
Allowing 1M open files per unprivileged process is too many.

Megabytes of RAM are precious. A hard limit of 1M open files per process
allows each process to eat at least 256MB (1M * sizeof(struct file)
[linux/fs.h]) of RAM. If a single user is allowed 1000 processes,
then that's 256GB of RAM, which is a Denial-of-Service attack.

Yes, 4096 open files is not enough. Raise it to 65536.

_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproject.org
Zebediah Figura
2018-10-08 20:26:05 UTC
Permalink
Post by John Reiser
Allowing 1M open files per unprivileged process is too many.
Megabytes of RAM are precious.  A hard limit of 1M open files per process
allows each process to eat at least 256MB (1M * sizeof(struct file)
[linux/fs.h]) of RAM.  If a single user is allowed 1000 processes,
then that's 256GB of RAM, which is a Denial-of-Service attack.
Yes, 4096 open files is not enough.  Raise it to 65536.
Correct me if I'm wrong, but wouldn't this be capped by the system-wide
limit (i.e. it would hit ENFILE) before presenting a problem?
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/dev
John Reiser
2018-10-08 21:43:23 UTC
Permalink
Post by Zebediah Figura
Post by John Reiser
Allowing 1M open files per unprivileged process is too many.
Megabytes of RAM are precious.  A hard limit of 1M open files per process
allows each process to eat at least 256MB (1M * sizeof(struct file)
[linux/fs.h]) of RAM.  If a single user is allowed 1000 processes,
then that's 256GB of RAM, which is a Denial-of-Service attack.
Yes, 4096 open files is not enough.  Raise it to 65536.
Correct me if I'm wrong, but wouldn't this be capped by the system-wide
limit (i.e. it would hit ENFILE) before presenting a problem?
That means that a different DoS can happen even sooner,
at (ENFILE / 1M) processes. No other process could open() a file.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.or
Zebediah Figura
2018-10-08 23:28:17 UTC
Permalink
Post by John Reiser
Post by Zebediah Figura
Post by John Reiser
Allowing 1M open files per unprivileged process is too many.
Megabytes of RAM are precious.  A hard limit of 1M open files per process
allows each process to eat at least 256MB (1M * sizeof(struct file)
[linux/fs.h]) of RAM.  If a single user is allowed 1000 processes,
then that's 256GB of RAM, which is a Denial-of-Service attack.
Yes, 4096 open files is not enough.  Raise it to 65536.
Correct me if I'm wrong, but wouldn't this be capped by the system-wide
limit (i.e. it would hit ENFILE) before presenting a problem?
That means that a different DoS can happen even sooner,
at (ENFILE / 1M) processes.  No other process could open() a file.
Sure, but in order to prevent that you'd almost always need to *lower*
NOFILE. I don't know what kind of policies Fedora (or any other
distribution) has regarding this kind of attack mitigation, but it seems
dubious to me that this is worth doing.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@li
Jan Pokorný
2018-10-09 13:14:03 UTC
Permalink
Post by Zebediah Figura
Post by John Reiser
Post by Zebediah Figura
Post by John Reiser
Allowing 1M open files per unprivileged process is too many.
Megabytes of RAM are precious.  A hard limit of 1M open files per
process
allows each process to eat at least 256MB (1M * sizeof(struct file)
[linux/fs.h]) of RAM.  If a single user is allowed 1000 processes,
then that's 256GB of RAM, which is a Denial-of-Service attack.
Yes, 4096 open files is not enough.  Raise it to 65536.
Correct me if I'm wrong, but wouldn't this be capped by the
system-wide limit (i.e. it would hit ENFILE) before presenting
a problem?
That means that a different DoS can happen even sooner,
at (ENFILE / 1M) processes.  No other process could open() a file.
The surface is substantially more colourful, e.g., executables may
not be launched anymore (i.e., effectively similar to denials based
on /proc/sys/kernel/pid_max), for failures to load shared libraries when
it gets thus far at all.
Post by Zebediah Figura
Sure, but in order to prevent that you'd almost always need to *lower*
NOFILE. I don't know what kind of policies Fedora (or any other
distribution) has regarding this kind of attack mitigation, but it
seems dubious to me that this is worth doing.
Yes, it feels somewhat uneasy that unprivileged users/processes are
given, in pristine configuration, a free permit to consume order (or
two) of magnitude more resources from globally shared domain than
what the globally imposed limits are, possibly impacting privileged
entities. Right now, not to talk about increasing one of these limits
globally, which would be hence preferrably limited just to Workstation
edition, and for the rest the question of possibly lowering these
limits for nonprivileged use cases would deserve some attention, IMHO.
--
Jan (Poki)
Michal Konečný
2018-10-09 13:28:25 UTC
Permalink
Post by Jan Pokorný
Post by Zebediah Figura
Post by John Reiser
Post by Zebediah Figura
Post by John Reiser
Allowing 1M open files per unprivileged process is too many.
Megabytes of RAM are precious.  A hard limit of 1M open files per
process
allows each process to eat at least 256MB (1M * sizeof(struct file)
[linux/fs.h]) of RAM.  If a single user is allowed 1000 processes,
then that's 256GB of RAM, which is a Denial-of-Service attack.
Yes, 4096 open files is not enough.  Raise it to 65536.
Correct me if I'm wrong, but wouldn't this be capped by the
system-wide limit (i.e. it would hit ENFILE) before presenting
a problem?
That means that a different DoS can happen even sooner,
at (ENFILE / 1M) processes.  No other process could open() a file.
The surface is substantially more colourful, e.g., executables may
not be launched anymore (i.e., effectively similar to denials based
on /proc/sys/kernel/pid_max), for failures to load shared libraries when
it gets thus far at all.
Post by Zebediah Figura
Sure, but in order to prevent that you'd almost always need to *lower*
NOFILE. I don't know what kind of policies Fedora (or any other
distribution) has regarding this kind of attack mitigation, but it
seems dubious to me that this is worth doing.
Yes, it feels somewhat uneasy that unprivileged users/processes are
given, in pristine configuration, a free permit to consume order (or
two) of magnitude more resources from globally shared domain than
what the globally imposed limits are, possibly impacting privileged
entities. Right now, not to talk about increasing one of these limits
globally, which would be hence preferrably limited just to Workstation
edition, and for the rest the question of possibly lowering these
limits for nonprivileged use cases would deserve some attention, IMHO.
Because this is mainly for Steam Proton I support the decision to raise
the limit only for Workstation. No need to do this on server edition.
I recommend to also raise this limit for Silverblue edition, because
this is targeted on common users.
mkonecny
Post by Jan Pokorný
_______________________________________________
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
Kamil Paral
2018-10-09 14:08:06 UTC
Permalink
Post by Michal Konečný
Because this is mainly for Steam Proton I support the decision to raise
the limit only for Workstation. No need to do this on server edition.
I recommend to also raise this limit for Silverblue edition, because this
is targeted on common users.
It would be nice if somebody managed to find where this is patched in
Debian. Because I somewhat doubt that they made this change without a
proper discussion. And Debian is very much server oriented.
Anderson, Charles R
2018-10-09 14:45:44 UTC
Permalink
Post by Kamil Paral
Post by Michal Konečný
Because this is mainly for Steam Proton I support the decision to raise
the limit only for Workstation. No need to do this on server edition.
I recommend to also raise this limit for Silverblue edition, because this
is targeted on common users.
It would be nice if somebody managed to find where this is patched in
Debian. Because I somewhat doubt that they made this change without a
proper discussion. And Debian is very much server oriented.
Can we not have the RPM package drop a file in /etc/security/limits.d
to set the limit only when that package is installed? That way it
only affects users of that package.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproj
Lennart Poettering
2018-10-09 16:13:52 UTC
Permalink
Post by Anderson, Charles R
Post by Kamil Paral
It would be nice if somebody managed to find where this is patched in
Debian. Because I somewhat doubt that they made this change without a
proper discussion. And Debian is very much server oriented.
Can we not have the RPM package drop a file in /etc/security/limits.d
to set the limit only when that package is installed? That way it
only affects users of that package.
That only affects stuff that goes through PAM (specifically, all PAM
stacks that include pam_limits.so).

It is my intention to change this system wide, i.e. for system
services (which do not go through PAM) too.

Lennart

--
Lennart Poettering, Red Hat
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject
Kamil Paral
2018-10-15 16:00:05 UTC
Permalink
Post by Lennart Poettering
Post by Anderson, Charles R
Post by Kamil Paral
It would be nice if somebody managed to find where this is patched in
Debian. Because I somewhat doubt that they made this change without a
proper discussion. And Debian is very much server oriented.
Can we not have the RPM package drop a file in /etc/security/limits.d
to set the limit only when that package is installed? That way it
only affects users of that package.
That only affects stuff that goes through PAM (specifically, all PAM
stacks that include pam_limits.so).
It is my intention to change this system wide, i.e. for system
services (which do not go through PAM) too.
Lennart, what is the path forward here? Should we pull in some security
experts to give us recommendations on the best default value? Or are those
conversations already happening somewhere else? Also, do you need any more
information regarding the Wine esync use case, or has Zebediah provided
sufficient data?

Thanks.
Zbigniew Jędrzejewski-Szmek
2018-10-15 16:04:23 UTC
Permalink
Post by Kamil Paral
Post by Lennart Poettering
Post by Anderson, Charles R
Post by Kamil Paral
It would be nice if somebody managed to find where this is patched in
Debian. Because I somewhat doubt that they made this change without a
proper discussion. And Debian is very much server oriented.
Can we not have the RPM package drop a file in /etc/security/limits.d
to set the limit only when that package is installed? That way it
only affects users of that package.
That only affects stuff that goes through PAM (specifically, all PAM
stacks that include pam_limits.so).
It is my intention to change this system wide, i.e. for system
services (which do not go through PAM) too.
Lennart, what is the path forward here? Should we pull in some security
experts to give us recommendations on the best default value? Or are those
conversations already happening somewhere else? Also, do you need any more
information regarding the Wine esync use case, or has Zebediah provided
sufficient data?
It's being discussed in systemd upstream:
https://github.com/systemd/systemd/pull/10244
It needs another round of review, but looks like it'll be merged soon.

Zbyszek
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives
Lennart Poettering
2018-10-15 17:32:57 UTC
Permalink
Post by Kamil Paral
Post by Lennart Poettering
Post by Anderson, Charles R
Post by Kamil Paral
It would be nice if somebody managed to find where this is patched in
Debian. Because I somewhat doubt that they made this change without a
proper discussion. And Debian is very much server oriented.
Can we not have the RPM package drop a file in /etc/security/limits.d
to set the limit only when that package is installed? That way it
only affects users of that package.
That only affects stuff that goes through PAM (specifically, all PAM
stacks that include pam_limits.so).
It is my intention to change this system wide, i.e. for system
services (which do not go through PAM) too.
Lennart, what is the path forward here? Should we pull in some security
experts to give us recommendations on the best default value? Or are those
conversations already happening somewhere else? Also, do you need any more
information regarding the Wine esync use case, or has Zebediah provided
sufficient data?
Please follow the current state of this here:

https://github.com/systemd/systemd/pull/10244

I have been discussing with some upstream kernel folks, and some more
obstacles showed up (specifically, I was advised that we really should
bump fs.file-max and fs.nr_open sysctls to their maximums these days,
as these limits are not really useful anymore given that fd memory is
properly tracked by memcg anyways these days), which I have now
covered in the PR above.

This is waiting for review, but should enter systemd upstream soon,
and will then eventually trickle into Fedora.

Lennart

--
Lennart Poettering, Red Hat
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject
Kamil Paral
2018-10-16 11:25:10 UTC
Permalink
Post by Kamil Paral
Post by Kamil Paral
Post by Lennart Poettering
Post by Anderson, Charles R
Post by Kamil Paral
It would be nice if somebody managed to find where this is patched
in
Post by Kamil Paral
Post by Lennart Poettering
Post by Anderson, Charles R
Post by Kamil Paral
Debian. Because I somewhat doubt that they made this change
without a
Post by Kamil Paral
Post by Lennart Poettering
Post by Anderson, Charles R
Post by Kamil Paral
proper discussion. And Debian is very much server oriented.
Can we not have the RPM package drop a file in /etc/security/limits.d
to set the limit only when that package is installed? That way it
only affects users of that package.
That only affects stuff that goes through PAM (specifically, all PAM
stacks that include pam_limits.so).
It is my intention to change this system wide, i.e. for system
services (which do not go through PAM) too.
Lennart, what is the path forward here? Should we pull in some security
experts to give us recommendations on the best default value? Or are
those
Post by Kamil Paral
conversations already happening somewhere else? Also, do you need any
more
Post by Kamil Paral
information regarding the Wine esync use case, or has Zebediah provided
sufficient data?
https://github.com/systemd/systemd/pull/10244
I have been discussing with some upstream kernel folks, and some more
obstacles showed up (specifically, I was advised that we really should
bump fs.file-max and fs.nr_open sysctls to their maximums these days,
as these limits are not really useful anymore given that fd memory is
properly tracked by memcg anyways these days), which I have now
covered in the PR above.
This is waiting for review, but should enter systemd upstream soon,
and will then eventually trickle into Fedora.
Zebediah, do you know about any other outlier except of Google Earth VR for
which the newly proposed default limit of 256K wouldn't be sufficient?
Zebediah Figura
2018-10-18 01:35:43 UTC
Permalink
Post by Kamil Paral
On Tue, Oct 9, 2018 at 6:15 PM Lennart Poettering
Post by Lennart Poettering
Post by Anderson, Charles R
Post by Kamil Paral
It would be nice if somebody managed to find where this is
patched in
Post by Lennart Poettering
Post by Anderson, Charles R
Post by Kamil Paral
Debian. Because I somewhat doubt that they made this change
without a
Post by Lennart Poettering
Post by Anderson, Charles R
Post by Kamil Paral
proper discussion. And Debian is very much server oriented.
Can we not have the RPM package drop a file in
/etc/security/limits.d
Post by Lennart Poettering
Post by Anderson, Charles R
to set the limit only when that package is installed?  That
way it
Post by Lennart Poettering
Post by Anderson, Charles R
only affects users of that package.
That only affects stuff that goes through PAM (specifically,
all PAM
Post by Lennart Poettering
stacks that include pam_limits.so).
It is my intention to change this system wide, i.e. for system
services (which do not go through PAM) too.
Lennart, what is the path forward here? Should we pull in some
security
experts to give us recommendations on the best default value? Or
are those
conversations already happening somewhere else? Also, do you need
any more
information regarding the Wine esync use case, or has Zebediah
provided
sufficient data?
https://github.com/systemd/systemd/pull/10244
I have been discussing with some upstream kernel folks, and some more
obstacles showed up (specifically, I was advised that we really should
bump fs.file-max and fs.nr_open sysctls to their maximums these days,
as these limits are not really useful anymore given that fd memory is
properly tracked by memcg anyways these days), which I have now
covered in the PR above.
This is waiting for review, but should enter systemd upstream soon,
and will then eventually trickle into Fedora.
Zebediah, do you know about any other outlier except of Google Earth VR
for which the newly proposed default limit of 256K wouldn't be sufficient?
I do not, no. That number came up pretty early in testing, and I think
we ended up leaving the limit at ~1M while testing other applications.
On the other hand, I know many users in the wild have used ~200k values,
and I haven't heard anyone specifically say that wasn't enough.

Security concerns aside, I'm all in favor of pushing the solution
farther upstream.
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/arc
Lennart Poettering
2018-10-18 07:47:07 UTC
Permalink
I do not, no. That number came up pretty early in testing, and I think we
ended up leaving the limit at ~1M while testing other applications. On the
other hand, I know many users in the wild have used ~200k values, and I
haven't heard anyone specifically say that wasn't enough.
Security concerns aside, I'm all in favor of pushing the solution farther
upstream.
We merged the PR that bumps RLIMIT_NOFILE to 256K into systemd
upstream yesterday. It should trickle into Fedora and the other
distros as soon as we do the next release.

https://github.com/systemd/systemd/pull/10244

Lennart

--
Lennart Poettering, Red Hat
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/ar
Kamil Paral
2018-10-18 09:34:31 UTC
Permalink
Post by Lennart Poettering
I do not, no. That number came up pretty early in testing, and I think we
ended up leaving the limit at ~1M while testing other applications. On
the
other hand, I know many users in the wild have used ~200k values, and I
haven't heard anyone specifically say that wasn't enough.
Security concerns aside, I'm all in favor of pushing the solution farther
upstream.
We merged the PR that bumps RLIMIT_NOFILE to 256K into systemd
upstream yesterday. It should trickle into Fedora and the other
distros as soon as we do the next release.
https://github.com/systemd/systemd/pull/10244
Lennart, Zbyszek, would it be possible to backport the change also to F29's
systemd (F28 as well would be ideal), so that Wine esync/Steam Proton works
for people out of the box earlier than on F30?
Lennart Poettering
2018-10-18 09:51:57 UTC
Permalink
Post by Kamil Paral
Post by Lennart Poettering
We merged the PR that bumps RLIMIT_NOFILE to 256K into systemd
upstream yesterday. It should trickle into Fedora and the other
distros as soon as we do the next release.
https://github.com/systemd/systemd/pull/10244
Lennart, Zbyszek, would it be possible to backport the change also to F29's
systemd (F28 as well would be ideal), so that Wine esync/Steam Proton works
for people out of the box earlier than on F30?
Quite frankly I'd wait a bit so that people can test this first across
the various development distros, before we propagate this to stable
distros.

Lennart

--
Lennart Poettering, Red Hat
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/lis
Zbigniew Jędrzejewski-Szmek
2018-10-18 13:10:59 UTC
Permalink
Post by Lennart Poettering
Post by Kamil Paral
Post by Lennart Poettering
We merged the PR that bumps RLIMIT_NOFILE to 256K into systemd
upstream yesterday. It should trickle into Fedora and the other
distros as soon as we do the next release.
https://github.com/systemd/systemd/pull/10244
Lennart, Zbyszek, would it be possible to backport the change also to F29's
systemd (F28 as well would be ideal), so that Wine esync/Steam Proton works
for people out of the box earlier than on F30?
Quite frankly I'd wait a bit so that people can test this first across
the various development distros, before we propagate this to stable
distros.
Yeah, this needs to be tested in rawhide first for a while before
being pushed to stable Fedora.

That said, I think we should be able to update systemd to v240 during
the lifetime of F29. We have been putting a lot of effort into
maintaining backwards compatibility in all changes being done in
systemd upstream, and I'm optimistic that we can make systemd 240
compatible enough to be updated in F29. This would be the first time.

Zbyszek
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel
Daniel P. Berrangé
2018-10-09 09:10:56 UTC
Permalink
Post by John Reiser
Allowing 1M open files per unprivileged process is too many.
Megabytes of RAM are precious. A hard limit of 1M open files per process
allows each process to eat at least 256MB (1M * sizeof(struct file)
[linux/fs.h]) of RAM. If a single user is allowed 1000 processes,
then that's 256GB of RAM, which is a Denial-of-Service attack.
AFAICT, the TCP receive buffer size is about 200 KB per socket.

With the current nfile=4096, it looks like a single process
can already consume 200 KB * 4096 = ~800 MB of RAM just by
using TCP sockets.

IOW, does the nfiles limit make a real world difference to
avoiding memory DOS, if you can just pick a different DOS
attack vector instead ?

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/***@lists.fedoraproje
John Reiser
2018-10-09 15:10:24 UTC
Permalink
Post by Daniel P. Berrangé
AFAICT, the TCP receive buffer size is about 200 KB per socket.
With the current nfile=4096, it looks like a single process
can already consume 200 KB * 4096 = ~800 MB of RAM just by
using TCP sockets.
IOW, does the nfiles limit make a real world difference to
avoiding memory DOS, if you can just pick a different DOS
attack vector instead?
Yes, it does. The attacker might not think of TCP.
Also, the usual successful TCP connection requires cooperation
from a "far" endpoint, which can be cumbersome to arrange.

_______________________________________________
devel mailing list -- ***@lists.fedoraproject.org
To unsubscribe send an email to devel-***@lists.fedoraproject.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/
Florian Weimer
2018-10-19 07:10:47 UTC
Permalink
Post by Kamil Paral
From a technical point of view I'm not able to judge whether raising
the fileno limits by default is a trivial change or something with
important security implications.
It has implications for reliability (and perhaps security). File
descriptors can refer to sockets, and each socket can have a fairly
large amount of unswappable kernel memory associated with it. This
memory is not tracked along with the process that created the sockets or
has them opened, so the OOM killer does not take it into account when
selecting processes to terminate.

The attached script, when run with “python3 many-sockets.py 50000” as a
regular user, after raising the limit, tricks the OOM killer into
terminating processes. Important processes such as systemd-journal fail
because the OOM killer cannot recover any memory. It even terminates
processes which are already fully swapped out.

I think a reasonable file descriptor limit is an important safety net.

Thanks,
Florian
Loading...