Discussion:
Yet another fork
(too old to reply)
Rainer Weikusat
2020-06-03 22:08:11 UTC
Permalink
Problem to be solved: A certain network accelarator card needs a user
mode driver running on Linux. For configuration, a script in a certain,
vendor-specific programming language needs to be run prior to other
applications trying to use the hardware.

The script can be executed once the driver process is ready to interface
with clients. This can be determined by calling the init function in a
loop until it returns a success status. This, starting of the user mode
driver process and possibly, of another loading the kernel part of this
driver, happens in a separate thread while the initial thread of the
management process accepts connections from prospective users of the
hardware on a UNIX stream socket.

There's a propietary interface library involved here which has to do
some cleanup tasks on process exit, Presumably, this happens via atexit
handlers. But this doesn't work anymore if the init function was called
from a different thread than the one causing the process to terminate:
Some SysV IPC shared memory segments otherwise hang around and have to
be removed manually before the user mode driver (process) can again
start successfully.

Something asynchronous could have done for the init here, but due to
time pressure, everything was to be kept as simple as possible. A forked
process couldn't have been used for the init because the driver process
needs to end up being a child of the management process.

Solution: Call the init routine from a process forked off by the init
thread and communicate the outcome back to the manager process via exit
status.

Of course, fork is alternatively "a totally useless adjunct to exec" or
"something nobody would ever need to use in a multithreaded process" ...
James K. Lowden
2020-06-04 00:07:17 UTC
Permalink
On Wed, 03 Jun 2020 23:08:11 +0100
Post by Rainer Weikusat
Problem to be solved: A certain network accelarator card needs a user
mode driver running on Linux.
Not to be completely snarky (but maybe a little): isn't the solution a
kernel-mode driver, what we used to call a "device driver"?

I know you don't have control over all the parts. If it was all in the
kernel, though, would the problem have ever arisen?

--jkl
Scott Lurndal
2020-06-04 15:27:25 UTC
Permalink
Post by James K. Lowden
On Wed, 03 Jun 2020 23:08:11 +0100
Post by Rainer Weikusat
Problem to be solved: A certain network accelarator card needs a user
mode driver running on Linux.
Not to be completely snarky (but maybe a little): isn't the solution a
kernel-mode driver, what we used to call a "device driver"?
There is some open-source data-plane software used for Software Defined
Networking (SDN) that typically runs in user mode. See, for example,
DPDK (Data Plane Development Kit) or ODP (Open Dataplane). Both of these
expect user-mode to be able to control the networking hardware
without kernel intervention (which can be expensive when processing
small packets at 10 to 100Gbps line rates) by allowing user-mode software
to queue packets directly to the hardware.
William Ahern
2020-06-04 16:24:34 UTC
Permalink
Post by Scott Lurndal
Post by James K. Lowden
On Wed, 03 Jun 2020 23:08:11 +0100
Post by Rainer Weikusat
Problem to be solved: A certain network accelarator card needs a user
mode driver running on Linux.
Not to be completely snarky (but maybe a little): isn't the solution a
kernel-mode driver, what we used to call a "device driver"?
There is some open-source data-plane software used for Software Defined
Networking (SDN) that typically runs in user mode. See, for example,
DPDK (Data Plane Development Kit) or ODP (Open Dataplane). Both of these
expect user-mode to be able to control the networking hardware
without kernel intervention (which can be expensive when processing
small packets at 10 to 100Gbps line rates) by allowing user-mode software
to queue packets directly to the hardware.
DPDK didn't need to put so much complexity and privilege in user space. For
example, the netmap interface permits similar performance, but provides
proper kernel abstractions. http://info.iet.unipi.it/~luigi/netmap/

Intel marketed the h*ll out DPDK as a way to push their controllers, and
unfortunately everybody took the bait.
Scott Lurndal
2020-06-04 17:03:15 UTC
Permalink
Post by William Ahern
Post by Scott Lurndal
Post by James K. Lowden
On Wed, 03 Jun 2020 23:08:11 +0100
Post by Rainer Weikusat
Problem to be solved: A certain network accelarator card needs a user
mode driver running on Linux.
Not to be completely snarky (but maybe a little): isn't the solution a
kernel-mode driver, what we used to call a "device driver"?
There is some open-source data-plane software used for Software Defined
Networking (SDN) that typically runs in user mode. See, for example,
DPDK (Data Plane Development Kit) or ODP (Open Dataplane). Both of these
expect user-mode to be able to control the networking hardware
without kernel intervention (which can be expensive when processing
small packets at 10 to 100Gbps line rates) by allowing user-mode software
to queue packets directly to the hardware.
DPDK didn't need to put so much complexity and privilege in user space. For
example, the netmap interface permits similar performance, but provides
proper kernel abstractions. http://info.iet.unipi.it/~luigi/netmap/
Intel marketed the h*ll out DPDK as a way to push their controllers, and
unfortunately everybody took the bait.
Perhaps, but DPDK (and ODP) also support other processors/controllers, e.g.:

https://www.marvell.com/content/dam/marvell/en/public-collateral/embedded-processors/marvell-infrastructure-processors-octeon-tx2-cn92xx-cn96xx-cn98xx-product-brief-2020-02.pdf
Kaz Kylheku
2020-06-04 02:39:17 UTC
Permalink
Post by Rainer Weikusat
Of course, fork is alternatively "a totally useless adjunct to exec" or
"something nobody would ever need to use in a multithreaded process" ...
I have developed the view that fork is the good thing, and threads are
the nonsense. Especially the POSIX threads which were badly bolted on to
the Unix process model---and this still keeps showing through in spite
of years of band-aiding with various ugly API extensions to address this
or that problem.

Make no mistake, I'm very good at threads. I was better at threads
twenty years ago than a lot of coders are today.

I'm just not inclined to use them in any side project where I'm the
boss, though.

Threads need to die. (Mostly; except maybe for that part whereby
breaking up inter-thread fights earns me a living.)

If threads are used, there should be few of them. Multiple activities
should be multiplexed onto threads with state machines, and efficient
I/O polling mechanisms.

Threads should never be used as a poor, resource-intensive substitute
for:

- lexical closures
- continuations
- generators
- coroutines

Here is a dime, kids: get off my lawn and get yourself a better
programming language.
b***@nowhere.co.uk
2020-06-04 08:20:10 UTC
Permalink
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
Post by Rainer Weikusat
Of course, fork is alternatively "a totally useless adjunct to exec" or
"something nobody would ever need to use in a multithreaded process" ...
I have developed the view that fork is the good thing, and threads are
the nonsense. Especially the POSIX threads which were badly bolted on to
Threading models spilled over from the Windows world as windows couldn't (and
win32 still can't, though oddly the Windows kernel can) do proper process
control and certainly fork() was a distance dream. So the mindset of using
threads for everything regardless of whether it was appropriate spilled over
into Unix.

The best (worst) example of this IMO is when Firefox switched from a multi
process model to multi threaded to make cross compilation easier, which meant
if any web page/tab crashedit would take out every other browser window
with it.
Post by Kaz Kylheku
Threads should never be used as a poor, resource-intensive substitute
- lexical closures
- continuations
- generators
- coroutines
Here is a dime, kids: get off my lawn and get yourself a better
programming language.
Threading is essentially just multiprocess with very little memory protection.
Jorgen Grahn
2020-06-04 10:51:13 UTC
Permalink
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
Post by Rainer Weikusat
Of course, fork is alternatively "a totally useless adjunct to exec" or
"something nobody would ever need to use in a multithreaded process" ...
I have developed the view that fork is the good thing, and threads are
the nonsense. Especially the POSIX threads which were badly bolted on to
Threading models spilled over from the Windows world as windows couldn't (and
win32 still can't, though oddly the Windows kernel can) do proper process
control and certainly fork() was a distance dream. So the mindset of using
threads for everything regardless of whether it was appropriate spilled over
into Unix.
Not sure about that: the book about thread programming I have is from
1995 and seems to focus a lot on Solaris. Did Sun push for that model
for some reason? Of course, Microsoft also pushing didn't help.
Post by b***@nowhere.co.uk
Post by Kaz Kylheku
Threads should never be used as a poor, resource-intensive substitute
- lexical closures
- continuations
- generators
- coroutines
(Kept that because it's a good summary.)

Things are finally improving now, AFAICT. For the past five years
everyone in projects I'm in has tried to /avoid/ introducing threads,
rather than trying to add as many as possible.

/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
William Ahern
2020-06-04 11:58:36 UTC
Permalink
Post by Jorgen Grahn
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
Post by Rainer Weikusat
Of course, fork is alternatively "a totally useless adjunct to exec" or
"something nobody would ever need to use in a multithreaded process" ...
I have developed the view that fork is the good thing, and threads are
the nonsense. Especially the POSIX threads which were badly bolted on to
Threading models spilled over from the Windows world as windows couldn't (and
win32 still can't, though oddly the Windows kernel can) do proper process
control and certainly fork() was a distance dream. So the mindset of using
threads for everything regardless of whether it was appropriate spilled over
into Unix.
Not sure about that: the book about thread programming I have is from
1995 and seems to focus a lot on Solaris. Did Sun push for that model
for some reason? Of course, Microsoft also pushing didn't help.
I've always thought the popularization of threaded application architectures
in Unix, and especially the FOSS ecosystem, tracked the shift to C++ and
particularly the expansion of C++ GUI development, which early on seemed to
have been lead by cross-over Windows programmers who knew no other way and
continue to believe there is no better way (I still can't fathom how anyone
could think posix_spawn reflects good interface design).

Of course, SunOS and other commercial vendors brought threads to Unix long
before that. And when Java became popular the necessity for good threading
implementations couldn't be ignored any longer by the BSDs and Linux, not
like when threaded architectures were largely limited to niche, proprietary
applications--relational databases, scientific packages, etc.

But the JVM is it's own cloistered universe, both technologically and
socially, where shared memory multithreading makes more sense. I keep coming
back to Windows programmers bringing their GUI event loops with
task-oriented thread pools, C++ object models, and general Windows-rooted
mindset to the Unix world, driving a shift in preferences in the Unix
ecosystem.
Joe Pfeiffer
2020-06-04 14:30:27 UTC
Permalink
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
Post by Rainer Weikusat
Of course, fork is alternatively "a totally useless adjunct to exec" or
"something nobody would ever need to use in a multithreaded process" ...
I have developed the view that fork is the good thing, and threads are
the nonsense. Especially the POSIX threads which were badly bolted on to
Threading models spilled over from the Windows world as windows couldn't (and
win32 still can't, though oddly the Windows kernel can) do proper process
control and certainly fork() was a distance dream. So the mindset of using
threads for everything regardless of whether it was appropriate spilled over
into Unix.
This doesn't match my recollection. I remember threads coming from
research in multiprocessing without the cost of a full context switch (I
heard them called "light weight processes" years before I heard them
called "threads"). You also got a shared memory communication model for
free, since threads share an address space anve had no protection from
each other (which is also their worst feature...).
Post by b***@nowhere.co.uk
The best (worst) example of this IMO is when Firefox switched from a multi
process model to multi threaded to make cross compilation easier, which meant
if any web page/tab crashedit would take out every other browser window
with it.
Post by Kaz Kylheku
Threads should never be used as a poor, resource-intensive substitute
- lexical closures
- continuations
- generators
- coroutines
Here is a dime, kids: get off my lawn and get yourself a better
programming language.
Threading is essentially just multiprocess with very little memory protection.
Yes.
b***@nowhere.co.uk
2020-06-04 14:40:05 UTC
Permalink
On Thu, 04 Jun 2020 08:30:27 -0600
Post by Joe Pfeiffer
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
Post by Rainer Weikusat
Of course, fork is alternatively "a totally useless adjunct to exec" or
"something nobody would ever need to use in a multithreaded process" ...
I have developed the view that fork is the good thing, and threads are
the nonsense. Especially the POSIX threads which were badly bolted on to
Threading models spilled over from the Windows world as windows couldn't (and
win32 still can't, though oddly the Windows kernel can) do proper process
control and certainly fork() was a distance dream. So the mindset of using
threads for everything regardless of whether it was appropriate spilled over
into Unix.
This doesn't match my recollection. I remember threads coming from
research in multiprocessing without the cost of a full context switch (I
I'm not claiming MS invented threads, obviously they didn't. But a critical
mass of developers encountered them programming for Windows and they went on
to develop libraries, applications and methods of developing that as someone
else said, split over into the unix world.
Scott Lurndal
2020-06-04 15:32:37 UTC
Permalink
Post by Joe Pfeiffer
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
Post by Rainer Weikusat
Of course, fork is alternatively "a totally useless adjunct to exec" or
"something nobody would ever need to use in a multithreaded process" ...
I have developed the view that fork is the good thing, and threads are
the nonsense. Especially the POSIX threads which were badly bolted on to
Threading models spilled over from the Windows world as windows couldn't (and
win32 still can't, though oddly the Windows kernel can) do proper process
control and certainly fork() was a distance dream. So the mindset of using
threads for everything regardless of whether it was appropriate spilled over
into Unix.
This doesn't match my recollection. I remember threads coming from
research in multiprocessing without the cost of a full context switch (I
heard them called "light weight processes" years before I heard them
called "threads"). You also got a shared memory communication model for
free, since threads share an address space anve had no protection from
each other (which is also their worst feature...).
Indeed, Digital Unix was an early developer of threading in Unix and
USL (Unix System Labs) fully implemented kernel threads (light-weight
processes, LWPs) in SVR4.2 ES/MP. Long before windows was anything more
than rather limited DOS program. ES/MP supported an M-N thread model
where there could be M threads executing on N LWPs (rather complicated).

Linux, on the other hand, did not implement LWPs; they started with
user-level threads and gradually made 'clone' create something that
resembled unix light-weight processes.

The Digital Unix implementation (and implementers) were primary drivers
of the POSIX 1003.4 threading standards, IIRC.
Rainer Weikusat
2020-06-04 14:33:51 UTC
Permalink
Post by Kaz Kylheku
Post by Rainer Weikusat
Of course, fork is alternatively "a totally useless adjunct to exec" or
"something nobody would ever need to use in a multithreaded process" ...
I have developed the view that fork is the good thing, and threads are
the nonsense. Especially the POSIX threads which were badly bolted on to
the Unix process model---and this still keeps showing through in spite
of years of band-aiding with various ugly API extensions to address this
or that problem.
I still want both, just with some sensible semantics when both are
useful and not something like "nobody will ever need to this" (which I
suspect to be a coded statement of intent and not "something
technical").
James K. Lowden
2020-06-04 15:07:49 UTC
Permalink
On Thu, 04 Jun 2020 15:33:51 +0100
Post by Rainer Weikusat
Post by Kaz Kylheku
I have developed the view that fork is the good thing, and threads
are the nonsense. Especially the POSIX threads which were badly
bolted on to the Unix process model---and this still keeps showing
through in spite of years of band-aiding with various ugly API
extensions to address this or that problem.
I still want both, just with some sensible semantics when both are
useful and not something like "nobody will ever need to this" (which I
suspect to be a coded statement of intent and not "something
technical").
Isn't the basic problem that threads break C memory semantics? Isn't
the basic problem even more basic: that no one has developed a
programming language whose semantics incorporate simultaneity? More
basic still: formal logic rests on sequential reasoning. No attempt at
parallel reasoning has yet been reduced to compilation.

In a single-threaded program, you know that the statement

a = 1;

means the memory address named "a" now has the value of 1. In a
multihreaded program, you don't know that. By the time you get around
to refrencing "a", it might have any value, depending on which thread
touched it last.

C has local variables that are inaccessible to other parts of the
program, whose scope is apparent from the text. It has file-scope
variables inaccessible to other modules, and explicit extern to denote
global (or at least multi-module) variables. The programmer reading
the text can reason about the variable's scope and the sequence of
actions that change its value.

Threads violate all that. Threads return us to the days before
multiprocessing, before virtual address spaces, when the whole of
memory was open to all comers, coordinated or not. Then we hire the
likes of Kaz to build fences and gates to bring order the chaos, as
though the chaos itself wasn't planned, or at least foreseeable.

--jkl
Rainer Weikusat
2020-06-04 18:34:36 UTC
Permalink
Post by James K. Lowden
On Thu, 04 Jun 2020 15:33:51 +0100
Post by Rainer Weikusat
Post by Kaz Kylheku
I have developed the view that fork is the good thing, and threads
are the nonsense. Especially the POSIX threads which were badly
bolted on to the Unix process model---and this still keeps showing
through in spite of years of band-aiding with various ugly API
extensions to address this or that problem.
I still want both, just with some sensible semantics when both are
useful and not something like "nobody will ever need to this" (which I
suspect to be a coded statement of intent and not "something
technical").
Isn't the basic problem that threads break C memory semantics? Isn't
the basic problem even more basic: that no one has developed a
programming language whose semantics incorporate simultaneity? More
basic still: formal logic rests on sequential reasoning. No attempt at
parallel reasoning has yet been reduced to compilation.
In a single-threaded program, you know that the statement
a = 1;
means the memory address named "a" now has the value of 1. In a
multihreaded program, you don't know that. By the time you get around
to refrencing "a", it might have any value, depending on which thread
touched it last.
In a single-threaded program, I don't necessarily know that, either:
Some code may have been writing to a inadvertently by using a corrupted
pointer. In a sufficiently complicated program with sufficiently chaotic
use of shared objects, there might even be different sets of functions
stepping onto each other's toes because of different implicitly defined
"access/ usage protocols" for shared objects.

Which gets us to "threads": Multiple threads of execution using shared
memory for communication need to employ "some sort of protocol" ensuring
that accesses (both read and write) to this shared memory yield
deterministic results. This will usually involve disabling parallell
execution in a suitable way while these accesses happen aka "locking".

I really don't see the problem here.
James Kuyper
2020-06-04 18:43:22 UTC
Permalink
On 6/4/20 11:07 AM, James K. Lowden wrote:
...
Post by James K. Lowden
Isn't the basic problem that threads break C memory semantics? Isn't
the basic problem even more basic: that no one has developed a
programming language whose semantics incorporate simultaneity? More
basic still: formal logic rests on sequential reasoning. No attempt at
parallel reasoning has yet been reduced to compilation. \
C2011 added not only <threads.h>, but also a whole bunch of new wording
addressing precisely the issues you're raising. "C memory semantics"
therefore are now supposed to be compatible with threads. If you believe
otherwise, then first familiarize yourself with what was added in C2011,
and then please identify how it falls short.

Note: I make no claim to being an expert in multi-threaded programming.
Until just a couple of years ago, I'd never worked on multi-threaded
code. While the project I'm currently working on does involve
multi-threaded code, the work I've done on this project has never
required me to worry about that fact. It's entirely possible that I'm
unaware of how badly C2011 addresses these issues.

However, the wording of your message suggests that you may know even
less than I do about the changes made in C2011. My apologies if I'm
wrong about that.
Post by James K. Lowden
In a single-threaded program, you know that the statement
a = 1;
means the memory address named "a" now has the value of 1. In a
multihreaded program, you don't know that. By the time you get around
to refrencing "a", it might have any value, depending on which thread
touched it last.
C has local variables that are inaccessible to other parts of the
program, whose scope is apparent from the text. It has file-scope
variables inaccessible to other modules, and explicit extern to denote
global (or at least multi-module) variables. The programmer reading
the text can reason about the variable's scope and the sequence of
actions that change its value.
And, as of C2011, C now includes _Thread_local variables, and precise,
detailed explanations of how the different storage classes interact with
multi-threaded code.
b***@nowhere.co.uk
2020-06-05 08:27:50 UTC
Permalink
On Thu, 4 Jun 2020 14:43:22 -0400
Post by James Kuyper
....
Post by James K. Lowden
Isn't the basic problem that threads break C memory semantics? Isn't
the basic problem even more basic: that no one has developed a
programming language whose semantics incorporate simultaneity? More
basic still: formal logic rests on sequential reasoning. No attempt at
parallel reasoning has yet been reduced to compilation. \
C2011 added not only <threads.h>, but also a whole bunch of new wording
I suspect like a lot of other people doing unix/linux systems programming
these days, its hard enough to keep up with all the kitchen sinks being thrown
into C++ every 3 years without also learning the divergant updates in C.
I spent a bit of time learning C99, found almost all of the changes apart
from compound literals and VLAs (already supported in C++ anyway) profoundly i
useless and didn't bother with C any further.
Post by James Kuyper
addressing precisely the issues you're raising. "C memory semantics"
therefore are now supposed to be compatible with threads. If you believe
otherwise, then first familiarize yourself with what was added in C2011,
and then please identify how it falls short.
A language shouldn't need to worry about OS internals like threads. Otherwise
if you're going to make it thread aware why not make it process aware while
you're at it?
James K. Lowden
2020-06-05 19:59:27 UTC
Permalink
On Thu, 4 Jun 2020 14:43:22 -0400
Post by James Kuyper
...
Post by James K. Lowden
Isn't the basic problem that threads break C memory semantics?
Isn't the basic problem even more basic: that no one has developed a
programming language whose semantics incorporate simultaneity? More
basic still: formal logic rests on sequential reasoning. No
attempt at parallel reasoning has yet been reduced to compilation. \
C2011 added not only <threads.h>, but also a whole bunch of new
wording addressing precisely the issues you're raising. "C memory
semantics" therefore are now supposed to be compatible with threads.
If you believe otherwise, then first familiarize yourself with what
was added in C2011, and then please identify how it falls short.
The C standard defined memory semantics to match the execution
environment in which C programs are made to operate. In essence, they
declared broken to be the standard.
Post by James Kuyper
4 Two expression evaluations conflict if one of them modifies a
memory location and the other one reads or modifies the same memory
location.
That's not just punting; it's punting the ball out of the stadium. Any
introductory C text will include two lines of code operating on the
same variable. Now, suddenly, when f() writes to a global variable and
g() reads it, they "conflict".

Putting words on a page to warn the programmer that what was once
valid is now "conflict" might indeed, as you say, define C memory
semantics for multithreaded programming. But they haven't brought
threads under the control of the compiler. They've simply lowered the
bar enough to step over it.

You might say that's too much to ask of C to regulate multithreaded
memory access, and I might agree (although not for the same reasons!).
My point was that threads broke C semantics. Even if C later revised
those semantics, it comically makes no attempt to identify
thread-introduced race conditions. The C programmer is at the mercy of
time, with no diagnostic from the C compiler.

Before everyone says, hey, what about X race condition, I get it. I
know two processes can interact with a file or a database or an I/O
port without any help from the compiler in reasoning about their
values. Threads are different because they bring that problem "in
house"; they undermine the very thing the program defines for itself
and claims to manage: memory. They make the "volatile" keyword a
superfluous anachronism, because all statically defined memory is
volatile.

The only programming language I'm aware of that addresses
multithreading coherently is Pike's go. (I don't claim any expertise in
go, only that it seems to have found a way out of the morass.) It
implements Hoare's Communicating Sequential Processes. And there you
have it: instead of reasoning about simultaneity, it reduces each
thread to a sequential process, and controls -- with the compiler --
where they intersect.

The rest of us are contending with threads in languages that don't
control them, whose memory semantics are undermined by them. At best
we have types and functions and libraries, well short of a compiler
that prevents two threads from "conflict".

--jkl
James Kuyper
2020-06-05 21:08:45 UTC
Permalink
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 14:43:22 -0400
...
Post by b***@nowhere.co.uk
Post by James Kuyper
C2011 added not only <threads.h>, but also a whole bunch of new
wording addressing precisely the issues you're raising. "C memory
semantics" therefore are now supposed to be compatible with threads.
If you believe otherwise, then first familiarize yourself with what
was added in C2011, and then please identify how it falls short.
The C standard defined memory semantics to match the execution
environment in which C programs are made to operate.
Yes, and as of C2011, that environment is a potentially multi-threaded one.
Post by b***@nowhere.co.uk
... In essence, they
declared broken to be the standard.
Post by James Kuyper
4 Two expression evaluations conflict if one of them modifies a
memory location and the other one reads or modifies the same memory
location.
Note: the 4 that you included at the beginning of that paragraph
indicates that it is the fourth paragraph of the section it occurs in.
If you look backwards from that point, you will find that it is section
5.1.2.4. Therefore, one good way to cite that clause is as 5.1.2.4p4.
Some people use '/' rather than 'p' to separate the paragraph number
from the section number, and there are a couple of other conventions,
but you should, by some means, convey both the section number and the
paragraph number.
Post by b***@nowhere.co.uk
That's not just punting; it's punting the ball out of the stadium. Any
introductory C text will include two lines of code operating on the
same variable. Now, suddenly, when f() writes to a global variable and
g() reads it, they "conflict".
The term "conflict" in that clause is in italics, an ISO convention
indicating that the sentence it occurs in constitutes the official
definition, within the context of the C standard, of the term "conflict".
That's all that clause does - define the term. Don't try to read
anything more into that sentence. That clause does not prohibit or
penalize code with conflicting expression evaluations. It doesn't say
that it has undefined behavior, or a constraint violation, or even
unspecified behavior.
What matters is what the rest of the standard says about conflicting
expression evaluations.

The only other place the standard uses the term "conflict" in normative
text is in 5.1.2.4p25:
"The execution of a program contains a _data race_ if it contains two
conflicting actions in different threads, at least one of which is not
atomic, and neither happens before the other. Any such data race results
in undefined behavior."

I've used underscores to indicate that the first occurrence of "data
race" in that paragraph is also italicized, making the first sentence of
that clause the official definition of that term, and that's all that it
is. It's the second sentence that actually indicates the possibility of
a problem.
"happens before" might seem to be imprecise wording, but it is in fact
another piece of C standard jargon. It is very precisely defined in
5.1.2.4p18.

I get the impression that your objections to the definition of
"conflict" is due to thinking that all conflicting expression
evaluations are problematic. In fact, the C standard only says that they
have undefined behavior if:
1. They occur in different threads.
2. At least one of them is not atomic.
3. Neither one is guaranteed to happen before the other, so it is
permissible for either one to occur first.

A significant fraction of the complexity of multi-threaded programming
appears (I have no personal experience with this) to involve finding
ways make sure that at least one of those 3 requirements fails to apply
whenever there are two or more conflicting expression evaluations.
Post by b***@nowhere.co.uk
Putting words on a page to warn the programmer that what was once
valid is now "conflict" might indeed, as you say, define C memory
semantics for multithreaded programming. But they haven't brought
threads under the control of the compiler. They've simply lowered the
bar enough to step over it.
You might say that's too much to ask of C to regulate multithreaded
No, I don't - and the clauses cited above define precisely how it
regulates them.
Post by b***@nowhere.co.uk
memory access, and I might agree (although not for the same reasons!).
My point was that threads broke C semantics. Even if C later revised
those semantics, it comically makes no attempt to identify
thread-introduced race conditions.
Yes, it does - that's what 5.1.2.4p18 is about.
James Kuyper
2020-06-05 21:13:13 UTC
Permalink
On 6/5/20 5:08 PM, James Kuyper wrote:
...
Post by James Kuyper
The only other place the standard uses the term "conflict" in normative
"The execution of a program contains a _data race_ if it contains two
conflicting actions in different threads, at least one of which is not
atomic, and neither happens before the other. Any such data race results
in undefined behavior."
...
Post by James Kuyper
Post by James K. Lowden
memory access, and I might agree (although not for the same reasons!).
My point was that threads broke C semantics. Even if C later revised
those semantics, it comically makes no attempt to identify
thread-introduced race conditions.
Yes, it does - that's what 5.1.2.4p18 is about.
Correction: 5.1.2.4p25.
James K. Lowden
2020-06-06 21:35:12 UTC
Permalink
On Fri, 5 Jun 2020 17:08:45 -0400
Post by James Kuyper
Even if C later revised those semantics, it comically makes no
attempt to identify thread-introduced race conditions.
Yes, it does - that's what 5.1.2.4p18 is about.
...
Post by James Kuyper
Correction: 5.1.2.4p25.
james, first, thanks for your precision and patience.

Your explanation of how to read the standard is also helpful; I doubt
I'm the only one who benefits.

Second, I almost have to remind myself that my complaint isn't about C,
but about threads.

I don't need to remind you that valid C functions become invalid in the
presence of threads. No emendment of the C standard changes that.

Let us acknowledge that the C standard now describes with precision
what we've known about threads since Moses parted the first one. I see
how that helps the person writing the compiler to understand the
parameters of his problem domain. I don't see how it makes much
difference to the person using the compiler.

I joined this thread (as it were) asking if the problem with threads is
that they break C memory semantics. I think we have established they
did, but we can't say "broken" because C has redefined those semantics.

I further suggested the language does not exist whose memory semantics
match what threads present because we lack any formal foundation for
controlling threads as they now exist. We have no Turing machine or
lambda calculus that describes computation in a multithreaded
environment. The best we have, to my knowledge, are disciplines:
parallel computation over a shared store (SIMD, as someone mentioned),
and CSP.

That is, to put it mildly, a real problem.

Adopting threads without a language to express them and manage them (if
it exists) was pure folly. We brought on ourselves, and indeed
society, a continuing, incalculable, almost unimaginable cost.

We have spent decades of accumulated effort to adapt the C standard
library and the C standard to the presence of threads. Not to mention
Posix. Not to mention programmer-centuries devoted to eradicating the
bugs they make possible or, as you say, "finding ways make sure that at
least one of those 3 requirements fails to apply". That's human labor
coping with complexity -- intentionally introduced -- that could have
been avoided or might, perhaps, with the right language, have been
controlled by the compiler.

Madness.

--jkl
James Kuyper
2020-06-07 01:16:52 UTC
Permalink
On 6/6/20 5:35 PM, James K. Lowden wrote:
...
Post by James K. Lowden
I don't need to remind you that valid C functions become invalid in the
presence of threads. No emendment of the C standard changes that.
Actually you do need to remind me of that. First of all, that's because
I have very little experience with multi-threading, but secondly because
what you say is in conflict with what little understanding I do have of
the subject.

Let's first be clear about what you're claiming. Are you saying that all
valid C functions do become invalid, or only that any C function might
become invalid - or something else?

My personal understanding is as follows:

"valid" isn't a term used in this context by the C standard, so let me
use a more precise way of saying what I think you mean by that. We start
with a single-threaded program. The C standard, the documentation for
the specific implementation of C that you are using, and the
documentation of all third party libraries that it links with can be
used to prove that the observable behavior should be what you want it to be.

Then somebody decides to modify the program to make it multi-threaded.
Any time that there's conflicting expression evaluations that occur in
different threads, care must be taken to ensure that no data races occur
between them. However, functions that don't involve any such evaluations
should continue to work correctly, just as they did in the single
threaded version of the code. Furthermore, in functions where a data
race could occur, if you take appropriate actions to make sure that the
data race doesn't occur, the function should still continue working
correctly. "taking appropriate actions" might, in some cases, be quite
difficult, but my impression is that it's should usually be possible.

I freely admit that, due to my lack of relevant experience, that
understanding might be dead wrong - but if so, I would appreciate an
explanation about what's wrong with it.
Post by James K. Lowden
Let us acknowledge that the C standard now describes with precision
what we've known about threads since Moses parted the first one.
Are you referring to when he parted the Red C? :-)

Keep in mind that what the C standard says now about threads is stuff
that used to be outside of it's domain. If you wrote C code to work with
POSIX threads, it was POSIX, not C, which provided the relevant
guarantees. And those guarantees might be very different, or only subtly
different, or more likely a mixture of both, from the ones provided
under Windows, or any other operating system that provided some way of
implementing multi-threaded code.

It's only because <threads.h> was added in C2011 that it became
necessary for C itself to say something about these issues.
James K. Lowden
2020-06-08 16:30:33 UTC
Permalink
On Sat, 6 Jun 2020 21:16:52 -0400
Post by James Kuyper
Let's first be clear about what you're claiming. Are you saying that
all valid C functions do become invalid, or only that any C function
might become invalid - or something else?
The latter. Any C function not written to account for threads may be
invalid -- not reliably produce promised results -- in the presence of
threads. In particular, any C function that uses a static variable.
I'm thinking of readdir and strerror, for example.

I don't think that's controversial, or a deep insight. The
reproduction of so many Posix functions with _r variants is evidence
enough.

By definition, every one of those _r function is more complex than the
one it replaces, because it requires the caller to manage the buffer
and pass the buffer as an argument. That, to me, is the tip of the
iceberg of the problems brought on by threads.
Post by James Kuyper
Keep in mind that what the C standard says now about threads is stuff
that used to be outside of it's domain.
Quite so.

--jkl
Rainer Weikusat
2020-06-08 17:53:29 UTC
Permalink
Post by James K. Lowden
Post by James Kuyper
Let's first be clear about what you're claiming. Are you saying that
all valid C functions do become invalid, or only that any C function
might become invalid - or something else?
The latter. Any C function not written to account for threads may be
invalid -- not reliably produce promised results -- in the presence of
threads. In particular, any C function that uses a static variable.
I'm thinking of readdir and strerror, for example.
I don't think that's controversial, or a deep insight. The
reproduction of so many Posix functions with _r variants is evidence
enough.
By definition, every one of those _r function is more complex than the
one it replaces, because it requires the caller to manage the buffer
and pass the buffer as an argument. That, to me, is the tip of the
iceberg of the problems brought on by threads.
That's a self-inflicted wound: They could all be implemented using other
facilities for "thread-specific data" (which meanwhile exist).
James K. Lowden
2020-06-09 23:36:36 UTC
Permalink
On Mon, 08 Jun 2020 18:53:29 +0100
Post by James K. Lowden
By definition, every one of those _r function is more complex than
the one it replaces, because it requires the caller to manage the
buffer and pass the buffer as an argument. That, to me, is the tip
of the iceberg of the problems brought on by threads.
Well, yes. As Inspector Clouseau said, "That is what I have been
saying all along." :-)
They could all be implemented using other facilities for
"thread-specific data" (which meanwhile exist).
In 2020, on some platforms, yes. It's too bad that, when Posix was
inventing thread-safe functions, they didn't bother to look over their
shoulder at Microsoft's thread-local storage.

But that's only the most obvious ugliness introduced by threads. At
some level unavoidable, given the existence and adoption of threads,
and a C standard library that has to cope. And given a world in
which where the OS, the compiler, and the library all dance to different
drummers. (The mixed metaphor is apt for the degree of coordination and
planning.)

The deeper problem is not just that C got harder to use in the presence
of threads. (I think even you'd admit that's true.) The deeper
problem is that *no* language ever solved the problem they introduced.
Some might claim to; I don't know. I have my doubts, because afaik
they'd lack any formal foundation. In any case, we don't use one.

--jkl
Keith Thompson
2020-06-08 19:07:27 UTC
Permalink
Post by James K. Lowden
On Sat, 6 Jun 2020 21:16:52 -0400
Post by James Kuyper
Let's first be clear about what you're claiming. Are you saying that
all valid C functions do become invalid, or only that any C function
might become invalid - or something else?
The latter. Any C function not written to account for threads may be
invalid -- not reliably produce promised results -- in the presence of
threads. In particular, any C function that uses a static variable.
I'm thinking of readdir and strerror, for example.
Consider this:

int half(int n) { return n / 2; }

It's not written to account for threads, but it's still valid in
the presence of threads.

I don't think you meant to imply that *all* C functions can become
invalid in the presence of threads, but what you wrote above could
be interpreted that way.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
James K. Lowden
2020-06-09 23:36:33 UTC
Permalink
On Mon, 08 Jun 2020 12:07:27 -0700
Post by Keith Thompson
Post by James K. Lowden
Any C function not written to account for threads may be
invalid -- not reliably produce promised results -- in the presence
of threads. In particular, any C function that uses a static
variable. I'm thinking of readdir and strerror, for example.
int half(int n) { return n / 2; }
It's not written to account for threads, but it's still valid in
the presence of threads.
I don't think you meant to imply that *all* C functions can become
invalid in the presence of threads, but what you wrote above could
be interpreted that way.
I'll cop to imprecision. I did say "any function may", which I think
the IETF would not interpret as "every function does". But I'm
open to a better version that is less subject to misinterpretation.

--jkl
James Kuyper
2020-06-08 22:26:24 UTC
Permalink
Post by James K. Lowden
On Sat, 6 Jun 2020 21:16:52 -0400
Post by James Kuyper
Let's first be clear about what you're claiming. Are you saying that
all valid C functions do become invalid, or only that any C function
might become invalid - or something else?
The latter. Any C function not written to account for threads may be
invalid -- not reliably produce promised results -- in the presence of
threads. In particular, any C function that uses a static variable.
I'm thinking of readdir and strerror, for example.
That's a pretty weak statement, on a par with "Any C function might call
abort()". While perfectly true, it's only the functions that actually
call abort() (including indirectly via assert() or by a subroutine) that
you need to worry about abort()ing.
Many functions are self-contained in a way that makes it easy to confirm
that they will work in a multi-threaded environment, despite not having
been designed with such an environment in mind.

It is, of course, extremely common for C functions to not be
self-contained. In particular, it's common to access things through
pointers, and if a pointer is passed to a function that points at memory
that can also be accessed by a function running in a different thread,
such code can certainly be a problem. But, as I understand it (and
correct me if I'm wrong about this) - the calling routine can make calls
to such a routine safe by taking appropriate measures to ensure that all
conflicting expression evaluations involving such memory by code in
other threads are guaranteed to either happen before the start of the
function call, or to happen after the end of that call - for instance,
by correct use of a mutex.
James K. Lowden
2020-06-09 23:36:30 UTC
Permalink
On Mon, 8 Jun 2020 18:26:24 -0400
Post by James Kuyper
Post by James K. Lowden
Any C function not written to account for threads may be
invalid -- not reliably produce promised results -- in the presence
of threads. In particular, any C function that uses a static
variable. I'm thinking of readdir and strerror, for example.
That's a pretty weak statement, on a par with "Any C function might
call abort()".
I'm not denying functions do surprising things. I'm saying many
functions stopped working as promised in a multithreaded environment.
Using the C standard library as an example, It was not enough to
re-write them; they had to be replaced with a different interface.
Post by James Kuyper
Many functions are self-contained in a way that makes it easy to
confirm that they will work in a multi-threaded environment, despite
not having been designed with such an environment in mind.
Yes. Having done battle with threads for decades, we know what to
watch for.
Post by James Kuyper
the calling routine can make calls to such a routine safe by taking
appropriate measures
Yes. And, in C, that's all you've got: your noggin and your time.
Thus it will ever be, it seems.

Here's what does not happen, not in C and not, afaik, in any
imperative language: the compiler identifies race conditions, and
enables the programmer to remove them by declaration.

Someone will say, wait, we've always had race conditions. it's taken
years just to identify sequence points, and global variables (not to
mention pointers) are always hairy. All true.

Yet and still, the C programmer in a single-threaded environment can,
in principle, start at main() and walk, step by sequential step, to the
end. He *knows* the entire sequence of the process, just by looking at
the text of the program. Static analysis will reveal variables set and
not used, used before set, and so on.

That same programmer, faced with a multithreaded environment, *cannot*
know the entire sequence of the process. Two threads operating over
the same data interact stochastically. The "appropriate measures"
you allude to are not (for the most part) part of any *language*;
they're library functions that coodinate threads. Do they thereby
coordinate access to the data? Maybe. Will the compiler guarantee
it? No, it's up to the programmer.

--jkl
James Kuyper
2020-06-10 04:01:24 UTC
Permalink
Post by James K. Lowden
On Mon, 8 Jun 2020 18:26:24 -0400
...
Post by James K. Lowden
Post by James Kuyper
the calling routine can make calls to such a routine safe by taking
appropriate measures
...
Post by James K. Lowden
the same data interact stochastically. The "appropriate measures"
you allude to are not (for the most part) part of any *language*;
they're library functions that coodinate threads.
The C standard describes both a programming language (described in
section 6 of the C standard) and an associated standard library
(described in section 7). And, as of C2011, changes that were made to
support multi-threaded code occurred in both sections 6 and 7.
The non-library changes that were made include "... an improved memory
sequencing model, atomic objects, and thread-local storage ..." (Forward
p6).
Rainer Weikusat
2020-06-10 21:48:06 UTC
Permalink
[...]
Post by James K. Lowden
Post by James Kuyper
Many functions are self-contained in a way that makes it easy to
confirm that they will work in a multi-threaded environment, despite
not having been designed with such an environment in mind.
Yes. Having done battle with threads for decades, we know what to
watch for.
Functions should usually be self-contained in the sense that they have
neither implicit inputs nor implicit outputs nor implicit, persistent
state.

If this is the case, expressions containing calls to a function can be
refactored in various ways without changing the semantics of the code.
Post by James K. Lowden
Post by James Kuyper
the calling routine can make calls to such a routine safe by taking
appropriate measures
Yes. And, in C, that's all you've got: your noggin and your time.
Thus it will ever be, it seems.
[...]
Post by James K. Lowden
Yet and still, the C programmer in a single-threaded environment can,
in principle, start at main() and walk, step by sequential step, to the
end. He *knows* the entire sequence of the process, just by looking at
the text of the program. Static analysis will reveal variables set and
not used, used before set, and so on.
This isn't generally true: It's possible to have race conditions
entirely without 'true' parallellism provided a program is driven by
external events, such as different kinds of inputs from the network,
which can happen in an unpredictable order.

I had to fix a couple of these in the past in single-threaded processes
structured around event loops.
b***@nowhere.co.uk
2020-06-11 09:24:12 UTC
Permalink
On Wed, 10 Jun 2020 22:48:06 +0100
Post by Rainer Weikusat
[...]
Post by James K. Lowden
Post by James Kuyper
Many functions are self-contained in a way that makes it easy to
confirm that they will work in a multi-threaded environment, despite
not having been designed with such an environment in mind.
Yes. Having done battle with threads for decades, we know what to
watch for.
Functions should usually be self-contained in the sense that they have
neither implicit inputs nor implicit outputs nor implicit, persistent
state.
If this is the case, expressions containing calls to a function can be
refactored in various ways without changing the semantics of the code.
Post by James K. Lowden
Post by James Kuyper
the calling routine can make calls to such a routine safe by taking
appropriate measures
Yes. And, in C, that's all you've got: your noggin and your time.
Thus it will ever be, it seems.
[...]
Post by James K. Lowden
Yet and still, the C programmer in a single-threaded environment can,
in principle, start at main() and walk, step by sequential step, to the
end. He *knows* the entire sequence of the process, just by looking at
the text of the program. Static analysis will reveal variables set and
not used, used before set, and so on.
This isn't generally true: It's possible to have race conditions
entirely without 'true' parallellism provided a program is driven by
external events, such as different kinds of inputs from the network,
which can happen in an unpredictable order.
You can't have a race when there's only 1 thread. You're simply refering to
unforseen code paths which are pretty standard bugs in any complex code.
Rainer Weikusat
2020-06-11 14:54:55 UTC
Permalink
Post by b***@nowhere.co.uk
On Wed, 10 Jun 2020 22:48:06 +0100
Post by Rainer Weikusat
[...]
Post by James K. Lowden
Post by James Kuyper
Many functions are self-contained in a way that makes it easy to
confirm that they will work in a multi-threaded environment, despite
not having been designed with such an environment in mind.
Yes. Having done battle with threads for decades, we know what to
watch for.
Functions should usually be self-contained in the sense that they have
neither implicit inputs nor implicit outputs nor implicit, persistent
state.
If this is the case, expressions containing calls to a function can be
refactored in various ways without changing the semantics of the code.
Post by James K. Lowden
Post by James Kuyper
the calling routine can make calls to such a routine safe by taking
appropriate measures
Yes. And, in C, that's all you've got: your noggin and your time.
Thus it will ever be, it seems.
[...]
Post by James K. Lowden
Yet and still, the C programmer in a single-threaded environment can,
in principle, start at main() and walk, step by sequential step, to the
end. He *knows* the entire sequence of the process, just by looking at
the text of the program. Static analysis will reveal variables set and
not used, used before set, and so on.
This isn't generally true: It's possible to have race conditions
entirely without 'true' parallellism provided a program is driven by
external events, such as different kinds of inputs from the network,
which can happen in an unpredictable order.
You can't have a race when there's only 1 thread. You're simply refering to
unforseen code paths which are pretty standard bugs in any complex code.
"Unforeseen code path" is a generic alias for "software error". I was
specifically referring to implicit ordering dependencies in different
code sections despite no order of execution being guaranteed, ie,
there's a code section A and a code section B. Which of the two will be
executed next depends on what kind of message is received from the network
next. But correct operation requires that A is always executed before B
(or B before A).

There's no way to determine what the order of execution will be by
examining the code.

If a definition of "race condition" exists which excludes this
situation, it's too narrow, as that's the exact same problem.
James Kuyper
2020-06-11 16:47:22 UTC
Permalink
Post by Rainer Weikusat
Post by b***@nowhere.co.uk
On Wed, 10 Jun 2020 22:48:06 +0100
...
Post by Rainer Weikusat
Post by b***@nowhere.co.uk
Post by Rainer Weikusat
Post by James K. Lowden
Yet and still, the C programmer in a single-threaded environment can,
in principle, start at main() and walk, step by sequential step, to the
end. He *knows* the entire sequence of the process, just by looking at
the text of the program. Static analysis will reveal variables set and
not used, used before set, and so on.
No, you can't always be sure of the entire sequence of the process, even
if it's single-threaded code with no signal handlers.

When quoting the standard below, I indicate phrases that are in italics
by surrounding them with _underscores_. That's an ISO convention
indicating that the sentence in which the italicized phrase occurs
constitutes the official definition of that phrase.

In the C standard, section 5.1.2.3p3,
"_Sequenced before_ is a ... relation between evaluations executed by a
single thread ...". "If A is not sequenced before or after B, then A and
B are _unsequenced_. Evaluations A and B are _indeterminately sequenced_
when A is sequenced either before or after B, but it is unspecified which."

For example, in the expression "p++ = q++", the two ++ operations are
unsequenced (6.5p3), while in the expression A()+B(), the executions of
A() and B() are indeterminately sequenced (6.5.2.2p10). The same is true
of int array[2] = {A(), B()}; (6.7.9p23).

Unsequenced code can cause problems that do not come up with code that
is merely indeterminately sequenced:
"If a side effect on a scalar object is unsequenced relative to either a
different side effect on the same scalar object or a value computation
using the value of the same scalar object, the behavior is undefined."
(6.5p2).
Post by Rainer Weikusat
Post by b***@nowhere.co.uk
Post by Rainer Weikusat
This isn't generally true: It's possible to have race conditions
entirely without 'true' parallellism provided a program is driven by
external events, such as different kinds of inputs from the network,
which can happen in an unpredictable order.
You can't have a race when there's only 1 thread. You're simply refering to
unforseen code paths which are pretty standard bugs in any complex code.
"Unforeseen code path" is a generic alias for "software error". I was
specifically referring to implicit ordering dependencies in different
code sections despite no order of execution being guaranteed, ie,
there's a code section A and a code section B. Which of the two will be
executed next depends on what kind of message is received from the network
next. But correct operation requires that A is always executed before B
(or B before A).
There's no way to determine what the order of execution will be by
examining the code.
If a definition of "race condition" exists which excludes this
situation, it's too narrow, as that's the exact same problem.
The definition of "_inter-thread happens before_" (5.1.2.4p16) is more
complicated than the single-threaded concept of "sequenced before", but
it is essentially the corresponding concept.

The same-thread and different-thread concepts get merged together: "An
evaluation A _happens before_ an evaluation B if A is sequenced before B
or A inter-thread happens before B." (5.1.2.4p18).

"The execution of a program contains a _data race_ if it contains two
conflicting actions in different threads, at least one of which is not
atomic, and neither happens before the other. Any such data race results
in undefined behavior." (5.1.2.4p25)

Note that it's impossible in C to have what the C standard calls a data
race unless the code is multi-threaded. As a result, I think that
"happens before" in that clause could be replaced by the more specific
term, "inter-thread happens before", without change in meaning - I'm not
sure why they didn't do so.

However, in any case where the fact that conflicting actions occur in
the same thread is the only reason that they don't qualify as a data
race, then 6.5p2 applies instead. Either way, the behavior is undefined,
so 6.5p2 describes the single-threaded problem that corresponds to data
races in multi-threaded code.

However, the problem you describe can and should be dealt with using
code that it indeterminately sequenced, not unsequenced. As such, it's a
different kind of problem than that described by either 5.1.2.4p25 or
6.5p2. You might not be able force the desired sequence, but you are
free to write code that copes with things occurring in the wrong
sequence, something that's not the case if either 5.1.2.4p24 or 6.5p2
applies. The only thing you can do about those cases is prevent them
from happening.

For instance, depending upon the context, your problem might be solvable
by saving the information needed to perform each step, and deferring the
actual execution of each step until after any prerequisite steps have
been executed. That's not always possible, but your description of the
problem is sufficiently abstract to include cases where it is possible.
Rainer Weikusat
2020-06-11 21:40:56 UTC
Permalink
Rainer Weikusat <***@talktalk.net> writes:

[...]
Post by Rainer Weikusat
"Unforeseen code path" is a generic alias for "software error". I was
specifically referring to implicit ordering dependencies in different
code sections despite no order of execution being guaranteed, ie,
there's a code section A and a code section B. Which of the two will be
executed next depends on what kind of message is received from the network
next. But correct operation requires that A is always executed before B
(or B before A).
There's no way to determine what the order of execution will be by
examining the code.
If a definition of "race condition" exists which excludes this
situation, it's too narrow, as that's the exact same problem.
Just to illustrate this a little. The output of the program below
depends on a race condition:

------
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

static int a = 1;

static void *ta(void *p)
{
usleep(20);

a += 2;
return NULL;
}

static void *tb(void *p)
{
usleep(20);

a -= 5;
return NULL;
}

int main(void)
{
pthread_t thr;

pthread_create(&thr, NULL, ta, NULL);
pthread_create(&thr, NULL, tb, NULL);

usleep(50);

printf("%d\n", a);

return 0;
}
------

The exact same thing can happen if ta and tb don't run in different
threads but are executed in response to different inputs with an
unpredictable ordering.
James Kuyper
2020-06-12 05:10:43 UTC
Permalink
Post by Rainer Weikusat
[...]
Post by Rainer Weikusat
"Unforeseen code path" is a generic alias for "software error". I was
specifically referring to implicit ordering dependencies in different
code sections despite no order of execution being guaranteed, ie,
there's a code section A and a code section B. Which of the two will be
executed next depends on what kind of message is received from the network
next. But correct operation requires that A is always executed before B
(or B before A).
There's no way to determine what the order of execution will be by
examining the code.
If a definition of "race condition" exists which excludes this
situation, it's too narrow, as that's the exact same problem.
Just to illustrate this a little. The output of the program below
------
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
static int a = 1;
static void *ta(void *p)
{
usleep(20);
a += 2;
return NULL;
}
static void *tb(void *p)
{
usleep(20);
a -= 5;
return NULL;
}
int main(void)
{
pthread_t thr;
pthread_create(&thr, NULL, ta, NULL);
pthread_create(&thr, NULL, tb, NULL);
usleep(50);
printf("%d\n", a);
return 0;
}
------
The exact same thing can happen if ta and tb don't run in different
threads but are executed in response to different inputs with an
unpredictable ordering.
I'm afraid I don't follow that - which might mean I'm missing something.

Because ta() and tb() run in different threads in the same program, and
a is not atomic, and nothing is done to make sure that one of them is
happens before the other, the behavior is undefined, so absolutely
nothing is guaranteed about the behavior of this program.

If ta() and tb() were both called once in single-threaded code in
response to different inputs with an unpredictable order, then after the
first call is completed, but before the second one has started, a is
guaranteed to have either the value 3 or the value -4. After both calls
have completed, a is guaranteed to end up with a value of -2.

Those don't sound like the same kind of problem to me. What am I missing?
Rainer Weikusat
2020-06-12 14:48:27 UTC
Permalink
Post by James Kuyper
Post by Rainer Weikusat
[...]
Post by Rainer Weikusat
"Unforeseen code path" is a generic alias for "software error". I was
specifically referring to implicit ordering dependencies in different
code sections despite no order of execution being guaranteed,
[...]
Post by James Kuyper
Post by Rainer Weikusat
Just to illustrate this a little. The output of the program below
------
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
static int a = 1;
static void *ta(void *p)
{
usleep(20);
a += 2;
return NULL;
}
static void *tb(void *p)
{
usleep(20);
a -= 5;
return NULL;
}
int main(void)
{
pthread_t thr;
pthread_create(&thr, NULL, ta, NULL);
pthread_create(&thr, NULL, tb, NULL);
usleep(50);
printf("%d\n", a);
return 0;
}
------
The exact same thing can happen if ta and tb don't run in different
threads but are executed in response to different inputs with an
unpredictable ordering.
I'm afraid I don't follow that - which might mean I'm missing something.
Because ta() and tb() run in different threads in the same program, and
a is not atomic, and nothing is done to make sure that one of them is
happens before the other, the behavior is undefined, so absolutely
nothing is guaranteed about the behavior of this program.
Some C standard makes no demands about this. But that's an aside which
doesn't really matter here. There are three threads of execution
involved here and the output of the program varies depending on the
order of read and write accesses to a. That's what this makes "a race
condition": All three threads are racing with each other and outcome of
the race will be reflected in the output.
Post by James Kuyper
If ta() and tb() were both called once in single-threaded code in
response to different inputs with an unpredictable order, then after the
first call is completed, but before the second one has started, a is
guaranteed to have either the value 3 or the value -4. After both calls
have completed, a is guaranteed to end up with a value of -2.
That's a deficiency of my example. Something more convoluted could be

static int a = 1, b, c;

static void *ta(void *p)
{
usleep(20);

if (b) c = 3;
a = 0;

return NULL;
}

static void *tb(void *p)
{
usleep(20);

b = 1;
if (a) c = 17;

return NULL;
}

and print the value of c which will either be 0, 3 or 17. In absence of
threads, the possible outcomes are

only ta called => 0
only tb called => 17
ta before tb => 0
tb before ta => 3

That's the same set of outcomes because it's the same race.

The actual code was obviously much more complicated than this. The
situation was roughly that the code would only work as intended if a
message A was either not received or received before a message B but not
if A was received after B.

Someone who can read the code could have predicted the possibility of
this, in exactly the same way someone looking at multithreaded code can
predict which outcomes might happen, but it had remained a more or less
theoretical possibility, not an (deterministic) "it will" or "it won't".
James Kuyper
2020-06-12 16:46:54 UTC
Permalink
...
Post by Rainer Weikusat
Post by James Kuyper
Post by Rainer Weikusat
Just to illustrate this a little. The output of the program below
------
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
static int a = 1;
static void *ta(void *p)
{
usleep(20);
a += 2;
return NULL;
}
static void *tb(void *p)
{
usleep(20);
a -= 5;
return NULL;
}
int main(void)
{
pthread_t thr;
pthread_create(&thr, NULL, ta, NULL);
pthread_create(&thr, NULL, tb, NULL);
usleep(50);
printf("%d\n", a);
return 0;
}
------
The exact same thing can happen if ta and tb don't run in different
threads but are executed in response to different inputs with an
unpredictable ordering.
I'm afraid I don't follow that - which might mean I'm missing something.
Because ta() and tb() run in different threads in the same program, and
a is not atomic, and nothing is done to make sure that one of them is
happens before the other, the behavior is undefined, so absolutely
nothing is guaranteed about the behavior of this program.
Some C standard makes no demands about this. But that's an aside which
doesn't really matter here.
It matters a great deal to me. Undefined behavior is the worst possible
case, far worse than having a only a finite number of different outcomes
that all have a non-negligible chance of occurring.
Post by Rainer Weikusat
... There are three threads of execution
involved here and the output of the program varies depending on the
order of read and write accesses to a. That's what this makes "a race
condition": All three threads are racing with each other and outcome of
the race will be reflected in the output.
If there is any, which isn't guaranteed. Neither are there any
requirements constraining what that output would be.
Post by Rainer Weikusat
Post by James Kuyper
If ta() and tb() were both called once in single-threaded code in
response to different inputs with an unpredictable order, then after the
first call is completed, but before the second one has started, a is
guaranteed to have either the value 3 or the value -4. After both calls
have completed, a is guaranteed to end up with a value of -2.
That's a deficiency of my example. Something more convoluted could be
static int a = 1, b, c;
static void *ta(void *p)
{
usleep(20);
if (b) c = 3;
a = 0;
return NULL;
}
static void *tb(void *p)
{
usleep(20);
b = 1;
if (a) c = 17;
return NULL;
}
and print the value of c which will either be 0, 3 or 17. In absence of
threads, the possible outcomes are
only ta called => 0
only tb called => 17
ta before tb => 0
tb before ta => 3
That's the same set of outcomes because it's the same race.
No, the set of outcomes for the threaded case also includes -1234567890,
586.736, or "You don't understand what undefined behavior means." There
is, in fact, no describable outcome that isn't permitted. Most of those
possibilities are vanishingly unlikely (they'd have to be, since there's
infinitely many of them), but some outcomes that don't match any of the
three you've described have a non-negligible chance of occurring. From
what I've heard, there are machines where the accesses to a or b could
interfere with each other, so that the actual results won't necessarily
be consistent with any of the orders permitted in the unthreaded case -
for instance, on platforms where the bit-width is less than the number
of bits in an 'int', int operations might require multiple accesses,
which might be interleaved between ta() and tb(). An implementation is
required to prevent such interleaving in the single-threaded case, but
not in the multi-threaded case. Since all of the initial, intermediate,
and final values of both a and b in your example are small positive
integers, that wouldn't be problematic in this particular case - but in
general, it would be.
Rainer Weikusat
2020-06-12 17:26:50 UTC
Permalink
[...]
Post by James Kuyper
Post by Rainer Weikusat
Some C standard makes no demands about this. But that's an aside which
doesn't really matter here.
It matters a great deal to me. Undefined behavior is the worst possible
case, far worse than having a only a finite number of different outcomes
that all have a non-negligible chance of occurring.
This is not a discussion about the (somewhat) formal definition of the C
language, hence, the "abstract machine" employed to describe the
semantics of that doesn't matter. It's about race conditions occurring
in actual code executing on a real computer.
Post by James Kuyper
Post by Rainer Weikusat
... There are three threads of execution
involved here and the output of the program varies depending on the
order of read and write accesses to a. That's what this makes "a race
condition": All three threads are racing with each other and outcome of
the race will be reflected in the output.
If there is any, which isn't guaranteed. Neither are there any
requirements constraining what that output would be.
If you are convinced that race conditions don't really exist - as this
text suggests - why do you claim to be worried about them? After all,
it's just more behaviour some C standard leaves undefined and it's not
more undefined that any of the other.
Post by James Kuyper
Post by Rainer Weikusat
Post by James Kuyper
If ta() and tb() were both called once in single-threaded code in
response to different inputs with an unpredictable order, then after the
first call is completed, but before the second one has started, a is
guaranteed to have either the value 3 or the value -4. After both calls
have completed, a is guaranteed to end up with a value of -2.
That's a deficiency of my example. Something more convoluted could be
static int a = 1, b, c;
static void *ta(void *p)
{
usleep(20);
if (b) c = 3;
a = 0;
return NULL;
}
static void *tb(void *p)
{
usleep(20);
b = 1;
if (a) c = 17;
return NULL;
}
and print the value of c which will either be 0, 3 or 17. In absence of
threads, the possible outcomes are
only ta called => 0
only tb called => 17
ta before tb => 0
tb before ta => 3
That's the same set of outcomes because it's the same race.
No, the set of outcomes for the threaded case also includes -1234567890,
586.736, or "You don't understand what undefined behavior means."
I do understand what "undefined behaviour" means, namely "the C standard
doesn't require anything about this situation". This, in turn, means the
C standard is (intentionally) useless as source of information about
this behaviour. And nothing else.

But that's -- see above -- still entirely besides the point.
James Kuyper
2020-06-12 17:45:51 UTC
Permalink
Post by Rainer Weikusat
[...]
Post by James Kuyper
Post by Rainer Weikusat
Some C standard makes no demands about this. But that's an aside which
doesn't really matter here.
It matters a great deal to me. Undefined behavior is the worst possible
case, far worse than having a only a finite number of different outcomes
that all have a non-negligible chance of occurring.
This is not a discussion about the (somewhat) formal definition of the C
language, hence, the "abstract machine" employed to describe the
semantics of that doesn't matter. It's about race conditions occurring
in actual code executing on a real computer.
And actual code executing on real computers can and sometimes does
behave in ways that are permitted by the C standard that don't fit your
model.
Post by Rainer Weikusat
Post by James Kuyper
Post by Rainer Weikusat
... There are three threads of execution
involved here and the output of the program varies depending on the
order of read and write accesses to a. That's what this makes "a race
condition": All three threads are racing with each other and outcome of
the race will be reflected in the output.
If there is any, which isn't guaranteed. Neither are there any
requirements constraining what that output would be.
If you are convinced that race conditions don't really exist - as this
text suggests - why do you claim to be worried about them?
I am very concerned about the fact that data races (as that term is
defined in the C standard) can exist, and if they do exist, have
undefined behavior, and that real implementations of C have behavior in
such cases that is only allowed because the behavior is undefined.
Undefined behavior is precisely what I was referring to in that
paragraph. I can't imagine what would lead you to interpret that as a
claim that race conditions don't exist.
Post by Rainer Weikusat
... After all,
it's just more behaviour some C standard leaves undefined and it's not
more undefined that any of the other.
All undefined behavior is deserving of top attention - unless the
behavior that the C standard leaves undefined is defined by some other
applicable document. Do you know of any? If so, which document, and what
does it say about such issues?
Post by Rainer Weikusat
Post by James Kuyper
Post by Rainer Weikusat
Post by James Kuyper
If ta() and tb() were both called once in single-threaded code in
response to different inputs with an unpredictable order, then after the
first call is completed, but before the second one has started, a is
guaranteed to have either the value 3 or the value -4. After both calls
have completed, a is guaranteed to end up with a value of -2.
That's a deficiency of my example. Something more convoluted could be
static int a = 1, b, c;
static void *ta(void *p)
{
usleep(20);
if (b) c = 3;
a = 0;
return NULL;
}
static void *tb(void *p)
{
usleep(20);
b = 1;
if (a) c = 17;
return NULL;
}
and print the value of c which will either be 0, 3 or 17. In absence of
threads, the possible outcomes are
only ta called => 0
only tb called => 17
ta before tb => 0
tb before ta => 3
That's the same set of outcomes because it's the same race.
No, the set of outcomes for the threaded case also includes -1234567890,
586.736, or "You don't understand what undefined behavior means."
I do understand what "undefined behaviour" means, namely "the C standard
doesn't require anything about this situation". This, in turn, means the
C standard is (intentionally) useless as source of information about
this behaviour. And nothing else.
Yes - which is why you should not write code for which that is true.
Post by Rainer Weikusat
But that's -- see above -- still entirely besides the point.
It's very much my point that you shouldn't write code that has undefined
behavior, whereas the behavior of code that isn't undefined is very much
worth talking about. In particular, there are ways to deal with the
unpredictability of the order in which inputs are received. The best
ways to do so depend upon what precisely the sequencing issue is - if
you fleshed out your example a bit more, I (or someone else) might
suggest an alternative.
There's nothing useful to be done about data races, as that term is
defined in the C standard, except to make sure that they don't occur.
Nicolas George
2020-06-12 18:00:08 UTC
Permalink
Post by James Kuyper
All undefined behavior is deserving of top attention - unless the
behavior that the C standard leaves undefined is defined by some other
applicable document. Do you know of any? If so, which document, and what
does it say about such issues?
I mostly agree with the points you make in this discussion, but I want to
address this one.

There is a document: the source code of the compiler you will be using. Of
course that's not a single document there are as many as there are
compilers, so we can't make an universal decision based on that, but we are
already in that case for implementation-defined behaviors.

My point is that unless you are in the process of implementing a C compiler,
in which case "undefined behavior" means "do whatever you want", when you
write C code, "undefined behavior" means "implementation-defined but we
won't tell what", i.e. RTFS instead of RTFM and expect it to be the opposite
of what you want.
James Kuyper
2020-06-12 19:34:03 UTC
Permalink
Post by Nicolas George
Post by James Kuyper
All undefined behavior is deserving of top attention - unless the
behavior that the C standard leaves undefined is defined by some other
applicable document. Do you know of any? If so, which document, and what
does it say about such issues?
I mostly agree with the points you make in this discussion, but I want to
address this one.
There is a document: the source code of the compiler you will be using. Of
That's not the kind of document I'm talking about. That tells you what
the compiler actually does (and it can be remarkably difficult to figure
out what it's telling you). I'm talking about a document that makes a
promise to the users of the compiler about what it does (such a document
should be far more readable than the source code). An implementation
that provides such a document to the users should not change any feature
promised by that document without warning - at the very least, the
implementors should issue a new version of the document when it delivers
a new version of the implementation that changes things described in
that document. An implementation can change any feature not meantioned
in the document any time it pleases.
Post by Nicolas George
My point is that unless you are in the process of implementing a C compiler,
in which case "undefined behavior" means "do whatever you want", when you
write C code, "undefined behavior" means "implementation-defined but we
won't tell what", i.e. RTFS instead of RTFM and expect it to be the opposite
of what you want.
No, if an implementation make promises about the behavior when the C
standard does not, that's exactly in the same category as the
implementation's promise to provide a conforming implementation of C. If
it makes such a promise, you have a reasonable expectation that they
will fulfill it. For any assumptions you make about features not
promised, the risk is entirely yours, and that's equally true of either
of those promises.
Nicolas George
2020-06-13 10:09:08 UTC
Permalink
Post by James Kuyper
That's not the kind of document I'm talking about. That tells you what
I know. But the source code of a compiler is still a document and it tells
you absolutely everything there is to know about the output of this
compiler. That was my point.
Keith Thompson
2020-06-13 10:20:08 UTC
Permalink
Post by Nicolas George
Post by James Kuyper
That's not the kind of document I'm talking about. That tells you what
I know. But the source code of a compiler is still a document and it tells
you absolutely everything there is to know about the output of this
compiler. That was my point.
It tells you (if and only if you read and understand all of it;
it's likely the authors don't know it that well) the output of
a particular version of a compiler. There's likely to be a new
version next week, with different output.

The actual documentation tells you what you can actually rely on.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
Nicolas George
2020-06-13 12:42:50 UTC
Permalink
Keith Thompson , dans le message
Post by Keith Thompson
It tells you (if and only if you read and understand all of it;
it's likely the authors don't know it that well) the output of
a particular version of a compiler. There's likely to be a new
version next week, with different output.
Indeed, a new version of a compiler is really a new compiler, and therefore
can make different choices about undefined behaviors. But it can also make
different choices about implementation-defined behaviors.
Kaz Kylheku
2020-06-14 14:25:20 UTC
Permalink
Post by Nicolas George
Keith Thompson , dans le message
Post by Keith Thompson
It tells you (if and only if you read and understand all of it;
it's likely the authors don't know it that well) the output of
a particular version of a compiler. There's likely to be a new
version next week, with different output.
Indeed, a new version of a compiler is really a new compiler, and therefore
can make different choices about undefined behaviors.
The only undefined behaviors in the compiler source will be those that
*it* invokes in the language it is written in, and those that its output
potentially invokes in the target language.

But yes, of course, the choices of those can change.
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
James Kuyper
2020-06-15 15:25:05 UTC
Permalink
Post by Nicolas George
Post by James Kuyper
That's not the kind of document I'm talking about. That tells you what
I know. But the source code of a compiler is still a document and it tells
you absolutely everything there is to know about the output of this
compiler. That was my point.
No, it doesn't. It doesn't necessarily contain a single word about which
features of the implementation users can rely on, and which ones they
should not. This matters if you want your unmodified code to be
translated and executed correctly with any version of the implementation
later than the current one.

If it does say anything about such issues, it does so only in the
comments, and comments have a tendency to fall out of sync with the
reality they try to describe. Since we're talking about comments about
future plans of the implementors, falling even a little bit out of sync
renders them nearly useless.
Nicolas George
2020-06-15 15:45:27 UTC
Permalink
Post by James Kuyper
No, it doesn't. It doesn't necessarily contain a single word about which
features of the implementation users can rely on, and which ones they
should not. This matters if you want your unmodified code to be
You are obviously missing my point: the source code itself of the very
compiler you are using tells you all there is to know about it, because
there is nothing else.
Post by James Kuyper
translated and executed correctly with any version of the implementation
later than the current one.
... then it is another compiler, not the same one. It has another source
code, which tells you something else. But it tells you everything.
James Kuyper
2020-06-15 17:36:56 UTC
Permalink
Post by Nicolas George
Post by James Kuyper
No, it doesn't. It doesn't necessarily contain a single word about which
features of the implementation users can rely on, and which ones they
should not. This matters if you want your unmodified code to be
You are obviously missing my point: the source code itself of the very
compiler you are using tells you all there is to know about it, because
there is nothing else.
You miss my point - there IS something else - the promises of the
implementors. The documents I'm talking about distinguish between the
features that are unlikely to be changed, and even more unlikely to be
changed without notice (because they're mentioned in those documents)
from those that could easily disappear in the next version of the
compiler without any notice at all (because they're not mentioned in
those documents). Where in the current source code can you find that
information? You can find it in the current documentation.

There's also the point that the documentation I'm talking about is a lot
easier to read than the compiler's source code.
Post by Nicolas George
Post by James Kuyper
translated and executed correctly with any version of the implementation
later than the current one.
... then it is another compiler, not the same one. ...
But nonetheless, a real one.
Post by Nicolas George
... It has another source
code, which tells you something else. But it tells you everything.
Yes, but it's kind of hard to figure these things out by reading that
source code, since it hasn't been written yet - much harder than reading
the current source code, which is already too hard.
Nicolas George
2020-06-15 19:18:49 UTC
Permalink
Post by James Kuyper
You miss my point - there IS something else - the promises of the
implementors.
I got that very well. You are still missing the point: the promises of the
implementors are for other versions of the compiler. The output of this very
version will stay the same.
Post by James Kuyper
But nonetheless, a real one.
A real one, but another, that nobody can force you to use.

Remember when Linux could only be compiled with gcc 2.95?
James Kuyper
2020-06-15 19:58:46 UTC
Permalink
Post by Nicolas George
Post by James Kuyper
You miss my point - there IS something else - the promises of the
implementors.
I got that very well. You are still missing the point: the promises of the
implementors are for other versions of the compiler. The output of this very
version will stay the same.
So, what does that matter? The promises are still important useful
information that should properly be taken into consideration for any
code that is intended to continue in use long enough for those promises
to matter. Your code might not be useful enough to bother keeping it
around for any significant amount of time, but mine certainly is.

Please note that, as a practical matter, it could easily take very much
longer to read the source code for the current version of the compiler
(even if it is available), and figure out all of the implications, than
it would for the information extracted by that method to become obsolete
by reason of the next version of the compiler coming out. That, in turn,
would be very much longer that the time needed to read and comprehend
properly-written documentation.
Post by Nicolas George
Post by James Kuyper
But nonetheless, a real one.
A real one, but another, that nobody can force you to use.
That depends entirely upon your contractual obligations to your
customers - I've certainly worked on contracts where we were required to
periodically test our code and update it as necessary to compile and
continue executing correctly with the latest versions of the OS,
compiler, and each of the libraries we linked to. Knowing in advance how
to write our code to make those tests likely to succeed, even for
updates that haven't come out yet, has been useful.
Nicolas George
2020-06-15 22:16:35 UTC
Permalink
Post by James Kuyper
So, what does that matter? The promises are still important useful
information that should properly be taken into consideration for any
code that is intended to continue in use long enough for those promises
to matter.
Exactly: you have to add a few hypothesis to make the problem of undefined
behaviors relevant.
Post by James Kuyper
Please note that, as a practical matter, it could easily take very much
longer to read the source code for the current version of the compiler
There is a very simple way to determine what a compiler will do with a
certain code.
Keith Thompson
2020-06-15 19:21:21 UTC
Permalink
Post by Nicolas George
Post by James Kuyper
No, it doesn't. It doesn't necessarily contain a single word about which
features of the implementation users can rely on, and which ones they
should not. This matters if you want your unmodified code to be
You are obviously missing my point: the source code itself of the very
compiler you are using tells you all there is to know about it, because
there is nothing else.
Only if the source code is written in a language that has no unspecified
behavior.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
Rainer Weikusat
2020-06-12 18:02:13 UTC
Permalink
Post by James Kuyper
Post by Rainer Weikusat
[...]
Post by James Kuyper
Post by Rainer Weikusat
Some C standard makes no demands about this. But that's an aside which
doesn't really matter here.
It matters a great deal to me. Undefined behavior is the worst possible
case, far worse than having a only a finite number of different outcomes
that all have a non-negligible chance of occurring.
This is not a discussion about the (somewhat) formal definition of the C
language, hence, the "abstract machine" employed to describe the
semantics of that doesn't matter. It's about race conditions occurring
in actual code executing on a real computer.
And actual code executing on real computers can and sometimes does
behave in ways that are permitted by the C standard that don't fit your
model.
So what?
Rainer Weikusat
2020-06-12 18:04:30 UTC
Permalink
Post by James Kuyper
Post by Rainer Weikusat
[...]
Post by James Kuyper
Post by Rainer Weikusat
Some C standard makes no demands about this. But that's an aside which
doesn't really matter here.
It matters a great deal to me. Undefined behavior is the worst possible
case, far worse than having a only a finite number of different outcomes
that all have a non-negligible chance of occurring.
This is not a discussion about the (somewhat) formal definition of the C
language, hence, the "abstract machine" employed to describe the
semantics of that doesn't matter. It's about race conditions occurring
in actual code executing on a real computer.
And actual code executing on real computers can and sometimes does
behave in ways that are permitted by the C standard that don't fit your
model.
I'm sorry but I have absolutely no interest in "discussing" what
threading facilities some random C standard does or doesn't
provide. This was a code example of a race condition (a real world
problem) which can also occur in single-threaded code, this being the
point which was of interest to me.
James Kuyper
2020-06-12 19:46:16 UTC
Permalink
On 6/12/20 2:04 PM, Rainer Weikusat wrote:
...
Post by Rainer Weikusat
I'm sorry but I have absolutely no interest in "discussing" what
threading facilities some random C standard does or doesn't
It's not a "random" standard - it's the official ISO C standard, and
also the official national standard in most (all?) countries which are
members of ISO - which is just about every country in the world -
including all of the ones where you're most likely to be using Unix. The
"Single UNIX® Specification, Version 4, 2018 Edition" (which would seem
to be a very relevant document for a newsgroup named
"comp.unix.programmer") requires that there be a command line utility
named c99 which conforms to that standard, and most of the documentation
of the system utility routines is defined in terms of C interfaces that
are consistent with that that standard.
Kaz Kylheku
2020-06-11 16:10:43 UTC
Permalink
Post by b***@nowhere.co.uk
You can't have a race when there's only 1 thread. You're simply refering to
unforseen code paths which are pretty standard bugs in any complex code.
The point is that there isn't only 1 thread when there are real-time
external events; the rest of the world that generates events is
effectively a bunch of threads. A single thread can race against the
world. (This is well known to device driver people: a driver can
race against hardware, even if everything is locked down so only one
thread is running.)

You won't have races with 1 thread when there are non-real-time external
events only, like reading syntax from a file and parsing it.

However, there is strictly no such thing as non-real-time events.
Non-real-time events are real-time events that have been carefully
massaged into appearing as a deterministic, repeatable sequence of
items.

For instance, if you read a file in Unix with a certain buffer size, you
are assured that you get nothing but full buffers, except at EOF. At
the hardware level, nothing is farther from the truth. The buffer
boundaries are not necessarily aligned with the low-level transfer
units. Say the buffer is smaller than a block. Sometimes your buffer is
filled by combining data from two blocks, which arrive at separate
times. Sometimes it comes all from one block. Sometimes it comes from a
cache, and sometimes the read has to wait for an I/O request to
complete.

It's quite a zoo of real-time activity, but the OS massages it such that
the compiler application enjoys 100% reproducible test cases (save for
its iternal issues like depending on random uninitialized memory
somewhere). Even if the compiler reads from a network socket, all is
not lost, because it likely uses a buffering library which hides all the
short reads from the rest of the logic. (Buffering is not just for
efficiency, but correctness also!)
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
b***@nowhere.co.uk
2020-06-12 12:49:36 UTC
Permalink
On Thu, 11 Jun 2020 16:10:43 +0000 (UTC)
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
You can't have a race when there's only 1 thread. You're simply refering to
unforseen code paths which are pretty standard bugs in any complex code.
The point is that there isn't only 1 thread when there are real-time
external events; the rest of the world that generates events is
effectively a bunch of threads. A single thread can race against the
world. (This is well known to device driver people: a driver can
race against hardware, even if everything is locked down so only one
thread is running.)
You won't have races with 1 thread when there are non-real-time external
events only, like reading syntax from a file and parsing it.
However, there is strictly no such thing as non-real-time events.
Non-real-time events are real-time events that have been carefully
massaged into appearing as a deterministic, repeatable sequence of
items.
All irrelevant. If you only have 1 thread in a program you cannot get race
conditions. End of. The fact that the external world is unpredictable is neither
here nor there - the code IS predictable based on the input it receives and its
current state.

Event driven programming has been mentioned, however when developing using this
sort of model the event loop is often a hidden seperate thread. This sort of
problem can be solved by multiplexing on select/poll unless the event loop is
quite a high abstration away from the system level.

And before someone mentions signals - interrupts are different problem class
to race conditions.
Rainer Weikusat
2020-06-12 18:53:11 UTC
Permalink
***@nowhere.co.uk writes:

[...]
Post by b***@nowhere.co.uk
All irrelevant. If you only have 1 thread in a program you cannot get race
conditions. End of. The fact that the external world is unpredictable is neither
here nor there - the code IS predictable based on the input it receives and its
current state.
A single-threaded program cannot exhibit what the C standard calls "race
condition" because that's impossible by definition but that's sort-of
besides the point.

Assuming there's some sort of "shared object" residing in memory which
can be read or modified atomically, read or write accesses to that will
be ordered somehow even if in face of multiple threads of execution
running in parallell, ie, on some sort of multiprocessor. The definition
of "race condition" I'm aware (preceding C11 by at least two decades) is
roughly: Correctness of the code depends on a certain ordering of such
accesses but this ordering isn't guaranteed: Independent codepaths are
racing with each other and the final state of the object depends on "who
made it there first" etc.
Post by b***@nowhere.co.uk
Event driven programming has been mentioned, however when developing using this
sort of model the event loop is often a hidden seperate thread. This sort of
problem can be solved by multiplexing on select/poll unless the event loop is
quite a high abstration away from the system level.
It cannot. With event-driven programs, the independent code paths are
sort-of "green threads": Whichever gets scheduled next depends on which
kind of event occurs next. And if this isn't predictable, the exact same
problem of "execution order A, B, C implicitly assumed but execution
order B, A, C happened instead" can occur.
Post by b***@nowhere.co.uk
And before someone mentions signals - interrupts are different problem class
to race conditions.
And that's just a limited special case of the "green threads" already
mentioned above.
James Kuyper
2020-06-12 20:00:41 UTC
Permalink
On 6/12/20 2:53 PM, Rainer Weikusat wrote:
...
Post by Rainer Weikusat
A single-threaded program cannot exhibit what the C standard calls "race
condition" because that's impossible by definition but that's sort-of
besides the point.
The C standard never uses the term "race condition" - the definition I
believe you are referring to defines the term "data race", which is why
I've been careful to couch my statements in those terms. See
"https://en.wikipedia.org/wiki/Race_condition#Data_race" for more
details on the distinction being made.
A single-threaded program can have race conditions, it cannot have a
data race (at least, not in C).
Post by Rainer Weikusat
Assuming there's some sort of "shared object" residing in memory which
can be read or modified atomically,
A key problem with the example you provided is that you did not take
advantage of the features of C that allow you to ensure that the
relevant objects can be read or modified atomically. Even before C2011,
you should have used a sig_atomic_t variable for such purposes. In
C2011, you have a much wider variety of options. If you had used one of
those features, then such code would not qualify as a data race, and you
would have been correct to identify the multi-threaded problem as being
essentially the same as the single-threaded problem.
Without guaranteed atomicity, the multi-threaded problem is very
different from, and much worse than, the single-threaded version of the
problem.
James Kuyper
2020-06-12 20:04:37 UTC
Permalink
On 6/12/20 4:00 PM, James Kuyper wrote:
...
Post by James Kuyper
The C standard never uses the term "race condition" - the definition I
Correction: it does use it, but the only normative text containing that
term is in the description of tmpnam_s() (K3.5.1.2p7).
Rainer Weikusat
2020-06-12 21:07:21 UTC
Permalink
[...]
Post by James Kuyper
Post by Rainer Weikusat
Assuming there's some sort of "shared object" residing in memory which
can be read or modified atomically,
A key problem with the example you provided is that you did not take
advantage of the features of C that allow you to ensure that the
relevant objects can be read or modified atomically. Even before C2011,
you should have used a sig_atomic_t
[...]

Would you please spare me this? I'm writing about something completely
different and random additions to random C standards occurring in random
years absolutely don't matter for that.

Multithreading existed a long time before some ISO committee chose to
invents its very own version of that.
James Kuyper
2020-06-12 22:33:59 UTC
Permalink
Post by Rainer Weikusat
[...]
Post by James Kuyper
Post by Rainer Weikusat
Assuming there's some sort of "shared object" residing in memory which
can be read or modified atomically,
A key problem with the example you provided is that you did not take
advantage of the features of C that allow you to ensure that the
relevant objects can be read or modified atomically. Even before C2011,
you should have used a sig_atomic_t
[...]
Would you please spare me this?
Would you please spare me this? Your comments added nothing to this
branch of the discussion - and I'm sure you feel the same about my
comments, so there's no point in continuing this.
Post by Rainer Weikusat
... I'm writing about something completely
different and random additions to random C standards occurring in random
years absolutely don't matter for that.
There's nothing the least bit random about any of those things you
dubbed random.
Post by Rainer Weikusat
Multithreading existed a long time before some ISO committee chose to
invents its very own version of that.
This branch of the discussion was created by James K. Lowden's claim
that threads break C's memory model. The C standard is very relevant to
that claim, and multi-threading that pre-existed C isn't. That claim was
wrong-headed for C89, with threads simply falling entirely outside of
the domain of that standard; at that time, other standards such as POSIX
dealt with the issues associated with multi-threaded code. What truth
there was in that claim was obsoleted by the modifications made to
C2011, so C2011 is also very relevant to that claim. Comments that
aren't relevant to that claim belong on some other branch of this
discussion.
Rainer Weikusat
2020-06-14 18:52:18 UTC
Permalink
Post by James Kuyper
Post by Rainer Weikusat
[...]
Post by James Kuyper
Post by Rainer Weikusat
Assuming there's some sort of "shared object" residing in memory which
can be read or modified atomically,
A key problem with the example you provided is that you did not take
advantage of the features of C that allow you to ensure that the
relevant objects can be read or modified atomically. Even before C2011,
you should have used a sig_atomic_t
[...]
Would you please spare me this?
Would you please spare me this? Your comments added nothing to this
branch of the discussion - and I'm sure you feel the same about my
comments, so there's no point in continuing this.
There is no "branch of a discussion" here, it feels more like an attempt
to bury a discussion.
Post by James Kuyper
Post by Rainer Weikusat
... I'm writing about something completely
different and random additions to random C standards occurring in random
years absolutely don't matter for that.
There's nothing the least bit random about any of those things you
dubbed random.
They're random in the sense that they're not related to the text/
example I originally posted.
Post by James Kuyper
Post by Rainer Weikusat
Multithreading existed a long time before some ISO committee chose to
invents its very own version of that.
This branch of the discussion was created by James K. Lowden's claim
that threads break C's memory model.
Which then branched further out with me pointing out that "race
conditions" can exist in single-threaded programs, too, as these don't
necessarily have (completely) predictable execution orders of
independent "code things", either.
b***@nowhere.co.uk
2020-06-15 09:17:00 UTC
Permalink
On Fri, 12 Jun 2020 19:53:11 +0100
Post by Rainer Weikusat
Post by b***@nowhere.co.uk
Event driven programming has been mentioned, however when developing using
this
Post by b***@nowhere.co.uk
sort of model the event loop is often a hidden seperate thread. This sort of
problem can be solved by multiplexing on select/poll unless the event loop
is
Post by b***@nowhere.co.uk
quite a high abstration away from the system level.
It cannot. With event-driven programs, the independent code paths are
Can't it? Clearly you've never done Xlib development where you write your
own event loop and can poll on the socket returned by ConnectionNumber().
Post by Rainer Weikusat
sort-of "green threads": Whichever gets scheduled next depends on which
No they're not "sort-of green threads" at all.
Post by Rainer Weikusat
kind of event occurs next. And if this isn't predictable, the exact same
problem of "execution order A, B, C implicitly assumed but execution
order B, A, C happened instead" can occur.
Varying code paths taken by a single threaded object based on varying input
is NOT race conditions.
Post by Rainer Weikusat
Post by b***@nowhere.co.uk
And before someone mentions signals - interrupts are different problem class
to race conditions.
And that's just a limited special case of the "green threads" already
mentioned above.
Rubbish.
Rainer Weikusat
2020-06-15 15:20:16 UTC
Permalink
Post by b***@nowhere.co.uk
On Fri, 12 Jun 2020 19:53:11 +0100
[...]
Post by b***@nowhere.co.uk
Post by Rainer Weikusat
kind of event occurs next. And if this isn't predictable, the exact same
problem of "execution order A, B, C implicitly assumed but execution
order B, A, C happened instead" can occur.
Varying code paths taken by a single threaded object based on varying input
is NOT race conditions.
You can call that anything you like. However, it's still a cause of
unpredictable execution ordering changes causing trouble in both
cases. The effect is more limited because - as with any other
implementation of cooperative multitasking/ -threading, there are no
involuntary context switches but ...
Post by b***@nowhere.co.uk
Post by Rainer Weikusat
Post by b***@nowhere.co.uk
And before someone mentions signals - interrupts are different problem class
to race conditions.
And that's just a limited special case of the "green threads" already
mentioned above.
Rubbish.
... even this evaporates when signals come into play which are nothing
but a limited form of preemtive multitasking/ -threading.

The structural similarities of all of this are pretty obvious. That
(over-)generalized event-loop implementation tend to evolve (or degenerate)
towards becoming full-blown "cooperative userspace threading reinvented
once again" implementations is also not exactly a secret.

There's also no difference in "predictability" here: Even with multiple
threads actually executing in parallell, all possible outcomes of any
interactions are principally predictable. What isn't predictable (and
that's apparently the difficult thing here) is which of these possible
outcomes will actually happen within a limited period of time.
Kaz Kylheku
2020-06-12 23:33:13 UTC
Permalink
Post by b***@nowhere.co.uk
On Thu, 11 Jun 2020 16:10:43 +0000 (UTC)
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
You can't have a race when there's only 1 thread. You're simply refering to
unforseen code paths which are pretty standard bugs in any complex code.
The point is that there isn't only 1 thread when there are real-time
external events; the rest of the world that generates events is
effectively a bunch of threads. A single thread can race against the
world. (This is well known to device driver people: a driver can
race against hardware, even if everything is locked down so only one
thread is running.)
You won't have races with 1 thread when there are non-real-time external
events only, like reading syntax from a file and parsing it.
However, there is strictly no such thing as non-real-time events.
Non-real-time events are real-time events that have been carefully
massaged into appearing as a deterministic, repeatable sequence of
items.
All irrelevant. If you only have 1 thread in a program you cannot get race
conditions. End of. The fact that the external world is unpredictable is neither
here nor there - the code IS predictable based on the input it receives and its
current state.
A race condition's behavior is also predictable based on the current state.

That thread's instruction pointer is at 0xabcd, this one's is at 0x1234,
neither is holding a lock ... all observations of state.
Post by b***@nowhere.co.uk
Event driven programming has been mentioned, however when developing using this
sort of model the event loop is often a hidden seperate thread. This sort of
problem can be solved by multiplexing on select/poll unless the event loop is
quite a high abstration away from the system level.
And before someone mentions signals - interrupts are different problem class
to race conditions.
No they aren't. We can meaningfully talk about a race condition between
main line and interrupt time.
b***@nowhere.co.uk
2020-06-15 09:19:21 UTC
Permalink
On Fri, 12 Jun 2020 23:33:13 +0000 (UTC)
Post by James Kuyper
Post by b***@nowhere.co.uk
On Thu, 11 Jun 2020 16:10:43 +0000 (UTC)
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
You can't have a race when there's only 1 thread. You're simply refering to
unforseen code paths which are pretty standard bugs in any complex code.
The point is that there isn't only 1 thread when there are real-time
external events; the rest of the world that generates events is
effectively a bunch of threads. A single thread can race against the
world. (This is well known to device driver people: a driver can
race against hardware, even if everything is locked down so only one
thread is running.)
You won't have races with 1 thread when there are non-real-time external
events only, like reading syntax from a file and parsing it.
However, there is strictly no such thing as non-real-time events.
Non-real-time events are real-time events that have been carefully
massaged into appearing as a deterministic, repeatable sequence of
items.
All irrelevant. If you only have 1 thread in a program you cannot get race
conditions. End of. The fact that the external world is unpredictable is
neither
Post by b***@nowhere.co.uk
here nor there - the code IS predictable based on the input it receives and
its
Post by b***@nowhere.co.uk
current state.
A race condition's behavior is also predictable based on the current state.
I should have qualified it by saying predictable by someone going through
the code based on the input. Clearly thats not the case with threads where
the scheduling by the OS is an unknown quantity.
Post by James Kuyper
Post by b***@nowhere.co.uk
And before someone mentions signals - interrupts are different problem class
to race conditions.
No they aren't. We can meaningfully talk about a race condition between
main line and interrupt time.
No you can't because by definition the main line will NEVER execute while the
interrupt code is executing. That is not the case for multiple threads.
Kaz Kylheku
2020-06-16 15:01:29 UTC
Permalink
Post by b***@nowhere.co.uk
On Fri, 12 Jun 2020 23:33:13 +0000 (UTC)
Post by James Kuyper
Post by b***@nowhere.co.uk
On Thu, 11 Jun 2020 16:10:43 +0000 (UTC)
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
You can't have a race when there's only 1 thread. You're simply refering to
unforseen code paths which are pretty standard bugs in any complex code.
The point is that there isn't only 1 thread when there are real-time
external events; the rest of the world that generates events is
effectively a bunch of threads. A single thread can race against the
world. (This is well known to device driver people: a driver can
race against hardware, even if everything is locked down so only one
thread is running.)
You won't have races with 1 thread when there are non-real-time external
events only, like reading syntax from a file and parsing it.
However, there is strictly no such thing as non-real-time events.
Non-real-time events are real-time events that have been carefully
massaged into appearing as a deterministic, repeatable sequence of
items.
All irrelevant. If you only have 1 thread in a program you cannot get race
conditions. End of. The fact that the external world is unpredictable is
neither
Post by b***@nowhere.co.uk
here nor there - the code IS predictable based on the input it receives and
its
Post by b***@nowhere.co.uk
current state.
A race condition's behavior is also predictable based on the current state.
I should have qualified it by saying predictable by someone going through
the code based on the input.
A single-threaded program can behave in a way that you cannot predict
from the input. It can depend on the exact timing of the arrival of the
input. Even if you have the exact timing of that arrival nailed down to
the nanosecond, you will have to make assumptions about the state of the
program.

For instance:

while (!kbhit())) // sorry for the DOS-ism
counter++;

Assume the input to the program is the Enter key. what is the final
value of counter?
Post by b***@nowhere.co.uk
the scheduling by the OS is an unknown quantity.
Post by James Kuyper
Post by b***@nowhere.co.uk
And before someone mentions signals - interrupts are different problem class
to race conditions.
No they aren't. We can meaningfully talk about a race condition between
main line and interrupt time.
No you can't because by definition the main line will NEVER execute while the
interrupt code is executing.
The situation with interrupt handlers is something like "one-way
cooperative" threading. The mainline does not know when the interrupt
will go off; it must mask it when required, or else there is a race.

The interrupt yields the processor programmatically, so its logic
does not concern itself with active interference from mainline while it
is running. However, the interrupt handler could still make wrong
assumptions about the state of the mainline, which can be in any state
at interrupt time.

Depending on system design, interrupts can be interrupted, too.
Post by b***@nowhere.co.uk
That is not the case for multiple threads.
Not strictly true! There are real-time scheduling algorithms whereby you
can assure that when thread B is running, thread A isn't dispatched, and
it stays that way for as long as B does not do anything to block itself.

This is not primarily looked to as a mechanism for resolving races.
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Scott Lurndal
2020-06-16 18:05:17 UTC
Permalink
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
On Fri, 12 Jun 2020 23:33:13 +0000 (UTC)
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
And before someone mentions signals - interrupts are different problem class
to race conditions.
No they aren't. We can meaningfully talk about a race condition between
main line and interrupt time.
No you can't because by definition the main line will NEVER execute while the
interrupt code is executing.
Actually, this is incorrect in all modern (multiple core) processors; even
if the hardware is configured to execute the interrupt handler on the same
hardware thread as the application, the application can simply rescheduled to
another hardware thread to execute in parallel with the interrupt handler.

Even something a simple as a periodic timer interrupt can affect the state
of processes/threads executing on other hardware threads (even on the same core
if hyperthreading is implemented).
Post by Kaz Kylheku
The situation with interrupt handlers is something like "one-way
cooperative" threading. The mainline does not know when the interrupt
will go off; it must mask it when required, or else there is a race.
Again, think modern processors (like threadripper with 32 cores/64 hardware
threads). You may mask interrupts on a particular hardware thread, but
masking them for all cores/threads is far too expensive.

[snip]e.
Post by Kaz Kylheku
Depending on system design, interrupts can be interrupted, too.
Indeed, windows relies on this (a legacy of the VAX/VMS interrupt regime
added to NT by DC). Major interrupt controllers support this by prioritizing
interrupts (ARM GIC, for example, offers from 31 to 255 distinct interrupt
priorities, depending on the implementation (most currently offer 31)).
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
That is not the case for multiple threads.
b***@nowhere.co.uk
2020-06-17 08:47:25 UTC
Permalink
On Tue, 16 Jun 2020 18:05:17 GMT
Post by Scott Lurndal
Post by b***@nowhere.co.uk
On Fri, 12 Jun 2020 23:33:13 +0000 (UTC)
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
And before someone mentions signals - interrupts are different problem
class
Post by b***@nowhere.co.uk
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
to race conditions.
No they aren't. We can meaningfully talk about a race condition between
main line and interrupt time.
No you can't because by definition the main line will NEVER execute while
the
Post by b***@nowhere.co.uk
interrupt code is executing.
Actually, this is incorrect in all modern (multiple core) processors; even
if the hardware is configured to execute the interrupt handler on the same
hardware thread as the application, the application can simply rescheduled to
another hardware thread to execute in parallel with the interrupt handler.
Well if the application does that it's its own lookout, but for example signals
on unix the main code path is suspended until the interrupt(s) have finished.
If that wasn't the case then an awful lot of code would break badly.
Scott Lurndal
2020-06-17 15:29:45 UTC
Permalink
Post by b***@nowhere.co.uk
On Tue, 16 Jun 2020 18:05:17 GMT
Post by Scott Lurndal
Post by b***@nowhere.co.uk
On Fri, 12 Jun 2020 23:33:13 +0000 (UTC)
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
And before someone mentions signals - interrupts are different problem
class
Post by b***@nowhere.co.uk
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
to race conditions.
No they aren't. We can meaningfully talk about a race condition between
main line and interrupt time.
No you can't because by definition the main line will NEVER execute while
the
Post by b***@nowhere.co.uk
interrupt code is executing.
Actually, this is incorrect in all modern (multiple core) processors; even
if the hardware is configured to execute the interrupt handler on the same
hardware thread as the application, the application can simply rescheduled to
another hardware thread to execute in parallel with the interrupt handler.
Well if the application does that it's its own lookout, but for example signals
on unix the main code path is suspended until the interrupt(s) have finished.
Signals are not interrupts. They're simply a mechanism to enable
pseudo-asynchronous events in an application[*]. System calls like
sigprocmask, pthread_sigmask and sigwait are there to allow the application some
control over when and where a signal handler will be called.

Sigwait is particularly useful in multithreaded applications where
one might dedicate a thread to just handle signals.

But, in any case, signal handlers can certainly execute in one thread
of a process while other threads are executing.
Post by b***@nowhere.co.uk
If that wasn't the case then an awful lot of code would break badly.
A lot of code did break badly until it was rewritten to use the correct
signal handling procedures; particularly in multithreaded applcations.

[*] In general, signal handlers are dispatched when the kernel returns
to the application from a system call (the system call either completes
successfully after the signal handler returns or returns EINTR), or when
scheduling a thread from the process on a hardware thread. They're not
truly asynchronous like hardware interrupts.
b***@nowhere.co.uk
2020-06-17 15:51:43 UTC
Permalink
On Wed, 17 Jun 2020 15:29:45 GMT
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
On Tue, 16 Jun 2020 18:05:17 GMT
Post by Scott Lurndal
Post by b***@nowhere.co.uk
On Fri, 12 Jun 2020 23:33:13 +0000 (UTC)
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
And before someone mentions signals - interrupts are different problem
class
Post by b***@nowhere.co.uk
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
to race conditions.
No they aren't. We can meaningfully talk about a race condition between
main line and interrupt time.
No you can't because by definition the main line will NEVER execute while
the
Post by b***@nowhere.co.uk
interrupt code is executing.
Actually, this is incorrect in all modern (multiple core) processors; even
if the hardware is configured to execute the interrupt handler on the same
hardware thread as the application, the application can simply rescheduled to
another hardware thread to execute in parallel with the interrupt handler.
Well if the application does that it's its own lookout, but for example
signals
Post by b***@nowhere.co.uk
on unix the main code path is suspended until the interrupt(s) have finished.
Signals are not interrupts. They're simply a mechanism to enable
We're discussing single threaded programs and in a single threaded program a
signal interrupts the flow of control. Ie an interrupt. If you wish to nit pick
thats up to you.
Post by Kaz Kylheku
pseudo-asynchronous events in an application[*]. System calls like
sigprocmask, pthread_sigmask and sigwait are there to allow the application some
control over when and where a signal handler will be called.
Yes, and?
Post by Kaz Kylheku
Sigwait is particularly useful in multithreaded applications where
one might dedicate a thread to just handle signals.
They're also a useful substitute for condition variables but thats another
discussion.
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
If that wasn't the case then an awful lot of code would break badly.
A lot of code did break badly until it was rewritten to use the correct
signal handling procedures; particularly in multithreaded applcations.
If make a program multithreaded without making sure the unchanged code is
thread safe then signal handling issues will be the least of your worries.
Kaz Kylheku
2020-06-17 16:03:29 UTC
Permalink
Post by Scott Lurndal
[*] In general, signal handlers are dispatched when the kernel returns
to the application from a system call (the system call either completes
successfully after the signal handler returns or returns EINTR), or when
scheduling a thread from the process on a hardware thread. They're not
truly asynchronous like hardware interrupts.
This is not so; a thread executing a tight loop that makes no system calls
can be interrupted by a signal (e.g. Ctrl-C-induced SIGINT from the
TTY), and execute a handler. That is truly asynchronous: the process is
first interrupted somehow, perhaps by the timer interrupt that drives
preemptive scheduling. Control passes into the kernel, and the kernel
inserts the handler when it resumes that thread. Therefore, that signal
handler is (effectively) an extension of the handling of whatever
interrupted the thread.

Signals also resemble the interrupt paradigm in how they can be
temporarily masked and whatnot.
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Scott Lurndal
2020-06-17 17:54:36 UTC
Permalink
Post by Kaz Kylheku
Post by Scott Lurndal
[*] In general, signal handlers are dispatched when the kernel returns
to the application from a system call (the system call either completes
successfully after the signal handler returns or returns EINTR), or when
scheduling a thread from the process on a hardware thread. They're not
truly asynchronous like hardware interrupts.
This is not so; a thread executing a tight loop that makes no system calls
can be interrupted by a signal (e.g. Ctrl-C-induced SIGINT from the
Which (ctrl-c) is a hardware (UART) interrupt. That's the 'scheduling a
thread from the process on a hardware thread'. The signal didn't interrupt
the application thread, the interrupt did.
Rainer Weikusat
2020-06-17 18:17:54 UTC
Permalink
Post by Scott Lurndal
Post by Kaz Kylheku
Post by Scott Lurndal
[*] In general, signal handlers are dispatched when the kernel returns
to the application from a system call (the system call either completes
successfully after the signal handler returns or returns EINTR), or when
scheduling a thread from the process on a hardware thread. They're not
truly asynchronous like hardware interrupts.
This is not so; a thread executing a tight loop that makes no system calls
can be interrupted by a signal (e.g. Ctrl-C-induced SIGINT from the
Which (ctrl-c) is a hardware (UART) interrupt. That's the 'scheduling a
thread from the process on a hardware thread'. The signal didn't interrupt
the application thread, the interrupt did.
The input starts with some "hardware interrupt". SIGINT comes from the
kernel terminal driver if the (configurable) interrupt character was
entered.

Interruption also works in absence of hardware interrupts explicitly
caused hardware interrupts.

-----
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static unsigned x;

static void terminate(int unused)
{
printf("%u\n", x);
exit(0);
}

int main(void)
{
signal(SIGCHLD, terminate);
if (fork()) while (1) ++x;
sleep(3);
return 0;
}
Scott Lurndal
2020-06-17 23:44:45 UTC
Permalink
Post by Rainer Weikusat
Post by Scott Lurndal
Post by Kaz Kylheku
Post by Scott Lurndal
[*] In general, signal handlers are dispatched when the kernel returns
to the application from a system call (the system call either completes
successfully after the signal handler returns or returns EINTR), or when
scheduling a thread from the process on a hardware thread. They're not
truly asynchronous like hardware interrupts.
This is not so; a thread executing a tight loop that makes no system calls
can be interrupted by a signal (e.g. Ctrl-C-induced SIGINT from the
Which (ctrl-c) is a hardware (UART) interrupt. That's the 'scheduling a
thread from the process on a hardware thread'. The signal didn't interrupt
the application thread, the interrupt did.
The input starts with some "hardware interrupt". SIGINT comes from the
kernel terminal driver if the (configurable) interrupt character was
entered.
Interruption also works in absence of hardware interrupts explicitly
caused hardware interrupts.
-----
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
static unsigned x;
static void terminate(int unused)
{
printf("%u\n", x);
exit(0);
}
int main(void)
{
signal(SIGCHLD, terminate);
if (fork()) while (1) ++x;
sleep(3);
return 0;
}
In this case, the signal is delivered when the sleep(3) returns (either because
three seconds elapsed, or the signal caused the system call to be interrupted
in which case it returns the number of seconds remaining).
Rainer Weikusat
2020-06-18 14:05:09 UTC
Permalink
[...]
Post by Scott Lurndal
Post by Rainer Weikusat
Post by Scott Lurndal
Post by Kaz Kylheku
This is not so; a thread executing a tight loop that makes no system calls
can be interrupted by a signal (e.g. Ctrl-C-induced SIGINT from the
Which (ctrl-c) is a hardware (UART) interrupt. That's the 'scheduling a
thread from the process on a hardware thread'. The signal didn't interrupt
the application thread, the interrupt did.
[...]
Post by Scott Lurndal
Post by Rainer Weikusat
Interruption also works in absence of hardware interrupts explicitly
caused hardware interrupts.
-----
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
static unsigned x;
static void terminate(int unused)
{
printf("%u\n", x);
exit(0);
}
int main(void)
{
signal(SIGCHLD, terminate);
if (fork()) while (1) ++x;
sleep(3);
return 0;
}
In this case, the signal is delivered when the sleep(3) returns (either because
three seconds elapsed, or the signal caused the system call to be interrupted
in which case it returns the number of seconds remaining).
The signal is delivered after the child process executing the sleep
terminated. The parent just increments x in a loop until interrupted to
execute the handler which causes it to terminate as well.

Rainer Weikusat
2020-06-17 18:38:42 UTC
Permalink
Post by Scott Lurndal
Post by Kaz Kylheku
Post by Scott Lurndal
[*] In general, signal handlers are dispatched when the kernel returns
to the application from a system call (the system call either completes
successfully after the signal handler returns or returns EINTR), or when
scheduling a thread from the process on a hardware thread. They're not
truly asynchronous like hardware interrupts.
This is not so; a thread executing a tight loop that makes no system calls
can be interrupted by a signal (e.g. Ctrl-C-induced SIGINT from the
Which (ctrl-c) is a hardware (UART) interrupt. That's the 'scheduling a
thread from the process on a hardware thread'. The signal didn't interrupt
the application thread, the interrupt did.
The input starts with some "hardware interrupt". SIGINT comes from the
kernel terminal driver if the (configurable) interrupt character was
entered.

Interruption also works in absence of explicitly
caused hardware interrupts.

-----
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static unsigned x;

static void terminate(int unused)
{
printf("%u\n", x);
exit(0);
}

int main(void)
{
signal(SIGCHLD, terminate);
if (fork()) while (1) ++x;
sleep(3);
return 0;
}
Kaz Kylheku
2020-06-18 12:19:11 UTC
Permalink
Post by Scott Lurndal
Post by Kaz Kylheku
Post by Scott Lurndal
[*] In general, signal handlers are dispatched when the kernel returns
to the application from a system call (the system call either completes
successfully after the signal handler returns or returns EINTR), or when
scheduling a thread from the process on a hardware thread. They're not
truly asynchronous like hardware interrupts.
This is not so; a thread executing a tight loop that makes no system calls
can be interrupted by a signal (e.g. Ctrl-C-induced SIGINT from the
Which (ctrl-c) is a hardware (UART) interrupt. That's the 'scheduling a
thread from the process on a hardware thread'. The signal didn't interrupt
the application thread, the interrupt did.
That kind of seems like saying that the interrupt *handler* which
serviced the UART didn't interrupt an application, the interrupt
*request* did that.

Indeed, signal handlers are not interrupt request handlers. The signal
abstraction has its own interrupt-request-like mechanism: a set of
pending signals (like bits/lines in some interrupt controller) which can
be blocked (like IRQ masking).
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Philip Guenther
2020-06-18 08:15:38 UTC
Permalink
On Wednesday, June 17, 2020 at 6:29:48 AM UTC-9, Scott Lurndal wrote:
...
Post by Scott Lurndal
[*] In general, signal handlers are dispatched when the kernel returns
to the application from a system call (the system call either completes
successfully after the signal handler returns or returns EINTR), or when
scheduling a thread from the process on a hardware thread. They're not
truly asynchronous like hardware interrupts.
I'll agree with Kaz Kylheku and disagree with this characterization. While they may often be delivered as described, it is certainly the case that on multiprocessors a signal generated by a thread running on CPU B targeted at the thread/process running on CPU A may result in an interprocessor interrupt (IPI) from B to A to force CPU A "into the kernel" so that the "hey, the thread you're running has a pending signal" flag can be handled and the signal can be delivered to the target thread. This is independent of that thread performing any system calls or the mechanisms implementing the normal preemptive scheduling (timer interrupts, etc), but rather strictly triggered by the action of independent threads and thereby asynchronous.


Philip Guenther
Kaz Kylheku
2020-06-11 16:01:17 UTC
Permalink
Post by Rainer Weikusat
This isn't generally true: It's possible to have race conditions
entirely without 'true' parallellism provided a program is driven by
external events, such as different kinds of inputs from the network,
which can happen in an unpredictable order.
I had to fix a couple of these in the past in single-threaded processes
structured around event loops.
Me too! Year sago I made the same observation: look, this bug is just
like a race condition, only without any threads or asynchronous signals
whatsoever. (The "threads" are in the outside world that generates
events.)

One example is simply how a stream of bytes is divided into individually
received packets. If you issue a four-byte read and asssume that you
will get all four bytes of the 32 bit datum from the network you will
find that it's sometimes false. That's a kind of race condition.
In fact, we can make it appear to go away by inserting a sufficiently
long delay before the read, by which time all four bytes are there. The
delay re-arranges your local execution order of the read operations with
regard to the arrival of events (TCP segments), concealing the race.
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Kaz Kylheku
2020-06-05 21:25:48 UTC
Permalink
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 14:43:22 -0400
Post by James Kuyper
4 Two expression evaluations conflict if one of them modifies a
memory location and the other one reads or modifies the same memory
location.
That's not just punting; it's punting the ball out of the stadium. Any
introductory C text will include two lines of code operating on the
same variable. Now, suddenly, when f() writes to a global variable and
g() reads it, they "conflict".
That's gotta mean "at the same time" (or "not sequenced" w.r.t. each
other). If f() writes to a global, and g() reads it, and these calls are
in a single thread of control, then they are sequenced; we do not have
"one modifies and the other reads/modifies". Not for the meaning of
"and" denoting simultaneity, as in "Rome is burning and Nero is
fiddling".

Even before threads, a form of essentially the same conflict
already existed in C:

(*p)++ + (*q); /* p and q point to same location */

The evaluation of *q reads *p which is being independently modified.
These uses of the object are not sequenced, so it is undefined
behavior.

Long before threads were introduced, C implementations were permitted to
interleave the evaluations of the operands of +, and even parallelize
them.

So, in a sense, threads are nothing new; just that now if you want
the undefined behavior between (*p)++ and (*q), they no longer have to
be in the same expression, between the same two sequence points.
James Kuyper
2020-06-05 22:39:31 UTC
Permalink
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 14:43:22 -0400
Your formatting isn't quite correct. While James Lowden did quote some
text that I wrote, you clipped all of that material from your response.
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
Post by James Kuyper
4 Two expression evaluations conflict if one of them modifies a
memory location and the other one reads or modifies the same memory
location. >>
That's not just punting; it's punting the ball out of the stadium. Any
introductory C text will include two lines of code operating on the
same variable. Now, suddenly, when f() writes to a global variable and
g() reads it, they "conflict".
That's gotta mean "at the same time" (or "not sequenced" w.r.t. each
other).
No, it doesn't. That's the complete definition of the term - two
expression evaluations conflict, regardless of whether or not they were
not sequenced with each other. The concept of conflicting expression
evaluations is used only in the definition of a data race, and it's only
in that definition that the timing of the expressions comes into play.
Specifically, it's a data race only if neither of the conflicting
expression evaluations happens before the other.

...
Post by Kaz Kylheku
(*p)++ + (*q); /* p and q point to same location */
The evaluation of *q reads *p which is being independently modified.
These uses of the object are not sequenced, so it is undefined
behavior.
That's covered by a different rule: "If a side effect on a scalar object
is unsequenced relative to either a different side effect on the same
scalar object or a value computation using the value of the same scalar
object, the behavior is undefined." (6.5p2)

Note that unlike data races, 6.5p2 applies even if both expressions
occur in the same thread.
Post by Kaz Kylheku
Long before threads were introduced, C implementations were permitted to
interleave the evaluations of the operands of +, and even parallelize
them.
Yes, and before C2011, the restriction described by 6.5p2 already
existed, but without being able to use the concept of "unsequenced", it
was expressed in way that was far less clear.
Joe Pfeiffer
2020-06-05 23:12:40 UTC
Permalink
Post by Kaz Kylheku
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 14:43:22 -0400
Post by James Kuyper
4 Two expression evaluations conflict if one of them modifies a
memory location and the other one reads or modifies the same memory
location.
That's not just punting; it's punting the ball out of the stadium. Any
introductory C text will include two lines of code operating on the
same variable. Now, suddenly, when f() writes to a global variable and
g() reads it, they "conflict".
That's gotta mean "at the same time" (or "not sequenced" w.r.t. each
other). If f() writes to a global, and g() reads it, and these calls are
in a single thread of control, then they are sequenced; we do not have
"one modifies and the other reads/modifies". Not for the meaning of
"and" denoting simultaneity, as in "Rome is burning and Nero is
fiddling".
No, it's a perfectly reasonable definition, and whenever those
conditions exist you have a conflict. Being able to deduce that one of
the operations took place before the other (as in your example, in a
single thread of control), is a way to resolve the conflict
unambiguously.
Richard Kettlewell
2020-06-06 07:27:22 UTC
Permalink
Post by James K. Lowden
The only programming language I'm aware of that addresses
multithreading coherently is Pike's go. (I don't claim any expertise in
go, only that it seems to have found a way out of the morass.) It
implements Hoare's Communicating Sequential Processes. And there you
have it: instead of reasoning about simultaneity, it reduces each
thread to a sequential process, and controls -- with the compiler --
where they intersect.
In Go threads (“goroutines”) get to share memory in a way that’s not
much different from C with threads. The compiler doesn’t prevent it. If
two goroutines concurrently access a single object then you’re out of
luck, just as you would be in C. The natively provided facilities (‘go’
statement and channels) do make concurrent programming a bit more
pleasant than C but the same basic risks are still there.
--
https://www.greenend.org.uk/rjk/
Rainer Weikusat
2020-06-07 18:22:42 UTC
Permalink
Post by Richard Kettlewell
Post by James K. Lowden
The only programming language I'm aware of that addresses
multithreading coherently is Pike's go. (I don't claim any expertise in
go, only that it seems to have found a way out of the morass.) It
implements Hoare's Communicating Sequential Processes. And there you
have it: instead of reasoning about simultaneity, it reduces each
thread to a sequential process, and controls -- with the compiler --
where they intersect.
In Go threads (“goroutines”) get to share memory in a way that’s not
much different from C with threads. The compiler doesn’t prevent it. If
two goroutines concurrently access a single object then you’re out of
luck, just as you would be in C.
I would still really like to know that the (perceived_ problem is. Code
doesn't "access" objects as if it was some sort of being capable of
acting autonomously, code access objects because it was written to
access them. Outside of C++-lalaland, "data races" on concurrent read
accesses don't exist (I was under the impression someone had killed this
changeling before it could escape into the real world ...).

This leave some mix of read and write accesses to a certain
object. These have to be coordinated such that they don't step onto each
others toes. Which means some sort of "mutual exclusion" has to be
used. There's a rich set of primitives for this and building "ITC
abstractions" (inter-thread communication) based on them isn't
complicated. This ends up exactly like CSP except that its' more
flexible than using opaque message-passing "channels" for all
communications.

Granted, there's the real-world "programmer couldn't be arsed because it
ain't gonna happen" (frequently enough to be worth any effort) problem
responsible for all this broken software (90% reliable! Fails only every
tenth time someone tries to use it!) out there we all have to fight as
part of our everyday lives and multithreading certainly doesn't help
with that, but that's a social problem and as such, has no techical
solutions[*].

[*] If any real construction work was completed according to accepted
"software reliability standards", nobody could ever enter a building
without serious danger to life ...
Richard Kettlewell
2020-06-08 07:03:08 UTC
Permalink
Post by Rainer Weikusat
This leave some mix of read and write accesses to a certain
object. These have to be coordinated such that they don't step onto each
others toes. Which means some sort of "mutual exclusion" has to be
used. There's a rich set of primitives for this and building "ITC
abstractions" (inter-thread communication) based on them isn't
complicated. This ends up exactly like CSP except that its' more
flexible than using opaque message-passing "channels" for all
communications.
Humans are not capable of using those primitives correctly with 100%
reliability. Whether you see that as a problem with the primitives or a
problem with the humans is up to you, but creating different approaches
to concurrency is quicker and easier than upgrading human intelligence.
--
https://www.greenend.org.uk/rjk/
Rainer Weikusat
2020-06-08 14:01:50 UTC
Permalink
Post by Richard Kettlewell
Post by Rainer Weikusat
This leave some mix of read and write accesses to a certain
object. These have to be coordinated such that they don't step onto each
others toes. Which means some sort of "mutual exclusion" has to be
used. There's a rich set of primitives for this and building "ITC
abstractions" (inter-thread communication) based on them isn't
complicated. This ends up exactly like CSP except that its' more
flexible than using opaque message-passing "channels" for all
communications.
Humans are not capable of using those primitives correctly with 100%
reliability.
Nothing on this planet "operates with 100% reliability" (except death
:-), hence, this isn't argument, just a pretty banal truism.
Post by Richard Kettlewell
Whether you see that as a problem with the primitives or a
problem with the humans is up to you, but creating different approaches
to concurrency is quicker and easier than upgrading human
intelligence.
There's no "different approach" here: Any form of ITC can be regarded as
(somehow synchronized) "message passing" at an abstract level. That's
not dependent on a specific implementation.
James K. Lowden
2020-06-08 16:30:30 UTC
Permalink
On Mon, 08 Jun 2020 15:01:50 +0100
Post by Rainer Weikusat
Post by Richard Kettlewell
Post by Rainer Weikusat
This leave some mix of read and write accesses to a certain
object. These have to be coordinated such that they don't step
onto each others toes. Which means some sort of "mutual exclusion"
has to be used. There's a rich set of primitives for this and
building "ITC abstractions" (inter-thread communication) based on
them isn't complicated. This ends up exactly like CSP except that
its' more flexible than using opaque message-passing "channels"
for all communications.
Humans are not capable of using those primitives correctly with 100%
reliability.
Nothing on this planet "operates with 100% reliability" (except death
:-), hence, this isn't argument, just a pretty banal truism.
Do you dispute that things that are statically verified by the
compiler are more reliable than things that are not?

DBMSs provide ACID guarantees. Doubtless, they sometimes fail. Would
you assert those guarantees are therefore pointless, and that
real-world programmers who can be arsed would have done as well or
better without those guarantees?

TCP guarantees a sequenced stream of packets. Why use it, given that
it sometimes fails to deliver, and occasionally delivers twice? Might
as well roll your own, right? After all, it's either reliable or it's
not.

In its totality, no program is provably correct. We can't even prove
it will terminate. Nonetheless, within limits, certain aspects of
programs can be proved correct. ISTM the more those limits encompass,
the more correct programs will be.

Threads moved us in the opposite direction: toward less verification.
The use of human manpower to compensate was and is -- ISTM
obviously -- an irretrevable loss.

--jkl
Rainer Weikusat
2020-06-08 17:47:56 UTC
Permalink
Post by James K. Lowden
On Mon, 08 Jun 2020 15:01:50 +0100
Post by Rainer Weikusat
Post by Richard Kettlewell
Post by Rainer Weikusat
This leave some mix of read and write accesses to a certain
object. These have to be coordinated such that they don't step
onto each others toes. Which means some sort of "mutual exclusion"
has to be used. There's a rich set of primitives for this and
building "ITC abstractions" (inter-thread communication) based on
them isn't complicated. This ends up exactly like CSP except that
its' more flexible than using opaque message-passing "channels"
for all communications.
Humans are not capable of using those primitives correctly with 100%
reliability.
Nothing on this planet "operates with 100% reliability" (except death
:-), hence, this isn't argument, just a pretty banal truism.
Do you dispute that things that are statically verified by the
compiler are more reliable than things that are not?
Aside: The compiler also isn't "100% reliable", hence, "maybe they are, maybe
they aren't". All general statements of this kind are equally
pointless.

But I fail to see how this would relate to the original topic and the me
pointing out that "it's not 100% reliable" is not an argument against
anything because nothing is "100% reliable".

[...]
Post by James K. Lowden
Threads moved us in the opposite direction: toward less verification.
The use of human manpower to compensate was and is -- ISTM
obviously -- an irretrevable loss.
This gets me back to "Why this?" (POSIX) threads is just set of
facilities enabling construction of various kinds of "communicating
sequential processes".
James K. Lowden
2020-06-15 23:27:45 UTC
Permalink
On Mon, 08 Jun 2020 18:47:56 +0100
Post by Rainer Weikusat
Post by James K. Lowden
Do you dispute that things that are statically verified by the
compiler are more reliable than things that are not?
Aside: The compiler also isn't "100% reliable", hence, "maybe they
are, maybe they aren't". All general statements of this kind are
equally pointless.
"All generalizations are false."

You will surely agree that reliability is not binary. It's not even
discrete. Reliability varies continuously from 0 to 1.

Compilers have bugs, yes. Ken Thompson gave us "Trusting Trust"
(https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf)
to remind us.

Still, compilers -- again, I'm sure you'll agree -- are useful. Even
when, 20 years ago, I was averaging 1 bug a month in Microsoft's C++
compiler, it still *found* more bugs in my code than it itself
exhibited, easliy by 3 orders of magnitude.

Simply this: the fact that everything is imperfect does not imply that
nothing works, only that nothing works perfectly. A tool of some
reliability is more useful than no tool. And no tool is exactly what C
provided (until they were recognized by the C standard) when it comes to
verifying memory accesses among competing threads.

You gave a fairly elaborate example of nondeterministic behavior. Let
me offer a simpler one. (If it's too simple, please say!)

void foo(void) {
static int n;
int ch;

while( (ch = getchar()) != EOF ) {
n += ch;
}
}

We can't know the value of n, not even whether it's negative or
positive (or not), a priori. But, given the input stream, we can know
the vaue of n, exactly.

Replace the while loop with two threads updating n, and that's no
longer true. Worse, until recently, the only mechanism the C
programmer had to arbitrate access to n was not through n itself, but
through a third party -- a mutex or critical section, say -- that
regulated not n, but the threads that would touch n.

Because it was threads that were regulated, not the data, and because
that was done through libraries, not the compiler, the compiler was
helpless to ensure *anything* about n. Static analysis tools have since
helped, but to my understanding the problem is intractable,
NP-complete or somesuch.

It is that intractable problem that you have often said is no problem
at all for the capable programmer.
Post by Rainer Weikusat
Post by James K. Lowden
Threads moved us in the opposite direction: toward less
verification. The use of human manpower to compensate was and is --
ISTM obviously -- an irretrevable loss.
This gets me back to "Why this?" (POSIX) threads is just set of
facilities enabling construction of various kinds of "communicating
sequential processes".
Er, no. Not as I understand what we're talking about.

https://www.cs.cmu.edu/~crary/819-f09/Hoare78.pdf

If there's some part of Hoare's description that allows two processes
to write to the same location in memory, with no meditation on the part
of the *compiler* (not a library, not the programmer) then your claim
might hold.

--jkl
Rainer Weikusat
2020-06-16 21:00:33 UTC
Permalink
Post by James K. Lowden
Post by Rainer Weikusat
Post by James K. Lowden
Do you dispute that things that are statically verified by the
compiler are more reliable than things that are not?
Aside: The compiler also isn't "100% reliable", hence, "maybe they
are, maybe they aren't". All general statements of this kind are
equally pointless.
"All generalizations are false."
You will surely agree that reliability is not binary. It's not even
discrete. Reliability varies continuously from 0 to 1.
I surely won't agree :-): Something is either reliable or it's not
reliable ("tertiam non datur"). Eg, gravity is reliable: Nothing will
ever fall upwards in absence of acceleration.


[...]
Post by James K. Lowden
You gave a fairly elaborate example of nondeterministic behavior. Let
me offer a simpler one. (If it's too simple, please say!)
void foo(void) {
static int n;
int ch;
while( (ch = getchar()) != EOF ) {
n += ch;
}
}
We can't know the value of n, not even whether it's negative or
positive (or not), a priori. But, given the input stream, we can know
the vaue of n, exactly.
Yes. After the input stream was received. And by this time, predicting
the final value of n is no longer useful because the final value is
known.
Post by James K. Lowden
Replace the while loop with two threads updating n, and that's no
longer true. Worse, until recently, the only mechanism the C
programmer had to arbitrate access to n was not through n itself, but
through a third party -- a mutex or critical section, say -- that
regulated not n, but the threads that would touch n.
Because it was threads that were regulated, not the data, and because
that was done through libraries, not the compiler, the compiler was
helpless to ensure *anything* about n.
That's true for a great, many other things: The compiler will never
detect that an addition was used where a subtraction should have been
used instead. Or, to use a more famous example: A compiler will never
detect that some software which was supposed to average a number of
values, ie, calculate a sum of some set of numbers and divide that by
the number of numbers, instead averages the first two numbers and each
subsequent number with the result of the previous calculation.

https://www.schneier.com/blog/archives/2009/05/software_proble.html
Post by James K. Lowden
It is that intractable problem that you have often said is no problem
at all for the capable programmer.
Using a mutex to control access to a single, shared data item is
certainly something that's conceptually easier than solving the
"averaging problem" mentioned above.
Post by James K. Lowden
Post by Rainer Weikusat
Post by James K. Lowden
Threads moved us in the opposite direction: toward less
verification. The use of human manpower to compensate was and is --
ISTM obviously -- an irretrevable loss.
This gets me back to "Why this?" (POSIX) threads is just set of
facilities enabling construction of various kinds of "communicating
sequential processes".
Er, no. Not as I understand what we're talking about.
https://www.cs.cmu.edu/~crary/819-f09/Hoare78.pdf
If there's some part of Hoare's description that allows two processes
to write to the same location in memory, with no meditation on the part
of the *compiler* (not a library, not the programmer) then your claim
might hold.
If two processes write to the same location in memory without mediation,
the code is usually broken. But that's no different from the examples
given above, it's just another algorithm which is not suitable for
solving a given problem.

The program below is a perfect example of two communicating, sequential
processes:

-------
#include <pthread.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>

static int postbox;
static sem_t adder_sem, main_sem;

static void *adder(void *unused)
{
while (1) {
sem_wait(&adder_sem);
++postbox;
sem_post(&main_sem);
}

return NULL;
}

int main(void)
{
char buf[4096];
pthread_t thr;

sem_init(&adder_sem, 0, 0);
sem_init(&main_sem, 0, 0);

pthread_create(&thr, NULL, adder, NULL);

while (fgets(buf, sizeof(buf), stdin)) {
postbox = atoi(buf);

sem_post(&adder_sem);
sem_wait(&main_sem);

printf("result %d\n", postbox);
}

return 0;
}
James Kuyper
2020-06-17 05:13:46 UTC
Permalink
...
Post by Rainer Weikusat
Post by James K. Lowden
You will surely agree that reliability is not binary. It's not even
discrete. Reliability varies continuously from 0 to 1.
I surely won't agree :-): Something is either reliable or it's not
reliable ("tertiam non datur"). Eg, gravity is reliable: Nothing will
ever fall upwards in absence of acceleration.
Well, the truth of that statement depends more upon the definition of
"fall" than it does on gravity. According to wiktionary, it means "To
move downwards."

So while it is perfectly ordinary thing to move upwards under the effect
of gravity (if, for instance, you start with a sufficiently high upward
velocity), you can't call such a motion a fall.

However, "absence of acceleration" doesn't make much sense - if you're
in a gravitational field, you'll always be undergoing acceleration
unless there's forces being applied to counter the acceleration due to
gravity. I suspect that what you meant was something like "in the
absence of other forces".

I suspect that a more meaningful rewrite that captures your intended
meaning would be "nothing will ever accelerate upward in the absence of
forces other than gravity".

That's an accurate statement as far as Newtonian gravity is concerned,
but reality is a better fit to Einstein's equations, which have
solutions that are not so constrained, for instance in the presence of
gravitational waves.
James K. Lowden
2020-06-04 15:07:46 UTC
Permalink
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
If threads are used, there should be few of them. Multiple activities
should be multiplexed onto threads with state machines, and efficient
I/O polling mechanisms.
I prefer Rob Pike's distinction between parallel and concurrent
processing.

Multiple threads of identical logic, operating over a shared
store, can execute a map-reduce function in parallel without locks or
race conditions. We see functionality like that in on GPUs for matrix
multiplication and the like.

The moment you're coordinating different threads of logic contending
for the same bits of memory, you're reinventing badly what the
operating system already provides.

The inevitable retort is efficiency. In reality, it's more imagined
than real. As Jim Gettys has observed, no multithreaded implementation
of the X11 server has outperformed the single-threaded one.

--jkl
b***@nowhere.co.uk
2020-06-04 15:21:16 UTC
Permalink
On Thu, 4 Jun 2020 11:07:46 -0400
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
If threads are used, there should be few of them. Multiple activities
should be multiplexed onto threads with state machines, and efficient
I/O polling mechanisms.
I prefer Rob Pike's distinction between parallel and concurrent
processing.
Multiple threads of identical logic, operating over a shared
store, can execute a map-reduce function in parallel without locks or
race conditions. We see functionality like that in on GPUs for matrix
multiplication and the like.
The moment you're coordinating different threads of logic contending
for the same bits of memory, you're reinventing badly what the
operating system already provides.
SIMD vs MIMD.
Post by b***@nowhere.co.uk
The inevitable retort is efficiency. In reality, it's more imagined
than real. As Jim Gettys has observed, no multithreaded implementation
of the X11 server has outperformed the single-threaded one.
Probably highly dependent on the hardware.
Scott Lurndal
2020-06-04 17:07:35 UTC
Permalink
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
If threads are used, there should be few of them. Multiple activities
should be multiplexed onto threads with state machines, and efficient
I/O polling mechanisms.
I prefer Rob Pike's distinction between parallel and concurrent
processing.
Multiple threads of identical logic, operating over a shared
store, can execute a map-reduce function in parallel without locks or
race conditions. We see functionality like that in on GPUs for matrix
multiplication and the like.
The moment you're coordinating different threads of logic contending
for the same bits of memory, you're reinventing badly what the
operating system already provides.
The inevitable retort is efficiency. In reality, it's more imagined
than real. As Jim Gettys has observed, no multithreaded implementation
of the X11 server has outperformed the single-threaded one.
twenty years ago, perhaps. Today? I'd find it difficult to believe.

At least make libX11, libXaw and libX... thread-safe.
Gary R. Schmidt
2020-06-05 03:22:46 UTC
Permalink
Post by b***@nowhere.co.uk
On Thu, 4 Jun 2020 02:39:17 +0000 (UTC)
Post by Kaz Kylheku
If threads are used, there should be few of them. Multiple activities
should be multiplexed onto threads with state machines, and efficient
I/O polling mechanisms.
I prefer Rob Pike's distinction between parallel and concurrent
processing.
Multiple threads of identical logic, operating over a shared
store, can execute a map-reduce function in parallel without locks or
race conditions. We see functionality like that in on GPUs for matrix
multiplication and the like.
The moment you're coordinating different threads of logic contending
for the same bits of memory, you're reinventing badly what the
operating system already provides.
The inevitable retort is efficiency. In reality, it's more imagined
than real. As Jim Gettys has observed, no multithreaded implementation
of the X11 server has outperformed the single-threaded one.
Well, yes, but the multi-threaded iWatch server I re-wrote to use queues
dropped CPU usage from around 100% to 35% while successfully servicing
more clients than it could at 100%. This was using my Ultra 10
development workstation, on the multi-CPU E10s and Alphas and PA-RISC
and Windows boxes that ran it for real it did even better.

But I understood threads (and processes) better than the PC-DOS/Windows
programmers who had tried to write it in the first place.

"'Tis a poor workman that blames 'is tools," is something I've both bit
my tongue on /and/ used with success far too many times in this business.

Cheers,
Gary B-)
--
Waiting for a new signature to suggest itself...
Loading...