thread duet

Post by James K. Lowden
https://research.swtch.com/mm
Russ Cox recently published two essays on threads and synchronization,
a/k/a memory models, one at the hardware level and another for
programming languages. A 3rd is in the works on the implications for
Go.
I've said many times that threads are a terrible model for concurrent
programming.

Threads are just something that most hackers get crazy about in their
programming puberty.

This is because due to their intellectual limitations they hit a
wall as to the kinds of programming problems they are able to solve.

So for excitement and redemption, they turn their attention to slicing
up and scrambling the execution order of solutions they understand.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

M***@qvyj9am89pyiehl99j.eu

2021-07-10 09:01:17 UTC

On Fri, 9 Jul 2021 21:42:52 -0000 (UTC)

Threads are just something that most hackers get crazy about in their
programming puberty.
This is because due to their intellectual limitations they hit a
wall as to the kinds of programming problems they are able to solve.
So for excitement and redemption, they turn their attention to slicing
up and scrambling the execution order of solutions they understand.

Nicely put :)

Though threads do have use cases , eg controlling GUIs with many widgets
that may have to do asynchronous tasks such as flashing cursors etc, but
in general they're way over used in situations that they're not required
for or where there's a better solution. Though Windows screws things up on
the network side by the sockets API not presenting through file descriptors
so its not possible AFAIK to single task multiplex on them in select or poll.

Richard Kettlewell

2021-07-12 14:28:10 UTC

Post by M***@qvyj9am89pyiehl99j.eu
Though threads do have use cases , eg controlling GUIs with many
widgets that may have to do asynchronous tasks such as flashing
cursors etc, but in general they're way over used in situations that
they're not required for or where there's a better solution. Though
Windows screws things up on the network side by the sockets API not
presenting through file descriptors so its not possible AFAIK to
single task multiplex on them in select or poll.

Does WSAPoll achieve what you want?

There are other less select/poll-like models for IO multiplexing in
Windows too. For instance in one project we used overlapped IO with the
completion routines dispatched from WaitForMultipleObjectsEx.

--
https://www.greenend.org.uk/rjk/

M***@84hbgptxp.com

2021-07-12 14:46:47 UTC

On Mon, 12 Jul 2021 15:28:10 +0100

Post by Richard Kettlewell

Does WSAPoll achieve what you want?

No idea, I'm not a windows specialist, I've only worked on cross platform code
on Windows.

Post by Richard Kettlewell
There are other less select/poll-like models for IO multiplexing in
Windows too. For instance in one project we used overlapped IO with the
completion routines dispatched from WaitForMultipleObjectsEx.

Looks like the usual MS over complex dogs dinner to solve a simple problem.

James K. Lowden

2021-07-12 16:19:52 UTC

On Sat, 10 Jul 2021 09:01:17 +0000 (UTC)

Post by M***@qvyj9am89pyiehl99j.eu
Though threads do have use cases , eg controlling GUIs with many
widgets that may have to do asynchronous tasks such as flashing
cursors

The Plan9 GUI programming model AIUI sets up 3 queues for CSP-style
programming: mouse and keyboard for input, and Window for output.

https://www.usenix.org/publications/compsystems/1989/spr_pike.pdf

It's not clear that GUIs justify threads on any basis. To the best of
my knowledge, Jim Getty's observation still stands that no multithreaded
implementation of the X11 server has ever outperformed the original
single-threaded model.

--jkl

M***@a1x.biz

2021-07-13 08:41:06 UTC

On Mon, 12 Jul 2021 12:19:52 -0400

Post by James K. Lowden
On Sat, 10 Jul 2021 09:01:17 +0000 (UTC)

Post by M***@qvyj9am89pyiehl99j.eu
Though threads do have use cases , eg controlling GUIs with many
widgets that may have to do asynchronous tasks such as flashing
cursors

The Plan9 GUI programming model AIUI sets up 3 queues for CSP-style
programming: mouse and keyboard for input, and Window for output.
https://www.usenix.org/publications/compsystems/1989/spr_pike.pdf
It's not clear that GUIs justify threads on any basis. To the best of
my knowledge, Jim Getty's observation still stands that no multithreaded
implementation of the X11 server has ever outperformed the original
single-threaded model.

Possibly not, but its not just down to performance. For the same reason
multiprocess is common in certain scenarios it lets the OS sort out
concurrency so the programmer doesn't have to. Its quite possible to do it
using select/poll and timers but beyond a certain level of simulated
concurrency it becomes arduous to do it.

James K. Lowden

2021-07-12 16:19:48 UTC

On Fri, 9 Jul 2021 21:42:52 -0000 (UTC)

Post by James K. Lowden
I've said many times that threads are a terrible model for
concurrent programming.

Not only does no popular programming language model multithreaded
execution, neither does it have much of a model in formal logic. I
suspect those two voids are related, and not related to my puberty.

Communicating sequential processes, conversely, has a strong
mathematical foundation.

The software world we live in is the one we deserve for standing on the
toes of giants.

--jkl

Scott Lurndal

2021-07-12 17:18:53 UTC

Post by M***@qvyj9am89pyiehl99j.eu
On Fri, 9 Jul 2021 21:42:52 -0000 (UTC)

Post by James K. Lowden
I've said many times that threads are a terrible model for
concurrent programming.

It starts with the hardware, for example:

https://developer.arm.com/architectures/cpu-architecture/a-profile/memory-model-tool

I would also argue that C++, for example, has a well-defined
memory model.

Post by M***@qvyj9am89pyiehl99j.eu
Communicating sequential processes, conversely, has a strong
mathematical foundation.

Yes, I studied CSP in 1981 - was useful for creating formal
proofs. Wasn't a useful language.

M***@9hso4b520vypdrb_.org

2021-07-13 08:52:39 UTC

On Mon, 12 Jul 2021 17:18:53 GMT

Post by M***@qvyj9am89pyiehl99j.eu
On Fri, 9 Jul 2021 21:42:52 -0000 (UTC)
Not only does no popular programming language model multithreaded
execution, neither does it have much of a model in formal logic. I
suspect those two voids are related, and not related to my puberty.

https://developer.arm.com/architectures/cpu-architecture/a-profile/memory-model
-tool

Most popular programming languages are simply a higher level version of
assembler - ie arithmetic, if/then, looping, jumps, calls etc - with higher
level data structures such as classes. The exceptions to this such as ML,
Prolog and similar tend not to be popular because people have trouble
thinking that way. The only exception to the rule has been SQL but that has
a very limited paradigm which isn't too hard to grasp.

Post by Scott Lurndal
I would also argue that C++, for example, has a well-defined
memory model.

Post by M***@qvyj9am89pyiehl99j.eu
Communicating sequential processes, conversely, has a strong
mathematical foundation.

Yes, I studied CSP in 1981 - was useful for creating formal
proofs. Wasn't a useful language.

Formal proofs are a good idea for small algorithms but utterly hopeless for
any significantly sized system since all that happens is the bugs get
transfered from the code to the proof itself.

Rainer Weikusat

2021-07-13 14:24:26 UTC

Post by M***@9hso4b520vypdrb_.org
On Mon, 12 Jul 2021 17:18:53 GMT

https://developer.arm.com/architectures/cpu-architecture/a-profile/memory-model
-tool

Most popular programming languages are simply a higher level version of
assembler - ie arithmetic, if/then, looping, jumps, calls etc

"Assembler" (the name of a certain kind of program, namely, one which
translated machine code in mnemonic notation into the corresponding
numbers/ bit patterns) doesn't have control structures like if/then/else
or loops. That's what differentiates high-level programming languages
from it.

M***@vu9ga1.gov.uk

2021-07-13 14:34:28 UTC

On Tue, 13 Jul 2021 15:24:26 +0100

Post by M***@9hso4b520vypdrb_.org
On Mon, 12 Jul 2021 17:18:53 GMT

Post by M***@9hso4b520vypdrb_.org

https://developer.arm.com/architectures/cpu-architecture/a-profile/memory-mod

Post by Scott Lurndal
-tool

Most popular programming languages are simply a higher level version of
assembler - ie arithmetic, if/then, looping, jumps, calls etc

What do you think operations such as jump-if-equals, jump-if-carry-set etc
are if not a type of if construct? Or are you just being pedantic?

Rainer Weikusat

2021-07-13 14:50:10 UTC

Post by M***@vu9ga1.gov.uk
On Tue, 13 Jul 2021 15:24:26 +0100

Post by M***@9hso4b520vypdrb_.org
On Mon, 12 Jul 2021 17:18:53 GMT

Post by M***@9hso4b520vypdrb_.org

https://developer.arm.com/architectures/cpu-architecture/a-profile/memory-mod

Post by Scott Lurndal
-tool

Most popular programming languages are simply a higher level version of
assembler - ie arithmetic, if/then, looping, jumps, calls etc

What do you think operations such as jump-if-equals, jump-if-carry-set etc
are if not a type of if construct? Or are you just being pedantic?

Because they aren't. They're conditional branches. Compilers for
high-level programming languages employ them to implement control
structures (like if/then/else). That's the difference between
"high-level languages" and "machine code".

M***@ozhp5ijbiubp.info

2021-07-13 16:06:46 UTC

On Tue, 13 Jul 2021 15:50:10 +0100

Post by M***@vu9ga1.gov.uk
On Tue, 13 Jul 2021 15:24:26 +0100

Post by Rainer Weikusat
"Assembler" (the name of a certain kind of program, namely, one which
translated machine code in mnemonic notation into the corresponding
numbers/ bit patterns) doesn't have control structures like if/then/else
or loops. That's what differentiates high-level programming languages
from it.

What do you think operations such as jump-if-equals, jump-if-carry-set etc
are if not a type of if construct? Or are you just being pedantic?

Assembler doesn't have for-next or while either, whats your point? There
is still a direct link between most high level declarative language statements
and assembler instructions unlike languages such as SQL or Prolog. But at least
we know you're a pedant now.

Rainer Weikusat

2021-07-13 16:14:45 UTC

Post by M***@ozhp5ijbiubp.info
On Tue, 13 Jul 2021 15:50:10 +0100

Post by M***@vu9ga1.gov.uk
On Tue, 13 Jul 2021 15:24:26 +0100

What do you think operations such as jump-if-equals, jump-if-carry-set etc
are if not a type of if construct? Or are you just being pedantic?

The only thing "we" know is that you have no clue about programming
language development and actively resent to learn anything about it.

M***@f_ufpuushfeusv9g1f9y.gov.uk

2021-07-14 07:12:36 UTC

Post by M***@ozhp5ijbiubp.info

On Tue, 13 Jul 2021 17:14:45 +0100

Post by M***@ozhp5ijbiubp.info
On Tue, 13 Jul 2021 15:50:10 +0100

Post by M***@vu9ga1.gov.uk
On Tue, 13 Jul 2021 15:24:26 +0100

What do you think operations such as jump-if-equals, jump-if-carry-set etc
are if not a type of if construct? Or are you just being pedantic?

Assembler doesn't have for-next or while either, whats your point? There
is still a direct link between most high level declarative language

statements

Post by M***@ozhp5ijbiubp.info
and assembler instructions unlike languages such as SQL or Prolog. But at

least

Post by M***@ozhp5ijbiubp.info
we know you're a pedant now.

The only thing "we" know is that you have no clue about programming
language development and actively resent to learn anything about it.

That the best riposte you've got? FWIW I've written a cut down interpreted
version of C for a company I worked at and I've written my own version of
BASIC along with plenty of other mini parsers for various tasks, so I do have
a vague idea of how these things work. You however are an arrogant twat but
then we knew that already.

James K. Lowden

2021-07-14 20:00:51 UTC

On Tue, 13 Jul 2021 14:34:28 +0000 (UTC)

Post by M***@vu9ga1.gov.uk

Post by M***@9hso4b520vypdrb_.org
Most popular programming languages are simply a higher level
version of assembler - ie arithmetic, if/then, looping, jumps,
calls etc

"Assembler" (the name of a certain kind of program, namely, one which
translated machine code in mnemonic notation into the corresponding
numbers/ bit patterns) doesn't have control structures like
if/then/else or loops. That's what differentiates high-level
programming languages from it.

What do you think operations such as jump-if-equals,
jump-if-carry-set etc are if not a type of if construct? Or are you
just being pedantic?

Rainer's not being pedantic. He's being precise.

Processors don't have loop or "else" opcodes. They don't have "call",
either, afaik. His statement that such constructs are what distinguish
"high level" languages from assembler isn't pedantry; it's the textbook
definition. It's why Fortran and Algol and C and Cobol were invented.
Not according to me; according to those who did the inventing.

I think the OP, in referencing "assembler", really meant a
class of imperative programming langauges that sought, as a design
criterion, to be convertible to machine code. That would distinguish
them not just from logical and functional languages (Prolog, ML) but
also from Lisp.

I'm sure I read Dennis Ritchie say that C is an idealized assembler for
an idealized machine, but I've never been able to track it down. I
think it's quite accurate, for some value of "idealized".

--jkl

Keith Thompson

2021-07-14 20:42:07 UTC

"James K. Lowden" <***@speakeasy.net> writes:
[...]

Post by James K. Lowden
Processors don't have loop or "else" opcodes. They don't have "call",
either, afaik. His statement that such constructs are what distinguish
"high level" languages from assembler isn't pedantry; it's the textbook
definition. It's why Fortran and Algol and C and Cobol were invented.
Not according to me; according to those who did the inventing.

To be picky, some CPUs do have "call" instructions. Others have a
subroutine call instruction for which they use (or rather, for which the
assembler uses) a different name; for example ARM uses BL. And some
CPUs have different call-like instructions depending on how far away the
callee is from the caller in the address space.

Post by James K. Lowden
I think the OP, in referencing "assembler", really meant a
class of imperative programming langauges that sought, as a design
criterion, to be convertible to machine code. That would distinguish
them not just from logical and functional languages (Prolog, ML) but
also from Lisp.

Well, just about any programming language can be converted to machine
code. The point of assembly language is that the mapping is precisely
defined. For a higher level language, the purpose of the mapping is to
implement the behavior defined by the source program.

Post by James K. Lowden
I'm sure I read Dennis Ritchie say that C is an idealized assembler for
an idealized machine, but I've never been able to track it down. I
think it's quite accurate, for some value of "idealized".

If you look at the range of languages from binary machine code to
assembly to C to whatever very-high-level language you like, in my
opinion the biggest semantic gap is between specifying instruction
sequences and specifying run-time behavior.

It's true that C, for example, is closer to the low end of abstraction
than to the high end (and yes, many of its features are strongly
inspired by common CPU instructions), but it's still "above" the
instructions vs. behavior semantic gap.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Scott Lurndal

2021-07-14 22:34:37 UTC

Post by James K. Lowden
On Tue, 13 Jul 2021 14:34:28 +0000 (UTC)

Post by M***@vu9ga1.gov.uk

Post by M***@9hso4b520vypdrb_.org
Most popular programming languages are simply a higher level
version of assembler - ie arithmetic, if/then, looping, jumps,
calls etc

"Assembler" (the name of a certain kind of program, namely, one which
translated machine code in mnemonic notation into the corresponding
numbers/ bit patterns) doesn't have control structures like
if/then/else or loops. That's what differentiates high-level
programming languages from it.

What do you think operations such as jump-if-equals,
jump-if-carry-set etc are if not a type of if construct? Or are you
just being pedantic?

Rainer's not being pedantic. He's being precise.
Processors don't have loop or "else" opcodes.

SOB on the PDP-11 qualifies as a loop opcode (subtract one and branch).

and of course the "LOOP" and "LOOPcc" instructions on the Intel x86 processors.

Burroughs medium systems didn't offer an assembler at all, but had a
higher level language called BPL (Burroughs Programming Language) that
had constructs sufficient to write efficient low level code.

James K. Lowden

2021-07-15 16:38:26 UTC

On Wed, 14 Jul 2021 22:34:37 GMT

Post by James K. Lowden
Rainer's not being pedantic. He's being precise.
Processors don't have loop or "else" opcodes.

SOB on the PDP-11 qualifies as a loop opcode (subtract one and
branch).

OK, if you say so. In my mind, that doesn't qualify as a "loop opcode"
because it's just a branch. It supports:

if( --k ) goto foo;

but not

while( --k ) ++j;

because you need at least one more jump to complete the loop.

--jkl

Keith Thompson

2021-07-13 21:01:50 UTC

Post by M***@9hso4b520vypdrb_.org
On Mon, 12 Jul 2021 17:18:53 GMT

https://developer.arm.com/architectures/cpu-architecture/a-profile/memory-model
-tool

Most popular programming languages are simply a higher level version of
assembler - ie arithmetic, if/then, looping, jumps, calls etc

In my opinion, the critical feature that differentiates assembly
languages from higher-level languages (including C) is that an assembly
language program specifies a sequence of CPU instructions, while a
program in a higher-level language specifies run-time behavior.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

M***@waq_91fv.org

2021-07-14 07:26:35 UTC

On Tue, 13 Jul 2021 14:01:50 -0700

Bear in mind of course that a lot of CPU instructions these days are
essentially "high level" in that they have to be broken down into further
actions by the CPU itself, eg x86 trigonometric opcodes, so its turtles
all the way down.

Keith Thompson

2021-07-14 07:58:40 UTC

Post by M***@waq_91fv.org
On Tue, 13 Jul 2021 14:01:50 -0700

True, but not particularly relevant to my point. Each target CPU
exposes an instruction set, regardless of how it's implemented. One
chip in a family might implement a given instruction directly, another
in microcode. Software has no access to the underlying turtles.

The point is that assembly language specifies those instructions;
higher-level languages do not (ignoring inline assembly).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

M***@xo6j81fgqjgx8x715co.co.uk

2021-07-14 08:22:50 UTC

On Wed, 14 Jul 2021 00:58:40 -0700

Post by M***@waq_91fv.org
essentially "high level" in that they have to be broken down into further
actions by the CPU itself, eg x86 trigonometric opcodes, so its turtles
all the way down.

Do modern CPUs still use microcode?

Post by Keith Thompson
The point is that assembly language specifies those instructions;
higher-level languages do not (ignoring inline assembly).

Sure, but my point was that the logical constructs in a procedural language
are similar to those in assembler and generalily map directly as 1 -> N .
The same cannot be said for lamguages such as SQL, Prolog etc where there is
no direct mapping between a lot of their contructs and assembler.

Keith Thompson

2021-07-14 09:03:22 UTC

Post by M***@xo6j81fgqjgx8x715co.co.uk
On Wed, 14 Jul 2021 00:58:40 -0700

Post by M***@xo6j81fgqjgx8x715co.co.uk

Post by M***@waq_91fv.org
essentially "high level" in that they have to be broken down into further
actions by the CPU itself, eg x86 trigonometric opcodes, so its turtles
all the way down.

Do modern CPUs still use microcode?

I don't know.

Post by Keith Thompson
The point is that assembly language specifies those instructions;
higher-level languages do not (ignoring inline assembly).

Sure, but my point was that the logical constructs in a procedural language
are similar to those in assembler and generalily map directly as 1 -> N .

And my point is that no, they don't.

A given implementation of a compiled language maps source code to CPU
instructions, but that mapping is not defined by the language.

If I write
printf("Hello, world\n");
in a C program, nothing in the C language says anything about what CPU
instructions will be generated -- and of course the instructions
generated by a given implementation depend on the target CPU. What the
C language says is that the program's behavior is to send a particular
character sequence to the standard output stream. An implementation can
do anything it likes to achieve that behavior (including generating an
equivalent call to puts, which gcc just did for me).

If I write a switch statement, I expect the compiler to generate very
different instruction sequences depending on how the case values are
distributed.

If I write an assembly program containing
subq $16, %rsp
leaq .LC0(%rip), %rdi
call ***@PLT
I expect the assembler to generate subq, leaq, and call instructions.

Post by M***@xo6j81fgqjgx8x715co.co.uk
The same cannot be said for lamguages such as SQL, Prolog etc where there is
no direct mapping between a lot of their contructs and assembler.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+***@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

M***@37xjp3wvmwkjyu4x.tv

2021-07-14 09:31:02 UTC

On Wed, 14 Jul 2021 02:03:22 -0700

Post by M***@xo6j81fgqjgx8x715co.co.uk
Sure, but my point was that the logical constructs in a procedural language
are similar to those in assembler and generalily map directly as 1 -> N .

And my point is that no, they don't.

I'm not talking about the exact same mapping all the time, what I mean is
you can see the logical similarities between the for example C code and
assembly language. This is not the case for higher level languages.

Post by Keith Thompson
If I write
printf("Hello, world\n");
in a C program, nothing in the C language says anything about what CPU
instructions will be generated -- and of course the instructions

No, but you'll see the string defined in the assembler and a call to
some subroutine - or even the actual code - to output it. What else would
you expect to see in the assembler, a prime number generator?

However if I write "select * from table" then assuming it ever reaches the
assembly level there would be probably hundreds or thousands of lines of
assembly code.

Richard Kettlewell

2021-07-14 09:23:00 UTC

Post by M***@xo6j81fgqjgx8x715co.co.uk

Post by M***@waq_91fv.org
essentially "high level" in that they have to be broken down into further
actions by the CPU itself, eg x86 trigonometric opcodes, so its turtles
all the way down.

Do modern CPUs still use microcode?

Intel/AMD, yes. ARM, no (though there’s still a translation from the
surface ISA into micro-ops, at least in some devices).

--
https://www.greenend.org.uk/rjk/

Scott Lurndal

2021-07-14 15:05:53 UTC

Post by M***@xo6j81fgqjgx8x715co.co.uk
On Wed, 14 Jul 2021 00:58:40 -0700

Post by M***@waq_91fv.org
essentially "high level" in that they have to be broken down into further
actions by the CPU itself, eg x86 trigonometric opcodes, so its turtles
all the way down.

Do modern CPUs still use microcode?

Not in the sense of the olden days.

There is a blob of loadable "stuff" that most Intel CPUs and older
AMD cpus would load from the BIOS that handled various things where
the possibility of bugs was higher than average, allowing such
bugs to be repaired without metal fixes. But it's not microcode
in the classic sense - most instructions are decoded and executed
in logic.

M***@u29fnm4ndcl379qydk5ln.tv

2021-07-14 16:06:39 UTC

On Wed, 14 Jul 2021 15:05:53 GMT

Post by M***@xo6j81fgqjgx8x715co.co.uk
On Wed, 14 Jul 2021 00:58:40 -0700

Post by M***@waq_91fv.org
essentially "high level" in that they have to be broken down into further
actions by the CPU itself, eg x86 trigonometric opcodes, so its turtles
all the way down.

Do modern CPUs still use microcode?

Not in the sense of the olden days.
There is a blob of loadable "stuff" that most Intel CPUs and older
AMD cpus would load from the BIOS that handled various things where
the possibility of bugs was higher than average, allowing such
bugs to be repaired without metal fixes. But it's not microcode
in the classic sense - most instructions are decoded and executed
in logic.

No wonder the transister count of modern CPUs is so phenomenally high in that
case :) I wouldn't want to even think about how you'd build a circuit in logic
gates to solve trig functions for example. Just doing division is hard enough
even in assembler if the instruction isn't provided by the CPU going by a video
I watched.

Scott Lurndal

2021-07-14 17:30:40 UTC

Post by M***@u29fnm4ndcl379qydk5ln.tv
On Wed, 14 Jul 2021 15:05:53 GMT

Post by M***@xo6j81fgqjgx8x715co.co.uk
On Wed, 14 Jul 2021 00:58:40 -0700

Post by M***@waq_91fv.org
essentially "high level" in that they have to be broken down into further
actions by the CPU itself, eg x86 trigonometric opcodes, so its turtles
all the way down.

Do modern CPUs still use microcode?

Not in the sense of the olden days.
There is a blob of loadable "stuff" that most Intel CPUs and older
AMD cpus would load from the BIOS that handled various things where
the possibility of bugs was higher than average, allowing such
bugs to be repaired without metal fixes. But it's not microcode
in the classic sense - most instructions are decoded and executed
in logic.

Well, nowadays, most of that is handled by the VHDL tools using libraries
of existing functionality.

//
// Add two BCD digits plus the carry_in, producing sum and carry_out
//
module bcd_adder(a,b,carry_in,sum,carry_out);

input [3:0] a,b;
input carry_in;

output [3:0] sum;
output carry_out;

reg [4:0] sum_temp;
reg [3:0] sum;
reg carry_out;

always @(a,b,carry_in)
begin
sum_temp = a+b+carry_in; //add all the inputs
if(sum_temp > 9) begin
sum_temp = sum_temp+6; //add 6, if result is more than 9.
carry_out = 1; //set the carry output
sum = sum_temp[3:0];
end
else begin
carry_out = 0;
sum = sum_temp[3:0];
end
end

endmodule

This gets compiled into the actual logic (gates & flops) during PD,
and is simply cascaded (carry_out -> carry_in) to do multiple digit
additions.

And here is a decoder that decodes a 4-digit index into
16 signals.

// 4x16 decoder

module decoder4x16(select_in, select_out, enable);
input[3:0] select_in;
intput enable;
output[15:0] select_out;

assign select_out[0]= (~select_in[3]) & (~select_in[2]) &(~select_in[1]) & (~select_in[0]) & (enable) ;
assign select_out[1]= (~select_in[3]) & (~select_in[2]) &(~select_in[1]) & (select_in[0]) & (enable) ;
assign select_out[2]= (~select_in[3]) & (~select_in[2]) &(select_in[1]) & (~select_in[0]) & (enable) ;
assign select_out[3]= (~select_in[3]) & (~select_in[2]) &(select_in[1]) & (select_in[0]) & (enable) ;
assign select_out[4]= (~select_in[3]) & (select_in[2]) &(~select_in[1]) & (~select_in[0]) & (enable) ;
assign select_out[5]= (~select_in[3]) & (select_in[2]) &(~select_in[1]) & (select_in[0]) & (enable) ;
assign select_out[6]= (~select_in[3]) & (select_in[2]) &(select_in[1]) & (~select_in[0]) & (enable) ;
assign select_out[7]= (~select_in[3]) & (select_in[2]) &(select_in[1]) & (select_in[0]) & (enable) ;

assign select_out[8]= (select_in[3]) & (~select_in[2]) &(~select_in[1]) & (~select_in[0]) & (enable) ;
assign select_out[9]= (select_in[3]) & (~select_in[2]) &(~select_in[1]) & (select_in[0]) & (enable) ;
assign select_out[10]= (select_in[3]) & (~select_in[2]) &(select_in[1]) & (~select_in[0]) & (enable) ;
assign select_out[11]= (select_in[3]) & (~select_in[2]) &(select_in[1]) & (select_in[0]) & (enable) ;
assign select_out[12]= (select_in[3]) & (select_in[2]) &(~select_in[1]) & (~select_in[0]) & (enable) ;
assign select_out[13]= (select_in[3]) & (select_in[2]) &(~select_in[1]) & (select_in[0]) & (enable) ;
assign select_out[14]= (select_in[3]) & (select_in[2]) &(select_in[1]) & (~select_in[0]) & (enable) ;
assign select_out[15]= (select_in[3]) & (select_in[2]) &(select_in[1]) & (select_in[0]) & (enable) ;

endmodule;

/ V500 Operand Fetch PMUX
//
// Permutes the digits in a 10 digit BCD operand in various ways.
// sellect value originates in CREG[46:43].
//
// Reference: V500 Fetch Module 6.2.1.4 (pp 72) Document # 1993 5212.
//

module pmux(clock, sellect, datain, pmuxerr, pmuxout);

input clock;
input[3:0] sellect;
input[39:0] datain;
output pmuxerr;
output[39:0] pmuxout;

reg[39:0] pmuxout;
reg pmuxerr;

//always @(selprty or sellect or datain)
always @(posedge clock)
begin
pmuxerr <= 1'b0; // Spec is not clear on this

case (sellect[3:0])
4'b0000 : begin // Left Rotate by 1 digit
pmuxout[39:36] <= datain[35:32]; // D9 <= D8
pmuxout[35:32] <= datain[31:28]; // D8 <= D7
pmuxout[31:28] <= datain[27:24]; // D7 <= D6
pmuxout[27:24] <= datain[23:20]; // D6 <= D5
pmuxout[23:20] <= datain[19:16]; // D5 <= D4
pmuxout[19:16] <= datain[15:12]; // D4 <= D3
pmuxout[15:12] <= datain[11: 8]; // D3 <= D2
pmuxout[11: 8] <= datain[ 7: 4]; // D2 <= D1
pmuxout[ 7: 4] <= datain[ 3: 0]; // D1 <= D0
pmuxout[ 3: 0] <= datain[39:36]; // D0 <= D9
end
4'b0001, 4'b0010 : pmuxout <= datain; // Pass Data without permutation
4'b0011 : begin // Left Rotate by 3 digits
pmuxout[39:36] <= datain[27:24]; // D9 <= D6
pmuxout[35:32] <= datain[23:20]; // D8 <= D5
pmuxout[31:28] <= datain[19:16]; // D7 <= D4
pmuxout[27:24] <= datain[15:12]; // D6 <= D3
pmuxout[23:20] <= datain[11: 8]; // D5 <= D2
pmuxout[19:16] <= datain[ 7: 4]; // D4 <= D1
pmuxout[15:12] <= datain[ 3: 0]; // D3 <= D0
pmuxout[11: 8] <= datain[39:36]; // D2 <= D9
pmuxout[ 7: 4] <= datain[35:32]; // D1 <= D8
pmuxout[ 3: 0] <= datain[31:28]; // D0 <= D7
end
4'b0100 : begin // Left Rotate by 4 digits
pmuxout[39:36] <= datain[23:20]; // D9 <= D5
pmuxout[35:32] <= datain[19:16]; // D8 <= D4
pmuxout[31:28] <= datain[15:12]; // D7 <= D3
pmuxout[27:24] <= datain[11: 8]; // D6 <= D2
pmuxout[23:20] <= datain[ 7: 4]; // D5 <= D1
pmuxout[19:16] <= datain[ 3: 0]; // D4 <= D0
pmuxout[15:12] <= datain[39:36]; // D3 <= D9
pmuxout[11: 8] <= datain[35:32]; // D2 <= D8
pmuxout[ 7: 4] <= datain[31:28]; // D1 <= D7
pmuxout[ 3: 0] <= datain[27:24]; // D0 <= D6
end
4'b0101 : begin // Left Rotate by 5 digits
pmuxout[39:36] <= datain[19:16]; // D9 <= D4
pmuxout[35:32] <= datain[15:12]; // D8 <= D3
pmuxout[31:28] <= datain[11: 8]; // D7 <= D2
pmuxout[27:24] <= datain[ 7: 4]; // D6 <= D1
pmuxout[23:20] <= datain[ 3: 0]; // D5 <= D0
pmuxout[19:16] <= datain[39:36]; // D4 <= D9
pmuxout[15:12] <= datain[35:32]; // D3 <= D8
pmuxout[11: 8] <= datain[31:28]; // D2 <= D7
pmuxout[ 7: 4] <= datain[27:24]; // D1 <= D6
pmuxout[ 3: 0] <= datain[23:20]; // D0 <= D5
end
4'b0110 : begin // Left Rotate by 6 digits
pmuxout[39:36] <= datain[15:12]; // D9 <= D3
pmuxout[35:32] <= datain[11: 8]; // D8 <= D2
pmuxout[31:28] <= datain[ 7: 4]; // D7 <= D1
pmuxout[27:24] <= datain[ 3: 0]; // D6 <= D0
pmuxout[23:20] <= datain[39:36]; // D5 <= D9
pmuxout[19:16] <= datain[35:32]; // D4 <= D8
pmuxout[15:12] <= datain[31:28]; // D3 <= D7
pmuxout[11: 8] <= datain[27:24]; // D2 <= D6
pmuxout[ 7: 4] <= datain[23:20]; // D1 <= D5
pmuxout[ 3: 0] <= datain[19:16]; // D0 <= D4
end
4'b0111 : begin // Left Rotate by 7 digits
pmuxout[39:36] <= datain[11: 8]; // D9 <= D2
pmuxout[35:32] <= datain[ 7: 4]; // D8 <= D1
pmuxout[31:28] <= datain[ 3: 0]; // D7 <= D0
pmuxout[27:24] <= datain[39:36]; // D6 <= D9
pmuxout[23:20] <= datain[35:32]; // D5 <= D8
pmuxout[19:16] <= datain[31:28]; // D4 <= D7
pmuxout[15:12] <= datain[27:24]; // D3 <= D6
pmuxout[11: 8] <= datain[23:20]; // D2 <= D5
pmuxout[ 7: 4] <= datain[19:16]; // D1 <= D4
pmuxout[ 3: 0] <= datain[15:12]; // D0 <= D3
end
4'b1000 : begin // Left Rotate by 8 digits
pmuxout[39:36] <= datain[ 7: 4]; // D9 <= D1
pmuxout[35:32] <= datain[ 3: 0]; // D8 <= D0
pmuxout[31:28] <= datain[39:36]; // D7 <= D9
pmuxout[27:24] <= datain[35:32]; // D6 <= D8
pmuxout[23:20] <= datain[31:28]; // D5 <= D7
pmuxout[19:16] <= datain[27:24]; // D4 <= D6
pmuxout[15:12] <= datain[23:20]; // D3 <= D5
pmuxout[11: 8] <= datain[19:16]; // D2 <= D4
pmuxout[ 7: 4] <= datain[15:12]; // D1 <= D3
pmuxout[ 3: 0] <= datain[11: 8]; // D0 <= D2
end
4'b1001 : begin // Left Rotate by 9 digits
pmuxout[39:36] <= datain[ 3: 0]; // D9 <= D0
pmuxout[35:32] <= datain[39:36]; // D8 <= D9
pmuxout[31:28] <= datain[35:32]; // D7 <= D8
pmuxout[27:24] <= datain[31:28]; // D6 <= D7
pmuxout[23:20] <= datain[27:24]; // D5 <= D6
pmuxout[19:16] <= datain[23:20]; // D4 <= D5
pmuxout[15:12] <= datain[19:16]; // D3 <= D4
pmuxout[11: 8] <= datain[15:12]; // D2 <= D3
pmuxout[ 7: 4] <= datain[11: 8]; // D1 <= D2
pmuxout[ 3: 0] <= datain[ 7: 4]; // D0 <= D1
end
4'b1010 : begin // Non-Branch IA syllable
pmuxout[39: 20] = 20'b00000000000000000000;
pmuxout[19:16] <= datain[35:32]; // D4 <= D8
pmuxout[15:12] <= datain[31:28]; // D3 <= D7
pmuxout[11: 8] <= datain[27:24]; // D2 <= D6
pmuxout[ 7: 4] <= datain[23:20]; // D1 <= D5
pmuxout[ 3: 0] <= datain[19:16]; // D0 <= D4
end
4'b1011 : begin // Extended IA syllable justify
pmuxout[39:24] = 16'b0000000000000000;
pmuxout[23:20] <= datain[31:28]; // D5 <= D7
pmuxout[19:16] <= datain[27:24]; // D4 <= D6
pmuxout[15:12] <= datain[23:20]; // D3 <= D5
pmuxout[11: 8] <= datain[19:16]; // D2 <= D4
pmuxout[ 7: 4] <= datain[15:12]; // D1 <= D3
pmuxout[ 3: 0] <= datain[11: 8]; // D0 <= D2
end
4'b1100 : begin // Branch IA syllable justify (non-extended)
pmuxout[39:22] = 18'b000000000000000000;
pmuxout[21:20] <= datain[37:36]; // D9 <= D9[1:0]
pmuxout[19:16] <= datain[35:32]; // D4 <= D8
pmuxout[15:12] <= datain[31:28]; // D3 <= D7
pmuxout[11: 8] <= datain[27:24]; // D2 <= D6
pmuxout[ 7: 4] <= datain[23:20]; // D1 <= D5
pmuxout[ 3: 0] <= datain[19:16]; // D0 <= D4
end
4'b1101 : begin // Strip Zone
pmuxout[39:36] <= datain[35:32]; // D9 <= D8
pmuxout[35:32] <= datain[27:24]; // D8 <= D6
pmuxout[31:28] <= datain[19:16]; // D7 <= D4
pmuxout[27:24] <= datain[11: 8]; // D6 <= D2
pmuxout[23:20] <= datain[ 3: 0]; // D5 <= D0
pmuxout[19: 0] = 20'b00000000000000000000;
end
4'b1110 : begin // IX Rotate
pmuxout[39:36] <= datain[35:32]; // D9 <= D8
pmuxout[35:32] = 4'b0000; // D8 <= 0
pmuxout[31:28] <= datain[39:36]; // D7 <= D9
pmuxout[27:24] = 4'b0000; // D6 <= 0
pmuxout[23:20] <= datain[31:28]; // D5 <= D7
pmuxout[19:16] <= datain[27:24]; // D4 <= D6
pmuxout[15:12] <= datain[23:20]; // D3 <= D5
pmuxout[11: 8] <= datain[19:16]; // D2 <= D4
pmuxout[ 7: 4] <= datain[15:12]; // D1 <= D3
pmuxout[ 3: 0] <= datain[11: 8]; // D0 <= D2
end
4'b1111 : begin // Left rotate by 2
pmuxout[39:36] <= datain[31:28]; // D9 <= D7
pmuxout[35:32] <= datain[27:24]; // D8 <= D6
pmuxout[31:28] <= datain[23:20]; // D7 <= D5
pmuxout[27:24] <= datain[19:16]; // D6 <= D4
pmuxout[23:20] <= datain[15:12]; // D5 <= D3
pmuxout[19:16] <= datain[11: 8]; // D4 <= D2
pmuxout[15:12] <= datain[ 7: 4]; // D3 <= D1
pmuxout[11: 8] <= datain[ 3: 0]; // D2 <= D0
pmuxout[ 7: 4] <= datain[39:36]; // D1 <= D9
pmuxout[ 3: 0] <= datain[35:32]; // D0 <= D8
end
endcase
end
endmodule
4'b1010 : begin // Non-Branch IA syllable
pmuxout[39: 20] = 20'b00000000000000000000;
pmuxout[19:16] <= datain[35:32]; // D4 <= D8
pmuxout[15:12] <= datain[31:28]; // D3 <= D7
pmuxout[11: 8] <= datain[27:24]; // D2 <= D6
pmuxout[ 7: 4] <= datain[23:20]; // D1 <= D5
pmuxout[ 3: 0] <= datain[19:16]; // D0 <= D4
end
4'b1011 : begin // Extended IA syllable justify
pmuxout[39:24] = 16'b0000000000000000;
pmuxout[23:20] <= datain[31:28]; // D5 <= D7
pmuxout[19:16] <= datain[27:24]; // D4 <= D6
pmuxout[15:12] <= datain[23:20]; // D3 <= D5
pmuxout[11: 8] <= datain[19:16]; // D2 <= D4
pmuxout[ 7: 4] <= datain[15:12]; // D1 <= D3
pmuxout[ 3: 0] <= datain[11: 8]; // D0 <= D2
end
4'b1100 : begin // Branch IA syllable justify (non-extended)
pmuxout[39:22] = 18'b000000000000000000;
pmuxout[21:20] <= datain[37:36]; // D9 <= D9[1:0]
pmuxout[19:16] <= datain[35:32]; // D4 <= D8
pmuxout[15:12] <= datain[31:28]; // D3 <= D7
pmuxout[11: 8] <= datain[27:24]; // D2 <= D6
pmuxout[ 7: 4] <= datain[23:20]; // D1 <= D5
pmuxout[ 3: 0] <= datain[19:16]; // D0 <= D4
end
4'b1101 : begin // Strip Zone
pmuxout[39:36] <= datain[35:32]; // D9 <= D8
pmuxout[35:32] <= datain[27:24]; // D8 <= D6
pmuxout[31:28] <= datain[19:16]; // D7 <= D4
pmuxout[27:24] <= datain[11: 8]; // D6 <= D2
pmuxout[23:20] <= datain[ 3: 0]; // D5 <= D0
pmuxout[19: 0] = 20'b00000000000000000000;
end
4'b1110 : begin // IX Rotate
pmuxout[39:36] <= datain[35:32]; // D9 <= D8
pmuxout[35:32] = 4'b0000; // D8 <= 0
pmuxout[31:28] <= datain[39:36]; // D7 <= D9
pmuxout[27:24] = 4'b0000; // D6 <= 0
pmuxout[23:20] <= datain[31:28]; // D5 <= D7
pmuxout[19:16] <= datain[27:24]; // D4 <= D6
pmuxout[15:12] <= datain[23:20]; // D3 <= D5
pmuxout[11: 8] <= datain[19:16]; // D2 <= D4
pmuxout[ 7: 4] <= datain[15:12]; // D1 <= D3
pmuxout[ 3: 0] <= datain[11: 8]; // D0 <= D2
end
4'b1111 : begin // Left rotate by 2
pmuxout[39:36] <= datain[31:28]; // D9 <= D7
pmuxout[35:32] <= datain[27:24]; // D8 <= D6
pmuxout[31:28] <= datain[23:20]; // D7 <= D5
pmuxout[27:24] <= datain[19:16]; // D6 <= D4
pmuxout[23:20] <= datain[15:12]; // D5 <= D3
pmuxout[19:16] <= datain[11: 8]; // D4 <= D2
pmuxout[15:12] <= datain[ 7: 4]; // D3 <= D1
pmuxout[11: 8] <= datain[ 3: 0]; // D2 <= D0
pmuxout[ 7: 4] <= datain[39:36]; // D1 <= D9
pmuxout[ 3: 0] <= datain[35:32]; // D0 <= D8
end
endcase
end
endmodule

M***@235e133vk70b96l1vv.ac.uk

2021-07-15 08:31:05 UTC

On Wed, 14 Jul 2021 17:30:40 GMT

Post by M***@u29fnm4ndcl379qydk5ln.tv
No wonder the transister count of modern CPUs is so phenomenally high in that
case :) I wouldn't want to even think about how you'd build a circuit in logic
gates to solve trig functions for example. Just doing division is hard enough
even in assembler if the instruction isn't provided by the CPU going by a

video

Post by M***@u29fnm4ndcl379qydk5ln.tv
I watched.

Well, nowadays, most of that is handled by the VHDL tools using libraries
of existing functionality.

Fair enough. But if there's a bug in the chip does any human have a chance of
figuring out where it is or how to fix it?

Post by Scott Lurndal
// Add two BCD digits plus the carry_in, producing sum and carry_out
//
module bcd_adder(a,b,carry_in,sum,carry_out);

Looks like a mix of a declarative and procedural language. Is that fair?

Scott Lurndal

2021-07-15 13:46:23 UTC

Post by M***@235e133vk70b96l1vv.ac.uk
On Wed, 14 Jul 2021 17:30:40 GMT

Post by M***@235e133vk70b96l1vv.ac.uk

video

Post by M***@u29fnm4ndcl379qydk5ln.tv
I watched.

Well, nowadays, most of that is handled by the VHDL tools using libraries
of existing functionality.

Fair enough. But if there's a bug in the chip does any human have a chance of
figuring out where it is or how to fix it?

Obviously, there must be some what to figure out where it is and how to
fix it, or no modern chip would ever get fabricated.

There are software tools that will simulate the logic given the VHDL input. Slow,
but comprehensive simulations of all signals and gates. We've a couple
of compute farms with thousands of processing cores available to run
simulations.

There are hardware logic simulators (known as emulators) that are large
arrays of Field Programmable Gate Arrays that are loaded with the design
(or in some cases, a subset of the design) and allow one to simulate the
entire design. These machines are very expensive, but essential for
timely development of complex designs.

There are floorplanning and p-n-r tools to handle laying out the
gates, flops and wires on the silicon substrate to ensure timing is met
(sub-300 picosecond margins at 3Ghz) and for signal integrity (e.g. crosstalk
elimination or mitigation).

Ultimately a netlist is produced (taped out in the vernacular) and
delivered to the Fab (once upon a time, by 9-track tape, hence the
vernacular).

Post by Scott Lurndal
// Add two BCD digits plus the carry_in, producing sum and carry_out
//
module bcd_adder(a,b,carry_in,sum,carry_out);

Looks like a mix of a declarative and procedural language. Is that fair?

Fair enough.

The fundamental difference between Verilog and C++ is that in verilog
everything[*] happens in parallel, synchronized to an edge of a clock
signal.

[*] a simplification, but sufficient for this description.

Nicolas George

2021-07-14 15:33:17 UTC

Keith Thompson , dans le message

Post by Keith Thompson
True, but not particularly relevant to my point. Each target CPU
exposes an instruction set, regardless of how it's implemented. One
chip in a family might implement a given instruction directly, another
in microcode. Software has no access to the underlying turtles.

I always wonder if it would make sense to give software access to the
"underlying turtles", as you put it. Not for generic code, of course, but
for code that needs to be extremely fast, the bits that are currently
written as CPU-specific, and even CPU-generation-specific, assembly code,
like the FFTs in a codec.

M***@tqp8qqn5u.biz

2021-07-14 16:07:59 UTC

On 14 Jul 2021 15:33:17 GMT

Post by Nicolas George
Keith Thompson , dans le message

Probably far too complex to implement as you'd need a whole host of extra
assembler instructions to support it. If you need that kind of literally
to-the-metal logic then you're probably better off putting an FPGA into
your circuit.

Scott Lurndal

2021-07-14 17:37:10 UTC

Post by M***@tqp8qqn5u.biz
On 14 Jul 2021 15:33:17 GMT

Post by Nicolas George
Keith Thompson , dans le message

The chip I'm currently working on has two dozen 2.5Ghz ARMv9 cores,
over a dozen high-end DSP (Digital Signal Processors), hardware blocks
to manage ethernet packets (ingress, egress, classification,
routing, deep packet inspection, TLS initiation/termination,
and packet header manipulation)
and hardware blocks for machine learning and various proprietary
signal processing blocks. And a virtualizable hardware mechanism
to divide the hardware resources amongst virtual machines in a
secure, high-performance manner.

M***@q7hlm8jyc5498ve028.biz

2021-07-15 08:33:37 UTC

On Wed, 14 Jul 2021 17:37:10 GMT

Post by Scott Lurndal
The chip I'm currently working on has two dozen 2.5Ghz ARMv9 cores,
over a dozen high-end DSP (Digital Signal Processors), hardware blocks
to manage ethernet packets (ingress, egress, classification,
routing, deep packet inspection, TLS initiation/termination,
and packet header manipulation)
and hardware blocks for machine learning and various proprietary
signal processing blocks. And a virtualizable hardware mechanism
to divide the hardware resources amongst virtual machines in a
secure, high-performance manner.

Sounds interesting. Can you tell us what its for or is that classified?

M***@h4qtzo.eu

2021-07-15 14:36:19 UTC

On Thu, 15 Jul 2021 13:51:17 GMT

Post by M***@q7hlm8jyc5498ve028.biz
On Wed, 14 Jul 2021 17:37:10 GMT

Sounds interesting. Can you tell us what its for or is that classified?

5G cellular base stations.

Why does a base station need machine learning? Its simply a multiplexer.

M***@8stt_1g7x3hci.tv

2021-07-16 09:36:16 UTC

On Thu, 15 Jul 2021 16:05:58 GMT

Post by M***@h4qtzo.eu

5G cellular base stations.

Why does a base station need machine learning? Its simply a multiplexer.

Handling 5G requires far more than a multiplexer. Consider the
signal processing required for a radio head with a MIMO antenna
array. Once you've teased the data out of the hundreds of streams
active on the radio side, you need to process it, error correct it,
accomodate reflections from nearby obstructions, and produce a
data packet. That goes from the radio head to the base station
where it is convered from CPRI/eCPRI to IP packets (at multiples of
100Gbits/sec) and through a gateway to the internet.

I still don't see why any of that needs machine learning, its just bog
standard signal processing as done by 3G, 4G and a host of other protocols.

Scott Lurndal

2021-07-16 14:29:40 UTC

Post by M***@8stt_1g7x3hci.tv
On Thu, 15 Jul 2021 16:05:58 GMT

Post by M***@h4qtzo.eu

5G cellular base stations.

Why does a base station need machine learning? Its simply a multiplexer.

I still don't see why any of that needs machine learning, its just bog
standard signal processing

Actually, that's not the case. 5G is _quite_ different than 3G/4G/LTE,
we make chips for both.

And the use case for the ML is currently proprietary.

M***@_1oek_lp2ioz045hg.org

2021-07-17 15:06:24 UTC

On Fri, 16 Jul 2021 14:29:40 GMT

Post by M***@8stt_1g7x3hci.tv
On Thu, 15 Jul 2021 16:05:58 GMT

Post by M***@h4qtzo.eu

5G cellular base stations.

Why does a base station need machine learning? Its simply a multiplexer.

I still don't see why any of that needs machine learning, its just bog
standard signal processing

Actually, that's not the case. 5G is _quite_ different than 3G/4G/LTE,
we make chips for both.
And the use case for the ML is currently proprietary.

I strongly suspect the use case for ML is that it looks good in the sales
blurb.

Scott Lurndal

2021-07-14 17:32:05 UTC

Post by Nicolas George
Keith Thompson , dans le message

That's why ASIC SoCs have on-board DSPs and other offload engines
that the application programmer can offload such work to.

Nicolas George

2021-07-14 19:16:24 UTC

Post by Scott Lurndal
That's why ASIC SoCs have on-board DSPs and other offload engines
that the application programmer can offload such work to.

I meant for generic CPUs. Possibly existing ones at the sole marginal cost
of a firmware update.

James K. Lowden

2021-07-14 20:00:43 UTC

On Mon, 12 Jul 2021 17:18:53 GMT

Post by Scott Lurndal
I would also argue that C++, for example, has a well-defined
memory model.

I believe among Cox's "litmus tests" you'll find cases in which C++
produces different results on different machines because it provides no
guarantees (or insufficient guarantees). To the extent that the
langauge is an abstraction of the machine, that doesn't meet my
threshold for "well defined". I would say "not defined" is more
accurate.

Post by Scott Lurndal
Yes, I studied CSP in 1981 - was useful for creating formal
proofs. Wasn't a useful language.

That's a bit like saying relational algebra isn't a useful langage,
isn't it? CSP is a model of computing, not a means.

The Go language is a means, in part based on that model.

--jkl

Kaz Kylheku

2021-07-14 18:21:23 UTC

Post by M***@qvyj9am89pyiehl99j.eu
On Fri, 9 Jul 2021 21:42:52 -0000 (UTC)

Post by James K. Lowden
I've said many times that threads are a terrible model for
concurrent programming.

Not only does no popular programming language model multithreaded
execution, neither does it have much of a model in formal logic. I
suspect those two voids are related, and not related to my puberty.
Communicating sequential processes, conversely, has a strong
mathematical foundation.

On the other hand, threads duking it out over a shared heap: not so
much.

But any mundane problem in computing suddenly becomes dangerous and
exciting when transported into this realm!

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Scott Lurndal

2021-07-12 17:21:54 UTC

Threads are just something that most hackers get crazy about in their
programming puberty.

I disagree.

Threads are a highly useful paradigm for utilizing the resources
available on modern processors.

Consider, for example, the most sucessful multithreaded application
on any hardware is the Operating System itself.

The fact that some programmers can't be bothered to learn how
to use them properly, nor bother to learn about the underlying
hardware constraints and memory models doesn't invalidate the
concept of threaded code in any way, shape or form.

Gary R. Schmidt

2021-07-13 13:43:58 UTC

On 13/07/2021 03:21, Scott Lurndal wrote:
[SNIP]

Post by Scott Lurndal
The fact that some programmers can't be bothered to learn how
to use them properly, nor bother to learn about the underlying
hardware constraints and memory models doesn't invalidate the
concept of threaded code in any way, shape or form.

This, oh so many, many times this.

If you learn how to use threads properly, they can be a great solution
to some (but not all) problems, but if you only /think/ you know what
you are doing, it'll be a mess.

Cheers,
Gary B-)

--
Waiting for a new signature to suggest itself...

Kaz Kylheku

2021-07-14 18:52:24 UTC

Threads are just something that most hackers get crazy about in their
programming puberty.

Well, yes and no. Operating systems have concurrency, but it's not
necessarily the same as "threading".

If we look back at early Unix, we see that it's not multithreaded at
all. There is one processor, and when that processor is executing kernel
code, the code isn't preempted. Interrupts may happen, whose actions
are carefully constrainted, and that's about it.

When the control flow reaches some scheduling point, such as a wait for
semaphore, then unknown actions occur before control resumes, due to
other tasks being dispatched in the mean time. Non-local assumptions
which were true before the suspension don't necessarily hold and so the
code must be defensively written.

SMP support can be introduced into this paradigm with great care.

E.g. Linux first introduced non-preemptive SMP, and then evolved
the ability to opt-into preemption at compile time (which is
an incredibly bad idea to enable).

In the user space, the Unix fathers were careful to avoid introducing
anything like threads; it was separate processes carefully communicating
via bit pipes. Shared memory mechanisms were carefully introduced in
subsequent years: specially arranged sharing of specific pieces of
memory.

In any case, the concurrent programming at the kernel level is
substantially more sane than the haphazard user space threading models
bolted onto processes. It's almost a different beast.

Kernel code isn't constrainted by internal backward compatibility.
If some model of sycnhronization or whatever is found to be lacking
due to newer developments, code gets refactored.

For instance, in the introduction of SMP, interrupt disabling becomes
interrupt disabling combined with a spinlock; code gets globally
refactored to use that going forward. There is no support for old
kernel code.

The problem is that some of the people who have designed threading
interfaces for operating systems may actualy be in this camp;
and even those who aren't are hamstrung by compatibility with
a non-threaded model.

Threads were simply unleashed like bulls into the streets, into a
process model with global resources like the current working directory,
signals and fork.

The resulting compromises are indadequate; the problems are not solved.

The non-threaded virtual machine programming model does not contain
unsolved problems; it's done.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Scott Lurndal

2021-07-14 22:25:50 UTC

Threads are just something that most hackers get crazy about in their
programming puberty.

Well, yes and no. Operating systems have concurrency, but it's not
necessarily the same as "threading".
If we look back at early Unix,

Why bother? The discusson is about modern systems.

Post by Kaz Kylheku
SMP support can be introduced into this paradigm with great care.

Which was done by Burroughs and IBM in the 1960s, and
for unix various vendors in the 80's and 90's.

Post by Kaz Kylheku
E.g. Linux first introduced non-preemptive SMP, and then evolved
the ability to opt-into preemption at compile time (which is
an incredibly bad idea to enable).

Linux was hardly state of the art at that point compared with the
dozen or so production SMP versions of Unix from many vendors.

Post by Kaz Kylheku
In the user space, the Unix fathers were careful to avoid introducing
anything like threads;

On a single processor PDP-11, there was no need.

Post by Kaz Kylheku
In any case, the concurrent programming at the kernel level is
substantially more sane than the haphazard user space threading models
bolted onto processes. It's almost a different beast.

Actually, from my experience (Chorus/Mix, various Unisys Unix SMP & MPP
operating systems, multithreaded mainframe operating systems (1985)
and two different bare-metal hypervisors) there is little difference
between OS threading and application threading. They both require
the same synchronization primitives and knowledge of the hardware
memory model.

The problem is that some of the people who have designed threading
interfaces for operating systems may actualy be in this camp;

Please, provide names. I know many of the folks that designed the
traditional unix threading mechanisms (USL SVR4.2ES/MP , Digital), and without
exception, they are very smart people. Dave B will complain,
with reason, about some of the choices that were make for 1003.4
(pthreads), but overall, what is there is far from useless.

Rainer Weikusat

2021-07-13 14:21:41 UTC

I've tried to read throught this but at some point in time, the
pointless, contrived example and rampant misuse of borrowed[*]
terminology became too annoying. Who is this guy and why does he believe
independent memory accesses by one CPU should become visible to another
CPU in any particular order just because some other guy said so almost
50 years ago?

[*] A "litmus test" is used to determine the ph value of some liquid
which is decidedly not "binary."

Scott Lurndal

2021-07-13 16:05:59 UTC

And his history is selective. He claims Plan 9 as the first true multiprocessor
x86 OS without a global locak in 1997 - yet Dynix/PTX had been shipping for almost
a decade at that point, and SVR4.2 ES/MP (highly scalable) and Chorus/Mix
development started in the late 1980s. Unisys was shipping Opus by 1997
with 32 processors running SVR4.2 ES/MP on top of the Chorus microkernel
with fine grained locking.

Post by Rainer Weikusat
[*] A "litmus test" is used to determine the ph value of some liquid
which is decidedly not "binary.

And "litmus test" is commonly used in popular culture as
a binary. C'est la vie.

James K. Lowden

2021-07-14 20:00:47 UTC

On Tue, 13 Jul 2021 16:05:59 GMT

Post by Scott Lurndal
He claims Plan 9 as the first true multiprocessor
x86 OS without a global locak in 1997 -

I'm not sure that's important to his point.

Post by Scott Lurndal
yet Dynix/PTX had been shipping for almost a decade at that point,
and SVR4.2 ES/MP (highly scalable) and Chorus/Mix development started
in the late 1980s. Unisys was shipping Opus by 1997 with 32
processors running SVR4.2 ES/MP on top of the Chorus microkernel with
fine grained locking.

Are any of those freely available as open source? (Trying to avoid
loaded language.) I think Cox was restricting his claim to system's
whose global lock of lack thereof could be independently verified.

I found the effect of Intel's vagueness on Plan 9's development
illuminating.

--jkl

Scott Lurndal

2021-07-14 22:30:15 UTC

Post by James K. Lowden
On Tue, 13 Jul 2021 16:05:59 GMT

Post by Scott Lurndal
He claims Plan 9 as the first true multiprocessor
x86 OS without a global locak in 1997 -

I'm not sure that's important to his point.

How about published papers? There are many that have been published
between ACM, IEEE, DIGITAL, usenix and others that describe the operating
systems above.

FWIW, I can verify all the above having worked on the Operating systems
for all of them at the time.

ES/MP in particular went through a very rigorous formal design process
for both the ES (Enhanced Security) and MP (Multiprocessor) features;
the later of which introduced threading to Unix (a M:N model, as it
happens).

James K. Lowden

2021-07-15 16:38:23 UTC

On Wed, 14 Jul 2021 22:30:15 GMT

Post by James K. Lowden
I think Cox was restricting his claim to system's

Post by James K. Lowden
whose global lock of lack thereof could be independently verified.

How about published papers? There are many that have been published
between ACM, IEEE, DIGITAL, usenix and others that describe the
operating systems above.

That's a fair point, Scott. I'll concede Bell Labs frequently turned a
blind eye toward what IBM in particular ever did.

--jkl

James K. Lowden

2021-07-14 20:00:20 UTC

On Tue, 13 Jul 2021 15:21:41 +0100

Post by Rainer Weikusat
Who is this guy

I think if you read more about Russ Cox, you'll find he's no fool.

Post by Rainer Weikusat
why does he believe independent memory accesses by one CPU should
become visible to another CPU in any particular order just because
some other guy said so almost 50 years ago?

I don't know who the "some other guy" is?

He's exploring what the programmer in a given language can expect from
the same source code running on different machines. Every programmer
has some model of memory in their heads -- correct or not -- but in
some cases what's "correct" holds only sometimes.

If the language promises anything, it is that defined behavior is
defined, regardless of compiler or OS or CPU. That was Java's
signature contribution 20-odd years ago, and even it was, shall we say,
overcome by events.

--jkl

Rainer Weikusat

2021-07-15 18:18:58 UTC

Post by Rainer Weikusat
Who is this guy

I think if you read more about Russ Cox, you'll find he's no fool.

Rather an "expert" used-car salesman :->.

I don't know who the "some other guy" is?

He's referring to Leslie Lamport for the definition of "sequentially
consistent machine" which bascially means all CPUs will see all memory
accesses by any CPU in program order.

Post by James K. Lowden
He's exploring what the programmer in a given language can expect from
the same source code running on different machines. Every programmer
has some model of memory in their heads -- correct or not -- but in
some cases what's "correct" holds only sometimes.

In the real world, the are no ordering constraints on independent memory
accesses: Because they're independent, the semantics of the code cannot
change when they're reordered. IOW, if there are ordering constraints a
compiler/ CPU cannot deduce from the code alone, it needs to be told
about them.

Stamping a foot on the floor while crying "But that's not how I wanted
it ot be!" won't help here. It's just another property of the machine
one needs to take into account when writing code. And neither does
burying this simple fact one can easily take into account below a
mountain of high-flying talk in order to make it sound like something
arcane and difficult (best dealt with by language designers instead of
"mere programmers").

James K. Lowden

2021-07-15 20:17:57 UTC

On Thu, 15 Jul 2021 19:18:58 +0100

I don't know who the "some other guy" is?

He's referring to Leslie Lamport for the definition of "sequentially
consistent machine" which bascially means all CPUs will see all memory
accesses by any CPU in program order.

Thank you. I would say anyone dismissing Leslie Lamport either doesn't
know what he's doing, or had be very sure he does.

Post by James K. Lowden
Every programmer
has some model of memory in their heads -- correct or not -- but in
some cases what's "correct" holds only sometimes.

In the real world,

I'm very often introduced to the real world, as if I live elsewhere. I
know that wasn't your intent, though.

Post by Rainer Weikusat
the are no ordering constraints on independent memory accesses

That is part of Cox's point. There's hardly any more real-world
encounter with the behavior of memory caches on hardware than while
writing an OS and trying to determine what minimal guarantees the
hardware provides.

Let me make up my own litmus test (sorry, I just work here):

Processor A:
x = 1
y = x + 1

Processor B:
z = x + y

If B runs after A, z = 3. If before A, z = 0. If during A (between
the two assignments), z = 1. But in no event, under the Intel TSO
model, can z be 2, Because to processor B, x must be visible before y.

As I understand Cox's paper, your assertion that there's "no [reliable]
ordering constraint" isn't quite true. The hardware may offer some
guarantees. Intel does (nowadays) and ARM does not. Your assertion is
*safe*, in the sense that by adhering to that rule as a programmer you
won't get caught relying on guarantees that aren't there. But it's not
optimal, because you'll sometimes introduce synchronization overhead
where guarantees are present.

This kind of thing is important to those whom you disdainfully call
language designers because the language need not reflect the
variablility of the hardware. Indeed, I would say it's a failing of
C++ to expose that variability. IMO language semantics demand
identical behavior across machines. (I do mean *semantics*, not
e.g. integer representation.) Count me in among those who wish for
DJB's boringcc.
(https://groups.google.com/g/boring-crypto/c/48qa1kWignU/m/o8GGp2K1DAAJ)

IOW, the language can make guarantees that the hardware does not. That
is what Cox says he's wrestling with: how will Go represent shared
memory? What guarantees will it make? IOW, what is the Go memory
model?

--jkl

Rainer Weikusat

2021-07-16 16:18:17 UTC

I don't know who the "some other guy" is?

He's referring to Leslie Lamport for the definition of "sequentially
consistent machine" which bascially means all CPUs will see all memory
accesses by any CPU in program order.

Thank you. I would say anyone dismissing Leslie Lamport either doesn't
know what he's doing, or had be very sure he does.

An incorrect model of the operation of a mulitprocessor remains
incorrect regardless of who published/ created it.

Post by James K. Lowden
Every programmer
has some model of memory in their heads -- correct or not -- but in
some cases what's "correct" holds only sometimes.

In the real world,

I'm very often introduced to the real world, as if I live elsewhere. I
know that wasn't your intent, though.

This was referring to Lamport's model: Real multiprocessors don't work
in this way.