[Beowulf] A cluster of Arduinos

Discussion:

[Beowulf] A cluster of Arduinos

Lux, Jim (337C)

2012-01-11 16:18:41 UTC

For educational purposes..

Has anyone done something where they implement some sort of message passing API on a network of Arduinos. Since they cost only $20 each, and have a fairly facile development environment, it seems you could put together a simple demonstration of parallel processing and various message passing things.

For instance, you could introduce errors in the message links and do experiments with Byzantine General type algorithms, or with multiple parallel routes, etc.

I've not actually tried hooking up multiple arduinos through a USB hub to one PC, but if that works, it gives you a nice "head node, debug console" sort of interface.

Smaller, lighter, cheaper than lashing together MiniITX mobos or building a Wal-Mart Cluster.

Prentice Bisbal

2012-01-11 16:58:59 UTC

Post by Lux, Jim (337C)
For educational purposes..
Has anyone done something where they implement some sort of message
passing API on a network of Arduinos. Since they cost only $20 each,
and have a fairly facile development environment, it seems you could
put together a simple demonstration of parallel processing and various
message passing things.
For instance, you could introduce errors in the message links and do
experiments with Byzantine General type algorithms, or with multiple
parallel routes, etc.
I've not actually tried hooking up multiple arduinos through a USB hub
to one PC, but if that works, it gives you a nice "head node, debug
console" sort of interface.
Smaller, lighter, cheaper than lashing together MiniITX mobos or
building a Wal-Mart Cluster.

I started tinkering with Arduinos a couple of months ago. Got lots of
related goodies for Christmas, so I've been looking like a mad scientist
building arduino things lately. I'm still a beginner arduino hacker, but
I'd be game for giving this a try, if anyone else wants to give this a go.

The Arduino Due, which is overdue in the marketplace, will have a
Cortex-M3 ARM processor.

--
Prentice

Nathan Moore

2012-01-11 17:31:30 UTC

I think something like the Raspberry Pi might be easier for this sort
of task. They'll also be about $25, but they'll run something like
ARM/linux. Not out yet thought.

http://www.raspberrypi.org/

Post by Prentice Bisbal

Post by Lux, Jim (337C)
For educational purposes..
Has anyone done something where they implement some sort of message
passing API on a network of Arduinos. Since they cost only $20 each,
and have a fairly facile development environment, it seems you could
put together a simple demonstration of parallel processing and various
message passing things.
For instance, you could introduce errors in the message links and do
experiments with Byzantine General type algorithms, or with multiple
parallel routes, etc.
I've not actually tried hooking up multiple arduinos through a USB hub
to one PC, but if that works, it gives you a nice "head node, debug
console" sort of interface.
Smaller, lighter, cheaper than lashing together MiniITX mobos or
building a Wal-Mart Cluster.

I started tinkering with Arduinos a couple of months ago. Got lots of
related goodies for Christmas, so I've been looking like a mad scientist
building arduino things lately. I'm still a beginner arduino hacker, but
I'd be game for giving this a try, if anyone else wants to give this a go.
The Arduino Due, which is overdue in the marketplace, will have a
Cortex-M3 ARM processor.
--
Prentice
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
- - - - - - - - - - - - - - - - - - - - -
Nathan Moore
Associate Professor, Physics
Winona State University
- - - - - - - - - - - - - - - - - - - - -

Vincent Diepeveen

2012-01-11 17:44:43 UTC

That's all very expensive considering the cpu's are under $1 i'd guess.
I actually might need some of this stuff some months from now to
build some robots.

Post by Nathan Moore
I think something like the Raspberry Pi might be easier for this sort
of task. They'll also be about $25, but they'll run something like
ARM/linux. Not out yet thought.
http://www.raspberrypi.org/
On Wed, Jan 11, 2012 at 10:58 AM, Prentice Bisbal

Post by Prentice Bisbal

Post by Lux, Jim (337C)
For educational purposes..
Has anyone done something where they implement some sort of message
passing API on a network of Arduinos. Since they cost only $20 each,
and have a fairly facile development environment, it seems you could
put together a simple demonstration of parallel processing and various
message passing things.
For instance, you could introduce errors in the message links and do
experiments with Byzantine General type algorithms, or with multiple
parallel routes, etc.
I've not actually tried hooking up multiple arduinos through a USB hub
to one PC, but if that works, it gives you a nice "head node, debug
console" sort of interface.
Smaller, lighter, cheaper than lashing together MiniITX mobos or
building a Wal-Mart Cluster.

I started tinkering with Arduinos a couple of months ago. Got lots of
related goodies for Christmas, so I've been looking like a mad scientist
building arduino things lately. I'm still a beginner arduino
hacker, but
I'd be game for giving this a try, if anyone else wants to give this a go.
The Arduino Due, which is overdue in the marketplace, will have a
Cortex-M3 ARM processor.
--
Prentice
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

--
- - - - - - - - - - - - - - - - - - - - -
Nathan Moore
Associate Professor, Physics
Winona State University
- - - - - - - - - - - - - - - - - - - - -
_______________________________________________
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Lux, Jim (337C)

2012-01-11 17:58:13 UTC

Yes.. better the widget that one can whip on down to Radio Shack and buy on my way home from work than the ghostware that may live for Christmas future.

Also, does the Raspberry PI $25 price point include a power supply? The Arduino runs off the USB 5V power, so it's one less thing to hassle with.

I don't know that performance is all that important in this application. It's more to experiment with message passing in a multiprocessor system. Slow is fine.

(I can't think of a computational application for a ArdWulf (combining Italian and Saxon) that wouldn't be blown away by almost any single computer, including something like a smart phone)

Realistically, you're looking at bitbanging kinds of serial interfaces.

I can see several network implementations: SPI shared bus, Hypercubes, toroidal surfaces, etc.

-----Original Message-----
From: beowulf-***@beowulf.org [mailto:beowulf-***@beowulf.org] On Behalf Of Nathan Moore
Sent: Wednesday, January 11, 2012 9:32 AM
To: Prentice Bisbal
Cc: ***@beowulf.org
Subject: Re: [Beowulf] A cluster of Arduinos

I think something like the Raspberry Pi might be easier for this sort of task. They'll also be about $25, but they'll run something like ARM/linux. Not out yet thought.

http://www.raspberrypi.org/

Post by Lux, Jim (337C)
For educational purposes..
Has anyone done something where they implement some sort of message
passing API on a network of Arduinos. Since they cost only $20 each,
and have a fairly facile development environment, it seems you could
put together a simple demonstration of parallel processing and
various message passing things.

Chris Samuel

2012-01-12 01:04:32 UTC

Post by Lux, Jim (337C)
Also, does the Raspberry PI $25 price point include a power supply?

I thought the plan was for them to be powered from the HDMI connector,
but it appears I was wrong, it looks like it can use either microUSB
or the GPIO header.

http://elinux.org/RaspberryPiBoard

# The board takes fixed 5V input, (with the 1V2 core voltage generated
# directly from the input using the internal switch-mode supply on the
# BCM2835 die). This permits adoption of the micro USB form factor,
# which, in turn, prevents the user from inadvertently plugging in
# out-of-range power inputs; that would be dangerous, since the 5V
# would go straight to HDMI and output USB ports, even though the
# problem should be mitigated by some protections applied to the input
# power: The board provides a polarity protection diode, a voltage
# clamp, and a self-resetting semiconductor fuse.

--
Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.unimelb.edu.au/

Lux, Jim (337C)

2012-01-12 01:22:07 UTC

Interesting...
That seems to be a growing trend, then. So, now we just have to wait for them to actually exist. The $35 B style board has Ethernet, and assuming one could netboot and operate "headless", then a stack o'raspberry PIs and a cheap Ethernet switch might be an alternate approach.

The "per node" cost is comparable to the Arduino, and it's true that Ethernet is probably more congenial in the long run.

Drawing 700mA off the microUSB, though.. That's fairly hefty (although not a big deal in general.. you might need to have some better power supply scheme for a basket o'pi cluster. (Arduino Uno runs around 40-50 mA)

-----Original Message-----
From: beowulf-***@beowulf.org [mailto:beowulf-***@beowulf.org] On Behalf Of Chris Samuel
Sent: Wednesday, January 11, 2012 5:05 PM
To: ***@beowulf.org
Subject: Re: [Beowulf] A cluster of Arduinos

Post by Lux, Jim (337C)
Also, does the Raspberry PI $25 price point include a power supply?

I thought the plan was for them to be powered from the HDMI connector, but it appears I was wrong, it looks like it can use either microUSB or the GPIO header.

http://elinux.org/RaspberryPiBoard

# The board takes fixed 5V input, (with the 1V2 core voltage generated # directly from the input using the internal switch-mode supply on the # BCM2835 die). This permits adoption of the micro USB form factor, # which, in turn, prevents the user from inadvertently plugging in # out-of-range power inputs; that would be dangerous, since the 5V # would go straight to HDMI and output USB ports, even though the # problem should be mitigated by some protections applied to the input # power: The board provides a polarity protection diode, a voltage # clamp, and a self-resetting semiconductor fuse.

Hearns, John

2012-01-12 10:16:28 UTC

Post by Lux, Jim (337C)
Interesting...
That seems to be a growing trend, then. So, now we just have to wait
for them to actually exist. The $35 B style board has Ethernet, and
assuming one could netboot and operate "headless", then a stack
o'raspberry PIs and a cheap Ethernet switch might be an alternate
approach.

Regarding Ethernet switches, I had cause recently to look for an USB
powered switch
Such things exist, they are promoted for gamers.
http://www.scan.co.uk/products/8-port-eten-pw-108-pocket-size-metal-casi
ng-10-100-switch-usb-powered-lan-party!

You could imagine a cluster being powered by those USB adapters which
fit into the cigarette
lighter socket of a car.
How about a cluster which fits in the glovebox or under the seat of a
car?

The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy.

Prentice Bisbal

2012-01-12 14:28:56 UTC

Post by Lux, Jim (337C)
Interesting...
That seems to be a growing trend, then. So, now we just have to wait for them to actually exist. The $35 B style board has Ethernet, and assuming one could netboot and operate "headless", then a stack o'raspberry PIs and a cheap Ethernet switch might be an alternate approach.
The "per node" cost is comparable to the Arduino, and it's true that Ethernet is probably more congenial in the long run.

You can get an ethernet "shield" for arduino to add ethernet
capabilities, but at $35-50 each, you cost savings just went out the
window, especially when compared to the Raspberry Pi. You can also buy
the Arduino Ethernet, which is an arduino board with Ethernet built in,
but at a cost of ~$60, is no better a value than buying an arduino and
the ethernet shield separately.

Post by Lux, Jim (337C)
Drawing 700mA off the microUSB, though.. That's fairly hefty (although not a big deal in general.. you might need to have some better power supply scheme for a basket o'pi cluster. (Arduino Uno runs around 40-50 mA

The arduino can be powered by USB, or a 9V power supply, so if you plan
on using lots of them (as Jim is, theoretically), you don't have to
worry about overloading the USB bus.

--
Prentice

Vincent Diepeveen

2012-01-11 17:43:17 UTC

Post by Prentice Bisbal

Post by Lux, Jim (337C)
For educational purposes..
Has anyone done something where they implement some sort of message
passing API on a network of Arduinos. Since they cost only $20 each,
and have a fairly facile development environment, it seems you could
put together a simple demonstration of parallel processing and various
message passing things.
For instance, you could introduce errors in the message links and do
experiments with Byzantine General type algorithms, or with multiple
parallel routes, etc.
I've not actually tried hooking up multiple arduinos through a USB hub
to one PC, but if that works, it gives you a nice "head node, debug
console" sort of interface.
Smaller, lighter, cheaper than lashing together MiniITX mobos or
building a Wal-Mart Cluster.

I started tinkering with Arduinos a couple of months ago. Got lots of
related goodies for Christmas, so I've been looking like a mad
scientist
building arduino things lately. I'm still a beginner arduino
hacker, but
I'd be game for giving this a try, if anyone else wants to give this a go.
The Arduino Due, which is overdue in the marketplace, will have a
Cortex-M3 ARM processor.

Completely superior chip that Cortex-M3.

Though i couldn't program much for it so far - difficult to get
contract jobs for.
Can do fast multiplication 32 x 32 bits.

You can even implement RSA very fast on that chip.
Runs at 70Mhz or so?

Usually writing assembler for such CPU's is more efficient by the way
than using
a compiler. Compilers are not so efficient, to say polite, for
embedded cpu's.

Writing assembler for such cpu's is pretty straightforward, whereas
in HPC things are far more complicated
because of vectorization.

AVX is the latest there. Speaking of AVX, is there already lots of
HPC support for AVX?

I see that after years of wrestling the George Woltman released some
prime number
code (GWNUM), of course as always: in beta for the remainder of this
century, which uses AVX.

Claims are that it's a tad faster than the existing SIMD codes. I saw
claims of even above 20% faster,
which is really a lot at that level of engineering; usually you work
6 months for 0.5% speedup.

If you improve algorithm, you still lose it from this code, as your C/
C++ code will be default a factor 10 slower if not more.

I remember how i found a clever caching trick in 2006 for a Numeric
Theoretic Transform (that's a FFT but then in integers, so without
the rounding errors that the floating point FFT's give), yet after
some hard work there my C code still was factor 8 slower than Woltman's
SIMD assembler.

Post by Prentice Bisbal
--
Prentice
_______________________________________________
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Lux, Jim (337C)

2012-01-11 18:00:36 UTC

Post by Prentice Bisbal
The Arduino Due, which is overdue in the marketplace, will have a
Cortex-M3 ARM processor.

Completely superior chip that Cortex-M3.

Though i couldn't program much for it so far - difficult to get contract jobs for.
Can do fast multiplication 32 x 32 bits.

You can even implement RSA very fast on that chip.
Runs at 70Mhz or so?

Usually writing assembler for such CPU's is more efficient by the way than using a compiler. Compilers are not so efficient, to say polite, for embedded cpu's.

Writing assembler for such cpu's is pretty straightforward, whereas in HPC things are far more complicated because of vectorization.

-->> ah, but this is not really a HPC application. It's a cluster computer architecture demonstration platform. The Java based arduino environment is pretty simple and multiplatform. Yes, it uses a sort of weird C-like language, but there it is... it's easy to use.

Lux, Jim (337C)

2012-01-11 18:19:24 UTC

Yes..
And there's been a bunch of "value clusters" over the years (StoneSouperComputer, for instance)..

But that's still $3k.

I could see putting together 8 nodes for a few hundred dollars. Arduino Uno R3 is about $25 each in quantity.

Think in terms of a small class where you want to have, say, 10 mini-clusters, one per student. No sharing, etc.

-----Original Message-----
From: Alex Chekholko [mailto:***@gmail.com]
Sent: Wednesday, January 11, 2012 10:12 AM
To: Lux, Jim (337C)
Cc: ***@beowulf.org
Subject: Re: [Beowulf] A cluster of Arduinos

The LittleFe cluster is designed specifically for teaching and demonstration. Current cost is ~$3k. But it's all standard x86 and runs Linux and even has GPUs.

http://littlefe.net/

I saw them build a bunch of them at SC11.

It's a cluster computer architecture demonstration platform.

Vincent Diepeveen

2012-01-11 22:47:00 UTC

Jim, your microcontroller cluster is not a rather good idea.

Latency didn't keep up with the CPU speeds...

Todays nodes have a CPU core or 12 and soon 16 which can execute,
let's take a simple integer example in my chessprogram and its IPC,
about 24 instructions per cycle

So nothing SIMD, just simple integer instructions most of it, of
course loads which effectively
come from L1 play an overwhelming role there.

typical latencies to do a random memory read from the remote nodes,
even with the latest networks,
it's between 0.85 and 1.9 microseconds. Let's take optimistic 1
microsecond. RDMA read...

So in that timeframe you can execute 24k+ instructions.

IPC at the cheapo cpu's is far under 1 effectively. Around 0.25 for
most codes.

Cpu's of 70Mhz can execute 1 instruction in each 280 Mhz. Now we are
busy with rough measures here.

Let's call that 1/4 millisecond.

Even USB 1.1 has to sticks latencies far under 1 millisecond.

So actual latency of todays clusters is factor 25k worse than this
'cluster'.

In fact your microcontrollercluster here has latencies that you do
not even have core to core
within a single CPU today.

There is still too much years 80s and years 90s software out there,
written by the guys who wrote books about how to parallellize, which
simply
doesn't scale at all at modern hardware.

Let me not quote too many names there as i've done before.

They were just too lazy to throw away their old code and start over
new writing a new parallel concept
that works at todays hardware.

If we involve GPU's now then there is gonna be an even bigger problem
and that's that bandwidth of the network
can't keep up with what a single GPU delivers. Who is to blame for
that is quite a complicated discussion,
if anyone has to be blamed anyway.

We just need more clever algorithms there.

Vincent Diepeveen

2012-01-11 22:56:12 UTC

Post by Vincent Diepeveen
Jim, your microcontroller cluster is not a rather good idea.
Latency didn't keep up with the CPU speeds...
Todays nodes have a CPU core or 12 and soon 16 which can execute,
let's take a simple integer example in my chessprogram and its IPC,
about 24 instructions per cycle
So nothing SIMD, just simple integer instructions most of it, of
course loads which effectively
come from L1 play an overwhelming role there.
typical latencies to do a random memory read from the remote nodes,
even with the latest networks,
it's between 0.85 and 1.9 microseconds. Let's take optimistic 1
microsecond. RDMA read...
So in that timeframe you can execute 24k+ instructions.

Hah, how easy it is to make a mistake, sorry for that.

I didn't even multiply by the Ghz frequency of the cpu's yet.

So if it's 3Ghz or so, it's actually closer to factor 75k faster than
24k.

Furthermore another problem is that you cant fully load networks of
course.

So to keep the network functioning great you want to do such
hammering over the network no more than once each 750k instructions.

Post by Vincent Diepeveen
IPC at the cheapo cpu's is far under 1 effectively. Around 0.25 for
most codes.
Cpu's of 70Mhz can execute 1 instruction in each 280 Mhz. Now we are
busy with rough measures here.
Let's call that 1/4 millisecond.
Even USB 1.1 has to sticks latencies far under 1 millisecond.
So actual latency of todays clusters is factor 25k worse than this
'cluster'.
In fact your microcontrollercluster here has latencies that you do
not even have core to core
within a single CPU today.
There is still too much years 80s and years 90s software out there,
written by the guys who wrote books about how to parallellize, which
simply
doesn't scale at all at modern hardware.
Let me not quote too many names there as i've done before.
They were just too lazy to throw away their old code and start over
new writing a new parallel concept
that works at todays hardware.
If we involve GPU's now then there is gonna be an even bigger problem
and that's that bandwidth of the network
can't keep up with what a single GPU delivers. Who is to blame for
that is quite a complicated discussion,
if anyone has to be blamed anyway.
We just need more clever algorithms there.
_______________________________________________
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Lux, Jim (337C)

2012-01-11 23:24:55 UTC

-----Original Message-----
From: beowulf-***@beowulf.org [mailto:beowulf-***@beowulf.org] On Behalf Of Vincent Diepeveen
Sent: Wednesday, January 11, 2012 2:47 PM
To: Beowulf Mailing List
Subject: Re: [Beowulf] A cluster of Arduinos

Jim, your microcontroller cluster is not a rather good idea.

Latency didn't keep up with the CPU speeds...

--- You're missing the point of the cluster. It's not for performance (where I can't imagine that the slowest single CPU PC out there wouldn't blow the figurative doors off). It's to provide a very inexpensive way to experiment/play/demonstrate loosely coupled multiprocessor systems.

--> for example, you could experiment with redundant message routing across a fabric of nodes. The algorithms are fairly simple, and this gives you a testbed which is qualitatively different than just simulating a bunch of nodes on a single PC. There is pedagogical value in a system where you can force a link error by just disconnecting the cable, and your blinky lights on each node show what's going on.

There is still too much years 80s and years 90s software out there, written by the guys who wrote books about how to parallellize, which simply doesn't scale at all at modern hardware.

--> I think that a lot of the theory of parallel processes is speed independent, and while some historical approaches might not be used in a modern system for good implementation reasons, students and others still need to learn about them, if only as the canonical approach. Sure, you could do a simulation on a single PC (and I've seen them, in Simulink, and in other more specialized tools), but there's a lot of appeal to a hands-on-the-cheap-hardware approach to learning.

--> To take an example, if you set a student a problem of lighting a LED on each node in a specified node order at specified intervals, and where the node interconnects are not specified in advance, that's a fairly interesting homework problem. You have to discover the network connectivity graph, then figure out how to pass the message to the appropriate node at the appropriate time. This is a classic "hot plug network discovery" kind of problem, and in the face of intermittent links, it's of great interest.

--> While that particular problem isn't exactly HPC, it DOES relate to HPC in a world where you cannot assume perfect processor nodes and perfect communications links. And that gets right to the whole "scalability" thing in HPC. It wasn't til the implementation of Error Correcting Codes in logic that something like the Q7A computer was even possible, because it was so large that you couldn't guarantee that all the tubes would be working all the time. Likewise with many other aspects of modern computing.

--> And, of course, in the spaceflight world, this kind of thing is even more important. A concept of growing importance is the "fractionated spacecraft" where all of the functions that would have been all in one physical vehicle are now spread across many smaller pieces. And one might reallocate spacecraft fractional pieces between different virtual spacecraft. Maybe right now, you need a lot of processing power to do image compression and analysis, so you want to allocate a lot of "processing pieces" to the job, with an ad hoc network connection among them. Later, you don't need them, so you can release them to other uses. The pieces might be in the immediate vicinity, or they might be some distance away, which affects the data rate in the link and its error rates.

--> You can legitimately ask whether this sort of thing (the fractionated spacecraft) is a Beowulf (defined as a cluster supercomputer built of commodity components) and I would say it shares many of the same properties, especially in the early Beowulf days before multicores and fancy interconnects were fashionable for multi-thousand processor clusters. It's that idea of building a large complex device out of many basically identical subunits, using open source/simple software to manage it.

-->> in summary, it's not about performance.. it's about a teaching tool for networking in the context of cluster computing. You claim we need to cast off the shackles of old programming styles and get some new blood and ideas. Well, you need to get people interested in parallel computing and learning the basics (so at least they don't reinvent the square wheel). One way might be challenges such as parallelization of game play; another might be working with parallelized database; the way I propose is with experimenting with message passing parallelization using dirt cheap hardware.

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Vincent Diepeveen

2012-01-12 00:36:37 UTC

Yes this was impossible to explain to a bunch of MiT folks as well,
some of whom wrote your book i bet - yet the slower the processor,
the more of a true SMP system it is.

It's obvious that you missed that point.

Writing code for a multicore is tougher, from SMP constraints viewpoint,
than for a bunch of 70Mhz cpu's that have a millisecond latency to
the other cpu's.

So it's far from demonstrating clusterprogramming. Lightyears away.

Emulation at a simple quadcore is in fact better representative than
this.

If you want to get closer to clusterprogramming than this, just buy
yourself off ebay
some barcelona core SMP system with 4 sockets. Say with energy
efficient 1.8Ghz CPU's.

So with one of the first incarnations of hypertransport, as of course
later on it dramatically improved.

Latency from cpu to cpu is some 300+ ns if you lookup randomly.

Even good programmers in game tree search have big problems working
with those latencies.

Clusters are having latencies that are far worse than that. Yet as
cpu speeds no longer increase much
and number of cores doesn't double that quickly, clusters are the way
to go if you're CPU hungry.

Setting up small clusters is cheap as well. If i put in the name
'mellanox' in ebay i see bunches of
cheap cards out there and also switches.

With a single switch you can teach half a dozen students. You can
just connect the machines you already
got there onto a few switches and write MPI code like that.

Average cost per student also will be a couple of hundreds of dollars.

Vincent

Post by Lux, Jim (337C)
-----Original Message-----
Sent: Wednesday, January 11, 2012 2:47 PM
To: Beowulf Mailing List
Subject: Re: [Beowulf] A cluster of Arduinos
Jim, your microcontroller cluster is not a rather good idea.
Latency didn't keep up with the CPU speeds...
--- You're missing the point of the cluster. It's not for
performance (where I can't imagine that the slowest single CPU PC
out there wouldn't blow the figurative doors off). It's to provide
a very inexpensive way to experiment/play/demonstrate loosely
coupled multiprocessor systems.
--> for example, you could experiment with redundant message
routing across a fabric of nodes. The algorithms are fairly
simple, and this gives you a testbed which is qualitatively
different than just simulating a bunch of nodes on a single PC.
There is pedagogical value in a system where you can force a link
error by just disconnecting the cable, and your blinky lights on
each node show what's going on.
There is still too much years 80s and years 90s software out there,
written by the guys who wrote books about how to parallellize,
which simply doesn't scale at all at modern hardware.
--> I think that a lot of the theory of parallel processes is
speed independent, and while some historical approaches might not
be used in a modern system for good implementation reasons,
students and others still need to learn about them, if only as the
canonical approach. Sure, you could do a simulation on a single
PC (and I've seen them, in Simulink, and in other more specialized
tools), but there's a lot of appeal to a hands-on-the-cheap-
hardware approach to learning.
--> To take an example, if you set a student a problem of lighting
a LED on each node in a specified node order at specified
intervals, and where the node interconnects are not specified in
advance, that's a fairly interesting homework problem. You have to
discover the network connectivity graph, then figure out how to
pass the message to the appropriate node at the appropriate time.
This is a classic "hot plug network discovery" kind of problem, and
in the face of intermittent links, it's of great interest.
--> While that particular problem isn't exactly HPC, it DOES relate
to HPC in a world where you cannot assume perfect processor nodes
and perfect communications links. And that gets right to the whole
"scalability" thing in HPC. It wasn't til the implementation of
Error Correcting Codes in logic that something like the Q7A
computer was even possible, because it was so large that you
couldn't guarantee that all the tubes would be working all the
time. Likewise with many other aspects of modern computing.
--> And, of course, in the spaceflight world, this kind of thing is
even more important. A concept of growing importance is the
"fractionated spacecraft" where all of the functions that would
have been all in one physical vehicle are now spread across many
smaller pieces. And one might reallocate spacecraft fractional
pieces between different virtual spacecraft. Maybe right now, you
need a lot of processing power to do image compression and
analysis, so you want to allocate a lot of "processing pieces" to
the job, with an ad hoc network connection among them. Later, you
don't need them, so you can release them to other uses. The pieces
might be in the immediate vicinity, or they might be some distance
away, which affects the data rate in the link and its error rates.
--> You can legitimately ask whether this sort of thing (the
fractionated spacecraft) is a Beowulf (defined as a cluster
supercomputer built of commodity components) and I would say it
shares many of the same properties, especially in the early Beowulf
days before multicores and fancy interconnects were fashionable for
multi-thousand processor clusters. It's that idea of building a
large complex device out of many basically identical subunits,
using open source/simple software to manage it.
-->> in summary, it's not about performance.. it's about a teaching
tool for networking in the context of cluster computing. You claim
we need to cast off the shackles of old programming styles and get
some new blood and ideas. Well, you need to get people interested
in parallel computing and learning the basics (so at least they
don't reinvent the square wheel). One way might be challenges such
as parallelization of game play; another might be working with
parallelized database; the way I propose is with experimenting with
message passing parallelization using dirt cheap hardware.
_______________________________________________
Computing To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf

Chris Samuel

2012-01-12 00:59:18 UTC

Post by Vincent Diepeveen
So it's far from demonstrating clusterprogramming. Lightyears away.

Whatever happpened to hacking on hardware just for the fun of it?

Just because it's not going to be useful doesn't mean you won't learn
from the experience, even if the lesson is only "don't do it again".
:-)

--
Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.unimelb.edu.au/

Lux, Jim (337C)

2012-01-12 01:09:53 UTC

-----Original Message-----
From: beowulf-***@beowulf.org [mailto:beowulf-***@beowulf.org] On Behalf Of Vincent Diepeveen
Sent: Wednesday, January 11, 2012 4:37 PM
To: Beowulf Mailing List
Subject: Re: [Beowulf] A cluster of Arduinos

Yes this was impossible to explain to a bunch of MiT folks as well, some of whom wrote your book i bet - yet the slower the processor, the more of a true SMP system it is.

It's obvious that you missed that point.

Writing code for a multicore is tougher, from SMP constraints viewpoint, than for a bunch of 70Mhz cpu's that have a millisecond latency to the other cpu's.

-> Yes, that's true... but that's also what I would think of as more advanced than understanding basic message passing or non-tightly-coupled multiprocessing systems. And there are lots of applications for the latter. Some might not be as sexy as others, but they exist.

So it's far from demonstrating clusterprogramming. Lightyears away.
Emulation at a simple quadcore is in fact better representative than this.
If you want to get closer to clusterprogramming than this, just buy yourself off ebay some barcelona core SMP system with 4 sockets. Say with energy efficient 1.8Ghz CPU's.
So with one of the first incarnations of hypertransport, as of course later on it dramatically improved.
Latency from cpu to cpu is some 300+ ns if you lookup randomly.
Even good programmers in game tree search have big problems working with those latencies.

-> but that's an entirely different sort of problem space and instructional area.

Clusters are having latencies that are far worse than that. Yet as cpu speeds no longer increase much and number of cores doesn't double that quickly, clusters are the way to go if you're CPU hungry.
Setting up small clusters is cheap as well. If i put in the name 'mellanox' in ebay i see bunches of cheap cards out there and also switches.

-> Oh, Im sure the surplus market is full of things one could potentially use. But I suspect that by the time you lash together your $40 cards and $20 cables and several hundred $ switch, you're up in the total system price >$1k. And you're using surplus, so there's a support issue. If you're tinkering for yourself in the garage or as a one-off, then surplus is a fine way to go. If you want to be able to give a list of "go buy this" to a teacher, it needs to be off-the-shelf currently being manufactured stuff.

-> Say you want to set up 10 demo systems with 8 nodes each, so that each student in a small class has their own to work with. There's a big difference between $30 Arduinos and $200 netbooks.

With a single switch you can teach half a dozen students. You can just connect the machines you already got there onto a few switches and write MPI code like that.

-> The whole point is to give a student exclusive access to the system, without needing to share. Sure, we've all done the shared "computer lab" resource thing and managed to learn(In the late 1970s, I would have done quite a lot to have on demand access to an 029 keypunch). That's part of what *personal* computers is all about. My program doesn't work right, I just hit the reset button and start over.

-> I confess, too, that there is an aspect of the "mass of boards on the desktop with cables strewn around", which is a learning experience in itself. On the other hand, the Arduino experience is a lot less hassle than, say, a mass of PC mobos, network cards, and power supplies and trying to get them to boot off the net or a USB drive.

Average cost per student also will be a couple of hundreds of dollars.
-> that's the "total cost of several thousand dollars divided by N students who share it" I suspect. We could get into a little BOM battle, and I'd venture that I can keep the off the shelf parts cost under $500, and give each student a dedicated system to play with. The only part that I don't know right off the top of my head is the actual interconnect hardware. I think you'd want to design some sort of board with a bunch of connectors that connects to the Arduinos with ribbon cables. But even there, that could be "here's your PCBExpress file.. order the board and you get 3 for $50"

-> over the years I've been involved in several of these "what can we set up for a demonstration", and I've converged to the realization that what you need is a parts list (preferably preloaded at Newark or DigiKey or Mouser or similar) and an explicit set of instructions. A setup that starts out with:
1) Find 8 motherboards on eBay or newegg with these sorts of specs
2) Find 8 power supplies that match the mother boards

Is doomed to failure. You need "buy 3 of those and 6 of these, and hook them up this way"

This is the beauty of the whole Arduino culture. In fact, it's a bit too much of that.. there's not a lot of good overview tutorial material.. but lots of "here's how to do specific task X"... I got started looking at Arduinos because I want to build a multichannel temperature controller to smoke/cure sausage.

But I've used just about every small single board computer out there: Rabbit, Basic Stamp, various PIC boards, etc. not to mention various MiniITX and PC schemes. So far, the Arduino is the winner on dirt cheap and simple combined. Spend $30, plug in USB cable, load java environment, done. Now I know why all those projects at the science fair are using them. You get to focus on what you want to do, rather than getting a computer working.

Vincent

Post by Lux, Jim (337C)
-----Original Message-----
Sent: Wednesday, January 11, 2012 2:47 PM
To: Beowulf Mailing List
Subject: Re: [Beowulf] A cluster of Arduinos
Jim, your microcontroller cluster is not a rather good idea.
Latency didn't keep up with the CPU speeds...
--- You're missing the point of the cluster. It's not for performance
(where I can't imagine that the slowest single CPU PC out there
wouldn't blow the figurative doors off). It's to provide a very
inexpensive way to experiment/play/demonstrate loosely coupled
multiprocessor systems.
--> for example, you could experiment with redundant message
routing across a fabric of nodes. The algorithms are fairly simple,
and this gives you a testbed which is qualitatively
different than just simulating a bunch of nodes on a single PC.
There is pedagogical value in a system where you can force a link
error by just disconnecting the cable, and your blinky lights on each
node show what's going on.
There is still too much years 80s and years 90s software out there,
written by the guys who wrote books about how to parallellize, which
simply doesn't scale at all at modern hardware.
--> I think that a lot of the theory of parallel processes is
speed independent, and while some historical approaches might not be
used in a modern system for good implementation reasons, students and
others still need to learn about them, if only as the
canonical approach. Sure, you could do a simulation on a single
PC (and I've seen them, in Simulink, and in other more specialized
tools), but there's a lot of appeal to a hands-on-the-cheap- hardware
approach to learning.
--> To take an example, if you set a student a problem of lighting
a LED on each node in a specified node order at specified intervals,
and where the node interconnects are not specified in advance, that's
a fairly interesting homework problem. You have to discover the
network connectivity graph, then figure out how to
pass the message to the appropriate node at the appropriate time.
This is a classic "hot plug network discovery" kind of problem, and in
the face of intermittent links, it's of great interest.
--> While that particular problem isn't exactly HPC, it DOES relate
to HPC in a world where you cannot assume perfect processor nodes and
perfect communications links. And that gets right to the whole
"scalability" thing in HPC. It wasn't til the implementation of Error
Correcting Codes in logic that something like the Q7A computer was
even possible, because it was so large that you couldn't guarantee
that all the tubes would be working all the time. Likewise with many
other aspects of modern computing.
--> And, of course, in the spaceflight world, this kind of thing is
even more important. A concept of growing importance is the
"fractionated spacecraft" where all of the functions that would have
been all in one physical vehicle are now spread across many smaller
pieces. And one might reallocate spacecraft fractional pieces between
different virtual spacecraft. Maybe right now, you need a lot of
processing power to do image compression and analysis, so you want to
allocate a lot of "processing pieces" to the job, with an ad hoc
network connection among them. Later, you don't need them, so you
can release them to other uses. The pieces might be in the immediate
vicinity, or they might be some distance away, which affects the data
rate in the link and its error rates.
--> You can legitimately ask whether this sort of thing (the
fractionated spacecraft) is a Beowulf (defined as a cluster
supercomputer built of commodity components) and I would say it shares
many of the same properties, especially in the early Beowulf days
before multicores and fancy interconnects were fashionable for
multi-thousand processor clusters. It's that idea of building a large
complex device out of many basically identical subunits, using open
source/simple software to manage it.
-->> in summary, it's not about performance.. it's about a teaching
tool for networking in the context of cluster computing. You claim we
need to cast off the shackles of old programming styles and get some
new blood and ideas. Well, you need to get people interested in
parallel computing and learning the basics (so at least they don't
reinvent the square wheel). One way might be challenges such as
parallelization of game play; another might be working with
parallelized database; the way I propose is with experimenting with
message passing parallelization using dirt cheap hardware.
_______________________________________________
Computing To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, ***@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Vincent Diepeveen

2012-01-12 02:03:21 UTC

The whole purpose of PC's is that they are generic to use. I remember
how in past decision taking bought low clocked junk for big price -
much against the wish of the sysadmins who wanted a PC for every
student exclusively. Outdated slow junk is not interesting
to students. Now you and i might like that CPU as it's under $1, but
to them it's just 70Mhz, factor 500 slower than their home PC single
core
is. What impresses is if you got something that can beat their own
machine at home.

In the end in science we basically learn a lot easier if we can take
a look into the future - so being faster than a single PC is a good
example of that.

So let them do that. If you take care you launch 1 proces on each
machine, then at quadcore machines, not to mention i7's with
hyperthreading, you can have 24 computers on 1 switch that serve 24
students, each using 12 logical cores.

And for demonstration purposes you can run succesful applications
also at all 24 computers at the same time.

Hey there is switches with even more slots.

Average price per student is gonna beat the crap out any junk
solution you show up with - besides how many are you gonna buy?

Those computers are already there, one for each student i suspect.

So they can exclusively toy and toy - for the switch it's not a real
problem except if they really mess up.

But most important they learn something - by toying with 70Mhz
hardware that's not representative and only intersting to experts like
you and me, who are real good in embedded programming, they don't
learn much.

There is no replacement for the real thing to test upon.

Besides if you go program at embedded processors, writing good fast
single CPU code mine is probably gonna kick the hell out of you writing
the same program at 8 CPU's. Probably by factor 10+ it'll be single
core faster than you at 8.

p.s. not that it's disturbing Jim but your replies are typed within
my original message always, so tough to read sometimes what you typed
into
the message i posted here - maybe this apple macbookpro's
mailing system doesn't know how to handle it - FYI i want to reformat
it to linux anyway -
getting sick being hacked silly each time by about every other
consultant,
but well this is all off topic - so hence the postscriptum.

Post by Lux, Jim (337C)
-----Original Message-----
Sent: Wednesday, January 11, 2012 4:37 PM
To: Beowulf Mailing List
Subject: Re: [Beowulf] A cluster of Arduinos
Yes this was impossible to explain to a bunch of MiT folks as well,
some of whom wrote your book i bet - yet the slower the processor,
the more of a true SMP system it is.
It's obvious that you missed that point.
Writing code for a multicore is tougher, from SMP constraints
viewpoint, than for a bunch of 70Mhz cpu's that have a millisecond
latency to the other cpu's.
-> Yes, that's true... but that's also what I would think of as
more advanced than understanding basic message passing or non-
tightly-coupled multiprocessing systems. And there are lots of
applications for the latter. Some might not be as sexy as others,
but they exist.
So it's far from demonstrating clusterprogramming. Lightyears away.
Emulation at a simple quadcore is in fact better representative than this.
If you want to get closer to clusterprogramming than this, just buy
yourself off ebay some barcelona core SMP system with 4 sockets.
Say with energy efficient 1.8Ghz CPU's.
So with one of the first incarnations of hypertransport, as of
course later on it dramatically improved.
Latency from cpu to cpu is some 300+ ns if you lookup randomly.
Even good programmers in game tree search have big problems working with those latencies.
-> but that's an entirely different sort of problem space and
instructional area.
Clusters are having latencies that are far worse than that. Yet as
cpu speeds no longer increase much and number of cores doesn't
double that quickly, clusters are the way to go if you're CPU hungry.
Setting up small clusters is cheap as well. If i put in the name
'mellanox' in ebay i see bunches of cheap cards out there and also
switches.
-> Oh, Im sure the surplus market is full of things one could
potentially use. But I suspect that by the time you lash together
your $40 cards and $20 cables and several hundred $ switch, you're
up in the total system price >$1k. And you're using surplus, so
there's a support issue. If you're tinkering for yourself in the
garage or as a one-off, then surplus is a fine way to go. If you
want to be able to give a list of "go buy this" to a teacher, it
needs to be off-the-shelf currently being manufactured stuff.
-> Say you want to set up 10 demo systems with 8 nodes each, so
that each student in a small class has their own to work with.
There's a big difference between $30 Arduinos and $200 netbooks.
With a single switch you can teach half a dozen students. You can
just connect the machines you already got there onto a few switches
and write MPI code like that.
-> The whole point is to give a student exclusive access to the
system, without needing to share. Sure, we've all done the shared
"computer lab" resource thing and managed to learn(In the late
1970s, I would have done quite a lot to have on demand access to an
029 keypunch). That's part of what *personal* computers is all
about. My program doesn't work right, I just hit the reset
button and start over.
-> I confess, too, that there is an aspect of the "mass of boards
on the desktop with cables strewn around", which is a learning
experience in itself. On the other hand, the Arduino experience is
a lot less hassle than, say, a mass of PC mobos, network cards, and
power supplies and trying to get them to boot off the net or a USB
drive.
Average cost per student also will be a couple of hundreds of dollars.
-> that's the "total cost of several thousand dollars divided by N
students who share it" I suspect. We could get into a little BOM
battle, and I'd venture that I can keep the off the shelf parts
cost under $500, and give each student a dedicated system to play
with. The only part that I don't know right off the top of my head
is the actual interconnect hardware. I think you'd want to design
some sort of board with a bunch of connectors that connects to the
Arduinos with ribbon cables. But even there, that could be
"here's your PCBExpress file.. order the board and you get 3 for $50"
-> over the years I've been involved in several of these "what can
we set up for a demonstration", and I've converged to the
realization that what you need is a parts list (preferably
preloaded at Newark or DigiKey or Mouser or similar) and an
1) Find 8 motherboards on eBay or newegg with these sorts of specs
2) Find 8 power supplies that match the mother boards
Is doomed to failure. You need "buy 3 of those and 6 of these, and hook them up this way"
This is the beauty of the whole Arduino culture. In fact, it's a
bit too much of that.. there's not a lot of good overview tutorial
material.. but lots of "here's how to do specific task X"... I got
started looking at Arduinos because I want to build a multichannel
temperature controller to smoke/cure sausage.
But I've used just about every small single board computer out
there: Rabbit, Basic Stamp, various PIC boards, etc. not to mention
various MiniITX and PC schemes. So far, the Arduino is the winner
on dirt cheap and simple combined. Spend $30, plug in USB cable,
load java environment, done. Now I know why all those projects at
the science fair are using them. You get to focus on what you want
to do, rather than getting a computer working.
Vincent

Post by Lux, Jim (337C)
-----Original Message-----
Sent: Wednesday, January 11, 2012 2:47 PM
To: Beowulf Mailing List
Subject: Re: [Beowulf] A cluster of Arduinos
Jim, your microcontroller cluster is not a rather good idea.
Latency didn't keep up with the CPU speeds...
--- You're missing the point of the cluster. It's not for
performance
(where I can't imagine that the slowest single CPU PC out there
wouldn't blow the figurative doors off). It's to provide a very
inexpensive way to experiment/play/demonstrate loosely coupled
multiprocessor systems.
--> for example, you could experiment with redundant message
routing across a fabric of nodes. The algorithms are fairly simple,
and this gives you a testbed which is qualitatively
different than just simulating a bunch of nodes on a single PC.
There is pedagogical value in a system where you can force a link
error by just disconnecting the cable, and your blinky lights on each
node show what's going on.
There is still too much years 80s and years 90s software out there,
written by the guys who wrote books about how to parallellize, which
simply doesn't scale at all at modern hardware.
--> I think that a lot of the theory of parallel processes is
speed independent, and while some historical approaches might not be
used in a modern system for good implementation reasons, students and
others still need to learn about them, if only as the
canonical approach. Sure, you could do a simulation on a single
PC (and I've seen them, in Simulink, and in other more specialized
tools), but there's a lot of appeal to a hands-on-the-cheap- hardware
approach to learning.
--> To take an example, if you set a student a problem of lighting
a LED on each node in a specified node order at specified intervals,
and where the node interconnects are not specified in advance, that's
a fairly interesting homework problem. You have to discover the
network connectivity graph, then figure out how to
pass the message to the appropriate node at the appropriate time.
This is a classic "hot plug network discovery" kind of problem, and in
the face of intermittent links, it's of great interest.
--> While that particular problem isn't exactly HPC, it DOES relate
to HPC in a world where you cannot assume perfect processor nodes and
perfect communications links. And that gets right to the whole
"scalability" thing in HPC. It wasn't til the implementation of Error
Correcting Codes in logic that something like the Q7A computer was
even possible, because it was so large that you couldn't guarantee
that all the tubes would be working all the time. Likewise with many
other aspects of modern computing.
--> And, of course, in the spaceflight world, this kind of thing is
even more important. A concept of growing importance is the
"fractionated spacecraft" where all of the functions that would have
been all in one physical vehicle are now spread across many smaller
pieces. And one might reallocate spacecraft fractional pieces between
different virtual spacecraft. Maybe right now, you need a lot of
processing power to do image compression and analysis, so you want to
allocate a lot of "processing pieces" to the job, with an ad hoc
network connection among them. Later, you don't need them, so you
can release them to other uses. The pieces might be in the immediate
vicinity, or they might be some distance away, which affects the data
rate in the link and its error rates.
--> You can legitimately ask whether this sort of thing (the
fractionated spacecraft) is a Beowulf (defined as a cluster
supercomputer built of commodity components) and I would say it shares
many of the same properties, especially in the early Beowulf days
before multicores and fancy interconnects were fashionable for
multi-thousand processor clusters. It's that idea of building a large
complex device out of many basically identical subunits, using open
source/simple software to manage it.
-->> in summary, it's not about performance.. it's about a teaching
tool for networking in the context of cluster computing. You
claim we
need to cast off the shackles of old programming styles and get some
new blood and ideas. Well, you need to get people interested in
parallel computing and learning the basics (so at least they don't
reinvent the square wheel). One way might be challenges such as
parallelization of game play; another might be working with
parallelized database; the way I propose is with experimenting with
message passing parallelization using dirt cheap hardware.
_______________________________________________
Computing To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Computing To change your subscription (digest mode or unsubscribe)
visit http://www.beowulf.org/mailman/listinfo/beowulf

Ellis H. Wilson III

2012-01-12 13:58:20 UTC

Post by Vincent Diepeveen
The whole purpose of PC's is that they are generic to use. I remember
how in past decision taking bought low clocked junk for big price -
much against the wish of the sysadmins who wanted a PC for every
student exclusively. Outdated slow junk is not interesting
to students. Now you and i might like that CPU as it's under $1, but
to them it's just 70Mhz, factor 500 slower than their home PC single
core
is. What impresses is if you got something that can beat their own
machine at home.
In the end in science we basically learn a lot easier if we can take
a look into the future - so being faster than a single PC is a good
example of that.

Take this advice in any other area, let's say, Chemical Engineering or
Mechanical Engineering, and the students are going to come out the of
the experience with chemical burns at least to at most blowing up half
of the building. In the best case all they do is screw up very, very
expensive equipment. So I have to respectfully disagree that learning
is only possible and students will only be interested when working on
the stuff of the "future." I think this is likely the reason why many
introductory engineering classes incorporate use of Lego Mindstorm
robots rather than lunar rovers (or even overstock lunar rovers :D).

Point in case, I got interested in HPC/Beowulfery back in 2006, read
RGBs book and a few other texts on it, and finally found a small group
(4) of unused PIIIs to play on in the attic of one of my college's
buildings. Did I learn how to setup a reasonable cluster? Yes. Was it
slow as dirt compared to then modern Intel and AMD processors? Of
course. But did the experience get me so completely hooked on
HPC/Cluster research that I went on to pursue a PHD on the topic?
Absolutely.

Granted, I'm just one data point, but I think Jim's idea has all the
right components for a great educational experience.

Best,

ellis

Vincent Diepeveen

2012-01-12 14:39:23 UTC

The average guy is not interested in knowing all details regarding
how to
play tennis with a wooden racket from the 1980s, just around
the time when McEnroe was on the tennisfield playing there.

Most people are more interested in whether you can win that grandslam
with what you produce.

The nerds however are interested in how well you can do with a wooden
racket
from 1980s,therefore projecting your own interest upon those students
will just
get them desinterested and you will be judged by them as an
irrelevant person
in their life, whose name they soon forget.

Vincent

Post by Ellis H. Wilson III

Post by Vincent Diepeveen
The whole purpose of PC's is that they are generic to use. I remember
how in past decision taking bought low clocked junk for big price -
much against the wish of the sysadmins who wanted a PC for every
student exclusively. Outdated slow junk is not interesting
to students. Now you and i might like that CPU as it's under $1, but
to them it's just 70Mhz, factor 500 slower than their home PC single
core
is. What impresses is if you got something that can beat their own
machine at home.
In the end in science we basically learn a lot easier if we can take
a look into the future - so being faster than a single PC is a good
example of that.

Take this advice in any other area, let's say, Chemical Engineering or
Mechanical Engineering, and the students are going to come out the of
the experience with chemical burns at least to at most blowing up half
of the building. In the best case all they do is screw up very, very
expensive equipment. So I have to respectfully disagree that learning
is only possible and students will only be interested when working on
the stuff of the "future." I think this is likely the reason why many
introductory engineering classes incorporate use of Lego Mindstorm
robots rather than lunar rovers (or even overstock lunar rovers :D).
Point in case, I got interested in HPC/Beowulfery back in 2006, read
RGBs book and a few other texts on it, and finally found a small group
(4) of unused PIIIs to play on in the attic of one of my college's
buildings. Did I learn how to setup a reasonable cluster? Yes.
Was it
slow as dirt compared to then modern Intel and AMD processors? Of
course. But did the experience get me so completely hooked on
HPC/Cluster research that I went on to pursue a PHD on the topic?
Absolutely.
Granted, I'm just one data point, but I think Jim's idea has all the
right components for a great educational experience.
Best,
ellis
_______________________________________________
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Prentice Bisbal

2012-01-12 14:50:05 UTC

Post by Vincent Diepeveen
The average guy is not interested in knowing all details regarding
how to
play tennis with a wooden racket from the 1980s, just around
the time when McEnroe was on the tennisfield playing there.
Most people are more interested in whether you can win that grandslam
with what you produce.
The nerds however are interested in how well you can do with a wooden
racket
from 1980s,therefore projecting your own interest upon those students
will just
get them desinterested and you will be judged by them as an
irrelevant person
in their life, whose name they soon forget.

Vincent, I think the only person projecting here is you. You refer to
the 'average guy'. The word 'average' itself implies that statistics
have been collected and analyzed. Can you please show us your
statistics, and how you collected them, to determine what the average
guy is interested in? And what about the average girl, what is she
interested in? If you are merely citing the work of other researchers,
please include citations.

--
Prentice

Ellis H. Wilson III

2012-01-12 14:53:57 UTC

Post by Prentice Bisbal

Post by Vincent Diepeveen
The average guy is not interested in knowing all details regarding
how to
play tennis with a wooden racket from the 1980s, just around
the time when McEnroe was on the tennisfield playing there.
Most people are more interested in whether you can win that grandslam
with what you produce.
The nerds however are interested in how well you can do with a wooden
racket
from 1980s,therefore projecting your own interest upon those students
will just
get them desinterested and you will be judged by them as an
irrelevant person
in their life, whose name they soon forget.

Vincent, I think the only person projecting here is you. You refer to
the 'average guy'. The word 'average' itself implies that statistics
have been collected and analyzed. Can you please show us your
statistics, and how you collected them, to determine what the average
guy is interested in? And what about the average girl, what is she
interested in? If you are merely citing the work of other researchers,
please include citations.

Guys, let's just let this one die in it's traditional form of Vincent
disagrees with the list and there is nothing more that can be done. I
recently read a blog that suggested (due to similar threads following
these trajectories) that the Wulf list wasn't what it used to be.

Let's save the flames for editors,

ellis

Vincent Diepeveen

2012-01-12 15:13:00 UTC

Post by Ellis H. Wilson III

Post by Prentice Bisbal

Post by Vincent Diepeveen
The average guy is not interested in knowing all details regarding
how to
play tennis with a wooden racket from the 1980s, just around
the time when McEnroe was on the tennisfield playing there.
Most people are more interested in whether you can win that
grandslam
with what you produce.
The nerds however are interested in how well you can do with a wooden
racket
from 1980s,therefore projecting your own interest upon those
students
will just
get them desinterested and you will be judged by them as an
irrelevant person
in their life, whose name they soon forget.

Vincent, I think the only person projecting here is you. You
refer to
the 'average guy'. The word 'average' itself implies that statistics
have been collected and analyzed. Can you please show us your
statistics, and how you collected them, to determine what the average
guy is interested in? And what about the average girl, what is she
interested in? If you are merely citing the work of other
researchers,
please include citations.

Guys, let's just let this one die in it's traditional form of Vincent
disagrees with the list and there is nothing more that can be done. I

Ah no medicine seems to cure you.
Let me remember the original posting of Jim:

"it seems you could put together a simple demonstration of parallel
processing and various message passing things."

The insights presented here obviously render this platform as no good
for that,
not inspiring and for sure the clever students will total get
desinterested and a bunch,
out of desinterest probably not even finish the course.

Working with stuff that isn't even within factor 500 of the speed of
a normal CPU that doesn't motivate,
doesn't inspire and basically learns a person very little.

Embedded cpu's are for professionals, leave it like that.

They are too hard for you to program efficiently.

Post by Ellis H. Wilson III
recently read a blog that suggested (due to similar threads following
these trajectories) that the Wulf list wasn't what it used to be.
Let's save the flames for editors,
ellis
_______________________________________________
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Vincent Diepeveen

2012-01-12 15:03:49 UTC

Very simple,

Wooden tennis rackets were dirt cheap in 90s.
No one bought them.

Instead they all bought for the tennis court a light frame racket
with big blade;
in fact those were pretty expensive in some cases.

Why did no one use suddenly those wooden rackets anymore?

How many people watch upcoming Australian Grandslam?

A lot.

How many will watch 1 or 2 dudes toy with a few embedded processors
using
a language no one has heard of? Only a handful.

Post by Prentice Bisbal

Post by Vincent Diepeveen
The average guy is not interested in knowing all details regarding
how to
play tennis with a wooden racket from the 1980s, just around
the time when McEnroe was on the tennisfield playing there.
Most people are more interested in whether you can win that grandslam
with what you produce.
The nerds however are interested in how well you can do with a wooden
racket
from 1980s,therefore projecting your own interest upon those students
will just
get them desinterested and you will be judged by them as an
irrelevant person
in their life, whose name they soon forget.

Vincent, I think the only person projecting here is you. You refer to
the 'average guy'. The word 'average' itself implies that statistics
have been collected and analyzed. Can you please show us your
statistics, and how you collected them, to determine what the average
guy is interested in? And what about the average girl, what is she
interested in? If you are merely citing the work of other
researchers,
please include citations.
--
Prentice
_______________________________________________
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Vincent Diepeveen

2012-01-12 15:56:32 UTC

Post by Ellis H. Wilson III
I think this is likely the reason why many
introductory engineering classes incorporate use of Lego Mindstorm
robots rather than lunar rovers (or even overstock lunar rovers :D).

I didn't comment on other complete wrong examples, but i want to
highlight
one. Your example of a lego robot actually is disproving your statement.

Amongst the affordable non-self built robots, the lego robot actually
is a genius robot.

It so to speak the i7-3960x under the robots, to compare it with the
fastest i7 that has been released to date.

It is affordable, it is completely programmable with robot OS,
and if you want to build something better you need to be pretty
genius.

A custom robot, except if you build a real simple stupid thing that
can do near to nothing, that'll be really expensive compared to such
lego robot which goes for oh a copule of hundreds of dollars only.

I see it for around 280 dollar online, and to add some components is
just a few dozens of dollars each copmonent.
The normal way to build 'something better', if better at all,
requires building most components for example from aluminium.

Each component then has a price of say roughly $5k and needs to be
special engineered. You need many of those components.

We assume then it's not a commercial project otherwise also royalties
will be involved paying for every component you build, of course that's
a small part of the above price.

Most custom robots, which are hardly bigger in size than the legorobot,
they're pretty expensive actually.

If you want to purchase components together for a tad bigger robot,
just something with 4 wheels which can hold a couple of dozens of
kilo's,
such components already are $5k - $10k.

And that's mass produced components.

So building something that actually is more functional, better,
that's not
gonna be easy.

It's a genius robot, really is.

In itself it's not really a lot more expensive , if you produce
something in the quantities at which lego produces it,
to build a bigger robot.

The reason the lego robot is very small. has really to do with safety.

Big robots rare really dangerous you know.

In cars they use already dozens of cpu's, already 10+ year old cars
have easily over 100 cpu's inside,
just for safety, with the intend that components of the car don't
damage humankind.

Robotsoftware is far too primitive there yet. No nothing safety
concerns.

In all that, the lego robot is really a genius thing.

Very bad example of what you 'tried' to show with some fake arguments.

Post by Ellis H. Wilson III
Point in case, I got interested in HPC/Beowulfery back in 2006, read
RGBs book and a few other texts on it, and finally found a small group
(4) of unused PIIIs to play on in the attic of one of my college's
buildings. Did I learn how to setup a reasonable cluster? Yes.
Was it
slow as dirt compared to then modern Intel and AMD processors? Of
course. But did the experience get me so completely hooked on
HPC/Cluster research that I went on to pursue a PHD on the topic?
Absolutely.
Granted, I'm just one data point, but I think Jim's idea has all the
right components for a great educational experience.
Best,
ellis
_______________________________________________
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Ellis H. Wilson III

2012-01-12 17:35:11 UTC

Post by Ellis H. Wilson III
I think this is likely the reason why many
introductory engineering classes incorporate use of Lego Mindstorm
robots rather than lunar rovers (or even overstock lunar rovers :D).

I didn't comment on other complete wrong examples, but i want to highlight
one. Your example of a lego robot actually is disproving your statement.

It was a price comparison, and without diving into the nitty-gritty of
how good or bad both the Arduino and the Mindstorms are in their
respective areas, it was spot on. Jim wants to give each student a 10
node cluster on the cheap (i.e. 20 to 30 bucks per node = 300 bucks),
universities want to give each student (or teams of students sometimes)
a robot (~280). Both provide an approachable level of difficulty and
potential for education at a reasonable price.

Feel free to continue to disagree for the sake of such. It was just an
example.

Best,

ellis

Vincent Diepeveen

2012-01-13 14:01:59 UTC

Post by Ellis H. Wilson III

Post by Ellis H. Wilson III
I think this is likely the reason why many
introductory engineering classes incorporate use of Lego Mindstorm
robots rather than lunar rovers (or even overstock lunar rovers :D).

I didn't comment on other complete wrong examples, but i want to highlight
one. Your example of a lego robot actually is disproving your
statement.

It was a price comparison, and without diving into the nitty-gritty
of how good or bad both the Arduino and the Mindstorms are in their
respective areas, it was spot on. Jim wants to give each student a
10 node cluster on the cheap (i.e. 20 to 30 bucks per node = 300
bucks), universities want to give each student (or teams of
students sometimes) a robot (~280). Both provide an approachable
level of difficulty and potential for education at a reasonable price.
Feel free to continue to disagree for the sake of such. It was
just an example.
Best,
ellis

It's not even spot on. You're lightyears away with your comparision.

You're comparing one of the best available robots that gets mass
produced,
with some freak thing where there is 100 alternatives which work way
better,
alternatives are 500x faster, and if you want to also cheaper,
and above all achieve the original goal better of demonstrating SMP
programming,
as the freak hardware, thanks to real low clocked type of CPU,
has a neglectible latency to other cpu's.

Where the robot shows you how to work with robots, the educational
purpose as Jim wrote down,
you won't get very well with the embedded cpu's, as the equipment has
none of the typical problems you can encounter in
a normal SMP system let alone a cluster environment, meanwhile it
has total other problems,
which you will never encounter at CPU's.

Such as that embedded cpu's have severely limited caches and can
execute just 1 instruction at a time.

Embedded programming is total different from CPU programming and
latencies embedded, thanks to the slow processor speed,
are not even comparable with SMP programming between cores of 1 cpu.

Such multicore box definitely has a cost below $300.

On ebay i see nodes with 8 cores for $200.

And those are 500x faster.

Myself i'm looking at some socket 771 Xeon machines say with a L5420.
Though they eat a lot more power than intel claims,
it's still i guess a 170 watt a machine or so under full load.

Note we still skipped the algorithmic discussion, as from algorithmic
viewpoint, if i look to artificial intelligence, getting something to
work
at 70Mhz machines is gonna behave total different and needs total
different approach than todays hardware. It's not even in the same
ballpark.

Vincent

Prentice Bisbal

2012-01-12 14:38:13 UTC

Post by Vincent Diepeveen
The whole purpose of PC's is that they are generic to use.

That is also the purpose of the Arduino. That's why they open-sourced
it's hardware design.

Post by Vincent Diepeveen
I remember
how in past decision taking bought low clocked junk for big price -
much against the wish of the sysadmins who wanted a PC for every
student exclusively. Outdated slow junk is not interesting
to students. Now you and i might like that CPU as it's under $1, but
to them it's just 70Mhz, factor 500 slower than their home PC single
core
is. What impresses is if you got something that can beat their own
machine at home.

Wrong. What impresses students is teaching something they didn't already
know, or showing them how to do something new. Using baking soda and
vinegar to build a volcano, is very low-tech, but it still impresses
students of all ages (even in this modern Apple i-everything world) and
it's done with ingredients just about everyone already has in their
kitchen.

Show them sodium acetate crystallizing out of a supersaturated solution,
and their heads practically explode. Also very low-tech.

--
Prentice

Lux, Jim (337C)

2012-01-12 14:35:50 UTC

Post by Hearns, John

Post by Lux, Jim (337C)
Interesting...
That seems to be a growing trend, then. So, now we just have to wait
for them to actually exist. The $35 B style board has Ethernet, and
assuming one could netboot and operate "headless", then a stack
o'raspberry PIs and a cheap Ethernet switch might be an alternate
approach.

Regarding Ethernet switches, I had cause recently to look for an USB
powered switch
Such things exist, they are promoted for gamers.
http://www.scan.co.uk/products/8-port-eten-pw-108-pocket-size-metal-casi
ng-10-100-switch-usb-powered-lan-party!
You could imagine a cluster being powered by those USB adapters which
fit into the cigarette
lighter socket of a car.
How about a cluster which fits in the glovebox or under the seat of a
car?

Powering off the cigarette lighter socket (or 12V power socket as they're
now labeled) is probably feasible, but those USB widgets can't source a
lot of power. Certainly not amps.

Post by Hearns, John
The contents of this email are confidential and for the exclusive use of
the intended recipient. If you receive this email in error you should
not copy it, retransmit it, use it or disclose its contents but should
return it to the sender immediately and delete your copy.
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Lux, Jim (337C)

2012-01-12 15:10:40 UTC

Post by Vincent Diepeveen
The average guy is not interested in knowing all details regarding
how to
play tennis with a wooden racket from the 1980s, just around
the time when McEnroe was on the tennisfield playing there.
Most people are more interested in whether you can win that grandslam
with what you produce.
The nerds however are interested in how well you can do with a wooden
racket
from 1980s,therefore projecting your own interest upon those students
will just
get them desinterested and you will be judged by them as an
irrelevant person
in their life, whose name they soon forget.

Having spent some time recently in Human Resources meetings about how to
better recruit software people for JPL, I'd say that something that
appeals to nerds and gives them something to do is not all bad. Part of
the educational process is to find and separate the people who are
interested and have a passion. I'm not sure that someone who starts
getting into clusters mostly because they are interested in breaking into
the Top500 is the target audience in any case.

If you look over the hobby clusters out there, the vast majority are "hey,
I heard about this interesting idea, I scrounged up N old/small/slow/easy
to find computers and tried to cluster them and do something. I learned
something about cluster administration, and it was fun, but I don't use it
anymore"

This is exactly the population you want to hit. Bring in 100 advanced
high school (grade 11-12 in US) students. Have them all use cheap
hardware to do a cluster. Some fraction will think, "this is kind of
cool, maybe I should major in CS instead of X" Some fraction will think,
"how lame, why not make the single processor faster", and they can be
CompEng or EE majors looking at how to reduce feature sizes and get the
heat out.

It's just like biology or chemistry classes. In high school biology
(9th/10th grade) most of it is mundane memorization (Krebs cycle, various
descriptive stuff. Other than the use of cheap cmos cameras, microscopes
used at this level haven't really changed much in the last 100 years (and
the microscopes at my kids' school are probably 10-20 years old). They
also do some more modern molecular biology in a series of labs partly
funded by Amgen: Some recombinant DNA to put fluorescent proteins in a
bacteria, running some gels, etc. The vast majority of the students will
NOT go on to a career in biology, but some fraction do, they get
interested in some aspect, and they wind up majoring in bio, or being a
pre-med, etc.

Not everyone is looking for the world beater. A lot of kids start with
Kart racing, even though even the fastest Karts aren't as fast as F1 (or
even a Smart Car). How many engineers started with dismantling the
lawnmower engine?

For my own work, I'd rather have people who are interested in solving
problems by ganging up multiple failure prone processors, rather than
centralizing it all in one monolithic box (even if the box happens to have
multiple cores).

Vincent Diepeveen

2012-01-12 15:21:54 UTC

Post by Lux, Jim (337C)

Post by Vincent Diepeveen
The average guy is not interested in knowing all details regarding
how to
play tennis with a wooden racket from the 1980s, just around
the time when McEnroe was on the tennisfield playing there.
Most people are more interested in whether you can win that grandslam
with what you produce.
The nerds however are interested in how well you can do with a wooden
racket
from 1980s,therefore projecting your own interest upon those students
will just
get them desinterested and you will be judged by them as an
irrelevant person
in their life, whose name they soon forget.

Having spent some time recently in Human Resources meetings about how to
better recruit software people for JPL, I'd say that something that
appeals to nerds and gives them something to do is not all bad. Part of
the educational process is to find and separate the people who are
interested and have a passion. I'm not sure that someone who starts
getting into clusters mostly because they are interested in
breaking into
the Top500 is the target audience in any case.
If you look over the hobby clusters out there, the vast majority are "hey,
I heard about this interesting idea, I scrounged up N old/small/
slow/easy
to find computers and tried to cluster them and do something. I learned
something about cluster administration, and it was fun, but I don't use it
anymore"
This is exactly the population you want to hit. Bring in 100 advanced
high school (grade 11-12 in US) students. Have them all use cheap
hardware to do a cluster. Some fraction will think, "this is kind of
cool, maybe I should major in CS instead of X" Some fraction will think,

Your example here will just take care a big number of students don't
want
to have to do anything with those studies, as there is a few lame nerds
there who toy with equipment that's factor 50k slower (adding to the
factor 500
the object oriented slowdown of factor 100) than what they have
at home, and it can do nothing useful.

But in this specific case you'll just scare away students and the
real clever ones
will get total desinterested as you are busy with lame duck speed
type cpu's.

If you'd build a small marsrover with it that would be something else
of course.

Post by Lux, Jim (337C)
"how lame, why not make the single processor faster", and they can be
CompEng or EE majors looking at how to reduce feature sizes and get the
heat out.
It's just like biology or chemistry classes. In high school biology
(9th/10th grade) most of it is mundane memorization (Krebs cycle, various
descriptive stuff. Other than the use of cheap cmos cameras,
microscopes
used at this level haven't really changed much in the last 100
years (and
the microscopes at my kids' school are probably 10-20 years old). They
also do some more modern molecular biology in a series of labs partly
funded by Amgen: Some recombinant DNA to put fluorescent proteins in a
bacteria, running some gels, etc. The vast majority of the
students will
NOT go on to a career in biology, but some fraction do, they get
interested in some aspect, and they wind up majoring in bio, or being a
pre-med, etc.
Not everyone is looking for the world beater. A lot of kids start with
Kart racing, even though even the fastest Karts aren't as fast as F1 (or
even a Smart Car). How many engineers started with dismantling the
lawnmower engine?
For my own work, I'd rather have people who are interested in solving
problems by ganging up multiple failure prone processors, rather than
centralizing it all in one monolithic box (even if the box happens to have
multiple cores).

Ellis H. Wilson III

2012-01-12 17:26:01 UTC

Post by Vincent Diepeveen

Post by Lux, Jim (337C)
This is exactly the population you want to hit. Bring in 100 advanced
high school (grade 11-12 in US) students. Have them all use cheap
hardware to do a cluster. Some fraction will think, "this is kind of
cool, maybe I should major in CS instead of X" Some fraction will think,

Your example here will just take care a big number of students don't
want
to have to do anything with those studies, as there is a few lame nerds
there who toy with equipment that's factor 50k slower (adding to the
factor 500
the object oriented slowdown of factor 100) than what they have
at home, and it can do nothing useful.
But in this specific case you'll just scare away students and the
real clever ones
will get total desinterested as you are busy with lame duck speed
type cpu's.

You have made it abundantly clear you aren't interested in enrolling in
such a course. Thanks for your comments.

On a related note, as I was thinking about 'lame duck' education, I
remembered that I took an undergraduate machine learning course in which
we designed players for connect-four, which would compete using recently
learned techniques against other students in the class. Despite that
particular game being a solved one, we all had a blast and got quite
competitive trying to beat each other out using the recently acquired
skills. I would encourage Jim to do something similar once the basics
of cluster administration are done -- perhaps a mini SC Cluster
Competition would be a neat application for the Arduinos?

Best,

ellis

Lux, Jim (337C)

2012-01-12 18:10:24 UTC

-----Original Message-----
From: beowulf-***@beowulf.org [mailto:beowulf-***@beowulf.org] On Behalf Of Ellis H. Wilson III
Sent: Thursday, January 12, 2012 9:26 AM
To: ***@beowulf.org
Subject: Re: [Beowulf] A cluster of Arduinos

Post by Vincent Diepeveen

Post by Lux, Jim (337C)
This is exactly the population you want to hit. Bring in 100
advanced high school (grade 11-12 in US) students. Have them all use
cheap hardware to do a cluster. Some fraction will think, "this is
kind of cool, maybe I should major in CS instead of X" Some fraction
will think,

Your example here will just take care a big number of students don't
want to have to do anything with those studies, as there is a few lame
nerds there who toy with equipment that's factor 50k slower (adding to
the factor 500 the object oriented slowdown of factor 100) than what
they have at home, and it can do nothing useful.
But in this specific case you'll just scare away students and the real
clever ones will get total desinterested as you are busy with lame
duck speed type cpu's.

You have made it abundantly clear you aren't interested in enrolling in such a course. Thanks for your comments.

On a related note, as I was thinking about 'lame duck' education, I remembered that I took an undergraduate machine learning course in which we designed players for connect-four, which would compete using recently learned techniques against other students in the class. Despite that particular game being a solved one, we all had a blast and got quite competitive trying to beat each other out using the recently acquired skills. I would encourage Jim to do something similar once the basics of cluster administration are done -- perhaps a mini SC Cluster Competition would be a neat application for the Arduinos?

----------------------------------------
Ooohh.. that sounds *very* cool..

A bunch of slow processors.
A simple problem to solve (e.g. 3D tic-tac-toe) for which there might even be published parallel approaches
The challenge is effectively using the limited system, warts and all.

The RaspberryPI might be a better vehicle, if it hits the price/availability targets: Comparable to Arduinos in price, but a bit more sophisticated and less contrived.

We've been talking about what kind of software competitions JPL could run as a recruiting tool at Universities, and that's along those lines. Hmm... I wonder if they'd be willing to spend recruiting funds on that? (probably not.. we're all poor this fiscal year)

And, on the undergrad education thing... At UCLA, I had to write stuff in MIXAL to run on a simulated MIX machine and complained mightily to the TAs, who just pointed to the sacred texts of Knuth, rather than giving an intelligent response as to why we didn't do something like work in PDP-11 ASM or System/360 BAL. (UCLA at the time had a monster 360, but I don't know that they had many 11s, and realistically, BAL is not something I'd inflict on 2nd quarter first year students. We were a PL/I or PL/C shop in the first couple years' classes for the most part, although there were people doing Algol)

OTOH, I suspect was an atypical incoming student for 1977.

I had, the previous year, done the Pascal courses at UCSD with p-machines running on LSI-11s as well as the Pascal system on the big Burroughs B6700, which uses a form of ALGOL as the machine language and is a stack machine to boot (how cool is that? Burroughs always did have cool machines.. Hey, they built ILLIAC IV). I had also done some ASM stuff on an 11/20 under RT-11. I guess that's characteristic of the differences in philosophy between different CS departments (UCSD was heading more in the direction of Software Engineering being part of the School of Engineering and Applied Sciences, while UCLA it was part of the Math department. Little did I know, as a cybernetics major, what the difference was: It sure as heck isn't manifested in the course catalog, at least in a form that a incoming student could discern. Going back now, I could probably look at catalogs from the various universities of the era and divine their philosophies, but that's clearly 2020 hindsight)

Douglas Eadline

2012-01-12 16:49:25 UTC

snip

Post by Lux, Jim (337C)
For my own work, I'd rather have people who are interested in solving
problems by ganging up multiple failure prone processors, rather than
centralizing it all in one monolithic box (even if the box happens to have
multiple cores).

This is going to be an exascale issue. i.e. how to compute on a systems
whose parts might be in a constant state of breaking. An other interesting
question is how do you know you are getting the right answer on a *really*
large system?

Of course I spend much of my time optimizing really small
systems.

--
Doug
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Lux, Jim (337C)

2012-01-12 17:54:52 UTC

-----Original Message-----
From: Douglas Eadline [mailto:***@eadline.org]
Sent: Thursday, January 12, 2012 8:49 AM
To: Lux, Jim (337C)
Cc: ***@beowulf.org
Subject: Re: [Beowulf] A cluster of Arduinos

snip

Post by Lux, Jim (337C)
For my own work, I'd rather have people who are interested in solving
problems by ganging up multiple failure prone processors, rather than
centralizing it all in one monolithic box (even if the box happens to
have multiple cores).

This is going to be an exascale issue. i.e. how to compute on a systems whose parts might be in a constant state of breaking. An other interesting question is how do you know you are getting the right answer on a *really* large system?

Of course I spend much of my time optimizing really small systems.

--

Your point about scaling is well taken.. so far, the computing world has largely dealt with things by trying to make the processor perfect and error free. Some limited areas of error correction are popular (RAM). But think in a bigger area... say your arithmetic unit has some infrequent unknown errors (e.g. FDIV bug on Pentium).. could clever algorithm design and multiple processors (or multi cores) mitigate this (e.g. instead of just computing Z = X/Y you also compute Z1 = (X*2)/(Y*2).. and compare answers... that exact example's not great because you've added 2 operations, but I can see that there are other clever techniques that might be possible.. )

What is nice if you can do things like temporal redundancy (do the calculation twice, and if it's different, do it a third time), or even better some sort of "check calculation" that takes small time compared to mainline calculation.

This, I think, is somewhere that even the big iron/cluster folks could be doing some research. What are optimum communication fabrics to support this kind of "side calculation" which may have different communication patterns and data flow than the "mainline". It has a parallel in things like CRC checks in communications protocols. A lot of hardware has a dedicated little CRC checker that is continuously calculating the CRC as the bits arrive, so that when you get to the end of the frame, the answer is already there.

And Doug, your small systems have a lot of the same issues, perhaps because that small Limulus might be operated in environments other than what the underlying hardware was designed for. I know people who have been rudely surprised when they found that the design environment for a laptop is a pretty narrow temperature range (e.g. office desktop) and when they put them in a car, subject to 0C or 40C temperatures, if not wider, that things don't work quite as well as expected.

Very small systems (few nodes) have the same issues, in some environments (e.g. a cluster subject to single event upsets or functional interrupts in a high radiation environment with a lot of high energy charged particles. it's not so much a total dose thing, but a SEE thing)

For Juno (which is in polar orbit around Jupiter), we shielded everything in a vault (a 1 meter cube with 1cm thick titanium walls) and still it's an issue. We don't get very long before everything is cooked.

And I think that a non-trivially small cluster (e.g. more than 4 nodes, I think) you could do a lot of experimentation on techniques.

(oddly, simulated fault injection is one of the trickier parts)

Douglas Eadline

2012-01-13 15:18:02 UTC

Post by Lux, Jim (337C)
-----Original Message-----
Sent: Thursday, January 12, 2012 8:49 AM
To: Lux, Jim (337C)
Subject: Re: [Beowulf] A cluster of Arduinos
snip

Post by Lux, Jim (337C)
For my own work, I'd rather have people who are interested in solving
problems by ganging up multiple failure prone processors, rather than
centralizing it all in one monolithic box (even if the box happens to
have multiple cores).

This is going to be an exascale issue. i.e. how to compute on a systems
whose parts might be in a constant state of breaking. An other interesting
question is how do you know you are getting the right answer on a *really*
large system?
Of course I spend much of my time optimizing really small systems.
--
Your point about scaling is well taken.. so far, the computing world has
largely dealt with things by trying to make the processor perfect and
error free. Some limited areas of error correction are popular (RAM).
But think in a bigger area... say your arithmetic unit has some infrequent
unknown errors (e.g. FDIV bug on Pentium).. could clever algorithm design
and multiple processors (or multi cores) mitigate this (e.g. instead of
just computing Z = X/Y you also compute Z1 = (X*2)/(Y*2).. and compare
answers... that exact example's not great because you've added 2
operations, but I can see that there are other clever techniques that
might be possible.. )
What is nice if you can do things like temporal redundancy (do the
calculation twice, and if it's different, do it a third time), or even
better some sort of "check calculation" that takes small time compared to
mainline calculation.
This, I think, is somewhere that even the big iron/cluster folks could be
doing some research. What are optimum communication fabrics to support
this kind of "side calculation" which may have different communication
patterns and data flow than the "mainline". It has a parallel in things
like CRC checks in communications protocols. A lot of hardware has a
dedicated little CRC checker that is continuously calculating the CRC as
the bits arrive, so that when you get to the end of the frame, the answer
is already there.
And Doug, your small systems have a lot of the same issues, perhaps
because that small Limulus might be operated in environments other than
what the underlying hardware was designed for. I know people who have
been rudely surprised when they found that the design environment for a
laptop is a pretty narrow temperature range (e.g. office desktop) and when
they put them in a car, subject to 0C or 40C temperatures, if not wider,
that things don't work quite as well as expected.

I will be curious to see where these things show up since
all you really need is a power plug. (a little nervous actually).

Post by Lux, Jim (337C)
Very small systems (few nodes) have the same issues, in some environments
(e.g. a cluster subject to single event upsets or functional interrupts in
a high radiation environment with a lot of high energy charged particles.
it's not so much a total dose thing, but a SEE thing)
For Juno (which is in polar orbit around Jupiter), we shielded everything
in a vault (a 1 meter cube with 1cm thick titanium walls) and still it's
an issue. We don't get very long before everything is cooked.
And I think that a non-trivially small cluster (e.g. more than 4 nodes, I
think) you could do a lot of experimentation on techniques.

I agree. Four nodes is really small. BTW, the most fun in designing
this system is a set of tighter constraints than are found on the typical
cluster. Noise, power, space, cabling, low cost packaging, etc. I have
been asked about a rack mount version, we'll see.

One thing I find interesting is the core/node efficiency.
(what I call "effective cores") In general *on some codes*, I found that
less cores (1P micro-atx 4-cores) is more efficient than many
cores (2P server 12-core). Seems obvious, but I like to test things.

Post by Lux, Jim (337C)
(oddly, simulated fault injection is one of the trickier parts)

I would assume, because in a sense, the black swan* is
by definition hard to predict.

(* the book by Nick Taleb, not the movie)

--
Doug

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Chris Samuel

2012-01-14 04:46:17 UTC

Post by Douglas Eadline
I would assume, because in a sense, the black swan* is
by definition hard to predict.

Ahem, not around here, they're all black [1]. Now a white swan, that
would be something to see!

[1] http://www.flickr.com/photos/earthinmyeyes/4608041877/

cheers!
Chris

--
Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: ***@unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.unimelb.edu.au/

Nathan Moore

2012-01-13 14:33:33 UTC

Jim,

Have you ever interacted with the "Modeling Instruction" folks over at
ASU? http://modeling.asu.edu/

They've done, for HS Physics, more or less what you're talking about
in terms of making the subject engaging, compelling, and diven by
student, not teacher, interest.

On Thu, Jan 12, 2012 at 9:10 AM, Lux, Jim (337C)

Post by Lux, Jim (337C)

Post by Vincent Diepeveen
The average guy is not interested in knowing all details regarding
how to
play tennis with a wooden racket from the 1980s, just around
the time when McEnroe was on the tennisfield playing there.
Most people are more interested in whether you can win that grandslam
with what you produce.
The nerds however are interested in how well you can do with a wooden
racket
from 1980s,therefore projecting your own interest upon those students
will just
get them desinterested and you will be judged by them as an
irrelevant person
in their life, whose name they soon forget.

Having spent some time recently in Human Resources meetings about how to
better recruit software people for JPL, I'd say that something that
appeals to nerds and gives them something to do is not all bad. Part of
the educational process is to find and separate the people who are
interested and have a passion. I'm not sure that someone who starts
getting into clusters mostly because they are interested in breaking into
the Top500 is the target audience in any case.
If you look over the hobby clusters out there, the vast majority are "hey,
I heard about this interesting idea, I scrounged up N old/small/slow/easy
to find computers and tried to cluster them and do something. I learned
something about cluster administration, and it was fun, but I don't use it
anymore"
This is exactly the population you want to hit. Bring in 100 advanced
high school (grade 11-12 in US) students. Have them all use cheap
hardware to do a cluster. Some fraction will think, "this is kind of
cool, maybe I should major in CS instead of X" Some fraction will think,
"how lame, why not make the single processor faster", and they can be
CompEng or EE majors looking at how to reduce feature sizes and get the
heat out.
It's just like biology or chemistry classes. In high school biology
(9th/10th grade) most of it is mundane memorization (Krebs cycle, various
descriptive stuff. Other than the use of cheap cmos cameras, microscopes
used at this level haven't really changed much in the last 100 years (and
the microscopes at my kids' school are probably 10-20 years old). They
also do some more modern molecular biology in a series of labs partly
funded by Amgen: Some recombinant DNA to put fluorescent proteins in a
bacteria, running some gels, etc. The vast majority of the students will
NOT go on to a career in biology, but some fraction do, they get
interested in some aspect, and they wind up majoring in bio, or being a
pre-med, etc.
Not everyone is looking for the world beater. A lot of kids start with
Kart racing, even though even the fastest Karts aren't as fast as F1 (or
even a Smart Car). How many engineers started with dismantling the
lawnmower engine?
For my own work, I'd rather have people who are interested in solving
problems by ganging up multiple failure prone processors, rather than
centralizing it all in one monolithic box (even if the box happens to have
multiple cores).
_______________________________________________
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

--
- - - - - - - - - - - - - - - - - - - - -
Nathan Moore
Associate Professor, Physics
Winona State University
- - - - - - - - - - - - - - - - - - - - -

Lux, Jim (337C)

2012-01-12 15:35:41 UTC

Post by Ellis H. Wilson III
I
recently read a blog that suggested (due to similar threads following
these trajectories) that the Wulf list wasn't what it used to be.

I think that's for a variety of reasons..

The cluster world has changed. Back 15-20 years ago, clusters were new,
novel, and pretty much roll your own, so there was a lot of traffic on the
list about how to do that. Remember all the mobo comparisons, and all the
carefully teased out idiosyncracies of various switches and network
schemes.

Back then, the idea of using a cluster for "big computing" was kind of
new, as well. People building clusters were doing it either because the
architecture was interesting OR because they had a computing problem to
solve, and a cluster was a cheap way to do it, especially with free labor.

I think clustering has evolved, and the concept of a cluster is totally
mature. You can buy a cluster essentially off the shelf, from a whole
variety of companies (some with people who were participating in this list
back then and still today), and it's interesting how the basic Beowulf
concept has evolved.

Back in late 90s, it was still largely "commodity computers, commodity
interconnects" where the focus was on using "business class" computers and
networking hardware. Perhaps not consumer, as cheap as possible, but
certainly not fancy, schmancy rack mounted 1U servers.. The switches
people were using were just ordinary network switches, the same as in the
wiring closet down the hall.

Over time, though, there has developed a whole industry of supplying
components specifically aimed at clusters: high speed interconnects,
computers, etc. Some of this just follows the IT industry in general..
There weren't as many "server farms" back in 1995 as there are now.

Maybe it's because the field has matured?

So, we're back to talking about "roll-your-own" clusters of one sort or
another. I think anyone serious about big cluster computing (>100 nodes)
probably won'd be hanging on this list looking for hints on how to route
and label their network cables. There's too many other places to go get
that information, or, better yet, places to hire someone who already knows.

I know that if I needed massive computational power at work, my first
thought these days isn't "hey, lets build a cluster", it's "let's call up
the HPC folks and get an account on one of the existing clusters".

But I still see the need to bring people into the cluster world in some
way. I don't know where the cluster vendors find their people, or even
what sorts of skill sets they're looking for. Are they beating the bushes
at CMU, MIT, and other hotbeds of CS looking for prior cluster design
experience? I suspect not, just like most of the people JPL hires don't
have spacecraft experience in school, or anywhere. You look for bright
people who might be interested in what you're doing, and they learn the
details of cluster-wrangling on the job.

For myself, I like probing the edges of what you can do with a cluster.
Big computational problems don't excite me. I like thinking about things
like:

1) What can I use from the body of cluster knowledge to do something
different. A distributed cluster is topologically similar to one all
contained in a single rack, but it's different. How is it different
(latency, error rate)? Can I use analysis (particularly from early cluster
days) to do a better job.

2) I've always been a fan of *personal* computing (probably from many
years of negotiating for a piece of some shared resource). It's tricky
here, because as soon as you have a decent 8 or 16 node cluster that fits
under a desk, and have figured out all the hideous complexity of how to
port some single user application to run on it, someone comes out with a
single processor box that's just as fast, and a lot easier to use. Back
in the 80s, I designed, but did not build, a 80286 clone using discrete
ECL logic, the idea being to make a 100MHz IBM PC-AT that would run
standard spreadsheet software 20 times faster (a big deal when your huge
spreadsheet takes hours to recalculate). However, Moore's law and Intel
made that idea a losing proposition.

But still, the idea of personal control over my computing resources is
appealing. Nobody watching to see "are you effectively using those cpu
cycles". No arguing about annual re-adjustment of chargeback rates where
you take the total system budget and divide it by CPU seconds. Ooops not
enough people used it, so your CPU costs just quadrupled.

3) I'm also interested in portable computing (Yes, I have a NEC 8201-
TRS-80 Model 100 clone, and a TI-59, I did sell the Compaq, but I had one
of those too, etc.) This is another interesting problem space.. No big
computer room with infrastructure. Here, the fascinating trade is between
local computer horsepower and cheap long distance datacomm. At some
point, it's cheaper/easier to send your data via satellite link to a big
computer elsewhere and get the results back. It's the classic 60s remote
computing problem revisited once again.

Vincent Diepeveen

2012-01-12 16:45:29 UTC

Well i feel small clusters of say 2 computers or so might get more
common in future.

Yet let's start asking:

What is a cluster however?

That's not such a simple answer.

Having a few computers at home connected via a router with simple
default ethernet
is something many have at home.

Is that a cluster?

Maybe.

Let me focus pon the clusters with a decent network.

The decent network clusters suffer from a number of problems.

The biggest problem for this list:

0) yesterday i read in the newspaper another Irani scientist was
killed by a carbomb.

Past few years i really missed experts posting in here and some dorks
who really have nothing to contribute to the cluster world,
and just are there to be here, like Jonathan Aquilina, they get back
in return. So experts leave and idiots come back.

This has completely killed this mailing list.

1) The lack of postings by RGB past few months, especially the ones
where he explains how easy
it is to build a nuke, given the right ingredients, which gives
interesting discussions.

Let's look to clusters:

10) the lack of software support for clusters

This is the real big issue.

Sure you can get expensive commercial software to run on clusters,
but that's all
interesting just for scientists.

Which game can effectively use cluster hardware and is dirt cheap?

This really is a big issue.

Note i intend to contribute myself there to change that, but that's
just 1 person of course.
Not an entire market moving there

11) the huge break even point of using clusterhardware

I can give examples that i sat here at home with next to me Don
Dailey, the programmer of Cilkchess,
which used Cilk from Leierson. We played Diep at a single cpu against
Cilkchess single cpu and Cilkchess
got total toasted.

After having been fried for 4 consecutive games, Don had enough of it
and disconnected the connection
to the cluster, from which he used 1 cpu for the games, and started
to play at a version at his laptop,
which did NOT use CILK. So no parallel framework.

It was factor 40 faster.

Now note that at tournaments they showed up with 500 or even 1800 cpu's,
yet you can't have a cluster of 1800 cpu's at home.

Usually building a 4 socket box is far easier, though not necessarily
cheaper, and practical faster than a small cluster.

Especially AMD has a bunch of cheap 4 socket solutions int he market,
if you buy those 2nd hand ,there is not really
any competition there from 4 socket clusters in the same price range.

100) the huge increase in power consumption lately of machines. Up to
2002 i used to visit
someone, Jan Louwman, who had 36 computres at home, testing
chessprograms at home.
So that wasn't a cluster, just a bunch of machiens, in sets of 2
machines connected with a special
cable we used to play back then machines against each other.

Nearly all of those machines was 60-100 watt or so.

He had divided his computers over 3 rooms or so, majority in 1
room though. There the 16 ampere @ 230 volt
power plug already had problems supplying this amount of
electricity. Around the power plug in the wall,
the wall and plastic of the powerplug were completely black burned.

As there was only a single P4 machine amongst the computers,
only 1 box really consumed a lot of power.

Try to run 36 computers at home nowadays. Most machines are well over
250 watt,
and the fastest 2 machines i've got here eat 410 respectively 270 watt.

That's excluding the videocard in the 410 watt machine, as it's out
of it currently (AMD HD 6970),
the box has been setup for gpgpu.

36 machines eat way way too much power.

This is a very simple practical problem that one shouldn't overlook.

It's not realistic that the average joe sets up at his popular gaming
program a cluster of more
than 2 machines or so.

A 2 machine cluster will never beat a 2 socket machine, except when
each node also has 2 sockets.

So clustering simple home computers together isn't really useful
except if you really cluster together half a dozen or more.

Half a dozen machines, using the 250 watt measure and another 25 watt
for each card and 200 watt for the switch,
it's gonna eat 6 * 275 + 200 = 1850 watt. You really need diehards
for that.

They are there and more than you and i guess, but they need SOFTWARE
that interests them that can use it in a very
efficient manner, clearly proven to them to be working great and
easy to install, which refers to point 11.

101) most people like to buy new stuff. new cluster hardware is very
expensive for more than 2 computers as it needs a switch.
Second hand it's a lot cheaper, sometimes even dirt cheap,
yet that's already not what most people like to do

110) Linux had a few setbacks and got less attractive. Say when we
had redhat end 90s with x-windows it was slowly improving
a lot. Then x64 was there together with a big dang and we went
back years and years to x.org.

X.org threw back linux 10 years in time. It eats massive RAM,
it's ugly bad, it's slow, it's difficult to configure etc.

Basically there isn't many good distributions now that are for
free.

As most clusters work only very well under linux, the
difficulty of using linux should really be factored in.

Have a problem under linux?

Then forget it as a normal user.

Now for me linux got MORE attractive as i get hacked total
silly by every consultant who on this planet knows how to hack on the
internet,
yet that's not representative for those with cash who can
afford a cluster. Note i don't fall into the cash group. My total
income in 2011 was real little.

111) Usually the big cash to afford a cluster is for people with a
good job or a tad older, that's usually a different group than the
group that
can work with linux. See the previous points for that

Despite all that i believe clusters will get more popular in future,
for a simple reason: processors don't really clock higher.
So all software that can use additional calculation power already is
getting parallellized or already has been paralelllized.

It's a matter of time before some of those applications also will
work well at cluster hardware. Yet this is a slow proces
and it really requires software that works real efficient at small
number of nodes.

As an example of why i feel this will happen i give to you the
popularity amongst gamers to run 2 graphics cards connected via a
bridge with
each other within 1 machine.

Yet the important factor there is that the games really profit from
doing that.

Post by Lux, Jim (337C)

Post by Ellis H. Wilson III
I
recently read a blog that suggested (due to similar threads following
these trajectories) that the Wulf list wasn't what it used to be.

I think that's for a variety of reasons..
The cluster world has changed. Back 15-20 years ago, clusters were new,
novel, and pretty much roll your own, so there was a lot of traffic on the
list about how to do that. Remember all the mobo comparisons, and all the
carefully teased out idiosyncracies of various switches and network
schemes.
Back then, the idea of using a cluster for "big computing" was kind of
new, as well. People building clusters were doing it either
because the
architecture was interesting OR because they had a computing
problem to
solve, and a cluster was a cheap way to do it, especially with free labor.
I think clustering has evolved, and the concept of a cluster is totally
mature. You can buy a cluster essentially off the shelf, from a whole
variety of companies (some with people who were participating in this list
back then and still today), and it's interesting how the basic Beowulf
concept has evolved.
Back in late 90s, it was still largely "commodity computers, commodity
interconnects" where the focus was on using "business class"
computers and
networking hardware. Perhaps not consumer, as cheap as possible, but
certainly not fancy, schmancy rack mounted 1U servers.. The switches
people were using were just ordinary network switches, the same as in the
wiring closet down the hall.
Over time, though, there has developed a whole industry of supplying
components specifically aimed at clusters: high speed interconnects,
computers, etc. Some of this just follows the IT industry in
general..
There weren't as many "server farms" back in 1995 as there are now.
Maybe it's because the field has matured?
So, we're back to talking about "roll-your-own" clusters of one sort or
another. I think anyone serious about big cluster computing (>100 nodes)
probably won'd be hanging on this list looking for hints on how to route
and label their network cables. There's too many other places to go get
that information, or, better yet, places to hire someone who
already knows.
I know that if I needed massive computational power at work, my first
thought these days isn't "hey, lets build a cluster", it's "let's call up
the HPC folks and get an account on one of the existing clusters".
But I still see the need to bring people into the cluster world in some
way. I don't know where the cluster vendors find their people, or even
what sorts of skill sets they're looking for. Are they beating the bushes
at CMU, MIT, and other hotbeds of CS looking for prior cluster design
experience? I suspect not, just like most of the people JPL hires don't
have spacecraft experience in school, or anywhere. You look for bright
people who might be interested in what you're doing, and they learn the
details of cluster-wrangling on the job.
For myself, I like probing the edges of what you can do with a
cluster.
Big computational problems don't excite me. I like thinking about things
1) What can I use from the body of cluster knowledge to do something
different. A distributed cluster is topologically similar to one all
contained in a single rack, but it's different. How is it different
(latency, error rate)? Can I use analysis (particularly from early cluster
days) to do a better job.
2) I've always been a fan of *personal* computing (probably from many
years of negotiating for a piece of some shared resource). It's tricky
here, because as soon as you have a decent 8 or 16 node cluster that fits
under a desk, and have figured out all the hideous complexity of how to
port some single user application to run on it, someone comes out with a
single processor box that's just as fast, and a lot easier to use.
Back
in the 80s, I designed, but did not build, a 80286 clone using
discrete
ECL logic, the idea being to make a 100MHz IBM PC-AT that would run
standard spreadsheet software 20 times faster (a big deal when your huge
spreadsheet takes hours to recalculate). However, Moore's law and Intel
made that idea a losing proposition.
But still, the idea of personal control over my computing resources is
appealing. Nobody watching to see "are you effectively using those cpu
cycles". No arguing about annual re-adjustment of chargeback rates where
you take the total system budget and divide it by CPU seconds.
Ooops not
enough people used it, so your CPU costs just quadrupled.
3) I'm also interested in portable computing (Yes, I have a NEC 8201-
TRS-80 Model 100 clone, and a TI-59, I did sell the Compaq, but I had one
of those too, etc.) This is another interesting problem space.. No big
computer room with infrastructure. Here, the fascinating trade is between
local computer horsepower and cheap long distance datacomm. At some
point, it's cheaper/easier to send your data via satellite link to a big
computer elsewhere and get the results back. It's the classic 60s remote
computing problem revisited once again.
_______________________________________________
Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

Vincent Diepeveen

2012-01-12 16:58:32 UTC

What really made small clusters at home less attractive,
there is another reason i should add :

That's the rise of cheap multi socket machines.

A 2 socket machine is not so expensive anymore nowadays.
So if you want faster than 1 socket, you buy a 2 socket machine.

If you want faster than that , 4 sockets is there.

That choice wasn't there before end 90s easily available. And in the
21th century it has become cheap.

Another delaying factor is the rise of so many cores per node. AMD
and intel sell cpu's for their 4 socket line
that has up to double the amount of nodes than you can have in a
single socket box.

So it's equivalent nearly to 8 nodes, be it low clocked.

For that reason clusters tend to get more effective at a dozen nodes
or more, assuming cheap single socket nodes.

Ellis H. Wilson III

2012-01-12 17:55:41 UTC

I really should be following Joe's advice circa 2008 and just not
responding, but I can't help myself.

Post by Vincent Diepeveen
1) The lack of postings by RGB past few months, especially the ones
where he explains how easy
it is to build a nuke, given the right ingredients, which gives
interesting discussions.

The last post from RGB was a long, long discussion about how very wrong
you were about RNGs. You just don't get it. It's okay to be wrong once
in a while Vincent, and even moreso to just agree to disagree. Foolish,
unedited and inflammatory diatribes with a unnatural dose of newlines
are what is killing this list and what that blog I referenced was
specifically disappointed with.

So please, I'm begging you. Stop writing huge emails that trail off
from their original point. Try to say things in a non-inflammatory
manner. Use spell-check, and try to read your emails once before
sending them. And last of all, remember that there are many people on
this list that have all sorts of different applications -- not just
Chess. Your experience does not generalize well to all areas.

Speaking of which, for anyone who is interested in doing serious work
with low-power processors, please see a paper named FAWN for an
excellent example of use-cases where low hertz low power processors can
do some great work. It's by Dave Anderson of CMU. I was lucky enough
to be invited to the CMU PDL retreat a few months back and had a nice
conversation about the project when we went for a run together. There
are some use-cases that benefit massively from that kind of architecture.

Best,

ellis

Joe Landman

2012-01-12 18:47:21 UTC

Post by Ellis H. Wilson III
I really should be following Joe's advice circa 2008 and just not
responding, but I can't help myself.

huh ...?

Post by Ellis H. Wilson III

Post by Vincent Diepeveen
1) The lack of postings by RGB past few months, especially the ones
where he explains how easy
it is to build a nuke, given the right ingredients, which gives
interesting discussions.

The last post from RGB was a long, long discussion about how very wrong
you were about RNGs. You just don't get it. It's okay to be wrong once
in a while Vincent, and even moreso to just agree to disagree. Foolish,
unedited and inflammatory diatribes with a unnatural dose of newlines
are what is killing this list and what that blog I referenced was
specifically disappointed with.
So please, I'm begging you. Stop writing huge emails that trail off
from their original point. Try to say things in a non-inflammatory

... oh ... never mind :)
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: ***@scalableinformatics.com
web : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615

Lux, Jim (337C)

2012-01-13 16:26:29 UTC

Post by Douglas Eadline

Post by Lux, Jim (337C)
And Doug, your small systems have a lot of the same issues, perhaps
because that small Limulus might be operated in environments other than
what the underlying hardware was designed for. I know people who have
been rudely surprised when they found that the design environment for a
laptop is a pretty narrow temperature range (e.g. office desktop) and
when
they put them in a car, subject to 0C or 40C temperatures, if not wider,
that things don't work quite as well as expected.

I will be curious to see where these things show up since
all you really need is a power plug. (a little nervous actually).

Yes.. That *will* be interesting... And wait til someone has a cluster of
Limuluses (Not sure of the proper alliterative collective noun, nor the
plural form.. A litany of limuli? A school? A murder?)

Post by Douglas Eadline
I agree. Four nodes is really small. BTW, the most fun in designing
this system is a set of tighter constraints than are found on the typical
cluster. Noise, power, space, cabling, low cost packaging, etc. I have
been asked about a rack mount version, we'll see.
One thing I find interesting is the core/node efficiency.
(what I call "effective cores") In general *on some codes*, I found that
less cores (1P micro-atx 4-cores) is more efficient than many
cores (2P server 12-core). Seems obvious, but I like to test things.

Yes, because we're using, in general, commodity components/assemblies,
we're subject to the results of optimizations and market/business forces
in other user spaces. Someone designing a media PC for home use might not
care about electrical efficiency (there's no big yellow energy tags on
computers, yet), but would care about noise. Someone designing a rack
mounted server cares not a whit about noise, but really cares about a 10%
change in power consumption.

And, drop on top of that the non-synchronized differences in
development/manufacturing/fabrication generations for the underlying
parts. Consumer stuff comes out for the winter selling season. Commercial
stuff probably is on a different cycle. It's not like everyone uses the
same "model year changeover".

Post by Douglas Eadline

Post by Lux, Jim (337C)
(oddly, simulated fault injection is one of the trickier parts)

I would assume, because in a sense, the black swan* is
by definition hard to predict.

Not so much that, as the actual mechanics of fault injection. Think about
testing error detection and recovery for Flash memory. The underlying
specification error rate is something like 1E-9 or 1E-10/read, and that's
a worst case kind of spec, so errors aren't too common (I.e. You can't
just run and wait for them to occur). SO how do you cause errors to occur
(without perturbing the system.)...

In the flash case, because we developed our own flash controller logic in
an FPGA, we can add "error injection logic" to the design, but that's not
always the case. How would you simulate upsets in a CPU core? (short of
blasting it with radiation.. Which is difficult and expensive.. I wish it
was as easy as getting a little Co60 gamma source and putting it on top of
the chip.. We hike to somewhere that has an accelerator (UC Davis,
Brookhaven, etc) and shoot protons and heavy ions at it.

Post by Douglas Eadline
(* the book by Nick Taleb, not the movie)

Black swans in this case would be things like the Pentium divide bug.
Yes.. That *would* be a challenge, but hey, we've got folks in our JPL
Laboratory for Reliable Software (LARS) who sit around thinking of how to
do that, among other things. (http://lars-lab.jpl.nasa.gov/) Hmm.. I'll
have to go talk to those guys about clusters of pi or arduinos... They're
big into formal verifications, too, and model based verification. So you
could have a modeled system in SysML or UML and compare its behavior with
that on your prototype.

Post by Douglas Eadline
--
Doug
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Mark Hahn

2012-01-14 04:18:57 UTC

Post by Lux, Jim (337C)
care about electrical efficiency (there's no big yellow energy tags on
computers, yet), but would care about noise. Someone designing a rack

the "plus 80" branding is pretty ubiquitous now, and the best part
is that commodity ATX parts are starting to show up at gold levels.
server vendors have offered gold or platinum for a while now, but it's
probably more important in the home, since personal machines spend more
time idling, thus running the PSU at low demand. poor-quality PSUs
are remarkably bad at low utilization.

regards, mark hahn.

45 Replies
6 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Lux, Jim (337C) 2012-01-11 16:18:41 UTC

Prentice Bisbal 2012-01-11 16:58:59 UTC

Nathan Moore 2012-01-11 17:31:30 UTC

Vincent Diepeveen 2012-01-11 17:44:43 UTC

Lux, Jim (337C) 2012-01-11 17:58:13 UTC

Chris Samuel 2012-01-12 01:04:32 UTC

Lux, Jim (337C) 2012-01-12 01:22:07 UTC

Hearns, John 2012-01-12 10:16:28 UTC

Prentice Bisbal 2012-01-12 14:28:56 UTC

Vincent Diepeveen 2012-01-11 17:43:17 UTC

Lux, Jim (337C) 2012-01-11 18:00:36 UTC

Lux, Jim (337C) 2012-01-11 18:19:24 UTC

Vincent Diepeveen 2012-01-11 22:47:00 UTC

Vincent Diepeveen 2012-01-11 22:56:12 UTC

Lux, Jim (337C) 2012-01-11 23:24:55 UTC

Vincent Diepeveen 2012-01-12 00:36:37 UTC

Chris Samuel 2012-01-12 00:59:18 UTC

Lux, Jim (337C) 2012-01-12 01:09:53 UTC

Vincent Diepeveen 2012-01-12 02:03:21 UTC

Ellis H. Wilson III 2012-01-12 13:58:20 UTC

Vincent Diepeveen 2012-01-12 14:39:23 UTC

Prentice Bisbal 2012-01-12 14:50:05 UTC

Ellis H. Wilson III 2012-01-12 14:53:57 UTC

Vincent Diepeveen 2012-01-12 15:13:00 UTC

Vincent Diepeveen 2012-01-12 15:03:49 UTC

Vincent Diepeveen 2012-01-12 15:56:32 UTC

Ellis H. Wilson III 2012-01-12 17:35:11 UTC

Vincent Diepeveen 2012-01-13 14:01:59 UTC

Prentice Bisbal 2012-01-12 14:38:13 UTC

Lux, Jim (337C) 2012-01-12 14:35:50 UTC

Lux, Jim (337C) 2012-01-12 15:10:40 UTC

Vincent Diepeveen 2012-01-12 15:21:54 UTC

Ellis H. Wilson III 2012-01-12 17:26:01 UTC

Lux, Jim (337C) 2012-01-12 18:10:24 UTC

Douglas Eadline 2012-01-12 16:49:25 UTC

Lux, Jim (337C) 2012-01-12 17:54:52 UTC

Douglas Eadline 2012-01-13 15:18:02 UTC

Chris Samuel 2012-01-14 04:46:17 UTC

Nathan Moore 2012-01-13 14:33:33 UTC

Lux, Jim (337C) 2012-01-12 15:35:41 UTC

Vincent Diepeveen 2012-01-12 16:45:29 UTC

Vincent Diepeveen 2012-01-12 16:58:32 UTC

Ellis H. Wilson III 2012-01-12 17:55:41 UTC

Joe Landman 2012-01-12 18:47:21 UTC

Lux, Jim (337C) 2012-01-13 16:26:29 UTC

Mark Hahn 2012-01-14 04:18:57 UTC

about - legalese

Loading...