[music-dsp] [OT] Modular design and feedback

Discussion:

David Brännvall

2003-02-20 13:53:00 UTC

I am working on a modular dsp engine!

Does the commecial modular softsynths (like VAZ, Reaktor, what else is there?) handle circular connections correctly, ie only one sample delay, or do they process samples in blocks?

regards,
David

Smartelectronix - Bram de Jong

2003-02-20 14:48:01 UTC

Permalink

Post by David BrÃ¤nnvall
I am working on a modular dsp engine!
Does the commecial modular softsynths (like VAZ, Reaktor, what else is

there?)

Post by David BrÃ¤nnvall
handle circular connections correctly, ie only one sample delay, or do

they process samples in blocks?

The only one that handles 1-sample feedback is Sync Modular, now bought by
Native Instruments.
Before you ask: it uses dynamic recompilation as that is the only way to
effeciently handle
1-sample processing calls within structures with many feedback loops and few
delays.

cheers,

- bram

Smartelectronix - Bram de Jong

2003-02-20 15:07:01 UTC

Permalink

----- Original Message -----

Post by Smartelectronix - Bram de Jong
Before you ask: it uses dynamic recompilation as that is the only way to
effeciently handle
1-sample processing calls within structures with many feedback loops and few
delays.

I should have probably added 'IMHO' in here somewhere!! ;-))

- bram

Urs Heckmann

2003-02-20 14:52:01 UTC

Permalink

Post by David BrÃ¤nnvall
Does the commecial modular softsynths (like VAZ, Reaktor, what else is
there?) handle circular connections correctly, ie only one sample
delay, or do they process samples in blocks?

I don't know how these handle it, but even 1 sample is a delay, huh?

I'm working on a similar problem with my current Synth development.
That's why I asked some people what they actually expect to see in a
modular architecture and what dirty tricks they usually do with their
modular hardware synths.

The main "problem application" seems to be modulation at audio rate,
i.e. by side chains or OSC-VCF and very complex modulation
combinations. There had been about no request that required true
circular cabeling.

The least they want to see are "funny 3d shadowed virtual cables" that
hang around and clutter your screen.

I think it all depends on the application. If circular connections are
required, it may be not so bad to have some kind of delay, like
building a virtual I/O inside a comb filter.

Processing everything per sample would be too painful IMHO.

HTH,

;) Urs

urs heckmann
***@u-he.com
www.u-he.com

Martin Eisenberg

2003-02-20 16:33:00 UTC

Permalink

I think this is getting on-topic...

Post by Urs Heckmann

Post by David BrÃ¤nnvall
Does the commecial modular softsynths (like VAZ, Reaktor, what
else is there?) handle circular connections correctly, ie only one
sample delay, or do they process samples in blocks?

Now, I don't own an analog modular system. I've never even touched
one. But feedback in all the impossible places would definitely be the
first thing I'd try out ;)

Post by Urs Heckmann
The main "problem application" seems to be modulation at audio
rate, i.e. by side chains or OSC-VCF and very complex modulation
combinations. There had been about no request that required true
circular cabeling.
The least they want to see are "funny 3d shadowed virtual cables"
that hang around and clutter your screen.
I think it all depends on the application. If circular connections are
required, it may be not so bad to have some kind of delay, like
building a virtual I/O inside a comb filter.

There, a delay already exists that could be used to ease computation.
But that's not always the case. For instance, I'd love to be able to
close a loop around a waveshaper; it is clear, though, that the loop
delay would have a crucial influence -- and it'd better be tight!
(Well, as tight as it gets when I put a filter in the feedback path,
but that is the same thing in analog.)

Post by Urs Heckmann
Processing everything per sample would be too painful IMHO.

I've leisurely thought about the issue some time ago. I came to
imagine that the buffering delay might be a property of the connection
that closes a loop. That way, the user could decide if delayed
feedback suits them, or on what to burn their cycles. OTOH, the flow
graph would be "segmented" into parts with different block sizes, and
the adaptation of those might introduce latency. What do you think?

Martin

David Olofson

2003-02-20 16:56:01 UTC

Permalink

On Thursday 20 February 2003 19.36, Martin Eisenberg wrote:
[...]

Post by Martin Eisenberg

Post by Urs Heckmann
Processing everything per sample would be too painful IMHO.

I've leisurely thought about the issue some time ago. I came to
imagine that the buffering delay might be a property of the
connection that closes a loop. That way, the user could decide if
delayed feedback suits them, or on what to burn their cycles. OTOH,
the flow graph would be "segmented" into parts with different block
sizes, and the adaptation of those might introduce latency. What do
you think?

If you keep all blocks at power-of-two sizes, a low latency "segment"
would fit nicely inte the graph without circular buffers or anything.
Though, you'd have to insert a real delay unit in the feedback line,
to get the right latency. (You'd probably want one anyway, to get
sub-sample accurate feedback latency control.)

//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`-----------------------------------> http://audiality.org -'
--- http://olofson.net --- http://www.reologica.se ---

Martin Eisenberg

2003-02-20 20:21:00 UTC

Permalink

Post by David Olofson
Though, you'd have to insert a real delay unit in the
feedback line, to get the right latency.

I'm afraid I don't get that. The feedback latency would be user-set
(and would of course be implemented by a "real" delay line ??), while
the throughput latency of any segment would be arbitrary. Maybe I
misread your previous statement?--

Post by David Olofson
If you keep all blocks at power-of-two sizes, a low
latency "segment" would fit nicely inte the graph
without circular buffers or anything.

...because at segment boundaries, blocks could be split and
recollected without remainder. Is that what you're saying?

Post by David Olofson
(You'd probably want one anyway, to get sub-sample
accurate feedback latency control.)

Nice idea!

Martin

David Olofson

2003-02-20 20:41:00 UTC

Permalink

Post by Martin Eisenberg

Post by David Olofson
Though, you'd have to insert a real delay unit in the
feedback line, to get the right latency.

I'm afraid I don't get that. The feedback latency would be user-set
(and would of course be implemented by a "real" delay line ??),
while the throughput latency of any segment would be arbitrary.
Maybe I misread your previous statement?--

Yes, the feedback latency would be set by the user, and the "local"
block size for the feedback loop would be set accordingly; that's the
basic idea.

However, if we want to use power-of-two block sizes only, we have to
round the exact latency value down to the nearest corresponding
power-of-two block size. Then, to get the right feedback latency, we
insert a delay line set to (exact_delay - block_delay).

Post by Martin Eisenberg

Post by David Olofson
If you keep all blocks at power-of-two sizes, a low
latency "segment" would fit nicely inte the graph
without circular buffers or anything.

...because at segment boundaries, blocks could be split and
recollected without remainder. Is that what you're saying?

Exactly.

[...]

//David Olofson - Programmer, Composer, Open Source Advocate

.- The Return of Audiality! --------------------------------.
| Free/Open Source Audio Engine for use in Games or Studio. |
| RT and off-line synth. Scripting. Sample accurate timing. |
`-----------------------------------> http://audiality.org -'
--- http://olofson.net --- http://www.reologica.se ---

Smartelectronix - Bram de Jong

2003-02-20 17:16:00 UTC

Permalink

Post by Martin Eisenberg
I've leisurely thought about the issue some time ago. I came to
imagine that the buffering delay might be a property of the connection
that closes a loop. That way, the user could decide if delayed
feedback suits them, or on what to burn their cycles. OTOH, the flow
graph would be "segmented" into parts with different block sizes, and
the adaptation of those might introduce latency. What do you think?

Yup, this -imho- is the right way to do it!
Cup your graph in parts that have feedback and parts
that don't. :-) Finding the cycles isn't even difficult.

Then locate that parts where you'll add the 1-sample delay.
Calculate samples inside the loop starting from the module
with the 1-sample-delay.

If a module happens to require data from a non-cycle module
just process those first.

goto start ;-)

But, still, if you've got multiple feedback loops -or say, you want to
model filters in your modular application like in Sync- you'll still
have a LOT of CPU-consumption due to all the reasons mentioned before.

So, the BEST way -imho- is to dynamicaly recompile the
'cycle'-containing-blocks
and keep the rest in buffer-mode (say, 32-samples at a time?)

Damn you people for spoiling what I want to research for my thesis!! ;-PP

cheers,

- bram

Martin Eisenberg

2003-02-20 21:28:08 UTC

Permalink

Post by Smartelectronix - Bram de Jong
So, the BEST way -imho- is to dynamicaly recompile the
'cycle'-containing-blocks
and keep the rest in buffer-mode (say, 32-samples at a time?)
Damn you people for spoiling what I want to research for my
thesis!! ;-PP

LOL! As I said, just leisure thoughts :)

But seriously, that sounds like a very interesting topic. How are you
going to do that dynamic recompilation? Rip GCC apart, or write your
own "reduced-C" compiler? http://softwire.sourceforge.net might be of
interest to you.

Martin

Ross Bencina

2003-02-21 16:11:00 UTC

Permalink

Post by Smartelectronix - Bram de Jong
Yup, this -imho- is the right way to do it!
Cup your graph in parts that have feedback and parts
that don't. :-) Finding the cycles isn't even difficult.
Then locate that parts where you'll add the 1-sample delay.
Calculate samples inside the loop starting from the module
with the 1-sample-delay.

[snip]

Post by Smartelectronix - Bram de Jong
Damn you people for spoiling what I want to research for my thesis!! ;-PP

Well, this kind of theory is the basis of vectorizing compilers. They can
take arbitrary code and work out which loops have "carried dependences" (in
other words, feedback). A loop without carried dependences can be vectorized
directly, and some loops with carried dependences can be transformed and
vectorised (a one sample delay in an FIR filter can still be vectorised, in
an IIR it can't).

I just finished reading the following book, which provides coverage of the
program analysis techniques required to perform these transformations to
arbitrary code. As you imply, synthesis graphs are simpler because they
don't (usually) contain control flow.

http://www.amazon.com/exec/obidos/tg/detail/-/1558602860/103-6988377-7070232
?vi=glance

Post by Smartelectronix - Bram de Jong

Post by Jens Groh
Regarding run-time recompilation: Wouldn't it be sufficient to

reASSEMBLE or even just reLINK dynamically? A complete compiler seems quite
heavy to me.

Post by Smartelectronix - Bram de Jong
No, because you want to be able to inline the source to build the most
efficient binary.
We look at this on the linux-audio-developers list a few months back, and
IIRC SAOL does something similar.

Yes, SAOL supports 1-sample delays - the syntax looks a bit like CSound, but
any feedback loops must be evaluated using single sample delays. It also has
other constructs that require sample-by-sample execution such as audio-rate
control flow (conditionals and loops). This means that the compiler should
perform the transformations described by bram above to determine which parts
of the code can be vectorised (executed block-wise like in CSound) and which
need to be executed sample-by-sample.

One implementation of SAOL (SAINT by Studer) compiles SAOL for a virtual
machine that has instructions supporting both block-wise and single sample
versions of all operators and opcodes - the compiler selects between
block-wise and single sample execution based on the presence of cycles in
the data dependence graph, as discussed above.

Best wishes,

Ross.

Mike Berry

2003-02-20 16:49:00 UTC

Permalink

When I wrote GrainWave a number of years ago, I did 1 sample processing
for feedback loops. The basic method was to abstract the loop as its own
unit generator. I would process a buffer until I hit the loop wrapper
object. That object would then call through the loop with 1 sample
sub-buffers untill everything had been processed. Then it would pass the
completed buffer to the next un-looped section. This even allowed
nesting of feedback loops.

Mike

Post by Urs Heckmann

I don't know how these handle it, but even 1 sample is a delay, huh?
I'm working on a similar problem with my current Synth development.
That's why I asked some people what they actually expect to see in a
modular architecture and what dirty tricks they usually do with their
modular hardware synths.
The main "problem application" seems to be modulation at audio rate,
i.e. by side chains or OSC-VCF and very complex modulation combinations.
There had been about no request that required true circular cabeling.
The least they want to see are "funny 3d shadowed virtual cables" that
hang around and clutter your screen.
I think it all depends on the application. If circular connections are
required, it may be not so bad to have some kind of delay, like building
a virtual I/O inside a comb filter.
Processing everything per sample would be too painful IMHO.
HTH,
;) Urs
urs heckmann
www.u-he.com
dupswapdrop -- the music-dsp mailing list and website: subscription
info, FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
http://aulos.calarts.edu/mailman/listinfo/music-dsp

Michael Gogins

2003-02-20 16:15:01 UTC

Permalink

I don't know about the commercial ones. I do know about Csound, PD, and
others of that ilk. PD does blocks of 128 frames, Csound does blocks of 1 to
N frames. I know from writing this kind of code myself that although with
contemporary PCs you can get some voices out in real time with 1 sample
frame at a tick, efficiency rises drastically if the block size is increaed
to 10 or more. Presumably, this is mostly because that allows the CPU to
keep code in on-chip cache, instead of shuffling code in and out of cache as
the stack frame moves up and down. The actual code, of course, is simpler
and performs fewer operations at 1 sample frame per tick because there is 1
inner loop, not 2 inner loops.

Csound has workarounds for the 1 sample delay, and perhaps the commercial
software does as well. The workaround, of course, would be to have some
object that is moving data at the 1 frame rate, e.g. in a shared table with
position indexes, inside the block processing functions.

----- Original Message -----
From: "David Brännvall" <***@brannvall.net>
To: <music-***@aulos.calarts.edu>
Sent: Thursday, February 20, 2003 10:52 AM
Subject: [music-dsp] [OT] Modular design and feedback

Post by David BrÃ¤nnvall
I am working on a modular dsp engine!
Does the commecial modular softsynths (like VAZ, Reaktor, what else is

there?) handle circular connections correctly, ie only one sample delay, or
do they process samples in blocks?

Post by David BrÃ¤nnvall
regards,
David
dupswapdrop -- the music-dsp mailing list and website: subscription info,

FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/

Post by David BrÃ¤nnvall
http://aulos.calarts.edu/mailman/listinfo/music-dsp

yon

2003-02-20 16:44:01 UTC

Permalink

Post by Michael Gogins
Presumably, this is mostly because that allows the CPU to
keep code in on-chip cache, instead of shuffling code in and out of
cache as
the stack frame moves up and down. The actual code, of course, is
simpler
and performs fewer operations at 1 sample frame per tick because there
is 1
inner loop, not 2 inner loops.

Another important performance effect is pipelining.
Processing more than one sample at a time makes
it possible to interleave instructions without local
dependencies, which makes much more efficient
use of the pipeline.

Thiago Born

2003-02-20 19:34:01 UTC

Permalink

David_Brännvall <***@brannvall.net> wrote:
I am working on a modular dsp engine!

Does the commecial modular softsynths (like VAZ, Reaktor, what else is there?) handle circular connections correctly, ie only one sample delay, or do they process samples in blocks?

regards,
David

I have made a modular script language that "compiles" as a VST/VSTi plugin..... itŽs called audioBox .... I have handled feedback with 1 sample delay.... and everything seems to work fine to me ..... just a performance insue... in this design the process function is called every sample :(

Thiago Born

New Born Music Plugins - www.get-me.to/nbm

Banda Tsunami - www.tsunamifloripa.hpg.com.br

NBM - Studios / 0 xx (48) 228-1098 / 0 xx (48) 9952-9875

---------------------------------
Busca Yahoo!
O serviço de busca mais completo da Internet. O que você pensar o Yahoo! encontra.

Urs Heckmann

2003-02-20 20:56:00 UTC

Permalink

Am Donnerstag, 20.02.03, um 19:36 Uhr (Europe/Berlin) schrieb Martin

Post by Martin Eisenberg
Now, I don't own an analog modular system. I've never even touched
one. But feedback in all the impossible places would definitely be the
first thing I'd try out ;)

Yeah, but it all depends. Your waveshaper example would be a feedback
around a single unit. That'd be no problem at all. The main problem
would be a feedback around a huge series of modules. Some modules like
filters would already introduce some kind of delay (like phase shifts)
which you can't reasonably compensate for in the digital domain (or you
run out of cpu power :-). He he, I'm a big friend of block processing
'cause it offers sheer performance advantages, like keeping
coefficients in registers while processing a filter over a couple of
samples...

Another big issue is that analog systems can easily deal with
parallelism and can have about no delay within a single module. Look at
Jürgen Michaelis' Neuron (www.jayemsonic.de). - There are six resonant
filters which all modulate each other simultaneously. I've seen and
heard this bastard. I doubt that this can be done in the digital domain
within reasonable cpu consumption, if it can be done in realtime at all.

Cheers,

;) Urs

Martin Eisenberg

2003-02-21 00:12:00 UTC

Permalink

Post by Urs Heckmann
Yeah, but it all depends. Your waveshaper example would be a
feedback around a single unit. That'd be no problem at all. The
main problem would be a feedback around a huge series of
modules.

My waveshaper was meant as a generic nonlinearity example. A chain of
pickup, tube amp, and PA stack simulations would have been a better
example.

Besides, a "single unit" in a modular app may actually be non-atomic
and contain any amount of history, so handling this case but not
others would be kind of pointless from my POV.

Post by Urs Heckmann
He he, I'm a big friend of block processing 'cause it offers sheer
performance advantages, like keeping coefficients in registers
while processing a filter over a couple of samples...

I'm aware of these points. But there are the people who want to
friggin' *wreck* their CPU fan for a higher cause ;) And my very own
modular system will allow that. Not that it'll come into existence
anytime soon -- but I'm already thinking it through now and then,
which is why I entered this thread.

Post by Urs Heckmann
Another big issue is that analog systems can easily deal with
parallelism and can have about no delay within a single module.
Look at Jürgen Michaelis' Neuron (www.jayemsonic.de).

That will stay pretty impressive for some time to come, yes. Isn't RL
wonderful?

Martin

Smartelectronix - Bram de Jong

2003-02-21 10:07:01 UTC

Permalink

Post by Martin Eisenberg

If you've played with Sync modular in your life you should know the
advantages.
Sync allows you to go "as deep" as you want, and is MORE CPU FRIENDLY than
any other
modular host all the while doing sample-based processing (in some parts??).
Even the FILTERS in Sync are modeled using one-sample-feedback delays!!

Imho this is a seriously superb combination!

But then Dr Sync got bought out by NI who promised to incorporate Syncs
wicked
system. That's been like what? 3 years ago? Booooh!! I've mailed Dr Sync
about
this, but obviously tons of NDA's prevent him from making any kind of
statement.

I've been told Max/MSP uses the same technique, but I'm not sure if Max
uses dynamic recompilation....

Imho you could take the system even further: make it so that in any "GUI"
module
you can actualy SEE the source code involved (if wanted). Then you'd be able
to go
even one step deeper, IF you wanted.
And you'd get the advantages of coding loops inside modules which is
difficult to handle
with GUI-based modulars.

This of course calls for a very 'open' architecture as your user would be
able to see the
DSP methods involved. But, as musicdsp.org's keeper I can only like that
;-)))))

cheers,

- bram

Martin Eisenberg

2003-02-21 10:13:00 UTC

Permalink

Post by David Olofson
Then, to get the right feedback latency, we
insert a delay line set to (exact_delay - block_delay).

Crystal clear, thanks. Don't know what I was thinking :|

Martin

Cesare Ferrari

2003-02-20 21:24:01 UTC

Permalink

Post by Urs Heckmann
Am Donnerstag, 20.02.03, um 19:36 Uhr (Europe/Berlin) schrieb Martin

Post by Martin Eisenberg
Now, I don't own an analog modular system. I've never even touched
one. But feedback in all the impossible places would definitely be the
first thing I'd try out ;)

Yeah, but it all depends. Your waveshaper example would be a feedback
around a single unit. That'd be no problem at all. The main problem
would be a feedback around a huge series of modules. Some modules like
filters would already introduce some kind of delay (like phase shifts)
which you can't reasonably compensate for in the digital domain (or you
run out of cpu power :-). He he, I'm a big friend of block processing
'cause it offers sheer performance advantages, like keeping
coefficients in registers while processing a filter over a couple of
samples...
Another big issue is that analog systems can easily deal with
parallelism and can have about no delay within a single module. Look at
Jürgen Michaelis' Neuron (www.jayemsonic.de). - There are six resonant
filters which all modulate each other simultaneously. I've seen and
heard this bastard. I doubt that this can be done in the digital domain
within reasonable cpu consumption, if it can be done in realtime at all.

With a typical digital modular, it is like you have a sample/hold on the
output stage of all your analogue modules, and you clock the signals through
the components synchronously with a clock (at the sample rate). In effect
you get a single sample delay around components when implementing direct
feedback.

Now i'm all in favour of you being able to do this, but the effect of it is
that changes to the sample rate will radically affect the sound (say moving
from 44.1 to 48khz) since the feedback paths now appear at different
frequencies. Without massively oversampling the system I don't think there
is a simple digital solution to this.

In my exploration of software modular systems i've stuck to no feedback
paths around the components, so you end up with a DAG.

Cesare

Jan Marguc

2003-02-20 22:33:01 UTC

Permalink

Post by Cesare Ferrari
In my exploration of software modular systems i've stuck to no feedback
paths around the components, so you end up with a DAG.
Cesare

I agree with Cesare. As intriguing as the thought of having one-sample delay
and recursion in your graph may be, it is still a question if it's really
worth basing your whole modular design on that concept. If I were to make a
modular synth, I'd for the general design go for the tree-based approach
(that is, the graph has no cycles) and then if need arose, create a special
module within that graph, which allows the user to create those really
low-level synth structures in a special editor -- of course using the
dynamic recompilation approach that Bram mentioned. You wouldn't even need
to write your own mini-assembler to do that.

I did such a modular synth, and I simply created one assembler-file and used
labels to determine the start and end offsets of the code segments of my
unit generator functions. These offsets were easily stored in a table along
with other information characterizing the module such as the number of
inputs, the number of outputs. An extra number described the num ber of
inputs that were directly passed to the code via. the FPU registers, while
for the others the general convention was, that inputs were read from e.g.
the [edx+xxx] memory while output values were written to the [edi+xxx]
memory. Generally storing all output values while adding support for input
values passed directly via. FPU registers made sense to me, since most of my
modules had in general more inputs than outputs. Indeed, typical signal
graphs have many leafs at the top that process data downwards to a single
output. Also, having the outputs automatically stored in [edi+xxx] I didn't
have to worry about introducing extra unit-delay buffers for recursive loops
;-)

I did some basic experiments with the code such as sine-oscillators,
bass-drums, decaying noise and the simplest waveguide model. The code is not
very readable, since the graphs are converted into special opcodes that are
used for a just-in-time compiler before the code is assembled. I did that
because I wanted to see how small i could get the JIT compiler, so that it
could be used in e.g. 4K intros and the like, but lost interest in it and
started to focus on fixed synth architectures and most of all better sound
quality ;-)

However, if anyone of you would like to have the code (windoze), just drop
me a mail.

Jan

Urs Heckmann

2003-02-20 22:14:00 UTC

Permalink

Am Freitag, 21.02.03, um 00:29 Uhr (Europe/Berlin) schrieb Martin

Post by Martin Eisenberg
But seriously, that sounds like a very interesting topic. How are you
going to do that dynamic recompilation?

Maybe just creating a new code block by copying small code snippets
from a code library and referencing it with a function pointer? That
way you could avoid a lot of conditional branches...

;) Urs

joshua reich

2003-02-20 23:55:01 UTC

Permalink

Id love to see the code.

When doing some demo-coding with woorlic, we were playing with doing
similar stuff with assembler shaders, and high level (C++) code that would
render the polygons, using jit 'compiled' shaders.

Post by Jan Marguc
However, if anyone of you would like to have the code (windoze), just drop
me a mail.
Jan

--
joshua reich

***@i2pi.com
Ph: +61 (0) 3 9415 9557
Mb: +61 (0) 408 355 788

Jan Marguc

2003-02-25 16:44:01 UTC

Permalink

I've uploaded the code for my modular "just-in-time reassembling" synth to
my homepage at www.kampsax.dtu.dk/~jm/jit_asm.zip

It is far from finished and as you can see clearly written with size
constraints (of the resulting executable program) in mind rather than speed
or portability of the generated code. Still, it supports quite a few unit
generators performing common mathematical operations, z^-1 delay and the
like.
If motivation comes back to me sometime, I will add macros, because I think
this is really missing in this implementation.

You will need MSVC and MASM to compile it. An example "tiny" build of the
synth's core is also included and compiles into a 1970 bytes (*) very
strange noise generating .com file. You will need the Netwide Assembler
(NASM) to compile the .com file.

See the included readme.txt for more details...

Bram: you might submit this to the musicdsp archives if you want. Couldn't
figure out how to attach a zip file ;-)

Jan

(*) 1970 bytes is quite much in the context of 4k intros, but I was lazy
here and wrote the main program in C (audio output using DirectSound) and
just called the assembler functions from the C file. I believe that the file
size could be a lot smaller if everything was written in assembler.

Post by joshua reich
Id love to see the code.
When doing some demo-coding with woorlic, we were playing with doing
similar stuff with assembler shaders, and high level (C++) code that would
render the polygons, using jit 'compiled' shaders.

Post by Jan Marguc
However, if anyone of you would like to have the code (windoze), just drop
me a mail.
Jan

--
joshua reich

Alexey Menshikov

2003-02-28 09:30:00 UTC

Permalink

Just Found nice Image of the DAY
http://www.flipcode.com/cgi-bin/msg.cgi?showThread=COTD-runtimei&forum=cotd&id=-1

Alexey Menshikov

Tuesday, February 25, 2003, 8:48:32 PM, you wrote:
JM> I've uploaded the code for my modular "just-in-time reassembling" synth to
JM> my homepage at www.kampsax.dtu.dk/~jm/jit_asm.zip

Urs Heckmann

2003-02-21 08:59:01 UTC

Permalink

Am Freitag, 21.02.03, um 03:16 Uhr (Europe/Berlin) schrieb Martin

Post by Martin Eisenberg
My waveshaper was meant as a generic nonlinearity example. A chain of
pickup, tube amp, and PA stack simulations would have been a better
example.

Ah okay. I was thinking of atomized Waveshapers because my Synth has
atomized Waveshapers 8-)

Post by Martin Eisenberg
Besides, a "single unit" in a modular app may actually be non-atomic
and contain any amount of history, so handling this case but not
others would be kind of pointless from my POV.

<snip>

Post by Martin Eisenberg
But there are the people who want to
friggin' *wreck* their CPU fan for a higher cause ;)

Yeah, but in that case you can go with the 1 sample delay and try to
compensate for that. But for "no delay" you'll have to dynamically
scribble up the transfer function for your whole system an try to get a
z-Transform managed 8-))

I for my needs am very happy with a discrete structure that is still
capabal of playing tens of voices on my ancient machine 8-))

Post by Martin Eisenberg

Post by Urs Heckmann
Look at Jürgen Michaelis' Neuron (www.jayemsonic.de).

That will stay pretty impressive for some time to come, yes. Isn't RL
wonderful?

Yeah, sure, but what is RL?

;) Urs

Martin Eisenberg

2003-02-21 10:13:15 UTC

Permalink

Post by Urs Heckmann
I for my needs am very happy with a discrete structure that is
still capabal of playing tens of voices on my ancient machine 8-))

I'm not trying to talk you into anything... :)

Post by Urs Heckmann
Yeah, sure, but what is RL?

Real Life, as usual.

Martin

Jens Groh

2003-02-21 10:13:22 UTC

Permalink

Regarding run-time recompilation: Wouldn't it be sufficient to reASSEMBLE or even just reLINK dynamically? A complete compiler seems quite heavy to me.

Regards,
Jens Groh

Steve Harris

2003-02-21 10:30:00 UTC

Permalink

Post by Jens Groh
Regarding run-time recompilation: Wouldn't it be sufficient to reASSEMBLE or even just reLINK dynamically? A complete compiler seems quite heavy to me.

No, because you want to be able to inline the source to build the most
efficient binary.

We look at this on the linux-audio-developers list a few months back, and
IIRC SAOL does something similar.

I even prototyped up a test system using gcc and perl, inspired by Sync
Modular: http://plugin.org.uk/blockless/

The lowest level objects are .c files and everything else is made up from
graphs of them. It was quite efficient, but I cant remeber the numbers.

- Steve

Michael Gogins

2003-02-21 11:17:00 UTC

Permalink

This would depend upon the degree of run-time optimization that was
implemented. If dynamically relinked blocks were optimized when they were
originally compiled, I think performance could be quite good - probably on
the order of a decent but not great C++ compiler. If somehow optimization
could be done at run-time across the entire graph of dynamically linked
blocks, that would give, I think, a significant increase in performance. But
doing this would take a lot of expert programming.

----- Original Message -----
From: "Jens Groh" <***@irt.de>
To: <music-***@aulos.calarts.edu>
Sent: Friday, February 21, 2003 7:13 AM
Subject: Re: [music-dsp] [OT] Modular design and feedback

Post by Jens Groh
Regarding run-time recompilation: Wouldn't it be sufficient to reASSEMBLE

or even just reLINK dynamically? A complete compiler seems quite heavy to
me.

Post by Jens Groh
Regards,
Jens Groh
dupswapdrop -- the music-dsp mailing list and website: subscription info,

FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/

Post by Jens Groh
http://aulos.calarts.edu/mailman/listinfo/music-dsp

Chun-Yu Shei

2003-02-21 11:59:00 UTC

Permalink

What about using .NET "reflection" to output MSIL code at runtime? The
.NET JIT performs runtime optimizations like inlining and all that good
stuff. If only the JIT could handle floating-point stuff better...then
.NET might be pretty neat for DSP stuff. It seems to be quite good at
integer stuff, but it's not very fast when it comes to floating point.

- Chun-Yu

Post by Michael Gogins
This would depend upon the degree of run-time optimization that was
implemented. If dynamically relinked blocks were optimized when they were
originally compiled, I think performance could be quite good - probably on
the order of a decent but not great C++ compiler. If somehow optimization
could be done at run-time across the entire graph of dynamically linked
blocks, that would give, I think, a significant increase in performance. But
doing this would take a lot of expert programming.
----- Original Message -----
Sent: Friday, February 21, 2003 7:13 AM
Subject: Re: [music-dsp] [OT] Modular design and feedback

Post by Jens Groh
Regarding run-time recompilation: Wouldn't it be sufficient to reASSEMBLE

or even just reLINK dynamically? A complete compiler seems quite heavy to
me.

Post by Jens Groh
Regards,
Jens Groh
dupswapdrop -- the music-dsp mailing list and website: subscription info,

FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/

Post by Jens Groh
http://aulos.calarts.edu/mailman/listinfo/music-dsp

dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://shoko.calarts.edu/musicdsp/
http://aulos.calarts.edu/mailman/listinfo/music-dsp

Michael Gogins

2003-02-21 12:58:01 UTC

Permalink

In my experience, which included some timing of similar algorithms coded in
different languages, .NET code runs somewhat faster than the Sun Java
virtual machine, but the Sun JVM runs about 3 times slower than optimized
C++ code; I would guess that .NET code runs about 2.5 times slower than
optimized C++ code.

Also, and this should not be forgotten, using C/C++ opens the up the vast
cornucopia of existing, highly usable open source libraries such as
PortAudio, libsndfile, boost, Loris, FFTW, STK, iiwusynth, etc., etc., etc.,
almost all of which is written in C or C++.

----- Original Message -----
From: "Chun-Yu Shei" <***@cs.indiana.edu>
To: <music-***@aulos.calarts.edu>
Sent: Friday, February 21, 2003 8:58 AM
Subject: Re: [music-dsp] [OT] Modular design and feedback

Post by Chun-Yu Shei
What about using .NET "reflection" to output MSIL code at runtime? The
.NET JIT performs runtime optimizations like inlining and all that good
stuff. If only the JIT could handle floating-point stuff better...then
.NET might be pretty neat for DSP stuff. It seems to be quite good at
integer stuff, but it's not very fast when it comes to floating point.
- Chun-Yu

optimization

Post by Chun-Yu Shei

Post by Michael Gogins
could be done at run-time across the entire graph of dynamically linked
blocks, that would give, I think, a significant increase in performance. But
doing this would take a lot of expert programming.
----- Original Message -----
Sent: Friday, February 21, 2003 7:13 AM
Subject: Re: [music-dsp] [OT] Modular design and feedback

Post by Jens Groh
Regarding run-time recompilation: Wouldn't it be sufficient to reASSEMBLE

or even just reLINK dynamically? A complete compiler seems quite heavy to
me.

Post by Jens Groh
Regards,
Jens Groh
dupswapdrop -- the music-dsp mailing list and website: subscription info,

FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/

Post by Jens Groh
http://aulos.calarts.edu/mailman/listinfo/music-dsp

dupswapdrop -- the music-dsp mailing list and website: subscription

info, FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/

Post by Chun-Yu Shei

Post by Michael Gogins
http://aulos.calarts.edu/mailman/listinfo/music-dsp

dupswapdrop -- the music-dsp mailing list and website: subscription info,

FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/

Post by Chun-Yu Shei
http://aulos.calarts.edu/mailman/listinfo/music-dsp

Chun-Yu Shei

2003-02-21 13:42:00 UTC

Permalink

Well, I suppose all those open source libraries could be compiled using
Visual C++ .NET and used (but it might be a big mess). You might have to
convert each C++ into a "managed" C++ class that's garbage collected and
stuff.

- Chun-Yu

Post by Michael Gogins
In my experience, which included some timing of similar algorithms coded in
different languages, .NET code runs somewhat faster than the Sun Java
virtual machine, but the Sun JVM runs about 3 times slower than optimized
C++ code; I would guess that .NET code runs about 2.5 times slower than
optimized C++ code.
Also, and this should not be forgotten, using C/C++ opens the up the vast
cornucopia of existing, highly usable open source libraries such as
PortAudio, libsndfile, boost, Loris, FFTW, STK, iiwusynth, etc., etc., etc.,
almost all of which is written in C or C++.

Michael Gogins

2003-02-21 14:38:01 UTC

Permalink

I'm sorry, but conversion to managed C++ is sometimes, which is all too
often, a joke. No multiple inheritance, no templates. You can conditionally
compile your code in managed/unmanaged blocks, but that is extra trouble.

I am increasingly frustrated and disappointed by Microsoft. I am not
reflexively anti-Microsoft - I am grateful to them for providing an
affordable, reasonably reliable platform on which to begin working on real
computer music at home, at a time when I could not afford a Unix workstation
and Linux did not yet exist, at least not in a usable form. Furthermore,
they have radically improved all their software over time, and I find Visual
Studio by far the most congenial development environment in spite of
experience with others.

However, Microsoft has always lagged the ISO C++ standard, and their work
with .NET, while superior to Java in some respects, and supplied with a wide
variety of very well implemented libraries, is nevertheless sadly behind the
state of the art in languages and appears to be oriented towards corporate
developers who are not highly educated in computer science. For example, the
.NET framework includes "delegates" which, if they had been implemented more
efficiently, would have enabled a certain amount of real functional
programming to be done in .NET. But they just didn't bother (I have to
assume they know, because all the MS people I've met have seemed very
competent). Similarly, they didn't provide an abstract enough implemention
of the dynamic proxy idea, which is practically hard-wired into the remoting
framework, unlike Java's Proxy class, which can much more easily be used in
other contexts.

Also, I have come to appreciate the benefits of open source, directly as a
result of my experience in computer music. I have become convinced (I sure
didn't start out this way!) that software is like academic research, not
like art works or inventions, and should be open to free redistribution in
the same way. This of course is opposed in many ways by Microsoft.

I would not have been able to make more than a few licks of music, and the
software I would have been able to make would have been feeble in
comparison, without the open source libraries that I referred to in my post.
And that's even assuming I could have afforded to pay licensing fees for
alternatives (Numerical Recipes, Intel math kernel, and so on).

ISO C++ remains a seriously advanced programming language, especially taking
recent developments in template metaprogramming into account. Also,
languages such as Ocaml and other current functional programming languages
are much more advanced than .NET.

----- Original Message -----
From: "Chun-Yu Shei" <***@cs.indiana.edu>
To: <music-***@aulos.calarts.edu>
Sent: Friday, February 21, 2003 10:41 AM
Subject: Re: [music-dsp] [OT] Modular design and feedback

Post by Chun-Yu Shei
Well, I suppose all those open source libraries could be compiled using
Visual C++ .NET and used (but it might be a big mess). You might have to
convert each C++ into a "managed" C++ class that's garbage collected and
stuff.
- Chun-Yu

dupswapdrop -- the music-dsp mailing list and website: subscription info,

FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/

Post by Chun-Yu Shei
http://aulos.calarts.edu/mailman/listinfo/music-dsp

Urs Heckmann

2003-02-21 10:25:01 UTC

Permalink

Am Freitag, 21.02.03, um 13:06 Uhr (Europe/Berlin) schrieb

Post by Smartelectronix - Bram de Jong
If you've played with Sync modular in your life you should know the
advantages.

Does it exist for OS X or Mac at all? Where can I get information about
it?

;) Urs

Steve Harris

2003-02-21 10:33:00 UTC

Permalink

Post by Urs Heckmann

Post by Smartelectronix - Bram de Jong
If you've played with Sync modular in your life you should know the
advantages.

Does it exist for OS X or Mac at all? Where can I get information about
it?

http://www.mtu-net.ru/syncmodular/

Unfortunatly its windows only AFAIK. It is really good, but not good
enough to make me use windows.

- Steve

Smartelectronix - Bram de Jong

2003-02-21 12:33:00 UTC

Permalink

Post by Urs Heckmann
Am Freitag, 21.02.03, um 13:06 Uhr (Europe/Berlin) schrieb

Post by Smartelectronix - Bram de Jong
If you've played with Sync modular in your life you should know the
advantages.

Does it exist for OS X or Mac at all?

nope...

Post by Urs Heckmann
Where can I get information about it?

http://www.google.be/search?q=sync+modular&ie=UTF-8&oe=UTF-8&hl=nl&meta=

;-PP

- bram

Jérôme MONCEAUX

2003-02-21 12:40:01 UTC

Permalink

Hello,

I want to read ANSI IEC and DIN standards, I found it on ANSI website,
but it's very expansive.

Do you know where a compiled book/pdf of audio standards can be found or
buy?

Thanks

Jérôme

Angelo Farina

2003-02-23 07:46:00 UTC

Permalink

I suggest that You access these standard through the library of Your
University. Most technical universities are subscribers of on-line access to
international standards. for example, my one (University of Parma) is
subscriber of ASTM, ISO and UNI.
The University of Bologna (where I was previously) did subscribe for IEC,
ITU and CEN, etc...
Usually accessing these documents through these library is free if done for
research purpose.
Bye!

Angelo Farina
----- Original Message -----
From: "Jérôme MONCEAUX" <***@arkamys.com>
To: <music-***@aulos.calarts.edu>
Sent: Friday, February 21, 2003 3:38 PM
Subject: [music-dsp] ANSI DIN IEC

Hello,

I want to read ANSI IEC and DIN standards, I found it on ANSI website,
but it's very expansive.

Do you know where a compiled book/pdf of audio standards can be found or
buy?

Thanks

Jérôme

dupswapdrop -- the music-dsp mailing list and website: subscription info,
FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
http://aulos.calarts.edu/mailman/listinfo/music-dsp

Christopher Weare

2003-02-21 15:04:01 UTC

Permalink

It depends on your performance requirements. If you wish to be most
efficient then you need to dynamically compile. Your main whack to
performance comes from bouncing stack frames around as you jump
functions. Recompiling allows you to efficiently order your
instructions for each configuration.

-chris

-----Original Message-----
From: Jens Groh [mailto:***@irt.de]
Sent: Friday, February 21, 2003 4:14 AM
To: music-***@aulos.calarts.edu
Subject: Re: [music-dsp] [OT] Modular design and feedback

Regarding run-time recompilation: Wouldn't it be sufficient to
reASSEMBLE or even just reLINK dynamically? A complete compiler seems
quite heavy to me.

Regards,
Jens Groh

dupswapdrop -- the music-dsp mailing list and website: subscription
info, FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/
http://aulos.calarts.edu/mailman/listinfo/music-dsp

Sarah Thompson

2003-02-25 18:17:00 UTC

Permalink

If you've not come across it before and are interested in the idea of
generating specialised code for DSP (or other) purposes on the fly, you
might like to check out some of the work that has been carried out on
partial evaluation (sometimes also known as partial specialisation).

I suspect that PE could well do impressive things for this kind of
application, especially where coefficients can be fixed at compile time.

This book is a good place to start (quite an easy read):

\bibitem{jones93}
Neil~D. Jones, Carsten K. Gomard and Peter Sestoft,
\emph{Partial Evaluation and Automatic Program Generation},
Prentice Hall International, 1993

You can download it in PDF form from Peter Sestoft's web site:

http://www.dina.dk/~sestoft/pebook/pebook.html

Have fun,
Sarah

PS: If you do happen to try some of these techniques, please let me know. I
have a research interest in PE, although my own specialism is applying PE to
hardware design.

Ross Bencina

2003-02-25 19:19:00 UTC

Permalink

Hi Sarah

Thanks for the link, it looks tasty. I've spent a while thinking about PE
for music-dsp too. One specific application of that I think would be highly
fruitful is in rate-reducing calculations in block-wise synthesis by aplying
loop-invariant code motion. For example, consider the following wavetable
oscillator code template, which supports audio rate frequency modulation
(where sr is the sample rate):

float oscil_generate( Table& t, float& phase, float frequency )
{
float increment = (t.length / sr) * frequency;

float result = t[phase];
phase += increment;
while( phase < 0 )
phase += t.length;
while( phase >= t.length )
phase -= t.length;
}

A simple block-wise version of this code would be (where ksamps is the block
length or control period):

void oscil_generate( float *result, Table& t, float& phase, const float
*frequency )
{
for( int i=0; i<ksamps; ++i ){
float increment = (t.length / sr) * frequency[i];

result[i] = t[phase];
phase += increment;
while( phase < 0 )
phase += t.length;
while( phase >= t.length )
phase -= t.length;
}
}

Now, if frequency is known to only change at the control rate and is also
known to be non-negative, the following optimised version may be used:

void oscil_generate( float *result, Table& t, float& phase, const float
frequency )
{
float increment = (t.length / sr) * frequency;

for( int i=0; i<ksamps; ++i ){
result[i] = t[phase];
phase += increment;
while( phase >= t.length )
phase -= t.length;
}
}

It's not uncommon to see separate implementations of the above variations
available in software syntheis systems to handle this kind of optimisation,
but by applying PE techniques a compiler could produce the above
optimisation automatically. Of course PE could go further and optimise or
eliminate the assignment to increment if length, sr or frequency were known
at compile time.

One important observation that arises from this is that significant
performance benifits may be available by allowing PE to operate on the
internals of built-in unit generators, which are often treated as black
boxes by audio synthesis compilation systems.

Best wishes,

Ross.

Post by Sarah Thompson
If you've not come across it before and are interested in the idea of
generating specialised code for DSP (or other) purposes on the fly, you
might like to check out some of the work that has been carried out on
partial evaluation (sometimes also known as partial specialisation).
I suspect that PE could well do impressive things for this kind of
application, especially where coefficients can be fixed at compile time.
\bibitem{jones93}
Neil~D. Jones, Carsten K. Gomard and Peter Sestoft,
\emph{Partial Evaluation and Automatic Program Generation},
Prentice Hall International, 1993
http://www.dina.dk/~sestoft/pebook/pebook.html
Have fun,
Sarah
PS: If you do happen to try some of these techniques, please let me know. I
have a research interest in PE, although my own specialism is applying PE to
hardware design.
dupswapdrop -- the music-dsp mailing list and website: subscription info,

FAQ, source code archive, list archive, book reviews, dsp links
http://shoko.calarts.edu/musicdsp/

Post by Sarah Thompson
http://aulos.calarts.edu/mailman/listinfo/music-dsp

yon

2003-02-25 20:39:01 UTC

Permalink

Interesting stuff!

I have still trouble getting my head
around it. I gather that the biggest benefit would apply to
a modular system (returning to the original example)
which is highly fragmented, meaning composed of a
large number of relatively low-expense modules, because
if the modules are large, then the overhead for
patching together pre-compiled modules is presumably
outweighed by the benefit of having very good
offline compilers? is this correct? or is the end goal to
employ a full (reasonably) state of the art compiler in a
just-in-time fashion?

In reading what you wrote about this, i thought of another
method one could employ to improve performance in
such a (modular) system. the idea is, in addition to the
original set of modules, to precompile a collection
of subgraphs of modules according to some grammar.
i.e., rather than having compiled modules

"oscillator", "waveshaper", "filter", ..

you would compile a larger number of larger
composite modules

"oscillator->waveshaper",
"oscillator->filter",
"waveshaper->filter"
"osc->filter->waveshaper->filter"
etc

in the simplest case, you could precompile all
binary processor combinations. later you could
add to this the most common larger subgraphs
of modules, etc.

the drawback would be increased program size.

however, in some perhaps-attainable limit,
such a scheme otherwise seems like it possesses
the capacity to solve the original problem as well.

this isn't a new idea, because one has lots of constrained-
modularity systems which basically benefit from it it.
it does seem that one could reap most of the
benefits without imposing such constraints, however.

(and i'm guessing that in the world of computer
science someone has thought of this in more
general terms.)

yon

Ross Bencina

2003-02-25 21:43:00 UTC

Permalink

Post by yon
I have still trouble getting my head
around it. I gather that the biggest benefit would apply to
a modular system (returning to the original example)
which is highly fragmented, meaning composed of a
large number of relatively low-expense modules, because
if the modules are large, then the overhead for
patching together pre-compiled modules is presumably
outweighed by the benefit of having very good
offline compilers? is this correct?

I think that's correct yes. The "lots of processing in each modyle reduces
the relative overhead of dynamic patching" is the design assumption that's
behind many sofware synthesizers - "lots of processing" may mean modules
that do lots of suff internally (VST Plugins, AudioMulch contraptions), or
modules that process lots of samples for each dynamic call because they
process vectors of samples (CSound unit generators, max.msp and pd objects
etc), or both.

One of the things that was mentioned in the original thread was sample-by
sample computation so that single-sample feedback can be implemented. Given
this constraint it's difficult to devise modules that do a lot per dynamic
call.

Post by yon
or is the end goal to
employ a full (reasonably) state of the art compiler in a
just-in-time fashion?

That's a goal for some people. Personally I think there is a need for both
approaches. I could be wrong, but I think SuperCollider already does
just-in-time compilation to some extent, although possibly not PE.

Post by yon
In reading what you wrote about this, i thought of another
method one could employ to improve performance in
such a (modular) system. the idea is, in addition to the
original set of modules, to precompile a collection
of subgraphs of modules according to some grammar.
i.e., rather than having compiled modules
"oscillator", "waveshaper", "filter", ..
you would compile a larger number of larger
composite modules
"oscillator->waveshaper",
"oscillator->filter",
"waveshaper->filter"
"osc->filter->waveshaper->filter"
etc
in the simplest case, you could precompile all
binary processor combinations. later you could
add to this the most common larger subgraphs
of modules, etc.
the drawback would be increased program size.

This is, I guess, would fall under the category "generative programming".
You could analysie a corpus of existing programs in the target language to
work out the best set of precompiled combinations given a constraint on the
size of the virtual machine.

unknown

2003-02-26 05:20:01 UTC

Permalink

Post by Sarah Thompson
\bibitem{jones93}
Neil~D. Jones, Carsten K. Gomard and Peter Sestoft,
\emph{Partial Evaluation and Automatic Program Generation},
Prentice Hall International, 1993

Also, one source to look at for DSP specifically is Scott Draves'
thesis, which talks about partial evaluation for graphics and audio.
http://www-2.cs.cmu.edu/~spot/diss/main.html

--
Eli Brandt | eli+@cs.cmu.edu | http://www.cs.cmu.edu/~eli/
(finished Ph.D., woohoo; looking for good work in the Seattle area)

Sarah Thompson

2003-02-26 06:11:00 UTC

Permalink

Post by Ross Bencina
One of the things that was mentioned in the original thread was sample-by
sample computation so that single-sample feedback can be
implemented. Given
this constraint it's difficult to devise modules that do a lot per dynamic
call.

Indeed. The trick that PE does is to unroll loops (not necessarily all
loops, but certainly the important ones) and inline expand functions. This
generates very large code fragments that are easy to optimise well, e.g. by
data flow analysis. As a consequence, it is also much easier to spot
inherent parallelism in code, so targetting certain architectures becomes
easier to do efficiently. I'd expect significant speedups in some cases,
especially on VLIW or similar architectures.

It's not necessarily the case that PE generates huge executables - the
optimisation that falls out naturally often generates surprisingly little
output. As I mentioned in my previous post, my thing is PE for hardware,
specifically PE for hardware compilation. A few years ago (1992 if memory
serves me right), as an experiment I coded a very simple 8 bit
microprocessor whose purpose was to execute a program in ROM that generated
the Fibonacci series. This generated a circuit with about 1500 gates that
executed one instruction per clock cycle. As a wild idea, I tried to get my
hardware compiler (which used PE extensively, although I didn't realise it
at the time!) to execute the entire program in a single clock cycle. Rather
than a huge spaghetti monster of a circuit, all that came out was an
odd-looking counter that generated the fibonacci series directly - no trace
of a CPU remained, and the gate count was just 57. I literally fell off my
chair when I saw that. Since then, as PE has become more widely known (the
Jones, Gomard & Sestoft book would have been useful, but was published some
time after my research).

I think there is quite a lot of relevance to DSP here, especially because
some of the newer FPGAs support quite enormous gate counts and could even be
used, with hardware compilation and partial evaluation, to support high
performance signal processing in their own right. Targetting conventional
DSPs is also relevant, because some of the problems that are associated with
generaing code for DSPs are similar to some issues in logic synthesis.

Sarah