Discussion:
[LAD] cpu spikes
Fokke de Jong
2016-01-24 14:03:08 UTC
Permalink
Hi all,

This is my first post here. I’m note new to audio programming or linux, but I haven’t done much in terms of combining the two. Most of my audio programming has been on os x.

Currently working on some realtime convolution with lots of channels and low latency requirements, but I am running into some unexpected cpu-spikes and hope some of you might have an idea of possible causes.

I’m processing 32 sample-blocks at 48KHz but roughly every 0,6 seconds I get a large spike in cpu usage. This cannot possibly be explained by my algorithm, because the load should be pretty stable.

I am measuring cpu load by getting the time with clock_gettime(CLOCK_MONOTONIC_RAW, timespec*) at the beginning and end of each callback. When converted to a percentage my cpu load hovers somewhere between 40 an 50% most of the time, but more or less every 900 callbacks (0.8 seconds there is a spike of more than 100%.

I am not doing any IO, mallocing or anything else that could block. My threads are SCHED_FIFO with max priority (I have 4 threads on 4 cores).
The only explanation I can come up with is that my threads are somehow pre-empted even though there are realtime threads. Is that even possible? And is there a way to check this? Besides pre-emption maybe my caches are severely thrashed but i find that unlikely as it seems to happen on all 4 cores simultaneously.

I’m running (more or less default install, no additional services run-in) Linux Mint 17.3 with a 3.19.0-42-lowlatency kernel on a core i7-6700 with hyperthreading/turbo disabled.

I remember reading somewhere that realtime threads cannot run more than .95s every second. That would be very bad if it actually meant my threads are blocked run for a period of 50ms straight


Anyone have any thoughts on possible causes?

best,
Fokke
Fons Adriaensen
2016-01-24 15:13:44 UTC
Permalink
Post by Fokke de Jong
I remember reading somewhere that realtime threads cannot run
more than .95s every second. That would be very bad if it
actually meant my threads are blocked run for a period of 50ms
straight…
If your normal load is around 50% you shouldn't ever hit that limit.

Do you lock all memory used by your RT threads ?

If you don't and the system is configured for high swappiness
[1] this sort of thing could happen.

I'm routinely running big real-time convolution matrices without
problems, so it's certainly possible.


[1] <https://en.wikipedia.org/wiki/Swappiness>
--
FA

A world of exhaustive, reliable metadata would be an utopia.
It's also a pipe-dream, founded on self-delusion, nerd hubris
and hysterically inflated market opportunities. (Cory Doctorow)
Len Ovens
2016-01-24 15:13:44 UTC
Permalink
Post by Fokke de Jong
I’m processing 32 sample-blocks at 48KHz but roughly every 0,6 seconds I get a
large spike in cpu usage. This cannot possibly be explained by my algorithm,
because the load should be pretty stable. 
...
Post by Fokke de Jong
I’m running (more or less default install, no additional services run-in) Linux
Mint 17.3 with a 3.19.0-42-lowlatency kernel on a core i7-6700 with
hyperthreading/turbo disabled.
...
Post by Fokke de Jong
Anyone have any thoughts on possible causes?
Bad kernel driver? WIFI drivers are known bad for things like this. An
interupt driver can block if it is designed badly. I found on one machine
I had to unload the the kernel module for my wifi as it actually created
more problems when I turned the power off to the tx than when it was on.
(it seems to me on my wifi, when it was turned on I got xruns every 5
seconds, but with it turned off it was every half second or so... sounds
very close to 0.6, unloading the kernel module fixed it)

Cron should also be turned off, but that is probably not the problem here.
Cron runs super "nice" but there seem to be some things it does like
packge update that can cause problems too. I turn off cron while
recording.

--
Len Ovens
www.ovenwerks.net
Harry van Haaren
2016-01-24 15:24:08 UTC
Permalink
Post by Fokke de Jong
Currently working on some realtime convolution with lots of channels and
low latency requirements, but I am running into some unexpected cpu-spikes
and hope some of you might have an idea of possible causes.
Keep an eye on the interrupts while its all running, particularly
Non-maskable interrupts. Try to correlate them with the 0.6 sec
of the glitches if possible;

watch -n 0.1 cat /proc/interrupts

I've written up some of the checks I generally do, perhaps browse
that to see if there's anything there that you could check?
http://openavproductions.com/real-time-latency-tuning/

Thats all I can think of at the moment, -Harry
--
http://www.openavproductions.com
Christopher Arndt
2016-02-01 06:14:46 UTC
Permalink
Post by Harry van Haaren
I've written up some of the checks I generally do, perhaps browse
that to see if there's anything there that you could check?
http://openavproductions.com/real-time-latency-tuning/
I'm trying to follow that guide but I am stuck on how to find out, what
to put exactly into RTIRQ_NAME_LIST.

How do I find out which modules are for my USB soundcard and how to
distinguish them from the ones for the mainboard soundchip? I'm also not
sure which entries in /proc/interrupts belong to my sound card.

I have a Behringer UCA-222 and a M-Audio Fast Track Pro usb audio interface.


Chris
Ralf Mardorf
2016-02-01 08:15:47 UTC
Permalink
Post by Christopher Arndt
Post by Harry van Haaren
I've written up some of the checks I generally do, perhaps browse
that to see if there's anything there that you could check?
http://openavproductions.com/real-time-latency-tuning/
I'm trying to follow that guide but I am stuck on how to find out, what
to put exactly into RTIRQ_NAME_LIST.
How do I find out which modules are for my USB soundcard and how to
distinguish them from the ones for the mainboard soundchip? I'm also
not sure which entries in /proc/interrupts belong to my sound card.
I have a Behringer UCA-222 and a M-Audio Fast Track Pro usb audio interface.
RTIRQ_NAME_LIST="usb"

https://wiki.archlinux.org/index.php/Pro_Audio#M-Audio_Fast_Track_Pro
http://lmgtfy.com/?q=uca222+linux
Christopher Arndt
2016-02-01 08:26:28 UTC
Permalink
Post by Ralf Mardorf
RTIRQ_NAME_LIST="usb"
https://wiki.archlinux.org/index.php/Pro_Audio#M-Audio_Fast_Track_Pro
How does the latter lead to the former?
Post by Ralf Mardorf
http://lmgtfy.com/?q=uca222+linux
Oh dang, hadn't thought of that!

Thanks for nothing.

Chris
Ralf Mardorf
2016-02-01 08:48:03 UTC
Permalink
Post by Christopher Arndt
Post by Ralf Mardorf
RTIRQ_NAME_LIST="usb"
https://wiki.archlinux.org/index.php/Pro_Audio#M-Audio_Fast_Track_Pro
How does the latter lead to the former?
It latter does provide additional information regarding this card. The
former is what you need in your rtirq config.

More information can be found here:

https://help.ubuntu.com/community/UbuntuStudio/UsbAudioDevices
http://www.linux-usb.org/USB-guide/x319.html
http://wiki.linuxaudio.org/wiki/system_configuration#rtirq
Post by Christopher Arndt
Post by Ralf Mardorf
http://lmgtfy.com/?q=uca222+linux
Oh dang, hadn't thought of that!
Thanks for nothing.
So what information are you missing?

http://www.catb.org/esr/faqs/smart-questions.html
Ralf Mardorf
2016-02-01 09:18:21 UTC
Permalink
Post by Ralf Mardorf
Post by Christopher Arndt
Post by Ralf Mardorf
RTIRQ_NAME_LIST="usb"
https://wiki.archlinux.org/index.php/Pro_Audio#M-Audio_Fast_Track_Pro
How does the latter lead to the former?
It latter does provide additional information regarding this card. The
former is what you need in your rtirq config.
https://help.ubuntu.com/community/UbuntuStudio/UsbAudioDevices
http://www.linux-usb.org/USB-guide/x319.html
http://wiki.linuxaudio.org/wiki/system_configuration#rtirq
Post by Christopher Arndt
Post by Ralf Mardorf
http://lmgtfy.com/?q=uca222+linux
Oh dang, hadn't thought of that!
Thanks for nothing.
So what information are you missing?
http://www.catb.org/esr/faqs/smart-questions.html
IOW for detailed help the output of commands might be needed, depending
to what is unclear the it could be the output of lsusb,
cat /proc/interrupts, rtirq status, cat /proc/asound/cards,
tree /sys/bus/usb/drivers/usb/ etc..
Fokke de Jong
2016-02-08 10:07:16 UTC
Permalink
Hi all,

I just wanted to give you an update on my quest for super-low-latency.
I bit the bullet and compiled a rt-kernel.
And the result is super-stable audio with a period of 32 samples @48Khz (for a total roundtrip latency of 3.5ms).
With a (peak) cpu load hovering around 70% on all 4 cores I don’t get any xruns anymore. (even when running a bloated desktop that comes with mint :-)

I only tested it running for a few minutes, we’ll see what happens when running for a few hours, but I’m happy so far.
My conclusion is that the lowlatency kernel is nice for lowish latency with a load that isn’t too high, but it just doesn’t cut it for high demanding work.

I want to thank you all for your input, I have learned a lot.

cheers,
Fokke
Post by Ralf Mardorf
IOW for detailed help the output of commands might be needed, depending
to what is unclear the it could be the output of lsusb,
cat /proc/interrupts, rtirq status, cat /proc/asound/cards,
tree /sys/bus/usb/drivers/usb/ etc..
_______________________________________________
Linux-audio-dev mailing list
http://lists.linuxaudio.org/listinfo/linux-audio-dev
Harry van Haaren
2016-02-08 11:14:45 UTC
Permalink
Post by Fokke de Jong
(for a total roundtrip latency of 3.5ms).
Congrats, that's pretty solid! Would you share some details on what config
options you used, and which kernel it is?

I think it would be great if the Linux Audio community built up a list of
kernel config options that need changing for optimal audio performance.

Cheers, -Harry
--
http://www.openavproductions.com
Ralf Mardorf
2016-02-08 11:26:17 UTC
Permalink
Post by Harry van Haaren
I think it would be great if the Linux Audio community built up a list
of kernel config options that need changing for optimal audio
performance.
Indeed, the rt config became tricky a while ago, especially for AMD
based machines.
Jeremy Jongepier
2016-02-08 11:36:44 UTC
Permalink
Post by Harry van Haaren
I think it would be great if the Linux Audio community built up a list of
kernel config options that need changing for optimal audio performance.
+1. I guess everybody is just enabling PREEMPT_RT_FULL and that's about it.

Jeremy
Fokke de Jong
2016-02-10 08:53:32 UTC
Permalink
I used 3.18.25 from kernel.org, applied this patch: patch-3.18.25-rt23.patch.gz <https://www.kernel.org/pub/linux/kernel/projects/rt/3.18/patch-3.18.25-rt23.patch.gz>.
The only option is changed from the default was "Fully Preemptible Kernel (RT)”.
After that I had to edit kernel/locking/locktorture.c. an comment out the line

#include <linux/rwlock.h>

or it wouldn’t compile. (this part took me a while to figure out).

This reason I choose this kernel was to stay close the the kernel version already installed on my system (3.19.x). I have to admit, i know very little about different kernel versions


If I would do it again, i would probably start out with the low-latency config rather than the default, but i’m not sure it it would make a difference.
One caveat though, I haven’t tested this kernel very thoroughly, so i’m not even sure it won’t blow up in my face if I were to try and send an email or something :-)
And also, I don’t seem to get hardware video acceleration (I have a intel builtin gpu). I don’t case for this too much as i plan to run this project in console only


I’m not sure how much there is to be gained by turning off all kinds of features in the kernel, but i know that trying this out takes a lot of time, especially if you don’t really know what you’re doing :-)

cheers.
Fokke
Congrats, that's pretty solid! Would you share some details on what config options you used, and which kernel it is?
I think it would be great if the Linux Audio community built up a list of kernel config options that need changing for optimal audio performance.
Cheers, -Harry
--
http://www.openavproductions.com <http://www.openavproductions.com/>
Cedric Roux
2016-02-08 18:44:49 UTC
Permalink
Hi,
Post by Fokke de Jong
Hi all,
I just wanted to give you an update on my quest for super-low-latency.
I bit the bullet and compiled a rt-kernel.
call me noob but did you download a special kernel
or is it the mainline with just config set in it?
I ask because at work we do realtime processing
using a low latency kernel thrown by ub*ntu and,
well I should dig the web but just to know from
someone who did it...

Regards,
Cédric.
Ralf Mardorf
2016-02-08 18:51:42 UTC
Permalink
Post by Cedric Roux
Hi,
Post by Fokke de Jong
Hi all,
I just wanted to give you an update on my quest for
super-low-latency. I bit the bullet and compiled a rt-kernel.
call me noob but did you download a special kernel
or is it the mainline with just config set in it?
I ask because at work we do realtime processing
using a low latency kernel thrown by ub*ntu and,
well I should dig the web but just to know from
someone who did it...
It's the vanilla + the rt-patch + configuration.

https://www.kernel.org/pub/linux/kernel/projects/rt/
Ralf Mardorf
2016-02-08 19:11:10 UTC
Permalink
PS:

I forgot to mention, the lowlatency is a vanilla kernel with a special
configuration, but without a realtime related patch.

The default kernels of common distros should already provide some level
of soft realtime ability if you boot with the 'threadirqs' option, e.g.
by the /boot entry of a grub.cfg to something like 'ro' 'quiet' add
'threadirqs'.
Jeremy Jongepier
2016-02-08 20:31:08 UTC
Permalink
Post by Ralf Mardorf
I forgot to mention, the lowlatency is a vanilla kernel with a special
configuration, but without a realtime related patch.
Afaik the Ubuntu low-latency kernel doesn't really have a special
config, just CONFIG_PREEMPT and CONFIG_HZ_1000.

Jeremy
Ralf Mardorf
2016-02-09 03:43:15 UTC
Permalink
Post by Jeremy Jongepier
Post by Ralf Mardorf
I forgot to mention, the lowlatency is a vanilla kernel with a
special configuration, but without a realtime related patch.
Afaik the Ubuntu low-latency kernel doesn't really have a special
config, just CONFIG_PREEMPT and CONFIG_HZ_1000.
A comparison between a standard Arch and an Ubuntu lowlatency kernel of
a few other things that come to mind on the fly:

[***@archlinux ~]$ uname -r
4.4.1-2-ARCH
[***@archlinux ~]$ zgrep THREADING_DEFAULT /proc/config.gz
[***@archlinux ~]$ grep THREADING_DEFAULT /mnt/moonstudio/boot/config-4.2.0-27-lowlatency
CONFIG_IRQ_FORCED_THREADING_DEFAULT=y
[***@archlinux ~]$ zgrep Q_DEFAULT_GOV /proc/config.gz
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
[***@archlinux ~]$ grep -i Q_DEFAULT_GOV /mnt/moonstudio/boot/config-4.2.0-27-lowlatency
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
[***@archlinux ~]$ zgrep NO_HZ /proc/config.gz
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
[***@archlinux ~]$ grep NO_HZ /mnt/moonstudio/boot/config-4.2.0-27-lowlatency
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
Jonathan E. Brickman
2016-01-24 21:15:21 UTC
Permalink
Post by Fokke de Jong
Hi all,
This is my first post here. I’m note new to audio programming or
linux, but I haven’t done much in terms of combining the two. Most of
my audio programming has been on os x.
Currently working on some realtime convolution with lots of channels
and low latency requirements, but I am running into some unexpected
cpu-spikes and hope some of you might have an idea of possible causes.
I’m processing 32 sample-blocks at 48KHz but roughly every 0,6 seconds
I get a large spike in cpu usage. This cannot possibly be explained by
my algorithm, because the load should be pretty stable.
[...snip...]
I’m running (more or less default install, no additional services
run-in) Linux Mint 17.3 with a 3.19.0-42-lowlatency kernel on a core
i7-6700 with hyperthreading/turbo disabled.
I remember reading somewhere that realtime threads cannot run more
than .95s every second. That would be very bad if it actually meant my
threads are blocked run for a period of 50ms straight…
Anyone have any thoughts on possible causes?
best,
Fokke
You're running Mint :-) Lots of background bells and whistles there,
lots of things which will crop up and interfere, things you cannot
disable or turn off with absolute certainty. If you want smooth power,
you'll have to choose more carefully. My current SOP in more detail here:

http://lsn.ponderworthy.com/doku.php/choosing_a_linux_platform_for_live_synth
--
Jonathan E. Brickman ***@ponderworthy.com (785)233-9977
Hear us at http://ponderworthy.com -- CDs and MP3 now available!
<http://ponderworthy.com/ad-astra/ad-astra.html>
Music of compassion; fire, and life!!!
Jörn Nettingsmeier
2016-01-25 11:23:09 UTC
Permalink
hi *!


sorry to hijack this thread, but: when enquiring about latency tuning,
one frequently encounters hints like "disable cron", "disable indexing
services", "disable this, disable that".

however, none of those alleged culprits run with real-time privileges or
access driver or kernel code which does. so how can they be a problem
(and disabling them part of the solution)? i'm asking because i've got
my own anecdotal evidence that it *does* make a difference...

i understand how device drivers can be nasty (graphics cards locking up
the pci bus, wifi chips hogging the kernel for milliseconds at a time or
worse...) but it seems that a) either kernel preemption and real-time
scheduling is terribly buggy or hand-wavey, or b) we're feeding each
other snake-oil in recommending to disable userspace things that is
running without rt privs.

i'd love to be educated on this.


best,


jörn
--
Jörn Nettingsmeier
Lortzingstr. 11, 45128 Essen, Tel. +49 177 7937487

Meister für Veranstaltungstechnik (Bühne/Studio)
Tonmeister VDT

http://stackingdwarves.net
Joakim Hernberg
2016-01-25 11:53:57 UTC
Permalink
On Mon, 25 Jan 2016 12:23:09 +0100
Post by Jörn Nettingsmeier
i understand how device drivers can be nasty (graphics cards locking
up the pci bus, wifi chips hogging the kernel for milliseconds at a
time or worse...) but it seems that a) either kernel preemption and
real-time scheduling is terribly buggy or hand-wavey, or b) we're
feeding each other snake-oil in recommending to disable userspace
things that is running without rt privs.
i'd love to be educated on this.
My 2 cents is that it's snakeoil ;)

AFAIK, the important things are.

1. Use a properly configured realtime patched kernel.

2. Set a high priority of the soundcard interrupt, something like 97 is
a good value. (If using a USB soundcard, set the priority of the
interrupt servicing the USB hub instead).

3. Run Jack with realtime and memlocking enabled and at a priority of
80.

4. Make sure that you don't have any hardware/drivers that play havoc
with your kernel scheduling. some WIFI adapters, NVIDIA, etc comes to
mind.

5. Make sure that the system isn't suffering from SMI/NMIs which
preempt the kernel and can take a long time to execute. This can be
done with hwlatdetect script in the rt-tests package.

6. Use cyclictest from rt-tests to confirm that there are no latency
spikes in how the kernel schedules threads.

Possibly hyperthreading, cpu power management, etc could cause
problems, and I don't have experience with all hardware out there, but
IME on modern Intel hardware this isn't a problem.

JACK2 also has a very nice profiling tool that can give a good idea
about what is going on with the soundcard interrupt, clients, etc.

Apart from that I think most things are snake oil, or ancient Internet
lore, of course ymmv ;)
--
Joakim
Joakim Hernberg
2016-01-25 21:19:44 UTC
Permalink
On Mon, 25 Jan 2016 12:53:57 +0100
Post by Joakim Hernberg
On Mon, 25 Jan 2016 12:23:09 +0100
Post by Jörn Nettingsmeier
i understand how device drivers can be nasty (graphics cards locking
up the pci bus, wifi chips hogging the kernel for milliseconds at a
time or worse...) but it seems that a) either kernel preemption and
real-time scheduling is terribly buggy or hand-wavey, or b) we're
feeding each other snake-oil in recommending to disable userspace
things that is running without rt privs.
i'd love to be educated on this.
2 additional observations, thermal events (cpu overheating) are bad and
will give xruns. Also I've had problems keeping audio files in my home
dir when using traditional hdd (rust), better to use a different hdd,
or a ssd instead. Changing i/o scheduler or class and priorities for
CFQ didn't seem to bring any relief.
--
Joakim
Jeremy Jongepier
2016-01-26 19:53:55 UTC
Permalink
Post by Joakim Hernberg
3. Run Jack with realtime and memlocking enabled and at a priority of
80.
Does this refer to the -m jackd option? What does that do actually, any
pros/cons?

Jeremy
Jeremy Jongepier
2016-01-26 20:05:37 UTC
Permalink
Post by Jeremy Jongepier
Post by Joakim Hernberg
3. Run Jack with realtime and memlocking enabled and at a priority of
80.
Does this refer to the -m jackd option? What does that do actually, any
pros/cons?
Nevermind, first try a certain search engine and then ask ;) I guess
it's preferable to have JACK not lock physical memory but do it's things
in virtual memory.

Jeremy
Jeremy Jongepier
2016-01-26 21:15:44 UTC
Permalink
Post by Jeremy Jongepier
Nevermind, first try a certain search engine and then ask ;) I guess
it's preferable to have JACK not lock physical memory but do it's things
in virtual memory.
Which should be the other way around of course... /me hides somewhere in
a corner

Jeremy
Will Godfrey
2016-01-26 21:41:26 UTC
Permalink
On Tue, 26 Jan 2016 22:15:44 +0100
Post by Jeremy Jongepier
Post by Jeremy Jongepier
Nevermind, first try a certain search engine and then ask ;) I guess
it's preferable to have JACK not lock physical memory but do it's things
in virtual memory.
Which should be the other way around of course... /me hides somewhere in
a corner
Jeremy
Don't do that. The corners are already crowded with some of us :)
--
Will J Godfrey
http://www.musically.me.uk
Say you have a poem and I have a tune.
Exchange them and we can both have a poem, a tune, and a song.
Joakim Hernberg
2016-01-31 14:12:04 UTC
Permalink
On Tue, 26 Jan 2016 20:53:55 +0100
Post by Jeremy Jongepier
Post by Joakim Hernberg
3. Run Jack with realtime and memlocking enabled and at a priority
of 80.
Does this refer to the -m jackd option? What does that do actually,
any pros/cons?
As I understand it, you wouldn't want memory being used for audio
to be swapped out to disk as that could incur a lengthy timeout when it
needs to be read back from disk, and thus producing xruns.
--
Joakim
Len Ovens
2016-01-25 14:50:06 UTC
Permalink
sorry to hijack this thread, but: when enquiring about latency tuning, one
frequently encounters hints like "disable cron", "disable indexing services",
"disable this, disable that".
however, none of those alleged culprits run with real-time privileges or
access driver or kernel code which does. so how can they be a problem (and
disabling them part of the solution)? i'm asking because i've got my own
anecdotal evidence that it *does* make a difference...
Yes, the big thing is that I see xruns just before something pops up
saying "hey theres an upgrade available". Now as I have said, cron runs
super "nice" and so anything that cron runs should be really low priority
too. But time constraints are not just CPU access and time. I would think
that the network driver even using the bus for a full 1500 bytes should
not be a problem, but where does that data go? What priority is a disk
access... and once it starts how big a chunk of data gets written and is
it atomic? It does not seem to be memory related as I use half my memory
it seems even running a lot of stuff at the same time. My swap after weeks
of running is still 0%. (swappiness 10)
i understand how device drivers can be nasty (graphics cards locking up the
pci bus, wifi chips hogging the kernel for milliseconds at a time or
Actually I think with wifi chips it is the bus that gets hogged.
worse...) but it seems that a) either kernel preemption and real-time
scheduling is terribly buggy or hand-wavey, or b) we're feeding each other
snake-oil in recommending to disable userspace things that is running without
rt privs.
As you yourself can attest, it does make a difference. I would suggest
that there are some kernel drivers that are optimized for throughput over
latency that have not yet been accounted for. Or some other things that
are in their own way time constrained. Network traffic comes to mind.
Network traffic comes when it comes and can only be buffered in hardware
so long before packets get lost. However, as I said, even full packets are
relatively small. What is the biggest data chunk that gets written to
disk? Has anyone gone through kernel drivers looking for atomic parts that
could be shortened? Is there a setting for maximum data size of a disk
write/read? It appears there are ways to throttle disk access speed on a
per-proccess basis.

Another one that is puzzling is CPU speed changes (AKA OnDemand). These
happen very fast and should not cause trouble, but they do. It seems to
me, just by watching a cpu speed monitor, that xruns happen at the point
the cpu speed goes down only. Perhaps there is some timing loop somewhere
that gets expanded that should not. I would think any timing should be
done by timers that are not cpu speed dependant.

Honestly, these are just thoughts off the top of my head. I don't know the
kernel code well enough to say (Means I have not looked at it). I just
know that by turning certain things off, I can get lower latency without
xruns over a 24hour period. (even just sitting idle streaming zeros)

--
Len Ovens
www.ovenwerks.net
Fokke de Jong
2016-01-25 12:52:39 UTC
Permalink
thanks for all your input, I’ll try and summarize here.
http://lsn.ponderworthy.com/doku.php/choosing_a_linux_platform_for_live_synth <http://lsn.ponderworthy.com/doku.php/choosing_a_linux_platform_for_live_synth>
--
Hear us at http://ponderworthy.com <http://ponderworthy.com/> -- CDs and MP3 now available! <http://ponderworthy.com/ad-astra/ad-astra.html>
Music of compassion; fire, and life!!!
First of all, booting into console mode, rather than running the full blown desktop seemed to eliminate most of the problems, although it’s still not quite a stable as i’d like.
Also i don’t quite understand how all of that could interfere with my RT-thread.
This was going to try and install a more minimal system anyway, and don’t need a graphical environment for this, but during developments it’s kind of nice to have.

I still would like to see how far i can take this, and was really hoping i can continuously use 80-90% of all cpu cores without dropouts…
Is that realistic with a lowlatency kernel?
Do you lock all memory used by your RT threads ?
If you don't and the system is configured for high swappiness
[1] this sort of thing could happen.
I'm routinely running big real-time convolution matrices without
problems, so it's certainly possible.
[1] <https://en.wikipedia.org/wiki/Swappiness <https://en.wikipedia.org/wiki/Swappiness>>
--
FA
I am not currently locking memory. I thought a had plenty of ram, as not to cause any swapping, but i guess its good practice to wire memory, so i will give it a try.
Bad kernel driver? WIFI drivers are known bad for things like this. An interupt driver can block if it is designed badly. I found on one machine I had to unload the the kernel module for my wifi as it actually created more problems when I turned the power off to the tx than when it was on. (it seems to me on my wifi, when it was turned on I got xruns every 5 seconds, but with it turned off it was every half second or so... sounds very close to 0.6, unloading the kernel module fixed it)
Cron should also be turned off, but that is probably not the problem here. Cron runs super "nice" but there seem to be some things it does like packge update that can cause problems too. I turn off cron while recording.
--
Len Ovens
I don’t have a wireless on my machine, nor an nvidia card. just intel builtin graphics. This where my linux knowledge falls short, but If i don’t have that hardware, can I assume no drivers for it are loaded?
AFAIK, the important things are.
1. Use a properly configured realtime patched kernel.
lowlatency-kernel is not going to cut it?

I wasn’t really able to find to much info on the difference between the two, other than than the rt-kernel is a “step up” and hard realtime vs soft.
But nothing on how this is technically achieved
2. Set a high priority of the soundcard interrupt, something like 97 is
a good value. (If using a USB soundcard, set the priority of the
interrupt servicing the USB hub instead).
did that.
3. Run Jack with realtime and memlocking enabled and at a priority of
80.
I’m not running jack but rather using alsa directly/
4. Make sure that you don't have any hardware/drivers that play havoc
with your kernel scheduling. some WIFI adapters, NVIDIA, etc comes to
mind.
5. Make sure that the system isn't suffering from SMI/NMIs which
preempt the kernel and can take a long time to execute. This can be
done with hwlatdetect script in the rt-tests package.
6. Use cyclictest from rt-tests to confirm that there are no latency
spikes in how the kernel schedules threads.
Possibly hyperthreading, cpu power management, etc could cause
problems, and I don't have experience with all hardware out there, but
IME on modern Intel hardware this isn't a problem.
I did actually find that hyperthreading had an impact, turing it of made every thing much more predictable.
JACK2 also has a very nice profiling tool that can give a good idea
about what is going on with the soundcard interrupt, clients, etc.
--
Joakim
Keep an eye on the interrupts while its all running, particularly
Non-maskable interrupts. Try to correlate them with the 0.6 sec
of the glitches if possible;
watch -n 0.1 cat /proc/interrupts
I've written up some of the checks I generally do, perhaps browse
that to see if there's anything there that you could check?
http://openavproductions.com/real-time-latency-tuning/ <http://openavproductions.com/real-time-latency-tuning/>
Thats all I can think of at the moment, -Harry
Here’s the output of cat /proc/interrupts:


CPU0 CPU1 CPU2 CPU3
0: 57 0 0 0 IO-APIC-edge timer
1: 3 0 0 0 IO-APIC-edge i8042
7: 44 0 0 0 IO-APIC-edge
8: 1 0 0 0 IO-APIC-edge rtc0
9: 3 0 0 0 IO-APIC-fasteoi acpi
12: 4 0 0 0 IO-APIC-edge i8042
16: 0 0 0 0 IO-APIC 16-fasteoi madifx
121: 7074 0 0 341 PCI-MSI-edge xhci_hcd
122: 13001 25946 0 342 PCI-MSI-edge 0000:00:17.0
123: 3409 0 0 0 PCI-MSI-edge eth0
124: 171029 0 0 0 PCI-MSI-edge i915_bpo
125: 4805 0 0 0 PCI-MSI-edge snd_hda_intel
NMI: 17 12 13 14 Non-maskable interrupts
LOC: 544121 436328 444080 462821 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 17 12 13 14 Performance monitoring interrupts
IWI: 0 0 0 0 IRQ work interrupts
RTR: 3 0 0 0 APIC ICR read retries
RES: 13051 11975 11216 8004 Rescheduling interrupts
CAL: 613 547 560 526 Function call interrupts
TLB: 640 767 676 535 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 31 31 31 31 Machine check polls
HYP: 0 0 0 0 Hypervisor callback interrupts
ERR: 44
MIS: 0

the local timer interrupts are getting fired all the time, but i guess they should.
123 eth0 is also updated rather often. But the one thats closed to 0.6s seems to be:

122: 13001 26147 0 342 PCI-MSI-edge 0000:00:17.0

But is there anything a can do about that?



cheers,
Fokke
Len Ovens
2016-01-25 15:18:33 UTC
Permalink
  16:          0          0          0          0   IO-APIC   16-fasteoi   madifx
Is this your audio interface on irq 16? If so why is it sharing an IRQ?
Move it to a different slot maybe? If this is a PCI card and there is only
one slot, I would suggest a different motherboard with more PCI slots. My
personal experience with sharing IRQs has never been good even using rtirq
to separate things out. One thing to try is in BIOS there is sometimes a
setting that tells bios to set irqs or not. I have found setting it to not
lets the kernel set it and the kernel does a better job. Also, some bios
have a part where you can fix a PCI card to a irq, that may help.

I am sure some will say that if rtirq doesn't help there is a bad
driver... OK. The thing to remember is that PCs are not built for low
latency but high throughput. Most people find that high throughput makes
for a "snappy" user experience. Low latency to most HW designers means
30ms.

--
Len Ovens
www.ovenwerks.net
Len Ovens
2016-01-25 15:28:17 UTC
Permalink
I am sure some will say that if rtirq doesn't help there is a bad driver...
Check the actual priorities that rtirq sets. It seems to me the last time
I checked that if an irq is shared by a, b an c and rtirq is used to
prioritize c to 90 for example, a and b will end up at 86 and 88 or
something like that even though they should be 50. This was some time ago
and may well have changed. In days of old I found even swapping the slots
cards were plugged into made a difference, but I have the same cards
backwards in the i5 I run now with no problem.

note: I use two audio cards, a delta66 and an audiopci. The delta has to
be higher priority than the audiopci (which provides midi only) or I get
xruns.


--
Len Ovens
www.ovenwerks.net
Fokke de Jong
2016-01-27 11:03:14 UTC
Permalink
Post by Fokke de Jong
16: 0 0 0 0 IO-APIC 16-fasteoi madifx
The madifx is my sound card. I have no idea what the fasteoi is though…(anyone ?)
I have have 3 PCIe slots, one of them and the cars doesn’t show up at all, the other 2 both have the result above.
The last time a had to deal with IRQ’s directly was when is was trying to install a soundblaster in an 486… :-)

fokke
Is this your audio interface on irq 16? If so why is it sharing an IRQ? Move it to a different slot maybe? If this is a PCI card and there is only one slot, I would suggest a different motherboard with more PCI slots. My personal experience with sharing IRQs has never been good even using rtirq to separate things out. One thing to try is in BIOS there is sometimes a setting that tells bios to set irqs or not. I have found setting it to not lets the kernel set it and the kernel does a better job. Also, some bios have a part where you can fix a PCI card to a irq, that may help.
I am sure some will say that if rtirq doesn't help there is a bad driver... OK. The thing to remember is that PCs are not built for low latency but high throughput. Most people find that high throughput makes for a "snappy" user experience. Low latency to most HW designers means 30ms.
--
Len Ovens
www.ovenwerks.net
_______________________________________________
Linux-audio-dev mailing list
http://lists.linuxaudio.org/listinfo/linux-audio-dev
Len Ovens
2016-01-27 15:48:37 UTC
Permalink
Post by Fokke de Jong
16: 0 0 0 0 IO-APIC 16-fasteoi madifx
The madifx is my sound card. I have no idea what the fasteoi is though
(anyone ?)
Hmm, I have looked as best I can and it seems fasteoi is an interupt
translator (best word I could come up with) and it in combination with
IO-APIC is the same as what some systems show as: IO-APIC-fasteoi I do not
know if this is kernel version or hardware. however it does appear to be
tied to the madifx use and so not to worry about.
I have have 3 PCIe slots
PCIe is a different animal than PCI. The interupts are sent different too.
Interupt conflicts should not happen.

Do you monitor temperature? (I use Psensor) Which CPU governor do you use?

Looking back at your first post, you are measuring time that your callback
takes in terms of the wall clock. The fact that it sometimes takes a lot
longer than it should does indicate that something else is taking some of
that time.

In another post you suggest that the interupts for 0000:00:17.0 seem to be
about the right number for every .6 seconds. Have you looked up what
device that is? ls /sys/bus/pci/devices

in the 0000:00:17.0 directory you can look at the driver which may (or
not) tell you more about what it is. cat uevent seems to give the most
readable info. But someone who knows the file system better may be able to
point to a better way.


--
Len Ovens
www.ovenwerks.net
Jonathan E. Brickman
2016-01-28 13:37:47 UTC
Permalink
First of all, for the record, anyone who equates firsthand experiences
with snakeoil, shall find their words completely ignored by yours truly :-)
Post by Fokke de Jong
First of all, booting into console mode, rather than running the full
blown desktop seemed to eliminate most of the problems, although it’s
still not quite a stable as i’d like.
Also i don’t quite understand how all of that could interfere with my RT-thread.
This was going to try and install a more minimal system anyway, and
don’t need a graphical environment for this, but during developments
it’s kind of nice to have.
Check your processes with htop. Make sure none of the resources-eating
background items remain.
Post by Fokke de Jong
I still would like to see how far i can take this, and was really
hoping i can continuously use 80-90% of all cpu cores without dropouts…
Is that realistic with a lowlatency kernel?
In my experiences this is not realistic with either a realtime kernel or
a lowlatency kernel, unless you can afford large latency times, using
large audio buffers. This is because in a low latency situation, the
CPU has to have a lot of free cycles available to be ready to handle
everything which comes.

I do think you will probably see more stability if you use JACK in such
efforts, or even PulseAudio, than if you use direct ALSA. I have found
ALSA to be great for drivers, not anywhere near so good for the
transport phases.
Post by Fokke de Jong
Post by Len Ovens
Cron should also be turned off, but that is probably not the problem
here. Cron runs super "nice" but there seem to be some things it does
like packge update that can cause problems too. I turn off cron while
recording.
I have never had to turn cron on an otherwise well-approached environment.
Post by Fokke de Jong
I don’t have a wireless on my machine, nor an nvidia card. just intel
builtin graphics. This where my linux knowledge falls short, but If i
don’t have that hardware, can I assume no drivers for it are loaded?
Yep, no problem there.
Post by Fokke de Jong
Post by Len Ovens
AFAIK, the important things are.
1. Use a properly configured realtime patched kernel.
lowlatency-kernel is not going to cut it?
Lowlatency is just fine if you have the CPU for it, and lowlatency is a
whole lot easier to set up now, with the Liquorix people on the ball
like they are.
Post by Fokke de Jong
I wasn’t really able to find to much info on the difference between
the two, other than than the rt-kernel is a “step up” and hard
realtime vs soft.
But nothing on how this is technically achieved
On my production box, with my Behringer Firewire FCA202, I have found
slightly better results using a Liquorix kernel than with a
realtime-patched kernel. Liquorix has a whole lot of interesting
optimizations. I would imagine that if my CPU were not what it is,
and/or the load type different, the differences would probably be
considerably greater, and I have no thought as to which side it would
land on.
--
Jonathan E. Brickman ***@ponderworthy.com (785)233-9977
Hear us at http://ponderworthy.com -- CDs and MP3 now available!
<http://ponderworthy.com/ad-astra/ad-astra.html>
Music of compassion; fire, and life!!!
Erik de Castro Lopo
2016-01-28 20:44:25 UTC
Permalink
Post by Jonathan E. Brickman
I do think you will probably see more stability if you use JACK in such
efforts, or even PulseAudio, than if you use direct ALSA. I have found
ALSA to be great for drivers, not anywhere near so good for the
transport phases.
Doesn't JACK sit on top of ALSA? If so how is it possible for JACK with
ALSA to perform better than ALSA alone?

Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/
Fons Adriaensen
2016-01-28 21:32:30 UTC
Permalink
Post by Erik de Castro Lopo
Post by Jonathan E. Brickman
I do think you will probably see more stability if you use JACK in such
efforts, or even PulseAudio, than if you use direct ALSA. I have found
ALSA to be great for drivers, not anywhere near so good for the
transport phases.
Doesn't JACK sit on top of ALSA? If so how is it possible for JACK with
ALSA to perform better than ALSA alone?
There are many ways of 'sitting on top of ALSA', including some
suboptimal ones. It's perfectly possible for an app using ALSA
directly to perform worse than Jack.

Ciao,
--
FA

A world of exhaustive, reliable metadata would be an utopia.
It's also a pipe-dream, founded on self-delusion, nerd hubris
and hysterically inflated market opportunities. (Cory Doctorow)
Joakim Hernberg
2016-01-31 14:26:16 UTC
Permalink
On Thu, 28 Jan 2016 07:37:47 -0600
Post by Jonathan E. Brickman
First of all, for the record, anyone who equates firsthand
experiences with snakeoil, shall find their words completely ignored
by yours truly :-)
It's not my intention to denigrate your experience, but my own
experience shows that the only way to get xrun free audio using JACK is
to follow my previous advice. There are many causes for xruns, but
IMO/IME using a lowlat kernel simply won't provide xrun free audio...
This conclusion seems to be corroborated by cyclictest results...

Personally I attend to what I outlined previously, and I use a full KDE
environment with opengl compositing enabled on an Intel GPU. I don't
have to disable cron or anything else, and I have let most of the
Internet lore just fall by the wayside, as it IME seems to be if not
snake oil, then at least not applicable to my system.

Of course YMMW ;)
--
Joakim
Joakim Hernberg
2016-01-25 11:29:06 UTC
Permalink
On Sun, 24 Jan 2016 15:03:08 +0100
Post by Fokke de Jong
I am measuring cpu load by getting the time with
clock_gettime(CLOCK_MONOTONIC_RAW, timespec*) at the beginning and
end of each callback. When converted to a percentage my cpu load
hovers somewhere between 40 an 50% most of the time, but more or less
every 900 callbacks (0.8 seconds there is a spike of more than 100%.
I remember reading somewhere that realtime threads cannot run more
than .95s every second. That would be very bad if it actually meant
my threads are blocked run for a period of 50ms straight…
You can disable the realtime runtime throttling if that is a problem,
but I'd doubt that would be the reason for your troubles, and it's
unlikely to cause problems unless you're really maxing out the CPUs.

Other reasons could range from hardware/kernel modules that aren't
playing nice with RT to wifi drivers and the NVIDIA module also comes to
mind. Your system could also be suffering from SMI/NMI that runs out
of BIOS and completely preempts the kernel.

There are testing utilities in a package called rt-tests. Try to run
cyclictest like this (sudo) "cyclictest -S -m -p98". cyclictest
schedules threads to be run at a certain time, and can be used to get a
pretty good idea of kernel scheduling latencies. It's the max that is
interesting to us, and hopefully yours will be under 100us, but if you
have peaks that go much higher that could be a reason for audio
dropouts.

Best to run it in the background for a while, and even better would be
to put load on the system. The hackbench utility can be used for that.

Another reason could be cpu power management, though on a modern intel
processor that appears to be a thing of the past.

I suppose hyperthreading could be a potential pitfall, but personally I
see no problems with it with my audio workloads on my i7.
--
Joakim
Len Ovens
2016-01-25 14:57:14 UTC
Permalink
Post by Joakim Hernberg
I suppose hyperthreading could be a potential pitfall, but personally I
see no problems with it with my audio workloads on my i7.
hyperthread is only a problem with jack latency under 64/2... even on an
older single core P4. (at least in my testing)

--
Len Ovens
www.ovenwerks.net
Joakim Hernberg
2016-01-25 21:35:47 UTC
Permalink
On Mon, 25 Jan 2016 06:57:14 -0800 (PST)
Post by Len Ovens
Post by Joakim Hernberg
I suppose hyperthreading could be a potential pitfall, but
personally I see no problems with it with my audio workloads on my
i7.
hyperthread is only a problem with jack latency under 64/2... even on
an older single core P4. (at least in my testing)
I can't remember back to the P4 days ;) But I thought I'd mention it
for completeness as it totally breaks the concept of SCHED_FIFO
threads. Good to know that it appears to be a non problem.
--
Joakim
Tim Goetze
2016-01-25 14:30:52 UTC
Permalink
[Fokke de Jong]
Post by Fokke de Jong
I’m processing 32 sample-blocks at 48KHz but roughly every 0,6
seconds I get a large spike in cpu usage. This cannot possibly be
explained by my algorithm, because the load should be pretty stable.
I am measuring cpu load by getting the time with
clock_gettime(CLOCK_MONOTONIC_RAW, timespec*) at the beginning and
end of each callback. When converted to a percentage my cpu load
hovers somewhere between 40 an 50% most of the time, but more or less
every 900 callbacks (0.8 seconds there is a spike of more than 100%.
Have you checked if your per-callback CPU time measurement and that
reported by getrusage(2) are in agreement?

Cheers, Tim
Loading...