Discussion:
core.thread.Fiber --- runtime stack overflow unlike goroutines
Carl Sturtivant via Digitalmars-d-learn
2014-08-14 07:46:28 UTC
Permalink
The default size of the runtime stack for a Fiber is 4*PAGESIZE
which is very small, and a quick test shows that a Fiber suffers
a stack overflow that doesn't lead to a clean termination when
this limit is exceeded.

This makes it difficult to simulate deterministic alternation
where the stack size needed is unpredictable because complex
deterministic computations are going on inside Fibers.

In contrast, the Go programming language's goroutines can extend
their stacks as needed at runtime, and so can be used to simulate
deterministic alternation without this limitation, and yet be
initially executed with each having only a small stack size.

There seems to be a claim that all that's needed to add
D-routines (goroutines for D) is a scheduler and a Channel type,
on top of Fiber.
http://forum.dlang.org/thread/lphnen$1ml7$***@digitalmars.com
See the initial post, point 7., as well as supporting remarks in
later replies.

Am I missing something? Is there a clean and simple way to get
Fiber to no longer suffer a stack overflow when implementing
D-routines?
Yuriy via Digitalmars-d-learn
2014-08-14 09:38:21 UTC
Permalink
Infinite stack comes with a cost. Every function call first
checks the state of the stack. This should be done with the help
of compiler. And such a "tradeoff" could scare bare-metal
programmers away from D. Though maybe there's a chance of making
this stack check controllable, but not that i can think of.

On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant
Post by Carl Sturtivant via Digitalmars-d-learn
The default size of the runtime stack for a Fiber is 4*PAGESIZE
which is very small, and a quick test shows that a Fiber
suffers a stack overflow that doesn't lead to a clean
termination when this limit is exceeded.
This makes it difficult to simulate deterministic alternation
where the stack size needed is unpredictable because complex
deterministic computations are going on inside Fibers.
In contrast, the Go programming language's goroutines can
extend their stacks as needed at runtime, and so can be used to
simulate deterministic alternation without this limitation, and
yet be initially executed with each having only a small stack
size.
There seems to be a claim that all that's needed to add
D-routines (goroutines for D) is a scheduler and a Channel
type, on top of Fiber.
See the initial post, point 7., as well as supporting remarks
in later replies.
Am I missing something? Is there a clean and simple way to get
Fiber to no longer suffer a stack overflow when implementing
D-routines?
Sean Kelly via Digitalmars-d-learn
2014-08-14 18:51:58 UTC
Permalink
On 64 bit, reserve a huge chunk of memory, set a SEGV handler and
commit more as needed. Basically how kernel thread stacks work.
I've been meaning to do this but haven't gotten around to it yet.
Dicebot via Digitalmars-d-learn
2014-08-14 23:31:31 UTC
Permalink
Post by Sean Kelly via Digitalmars-d-learn
On 64 bit, reserve a huge chunk of memory, set a SEGV handler
and commit more as needed. Basically how kernel thread stacks
work. I've been meaning to do this but haven't gotten around to
it yet.
I think using some sort of thread-local shared heap pool is
better approach in general as it does need any SEGV handling
overhead and simplifies fiber implementation (all fibers are
guaranteed to take fixed amount of memory). It is not a silver
bullet and forces you to think much more about application memory
layout but I believe this is much better approach for high
performance services than segmented stack like in Go.
Kagamin via Digitalmars-d-learn
2014-08-15 08:34:49 UTC
Permalink
Post by Sean Kelly via Digitalmars-d-learn
On 64 bit, reserve a huge chunk of memory, set a SEGV handler
and commit more as needed. Basically how kernel thread stacks
work. I've been meaning to do this but haven't gotten around to
it yet.
AFAIK, OS already provides this transparently: when you allocate
memory, OS only reserves the memory range and commits pages when
they are accessed.
Kagamin via Digitalmars-d-learn
2014-08-15 08:36:33 UTC
Permalink
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887%28v=vs.85%29.aspx
Allocates memory charges (from the overall size of memory and
the paging files on disk) for the specified reserved memory
pages. The function also guarantees that when the caller later
initially accesses the memory, the contents will be zero.
Actual physical pages are not allocated unless/until the
virtual addresses are actually accessed.
Sean Kelly via Digitalmars-d-learn
2014-08-15 14:26:26 UTC
Permalink
Post by Kagamin via Digitalmars-d-learn
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887%28v=vs.85%29.aspx
Allocates memory charges (from the overall size of memory and
the paging files on disk) for the specified reserved memory
pages. The function also guarantees that when the caller later
initially accesses the memory, the contents will be zero.
Actual physical pages are not allocated unless/until the
virtual addresses are actually accessed.
Oh handy, so there's basically no work to be done on Windows.
I'll have to check the behavior of mmap on Posix.
Kagamin via Digitalmars-d-learn
2014-08-15 14:28:33 UTC
Permalink
Post by Sean Kelly via Digitalmars-d-learn
Oh handy, so there's basically no work to be done on Windows.
I'll have to check the behavior of mmap on Posix.
I heard, calloc behaves this way on linux (COW blank page mapped
to the entire range), it was discussed here some time ago.
Kagamin via Digitalmars-d-learn
2014-08-15 14:32:38 UTC
Permalink
Post by Kagamin via Digitalmars-d-learn
Post by Sean Kelly via Digitalmars-d-learn
Oh handy, so there's basically no work to be done on Windows.
I'll have to check the behavior of mmap on Posix.
I heard, calloc behaves this way on linux (COW blank page
mapped to the entire range), it was discussed here some time
ago.
Another abstraction for core.vmm.
Sean Kelly via Digitalmars-d-learn
2014-08-15 14:38:21 UTC
Permalink
Post by Sean Kelly via Digitalmars-d-learn
Post by Kagamin via Digitalmars-d-learn
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887%28v=vs.85%29.aspx
Allocates memory charges (from the overall size of memory and
the paging files on disk) for the specified reserved memory
pages. The function also guarantees that when the caller
later initially accesses the memory, the contents will be
zero. Actual physical pages are not allocated unless/until
the virtual addresses are actually accessed.
Oh handy, so there's basically no work to be done on Windows.
I'll have to check the behavior of mmap on Posix.
It sounds like mmap (typically) works the same way on Linux, and
that malloc generally does as well. I'll have to test this to be
sure. If so, and if doing so is fast, I'm going to increase the
default stack size on 64-bit systems to something reasonably
large. And here I thought I would have to do this all manually.
Carl Sturtivant via Digitalmars-d-learn
2014-08-15 20:09:26 UTC
Permalink
Post by Sean Kelly via Digitalmars-d-learn
On 64 bit, reserve a huge chunk of memory, set a SEGV handler
and commit more as needed. Basically how kernel thread stacks
work. I've been meaning to do this but haven't gotten around to
it yet.
Very nice; the hardware VM machinery takes care of it and there's
only overhead when overflow occurs. D's Fibers will really be
something remarkable for user-space with this facility.
Brad Anderson via Digitalmars-d-learn
2014-08-14 19:19:40 UTC
Permalink
On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant
Post by Carl Sturtivant via Digitalmars-d-learn
The default size of the runtime stack for a Fiber is 4*PAGESIZE
which is very small, and a quick test shows that a Fiber
suffers a stack overflow that doesn't lead to a clean
termination when this limit is exceeded.
This makes it difficult to simulate deterministic alternation
where the stack size needed is unpredictable because complex
deterministic computations are going on inside Fibers.
In contrast, the Go programming language's goroutines can
extend their stacks as needed at runtime, and so can be used to
simulate deterministic alternation without this limitation, and
yet be initially executed with each having only a small stack
size.
There seems to be a claim that all that's needed to add
D-routines (goroutines for D) is a scheduler and a Channel
type, on top of Fiber.
See the initial post, point 7., as well as supporting remarks
in later replies.
Am I missing something? Is there a clean and simple way to get
Fiber to no longer suffer a stack overflow when implementing
D-routines?
Segmented stacks come with a cost. Rust abandoned them for
reasons you can read about here:

https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html

I believe Go has taken steps to help mitigate stack thrashing but
I don't know if they have been successful yet.
Kagamin via Digitalmars-d-learn
2014-08-15 08:41:29 UTC
Permalink
On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant
Post by Carl Sturtivant via Digitalmars-d-learn
The default size of the runtime stack for a Fiber is 4*PAGESIZE
which is very small, and a quick test shows that a Fiber
suffers a stack overflow that doesn't lead to a clean
termination when this limit is exceeded.
Pass a bigger stack size to the Fiber constructor?
Dicebot via Digitalmars-d-learn
2014-08-15 14:28:32 UTC
Permalink
Post by Yuriy via Digitalmars-d-learn
On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant
Post by Carl Sturtivant via Digitalmars-d-learn
The default size of the runtime stack for a Fiber is
4*PAGESIZE which is very small, and a quick test shows that a
Fiber suffers a stack overflow that doesn't lead to a clean
termination when this limit is exceeded.
Pass a bigger stack size to the Fiber constructor?
Won't that kind of kill the purpose of Fiber as low-cost context
abstraction? Stack size does add up for thousands of fibers.
Kagamin via Digitalmars-d-learn
2014-08-15 14:30:21 UTC
Permalink
Post by Dicebot via Digitalmars-d-learn
Won't that kind of kill the purpose of Fiber as low-cost
context abstraction? Stack size does add up for thousands of
fibers.
I didn't measure it.
Sean Kelly via Digitalmars-d-learn
2014-08-15 14:45:01 UTC
Permalink
Post by Dicebot via Digitalmars-d-learn
Won't that kind of kill the purpose of Fiber as low-cost
context abstraction? Stack size does add up for thousands of
fibers.
As long as allocation speed is fast for large allocs (which I
have to test), I want to change the default size to be very
large. The virtual address space in a 64-bit app is enormous,
and if the memory is committed on demand then physical memory use
should only match what the user actually requires. This should
allow us to create millions of fibers and not overrun system
memory, and also not worry about stack overruns, which has always
been a concern with the default fiber stack size.
Sean Kelly via Digitalmars-d-learn
2014-08-15 15:09:18 UTC
Permalink
At least on OSX, it appears that mapping memory is constant time
regardless of size, but there is some max total memory I'm
allowed to map, presumably based on the size of a vmm lookup
tabe. The max block size I can allocate is 1 GB, and I can
allocate roughly 131,000 of these blocks before getting an out of
memory error. If I reduce the block size to 4 MB I can allocate
more than 10M blocks without error. I think some default stack
size around 4 MB seems about right. Increasing the size to 16 MB
failed after about 800,000 allocations, which isn't enough
(potential) fibers.
Martin Nowak via Digitalmars-d-learn
2014-09-22 00:09:09 UTC
Permalink
Post by Sean Kelly via Digitalmars-d-learn
At least on OSX, it appears that mapping memory is constant time
regardless of size, but there is some max total memory I'm allowed to
map, presumably based on the size of a vmm lookup tabe. The max block
size I can allocate is 1 GB, and I can allocate roughly 131,000 of these
blocks before getting an out of memory error. If I reduce the block
size to 4 MB I can allocate more than 10M blocks without error. I think
some default stack size around 4 MB seems about right. Increasing the
size to 16 MB failed after about 800,000 allocations, which isn't enough
(potential) fibers.
While pages are committed lazily it's not free to map more memory.
For example the kernel will have to allocate and setup more page table
entries.
Furthermore we might need to use MAP_NORESERVE to not run into
overcommit limitations, but that's not available on all OSes I think.

It might make sense to raise the stack size again [1] if we had an idea
what size would avoid common issues (vibe.d uses 64kB). 4MB is too much
for the default size IMO.

Given that the stack size is configurable let's instead try to improve
the diagnostic of Fiber stack overflows first, we should also map a
guard page on posix and add a segfault handler to detect overflows.

[1]:
https://github.com/D-Programming-Language/druntime/commit/9dca50fe65402fb3cdfbb689f1aca58dc835dce4#diff-8bb12ed976acf0a5132e877ec5a01ea8R3163
Dicebot via Digitalmars-d-learn
2014-08-15 15:25:22 UTC
Permalink
Post by Sean Kelly via Digitalmars-d-learn
Post by Dicebot via Digitalmars-d-learn
Won't that kind of kill the purpose of Fiber as low-cost
context abstraction? Stack size does add up for thousands of
fibers.
As long as allocation speed is fast for large allocs (which I
have to test), I want to change the default size to be very
large. The virtual address space in a 64-bit app is enormous,
and if the memory is committed on demand then physical memory
use should only match what the user actually requires. This
should allow us to create millions of fibers and not overrun
system memory, and also not worry about stack overruns, which
has always been a concern with the default fiber stack size.
No, I was referring to the proposal to supply bigger stack size
to Fiber constructor - AFAIR it currently does allocate that
memory eagerly (and does not use any OS CoW tools), doesn't it?
Sean Kelly via Digitalmars-d-learn
2014-08-15 15:40:33 UTC
Permalink
Post by Dicebot via Digitalmars-d-learn
No, I was referring to the proposal to supply bigger stack size
to Fiber constructor - AFAIR it currently does allocate that
memory eagerly (and does not use any OS CoW tools), doesn't it?
I thought it did, but apparently the behavior of VirtualAlloc and
mmap (which Fiber uses to allocate the stack) simply reserves the
range and then commits it lazily, even though what you've told it
to do is allocate the memory. This is really great news since it
means that no code changes will be required to do the thing I
wanted to do anyway.
Dicebot via Digitalmars-d-learn
2014-08-15 15:50:29 UTC
Permalink
Post by Sean Kelly via Digitalmars-d-learn
Post by Dicebot via Digitalmars-d-learn
No, I was referring to the proposal to supply bigger stack
size to Fiber constructor - AFAIR it currently does allocate
that memory eagerly (and does not use any OS CoW tools),
doesn't it?
I thought it did, but apparently the behavior of VirtualAlloc
and mmap (which Fiber uses to allocate the stack) simply
reserves the range and then commits it lazily, even though what
you've told it to do is allocate the memory. This is really
great news since it means that no code changes will be required
to do the thing I wanted to do anyway.
Guess that means some experimenting this weekend for me too :)
Dicebot via Digitalmars-d-learn
2014-08-22 16:08:15 UTC
Permalink
Post by Dicebot via Digitalmars-d-learn
Post by Sean Kelly via Digitalmars-d-learn
Post by Dicebot via Digitalmars-d-learn
No, I was referring to the proposal to supply bigger stack
size to Fiber constructor - AFAIR it currently does allocate
that memory eagerly (and does not use any OS CoW tools),
doesn't it?
I thought it did, but apparently the behavior of VirtualAlloc
and mmap (which Fiber uses to allocate the stack) simply
reserves the range and then commits it lazily, even though
what you've told it to do is allocate the memory. This is
really great news since it means that no code changes will be
required to do the thing I wanted to do anyway.
Guess that means some experimenting this weekend for me too :)
Looks like it indeed just magically works. I have used this
simple program:

import core.thread;

void foo()
{
// int[10240] array;
Fiber.yield();
for (;;) {}
}

void main()
{
foreach (_ ; 0 .. 10_000)
{
//auto fiber = new Fiber(&foo);
auto fiber = new Fiber(&foo, 100_000);
fiber.call();
}
}

And checked peak memory profile via `time --format="%M"`. Fiber
constructor argument size did not change memory consumption at
all, only changing size of fiber stack variables did.
Carl Sturtivant via Digitalmars-d-learn
2014-08-15 20:17:50 UTC
Permalink
Post by Sean Kelly via Digitalmars-d-learn
Post by Dicebot via Digitalmars-d-learn
No, I was referring to the proposal to supply bigger stack
size to Fiber constructor - AFAIR it currently does allocate
that memory eagerly (and does not use any OS CoW tools),
doesn't it?
I thought it did, but apparently the behavior of VirtualAlloc
and mmap (which Fiber uses to allocate the stack) simply
reserves the range and then commits it lazily, even though what
you've told it to do is allocate the memory. This is really
great news since it means that no code changes will be required
to do the thing I wanted to do anyway.
Just read this after posting earlier replies! Very exciting.

I'll be doing some experiments to see how this works out.

What about at 32-bits?
Sean Kelly via Digitalmars-d-learn
2014-08-15 21:34:17 UTC
Permalink
Post by Carl Sturtivant via Digitalmars-d-learn
Post by Sean Kelly via Digitalmars-d-learn
I thought it did, but apparently the behavior of VirtualAlloc
and mmap (which Fiber uses to allocate the stack) simply
reserves the range and then commits it lazily, even though
what you've told it to do is allocate the memory. This is
really great news since it means that no code changes will be
required to do the thing I wanted to do anyway.
Just read this after posting earlier replies! Very exciting.
I'll be doing some experiments to see how this works out.
What about at 32-bits?
I'm sure it works the same, but reserving large chunks of memory
there would eat up the address space. I think the default will
have to remain some reasonably low number on 32-bit.
Carl Sturtivant via Digitalmars-d-learn
2014-08-15 20:11:42 UTC
Permalink
Post by Yuriy via Digitalmars-d-learn
On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant
Post by Carl Sturtivant via Digitalmars-d-learn
The default size of the runtime stack for a Fiber is
4*PAGESIZE which is very small, and a quick test shows that a
Fiber suffers a stack overflow that doesn't lead to a clean
termination when this limit is exceeded.
Pass a bigger stack size to the Fiber constructor?
No good if the stack size needed depends dynamically on the
computation in that Fiber.
Carl Sturtivant via Digitalmars-d-learn
2014-08-15 20:19:18 UTC
Permalink
Post by Carl Sturtivant via Digitalmars-d-learn
Post by Yuriy via Digitalmars-d-learn
On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant
Post by Carl Sturtivant via Digitalmars-d-learn
The default size of the runtime stack for a Fiber is
4*PAGESIZE which is very small, and a quick test shows that a
Fiber suffers a stack overflow that doesn't lead to a clean
termination when this limit is exceeded.
Pass a bigger stack size to the Fiber constructor?
No good if the stack size needed depends dynamically on the
computation in that Fiber.
Should have read further down the thread --- you're right as the
memory is in effect merely reserved virtual memory and isn't
actually allocated.
ketmar via Digitalmars-d-learn
2014-08-15 22:26:45 UTC
Permalink
On Fri, 15 Aug 2014 20:19:18 +0000
Carl Sturtivant via Digitalmars-d-learn
Post by Carl Sturtivant via Digitalmars-d-learn
Should have read further down the thread --- you're right as the
memory is in effect merely reserved virtual memory and isn't
actually allocated.
and we -- 32-bit addicts -- will run out of address space while 64-bit
happy people will laugh looking at us. ;-)
Kagamin via Digitalmars-d-learn
2014-08-16 17:49:28 UTC
Permalink
On Friday, 15 August 2014 at 22:26:54 UTC, ketmar via
Post by ketmar via Digitalmars-d-learn
and we -- 32-bit addicts -- will run out of address space while
64-bit
happy people will laugh looking at us. ;-)
You should only choose stack size carefully or keep data in
TempAlloc instead of stack.
ketmar via Digitalmars-d-learn
2014-08-16 23:25:38 UTC
Permalink
On Sat, 16 Aug 2014 17:49:28 +0000
Post by Kagamin via Digitalmars-d-learn
You should only choose stack size carefully or keep data in
TempAlloc instead of stack.
but i can choose stack size carefully without such 'address reserving'
feature. ;-)
Martin Nowak via Digitalmars-d-learn
2014-09-21 23:29:53 UTC
Permalink
Am I missing something? Is there a clean and simple way to get Fiber to
no longer suffer a stack overflow when implementing D-routines?
Simply choose a big enough stack size when creating your fibers.
http://dlang.org/library/core/thread/Fiber.this.html

It's fairly cheap to use a really big stack like 4MB, because memory
pages are committed lazily by the OS.
Loading...