Discussion:
[golang-dev] Transaction Oriented Collector (TOC) - Go's next GC
r***@golang.org
2016-06-24 12:16:03 UTC
Permalink
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions are
always welcome.

golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
t***@gmail.com
2016-06-24 13:36:30 UTC
Permalink
We expect to hear great new, cheers.
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
nodir via golang-dev
2016-06-24 16:50:42 UTC
Permalink
do I understand correctly that "(Before|After): 0 <text>" actually
means "(Before|After) 0 *indicates* <text>"? I spent some time scratching
my head try to understand why there was "0 allocated since the last GC"
Post by t***@gmail.com
We expect to hear great new, cheers.
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Nodir Turakulov' via golang-dev
2016-06-24 18:25:07 UTC
Permalink
I am a bit confused. Assuming heap spans are shared by different
goroutines, does "Current Sweep Pointer" belong to a span or goroutine?

if a span owns current sweep pointer and an allocation in any goroutine can
move it, the sentence "Objects between the initial sweep pointer and the
current sweep pointer were either marked reachable or have been allocated by
*this* goroutine." does not seem to be correct because an object may
belong to another goroutine if another goroutine moved the current sweeping
pointer.
Also it is unclear to me how the algorithm can "reset the Current Sweep
Pointer to the goroutine Initial Sweep Pointer" in Figure 7; wouldn't it
break other goroutines?

if a goroutine owns current sweep pointer, then the sentence "Each span
maintains a current sweep pointer" is confusing because they seem to imply
the opposite.
Also I assume the algorithm will make sure that segments [initial sweep
pointer, current sweep pointer) of different goroutines will never
intersect, otherwise unmarked bit past current sweep pointer does not
necessarily mean that mutator can use the associated memory for a new
object because it may used by an unpublished object of another goroutine,
it seems. If segments of different goroutines never intersect, does that
mean that there may be only one segment per span per goroutine or goroutine
will manage multiple segments per span?

thank you
Post by nodir via golang-dev
do I understand correctly that "(Before|After): 0 <text>" actually
means "(Before|After) 0 *indicates* <text>"? I spent some time scratching
my head try to understand why there was "0 allocated since the last GC"
Post by t***@gmail.com
We expect to hear great new, cheers.
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
rlh via golang-dev
2016-06-27 14:06:11 UTC
Permalink
That is correct, Before 0 _indicates_ <text>
Post by nodir via golang-dev
do I understand correctly that "(Before|After): 0 <text>" actually
means "(Before|After) 0 *indicates* <text>"? I spent some time scratching
my head try to understand why there was "0 allocated since the last GC"
Post by t***@gmail.com
We expect to hear great new, cheers.
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Caleb Spare
2016-06-24 18:37:23 UTC
Permalink
This is an interesting document. I have a few questions for you:

1. How does the GC understand that a goroutine is "dormant" between
requests? (As opposed to, say, waiting on i/o to finish processing a
request.)
1a. Do you think this algorithm will implicitly encourage people
towards one-goroutine-per-request architectures?

2. How do you think TOC will degenerate on workloads that do not match
the transactional hypothesis?
2a. What transactional and non-transactional workloads will you use to
evaluate TOC?

Thanks!
Caleb
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions are
always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
r***@golang.org
2016-06-27 14:34:50 UTC
Permalink
Good questions but I will try to avoid speculation. Ask us again in several
months when we have better answers.
Post by Caleb Spare
1. How does the GC understand that a goroutine is "dormant" between
requests? (As opposed to, say, waiting on i/o to finish processing a
request.)
Blocked with a very small stack seems to be a reasonable heuristic and is
where we intend to start. Build and measure.

1a. Do you think this algorithm will implicitly encourage people
Post by Caleb Spare
towards one-goroutine-per-request architectures?
Complicating a clean architecture to facilitate memory management is not
one of our goals. The best way forward is to create clean maintainable
architectures and trust us to make them fast.
Post by Caleb Spare
2. How do you think TOC will degenerate on workloads that do not match
the transactional hypothesis?
It will degenerate to what we have now plus the cost of the write barrier.
Keeping the cost of the write barrier low is the challenge.
Post by Caleb Spare
2a. What transactional and non-transactional workloads will you use to
evaluate TOC?
Our evaluations will be based on real world production systems as well as
the current set of benchmarks and tests. Who we are able to recruit to help
will play a large part in the workloads we will use to evaluate TOC.
Post by Caleb Spare
Thanks!
Caleb
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage
collection
Post by r***@golang.org
(GC) algorithm for Go. The Transaction Oriented Collector or TOC
algorithm
Post by r***@golang.org
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are
Post by r***@golang.org
always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google
Groups
Post by r***@golang.org
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by r***@golang.org
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
a***@gmail.com
2017-05-26 23:28:19 UTC
Permalink
Post by r***@golang.org
Good questions but I will try to avoid speculation. Ask us again in
several months when we have better answers.
I was reminded of this thread today. Do you have any updates to share on
the status of this work?
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
r***@golang.org
2017-05-28 15:14:53 UTC
Permalink
ROC* is running on a separate branch and there are CLs out for review. The
algorithm is being studied to see how it runs on a cross section of Go
programs and using the results to understand and improve performance. We
have yet to do a preform rigorous, publishable, and most importantly
reproducible evaluation that compares ROC to the current approach. Stay
tuned.

*ROC (Request Oriented Collector) is the new name for TOC that avoids the
acidic confusion of the word transaction.
Post by a***@gmail.com
Post by r***@golang.org
Good questions but I will try to avoid speculation. Ask us again in
several months when we have better answers.
I was reminded of this thread today. Do you have any updates to share on
the status of this work?
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Henrik Johansson
2017-10-10 12:30:07 UTC
Permalink
Is there any news on this?
I saw this talk from SRECon17

and he mentiones the ROC at some point close to the end.
It is from June I guess so I may very well be a bit late the GC party.
Post by r***@golang.org
ROC* is running on a separate branch and there are CLs out for review. The
algorithm is being studied to see how it runs on a cross section of Go
programs and using the results to understand and improve performance. We
have yet to do a preform rigorous, publishable, and most importantly
reproducible evaluation that compares ROC to the current approach. Stay
tuned.
*ROC (Request Oriented Collector) is the new name for TOC that avoids the
acidic confusion of the word transaction.
Post by a***@gmail.com
Post by r***@golang.org
Good questions but I will try to avoid speculation. Ask us again in
several months when we have better answers.
I was reminded of this thread today. Do you have any updates to share on
the status of this work?
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rick Hudson
2017-10-10 12:44:31 UTC
Permalink
We are working on a write-up, collecting numbers, and getting on paper what
we learned. It is a bit overdue but new ideas keep popping up.
Post by Henrik Johansson
Is there any news on this?
I saw this talk from SRECon17 http://youtu.be/q1h2g84EX1M
and he mentiones the ROC at some point close to the end.
It is from June I guess so I may very well be a bit late the GC party.
Post by r***@golang.org
ROC* is running on a separate branch and there are CLs out for review.
The algorithm is being studied to see how it runs on a cross section of Go
programs and using the results to understand and improve performance. We
have yet to do a preform rigorous, publishable, and most importantly
reproducible evaluation that compares ROC to the current approach. Stay
tuned.
*ROC (Request Oriented Collector) is the new name for TOC that avoids the
acidic confusion of the word transaction.
Post by a***@gmail.com
Post by r***@golang.org
Good questions but I will try to avoid speculation. Ask us again in
several months when we have better answers.
I was reminded of this thread today. Do you have any updates to share on
the status of this work?
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Henrik Johansson
2017-10-10 12:47:50 UTC
Permalink
Great, thanks!
Post by Rick Hudson
We are working on a write-up, collecting numbers, and getting on paper
what we learned. It is a bit overdue but new ideas keep popping up.
Post by Henrik Johansson
Is there any news on this?
I saw this talk from SRECon17 http://youtu.be/q1h2g84EX1M
and he mentiones the ROC at some point close to the end.
It is from June I guess so I may very well be a bit late the GC party.
Post by r***@golang.org
ROC* is running on a separate branch and there are CLs out for review.
The algorithm is being studied to see how it runs on a cross section of Go
programs and using the results to understand and improve performance. We
have yet to do a preform rigorous, publishable, and most importantly
reproducible evaluation that compares ROC to the current approach. Stay
tuned.
*ROC (Request Oriented Collector) is the new name for TOC that avoids
the acidic confusion of the word transaction.
Post by a***@gmail.com
Post by r***@golang.org
Good questions but I will try to avoid speculation. Ask us again in
several months when we have better answers.
I was reminded of this thread today. Do you have any updates to share
on the status of this work?
--
You received this message because you are subscribed to the Google
Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Matthew Dempsky' via golang-dev
2016-06-24 22:09:13 UTC
Permalink
Currently, spans allocate objects of a single allocation size class, and
the per-P mcache structure caches a span per size class. From the doc's
description, it's not obvious to me if these will still be true under TOC.

E.g., does "running goroutine" refer to a G or P when talking about
maintaining an initial sweep pointer? If G, does that mean we'll need more
spans so that we can cache them per G? Or is it generally already the case
that we can cache spans per-G?

Also, is it over-reading to understand that "an" initial sweep pointer
means a single span is used for all recently allocated objects, thereby
suggesting size classes are going away? Or would it actually be an initial
sweep pointer per size class?
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Yura Sokolov
2016-06-26 00:32:03 UTC
Permalink
Some (useless) thoughts:

I think, it is could be good to maintain "generations of goroutines per
thread":
- thread (P) maintains list of young generations of goroutines
- newly born goroutine falls into youngest generation.
- If size of youngest generation overflow some limit (for example, 10),
then new generation added as youngest
- all goroutines from same "young generation" (of same P) uses same TOC -
same heap span, same Initial Sweep and Current Sweep Pointer
- if all goroutines from same "generation" dies before GC, then "Current
Sweep Pointer" for this TOC is reset, and TOC reused.
- when GC came, all survived young generations (ie, which have at least 1
live goroutine) became tenured generations.
- tenured generations behave same as young generation till GC came.
- when GC came, all survived tenured goroutines are became usual, and do
not use TOC.

Pitfalls of this approach:
- young and tenured goroutine could not be individually stolen be other
thread (P).
only whole "young/tenured" generation at once.
- long living goroutines uses different allocation mechanism than
young/tenured (not TOC).

Probably, there could be a way to explicitly mark some "long lived"
goroutines for
using individual TOC:
- for example, i mark some long living loop as "long lived with 10MB TOC"
- the goroutine allocates using TOC until it allocates 10MB,
- then goroutine make "individual mark phase" - marks all its reachable and
not published objects:
- after that it may reset "current sweep pointer" to "initial sweep
pointer" safely.
- perhaps, copying collection could be used here instead:
since reachable and not published objects are invisible to other
goroutines,
it is safe to move them around.

With regards,
Sokolov Yura
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Rick Hudson
2016-06-27 12:18:15 UTC
Permalink
At a higher non-implementation level what is being discussed here is how
does one create or recognize a federation of goroutines where local means
local to the federation and when all the goroutines in a federation exits,
TOC can free all objects not published outside the federation.

Using birthday or age and a universal clock to identify such federations
seems misplaced in Go. Each goroutine can be throught of as having its own
clock and unlike languages where only a handfull of threads (and clocks)
are concurrently active in Go there may be tens or hundreds of thousands of
goroutines active.
Post by Yura Sokolov
I think, it is could be good to maintain "generations of goroutines per
- thread (P) maintains list of young generations of goroutines
- newly born goroutine falls into youngest generation.
- If size of youngest generation overflow some limit (for example, 10),
then new generation added as youngest
- all goroutines from same "young generation" (of same P) uses same TOC -
same heap span, same Initial Sweep and Current Sweep Pointer
- if all goroutines from same "generation" dies before GC, then "Current
Sweep Pointer" for this TOC is reset, and TOC reused.
- when GC came, all survived young generations (ie, which have at least 1
live goroutine) became tenured generations.
- tenured generations behave same as young generation till GC came.
- when GC came, all survived tenured goroutines are became usual, and do
not use TOC.
- young and tenured goroutine could not be individually stolen be other
thread (P).
only whole "young/tenured" generation at once.
- long living goroutines uses different allocation mechanism than
young/tenured (not TOC).
Probably, there could be a way to explicitly mark some "long lived"
goroutines for
- for example, i mark some long living loop as "long lived with 10MB TOC"
- the goroutine allocates using TOC until it allocates 10MB,
- then goroutine make "individual mark phase" - marks all its reachable
- after that it may reset "current sweep pointer" to "initial sweep
pointer" safely.
since reachable and not published objects are invisible to other
goroutines,
it is safe to move them around.
With regards,
Sokolov Yura
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to a topic in the
Google Groups "golang-dev" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/golang-dev/WcZaqTE51ZU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
hay
2016-08-06 14:20:39 UTC
Permalink
Hi Sir,

How about adding a keyword to launch goroutine in gctoc like: *tgo*
TestGoroutine()? And it would be possible to set option for gctoc goroutine
so GC doesn't come in until the goroutine is finished? It would help
building soft real-time systems.

Kind regards,
Post by Rick Hudson
At a higher non-implementation level what is being discussed here is how
does one create or recognize a federation of goroutines where local means
local to the federation and when all the goroutines in a federation exits,
TOC can free all objects not published outside the federation.
Using birthday or age and a universal clock to identify such federations
seems misplaced in Go. Each goroutine can be throught of as having its own
clock and unlike languages where only a handfull of threads (and clocks)
are concurrently active in Go there may be tens or hundreds of thousands of
goroutines active.
Post by Yura Sokolov
I think, it is could be good to maintain "generations of goroutines per
- thread (P) maintains list of young generations of goroutines
- newly born goroutine falls into youngest generation.
- If size of youngest generation overflow some limit (for example, 10),
then new generation added as youngest
- all goroutines from same "young generation" (of same P) uses same TOC -
same heap span, same Initial Sweep and Current Sweep Pointer
- if all goroutines from same "generation" dies before GC, then "Current
Sweep Pointer" for this TOC is reset, and TOC reused.
- when GC came, all survived young generations (ie, which have at least 1
live goroutine) became tenured generations.
- tenured generations behave same as young generation till GC came.
- when GC came, all survived tenured goroutines are became usual, and do
not use TOC.
- young and tenured goroutine could not be individually stolen be other
thread (P).
only whole "young/tenured" generation at once.
- long living goroutines uses different allocation mechanism than
young/tenured (not TOC).
Probably, there could be a way to explicitly mark some "long lived"
goroutines for
- for example, i mark some long living loop as "long lived with 10MB TOC"
- the goroutine allocates using TOC until it allocates 10MB,
- then goroutine make "individual mark phase" - marks all its reachable
- after that it may reset "current sweep pointer" to "initial sweep
pointer" safely.
since reachable and not published objects are invisible to other
goroutines,
it is safe to move them around.
With regards,
Sokolov Yura
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to a topic in the
Google Groups "golang-dev" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/golang-dev/WcZaqTE51ZU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Ian Lance Taylor
2016-08-06 16:02:51 UTC
Permalink
How about adding a keyword to launch goroutine in gctoc like: tgo
TestGoroutine()? And it would be possible to set option for gctoc goroutine
so GC doesn't come in until the goroutine is finished? It would help
building soft real-time systems.
I think the current approach that the runtime is taking is already
making Go suitable for soft real-time systems. The GC is largely
concurrent, and stop-the-world times are steadily shrinking.

In the current runtime it does not make sense to speak of stopping the
GC. The GC is always running. It does make sense to speak of
blocking a stop-the-world step. But it would not be a good idea for a
Go program to be able to block a stop-the-world step for an arbitrary
length of time. That makes it very easy for a program to break
itself.

I do not think that is a direction we should move in. And if we did,
I do not think it should be done by adding a new language keyword.
The same effect could be had by calling a runtime function
"runtime.EnableGC(enable bool)". But I don't think that runtime
function would be a good idea either.

Ian
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Dmitry Vyukov' via golang-dev
2016-06-26 16:49:53 UTC
Permalink
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
What percent of objects that can be collected by TOC can also potentially
be stack allocated?
TOC and stack allocation solve the same problem, while stack allocation is
cheaper, provide better locality and shorter recycling time.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Keith Randall' via golang-dev
2016-06-26 17:22:13 UTC
Permalink
TOC can handle all kinds of non-compile-time-constant-size and
loop-allocated objects which cannot be done with stack allocation.


On Sun, Jun 26, 2016 at 9:49 AM, 'Dmitry Vyukov' via golang-dev <
Post by 'Dmitry Vyukov' via golang-dev
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
What percent of objects that can be collected by TOC can also potentially
be stack allocated?
TOC and stack allocation solve the same problem, while stack allocation is
cheaper, provide better locality and shorter recycling time.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Dmitry Vyukov' via golang-dev
2016-06-27 09:29:44 UTC
Permalink
I mean that aggressive stack allocation is an essential part of
performance story long term regardless of what we do in malloc/GC. It
avoids malloc overhead, cuts GC overhead, provides good locality and
quick reuse. While TOC incurs malloc overhead and cost of barriers
enabled all the time. If we ignore opportunities for more aggressive
stack allocation while evaluating TOC, then we can falsely conclude
that it's profitable. While if we allocate more objects on stack it
may turn out that TOC savings do not outweigh the cost of write
barriers and increased request latency.
That's why I am asking about the ratios. When I looked at HTTP
allocations, some of them were clear candidates for stack allocation,
and some are leaking to other goroutines. The question is: what's the
size of that middle ground that does not leak but is inherently
unallocatable on stack?
Post by 'Keith Randall' via golang-dev
TOC can handle all kinds of non-compile-time-constant-size and
loop-allocated objects which cannot be done with stack allocation.
On Sun, Jun 26, 2016 at 9:49 AM, 'Dmitry Vyukov' via golang-dev
Post by 'Dmitry Vyukov' via golang-dev
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions are
always welcome.
golang.org/s/gctoc
What percent of objects that can be collected by TOC can also potentially
be stack allocated?
TOC and stack allocation solve the same problem, while stack allocation is
cheaper, provide better locality and shorter recycling time.
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Dmitry Vyukov' via golang-dev
2016-06-27 09:54:54 UTC
Permalink
Post by 'Dmitry Vyukov' via golang-dev
I mean that aggressive stack allocation is an essential part of
performance story long term regardless of what we do in malloc/GC. It
avoids malloc overhead, cuts GC overhead, provides good locality and
quick reuse. While TOC incurs malloc overhead and cost of barriers
enabled all the time. If we ignore opportunities for more aggressive
stack allocation while evaluating TOC, then we can falsely conclude
that it's profitable. While if we allocate more objects on stack it
may turn out that TOC savings do not outweigh the cost of write
barriers and increased request latency.
That's why I am asking about the ratios. When I looked at HTTP
allocations, some of them were clear candidates for stack allocation,
and some are leaking to other goroutines. The question is: what's the
size of that middle ground that does not leak but is inherently
unallocatable on stack?
Post by 'Keith Randall' via golang-dev
TOC can handle all kinds of non-compile-time-constant-size and
loop-allocated objects which cannot be done with stack allocation.
Besides more obvious stack allocation opportunities that we currently
miss, dynamically-sized and loop-allocated objects can be handled by
allocating them of a separate shadow stack with bump-the-pointer
allocation and rewind of the shadow stack pointer on return. That
should handle "request deserialization" case.
On top of that, since the shadow stack is non copied on normal stack
growth, it can also handle some of the cases of shared objects when we
can statically prove strict scoping. For example, if a function spawns
a set of subgoroutines and joins them before exit, or if a function
sends a request to another goroutine over a channel and waits for a
response, the shared objects can be allocated on the shadow stack.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Dmitry Vyukov' via golang-dev
2018-07-18 11:28:10 UTC
Permalink
Post by 'Dmitry Vyukov' via golang-dev
Post by 'Dmitry Vyukov' via golang-dev
I mean that aggressive stack allocation is an essential part of
performance story long term regardless of what we do in malloc/GC. It
avoids malloc overhead, cuts GC overhead, provides good locality and
quick reuse. While TOC incurs malloc overhead and cost of barriers
enabled all the time. If we ignore opportunities for more aggressive
stack allocation while evaluating TOC, then we can falsely conclude
that it's profitable. While if we allocate more objects on stack it
may turn out that TOC savings do not outweigh the cost of write
barriers and increased request latency.
That's why I am asking about the ratios. When I looked at HTTP
allocations, some of them were clear candidates for stack allocation,
and some are leaking to other goroutines. The question is: what's the
size of that middle ground that does not leak but is inherently
unallocatable on stack?
Post by 'Keith Randall' via golang-dev
TOC can handle all kinds of non-compile-time-constant-size and
loop-allocated objects which cannot be done with stack allocation.
Besides more obvious stack allocation opportunities that we currently
miss, dynamically-sized and loop-allocated objects can be handled by
allocating them of a separate shadow stack with bump-the-pointer
allocation and rewind of the shadow stack pointer on return. That
should handle "request deserialization" case.
On top of that, since the shadow stack is non copied on normal stack
growth, it can also handle some of the cases of shared objects when we
can statically prove strict scoping. For example, if a function spawns
a set of subgoroutines and joins them before exit, or if a function
sends a request to another goroutine over a channel and waits for a
response, the shared objects can be allocated on the shadow stack.
Thinking of this more, if we have not just secondary shadow stack, but a
secondary shadow stack per _frame_, then we could even express things like:
if this conditional dynamically-sized allocation site allocate, then the
allocation needs to go to parent-parent secondary stack, because compiler
proved that it is scoped against parent-parent frame.
Such scheme is potentially capable of capturing most of what TOC would
capture, and maybe even more (e.g. allocations that are temporary
accessible to another goroutine but still scoped against host goroutine),
while still providing cheap allocation, perfect locality and prompt, bulk
deallocation with no barriers involved.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Philip Hofer
2018-07-27 17:41:40 UTC
Permalink
Post by 'Dmitry Vyukov' via golang-dev
Thinking of this more, if we have not just secondary shadow stack, but a
if this conditional dynamically-sized allocation site allocate, then the
allocation needs to go to parent-parent secondary stack, because compiler
proved that it is scoped against parent-parent frame.
Such scheme is potentially capable of capturing most of what TOC would
capture, and maybe even more (e.g. allocations that are temporary
accessible to another goroutine but still scoped against host goroutine),
while still providing cheap allocation, perfect locality and prompt, bulk
deallocation with no barriers involved.
It would be interesting to see what cases the compiler could detect with
moderate effort.

One problem I foresee is that many functions that could have this
optimization applied for some call sites would not be able to use it for
other call sites. So, would you monomorphize call-sites based on escape
analysis?

More concretely, let's say I have the following:

function f() []byte {
o := make([]byte, 1)
o[0] = 0x3f
return o
}

function g() bool {
return f()[0] == 0x3f
}

function h() {
r := f()
// ... do something with 'r' like assign it to a global or send it on a
channel
}

(Let's ignore the effects of inlining.) The call site of f() in g() can use
g's shadow stack. The call site of f() in h() must use the heap. How do you
compile only one copy of f? Or do you compile two?
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Austin Clements' via golang-dev
2018-07-27 17:53:01 UTC
Permalink
Specializing f is definitely a possibility, though has the usual downsides:
How far do you have to specialize up the call stack? Do you specialize on
allocation of things other than the return value and, if so, how many
copies of f might you need?

Another possibility is to pass this information around dynamically in a
hidden argument. We would only have to add this in cases where it's useful
(e.g., when f's allocation could go on the shadow stack) and it would be
easy to map different bits to different allocations. With this sort of
approach, there's even a fighting chance of making this work through
interface and closure calls. This is a little tricky with the calling
convention, though. Maybe we steal the bottom three bits from the context
register. :)
Post by Philip Hofer
Post by 'Dmitry Vyukov' via golang-dev
Thinking of this more, if we have not just secondary shadow stack, but a
if this conditional dynamically-sized allocation site allocate, then the
allocation needs to go to parent-parent secondary stack, because compiler
proved that it is scoped against parent-parent frame.
Such scheme is potentially capable of capturing most of what TOC would
capture, and maybe even more (e.g. allocations that are temporary
accessible to another goroutine but still scoped against host goroutine),
while still providing cheap allocation, perfect locality and prompt, bulk
deallocation with no barriers involved.
It would be interesting to see what cases the compiler could detect with
moderate effort.
One problem I foresee is that many functions that could have this
optimization applied for some call sites would not be able to use it for
other call sites. So, would you monomorphize call-sites based on escape
analysis?
function f() []byte {
o := make([]byte, 1)
o[0] = 0x3f
return o
}
function g() bool {
return f()[0] == 0x3f
}
function h() {
r := f()
// ... do something with 'r' like assign it to a global or send it on
a channel
}
(Let's ignore the effects of inlining.) The call site of f() in g() can
use g's shadow stack. The call site of f() in h() must use the heap. How do
you compile only one copy of f? Or do you compile two?
--
You received this message because you are subscribed to the Google Groups
"golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Philip Hofer
2018-07-27 18:03:56 UTC
Permalink
Post by 'Austin Clements' via golang-dev
Another possibility is to pass this information around dynamically in a
hidden argument. We would only have to add this in cases where it's useful
(e.g., when f's allocation could go on the shadow stack) and it would be
easy to map different bits to different allocations. With this sort of
approach, there's even a fighting chance of making this work through
interface and closure calls. This is a little tricky with the calling
convention, though. Maybe we steal the bottom three bits from the context
register. :)
Yes, that seems like the way to go. My intuition is that you'd end up
with exponentially growing specialization opportunities. (... unless
you can determine that the function is only called once, but in that
case it should just be inlined.)

With closure calls, you're already passing around "hidden" values, so
it would be easy enough to add the information to the closure context.

For regular function calls, you could re-use the closure context register.
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Randall Farmer
2016-06-27 06:37:00 UTC
Permalink
I'm enthused. Clever but understandable, seems like a great fit for Go and
a lot of stuff folks build in it.

Am I guessing right that the single contiguous span is a simplification to
describe the idea, and there are still spans per size class, etc.? If so,
does that mean that a goroutine's min. footprint while running might
include a span for each size class it uses where they don't now? Seems fine
if so (we're still talking KBs and you get a lot in return), but could rate
a sentence in any eventual release notes, .


Something fun re: data being published in possibly surprising ways. The
net/http server uses a sync.Pool to get a pointer to a bufio.Writer likely
allocated by some other goroutine, and then sets the bufio.Writer's target
io.Writer to a chunkWriter for writing to the HTTP response. The
chunkWriter holds a pointer to the response, so it can reach request and
response headers, etc. So by using a *bufio.Writer that isn't in your local
spans, you prevent the TOC from collecting the whole request/response objs.
Not how big a deal that is in terms of bytes, but it's surprising behavior!
(Also, talking here about HTTP/1.1 code to be clear. Know h2 stuff is
intentionally communicating a lot.)

Here I see two workarounds. If you copy a bufio.Writer by value instead of
by pointer, now it really *is* in local memory and setting its Writer
doesn't publish anything. (If you're doing that you'd also need to reset
the target io.Writer to nil before returning it to the pool.) I imagine
something similar may come up in other situations--you *wish* some struct
were local but it's not, so you copy it and your wish comes true, heh. Or,
much simpler fix, but more specific to this situation, is to drop the
sync.Pools if TOC makes it cheap to just locally alloc a bufio.Writer per
response.


Assuming this goes in, I am sort of curious what you guys' position winds
up being re: launching goroutines you otherwise wouldn't as a hint to the
collector, and curious if a runtime.LocalGC() will be exposed. I'm not sure
I have much to add on either.
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
r***@golang.org
2016-06-27 14:44:16 UTC
Permalink
Inline.
Post by Randall Farmer
I'm enthused. Clever but understandable, seems like a great fit for Go and
a lot of stuff folks build in it.
Am I guessing right that the single contiguous span is a simplification to
describe the idea, and there are still spans per size class, etc.? If so,
does that mean that a goroutine's min. footprint while running might
include a span for each size class it uses where they don't now? Seems fine
if so (we're still talking KBs and you get a lot in return), but could rate
a sentence in any eventual release notes, .
That is correct. We tried to avoid implementation detail in this document.
Post by Randall Farmer
Something fun re: data being published in possibly surprising ways. The
net/http server uses a sync.Pool to get a pointer to a bufio.Writer likely
allocated by some other goroutine, and then sets the bufio.Writer's target
io.Writer to a chunkWriter for writing to the HTTP response. The
chunkWriter holds a pointer to the response, so it can reach request and
response headers, etc. So by using a *bufio.Writer that isn't in your local
spans, you prevent the TOC from collecting the whole request/response objs.
Not how big a deal that is in terms of bytes, but it's surprising behavior!
(Also, talking here about HTTP/1.1 code to be clear. Know h2 stuff is
intentionally communicating a lot.)
Here I see two workarounds. If you copy a bufio.Writer by value instead of
by pointer, now it really *is* in local memory and setting its Writer
doesn't publish anything. (If you're doing that you'd also need to reset
the target io.Writer to nil before returning it to the pool.) I imagine
something similar may come up in other situations--you *wish* some struct
were local but it's not, so you copy it and your wish comes true, heh. Or,
much simpler fix, but more specific to this situation, is to drop the
sync.Pools if TOC makes it cheap to just locally alloc a bufio.Writer per
response.
Assuming this goes in, I am sort of curious what you guys' position winds
up being re: launching goroutines you otherwise wouldn't as a hint to the
collector, and curious if a runtime.LocalGC() will be exposed. I'm not sure
I have much to add on either.
runtime.LocalGC will not be exposed. My position is to launch goroutines
when it makes the application cleaner. Do not warp application logic or
complicate APIs to fit some version of the GC.
Post by Randall Farmer
Post by r***@golang.org
Over the course of the next several months we hope to implement, test,
optimize, and if the numbers warrent, deploy a brand new garbage collection
(GC) algorithm for Go. The Transaction Oriented Collector or TOC algorithm
will be integrated with and augment Go's current low latency collector.
golang.org/s/gctoc contains an overview, some informal proofs showing
correctness, and our implementation approach. Comments and suggestions
are always welcome.
golang.org/s/gctoc
--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...