[boost] [poly_collection] Memory allocation and allocator propagation

Post by Ion GaztaÃ±aga via Boost
Hi Joaquín,
/////////////////////////
// Memory indirections
/////////////////////////
If I understand code correctly, the internal structure of a poly
1) poly_collection stores an unordered_map<type_index, segment_type>
2) segment_type stores a unique_ptr<segment_backend> and
segment_backend_is an abstract class, so roughly equivalent to
unique_ptr<packed_segment/split_segment> (both
packed_segment/split_segment derive from segment_backend.
3) packed_segment/split_segment stores a vector of concrete types.
So we need 3 memory hops to navigate between the "this" pointer of a
PolyCollection to a pointer to a concrete type: poly_collection ->
unordered node -> segment in unique_ptr -> vector value.
I think the data structure could be optimized to save one indirection.
segment_type could directly hold the class derived from
segment_backend as packed_segment/split_segment will need the same
size and alignment requirements regardless of the ConcreteType they
hold (as long as they use std::vector). This could improve insertion
times and other operations. This optimization could also make easier
to fix the following section (allocator propagation).
*Main question 1*: Do you find think PolyCollection should implement
this optimization?

How do yo envision this could be done? Replacing the current
std::unique_ptr<segment_backend> pimpl with a
std::variant<packed_segment,split_segment> (which would restrict the set
of implementations of segment_backend to only these two)? Or do you have
something else in mind?

In principle if the optimization is reasonably implemented I'd say it'd
be welcome. I don't
think this is going to show in performance, though, but profiling could
tell us.

Post by Ion GaztaÃ±aga via Boost
/////////////////////////
// Allocator propagation
/////////////////////////
As mentioned in Adam Wulkiewicz's review PolyCollection currently does
not propagate the allocator from the constructor argument of
poly_collection to the concrete type. From a practical point of view,
the key requirement is that all intermediate memory until the concrete
type must be allocated using the same root allocator passed when a
poly_collection was constructed, and that allocator must be propagated
also to the concrete type.

We have a problem here: your key requirement can be split in two:

A. Allocator is propagated down the data structure for memory allocaion
B. allocator_traits<allocator_type>::construct/destroy is used for
construction/destruction
of the "container's element type".

As for A, in fact the root allocator is copied and used for all dynamic
memory handling
from the top std::unordered_map down to the private std::vectors in
packed_segment/split_segment *except* at the
std::unique_ptr<segment_backend> pimpl
part. This could be remedied either by supressing the unique_ptr
altogether as you
suggest at the beginning of the post, or providing a custom deleter to
unique_ptr or
something.

Now as for B: This really depends on what the "container's element type"
is supposed
to be. A natural posibility would be to equate this with the container's
value_type, in
which the case the situation is:

* base_collection<Base> does *not* use
allocator_traits<allocator_type>::construct/destroy
for construction of its value_type (=Base). What gets into the internal
vector is a
value_holder<Derived>, which constructs/destroys the contained Derived
values through
placement new and direct call to Derived::~Derived, respectively
Strictly speaking, it is
impossible to use allocator_traits<allocator_type>::construct/destroy
for *Base*, as it
is *Derived* objects that get constructed/destroyed.
* function_collection and any_collection *do* use
allocator_traits<allocator_type>::construct/destroy
for construction/destruction of their value_type objects
(function_collection_value_type and
any_collection_value_type); take a look at

https://github.com/joaquintides/poly_collection/blob/master/include/boost/poly_collection/detail/split_segment.hpp#L54-L57

Now, if by "container's element type" we understand the various concrete
types that
get stored in the polymorphic collections, the the situation is like in
base_collection
before: none of these concrete objects is cosntructed/destroyed through
an allocator
but directly with placement new / Concrete::~Concrete inside value_holder.

Summing up: I see A (all dyamic memory handled by passed allocator) as
an useful and
desireable feature. I don't see the point of doing contortions to
achieve B (by passing
allocators to value_holder etc.) just for the sake of compliance, much
more so when in
one collection (namely base_collection) it is impossible to comply in a
literal sense (it is
derived classes that get constructed/destroyed).

Post by Ion GaztaÃ±aga via Boost
[...]
I think that a unique memory source for all internal structures and
the correct propagation of the allocator using allocator_traits is a
very interesting feature for PolyCollection.

Hopefully answered above.

Post by Ion GaztaÃ±aga via Boost
[...]
*Main question 2*: Do you think PolyCollection should support stateful
allocator using the scoped allocator model?

Now, the scoped allocator model is an even stronger requirement than we
have discussed
so far. I don't have the necessary expertise in this, but I fail to see
the applicability to
PolyCollection from a cursory glance at N2554.

Post by Ion GaztaÃ±aga via Boost
Some specific notes that might show places where the scoped allocator
[...]
- The "emplace" operation uses a placement new to wrapped in a lambda
which is type-erased with a captureless lambda. It looks to me that
instead of delegating the work into the value_holder, the type erasure
arguments should be treated by the segment so that internal vectors
propagate the allocator when needed.

I don't see how this could be done: type erasure must happen before
segment is used, and actual
args types are then unknown to segment... Care to elaborate?

Joaquín M López Muñoz

_______________________________________________
Unsubscribe & other changes: http://lists.boost.o

Ion Gaztañaga via Boost

2017-05-15 23:00:07 UTC

Post by Ion GaztaÃ±aga via Boost
I think the data structure could be optimized to save one indirection.
segment_type could directly hold the class derived from
segment_backend as packed_segment/split_segment will need the same
size and alignment requirements regardless of the ConcreteType they
hold (as long as they use std::vector). This could improve insertion
times and other operations. This optimization could also make easier
to fix the following section (allocator propagation).
*Main question 1*: Do you find think PolyCollection should implement
this optimization?

In theory each Model can specify the alignment and size to hold every
possible segment type for any ConcreteType. I think any ConcreteType
will have the same size and alignment requirements (as a vector stores
pointers so the size or alignment of T does not participate in the size
or alignment of segment_type).

So the requirement is that there is an upper bound on the size and
alignment for segment types holding any ConcreteType. Class segment
holds aligned storage to hold the any concrete segment_type:

class MyModel
{
private:
typedef packed_segment<MyModel, DummyT, DummyTAlloc> any_segment_t;

private:
//size of any segment_type usable with this model
static std::size_t segment_type_size = sizeof(any_segment_t);

//alignment
static std::size_t segment_type_alignment = alignof(any_segment_t)

template<typename Derived,typename Allocator>
static segment_backend* make_in_place(void *addr, const Allocator& al)
{
typedef typename std::allocator_traits<Allocator>::
template rebind_alloc<Derived> derived_alloc_t;
typedef packed_segment<MyModel, Derived, derived_alloc_t>
any_segment_t;

//Propagate allocator using a possibly custom construct operation
typedef allocator_traits<Allocator>::rebind_traits
<any_segment_t> segment_traits_t;
return segment_traits_t::construct(addr, al);
}
};

Maybe segment_backend should have a virtual function that does the
placement destruction using allocator_traits. When "class segment's"
destructor is run, that virtual function is used to destroy the inplace
object.

Post by Joaquin M LÃ³pez MuÃ±oz via Boost
A. Allocator is propagated down the data structure for memory allocaion
B. allocator_traits<allocator_type>::construct/destroy is used for
construction/destruction
of the "container's element type".
As for A, in fact the root allocator is copied and used for all dynamic
memory handling
from the top std::unordered_map down to the private std::vectors in
packed_segment/split_segment *except* at the
std::unique_ptr<segment_backend> pimpl
part. This could be remedied either by supressing the unique_ptr
altogether as you
suggest at the beginning of the post, or providing a custom deleter to
unique_ptr or
something.

Right. I noticed that value_holder also does not propagate the allocator
argument when the concrete type is constructed. value_holder could be
defined so that uses_allocator_v == true (defining an internal
allocator_type and having extended constructors. Then it could use
allocator traits to construct the concrete type and forward the
allocator if that type supports it.

IMHO, even if the standard seems to say the opposite (as only requires
allocator_traits::construct for the element_type), I think the allocator
should be propagated also to the vector holding the index in
split_segment. The idea is that all memory needed by the container
should come from the same source.

Post by Joaquin M LÃ³pez MuÃ±oz via Boost
Now as for B: This really depends on what the "container's element type"
is supposed
to be. A natural posibility would be to equate this with the container's
value_type, in

I think the standard is written to support only std containers but not
generic containers. IMHO requirement B is not applicable to PolyCollection.

Post by Joaquin M LÃ³pez MuÃ±oz via Boost
Summing up: I see A (all dyamic memory handled by passed allocator) as
an useful and
desireable feature. I don't see the point of doing contortions to
achieve B (by passing
allocators to value_holder etc.) just for the sake of compliance, much
more so when in
one collection (namely base_collection) it is impossible to comply in a
literal sense (it is
derived classes that get constructed/destroyed).

You need to pass the allocator to value_holder so it forwards it to the
real concrete type, which could be a container. Just use
allocator_traits each time you need to construct an object in place.

Post by Ion GaztaÃ±aga via Boost
- The "emplace" operation uses a placement new to wrapped in a lambda
which is type-erased with a captureless lambda. It looks to me that
instead of delegating the work into the value_holder, the type erasure
arguments should be treated by the segment so that internal vectors
propagate the allocator when needed.

I don't see how this could be done: type erasure must happen before
segment is used, and actual
args types are then unknown to segment... Care to elaborate?

I haven't thought it carefully and it might not possible at all, but
maybe the lambda could execute:

s.emplace_back(forward<Args>(args...));

where s is the concrete segment type that can be obtained via Model (the
address of the aligned storage inside "class storage" is casted to
Mode::get_segment_type<T>::type*) because the ConcreteType we are going
to construct is known at compile time in emplace_back_impl.

I also wonder if insert<T> and other functions should avoid slicing
detection so that they can also avoid the virtual call overhead. If T is
assumed to be the same type of the final ConcreteType, you can use
typeid(T) instead of typeid (x) (which can require to trip through the
vptr) when searching in the unordered map and obtain a pointer to the
concrete segment type just casting. Insertion times could be improved,
specially range insertions could shine (as vector could be resized just
once). Or am I missing something?

Best,

Ion

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/li

Joaquin M López Muñoz via Boost

2017-05-16 07:23:52 UTC

[discussion about avoiding unique_ptr in segment_backend and scoped
allocators]

I'll definitely try to implement this along your suggestions --might
need your
offlist help, though :-)

I don't see how this could be done: type erasure must happen before
segment is used, and actual
args types are then unknown to segment... Care to elaborate?

I haven't thought it carefully and it might not possible at all, but
s.emplace_back(forward<Args>(args...));
where s is the concrete segment type that can be obtained via Model
(the address of the aligned storage inside "class storage" is casted
to Mode::get_segment_type<T>::type*) because the ConcreteType we are
going to construct is known at compile time in emplace_back_impl.
I also wonder if insert<T> and other functions should avoid slicing
detection so that they can also avoid the virtual call overhead. If T
is assumed to be the same type of the final ConcreteType, you can use
typeid(T) instead of typeid (x) (which can require to trip through the
vptr) when searching in the unordered map and obtain a pointer to the
concrete segment type just casting. Insertion times could be improved,
specially range insertions could shine (as vector could be resized
just once). Or am I missing something?

Ion, I think you're on to something here. As it happens, poly_collection
can
statically determine when T is acceptable and dynamically detect the cases
where x's subtype is actually T, in which case virtual calls can be omitted
altogether --A reference to the internal
packed_segment/split_segment<Model,T,Allocator>
can be statically obtained from segment<Model,Allocator>. I have to
think this
carefully, but this can be huge in terms of insertion performance.
There's no need
to dispense with slicing detection.

Joaquín M López Muñoz

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/

Ion Gaztañaga via Boost

2017-05-16 11:35:07 UTC

Post by Ion GaztaÃ±aga via Boost
where s is the concrete segment type that can be obtained via Model
(the address of the aligned storage inside "class storage" is casted
to Mode::get_segment_type<T>::type*) because the ConcreteType we are
going to construct is known at compile time in emplace_back_impl.
I also wonder if insert<T> and other functions should avoid slicing
detection so that they can also avoid the virtual call overhead. If T
is assumed to be the same type of the final ConcreteType, you can use
typeid(T) instead of typeid (x) (which can require to trip through the
vptr) when searching in the unordered map and obtain a pointer to the
concrete segment type just casting. Insertion times could be improved,
specially range insertions could shine (as vector could be resized
just once). Or am I missing something?

For the emplace function, as the ConcreteType it's always statically
known (type T is the type to be constructed, whereas in "insert" T is
the type used to construct the concrete type), then I think no and
lambda is needed, it could be directly translated to a simple call to
the statically known segment type:

s.emplace_back(forward<Args>(args...));

Post by Joaquin M LÃ³pez MuÃ±oz via Boost
Ion, I think you're on to something here. As it happens, poly_collection
can
statically determine when T is acceptable and dynamically detect the cases
where x's subtype is actually T, in which case virtual calls can be omitted
altogether --A reference to the internal
packed_segment/split_segment<Model,T,Allocator>
can be statically obtained from segment<Model,Allocator>.

Ok, I see it. If typeid(x) == typeid(T), the static path (let's call it
statically_known_insert(x)) is taken. Otherwise the virtual function is
used.

In any case, the runtime slicing detection will slow down ranges, as
you'll need to check the typeid of each *it reference in the range. If
all the range is assumed to be the same runtime type (or slicing is
assumed) then the operation is a simple:

s.insert(s.end(), first, last);

which will use distance(first, last) for forward iterators, allocate all
the storage once and avoid a lot of objet moving/copying. Maybe a new
overload for insert taking a range that the user does not care to slice
could be a good idea for performance reasons:

base_collection.insert<warrior>(warriors_first, warriors_last);

For absolute zero overhead (no repeteated typeid, no repeated hash
search), a segment handle utility comes to my mind:

typedef base_collection<sprite> bc_t;
bc_t bc;

//register type
bc.register_types<warrior>();

//Obtain segment handle (maybe just a struct holding the pointer to
//the concrete segment)
bc_t::segment_handle<warrior> sh = bc.segment_handle<warrior>();

//Insert directly in the segment (just vector insertion)
//and updating bc's member data if any.
bc.insert(sh, pos, warrior);

Additional comments:

1) I also noticed that bc.size() and bc.empty() are not constant-time.
It seems a bit surprising, it might be noticeable when the number of
segments grows. This avoids common data in poly_collection but I imagine
most users assume these functions are free.

2) The number of additional operations that could take advantage of the
"I know your static type" optimization is important, although the
speedup will depend on the cost of each operation (eg. size()/ reserve()
vs. a virtual function call):

template<typename T> void reserve(size_type n);
template<typename T> void shrink_to_fit();
template<typename T> size_type capacity() const;
template<typename T> size_type max_size() const;
template<typename T> size_type size() const;
template<typename T> bool empty() const;
template<typename T> void clear();

Additional possible "zero cost" calls using the segment handle:

template<typename SegmentHandle>
void reserve(SegmentHandle sh, size_type n);

template<typename SegmentHandle>
void shrink_to_fit(SegmentHandle sh);

template<typename SegmentHandle>
size_type capacity(SegmentHandle sh) const;

template<typename SegmentHandle>
size_type max_size(SegmentHandle sh) const;

template<typename SegmentHandle>
size_type size(SegmentHandle sh) const;

template<typename SegmentHandle>
bool empty(SegmentHandle sh) const;

template<typename SegmentHandle>
void clear(SegmentHandle sh);

Best,

Ion

_______________________________________________
Unsubscribe & other changes: http://list

Joaquin M López Muñoz via Boost

2017-05-16 18:28:16 UTC

Correct.

Not in every case, see for instance the two insert overloads at:

https://github.com/joaquintides/poly_collection/blob/master/include/boost/poly_collection/detail/poly_collection.hpp#L660-L683

where you can notice than when the type is terminal we don't repeat
typeid checking.
But in general we need to typeid individually, and I don't see this as
something we
can dispense with: otherwise there'd be no way to do type-erased
insertions as
simple as:

boost::ptr_vector<Base> v;
...
c.insert(p.begin(),p.end());

Post by Ion GaztaÃ±aga via Boost
If all the range is assumed to be the same runtime type (or slicing is
s.insert(s.end(), first, last);
which will use distance(first, last) for forward iterators, allocate
all the storage once and avoid a lot of objet moving/copying. Maybe a
new overload for insert taking a range that the user does not care to
base_collection.insert<warrior>(warriors_first, warriors_last);

We sort of already have that:

bc.insert(bc.end<warrior>(),warriors_first, warriors_last);

See:

https://github.com/joaquintides/poly_collection/blob/master/include/boost/poly_collection/detail/poly_collection.hpp#L694-L705

This is efficient typeid-wise, not so with respect to preallocating
based on distance.
If we go ahead with your proposal of static-fying the interface when the
concrete type
is known, we can get much better --actually, as fast as possible.

Post by Ion GaztaÃ±aga via Boost
For absolute zero overhead (no repeteated typeid, no repeated hash
search), a segment
[...]

I think this is not needed in light of my previous comment.

Post by Ion GaztaÃ±aga via Boost
1) I also noticed that bc.size() and bc.empty() are not constant-time.
It seems a bit surprising, it might be noticeable when the number of
segments grows. This avoids common data in poly_collection but I
imagine most users assume these functions are free.

Strictly speaking, they're constant time wrt number of elements, assuming
(correctly, I'd say) that the number of registered types is bounded and
don't
increase linearly with size(). I'm very against caching size for
consistency reasons
and because we are adding a little fat to every operation of the container.

Post by Ion GaztaÃ±aga via Boost
2) The number of additional operations that could take advantage of
the "I know your static type" optimization is important, although the
speedup will depend on the cost of each operation (eg. size()/
template<typename T> void reserve(size_type n);
template<typename T> void shrink_to_fit();
template<typename T> size_type capacity() const;
template<typename T> size_type max_size() const;
template<typename T> size_type size() const;
template<typename T> bool empty() const;
template<typename T> void clear();

If/when we staticfy the interface of course we'll go all the way and improve
as much as can be improved. The functions you list above, though, are not
expected to be a performance problem in any sensible program.

Joaquín M López Muñoz

_______________________________________________
Unsubscribe & other changes: http://lists.boost.o

Ion Gaztañaga via Boost

2017-05-16 21:41:50 UTC

Post by Ion GaztaÃ±aga via Boost
In any case, the runtime slicing detection will slow down ranges, as
you'll need to check the typeid of each *it reference in the range.

https://github.com/joaquintides/poly_collection/blob/master/include/boost/poly_collection/detail/poly_collection.hpp#L660-L683
where you can notice than when the type is terminal we don't repeat
typeid checking.

Is the iterator's "value_type" a guarantee of the dynamic type of the
referenced object? I can imagine ptr_vector or Boost.Intrusive
iterators, where value_type is the base class, but the dynamic type of
the object is a derived class. If the implementation wants to avoid any
slicing and support these "special" iterators then it should check the
typeid of every value in the range.

Post by Joaquin M LÃ³pez MuÃ±oz via Boost
bc.insert(bc.end<warrior>(),warriors_first, warriors_last);
https://github.com/joaquintides/poly_collection/blob/master/include/boost/poly_collection/detail/poly_collection.hpp#L694-L705

I was not talking about a range taking local iterators, but a range of
iterators of a vector holding warriors. Something like:

https://github.com/joaquintides/poly_collection/blob/master/include/boost/poly_collection/detail/poly_collection.hpp#L707-L722

which also assumes iterator::value_type is the dynamic type of the whole
range.

Post by Joaquin M LÃ³pez MuÃ±oz via Boost
This is efficient typeid-wise, not so with respect to preallocating
based on distance.
If we go ahead with your proposal of static-fying the interface when the
concrete type
is known, we can get much better --actually, as fast as possible.

Post by Ion GaztaÃ±aga via Boost
For absolute zero overhead (no repeteated typeid, no repeated hash
search), a segment
[...]

I think this is not needed in light of my previous comment.

As a suggestion, I would include in the insertion benchmarks a vector
push_back. At least for packed_segment, vector push_back is the
upper_bound of any performance improvement. If the performance
difference between the new "staticfied" single value insertion function:

while(....)
pc.insert(warrior());

and the vector push_back code

while(...)
warrior_vector.push_back(warrior());

is significant, then maybe a uglier but near-zero overhead alternative
(like the segment handle) methods might be a good idea. Specially for
power programmers that want to create a bunch of warriors when the new
game level starts ;-)

I don't know if anyone will use poly_collection with many different
types and few elements per segment. As always users will surprise us ;-)

In any case I would recommend documenting this visible somewhere
(ideally in the method description as Complexity) with a complexity
similar to (I hope I don't get it wrong):

O(distance(segment_traversal().begin(), segment_traversal().end()))

as typically most programmers would think about those as extremely cheap
operations and might call them in a loop.

Right. My "the root of all evil" side is also reviewing the library ;-)

And hopefully these are my last comments ;-) I think none of them are
needed for the acceptance of the library, as they could be added as
future improvements.

Best,

Ion

_______________________________________________
Unsubscribe & other c

Joaquin M López Muñoz via Boost

2017-05-16 21:59:51 UTC

Post by Ion GaztaÃ±aga via Boost
In any case, the runtime slicing detection will slow down ranges, as
you'll need to check the typeid of each *it reference in the range.

Is the iterator's "value_type" a guarantee of the dynamic type of the
referenced object? I can imagine ptr_vector or Boost.Intrusive
iterators, where value_type is the base class, but the dynamic type of
the object is a derived class. If the implementation wants to avoid
any slicing and support these "special" iterators then it should check
the typeid of every value in the range.

Exactly, this is what it does. At the end of the day, this is about
convenience vs.
performance. I'd have to find a very convincing argument to give slicing
prevention
away. Furthermore, you can have the best of both worlds if you keep
reading below :-)

I was not talking about a range taking local iterators, but a range of
https://github.com/joaquintides/poly_collection/blob/master/include/boost/poly_collection/detail/poly_collection.hpp#L707-L722
which also assumes iterator::value_type is the dynamic type of the
whole range.

I know, and this is what you got:

https://github.com/joaquintides/poly_collection/blob/master/include/boost/poly_collection/detail/poly_collection.hpp#L785-L818

Note that in bc.insert(bc.end<warrior>(),warriors_first, warriors_last)
it's only
the first arg that's actually a local iterator: warriors_first and
warriors_last can
be any InputIterator. No typeid checking is done (except on debug mode, see
the BOOST_ASSERT's) and when this is staticfied we can get the preallocation
performance boost etc. In a sense, bc.end<warrior>() is the indication to
Boost.PolyCollection that the inserted range *must* be warriors and not
something
else.

Post by Ion GaztaÃ±aga via Boost
[...]
As a suggestion, I would include in the insertion benchmarks a vector
push_back. At least for packed_segment, vector push_back is the
upper_bound of any performance improvement. If the performance
while(....)
pc.insert(warrior());
and the vector push_back code
while(...)
warrior_vector.push_back(warrior());
is significant, then maybe a uglier but near-zero overhead alternative
(like the segment handle) methods might be a good idea. Specially for
power programmers that want to create a bunch of warriors when the new
game level starts ;-)

Hintless insertion resolves to pushing back to the corresponding
segment. The difference
in performance would solely consist in the looking up of the segment.
Again, you can already
have your second version with the existing interface:

while(...)
pc.insert(bc.end<warrior>(),warrior());

(Insertion at the end is, of course, the same as pushing back).

I don't know if anyone will use poly_collection with many different
types and few elements per segment. As always users will surprise us ;-)
In any case I would recommend documenting this visible somewhere
(ideally in the method description as Complexity) with a complexity
O(distance(segment_traversal().begin(), segment_traversal().end()))
as typically most programmers would think about those as extremely
cheap operations and might call them in a loop.

Noted.

Post by Ion GaztaÃ±aga via Boost
[...]
And hopefully these are my last comments ;-) I think none of them are
needed for the acceptance of the library, as they could be added as
future improvements.

Thank you for your very interesting input. Let's see is some latecomer joins
the party before the review is over.

Joaquín M López Muñoz

_______________________________________________
Unsubscribe & other changes: http://lists.boost

Thorsten Ottosen via Boost

2017-05-17 10:04:50 UTC

Post by Ion GaztaÃ±aga via Boost
I don't know if anyone will use poly_collection with many different
types and few elements per segment. As always users will surprise us ;-)

I agree with Joaquin that we should not store the size.