[squeak-dev] The defaullt implementation of isEmpty might do too much work

Discussion:

[squeak-dev] The defaullt implementation of isEmpty might do too much work

Nicolas Cellier

2016-10-20 16:40:26 UTC

The default behavior is to rely on ^self size = 0.
But default size is doing too much (it iterates over all elements).
It's OK for most of our classes which know very well about their size and
override #size.
It's not OK for an eventually lazy or infinite subclass.
I realized this by answering
http://stackoverflow.com/questions/40149826/is-there-a-construct-like-iterate-iterable-if-it-has-elements-else

Eliot Miranda

2016-10-20 17:26:51 UTC

Nicolas,

On Thu, Oct 20, 2016 at 9:40 AM, Nicolas Cellier <
***@gmail.com> wrote:

> The default behavior is to rely on ^self size = 0.
> But default size is doing too much (it iterates over all elements).
> It's OK for most of our classes which know very well about their size and
> override #size.
> It's not OK for an eventually lazy or infinite subclass.
>

Good catch. Change it!

isEmpty
self do: [:element| ^false].
^true

> I realized this by answering http://stackoverflow.com/
> questions/40149826/is-there-a-construct-like-iterate-
> iterable-if-it-has-elements-else
>

_,,,^..^,,,_
best, Eliot

Stéphane Rollandin

2016-10-21 09:50:50 UTC

> isEmpty
> self do: [:element| ^false].
> ^true

Nice one.

Stef

monty

2016-10-21 22:59:48 UTC

All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.

Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.

Eliot Miranda

2016-10-22 03:26:54 UTC

Hi Monty,

what happens if you add an isEmpty implement ration based in size to SequenceableCollection?

_,,,^..^,,,_ (phone)

> On Oct 21, 2016, at 3:59 PM, monty <***@programmer.net> wrote:
>
> All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
>
> Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
>

monty

2016-10-22 04:48:08 UTC

Putting #size-based implementations in abstract Collection subclasses negates the lazy collection benefit and still penalizes non-abstract direct Collection subclasses like CharacterSet and any user-defined direct Collection subclass implemented using composition with forwarding to a concrete collection. If the default linear complexity of #size and #isEmpty bothers you so much, make #size a subclassResponsiblity like #do: instead of suddenly penalizing collections whose authors actually did the right thing by providing a non-linear #size implementation.

> Sent: Friday, October 21, 2016 at 11:26 PM
> From: "Eliot Miranda" <***@gmail.com>
> To: "The general-purpose Squeak developers list" <squeak-***@lists.squeakfoundation.org>
> Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
>
> Hi Monty,
>
> what happens if you add an isEmpty implement ration based in size to SequenceableCollection?
>
> _,,,^..^,,,_ (phone)
>
> > On Oct 21, 2016, at 3:59 PM, monty <***@programmer.net> wrote:
> >
> > All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
> >
> > Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
> >
>
>

Nicolas Cellier

2016-10-22 12:42:43 UTC

Please retry with latest trunk version.

2016-10-22 6:48 GMT+02:00 monty <***@programmer.net>:

> Putting #size-based implementations in abstract Collection subclasses
> negates the lazy collection benefit and still penalizes non-abstract direct
> Collection subclasses like CharacterSet and any user-defined direct
> Collection subclass implemented using composition with forwarding to a
> concrete collection. If the default linear complexity of #size and #isEmpty
> bothers you so much, make #size a subclassResponsiblity like #do: instead
> of suddenly penalizing collections whose authors actually did the right
> thing by providing a non-linear #size implementation.
>
> > Sent: Friday, October 21, 2016 at 11:26 PM
> > From: "Eliot Miranda" <***@gmail.com>
> > To: "The general-purpose Squeak developers list" <squeak-***@lists.
> squeakfoundation.org>
> > Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty
> might do too much work
> >
> > Hi Monty,
> >
> > what happens if you add an isEmpty implement ration based in size to
> SequenceableCollection?
> >
> > _,,,^..^,,,_ (phone)
> >
> > > On Oct 21, 2016, at 3:59 PM, monty <***@programmer.net> wrote:
> > >
> > > All non-trivial collections implement #size or inherit a custom
> implementation, usually just as a return of an inst var. Your #do:-based
> one is 3x as slow in my tests, so you've now made #isEmpty slower for every
> collection that implements #size just to benefit ones whose careless
> authors didn't and so the implementors of lazy collections have one fewer
> message to override.
> > >
> > > Do not do this. People already avoid #ifEmpty: and related messages
> where performance matters and shouldn't have to avoid #isEmpty too.
> > >
> >
> >
>
>

monty

2016-10-24 04:36:46 UTC

The latest trunk has numerous custom #isEmpty implementations, but that doesn't help custom collections outside the image not inheriting one. I maintain cross-platform projects that have such collections, like XPath with XPathFunctionSet or the BitmapCharacterSet package it relies on. Both are Collection subclasses, and now I feel like I should add this:
isEmpty
^ self size = 0

to every single collection I maintain. Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?

>Sent: Saturday, October 22, 2016 at 8:42 AM
>From: "Nicolas Cellier" <***@gmail.com>
>To: "The general-purpose Squeak developers list" <squeak-***@lists.squeakfoundation.org>
>Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
>
>Please retry with latest trunk version.
>
>2016-10-22 6:48 GMT+02:00 monty <***@programmer.net>:Putting #size-based implementations in abstract Collection subclasses negates the lazy collection benefit and still penalizes non-abstract direct Collection subclasses like CharacterSet and any user-defined direct Collection subclass implemented using composition with forwarding to a concrete collection. If the default linear complexity of #size and #isEmpty bothers you so much, make #size a subclassResponsiblity like #do: instead of suddenly penalizing collections whose authors actually did the right thing by providing a non-linear #size implementation.
>
>> Sent: Friday, October 21, 2016 at 11:26 PM
>> From: "Eliot Miranda" <***@gmail.com>
>> To: "The general-purpose Squeak developers list" <squeak-***@lists.squeakfoundation.org[mailto:squeak-***@lists.squeakfoundation.org]>
>> Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
>
>>
>> Hi Monty,
>>
>> what happens if you add an isEmpty implement ration based in size to SequenceableCollection?
>>
>> _,,,^..^,,,_ (phone)
>>
>> > On Oct 21, 2016, at 3:59 PM, monty <***@programmer.net> wrote:
>> >
>> > All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
>> >
>> > Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
>> >
>>
>>

Bert Freudenberg

2016-10-24 08:31:22 UTC

On Monday, 24 October 2016, monty <***@programmer.net> wrote:

> Why not just make #size a subclassResponsibility or add abstract
> superclasses for lazy or infinite collections that implement #isEmpty using
> #do: and change #size to shouldNotImplement?
>

Because #do: is the only method required to be overridden by Collection
subclasses. All other methods are implemented in terms of #do:.

So yes, if your Collection subclass has an optimized implementation for
#isEmpty, then provide it. That is a small price to pay for making a class
work optimally across different code bases.

- Bert -

Eliot Miranda

2016-10-24 18:56:17 UTC

On Mon, Oct 24, 2016 at 1:31 AM, Bert Freudenberg <***@freudenbergs.de>
wrote:

> On Monday, 24 October 2016, monty <***@programmer.net> wrote:
>
>> Why not just make #size a subclassResponsibility or add abstract
>> superclasses for lazy or infinite collections that implement #isEmpty using
>> #do: and change #size to shouldNotImplement?
>>
>
> Because #do: is the only method required to be overridden by Collection
> subclasses. All other methods are implemented in terms of #do:.
>

+1

> So yes, if your Collection subclass has an optimized implementation for
> #isEmpty, then provide it. That is a small price to pay for making a class
> work optimally across different code bases.
>
> - Bert -
>
>
>
>

--
_,,,^..^,,,_
best, Eliot

Chris Muller

2016-10-25 02:07:39 UTC

Normally its the Subclasses where implementation-specific
optimizations are done, not in the top abstractions. At the level of
Collection, the code should be elegant, expressive, and defining of
the responsibilities by the messages it sends. #size is something of
Collection, period, there is nothing wrong with sending it, and
checking size = 0 is more elegant than a block activation with
non-local return.

As a community that respects its user base, we need to consider the
*type* of damage that "Change it!" could cause to existing production
applications. These applications were built on the expressive
implementation, but now they'll be sending #do: to that SQLCollection
instead of #size, which will cause the database to retrieve a full
ResultSet, send it back to the client, just so it can answer #isEmpty.

Its this kind of "silent" damage that is the worst. Because its
invisible, not a an error, not a blow up. Just an degradation in
performance (ironic, given our motive here) which, believably, would
not be caught in testing.

I agree with Monty. Collection is abstract, there will never be an
instance of it. No self-respecting subclass needs this optimized, and
trying to optimize it way up there forces too many assumptions about
implementation. It makes the code less expressive while silently
inflicting potentially performance-killing pain onto legacy
applications until they're forced to sprinkle duplicate #isEmpty
implementations all about..

On Mon, Oct 24, 2016 at 3:31 AM, Bert Freudenberg <***@freudenbergs.de> wrote:
> On Monday, 24 October 2016, monty <***@programmer.net> wrote:
>>
>> Why not just make #size a subclassResponsibility or add abstract
>> superclasses for lazy or infinite collections that implement #isEmpty using
>> #do: and change #size to shouldNotImplement?
>
>
> Because #do: is the only method required to be overridden by Collection
> subclasses. All other methods are implemented in terms of #do:.
>
> So yes, if your Collection subclass has an optimized implementation for
> #isEmpty, then provide it. That is a small price to pay for making a class
> work optimally across different code bases.
>
> - Bert -
>
>
>

Chris Muller

2016-10-25 03:18:15 UTC

Since Smalltalk-80, sends to #do: have been considered expensive, and
sends to #size, inexpensive. And so it has been okay for applications
which defined their own Collection subclasses, to add a few
milliseconds of cost to something that's already considered expensive
(#do:), and not to something which is expected to be fast (#size).

** For 30 years applications were built on these assumptions. Even I
did it in Magma. #isEmpty was built on that assumption. Subclass
implementors have seemed to agree to these assumptions too, and made
their implementations true to them..

May we please reconsider this change?

On Mon, Oct 24, 2016 at 9:07 PM, Chris Muller <***@gmail.com> wrote:
> Normally its the Subclasses where implementation-specific
> optimizations are done, not in the top abstractions. At the level of
> Collection, the code should be elegant, expressive, and defining of
> the responsibilities by the messages it sends. #size is something of
> Collection, period, there is nothing wrong with sending it, and
> checking size = 0 is more elegant than a block activation with
> non-local return.
>
> As a community that respects its user base, we need to consider the
> *type* of damage that "Change it!" could cause to existing production
> applications. These applications were built on the expressive
> implementation, but now they'll be sending #do: to that SQLCollection
> instead of #size, which will cause the database to retrieve a full
> ResultSet, send it back to the client, just so it can answer #isEmpty.
>
> Its this kind of "silent" damage that is the worst. Because its
> invisible, not a an error, not a blow up. Just an degradation in
> performance (ironic, given our motive here) which, believably, would
> not be caught in testing.
>
> I agree with Monty. Collection is abstract, there will never be an
> instance of it. No self-respecting subclass needs this optimized, and
> trying to optimize it way up there forces too many assumptions about
> implementation. It makes the code less expressive while silently
> inflicting potentially performance-killing pain onto legacy
> applications until they're forced to sprinkle duplicate #isEmpty
> implementations all about..
>
> On Mon, Oct 24, 2016 at 3:31 AM, Bert Freudenberg <***@freudenbergs.de> wrote:
>> On Monday, 24 October 2016, monty <***@programmer.net> wrote:
>>>
>>> Why not just make #size a subclassResponsibility or add abstract
>>> superclasses for lazy or infinite collections that implement #isEmpty using
>>> #do: and change #size to shouldNotImplement?
>>
>>
>> Because #do: is the only method required to be overridden by Collection
>> subclasses. All other methods are implemented in terms of #do:.
>>
>> So yes, if your Collection subclass has an optimized implementation for
>> #isEmpty, then provide it. That is a small price to pay for making a class
>> work optimally across different code bases.
>>
>> - Bert -
>>
>>
>>

monty

2016-10-25 04:33:23 UTC

Except #do: isn't the only method required to be overridden. #add: and #remove:ifAbsent: are also abstract.

> Sent: Monday, October 24, 2016 at 4:31 AM
> From: "Bert Freudenberg" <***@freudenbergs.de>
> To: "The general-purpose Squeak developers list" <squeak-***@lists.squeakfoundation.org>
> Subject: Re: [squeak-dev] The defaullt implementation of isEmpty might do too much work
> On Monday, 24 October 2016, monty <***@programmer.net[mailto:***@programmer.net]> wrote:Why not just make #size a subclassResponsibility or add abstract superclasses for lazy or infinite collections that implement #isEmpty using #do: and change #size to shouldNotImplement?
>
> Because #do: is the only method required to be overridden by Collection subclasses. All other methods are implemented in terms of #do:.
>
> So yes, if your Collection subclass has an optimized implementation for #isEmpty, then provide it. That is a small price to pay for making a class work optimally across different code bases.
>
> - Bert -

Bert Freudenberg

2016-10-25 11:03:43 UTC

On Tue, Oct 25, 2016 at 6:33 AM, monty <***@programmer.net> wrote:

> Except #do: isn't the only method required to be overridden. #add: and
> #remove:ifAbsent: are also abstract.
>

Only for extensible subclasses. These *can not* be implemented in terms of
#do: so they have to be implemented in a subclass.

Collection provides a maximum of protocol with a minimum of methods
required to be overridden.

This discussion is besides the point though. I was just pointing out that
requiring #size or #isEmpty to be a subclass responsibility is not a good
idea, because they can and should be implemented in terms of #do:. Whether
#isEmpty should be implemented in terms of #size is a wholly different
issue.

Chris has some valid points about historical performance trade-offs which
are worth discussing.

My take is that if we don't know *anything* about the subclass, we have to
assume that #size can be way more expensive than a single iteration of #do:
(e.g. a linked list). So using #do: is the sensible default implementation
for #isEmpty. If OTOH a subclass has a very expensive #do:, then it's
reasonable to assume that it would avoid doing much work if it is in fact
empty.

If a Collection subclass did rely on an implementation detail of its
superclass performance-wise then this is unfortunate, but easy to fix by
implementing #isEmpty in terms of #size. We shouldn't let that get in the
way of making our base libraries more elegant.

- Bert -

Chris Muller

2016-10-25 19:45:41 UTC

> My take is that if we don't know *anything* about the subclass, we have to
> assume that #size can be way more expensive than a single iteration of #do:
> (e.g. a linked list).

I don't understand this. How could #size ever be "way more expensive"
than a single iteration of #do:? Based on the implementation in
Collection, that's impossible..

> So using #do: is the sensible default implementation
> for #isEmpty. If OTOH a subclass has a very expensive #do:, then it's
> reasonable to assume that it would avoid doing much work if it is in fact
> empty.
>
> If a Collection subclass did rely on an implementation detail of its
> superclass performance-wise then this is unfortunate,

Well that's precisely what this change does! We're trying to fix some
hypothetical subclass to "rely" on this implementation detail way up
in Collection..

> but easy to fix by
> implementing #isEmpty in terms of #size.

You ignored the "silent damage" argument, that someone wouldn't even
KNOW it needs fixing.

Besides, the fix should be made in the subclass which doesn't have a fast #size.

Folks there is NO BASIS at the level of Collection for assuming that
do: [ : each | ^
false ] is faster than ^self size=0. In fact, the proper assumption
is the opposite -- that #size is the optimized method, not #do:.

> We shouldn't let that get in the
> way of making our base libraries more elegant.

It makes them LESS elegant, because of 1) the violation of the
assumption that #size is faster than #do:, plus 2) the associated
duplication of code that will be required across multiple subclasses
just to recover that original faster implementation, which they used
to inherit. Doesn't that matter?

I don't think the burden of proof should be on the legacy method, but
on the new implementation which purports some improvement. This
change sounded good in the ivory towser, but can anyone identify *one
single place* in the real-world where this change is beneficial?

Because the detractors are potentially very real...

Tobias Pape

2016-10-25 20:03:31 UTC

On 25.10.2016, at 21:45, Chris Muller <***@gmail.com> wrote:

> Folks there is NO BASIS at the level of Collection for assuming that
> do: [ : each | ^
> false ] is faster than ^self size=0. In fact, the proper assumption
> is the opposite -- that #size is the optimized method, not #do:.

I don't understand on what _either_ performance-assumption is grounded.
Since I've apparently not been around since the early years of either
ST-80 or Squeak, can someone enlighten me here?

Both ways seem somewhat reasonable:
`self size = 0` is simple, easy to understand on first sight and said here to be more expensive than
`self do: [:ea | ^false]`, which is clever[1], possibly needs a double-take (or comment), but has a certain appeal of wholeness[2].

Best regards
-Tobias

[1]: sadly, cleverness in software typically is a drawback.
[2]: I mean, the minimalism that is possible by just using/providing #do: somehow conveys closure/completeness

tim Rowledge

2016-10-25 20:14:18 UTC

> On 25-10-2016, at 1:03 PM, Tobias Pape <***@gmx.de> wrote:
> [1]: sadly, cleverness in software typically is a drawback.

Important point here; since debugging typically requires more cleverness than writing, making the code as clever as possible almost guarantees it cannot be debugged.

tim
--
tim Rowledge; ***@rowledge.org; http://www.rowledge.org/tim
Useful random insult:- An 8086 in a StrongARM environment.

Eliot Miranda

2016-10-25 20:49:13 UTC

Hi Tobias, Hi All,

On Tue, Oct 25, 2016 at 1:03 PM, Tobias Pape <***@gmx.de> wrote:

>
> On 25.10.2016, at 21:45, Chris Muller <***@gmail.com> wrote:
>
> > Folks there is NO BASIS at the level of Collection for assuming that
> > do: [ : each | ^
> > false ] is faster than ^self size=0. In fact, the proper assumption
> > is the opposite -- that #size is the optimized method, not #do:.
>
> I don't understand on what _either_ performance-assumption is grounded.
> Since I've apparently not been around since the early years of either
> ST-80 or Squeak, can someone enlighten me here?
>
> Both ways seem somewhat reasonable:
> `self size = 0` is simple, easy to understand on first sight and said
> here to be more expensive than

`self do: [:ea | ^false]`, which is clever[1], possibly needs a
> double-take (or comment), but has a certain appeal of wholeness[2].
>

self size = 0 doesn't work for infinite collections. The suggested
implementation of isEmpty does indeed require a comment (I didn't implement
it) but it does work for infinite collections. The suggested implementation
os isEmpty is also faster than elf size = 0 for any collection of
sufficient size.

For example:

| b n |
b := Bag someInstance.
n := 100000.
{ b size. [1 to: n do: [:ign| b isEmptyOId]] timeToRun. [1 to: n do: [:ign|
b isEmpty]] timeToRun } #(20402 3293 21)

and the cut off is quite low:

| n |
n := 10000.
((1 to: 10) collect:
[:iguana|
(1 to: SmallInteger maxVal) detect:
[:i|
b := (1 to: i) asBag.
[1 to: n do: [:ign| b isEmptyOId]] timeToRun > [1 to: n do: [:ign| b
isEmpty]] timeToRun]]) average asFloat 4.9

So for collections that are 5 elements or more the "clever" implementation
is faster.

Best regards
> -Tobias
>
> [1]: sadly, cleverness in software typically is a drawback.
> [2]: I mean, the minimalism that is possible by just using/providing #do:
> somehow conveys closure/completeness
>

_,,,^..^,,,_
best, Eliot

Nicolas Cellier

2016-10-25 21:35:41 UTC

Hi, Chris, Monty, Tim,
did you see the implementation of Collection>>size?

size
"Answer how many elements the receiver contains."

| tally |
tally := 0.
self do: [:each | tally := tally + 1].
^ tally

Let's forget a minute about specific subclasses.
Does iterating over the whole collection is really the best idea for
implementing isEmpty?
When we implement a message in Collection, we should think in term of
Collection, not Interval, Set nor Array.

We're talking of several things here:
- universality of the default implementation:
we don't need to know about the size to tell if empty
not all collection have a known size
- optimization:
old implementation is fast for collections having an optimized size,
catastrophic for others.
new implementation is reasonably fast, and full optimization can be
restored with a few 1-liner.
- cleverness:
IMO this is not more clever than, and quite logically connected to
implementation of size

I agree that having several implementations for the sake of optimization is
not ideal.
That's what would have curbed my enthusiasm.
Though I doubt about any noticeable slowdown in anything else than
micro-benchmarks.

2016-10-25 22:49 GMT+02:00 Eliot Miranda <***@gmail.com>:

> Hi Tobias, Hi All,
>
> On Tue, Oct 25, 2016 at 1:03 PM, Tobias Pape <***@gmx.de> wrote:
>
>>
>> On 25.10.2016, at 21:45, Chris Muller <***@gmail.com> wrote:
>>
>> > Folks there is NO BASIS at the level of Collection for assuming that
>> > do: [ : each | ^
>> > false ] is faster than ^self size=0. In fact, the proper assumption
>> > is the opposite -- that #size is the optimized method, not #do:.
>>
>> I don't understand on what _either_ performance-assumption is grounded.
>> Since I've apparently not been around since the early years of either
>> ST-80 or Squeak, can someone enlighten me here?
>>
>> Both ways seem somewhat reasonable:
>> `self size = 0` is simple, easy to understand on first sight and said
>> here to be more expensive than
>
> `self do: [:ea | ^false]`, which is clever[1], possibly needs a
>> double-take (or comment), but has a certain appeal of wholeness[2].
>>
>
> self size = 0 doesn't work for infinite collections. The suggested
> implementation of isEmpty does indeed require a comment (I didn't implement
> it) but it does work for infinite collections. The suggested implementation
> os isEmpty is also faster than elf size = 0 for any collection of
> sufficient size.
>
> For example:
>
> | b n |
> b := Bag someInstance.
> n := 100000.
> { b size. [1 to: n do: [:ign| b isEmptyOId]] timeToRun. [1 to: n do:
> [:ign| b isEmpty]] timeToRun } #(20402 3293 21)
>
> and the cut off is quite low:
>
>
> | n |
> n := 10000.
> ((1 to: 10) collect:
> [:iguana|
> (1 to: SmallInteger maxVal) detect:
> [:i|
> b := (1 to: i) asBag.
> [1 to: n do: [:ign| b isEmptyOId]] timeToRun > [1 to: n do: [:ign| b
> isEmpty]] timeToRun]]) average asFloat 4.9
>
> So for collections that are 5 elements or more the "clever" implementation
> is faster.
>
> Best regards
>> -Tobias
>>
>> [1]: sadly, cleverness in software typically is a drawback.
>> [2]: I mean, the minimalism that is possible by just using/providing #do:
>> somehow conveys closure/completeness
>>
>
>
> _,,,^..^,,,_
> best, Eliot
>
>
>
>

Chris Muller

2016-10-26 04:40:29 UTC

Hi Nicolas, I understand the angle you are seeing this from, but these
we distill these points down, we end up with no real-world value; but
some unintended negatives, in fact..

> did you see the implementation of Collection>>size?
>
> size
> "Answer how many elements the receiver contains."
>
> | tally |
> tally := 0.
> self do: [:each | tally := tally + 1].
> ^ tally
>
> Let's forget a minute about specific subclasses.
> Does iterating over the whole collection is really the best idea for
> implementing isEmpty?

Asking this question is already assuming too much.
Collection>>#isEmpty should not be optimized in terms of another
method two sends away. Collection is the ivory tower which only
codifies the elegant definitions of the general behaviors. We should
let the subclasses be concerned with the nitty-gritty implementation
details needed to go crazy with their (less elegant) optimizations.
**We shouldn't try to make performance optimizations in Collection.

> When we implement a message in Collection, we should think in term of
> Collection, not Interval, Set nor Array.

The old implementation already was implemented in terms of Collection.

> We're talking of several things here:
> - universality of the default implementation:

The old implementation was universal. #size always has and always
will be part of Collection.

> we don't need to know about the size to tell if empty

Not any more than we need to avoid it. Not any more than we need to
initiate a #do: enumeration (which is very expensive for some).

> not all collection have a known size

So what will happen if they're sent #size then?

If #shouldNotImplement then they will override #isEmpty.

> - optimization:
> old implementation is fast for collections having an optimized size,
> catastrophic for others.

Those are more rare, they should override #isEmpty for their specific needs.

> new implementation is reasonably fast, and full optimization can be
> restored with a few 1-liner.

"Reasonably fast" for which subclass? We can't ignore the subclasses.

> - cleverness:
> IMO this is not more clever than, and quite logically connected to
> implementation of size
>
> I agree that having several implementations for the sake of optimization is
> not ideal.

It told me something was amiss with this change. Too many subclasses disagree.

> That's what would have curbed my enthusiasm.
> Though I doubt about any noticeable slowdown in anything else than
> micro-benchmarks.

The SQLCollection example would be a large degradation.
-----
It all distills down to slightly-negative. We should revert it.

Chris Muller

2016-10-25 21:54:52 UTC

Hi Tobias and Eliot,

>> > Folks there is NO BASIS at the level of Collection for assuming that
>> > do: [ : each | ^
>> > false ] is faster than ^self size=0. In fact, the proper assumption
>> > is the opposite -- that #size is the optimized method, not #do:.
>>
>> I don't understand on what _either_ performance-assumption is grounded.

Exactly. The original basis for making this change is groundless.

> self size = 0 doesn't work for infinite collections.

What's an infinite collection, and why wouldn't self size = 0 work for
its #isEmpty? I assume it will need to override #size, so if it
answers Float infinity then #isEmpty is false. What's not working?
And, irregardless, custom collections like that define what THEY need
to work properly (e.g., #size, #isEmpty, whatever) in their own class,
not going changing 30-year old base library code to make themselves
work..

> The suggested implementation of isEmpty does indeed require a comment (I didn't implement it) but it does work for infinite collections. The suggested implementation os isEmpty is also faster than elf size = 0 for any collection of sufficient size.
>
> For example:
>
> | b n |
> b := Bag someInstance.
> n := 100000.
> { b size. [1 to: n do: [:ign| b isEmptyOId]] timeToRun. [1 to: n do: [:ign| b isEmpty]] timeToRun } #(20402 3293 21)
>
> and the cut off is quite low:
>
>
> | n |
> n := 10000.
> ((1 to: 10) collect:
> [:iguana|
> (1 to: SmallInteger maxVal) detect:
> [:i|
> b := (1 to: i) asBag.
> [1 to: n do: [:ign| b isEmptyOId]] timeToRun > [1 to: n do: [:ign| b isEmpty]] timeToRun]]) average asFloat 4.9
>
> So for collections that are 5 elements or more the "clever" implementation is faster.

Then add Bag>>#isEmpty, don't change Collection>>#isEmpty, because it
changes a 30-year old assumption about how #isEmpty works for every
other kind of Collection.

Tobias Pape

2016-10-25 23:09:39 UTC

On 25.10.2016, at 23:54, Chris Muller <***@gmail.com> wrote:

> Hi Tobias and Eliot,
>
>>>> Folks there is NO BASIS at the level of Collection for assuming that
>>>> do: [ : each | ^
>>>> false ] is faster than ^self size=0. In fact, the proper assumption
>>>> is the opposite -- that #size is the optimized method, not #do:.
>>>
>>> I don't understand on what _either_ performance-assumption is grounded.
>
> Exactly. The original basis for making this change is groundless.

No Chris, I said I don't understand _both_ assumptions.

>
>> self size = 0 doesn't work for infinite collections.
>
> What's an infinite collection,

For example, a generator…
or a right-open interval?

[...]
>> ts or more the "clever" implementation is faster.
>
> Then add Bag>>#isEmpty, don't change Collection>>#isEmpty, because it
> changes a 30-year old assumption about how #isEmpty works for every
> other kind of Collection.

Kent Beck gives this example for 'Simple Delegation':

Object subclass: #Vector
instanceVariableNames: ‘elements’

Vector>>do: aBlock
elements do: aBlock

It seems the #do:-based implementation is 'more correct' here.

I found no other recommendation in either Beck's or Thomas' style guides.
To _me_ this suggests that what you perceive as 30-year-old-assumption
is not as universal as '#do: should suffice'.

Best regards
-Tobias

David T. Lewis

2016-10-25 23:42:55 UTC

On Sat, Oct 22, 2016 at 12:59:48AM +0200, monty wrote:
> All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
>
> Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.

I like Eliot's new implementation, in part because it teaches the reader something
about how iteration works. It is still simple enough for an inexperienced person
to read, but it also challenges the reader to think about the control flow.

That said, I suspect that Monty and Chris are among the people with the most
real-world experience in dealing with performance on very large collections.

So, I would like to ask from a strictly practical point of view, does the change
that we are discussing here cause real-world problems for Gemstone and Magma
applications?

Does the change make the maintenance of large collection classes more difficult
for cross-dialect portability?

Are there any collection classes (especially in Gemstone and Magma) that become
slower in a real-world measurable sense as a result of this change?

Dave

John Pfersich

2016-10-26 02:17:56 UTC

+1.

Sent from my iPhone

> On Oct 25, 2016, at 16:42, David T. Lewis <***@mail.msen.com> wrote:
>
>> On Sat, Oct 22, 2016 at 12:59:48AM +0200, monty wrote:
>> All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
>>
>> Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
>
> I like Eliot's new implementation, in part because it teaches the reader something
> about how iteration works. It is still simple enough for an inexperienced person
> to read, but it also challenges the reader to think about the control flow.
>
> That said, I suspect that Monty and Chris are among the people with the most
> real-world experience in dealing with performance on very large collections.
>
> So, I would like to ask from a strictly practical point of view, does the change
> that we are discussing here cause real-world problems for Gemstone and Magma
> applications?
>
> Does the change make the maintenance of large collection classes more difficult
> for cross-dialect portability?
>
> Are there any collection classes (especially in Gemstone and Magma) that become
> slower in a real-world measurable sense as a result of this change?
>
> Dave
>
>

Chris Muller

2016-10-26 04:42:24 UTC

I agreed that it looked good in the ivory tower. The thing is, there
are these real-world detractors which we should not igore. To save my
old fingers, let me just summarize them from the earlier posts. In
summary:

- it provides no known material benefit
- it could affect unsuspecting legacy application performance
materially, and in an insidious way
- it's an anomalous usage of the API which works against performance
expectations about #size and #do:.
- the justification for this change (performance optimization) in
Collection is not valid in the first place.

No real-world positives, just negatives or zeros. Please revert.

On Tue, Oct 25, 2016 at 6:42 PM, David T. Lewis <***@mail.msen.com> wrote:
> On Sat, Oct 22, 2016 at 12:59:48AM +0200, monty wrote:
>> All non-trivial collections implement #size or inherit a custom implementation, usually just as a return of an inst var. Your #do:-based one is 3x as slow in my tests, so you've now made #isEmpty slower for every collection that implements #size just to benefit ones whose careless authors didn't and so the implementors of lazy collections have one fewer message to override.
>>
>> Do not do this. People already avoid #ifEmpty: and related messages where performance matters and shouldn't have to avoid #isEmpty too.
>
> I like Eliot's new implementation, in part because it teaches the reader something
> about how iteration works. It is still simple enough for an inexperienced person
> to read, but it also challenges the reader to think about the control flow.
>
> That said, I suspect that Monty and Chris are among the people with the most
> real-world experience in dealing with performance on very large collections.
>
> So, I would like to ask from a strictly practical point of view, does the change
> that we are discussing here cause real-world problems for Gemstone and Magma
> applications?
>
> Does the change make the maintenance of large collection classes more difficult
> for cross-dialect portability?
>
> Are there any collection classes (especially in Gemstone and Magma) that become
> slower in a real-world measurable sense as a result of this change?
>
> Dave
>
>

Bert Freudenberg

2016-10-26 16:35:42 UTC

On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller <***@gmail.com> wrote:

> I agreed that it looked good in the ivory tower. The thing is, there
> are these real-world detractors which we should not igore. To save my
> old fingers, let me just summarize them from the earlier posts. In
> summary:
>
> - it provides no known material benefit
>

Having an O(1) runtime vs O(n) is a very large material benefit.
Collection>>size is O(n) so it's not a good base for #isEmpty.

> - it could affect unsuspecting legacy application performance
> materially, and in an insidious way
>

If you have performance problems, profile.

> - it's an anomalous usage of the API which works against performance
> expectations about #size and #do:.
>

The performance expectation of both #size and #do: in Collection is O(n).
And there is nothing "anomalous" in using #do:, quite to the contrary, #do:
is the *essence* of a collection of objects.

> - the justification for this change (performance optimization) in
> Collection is not valid in the first place.
>

It is very valid optimization given the implementation of Collection>>size.

> No real-world positives, just negatives or zeros. Please revert.
>

Quite the opposite. It's a major improvement in basic API.

Yes, you will have to provide an optimized #isEmpty now, because you relied
on an implementation detail of a superclass. But there never was a
contract that #isEmpty needs to be implemented in terms of #size. The new
implementation makes much more sense for an arbitrary abstract Collection,
so I'd like to see it stay.

- Bert -

Frank Shearar

2016-10-26 17:42:19 UTC

On 26 October 2016 at 09:35, Bert Freudenberg <***@freudenbergs.de> wrote:

> On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller <***@gmail.com> wrote:
>
>> I agreed that it looked good in the ivory tower. The thing is, there
>> are these real-world detractors which we should not igore. To save my
>> old fingers, let me just summarize them from the earlier posts. In
>> summary:
>>
>> - it provides no known material benefit
>>
>
> Having an O(1) runtime vs O(n) is a very large material benefit.
> Collection>>size is O(n) so it's not a good base for #isEmpty.
>
>
>> - it could affect unsuspecting legacy application performance
>> materially, and in an insidious way
>>
>
> If you have performance problems, profile.
>
>
>> - it's an anomalous usage of the API which works against performance
>> expectations about #size and #do:.
>>
>
> The performance expectation of both #size and #do: in Collection is O(n).
> And there is nothing "anomalous" in using #do:, quite to the contrary, #do:
> is the *essence* of a collection of objects.
>
>
>> - the justification for this change (performance optimization) in
>> Collection is not valid in the first place.
>>
>
> It is very valid optimization given the implementation of Collection>>size.
>
>
>> No real-world positives, just negatives or zeros. Please revert.
>>
>
> Quite the opposite. It's a major improvement in basic API.
>
> Yes, you will have to provide an optimized #isEmpty now, because you
> relied on an implementation detail of a superclass. But there never was a
> contract that #isEmpty needs to be implemented in terms of #size. The new
> implementation makes much more sense for an arbitrary abstract Collection,
> so I'd like to see it stay.
>

+1. Especially since we've seen examples of Collections where #size just
doesn't make sense: generators, Actor-style mailboxes, Xtreams (and, by
extension, all reactive APIs).

Which is exactly why in C# an IEnumerable (a Collection) doesn't even have
a Count method. (It's added as an extension, and People Know(tm) to be
careful using it on an abstract collection.)

Really, to be properly Ivory Tower, and Elegant, Collection's #size should
be removed. And as Eliot correctly points out, the only elegant way to ask
a Collection about which you know nothing, whether it has any elements, is
to _evaluate its first value_.

I can't give real world Smalltalk experience reports, but I can say, with
many burn marks to prove it, that asking an abstract collection of things
for its size, is a terrible idea.

frank

> - Bert -
>
>
>
>
>

Chris Muller

2016-10-26 18:41:28 UTC

I don't wish to belabor, but you completely ignored all of my (and
Monty's) arguments. Yes, there *is* a sort-of "contract" that
#isEmpty should be implemented in terms of #size, and not #do:, that
is now being violated. I explained why earlier and you even said it
was a point worth discussing, but never did. The comment that was
added to Collection>>#isEmpty advertises the flawed thinking.

The flaws with all of your other statements were addressed in prior
posts, too. I don't know why any of the advocates for this change
couldn't address or even acknowledge those arguments. Even the
comment that was added to Collection>>#isEmpty advertises the flawed
thinking. I hope someone will go back and really _read_ the arguments
being made in this thread.

Just saw Franks note. Same thing.. :(

On Wed, Oct 26, 2016 at 11:35 AM, Bert Freudenberg <***@freudenbergs.de> wrote:
> On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller <***@gmail.com> wrote:
>>
>> I agreed that it looked good in the ivory tower. The thing is, there
>> are these real-world detractors which we should not igore. To save my
>> old fingers, let me just summarize them from the earlier posts. In
>> summary:
>>
>> - it provides no known material benefit
>
>
> Having an O(1) runtime vs O(n) is a very large material benefit.
> Collection>>size is O(n) so it's not a good base for #isEmpty.
>
>>
>> - it could affect unsuspecting legacy application performance
>> materially, and in an insidious way
>
>
> If you have performance problems, profile.
>
>>
>> - it's an anomalous usage of the API which works against performance
>> expectations about #size and #do:.
>
>
> The performance expectation of both #size and #do: in Collection is O(n).
> And there is nothing "anomalous" in using #do:, quite to the contrary, #do:
> is the *essence* of a collection of objects.
>
>>
>> - the justification for this change (performance optimization) in
>> Collection is not valid in the first place.
>
>
> It is very valid optimization given the implementation of Collection>>size.
>
>>
>> No real-world positives, just negatives or zeros. Please revert.
>
>
> Quite the opposite. It's a major improvement in basic API.
>
> Yes, you will have to provide an optimized #isEmpty now, because you relied
> on an implementation detail of a superclass. But there never was a contract
> that #isEmpty needs to be implemented in terms of #size. The new
> implementation makes much more sense for an arbitrary abstract Collection,
> so I'd like to see it stay.
>
> - Bert -
>
>
>
>

Eliot Miranda

2016-10-26 19:21:15 UTC

On Wed, Oct 26, 2016 at 11:41 AM, Chris Muller <***@gmail.com> wrote:

> I don't wish to belabor, but you completely ignored all of my (and
> Monty's) arguments. Yes, there *is* a sort-of "contract" that
> #isEmpty should be implemented in terms of #size, and not #do:, that
> is now being violated. I explained why earlier and you even said it
> was a point worth discussing, but never did. The comment that was
> added to Collection>>#isEmpty advertises the flawed thinking.
>

I don't see any such contract. Collection has a contract; to be fully
functional as a collection class the subclass must implement #do:. This is
stated; #do: is a subclass responsibility of Collection. I see no stated
contract that #isEmpty is implemented in terms of #size. Given that in
Collection #size is implemented in terms of #do:, we are free to implement
#isEmpty et al in terms either of #do: or of #size. The new implementation
is better for large collections, works for infinite collections, and is
hence to be preferred.

>
> The flaws with all of your other statements were addressed in prior
> posts, too. I don't know why any of the advocates for this change
> couldn't address or even acknowledge those arguments. Even the
> comment that was added to Collection>>#isEmpty advertises the flawed
> thinking. I hope someone will go back and really _read_ the arguments
> being made in this thread.
>

I don't see what's special here. We've given arguments for the change;
both Bert and Frank (effectively, in my view) refuted your arguments
against the change. This is no different than other performance-related
changes that have been made in the past. You have yet to present an
example of code that is broken ny this change.

Just saw Franks note. Same thing.. :(
>

Looked pretty cogent and on point to me.

>
>
>
> On Wed, Oct 26, 2016 at 11:35 AM, Bert Freudenberg <***@freudenbergs.de>
> wrote:
> > On Wed, Oct 26, 2016 at 6:42 AM, Chris Muller <***@gmail.com>
> wrote:
> >>
> >> I agreed that it looked good in the ivory tower. The thing is, there
> >> are these real-world detractors which we should not igore. To save my
> >> old fingers, let me just summarize them from the earlier posts. In
> >> summary:
> >>
> >> - it provides no known material benefit
> >
> >
> > Having an O(1) runtime vs O(n) is a very large material benefit.
> > Collection>>size is O(n) so it's not a good base for #isEmpty.
> >
> >>
> >> - it could affect unsuspecting legacy application performance
> >> materially, and in an insidious way
> >
> >
> > If you have performance problems, profile.
> >
> >>
> >> - it's an anomalous usage of the API which works against performance
> >> expectations about #size and #do:.
> >
> >
> > The performance expectation of both #size and #do: in Collection is O(n).
> > And there is nothing "anomalous" in using #do:, quite to the contrary,
> #do:
> > is the *essence* of a collection of objects.
> >
> >>
> >> - the justification for this change (performance optimization) in
> >> Collection is not valid in the first place.
> >
> >
> > It is very valid optimization given the implementation of
> Collection>>size.
> >
> >>
> >> No real-world positives, just negatives or zeros. Please revert.
> >
> >
> > Quite the opposite. It's a major improvement in basic API.
> >
> > Yes, you will have to provide an optimized #isEmpty now, because you
> relied
> > on an implementation detail of a superclass. But there never was a
> contract
> > that #isEmpty needs to be implemented in terms of #size. The new
> > implementation makes much more sense for an arbitrary abstract
> Collection,
> > so I'd like to see it stay.
> >
> > - Bert -
> >
> >
> >
> >
>
>

--
_,,,^..^,,,_
best, Eliot

Chris Muller

2016-10-26 22:51:40 UTC

Hi Eliot, I put a lot of effort to convey these arguments for the
betterment of Squeak, I hope you'll oblige this last effort with an
open and critical mind..

> I don't see any such contract. Collection has a contract; to be fully
> functional as a collection class the subclass must implement #do:. This is
> stated; #do: is a subclass responsibility of Collection. I see no stated
> contract that #isEmpty is implemented in terms of #size.

I said a "sort-of 'contract'", because the problem with this change is
more about going against the universal reality of the Smalltalk
environment and ecosystem, not against an explicit API contract. I
made an explanation of this issue in my earlier post which began with
"Since Smalltalk-80, sends to #do: ..."

> Given that in
> Collection #size is implemented in terms of #do:, we are free to implement
> #isEmpty et al in terms either of #do: or of #size. The new implementation
> is better for large collections, works for infinite collections, and is
> hence to be preferred.

If you understood that earlier post, then you understand why I believe
we are NOT free to change Collection>>#isEmpty to operate in terms
#do:, because we will have surreptitiously changed legacy applications
which incur a much heavier cost with the #do: implementation. I
provided a very plausible example of this (the SQLCollection) which
should not be ignored.

Now, I guess this change *already* did even slow down #isEmpty for
some existing classes in the image(!!), to which Nicolas reponded to
by copying-and-pasting the old implementation code into multiple
classes in order to recover their original performance. Wow.

In exchange for these blemishes, how many classes actually got benefit
from inheriting the new implementation?

Collection allSubclasses count: [ : each | (each
lookupSelector: #size) methodClass = Collection] "0"

So we're obviously not talking about benefit to any *in-image* kind of
Collection, maybe just potential benefit to some... external
application's Collection? But while hurting others..? Have we
reached a point of enough diminishing returns yet?

The reality is that #size is not only part of Smalltalk's original
concept of a Collection since 30 years ago, but is universally
expected to be fast, and Smalltalkers write there code under that
assumption. That's why the responsibility should be on those unknown
*exceptional* subclasses, the InfiniteCollection's or whatever, to
deny their #size and provide their alternate #isEmpty's.

Tallying it all up, we've got (-1) potential legacy impacts, (-1)
all-new copy-and-pasted code in our core library, just so (+0) some
exceptional-case external class MIGHT inherit this faster-for-them
implementaiton rather than override it.

I totally understand and agree with the ivory-tower arguments made in
favor of this change, its just that in the practical world, it turns
out to be a net negative.

Best,
Chris

David T. Lewis

2016-10-27 00:46:56 UTC

On Wed, Oct 26, 2016 at 05:51:40PM -0500, Chris Muller wrote:
>
> In exchange for these blemishes, how many classes actually got benefit
> from inheriting the new implementation?
>
> Collection allSubclasses count: [ : each | (each
> lookupSelector: #size) methodClass = Collection] "0"
>

Fair point. Every subclass of Collection reimplements #size in some way.
Collection>>size is never actually used by any class in the image, and
possibly not in any classes outside the image. So Collection>>size serves
as documentation of the intent, and as a default implementation if someone
were to implement a new subclass of Collection directly.

Some subclasses (e.g. Bag) would have similar characteristics and would presumably
benefit from the new #isEmpty implementation, but most seem to have #size
implementations that do not degrade in performance as size increases.

So Chris is raising a fair question. The new #isEmpty looks like it should be
faster, but is it actually so in practice?

Dave

p.s. I personally still prefer the new implementation on aesthetic grounds.

Bert Freudenberg

2016-10-27 12:00:59 UTC

On Thu, Oct 27, 2016 at 12:51 AM, Chris Muller <***@gmail.com> wrote:

>
> In exchange for these blemishes, how many classes actually got benefit
> from inheriting the new implementation?
>

| b |
b := (1 to: 10000) asBag.
[b isEmpty] bench

before: : '5,540 per second. 181 microseconds per run.'
now: '167,000 per second. 6 microseconds per run.'

In my book that's a speedup worth having.

- Bert -

Chris Muller

2016-10-27 16:50:51 UTC

> | b |
> b := (1 to: 10000) asBag.
> [b isEmpty] bench
>
> before: : '5,540 per second. 181 microseconds per run.'
> now: '167,000 per second. 6 microseconds per run.'
>
> In my book that's a speedup worth having.

You're depending on the superclass implementation in Collection for
that speedup. What did you say about that yesterday?

Honestly, Bert, it almost feels like you're being intellectually
dishonest. You chose the only one that got sped up and ignored all
the ones that got slowed down. AND, you also ignored all the other
points (again) which render the above point irrelevant. Sad.

I give up. You "win".

H. Hirzel

2016-10-27 17:20:15 UTC

Yes, in this case a series of tests across collection classes actually
would be a proof. A single point measurement is not a statistically
relevant result.

--Hannes

On 10/27/16, Chris Muller <***@gmail.com> wrote:
>> | b |
>> b := (1 to: 10000) asBag.
>> [b isEmpty] bench
>>
>> before: : '5,540 per second. 181 microseconds per run.'
>> now: '167,000 per second. 6 microseconds per run.'
>>
>> In my book that's a speedup worth having.
>
> You're depending on the superclass implementation in Collection for
> that speedup. What did you say about that yesterday?
>
> Honestly, Bert, it almost feels like you're being intellectually
> dishonest. You chose the only one that got sped up and ignored all
> the ones that got slowed down. AND, you also ignored all the other
> points (again) which render the above point irrelevant. Sad.
>
> I give up. You "win".
>
>

H. Hirzel

2016-10-27 19:51:42 UTC

Or to put it diffently: Bert supplied the first contribution in this
thread which is about performance actually measuring performance......

More is welcome.

On 10/27/16, H. Hirzel <***@gmail.com> wrote:
> Yes, in this case a series of tests across collection classes actually
> would be a proof. A single point measurement is not a statistically
> relevant result.
>
> --Hannes
>
> On 10/27/16, Chris Muller <***@gmail.com> wrote:
>>> | b |
>>> b := (1 to: 10000) asBag.
>>> [b isEmpty] bench
>>>
>>> before: : '5,540 per second. 181 microseconds per run.'
>>> now: '167,000 per second. 6 microseconds per run.'
>>>
>>> In my book that's a speedup worth having.
>>
>> You're depending on the superclass implementation in Collection for
>> that speedup. What did you say about that yesterday?
>>
>> Honestly, Bert, it almost feels like you're being intellectually
>> dishonest. You chose the only one that got sped up and ignored all
>> the ones that got slowed down. AND, you also ignored all the other
>> points (again) which render the above point irrelevant. Sad.
>>
>> I give up. You "win".
>>
>>
>

monty

2016-10-27 23:23:28 UTC

Or just make Collection>>#size a subclassResponsibiliity!

This will force implementors to give proper #size implementations that are usually O(1), which almost all do already. Bag could get the #do:-based #isEmpty as an optimization, or we could add a dedicated 'tally' inst var like other collections have and its #size could return that, making the default #size-based #isEmpty OK for it.

This is a compromise that addresses the concerns about the default Collection>>#isEmpty degrading linearly without a constant #size and the concerns about harming the performance of existing libraries and applications with custom collections that assume Collection>>#isEmpty is #size-based.

Please consider it.

> From: "Bert Freudenberg" <***@freudenbergs.de>
> To: "Chris Muller" <***@gmail.com>, "The general-purpose Squeak developers list" <squeak-***@lists.squeakfoundation.org>
> Subject: Re: [squeak-dev] Re: The defaullt implementation of isEmpty might do too much work
>
> On Thu, Oct 27, 2016 at 12:51 AM, Chris Muller <***@gmail.com[mailto:***@gmail.com]> wrote:
> In exchange for these blemishes, how many classes actually got benefit
> from inheriting the new implementation?
>
> | b |
> b := (1 to: 10000) asBag.
> [b isEmpty] bench
>
> before: : '5,540 per second. 181 microseconds per run.'
> now: '167,000 per second. 6 microseconds per run.'
> In my book that's a speedup worth having.
>

Stéphane Rollandin

2016-10-27 14:13:08 UTC

> there never was
> a contract that #isEmpty needs to be implemented in terms of #size. The
> new implementation makes much more sense for an arbitrary abstract
> Collection, so I'd like to see it stay.
>
> - Bert -

+1

Stef

36 Replies
2 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Nicolas Cellier 2016-10-20 16:40:26 UTC

Eliot Miranda 2016-10-20 17:26:51 UTC

Stéphane Rollandin 2016-10-21 09:50:50 UTC

monty 2016-10-21 22:59:48 UTC

Eliot Miranda 2016-10-22 03:26:54 UTC

monty 2016-10-22 04:48:08 UTC

Nicolas Cellier 2016-10-22 12:42:43 UTC

monty 2016-10-24 04:36:46 UTC

Bert Freudenberg 2016-10-24 08:31:22 UTC

Eliot Miranda 2016-10-24 18:56:17 UTC

Chris Muller 2016-10-25 02:07:39 UTC

Chris Muller 2016-10-25 03:18:15 UTC

monty 2016-10-25 04:33:23 UTC

Bert Freudenberg 2016-10-25 11:03:43 UTC

Chris Muller 2016-10-25 19:45:41 UTC

Tobias Pape 2016-10-25 20:03:31 UTC

tim Rowledge 2016-10-25 20:14:18 UTC

Eliot Miranda 2016-10-25 20:49:13 UTC

Nicolas Cellier 2016-10-25 21:35:41 UTC

Chris Muller 2016-10-26 04:40:29 UTC

Chris Muller 2016-10-25 21:54:52 UTC

Tobias Pape 2016-10-25 23:09:39 UTC

David T. Lewis 2016-10-25 23:42:55 UTC

John Pfersich 2016-10-26 02:17:56 UTC

Chris Muller 2016-10-26 04:42:24 UTC

Bert Freudenberg 2016-10-26 16:35:42 UTC

Frank Shearar 2016-10-26 17:42:19 UTC

Chris Muller 2016-10-26 18:41:28 UTC

Eliot Miranda 2016-10-26 19:21:15 UTC

Chris Muller 2016-10-26 22:51:40 UTC

David T. Lewis 2016-10-27 00:46:56 UTC

Bert Freudenberg 2016-10-27 12:00:59 UTC

Chris Muller 2016-10-27 16:50:51 UTC

H. Hirzel 2016-10-27 17:20:15 UTC

H. Hirzel 2016-10-27 19:51:42 UTC

monty 2016-10-27 23:23:28 UTC

Stéphane Rollandin 2016-10-27 14:13:08 UTC

about - legalese

Loading...