Python IS slow ! [was] Re: Python too slow for real world

The reason it's slow is its runtime model. _Every_ function call requires
a lookup in a hash table, just on the off-chance that the programmer
changed the meaning of the function.

A batch-mode optimizer analyzing an entire file (module) should be able to
detect whether or not function names are rebound.

Or a human optimizer, might do something like

(old code)
for i in a:
f(a)
(new code)
lf=f
for i in a:
lf(a)

Python (I believe) optimizes local variable lookups.
--
Moshe Zadka <mzadka at geocities.com>.
QOTD: My own exit is more likely to be horizontal then perpendicular.

William Tanksley

1999-04-28 17:54:02 UTC

[deltia]
Brian> Arne,
Python would be appropriate for much more problems if it would only be as fast
as other scripting languages. The bytecode interpreter IS significantly slower
than other byte code interpreted languages. Since we all know that python
is more productive than most other languages, this becomes sooner or later an
issue because one would not be able some tasks in python because it is just
to slow.

Your data is correct (Python is slow for many things, slower than it needs
to be), but your conclusion is wrong. Python isn't slow because its
bytecode engine is slow; actually, although there's a lot of room for
improvement, its bytecode engine is faster than many of the others out
there.

The reason it's slow is its runtime model. _Every_ function call requires
a lookup in a hash table, just on the off-chance that the programmer
changed the meaning of the function.

It seems to me that without a redesign of at least the bytecode for
function calls python's speed will not take off.

Bytecode won't help enough -- the whole calling model needs to be
examined. Fortunately, that's one of the things the 2.0 design process
will be looking at. Like you, I hope that they consider Smalltalk as an
example.

And Oberon (SlimBinaries), and Eiffel (typing and general compile-time
error catching), and ...

Markus

--
-William "Billy" Tanksley
"But you shall not escape my iambics."
-- Gaius Valerius Catullus

William Tanksley

1999-04-29 23:06:15 UTC

Post by William Tanksley
And Oberon (SlimBinaries), and Eiffel (typing and general
compile-time error catching), and ...

Actually, isn't Eiffel's type system famous for being full of
holes?

William> I'm familiar with the covariance/contravariance argument,
William> but I've never before heard anyone say anything about
William> Eiffel being full of holes. What problems have you heard
William> of?
As far as I remember Eiffel requires a global system analysis to be type
safe at compile time.

According to Meyer, that would be the most permissive thing to do. He
also suggested an option, the rule about "no polymorphic CATcalls", which
would not require such intensive processing and would allow most of the
same things.

All in all, though, after having read over Meyer's materials very
carefully, I have to say that I don't like his justifications. I just
looked at the Sather site (Sather is an Eiffel-inspired language which
uses contravariance), and they explain how to solve the problem he was
demonstrating (in a MUCH more logical and reuable way!).

Since Sather is itself an subclass of Eiffel, I don't think I'm violating
any typing rules by claiming that imitating Sather is similar to taking
things from Eiffel ;-).

Sather itself offers some interesting options for inheritance. For
example (let me translate to Python):

class Gizmo(Object):

inherits the interface of Object, but _none_ of its implementation. If
you want the same member functions as Object had, you have to do this:

class Gizmo(Object):
import Object
from Object import new, del, etc

This means that classes can behave like modules (because they bring
functions into their namespace the same way)... And logically, vice
versa.

Of course, "inheriting an interface" is only meaningful with static
typing. Notice that this also addresses the issue of interfaces -- every
class is an interface.

The disturbing thing, of course, is that this is totally incompatible with
the current way of doing things. That's easy to overcome, though:

class Gizmo(Object, <Doodad>):
from Doodad import *

Regardless, wouldn't a better source of inspiration on typing
be a language with *optional* compile-time error checking?

William> Good question. I don't know the answer, but I can point
William> out the dangers of not planning our path.

[the dangers of C++ and other loose languages -- mixed measures considered
harmful]

William> The other dangerous side, of course, is being one of the
William> languages which are so safe you can't use them without a
William> huge manual and pages of UML. Python can NOT wind up
William> like this, regardless of the dangers obvious in the other
William> side.
Agreed. The main problem with todays statically typed languages is that
they are too restrictive and to complicated.

I'm not sure that they're _too_ restrictive; they work in their domains.
The problem is that we don't want to loose Python's greatest strength:
it's friendly.

William> Let me expand a little on the dangers of C++.
William> - virtual functions. It shouldn't be my job to tell the
William> compiler when virtual lookups are the only thing to do;
William> the compiler ought to be able to tell. For performance
William> gurus, a way to hint to the compiler that virtual lookups
William> were _not_ needed might be useful -- but what if the
William> programmer was wrong?
True, SmallEiffel inlines most of the virtual function calls automatically.
This is the way to go.

Inlines virtual functions? I think you got your wires crossed ;-).
Virtual functions, by definition, can't be inlined. Inlining is where you
place the routine's source code directly into the code of the calling
routine.

William> - GC -- hard to add after the fact. No worry with
William> Python, but what if there are other issues like it?
I'm not sure if I understand this sentence. Do you mean GC is hard to add
to python without breaking existing code ?
I would agree with that. And why do you say "no worry"
Do you mean GC is not worth the effort ?

Guido seems to think so. Anyone who disagrees is free to contribute and
work on porting GC code (I would like to see it).

I think that worrying about the other issues is FAR more rewarding -- RC
is not only hard to replace with GC, but the rewards for doing so are
relatively small (in comparison to the other things we can think about).

I don't want to stop anyone from trying, though. The nice thing about
free software is that even if everyone disagrees with you or thinks it's
not worth it you can still get it done.

So what other issues are worth thinking about? I don't know.

You may have a look at Strongtalk a Smalltalk with static (optional)
type checking.

Found it using Google.com (a GOOD search engine for technical matters).
http://java.sun.com/people/gbracha/nwst.html

I'll be spending some time reading it. My initial impression: _very_
good. This would fit well, I think.

Markus

--
-William "Billy" Tanksley
"But you shall not escape my iambics."
-- Gaius Valerius Catullus

Markus Kohler

1999-04-30 07:13:53 UTC

[deletia]

Agreed. The main problem with todays statically typed languages
is that they are too restrictive and to complicated.

William> I'm not sure that they're _too_ restrictive; they work in
William> their domains. The problem is that we don't want to
William> loose Python's greatest strength: it's friendly.

William> Let me expand a little on the dangers of C++.

William> - virtual functions. It shouldn't be my job to tell the
William> compiler when virtual lookups are the only thing to do;
William> the compiler ought to be able to tell. For performance
William> gurus, a way to hint to the compiler that virtual lookups
William> were _not_ needed might be useful -- but what if the
William> programmer was wrong?

True, SmallEiffel inlines most of the virtual function calls
automatically. This is the way to go.

William> Inlines virtual functions? I think you got your wires
William> crossed ;-). Virtual functions, by definition, can't be
William> inlined. Inlining is where you place the routine's
William> source code directly into the code of the calling
William> routine.

No I'm serious. Even most C++ compilers cannot inline virtual functions,
that doesn't mean it is not possible ;-)

A virtual function call can be inlined if you know all subtypes of a type:

Assume you have a class "A" and classes "AA" and "AB" that are subclasses of
A.

Given

A thing;
thing = createFromFactory();/* it's not known here at compile whether thing is
of type A, AA or AB */
thing.doSomething();

Then the a call A.doSomething() can be inlined ( translated to pseudo-C (tm)).

A thing
thing = createFromFactory();

if (thing.type)==AA
{
doSomething__AA(thing);
}
elseif (thing.type)==AC
{
doSomething__AB(thing);
}
elseif (thing.type)==A
{
doSomething__A(thing);
}
else
{
performHashBasedSlowCall(thing, doSomething)
}

Nobody says that one has to use vtables like C++ does ;-)

SmallEiffel does exactly this. They used have a paper about their technique
on their web side: http://SmallEiffel.loria.fr/index.html

Self a smalltalk successor did the same without static type information
years ago : http://self.smli.com/
Sun's new HotSpot Java compiler is based on this technology.

Another way to speed up calling polymorphic functions is to use
a cache of the latest (or the first) type together with the functions called for
this type. This works very well for Smalltalk and should be a good thing in python too.

William> - GC -- hard to add after the fact. No worry with
William> Python, but what if there are other issues like it?

I'm not sure if I understand this sentence. Do you mean GC is
hard to add to python without breaking existing code ? I would
agree with that. And why do you say "no worry" Do you mean GC
is not worth the effort ?

William> Guido seems to think so. Anyone who disagrees is free to
William> contribute and work on porting GC code (I would like to
William> see it).

William> I think that worrying about the other issues is FAR more
William> rewarding -- RC is not only hard to replace with GC, but
William> the rewards for doing so are relatively small (in
William> comparison to the other things we can think about).

I would guess it's a lot of work to replace the RC with a GC, because a good
GC needs a way to know if something is a pointer or not and this would require
changing object layouts. Also all the reference counting calls would need to be
removed. OK it might be possible to replace this code with macros that just do nothing.

But the main problem would be that a lot of external (C) code would break with a decent GC
because the GC would move objects around at runtime.

William> I don't want to stop anyone from trying, though. The
William> nice thing about free software is that even if everyone
William> disagrees with you or thinks it's not worth it you can
William> still get it done.

William> So what other issues are worth thinking about? I don't
William> know.

You may have a look at Strongtalk a Smalltalk with static
(optional) type checking.

William> Found it using Google.com (a GOOD search engine for
William> technical matters).
William> http://java.sun.com/people/gbracha/nwst.html

William> I'll be spending some time reading it. My initial
William> impression: _very_ good. This would fit well, I think.

Nice to hear that ...

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Christian Tismer

1999-04-30 13:02:43 UTC

|> A batch-mode optimizer analyzing an entire file (module) should be able to
|> detect whether or not function names are rebound.
|>
|> Perhaps module bindings should be considered immutable from outside the
|> module unless explicitly declared otherwise.
|
|I'm thinking of no new keyword, but a mechanism which allows me to lock a
|namespace somehow.
I like this idea in concept. Though I would prefer a way to have
namespaces "lock by default". Examples: After a class definition, the
class function dictionary is locked. After a module is fully read, all
references are bound and the module namespace is locked. etc.

Well, I wouldn't do that by default. By default, everything
could stay as it is. First of all, this would not break any
existing code. Then, many people will want to
fine tune their modules, and they are perhaps not done
after a class definition was ready.

Then, what would you do with classes which depend on each
other? You cannot lock them immediately, this would fail.
Locking them after they both are defined is fine, since
everything is defined then. With minimum effort and no
language changes, this will be needed.

Then think of all the more difficult systems which need
more effort to become configured. The xml parsers together
with SAX are an example. If I wanted to lock this, then
this must be done with care. One would also not lock the mixin
classes, but only the final workhorse class, bound with
the correctly selected parser, and so on.

It might also be necessary to find a way to specify which
attributes may be locked and which not, since there exist
indeed cases where Python's super-flexibility is needed :-)

Depending on how exactly will be implemented, a single line
at the end of a module should suffice to accomplish this stuff
for the standard cases. Fine adjustment would take a little more.
As a side effect, locking a module would also find all
referenced but undefined symbols.

Anyway, this is still no cakewalk and quite a lot of code
is involved. Needs much more thinking...

ciao - chris

--
Christian Tismer :^) <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home

Randall Hopper

1999-04-30 14:12:15 UTC

Christian Tismer:
|Randall Hopper wrote:
|> Christian Tismer:
|> |I'm thinking of no new keyword, but a mechanism which allows me to lock a
|> |namespace somehow.
|>
|> I like this idea in concept. Though I would prefer a way to have
|> namespaces "lock by default". Examples: After a class definition, the
|> class function dictionary is locked. After a module is fully read, all
|> references are bound and the module namespace is locked. etc.
|
|Well, I wouldn't do that by default. By default, everything could stay as
|it is. First of all, this would not break any existing code.

Right. I wasn't too clear. By default, I mean I don't want to have to
insert some commands/declarations for every namespace (every class, every
method, every module, etc.) to lock them. I want this to happen
automagically based on a switch.

A command-line option (-w), or something like Perl's "use strict"
declaration would be a reasonable way to enable this behavior.

|Then, what would you do with classes which depend on each
|other? You cannot lock them immediately, this would fail.

Could you give an example?

|Depending on how exactly will be implemented, a single line
|at the end of a module should suffice to accomplish this stuff
|for the standard cases.

This would prevent namespace binding and locking from occuring while the
module is being parsed, wouldn't it? With a lock declaration at the
beginning, Python could do this as it goes. Seems like that would be
easier (Python sees end of function -> module.symbol referenced in the
function was not defined -> flag error and abort parse).

|As a side effect, locking a module would also find all
|referenced but undefined symbols.

That's the goal I'm rooting for. ;-)

|Anyway, this is still no cakewalk and quite a lot of code
|is involved. Needs much more thinking...

Definitely. No doubt Guido and the other language experts have a better
feel for this than I do. But I felt compelled to chime-in on this topic
since it's important to me (and the squeaky wheel gets the grease. :-)

Randall

Christian Tismer

1999-04-30 17:20:07 UTC

Randall Hopper wrote:

[me, trying to give an example of forward references]

Well, I personally would call this "broken". :-)
Base class A calling subclass B's method M without declaring a virtual
method M itself is very perverse IMO.

Then please go and tell this to Jim Fulton (Bobo/Zope)
or look into the XML-sig's parsers.

A mixin class is a mixin, which is not supposed to produce
prototypes for everybody who uses it. This is a different
idea which is implied by multiple inheritance, IMHO.

"mixin class"
self.parser_method(self.scanner, args)
def parser_method(self,...)
""" Pure virtual method """
pass
...
Then there's no problem. self.parse_method will always resolve, possibly
to a subclass method but at least to this pure virtual stub.
|It will also not resolve self.magic, since it doesn't
|inherit from abstractparser.
I bet you 5 bucks you know what I'm going to say. :-)
Accessing attributes of a sibling base class united by a future subclass?
[Cringe!] Sheesh. I wouldn't want to try to follow _that_ code, much less
debug it.

We were not talking style or how to do things, but
theory. Existing code would break. Some could be fixed,
some not, all-in-all I don't want to change the language,
but have my (at least:) programs run very fast.

|Yes, but this would limit Python down to Pascal like name spaces.
|You would need all kinds of tricks to write recoursions like
|
| return three(arg-1)
| return arg
|
| return two(arg-1)
| return arg
Good point for deferring resolution until the end of the module.
I'm sold. I'd prefer this to having to use forward declarations.

There'se also a lot more of possible flexibility which
we should keep. Some modules tend to produce code on-the-fly,
invent functions dynamically, whatever.
Also I often see initialization code like

import string
my_split = string.split
del string

and doing some more. During creation (execution) of the module
code, the name space is changing quite much. Some module
are stable afterwards, and they can be completely frozen, but,
of course, this would be about the last line of code.

ciao - chris

Randall Hopper

1999-04-30 17:40:46 UTC

Christian Tismer:
|Randall Hopper wrote:
|>
|> Base class A calling subclass B's method M without declaring a virtual
|> method M itself is very perverse IMO.
...
|> Accessing attributes of a sibling base class united by a future subclass?
...

|We were not talking style or how to do things, but theory. Existing code
|would break.

Point taken. No argument there. I was a bit surprised to hear some folks
are doing such things, but I suppose if a semantic is possible in a
language, there will always be someone who will use it.

Randall

Paul Prescod

1999-04-30 14:40:19 UTC

Post by Randall Hopper
Right. I wasn't too clear. By default, I mean I don't want to have to
insert some commands/declarations for every namespace (every class, every
method, every module, etc.) to lock them. I want this to happen
automagically based on a switch.

But the "right thing" varies on a class by class basis. Some classes in a
module need to be locked and some do not.

--
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself
http://itrc.uwaterloo.ca/~papresco

"Microsoft spokesman Ian Hatton admits that the Linux system would have
performed better had it been tuned."
"Future press releases on the issue will clearly state that the research
was sponsored by Microsoft."
http://www.itweb.co.za/sections/enterprise/1999/9904221410.asp

Christian Tismer

1999-04-30 15:34:42 UTC

Randall Hopper wrote:

[tismer().chris about locking]

Post by Randall Hopper
|Well, I wouldn't do that by default. By default, everything could stay as
|it is. First of all, this would not break any existing code.
Right. I wasn't too clear. By default, I mean I don't want to have to
insert some commands/declarations for every namespace (every class, every
method, every module, etc.) to lock them. I want this to happen
automagically based on a switch.
A command-line option (-w), or something like Perl's "use strict"
declaration would be a reasonable way to enable this behavior.

Now, well, but Perl is quite different here. As I remember,
it doesn't have all the dynamic features of Python.
In Python, you can reference any identifier in a function
or class definition without the need that the object exists.
That's the reason why I'm talking of an optional feature,
since it has reasonable semantic side effects.

Post by Randall Hopper
|Then, what would you do with classes which depend on each
|other? You cannot lock them immediately, this would fail.
Could you give an example?

class abstractparser:
magic = 42
pass # numerous default and stub methods

class parserops:
"mixin class"
def general_method1(self, *args):
self.parser_method(self.scanner, args)
def funky_method(self, *args):
#some code there
return self.magic

class someparser(abstractparser, parserops):
def parser_method(self, scanner, *args):
# do something, and then use the mixin
self.funky_method(2, 3, 5)

Sorry about my lack of spirit today, this example is bad.
But, if you lock class parserops after it is created,
it will barf, since parser_method cannot be resolved yet.
It will also not resolve self.magic, since it doesn't
inherit from abstractparser.

But if I lock someparser, it will get all of its methods
right though the inheritance mechanism once.

Post by Randall Hopper
|Depending on how exactly will be implemented, a single line
|at the end of a module should suffice to accomplish this stuff
|for the standard cases.
This would prevent namespace binding and locking from occuring while the
module is being parsed, wouldn't it? With a lock declaration at the
beginning, Python could do this as it goes. Seems like that would be
easier (Python sees end of function -> module.symbol referenced in the
function was not defined -> flag error and abort parse).

Yes, but this would limit Python down to Pascal like name spaces.
You would need all kinds of tricks to write recoursions like

def two(arg):
if arg % 2 == 0:
return three(arg-1)
return arg

def three(arg):
if arg % 3 == 0:
return two(arg-1)
return arg

(again not good :)

This would need to be locked after both are defined,
otherwise we had to invent declarations which is bad.

Post by Randall Hopper
|As a side effect, locking a module would also find all
|referenced but undefined symbols.
That's the goal I'm rooting for. ;-)

If it is just that, download Aaron Watter's kwParsing module
and use its pylint module to see lots of complaints about
your source :-)

Post by Randall Hopper
|Anyway, this is still no cakewalk and quite a lot of code
|is involved. Needs much more thinking...
Definitely. No doubt Guido and the other language experts have a better
feel for this than I do. But I felt compelled to chime-in on this topic
since it's important to me (and the squeaky wheel gets the grease. :-)

Sure that they will also not like my idea, but this
cannot stop me from trying :-)

ciao - chris

Randall Hopper

1999-04-30 17:08:07 UTC

Christian Tismer:
|[tismer().chris about locking]
|
|> A command-line option (-w), or something like Perl's "use strict"
|> declaration would be a reasonable way to enable this behavior.
|
|Now, well, but Perl is quite different here. As I remember,
|it doesn't have all the dynamic features of Python.

My point was not to imply "what" would be done by the switch, only "how" it
could be flipped ;-)

|> |Then, what would you do with classes which depend on each
|> |other? You cannot lock them immediately, this would fail.
|>
|> Could you give an example?
|
|class abstractparser:
| magic = 42
| pass # numerous default and stub methods
|
|class parserops:
| "mixin class"
| def general_method1(self, *args):
| self.parser_method(self.scanner, args)
| def funky_method(self, *args):
| #some code there
| return self.magic
|
|class someparser(abstractparser, parserops):
| def parser_method(self, scanner, *args):
| # do something, and then use the mixin
| self.funky_method(2, 3, 5)
|
|Sorry about my lack of spirit today, this example is bad.
|But, if you lock class parserops after it is created,
|it will barf, since parser_method cannot be resolved yet.

Well, I personally would call this "broken". :-)
Base class A calling subclass B's method M without declaring a virtual
method M itself is very perverse IMO.

If we have:
class parserops:
"mixin class"
def general_method1(self, *args):
self.parser_method(self.scanner, args)
def parser_method(self,...)
""" Pure virtual method """
pass
...

Then there's no problem. self.parse_method will always resolve, possibly
to a subclass method but at least to this pure virtual stub.

|It will also not resolve self.magic, since it doesn't
|inherit from abstractparser.

I bet you 5 bucks you know what I'm going to say. :-)
Accessing attributes of a sibling base class united by a future subclass?
[Cringe!] Sheesh. I wouldn't want to try to follow _that_ code, much less
debug it.

|Yes, but this would limit Python down to Pascal like name spaces.
|You would need all kinds of tricks to write recoursions like
|
|def two(arg):
| if arg % 2 == 0:
| return three(arg-1)
| return arg
|
|def three(arg):
| if arg % 3 == 0:
| return two(arg-1)
| return arg

Good point for deferring resolution until the end of the module.
I'm sold. I'd prefer this to having to use forward declarations.

Randall

Michael Hudson

1999-04-30 15:11:39 UTC

Well, I wouldn't do that by default. By default, everything
could stay as it is. First of all, this would not break any
existing code. Then, many people will want to
fine tune their modules, and they are perhaps not done
after a class definition was ready.
Then, what would you do with classes which depend on each
other? You cannot lock them immediately, this would fail.
Locking them after they both are defined is fine, since
everything is defined then. With minimum effort and no
language changes, this will be needed.
Then think of all the more difficult systems which need
more effort to become configured. The xml parsers together
with SAX are an example. If I wanted to lock this, then
this must be done with care. One would also not lock the mixin
classes, but only the final workhorse class, bound with
the correctly selected parser, and so on.
It might also be necessary to find a way to specify which
attributes may be locked and which not, since there exist
indeed cases where Python's super-flexibility is needed :-)
Depending on how exactly will be implemented, a single line
at the end of a module should suffice to accomplish this stuff
for the standard cases. Fine adjustment would take a little more.
As a side effect, locking a module would also find all
referenced but undefined symbols.
Anyway, this is still no cakewalk and quite a lot of code
is involved. Needs much more thinking...

I think I can do this. Not to classes yet, but given a function, I can
make you another function where all the global lookups are bound to
the values found at that moment.
... return x+y
...

import closure
g=closure.bind(f,y=1)
f(1)

Traceback (innermost last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in f
NameError: y

g(1)

2

It's then easy to write a function that binds all the variables found
in the current environment:

def bind_now(func):
try:
raise ""
except:
import sys
frame=sys.exc_traceback.tb_frame.f_back
l=apply(bind,(func,),frame.f_locals)
g=apply(bind,(l,),frame.f_globals)
return g

import closure
y=1

... return x+y
...

f(0)

g=closure.bind_now (f)
y=2
f(0)

g(0)

1
Is this what you wanted?

A word of warning: this code is nasty, *nasty*, NASTY. Possibly the
most horrible thing you will see perpetrated in Python this year. It
applies regular expressions to strings of bytecode...

I made Python core repeatedly when debugging it.

However, it works. The returned functions are very fast. I wrote this
package because I wanted to avoid both the tackiness of the `default
argument hack' and the performance penalty of using classes to fake
closures.

As to `sealing' classes in this fashion, I guess it could be
done. You'd need to look for patterns of LOAD_FAST 0 (the first
argument is the zeroth local) followed by LOAD_ATTR. You could then
calculate this and insert a LOAD_CONST instead. The thing is, this
would replace two argumented bytecode with one, changing the length of
the codestring and you'd need to recompute jumps. I haven't had the
bloody-mindedness to get this to work yet.

Code follows...

HTH
Michael

import new,string,re

def copy_code_with_changes(codeobject,
argcount=None,
nlocals=None,
stacksize=None,
flags=None,
code=None,
consts=None,
names=None,
varnames=None,
filename=None,
name=None,
firstlineno=None,
lnotab=None):
if argcount is None: argcount = codeobject.co_argcount
if nlocals is None: nlocals = codeobject.co_nlocals
if stacksize is None: stacksize = codeobject.co_stacksize
if flags is None: flags = codeobject.co_flags
if code is None: code = codeobject.co_code
if consts is None: consts = codeobject.co_consts
if names is None: names = codeobject.co_names
if varnames is None: varnames = codeobject.co_varnames
if filename is None: filename = codeobject.co_filename
if name is None: name = codeobject.co_name
if firstlineno is None: firstlineno = codeobject.co_firstlineno
if lnotab is None: lnotab = codeobject.co_lnotab
return new.code(argcount,
nlocals,
stacksize,
flags,
code,
consts,
names,
varnames,
filename,
name,
firstlineno,
lnotab)

def encode_16(n):
return '\\%03o\\%03o'%(n%256,n/256)

LOAD_CONST=chr(100)
LOAD_GLOBAL=chr(116)

def munge_code(code,vars):
codestring=code.co_code
names=list(code.co_names)
consts=list(code.co_consts)
for var,value in vars.items():
try:
index=names.index(var)
except ValueError:
continue
codestring=re.sub(LOAD_GLOBAL+encode_16(index),
LOAD_CONST+encode_16(len(consts)),
codestring)
consts.append(value)
return copy_code_with_changes(
code,
consts=tuple(consts),
code=codestring)

def bind(func,**vars):
newfunc=new.function(
munge_code(func.func_code,vars),
func.func_globals,
func.func_name)
newfunc.__doc__=func.__doc__
newfunc.func_defaults=func.func_defaults
newfunc.func_doc=func.func_doc
return newfunc

def bind_locals(func):
try:
raise ""
except:
import sys
frame=sys.exc_traceback.tb_frame.f_back
l=apply(bind,(func,),frame.f_locals)
frame=None
return l

def bind_now(func):
try:
raise ""
except:
import sys
frame=sys.exc_traceback.tb_frame.f_back
l=apply(bind,(func,),frame.f_locals)
g=apply(bind,(l,),frame.f_globals)
return g

## examples

def make_adder(n):
def adder(x):
return x+n
return bind_locals(adder)

def make_balance(initial_amount):
def withdraw(amount):
if current[0]<amount:
raise "debt!"
else:
current[0]=current[0]-amount
return current[0]
return bind

Amit Patel

1999-04-30 17:55:35 UTC

Michael Hudson <mwh21 at cam.ac.uk> wrote:
|
| A word of warning: this code is nasty, *nasty*, NASTY. Possibly the
| most horrible thing you will see perpetrated in Python this year. It
| applies regular expressions to strings of bytecode...

[code removed]

| def make_adder(n):
| def adder(x):
| return x+n
| return bind_locals(adder)

Yow.

I think you are INSANE. :-) I thought bind_locals was impossible to
implement in Python, and you go and implement it. Eek!

Amit, too afraid to use bind_locals

Michael Hudson

1999-05-02 15:12:56 UTC

[speeup for functors, too?]

Now, that's interesting. In all those insane speed
races, we always came up with solutions like
to avoid global lookups. They became default values
for not otherwise used parameters.
Big Q: Is your method even faster?
Now, I did an extra benchmark and it turns out that
the old default parameter trick is still a few cycles
faster.
As you can see with my benchmark, it would be nice
to take the chance and allow to change the function
name when binding.

That's easy-peasy:

def xapply_func(func,args,newname=None):
if newname is None:
newname=func.func_name
newfunc=new.function(
munge_code(func.func_code,args),
func.func_globals,
newname)
newfunc.__doc__=func.__doc__
newfunc.func_defaults=func.func_defaults
newfunc.func_doc=func.func_doc
return newfunc

or

def bind(func,newname=None,**vars):
if newname is None:
newname=func.func_name
newfunc=new.function(
munge_code(func.func_code,vars),
func.func_globals,
newname)
newfunc.__doc__=func.__doc__
newfunc.func_defaults=func.func_defaults
newfunc.func_doc=func.func_doc
return newfunc

(unless you want to bind a variable called newname or func, of
course. I guess you could mutilate bind.func_code so that Python
thinks the first two parameters have names that are illegal
identifiers. Hmm.)

Then, what about indirections? I used to use default
parameter assignments like add=operator.add
which I now would have to do elsewhere to get rid
of that indirection. Do you see a chance to bind
these through as well, maybe for builtins only, or
for surely read-only attributes?

That would be nice, and I have thought about it, but you're replacing

LOAD_GLOBAL ('operator')
LOAD_ATTR ('add')

which is six bytes with

LOAD_CONST (n)

which is just three bytes. So, as I think I said earlier, you'd need
to recompute jumps, and I haven't yet had the bloody-mindedness to get
that to work.

It'll probably happen though. Skip Montanaro has written a peephole
optimizer for Python
(http://www.automatrix.com/~skip/python/spam7/optimizer.html); this
must do these manipulations, so maybe I'll have a look at that for
inspiration.

The similarity of the code in xapply.py and in closure.py makes me
think that I should write some general framework for mutilating
codestrings. Support for changing the length of the codestring is
pretty much a must for that.

What might the reason be for the slight slowdown
between load_const and default parameter access?
load_fast appears to be a little quicker.

Poking around the source shows that a LOAD_FAST just indexes into an
array, whereas LOAD_CONST indexes into a Python tuple. Fast, but still
a couple of C function calls of indirection on top of LOAD_FAST.

The upshot of this is that:

def func0(x):
return x + 0 + 0 + 0 + 0 + 0

is slower than

def func1(x, g1=g1, g2=g2, g3=g3, g4=g4, g5=g5):
return x + g1 + g2 + g3 + g4 + g5

(!)

The time differentials we're talking about here are very, very small..

Could it be even better to turn it upside down
and invent hidden default parameters instead? :-)

I don't know what the implementation details of making co_consts a
bare C array of PyObject*s would be. It wouldn't be hard, but I don't
really think it's worth the effort.

I've tweaked the benchmark slightly and added it to the tarball on the
starship.

thanks & cheers - chris

A pleasure!

Michael

Michael Hudson

1999-05-04 22:14:03 UTC

It goes on...

Post by Michael Hudson
The similarity of the code in xapply.py and in closure.py makes me
think that I should write some general framework for mutilating
codestrings. Support for changing the length of the codestring is
pretty much a must for that.

Maybe, yes. WOuld mean to break the code list into slices,
add some lables, and re-resolve them again. Looks like
not the big deal, but some work.

I've abandonded any hope of doing any real work this evening and
bashed on this a bit more. There's a new tarball on the starship that
implements a first attempt at a general code editing framework, and a
(buggy) implementation of an attribute freezer.

Comments are very much appreciated! My background isn't in this kind
of stuff (heck, I'm a mathematician), so if someone knows a better way
of organising things, I'd certainly like to know

Thanks for the fun.
Wondering how far this can continue :-)

I'm sure there's life in it yet...

Michael

Michael Hudson

1999-05-06 11:48:00 UTC

Post by Michael Hudson
It goes on...

Maybe, yes. WOuld mean to break the code list into slices,
add some lables, and re-resolve them again. Looks like
not the big deal, but some work.

I've abandonded any hope of doing any real work this evening and
bashed on this a bit more. There's a new tarball on the starship that
implements a first attempt at a general code editing framework, and a
(buggy) implementation of an attribute freezer.
Comments are very much appreciated! My background isn't in this kind
of stuff (heck, I'm a mathematician), so if someone knows a better way
of organising things, I'd certainly like to know

Thanks for the fun.
Wondering how far this can continue :-)

I'm sure there's life in it yet...
Michael

Well, everyone seems to have ignored that code, which is fair enough
because it sucked. There's yet another tarball on starship (release
early! release often!) - go to http://starship.python.net/crew/mwh and
click the link.

Here's what I've added to the README (which should probably be called
NEWS, but never mind):

-- begin

Thu May 6 11:47:54 BST 1999

Update! I've added a more general code editing packages, in
code_editor.py.

The centerpiece of the module is the CodeString class. This allows one
to treat a code string as a list of instructions. The (auto-generated)
file ops.py contains a class for each opcode, so you can do things
like this:

cs=code_editor.CodeString(<some codestring>)
cs[0:3]=[ops.SET_LINENO(3),ops.LOAD_CONST(0),ops.RETURN_VALUE()]
cs.assemble() => a codestring that can be passed to new.code.

code_editor.py also contains a couple of classes to make editing code
objects (as opposed to code strings) and function easier.

So you can do this:

from bytecodehacks import code_editor,ops

def f():
return "This function does something really interesting"

g=code_editor.Function(f)
g.func_code.co_code[:]=[ops.SET_LINENO(3),
ops.LOAD_CONST(len(g.func_code.co_consts)),
ops.RETURN_VALUE()]
g.func_code.co_consts.append("Not any more!")
g=g.make_function()
print f()
print g()

to produce:

This function does something really interesting
Not any more!

This is just a toy example, obviously. This package doesn't make
bytecode twiddling *easy*, but it does help not having to worry too
much about recomputing jumps.

To see some more concrete examples of use, look at xapply2.py and
attr_freeze.py.

xapply2.py contains another implementation of xapply (see below).

attr_freeze.freeze_attrs does a similar job to closure.bind below, but
can resolve attribute references too. It can't use the cute
`bind(f,var=value)' interface of bind, but I've made efforts to make
it as painless to use as possible. An example:

from bytecodehacks import attr_freeze
import sys

def ff(x):
if x:
print sys.exit
else:
print sys.copyright

_=attr_freeze.Ref()

gg=attr_freeze.freeze_attrs(ff,
_.sys.exit,sys.exit,
_.sys.copyright,sys.copyright)

I hope someone finds these interesting.

Thanks again for taking an interest!

-- end

I have some even more grandiose plans for this project - constant
expression evaluation, maybe, or how about a function inliner?

Revise, Michael, revise!

Michael

Christian Tismer

1999-05-02 16:00:13 UTC

[big snip]

Post by Michael Hudson
That would be nice, and I have thought about it, but you're replacing
LOAD_GLOBAL ('operator')
LOAD_ATTR ('add')
which is six bytes with
LOAD_CONST (n)
which is just three bytes. So, as I think I said earlier, you'd need
to recompute jumps, and I haven't yet had the bloody-mindedness to get
that to work.

I don't think so. Just replace the superfluous opcode with
a 3-byte NOOP, i.e. SET_LINENO 4711, and then...

Post by Michael Hudson
It'll probably happen though. Skip Montanaro has written a peephole
optimizer for Python
(http://www.automatrix.com/~skip/python/spam7/optimizer.html); this
must do these manipulations, so maybe I'll have a look at that for
inspiration.

... simply pour it through Skip's code, which should
remove all the unnecessary crap. :-)

Maybe, yes. WOuld mean to break the code list into slices,
add some lables, and re-resolve them again. Looks like
not the big deal, but some work.

What might the reason be for the slight slowdown
between load_const and default parameter access?
load_fast appears to be a little quicker.

Not true. Both evaluate to similar load insns by macro
expansion. The reason is just a continue-shortcut which
avoids a little of the error checking code outside of
the interpreter loop. But as said in another post,
it isn't worth considering, and the code which the
MS compiler emits seems to be quite arbitrary and
should better not be touched.

...

Post by Michael Hudson
I don't know what the implementation details of making co_consts a
bare C array of PyObject*s would be. It wouldn't be hard, but I don't
really think it's worth the effort.

Nor do I.
Thanks for the fun.
Wondering how far this can continue :-)

ciao - chris

Michael Hudson

1999-05-02 12:03:32 UTC

One of these days I'll remember that `R' is `reply' in gnus, and I
usually want to hit `F'....
[snip]

Post by Michael Hudson
A word of warning: this code is nasty, *nasty*, NASTY. Possibly the
most horrible thing you will see perpetrated in Python this year. It
applies regular expressions to strings of bytecode...

I'm really impressed. No, what is ugly about that?

Well, it's highly dependent on the internals of the interpreter, for
starters.

Don Beaudry's functor module comes into mind.
Not fiddling bytecodes, but the techniques are nearly
as "insane" as yours. Do you think that functors
could be sped up with your bytecode art?

It seems so. I've found something to do with my starship page beyond
basic fiddling at last; go to http://starship.python.net/crew/mwh to
find a little tarball containing closure.py from my last post and
xapply.py. xapply.py implements functor.xapply for the simplest cases;
it ignores varadic details and keyword arguments entirely. It does
work for methods (bound and unbound).

I haven't timed it, but so far as I understand, it should implement
partial argument resolution with slowdown by a factor of 1 :-).

xapply.py contains improvements to my bytecode fiddling suggested by
Tim Peters (which were sent to the list as well but don't seem to have
shown up on the newsgroup yet). I haven't merged these into
closure.py, but it won't take long... I've just done it in fact.

I *so* should be revising...

Michael

Christian Tismer

1999-05-02 14:09:20 UTC

Michael Hudson wrote:

[bytecode fiddling]

I'm really impressed. No, what is ugly about that?

Well, it's highly dependent on the internals of the interpreter, for
starters.

Right. But if treated as an internal system tool
which is just used after understanding the
implication, there is no difference to something
like the Python interpreter itself. People who
look at it are on themselves, or they had better
not looked into it :-)

[speeup for functors, too?]

Post by Michael Hudson
It seems so. I've found something to do with my starship page beyond
basic fiddling at last; go to http://starship.python.net/crew/mwh to
find a little tarball containing closure.py from my last post and
xapply.py. xapply.py implements functor.xapply for the simplest cases;
it ignores varadic details and keyword arguments entirely. It does
work for methods (bound and unbound).
I haven't timed it, but so far as I understand, it should implement
partial argument resolution with slowdown by a factor of 1 :-).

Now, that's interesting. In all those insane speed
races, we always came up with solutions like

def my_lightspeed_impl(arg1, arg2, None=None, map=map, ...):

to avoid global lookups. They became default values
for not otherwise used parameters.
Big Q: Is your method even faster?
Now, I did an extra benchmark and it turns out that
the old default parameter trick is still a few cycles
faster.
Possible improvements:
As you can see with my benchmark, it would be nice
to take the chance and allow to change the function
name when binding.
Then, what about indirections? I used to use default
parameter assignments like add=operator.add
which I now would have to do elsewhere to get rid
of that indirection. Do you see a chance to bind
these through as well, maybe for builtins only, or
for surely read-only attributes?

What might the reason be for the slight slowdown
between load_const and default parameter access?
load_fast appears to be a little quicker.
Could it be even better to turn it upside down
and invent hidden default parameters instead? :-)

thanks & cheers - chris
--
Christian Tismer :^) <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home
-------------- next part --------------
# timing of Mike Hudson's binding

import closure

g1 = g2 = g3 = g4 = g5 = 0

def func1(x, g1=g1, g2=g2, g3=g3, g4=g4, g5=g5):
return x + g1 + g2 + g3 + g4 + g5

def func2(x):
return x + g1 + g2 + g3 + g4 + g5

func3 = closure.bind_now(func2)

def timing(func, args, n=1, **keywords) :
import time
time=time.time
appl=apply
if type(args) != type(()) : args=(args,)
rep=range(n)
before=time()
for i in rep: res=appl(func, args, keywords)
return round(time()-before,4), res

def test():
for func in (func1, func2, func3):
print func.__name__, timing(func, 42, 100000)

result = """

test()

func1 (1.54, 42)
func2 (1.98, 42)
func2 (1.59, 42)
"""

Christian Tismer

1999-05-02 14:54:05 UTC

Just a word about LOAD_FAST and LOAD_CONST.

I looked into the macro expansions in ceval.c, and the real
difference is just that LOAD_FAST has a shortcut in the
interpreter loop which reads

if (x != NULL) continue;

LOAD_CONST doesn't have this. I did a quick test to see
the difference, and afterwards, LOAD_CONST was in fact
slightly faster than LOAD_FAST.
But...

result = """

Post by Christian Tismer
test()

func1 (6.32, 42) # LOAD_FAST
func2 (7.36, 42) # LOAD_GLOBAL
func2 (6.21, 42) # should read func3, does LOAD_CONST
# with patch to ceval.c

Post by Christian Tismer
test()

func1 (7.08, 42)
func2 (7.8, 42)
func2 (6.76, 42)
"""

... due to the platform specific optimization strategy
of the compiler, an overall speed loss was the effect.
Makes no sense to supply a patch. Instead I would claim
that both opcodes are virtually at the same speed and
one should not worry about it.

ciao - chris (back to real work:)

Christian Tismer

1999-05-01 16:18:14 UTC

[we all about locking to do kinda early binding]

Post by Michael Hudson
I think I can do this. Not to classes yet, but given a function, I can
make you another function where all the global lookups are bound to
the values found at that moment.

Whow. Really! A true hacker :)

[examples of usage]

Post by Michael Hudson
Is this what you wanted?

Partially, yes. This is as far as one can go without
extending the interpreter. The next step would be
to break into the API, freeze types, and so on...

I'm really impressed. No, what is ugly about that?
Don Beaudry's functor module comes into mind.
Not fiddling bytecodes, but the techniques are nearly
as "insane" as yours. Do you think that functors
could be sped up with your bytecode art?
I think a good trick is never ugly if written with
elegance.

I will try some more with your code and contact you by
private email.

ciao - chris

Paul Prescod

1999-04-28 20:04:00 UTC

Post by William Tanksley
And Oberon (SlimBinaries), and Eiffel (typing and general compile-time
error catching), and ...

Actually, isn't Eiffel's type system famous for being full of holes?

Regardless, wouldn't a better source of inspiration on typing be a
language with *optional* compile-time error checking?

Al Christians

1999-04-29 02:19:57 UTC

Post by Paul Prescod
Actually, isn't Eiffel's type system famous for being full of holes?

I'm familiar with the covariance/contravariance argument, but I've never
before heard anyone say anything about Eiffel being full of holes. What
problems have you heard of?

I think it's called changing availability by type, or some such. It is
possible to delete a feature in a descendant class, leaving a hole in
polymorphic calls through the base class.

Al

William Tanksley

1999-04-29 23:11:38 UTC

Post by Al Christians

Post by Paul Prescod
Actually, isn't Eiffel's type system famous for being full of holes?

I'm familiar with the covariance/contravariance argument, but I've never
before heard anyone say anything about Eiffel being full of holes. What
problems have you heard of?

I think it's called changing availability by type, or some such.

No, CAT is a feature of any object system (never a bug).

Post by Al Christians
It is
possible to delete a feature in a descendant class, leaving a hole in
polymorphic calls through the base class.

This would be true, but feature deletion is not used for public
interfaces. It's mainly used when you're treating inheritance as though
it were 'import' (certainly a major weakness of Eiffel). It's true that
you can abuse it by making that feature deletion public, but the type
system could just as easily forbid you from doing any such thing (I have
no idea whether it does).

I think you were talking about Meyer's insistance on covariance. I would
call that _one_ hole, albeit a strange one. It's certainly not enough to
condemn it as full of holes -- its type system is in other ways quite
robust.

Post by Al Christians
Al

--
-William "Billy" Tanksley
"But you shall not escape my iambics."
-- Gaius Valerius Catullus

Jason Trenouth

1999-04-29 10:48:23 UTC

In article <372769B0.3CE8C0F3 at prescod.net>,

Post by Paul Prescod

Post by William Tanksley
And Oberon (SlimBinaries), and Eiffel (typing and general compile-time
error catching), and ...

Actually, isn't Eiffel's type system famous for being full of holes?
Regardless, wouldn't a better source of inspiration on typing be a
language with *optional* compile-time error checking?

I've been playing around with Dylan recently, and it seems like what
Python would be if you added "end" blocks and mated it with CLOS. Since
Dylan is nearly as dynamic as Python, I think it might be a good source
of inspiration for Python 2. (And it might even be the case that the
Dylan-to-C compiler might be a source of good bits to improve Python's
speed. I haven't looked at the source yet, though.)

Conversely, I was just looking at Python recently (as related work for a
paper).

The Python community might like to take a look at Dylan:

http://www.gwydiondylan.org

http://www.harlequin.com/products/ads/dylan/

As a taster, here is the "invert" example in Python and Dylan:

# Python
def invert(table):
index = {} # empty dictionary
for key in table.keys():
value = table[key]
if not index.has_key(value):
index[value] = [] # empty list
index[value].append(key)
return index

// Dylan
define function invert (table)
let index = make(<table>);
for (value keyed-by key in table)
index[value] := add!(element(index, value, default: #()), key);
end;
index
end;

The "keyed-by" clause in the "for" statement is a slight cheat since it is a
Harlequin extension (otherwise the Dylan code would be even more similar to
the Python code).

with-heavy-troll ()
What would you give for interactive development and native compilation and
incremental gc and objects-all-the-way-down and extensible syntax and COM and
CORBA and a great GUI toolkit?
end;

:-j

__Jason

Reimer Behrends

1999-04-29 23:37:24 UTC

Jason Trenouth (jason at harlequin.com) wrote:
[...]

Post by Jason Trenouth
with-heavy-troll ()
What would you give for interactive development and native compilation and
incremental gc and objects-all-the-way-down and extensible syntax and COM and
CORBA and a great GUI toolkit?
end;

A lot--now if I could only get Harlequin Dylan to (1) use a readable
font and (2) run on my Linux machine. So I have to settle for Gwydion
Dylan right now, which is nice, but not nearly as nice as what you
described above. :)

(I would still second Jason's comment and suggest that people have a
look at the free version of Harlequin Dylan--if I had to develop code
under Windows, it would probably be my IDE of choice, with a couple of
reservations.)

Reimer Behrends

doylep

1999-04-29 19:25:23 UTC

In article <p54slzn7rt.fsf at bidra241.bbn.hp.com>,

As far as I remember Eiffel requires a global system analysis to be type
safe at compile time.

Well, the story is a bit more complicated. The bottom line is that current
compilers use run-time checks in the dangerous situations, and that the
dangerous situations are easily avoided anyway.

If you avoid covariant arguments and feature hiding, as many do, Eiffel is
statically typesafe with no global analysis. Personally, I'd prefer if Eiffel
didn't offer these language "features" at all.

--
Patrick Doyle
doylep at ecf.toronto.edu

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

Paul Duffin

1999-04-29 09:35:20 UTC

Let me expand a little on the dangers of C++.
- virtual functions. It shouldn't be my job to tell the compiler when
virtual lookups are the only thing to do; the compiler ought to be able to
tell. For performance gurus, a way to hint to the compiler that virtual
lookups were _not_ needed might be useful -- but what if the programmer
was wrong?

How can the compiler tell ?

- inline functions. Again, a good compiler HAS to make this decision for
itself, and in a good compiler, whether or not this decision was made
should be transparent to the programmer.

A good compiler will, inline is only a hint to the compiler, the compiler
can choose whether or not to inline that function, it can also choose to
inline other functions which have not been marked as inline.

- templates. Generic functions are a very good thing, and with Python
2's type support we might wind up needing them. In C++, templates are
also a very good thing, but thanks to the C model of seperate compilation,
no two C++ compilers handle templates compatibly.

True enough.

- single-root object hierarchy -- this is a must when the default binding
is virtual.
- GC -- hard to add after the fact. No worry with Python, but what if
there are other issues like it?
- covariance/contravariance/"no"variance -- when a subclass redefines a
function, what types of input parameters can the redefinition accept? In
C++, you can't accept a different type without overloading the function.
With covariance (in Sather) you can allow the redefined function to accept
a parent class, which allows the new function to handle more subclasses
(including all the ones the old function handled). With contravariance
you can require the redefinition to be more specific, accepting fewer
classes. Aside from my obvious dislike of novariance, I'm not going to
take sides ;-).

This is something which I would love to be able to do in C++.

--
Paul Duffin
DT/6000 Development Email: pduffin at hursley.ibm.com
IBM UK Laboratories Ltd., Hursley Park nr. Winchester
Internal: 7-246880 International: +44 1962-816880

Markus Kohler

1999-04-29 14:05:25 UTC

Let me expand a little on the dangers of C++.
- virtual functions. It shouldn't be my job to tell the
compiler when virtual lookups are the only thing to do; the
compiler ought to be able to tell. For performance gurus, a
way to hint to the compiler that virtual lookups were _not_
needed might be useful -- but what if the programmer was wrong?

Paul> How can the compiler tell ?

What William means (I think) is that the default should be that functions
are virtual. Since even virtuals functions can be inlined this makes sense.

- inline functions. Again, a good compiler HAS to make this
decision for itself, and in a good compiler, whether or not
this decision was made should be transparent to the programmer.

Paul> A good compiler will, inline is only a hint to the compiler,
Paul> the compiler can choose whether or not to inline that
Paul> function, it can also choose to inline other functions which
Paul> have not been marked as inline.

[agree with all the other points]

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

William Tanksley

1999-04-29 23:18:25 UTC

Post by Paul Duffin

How can the compiler tell ?

Because there's only one possibility in all of the libraries with which
the system is linked. Or perhaps it can tell in some other way -- perhaps
the object to which the message is being sent was just declared and
couldn't have changed.

But let's suppose you have the stupidest compiler in the world (i.e. one
that never makes optimizations). What's the _worst_ thing that could
happen if functions defaulted to virtual? Your code would be a tiny bit
slower. Yawn.

But with only a little more cleverness in the compiler, you wind up with
the compiler able to make virtual calls when virtual calls are needed, and
direct ones otherwise. This means that the same function could be called
virtually one moment and directly the next.

Post by Paul Duffin

- inline functions. Again, a good compiler HAS to make this decision for
itself, and in a good compiler, whether or not this decision was made
should be transparent to the programmer.

Good. So get rid of it.

Oh, and even a good C++ compiler can't choose to inline a function which
wasn't defined in the header file. And at that, it doesn't have complete
information about what the global impact of inlining will be -- only for
that one module.

Post by Paul Duffin
Paul Duffin

--
-William "Billy" Tanksley
"But you shall not escape my iambics."
-- Gaius Valerius Catullus

Neel Krishnaswami

1999-04-29 00:34:24 UTC

In article <372769B0.3CE8C0F3 at prescod.net>,

Post by Paul Prescod

Post by William Tanksley
And Oberon (SlimBinaries), and Eiffel (typing and general compile-time
error catching), and ...

Actually, isn't Eiffel's type system famous for being full of holes?
Regardless, wouldn't a better source of inspiration on typing be a
language with *optional* compile-time error checking?

William Tanksley

1999-04-28 22:54:30 UTC

Post by Paul Prescod

Post by William Tanksley
And Oberon (SlimBinaries), and Eiffel (typing and general compile-time
error catching), and ...

Actually, isn't Eiffel's type system famous for being full of holes?

I'm familiar with the covariance/contravariance argument, but I've never
before heard anyone say anything about Eiffel being full of holes. What
problems have you heard of?

Post by Paul Prescod
Regardless, wouldn't a better source of inspiration on typing be a
language with *optional* compile-time error checking?

Good question. I don't know the answer, but I can point out the dangers
of not planning our path.

One dangerous side of allowing flexibility is that we could be like C++,
where the safe way of doing anything requires more typing and often causes
incompatibilities. Examples range from virtual functions; through inline
functions, templates, single-root object hierarchy, and garbage
collection; to covariance/contravariance.

The other dangerous side, of course, is being one of the languages which
are so safe you can't use them without a huge manual and pages of UML.
Python can NOT wind up like this, regardless of the dangers obvious in the
other side.

Let me expand a little on the dangers of C++.

- virtual functions. It shouldn't be my job to tell the compiler when
virtual lookups are the only thing to do; the compiler ought to be able to
tell. For performance gurus, a way to hint to the compiler that virtual
lookups were _not_ needed might be useful -- but what if the programmer
was wrong?

- inline functions. Again, a good compiler HAS to make this decision for
itself, and in a good compiler, whether or not this decision was made
should be transparent to the programmer.

- templates. Generic functions are a very good thing, and with Python
2's type support we might wind up needing them. In C++, templates are
also a very good thing, but thanks to the C model of seperate compilation,
no two C++ compilers handle templates compatibly.

- single-root object hierarchy -- this is a must when the default binding
is virtual.

- GC -- hard to add after the fact. No worry with Python, but what if
there are other issues like it?

- covariance/contravariance/"no"variance -- when a subclass redefines a
function, what types of input parameters can the redefinition accept? In
C++, you can't accept a different type without overloading the function.
With covariance (in Sather) you can allow the redefined function to accept
a parent class, which allows the new function to handle more subclasses
(including all the ones the old function handled). With contravariance
you can require the redefinition to be more specific, accepting fewer
classes. Aside from my obvious dislike of novariance, I'm not going to
take sides ;-).

Post by Paul Prescod
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself

--
-William "Billy" Tanksley
"But you shall not escape my iambics."
-- Gaius Valerius Catullus

Markus Kohler

1999-04-29 08:35:50 UTC

Post by William Tanksley
And Oberon (SlimBinaries), and Eiffel (typing and general
compile-time error catching), and ...

Actually, isn't Eiffel's type system famous for being full of
holes?

William> I'm familiar with the covariance/contravariance argument,
William> but I've never before heard anyone say anything about
William> Eiffel being full of holes. What problems have you heard
William> of?

As far as I remember Eiffel requires a global system analysis to be type
safe at compile time.

Regardless, wouldn't a better source of inspiration on typing
be a language with *optional* compile-time error checking?

William> Good question. I don't know the answer, but I can point
William> out the dangers of not planning our path.

William> One dangerous side of allowing flexibility is that we
William> could be like C++, where the safe way of doing anything
William> requires more typing and often causes incompatibilities.
William> Examples range from virtual functions; through inline
William> functions, templates, single-root object hierarchy, and
William> garbage collection; to covariance/contravariance.

William> The other dangerous side, of course, is being one of the
William> languages which are so safe you can't use them without a
William> huge manual and pages of UML. Python can NOT wind up
William> like this, regardless of the dangers obvious in the other
William> side.

Agreed. The main problem with todays statically typed languages is that
they are too restrictive and to complicated.

William> Let me expand a little on the dangers of C++.

William> - virtual functions. It shouldn't be my job to tell the
William> compiler when virtual lookups are the only thing to do;
William> the compiler ought to be able to tell. For performance
William> gurus, a way to hint to the compiler that virtual lookups
William> were _not_ needed might be useful -- but what if the
William> programmer was wrong?

True, SmallEiffel inlines most of the virtual function calls automatically.
This is the way to go.

William> - inline functions. Again, a good compiler HAS to make
William> this decision for itself, and in a good compiler, whether
William> or not this decision was made should be transparent to
William> the programmer.

Agreed.

William> - templates. Generic functions are a very good thing,
William> and with Python 2's type support we might wind up needing
William> them. In C++, templates are also a very good thing, but
William> thanks to the C model of seperate compilation, no two C++
William> compilers handle templates compatibly.

The C/C++ compilation model is outdated. Python should *never* adapt that.

William> - single-root object hierarchy -- this is a must when the
William> default binding is virtual.

William> - GC -- hard to add after the fact. No worry with
William> Python, but what if there are other issues like it?

I'm not sure if I understand this sentence. Do you mean GC is hard to add
to python without breaking existing code ?
I would agree with that. And why do you say "no worry"
Do you mean GC is not worth the effort ?

William> - covariance/contravariance/"no"variance -- when a
William> subclass redefines a function, what types of input
William> parameters can the redefinition accept? In C++, you
William> can't accept a different type without overloading the
William> function. With covariance (in Sather) you can allow the
William> redefined function to accept a parent class, which allows
William> the new function to handle more subclasses (including all
William> the ones the old function handled). With contravariance
William> you can require the redefinition to be more specific,
William> accepting fewer classes. Aside from my obvious dislike
William> of novariance, I'm not going to take sides ;-).

You may have a look at Strongtalk a Smalltalk with static (optional) type checking.
Sorry no links, but if a www search does not work I can sent you a document about it.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Christian Tismer

1999-04-30 08:11:54 UTC

In article <slrn7ieipq.8uk.wtanksle at dolphin.openprojects.net>,
wtanksle at dolphin.openprojects.net says...

The reason it's slow is its runtime model. _Every_ function call requires
a lookup in a hash table, just on the off-chance that the programmer
changed the meaning of the function.

A batch-mode optimizer analyzing an entire file (module) should be able to
detect whether or not function names are rebound.
Perhaps module bindings should be considered immutable from outside the
module unless explicitly declared otherwise. (This would of course require a
new keyword and would break the rare existing code that does this until the
new directive was added.)

I'm thinking of no new keyword, but a mechanism which
allows me to lock a namespace somehow. At the end of a
library module, this function could be called for the module,
and all functions would be bound to a special table.
I would also like to be able to do this with functions
and classes. A function would be modified by resolving
all identifiers once. For classes, I would do a similar
snapshot, which would collect once every function/method which
can be seen by inheritance.
This is of course a semantic change, but it is an option
for people who know the implications. There is no need
for a library module to be slow, if it just suffers from
flexibility.

ciao - chris

Randall Hopper

1999-04-30 12:08:05 UTC

Christian Tismer:
|Terry Reedy wrote:
|> A batch-mode optimizer analyzing an entire file (module) should be able to
|> detect whether or not function names are rebound.
|>
|> Perhaps module bindings should be considered immutable from outside the
|> module unless explicitly declared otherwise.
|
|I'm thinking of no new keyword, but a mechanism which allows me to lock a
|namespace somehow.

I like this idea in concept. Though I would prefer a way to have
namespaces "lock by default". Examples: After a class definition, the
class function dictionary is locked. After a module is fully read, all
references are bound and the module namespace is locked. etc.

Randall

Terry Reedy

1999-04-29 14:35:04 UTC

In article <slrn7ieipq.8uk.wtanksle at dolphin.openprojects.net>,
wtanksle at dolphin.openprojects.net says...

The reason it's slow is its runtime model. _Every_ function call requires
a lookup in a hash table, just on the off-chance that the programmer
changed the meaning of the function.

A batch-mode optimizer analyzing an entire file (module) should be able to
detect whether or not function names are rebound.

Perhaps module bindings should be considered immutable from outside the
module unless explicitly declared otherwise. (This would of course require a
new keyword and would break the rare existing code that does this until the
new directive was added.)

TJR

Markus Kohler

1999-04-29 08:17:41 UTC

[deletia]
William> Your data is correct (Python is slow for many things,
William> slower than it needs to be), but your conclusion is
William> wrong. Python isn't slow because its bytecode engine is
William> slow; actually, although there's a lot of room for
William> improvement, its bytecode engine is faster than many of
William> the others out there.

It seems to me that *most* of the byte code engine is ok.

William> The reason it's slow is its runtime model. _Every_
William> function call requires a lookup in a hash table, just on
William> the off-chance that the programmer changed the meaning of
William> the function.

That is the same in Smalltalk (squeak). You can build new classes and new
methods at runtime and everything still works as expected.

It seems to me that without a redesign of at least the bytecode
for function calls python's speed will not take off.

William> Bytecode won't help enough -- the whole calling model
William> needs to be examined. Fortunately, that's one of the
William> things the 2.0 design process will be looking at. Like
William> you, I hope that they consider Smalltalk as an example.

I hope so too. My point was exactly that the bytecode for doing the
function call seems to be the problem.

William> And Oberon (SlimBinaries), and Eiffel (typing and general
William> compile-time error catching), and ...

OPTIONAL types would be cool.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Graham Matthews

1999-04-30 15:53:22 UTC

William Tanksley (wtanksle at dolphin.openprojects.net) wrote:
: Your data is correct (Python is slow for many things, slower than it needs
: to be), but your conclusion is wrong. Python isn't slow because its
: bytecode engine is slow; actually, although there's a lot of room for
: improvement, its bytecode engine is faster than many of the others out
: there.
:
: The reason it's slow is its runtime model. _Every_ function call requires
: a lookup in a hash table, just on the off-chance that the programmer
: changed the meaning of the function.

I don't buy this. As far as I know every function call in Self and
Smalltalk also requires a lookup (just like Python), but both languages
are significantly faster than Python it seems.

graham

--
This ain't no technological breakdown
Oh no, this is the road to hell

Reimer Behrends

1999-04-29 23:27:10 UTC

You are right that one should choose the right tool for a problem, but
I disagree that Python is optimized for the general case. Squeak a free
Smalltalk implementation (www.squeak.org), is already much faster ( about
3 times actually ) than python and it has even a true Garbage
Collector.

This is a little off-topic but I'm curious whether squeak has an embedding
API. Is there any languge that is as easy to embed as Python, and also has
full garbage collection?

Ruby. (Was discussed here before.) See

http://www.netlab.co.jp/ruby/

I think it is actually easier to extend with C code (because you don't
have to keep track of reference counts, among other things). On the
other hand, I could live without some of the more Perl-inspired
constructs in Ruby (nothing against Perl in general, it just happens
that its feature set is just the opposite of what I personally like in
a programming language -- i.e. thousands of different idioms and
extensive use of punctuation symbols).

Reimer Behrends

Paul Prescod

1999-04-30 14:25:27 UTC

Post by Reimer Behrends

Is there any languge that is as easy to embed as Python, and also has
full garbage collection?

Ruby. (Was discussed here before.) See

Maybe you can describe its embedding API. How does it handle the issue of
C pointers to objects that are moving around? How does it know if the
embedding code has buried a pointer away somewhere?

Friedrich Dominicus

1999-04-30 17:26:37 UTC

[deletia]
Mitchell> If it's not too much trouble, could you post your
Mitchell> benchmark code and results, either here or on a web
Mitchell> page?
Of course I only did useless microbenchmarks ;-)

If you have some of them please send me a copy and allow me to add them
too:

http://edt.vol.cz:81/bench/

I like to collect more and more of them to get more values to get it
more fair.
Of course I'll write them in other languages too for comparison. If you
do such a test would you be so kind to document how long the development
took and how much bugs you have to fix? This is IMO another important
point to judge when a language is appropriate and when not.

Regards
Friedrich

Paul Prescod

1999-04-28 15:43:28 UTC

This is a little off-topic but I'm curious whether squeak has an embedding
API. Is there any languge that is as easy to embed as Python, and also has
full garbage collection?

Markus Kohler

1999-04-29 08:13:46 UTC

You are right that one should choose the right tool for a
problem, but I disagree that Python is optimized for the
general case. Squeak a free Smalltalk implementation
(www.squeak.org), is already much faster ( about 3 times
actually ) than python and it has even a true Garbage
Collector.

Paul> This is a little off-topic but I'm curious whether squeak
Paul> has an embedding API. Is there any languge that is as easy
Paul> to embed as Python, and also has full garbage collection?

No really off-topic, because not using a GC makes embedding much easier, because
objects do not move around. Squeak has 'plugins' but these are not yet good
enough to allow embedding of arbitrary C code. The main problem is of course
that some C code cannot use Squeak allocated objects because the address of the object
may change because of garbage collection. Also Squeak currently lacks an easy general
mechanism to callback Squeak code from C.

In summary one can say that with a decent GC (generational copying + mark and sweep)
embedding is more diffcult. Anyway I think a fairly simple embedding API with could
be done, it's just more work to do once.
Visualworks a commercial Smalltalk implementation shows it's possible to do it.
They also have a COM integration for example.

I don't know of any other language with a decent garbage that is easy to embed.

Maybe SmallEiffel does it. I is a free Eiffel compiler that translates to C has an (optional) GC
and allows to call C code.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Greg Ewing

1999-05-04 03:30:14 UTC

Is there any languge that is as easy to embed as Python, and also has
full garbage collection?

Before I discovered Python, I played around with
Elk Scheme quite a lot. Its extension/embedding
interface is very nice - perhaps even nicer than
Python's, since there are no refcounts to lose
sleep over. And, like all Lisp/Scheme variants,
it has full GC.

I don't know how its speed compares with Python.
In principle, I imagine that a bytecoded Scheme
implementation could be made somewhat faster than
Python, because Scheme isn't quite so pathologically
dynamic - global var references can be implemented
without requiring a dictionary lookup on every
access, for example. But then Scheme doesn't
have modules or classes as part of the core language,
so it's hard to compare them directly.

Greg

Markus Kohler

1999-04-28 07:57:18 UTC

[deltia]

Brian> Arne,

Brian> While I'm not going to go near comparing Python to Perl, I
Brian> will comment that different languages are just that -
Brian> different. As such, the approach you would take in one
Brian> language may not be the most appropriate (or comparable in
Brian> speed or efficiency) to the approach you would take in
Brian> another.

Brian> The question here (IMHO) is not Python's appropriateness
Brian> for processing large datasets (a fair number of
Brian> scientist-types do this all the time), or even the speed of
Brian> Python in general, but using the most appropriate
Brian> algorithms in the context of the language in use.

Python would be appropriate for much more problems if it would only be as fast
as other scripting languages. The bytecode interpreter IS significantly slower
than other byte code interpreted languages. Since we all know that python
is more productive than most other languages, this becomes sooner or later an
issue because one would not be able some tasks in python because it is just
to slow.

Brian> For example, Perl is very regex-centric, so your example
Brian> Perl implementation is probably perfectly appropriate for
Brian> Perl. Python tends to be more optimized for the general
Brian> case, and if it were _me_, I wouldn't bother with using
Brian> regular expressions in this case,. Since you have a
Brian> predictable file format, there are more specific (and
Brian> efficient) Python tools that you could use here.

You are right that one should choose the right tool for a problem, but
I disagree that Python is optimized for the general case. Squeak a free
Smalltalk implementation (www.squeak.org), is already much faster ( about
3 times actually ) than python and it has even a true Garbage
Collector. As soon as the first Just in Time Compiler has been
finished, it may even come close to Visualworks a commercial
Smalltalk implementation with a Just in Time Compiler. At the moment
Visualworks is still at least 5 times faster than squeak.
I tell you all this just to give you some idea how fast a language that is
as flexible as python could be.

[deletia]

From profiling python 1.5.2c I found that python's main problem is that

calling functions seems to be very costly. Also I have not yet looked at
this issue in detail it is obvious that the code to call a
python function is pretty complex.

I guess this is because python supports default arguments and other
'nice' features for calling functions.

It seems to me that without a redesign of at least the bytecode for
function calls python's speed will not take off.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Mitchell Morris

1999-04-28 11:24:37 UTC

In article <p5g15lmb35.fsf at bidra241.bbn.hp.com>, Markus Kohler wrote:
[snip]

Python would be appropriate for much more problems if it would only be as fast
as other scripting languages. The bytecode interpreter IS significantly slower
than other byte code interpreted languages.

[snip]

From profiling python 1.5.2c I found that python's main problem is that
calling functions seems to be very costly.

[snip]

Markus
--
Markus Kohler mailto:markus_kohler at hp.com

If it's not too much trouble, could you post your benchmark code and results,
either here or on a web page?

+Mitchell

--
Mitchell Morris

Underlying Principle of Socio-Genetics: Superiority is recessive.

Markus Kohler

1999-04-30 11:56:31 UTC

The numbers for tak(18, 12, 6) see my previous post.

Python-1.5.2c1 compiled with HP compiler "cc -O"
0.52 seconds

Squeak (www.squeak.org) 2.4 running with a 2.3 virtual machine compiled with gcc (egcs 1.02) -O
0.13 seconds. this is compiled with gcc since it uses some gcc extensions that speed up the the
case statement for the byte code dispatching. Even the HP compiler produces much better code
than gcc, in this case the gcc compiled squeak is usally faster.

Of course the comparsion is not really fair because squeak might
or might not run a the garbage collector during the test (probably not).

But I doubt that this would explain a factor of 4.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Markus Kohler

1999-04-30 07:31:51 UTC

[deletia]

Mitchell> If it's not too much trouble, could you post your
Mitchell> benchmark code and results, either here or on a web
Mitchell> page?

Of course I only did useless microbenchmarks ;-)

One was :
# benchmark in Python
# see <url:http://www.lib.uchicago.edu/keith/crisis/benchmarks/tak/>
# Keith Waclena <k-waclena at uchicago.edu>

def tak (x, y, z):
if not(y < x):
return z
else:
return tak(tak(x-1, y, z), tak(y-1, z, x), tak(z-1, x, y))

this is usally a pretty good tests of function call speed.
Since in Smalltalk really everything is object oriented I could not
just define a global function that is not a method of some class.
Therefore I decided to use class methods and introduce a class TestRec
just for that :

takx: x y: y z:z
"Test function call speed"
^(y<x) ifFalse: [z]
ifTrue: [TestRec takx: (TestRec takx:(x-1)y: y z:z) y: (TestRec takx:(y-1) y:z z: x ) z:(TestRec takx:(z-1)y:x z: y)]

I don't have the exact numbers anymore but the latest version of python was about 3 times slower than squeak.

Other things I tested were simple loops like this :

#!/usr/local/bin/python
import time;
def counter():

i = 0;
start = time.time();
for i in range(10000):
j = 0.0;
for k in range(100):
j = j * k;
return time.time() - start;

print counter();

Even Smalltalk's loops are more powerfull than pythons "for" loop, the result
was almost the same as for tak.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Greg Ewing

1999-05-04 22:31:10 UTC

From profiling python 1.5.2c I found that python's main problem is that
calling functions seems to be very costly.

This is just a guess, but one contributor to that might be
the way that it packs all the arguments up into a tuple,
and then immediately unpacks them again upon entering
the function.

Also, it allocates each stack frame as a separate object.
So that's at least two memory allocation/free operations
per procedure call.

A more efficient way would be to use one big stack for
all procedure calls, and just leave the parameters on
the stack in the usual case where a procedure with a
fixed number of arguments is called without any keyword
args, etc.

Greg

Christian Tismer

1999-05-05 13:24:17 UTC

Hi Guys, I think we are missing something

Post by Greg Ewing
Also, it allocates each stack frame as a separate object.
So that's at least two memory allocation/free operations
per procedure call.

...

iirc, the self people discovered that two optimizations
-- method and block in-lining

...

-- call-site caching

This is all wonderful stuff. But I think we are
missing one point, and if I didn't go completely nuts
last night, there might be some fairly low hanging
fruit which could save 30% of runtime, at least speaking of NT.

After Markus' recommendation, I loaded a trial version of
Visual Quantify, a tool for timing of compiled code.
This tool works with optimized code very well, finds
every system call and so on.

I configured Quantify to ignore all .dll files but python15.dll
and msvcrt.dll, to get an idea where the core spends
most of its time.

Test 1: Running python.exe with the commandline
-c "from test.pystone import *;map (lambda x:main(), range(10))"

Result time: 22.36% malloc, 14.35% free.

Test 2: Running pythonw.exe and Idle, and removing again
all system calls and Tcl/Tk calls, just keeping the two dlls as
above:

Result time: 19.47% malloc, 12.29% free.

Which tells me that if we could avoid most of these calls
and provide a very fast allocation scheme for small blocks,
I believe a single extension module would gain 25% of win!

This all under the assumption that these results were reliable.

ciao - chris

Markus Kohler

1999-05-07 09:51:20 UTC

Guido van Rossum <guido at eric.cnri.reston.va.us> writes:

[deletia]

You call avoiding malloc() and free() low-hanging fruit? You must be
deluded <0.5 wink>. All the low-hanging fruit has already been
picked: special allocation strategies for ints, frames, even small
tuples! You could try linking in GNU malloc and see if it's faster
than native malloc -- now *that* is low-hanging fruit. (GNU malloc
has a reputation for being the fastest malloc arround; Microsoft's
doesn't :-)

I found that Doug Lea's dmalloc is also pretty good. Run's on Windows
and most Unixes.

Some Unixes support the mallopt call to tune malloc to give you more speed
by using more space.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Christian Tismer

1999-05-06 12:34:36 UTC

Markus Kohler wrote:
<snip/>

Running pystone on HP-UX the result for 1.5.2c is 3.37% for malloc and
3.95%
for free :-(

Are you absolutely sure? Did you run my tests, and did you
really clean up and filter out every other dll timings?
Especially getc can trick you if you were in an interactive
session. Also, if you wiped too many dlls out, you might
miss the real workhorses. malloc and free might be thin
wrappers around a system routine.
You need to go through all the functions and libraries
to wipe exactly the right things.

Sorry if I sound so doubtful, but it took me a couple
of hours until I was sure to presend my results. Can
you please check again?

Nevertheless, if it pais off under Windows, then I will
write a really good malloc just for Winnows, as an extension
module.

cheers - chris

Gordon McMillan

1999-05-06 15:36:46 UTC

Post by Christian Tismer
<snip/>

Running pystone on HP-UX the result for 1.5.2c is 3.37% for malloc and
3.95%
for free :-(

Are you absolutely sure?

/snip snip/

I find Markus' numbers just-slightly-surprising. I don't find your
numbers for Windows surprising at all. After all, there are some
software companies who charge $800 or so for a drop in replacement,
and claim 30-50% speed ups. Allowing for marketing-inflation, that's
just about right.

(Actually, there are, or at least were, certain pathological cases
in which multithreading on a multiprocessor Windows box would slow
to a crawl because of poorly designed locking within malloc / free.)

(And, in MS's defense, let me point out that their debugging versions
are very nice - they let you track down rogue pointers / premature
frees much faster than most other libs.)

- Gordon

Guido van Rossum

1999-05-06 12:42:26 UTC

Post by Christian Tismer
This is all wonderful stuff. But I think we are
missing one point, and if I didn't go completely nuts
last night, there might be some fairly low hanging
fruit which could save 30% of runtime, at least speaking of NT.
Test 1: Running python.exe with the commandline
-c "from test.pystone import *;map (lambda x:main(), range(10))"
Result time: 22.36% malloc, 14.35% free.
Test 2: Running pythonw.exe and Idle, and removing again
all system calls and Tcl/Tk calls, just keeping the two dlls as
Result time: 19.47% malloc, 12.29% free.
Which tells me that if we could avoid most of these calls
and provide a very fast allocation scheme for small blocks,
I believe a single extension module would gain 25% of win!

Christian Tismer

1999-05-06 17:30:10 UTC

[a little Windoze timing]

Post by Christian Tismer
Which tells me that if we could avoid most of these calls
and provide a very fast allocation scheme for small blocks,
I believe a single extension module would gain 25% of win!

I don't like to pull a GNU licensed thing into Python.
Further, I did statistics on the malloc calls for
pystone and Idle, and it turns out that 99 percent
of all mallocs in a typical Python program are of
a size upto 128 bytes.
Therefore, my malloc will only handle these cases
in a very fast way, and redirect all other allocations
(including its own buffers) to the original
malloc function, which may spend the time it needs
for the one percent of cases.

This can be nearly equally as fast as your extra code
for ints, frames and small tuples.
I'm going to gather it.

ciao - chris

Markus Kohler

1999-05-06 12:15:11 UTC

Post by Christian Tismer
Hi Guys, I think we are missing something

Post by Greg Ewing
Also, it allocates each stack frame as a separate object.
So that's at least two memory allocation/free operations
per procedure call.

...

iirc, the self people discovered that two optimizations
-- method and block in-lining

...

-- call-site caching

Running pystone on HP-UX the result for 1.5.2c is 3.37% for malloc and
3.95%
for free :-(

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Markus Kohler

1999-05-05 13:35:42 UTC

Post by Christian Tismer
Hi Guys, I think we are missing something

Post by Greg Ewing
Also, it allocates each stack frame as a separate object.
So that's at least two memory allocation/free operations
per procedure call.

...

iirc, the self people discovered that two optimizations
-- method and block in-lining

...

-- call-site caching

This is all wonderful stuff. But I think we are
missing one point, and if I didn't go completely nuts
last night, there might be some fairly low hanging
fruit which could save 30% of runtime, at least speaking of NT.
After Markus' recommendation, I loaded a trial version of
Visual Quantify, a tool for timing of compiled code.
This tool works with optimized code very well, finds
every system call and so on.
I configured Quantify to ignore all .dll files but python15.dll
and msvcrt.dll, to get an idea where the core spends
most of its time.
Test 1: Running python.exe with the commandline
-c "from test.pystone import *;map (lambda x:main(), range(10))"
Result time: 22.36% malloc, 14.35% free.
Test 2: Running pythonw.exe and Idle, and removing again
all system calls and Tcl/Tk calls, just keeping the two dlls as
Result time: 19.47% malloc, 12.29% free.
Which tells me that if we could avoid most of these calls
and provide a very fast allocation scheme for small blocks,
I believe a single extension module would gain 25% of win!

I think I have seen that too. But as far as I can remember It didn't
appear in all my test cases. I would need to check that.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Markus Kohler

1999-05-05 12:18:54 UTC

Post by Greg Ewing

From profiling python 1.5.2c I found that python's main problem is that
calling functions seems to be very costly.

This is just a guess, but one contributor to that might be
the way that it packs all the arguments up into a tuple,
and then immediately unpacks them again upon entering
the function.

Aha ...

Post by Greg Ewing
Also, it allocates each stack frame as a separate object.
So that's at least two memory allocation/free operations
per procedure call.

one could build a custom allocation routine that would preallocate
a block of memory. If the frame objects are the same size this could
eliminate most of the allocation/deallocation time.

Post by Greg Ewing
A more efficient way would be to use one big stack for
all procedure calls, and just leave the parameters on
the stack in the usual case where a procedure with a
fixed number of arguments is called without any keyword
args, etc.

Could you explain me a bit more about the keyword args ?
What's the problem with them ?

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Markus Kohler

1999-05-05 12:55:32 UTC

Post by Markus Kohler

Post by Greg Ewing
Also, it allocates each stack frame as a separate object.
So that's at least two memory allocation/free operations
per procedure call.

one could build a custom allocation routine that would preallocate
a block of memory. If the frame objects are the same size this could
eliminate most of the allocation/deallocation time.

/* Stack frames are allocated and deallocated at a considerable rate.
In an attempt to improve the speed of function calls, we maintain a
separate free list of stack frames
...
iirc, the self people discovered that two optimizations
-- method and block in-lining
or in other words, avoid creating new call frames whenever
possible. Python is already a bit better off than Smalltalk
and Self (since the structural verbs doesn't create new
blocks).
-- call-site caching
that is, for each call-site in the source program, keep track
of the target used the last time. instead of doing a dynamic
lookup, just jump there and let the method check that "self"
is what it expected.

This is what I was trying to describe in my proposal.
Unfortunately I'm a poor German and english is not my native language .
and call-site was the key word I couldn't remember ..

not sure if/how the latter can be combined with Python's
current semantics (iirc, self uses lots of extra points to be
able to update things when you dynamically modify classes
and individual instances...)

I think it can be applied.

There are two strategies actually one is to the keep track fo the target
used the last time
and the other is just to keep the first target. The later has the
advantage of not needing
a lot of updates, but is probably less accurate.
If you only track one target all this is not very diffucult to
implement. Invalidating caches if the code for
the call-site changes should be pretty easy.

What Self does is far more difficult. It records the last <n> targets.
It does that to inline code which makes
recompiling code necessary in case the inlined code changes.

MUCH more difficult ...

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Fredrik Lundh

1999-05-05 12:50:33 UTC

Post by Markus Kohler

Post by Greg Ewing
Also, it allocates each stack frame as a separate object.
So that's at least two memory allocation/free operations
per procedure call.

one could build a custom allocation routine that would preallocate
a block of memory. If the frame objects are the same size this could
eliminate most of the allocation/deallocation time.

use the source, luke:

from frameobject.c:

/* Stack frames are allocated and deallocated at a considerable rate.
In an attempt to improve the speed of function calls, we maintain a
separate free list of stack frames

...

iirc, the self people discovered that two optimizations
dominated everything:

-- method and block in-lining

or in other words, avoid creating new call frames whenever
possible. Python is already a bit better off than Smalltalk
and Self (since the structural verbs doesn't create new
blocks).

-- call-site caching

that is, for each call-site in the source program, keep track
of the target used the last time. instead of doing a dynamic
lookup, just jump there and let the method check that "self"
is what it expected.

not sure if/how the latter can be combined with Python's
current semantics (iirc, self uses lots of extra points to be
able to update things when you dynamically modify classes
and individual instances...)

</F>

Guido van Rossum

1999-05-04 23:53:31 UTC

Post by Greg Ewing

From profiling python 1.5.2c I found that python's main problem is that
calling functions seems to be very costly.

This is just a guess, but one contributor to that might be
the way that it packs all the arguments up into a tuple,
and then immediately unpacks them again upon entering
the function.

Ever since 1.4 it doesn't do that any more (except when using apply()).

Post by Greg Ewing
Also, it allocates each stack frame as a separate object.
So that's at least two memory allocation/free operations
per procedure call.

And the stack frame allocations are optimized.

Some of this is done already -- a copy takes place but it's very fast.
The stack frames can't be allocated as a true stack because (1) they
are variable-length and (2) some stack frames may survive (through
tracebacks etc.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum

1999-05-05 23:09:45 UTC

Maybe there could be an alternative interface
PyObject *myFunc(int argc, PyObject *argv[]);
for the case where the function has a fixed
arg list. Plus a PyArg_ParseArgv function or
something to go with it.

This seems like a good idea. It will take some work to fully support
it so I'm not sure it will happen in 1.6 but it seems right for 2.0!

--Guido van Rossum (home page: http://www.python.org/~guido/)

Greg Ewing

1999-05-05 22:10:43 UTC

Post by Guido van Rossum
Ever since 1.4 it doesn't do that any more (except when using apply()).

Ah, that's good to know.

However, when calling a C function, it still
creates a tuple of args, doesn't it? That's
always struck me as a bit inefficient - sort
of niggles at me every time I write an
extension.

Maybe there could be an alternative interface
for C functions:

PyObject *myFunc(int argc, PyObject *argv[]);

for the case where the function has a fixed
arg list. Plus a PyArg_ParseArgv function or
something to go with it.

Greg

tomjenkins

1999-04-26 20:47:29 UTC

In article <613145F79272D211914B0020AFF6401914DAD8 at gandalf.digicool.com>,

There are also some general optimizations that can be used in
places where speed is an issue, such as avoiding repeated
attribute lookups (esp. in loops). This version of your read_write
function uses the same basic algorithm, but forgoes re for more
specific tools (slicing, string.split) and has some examples of
optimizations to mimimize attribute lookups. I haven't timed it
or anything, but I'd be surprised if it wasn't noticeably
faster.

Brian, just to followup on your post I profiled his original code and yours:
PII 450, 128M, WinNT
Original: 5.126 seconds
Your Ver: 1.512 seconds

Tom

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

Christian Tismer

1999-05-07 15:20:57 UTC

[various, about malloc/free performance]
[Christian Tismer]

...
A malloc just for Python - would it need locking at all?
Since the interpreter is locking already, I guess malloc
would be always called in a safe way.

Right. Note that Vladimir has already put a great deal of work into this
area; see
http://starship.python.net/crew/vlad/pymalloc/
IIRC, paradoxically this yielded bigger speedups on non-Windows systems.
windows-vm-flushes-most-recently-used<0.9-wink>-ly y'rs - tim

Thanks a lot. Now I have so many ways to go that I cannot
decide. Vlad uses explicit bins, dedicated storage. While
this can be very fast, it is a generalization of Guido's
pre-allocated arrays for some kinds of objects. Nobody
can tell me if this is the way to go, or if it will
under circumstances yield much fragmentation. Doug Lea's
code is more general, since his bins are linked lists of
free space which can coalesce again. Takes more time and
less fragmentation. And J.C. Wippler gave me a very fast
allocator for small objects which uses a power of two
size scheme, with memory blocks arranged in a way that
their size is encoded in the address (portable!).

I guess finding out "the right way"(TM) is NP-complete.

If I knew in advance how long certain objects will live,
then I would be able to put them into the proper place.
Does it make sense to do statistics on object lifetime?
But the only known thing is allocation size.

Also, Python knows a lot more than malloc: A list(dict)
is always two things: A header which cannot move, and
a body which can move and grow and shrink. Could make
sense to use different strategies for heads and bodies.
Bad again with strings and tuples which have varying
size and are not moveable. I would be much happier if
we would pay the one level of indirection, but we
had all the varying sizes moveable, since this would
allow for some compaction as well.

But I know it is too late :-)

ciao - chris

Markus Kohler

1999-05-03 08:41:32 UTC

Python-1.5.2c1 .... 0.52 seconds
Squeak .... 0.13 seconds.

Marko> I wouldn't put too much meaning in this number. You can't
Marko> say how big the startup costs are for a python
Marko> vs. squeak.

I'm not really interested in the startup costs. One can make them near
zero for squeak, by compiling the image into an executable. These are
the times without startup costs.

Marko> You mentioned Garbage Collection. There are other factors,
Marko> that might make other (longer running?) examples very
Marko> different.

I ran this example 100 times. I think that's long enough.

Marko> Still it would be nice, if python were faster.

Agreed. And profiling with quantify (thanks for the nice support)
seems to suggest that most time in python is spend in the byte code for function
call. I took only a quick look at the code, but it seems to me that
the code for calling a function is pretty comples.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Guido van Rossum

1999-05-03 17:29:09 UTC

Post by Markus Kohler

Python-1.5.2c1 .... 0.52 seconds
Squeak .... 0.13 seconds.

Markus, I think Marko was asking how you measured the times. If you
used something like "time python bench.py" you were measuring Python's
(considerable) start-up time, and you are misrepresenting Python's
speed. On the other hand, if you used Python's time.clock():
t1 = time.clock()
bench()
t2 = time.clock()
you're fine. Without this information your numbers are meaningless
(at least to skeptics :-).

--Guido van Rossum (home page: http://www.python.org/~guido/)

Markus Kohler

1999-05-04 10:17:18 UTC

"Marko" == Marko Schulz

Python-1.5.2c1 .... 0.52 seconds >> >> Squeak .... 0.13

seconds.

Marko> I wouldn't put too much meaning in this number. You can't
Marko> say how big the startup costs are for a python vs. squeak.

I'm not really interested in the startup costs. One can make
them near zero for squeak, by compiling the image into an
executable. These are the times without startup costs.

Guido> Markus, I think Marko was asking how you measured the
Guido> times. If you used something like "time python bench.py"
Guido> you were measuring Python's (considerable) start-up time,
Guido> and you are misrepresenting Python's speed. On the other
Guido> hand, if you used Python's time.clock(): t1 = time.clock()
Guido> bench() t2 = time.clock() you're fine. Without this
Guido> information your numbers are meaningless (at least to
Guido> skeptics :-).

Ok, I admit that I also tested with time python bench.py.
But the result is in this case almost the same. because I repeated the
test 100 times. That means the test runs almost one minute, startup time
becomes unimportant for such a small test.

Remeasuring with time gave me the same result.

By the way the call I used is tak(18, 12, 6). I forgot to mention that in
my first post.

Anyway, what I just want to say is that there is room for improvement and that
it's worth the effort.
I also measured the function call speed of the language 'ruby' that
R. Beherends mentioned here. It runs the function call speed test
almost at the same speed as python.

Markus

--
Markus Kohler mailto:markus_kohler at hp.com

Tim Peters

1999-05-02 16:09:58 UTC

[Michael Hudson, writes some seriously disturbed code for capturing
current bindings of non-local names]

Ooh -- that is cute!

Post by Michael Hudson
...
A word of warning: this code is nasty, *nasty*, NASTY. Possibly the
most horrible thing you will see perpetrated in Python this year.

Watch out, or Gordon will revive a metaclass thread <wink>.

Post by Michael Hudson
It applies regular expressions to strings of bytecode...

Didn't need to, though. That is, string.replace would have worked as well
here. Regardless, it's vulnerable to picking up an inappropriate
bit-pattern by unlikely accident in either case, so should be made
bulletproof in that respect. Believe the attached does that (and is a bit
more frugal by appending values to the consts only if they're actually
referenced, and only once when they are).

Post by Michael Hudson
I made Python core repeatedly when debugging it.

That's why "new" is an undocumented module <0.1 wink>.

Post by Michael Hudson
However, it works. The returned functions are very fast. I wrote this
package because I wanted to avoid both the tackiness of the `default
argument hack' and the performance penalty of using classes to fake
closures.

Not to mention writing functions to write functions that get dynamically
compiled -- this is a reasonable hack for expert use, and with any luck at
all Guido will make it completely useless in Python2 <wink>.

Post by Michael Hudson
... [getting more extreme] ...
The thing is, this would replace two argumented bytecode with one,
changing the length of the codestring and you'd need to recompute
jumps. I haven't had the bloody-mindedness to get this to work yet.

At least Skip Montanaro has written a bytecode optimizer with reusable code
for all this kind of manipulation; offhand I don't have a URL, though.

rewriting-from-scratch-is-a-noble-python-tradition<wink>-ly y'rs - tim

A safer replacement for munge_code and hardcoded VM constants:

from dis import opname, HAVE_ARGUMENT
LOAD_CONST = opname.index("LOAD_CONST")
LOAD_GLOBAL = opname.index("LOAD_GLOBAL")
del opname

def munge_code(code, vars):
codelist = list(code.co_code)
names = list(code.co_names)
consts = list(code.co_consts)
nameindex2value = {}
for name, value in vars.items():
try:
index = names.index(name)
except ValueError:
pass
else:
nameindex2value[index] = value

# Replace LOAD_GLOBAL instructions referring to names in vars w/
# LOAD_CONST instructions referring to the name's current binding.
nameindex2constindex = {}
i, n = 0, len(codelist)
while i < n:
c = codelist[i]
i = i+1
op = ord(c)
if op < HAVE_ARGUMENT:
continue
i = i+2
if op != LOAD_GLOBAL:
continue
oparg = 256 * ord(codelist[i-1]) + ord(codelist[i-2])
if not nameindex2value.has_key(oparg):
# this name wasn't in vars
continue
if nameindex2constindex.has_key(oparg):
index = nameindex2constindex[oparg]
else:
# first time this name is being replaced; create a slot
# for it in the consts vector
nameindex2constindex[oparg] = index = len(consts)
consts.append(nameindex2value[oparg])
codelist[i-3] = chr(LOAD_CONST)
codelist[i-2] = chr(index & 0xff)
codelist[i-1] = chr(index >> 8)
assert i == n

return copy_code_with_changes(
code,
consts=tuple(consts),
code=string.join(codelist, ''))

Gordon McMillan

1999-05-02 19:26:34 UTC

Post by Tim Peters
[Michael Hudson, writes some seriously disturbed code for capturing
current bindings of non-local names]
Ooh -- that is cute!

Post by Michael Hudson
...
A word of warning: this code is nasty, *nasty*, NASTY. Possibly the
most horrible thing you will see perpetrated in Python this year.

Watch out, or Gordon will revive a metaclass thread <wink>.

... pass
...
... def __init__(self):
... self.__class__.__bases__ = (A,)
...

Post by Tim Peters

b = B()
b.__class__

Post by Tim Peters

b.__class__.__bases__

(<class __main__.A at 7f7c30>,)

Oh crap. Now I really will have to kill you, Tim.

For those completely mystified, and with the urge to splatter their
remaining gray matter all over the walls...

Around last June, Tim & I goaded each other into creating a metaclass
hack that actually re-wrote the class-deriving-from-our-metaclass.
The normal metaclass trick is to proxy the user's class, so that the
magic is performed at runtime by "normal" getattr / setattr hacks. We
actually rewrote the user's class and built the magic into it. The
problem was that it only worked once - on the class directly deriving
from our metaclass. Sub-sub-classes got no magic. We both concluded
that the main obstacle was the (1.5.1) inability to assign to
__bases__.

When 1.5.2 came out, I studiously managed to ignore the fact that one
can now assign to __bases__. And I _was_ following this thread from a
nice safe distance, since there were enough psychos involved already.
Grrr...

Well, Tim, you can take some comfort in the fact that I _will_ get
that free dinner out of you before sending you down to count the
fishies at the bottom of the Charles.

unless-Michael-or-Christian-want-to-play-the-sucker-on-this-one-ly
y'rs

- Gordon

Travis Oliphant

1999-05-03 03:37:22 UTC

Exactly how slow are we talking about here?

I haven't noticed a problem with Python's speed. I used to do a lot of
interactive work in MATLAB (which is VERY slow). So from my perspective I
was getting a huge boost in speed. :-)

I want to use Python for an interactive program.

Anecdotal evidence from me says no problem.

David Steuber

1999-05-01 23:12:57 UTC

Exactly how slow are we talking about here? I haven't done any of my
own testing yet. The tiny script that produces my signature runs fast
enough not to be noticed as different from an ordinary signature file
(it writes to a named pipe, code was posted here before).

I want to use Python for an interactive program. Speed is nice, but
it only has to be fast enough to not frustrate the user. I also
expect to use it for prototyping. The plan is to use the Python
interpreter in the application for an extension and control language.
Again, speed is nice, but it may not be critical.

--
David Steuber | s/trashcan/david/ if you wish to reply by mail

Help fight continental drift.

Christian Tismer

1999-05-06 16:48:17 UTC

Post by Christian Tismer
<snip/>

Running pystone on HP-UX the result for 1.5.2c is 3.37% for malloc and
3.95%
for free :-(

Are you absolutely sure?

Hmm. Seems to me as if I could do the same thing for Python.

(Actually, there are, or at least were, certain pathological cases
in which multithreading on a multiprocessor Windows box would slow
to a crawl because of poorly designed locking within malloc / free.)

A malloc just for Python - would it need locking at all?
Since the interpreter is locking already, I guess malloc
would be always called in a safe way.

(And, in MS's defense, let me point out that their debugging versions
are very nice - they let you track down rogue pointers / premature
frees much faster than most other libs.)

Absolutely, this is nice. Although I still have a bad
memory error with Python running through many Excel files,
and with the debugging library, the error just doesn't
appear :-(

ciao - chris

geek+

1999-05-07 16:22:14 UTC

Post by Christian Tismer
Thanks a lot. Now I have so many ways to go that I cannot
decide. Vlad uses explicit bins, dedicated storage. While

[chomp]

Post by Christian Tismer
Also, Python knows a lot more than malloc: A list(dict)
is always two things: A header which cannot move, and
a body which can move and grow and shrink. Could make
sense to use different strategies for heads and bodies.
Bad again with strings and tuples which have varying
size and are not moveable. I would be much happier if
we would pay the one level of indirection, but we
had all the varying sizes moveable, since this would
allow for some compaction as well.

This actually seems to have potential for all kinds of objects where
there is a separation between the header (PyObject*) and "everything
else". Of course, it *still* makes allocation more difficult.
--
=====================================================================
| JAVA must have been developed in the wilds of West Virginia. |
| After all, why else would it support only single inheritance?? |
=====================================================================
| Finger geek at cmu.edu for my public key. |
=====================================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 266 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/19990507/512d4401/attachment.pgp>

Vladimir Marangozov

1999-05-07 19:53:38 UTC

[about optimizing Python's memory management]

Post by Christian Tismer
Thanks a lot. Now I have so many ways to go that I cannot
decide.

If you promise to think twice before jumping into memory
management, I promise to help you and try to answer all your
questions. ;-)

Hint: before choosing a way to go, (1) look at the stats,
(2) decide where you want to go, (3) tell us why &
then go for it.

Post by Christian Tismer
Vlad uses explicit bins, dedicated storage. While
this can be very fast, it is a generalization of Guido's
pre-allocated arrays for some kinds of objects.

Not quite. It does more than it seems to do.

Post by Christian Tismer
Nobody can tell me

Ok. I won't tell you then ;-)

Post by Christian Tismer
if this is the way to go,

This is one way to go.

Post by Christian Tismer
or if it will under circumstances yield much fragmentation.

Under circumstances, all allocators yield much fragmentation.

Post by Christian Tismer
Doug Lea's
code is more general, since his bins are linked lists of
free space which can coalesce again. Takes more time and
less fragmentation.

DL's malloc is a general purpose allocator. It performs well
in the average case. The average case is not Python's case
(see the stats), so when you have 95% of small allocs, where
70% do not exceed 32 bytes, DL's allocator wastes a lot of
space for the links it stores in each block. Besides, as you
noticed, it takes more time.

You'd certainly want to do better, don't you?
You want a special purpose allocator for Python, right?
If yes, DL's strategy is not optimal for you "as is".

Post by Christian Tismer
And J.C. Wippler gave me a very fast
allocator for small objects which uses a power of two
size scheme, with memory blocks arranged in a way that
their size is encoded in the address (portable!).

Must be a BSD-malloc variant. I think it wastes too much memory
in the Python case. But it's fast, yes (if you don't start
paging in&out).

Post by Christian Tismer
I guess finding out "the right way"(TM) is NP-complete.

Good guess.

Post by Christian Tismer
If I knew in advance how long certain objects will live,

The problem wouldn't be NP-complete :-)

Post by Christian Tismer
then I would be able to put them into the proper place.
Does it make sense to do statistics on object lifetime?
But the only known thing is allocation size.
...
But I know it is too late :-)

It's never too late. You've discovered so far 1/4 of the problems a
Python malloc should solve in order to be acceptable ;-)

Still want to dig into memory management?

Remember that whatever you do, you deal with virtual memory, so lots
of things aren't under control. These things are different and specific
to each OS.

Still want to dig into memory management? ;-)

--
Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Gaetan Corneau

1999-05-06 13:42:54 UTC

Pardon me for jumping in,

(GNU malloc has a reputation for being the fastest malloc arround;

Microsoft's doesn't :-)

I have been working on MRP software about a year ago, and we improved
performances by over 30% (measured on NT and HP-UX) just by using "memory
pools". We used C++ and redefined "new" and "delete" to get memory from our
own memory manager. The memory manager's job is to minimize real (os) memory
allocation/deallocation by keeping blocs handy. It uses more RAM, but it is
much faster and reduces fragmentation. This was very important for us, since
the MRP allocates tens of millions of memory blocs in a single run.

I suppose such a manager could be made for Python, but with more
configuration options to keep RAM consumption at a reasonable level.
______________________________________________________
Gaetan Corneau
Software Developer (System integration Team)
BaaN Supply Chain Solutions
E-mail: Gaetan_Corneau at baan.com
Compuserve: Gaetan_Corneau at compuserve.com
ICQ Number: 7395494
Tel: (418) 654-1454 ext. 252
______________________________________________________
"Profanity is the one language all programmers know best"

Tim Peters

1999-05-08 07:35:04 UTC

[various, about malloc/free performance, Tim points to
http://starship.python.net/crew/vlad/pymalloc/
]

[Christian Tismer]

Post by Christian Tismer
Thanks a lot. Now I have so many ways to go that I cannot
decide. Vlad uses explicit bins, dedicated storage. While
this can be very fast, it is a generalization of Guido's
pre-allocated arrays for some kinds of objects. Nobody
can tell me if this is the way to go, or if it will
under circumstances yield much fragmentation.

Vladimir can tell you -- but you have to let him <wink>. It's A Theorem
that any non-relocating allocator can be provoked into bad fragmentation,
even if the only block sizes ever requested are 1 and 2 (units of your
choice). Of course some schemes fragment worse than others, but before
worrying about that determine whether fragmentation is a real problem for
Python programs. Whichever way you answer, it's wrong <wink>.

Post by Christian Tismer
...
If I knew in advance how long certain objects will live,
then I would be able to put them into the proper place.

Yes: on the stack for starters, and exempted from ref-counting entirely.

For the rest, I defer to Vlad's wise response. With all due respect, he's
spent more time investigating *Python's* memory behavior than Doug Lea, JC,
you, me and Guido combined.

not-to-mention-larry-wall-ly y'rs - tim

Christian Tismer

1999-05-07 15:21:51 UTC

[various, about malloc/free performance]
[Christian Tismer]

...
A malloc just for Python - would it need locking at all?
Since the interpreter is locking already, I guess malloc
would be always called in a safe way.

Tim Peters

1999-05-07 02:30:52 UTC