Discussion:
[SciPy-dev] SciPy Foundation
Joe Harrington
2009-07-31 17:06:37 UTC
Permalink
About sixteen months ago, I launched the SciPy Documentation Project
and its Marathon. Dozens pitched in and now numpy docs are rapidly
approaching a professional level. The "pink wave" ("Needs Review"
status) is at 56% today! There is consensus among doc writers that
much of the rest can be labeled in the "unimportant" category, so
we're close to starting the review push (hold your fire, there is a
web site mod to be done first).

We're also nearing the end of the summer, and it's time to look ahead.
The path for docs is clear, but the path for SciPy is not. I think
our weakest area right now is organization of the project. There is
no consensus-based plan for improvement of the whole toward a stated
goal, no centralized coordination of work, and no funded work focused
on many of our weaknesses, notwithstanding my doc effort and what
Enthought does for code.

I define success as popular adoption in preference to commercial
packages. I believe in vote-with-your-feet: this goal will not be
reached until all aspects of the package and its presentation to the
world exceed those of our commercial competition. Scipy is now a
grass roots effort, but that takes it only so far. Other projects,
such as OpenOffice and Sage, don't follow this model and do produce
quality products that compete with commercial offerings, at least on
open-source platforms. Before we can even hope for that, we have to
do the following:

- Docs
- Rest of numpy reference pages reviewed and proofed or marked unimportant
- Scipy reference pages
- User manual for the whole toolstack
- Multiple commercial books
- Packaging
- Personal Package Archive or equivalent for every release of every
OS for the full toolstack (There are tools that do this but we
don't use them. NSF requires Metronome - http://nmi.cs.wisc.edu/
- for funding most development grants, so right now we're not even
on NSF's radar.)
- Track record of having the whole toolstack installation "just
work" in a few command lines or clicks for *everyone*
- Regular, scheduled releases of numpy and scipy
- Coordinated releases of numpy, scipy, and stable scikits into PPA system
- Public communication
- A real marketing plan
- Executing on that plan
- Web site geared toward multiple audiences, run by experts at that
kind of communication
- More webinars, conference booths, training, aimed at all levels
- Demos, testimonials, topical forums, all showcased
- Code
- A full design review for numpy 2.0
- No more inconsistencies like median(), lacking "out", degrees
option for angle functions?
- Trimming of financial functions, maybe others, from numpy?
- Package structure review (eliminate "fromnumeric"?)
- Goal that this be the last breakage for numpy API (the real 1.0)
- Scipy
- Is it maintainable? should it be broken up?
- Clear code addition path (or decide never to add more)
- Docs (see above)
- Add-on packages
- Both existence of and good indexing/integration/support for
field-specific packages
- Clearer development path for new packages
- Central hosting system for packages (svn, mailing lists, web,
build integration, etc.)
- Simultaneous releases of stable packages along with numpy/scipy

I posted a basic improvement plan some years back. The core ideas
have not changed; it is linked from the bottom of
http://scipy.org/Developer_Zone. I chose our major weakness to begin
with and started the doc project, using some money I could justify
spending simply for the utility of docs for my own research. I funded
the work of two doc coordinators, one each this summer and last.
Looking at http://docs.scipy.org/numpy/stats/, you can see that when a
doc coordinator was being paid (summers), work got done. When not,
then not. Without publicly announcing what these guys made, I'll be
the first to admit that it wasn't a lot. Yet, those small sums bought
a huge contribution to numpy through the work of several dozen
volunteers and the major contributions of a few.

My conclusion is that active and constant coordination is central to
motivating volunteer work, and that without a salary we cannot depend
on coordination remaining active. On the other hand, I have heard
Enthought's leaders bemoan the high cost of devoting employee time to
this project, and the low returns available from selling support to
universities and non-profit research institutes. Their leadership has
moved us forward, particularly in the area of code, but has not
provided the momentum necessary to carry us forward on all fronts. It
is time for the public and education sectors to kick in some resources
and organizational leadership. We are, after all, benefitting
immensely.

Since the cost of employee time is not so high for us in the public
and education sectors, I propose to continue hiring people like Stefan
and David as UCF employees or contractors, and to expand to hiring
others in areas like packaging and marketing, provided that funding
for those hires can be found. However, my grant situation is no
longer as rich as it has been the past two years, and the needs going
forward are greater than in the past if we're now to tackle all the
points above. So, I will not be hiring another doc guru from my
research grants next year.

I am confident that others are willing to pitch in financially, but
few will pitch in a full FTE, and we need several. We can (and will)
set up a donations site, but donation sites tend to receive pizza
money unless a sugar daddy comes along. Those benefitting most from
the software, notably education, non-profit research, and government
institutions, are *forbidden* from making donations by the terms of
their grants. NSF doesn't give you money so you can give it away.

We need to provide services they can buy on subcontract and a means
for handling payments from them. Selling support does not solve the
problem, as that requires spending most of the income on servicing
that particular client. Rather, we need to sell a chunk of
documentation or the packaging of a particular release, and then
provide the product not just to that client but to everyone.

We can also propose directly for federal and corporate grant funds. I
have spoken with several NASA and NSF program managers and with
Google's Federal Accounts Representative, and the possibilities for
funding are good. But, I am not going to do this alone. We need a
strong proposal team to be credible.

So, I am seeking a group that is willing to work with me to put up the
infrastructure of a funded project, to write grant proposals, and to
coordinate a financial effort. Members of this group must have a
track record of funded grants, business success, foundation support,
etc. We might call it the SciPy Foundation. It could be based at
UCF, which has a low overhead rate and has infrastructure (like an HR
staff), or it might be independent if we can find a good director
willing to devote significant time for relatively low pay compared to
what they can likely make elsewhere. I would envision hiring
permanent coordinators for docs, packaging, and marketing
communications. Enthought appears to have code covered by virtue of
having hired Travis, Robert, etc.; how to integrate that with this
effort is an open question but not a difficult one, I think, as code
is our strongest asset at this point.

I invite discussion of this approach and the task list above on the
scipy-***@scipy.org mailing list. If you are seeing this post
elsewhere, please reply only on scipy-***@scipy.org.

If you are eligible to lead funding proposals and are interested in
participating in grant writing and management activities related to
work in our weak areas, please contact me directly.

Thanks,

--jh--
Prof. Joseph Harrington
Planetary Sciences Group
Department of Physics
MAP 414
4000 Central Florida Blvd.
University of Central Florida
Orlando, FL 32816-2385
***@physics.ucf.edu
planets.ucf.edu
Robert Kern
2009-07-31 19:27:14 UTC
Permalink
 Enthought appears to have code covered by virtue of
having hired Travis, Robert, etc.;
Eh, what? We work on numpy and scipy in our spare time, just like
everyone else. There are rare occasions when a client wants to fund a
particular feature, or we need to fix a bug in the course of our work,
but that's a far cry from having "code covered".
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
Joe Harrington
2009-07-31 21:04:46 UTC
Permalink
Post by Robert Kern
 Enthought appears to have code covered by virtue of
having hired Travis, Robert, etc.;
Eh, what? We work on numpy and scipy in our spare time, just like
everyone else. There are rare occasions when a client wants to fund a
particular feature, or we need to fix a bug in the course of our work,
but that's a far cry from having "code covered".
Then please accept my profusest apologies! Eric mentioned to me that
Enthough had paid significantly for scipy development and I thought
that meant a portion of developers' time. Perhaps this was just in
the past.

--jh--
Robert Kern
2009-07-31 21:13:48 UTC
Permalink
Post by Robert Kern
 Enthought appears to have code covered by virtue of
having hired Travis, Robert, etc.;
Eh, what? We work on numpy and scipy in our spare time, just like
everyone else. There are rare occasions when a client wants to fund a
particular feature, or we need to fix a bug in the course of our work,
but that's a far cry from having "code covered".
Then please accept my profusest apologies!  Eric mentioned to me that
Enthough had paid significantly for scipy development and I thought
that meant a portion of developers' time.  Perhaps this was just in
the past.
Still do; it's just not part of our daily duties and is usually
focused on what we need, not general maintenance. Not to mention the
infrastructure support.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
David Goldsmith
2009-07-31 19:37:58 UTC
Permalink
Interesting... Now I'm curious to know how many others thought Enthought employees were paid to "keep the code covered"?

DG
Subject: Re: [SciPy-dev] [SciPy-User] SciPy Foundation
Date: Friday, July 31, 2009, 12:27 PM
On Fri, Jul 31, 2009 at 12:06, Joe
 Enthought appears to have code covered by virtue of
having hired Travis, Robert, etc.;
Eh, what? We work on numpy and scipy in our spare time,
just like
everyone else. There are rare occasions when a client wants
to fund a
particular feature, or we need to fix a bug in the course
of our work,
but that's a far cry from having "code covered".
--
Robert Kern
"I have come to believe that the whole world is an enigma,
a harmless
enigma that is made terrible by our own mad attempt to
interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
Charles R Harris
2009-07-31 19:59:41 UTC
Permalink
Post by David Goldsmith
Interesting... Now I'm curious to know how many others thought Enthought
employees were paid to "keep the code covered"?
I always figured they had to scramble to pay the bills. Making a small
company go isn't easy-peasy.

Chuck
David Goldsmith
2009-07-31 20:13:26 UTC
Permalink
Understood and agreed, but is your point that since code maintenance and generation would fall under the category of "capital," not "operations," consequently our default assumption as outsiders should be that they do not invest in it (except when "operations" necessitate)?

DG
Subject: Re: [SciPy-dev] [SciPy-User] SciPy Foundation
Date: Friday, July 31, 2009, 12:59 PM
On Fri, Jul 31, 2009 at 1:37 PM,
Interesting...  Now I'm curious to know how many
others thought Enthought employees were paid to "keep
the code covered"?
I always figured they had to scramble to pay the bills.
Making a small company go isn't easy-peasy.
Chuck
-----Inline Attachment Follows-----
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
Charles R Harris
2009-07-31 20:49:09 UTC
Permalink
Post by David Goldsmith
Understood and agreed, but is your point that since code maintenance and
generation would fall under the category of "capital," not "operations,"
consequently our default assumption as outsiders should be that they do not
invest in it (except when "operations" necessitate)?
I believe they host the svn servers and pay for the bandwidth, so that is a
significant investment. They also hire folks from the community who, after
all, need to make a living. As to direct investment in code development I
think Robert covered it. But I don't know much about what Enthought does, so
if you want a definitive statement you will need to ask them.

Chuck
Dag Sverre Seljebotn
2009-08-01 09:57:15 UTC
Permalink
I am going to play the devil's advocate here -- I'm not into this in order
to make myself enemies, I just have some sincere questions.
Post by Joe Harrington
I define success as popular adoption in preference to commercial
packages. I believe in vote-with-your-feet: this goal will not be
reached until all aspects of the package and its presentation to the
world exceed those of our commercial competition. Scipy is now a
grass roots effort, but that takes it only so far. Other projects,
such as OpenOffice and Sage, don't follow this model and do produce
quality products that compete with commercial offerings, at least on
open-source platforms. Before we can even hope for that, we have to
<snip>
Post by Joe Harrington
- Public communication
- A real marketing plan
- Executing on that plan
- Web site geared toward multiple audiences, run by experts at that
kind of communication
- More webinars, conference booths, training, aimed at all levels
- Demos, testimonials, topical forums, all showcased
A thing OpenOffice.org and Sage both have is a very clear sense of
direction and a clearly stated goal.

SciPy might also have that for all I know, but I must admit I haven't
understood what it is in the past year following the SciPy and NumPy
lists, and reading the SciPy site. But I have seen email threads asking
what the SciPy goal is, without any clear resolution (?).

The website says this: "SciPy is open-source software for mathematics,
science, and engineering."

Which of course says nothing at all. Someone asked me what SciPy is the
other day, and while I more or less "know" when I'd try to look in SciPy
for an algorithm (instead of going to, say, R, or netlib.org, or
whatever), I was more or less forced to say that it is a "dumping ground
for various algorithms people have found useful, with the link being them
being either written in Python or wrapped for Python".

That's probably an unfair description -- the point is: If one needs to
formulate a two- or three-liner about SciPy, what would it be? Is it a
goal to reimplement stuff in SciPy that's (for instance) already thriving
in the open source R community, or is that not a goal? And so on.

You might feel this is going off-topic, but I somehow feel that a very
clear sense of direction is paramount when talking of these issues -- just
look at the Sage project.

Dag Sverre
j***@gmail.com
2009-08-01 12:47:36 UTC
Permalink
On Sat, Aug 1, 2009 at 5:57 AM, Dag Sverre
Post by Dag Sverre Seljebotn
I am going to play the devil's advocate here -- I'm not into this in order
to make myself enemies, I just have some sincere questions.
Post by Joe Harrington
I define success as popular adoption in preference to commercial
packages.  I believe in vote-with-your-feet: this goal will not be
reached until all aspects of the package and its presentation to the
world exceed those of our commercial competition.  Scipy is now a
grass roots effort, but that takes it only so far.  Other projects,
such as OpenOffice and Sage, don't follow this model and do produce
quality products that compete with commercial offerings, at least on
open-source platforms.  Before we can even hope for that, we have to
<snip>
Post by Joe Harrington
- Public communication
  - A real marketing plan
  - Executing on that plan
  - Web site geared toward multiple audiences, run by experts at that
    kind of communication
  - More webinars, conference booths, training, aimed at all levels
  - Demos, testimonials, topical forums, all showcased
A thing OpenOffice.org and Sage both have is a very clear sense of
direction and a clearly stated goal.
SciPy might also have that for all I know, but I must admit I haven't
understood what it is in the past year following the SciPy and NumPy
lists, and reading the SciPy site. But I have seen email threads asking
what the SciPy goal is, without any clear resolution (?).
The website says this: "SciPy is open-source software for mathematics,
science, and engineering."
Which of course says nothing at all. Someone asked me what SciPy is the
other day, and while I more or less "know" when I'd try to look in SciPy
for an algorithm (instead of going to, say, R, or netlib.org, or
whatever), I was more or less forced to say that it is a "dumping ground
for various algorithms people have found useful, with the link being them
being either written in Python or wrapped for Python".
I think scipy is a pretty much the same as a collection of matlab tool
boxes, either with more enhanced basic numerical algorithms (linalg,
special, optimize, interpolate, sparse, fft, spatial) or toolboxes
with wider applicability (stats including cluster, odr and maxentropy,
signal, ndimage+stsci?). This misses weave.

Which algorithms are actually included and some of the structure still
reflects the "dumping ground
for various algorithms people have found useful". And some parts
don't look very used.

There is still a lot of cleaning and testing to do, but the
description as analogy to matlab toolboxes is pretty accurate, if a
description by analogy is allowed. E.g. to understand more of
scipy.signal, I started to read the help for matlabs signal toolbox.

That's my impression of scipy after working my way through some parts
of it in the last year.
Post by Dag Sverre Seljebotn
That's probably an unfair description -- the point is: If one needs to
formulate a two- or three-liner about SciPy, what would it be? Is it a
goal to reimplement stuff in SciPy that's (for instance) already thriving
in the open source R community, or is that not a goal? And so on.
For stats, I consider matlab and maybe gauss for econometrics as
benchmark, not the coverage of a specialized language/package like R,
but I'm no statistician and I don't know anyone personally that uses
R.

Josef
Post by Dag Sverre Seljebotn
You might feel this is going off-topic, but I somehow feel that a very
clear sense of direction is paramount when talking of these issues -- just
look at the Sage project.
Dag Sverre
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
Joe Harrington
2009-08-01 16:20:17 UTC
Permalink
Post by Dag Sverre Seljebotn
I have seen email threads asking
what the SciPy goal is, without any clear resolution (?).
How's this for a goal/mission statement (for SciPy, IDL, and Matlab):

(The toolstack) is a professional-quality numerical computation and
visualization environment that supports convenient handling of
numerical arrays, provides a rich set of basic tools and algorithms
for science and engineering, and supports a variety of both general
and discipline-specific application software. It is easy for
numerically savvy teens to learn, but rich enough to support the most
complex of professional applications. It can be run both
non-interactively and interactively, with the latter featuring both
GUI and rich command-line interfaces. It comes with full
documentation, is easy to install and run on all popular platforms,
has a strong online user community spanning all disciplines, and has
commercial support and consulting.

For SciPy, I'd replace the part after the last comma with "is free and
open-source, supports cloud computing, and has options for commercial
user support and consulting." One could add to the list of general
features, such as symbolic manipulation, parallel processing, etc.,
but it's already getting long.

For SciPy, some of this, of course, is not yet true, which is the
point of the current thread.

Another way of looking at it:

For me, SciPy is a replacement for IDL that improves on it in some
areas. No more, but no less. That doesn't say what it *is*, since it
just begs the question, "what is IDL", but it does identify the space
I'd like to see SciPy occupy. It occupies most of the space IDL
occupied for me now, except for a few crucial areas. The main one is
that enough of my colleagues use it that I can exchange codes with
them. A code written in an interpreted language that your colleague
does not use is not useful to them. If it's not useful to them, then
the interest in your contribution is that much smaller. So, my goal
is to make SciPy (the toolstack, not the package) *to them* be what
IDL is to them today. That is a lot more than what IDL is to me,
since I have more of a knack for computers than most of my colleagues.
They need a one-touch install, hold-your-hand docs, GUIs, and so
forth. They are also less interested in the linguistic improvements
of Python over IDL. Or, they are until they really get coding, which
is long after they make the decision to give it a spin. This is a
good thing in a way, since it means that once they try it, they
*really* like it. Most current SciPy users, I think, are savvy enough
about computers that we can work around the shortcomings, but the next
round of adopters will always be less savvy than the last, on the
whole, hence the need for better and lower-level docs, professional
packaging on all platforms, etc.

--jh--
Tommy Grav
2009-08-01 17:11:09 UTC
Permalink
Post by Joe Harrington
Post by Dag Sverre Seljebotn
I have seen email threads asking
what the SciPy goal is, without any clear resolution (?).
For me, SciPy is a replacement for IDL that improves on it in some
areas. No more, but no less.
I have been using python, numpy and matplotlib for a few years as part
of my astronomy research. While I find numpy and matplotlib extremely
useful, scipy just don't seem to help me much. I think the problem is
that
it is very unfocused. To me scipy is not a replacement of IDL, it is a
python
implementation of Numerical Recipes, but it because of its lack of focus
it has become very chaotic. So far I have only found use for the
integrate.leastsq
and spatial.KDTree packages from scipy. Packages like pyfits, pyraf,
AstLib, etc.
take care of the more astronomy related problems. So I would
personally like to
see scipy become a package that binds the numpy package to the more
field
specific packages, by providing numerical methods that are broadly
applicable
in many fields (i.e. least square minimization, KDTree implementation,
Runga-Kutta
and other type of integration schemes, differential equation solvers
and so on).

Making scipy into a tool for science and engineering is in my opinion
a to broad a
goal. Making into a set of tools that are useable in many fields and
thus supporting
development of field specific packages is in again my opinion the way
to go. It narrows
the focus and makes the project more self contained.

Cheers
Tommy Grav
+
----------------------------------------------------------------------------+
Associate Researcher @ Dept. of Physics and Astronomy
Johns Hopkins University
+
----------------------------------------------------------------------------+
***@pha.jhu.edu
(410) 516-7683
http://web.mac.com/tgrav/Astronomy/Welcome.html
+
----------------------------------------------------------------------------+
Dag Sverre Seljebotn
2009-08-01 17:49:21 UTC
Permalink
Post by Joe Harrington
Post by Dag Sverre Seljebotn
I have seen email threads asking
what the SciPy goal is, without any clear resolution (?).
For me, SciPy is a replacement for IDL that improves on it in some
areas. No more, but no less. That doesn't say what it *is*, since it
just begs the question, "what is IDL", but it does identify the space
I'd like to see SciPy occupy. It occupies most of the space IDL
occupied for me now, except for a few crucial areas. The main one is
that enough of my colleagues use it that I can exchange codes with
them. A code written in an interpreted language that your colleague
does not use is not useful to them. If it's not useful to them, then
the interest in your contribution is that much smaller. So, my goal
is to make SciPy (the toolstack, not the package) *to them* be what
IDL is to them today. That is a lot more than what IDL is to me,
since I have more of a knack for computers than most of my colleagues.
They need a one-touch install, hold-your-hand docs, GUIs, and so
forth. They are also less interested in the linguistic improvements
of Python over IDL. Or, they are until they really get coding, which
is long after they make the decision to give it a spin. This is a
good thing in a way, since it means that once they try it, they
*really* like it. Most current SciPy users, I think, are savvy enough
about computers that we can work around the shortcomings, but the next
round of adopters will always be less savvy than the last, on the
whole, hence the need for better and lower-level docs, professional
packaging on all platforms, etc.
I really, really want what you seem to want too. BUT, I'll continue my
criticism, in the hope that something may come out of it.

What you mention above seem to be A LOT of work (in particular
"professional packaging on all platforms"), and as others have mentioned
partly in conflict with the way people tend to view SciPy currently, and
so on.

As you say it is indeed the whole stack that is important. Still, part of
what you write seems to be an effort to do what many are doing already:
- EPD
- Sage (currently maths focused, but it does bundle SciPy and integrating
it better would )
- SPD (Sage without some of the math libs)
- Python(x,y)

These all bundle SciPy, but also sets up the whole stack, and can focus on
the whole picture.

Are you saying that you just want to do it better than these, through a
foundation? Wouldn't it be better to direct any funding through one of
these existing candidates?

This post I've written on the Sage list is very related and is about SciPy
vs. Sage:
http://groups.google.com/group/sage-devel/msg/78e2a2032042d35b

The parent thread is a bit long but lots of related material in there:
http://groups.google.com/group/sage-devel/browse_thread/thread/bef2010f45984730/78e2a2032042d35b?#78e2a2032042d35b

Dag Sverre
Gael Varoquaux
2009-08-01 22:52:16 UTC
Permalink
Post by Dag Sverre Seljebotn
As you say it is indeed the whole stack that is important. Still, part of
- EPD
- Sage (currently maths focused, but it does bundle SciPy and integrating
it better would )
- SPD (Sage without some of the math libs)
- Python(x,y)
These all bundle SciPy, but also sets up the whole stack, and can focus on
the whole picture.
Are you saying that you just want to do it better than these, through a
foundation? Wouldn't it be better to direct any funding through one of
these existing candidates?
This post I've written on the Sage list is very related and is about SciPy
http://groups.google.com/group/sage-devel/msg/78e2a2032042d35b
I am jumping in this discussion (something that I have been trying to
avoid, because such discussions are very hard to drive to a useful
point). I'll try to write a clear e-mail, to the point, however, as the
previous discussion you are pointing to does not reflect my needs.

On the various usecases and users
===================================

I think that the discussion on the Sage mailing list, and a few points of
the last e-mails I have seen on this mailing list, miss a very important
point for many users of the scipy stack that I see around me:

We want a tool, or a set of tools, to build our own entry points. We want
more than an IDE like Matlab, Mathematica. We want to be able to use the
tools separated, to do data mining on servers log, to build custom
applications for eg medical image analysis, or to control a physics
experiment (there are a lot of talks at the scipy conference this year on
this). Most of the scipy users are "even more applied than applied math"
(golly, this sounds almost dirty ;> ).

Building a reusable stack is why we need tools to be broken up separating
features. Scipy as a community and an umbrella project may benefit from
an IDE, like matlab, or a web interface like the amazing one Sage has,
but we don't want to bundle these features with the core numerical tools
of scipy.

Now this might actually concern only a fractions of users. Many users
(including me) mostly use the scipy tool stack as a matlab/mathematica
replacement. However, these users are not the main code contributors. If
somebody develops an algorithm he wants to ship or to share, chances are
he wants it not to be bound to a heavy platform, but more to a light core
(hey, numpy is even shipped by default on macOSX and many linux
distributions nowadays).


An integrated environment as an entry point
==============================================

Besides building a good set of tools and their documentation, we need to
address two separate issues to make life easier for users: building an
integrated environment (what I call an entry point) and building
distributions. It is tempting to do both at the same time, however, I
think that if we collapse the two problems, we are going in the wrong
directions: I want to be able to reuse the underlying technology of the
integrated environment, for instance to build an astronomic-specific IDE,
and I want to be able to contribute modules to it even if those modules
are not distributed together.

Like many people, my working environment is IPython. It suits my needs,
and I get scientific results using it. However, I can see that it is not
the best solution to guide a beginner. Inspired by matlab, IDL or
mathematica, we have been dreaming of having an IDE for a long while.
Last year, Enthought has payed me to start work on making IPython
GUI-friendly to plug one of the missing bricks to assembling the tool
stack in an IDE. I have been unable to work on this for a year, as it is
not a priority for my research, but the effort lives on in the IPython
repository, and it would be great to see IDE build upon it, and improve
it.

An IDE for easy scientific development with Python would bring together
tools such as a shell, easy access to documentation, and an editor
(reinventing any one of these components might not be necessary). There
is EPDLab, which is being developed in the ETS repository. I love the
technology stack that it is built upon (ETS provides good tools for
building GUIs, and IPython provides an very handy and powerful command
line), and I am thus full of hope for EPDLab. I can see however that
people might be afraid of using it, let alone contributing to it, as it
bares strong Enthought branding. This is a pity, because in this case we
have the chance of having a compagny's interest lying in the same
direction than the community.

For a web environment, the Sage notebook is amazing. Unfortunately last
time I looked, it was GPL licensed, which renders it improper for my use,
as the tools we use at the lab must be BSD, in order to be able to build
(eventually) medical imaging products from them one day.

But, from a more pragmatic point of view the simplest thing to do to make
it easier for a beginner to get started, would be to improve the
documentation on the web. I am not thinking of the specific packages
documentation, but more describing how things fit together: giving the
workflow, and pointing to the various main packages used for different
things. We already have a lot of material on the webpages, but this
material is not as 'sexy' as it could be, and not as to-the-point as
possible. Sure, this is a lot of work too.

Building standard distributions
=================================

I am a huge fan of distributions. Every large applied lab I know ends up
building a distribution mechanism. Without standard distributions, we
cannot reuse each-other's effort to distribute, but also we have huge
friction on reusing each-other's tools: installing on your computer may
be easy, but if you have to worry whether your non-technical users will
succeed in installing a tool, you start wondering whether you want to
rely on the tool, or whether you are going to reimplement it.

However, the other side of the problem is that distributions could end up
developing tools that make use of the tight integration that they provide
to solve numerical or usability problems quicker, while locking the users
in the distribution. If I want to integrate an algorithm developed by
another lab in a medical imaging platform, I cannot afford to drag in
Sage, just like I cannot afford R, or Maltab, as they are too big
dependencies. An IDE that works only on a distribution is not one that I
will rely on for teaching). This is why I believe that every single piece
of code in a distribution should be usable outside of this distribution
(and I applaud the SPD effort started by Ondrej and the SAGE guys).

Concrete suggestions to ease the progress
==========================================

Of course providing a consistent environment is a hard problem, but
hey, this is a problem many of us face. I believe that we are making
progress with many encouraging projects such as Sage, EPD, Python(x,y),
or SPD. Establishing scientific environments in Python is an ambitious
project; there will not be a one-size-fits-all solution and having many
different approaches is healthy, as long as we keep it friendly and learn
from all the efforts. I strongly believe that we will be getting more and
more satisfactory solutions in the next years.

Specifically, I would love to see an official umbrella project for
BSD-licensed tools for building scientific projects with Python. As the
"scipy" name is well branded (through the website, and the conference),
we could call this the 'scipy project'. I would personally like to limit
wheel reinvention and have preferred solutions for the various bricks (I
am thinking of the unfortunate Chaco versus Matplotlib situation, where I
have to depend on both libraries that complement each other).

Back to the scipy foundation idea
==================================

The idea of the scipy foundation is an idea that has been floating around
for a while. If it is manned by a variety of people who express the wills
and needs of users and developers of the scipy ecosystem, it could be a
great thing. But I see two road blocks: first, as Robert points out,
telling somebody what to do will not achieve anything. I am already way
too busy scratching my own itches. Second, who will find the time to take
care of this?

And now, I have to catch up on sleep.

Gaël
David Cournapeau
2009-08-02 00:39:48 UTC
Permalink
On Sun, Aug 2, 2009 at 7:52 AM, Gael
Post by Gael Varoquaux
Back to the scipy foundation idea
==================================
The idea of the scipy foundation is an idea that has been floating around
for a while. If it is manned by a variety of people who express the wills
and needs of users and developers of the scipy ecosystem, it could be a
great thing. But I see two road blocks: first, as Robert points out,
telling somebody what to do will not achieve anything.
To have a foundation, by itself, has no consequence on telling people
what to do. It is just a way to have a single point of entry for
people who want to interact with the community, and to have the legal
right to collect money.
Post by Gael Varoquaux
I am already way
too busy scratching my own itches. Second, who will find the time to take
care of this?
There is an inherent amount of bureaucracy involved with those things,
but it does not have to always be done by the same people, and
rotation works better than for code I think.

David
Ondrej Certik
2009-08-03 21:32:31 UTC
Permalink
On Sat, Aug 1, 2009 at 4:52 PM, Gael
Varoquaux<***@normalesup.org> wrote:
[...]
Post by Gael Varoquaux
For a web environment, the Sage notebook is amazing. Unfortunately last
time I looked, it was GPL licensed, which renders it improper for my use,
as the tools we use at the lab must be BSD, in order to be able to build
(eventually) medical imaging products from them one day.
Actually, in this thread:

http://groups.google.com/group/sage-devel/browse_thread/thread/65ca1e0489a0a980/

most (if not all) contributors to the Sage notebook agreed to release
their code as BSD.

The same about William being positive to license the build system as
BSD too. So we can get lots of done by working on these things
together with Sage.

Ondrej
Gael Varoquaux
2009-08-03 21:36:49 UTC
Permalink
Post by Ondrej Certik
On Sat, Aug 1, 2009 at 4:52 PM, Gael
[...]
Post by Gael Varoquaux
For a web environment, the Sage notebook is amazing. Unfortunately last
time I looked, it was GPL licensed, which renders it improper for my use,
as the tools we use at the lab must be BSD, in order to be able to build
(eventually) medical imaging products from them one day.
http://groups.google.com/group/sage-devel/browse_thread/thread/65ca1e0489a0a980/
most (if not all) contributors to the Sage notebook agreed to release
their code as BSD.
The same about William being positive to license the build system as
BSD too. So we can get lots of done by working on these things
together with Sage.
I can see that a lot of good things are coming out of Sage (the current
Cython development frenzy was clearly helped by the needs of Sage). It is
really nice to see our community (I am talking in the sens of a
scientific Python community, agnostic of tools and distribution) growing.

Cheers to these guys, that notebook is really amazing!

Gaël
Ondrej Certik
2009-08-03 21:54:08 UTC
Permalink
On Mon, Aug 3, 2009 at 3:36 PM, Gael
Post by Gael Varoquaux
Post by Ondrej Certik
On Sat, Aug 1, 2009 at 4:52 PM, Gael
[...]
Post by Gael Varoquaux
For a web environment, the Sage notebook is amazing. Unfortunately last
time I looked, it was GPL licensed, which renders it improper for my use,
as the tools we use at the lab must be BSD, in order to be able to build
(eventually) medical imaging products from them one day.
http://groups.google.com/group/sage-devel/browse_thread/thread/65ca1e0489a0a980/
most (if not all) contributors to the Sage notebook agreed to release
their code as BSD.
The same about William being positive to license the build system as
BSD too. So we can get lots of done by working on these things
together with Sage.
I can see that a lot of good things are coming out of Sage (the current
Cython development frenzy was clearly helped by the needs of Sage). It is
really nice to see our community (I am talking in the sens of a
scientific Python community, agnostic of tools and distribution) growing.
Cheers to these guys, that notebook is really amazing!
Yep. And Cython is BSD like (resp Apache) license too, so I think that
for these basic tools that everyone needs (cython/notebook/build
infrustructure) Sage is not against BSD at all.

Ondrej
Sebastian Walter
2009-08-04 08:00:26 UTC
Permalink
2 cents from an outsider who thought about contributing to
scipy/scikits (but didn't (yet)):

I think it is a good idea to make scipy easy to use for beginners.
However, after reading this thread, I have the impression that it is
not the goal to provide state of the art algorithms but rather making
Scipy as popular as possible by putting money and effort into the
"marketing" of Scipy.
Don't get me wrong, I think there are some good reasons why a project
should thrive for a large user base. Some of the best projects are
popular.
Alas, correlation does not imply causality.

Me for instance, would rather like to see more efforts to get state of
the art algorithms to be implemented in Scipy because that's something
that would make a real difference in my research work. Of course,
targeting the "clueless Matlab" users is quite pointless if it is that
what you are after.
IMHO the way to go is to convince experts to implement their research
prototypes as part of scipy.
Then you really get some "killer applications". I could name a few
people who are coding some cool state of the art algorithms but waste
so much time because they started coding directly in C++. In the
meantime, they could have implemented the algorithms in Python _and_
in C++. If scipy had something really good that Matlab etc. do not
have: guess what ppl would do....

What would you need to get experts contribute to scipy instead of
hacking their prototype in Matlab or C++?
I can't speak for everyone, so I'll just say what I think (and feel):
I would instantly start "contributing research prototypes" to scipy if
scipy offered:
1) an easy, modular and flexible build system (fortran, c, c++, D,
swig, boost:python, cython,...)
2) very low entry barrier for possible contributors:
a simple checkout, then ./manage.py startapp mycoolmodule
and everything is ready to go ( "Start coding in 5 minutes!")
3) a distributed version control system (e.g. git). SVN really scares me off...
4) standardized unit tests
5) automated documentation generation

Then I could simply
1) fork the master branch
2) ./manage.py startapp mycoolmodule
3) adjust config files that were written in ./scipy/mycoolmodule/config.py
4) start coding
5) share the experimental code with collaborators or interested users
who are not afraid to use experimental code
6) eventually, when the project has matured, hope that it gets
included in the master branch


hope that made sense,
Sebastian
Post by Ondrej Certik
On Mon, Aug 3, 2009 at 3:36 PM, Gael
Post by Gael Varoquaux
Post by Ondrej Certik
On Sat, Aug 1, 2009 at 4:52 PM, Gael
[...]
Post by Gael Varoquaux
For a web environment, the Sage notebook is amazing. Unfortunately last
time I looked, it was GPL licensed, which renders it improper for my use,
as the tools we use at the lab must be BSD, in order to be able to build
(eventually) medical imaging products from them one day.
http://groups.google.com/group/sage-devel/browse_thread/thread/65ca1e0489a0a980/
most (if not all) contributors to the Sage notebook agreed to release
their code as BSD.
The same about William being positive to license the build system as
BSD too. So we can get lots of done by working on these things
together with Sage.
I can see that a lot of good things are coming out of Sage (the current
Cython development frenzy was clearly helped by the needs of Sage). It is
really nice to see our community (I am talking in the sens of a
scientific Python community, agnostic of tools and distribution) growing.
Cheers to these guys, that notebook is really amazing!
Yep. And Cython is BSD like (resp Apache) license too, so I think that
for these basic tools that everyone needs (cython/notebook/build
infrustructure) Sage is not against BSD at all.
Ondrej
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
Gael Varoquaux
2009-08-04 08:18:25 UTC
Permalink
Post by Sebastian Walter
Me for instance, would rather like to see more efforts to get state of
the art algorithms to be implemented in Scipy because that's something
that would make a real difference in my research work.
On this side, we are hiring a talented engineer to work on machine
learning in scipy, via the scikit learn. We already have the algorithm,
it is a question of QAing them, integrating them in the scikit,
writing docs and making releases.

Gaël
David Cournapeau
2009-08-04 08:35:02 UTC
Permalink
Post by Sebastian Walter
2 cents from an outsider who thought about contributing to
I think it is a good idea to make scipy easy to use for beginners.
However, after reading this thread, I have the impression that it is
not the goal to provide state of the art algorithms but rather making
Scipy as popular as possible by putting money and effort into the
"marketing" of Scipy.
Don't get me wrong, I think there are some good reasons why a project
should thrive for a large user base. Some of the best projects are
popular.
Alas, correlation does not imply causality.
Me for instance, would rather like to see more efforts to get state of
the art algorithms to be implemented in Scipy because that's something
that would make a real difference in my research work. Of course,
targeting the "clueless Matlab" users is quite pointless if it is that
what you are after.
One point which has not been mentioned concerning matlab-like
environment - maybe it is obvious and everyone implicitly acknowledges
it, but Mathworks is a 30 years old company, with > 1000 people today.

Building something like matlab, with a good GUI and top notch
documentation takes a huge amount of resources, of which the 'useful'
code is only a fraction. I of course don't know the details of matlab
implementation, but I know that for music oriented softwares (which need
good UI to sell well, and have non trivial computational requirements,
so the comparison is not totally stupid), the graphical code is 80 % of
the code. This ratio is consistent with the big open source audio
softwares as well (ardour, rosegarden). Worse, being cross platform
makes the problem much more difficult. For music softwares market, mac
os x is rarely ignored (~ 40-50% of the market I believe), so people
need to support two platforms, and that's really a lot of work. For
scientific software, I think you can go the non native route for the
graphical toolkit, though.

Also, very few open source software are successful as far as good GUI
are concerned (I don't want to enter into a debate here, but there are
good documents/studies on this topic). You need financial incentive for
this, so only projects backed up by big companies managed to pull it of.

IOW, I am pretty pessimistic about being a 'matlab' clone. We should
rather shoot for what makes numpy/scipy better (extensibility, cross
platform, actual language, etc...), because really, matlab will always
be a much better matlab than us. Price and licensing are not good enough
to justify migration - if what you want is a free matlab clone, why not
using octave or scilab anyway.

That does NOT mean that we should not aim at making the software more
accessible. I (and I guess other developers) are definitely interested
in a more product-like, integrated stack, to make the barrier of entry
lower. I for example am really tired of the installation problems
consistently reported. I feel like we cover mac os x and windows pretty
well now, but the linux situation is still dreadful. I have a few ideas
on how to improve the situation, but they all requires quite a bit of
work/infrastructure. I hope that soon, the scenario "I see this cool
python script on the internet, it requires this numpy/scipy thing, can I
try it in 2 minutes ?" will be a reality.
Post by Sebastian Walter
Then you really get some "killer applications". I could name a few
people who are coding some cool state of the art algorithms but waste
so much time because they started coding directly in C++. In the
meantime, they could have implemented the algorithms in Python _and_
in C++. If scipy had something really good that Matlab etc. do not
have: guess what ppl would do....
Yes, there are a lot of people who still don't know that there are
languages outside Fortran, C and C++. In my field, I still see some
people who implement parsers in C...
Post by Sebastian Walter
1) an easy, modular and flexible build system (fortran, c, c++, D,
swig, boost:python, cython,...)
you mean like numscons :) ? Adding D support to numscons should be easy.
For example, I added initial cython support in a couple of minutes
during the cython talk at SciPy08, adding new languages is relatively
easy thanks to scons.
Post by Sebastian Walter
a simple checkout, then ./manage.py startapp mycoolmodule
and everything is ready to go ( "Start coding in 5 minutes!")
there are various pieces to enable this (in place build, develop command
of setuptools, virtualenv/pip/easy_install), but yes, the situation is
kind of messy. For scikits, that's not so difficult - you should be
able to implement a trivial scikit by copying the scikits.example
package and starting from there.

One problem is that it is technically impossible to build in place and
test in one go because of a nose limitation ATM (for some reason, nose
fails to import a package if it is in the current directory).
Post by Sebastian Walter
3) a distributed version control system (e.g. git). SVN really scares me off...
That's a sensitive issue, I think we should avoid starting this one here
:) Needless to say, you can use git-svn - several core developers use it
for numpy/scipy dev, and we distribute an official import:

http://projects.scipy.org/numpy/browse_git

At least I have not touched svn for numpy/scipy development for > 6
months now, except to check releases when I tag them.
Post by Sebastian Walter
4) standardized unit tests
What do you mean exactly here ? We use nose for testing, what do you
consider "non standard".
Post by Sebastian Walter
5) automated documentation generation
It is almost automated now - but an example for scikits is missing in
the example package :)

cheers,

David
Sebastian Walter
2009-08-04 09:25:55 UTC
Permalink
On Tue, Aug 4, 2009 at 10:35 AM, David
Post by David Cournapeau
Post by Sebastian Walter
2 cents from an outsider who thought about contributing to
I think it is a good idea to make scipy easy to use for beginners.
However, after reading this thread, I have the impression that it is
not the goal to provide state of the art algorithms but rather making
Scipy as popular as possible by putting money and effort into the
"marketing" of Scipy.
Don't get me wrong, I think there are some good reasons why a project
should thrive for a large user base. Some of the best projects are
popular.
Alas, correlation does not imply causality.
Me for instance, would rather like to see more efforts to get state of
the art algorithms to be implemented in Scipy because that's something
that would make a real difference in my research work. Of course,
targeting the "clueless Matlab" users is quite pointless if it is that
what you are after.
One point which has not been mentioned concerning matlab-like
environment - maybe it is obvious and everyone implicitly acknowledges
it, but Mathworks is a 30 years old company, with > 1000 people today.
Building something like matlab, with a good GUI and top notch
documentation takes a huge amount of resources, of which the 'useful'
code is only a fraction. I of course don't know the details of matlab
implementation, but I know that for music oriented softwares (which need
good UI to sell well, and have non trivial computational requirements,
so the comparison is not totally stupid), the graphical code is 80 % of
the code. This ratio is consistent with the big open source audio
softwares as well (ardour, rosegarden). Worse, being cross platform
makes the problem much more difficult. For music softwares market, mac
os x is rarely ignored (~ 40-50% of the market I believe), so people
need to support two platforms, and that's really a lot of work. For
scientific software, I think you can go the non native route for the
graphical toolkit, though.
Also, very few open source software are successful as far as good GUI
are concerned (I don't want to enter into a debate here, but there are
good documents/studies on this topic). You need financial incentive for
this, so only projects backed up by big companies managed to pull it of.
IOW, I am pretty pessimistic about being a 'matlab' clone. We should
rather shoot for what makes numpy/scipy better (extensibility, cross
platform, actual language, etc...), because really, matlab will always
be a much better matlab than us. Price and licensing are not good enough
to justify migration - if what you want is a free matlab clone, why not
using octave or scilab anyway.
That does NOT mean that we should not aim at making the software more
accessible. I (and I guess other developers) are definitely interested
in a more product-like, integrated stack, to make the barrier of entry
lower. I for example am really tired of the installation problems
consistently reported. I feel like we cover mac os x and windows pretty
well now, but the linux situation is still dreadful. I have a few ideas
on how to improve the situation, but they all requires quite a bit of
work/infrastructure. I hope that soon, the scenario "I see this cool
python script on the internet, it requires this numpy/scipy thing, can I
try it in 2 minutes ?" will be a reality.
Post by Sebastian Walter
Then you really get some "killer applications". I could name a few
people who are coding some cool state of the art algorithms but waste
so much time because they started coding directly in C++. In the
meantime, they could have implemented the algorithms in Python _and_
in C++. If scipy had something really good that Matlab etc. do not
have: guess what ppl would do....
Yes, there are a lot of people who still don't know that there are
languages outside Fortran, C and C++. In my field, I still see some
people who implement parsers in C...
Post by Sebastian Walter
1) an easy, modular and flexible build system (fortran, c, c++, D,
swig, boost:python, cython,...)
you mean like numscons :) ? Adding D support to numscons should be easy.
For example, I added initial cython support in a couple of minutes
during the cython talk at SciPy08, adding new languages is relatively
easy thanks to scons.
Post by Sebastian Walter
a simple checkout, then ./manage.py startapp mycoolmodule
and everything is ready to go ( "Start coding in 5 minutes!")
there are various pieces to enable this (in place build, develop command
of setuptools, virtualenv/pip/easy_install), but yes, the situation is
kind of messy. For scikits, that's not so difficult - you should be
able to implement a trivial scikit by copying the scikits.example
package and starting from there.
One problem is that it is technically impossible to build in place and
test in one go because of a nose limitation ATM (for some reason, nose
fails to import a package if it is in the current directory).
Post by Sebastian Walter
3) a distributed version control system (e.g. git). SVN really scares me off...
That's a sensitive issue, I think we should avoid starting this one here
:) Needless to say, you can use git-svn - several core developers use it
http://projects.scipy.org/numpy/browse_git
At least I have not touched svn for numpy/scipy development for > 6
months now, except to check releases when I tag them.
Post by Sebastian Walter
4) standardized unit tests
What do you mean exactly here ? We use nose for testing, what do you
consider "non standard".
Post by Sebastian Walter
5) automated documentation generation
It is almost automated now - but an example for scikits is missing in
the example package :)
Just enumerating what I think would be useful to attract high quality
contributors. I'm aware that scipy has already a lot of the features
(which is nice).
But it would be even nicer to have a really low entry barrier and have
a framework that guides you to write good (and documented) code with
extensive unit tests, just like the big web frameworks (Django, RoR,
...)
It has to be a win-win situation for both the community and the developer.
Post by David Cournapeau
cheers,
David
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
David Cournapeau
2009-08-01 16:21:10 UTC
Permalink
Hi Joe,
Post by Joe Harrington
I define success as popular adoption in preference to commercial
packages.  I believe in vote-with-your-feet: this goal will not be
reached until all aspects of the package and its presentation to the
world exceed those of our commercial competition.  Scipy is now a
grass roots effort, but that takes it only so far.  Other projects,
such as OpenOffice and Sage, don't follow this model and do produce
quality products that compete with commercial offerings, at least on
open-source platforms.
I am not sure openoffice is a good example, but I share the sentiment
that something is missing in the organization of the community.

I think it is very important to keep in mind that in any open source
project, telling people what to do does not work well. Not everybody
will share the same goals, are interested in scipy in the same way,
etc... So any structure should help people doing what they want for
scipy's sake, but above all, should not alienate anyone who would have
worked on scipy otherwise. It may just be rhetoric, but saying that
"it would be nice for scipy to have this goal" instead of "we should
do this" matters IMHO.

Some of the things I am missing:
- no quantifiable feedback from users: if we want to work on a set of
features, we cannot prioritize. Likewise, we have very little
statistics on usage, platforms, etc... OTOH, this is often hard to
obtain for open source projects.
- a scipy foundation: several times already, I have been asked
privately to do add some feature to scipy, generally things which
takes a few hours max, in exchange for some money. It is too much of a
hassle to set up things to get money for a few hours work, and
frankly, for a few hours, I would prefer to ask people to give money
to a scipy foundation instead. Something like the R foundation
(http://www.r-project.org/foundation/main.html). A foundation with a
legal status would make the situation much easier w.r.t donations I
believe. It should not be that hard to set up.
- website: I think the root of the problem is lack of a dedicated
person for it, a person with design skills ideally, to design a
coherent graphic "chart" (not sure about the exact English word),
etc... I don't know how to get volunteers for this: it seems like many
projects manage to have such volunteers.
Post by Joe Harrington
- Packaging
 - Personal Package Archive or equivalent for every release of every
   OS for the full toolstack (There are tools that do this but we
   don't use them.  NSF requires Metronome - http://nmi.cs.wisc.edu/
   - for funding most development grants, so right now we're not even
   on NSF's radar.)
 - Track record of having the whole toolstack installation "just
   work" in a few command lines or clicks for *everyone*
 - Regular, scheduled releases of numpy and scipy
 - Coordinated releases of numpy, scipy, and stable scikits into PPA system
The problem of packaging is that it is hard to do well, but has no
technically challenging part in it. And it usually does not fall into
the "scratching ones' itch", because once you know how to build the
software, you are done and usually want to start using the damn thing.
Worse, it needs to be done every-time (every release). So this is
fundamentally different than doc: having done a great packaging work
for version N is useless after N+1 is out. It does not make sense to
pay someone to do it once.

Having some infrastructure would help: for example, something which
automatically builds packages on a set of supported platforms. It has
to be 100 % automatic, so that pushing one button get you the sources,
build the package, install it, and test it. This costs money and time
to set up.
Post by Joe Harrington
- Public communication
 - A real marketing plan
 - Executing on that plan
 - Web site geared toward multiple audiences, run by experts at that
   kind of communication
 - More webinars, conference booths, training, aimed at all levels
 - Demos, testimonials, topical forums, all showcased
Concerning communication with users, I think that the mailing lists do
not work well. It is ok for development, but it kinda sucks for
helping average users. Since I have been working on the dark side for
numpy/scipy- windows, I have been regularly using stackoverflow to ask
for some obscure windows stuff. stackoverflow is a a mix between a FAQ
and wikipedia. It works extremely well, and the user experience is way
above anything I have seen in this vein. Something like this to use
for scipy/numpy would be extremely useful I believe. It is vastly
superior to ML or wiki for focused problems ("how to do this in
matlab", "how to install on this linux distribution", etc...).

As an example of usage, R has recently used the main website so that
the most upvoted N R questions would be answered by R core developers
(during a R conference I believe). This all feels much better than ML
to me (again, as far as average user usage is concerned, not for
developer communication).

One website to handle all the user community, no need for complicated
forum rules and all (everything works with search and tags).
Stackoverflow works without any fixed hierarchy for many times more
participants that we will ever have, and much broader topics than us.

They will have soon a dedicated solution for custom websites using the
same stack - maybe something can be worked on as a open source
project.

David
Joe Harrington
2009-08-01 17:10:45 UTC
Permalink
[Replying only on scipy-dev, per the original post.]
Post by David Cournapeau
I think it is very important to keep in mind that in any open source
project, telling people what to do does not work well. Not everybody
will share the same goals, are interested in scipy in the same way,
etc... So any structure should help people doing what they want for
scipy's sake, but above all, should not alienate anyone who would
have worked on scipy otherwise. It may just be rhetoric, but saying
that "it would be nice for scipy to have this goal" instead of "we
should do this" matters IMHO.
I think (hope!) that everyone understands that anything posted here is
a personal opinion and that none of us feels we are in a position to
give orders. Nobody is boss or supervisor to the whole list. When I
write, "We need...," of course I am writing "It is my opinion that we
need," etc., but that gets tedious both to write and to read. Visions
should be bold.

That said, there do need to be goals, standards, etc. Those do
translate into telling people what to do. I think the key point is
that it must be the community, not any individual, that does the
telling. For example, we are engaged in a discussion of a plan I
floated. The list I posted is "my plan", but already we've added code
to the funding umbrella and no doubt there will be more changes (I
fully expected Robert Kern to flip out about my suggestion to remove
functions from numpy...maybe he didn't read that far...I expect to
lose that one.:-). I think that once it's the community's plan, we
can say no to contributions that don't fit, that conflict with others,
that are too slow or insufficient, and so on, because we will have the
critical mass to replace those contributions with ones the community
thinks are better. We see this already with the vigilant rejection of
change requests to the numpy API and the review comment system on the
doc wiki. We can and have to say no occasionally, to maintain our
direction and our standards. We just have to be careful about it and
make sure it is based on established community goals and norms, not
one person's random opinion.

More on some of your other points later...

--jh--
David Goldsmith
2009-08-01 21:56:52 UTC
Permalink
Post by Tommy Grav
Making scipy into a tool for science and engineering is in
my opinion 
a to broad a
goal. Making into a set of tools that are useable in many
fields and 
thus supporting
development of field specific packages is in again my
opinion the way 
to go.
Please clarify what you see as the difference between these two - to me, on the surface of it, your goal statement is no more "focused" nor "self-contained" than Joe's. Perhaps if you clarify what you see as the differences, we all may discover that your vision and Joe's actually aren't that far apart.

DG
Post by Tommy Grav
It narrows
the focus and makes the project more self contained.
Cheers
Tommy Grav
+
----------------------------------------------------------------------------+
Johns Hopkins University
+
----------------------------------------------------------------------------+
(410) 516-7683
http://web.mac.com/tgrav/Astronomy/Welcome.html
+
----------------------------------------------------------------------------+
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
Tommy Grav
2009-08-02 12:32:56 UTC
Permalink
Post by David Goldsmith
Post by Tommy Grav
Making scipy into a tool for science and engineering is in my opinion
a to broad a goal. Making into a set of tools that are useable in
many
fields and thus supporting development of field specific packages
is in again my
opinion the way to go.
Please clarify what you see as the difference between these two - to
me, on the surface
of it, your goal statement is no more "focused" nor "self-contained"
than Joe's. Perhaps
if you clarify what you see as the differences, we all may discover
that your vision and
Joe's actually aren't that far apart.
I don't think that Joe and I are that far apart either. My point (very
badly formulated) was
that trying to make scipy be a replacement for IDL or matlab is in my
opinion not the right
goal. IDL in particular has a lot of field specific code available in
it. I would like to see a
structure where scipy provides the underlaying code needed by many
fields (like the
Numerical Recipes codes) but stay away from providing field specific
code. Also scipy
should not venture into GUI or provide an interactive environment like
IDL (there are
other packages that provide this).

Just my opinion
Tommy Grav
David Goldsmith
2009-08-02 18:58:11 UTC
Permalink
Post by Tommy Grav
I don't think that Joe and I are that far apart either. My
point (very 
badly formulated) was
that trying to make scipy be a replacement for IDL or
matlab is in my 
opinion not the right
goal. IDL in particular has a lot of field specific code
available in 
it. I would like to see a
structure where scipy provides the underlaying code needed
by many 
fields (like the
Numerical Recipes codes) but stay away from providing field
specific 
code. Also scipy
should not venture into GUI or provide an interactive
environment like 
IDL (there are
other packages that provide this).
Just my opinion
    Tommy Grav
OK, that helps. :-)

Fine goal (between the two, I choose to remain neutral for now), but one comment: you say avoid a GUI, but the kind of "tool set" you describe would greatly benefit from (dare I say require) some sort of UI that makes it "easy" for the uninitiated (at the very least) to find the specific resources they need; IMO, for example, the UI LAPACK provides for this is a good example of how *not* to do it.

DG
Post by Tommy Grav
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
Neil Martinsen-Burrell
2009-08-02 20:44:49 UTC
Permalink
Post by David Goldsmith
Post by Tommy Grav
I don't think that Joe and I are that far apart either. My point
(very badly formulated) was that trying to make scipy be a
replacement for IDL or matlab is in my opinion not the right goal.
IDL in particular has a lot of field specific code available in it.
I would like to see a structure where scipy provides the
underlaying code needed by many fields (like the Numerical Recipes
codes) but stay away from providing field specific code. Also
scipy should not venture into GUI or provide an interactive
environment like IDL (there are other packages that provide this).
Just my opinion Tommy Grav
OK, that helps. :-)
Fine goal (between the two, I choose to remain neutral for now), but
one comment: you say avoid a GUI, but the kind of "tool set" you
describe would greatly benefit from (dare I say require) some sort of
UI that makes it "easy" for the uninitiated (at the very least) to
find the specific resources they need; IMO, for example, the UI
LAPACK provides for this is a good example of how *not* to do it.
This may be an instrumentality on the way to the "Goal of Scipy"
(whatever that is) but I wanted to mention here the importance of
reaching students with SciPy. Software vendors know this: if a student
learns about a certain type of computing using your software, then they
are likely to continue using your software throughout their career.
Matlab has been stupendously good at this sort of marketing in
engineering schools, where learning Matlab is seen by some as a
*required* part of the curriculum, due to its industry dominance.

Apropos of David's point about the relevance of a GUI, I think that in
addition to the packaging, documentation and communication aspects of
Joe's plan, an easy-to-install environment for interactive computation
is important for teaching students with SciPy. When I taught an
undergraduate class on Markov chains using numpy and scipy, it was hard
for students to install scipy. Once they had it installed, they were
able to be moderately productive in IDLE, but they missed some of the
features of IPython (command completion, saved inputs and output). An
interactive Python environment that allowed access to documentation, an
editor and a rich interpreter would have made the uptake much easier for
students.

In the past, Alan has spoken strongly about the importance of the matrix
class for teaching linear algebra and I want to echo his message about
the importance of pedagogical usability for the continued adoption of
the SciPy stack. Students who start using software in their classes
will continue using that software throughout their careers, particularly
so for something such as SciPy which has some significant advantages
over its better-known competitors. I think that there is a tendency for
active researchers to underestimate the importance of
undergraduate-level learning and I hope that in this discussion, we will
keep in mind the singular importance of that young audience.

-Neil
nicky van foreest
2009-08-02 20:55:46 UTC
Permalink
Hi,

my 2 cts: I completely agree. I try to "force" python upon my
students, and advice them to use python xy. In my opinion this package
is certainly a step in the right direction as it makes using
python/numpy/scipy very easy. Hopefully I am helping raising zealots
(of the right type :-) ) with this.

bye

NIcky
Post by Neil Martinsen-Burrell
Post by David Goldsmith
Post by Tommy Grav
I don't think that Joe and I are that far apart either. My point
(very badly formulated) was that trying to make scipy be a
replacement for IDL or matlab is in my opinion not the right goal.
IDL in particular has a lot of field specific code available in it.
I would like to see a structure where scipy provides the
underlaying code needed by many fields (like the Numerical Recipes
codes) but stay away from providing field specific code. Also
scipy should not venture into GUI or provide an interactive
environment like IDL (there are other packages that provide this).
Just my opinion Tommy Grav
OK, that helps. :-)
Fine goal (between the two, I choose to remain neutral for now), but
one comment: you say avoid a GUI, but the kind of "tool set" you
describe would greatly benefit from (dare I say require) some sort of
UI that makes it "easy" for the uninitiated (at the very least) to
find the specific resources they need; IMO, for example, the UI
LAPACK provides for this is a good example of how *not* to do it.
This may be an instrumentality on the way to the "Goal of Scipy"
(whatever that is) but I wanted to mention here the importance of
reaching students with SciPy.  Software vendors know this: if a student
learns about a certain type of computing using your software, then they
are likely to continue using your software throughout their career.
Matlab has been stupendously good at this sort of marketing in
engineering schools, where learning Matlab is seen by some as a
*required* part of the curriculum, due to its industry dominance.
Apropos of David's point about the relevance of a GUI, I think that in
addition to the packaging, documentation and communication aspects of
Joe's plan, an easy-to-install environment for interactive computation
is important for teaching students with SciPy.  When I taught an
undergraduate class on Markov chains using numpy and scipy, it was hard
for students to install scipy.  Once they had it installed, they were
able to be moderately productive in IDLE, but they missed some of the
features of IPython (command completion, saved inputs and output).  An
interactive Python environment that allowed access to documentation, an
editor and a rich interpreter would have made the uptake much easier for
students.
In the past, Alan has spoken strongly about the importance of the matrix
class for teaching linear algebra and I want to echo his message about
the importance of pedagogical usability for the continued adoption of
the SciPy stack.  Students who start using software in their classes
will continue using that software throughout their careers, particularly
so for something such as SciPy which has some significant advantages
over its better-known competitors.  I think that there is a tendency for
active researchers to underestimate the importance of
undergraduate-level learning and I hope that in this discussion, we will
keep in mind the singular importance of that young audience.
-Neil
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
a***@ajackson.org
2009-08-02 21:03:44 UTC
Permalink
---------------8< snip -------------------
Post by Neil Martinsen-Burrell
This may be an instrumentality on the way to the "Goal of Scipy"
(whatever that is) but I wanted to mention here the importance of
reaching students with SciPy. Software vendors know this: if a student
learns about a certain type of computing using your software, then they
are likely to continue using your software throughout their career.
Matlab has been stupendously good at this sort of marketing in
engineering schools, where learning Matlab is seen by some as a
*required* part of the curriculum, due to its industry dominance.
---------------8< snip -------------------
Post by Neil Martinsen-Burrell
-Neil
I'd like to echo these comments. I have been working for several years now to
get people in my company to use python, numpy, scipy, etc, and have made
progress, but the biggest battle I fight is with the Matlab people. Pretty
nearly every person we hire just out of school, and every summer intern comes
to us as a Matlab user, due to the excellent job Matlab has done with cheap
academic licenses. Anything we can do do get professors to start using the
python suite so that their students will learn and use it will pay great
dividends in the future.

- Alan
--
-----------------------------------------------------------------------
| Alan K. Jackson | To see a World in a Grain of Sand |
| ***@ajackson.org | And a Heaven in a Wild Flower, |
| www.ajackson.org | Hold Infinity in the palm of your hand |
| Houston, Texas | And Eternity in an hour. - Blake |
-----------------------------------------------------------------------
Gael Varoquaux
2009-08-02 21:09:42 UTC
Permalink
I think that there is a tendency for active researchers to
underestimate the importance of undergraduate-level learning and I
hope that in this discussion, we will keep in mind the singular
importance of that young audience.
That's all good and nice. I agree with you it is important, and I am very
happy to hear people talking about this, because it makes me hope that we
will be getting more help to do this.

If I work my ass off on an IDE, or more simply a GUI frontend, it won't
help me get more work done, which means shooting papers out, to be
cynical, and, in a few years, I will most likely not be doing any
scientific Python anymore. On the other hand, if I work on something that
is useful for my day to day work, I get some traction at the lab, and my
sleepless face is more easily forgiven. If I build an IDE that is of no
use to our work, nobody cares, and for a good reason.

This is not to say that we shouldn't be working on the IDE, I believe
that I am one of the people that have actually written code to do this,
but there is a lot of work to be done here, and working on making sure
that we have a shell to do this, and interactive plotting, and good
documentation is part of this work, and can be reused for direct research
interests. Writing docs is also something that can help a lot, does not
require extensive technical knowledge and takes a lot of time.

Actually, I must point out that I am quite unhappy, because I am very
tired, I have spent the week end fixing bugs on various open source
projects (nipy and mayavi) and answering complicated users questions. I
find that to be told that we are underestimating the importance of ease
of use and ease of learning is unfair. This simply takes a lot of time
and some of us are working on it.

Gaël
Neil Martinsen-Burrell
2009-08-03 14:29:35 UTC
Permalink
Post by Gael Varoquaux
I think that there is a tendency for active researchers to
underestimate the importance of undergraduate-level learning and I
hope that in this discussion, we will keep in mind the singular
importance of that young audience.
That's all good and nice. I agree with you it is important, and I am very
happy to hear people talking about this, because it makes me hope that we
will be getting more help to do this.
As I have time to spare apart from the teaching and researching duties
that I need to do to keep *my* job, I am glad to volunteer my time for
this effort. I have some things in mind for making Scipy accessible as
a module within a Numerical Analysis or Scientific Computing course that
I hope to work on within this calendar year.
Post by Gael Varoquaux
If I work my ass off on an IDE, or more simply a GUI frontend, it won't
help me get more work done, which means shooting papers out, to be
cynical, and, in a few years, I will most likely not be doing any
scientific Python anymore. On the other hand, if I work on something that
is useful for my day to day work, I get some traction at the lab, and my
sleepless face is more easily forgiven. If I build an IDE that is of no
use to our work, nobody cares, and for a good reason.
This is not to say that we shouldn't be working on the IDE, I believe
that I am one of the people that have actually written code to do this,
but there is a lot of work to be done here, and working on making sure
that we have a shell to do this, and interactive plotting, and good
documentation is part of this work, and can be reused for direct research
interests. Writing docs is also something that can help a lot, does not
require extensive technical knowledge and takes a lot of time.
Indeed, you have highlighted one of the difficulties in depending on
active domain scientists to create software projects: scratching one's
itch is not selfish, but necessary for their career. As Joe mentioned
about the Doc marathon funded through some of his grants, as his
granting situation gets tighter, the funding that he is able to devote
to SciPy development is drying up. I think that this is a persuasive
argument for the establishment of a SciPy Foundation which can provide
the organizational structure to pay willing developers for some of the
code which they develop. In doing so, we provide an alternative system
of rewards (however small) from the scientific career track.
Post by Gael Varoquaux
Actually, I must point out that I am quite unhappy, because I am very
tired, I have spent the week end fixing bugs on various open source
projects (nipy and mayavi) and answering complicated users questions. I
find that to be told that we are underestimating the importance of ease
of use and ease of learning is unfair. This simply takes a lot of time
and some of us are working on it.
I certainly appreciate your work fixing bugs on open-source scientific
projects. Thank you. It was not my intent to say that anyone is
"underestimating the importance of ease of use and ease of learning".
My intent was to highlight an audience for Scipy that has significant
importance for future uptake.

-Neil
Charles R Harris
2009-08-03 17:44:38 UTC
Permalink
Post by Neil Martinsen-Burrell
Post by Gael Varoquaux
I think that there is a tendency for active researchers to
underestimate the importance of undergraduate-level learning and I
hope that in this discussion, we will keep in mind the singular
importance of that young audience.
That's all good and nice. I agree with you it is important, and I am very
happy to hear people talking about this, because it makes me hope that we
will be getting more help to do this.
As I have time to spare apart from the teaching and researching duties
that I need to do to keep *my* job, I am glad to volunteer my time for
this effort. I have some things in mind for making Scipy accessible as
a module within a Numerical Analysis or Scientific Computing course that
I hope to work on within this calendar year.
Post by Gael Varoquaux
If I work my ass off on an IDE, or more simply a GUI frontend, it won't
help me get more work done, which means shooting papers out, to be
cynical, and, in a few years, I will most likely not be doing any
scientific Python anymore. On the other hand, if I work on something that
is useful for my day to day work, I get some traction at the lab, and my
sleepless face is more easily forgiven. If I build an IDE that is of no
use to our work, nobody cares, and for a good reason.
This is not to say that we shouldn't be working on the IDE, I believe
that I am one of the people that have actually written code to do this,
but there is a lot of work to be done here, and working on making sure
that we have a shell to do this, and interactive plotting, and good
documentation is part of this work, and can be reused for direct research
interests. Writing docs is also something that can help a lot, does not
require extensive technical knowledge and takes a lot of time.
Indeed, you have highlighted one of the difficulties in depending on
active domain scientists to create software projects: scratching one's
itch is not selfish, but necessary for their career.
Linus on selfish apropos Microsoft contributing driver code to linux:

I agree that it's driven by selfish reasons, but that's how all open source
code gets written! We all "scratch our own itches". It's why I started
Linux, it's why I started git, and it's why I am still involved. It's the
reason for everybody to end up in open source, to some degree.

So complaining about the fact that Microsoft picked a selfish area to work
on is just silly. Of course they picked an area that helps them. That's the
point of open source - the ability to make the code better for your
particular needs, whoever the 'your' in question happens to be.

Does anybody complain when hardware companies write drivers for the hardware
they produce? No. That would be crazy. Does anybody complain when IBM funds
all the POWER development, and works on enterprise features because they
sell into the enterprise? No. That would be insane.

So the people who complain about Microsoft writing drivers for their own
virtualization model should take a long look in the mirror and ask
themselves why they are being so hypocritical.

Chuck
Tommy Grav
2009-08-02 21:22:54 UTC
Permalink
Post by Neil Martinsen-Burrell
This may be an instrumentality on the way to the "Goal of Scipy"
(whatever that is) but I wanted to mention here the importance of
reaching students with SciPy. Software vendors know this: if a student
learns about a certain type of computing using your software, then they
are likely to continue using your software throughout their career.
Matlab has been stupendously good at this sort of marketing in
engineering schools, where learning Matlab is seen by some as a
*required* part of the curriculum, due to its industry dominance.
Apropos of David's point about the relevance of a GUI, I think that in
addition to the packaging, documentation and communication aspects of
Joe's plan, an easy-to-install environment for interactive computation
is important for teaching students with SciPy. When I taught an
undergraduate class on Markov chains using numpy and scipy, it was hard
for students to install scipy. Once they had it installed, they were
able to be moderately productive in IDLE, but they missed some of the
features of IPython (command completion, saved inputs and output). An
interactive Python environment that allowed access to documentation, an
editor and a rich interpreter would have made the uptake much easier for
students.
I agree with what you are saying, but I don't think scipy is the right
package
for this. The scipy package should in my opinion be like numpy, a self
contained package of methods that are frequently used in science and
engineering. In a sense it should provide the applied math. Then one can
have separate packages providing interpreters ala matlab and IDL that
sits
on top of the scipy package and other more field specific packages. I
think
that in thinking of scipy as a replacement for IDL and Matlab the
project becomes
to broad reaching and it gets harder to get everyone to pull in
approximately
the same direction.
Post by Neil Martinsen-Burrell
In the past, Alan has spoken strongly about the importance of the matrix
class for teaching linear algebra and I want to echo his message about
the importance of pedagogical usability for the continued adoption of
the SciPy stack. Students who start using software in their classes
will continue using that software throughout their careers,
particularly
so for something such as SciPy which has some significant advantages
over its better-known competitors. I think that there is a tendency for
active researchers to underestimate the importance of
undergraduate-level learning and I hope that in this discussion, we will
keep in mind the singular importance of that young audience.
I agree again, but I also think that students should learn how to code
in Python,
not in Sage/Python(x,y)/Scipy. The more of the core language the
student learns
the more powerful all the tools become.

Tommy
David Goldsmith
2009-08-02 22:22:56 UTC
Permalink
Not wishing to turn this into a mutual admiration society, but "+10" vis-a-vis everything Neil said! ;-)

DG

PS: I do feel, however, that a UI as rich as Neil implies, though such should be wholly supported both materially and "in spirit" by whatever entity takes responsibility for SciPy, such a UI should be "separable" from the SciPy "core," so that the latter is deliverable both with a "rich" UI and a "serviceable" UI.
Subject: Re: [SciPy-dev] SciPy Foundation
Date: Sunday, August 2, 2009, 1:44 PM
On 08/02/2009 01:58 PM, David
Post by David Goldsmith
Post by Tommy Grav
I don't think that Joe and I are that far apart
either. My point
Post by David Goldsmith
Post by Tommy Grav
(very badly formulated) was that trying to make
scipy be a
Post by David Goldsmith
Post by Tommy Grav
replacement for IDL or matlab is in my opinion not
the right goal.
Post by David Goldsmith
Post by Tommy Grav
IDL in particular has a lot of field specific code
available in it.
Post by David Goldsmith
Post by Tommy Grav
I would like to see a structure where scipy
provides the
Post by David Goldsmith
Post by Tommy Grav
underlaying code needed by many fields (like the
Numerical Recipes
Post by David Goldsmith
Post by Tommy Grav
codes) but stay away from providing field specific
code. Also
Post by David Goldsmith
Post by Tommy Grav
scipy should not venture into GUI or provide an
interactive
Post by David Goldsmith
Post by Tommy Grav
environment like IDL (there are other packages
that provide this).
Post by David Goldsmith
Post by Tommy Grav
Just my opinion Tommy Grav
OK, that helps. :-)
Fine goal (between the two, I choose to remain neutral
for now), but
Post by David Goldsmith
one comment: you say avoid a GUI, but the kind of
"tool set" you
Post by David Goldsmith
describe would greatly benefit from (dare I say
require) some sort of
Post by David Goldsmith
UI that makes it "easy" for the uninitiated (at the
very least) to
Post by David Goldsmith
find the specific resources they need; IMO, for
example, the UI
Post by David Goldsmith
LAPACK provides for this is a good example of how
*not* to do it.
This may be an instrumentality on the way to the "Goal of
Scipy"
(whatever that is) but I wanted to mention here the
importance of
reaching students with SciPy.  Software vendors know
this: if a student
learns about a certain type of computing using your
software, then they
are likely to continue using your software throughout their
career.
Matlab has been stupendously good at this sort of marketing
in
engineering schools, where learning Matlab is seen by some
as a
*required* part of the curriculum, due to its industry
dominance.
Apropos of David's point about the relevance of a GUI, I
think that in
addition to the packaging, documentation and communication
aspects of
Joe's plan, an easy-to-install environment for interactive
computation
is important for teaching students with SciPy.  When I
taught an
undergraduate class on Markov chains using numpy and scipy,
it was hard
for students to install scipy.  Once they had it
installed, they were
able to be moderately productive in IDLE, but they missed
some of the
features of IPython (command completion, saved inputs and
output).  An
interactive Python environment that allowed access to
documentation, an
editor and a rich interpreter would have made the uptake
much easier for
students.
In the past, Alan has spoken strongly about the importance
of the matrix
class for teaching linear algebra and I want to echo his
message about
the importance of pedagogical usability for the continued
adoption of
the SciPy stack.  Students who start using software in
their classes
will continue using that software throughout their careers,
particularly
so for something such as SciPy which has some significant
advantages
over its better-known competitors.  I think that there
is a tendency for
active researchers to underestimate the importance of
undergraduate-level learning and I hope that in this
discussion, we will
keep in mind the singular importance of that young
audience.
-Neil
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
Prabhu Ramachandran
2009-08-03 17:20:28 UTC
Permalink
Post by Joe Harrington
About sixteen months ago, I launched the SciPy Documentation Project
and its Marathon. Dozens pitched in and now numpy docs are rapidly
approaching a professional level. The "pink wave" ("Needs Review"
status) is at 56% today! There is consensus among doc writers that
much of the rest can be labeled in the "unimportant" category, so
we're close to starting the review push (hold your fire, there is a
web site mod to be done first).
We're also nearing the end of the summer, and it's time to look ahead.
The path for docs is clear, but the path for SciPy is not. I think
our weakest area right now is organization of the project. There is
no consensus-based plan for improvement of the whole toward a stated
goal, no centralized coordination of work, and no funded work focused
on many of our weaknesses, notwithstanding my doc effort and what
Enthought does for code.
Thank you for your efforts!

I believe I will be able to help this effort in various ways over the
next few years from India as part of a large government grant. I do not
have the time to discuss it here at the moment but I will be at SciPy09
and would love to discuss it there in person. I will also be talking
briefly about our overall goals there. Specifically see:

http://conference.scipy.org/abstract?id=13

regards,
prabhu
David Goldsmith
2009-08-03 17:33:27 UTC
Permalink
Thanks, Prabhu, this looks very promising! I look forward to your talk!

DG
Subject: Re: [SciPy-dev] SciPy Foundation
Date: Monday, August 3, 2009, 10:20 AM
On 07/31/09 22:36, Joe Harrington
Post by Joe Harrington
About sixteen months ago, I launched the SciPy
Documentation Project
Post by Joe Harrington
and its Marathon.  Dozens pitched in and now
numpy docs are rapidly
Post by Joe Harrington
approaching a professional level.  The "pink
wave" ("Needs Review"
Post by Joe Harrington
status) is at 56% today!  There is consensus
among doc writers that
Post by Joe Harrington
much of the rest can be labeled in the "unimportant"
category, so
Post by Joe Harrington
we're close to starting the review push (hold your
fire, there is a
Post by Joe Harrington
web site mod to be done first).
We're also nearing the end of the summer, and it's
time to look ahead.
Post by Joe Harrington
The path for docs is clear, but the path for SciPy is
not.  I think
Post by Joe Harrington
our weakest area right now is organization of the
project.  There is
Post by Joe Harrington
no consensus-based plan for improvement of the whole
toward a stated
Post by Joe Harrington
goal, no centralized coordination of work, and no
funded work focused
Post by Joe Harrington
on many of our weaknesses, notwithstanding my doc
effort and what
Post by Joe Harrington
Enthought does for code.
Thank you for your efforts!
I believe I will be able to help this effort in various
ways over the
next few years from India as part of a large government
grant.  I do not
have the time to discuss it here at the moment but I will
be at SciPy09
and would love to discuss it there in person.  I will
also be talking
briefly about our overall goals there.  Specifically
  http://conference.scipy.org/abstract?id=13
regards,
prabhu
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
David Goldsmith
2009-08-04 18:53:20 UTC
Permalink
At this point I think the question becomes: do we let the (clear) fact that there is not a single set of priorities for where SciPy should be headed (which I do not see as a bad thing at this stage) get in the way of the community moving on *some* proposal (e.g., Joe's, with mods) for *some* "not-for-profit entity" (e.g., a "SciPy Foundation," the original topic of this thread) that will function as an institutional resource for furthering whichever priorities for SciPy should bubble to the surface? In other words, this thread is diverging (into territory necessary to discuss, yes), but can we at least agree (a semi-rhetorical question because I think the answer is clearly "yes") that something along the lines of a "SciPy Foundation" would be useful, certainly for helping us move SciPy where we want it to go, but perhaps also for helping us decide where as well?

DG
Subject: Re: [SciPy-dev] SciPy Foundation
Date: Tuesday, August 4, 2009, 2:25 AM
On Tue, Aug 4, 2009 at 10:35 AM,
David
Post by David Cournapeau
Post by Sebastian Walter
2 cents from an outsider who thought about
contributing to
Post by David Cournapeau
Post by Sebastian Walter
I think it is a good idea to make scipy easy to
use for beginners.
Post by David Cournapeau
Post by Sebastian Walter
However, after reading this thread, I have the
impression that it is
Post by David Cournapeau
Post by Sebastian Walter
not the goal to provide state of the art
algorithms but rather making
Post by David Cournapeau
Post by Sebastian Walter
Scipy as popular as possible by putting money and
effort into the
Post by David Cournapeau
Post by Sebastian Walter
"marketing" of Scipy.
Don't get me wrong, I think there are some good
reasons why a project
Post by David Cournapeau
Post by Sebastian Walter
should thrive for a large user base. Some of the
best projects are
Post by David Cournapeau
Post by Sebastian Walter
popular.
Alas, correlation does not imply causality.
Me for instance, would rather like to see more
efforts to get state of
Post by David Cournapeau
Post by Sebastian Walter
the art algorithms to be implemented in Scipy
because that's something
Post by David Cournapeau
Post by Sebastian Walter
that would make a real difference in my research
work. Of course,
Post by David Cournapeau
Post by Sebastian Walter
targeting the "clueless Matlab" users is quite
pointless if it is that
Post by David Cournapeau
Post by Sebastian Walter
what you are after.
One point which has not been mentioned concerning
matlab-like
Post by David Cournapeau
environment - maybe it is obvious and everyone
implicitly acknowledges
Post by David Cournapeau
it, but Mathworks is a 30 years old company, with >
1000 people today.
Post by David Cournapeau
Building something like matlab, with a good GUI and
top notch
Post by David Cournapeau
documentation takes a huge amount of resources, of
which the 'useful'
Post by David Cournapeau
code is only a fraction. I of course don't know the
details of matlab
Post by David Cournapeau
implementation, but I know that for music oriented
softwares (which need
Post by David Cournapeau
good UI to sell well, and have non trivial
computational requirements,
Post by David Cournapeau
so the comparison is not totally stupid), the
graphical code is 80 % of
Post by David Cournapeau
the code. This ratio is consistent with the big open
source audio
Post by David Cournapeau
softwares as well (ardour, rosegarden). Worse, being
cross platform
Post by David Cournapeau
makes the problem much more difficult. For music
softwares market, mac
Post by David Cournapeau
os x is rarely ignored (~ 40-50% of the market I
believe), so people
Post by David Cournapeau
need to support two platforms, and that's really a lot
of work. For
Post by David Cournapeau
scientific software, I think you can go the non native
route for the
Post by David Cournapeau
graphical toolkit, though.
Also, very few open source software are successful as
far as good GUI
Post by David Cournapeau
are concerned (I don't want to enter into a debate
here, but there are
Post by David Cournapeau
good documents/studies on this topic). You need
financial incentive for
Post by David Cournapeau
this, so only projects backed up by big companies
managed to pull it of.
Post by David Cournapeau
IOW, I am pretty pessimistic about being a 'matlab'
clone. We should
Post by David Cournapeau
rather shoot for what makes numpy/scipy better
(extensibility, cross
Post by David Cournapeau
platform, actual language, etc...), because really,
matlab will always
Post by David Cournapeau
be a much better matlab than us. Price and licensing
are not good enough
Post by David Cournapeau
to justify migration - if what you want is a free
matlab clone, why not
Post by David Cournapeau
using octave or scilab anyway.
That does NOT mean that we should not aim at making
the software more
Post by David Cournapeau
accessible. I (and I guess other developers) are
definitely interested
Post by David Cournapeau
in a more product-like, integrated stack, to make the
barrier of entry
Post by David Cournapeau
lower. I for example am really tired of the
installation problems
Post by David Cournapeau
consistently reported. I feel like we cover mac os x
and windows pretty
Post by David Cournapeau
well now, but the linux situation is still dreadful. I
have a few ideas
Post by David Cournapeau
on how to improve the situation, but they all requires
quite a bit of
Post by David Cournapeau
work/infrastructure. I hope that soon, the scenario "I
see this cool
Post by David Cournapeau
python script on the internet, it requires this
numpy/scipy thing, can I
Post by David Cournapeau
try it in 2 minutes ?" will be a reality.
Post by Sebastian Walter
Then you really get some "killer applications". I
could name a few
Post by David Cournapeau
Post by Sebastian Walter
people who are coding some cool state of the art
algorithms but waste
Post by David Cournapeau
Post by Sebastian Walter
so much time because they started coding directly
in C++. In the
Post by David Cournapeau
Post by Sebastian Walter
meantime, they could have implemented the
algorithms in Python _and_
Post by David Cournapeau
Post by Sebastian Walter
in C++. If scipy had something really good that
Matlab etc. do not
Post by David Cournapeau
Post by Sebastian Walter
have: guess what ppl would do....
Yes, there are a lot of people who still don't know
that there are
Post by David Cournapeau
languages outside Fortran, C and C++. In my field, I
still see some
Post by David Cournapeau
people who implement parsers in C...
Post by Sebastian Walter
1) an easy, modular and flexible build system
(fortran, c, c++, D,
Post by David Cournapeau
Post by Sebastian Walter
swig, boost:python, cython,...)
you mean like numscons :) ? Adding D support to
numscons should be easy.
Post by David Cournapeau
For example, I added initial cython support in a
couple of minutes
Post by David Cournapeau
during the cython talk at SciPy08, adding new
languages is relatively
Post by David Cournapeau
easy thanks to scons.
Post by Sebastian Walter
2) very low entry barrier for possible
   a simple checkout, then 
./manage.py startapp  mycoolmodule
Post by David Cournapeau
Post by Sebastian Walter
   and everything is ready to go (
"Start coding in 5 minutes!")
Post by David Cournapeau
there are various pieces to enable this (in place
build, develop command
Post by David Cournapeau
of setuptools, virtualenv/pip/easy_install), but yes,
the situation is
Post by David Cournapeau
kind of messy. For scikits, that's not so
difficult  - you should be
Post by David Cournapeau
able to implement a trivial scikit by copying the
scikits.example
Post by David Cournapeau
package and starting from there.
One problem is that it is technically impossible to
build in place and
Post by David Cournapeau
test in one go because of a nose limitation ATM (for
some reason, nose
Post by David Cournapeau
fails to import a package if it is in the current
directory).
Post by David Cournapeau
Post by Sebastian Walter
3) a distributed version control system (e.g.
git). SVN really scares me off...
Post by David Cournapeau
That's a sensitive issue, I think we should avoid
starting this one here
Post by David Cournapeau
:) Needless to say, you can use git-svn - several core
developers use it
Post by David Cournapeau
for numpy/scipy dev, and we distribute an official
http://projects.scipy.org/numpy/browse_git
At least I have not touched svn for numpy/scipy
development for > 6
Post by David Cournapeau
months now, except to check releases when I tag them.
Post by Sebastian Walter
4) standardized unit tests
What do you mean exactly here ? We use nose for
testing, what do you
Post by David Cournapeau
consider "non standard".
Post by Sebastian Walter
5) automated documentation generation
It is almost automated now - but an example for
scikits is missing in
Post by David Cournapeau
the example package :)
Just enumerating what I think would be useful to attract
high quality
contributors.  I'm aware that scipy has already 
a lot of the features
(which is nice).
But it would be even nicer to have a really low entry
barrier and have
a framework that guides you to write good (and documented)
code with
extensive unit tests, just like the big web frameworks
(Django, RoR,
...)
It has to be a win-win situation for both the community and
the developer.
Post by David Cournapeau
cheers,
David
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________
Scipy-dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev
Robert Kern
2009-08-04 19:37:01 UTC
Permalink
At this point I think the question becomes: do we let the (clear) fact that there is not a single set of priorities for where SciPy should be headed (which I do not see as a bad thing at this stage) get in the way of the community moving on *some* proposal (e.g., Joe's, with mods) for *some* "not-for-profit entity" (e.g., a "SciPy Foundation," the original topic of this thread) that will function as an institutional resource for furthering whichever priorities for SciPy should bubble to the surface?  In other words, this thread is diverging (into territory necessary to discuss, yes), but can we at least agree (a semi-rhetorical question because I think the answer is clearly "yes") that something along the lines of a "SciPy Foundation" would be useful, certainly for helping us move SciPy where we want it to go, but perhaps also for helping us decide where as well?
Perhaps a new name would be in order. I think a lot of the
disagreement in vision arises from the fact that a number of the very
good ideas about how to encourage the use of Python in the sciences,
which could be implemented by the people involved in
SciPy-the-project, are being conflated with scipy-the-package. Things
like IDEs and GUIs and applications do not fit into scipy-the-package
as it currently exists, and changing scipy-the-package such that they
do fit in deteriorates what scipy-the-package is good at now.

Personally, I see scipy-the-package as something very close in spirit
to what GSL is to C: a library of quality numerical algorithms useful
to science and engineering. scipy-the-package is not everything that
is required to advance Python's use in the sciences. It can't be. A
single Python package is the wrong technology for delivering all of
that functionality.

I think we need to step back and question the question itself. Perhaps
we should not be asking "where should scipy(-the-package) be heading?"
but "what do we need to do advance Python's use in the sciences?" I
don't think a Foundation helps the former much, but I do think the
latter would be an excellent mission for one. scipy-the-package is a
component of what the Foundation might work one, but I think it would
make a huge mistake if it fixated on scipy-the-package and assumed
that all of the work it does needs to be jammed into
scipy-the-package.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
Gael Varoquaux
2009-08-04 19:41:00 UTC
Permalink
I fully agree with your analysis Robert.

I had this discussion with Eric, and he did mention that it would be
useful if the name was reminiscent of 'SciPy', because it is a higly
visible name.

Should we have a BOF on that at the SciPy conference? Mailing list
discussions tend to go in a circle.
Post by Robert Kern
Perhaps a new name would be in order. I think a lot of the
disagreement in vision arises from the fact that a number of the very
good ideas about how to encourage the use of Python in the sciences,
which could be implemented by the people involved in
SciPy-the-project, are being conflated with scipy-the-package. Things
like IDEs and GUIs and applications do not fit into scipy-the-package
as it currently exists, and changing scipy-the-package such that they
do fit in deteriorates what scipy-the-package is good at now.
Personally, I see scipy-the-package as something very close in spirit
to what GSL is to C: a library of quality numerical algorithms useful
to science and engineering. scipy-the-package is not everything that
is required to advance Python's use in the sciences. It can't be. A
single Python package is the wrong technology for delivering all of
that functionality.
I think we need to step back and question the question itself. Perhaps
we should not be asking "where should scipy(-the-package) be heading?"
but "what do we need to do advance Python's use in the sciences?" I
don't think a Foundation helps the former much, but I do think the
latter would be an excellent mission for one. scipy-the-package is a
component of what the Foundation might work one, but I think it would
make a huge mistake if it fixated on scipy-the-package and assumed
that all of the work it does needs to be jammed into
scipy-the-package.
Robert Kern
2009-08-04 19:45:32 UTC
Permalink
On Tue, Aug 4, 2009 at 14:41, Gael
Post by Gael Varoquaux
I fully agree with your analysis Robert.
I had this discussion with Eric, and he did mention that it would be
useful if the name was reminiscent of 'SciPy', because it is a higly
visible name.
Should we have a BOF on that at the SciPy conference? Mailing list
discussions tend to go in a circle.
We could get a bikeshed, some paint, and some brushes. Everyone who
wants to contribute an idea must paint it on the bikeshed.

I like it.

Anyways, it could probably even be called the SciPy Foundation as long
as the introductory material was very explicit about its relationship
to scipy-the-package and the founding members use language carefully.
Tricky, but doable.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
Robert Kern
2009-08-04 19:48:10 UTC
Permalink
Post by Robert Kern
We could get a bikeshed, some paint, and some brushes. Everyone who
wants to contribute an idea must paint it on the bikeshed.
In their preferred color, of course.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
Ondrej Certik
2009-08-04 21:28:52 UTC
Permalink
Post by Robert Kern
Post by Robert Kern
We could get a bikeshed, some paint, and some brushes. Everyone who
wants to contribute an idea must paint it on the bikeshed.
In their preferred color, of course.
Maybe everyone should bring a bike too.

Ondrej
Charles R Harris
2009-08-04 21:47:21 UTC
Permalink
Post by Ondrej Certik
Post by Robert Kern
Post by Robert Kern
We could get a bikeshed, some paint, and some brushes. Everyone who
wants to contribute an idea must paint it on the bikeshed.
In their preferred color, of course.
Maybe everyone should bring a bike too.
It would be nice if the hotels would offer rental bikes. As is, the bike
stores are far enough away that getting the bike and dropping it off on
departure, is too much hassle.

Chuck
Gael Varoquaux
2009-08-04 21:49:41 UTC
Permalink
Post by Ondrej Certik
Post by Robert Kern
Post by Robert Kern
We could get a bikeshed, some paint, and some brushes. Everyone who
wants to contribute an idea must paint it on the bikeshed.
In their preferred color, of course.
Maybe everyone should bring a bike too.
It would be nice if the hotels would offer rental bikes. As is, the bike
stores are far enough away that getting the bike and dropping it off on
departure, is too much hassle.
The good news is that you'll find a bikeshed to shelter your bike when
you get there.

Actually, I suggest some people bring bikesheds too. As I am unsure of
my favorite color.

Gaël
Joe Harrington
2009-08-16 14:40:51 UTC
Permalink
I've finally had time to look at all the replies to this thread.
There were dozens, so rather than quoting and responding to everyone
individually, I'll summarize. The short version is that due to an
early misunderstanding, we spent a lot of bandwidth generating
agreement that masqueraded as dissent! In the end, I think we have
general agreement (and no specific dissent) to the idea of an
organization dedicated to development of scientific tools in Python
and gathering and disbursing funds to that end. We even agree on our
major priorities. So, I propose that we move forward with planning.
There's a BoF proposal at the end.


Here's the longer version:

1. Objection: The mission statement stuffs too much into one package.
The scipy package doesn't need a GUI! (Long post by Gael 2009-08-01
22:52:16, shorter one by Robert 2009-08-04 19:37:01, many others.)

My apologies to these fine gentlemen and others who discussed on this
threadlet, but this was a bit of a bandwidth waster since I started my
proposed mission statement with "(The toolstack)", not "SciPy" or
"scipy". Of course nobody would go to such lengths just for one
package, nor propose stuffing so much into it that exists elsewhere
and is in wide use already (GUIs, interactive shells, etc.). We're
talking broadly about scientific use of Python.

Robert proposed a name change to avoid such ambiguity. SciPython?
SciPyStack? Py4Sci? Scientific Python is taken. I really prefer
SciPy, as it has branding already, but perhaps SciPyStack is ok
informally. I think we're stuck with SciPy for formal docs, web site,
etc, just like JPL (which has not studied the propulsion of jets or
rocket engines for decades). What this means is:

a. POSTERS BE CLEAR: specify package or toolstack when you talk about
scipy. Use "SciPy" for the toolstack and "scipy" for the package, but
don't rely on that alone. (note: I did this!)

b. RESPONDENTS BE CAREFUL: double-check what the poster wrote before
replying if it's about "scipy" or "SciPy".


2. It's important for the package structure to be light.

Yes! I am not proposing to change the package structure at all.
People need to be able to pick and choose, and it needs to be light
for many reasons, such as OLPC.

However, as a practical matter, I know of *nobody* who is a heavy user
and who does not install a significant number of packages. We install
about 15 python-related packages now for our group. It has become a
nightmare that takes my very experienced system manager, an Ubuntu
developer with a PhD in computer science, several days. Basically, if
you want everything current (e.g., to get recent docs in numpy, or HDF
libraries that actually work), it is hard to do a consistent build
without doing a lot of patching. Clearly, most potential users cannot
tolerate that, or even do it.

So, I would like to see packaging *coordination* such that a
monolithic install is as trivial for the user as it is to install one
package.
Johann Cohen-Tanugi
2009-08-16 17:01:13 UTC
Permalink
Hi Joe, just one quick comment : I really think that you cannot use
scipy name without certainly creating misunderstandings down the line.
It is crazy in my mind to rely on 2 upper/lowercases to differentiate 2
different "objects". I do not like the difference package/toolstack
either. For one thing you may have more confusion coming from the non
English native speakers than you really wish!
Why no Py4Science? It does convey the ultimate goal of this effort, and
I only saw it in the context of ipython and matplotlib : first hit from
google is http://ipython.scipy.org/moin/Py4Science which was a practical
workshop in python usage for scientific work (I think content still
lives in matplotlib SVN).

anyway, my two cents.....

Johann
Post by Joe Harrington
I've finally had time to look at all the replies to this thread.
There were dozens, so rather than quoting and responding to everyone
individually, I'll summarize. The short version is that due to an
early misunderstanding, we spent a lot of bandwidth generating
agreement that masqueraded as dissent! In the end, I think we have
general agreement (and no specific dissent) to the idea of an
organization dedicated to development of scientific tools in Python
and gathering and disbursing funds to that end. We even agree on our
major priorities. So, I propose that we move forward with planning.
There's a BoF proposal at the end.
1. Objection: The mission statement stuffs too much into one package.
The scipy package doesn't need a GUI! (Long post by Gael 2009-08-01
22:52:16, shorter one by Robert 2009-08-04 19:37:01, many others.)
My apologies to these fine gentlemen and others who discussed on this
threadlet, but this was a bit of a bandwidth waster since I started my
proposed mission statement with "(The toolstack)", not "SciPy" or
"scipy". Of course nobody would go to such lengths just for one
package, nor propose stuffing so much into it that exists elsewhere
and is in wide use already (GUIs, interactive shells, etc.). We're
talking broadly about scientific use of Python.
Robert proposed a name change to avoid such ambiguity. SciPython?
SciPyStack? Py4Sci? Scientific Python is taken. I really prefer
SciPy, as it has branding already, but perhaps SciPyStack is ok
informally. I think we're stuck with SciPy for formal docs, web site,
etc, just like JPL (which has not studied the propulsion of jets or
a. POSTERS BE CLEAR: specify package or toolstack when you talk about
scipy. Use "SciPy" for the toolstack and "scipy" for the package, but
don't rely on that alone. (note: I did this!)
b. RESPONDENTS BE CAREFUL: double-check what the poster wrote before
replying if it's about "scipy" or "SciPy".
2. It's important for the package structure to be light.
Yes! I am not proposing to change the package structure at all.
People need to be able to pick and choose, and it needs to be light
for many reasons, such as OLPC.
However, as a practical matter, I know of *nobody* who is a heavy user
and who does not install a significant number of packages. We install
about 15 python-related packages now for our group. It has become a
nightmare that takes my very experienced system manager, an Ubuntu
developer with a PhD in computer science, several days. Basically, if
you want everything current (e.g., to get recent docs in numpy, or HDF
libraries that actually work), it is hard to do a consistent build
without doing a lot of patching. Clearly, most potential users cannot
tolerate that, or even do it.
So, I would like to see packaging *coordination* such that a
monolithic install is as trivial for the user as it is to install one
package.
Robert Kern
2009-08-16 17:09:07 UTC
Permalink
 EPD focuses on Windows.
Uh, no.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
Johann Cohen-Tanugi
2009-08-16 17:09:07 UTC
Permalink
Maybe a confusion with Python(x,y)? I believe it has a very short
release cycle (at least in the recent past), which seems to be what Joe
would like to see happening for the toolstack, so that many different
flavors of science can easily have an updated set of packages that meet
their needs. I guess EPD has a much longer release cycle.....

Johann
Post by Robert Kern
EPD focuses on Windows.
Uh, no.
Loading...