Discussion:
[Numpy-discussion] Moving away from svn ?
David Cournapeau
2008-01-04 11:24:04 UTC
Permalink
Hi,

First things first, happy new year to all !

Having recently felt the pain to use subversion merge, I was
wondering about people's feeling on moving away from subversion and
using a better system, ala mercurial or bzr (I will talk about bzr
because that's the one I know the most, but this discussion is really
about using something better than subversion, not that much about bzr).
I think this could be an important step forward, and is somewhat related
to the discusions on scikits and co.
As some of you are certainly aware, there has been a recent trend
towards so called Distributed Version Control Systems (DVCS). I won't go
into the details, because it varies from system to system, and I am in
no position to explain technical details. But for people who are
wondering, here is a small description of DVCS, and why I think this can
be a significant step forward for numpy/scipy. You can skip it if you
know about them

What is a DVCS
==============

DVCS, contrary to centralized systems like CVS or SVN, has no
technical concept of a central repository from which everybodoy pulls
and push changes. Instead, the DVCS are centered around the branch
concept, which contains a local copy of the history. As a consequence:

1 you can do most traditionnal svn/cvs operations locally, and
disconnected from the network (getting the log, getting annotations,
commiting, branching, merging from other branches).
2 because the branch is local, no rights is needed: anybody can
jump-in, commit to a local branch. Of course, integration in an official
numpy branch would need some special approval.

Also, this has the following consequence: since branching/merging is
such a key point of DVCS, merging actually works with DVCS. In
perticular, merging several times the same changes works, and you
certainly do not have to do the whole svn madness of tracking versions.

For more informations, here are some links which go much deeper:
- some discussion from K. Richard, the maintained of X.org:
http://keithp.com/blogs/Tyrannical_SCM_selection/,
http://keithp.com/blogs/Repository_Formats_Matter/
- Linus Torvald on the advantages of git, the DVCS he wrote for
linux developement, versus svn for kde (long but it really makes all the
points really clearly):
http://lists.kde.org/?l=kde-core-devel&m=118764715705846&w=2

Why using a DVCS ?
==================

Some people argue that DVCS are intrinsically more complicated, which is
something I really don't understand. I've been programming 'seriously'
for only about 2-3 years, and I find bzr much easier to use and setting
up than subversion; the key point I think is that I started using DVCS
before centralized ones. Some things which are utterly complicated with
subversion and are trivial with bzr: merging, going back into the
history (that is at rev 150, you realize that everything from rev 140 is
rubbish, and you want to go back: this is extremely tedious to do with
subversion). Basically, most of the things which are the reasons why we
use VCS in the first place are easier with DVCS than VCS (at least as
far as svn is concerned). Also:

- For a casual user who wants to use the last development instead of a
release, getting it from a bzr repository, a git repository, a mercurial
repository or a svn repository is extremely similar. It is one step in
all cases.

- For casual developers: being able to use branches means that they can
implements their new features in a change-set oriented way instead of
one big patch. Also, bzr enables things like uncommit if you made a
mistake and wants to go back. More generally, going back in history is
much easier.

- For core developers: I personally find the ability to use branches for
each new feature to be extremely useful. It makes me feel much safer
when I do something. I am not afraid of doing something totally stupid
which may end up screwing other people.

And finally, I find the ability to do things locally to be really
pleasant and it enables workflows not really possible with systems such
as SVN. In particular, I work at three distant places every week, and
the ability to work in the transportation, and the trivial
synchronization between computers is definitely helpful. Instantaneous
log and annotations is also really useful IMHO.

Which DVCS ?
============

The 3 ones which keeps coming up are:
- git (the one used for linux kernel development). That's the one I
know the least (only from a user point of view, never used it for
developement). It is supposed to be more powerful, more complicated than
the others. It is also known to be really fast (the kernel is not a
small codebase for sure).
- mercurial: started at the same time than git. Is written in python
except for a few things written in C. It is reasonably fast, and has
been recently selected for some bigs projects, in perticular by Sun
(openJDK, openSolaris, open Netbeans).
- bzr: also written in python. Sponsored by Cannonical, the company
between Ubuntu. It has just reached the 1.0 version. The focus is on the
UI; handles renaming really well. It has a vibrant community, with
dedicated developers working on it; it has the reputation of being slow,
which was somewhat true previously, but in my experience, it is on par
with mercurial, at least for local operations. Anyway, it is not a
problem for numpy or scipy, which are small codebases (a few thousand of
files, a few thousand revisions).

Problems:
=========

Assuming people think it worths being tried out, I mainly see two problems:
- importing the current history
- integration with trac

For bzr, I can say that the bzr-svn plugin works really well; in
perticular, it can import numpy and scipy repositories with the whole
history, I am using it regurlarly as a proxy between local bzr and the
scipy and scikits trunk. Incidentally, this makes it possible for me to
give numbers if numbers are needed wrt bzr's speed, repository size, etc...

For mercurial, I tried one method once which did not go really far, but
I did not try really hard; anyway, I think people at enthought use
mercurial a lot, so they would know better.

Integration with trac is the real problem, I think. According to one bzr
developer, trac model (0.10, the last released one) is really based
around subversion notion of repository, which does not fit well with
mercurial and bzr. I don't know if this is true for the not yet released
0.11. If bzr is considered a possible candidate, I can get more
informations from bzr developers. What is the experience wrt trac from
enthought developers ?

This email is already getting pretty long, so to conclude, I think DVCS
would be helpful for future development of numpy/scipy. I believe it
would both enable easier participation from different people, enabling
safer developement schedules, etc... What do other people think ? Would
it be worthwhile to discuss further around the issues and how to resolve
things ?

cheers,

David

P.S: I would be willing to take care about the bzr side of things:
trying conversion, setting up experimental repositories for trial, and
asking advices to the bzr community.
dmitrey
2008-01-04 11:48:49 UTC
Permalink
As for me, I would wait until DVCS became more popular than svn. Jump
often from one VSC to another isn't a good idea, moreover, it's not
clear for now which DVCS will suppress others and became standard (being
installed in many OS by default).

Also, I would prefer (for example my openopt) changes being available 24
hours/day immediately after I have commit them; also, keeping them too
long only in my HDD makes data more vulnerable - computer viruses, often
electricity drops, other possible causes to lose data.
// my 2 cents

Regards, D.
Hi,
Neal Becker
2008-01-04 11:56:51 UTC
Permalink
Post by dmitrey
As for me, I would wait until DVCS became more popular than svn. Jump
often from one VSC to another isn't a good idea, moreover, it's not
clear for now which DVCS will suppress others and became standard (being
installed in many OS by default).
Also, I would prefer (for example my openopt) changes being available 24
hours/day immediately after I have commit them; also, keeping them too
long only in my HDD makes data more vulnerable - computer viruses, often
electricity drops, other possible causes to lose data.
// my 2 cents
You misunderstand dvcs. There is no problem to maintain a centralized copy,
I believe all the well known projects that have adopted dvcs all maintain a
canonical centralized copy.
David Cournapeau
2008-01-04 11:54:13 UTC
Permalink
Post by dmitrey
As for me, I would wait until DVCS became more popular than svn. Jump
often from one VSC to another isn't a good idea, moreover, it's not
clear for now which DVCS will suppress others and became standard (being
installed in many OS by default).
I don't think one will become standard. Git will stay for sure, since it
is used and developers by kernel hackers; it is used by at least two big
open source projects: linux and xorg (as well as many freedesktop
projects). bzr is pushed really hard by Canonical, and I don't think
Canonical will be going away soon. Mercurial is used by Sun for all its
open sourced projects. I don't see why this would be a problem.

On Linux, getting open source softwares is trivial, and windows has
never been distributed with any VCS system :) Only mac os X has svn by
default (if you install the dev tools at least). But as long as binary
installers are available, I don't see that as a big problem either.

Now, the points you raised (concerning the popularity) have direct
consequences on the availability of third party tools, which certainly
is a problem to consider (GUI, etc...). As far as bzr is concerned, I
would say that's the core problem (Gui on windows, integration with trac).
Post by dmitrey
Also, I would prefer (for example my openopt) changes being available 24
hours/day immediately after I have commit them; also, keeping them too
long only in my HDD makes data more vulnerable - computer viruses, often
electricity drops, other possible causes to lose data.
Nothing prevents you from putting the changes on a backup server: for
example, bzr supports the concept of sending any commited changed to a
'bound branch' automatically, to have a more svn-like workflow.

I certainly agree that changing the VCS is a big change, and requires a
lot of thinking, though. I am not suggesting to change for the next week.

cheers,

David
Gael Varoquaux
2008-01-04 12:11:59 UTC
Permalink
Post by David Cournapeau
I certainly agree that changing the VCS is a big change, and requires a
lot of thinking, though. I am not suggesting to change for the next week.
In the mean time, do you want to tell us more about how you use bzr with
svn. This seems like a good transitory option.

Gaël
Neal Becker
2008-01-04 12:17:53 UTC
Permalink
Post by Gael Varoquaux
Post by David Cournapeau
I certainly agree that changing the VCS is a big change, and requires a
lot of thinking, though. I am not suggesting to change for the next week.
In the mean time, do you want to tell us more about how you use bzr with
svn. This seems like a good transitory option.
Gaël
Mercurial has a very powerful extension called 'convert'.
http://www.selenic.com/mercurial/wiki/index.cgi/ConvertExtension
David Cournapeau
2008-01-04 12:21:21 UTC
Permalink
Post by Gael Varoquaux
Post by David Cournapeau
I certainly agree that changing the VCS is a big change, and requires a
lot of thinking, though. I am not suggesting to change for the next week.
In the mean time, do you want to tell us more about how you use bzr with
svn. This seems like a good transitory option.
Once you installed bzr-svn, you can import the whole scikits trunk using
the svn-import command. This will create a shared repository (that is no
working tree is actually put in scikits.bzr: this means that having the
whole repository is quite cheap storage-wise; as an example, the whole
scipy history takes around 68 Mb, that is less than a svn checkout with
the working tree). Once you have the shared repository, you do not care
anymore that it is imported from svn. It is exactly the same worklow as
a native bzr shared repository.

The problems:
- importing is really slow, because you need to get the history per
revision; this is a network-bound operation (for local svn mirrors, on a
macbook, I can import around 15 revisions/second for matplotlib, at
which point it becomes CPU bound). That's why I asked about svn server
informations, to be able to use svnsync (which makes a local mirror of a
svn repository) for thorough experiments on my own.
- because of the above, it would be really bad if many people start
to import directly, because of the burden on the svn server.
- bzr-svn uses a different format than usual bzr. UI wise, it does
not change anything, but this means it is less performant than the
format used since bzr 0.92 (it is an over-simplification, because you
can have a better format).
- It does not work right now with numpy because I made a mistake + a
bzr-svn bug, which should be easily solved, though. It works with scipy
and scikits.

David
Matthieu Brucher
2008-01-04 14:26:52 UTC
Permalink
Post by David Cournapeau
Post by Gael Varoquaux
In the mean time, do you want to tell us more about how you use bzr with
svn. This seems like a good transitory option.
Once you installed bzr-svn, you can import the whole scikits trunk using
the svn-import command.
This works OK for Linux, but for Windows, the packages needed by bzr-svn
(the python wrappers that are in the usual python-subversion package) are in
the Subversion trunk (1.5). So we have to compile them first.
Beside this, I'm starting to use bazaar (in fact it's the successor of arch)
for a small project of mine hosted on launchpad.net, and it works great. As
David stated, the only problem is the UI : on Linux, I'm using mainly the
command line because olive-gtk is buggy and not really user friendly (there
is room for improvement, it has even deprecation warnings because it uses a
pre-0.18 bzr API). For Windows, I tried to use it also, but I saw that there
might be a TortoiseBZR program, but I didn't try it.

Matthieu
--
French PhD student
Website : http://matthieu-brucher.developpez.com/
Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn : http://www.linkedin.com/in/matthieubrucher
Stefan van der Walt
2008-01-04 15:09:50 UTC
Permalink
Hi Matthieu
Post by Matthieu Brucher
Beside this, I'm starting to use bazaar (in fact it's the successor of arch)
for a small project of mine hosted on launchpad.net, and it works great. As
Note that bzr refers to bazaar-ng (new generation), which is not the
same as the bazaar which succeeded arch/tla.

Regards
Stéfan
David Cournapeau
2008-01-04 15:24:33 UTC
Permalink
Post by Matthieu Brucher
Post by David Cournapeau
Post by Gael Varoquaux
In the mean time, do you want to tell us more about how you use bzr with
svn. This seems like a good transitory option.
Once you installed bzr-svn, you can import the whole scikits trunk using
the svn-import command.
This works OK for Linux, but for Windows, the packages needed by bzr-svn
(the python wrappers that are in the usual python-subversion package) are in
the Subversion trunk ( 1.5). So we have to compile them first.
Yes, there is indeed a problem with bzr-svn in that respect on
windows. People are trying to improve the situation, though (at least
one developer of bzr is a windows user, by which I mean he mainly uses
windows, and bzr-svn is one of the top priority for obvious reasons).

But the transition Gael talked about is not about everyone using
bzr-svn, this would be overkill and a waste of ressources. The
solution would be to have a bzr mirror of svn, hosted somewhere on
scipy.org, so that we use bzr, on a bzr repository, e.g only the
mirror system would need bzr-svn.
Post by Matthieu Brucher
Beside this, I'm starting to use bazaar (in fact it's the successor of arch)
for a small project of mine hosted on launchpad.net, and it works great. As
David stated, the only problem is the UI : on Linux, I'm using mainly the
command line because olive-gtk is buggy and not really user friendly (there
is room for improvement, it has even deprecation warnings because it uses a
pre-0.18 bzr API). For Windows, I tried to use it also, but I saw that there
might be a TortoiseBZR program, but I didn't try it.
As a linux user, I don't see the point of a GUI for most tasks related
to bzr :) I sometimes use bzr-gtk, though. On windows, it seems that
qbzr is the most windows-friendly. I don't know the state of
TortoiseBzr, but I will find out soon, since I have to set up a system
to coordinate change at my lab, and I intend to test more thoroughly
trac+bzr+TortoiseBzr.

David
Matthieu Brucher
2008-01-04 15:50:52 UTC
Permalink
Post by David Cournapeau
As a linux user, I don't see the point of a GUI for most tasks related
to bzr :) I sometimes use bzr-gtk, though. On windows, it seems that
qbzr is the most windows-friendly. I don't know the state of
TortoiseBzr, but I will find out soon, since I have to set up a system
to coordinate change at my lab, and I intend to test more thoroughly
trac+bzr+TortoiseBzr.
I tried to test qbzr, but I could find where and how to launch it, and it
seemed less advanced than olive, so I stopped trying :| But as you said it,
a lot can be done in command line (and should be done this way, even for
SVN).

Matthieu
--
French PhD student
Website : http://matthieu-brucher.developpez.com/
Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn : http://www.linkedin.com/in/matthieubrucher
Neal Becker
2008-01-04 11:57:29 UTC
Permalink
There is a mercurial plugin for trac.
David Cournapeau
2008-01-04 11:56:31 UTC
Permalink
Post by Neal Becker
There is a mercurial plugin for trac.
as well as a bzr one. The problem is more related to performance issues
(cheap things in svn are not cheap in DVCS, and vice-versa). For
example, the trac-bzr plugin is really slow for timelines (it takes
almost one second on a local server !); I have not tried the
trac-mercurial one much.

cheers,

David
Ondrej Certik
2008-01-04 12:25:55 UTC
Permalink
Post by David Cournapeau
Post by Neal Becker
There is a mercurial plugin for trac.
as well as a bzr one. The problem is more related to performance issues
(cheap things in svn are not cheap in DVCS, and vice-versa). For
example, the trac-bzr plugin is really slow for timelines (it takes
almost one second on a local server !); I have not tried the
trac-mercurial one much.
We switched from svn to Mercurial in SymPy, I wrote some info here:

http://code.google.com/p/sympy/wiki/Mercurial

But basically, once you try "svn merge" and you go through all the pain
and then try DVCS (I only use mercurial and git), you never want to come back.

Our central repo is here:

http://hg.sympy.org/sympy/

and I can just fully recommend. We converted all our svn history to
it, so now, I frequently
browse the history of sympy (because every clone of the repo has it)
if I need to look at something. I never used that with svn, because
it's painfully slow.

We were basically only deciding between git and Mercurial, but we
chose mercurial, because

* we are python guys and Mercurial is in python+C, very nicely written
and they accept patches (Kirill, one sympy developer, has posted
several already, to implement features he was missing - he used to use
darcs before)
* Sage uses it

Ondrej
David Cournapeau
2008-01-04 12:47:33 UTC
Permalink
Post by Ondrej Certik
Post by David Cournapeau
Post by Neal Becker
There is a mercurial plugin for trac.
as well as a bzr one. The problem is more related to performance issues
(cheap things in svn are not cheap in DVCS, and vice-versa). For
example, the trac-bzr plugin is really slow for timelines (it takes
almost one second on a local server !); I have not tried the
trac-mercurial one much.
http://code.google.com/p/sympy/wiki/Mercurial
But basically, once you try "svn merge" and you go through all the pain
and then try DVCS (I only use mercurial and git), you never want to come back.
Imagine the pain in the other direction, which was my experience :) I
actually did not believe at first that it was so bad, and thought I was
doing something wrong. At least, it certainly convinced me that SVN was
not easier than DVCS.
Post by Ondrej Certik
http://hg.sympy.org/sympy/
and I can just fully recommend. We converted all our svn history to
it, so now, I frequently
browse the history of sympy (because every clone of the repo has it)
if I need to look at something. I never used that with svn, because
it's painfully slow.
I am not familiar with sympy: you are not using trac at all ? Also, how
did you convert the svn history ?

I like the mercurial's way of showing branches and co; bzr does not have
anything like that out of the box (there are separate projects to show
sources; there is also launchpad of course, but since it is not open
source, I do not even consider it for numpy/scipy).

On the other hand, the bzr community is more user-friendly: the tool is
easier to use I think, the graphical tools are more advanced, at least
from my experience.
Post by Ondrej Certik
We were basically only deciding between git and Mercurial, but we
chose mercurial, because
* we are python guys and Mercurial is in python+C, very nicely written
and they accept patches (Kirill, one sympy developer, has posted
several already, to implement features he was missing - he used to use
darcs before)
* Sage uses it
For some time, the big problem of bzr was speed. But bzr accomplished
quite a lot the last year: the first time I used mercurial, the speed
difference was obvious; it is not so true anymore (they 'feel' the same,
basically, but I have not used mercurial extensively, at least compared
to bzr).

So I think it really boils down to the difficulty of the transition, the
availability of third party tools (and also the tool used by other
projects similar to numpy, as you mentionned).

cheers,

David
Ondrej Certik
2008-01-04 14:16:10 UTC
Permalink
Post by David Cournapeau
Imagine the pain in the other direction, which was my experience :) I
actually did not believe at first that it was so bad, and thought I was
doing something wrong. At least, it certainly convinced me that SVN was
not easier than DVCS.
It would made me sick. :)
Post by David Cournapeau
I am not familiar with sympy: you are not using trac at all ? Also, how
We use googlecode:

http://code.google.com/p/sympy/

it works nice for us.
Post by David Cournapeau
did you convert the svn history ?
using the mercurial extension. Kirill submitted some patches, so that
also branches are converted
and tags too.
Post by David Cournapeau
I like the mercurial's way of showing branches and co; bzr does not have
anything like that out of the box (there are separate projects to show
sources; there is also launchpad of course, but since it is not open
source, I do not even consider it for numpy/scipy).
On the other hand, the bzr community is more user-friendly: the tool is
easier to use I think, the graphical tools are more advanced, at least
from my experience.
I never used bzr, so I cannot judge.
Post by David Cournapeau
Post by Ondrej Certik
We were basically only deciding between git and Mercurial, but we
chose mercurial, because
* we are python guys and Mercurial is in python+C, very nicely written
and they accept patches (Kirill, one sympy developer, has posted
several already, to implement features he was missing - he used to use
darcs before)
* Sage uses it
For some time, the big problem of bzr was speed. But bzr accomplished
quite a lot the last year: the first time I used mercurial, the speed
difference was obvious; it is not so true anymore (they 'feel' the same,
basically, but I have not used mercurial extensively, at least compared
to bzr).
So I think it really boils down to the difficulty of the transition, the
availability of third party tools (and also the tool used by other
projects similar to numpy, as you mentionned).
I know that bzr is used by Canonical, but I think socially, you should
choose either
mercurial or git. Those are imho the most widespread DVCS.

As I said, Mercurial being Python+C was very important for us,
so that we can easily fix bugs and implement new functionality in mercurial.

Also the commands of mercurial are very similar to svn.


Ondrej
Stefan van der Walt
2008-01-04 15:22:02 UTC
Permalink
Hi David
Post by David Cournapeau
First things first, happy new year to all !
Happy new year! It's been great so far :)
Post by David Cournapeau
Having recently felt the pain to use subversion merge, I was
wondering about people's feeling on moving away from subversion and
using a better system, ala mercurial or bzr (I will talk about bzr
because that's the one I know the most, but this discussion is really
about using something better than subversion, not that much about bzr).
I think this could be an important step forward, and is somewhat related
to the discusions on scikits and co.
I have a couple of questions, that you may be able to answer more
quickly than what I could, googling for a few hours:

1) Is it easy to setup bzr so that many people have submit-access to
the main-branch, i.e. so that we don't need a central "patch-manager"?

2) With a DRCS, we have to check out the whole repository, instead of
the current sources only. Currently, the history amounts to roughly
70Mb, but that includes files that have been deleted etc. Is there
any way to compact the repository, or to say "let's only go 100
revisions back, and for the rest query the main branch"? I'm just
worried that, some day in the future, a user will need to do an
extremely large checkout to hack on a fairly small codebase.

3) Which of bzr, mercurial and git has the best merging capabilities?
I heard a while back that git does not try to be too clever about it,
while the others do -- I wonder how that worked out.

I am very fond of the distributed model, and use it for my own
development, too. Regardless, I would still like to hear more from
people who have used it on a larger scale.

Regards
Stéfan
David Cournapeau
2008-01-04 15:45:49 UTC
Permalink
Post by Stefan van der Walt
Hi David
Post by David Cournapeau
First things first, happy new year to all !
Happy new year! It's been great so far :)
Post by David Cournapeau
Having recently felt the pain to use subversion merge, I was
wondering about people's feeling on moving away from subversion and
using a better system, ala mercurial or bzr (I will talk about bzr
because that's the one I know the most, but this discussion is really
about using something better than subversion, not that much about bzr).
I think this could be an important step forward, and is somewhat related
to the discusions on scikits and co.
I have a couple of questions, that you may be able to answer more
1) Is it easy to setup bzr so that many people have submit-access to
the main-branch, i.e. so that we don't need a central "patch-manager"?
If you read the linus' email I linked, he says a word about that, as
well as several related problems (having several "main branches", and
problems related to access/regression testing/build bot). I don't know
much about it, but bzr's team uses a software which tracks merge
requests. The trunk is thus read only (in the sense that nobody pushes
change to it), and pull changes from the software tracking merge
requests.

https://launchpad.net/pqm

My understanding, but I have never used that feature, is that
mercurial has something similar.
Post by Stefan van der Walt
2) With a DRCS, we have to check out the whole repository, instead of
the current sources only. Currently, the history amounts to roughly
70Mb, but that includes files that have been deleted etc. Is there
any way to compact the repository, or to say "let's only go 100
revisions back, and for the rest query the main branch"? I'm just
worried that, some day in the future, a user will need to do an
extremely large checkout to hack on a fairly small codebase.
Git has this feature, bzr not yet. I don't know about mercurial. In
itself, it is not so much a problem because the amount of data is not
so big compared to a svn checkout. As I mentionned previously, for
scikits and scipy import, the whole history imported un bzr is smaller
than a svn checkout ! It depends a lot on the codebase details, but
again, my understanding is that the repository formats used by DVCS
makes it quite efficient to get the whole history.

This is the reason why I linked the Keith Richard blog: in his
discussion about why repository format matters, there is the story of
the 2.7 Gb mozilla CVS repository going to 8.2 Gb when imported in
svn, going back to 450 Mb when imported with git (compared to a 350 Mb
checkout !).
Post by Stefan van der Walt
3) Which of bzr, mercurial and git has the best merging capabilities?
I heard a while back that git does not try to be too clever about it,
while the others do -- I wonder how that worked out.
I don't know. Mercurial historically used an external merge tool, I
think. I can just say that most of the time, bzr merge works without
any problem, whereas svn never works. I basically gave up on using
merge with svn: for my scons related work, I just checked out from
svn, created a bzr repository from it, and made my changes inside bzr,
because I was fed up with svnmerge failing every single time I wanted
to pull merges from other branches. Maybe I am too stupide, but I
don't see how anyone would want to use the merge command from svn.

cheers,

David
Chris Barker
2008-01-05 07:21:19 UTC
Permalink
hmmm. Everyone posting so far seems to be positive on this idea, but I'm
not so sure. A few thoughts:

1) change is bad. It may be worth it, but this decision needs to be made
very differently than if we were starting from scratch.

2) apparently svn merge sucks compared to other merge technology. svn
(and cvs before it) is the only system I'm used, and yes, merging is
painful, but I have to say that it appeared to be painful because it's a
very hard problem. Can anyone comment on why these other systems seem so
much better? Does it have anything to do with Centralized vs.
Distributed at all?

3) I read Linus' post -- he's quite articulate. However, it seems that
most of his arguments really applied primarily to large projects --
where there really will be a lot of different "central" versions. This
is very, very, important to the Linux kernel, and probably good for kde,
but scipy is a monstrously smaller community. And it's not a question of
number of devs -- but rather number of versions.

This makes me thing it really comes down to a better merge -- is there a
way to address that problem with svn? maybe the svnmerge.py that Russel
suggested?

4) SVN is very, very, popular. Lots of folks use it, they use it on all
common platforms, and there are tons of clients for it. I work with a
bunch of folks that really don't like a command line (for programmers, I
think that's just plain weird, but there you go). I could never sell a
VCS that didn't have a decent GUI client on Windows and OS-X.
Post by Charles R Harris
Sometimes it is the little niggling things that matter, in this case
line breaks. Hg (and probably bzr), store everything as binary, so if
someone uses an editor that breaks line with CR or LF+CR instead of the
good 'ol unixy LF, there might be a lot whitespace updates coming in to
the repository.
Good point. In fact, line break translation was one of the features I
loved about cvs -- too bad svn doesn't do it by default, but it can be
made to. This is definitely one of the niggling things that matter!

-Chris
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Robert Kern
2008-01-05 07:38:01 UTC
Permalink
Post by Chris Barker
hmmm. Everyone posting so far seems to be positive on this idea, but I'm
1) change is bad. It may be worth it, but this decision needs to be made
very differently than if we were starting from scratch.
2) apparently svn merge sucks compared to other merge technology. svn
(and cvs before it) is the only system I'm used, and yes, merging is
painful, but I have to say that it appeared to be painful because it's a
very hard problem. Can anyone comment on why these other systems seem so
much better? Does it have anything to do with Centralized vs.
Distributed at all?
Tangentially, yes. DVCSes need to keep track of more information in order to be
distributed. That information is extremely useful for managing merges properly.
Centralized systems could track this information, but they don't *need* to in
order to be functional, so they mostly haven't, yet.

For each revision, the DVCS knows what revisions it derives from. SVN does not
keep this information. SVN primarily just knows the text diffs from revision to
revision. In particular, if I have a long-lived branch, I am going to merge in
changes from the trunk while I'm working on it. When I go to merge the branch
back into the trunk, I need to know which trunk-revisions I've already merged
into the branch. SVN does not track this information. Tools like svnmerge.py
track some of this information at the expense of some added clumsiness.

It's worth noting that SVN 1.5 will be tracking such information. But that
release is a long ways off.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
David Cournapeau
2008-01-06 04:56:29 UTC
Permalink
Post by Robert Kern
Post by Chris Barker
hmmm. Everyone posting so far seems to be positive on this idea, but I'm
1) change is bad. It may be worth it, but this decision needs to be made
very differently than if we were starting from scratch.
2) apparently svn merge sucks compared to other merge technology. svn
(and cvs before it) is the only system I'm used, and yes, merging is
painful, but I have to say that it appeared to be painful because it's a
very hard problem. Can anyone comment on why these other systems seem so
much better? Does it have anything to do with Centralized vs.
Distributed at all?
Tangentially, yes. DVCSes need to keep track of more information in order to be
distributed. That information is extremely useful for managing merges properly.
Centralized systems could track this information, but they don't *need* to in
order to be functional, so they mostly haven't, yet.
For each revision, the DVCS knows what revisions it derives from. SVN does not
keep this information. SVN primarily just knows the text diffs from revision to
revision. In particular, if I have a long-lived branch, I am going to merge in
changes from the trunk while I'm working on it. When I go to merge the branch
back into the trunk, I need to know which trunk-revisions I've already merged
into the branch. SVN does not track this information. Tools like svnmerge.py
track some of this information at the expense of some added clumsiness.
It's worth noting that SVN 1.5 will be tracking such information. But that
release is a long ways off.
My understanding, but I do not follow svn much, is that in 1.5, you
will only get what svnmerge gives you today.

David
Robert Kern
2008-01-06 05:34:59 UTC
Permalink
Post by David Cournapeau
Post by Robert Kern
It's worth noting that SVN 1.5 will be tracking such information. But that
release is a long ways off.
My understanding, but I do not follow svn much, is that in 1.5, you
will only get what svnmerge gives you today.
I suspect that much of the remaining fragility when using svnmerge is linked to
the fact that it must exist outside of the core. It needs to manage its
information through SVN properties which are themselves revision-controlled and
occasionally need merging themselves if there are multiple branches being
tracked. If this information were integrated into the core of SVN itself, I
conjecture that as far as merging is concerned, SVN 1.5 will be mostly on par
with the average DVCS.

For a brief rundown of the differences of 1.5's merge tracking and svnmerge:

http://blogs.open.collab.net/svn/2007/10/subversion-15-m.html
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
David Cournapeau
2008-01-06 04:54:26 UTC
Permalink
Post by Chris Barker
hmmm. Everyone posting so far seems to be positive on this idea, but I'm
1) change is bad. It may be worth it, but this decision needs to be made
very differently than if we were starting from scratch.
Change for the sake of change is bad. I thought I highlighted in my
email that the difficult point was how to make the change (transition,
importing the history, etc...), but instead, it quickly slipped to
using mercurial. I would have prefered to see what people thought was
important on how to proceed, but we all prefer to speak about which
tool to use instead :)
Post by Chris Barker
2) apparently svn merge sucks compared to other merge technology. svn
(and cvs before it) is the only system I'm used, and yes, merging is
painful, but I have to say that it appeared to be painful because it's a
very hard problem. Can anyone comment on why these other systems seem so
much better? Does it have anything to do with Centralized vs.
Distributed at all?
Merge is a hard problem, but DVCS have to solve it to be of any use.
Post by Chris Barker
3) I read Linus' post -- he's quite articulate. However, it seems that
most of his arguments really applied primarily to large projects --
where there really will be a lot of different "central" versions. This
is very, very, important to the Linux kernel, and probably good for kde,
but scipy is a monstrously smaller community. And it's not a question of
number of devs -- but rather number of versions.
This makes me thing it really comes down to a better merge -- is there a
way to address that problem with svn? maybe the svnmerge.py that Russel
suggested?
svnmerge just does not cut it, when I was saying that merge in svn
does not work, I was not even considering basic svn merge, but
svnmerge. svnmerge does not change much: merge still fails more often
than not, and you have to do a lot of manual things. In DVCS, merge is
one command, you do not need to initialize anything when you start.

But DVCS is much more than better merge. And has nothing to do with
the size of the project.As I said, the whole concept of sandbox,
trying new things, is made harder by using svn; it really goes in the
way, instead of helping us.
Post by Chris Barker
4) SVN is very, very, popular. Lots of folks use it, they use it on all
common platforms, and there are tons of clients for it. I work with a
bunch of folks that really don't like a command line (for programmers, I
think that's just plain weird, but there you go). I could never sell a
VCS that didn't have a decent GUI client on Windows and OS-X.
I don't understand this argument: do your co-workers use scipy now but
would not if the code source would be kept under a VCS which has no
GUI ? scipy and python are fundamentally command line tools, and you
cannot contribute to scipy without using the command line. We do not
require python, distutils to have a gui ?

cheers,

David
Christopher Barker
2008-01-06 07:11:11 UTC
Permalink
Post by David Cournapeau
Post by Chris Barker
4) SVN is very, very, popular. Lots of folks use it, they use it on all
common platforms, and there are tons of clients for it. I work with a
bunch of folks that really don't like a command line (for programmers, I
think that's just plain weird, but there you go). I could never sell a
VCS that didn't have a decent GUI client on Windows and OS-X.
I don't understand this argument: do your co-workers use scipy now but
would not if the code source would be kept under a VCS which has no
GUI ?
Sorry, that wasn't the point. My co-workers (and most folks) use
numpy/scipy by downloading binaries, or maybe tarballs, so what VCS is
used is, indeed, pretty irrelevant to most users.

Would good multi-platform GUI support of the VCS make for more
contributers to numpy/scipy? I don't know. At least at the numpy level
the kinds of folks that are likely to contribute are probably
command-line savvy, etc (though the command line on Windows really,
really sucks!)

My point was that there is a benefit to using a widely known and used
tool that lots of folks are using for their own projects as well. I use
SVN every day, and I'm far more likely to go get and look at SVN head of
some open source project because of that. I'm kind of dreading that I'll
soon need to figure out svn, mercurial, and who knows what other VCSs in
order to keep up with the various projects I like to keep up with.

The discussion here has been compelling, however. I see these benefits:

1) Better merge. svn merge does suck, so that would be great. It sounds
like it may well get better, but no time soon.

2) Easier to to create sub-versions of the main tree -- experimental
projects that groups of folks can work on. I started out, from reading
Linus' note, thinking that that was really only beneficial for
kernel-scale projects, but I now see that it could be nice for much
smaller scale projects as well. I do still think scipy could probably
support the few of those that will really happen with svn branches, but
then we're back to that merge problem again!

-Chris
--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

***@noaa.gov
Stefan van der Walt
2008-01-04 15:53:55 UTC
Permalink
Hi David
Post by David Cournapeau
Having recently felt the pain to use subversion merge, I was
[...]

Also take a look at

https://launchpad.net/pqm

I don't know why we haven't set up a hook like this a long time ago in
SVN, it just makes so much sense -- no checkins without the unit tests
passing!

Regards
Stéfan
Charles R Harris
2008-01-04 16:30:00 UTC
Permalink
Post by David Cournapeau
Hi,
First things first, happy new year to all !
Having recently felt the pain to use subversion merge, I was
wondering about people's feeling on moving away from subversion and
using a better system, ala mercurial or bzr (I will talk about bzr
because that's the one I know the most, but this discussion is really
about using something better than subversion, not that much about bzr).
I think this could be an important step forward, and is somewhat related
to the discusions on scikits and co.
As some of you are certainly aware, there has been a recent trend
towards so called Distributed Version Control Systems (DVCS). I won't go
into the details, because it varies from system to system, and I am in
no position to explain technical details. But for people who are
wondering, here is a small description of DVCS, and why I think this can
be a significant step forward for numpy/scipy. You can skip it if you
know about them
I like Mercurial and use it a lot, but I'm not convinced we have enough
developers and code to justify the pain of changing the VCS at this time.
SVN generally works well and has good support on Windows through tortoise.
Mercurial also has tortoise support these days, but I haven't ever used it
and can't comment on it. In fact, I've never even used Mercurial on windows,
perhaps someone can relate their experiences with it. I suppose a shift
might be justified if there is a lot of branch merging and such in our
future. Anyone know what folks are working in branches?

Chuck
David Cournapeau
2008-01-04 16:56:49 UTC
Permalink
Post by Charles R Harris
I like Mercurial and use it a lot, but I'm not convinced we have enough
developers and code to justify the pain of changing the VCS at this time.
I don't understand the number of developers argument: on most of the
projects I am working on, I am the only developer, and I much prefer
bzr to svn, although for reasons which are not really relevant to a
project like numpy/scipy.
Post by Charles R Harris
SVN g!enerally works well and has good support on Windows through tortoise.
That's where I don't agree: I don't think svn works really well. As
long as you use it as an history backup, it works ok, but that's it.
The non functional merge makes branching almost useless, reverting
back in time is extremely cumbersome,
Post by Charles R Harris
Mercurial also has tortoise support these days, but I haven't ever used it
and can't comment on it. In fact, I've never even used Mercurial on windows,
perhaps someone can relate their experiences with it. I suppose a shift
might be justified if there is a lot of branch merging and such in our
future. Anyone know what folks are working in branches?
Well, I started this discussion because of the scikits discussion. A
typical use of branches is for sandboxes: it makes a lot of sense to
use branches instead of sandboxes. Also, when branching actually
works, you really start using many branches: I do it all the time on
all my projects, and I am the only developer on most of them. It means
that you commit smaller changes (because comitting does not mean
makeing your changes available to the trunk), and instead of
submitting one big changeset, you actually submit a serie of small
changes. This really makes a big difference IMHO. Also, things like
log, blame are actually usable, since they are much faster on DVCS.

For something like scipy (less for numpy), where many people develop
different things, I think it really makes a lot of sense to use a
DVCS. I actually think scipy to be more distributed in nature than
many open source projects (again, this is much less true for numpy,
IMHO).

David
Ondrej Certik
2008-01-04 18:45:06 UTC
Permalink
Post by David Cournapeau
Post by Charles R Harris
I like Mercurial and use it a lot, but I'm not convinced we have enough
developers and code to justify the pain of changing the VCS at this time.
I don't understand the number of developers argument: on most of the
projects I am working on, I am the only developer, and I much prefer
bzr to svn, although for reasons which are not really relevant to a
project like numpy/scipy.
Post by Charles R Harris
SVN g!enerally works well and has good support on Windows through tortoise.
That's where I don't agree: I don't think svn works really well. As
long as you use it as an history backup, it works ok, but that's it.
The non functional merge makes branching almost useless, reverting
back in time is extremely cumbersome,
Post by Charles R Harris
Mercurial also has tortoise support these days, but I haven't ever used it
and can't comment on it. In fact, I've never even used Mercurial on windows,
perhaps someone can relate their experiences with it. I suppose a shift
might be justified if there is a lot of branch merging and such in our
future. Anyone know what folks are working in branches?
Well, I started this discussion because of the scikits discussion. A
typical use of branches is for sandboxes: it makes a lot of sense to
use branches instead of sandboxes. Also, when branching actually
works, you really start using many branches: I do it all the time on
all my projects, and I am the only developer on most of them. It means
that you commit smaller changes (because comitting does not mean
makeing your changes available to the trunk), and instead of
submitting one big changeset, you actually submit a serie of small
changes. This really makes a big difference IMHO. Also, things like
log, blame are actually usable, since they are much faster on DVCS.
For something like scipy (less for numpy), where many people develop
different things, I think it really makes a lot of sense to use a
DVCS. I actually think scipy to be more distributed in nature than
many open source projects (again, this is much less true for numpy,
IMHO).
David is 100% right, I fully support this. I would be just repeating
what he says.

Charles actually said another point in favor of Mercurial - it works
on Windows (at least people say so), while git not that much (at least
people say so). I never use Windows myself, so I don't know.

Subversion sucks not only in the merge thing, but especially when
providing patches. Because most of the people don't have access to the
repo,
and not being able to commit locally (=work incrementally), is just
bad. So, I use mercurial even when providing patches for svn.

Ondrej
Fernando Perez
2008-01-04 18:58:07 UTC
Permalink
Post by Ondrej Certik
David is 100% right, I fully support this. I would be just repeating
what he says.
Charles actually said another point in favor of Mercurial - it works
on Windows (at least people say so), while git not that much (at least
people say so). I never use Windows myself, so I don't know.
FWIW, we (ipython) have also gone around a few times on this, and
would like (at some point to switch to a DVCS as well). I think the
benefits are many, so I won't rehash it here, others have done it
well.

One point that hasn't been mentioned is how useful a DVCS is when
doing dev sprints: people can work and sync off their own private
repos without touching SVN, with lots and lots of cross-developer
information flow that doesn't affect the main server or even other
devs. In fact, when doing sprints I always end up making a local hg
repo just for that purpose, and then committing back to svn upstream
at the end of the sprint.

As much as git looks really good, the Windows issue is, I think, a
deal killer: last I checked support was poor, and I think our core dev
tools should be truly, 100% cross-platform without any discrimination
(kinda-sorta-works on platform X isn't enough).

My vote so far is for hg, for performance reasons but also partly
because sage and sympy already use it, two projects I'm likely to
interact a lot with and that are squarely in line with the
ipython/numpy/scipy/matplotlib world. Since they went first and made
the choice, I'm happy to let that be a factor in my decision. I'd
rather use a tool that others in the same community are also using,
especially when the choice is a sound one on technical merit alone.

Just my 1e-2...

f
David Cournapeau
2008-01-04 19:21:46 UTC
Permalink
Post by Fernando Perez
Post by Ondrej Certik
David is 100% right, I fully support this. I would be just repeating
what he says.
Charles actually said another point in favor of Mercurial - it works
on Windows (at least people say so), while git not that much (at least
people say so). I never use Windows myself, so I don't know.
FWIW, we (ipython) have also gone around a few times on this, and
would like (at some point to switch to a DVCS as well). I think the
benefits are many, so I won't rehash it here, others have done it
well.
One point that hasn't been mentioned is how useful a DVCS is when
doing dev sprints: people can work and sync off their own private
repos without touching SVN, with lots and lots of cross-developer
information flow that doesn't affect the main server or even other
devs. In fact, when doing sprints I always end up making a local hg
repo just for that purpose, and then committing back to svn upstream
at the end of the sprint.
As much as git looks really good, the Windows issue is, I think, a
deal killer: last I checked support was poor, and I think our core dev
tools should be truly, 100% cross-platform without any discrimination
(kinda-sorta-works on platform X isn't enough).
I agree. This is not enough, but for me, the following are non negotiable:
- the tool must work on unix, mac os X and windows
- the tool must be open source.

I guess everyone agrees on those points anyway.
Post by Fernando Perez
My vote so far is for hg, for performance reasons but also partly
because sage and sympy already use it, two projects I'm likely to
interact a lot with and that are squarely in line with the
ipython/numpy/scipy/matplotlib world. Since they went first and made
the choice, I'm happy to let that be a factor in my decision. I'd
rather use a tool that others in the same community are also using,
especially when the choice is a sound one on technical merit alone.
I understand the "sumpy uses it" reason, it is definitely a factor.
But I would rather have a more thorough study on the merits of each
system. For example, being a user of bzr for a year and a half now, I
think I have a pretty good idea on how it works, and its advantages.
We could then decide on a set of attributes to compare, and people who
knows about one tool could then tell about it.

Performances-wise, hg and bzr really are comparable nowadays for
common, local operations. I don't think it is a relevant parameter for
the hg vs bzr choice anymor, specially for scipy/numpy which are small
projects (I have bzr imports of scipy and scikits, so I can give some
numbers if you need them). Third party tools, special abilities (svn
import, storage efficiency, special commands, etc...) are more
important I think

David
Charles R Harris
2008-01-04 19:50:31 UTC
Permalink
Post by David Cournapeau
Post by Fernando Perez
Post by Ondrej Certik
David is 100% right, I fully support this. I would be just repeating
what he says.
Charles actually said another point in favor of Mercurial - it works
on Windows (at least people say so), while git not that much (at least
people say so). I never use Windows myself, so I don't know.
FWIW, we (ipython) have also gone around a few times on this, and
would like (at some point to switch to a DVCS as well). I think the
benefits are many, so I won't rehash it here, others have done it
well.
One point that hasn't been mentioned is how useful a DVCS is when
doing dev sprints: people can work and sync off their own private
repos without touching SVN, with lots and lots of cross-developer
information flow that doesn't affect the main server or even other
devs. In fact, when doing sprints I always end up making a local hg
repo just for that purpose, and then committing back to svn upstream
at the end of the sprint.
As much as git looks really good, the Windows issue is, I think, a
deal killer: last I checked support was poor, and I think our core dev
tools should be truly, 100% cross-platform without any discrimination
(kinda-sorta-works on platform X isn't enough).
- the tool must work on unix, mac os X and windows
- the tool must be open source.
I guess everyone agrees on those points anyway.
Post by Fernando Perez
My vote so far is for hg, for performance reasons but also partly
because sage and sympy already use it, two projects I'm likely to
interact a lot with and that are squarely in line with the
ipython/numpy/scipy/matplotlib world. Since they went first and made
the choice, I'm happy to let that be a factor in my decision. I'd
rather use a tool that others in the same community are also using,
especially when the choice is a sound one on technical merit alone.
I understand the "sumpy uses it" reason, it is definitely a factor.
But I would rather have a more thorough study on the merits of each
system. For example, being a user of bzr for a year and a half now, I
think I have a pretty good idea on how it works, and its advantages.
We could then decide on a set of attributes to compare, and people who
knows about one tool could then tell about it.
At this point, it might be more efficient to ask if anyone has objections or
knows of any problems. I suspect that both hg and bzr probably do most of
what is needed. My own preference is hg because I have several years
experience with it, it has a long history with trac, and it is in pretty
widespread use.

Chuck
Fernando Perez
2008-01-04 19:52:04 UTC
Permalink
Post by David Cournapeau
I understand the "sumpy uses it" reason, it is definitely a factor.
But I would rather have a more thorough study on the merits of each
system. For example, being a user of bzr for a year and a half now, I
think I have a pretty good idea on how it works, and its advantages.
We could then decide on a set of attributes to compare, and people who
knows about one tool could then tell about it.
Performances-wise, hg and bzr really are comparable nowadays for
common, local operations. I don't think it is a relevant parameter for
the hg vs bzr choice anymor, specially for scipy/numpy which are small
projects (I have bzr imports of scipy and scikits, so I can give some
numbers if you need them). Third party tools, special abilities (svn
import, storage efficiency, special commands, etc...) are more
important I think
Absolutely. That's why I said above "when the choice is a sound one
on technical merit alone". At the time (for sage/sympy) the bzr/hg
choice was unmistakably in favor of hg. Things might be different
today.

Incidentally, the emacs guys seem to be worrying about the same thing:

http://thread.gmane.org/gmane.emacs.devel/85893

If they actually do the work of comparing tools, that work may be
useful for us. I'm pretty sure that any tool that can handle the
entire history of emacs can chew on numpy/scipy/ipython/matplotlib
*combined* for breakfast.

Cheers,

f
Charles R Harris
2008-01-04 20:36:09 UTC
Permalink
Post by Fernando Perez
Post by David Cournapeau
I understand the "sumpy uses it" reason, it is definitely a factor.
But I would rather have a more thorough study on the merits of each
system. For example, being a user of bzr for a year and a half now, I
think I have a pretty good idea on how it works, and its advantages.
We could then decide on a set of attributes to compare, and people who
knows about one tool could then tell about it.
Performances-wise, hg and bzr really are comparable nowadays for
common, local operations. I don't think it is a relevant parameter for
the hg vs bzr choice anymor, specially for scipy/numpy which are small
projects (I have bzr imports of scipy and scikits, so I can give some
numbers if you need them). Third party tools, special abilities (svn
import, storage efficiency, special commands, etc...) are more
important I think
Absolutely. That's why I said above "when the choice is a sound one
on technical merit alone". At the time (for sage/sympy) the bzr/hg
choice was unmistakably in favor of hg. Things might be different
today.
http://thread.gmane.org/gmane.emacs.devel/85893
If they actually do the work of comparing tools, that work may be
useful for us. I'm pretty sure that any tool that can handle the
entire history of emacs can chew on numpy/scipy/ipython/matplotlib
*combined* for breakfast.
A quick google for benchmarks show that a year ago, hg was a bit faster and
generated smaller repositories than bzr, but I don't think the difference is
enough to matter. Git is 10-20 times faster than either for a lot of things,
but Linus was definitely focused on speed, which is easy to understand if
you look at the churn in the kernel. Anyway, I suspect that, technically,
both bzr and hg are suitable choices. I'm not sure esr correct that it is
unlikely that both are going to last long term, bazaar (the ancestor of bzr)
is used for Ubuntu. But the two are similar and fill the same niche, so I
expect that one or the other will become dominant in the wild. And hg seems
to have the advantage of a head start and not being as tightly tied to
Linux.

Chuck
David Cournapeau
2008-01-04 21:05:45 UTC
Permalink
Post by Charles R Harris
A quick google for benchmarks show that a year ago, hg was a bit faster and
generated smaller repositories than bzr, but I don't think the difference is
enough to matter.
Forget a year ago, because as far as bzr is concerned, they got much
faster (several times faster for common operations like
commit/branch/log/merge).
Post by Charles R Harris
but Linus was definitely focused on speed, which is easy to understand if
you look at the churn in the kernel. Anyway, I suspect that, technically,
both bzr and hg are suitable choices. I'm not sure esr correct that it is
unlikely that both are going to last long term, bazaar (the ancestor of bzr)
is used for Ubuntu. But the two are similar and fill the same niche, so I
expect that one or the other will become dominant in the wild. And hg seems
to have the advantage of a head start and not being as tightly tied to
Linux.
bzr is not tied to linux. They always have win32 binaries, TortoiseBzr
has a longer history than the mercurial one, and as said previously,
one developer of bzr at least is mainly a windows user. I don't want
to sound like I defend bzr, because honestly, I don't care about which
one is used, but so far, the arguments I heard against bzr do not
reflect my experience at all.

One thing that bzr tries hard is the general UI, and the explicit
support for several workflows (with moderately advanced concepts such
as shared repositories, bound branches: for example, with a branch A
bound to branch B, a commit is first pushed on branch B, and if
successfull, applied to A; for centralized worflows, this makes things
easier). I honestly do not know if this is significant. bzr claims its
merge capability is better: I do not know if this is true, or if that
matters at all.

I would rather discuss those than "bzr is tied to linux", because I
don't think they are based on accurate or recent informations. As I
said, I have bzr imports of scipy and scikits, and I could easily to
the same for hg, make them available for everybody to play with.

David
Ondrej Certik
2008-01-04 21:17:33 UTC
Permalink
Post by David Cournapeau
Post by Charles R Harris
A quick google for benchmarks show that a year ago, hg was a bit faster and
generated smaller repositories than bzr, but I don't think the difference is
enough to matter.
Forget a year ago, because as far as bzr is concerned, they got much
faster (several times faster for common operations like
commit/branch/log/merge).
Post by Charles R Harris
but Linus was definitely focused on speed, which is easy to understand if
you look at the churn in the kernel. Anyway, I suspect that, technically,
both bzr and hg are suitable choices. I'm not sure esr correct that it is
unlikely that both are going to last long term, bazaar (the ancestor of bzr)
is used for Ubuntu. But the two are similar and fill the same niche, so I
expect that one or the other will become dominant in the wild. And hg seems
to have the advantage of a head start and not being as tightly tied to
Linux.
bzr is not tied to linux. They always have win32 binaries, TortoiseBzr
has a longer history than the mercurial one, and as said previously,
one developer of bzr at least is mainly a windows user. I don't want
to sound like I defend bzr, because honestly, I don't care about which
one is used, but so far, the arguments I heard against bzr do not
reflect my experience at all.
One thing that bzr tries hard is the general UI, and the explicit
support for several workflows (with moderately advanced concepts such
as shared repositories, bound branches: for example, with a branch A
bound to branch B, a commit is first pushed on branch B, and if
successfull, applied to A; for centralized worflows, this makes things
easier). I honestly do not know if this is significant. bzr claims its
merge capability is better: I do not know if this is true, or if that
matters at all.
I would rather discuss those than "bzr is tied to linux", because I
don't think they are based on accurate or recent informations. As I
said, I have bzr imports of scipy and scikits, and I could easily to
the same for hg, make them available for everybody to play with.
Instead of devising our own arguments, read this:

http://bazaar-vcs.org/BzrVsHg

and the mercurial response therein.

Ondrej
David Cournapeau
2008-01-04 21:35:29 UTC
Permalink
Post by Ondrej Certik
Post by David Cournapeau
Post by Charles R Harris
A quick google for benchmarks show that a year ago, hg was a bit faster and
generated smaller repositories than bzr, but I don't think the difference is
enough to matter.
Forget a year ago, because as far as bzr is concerned, they got much
faster (several times faster for common operations like
commit/branch/log/merge).
Post by Charles R Harris
but Linus was definitely focused on speed, which is easy to understand if
you look at the churn in the kernel. Anyway, I suspect that, technically,
both bzr and hg are suitable choices. I'm not sure esr correct that it is
unlikely that both are going to last long term, bazaar (the ancestor of bzr)
is used for Ubuntu. But the two are similar and fill the same niche, so I
expect that one or the other will become dominant in the wild. And hg seems
to have the advantage of a head start and not being as tightly tied to
Linux.
bzr is not tied to linux. They always have win32 binaries, TortoiseBzr
has a longer history than the mercurial one, and as said previously,
one developer of bzr at least is mainly a windows user. I don't want
to sound like I defend bzr, because honestly, I don't care about which
one is used, but so far, the arguments I heard against bzr do not
reflect my experience at all.
One thing that bzr tries hard is the general UI, and the explicit
support for several workflows (with moderately advanced concepts such
as shared repositories, bound branches: for example, with a branch A
bound to branch B, a commit is first pushed on branch B, and if
successfull, applied to A; for centralized worflows, this makes things
easier). I honestly do not know if this is significant. bzr claims its
merge capability is better: I do not know if this is true, or if that
matters at all.
I would rather discuss those than "bzr is tied to linux", because I
don't think they are based on accurate or recent informations. As I
said, I have bzr imports of scipy and scikits, and I could easily to
the same for hg, make them available for everybody to play with.
http://bazaar-vcs.org/BzrVsHg
and the mercurial response therein.
I personally do not find those pages (both mercurial and bzr) really
informative. Bzr page sounds too much like advertisement, and the
answer from mercurial's team, logically, sounds like an agressive
defense.

David
Post by Ondrej Certik
Ondrej
_______________________________________________
Numpy-discussion mailing list
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris
2008-01-04 21:41:56 UTC
Permalink
<snip>
http://bazaar-vcs.org/BzrVsHg
and the mercurial response therein.
I saw that, but thought it is more marketing than technical. Turned me off,
actually, last thing I want is someone selling me toothpaste recommended by
nine out of ten dentists.

Chuck
David Cournapeau
2008-01-04 22:09:20 UTC
Permalink
Post by Charles R Harris
<snip>
http://bazaar-vcs.org/BzrVsHg
and the mercurial response therein.
I saw that, but thought it is more marketing than technical. Turned me off,
actually, last thing I want is someone selling me toothpaste recommended by
nine out of ten dentists.
I agree. I find those pages to be really bad, actually. To have better
informations, you should get into the mailing list of the respective
projects.

David
Matthew Brett
2008-01-04 22:15:33 UTC
Permalink
Post by David Cournapeau
I agree. I find those pages to be really bad, actually. To have better
informations, you should get into the mailing list of the respective
projects.
Just to extend this holiday special:

I found the mozilla DVCS discussion informative:

http://weblogs.mozillazine.org/preed/2007/04/version_control_system_shootou_1.html

(but graphically embarrassing)

Matthew
David Cournapeau
2008-01-04 22:25:14 UTC
Permalink
Post by Matthew Brett
Post by David Cournapeau
I agree. I find those pages to be really bad, actually. To have better
informations, you should get into the mailing list of the respective
projects.
http://weblogs.mozillazine.org/preed/2007/04/version_control_system_shootou_1.html
(but graphically embarrassing)
The open solaris project documented their choice, too:

http://www.opensolaris.org/os/community/tools/scm/history/

Contrary to mozilla, solaris is using hg as the main VCS. Again, a
thing to keep in mind with the mozilla thing is that bzr improved a
lot speed-wise, and that mozilla is several order of magnitudes bigger
than scipy will ever be.

To test the svn import by hg, svnsync seems to be necessary: I don't
go anywhere by using hg convert directly on scipy repository, whereas
bzr-svn does.

cheers,

David
Post by Matthew Brett
Matthew
_______________________________________________
Numpy-discussion mailing list
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Rafael Villar Burke
2008-01-06 16:49:34 UTC
Permalink
Post by David Cournapeau
http://www.opensolaris.org/os/community/tools/scm/history/
Contrary to mozilla, solaris is using hg as the main VCS.
Mozilla will be using mercurial (hg) too, but decided to do the full transition
after the next release.

AFAICT, they have fully synched copies of their code in a mercurial repository.

Regards,

Rafael
David Cournapeau
2008-01-06 17:22:46 UTC
Permalink
Post by Rafael Villar Burke
Post by David Cournapeau
http://www.opensolaris.org/os/community/tools/scm/history/
Contrary to mozilla, solaris is using hg as the main VCS.
Mozilla will be using mercurial (hg) too, but decided to do the full transition
after the next release.
AFAICT, they have fully synched copies of their code in a mercurial repository.
Yes. When I said that open solaris was using hg as their main VCS, I
meant today. Mozilla does not use yet mercurial; note that when
solaris (and mozilla ?) made their choice, bzr had problems with big
code size, which is less of a problem now (bzr heavily focused on
performances in the last 6 months, and it showed). I have imported
scikits and scipy in both mercurial and bzr, and for some operations,
bzr is now faster (in my limited experience).

To be frank, I did not realize that mercurial was that popular (which
makes it more of an argument than I initially thought: I assumed -
wrongly it seems - that both had a similar user-base)

cheers,

David
Eric Firing
2008-01-06 18:11:02 UTC
Permalink
David Cournapeau wrote:
[...]
Post by David Cournapeau
To be frank, I did not realize that mercurial was that popular (which
makes it more of an argument than I initially thought: I assumed -
wrongly it seems - that both had a similar user-base)
David,

One reason that they apparently do not is that mercurial has been more
stable and usable for a longer period. It was more than two years ago
that I decided to begin using a VCS for local projects. I surveyed the
field, concluded I wanted a DVCS, and gradually got the sense that
mercurial was already usable without too much risk of ending up stranded
by major design changes or death of the project. Bzr was in the early
development stages, breaking away from the original bazaar which was
being abandoned, so it was simply not a reasonable choice at the time.

Eric
Eric Firing
2008-01-04 21:31:14 UTC
Permalink
I have been using mercurial for some time now. I just discovered that
the introductory documentation has been improved and consolidated in an
online book-in-progress: http://hgbook.red-bean.com/hgbook.html

Eric
Post by David Cournapeau
Post by Charles R Harris
A quick google for benchmarks show that a year ago, hg was a bit faster and
generated smaller repositories than bzr, but I don't think the difference is
enough to matter.
Forget a year ago, because as far as bzr is concerned, they got much
faster (several times faster for common operations like
commit/branch/log/merge).
Post by Charles R Harris
but Linus was definitely focused on speed, which is easy to understand if
you look at the churn in the kernel. Anyway, I suspect that, technically,
both bzr and hg are suitable choices. I'm not sure esr correct that it is
unlikely that both are going to last long term, bazaar (the ancestor of bzr)
is used for Ubuntu. But the two are similar and fill the same niche, so I
expect that one or the other will become dominant in the wild. And hg seems
to have the advantage of a head start and not being as tightly tied to
Linux.
bzr is not tied to linux. They always have win32 binaries, TortoiseBzr
has a longer history than the mercurial one, and as said previously,
one developer of bzr at least is mainly a windows user. I don't want
to sound like I defend bzr, because honestly, I don't care about which
one is used, but so far, the arguments I heard against bzr do not
reflect my experience at all.
One thing that bzr tries hard is the general UI, and the explicit
support for several workflows (with moderately advanced concepts such
as shared repositories, bound branches: for example, with a branch A
bound to branch B, a commit is first pushed on branch B, and if
successfull, applied to A; for centralized worflows, this makes things
easier). I honestly do not know if this is significant. bzr claims its
merge capability is better: I do not know if this is true, or if that
matters at all.
I would rather discuss those than "bzr is tied to linux", because I
don't think they are based on accurate or recent informations. As I
said, I have bzr imports of scipy and scikits, and I could easily to
the same for hg, make them available for everybody to play with.
David
_______________________________________________
Numpy-discussion mailing list
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris
2008-01-04 21:36:21 UTC
Permalink
Post by Charles R Harris
Post by Charles R Harris
A quick google for benchmarks show that a year ago, hg was a bit faster
and
Post by Charles R Harris
generated smaller repositories than bzr, but I don't think the
difference is
Post by Charles R Harris
enough to matter.
Forget a year ago, because as far as bzr is concerned, they got much
faster (several times faster for common operations like
commit/branch/log/merge).
Sure, that's why I mentioned the time. Bzr used to claim better directory
renames than hg, but that is not the case since version 9.4. So on and so
forth. They are both moving targets.

bzr is not tied to linux.


It is, in that development is funded by Canonical, but I haven't used either
on windows, so don't have any idea how they compare in that regard.

They always have win32 binaries, TortoiseBzr
Post by Charles R Harris
has a longer history than the mercurial one, and as said previously,
one developer of bzr at least is mainly a windows user. I don't want
to sound like I defend bzr, because honestly, I don't care about which
one is used, but so far, the arguments I heard against bzr do not
reflect my experience at all.
One thing that bzr tries hard is the general UI, and the explicit
support for several workflows (with moderately advanced concepts such
as shared repositories, bound branches: for example, with a branch A
bound to branch B, a commit is first pushed on branch B, and if
successfull, applied to A; for centralized worflows, this makes things
easier).
Hg has always recommended a similar process: clone the repository, push your
changes to the clone, fix what needs fixing, and commit. It's not an atomic
operation, though. I don't know where things are in that regard at the
moment.

Chuck
Robert Kern
2008-01-04 22:09:08 UTC
Permalink
Post by David Cournapeau
bzr is not tied to linux.
It is, in that development is funded by Canonical, but I haven't used
either on windows, so don't have any idea how they compare in that regard.
In that sense, it's all pretty much a wash between the three. Selenic initially
developed Mercurial in the aftermath of the Linux kernel Bitkeeper foofoorah,
and they continue to use it to manage their kernel modules.

If we want to talk about Windows support, we should stick to more concrete facts
(like the availability of Windows shell integration, etc.) instead of vague
inferences.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
David Cournapeau
2008-01-04 22:16:04 UTC
Permalink
Post by Charles R Harris
Post by David Cournapeau
Post by Charles R Harris
A quick google for benchmarks show that a year ago, hg was a bit faster
and
Post by David Cournapeau
Post by Charles R Harris
generated smaller repositories than bzr, but I don't think the
difference is
Post by David Cournapeau
Post by Charles R Harris
enough to matter.
Forget a year ago, because as far as bzr is concerned, they got much
faster (several times faster for common operations like
commit/branch/log/merge).
Sure, that's why I mentioned the time. Bzr used to claim better directory
renames than hg, but that is not the case since version 9.4. So on and so
forth. They are both moving targets.
Yes, I agree on the moving targets.
Post by Charles R Harris
Post by David Cournapeau
bzr is not tied to linux.
It is, in that development is funded by Canonical, but I haven't used either
on windows, so don't have any idea how they compare in that regard.
Being funded by Canonical does not mean it is tied to linux. For
example, some people are working on handling case insensitive fs,
which is not something a tied-to-linux tool would care about. That's
not something we care about, though.

Both hg and bzr are developed mainly on linux, by linux developers
(typically, the fact that none of them have an official, complete GUI
shows that they were not born on windows). bzr works as well on
windows as it does on linux (I use bzr on windows for the numscons
project, for example), but AFAIK, so does mercurial, so this is not an
important point.
Post by Charles R Harris
Hg has always recommended a similar process: clone the repository, push your
changes to the clone, fix what needs fixing, and commit. It's not an atomic
operation, though. I don't know where things are in that regard at the
moment.
The point is really about the one operation; otherwise, any DVCS can
do it. Honestly, I do not find it such a useful feature, but maybe it
shows when used by many different people ?

IMHO, the only really important points are how to convert the current
history, and trac integration. All other differences look quite minor
to me.

David
Charles R Harris
2008-01-04 21:02:16 UTC
Permalink
Post by Fernando Perez
Post by David Cournapeau
I understand the "sumpy uses it" reason, it is definitely a factor.
But I would rather have a more thorough study on the merits of each
system. For example, being a user of bzr for a year and a half now, I
think I have a pretty good idea on how it works, and its advantages.
We could then decide on a set of attributes to compare, and people who
knows about one tool could then tell about it.
Performances-wise, hg and bzr really are comparable nowadays for
common, local operations. I don't think it is a relevant parameter for
the hg vs bzr choice anymor, specially for scipy/numpy which are small
projects (I have bzr imports of scipy and scikits, so I can give some
numbers if you need them). Third party tools, special abilities (svn
import, storage efficiency, special commands, etc...) are more
important I think
Absolutely. That's why I said above "when the choice is a sound one
on technical merit alone". At the time (for sage/sympy) the bzr/hg
choice was unmistakably in favor of hg. Things might be different
today.
Sometimes it is the little niggling things that matter, in this case line
breaks. Hg (and probably bzr), store everything as binary, so if someone
uses an editor that breaks line with CR or LF+CR instead of the good 'ol
unixy LF, there might be a lot whitespace updates coming in to the
repository. I wonder if there is a way to put an automatic filter in place
for that sort of thing?

Chuck
Post by Fernando Perez
http://thread.gmane.org/gmane.emacs.devel/85893
If they actually do the work of comparing tools, that work may be
useful for us. I'm pretty sure that any tool that can handle the
entire history of emacs can chew on numpy/scipy/ipython/matplotlib
*combined* for breakfast.
Cheers,
f
_______________________________________________
Numpy-discussion mailing list
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Giorgos Keramidas
2008-01-06 03:27:00 UTC
Permalink
Post by Fernando Perez
http://thread.gmane.org/gmane.emacs.devel/85893
If they actually do the work of comparing tools, that work may be
useful for us. I'm pretty sure that any tool that can handle the
entire history of emacs can chew on numpy/scipy/ipython/matplotlib
*combined* for breakfast.
The discussions on emacs-devel are interesting indeed. I've been
keeping my own local Emacs patches (to make it build on FreeBSD as a
port) in a Mercurial repository for a fair amount of time now, and it
seems to work "ok".

One interesting datapoint is that the entire history of the "HEAD"
branch of the repository of Emacs can fit in 141 MB, which is smaller
than a full checkout of Emacs plus object code after a build, and it
takes less than 2 seconds to look at the log of the first commit done in
CVS ever:

kobe % du -sk ~/tmp/emacs-src
187436 /home/keramida/tmp/emacs-src
kobe % pwd
/home/keramida/hg/emacs/gnu
kobe % du -sk .hg
141620 .hg
kobe % /usr/bin/time hg log -r0
changeset: 0:c67c006134ec
user: jimb
date: Thu Apr 18 00:48:29 1985 +0000
summary: entered into RCS

1.72 real 1.40 user 0.29 sys
kobe %

I'm sure that this is far from a "killer feature", but at least it shows
that there's no huge obstacle in using Hg to check-out and browse the
history of a repository the size of Emacs' CVS tree.

I don't know if Emacs will use Mercurial or some other DVCS, but it
should be certainly "do-able".

Just my two cents,
Giorgos
David M. Cooke
2008-01-05 19:08:46 UTC
Permalink
Post by Fernando Perez
My vote so far is for hg, for performance reasons but also partly
because sage and sympy already use it, two projects I'm likely to
interact a lot with and that are squarely in line with the
ipython/numpy/scipy/matplotlib world. Since they went first and made
the choice, I'm happy to let that be a factor in my decision. I'd
rather use a tool that others in the same community are also using,
especially when the choice is a sound one on technical merit alone.
Just my 1e-2...
+1 on mercurial. It's what I use these days (previously, I used darcs,
which I still like for its patch-handling semantics, but its
dependence on Haskell, and the dreaded exponential-time merge are a
bit of a pain).

One thing that can help is an official Mercurial mirror of the
subversion repository. IIRC, sharing changegroups or pulling patches
between hg repos requires that they have a common ancestor repo (as
opposed to two developers independently converting the svn repo).
--
|>|\/|<
/------------------------------------------------------------------\
|David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/
|***@physics.mcmaster.ca
Fernando Perez
2008-01-05 19:15:41 UTC
Permalink
Post by David M. Cooke
Post by Fernando Perez
My vote so far is for hg, for performance reasons but also partly
because sage and sympy already use it, two projects I'm likely to
interact a lot with and that are squarely in line with the
ipython/numpy/scipy/matplotlib world. Since they went first and made
the choice, I'm happy to let that be a factor in my decision. I'd
rather use a tool that others in the same community are also using,
especially when the choice is a sound one on technical merit alone.
Just my 1e-2...
+1 on mercurial. It's what I use these days (previously, I used darcs,
which I still like for its patch-handling semantics, but its
dependence on Haskell, and the dreaded exponential-time merge are a
bit of a pain).
Regarding the 'record' capapbilities of darcs which were indeed very
nice, here's something that was recently mentioned on the sage list:

"""
I noticed that Mercurial 0.9.5 has a "record" extension that mimics the
darcs record functionality of interactively asking what changes you want
to commit out of a file. I know there was discussion of this a while ago.

Reference:

http://www.selenic.com/pipermail/mercurial/2007-October/015150.html
under the New extensions heading. See also
http://www.selenic.com/mercurial/wiki/index.cgi/RecordExtension

Anyways, I'm just posting this as an FYI. It might be nice to expose
this functionality to sage, if we haven't already.

Thanks,

Jason
"""

Cheers,

f
Ondrej Certik
2008-01-05 19:59:50 UTC
Permalink
Post by Fernando Perez
Post by David M. Cooke
Post by Fernando Perez
My vote so far is for hg, for performance reasons but also partly
because sage and sympy already use it, two projects I'm likely to
interact a lot with and that are squarely in line with the
ipython/numpy/scipy/matplotlib world. Since they went first and made
the choice, I'm happy to let that be a factor in my decision. I'd
rather use a tool that others in the same community are also using,
especially when the choice is a sound one on technical merit alone.
Just my 1e-2...
+1 on mercurial. It's what I use these days (previously, I used darcs,
which I still like for its patch-handling semantics, but its
dependence on Haskell, and the dreaded exponential-time merge are a
bit of a pain).
Regarding the 'record' capapbilities of darcs which were indeed very
"""
I noticed that Mercurial 0.9.5 has a "record" extension that mimics the
darcs record functionality of interactively asking what changes you want
to commit out of a file. I know there was discussion of this a while ago.
http://www.selenic.com/pipermail/mercurial/2007-October/015150.html
under the New extensions heading. See also
http://www.selenic.com/mercurial/wiki/index.cgi/RecordExtension
Anyways, I'm just posting this as an FYI. It might be nice to expose
this functionality to sage, if we haven't already.
Thanks,
Jason
"""
Kirill (a sympy developer) has also sent patches for qrecord (record
for mercurial queues)

http://www.selenic.com/pipermail/mercurial-devel/2007-December/003953.html

Ondrej
Travis E. Oliphant
2008-01-05 21:00:21 UTC
Permalink
Post by David M. Cooke
Post by Fernando Perez
My vote so far is for hg, for performance reasons but also partly
because sage and sympy already use it, two projects I'm likely to
interact a lot with and that are squarely in line with the
ipython/numpy/scipy/matplotlib world. Since they went first and made
the choice, I'm happy to let that be a factor in my decision. I'd
rather use a tool that others in the same community are also using,
especially when the choice is a sound one on technical merit alone.
Just my 1e-2...
+1 on mercurial. It's what I use these days (previously, I used darcs,
which I still like for its patch-handling semantics, but its
dependence on Haskell, and the dreaded exponential-time merge are a
bit of a pain).
I don't think it is time to move wholesale to something like Mercurial
or bzr. I would prefer it if all of the Enthought-hosted projects
moved to the (new) system at once, which is not going to happen in the
short term (but long term of course it's an open question).

But, having an official mirror is a useful thing to explore.

I suspect there are others with serious reservations about jumping off
of SVN just now (just when a lot of people have finally figured out how
to use it). If there is an Hg mirror that allows other to use mercurial
at the same time then that sounds like a good idea to me.


-Travis O.
Stefan van der Walt
2008-01-05 23:25:00 UTC
Permalink
Post by Travis E. Oliphant
I suspect there are others with serious reservations about jumping off
of SVN just now (just when a lot of people have finally figured out how
to use it).
I recall something you said to David last week, regarding merges with
SVN: that a person never knows how to do it until *after* you've done
it! We often make branches in scipy and numpy, and stand a lot to
gain from a distributed RCS.

Once a person knows how to use SVN, it doesn't take much effort at all
to learn bzr or hg (even the commands are often the same). The main
change is a mind-shift: that branches are now a lot friendlier, and
that they are accessable to everybody.

At the end of 2005, back when I was still working with Octave, we had
a discussion on the merits of switching over to Subversion. That
conversation never went anywhere, which is the reason you can still
obtain Octave today using

cvs -d :ext:***@www.octave.org:/cvs

I know there are reservations about doing the switch *right now*,
which is fine -- we must just not wait too long.

Regards
Stéfan
Fernando Perez
2008-01-05 23:47:38 UTC
Permalink
I'd like to briefly provide a different perspective on this question,
which is not a technical one but a more social/process one.

It seems to me (but I could be wrong; this is opinion, not research!)
that a DVCS encourages a more open participation model for newcomers.
Since anyone with a checkout has the same tree, there is no more 'us
vs. them' in the sense of 'developers vs users'. Yes, with SVN anyone
can track trunk or branches and submit a patch, but there's a distinct
asymmetry in the process that DVCS remove (bviously even with a DVCS
model there always be a canonical repository that is considered
official, and to which only a group with commit rights can push
changes).

In addition, DVCS allow more easily the creation of subgroups of
parallel developers who share their branches and explore ideas,
subprojects, optimizations, etc. With a DVCS, anyone can join such a
subgroup, contribute, and if that idea bears fruit, it's easy to fold
it back into the official trunk. SVN doesn't really lend itself well
at all to this type of approach, and I think it therefore tends to
lower the amount of intellectual exploration a project is likely to do
during its lifetime.

So I'd venture that a DVCS can benefit a project in the long run by
lowering the tunneling energy required to make the user->developer
transition. Given how users who make this transition are the life and
blood of any open source project, I'd argue that anything that helps
this is worth considering.

Obviously the above is not an argument for doing anything *now*, as
for many reasons now may not be the right time. But it is to me a
compelling argument for taking the step, leaving only the when and
which specific tool as decisions to be appropriately determined.

Of course, I could be fully wrong, since the above is little more than
common-sense-sounding speculation.

Cheers,

f
Bill Baxter
2008-01-05 23:55:00 UTC
Permalink
Post by Stefan van der Walt
I recall something you said to David last week, regarding merges with
SVN: that a person never knows how to do it until *after* you've done
it! We often make branches in scipy and numpy, and stand a lot to
gain from a distributed RCS.
Once a person knows how to use SVN, it doesn't take much effort at all
to learn bzr or hg (even the commands are often the same). The main
change is a mind-shift: that branches are now a lot friendlier, and
that they are accessable to everybody.
I understand that DVCS's do merging better. But what I don't really
understand is why this is an inherent advantage of DVCS. Isnt it
just a matter of the current crop of DVCS's implementing a better
merge algorithm than SVN? The SVN guys seem to be competent, so if
you just give them time surely they will eventually incorporate these
better merging algorithms into SVN. Who wouldn't want better merging?

--bb
Ondrej Certik
2008-01-06 00:02:40 UTC
Permalink
Post by Bill Baxter
Post by Stefan van der Walt
I recall something you said to David last week, regarding merges with
SVN: that a person never knows how to do it until *after* you've done
it! We often make branches in scipy and numpy, and stand a lot to
gain from a distributed RCS.
Once a person knows how to use SVN, it doesn't take much effort at all
to learn bzr or hg (even the commands are often the same). The main
change is a mind-shift: that branches are now a lot friendlier, and
that they are accessable to everybody.
I understand that DVCS's do merging better. But what I don't really
understand is why this is an inherent advantage of DVCS. Isnt it
just a matter of the current crop of DVCS's implementing a better
merge algorithm than SVN? The SVN guys seem to be competent, so if
you just give them time surely they will eventually incorporate these
better merging algorithms into SVN. Who wouldn't want better merging?
It's not just about merging. But anyway, all arguments were already said in
this thread. I fully agree with both Davids and Fernando.

So let's setup an official mercurial mirror, that will automatically download
all svn commits.

That way, we can easily work with Mercurial, clone the repos, browse history,
everything. Review patches. And then, when the patches are reviewed, instead
of pushing them to the Mercurial repo, they will be committed using svn.
No big deal, everyone is happy.

We did it too in sympy, at the beginning, because we
were afraid of switching (it was mainly me, who was afraid, because
I am very conservative, that's why I use Debian:). But then, once you try
DVCS, you never want to come back.

Ondrej
Robert Kern
2008-01-06 00:56:21 UTC
Permalink
Post by Bill Baxter
Post by Stefan van der Walt
I recall something you said to David last week, regarding merges with
SVN: that a person never knows how to do it until *after* you've done
it! We often make branches in scipy and numpy, and stand a lot to
gain from a distributed RCS.
Once a person knows how to use SVN, it doesn't take much effort at all
to learn bzr or hg (even the commands are often the same). The main
change is a mind-shift: that branches are now a lot friendlier, and
that they are accessable to everybody.
I understand that DVCS's do merging better. But what I don't really
understand is why this is an inherent advantage of DVCS. Isnt it
just a matter of the current crop of DVCS's implementing a better
merge algorithm than SVN?
No, it's not the algorithm itself. It's the information that the VCS tracks
about files and revisions.
Post by Bill Baxter
The SVN guys seem to be competent, so if
you just give them time surely they will eventually incorporate these
better merging algorithms into SVN. Who wouldn't want better merging?
Yes, such support is on its way in 1.5. Unfortunately, that release is most
likely years away.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
David Cournapeau
2008-01-06 07:20:50 UTC
Permalink
Post by Travis E. Oliphant
Post by David M. Cooke
Post by Fernando Perez
My vote so far is for hg, for performance reasons but also partly
because sage and sympy already use it, two projects I'm likely to
interact a lot with and that are squarely in line with the
ipython/numpy/scipy/matplotlib world. Since they went first and made
the choice, I'm happy to let that be a factor in my decision. I'd
rather use a tool that others in the same community are also using,
especially when the choice is a sound one on technical merit alone.
Just my 1e-2...
+1 on mercurial. It's what I use these days (previously, I used darcs,
which I still like for its patch-handling semantics, but its
dependence on Haskell, and the dreaded exponential-time merge are a
bit of a pain).
I don't think it is time to move wholesale to something like Mercurial
or bzr. I would prefer it if all of the Enthought-hosted projects
moved to the (new) system at once, which is not going to happen in the
short term (but long term of course it's an open question).
How would you define short term and long term ? I understand the need to
plan carefully things, and this takes time. But I am not sure I
understand why not moving now, in the sense what will change in the
future which will make the change better then than now ? I thought about
targeting the change for 1.1, because it gives one precise time target,
and gives use around 6 months, but this can be later if it takes more time.

I have not thought about all the details, but I had something like the
following plan in mind:
- first, importing the different trunks into the different contenders
- making read-only mirrors of those import so that people can try it out
- making tutorials so that people who do not know about the tools
can start in a few minutes
- Having a clear idea on the difference between the tools, and makes
a list of the different requirements (speed, availability on platforms,
3-party tools, GUI, etc...).

It seems mercurial is the tool of choice for almost everybody; I don't
have any problem with that, except that I would like to see more
arguments than just "I am using mercurial and it works" (the fact that
most contributors knows mercurial certainly is an argument in favor of
it, though). Having a list of requirements and how each tool fulfill
each of them would be helpful, no ? I am certainly willing to do all the
above for bzr, and it seems doing it for mercurial won't be any problem
since so many people already know it.
Post by Travis E. Oliphant
But, having an official mirror is a useful thing to explore.
I suspect there are others with serious reservations about jumping off
of SVN just now (just when a lot of people have finally figured out how
to use it).
Using bzr is easier than svn IMHO, and knowing bzr, I knew how to use
mercurial for basic things in 5 minutes (by basic I mean checking out
code, branching, committing and getting patches):

bzr co http://bzr.scipy.org/bzr/numpy -> get the code
bzr st -> see the changes
bzr diff -> get a patch
bzr ci -> commit

Basically, you have to change svn to bzr :) And this is really similar
in mercurial. Things like getting through the history, merging is a bit
more complicated because there is no notion of global revision anymore,
but this won't matter to most people ?
Post by Travis E. Oliphant
If there is an Hg mirror that allows other to use mercurial
at the same time then that sounds like a good idea to me.
Is it possible to get the svn dump for people willing to do the import
(doing it from the import is not the only way, but is certainly the
fastest) ? I can do the import for bzr; I can do it in mercurial too if
nobody else jumps in, but since I am less familiar with mercurial, it
would be better for someone else to do it.

cheers,

David
Robert Kern
2008-01-06 08:05:25 UTC
Permalink
Post by Travis E. Oliphant
I don't think it is time to move wholesale to something like Mercurial
or bzr. I would prefer it if all of the Enthought-hosted projects
moved to the (new) system at once, which is not going to happen in the
short term (but long term of course it's an open question).
I think that's irrelevant. There is absolutely no reason that we should force
all of the Enthought-hosted projects to move in sync. We would have reason if we
were being asked to host a different centralized VCS with a complicated server,
but hosting Mercurial or bzr is nearly trivial. We already do it for me:

http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi

The remaining thing we would have to support is the Trac integration. While not
as trivial as simply hosting the repositories, it's not a very large commitment.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Bill Baxter
2008-01-06 09:34:25 UTC
Permalink
http://www.selenic.com/mercurial/wiki/index.cgi/MergeProgram

This is a bit puzzling. I understand better merging isn't the only
reason to choose DVCS, but the above page basically says that
Mercurial just uses whatever external merge program it can find. So
the file-level merging sounds like it must really be no different from
other VCSs.

So it is really just proper merging of directory renames and the like
that make it superior?

--bb
Robert Kern
2008-01-06 09:38:44 UTC
Permalink
Post by Bill Baxter
http://www.selenic.com/mercurial/wiki/index.cgi/MergeProgram
This is a bit puzzling. I understand better merging isn't the only
reason to choose DVCS, but the above page basically says that
Mercurial just uses whatever external merge program it can find. So
the file-level merging sounds like it must really be no different from
other VCSs.
So it is really just proper merging of directory renames and the like
that make it superior?
No. If you'll pardon my repeating myself:

"""
DVCSes need to keep track of more information in order to be
distributed. That information is extremely useful for managing merges properly.
Centralized systems could track this information, but they don't *need* to in
order to be functional, so they mostly haven't, yet.

For each revision, the DVCS knows what revisions it derives from. SVN does not
keep this information. SVN primarily just knows the text diffs from revision to
revision. In particular, if I have a long-lived branch, I am going to merge in
changes from the trunk while I'm working on it. When I go to merge the branch
back into the trunk, I need to know which trunk-revisions I've already merged
into the branch. SVN does not track this information. Tools like svnmerge.py
track some of this information at the expense of some added clumsiness.
"""
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
David Cournapeau
2008-01-06 09:38:56 UTC
Permalink
Post by Robert Kern
Post by Bill Baxter
http://www.selenic.com/mercurial/wiki/index.cgi/MergeProgram
This is a bit puzzling. I understand better merging isn't the only
reason to choose DVCS, but the above page basically says that
Mercurial just uses whatever external merge program it can find. So
the file-level merging sounds like it must really be no different from
other VCSs.
So it is really just proper merging of directory renames and the like
that make it superior?
"""
DVCSes need to keep track of more information in order to be
distributed. That information is extremely useful for managing merges properly.
Centralized systems could track this information, but they don't *need* to in
order to be functional, so they mostly haven't, yet.
For each revision, the DVCS knows what revisions it derives from. SVN does not
keep this information. SVN primarily just knows the text diffs from revision to
revision. In particular, if I have a long-lived branch, I am going to merge in
changes from the trunk while I'm working on it. When I go to merge the branch
back into the trunk, I need to know which trunk-revisions I've already merged
into the branch. SVN does not track this information. Tools like svnmerge.py
track some of this information at the expense of some added clumsiness.
"""
Does good merging only depends on the above ? Martin Pool, one of the
bzr programmer, wrote this article two years ago:

http://sourcefrog.net/weblog/software/vc/derivatives.html

which I found both enlightening and easy to follow.

cheers,

David
Robert Kern
2008-01-06 10:09:17 UTC
Permalink
Post by David Cournapeau
Does good merging only depends on the above ? Martin Pool, one of the
http://sourcefrog.net/weblog/software/vc/derivatives.html
which I found both enlightening and easy to follow.
My terminology was fuzzy/incorrect. By "revision," I meant "changeset" rather
than "snapshot."

The main thrust of that article is about the value of viewing VCS history as a
sequence of changesets rather than a sequence of snapshots. FWIW, svnmerge and
SVN 1.5 Merge Tracking are changeset-oriented rather than snapshot-oriented.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Bill Baxter
2008-01-06 11:03:49 UTC
Permalink
Post by Robert Kern
Post by Bill Baxter
http://www.selenic.com/mercurial/wiki/index.cgi/MergeProgram
This is a bit puzzling. I understand better merging isn't the only
reason to choose DVCS, but the above page basically says that
Mercurial just uses whatever external merge program it can find. So
the file-level merging sounds like it must really be no different from
other VCSs.
So it is really just proper merging of directory renames and the like
that make it superior?
"""
DVCSes need to keep track of more information in order to be
distributed. That information is extremely useful for managing merges properly.
Centralized systems could track this information, but they don't *need* to in
order to be functional, so they mostly haven't, yet.
For each revision, the DVCS knows what revisions it derives from. SVN does not
keep this information. SVN primarily just knows the text diffs from revision to
revision. In particular, if I have a long-lived branch, I am going to merge in
changes from the trunk while I'm working on it. When I go to merge the branch
back into the trunk, I need to know which trunk-revisions I've already merged
into the branch. SVN does not track this information. Tools like svnmerge.py
track some of this information at the expense of some added clumsiness.
"""
Ok. Sorry for not reading that closer. So what you're saying is that
the magic is in the deciding of exactly which revisions of which files
to run the merge program on?

--bb
Robert Kern
2008-01-06 11:10:11 UTC
Permalink
Post by Bill Baxter
Post by Robert Kern
Post by Bill Baxter
http://www.selenic.com/mercurial/wiki/index.cgi/MergeProgram
This is a bit puzzling. I understand better merging isn't the only
reason to choose DVCS, but the above page basically says that
Mercurial just uses whatever external merge program it can find. So
the file-level merging sounds like it must really be no different from
other VCSs.
So it is really just proper merging of directory renames and the like
that make it superior?
"""
DVCSes need to keep track of more information in order to be
distributed. That information is extremely useful for managing merges properly.
Centralized systems could track this information, but they don't *need* to in
order to be functional, so they mostly haven't, yet.
For each revision, the DVCS knows what revisions it derives from. SVN does not
keep this information. SVN primarily just knows the text diffs from revision to
revision. In particular, if I have a long-lived branch, I am going to merge in
changes from the trunk while I'm working on it. When I go to merge the branch
back into the trunk, I need to know which trunk-revisions I've already merged
into the branch. SVN does not track this information. Tools like svnmerge.py
track some of this information at the expense of some added clumsiness.
"""
Ok. Sorry for not reading that closer. So what you're saying is that
the magic is in the deciding of exactly which revisions of which files
to run the merge program on?
That's the main idea.
--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Travis E. Oliphant
2008-01-06 19:41:06 UTC
Permalink
Post by Robert Kern
Post by Travis E. Oliphant
I don't think it is time to move wholesale to something like Mercurial
or bzr. I would prefer it if all of the Enthought-hosted projects
moved to the (new) system at once, which is not going to happen in the
short term (but long term of course it's an open question).
I think that's irrelevant. There is absolutely no reason that we should force
all of the Enthought-hosted projects to move in sync.
It is relevant, because I have my hands in all of them. I don't want to
have to keep up with too many different systems at once. Perhaps it's
selfish and immaterial to others, but it is relevant to me.
Post by Robert Kern
We would have reason if we
were being asked to host a different centralized VCS with a complicated server,
http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi
The remaining thing we would have to support is the Trac integration. While not
as trivial as simply hosting the repositories, it's not a very large commitment.
Trac integration is exactly what I'm thinking about. If it is easy,
then it is not a big deal, but I have not seen arguments or evidence
that it is the case (or suggestions of something to use other than Trac
-- which is also a bigger deal for moving everything at once).

-Travis O.
David Cournapeau
2008-01-07 12:59:14 UTC
Permalink
Post by Travis E. Oliphant
Post by Robert Kern
Post by Travis E. Oliphant
I don't think it is time to move wholesale to something like Mercurial
or bzr. I would prefer it if all of the Enthought-hosted projects
moved to the (new) system at once, which is not going to happen in the
short term (but long term of course it's an open question).
I think that's irrelevant. There is absolutely no reason that we should force
all of the Enthought-hosted projects to move in sync.
It is relevant, because I have my hands in all of them. I don't want to
have to keep up with too many different systems at once. Perhaps it's
selfish and immaterial to others, but it is relevant to me.
I can understand the reluctance, but neither mercurial or bzr are
difficult to use. The time required to learn one of them would be
quickly negligeable compared to the time gained.
Post by Travis E. Oliphant
Post by Robert Kern
We would have reason if we
were being asked to host a different centralized VCS with a complicated server,
http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi
The remaining thing we would have to support is the Trac integration. While not
as trivial as simply hosting the repositories, it's not a very large commitment.
Trac integration is exactly what I'm thinking about. If it is easy,
then it is not a big deal, but I have not seen arguments or evidence
that it is the case (or suggestions of something to use other than Trac
-- which is also a bigger deal for moving everything at once).
There are trac plugins for both bzr and mercurial. For bzr, I was
pointed the following project using trac (10.4) plus trac+bzr:

http://bugs.bitlbee.org/bitlbee/browser

You seemed to be open on the idea of a mirror: what about a
mercurial/bzr mirror of numpy and scipy, + some kind of trac mirror (I
don't think it would be a problem to have a mirror to see the
branches; having a read only ticket viewer maybe more difficult) ? It
is just a matter of having 1/2 volunteers + some help from people
having access to the servers (to get svn dump and co).

cheers,

David

Stefan van der Walt
2008-01-04 19:03:15 UTC
Permalink
Post by Ondrej Certik
Charles actually said another point in favor of Mercurial - it works
on Windows (at least people say so), while git not that much (at least
people say so). I never use Windows myself, so I don't know.
Note that bzr also runs under Windows, and is also written in
Python+C. Here is the URL I referred to this
afternoon on IRC, regarding the diff-algorithm:

http://bramcohen.livejournal.com/37690.html

Regards
Stéfan
Russell E. Owen
2008-01-04 19:51:11 UTC
Permalink
In article
Post by David Cournapeau
Post by Charles R Harris
I like Mercurial and use it a lot, but I'm not convinced we have enough
developers and code to justify the pain of changing the VCS at this time.
I don't understand the number of developers argument: on most of the
projects I am working on, I am the only developer, and I much prefer
bzr to svn, although for reasons which are not really relevant to a
project like numpy/scipy.
Post by Charles R Harris
SVN g!enerally works well and has good support on Windows through tortoise.
That's where I don't agree: I don't think svn works really well. As
long as you use it as an history backup, it works ok, but that's it.
The non functional merge makes branching almost useless, reverting
back in time is extremely cumbersome,
I am a bit puzzled by the vitriol about merging with svn. svn's built in
merge is a joke but svnmerge.py works reasonably well (especially newer
versions of svnmerge.py; I use rev 26317 and the version included in the
current svn 1.4.6 should be even more recent)

I agree that reverting a file to an older versions is clumsy using svn.

-- Russell
David Cournapeau
2008-01-04 20:45:53 UTC
Permalink
Post by Russell E. Owen
In article
I am a bit puzzled by the vitriol about merging with svn. svn's built in
merge is a joke but svnmerge.py works reasonably well (especially newer
versions of svnmerge.py; I use rev 26317 and the version included in the
current svn 1.4.6 should be even more recent)
I don't know which version I used. But svnmerge is not that practical:
you have to explicitely say which repository you want to track down,
and merge almost alway fails. Several times, I had conflicts with
files I did not touch, which does not make any sense to me. Merging a
file modified in my branch and in another branch almost always gave me
a conflict. In bzr (and this is really similar in hg at least), when
you merge something, you just say bzr merge SOURCE, and that's it. No
svnsync init, no svnmerge avail / svnmerge merge cycles, etc...

David
Jarrod Millman
2008-01-04 18:56:12 UTC
Permalink
In general I think that this is a good direction to go in. My general
preference would be to use git or mercurial.

I haven't had time to read the entire thread, but since I won't get a
chance to catch up on this thread until much later today -- here are
my concerns:
1. We use as vanilla a version of Trac as possible. In particular,
we should avoid using experimental plugins.
2. Before making any major changes on the scipy.org server we first
get the server upgraded and cleaned up.

Thanks,
--
Jarrod Millman
Computational Infrastructure for Research Labs
10 Giannini Hall, UC Berkeley
phone: 510.643.4014
http://cirl.berkeley.edu/
David Cournapeau
2008-01-04 19:04:54 UTC
Permalink
Post by Jarrod Millman
In general I think that this is a good direction to go in. My general
preference would be to use git or mercurial.
I haven't had time to read the entire thread, but since I won't get a
chance to catch up on this thread until much later today -- here are
1. We use as vanilla a version of Trac as possible. In particular,
we should avoid using experimental plugins.
2. Before making any major changes on the scipy.org server we first
get the server upgraded and cleaned up.
I personally think that this is a change which should be carefully
planned. Typically, I would not think about doing it before say the
1.1 release of numpy (that is, having several months at least to test
things and let people get used to it).

My email should really be understood as a proposal for a roadmap, and
how to proceed (having mirrors in bzr, hg, etc... testing with an
experimental trac which would run in parrallel with the main one,
etc...).

cheers,

David
Ivan Vilata i Balaguer
2008-01-04 20:29:54 UTC
Permalink
Post by David Cournapeau
[...]
Integration with trac is the real problem, I think. According to one bzr
developer, trac model (0.10, the last released one) is really based
around subversion notion of repository, which does not fit well with
mercurial and bzr. I don't know if this is true for the not yet released
0.11. If bzr is considered a possible candidate, I can get more
informations from bzr developers.
[...]
My main concern about Trac and bzr integration is that (please correct
me if I'm wrong or outdated) Trac doesn't seem to support multiple
bzr branches, even if they are under the same shared repository. This
would limit Trac to following only one bzr branch, say the trunk
(leaving tags and branches out of its control). If this was to be
avoided by creating a single bzr branch with all branches and tags, we
may be facing a size problem, since bzr lacks cheap copying operations
right now (but support is planned). Anyway, this last approach isn't
very appropriate with a DVCS.

(Multiple bzr branch support in Trac would be really wonderful!)

::

Ivan Vilata i Balaguer >qo< http://www.carabos.com/
Cárabos Coop. V. V V Enjoy Data
""
Loading...