David Cournapeau
2008-01-04 11:24:04 UTC
Hi,
First things first, happy new year to all !
Having recently felt the pain to use subversion merge, I was
wondering about people's feeling on moving away from subversion and
using a better system, ala mercurial or bzr (I will talk about bzr
because that's the one I know the most, but this discussion is really
about using something better than subversion, not that much about bzr).
I think this could be an important step forward, and is somewhat related
to the discusions on scikits and co.
As some of you are certainly aware, there has been a recent trend
towards so called Distributed Version Control Systems (DVCS). I won't go
into the details, because it varies from system to system, and I am in
no position to explain technical details. But for people who are
wondering, here is a small description of DVCS, and why I think this can
be a significant step forward for numpy/scipy. You can skip it if you
know about them
What is a DVCS
==============
DVCS, contrary to centralized systems like CVS or SVN, has no
technical concept of a central repository from which everybodoy pulls
and push changes. Instead, the DVCS are centered around the branch
concept, which contains a local copy of the history. As a consequence:
1 you can do most traditionnal svn/cvs operations locally, and
disconnected from the network (getting the log, getting annotations,
commiting, branching, merging from other branches).
2 because the branch is local, no rights is needed: anybody can
jump-in, commit to a local branch. Of course, integration in an official
numpy branch would need some special approval.
Also, this has the following consequence: since branching/merging is
such a key point of DVCS, merging actually works with DVCS. In
perticular, merging several times the same changes works, and you
certainly do not have to do the whole svn madness of tracking versions.
For more informations, here are some links which go much deeper:
- some discussion from K. Richard, the maintained of X.org:
http://keithp.com/blogs/Tyrannical_SCM_selection/,
http://keithp.com/blogs/Repository_Formats_Matter/
- Linus Torvald on the advantages of git, the DVCS he wrote for
linux developement, versus svn for kde (long but it really makes all the
points really clearly):
http://lists.kde.org/?l=kde-core-devel&m=118764715705846&w=2
Why using a DVCS ?
==================
Some people argue that DVCS are intrinsically more complicated, which is
something I really don't understand. I've been programming 'seriously'
for only about 2-3 years, and I find bzr much easier to use and setting
up than subversion; the key point I think is that I started using DVCS
before centralized ones. Some things which are utterly complicated with
subversion and are trivial with bzr: merging, going back into the
history (that is at rev 150, you realize that everything from rev 140 is
rubbish, and you want to go back: this is extremely tedious to do with
subversion). Basically, most of the things which are the reasons why we
use VCS in the first place are easier with DVCS than VCS (at least as
far as svn is concerned). Also:
- For a casual user who wants to use the last development instead of a
release, getting it from a bzr repository, a git repository, a mercurial
repository or a svn repository is extremely similar. It is one step in
all cases.
- For casual developers: being able to use branches means that they can
implements their new features in a change-set oriented way instead of
one big patch. Also, bzr enables things like uncommit if you made a
mistake and wants to go back. More generally, going back in history is
much easier.
- For core developers: I personally find the ability to use branches for
each new feature to be extremely useful. It makes me feel much safer
when I do something. I am not afraid of doing something totally stupid
which may end up screwing other people.
And finally, I find the ability to do things locally to be really
pleasant and it enables workflows not really possible with systems such
as SVN. In particular, I work at three distant places every week, and
the ability to work in the transportation, and the trivial
synchronization between computers is definitely helpful. Instantaneous
log and annotations is also really useful IMHO.
Which DVCS ?
============
The 3 ones which keeps coming up are:
- git (the one used for linux kernel development). That's the one I
know the least (only from a user point of view, never used it for
developement). It is supposed to be more powerful, more complicated than
the others. It is also known to be really fast (the kernel is not a
small codebase for sure).
- mercurial: started at the same time than git. Is written in python
except for a few things written in C. It is reasonably fast, and has
been recently selected for some bigs projects, in perticular by Sun
(openJDK, openSolaris, open Netbeans).
- bzr: also written in python. Sponsored by Cannonical, the company
between Ubuntu. It has just reached the 1.0 version. The focus is on the
UI; handles renaming really well. It has a vibrant community, with
dedicated developers working on it; it has the reputation of being slow,
which was somewhat true previously, but in my experience, it is on par
with mercurial, at least for local operations. Anyway, it is not a
problem for numpy or scipy, which are small codebases (a few thousand of
files, a few thousand revisions).
Problems:
=========
Assuming people think it worths being tried out, I mainly see two problems:
- importing the current history
- integration with trac
For bzr, I can say that the bzr-svn plugin works really well; in
perticular, it can import numpy and scipy repositories with the whole
history, I am using it regurlarly as a proxy between local bzr and the
scipy and scikits trunk. Incidentally, this makes it possible for me to
give numbers if numbers are needed wrt bzr's speed, repository size, etc...
For mercurial, I tried one method once which did not go really far, but
I did not try really hard; anyway, I think people at enthought use
mercurial a lot, so they would know better.
Integration with trac is the real problem, I think. According to one bzr
developer, trac model (0.10, the last released one) is really based
around subversion notion of repository, which does not fit well with
mercurial and bzr. I don't know if this is true for the not yet released
0.11. If bzr is considered a possible candidate, I can get more
informations from bzr developers. What is the experience wrt trac from
enthought developers ?
This email is already getting pretty long, so to conclude, I think DVCS
would be helpful for future development of numpy/scipy. I believe it
would both enable easier participation from different people, enabling
safer developement schedules, etc... What do other people think ? Would
it be worthwhile to discuss further around the issues and how to resolve
things ?
cheers,
David
P.S: I would be willing to take care about the bzr side of things:
trying conversion, setting up experimental repositories for trial, and
asking advices to the bzr community.
First things first, happy new year to all !
Having recently felt the pain to use subversion merge, I was
wondering about people's feeling on moving away from subversion and
using a better system, ala mercurial or bzr (I will talk about bzr
because that's the one I know the most, but this discussion is really
about using something better than subversion, not that much about bzr).
I think this could be an important step forward, and is somewhat related
to the discusions on scikits and co.
As some of you are certainly aware, there has been a recent trend
towards so called Distributed Version Control Systems (DVCS). I won't go
into the details, because it varies from system to system, and I am in
no position to explain technical details. But for people who are
wondering, here is a small description of DVCS, and why I think this can
be a significant step forward for numpy/scipy. You can skip it if you
know about them
What is a DVCS
==============
DVCS, contrary to centralized systems like CVS or SVN, has no
technical concept of a central repository from which everybodoy pulls
and push changes. Instead, the DVCS are centered around the branch
concept, which contains a local copy of the history. As a consequence:
1 you can do most traditionnal svn/cvs operations locally, and
disconnected from the network (getting the log, getting annotations,
commiting, branching, merging from other branches).
2 because the branch is local, no rights is needed: anybody can
jump-in, commit to a local branch. Of course, integration in an official
numpy branch would need some special approval.
Also, this has the following consequence: since branching/merging is
such a key point of DVCS, merging actually works with DVCS. In
perticular, merging several times the same changes works, and you
certainly do not have to do the whole svn madness of tracking versions.
For more informations, here are some links which go much deeper:
- some discussion from K. Richard, the maintained of X.org:
http://keithp.com/blogs/Tyrannical_SCM_selection/,
http://keithp.com/blogs/Repository_Formats_Matter/
- Linus Torvald on the advantages of git, the DVCS he wrote for
linux developement, versus svn for kde (long but it really makes all the
points really clearly):
http://lists.kde.org/?l=kde-core-devel&m=118764715705846&w=2
Why using a DVCS ?
==================
Some people argue that DVCS are intrinsically more complicated, which is
something I really don't understand. I've been programming 'seriously'
for only about 2-3 years, and I find bzr much easier to use and setting
up than subversion; the key point I think is that I started using DVCS
before centralized ones. Some things which are utterly complicated with
subversion and are trivial with bzr: merging, going back into the
history (that is at rev 150, you realize that everything from rev 140 is
rubbish, and you want to go back: this is extremely tedious to do with
subversion). Basically, most of the things which are the reasons why we
use VCS in the first place are easier with DVCS than VCS (at least as
far as svn is concerned). Also:
- For a casual user who wants to use the last development instead of a
release, getting it from a bzr repository, a git repository, a mercurial
repository or a svn repository is extremely similar. It is one step in
all cases.
- For casual developers: being able to use branches means that they can
implements their new features in a change-set oriented way instead of
one big patch. Also, bzr enables things like uncommit if you made a
mistake and wants to go back. More generally, going back in history is
much easier.
- For core developers: I personally find the ability to use branches for
each new feature to be extremely useful. It makes me feel much safer
when I do something. I am not afraid of doing something totally stupid
which may end up screwing other people.
And finally, I find the ability to do things locally to be really
pleasant and it enables workflows not really possible with systems such
as SVN. In particular, I work at three distant places every week, and
the ability to work in the transportation, and the trivial
synchronization between computers is definitely helpful. Instantaneous
log and annotations is also really useful IMHO.
Which DVCS ?
============
The 3 ones which keeps coming up are:
- git (the one used for linux kernel development). That's the one I
know the least (only from a user point of view, never used it for
developement). It is supposed to be more powerful, more complicated than
the others. It is also known to be really fast (the kernel is not a
small codebase for sure).
- mercurial: started at the same time than git. Is written in python
except for a few things written in C. It is reasonably fast, and has
been recently selected for some bigs projects, in perticular by Sun
(openJDK, openSolaris, open Netbeans).
- bzr: also written in python. Sponsored by Cannonical, the company
between Ubuntu. It has just reached the 1.0 version. The focus is on the
UI; handles renaming really well. It has a vibrant community, with
dedicated developers working on it; it has the reputation of being slow,
which was somewhat true previously, but in my experience, it is on par
with mercurial, at least for local operations. Anyway, it is not a
problem for numpy or scipy, which are small codebases (a few thousand of
files, a few thousand revisions).
Problems:
=========
Assuming people think it worths being tried out, I mainly see two problems:
- importing the current history
- integration with trac
For bzr, I can say that the bzr-svn plugin works really well; in
perticular, it can import numpy and scipy repositories with the whole
history, I am using it regurlarly as a proxy between local bzr and the
scipy and scikits trunk. Incidentally, this makes it possible for me to
give numbers if numbers are needed wrt bzr's speed, repository size, etc...
For mercurial, I tried one method once which did not go really far, but
I did not try really hard; anyway, I think people at enthought use
mercurial a lot, so they would know better.
Integration with trac is the real problem, I think. According to one bzr
developer, trac model (0.10, the last released one) is really based
around subversion notion of repository, which does not fit well with
mercurial and bzr. I don't know if this is true for the not yet released
0.11. If bzr is considered a possible candidate, I can get more
informations from bzr developers. What is the experience wrt trac from
enthought developers ?
This email is already getting pretty long, so to conclude, I think DVCS
would be helpful for future development of numpy/scipy. I believe it
would both enable easier participation from different people, enabling
safer developement schedules, etc... What do other people think ? Would
it be worthwhile to discuss further around the issues and how to resolve
things ?
cheers,
David
P.S: I would be willing to take care about the bzr side of things:
trying conversion, setting up experimental repositories for trial, and
asking advices to the bzr community.