Subversion, decentralized version control, and the future.

Discussion:

Karl Fogel

2007-06-28 23:56:46 UTC

I've been wanting to post this for a while, but was waiting for the
dust from Linus Torvald's GIT talk to settle first (for those who
haven't seen it:

Eric
Raymond's recent post thanking the Subversion team gives me the excuse
I needed to finally sit down and write this :-). (Eric's post is at
http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=128106.)

In his talk, Torvalds explained why he thinks decentralized version
control systems (like GIT and Mercurial) are the way of the future,
and why he thinks Subversion got it all wrong. I think that's a
misanalysis, and will describe why below. Unfortunately, Torvalds
also indulged in a childish presentation style that distracted from
his useful technical criticisms of Subversion. Since I'd like to use
some of his arguments as a jumping-off point for thoughts on
Subversion's future, here they are in brief:

* Optimizing merging is as important as optimizing branching (if
not more so).

* Speed matters: when a common operation goes from thirty seconds
to half a second, that changes the whole way you work.

* Having all history locally (or at least as much history as you
need for a given operation) is useful.

* Reducing the thickness of the "commit access" wall is good for
development. Torvalds didn't make this argument terribly well,
so I'll try to restate what I think was his point:

The important question is, who can put changes into the
repository that the project is publishing releases from? This
should not be confused with commit access in the technical sense.
Instead, think of it this way: committing is just a way to
connect changes to other changes, and you shouldn't need my trust
in order to connect your changes to anything you want to connect
them to. The real question is, when and how do I include your
changes in my release? So the issue isn't commit access, it's
having trust networks and convenient methods of change selection.

When I talked to Brian Fitzpatrick about this, he listed three things
as top priorities:

- Faster. Subversion does need to be faster for many ops.
- Offline commits.
- Local branches.

I would add "better merging", but basically agree with Fitz (note that
we're getting much-improved merging in Subversion 1.5).

Now, since decentralized is formally a superset of centralized, one
way to get all those things is just to use a decentralized system.
That doesn't guarantee "faster", of course, but we already know that
Mercurial and GIT have good performance, so no problem there.

But there's another factor...

True decentralized systems are really hard for most people to wrap
their heads around.

Those of us who work in version control, who think about branching and
merging and change algebra and diff3/diff4 algorithms all day, often
forget this. One of Subversion's biggest advantages, and one of the
reasons it's taking over the world, is that it's really easy to
understand. There's a repository; you check stuff out; you modify the
stuff; you check it back in. Comprehensibility is a big part of
survivability: wherever Subversion goes from here, it must not become
so complex that it can't be explained in five minutes or less with no
questions from the audience. (I've witnessed enough explanations of
decentralized systems to know that their learning curve is generally a
bit steeper, though it may well be worth it.)

So we should acquire some of the characteristics of decentralized
systems (SVK will be a useful guide here, as will GIT and Mercurial
and other systems). But, in the words of Eric Gillespie in IRC just
now, we shouldn't morph into an "also-ran DVCS"; the result would be
hard to maintain and impossible to explain.

Sure, centralized VC can be viewed as a particular restricted mode of
decentralized VC. But in practice it won't work out very well to have
a natively decentralized VC that most users configure to be
centralized. For many organizations, including open source projects,
centralization is a feature: you want changes (and branches) to end up
in the master repository sooner rather than later, so they'll be
visible to everyone, so they'll be backed up, so they'll go through
the central hook system, etc. It focuses the community on a shared
object (Ben Collins-Sussman makes this argument in more detail at
http://blog.red-bean.com/sussman/?p=20).

A general tool configured to behave in a specific way is never quite
as natural to use as a tool designed for that specific use in the
first place. In other words, Subversion can -- will have to -- take
on some of the features of decentralized VC systems, but it will never
be as good a decentralized system as they are. By the same token, a
decentralized system can be configured to work like a centralized one,
but will never be as good at it as Subversion is. The trick for us is
to keep the centralization feature without some of the limitations
that have traditionally come with centralization.

Concretely, what does this mean?

One of Subversion's flaws (mea culpa) is that we didn't realize the
usefulness of having symmetrical functionality on the client and
server sides. The working copy should really be a repository, even if
it's not always going to store all the history available on the server
side (with some projects, you really can't, it's too big).

So we're going to need a working copy rewrite. We knew that; in fact
we've talked about rewriting the repository to use something like
Mercurial's revlog format, for various reasons, and about using that
kind of repository for working copies as well.

We also have to be faster. Fortunately, we've pretty much agreed,
IIRC, that we're willing to punt on subdirectory detachability in
working copies in order to get performance improvements.

And now I'm going to hand-wave on a lot of details. I don't mean to
start the Subversion 2.0 design thread now, just to offer some
thoughts on general goals. We don't need to let labels guide our
thinking ("We are a centralized system!" / "We are a decentralized
system!"). We do need to recognize that users are not interested in
becoming version control experts, and we need to pay close attention
to what they actually want, as opposed to what experts might want them
to want.

Case in point: what's the most popular feature added to Subversion
after the 1.0 release? Probably file locking (the ultimate
centralized feature, by the way). Yes, the heavy-duty developers wish
for better merging, and I don't blame them. But from watching users@
and irc.freenode.net/#svn, talking to companies that do Subversion
support, and from doing some Subversion consulting myself, I think
locking was actually a more important feature. (Of course, we have it
already, so that doesn't change anything about Subversion's future,
I'm just making a point about what's important to users.)

Subversion's phenomenal adoption rate (*) isn't due to being the only
game in town. We never were, if you count the proprietary systems,
and we're even less so now that the open source version control world
has become so fertile. The reason Subversion is taking over the world
is because it is tremendously user-focused, and because it provides
well-documented APIs that enable other developers to write software on
top of Subversion. We should copy what we need from the decentralized
systems, but remember that most users don't know or care whether a
system is centralized or decentralized -- their ideal system is one
they don't notice. Let's keep our eye on the ball, so they don't have
to.

-Karl

(*) http://subversion.tigris.org/svn-dav-securityspace-survey.html

2007-06-29 01:08:09 UTC