Discussion:
gcc branches?
Per Bothner
2002-12-03 00:47:27 UTC
Permalink
Any ETA on when the gcc trunk will be open again for
non-bug-fixes? I've got two space-reduction patches
I'd like to check in soon, and I'm trying to figure out
whether to check in to the 3_4-basic-improvements-branch
or wait until the trunk is open.

Will someone automatically merge in changes from the 3.4
branch into the trunk, or are each of us responsible for
our own merges? I hope the former, though if there are
any conflicts that may be difficult.
--
--Per Bothner
***@bothner.com http://www.bothner.com/per/
Zack Weinberg
2002-12-03 01:02:54 UTC
Permalink
Post by Per Bothner
Will someone automatically merge in changes from the 3.4
branch into the trunk, or are each of us responsible for
our own merges? I hope the former, though if there are
any conflicts that may be difficult.
I have been doing trunk->branch merges all along, and I intend to
merge the entire branch back to the trunk at once, shortly after the
3.3 branch is created. It will need testing though. (I've got x86,
sparc, and hppa test platforms lined up; other architectures would be
helpful.)

zw
Per Bothner
2002-12-03 00:57:47 UTC
Permalink
Post by Zack Weinberg
I have been doing trunk->branch merges all along, and I intend to
merge the entire branch back to the trunk at once, shortly after the
3.3 branch is created. It will need testing though. (I've got x86,
sparc, and hppa test platforms lined up; other architectures would be
helpful.)
Great! Have you (or Mark?) tested merging the 3.4 branch with the
cp-parser-branch, since (I gather) we'll be merging that in too
as soon as the trunk opens up?
--
--Per Bothner
***@bothner.com http://www.bothner.com/per/
Zack Weinberg
2002-12-03 01:31:33 UTC
Permalink
Post by Per Bothner
Post by Zack Weinberg
I have been doing trunk->branch merges all along, and I intend to
merge the entire branch back to the trunk at once, shortly after the
3.3 branch is created. It will need testing though. (I've got x86,
sparc, and hppa test platforms lined up; other architectures would be
helpful.)
Great! Have you (or Mark?) tested merging the 3.4 branch with the
cp-parser-branch, since (I gather) we'll be merging that in too
as soon as the trunk opens up?
Not to my knowledge. Mark's been playing that branch close to his
chest.

zw
Tom Lord
2002-12-03 02:03:32 UTC
Permalink
The following exchange is typical of many that cross the gcc list, and
thus, at least iconically, it represents a lot of people's time and
money.
Post by Per Bothner
Any ETA on when the gcc trunk will be open again for
non-bug-fixes? I've got two space-reduction patches
I'd like to check in soon, and I'm trying to figure out
whether to check in to the 3_4-basic-improvements-branch
or wait until the trunk is open.
Will someone automatically merge in changes from the 3.4
branch into the trunk, or are each of us responsible for
our own merges? I hope the former, though if there are
any conflicts that may be difficult.
I have been doing trunk->branch merges all along, and I
intend to merge the entire branch back to the trunk at once,
shortly after the 3.3 branch is created. It will need
testing though. (I've got x86, sparc, and hppa test
platforms lined up; other architectures would be helpful.)
If the project was using `arch', I think you'd have an easier time.

Since it is cheap and easy to create branches (spanning repositories),
you (per) would have the option of forking the trunk and committing
your changes there. By hand, or (better), automatically given some
infrastructure work first, as the trunk evolves, you could have the
evolving trunk re-merged nightly with your changes so that any
conflicts are noticed early, giving you some flexibility about when
you fix them (or about who fixes them). When the freeze is off,
merging the changes back to the trunk would go smoothly. (Of course
you can do similar things using any revision control system, arch
simply makes it easier than most and is designed with infrastructure
automation in mind.)

I've been thinking it makes sense to automate things as follows
(though feedback from gcc developers might change my mind):

*) developers have their own repositories

Particularly, developers working on new features rather than those
doing release integration/engineering. One advantage of this is
that each feature developer can work on a branch of the trunk, using
`star-merge' as described above, to catch conflicts as early as
desired. Another advantage is that feature developers can decide to
merge their branches with one another early, simplifying integration
into the trunk down the road -- shortening freeze periods. Another
advantage is that early adopters of features can gain access to them
earlier, more easily.


*) the gcc web site catalogs all those repositories

You can browse and review features under development, for example.
You can get email notifications as a feature evolves. You can run a
cron job that will detect (early) when two features-in-progress
touch overlapping files and are likely to conflict. These features
can be offered even to developers who do not have write access to
the primary repository.


*) favored developers get automated testing

By adding a header field to some of your log messages, you can
trigger overnight testing of your branches. This facility can be
safely extended even to developers who do not have write access to
the primary repository -- and in that way, it can reduce the
workload on those who do have write access and have the job of
merging in other people's patches.


*) trunk merges get automated vetting

By adding a header field to some of your log messages, you can
declare that a branch is ready to be merged into the trunk.
This implies, for example, that you have resolved all conflicts
on your branch -- that if the trunk remains stable, the merge will
be clean. Automatically, then, the merge can be validated and
tested before, finally, being committed to the trunk.

Of course, if two developers publish conflicting changes at the same
time, human intervention is necessary for at least one of those
change sets -- but here, too, arch can help: by helping to
automatically detect the condition, to notify the relevant
developers, and to provide them with good tools for fixing the
conflicts.


I also note that "CVS is wedged" posts occur frequently on the gcc
list. With a modest up-front investment to finish `arch 1.0' and test
the heck out it, I think it is clear (from the designs of both
systems), that arch's admin costs will be lower -- repositories will
break less often (and perhaps, essentially, never). There has been
some sentiment on this list to move to svn -- I encourage design
inspections and comparisons (contemplating admin costs and robustness)
there as well.

Finally,
Post by Per Bothner
Great! Have you (or Mark?) tested merging the 3.4 branch
with the cp-parser-branch
Such testing is straightforward to automate with arch and a status
inquiry like this one should be answered by automatically generated
web pages.


-t
Per Bothner
2002-12-06 21:29:11 UTC
Permalink
From what I hear, arch is a much more visisionary and conceptually
powerful framework. However, the "engineering" is lacking. So we
have a choice between:
* Sticking with CVS.
* Switching to Subversion, once it is solid enough.
This is basically an "improved CVS".
* Switch to arch (after making it suitable).
* Switch to bitkeeper (not an option for gcc).

So the choice (in say a year or so) is between subversion or arch.
I suspect it will be subversion, just because subversion has had more
resourches put into making it a solid, eficient, maintainable and
"production quality".

It would be nice to make arch equally "production quality", but
that takes a lot of work. I hope you can find volunteers to do
that work. Perhaps "stealing" as much code from subversions may
be worth considering. I.e. merge the ieads of arch into the
framework of subversions? (I say this without know either code-base,
so it probably doesn't make sense.)
--
--Per Bothner
***@bothner.com http://www.bothner.com/per/
Tom Lord
2002-12-07 10:29:36 UTC
Permalink
The following was initially the concluding paragraph, but since I
think it is the most important, let me lead off with it:

`arch' is fun, cool, interesting, simple, and has a clear path to
making your and lot of other people's lives better. So let's get off
the posing and resume being hackers -- please dig into it -- I predict
you'll come to dig it. We'll find a way to solve the funding problems
as we go along.

Regards,
-t



And, true to form, here's the vitriol


When one or both get to the stage that they are serious
contenders to replace cvs, then I'd like someone whose
judgement I trust (it could be me, I guess) to evaluate both.


The situation is more complicated than that.

`arch', unless something changes, will _not_ reach that stage. In
spite of your protests, I think you are in a position to help fix
that.

`arch' _could_ reach that stage, for less money than svn by my
estimation, and producing a far better end result (that is much
needed). And this is fairly easy to determine, if you look into it in
reasonable depth.

So I must, unfortunately, ask people such as you to put their
momentary convenience aside, and do a little planning and looking
ahead in this area. Is not GCC engineering process a legitimate
concern of the SC?


But until then, I think it is premature or either myself or the
gcc steering committee to spend much time evaluating them (unless
personally interested, of course).

If you stick to that, you are doing the community, the market, and the
free software movement a disservice. Surely planning is an important
function of the SC. Surely you recognize that effective
(i.e. proactive, positively consequentlal) planning is important at
this juncture.


Convincing me won't really help you or arch. I'm not an opinion-maker
in the [presumably relevant part of] Free Software world.

Hopefully this is false, given your SC membership. If it is not, then
I think it is time to ask some questions about what your duty as an SC
member is -- and about the role of the SC overall.



If I were, I'd be rich from Kawa - which a very few people rave
about.

Surely you aren't asking for a project-endorsement quid-pro-quo.
Actually, not "surely" at all -- one can see such q-p-q's operational
elsewhere -- so why not here, eh?

Really, I can't begin to fathom how Kawa enters into this discussion
at all, unless as a possible implementation language for an `arch'.

In general, your modern western euro royalist approach to SC duties
("deserving infinite deference, yet responsible for nothing") is, to
put it lightly, disheartening.

Regards,
-t
Per Bothner
2002-12-07 16:58:21 UTC
Permalink
Post by Tom Lord
The following was initially the concluding paragraph, but since I
`arch' is fun, cool, interesting, simple, and has a clear path to
making your and lot of other people's lives better. So let's get off
the posing and resume being hackers -- please dig into it -- I predict
you'll come to dig it. We'll find a way to solve the funding problems
as we go along.
Fine. However, I personally have my own "fun, cool, interesting"
project(s) that I'm already working on, or that I want to work on.
The same applies to many of us, or we have no time/interest hacking
on new projects after our days jobs.
Post by Tom Lord
`arch', unless something changes, will _not_ reach that stage. In
spite of your protests, I think you are in a position to help fix
that.
Other big projects have managed to become solid and useful using
people's spare time.
Post by Tom Lord
So I must, unfortunately, ask people such as you to put their
momentary convenience aside, and do a little planning and looking
ahead in this area.
You can ask it, but you cannot expect it.
Post by Tom Lord
Is not GCC engineering process a legitimate
concern of the SC?
No, it is a legitimate concern of the entire gcc community. It is
*not* the SC's job to initiate, lead, or fund projects. It is the
SC's job abjudicate between competing proposals, if the technical
leadersship (a mcuh less formal group of people) do not agree.
Post by Tom Lord
If you stick to that, you are doing the community, the market, and the
free software movement a disservice.
The Free Software movement will just have to manage. We've all done
plenty for it, and can't do everything.
Post by Tom Lord
Surely planning is an important function of the SC.
Only to a very limited extent. Since we have no means of
enforcement or sources of funding, and depend entirely on
volunteers, it is limited what planning can do.
Post by Tom Lord
Surely you recognize that effective
(i.e. proactive, positively consequentlal) planning is important at
this juncture.
It may be important, but I don't think it's going to happen.
We tend to be more re-active.

I'd love to have a "Gcc Foundation" (within the FSF of course) that had
a real budget, and real plans. I believe this has been tried before,
and may yet succeed. But I don't have the time, skills, or inclination
to do all the politicking needed.
Post by Tom Lord
Convincing me won't really help you or arch. I'm not an opinion-maker
in the [presumably relevant part of] Free Software world.
Hopefully this is false, given your SC membership.
The SC as a whole may have name recogniztion and influence comparable
to (say) Larry Wall or Linus Torvalds. Individually, none of us have
anywhere close to that level. That's a fact. Many of us do have a
fair level of name recognition and respect. People will talk with us,
but that doesn't mean they will give us money. Remember, we are still
in a recession. Discretionary spending is very hard to "sell".
Post by Tom Lord
If it is not, then
I think it is time to ask some questions about what your duty as an SC
member is -- and about the role of the SC overall.
Ask ahead. Change requires someone with the will and energy to do
better/different.

Setting up a more pro-active "foundation" has been tried. That doesn't
mean it can't succeed or shouldn't be tried again.
Post by Tom Lord
If I were, I'd be rich from Kawa - which a very few people rave
about.
Really, I can't begin to fathom how Kawa enters into this discussion
at all, unless as a possible implementation language for an `arch'.
It was an example of an established project close to my heart that has
technical superiority (I and others think) and happy users, but I still
can't raise enough money for it to pay myself a decent wage. So while
you have my sympathy, I can't offer you more.
Post by Tom Lord
In general, your modern western euro royalist approach to SC duties
("deserving infinite deference, yet responsible for nothing") is, to
put it lightly, disheartening.
You're free to start a revolution.
--
--Per Bothner
***@bothner.com http://www.bothner.com/per/
Tom Lord
2002-12-07 20:54:33 UTC
Permalink
Post by Per Bothner
Post by Tom Lord
Surely planning is an important function of the SC.
Only to a very limited extent. Since we have no means
of enforcement or sources of funding, and depend
entirely on volunteers, it is limited what planning can
do.
Right. I recognize that. SC legitimacy is historically meritocratous
although there is a tiny bit of practical authority originating out of
the distribution of CVS and web site passwords. Mostly it was the
EGCS foo and subsequent nearly-monotonic improvments to GCC that give
the SC its standing. Y'all seem to be proud of that and that seems
appropriate to me.

What comes after that success?

Some of the volunteers the SC works with are corporations and part of
their contribution is funded work. I think it's legitimate for the SC
to approach those companies about how to make their spending more
efficient for them, and more effective for gcc as a whole. The SC is
far better positioned, politically, than I or any other individual to
have that kind of conversation with those volunteer corporations.

If you have some individual volunteers that want to make a long-term
quasi-commitment to your project, it's customary to give some guidance
about what work needs doing -- not just a task list, but a real
negotiation/planning session that helps to produce a task list. The
corporations are volunteers who are making long-term quasi-commitments
to spend a decent amount of money on GCC.
Post by Per Bothner
Remember, we are still in a recession. Discretionary
spending is very hard to "sell".
Many business leaders/educational materials, etc. will tell you that
during a recession, R&D spending and marketing are the two things you
want to protect and even increase. It's part of how recessions get
fixed. Canceling product lines that aren't going to make it, forming
mergers, tuning productivity gains -- those are the ways to align
spending to reduced revenue.
Post by Per Bothner
I'd love to have a "Gcc Foundation" (within the FSF of
course)
I proposed that a while back and the feedback was almost universally
negative. I think it strikes people as too much like socialism, or
something: there's a definate resistence to handing over substantial
budgets to NPOs. There is something appealing about the companies who
fund much of GCC spending their own money directly. People like
capitalist freedoms.

So I'm not proposing a big SC budget today.

Here's an analogy to what I'm proposing: let's suppose it was the
80s and GCC was an in-house project at a large vendor. Let's suppose
that GCC had a small team of project-leads from various divisions --
the SC -- who held engineering, not management titles (i.e., they have
little or no budgets). In that circumstance, it would be quite
ordinary and useful for the project leads to suggest plans and
budgeting strategies to the managers, and for that to be an important
factor in how budgeting happens. Since the project spans divisions,
there'd be some politics among the managers to figure out which
budgets provide what parts of the funding -- but the SC would be
providing project-perspective guidance about how to coordinate that
budgeting.

Now, GCC isn't an in-house project, and the management in question
doesn't span divisions -- it spans companies. But it's still useful
to try to coordinate spending among those managers more directly than
at the level of todo list / patch acceptance / release management.

GCC is interesting in particular because, historically, it is a
"template project" -- a successful commercialization of free software
that people try to emulate with other projects.

That's all very general. With `arch', it gets a bit more concrete
because what I've been proposing are tools that specifically address
lowering the costs and raising the quality of source-based cooperation
among the volunteers. That's an issue that is demonstrably useful for
gcc (scan the past year of the dev list on the topics of CVS, testing,
and branch management). And it's an issue that, if addressed
cleanly, can benefit other projects as well.

In economic theory, the volunteer corporations like projects like GCC
because of the low "transaction costs" for inter-corporate
cooperation. They should be receptive to plans that reinforce and
improve that, especially if the plans can help other projects as well.
The FSF, meanwhile, likes volunteer corporations because they
contribute useful work: it should be receptive to plans that can
improve the quality and quantity of what volunteer cooperations can do
per dollar spent.



Fine. However, I personally have my own "fun, cool, interesting"
project(s) that I'm already working on, or that I want to work on.
The same applies to many of us, or we have no time/interest hacking
on new projects after our days jobs.

While a big SC budget isn't popular, perhaps (as with executive
boards) SC members should receive honorariums or stipends for their
SC activities.




[Kawa] was an example of an established project close to my heart
that has technical superiority (I and others think) and happy
users, but I still can't raise enough money for it to pay
myself a decent wage. So while you have my sympathy, I can't
offer you more.

Ok. Kawa may be symptomatic of another (related) problem: that the
free software businesses haven't (for the most part) figured out how
to invest in strategic, practical research in any kind of systematic
way.

In general, there's too much "magic cauldron" thinking implicit in how
the industry is relating to free software these days.

Regards,
-t
Tom Lord
2002-12-07 07:08:22 UTC
Permalink
(I say this without know either code-base, so it probably
doesn't make sense.)
That's a bug. If you ask focused questions, I will be happy to answer
them. It shouldn't take very many hours to grok the situation. In
person, I'd bet it would take one or two (since I recall you as being
smart/competent). Do you like good beer?
[From what I hear] the "engineering" [in `arch']
is lacking.
Yes, well, bullshit rumours are like that.

Ask some serious questions or read the `arch' source code. I'll be
happy to help with your evaluation because I am confident that if you
are not foolish about it, I'll "win".

I can also help a little bit by providing some critical perspective
about svn.

-t
Per Bothner
2002-12-07 08:01:03 UTC
Permalink
Post by Tom Lord
That's a bug. If you ask focused questions, I will be happy to answer
them. It shouldn't take very many hours to grok the situation.
It wouldn't do much good, since I'm not in a position to either
finance or recommend either svn or arch. When one or both get to
the stage that they are serious contenders to replace cvs, then
I'd like someone whose judgement I trust (it could be me, I guess)
to evaluate both. But until then, I think it is premature or
either myself or the gcc steering committee to spend much time
evaluating them (unless personally interested, of course).
Post by Tom Lord
[From what I hear] the "engineering" [in `arch']
is lacking.
Yes, well, bullshit rumours are like that.
Perception rules, often. Overcoming incorrect perception is
possible, but takes a lot of work.
Post by Tom Lord
Ask some serious questions or read the `arch' source code. I'll be
happy to help with your evaluation because I am confident that if you
are not foolish about it, I'll "win".
Convincing me won't really help you or arch. I'm not an opinion-maker
in the Free Software world. If I were, I'd be rich from Kawa - which
a very few people rave about.
--
--Per Bothner
***@bothner.com http://www.bothner.com/per/
Phil Edwards
2002-12-07 22:48:47 UTC
Permalink
Post by Tom Lord
[From what I hear] the "engineering" [in `arch']
is lacking.
Yes, well, bullshit rumours are like that.
Ask some serious questions or read the `arch' source code. I'll be
happy to help with your evaluation because I am confident that if you
are not foolish about it, I'll "win".
This is why I distrust arch; I distrust the mentality of the authors
behind it. Anyone who says, "certainly, let's discuss it, and if you aren't
stupid, you'll agree with me," is too arrogant to be bothered to work with.
Post by Tom Lord
I can also help a little bit by providing some critical perspective
about svn.
So, you're qualified to dispense criticism of /other/ tools, but criticism
of /your/ tool is "bullshit rumor".

Sorry, Tom. You aren't going to convince me to stop working on GCC and
work on arch instead; I don't have the time or the interest.

I'm pleased with the leadership provided by the SC; they take a light
touch in a community of volunteers. If I came home from work, sat down to
do some volunteer hacking, and was ordered by an arrogant, heavy-handed,
"you must focus on project <foo> now or else" SC, I would cordially invite
them to perform a certain anitomical impossibility, and take my resources
elsewhere. I don't believe the SC have the time or the interest to hack
arch code either. No do they have secret caches of funding.


I'm still trying to figure out what exactly -- concrete suggestions, now --
what exactly you want us to do, given the constraints of a) no extra time,
and b) no money.


Phil
--
I would therefore like to posit that computing's central challenge, viz. "How
not to make a mess of it," has /not/ been met.
- Edsger Dijkstra, 1930-2002
Tom Lord
2002-12-07 23:40:25 UTC
Permalink
Post by Tom Lord
Ask some serious questions or read the `arch' source code.
I'll be happy to help with your evaluation because I am
confident that if you are not foolish about it, I'll "win".
This is why I distrust arch; I distrust the mentality of the
authors behind it. Anyone who says, "certainly, let's discuss
it, and if you aren't stupid, you'll agree with me," is too
arrogant to be bothered to work with.

Wow. That's not what I'm saying. My bad for casually using the word
"foolish" on a mailing list, I suppose.

I'm saying:

I already know that you (SC members) are smart. I already know
that arch is good and the project is very useful to you. What's
been consistently lacking for many months now is a serious
discussion of the issues and technology domain. There's a
communication disconnect, it seems to me. In this particular
instance, not repairing that disconnect, will be seen in
retrospect as foolish.

"foolish" in contexts like this doesn't seem to me to be a personal
judgement about people's competence. Instead, it's just me, putting
some substantial part of what little good reputation I might have on
the line, explicitly. It's me saying: I'm not bullshitting here and
I'm not even just speculating -- this is real, and important, and
deserves your focused attention.

I could equally well have said:

I am confident that if you spend a bit of time digging into
these issues -- such that I become confident you really are
thinking about the design space and it's implications -- that
you'll come to agree with me that `arch' is a no-brainer, and
highly desirable technology for you.

It's a good heuristic for busy project leads to be dismissive by
default. My "foolish" comment is just stressing my judgement that
this is a case where that default response is inappropriate. Yes,
yes, I know you don't have time for 99% of anything off your main
track -- I'm insisting that this is in that 1%. Here, I'll use strong
language (e.g. "foolish"), to demonstrate my insistence. This is a
bit like that guy (was it Larry Wall?) who (carefully, so as not to
cause injury) threw and smashed a coffee-cup against a wall during a
wet-space meeting -- to signal, unambiguously: "Ok, I'm insisting
now."

So, again: "foolish" in this context is meant to underscore my
confidence -- not to denigrate others. Where I come from, this
idiomatic usage is common among engineers and understood positively
with a smirk and a "well, ok then". It's an instance of engineering
machismo functioning properly. Sadly, it has been my experience that
this elegant use of conversational valence is easily confused with
random email-based flamage.


So, you're qualified to dispense criticism of /other/ tools,
but criticism of /your/ tool is "bullshit rumor".

Yes.

Some criticisms (not the ones offered here) of `arch' are not
bullshit. But neither are they fatal: rather, they are part of why
arch needs just a bit of commercial investment to finish the job.
I have a number of such criticisms myself. This is part of the transition
from strategic R&D to tactical execution.

Yes, I am quite well qualified to say a thing or two about the
approach being taken by svn. Under some circumstances, it is my
social duty to speak up.


I'm pleased with the leadership provided by the SC; they take a
light touch in a community of volunteers. If I came home from
work, sat down to do some volunteer hacking, and was ordered by
an arrogant, heavy-handed, "you must focus on project <foo> now
or else" SC, I would cordially invite them to perform a certain
anitomical impossibility, and take my resources elsewhere.

Well, of course! I'm not asking the SC to change their relation to
you, an individual. I am asking them to first, start to understand
`arch' and the related projects I've been advocating and how those
things relate to the project; second, we can start to figure out
together how to relate this to the volunteer _corporations_.

If all goes well, it will become easier for you, as an individual, to
contribute as you like. Your time is valuable. `arch' can make
your life more fun, and your contributions more effective.


I'm still trying to figure out what exactly -- concrete
suggestions, now -- what exactly you want us to do, given the
constraints of a) no extra time, and b) no money.

This list is advertised as the best way to communicate with the SC.
The SC is the best way to communicate with the corporations.

"It was uphill both ways. And we liked it that way.",
-t
Tom Lord
2002-12-08 01:51:10 UTC
Permalink
So, again: "foolish" in this context is meant to underscore my
confidence -- not to denigrate others.


I'd like to add that:

I wish I had had sufficient resources to perfectly prepare the
ultimate presentation of why `arch' is both good and deeply relevent,
and sustain it's presentation on my web site.

I didn't. So I must request some genuine uptake[*] and professional
courtesy, instead: good conversation is cheaper than perfect
educational materials. The source, and some reasonably happy users
are available.

-t

[*] "genuine uptake"

A technical term from feminist philosophy that is hopefully
comprehensible even when applied outside that context. Within
feminist philosophy, it is often used in analysis of the origins of
justified anger.
Kai Henningsen
2002-12-08 13:23:00 UTC
Permalink
Post by Tom Lord
[*] "genuine uptake"
A technical term from feminist philosophy that is hopefully
comprehensible even when applied outside that context. Within
feminist philosophy, it is often used in analysis of the origins of
justified anger.
Sorry, reads like Chinese to me, either in or outside "feminist
philosophy" (which, in itself, sounds like a rather doubtful proposition
to me).

MfG Kai
Stan Shebs
2002-12-08 21:42:15 UTC
Permalink
[...] I am asking [the SC] to first, start to understand
`arch' and the related projects I've been advocating and how those
things relate to the project; second, we can start to figure out
together how to relate this to the volunteer _corporations_.
You'd do better to find an individual or subgroup who will get excited
about arch, use/improve it, and advocate it to other GCC developers.
IIRC, Larry McVoy got the Linux PPC port folks to start using BitKeeper
first, and then they helped sell it to the other kernel developers.

It would also help to be more specific about how arch will help *me*.
Despite the loose talk about altruism, 99% of GCC developers are really
doing it for selfish reasons - money, fame, joie de hack, scratching
an itch, whatever. We only cooperate because we can do more working
together than separately.

Now, almost all of *my* merge difficulties have been because Apple
changes to GCC are logically contradictory to FSF code. Does arch
include an intelligent merging component that is smart about C and
can figure out which pieces of FSF code need to be overridden by
Apple code, even if the FSF code changed? If not, then for *me*
it doesn't have any advantage over CVS, and there's no point in
trying to sell it to me.

Technology advocacy is like any other kind of selling; if the
customer doesn't buy, it's your failure, not the customer's.

Stan
Bruce Stephens
2002-12-08 22:17:49 UTC
Permalink
Stan Shebs <***@apple.com> writes:

[...]
Post by Stan Shebs
Now, almost all of *my* merge difficulties have been because Apple
changes to GCC are logically contradictory to FSF code. Does arch
include an intelligent merging component that is smart about C and
can figure out which pieces of FSF code need to be overridden by
Apple code, even if the FSF code changed?
No, but (like other modern CM systems) it remembers what's been
merged. So with CVS, you develop on a branch (presumably updating
with whats on the head), and when you want to update the head, CVS
does textual guessing to decide which apparent conflicts are real ones
(and this usually works pretty well, to be fair).

Arch remembers the updates you've done on your branch, so when you
want to update the head, you'll get fewer spurious conflicts.

So with arch, it makes sense to keep branches up to date with respect
to each other---it makes future merges easier. With CVS, it's
sometimes the opposite.

(Arch knows about file renames and things, too, which would matter for
some projects, but possibly not gcc.)
Post by Stan Shebs
If not, then for *me* it doesn't have any advantage over CVS, and
there's no point in trying to sell it to me.
Maybe not. It's hard to say without looking in more detail and trying
experiments.

[...]
Joseph S. Myers
2002-12-09 00:19:26 UTC
Permalink
Post by Stan Shebs
Now, almost all of *my* merge difficulties have been because Apple
changes to GCC are logically contradictory to FSF code. Does arch
Similarly, a common occurrence is that a new target, developed for some
time in an external tree, is merged in, but it follows older coding
standards and does not take account of global cleanups done to the main
tree in the mean time. How could arch tell that, say, FRV's xm-frv.h file
ought to have been removed when xm-files.h were removed generally from the
main tree, and the presence of such a file is a conflict, or that a
particular target macro should have been removed (except that we use
#pragma GCC poison to ensure that part), or that a certain coding style is
obsolete, or that all occurrences of a spelling error had been fixed, or
that something now needs documenting?

That sort of problem is what *I* generally see as merge problems - failure
to follow coding standards, especially as regards to documentation
(including comments), when patches are submitted, even though the issues
involved have been cleaned up in the tree before. (And I include in this
small patches that didn't need a branch - if a patch is submitted that
includes some particular Texinfo error, when all such were previously
fixed, there's a logical conflict in that the new code would have been
fixed if it had been there previously.) The basic mechanics of merging to
and from branches (a rare event, in any case) seem to work fine, the
logical aspects of keeping track of cleanups and refinements to coding
standards (involving in principle remembering on the order of a GB of mail
to the lists since the start of EGCS, though we try to include
documentation to reduce the effects of the impracticability of remembering
all the mail) need continual reminders to patch submitters to include
docs, or testcases, or that something in their patch is obsolete, or
misspelt, or will be ugly in the printed manual, or bad style.
--
Joseph S. Myers
***@cam.ac.uk
Joel Sherrill
2002-12-09 00:49:40 UTC
Permalink
Post by Joseph S. Myers
Post by Stan Shebs
Now, almost all of *my* merge difficulties have been because Apple
changes to GCC are logically contradictory to FSF code. Does arch
Similarly, a common occurrence is that a new target, developed for some
time in an external tree, is merged in, but it follows older coding
standards and does not take account of global cleanups done to the main
tree in the mean time. How could arch tell that, say, FRV's xm-frv.h file
ought to have been removed when xm-files.h were removed generally from the
main tree, and the presence of such a file is a conflict, or that a
particular target macro should have been removed (except that we use
#pragma GCC poison to ensure that part), or that a certain coding style is
obsolete, or that all occurrences of a spelling error had been fixed, or
that something now needs documenting?
If you find something that can really help on these issues, gcc is not
the
only project that has problems like this. I have used CVS on
applications
that had small teams and still saw global fixes not accounted for when
someone merged their long checked out work. All it takes is for someone
to
take a significant length of time to work on something while the rest of
the source base moves forward.

I know that checking for problems like this in submissions to RTEMS is
tedious and error prone. I have ot know what version they worked with
and try to account for global changes that might be in the mix.
Post by Joseph S. Myers
--
Joseph S. Myers
--
Joel Sherrill, Ph.D. Director of Research & Development
***@OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985
Tom Tromey
2002-12-09 07:43:58 UTC
Permalink
Joseph> That sort of problem is what *I* generally see as merge
Joseph> problems - failure to follow coding standards, especially as
Joseph> regards to documentation (including comments), when patches
Joseph> are submitted, even though the issues involved have been
Joseph> cleaned up in the tree before.

I agree. But as you point out it's hard to keep up with all the
changes that go through.

In automake our approach has been to automate the checking of such
changes. For instance, if we discover a new systemic error (sh and
make bugs occasionally pop up that require a sweep across all of
automake), we add a new entry to the `maintainer-check' target. Then
you can run `make maintainer-check' to see if anything has regressed.
Maintainers are expected to keep the code maintainer-check-clean.

This is basically the same idea as poisoning identifiers in system.h.
As we've seen with warnings, if we can't automate it, there will be
regressions.

Tom
Tom Lord
2002-12-09 11:04:26 UTC
Permalink
Now, almost all of *my* merge difficulties have been because
Apple changes to GCC are logically contradictory to FSF code.
Does arch include an intelligent merging component that is
smart about C

Ok, I'm stumped, perhaps because I don't know the details of what you
are describing. Why do you think a tool specifically knowledgable
about C is needed here?

If you have some fixed or slowly evolving deltas from FSF branches,
then arch can help you ... but that has nothing to do with C om
particular.

-t
Stan Shebs
2002-12-10 03:14:55 UTC
Permalink
Post by Stan Shebs
Now, almost all of *my* merge difficulties have been because
Apple changes to GCC are logically contradictory to FSF code.
Does arch include an intelligent merging component that is
smart about C
Ok, I'm stumped, perhaps because I don't know the details of what you
are describing. Why do you think a tool specifically knowledgable
about C is needed here?
To take an example of something real-life, we have additional warning
flags not in FSF sources because of the collective spoon-gagging
response when they were suggested. Now when Neil Booth reorganized
option flags into a c-opts.c (a good thing), I then had to go and
rewrite our local patches to fit this new scheme, deleting some code
(yay) and moving other bits into c-opts.c. All of this was manual
work.

Sure, it's unreasonable to expect any CM system to somehow divine the
meaning of an arbitrary change, and then do an arbitrary rewrite of a
local patch to work with it, but that is what consumed the great
majority of my time when merging. It may be more reasonable to
expect some class of patches to work - perhaps a local patch could be
made to follow the function it's patching, if the function is moved
to another file - but my real point is that at least for me, a change
of CM system would not have had much impact on the tasks that took
up the largest portion of my time.

Stan
Robert Dewar
2002-12-08 09:26:10 UTC
Permalink
Post by Tom Lord
This list is advertised as the best way to communicate with the SC.
The SC is the best way to communicate with the corporations.
(first I will put on my hat as president of a corporation devoted to
the use of Free Software).

This has things upside down. We do things because of input from our
customers. if someone tells us "gee, we would really like to have
arch working nicely with Ada, and we have $$$ to backup the request",
we will be very glad to talk and look. But I am afraid we are not
about to pay any attention to someone on the SC (or the whole SC
for that matter) exhorting us to use our resources on arch because
they think it would be a "good thing".

Robert Dewar
Ada Core Technologies

P.S. The idea that R&D does not get cut back during a recession is at best
wishful thinking, and at worst hopelessly naive. Now at ACT, it is defintiely
the case that we have not cut back on R&D, but that's because the recession
has not affected us significantly, and we continue to grow steadily.

I really think that if you want arch (or any other technology) to succeed,
convince some large scale users that they want it!
Tom Lord
2002-12-08 09:56:19 UTC
Permalink
Positivity:

(first I will put on my hat as president of a corporation
devoted to the use of Free Software).

Is "second" your hacker/engineer hat? That's the guy I want to talk
with. When you start asking questions about patch set formats and
their implications in context, then I'll know you are on the right
track. When you start asking questions about global namespaces for
revisions (changesets), by then we'll be getting along famously. When
you start asking about how to automate various aspects of the GCC
process in an arch-based framework, I'll [censored] and send you to
heaven.


Pointed vitriol:

This has things upside down. We do things because of input from
our customers.


Hmm. Too bad its _so_ unfashionable to talk about engineering ethics
-- otherwise I'd be able to flame you for that comment in the manner
it deserves without having to endure lots of stupid replies. Customers
are hyper-super-double-plus-ultimate-thats-why-we-are-here important,
but their demands do not trump fundamentals. Our relationship with customers
must be a two-way street.


P.S. The idea that R&D does not get cut back during a
recession is at best wishful thinking

Not from what I have read.


But I am afraid we are not about to pay

Yeah, but, you're comparitively dinky anyway, right? I don't mean
that as an insult. You're a successful but relatively tiny corp? If
so, I'm not talking to you (while you're wearing that hat). I do hope
to make your life better as a side effect, though -- and to help
others succeed at similar scales. Hurray for human-scale corps! You
go girl (flippency aside: tiny corps are really cool, if you ask me).

IBM, HP/C -- even all the way down to RHAT .... corps at that scale
can afford this. It's even very far from a large line-item for them.


"If a thing is worth doing...",
-t
Robert Dewar
2002-12-08 10:24:13 UTC
Permalink
Post by Tom Lord
Is "second" your hacker/engineer hat? That's the guy I want to talk
with. When you start asking questions about patch set formats and
their implications in context, then I'll know you are on the right
track. When you start asking questions about global namespaces for
revisions (changesets), by then we'll be getting along famously. When
you start asking about how to automate various aspects of the GCC
process in an arch-based framework, I'll [censored] and send you to
heaven.
I was responding specifically to your idea that the SC could approach
corporations to get funding. This part of your previous message was
about money, not about hacking. And I was specifically addressing
that request. Once again, if you want to extract funds from corporations
big or small you can't do it by exhoration, you have to show value unless
you are specifically asking for charity type handouts. Corporations
certainly do make such contributions, but there is a long line :-)
Post by Tom Lord
Yeah, but, you're comparitively dinky anyway, right? I don't mean
that as an insult. You're a successful but relatively tiny corp? If
so, I'm not talking to you (while you're wearing that hat). I do hope
to make your life better as a side effect, though -- and to help
others succeed at similar scales. Hurray for human-scale corps! You
go girl (flippency aside: tiny corps are really cool, if you ask me).
Now I begin to get some real sense of why you get nowhere. You seem to
to think you can get people to help you by insulting them. I have seen
you do this to the SC, and to individuals. It's a strange way to try
to win friends and supporters.


In fact a successful small corporation like ACT potentially is a much
better friend for you than RH or IBM, or any other large public company
that must answer to stock holkders or outside investors.

We have a 35 engineers who are very competent and entirely devoted to
the continued development of free software. We spend a lot of resources
in developing FS (for example, our current development of GPS).
Post by Tom Lord
Hmm. Too bad its _so_ unfashionable to talk about engineering ethics
-- otherwise I'd be able to flame you for that comment in the manner
it deserves without having to endure lots of stupid replies. Customers
are hyper-super-double-plus-ultimate-thats-why-we-are-here important,
but their demands do not trump fundamentals. Our relationship with customers
must be a two-way street.
I find this statement quite arrogant. I trust our customers a lot more
frankly than I trust you, since you so obviously have a (big) axe to
grind. Yes, I understand that you think this project is wonderful and
essential and valuable. Well so far that does not distinguish it from
dozens of other projects ranging in worth from dubious to useful. You
have to convince others of this fact, and you may be competent at
software developemnt (or not, I have no idea), but for sure your
competence in persuading other people is minimal. It just won't do to
call people unethical and stupid for disagreement with you.

If this interchange had managed to convince me that arch was of interest,
then I would be quite happy to have a look to see whether it might meet
our customers needs. Obviously customer needs are often expressed in very
general terms (we need a good IDE, we need a good CM system etc). And in
such cases, we definitely play an active role in suggesting (and developing)
appropriate solutions.

It really doesn't sound like you need $10 million and a team of 50 engineers
for this project. On the contrary it sounds like a relatively small
investment of effort by competent people could make a big difference. But
you seem more interested in fulminating than illuminating.

I have really learned nothing about arch from the thread that would entice
me to take a closer look, on the contrary, it has left a rather negative
impression. The idea that one should look at source code to figure out
what it is about is in particular a bit absurd.

It is rather sad to see what may possibly (for all I know) be a really
valuable project hanmpered and sabotaged by incompetent advocacy.
Tom Lord
2002-12-08 11:23:06 UTC
Permalink
Post by Tom Lord
Yeah, but, you're comparitively dinky anyway, right? I don't mean
that as an insult. You're a successful but relatively tiny corp? If
so, I'm not talking to you (while you're wearing that hat). I do hope
to make your life better as a side effect, though -- and to help
others succeed at similar scales. Hurray for human-scale corps! You
go girl (flippency aside: tiny corps are really cool, if you ask me).
Now I begin to get some real sense of why you get nowhere. You
seem to to think you can get people to help you by insulting
them.


My gawd man: how are you possibly insulted?

Is ACT much larger than I think it is (you go on to confirm that it is
not)?

I don't know much at all about ACT. I infer it is smaller than those
others that I named. I infer that it is successful. I stated, based
on those assumptions, that I think it belongs to a really cool class
of corps and I meant that. Good for you! The comment you objected to
is praise, not insult.

I'm "not talking to" ACT because, at your scale, my R&D funding needs
are too big for you and not central enough to your mission. I don't
want to waste your time by pretending otherwise.

How the heck did you manage to be insulted by my calling your corp
cool, when assessed in terms of its comparatively dinky size? I
wonder if you aren't eager to read insult into any little thing,
Cabron [pop music reference].
Post by Tom Lord
We have a 35 engineers who are very competent and entirely
devoted to the continued development of free software. We
spend a lot of resources in developing FS (for example, our
current development of GPS).
That's pretty much what I'd guessed. I'll reiterate: you go girl!
That's cool. I admire you. Human scaled, competent, successful:
neat! Sheesh. Are you just flipping out over my use of the word
"dinky"?



[on engineering ethics]
Post by Tom Lord
I find this statement quite arrogant. I trust our customers
a lot more frankly than I trust you
You don't have to "trust" me or your customers. We can talk over the
issues, preferablly in person (since vulnerabilities are part of the
topic, and since bandwidth helps), and then you can trust yourself.


It really doesn't sound like you need $10 million and a team
of 50 engineers for this project. On the contrary it sounds
like a relatively small investment of effort by competent
people could make a big difference. But you seem more
interested in fulminating than illuminating.


I've been estimating about 6 engineers for a year -- a little over
$1M. And that's to produce a really top-shelf 1.0. In total, that's
about x10 less than McVoy's estimate of $12M to compete with BK.


It is rather sad to see what may possibly (for all I know) be
a really valuable project hanmpered and sabotaged by
incompetent advocacy.

I've started to believe that there is no variation on advocacy that
could possibly succeed given presumptions such as you have exhibited.
It is interesting to try to trace those presumptions back to their
origins (*cough*cygnus). Yet another "bash on Tom" day, I guess.



"I am small but I am strong
I'll get it on with you
If you want me to
What else can I do" -- from "Cabron" by Red Hot Chili Peppers

-t
Robert Dewar
2002-12-08 11:37:11 UTC
Permalink
Post by Tom Lord
That's pretty much what I'd guessed. I'll reiterate: you go girl!
neat! Sheesh. Are you just flipping out over my use of the word
"dinky"?
No, it is just the entire style of your presentation.
Post by Tom Lord
I've started to believe that there is no variation on advocacy that
could possibly succeed given presumptions such as you have exhibited.
It is interesting to try to trace those presumptions back to their
origins (*cough*cygnus). Yet another "bash on Tom" day, I guess.
I would tend to agree if it is you doing the advocacy. My best advice,
find someone who knows how to approach other people successfully.
Post by Tom Lord
I don't know much at all about ACT
So I see :-)
Post by Tom Lord
I'm "not talking to" ACT because, at your scale, my R&D funding needs
are too big for you and not central enough to your mission.
Well how do you know? Given the previous quote?
In fact CM and revision control systems are quite critical to many of our
customers. We have several customers managing systems with tens of thousands
of files and millions of lines of code. Remember that the niche Ada occupies
is large scale mission critical systems.

Perhaps you are missing an opportunity here, though I must say the phrase
"my R&D" funding needs is worryingly personal, and as I said earlier, if
the intent of this thread was to encourage people to look at arch, it
has not worked with me.
Tom Lord
2002-12-08 22:06:31 UTC
Permalink
Post by Robert Dewar
No, it is just the entire style of your presentation.
In fact CM and revision control systems are quite critical to
many of our customers. We have several customers managing
systems with tens of thousands of files and millions of lines
of code.
[...]
Post by Robert Dewar
Perhaps you are missing an opportunity here
[...]
Post by Robert Dewar
if the intent of this thread was to encourage people to look
at arch, it has not worked with me.
I'm inexperienced in sales, but from what I read, the right thing here
is for me to solicit from you much more information about what you
think your (or your customers) needs are -- then if `arch' fits, I can
state why in your terms (and if not, thank you for your time and take
my leave). Ok?

So, I'm listening. For both the GCC project and ACT's customers, what
do you (and others on this list) initially think is important in
source management technology, especially, but not limited to revision
control and adjacent tools?

I said "initially" because I'm wondering how to proceed if you list
requirements that I think are buggy in one way or another. Is it
"good style" to point that out if it occurs?

I encourage you to spend a little time answering these questions.
There are currently three of four serious revision control projects in
the free software world (OpenCM, svn, arch, and metacvs), all in the
later stages of initial development. A lot of people, besides just
me, can probably benefit from your (and other GCC developers') input
-- and your input can help make sure you get better tools down the
road.

I have some observations that I hope your answers might begin to
address. These are observations of facts I think are relevant; I'm
assuming it's "good style" to stop there rather than to try to turn
these into leading questions. These observations include (in no
particular order):

1) There are frequent reports on this list of glitches with
the current CVS repository.

2) GCC, more than many projects, relies on a distributed
testing effort, which mostly applies to the HEAD revision
and to release candidates. Most of this testing is done
by hand.

3) Judging by the messages on this list, there is some tension
between the release cycle and feature development -- some
issues around what is merged when, and around the impact of
freezes.

4) GCC, more than many projects, makes use of a formal review
process for incoming patches.

5) Mark and the CodeSourcery crew seem to do a lot of fairly
mechanical work by hand to operate the release cycle.

6) People often do some archaeology to understand how
performance and quality of generated code are evolving:
they work up experiments comparing older releases to newer,
and comparing various combinations of patches.

7) Questions about which patches relate to which issues in the
issue database are fairly common.

8) There have been a few controversies from GCC "customers"
arising out of whether they can use the latest release, or
whether they should release non-official versions.

9) Distributed testing occurs mostly on the HEAD -- which
means that the HEAD breaks on various targets, fairly
frequently.

10) The utility of the existing revision control set up to
people who lack write access is distinctly less than
the utility to people with write access.

11) Some efforts, such as overhauling the build process, will
probably benefit from a switch to rev ctl. systems that
support tree rearrangements.

12) The GCC project is heavily invested in a particular
testing framework.

13) GCC, more than many projects, makes very heavy use of
development on branches.

-t
DJ Delorie
2002-12-08 22:44:55 UTC
Permalink
Post by Tom Lord
There are currently three of four serious revision control projects in
the free software world (OpenCM, svn, arch, and metacvs),
You forgot to list RCS and CVS.
Post by Tom Lord
2) GCC, more than many projects, relies on a distributed
testing effort, which mostly applies to the HEAD revision
and to release candidates. Most of this testing is done
by hand.
All my testing is automated.
Post by Tom Lord
3) Judging by the messages on this list, there is some tension
between the release cycle and feature development -- some
issues around what is merged when, and around the impact of
freezes.
I don't see how any revision management system can fix this. This is
a people problem.
Post by Tom Lord
9) Distributed testing occurs mostly on the HEAD -- which
means that the HEAD breaks on various targets, fairly
frequently.
No, more testing on head means that head *works* more often. The
other branches are just as broken, we just don't know about it yet.
Post by Tom Lord
10) The utility of the existing revision control set up to
people who lack write access is distinctly less than
the utility to people with write access.
This is a good thing. We don't want them to be able to do all the
things write-access people can do. That's the whole point.
Post by Tom Lord
11) Some efforts, such as overhauling the build process, will
probably benefit from a switch to rev ctl. systems that
support tree rearrangements.
Like CVS? It supports trees.
David S. Miller
2002-12-08 22:47:41 UTC
Permalink
I think if one is going to try and promote a source management system,
I'm pretty sure performance alone would be enough to convince a lot of
people.

After using bitkeeper for just a week or two, I nearly stopped doing
much GCC development simply because CVS is such a dinosaur. It's like
driving a model-T on a US interstate highway or the autobahn. It's
truly that painful to use.

So if arch can provide the same kind of improvement, promote that part
of it.
Bruce Stephens
2002-12-08 23:11:14 UTC
Permalink
Post by David S. Miller
I think if one is going to try and promote a source management system,
I'm pretty sure performance alone would be enough to convince a lot of
people.
After using bitkeeper for just a week or two, I nearly stopped doing
much GCC development simply because CVS is such a dinosaur. It's like
driving a model-T on a US interstate highway or the autobahn. It's
truly that painful to use.
So if arch can provide the same kind of improvement, promote that part
of it.
I think it can't, at the moment.

However, that's an interesting point: what do you do with CVS and with
BitKeeper? What operations are performance-critical for you?

(My intuition is that arch has concentrated on operations which are
relatively uncommon, such as branch merging and the like, relying on a
revision library for operations which seem to me more common---like
"cvs log", "cvs diff", and the like (or rather their moral equivalents
in a configuration based CM). The catch is that the revision library
is expensive in disk terms---arguably not a problem, since disk space
is cheap, but even so. But my intuition may be wrong, so what about
CVS seems slow to you, compared with BitKeeper?)
David S. Miller
2002-12-09 00:45:51 UTC
Permalink
From: Bruce Stephens <***@cenderis.demon.co.uk>
Date: Sun, 08 Dec 2002 23:11:14 +0000

However, that's an interesting point: what do you do with CVS and with
BitKeeper? What operations are performance-critical for you?

I think CVS's weak performance points are so well understood
by other people that they can comment as good as or better
than me :-)

Operations on a branch are painful, so someone can start there :-)
Bruce Stephens
2002-12-08 22:56:48 UTC
Permalink
DJ Delorie <***@redhat.com> writes:

[...]
Post by DJ Delorie
Post by Tom Lord
10) The utility of the existing revision control set up to
people who lack write access is distinctly less than
the utility to people with write access.
This is a good thing. We don't want them to be able to do all the
things write-access people can do. That's the whole point.
Not on the central repository, no. But it might be that people
(people without write access to the main repository) could usefully
keep branches on their own repository (perhaps merging the patches in
at some stage). With CVS, that's not possible, but with a distributed
CM system it would be.
Post by DJ Delorie
Post by Tom Lord
11) Some efforts, such as overhauling the build process, will
probably benefit from a switch to rev ctl. systems that
support tree rearrangements.
Like CVS? It supports trees.
It doesn't handle renaming files or directories. There are ways to do
both, but you lose something, whatever you choose to do.
Joseph S. Myers
2002-12-09 00:07:44 UTC
Permalink
Post by Bruce Stephens
Not on the central repository, no. But it might be that people
(people without write access to the main repository) could usefully
keep branches on their own repository (perhaps merging the patches in
at some stage). With CVS, that's not possible, but with a distributed
CM system it would be.
Distributed CM could be a mixed blessing. Sometimes when people merge
development from a branch to mainline the mainline ChangeLog such says
"See ChangeLog.foobar on foobar-branch for details." (though I don't think
this is a proper form of ChangeLog for such changes, the Changelog should
describe the changes made to mainline following the usual standards). If
the branch sat on someone's machine elsewhere, there's then a lot of
potential for losing this information later if the machine goes away,
fails, etc. - whereas the main repository is at least rsyncable and
rsynced by various people.

(Such problems could be avoided if there were a mechanism by which such
branches of interest - probably including any discussed on the list, could
be "adopted" into the main repository, so that their history (maintained
on some other machine) is regularly made available from the main rsyncable
repository and isn't lost if the originating machine goes away. This
applies even to branches that don't get merged to mainline (superseded by
other branches, etc.) but which are of relevance to historical discussions
on the lists.)

There is one notable problem with CVS's handling of users without write
access: they can't do "cvs add" to generate diffs with added files, though
they can fake its local effects. I don't know whether svn fixes this.
--
Joseph S. Myers
***@cam.ac.uk
Tom Lord
2002-12-09 00:22:22 UTC
Permalink
Thanks for the replies so far. These are helpful.

My intention is to read these over, take lots of notes, and make a
succinct-as-possible, coalesced reply. I'll also (so far) reply
individually to shebs' issue with merging (since it sounds like an
interesting and relevant technical problem). If there's some other
issue you'd like to see pulled out from a coallesced reply, please say
so explicitly.

One quick request: someone said "Hey, testing is already automated."
Can I please see a slight elaboration on the form and function of that
automation? (I have some idea, but maybe there's something I've
overlooked. What I _think_ I know already is that `make test' works,
and that there's some infrastructure for mailing in `make test' output
and having it show up on a web site. Presumably individual testers
have their own scripts for that. I'm not aware that there is any
infrastructure for easily testing arbitrary combinations of patches,
but one comment implied that there is. Someone mentioned QMtest,
which can really tighten-up that automation -- but last I heard,
prospects for its adoption by GCC were slim: has that changed?)


Still listening,
-t
Craig Rodrigues
2002-12-09 05:03:59 UTC
Permalink
Post by Tom Lord
One quick request: someone said "Hey, testing is already automated."
Can I please see a slight elaboration on the form and function of that
automation?
As far as I can tell, there are are number of people who run
daily (or frequent) builds of GCC on a few platforms. They
use the output of "make test", which kicks off some tests which
use the DejaGNU testing framework, and post their output
to the gcc-testresults mailing list:

http://gcc.gnu.org/ml/gcc-testresults/

CodeSourcery has been working on converting the GCC testsutie
over to QMTest:
http://gcc.gnu.org/ml/gcc/2002-05/msg01978.html

While the existing GCC testing process has its benefits, I don't
think it is perfect. It would be great if someone had some
positive ideas towards improving the GCC testing process.

To give you some ideas of some of the problems in the current
process, Wolfgang Banerth has informed me that he has
identified 71 current C++ regressions from GCC 3.2 in the mainline
branch of GCC, based on reading reports in GNATS.
Granted, many of these regressions might be related and duplicates, but
still, that is quite a number of regressions to track down and fix.
I'd be very interested in any ideas which could improve this process.
--
Craig Rodrigues
http://www.gis.net/~craigr
***@attbi.com
Phil Edwards
2002-12-08 23:45:20 UTC
Permalink
Post by Tom Lord
I said "initially" because I'm wondering how to proceed if you list
requirements that I think are buggy in one way or another. Is it
"good style" to point that out if it occurs?
It's more likely that they understand the requirements better than you do,
so it would be /better/ style if you said, "could you elaborate on this,
here are my questions," rather than, "no, /your/ requirements are buggy."
Post by Tom Lord
1) There are frequent reports on this list of glitches with
the current CVS repository.
IIRC, these have all been caused by non-CVS problems. (E.g., disks filled
up, mail server getting hammed and DoS'ing the other services, etc.)
Post by Tom Lord
2) GCC, more than many projects, relies on a distributed
testing effort, which mostly applies to the HEAD revision
and to release candidates. Most of this testing is done
by hand.
I'll borrow one of your choice phrases and call this a bullshit rumor.
It's nearly all automated.
Post by Tom Lord
3) Judging by the messages on this list, there is some tension
between the release cycle and feature development -- some
issues around what is merged when, and around the impact of
freezes.
Yes. I don't see how the choice of revision control software makes a
difference here. The limiting resource here is people-hours.
Post by Tom Lord
4) GCC, more than many projects, makes use of a formal review
process for incoming patches.
Yes.
Post by Tom Lord
5) Mark and the CodeSourcery crew seem to do a lot of fairly
mechanical work by hand to operate the release cycle.
Perhaps you haven't looked at contrib/* and maintainer-scripts/* lately?
Releases and weekly snapshots are all done with those.
Post by Tom Lord
6) People often do some archaeology to understand how
they work up experiments comparing older releases to newer,
and comparing various combinations of patches.
Yes. This is also automated, e.g., Diego's SPEC2000 pages.
Post by Tom Lord
7) Questions about which patches relate to which issues in the
issue database are fairly common.
*shrug* When a patch is committed with an PR number in the log, the
issue database takes notice of it. That's something that we added with
a CVS plugin.
Post by Tom Lord
8) There have been a few controversies from GCC "customers"
arising out of whether they can use the latest release, or
whether they should release non-official versions.
Yes. What does this have to do with revision control software? Anybody
using open source can make this same decision.
Post by Tom Lord
9) Distributed testing occurs mostly on the HEAD -- which
means that the HEAD breaks on various targets, fairly
frequently.
Uh, no. Exactly backwards.
Post by Tom Lord
10) The utility of the existing revision control set up to
people who lack write access is distinctly less than
the utility to people with write access.
Well, duh.
Post by Tom Lord
11) Some efforts, such as overhauling the build process, will
probably benefit from a switch to rev ctl. systems that
support tree rearrangements.
Probably.
Post by Tom Lord
12) The GCC project is heavily invested in a particular
testing framework.
Yes. Well, that plus the new QMtest, which looks to bo far superior.
Post by Tom Lord
13) GCC, more than many projects, makes very heavy use of
development on branches.
Yes.
--
I would therefore like to posit that computing's central challenge, viz. "How
not to make a mess of it," has /not/ been met.
- Edsger Dijkstra, 1930-2002
Zack Weinberg
2002-12-09 00:55:14 UTC
Permalink
Post by Phil Edwards
Post by Tom Lord
1) There are frequent reports on this list of glitches with
the current CVS repository.
IIRC, these have all been caused by non-CVS problems. (E.g., disks
filled up, mail server getting hammed and DoS'ing the other
services, etc.)
There is one situation that used to come up a lot which is CVS's
fault: a 'cvs server' process dies without removing its lock files,
wedging that directory for everyone else until it's manually removed.
I believe this has been dealt with by some patches to the server plus
a cron job that looks for stale locks; however, a version control
system that could not get into a wedged state like that would be useful.
Post by Phil Edwards
Post by Tom Lord
3) Judging by the messages on this list, there is some tension
between the release cycle and feature development -- some
issues around what is merged when, and around the impact of
freezes.
Yes. I don't see how the choice of revision control software makes a
difference here. The limiting resource here is people-hours.
CVS makes working on branches quite difficult. I suspect that a
system that made it easier would mean that people were a bit more
comfortable about doing development on branches for long periods of
time.
Post by Phil Edwards
Post by Tom Lord
4) GCC, more than many projects, makes use of a formal review
process for incoming patches.
Yes.
This is a strength, but with a downside -- patches can and do get
lost. We advise people to re-send patches at intervals, but some
sort of automated patch-tracker would probably be helpful. I don't
think the version control system can help much here (but see below).
Post by Phil Edwards
Post by Tom Lord
5) Mark and the CodeSourcery crew seem to do a lot of fairly
mechanical work by hand to operate the release cycle.
Perhaps you haven't looked at contrib/* and maintainer-scripts/* lately?
Releases and weekly snapshots are all done with those.
I do a fair amount of by-hand work merging the trunk into the
basic-improvements-branch. Some, but not all, of that work could be
facilitated with a better version control system. See below.
Post by Phil Edwards
Post by Tom Lord
11) Some efforts, such as overhauling the build process, will
probably benefit from a switch to rev ctl. systems that
support tree rearrangements.
Probably.
I have several changes in mind which I have not done largely because
CVS lacks the ability to version renames. To be specific: move cpplib
to the top level; move gcc/intl to the top level and sync it with the
version of that directory in the src repository; move the C front end
to a language subdirectory like the others; move the Ada runtime
library to the top level.

I'm not saying that I would definitely have done all of these changes
by now if we were using a version control system that handled renames;
only that the lack of rename support is a major barrier to them.

* * *

I'm now going to list the requirements which I would place on a
replacement for CVS, in rough decreasing order of importance. I
haven't done any research to back them up -- this is just off the top
of my head (but having thought about the issue quite a bit).

0. Must be at least as reliable and at least as portable as CVS. GCC
is a very large development effort. We can't afford to lose
contributors because their preferred platform is shut out, nor can
we afford to lose work due to bugs, and we *especially* cannot risk
a system which has not been audited for security exposures. It
would be relatively easy to give much stronger data integrity
guarantees than CVS currently manages:

0a. All data stored in the repository is under an end-to-end
checksum. All data transmitted over the network is independently
checksummed (yes, redundant with TCP-layer checksums). CVS does
no checksumming at all.

0b. Anonymous repository access is done under a user ID that has only
OS-level read privileges on the repository's files. This cannot
be done with (unpatched) CVS.

0c. Remote write operations on the repository intrinsically require
the use of a protocol which makes strong cryptographic integrity
and authority guarantees. CVS can be set up like this, but it's
not built into the design.

0d. The data stored in the repository cannot be modified by
unprivileged local users except by going through the version
control system. Presently I could take 'vi' to one of the ,v
files in /cvs/gcc and break it thoroughly, or sneak something into
the file content, and leave no trace.

1. Must be at least as fast as CVS for all operations, and should be
substantially faster for all operations where CVS uses a braindead
algorithm. I would venture to guess that everyone's #1 complaint
about CVS is the amount of time we waste waiting for it to complete
this or that request. To be more specific:

1a. Efficient network protocol. Specifically, a network protocol that,
for *all* operations, transmits a volume of data proportional --
with a small constant! -- to the size of the diff involved, *not*
the total size of all the files touched by the diff involved, as
CVS does.

1b. Efficient tags and branches. It should be possible to create
either by creating *one* metadata record, rather than touching
every single file in the repository.

1c. Efficient delta storage algorithm, such that checking in a change
on the tip of a branch is not orders of magnitude slower than
checking in a change on the tip of the trunk. There are several
sane ways to do this.

1d. Efficient method for extracting a logical change after the fact,
no matter how many files it touched. (Currently the easiest way
to do this is: hunt through the gcc-cvs archive until you find the
message describing the checkin you care about, then use wget on
all of the per-file diff URLs in the list and glue them all
together. Slow, painful, doesn't always work.)

2. Should support this laundry list of features, none of which is
known to CVS. Most of them would be useful independent of the
others, though there's not much point to 2b without 2a, nor 2e
without 2d.

2a. Atomic application of a logical change that touches many files,
possibly not all in the same directory. (This is commonly known as
a "change set".) One checkin log per change set is adequate.

2b. Ability to back out an entire change set just as atomically as it
went in.

2c. Ability to rename a file, including the ability for a file to have
different names on different branches.

2d. Automatically remember that a merge occurred from branch A to
branch B; later, when a second merge occurs from A to B, don't
apply those changes again.

2e. Understand the notion of a single-delta merge, either applying
just one change from branch A to branch B, or removing just one
change formerly on branch A ("subtractive merge").

2f. Perform conflict resolution by automatic formation of
microbranches.

3. Should allow a user without commit privileges to generate a change
set, making arbitrary changes to the repository (none of this "you
can edit files and generate diffs but you can't add or delete
files" nonsense), which can be applied by a user who does have
commit privileges, and when the original author does an update
he/she doesn't get spurious conflicts.

4. The repository's on-disk data should be stored in a highly compact
format, to the maximum extent possible and consonant with being
fast. Being fast is much more important; however, GCC's CVS
repository is ~800MB in size and compresses down to ~100MB. You
can do interesting things (like keep a copy of the entire
repository on every developer's personal hard disk, as Bitkeeper
does) with a 100MB repository that are not so practical when it's
closer to a gigabyte.

5. Should have the ability to generate ChangeLog files automagically
from the checkin comments. (When merging to basic-improvements I
normally spend more time fixing up the ChangeLogs than anything
else. Except maybe waiting for 'cvs tag' and 'cvs update -j...'.)

zw
Phil Edwards
2002-12-09 09:46:12 UTC
Permalink
Post by Zack Weinberg
I'm now going to list the requirements which I would place on a
replacement for CVS, in rough decreasing order of importance. I
haven't done any research to back them up -- this is just off the top
of my head (but having thought about the issue quite a bit).
With the exception of 2[def] and 5, I believe subversion does all of those.
I handle 5 with a wrapper script for checkins.


Phil
--
I would therefore like to posit that computing's central challenge, viz. "How
not to make a mess of it," has /not/ been met.
- Edsger Dijkstra, 1930-2002
Joseph S. Myers
2002-12-09 09:08:35 UTC
Permalink
Post by Zack Weinberg
0a. All data stored in the repository is under an end-to-end
checksum. All data transmitted over the network is independently
checksummed (yes, redundant with TCP-layer checksums). CVS does
no checksumming at all.
Doesn't SSH?

(And CVS does checksum checkouts/updates: if after applying a diff in cvs
update the file checksum doesn't match, it warns and regets the whole
file, which can indicate something was broken in the latest checkin to the
file (yielding a bogus delta). This is however highly suboptimal - it
should be an error not a warning (with a warning sent to the repository
maintainers) and lots more checksumming should be done. In addition:

0aa. Checksums stored in the repository format for all file revisions,
deltas, log messages etc., with an easy way to verify them - to detect
corruption early.)
Post by Zack Weinberg
5. Should have the ability to generate ChangeLog files automagically
from the checkin comments. (When merging to basic-improvements I
normally spend more time fixing up the ChangeLogs than anything
else. Except maybe waiting for 'cvs tag' and 'cvs update -j...'.)
The normal current practice here is for branch ChangeLogs to be kept in a
separate file, not the ChangeLogs that need merging from mainline. (In
the case of BIB the branch ChangeLog then goes on the top of the mainline
one (with an overall "merge from BIB" comment) when the merge back to
mainline is done. For branches developing new features a new ChangeLog
entry describing the overall logical effect of the branch changes, not the
details of how that state was reached, is more appropriate.)
--
Joseph S. Myers
***@cam.ac.uk
Walter Landry
2002-12-09 23:04:28 UTC
Permalink
Hi,
Post by Zack Weinberg
I'm now going to list the requirements which I would place on a
replacement for CVS, in rough decreasing order of importance. I
haven't done any research to back them up -- this is just off the top
of my head (but having thought about the issue quite a bit).
0. Must be at least as reliable
To my knowledge, arch doesn't have any real reliability problems. Tom
didn't make a fast implementation, but it is reliable. There is a bug
database [1] with all of the bugs that I can think of, so you can
decide for yourself whether it is reliable.
Post by Zack Weinberg
and at least as portable as CVS.
Arch currently doesn't work on 64 bit machines. The problem is in the
non-shell parts. It is not an insurmountable problem, it is just that
no one has taken the time to try to fix the bugs.

Someone got it running under cygwin once, but the patches have
disappeared. It wasn't usable there. Too slow.

Otherwise, it seems to work on posix machines.
Post by Zack Weinberg
GCC is a very large development effort. We can't afford to lose
contributors because their preferred platform is shut out, nor
can we afford to lose work due to bugs, and we *especially*
cannot risk a system which has not been audited for security
exposures. It would be relatively easy to give much stronger
arch doesn't interact at all with root. The remote repositories are
all done with sftp, ftp, and http, which is as secure as those servers
are.
Post by Zack Weinberg
0a. All data stored in the repository is under an end-to-end
checksum. All data transmitted over the network is independently
checksummed (yes, redundant with TCP-layer checksums). CVS does
no checksumming at all.
Sort of. Patches are gzipped, which have their own checksum, but
there is isn't any way to make sure that what you get is the same
thing as what you put in. That is, there is some individual
checksums, but no end-to-end checksum.
Post by Zack Weinberg
0b. Anonymous repository access is done under a user ID that has only
OS-level read privileges on the repository's files. This cannot
be done with (unpatched) CVS.
Is http access good enough?
Post by Zack Weinberg
0c. Remote write operations on the repository intrinsically require
the use of a protocol which makes strong cryptographic integrity
and authority guarantees. CVS can be set up like this, but it's
not built into the design.
Currently, we allow writeable ftp servers and sftp servers. If we
disallowed writeable ftp servers, would that be good enough? (Don't
tempt me. I've considered it in the past.)
Post by Zack Weinberg
0d. The data stored in the repository cannot be modified by
unprivileged local users except by going through the version
control system. Presently I could take 'vi' to one of the ,v
files in /cvs/gcc and break it thoroughly, or sneak something into
the file content, and leave no trace.
There is no interaction with root, so if you own the archive, you can
always do what you want. To get anything approaching this, you have
to deal with PGP signatures, SHA hashes, and the like. OpenCM is
probably the only group (including BitKeeper) that even comes close to
doing this right.
Post by Zack Weinberg
1. Must be at least as fast as CVS for all operations, and should be
substantially faster for all operations where CVS uses a braindead
algorithm. I would venture to guess that everyone's #1 complaint
about CVS is the amount of time we waste waiting for it to complete
Arch is slow, slow, slow. Don't let Tom beguile you into thinking
that it is even reasonably fast right now. It isn't. It is a subject
of great interest to the developers, but we're not there yet. Part of
this is the shell implementation. Once certain parts are rewritten in
a compiled language, it should get _much_ better.
Post by Zack Weinberg
1a. Efficient network protocol. Specifically, a network protocol that,
for *all* operations, transmits a volume of data proportional --
with a small constant! -- to the size of the diff involved, *not*
the total size of all the files touched by the diff involved, as
CVS does.
Arch has this, although some of the implementations could do with a
little improvement (e.g. the mirroring script seems to take forever).
Post by Zack Weinberg
1b. Efficient tags and branches. It should be possible to create
either by creating *one* metadata record, rather than touching
every single file in the repository.
Don't know. I haven't looked at the actual implementation. There
isn't a fundamental reason why not, though.
Post by Zack Weinberg
1c. Efficient delta storage algorithm, such that checking in a change
on the tip of a branch is not orders of magnitude slower than
checking in a change on the tip of the trunk. There are several
sane ways to do this.
Arch has this
Post by Zack Weinberg
1d. Efficient method for extracting a logical change after the fact,
no matter how many files it touched. (Currently the easiest way
to do this is: hunt through the gcc-cvs archive until you find the
message describing the checkin you care about, then use wget on
all of the per-file diff URLs in the list and glue them all
together. Slow, painful, doesn't always work.)
Arch has this
Post by Zack Weinberg
2. Should support this laundry list of features, none of which is
known to CVS. Most of them would be useful independent of the
others, though there's not much point to 2b without 2a, nor 2e
without 2d.
2a. Atomic application of a logical change that touches many files,
possibly not all in the same directory. (This is commonly known as
a "change set".) One checkin log per change set is adequate.
Arch has this. It's why I started using it.
Post by Zack Weinberg
2b. Ability to back out an entire change set just as atomically as it
went in.
In theory, easy to do (just a few rm's and an mv). There are larger
policy questions, though (Do we want to allow that?). Some day, I may
just hack something together that does that.
Post by Zack Weinberg
2c. Ability to rename a file, including the ability for a file to have
different names on different branches.
Arch has this
Post by Zack Weinberg
2d. Automatically remember that a merge occurred from branch A to
branch B; later, when a second merge occurs from A to B, don't
apply those changes again.
Arch has this
Post by Zack Weinberg
2e. Understand the notion of a single-delta merge, either applying
just one change from branch A to branch B, or removing just one
change formerly on branch A ("subtractive merge").
Single delta forward merges are no problem. Reverse merges are more
difficult. This is one of those "lurking design issues" that I
mentioned earlier.
Post by Zack Weinberg
2f. Perform conflict resolution by automatic formation of
microbranches.
I'm not quite sure what you mean here.
Post by Zack Weinberg
3. Should allow a user without commit privileges to generate a change
set, making arbitrary changes to the repository (none of this "you
can edit files and generate diffs but you can't add or delete
files" nonsense), which can be applied by a user who does have
commit privileges, and when the original author does an update
he/she doesn't get spurious conflicts.
Are you thinking of sending patches by email? Arch doesn't have that.
Post by Zack Weinberg
4. The repository's on-disk data should be stored in a highly compact
format, to the maximum extent possible and consonant with being
fast. Being fast is much more important; however, GCC's CVS
repository is ~800MB in size and compresses down to ~100MB. You
can do interesting things (like keep a copy of the entire
repository on every developer's personal hard disk, as Bitkeeper
does) with a 100MB repository that are not so practical when it's
closer to a gigabyte.
Arch stores the repository as tar.gz of the initial revision, plus
tar.gz of the patches. This will be about as compact as anything.

The problem comes when you want to get older revisions. If you're at
patch-51, getting patch-48 means starting from patch-0 and applying
all 48 patches. This can be sped up by saving entire trees along the
way, but that kills the "highly compact format".
Post by Zack Weinberg
5. Should have the ability to generate ChangeLog files automagically
from the checkin comments. (When merging to basic-improvements I
normally spend more time fixing up the ChangeLogs than anything
else. Except maybe waiting for 'cvs tag' and 'cvs update -j...'.)
That apparently works, although I've never used it.

By the way, I thought that your comments were quite illuminating, so I
put them up on the arch web site [2].

I also think that Tom should stop telling everyone to work on arch.
At this point, it just causes more trouble than any help I'll get.

Regards,
Walter Landry
***@ucsd.edu

[1] http://bugs.fifthvision.net:8080/
[2] http://www.fifthvision.net/open/bin/view/Arch/GccHackers
Joseph S. Myers
2002-12-09 23:20:05 UTC
Permalink
Post by Walter Landry
arch doesn't interact at all with root. The remote repositories are
all done with sftp, ftp, and http, which is as secure as those servers
are.
Is this - for anonymous access - _plain_ HTTP, or HTTP + WebDAV + DeltaV
which svn uses? One problem there was with SVN - it may have been fixed
by now, and a fix would be necessary for it to be usable for GCC - was its
use of HTTP and HTTPS (for write access); these tend to be heavily
controlled by firewalls and the ability to tunnel over SSH (with just that
one port needing to be open) would be necessary. "Transparent" proxies
may pass plain HTTP OK, but not the WebDAV/DeltaV extensions SVN needs.
Post by Walter Landry
Post by Zack Weinberg
0d. The data stored in the repository cannot be modified by
unprivileged local users except by going through the version
control system. Presently I could take 'vi' to one of the ,v
files in /cvs/gcc and break it thoroughly, or sneak something into
the file content, and leave no trace.
There is no interaction with root, so if you own the archive, you can
always do what you want. To get anything approaching this, you have
to deal with PGP signatures, SHA hashes, and the like. OpenCM is
probably the only group (including BitKeeper) that even comes close to
doing this right.
This sort of thing has been done simply by a modified setuid (to a cvs
user, not root) cvs binary so users can't access the repository directly,
only through that binary. More generically, with a reasonable protocol
for local repository access it should be possible to use GNU userv to
separate the repository from the users.
Post by Walter Landry
Post by Zack Weinberg
2b. Ability to back out an entire change set just as atomically as it
went in.
In theory, easy to do (just a few rm's and an mv). There are larger
policy questions, though (Do we want to allow that?). Some day, I may
just hack something together that does that.
A change set is applied. It turns out to have problems, so needs to be
reverted - common enough. Of course the version history and ChangeLog
shows both the original application and reversion. The reversion might in
fact be of the original change set and a series of subsequent failed
attempts at patching it up. But intermediate unrelated changes to the
tree should not be backed out in the process.
Post by Walter Landry
Post by Zack Weinberg
3. Should allow a user without commit privileges to generate a change
set, making arbitrary changes to the repository (none of this "you
can edit files and generate diffs but you can't add or delete
files" nonsense), which can be applied by a user who does have
commit privileges, and when the original author does an update
he/she doesn't get spurious conflicts.
Are you thinking of sending patches by email? Arch doesn't have that.
Patches by email (with distributed patch review by multiple people reading
gcc-patches, including those who can't actually approve the patch) is the
normal way GCC development works. Presume that most contributors will not
want to deal with security issues of making any local repository
accessible to other machines, even if it's on a permanently connected
machine and local firewalls or policy don't prevent this.

A patch for use with a better version control system would need to include
some encoding for that system of renames / deletes / ... - but that needs
to be just as human-readable as context diffs / unidiffs are.
--
Joseph S. Myers
***@cam.ac.uk
Walter Landry
2002-12-10 00:43:25 UTC
Permalink
Post by Joseph S. Myers
Post by Walter Landry
arch doesn't interact at all with root. The remote repositories are
all done with sftp, ftp, and http, which is as secure as those servers
are.
Is this - for anonymous access - _plain_ HTTP, or HTTP + WebDAV + DeltaV
which svn uses? One problem there was with SVN - it may have been fixed
by now, and a fix would be necessary for it to be usable for GCC - was its
use of HTTP and HTTPS (for write access); these tend to be heavily
controlled by firewalls and the ability to tunnel over SSH (with just that
one port needing to be open) would be necessary. "Transparent" proxies
may pass plain HTTP OK, but not the WebDAV/DeltaV extensions SVN needs.
Anonymous access requires for HTTP + WebDAV (no DeltaV). However, the
set of WebDAV commands needed are much smaller than what subversion
needs. It just needs whatever anonymous ftp has that http doesn't (I
believe PROPFIND is one). In particular, you can run a server using
apache 1.3.
Post by Joseph S. Myers
Post by Walter Landry
Post by Zack Weinberg
0d. The data stored in the repository cannot be modified by
unprivileged local users except by going through the version
control system. Presently I could take 'vi' to one of the ,v
files in /cvs/gcc and break it thoroughly, or sneak something into
the file content, and leave no trace.
There is no interaction with root, so if you own the archive, you can
always do what you want. To get anything approaching this, you have
to deal with PGP signatures, SHA hashes, and the like. OpenCM is
probably the only group (including BitKeeper) that even comes close to
doing this right.
This sort of thing has been done simply by a modified setuid (to a cvs
user, not root) cvs binary so users can't access the repository directly,
only through that binary. More generically, with a reasonable protocol
for local repository access it should be possible to use GNU userv to
separate the repository from the users.
This is a different security model. Arch is secure because it doesn't
depend on having priviledged access. For example, there is an "rm
-rf" command built into arch.

I have a feeling that you are thinking of how CVS handles things, with
a centralized server. Part of the whole point of arch is that there
is no centralized server. So, for example, I can develop arch
independently of whether Tom thinks that I am worthy enough to do so.
I can screw up my archive as much as I want (and I have), and Tom can
be blissfully unaware. Easy merging is what makes this possible.

So you don't, in general, have a repository that is writeable by more
than one person.
Post by Joseph S. Myers
Post by Walter Landry
Post by Zack Weinberg
2b. Ability to back out an entire change set just as atomically as it
went in.
In theory, easy to do (just a few rm's and an mv). There are larger
policy questions, though (Do we want to allow that?). Some day, I may
just hack something together that does that.
A change set is applied. It turns out to have problems, so needs to be
reverted - common enough. Of course the version history and ChangeLog
shows both the original application and reversion. The reversion might in
fact be of the original change set and a series of subsequent failed
attempts at patching it up. But intermediate unrelated changes to the
tree should not be backed out in the process.
To get what you really want means that we can reverse our patches.
Then you could simply unapply a patch. But that isn't possible right
now, and is not going to be done real soon. That would require
someone who is actually working on the code to understand the current
patch format.

Regards,
Walter Landry
***@ucsd.edu
Joseph S. Myers
2002-12-10 01:05:16 UTC
Permalink
Post by Walter Landry
Anonymous access requires for HTTP + WebDAV (no DeltaV). However, the
set of WebDAV commands needed are much smaller than what subversion
needs. It just needs whatever anonymous ftp has that http doesn't (I
believe PROPFIND is one). In particular, you can run a server using
apache 1.3.
I'm sure some "transparent" proxies will fail to pass even that (though
WebDAV may be better supported by them than DeltaV). This is similar to
Zack's first point - just as any new system must be no less portable to
running on different systems, it must be no less portable to working
through networks restricted in different ways.
Post by Walter Landry
I have a feeling that you are thinking of how CVS handles things, with
a centralized server. Part of the whole point of arch is that there
is no centralized server. So, for example, I can develop arch
independently of whether Tom thinks that I am worthy enough to do so.
I can screw up my archive as much as I want (and I have), and Tom can
be blissfully unaware. Easy merging is what makes this possible.
So you don't, in general, have a repository that is writeable by more
than one person.
For GCC there clearly needs to be some server that has the mainline of
development we advertise on our web pages for users, from which release
branches are made, which has some vague notions of the machine being
securely maintained, having adequate bandwidth, having some backup
procedure, having maintainers for the server keeping it up reliably,
having a reasonable expectation that the development lines in there will
still be available in 20 years' time when current developers have lost
interest. (gcc.gnu.org presents a remarkably good impression of this to
the outside world, considering how it operates purely by volunteer
effort.)

There may be many other servers - private and public - but some server
provides the line of development that gets branched into new releases, and
inevitably multiple people may write to that line. (I'm also presuming -
see <http://gcc.gnu.org/ml/gcc/2002-12/msg00436.html> - that all the
developments in any third party repository that get discussed on the lists
should be mirrored into this main one to give some hope of long term
survival and availability. In developing GCC with list archives and
version control we are simultaneously acting as curators of the history of
GCC development, which means attempting to preserve that history for
posterity (a period beyond the involvement of any one individual).)
--
Joseph S. Myers
***@cam.ac.uk
Walter Landry
2002-12-10 01:54:47 UTC
Permalink
Post by Joseph S. Myers
Post by Walter Landry
Anonymous access requires for HTTP + WebDAV (no DeltaV). However, the
set of WebDAV commands needed are much smaller than what subversion
needs. It just needs whatever anonymous ftp has that http doesn't (I
believe PROPFIND is one). In particular, you can run a server using
apache 1.3.
I'm sure some "transparent" proxies will fail to pass even that (though
WebDAV may be better supported by them than DeltaV). This is similar to
Zack's first point - just as any new system must be no less portable to
running on different systems, it must be no less portable to working
through networks restricted in different ways.
Well, there is anonymous ftp. But if all you have is plain http, I
would think that you would have problems checking out from CVS as
well.
Post by Joseph S. Myers
Post by Walter Landry
So you don't, in general, have a repository that is writeable by more
than one person.
For GCC there clearly needs to be some server that has the mainline of
development we advertise on our web pages for users, from which release
branches are made, which has some vague notions of the machine being
securely maintained, having adequate bandwidth, having some backup
procedure, having maintainers for the server keeping it up reliably,
having a reasonable expectation that the development lines in there will
still be available in 20 years' time when current developers have lost
interest. (gcc.gnu.org presents a remarkably good impression of this to
the outside world, considering how it operates purely by volunteer
effort.)
That, presumably, would be the release manager's branch.
Periodically, people would say, "feature X is implemented on branch
Y". If the release manager trusts them, then he does a simple update.
If there is no trust, then the release manager can review the patches.
In any case, assuming the submitter knows what they are doing, the
patch will apply cleanly. It would be very quick. If it doesn't
apply cleanly, then the release manager sends a curt note to the
submitter (perhaps automatically) or tries to resolve it himself.
This is how the Linux kernel development works, although a release
manager wouldn't have to do as much work as Linus does.

Regards,
Walter Landry
***@ucsd.edu
Joseph S. Myers
2002-12-10 02:56:58 UTC
Permalink
Post by Walter Landry
Well, there is anonymous ftp. But if all you have is plain http, I
would think that you would have problems checking out from CVS as
well.
FTP is probably useful in most such cases (for arch, since I didn't think
svn provided FTP transport; and I don't know about the other systems
mentioned). The case I was thinking of is the common situation where most
outgoing ports are free but port 80 is redirected through a "transparent"
proxy to save ISP bandwidth. (In such situations, a few other outgoing
ports such 25 are probably proxied but are irrelevant here.) That's one
common (consumer) situation, and FTP and HTTPS probably work there, but
another (corporate) situation may well have HTTPS tied down more tightly.
Where people have got pserver or ssh allowed through their firewall, there
may be more problems with protocols used for other purposes and restricted
or proxied for other reasons.

(Some blocks might be avoided by choosing nonstandard ports, but then
everyone is likely to choose different ports and create more confusion.
Arranging that both write and anonymous access is tunnelled over ssh - as
some sites do for anonymous CVS access - simplifies things.)
Post by Walter Landry
That, presumably, would be the release manager's branch.
Periodically, people would say, "feature X is implemented on branch
Y". If the release manager trusts them, then he does a simple update.
If there is no trust, then the release manager can review the patches.
In any case, assuming the submitter knows what they are doing, the
patch will apply cleanly. It would be very quick. If it doesn't
apply cleanly, then the release manager sends a curt note to the
submitter (perhaps automatically) or tries to resolve it himself.
This is how the Linux kernel development works, although a release
manager wouldn't have to do as much work as Linus does.
There are about 100 people applying patches to the mainline (half
maintainers of some of the code who can apply some patches without review,
half needing review for all nonobvious patches). Having the release
manager manually handle the patches from all 100 people is not a sensible
scalable solution for GCC; the expectation is that anyone producing a
reasonable number of good patches will get write access which reduces the
reviewers' effort (to needing only to review the patch, not apply it) and
means that the version control logs clearly show which user was
responsible for a patch by who checked it in (the case of someone else,
named in the log message, being responsible, being the exceptional case).
Note that the 50 or so maintainers all do some patch review; it's only at
a late stage on the actual release branches that the review is
concentrated in the release manager.

(You might then say that the release manager could have a bot
automatically applying patches from developers who now have write access,
but this has no real advantages over them all having write access and a
lot of fragility added in.)

The Linux model of one person controlling everything going into the
mainline is exceptional; GCC, *BSD, etc., have many committers to mainline
(the rules for who commits where with what review varying) and as Zack
explains <http://gcc.gnu.org/ml/gcc/2002-12/msg00492.html> (albeit missing
the footnote [1] on where releases are made from) this mainline on a
master server will remain central, with new developments normally going
there rapidly except for various major work on longer-term surrounding
branches. Zack notes that practically the main use of a distributed
system would be for individual developers to do their work offline, not to
move the main repository for communication between developers off a single
machine (though depending on the system, developers may naturally have
repository mirrors) - it is not in general the case that development takes
place on always-online systems or systems which can allow remote access to
their repositories.

I expect for most branches it will also be most convenient for the master
server to host them. The exceptions are most likely to be for
developments that aren't considered politically appropriate for mainstream
GCC, or those that aren't assigned to the FSF or may have other legal
problems, or those done under NDA (albeit legally dodgy), or corporate
developments whose public visibility too early would give away sensitive
information or which a customer would like to have before they go public
(i.e., work eventually destined for public GCC unless too specialised or
ugly, where the customer and company would be free under the GPL to
release the work early but choose not to). In general, I expect most
development would be on the central server, except for small-scale
individual development (often offline) on personal servers and corporate
development on internal systems that definitely will not be accessible to
the public.

This is, of course, all just hypothesis about how GCC development would
work with distributed CM, but it seems a reasonable extrapolation
supposing we start from wanting to preserve security, accessibility and
long-term survival of all the version history of developments that
presently go in the public repository (mainline and branches).
--
Joseph S. Myers
***@cam.ac.uk
Zack Weinberg
2002-12-10 07:26:01 UTC
Permalink
Post by Joseph S. Myers
The Linux model of one person controlling everything going into the
mainline is exceptional; GCC, *BSD, etc., have many committers to
mainline (the rules for who commits where with what review varying)
and as Zack explains <http://gcc.gnu.org/ml/gcc/2002-12/msg00492.html>
(albeit missing the footnote [1] on where releases are made from)
this mainline on a master server will remain central, with new
developments normally going there rapidly except for various major
work on longer-term surrounding branches.
The missing footnote was going to be an argument that the Linux model
is not just exceptional, but pathological. Not something I think we
should emulate with GCC, and not something I consider worth designing
a version control system to support.

Linux is a large project - 4.3 million lines of code - but only one
person has commit privileges on the official tree, for any given
release branch. No matter how good their tools are, this cannot be
expected to scale, and indeed it does not. I have not actually
measured it, but the appearance of the traffic on linux-kernel is that
Linus drops patches on the floor just as often as he did before he
started using Bitkeeper. However, Bitkeeper facilitates other people
maintaining their own semi-official versions of the tree, in which
some of these patches get sucked up. That is bad. It means users
have to choose between N different variants; as time goes by it
becomes increasingly difficult to put them all back together again;
eventually will come a point where critical feature A is available
only in tree A, critical feature B is available only in tree B, and
the implementations conflict, because no one's exerting adequate
centripetal force.

Possibly I am too pessimistic.

zw
Tom Lord
2002-12-10 08:31:36 UTC
Permalink
Zack:

Linux is a large project - 4.3 million lines of code - but only one
person has commit privileges on the official tree, for any given
release branch. No matter how good their tools are, this cannot be
expected to scale, and indeed it does not.

I hope you'll have a look at the process automation scenario in my
reply to Joseph S. Myers ("new patch of replies (B)").

-t
Phil Edwards
2002-12-10 20:05:13 UTC
Permalink
Post by Zack Weinberg
the implementations conflict, because no one's exerting adequate
centripetal force.
Heh. I never thought I'd hear that term applied to software development.
I like it.



Phil
--
I would therefore like to posit that computing's central challenge, viz. "How
not to make a mess of it," has /not/ been met.
- Edsger Dijkstra, 1930-2002
Mark Mielke
2002-12-11 03:42:24 UTC
Permalink
Post by Zack Weinberg
Linux is a large project - 4.3 million lines of code - but only one
person has commit privileges on the official tree, for any given
release branch. No matter how good their tools are, this cannot be
expected to scale, and indeed it does not. I have not actually
measured it, but the appearance of the traffic on linux-kernel is that
Linus drops patches on the floor just as often as he did before he
started using Bitkeeper. However, Bitkeeper facilitates other people
maintaining their own semi-official versions of the tree, in which
some of these patches get sucked up. That is bad. It means users
have to choose between N different variants; as time goes by it
becomes increasingly difficult to put them all back together again;
eventually will come a point where critical feature A is available
only in tree A, critical feature B is available only in tree B, and
the implementations conflict, because no one's exerting adequate
centripetal force.
Possibly I am too pessimistic.
Actually, the model used for Linux provides substantial freedom. Since
no single site is the 'central' site, development can be fully
distributed. Changes can be merged back and forth on demand, and
remote users require no resources to run, other than the resources to
periodically synchronize the data.

Unfortunately -- this freedom (as always) comes with a price. The
price is that the fully distributed model means that there is no
enforced regulation. There is no control, and the same freedom that
allows anybody to create a variant, allows them to keep a variant.

The models are substantially different, however, I would suggest that
neither is wrong in the generic sense.

The only questions that really matters are: 1) are you more
comfortable in a regulated environment, and if so, then 2) are you
willing to live with the limitations that a regulated environment
gives you? Some of these limitations include the need to maintain
contact with a central repository of some sort, and the need for
processing at a central repository of some sort.

Personally, I'm with you in that I prefer regulation and enforcement.
It keeps me from fsck'ing up my own data.

mark
--
***@mielke.cc/***@ncf.ca/***@nortelnetworks.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/
David S. Miller
2002-12-11 03:41:05 UTC
Permalink
From: Mark Mielke <***@mark.mielke.cc>
Date: Tue, 10 Dec 2002 22:42:24 -0500
Post by Zack Weinberg
Linux is a large project - 4.3 million lines of code - but only one
person has commit privileges on the official tree, for any given
release branch. No matter how good their tools are, this cannot be
expected to scale, and indeed it does not. I have not actually
measured it, but the appearance of the traffic on linux-kernel is that
Linus drops patches on the floor just as often as he did before he
started using Bitkeeper. However, Bitkeeper facilitates other people
maintaining their own semi-official versions of the tree, in which
some of these patches get sucked up. That is bad. It means users
have to choose between N different variants; as time goes by it
becomes increasingly difficult to put them all back together again;
eventually will come a point where critical feature A is available
only in tree A, critical feature B is available only in tree B, and
the implementations conflict, because no one's exerting adequate
centripetal force.
Possibly I am too pessimistic.
Actually, the model used for Linux provides substantial freedom. Since
no single site is the 'central' site, development can be fully
distributed. Changes can be merged back and forth on demand, and
remote users require no resources to run, other than the resources to
periodically synchronize the data.

I think some assesments are wrong here.

Linus does get more patches applied these days, and less gets
dropped on the floor.

Near the end of November, as we were approaching the feature
freeze deadline, he was merging on the order of 4MB of code
per day if not more.

What really ends up happening also is that Linus begins to trust
people with entire subsystems. So when Linus pulls changes from
their BK tree, he can see if they touch any files outside of their
areas of responsibility.

Linus used to drop my work often, and I would just retransmit until
he took it. Now with BitKeeper, I honestly can't remember the last
time he silently dropped a code push I sent to him.

The big win with BitKeeper is the whole disconnected operation bit.

When the net goes down, I can't check RCS history and make diffs
against older versions of files in the gcc tree.

With Bitkeeper I have all the revision history in my cloned tree so
there is zero need for me to every go out onto the network to do work
until I want to share my changes with other people. This also
decreases the load on the machine with the "master" repository.

There is nothing about this which makes it incompatible with how GCC
works today. So if arch and/or subversions can support the kind of
model BitKeeper can, we'd set it up like so:

1) gcc.gnu.org would still hold the "master" repository
2) there would be trusted people with write permission who
could thusly push their changes into the master tree

Releases and tagging would still be done by someone like Mark
except it hopefully wouldn't take several hours to do it :-)
Phil Edwards
2002-12-11 03:57:24 UTC
Permalink
Post by David S. Miller
When the net goes down, I can't check RCS history and make diffs
against older versions of files in the gcc tree.
I just rsync the repository and do everything but checkins locally.
Very very fast.
Post by David S. Miller
With Bitkeeper I have all the revision history in my cloned tree so
there is zero need for me to every go out onto the network to do work
until I want to share my changes with other people. This also
decreases the load on the machine with the "master" repository.
So does the rsync-repo technique.


Phil
--
I would therefore like to posit that computing's central challenge, viz. "How
not to make a mess of it," has /not/ been met.
- Edsger Dijkstra, 1930-2002
David S. Miller
2002-12-11 03:58:36 UTC
Permalink
From: Phil Edwards <***@jaj.com>
Date: Tue, 10 Dec 2002 22:57:24 -0500
Post by David S. Miller
With Bitkeeper I have all the revision history in my cloned tree so
there is zero need for me to every go out onto the network to do work
until I want to share my changes with other people. This also
decreases the load on the machine with the "master" repository.
So does the rsync-repo technique.

That's not distributed source management, that's "I copy the entire
master tree onto my computer."

If you make modifications to your local rsync'd master tree, you can't
transparently push those changes to other people unless you setup
anoncvs on your computer and tell them "use this as your master repo
instead of gcc.gnu.org to get my changes".

That's bolted onto the side, not part of the design.
Phil Edwards
2002-12-11 14:23:32 UTC
Permalink
Post by David S. Miller
Date: Tue, 10 Dec 2002 22:57:24 -0500
Post by David S. Miller
With Bitkeeper I have all the revision history in my cloned tree so
there is zero need for me to every go out onto the network to do work
until I want to share my changes with other people. This also
decreases the load on the machine with the "master" repository.
So does the rsync-repo technique.
That's not distributed source management, that's "I copy the entire
master tree onto my computer."
I'm not claiming otherwise. I'm simply offering a tip to make life easier
for current users in the current situation. What I said is still true
with regards to the paragraph I quoted.


Phil
--
I would therefore like to posit that computing's central challenge, viz. "How
not to make a mess of it," has /not/ been met.
- Edsger Dijkstra, 1930-2002
Linus Torvalds
2002-12-14 21:05:57 UTC
Permalink
[ See the blurb about OpenCM at the end. ]
Post by Zack Weinberg
Linux is a large project - 4.3 million lines of code - but only one
person has commit privileges on the official tree, for any given
release branch.
No. That's not how it works.

Linux, unlike _every_ other project I know of, has always actively
encouraged "personal/vendor branches", and that is in fact how 99% of
all development has happened.

Most development happens in trees that have _nothing_ to do with the
official tree. To me, the whole CVS model (many branches in one
centralized repository) is just incredibly broken, and you should
realize that that isn't how Linux has ever worked.

My tree is often called the "official" tree, but what it really is is
just a base tree that many people maintain their own forks from.
This is fundamentally _more_ scalable than the CVS mess that is gcc,
since it much more easily allows for very radical branches that do not
need any centralized permission from me.

Think of it this way: in gcc, the egcs split was a very painful thing.
In Linux, those kinds of splits (people doing what they think is right,
_without_ support from the official maintainers) is how _everything_ gets
done. Linux is a "constantly forking" project, and that's how development
very fundamentally happens.

And a fork is a lot more scalable than a branch. It's also a lot more
powerful: it gives _full_ rights to the forker. That implies that a
forked source tree should be a first-class citizen, not just something
that was copied off somebody elses CVS tree. The BitKeeper "clone"
thing is a beautiful implementation of the Linux development model.
Post by Zack Weinberg
No matter how good their tools are, this cannot be
expected to scale, and indeed it does not.
Sorry, but you're wrong. Probably simply because you're too used to the
broken CVS model.

I would like to point out that Linux development has scaled a lot better
than gcc, to a larger source base (it's 5+ M lines) with much more
fundamental programming issues (concurrency etc). I will bet you that
the Linux kernel merges are a lot bigger than the gcc ones, that
development happens faster, and that there are more independent
developers working on their own versions of Linux than there are of gcc.

There aren't just a handful of branches, there are _hundreds_. Many of
them end up not being interesting, or ever necessarily merged back. And
_none_ of them required write access to my tree.

I'd also like to point out that Linux has _never_ had a flap like the
gcc/egcs/emacs/xemacs splits. Exactly because of the _much_ more
scalable approach of just fundamentally always having had a distributed
development model that allows _anybody_ to contribute easily, instead of
having a model that makes certain people have "special powers".

In short, _my_ tree is _not_ the same thing as the gcc CVS sources.
Post by Zack Weinberg
I have not actually
measured it, but the appearance of the traffic on linux-kernel is that
Linus drops patches on the floor just as often as he did before he
started using Bitkeeper.
Measure the number of changes accepted, and I bet the Linux kernel
approach had an order of magnitude more changes than gcc has _ever_ had.
Even before using Bitkeeper.

The proof is in the pudding - care to compare real numbers, and compare
sizes of weekly merged patches? I bet gcc will come in _far_ behind.
Post by Zack Weinberg
However, Bitkeeper facilitates other people
maintaining their own semi-official versions of the tree, in which
some of these patches get sucked up. That is bad.
No. Have you ever used Bitkeeper? Really _used_ it?

I've used both bitkeeper and CVS (I refuse to touch CVS with a ten-foot
pole for my "fun" projects, but I've used CVS for big projects at work),
and I can tell you, CVS doesn't even come _close_. Not even with
various wrapper helper tools to make things like CVS branches look even
remotely palatable.

The part that you're missing, simply because you've probably used CVS
for too long, is the _distributed_ nature of Bitkeeper, and of Linux
development. Repeat after me: "There is no single tree". Everything is
distributed.

Any source control system that has "write access" issues is
fundamentally broken in my opinion. Your repository should be _yours_,
and nobody elses. There is no "write access". There is only some way
to expedite a merge between two repositories. The source control
management should make it easy for you to export your changes to other
repositories. In fact, it should make it easy for you to have many
different repositories - for different things you're working on.

Bitkeeper does this very well. It's _the_ reason I use bitkeeper. BK
does other things too, but they all pale to the _fundamental_ idea of
each repository being a thing unto itself, and having no stupid
"branches", but simply having truly distributed repositories. Some
people think that is a "offline" feature, but nothing could be further
from the truth. The _real_ issue about independent repositories is that
it makes it possible to do truly independent development, and makes
notions like branches such an outdated idea.

Projects like Subversion never seem to have really _understood_ the
notion of true distributed repositories. And by not understanding them,
like you they miss the whole point of truly scalable development.
Development that scales _past_ the notion of one central repository.
Post by Zack Weinberg
Possibly I am too pessimistic.
No. You're not pessimistic, you just don't _understand_.

You don't have to believe me. Believe the numbers. Look at which
project gets more done. And realize that even before Linux used
Bitkeeper, it used the truly distributed _model_. The model is
independent from what SCM you use, although some SCMs obviously cannot
support some models (and CVS in particular forces its users to use a
particularly broken model).

Btw, I realize that there's no way in hell gcc will use bitkeeper. I'm
not trying to push that. I'm just hoping that if gcc does change to
something smarter than CVS, it would change to something that is truly
distributed, and doesn't have that broken "branch" notion, or the notion
of needing write permissions to some stupid central repository in order
to enjoy the idea of SCM.

Looking at the current projects out there, the only one that looks like
it has more than half a clue is "OpenCM". It doesn't seem to really do
the distributed thing right, but at least from what I've seen it looks
like they have the right notions about doing it.

The OpenCM project seems to still believe that distribution is just
about "disconnected commits" rather than understanding that if you do
distributed repositories right you shouldn't have branches at all
(instead of a branch, you should just have a _different_ repository),
but they at least seem to understand the importance of true
distribution. I hope gcc developers are giving it a look.

Linus
Tom Lord
2002-12-14 21:43:48 UTC
Permalink
The OpenCM project seems to still believe that distribution is
just about "disconnected commits" rather than understanding
that if you do distributed repositories right you shouldn't
have branches at all (instead of a branch, you should just have
a _different_ repository),

Branches can span repository boundaries just fine, and that's a nice
way to keep useful track of the history that relates the two forks.
Distribution is orthogonal to branching. Two repositories can be
separately administered and fully severable, yet branches can usefully
exist between them. For the "branched from" repository, this is a
passive, read-only operation.


Looking at the current projects out there, the only one that
looks like it has more than half a clue is "OpenCM". It
doesn't seem to really do the distributed thing right, but at
least from what I've seen it looks like they have the right
notions about doing it.

What aspect of arch has you confused? or, alternatively, what flaw
do you see in arch's approach to distribution?


The _real_ issue about independent repositories is that it
makes it possible to do truly independent development, and
makes notions like branches such an outdated idea.

Arranging that one line is a branch of another (even when they are in
two independent, severable repositories) facilitates simpler and more
accurate queries about how their histories relate. Among other
things, such queries can (a) take more of the drudgery out of some
common merging tasks, (b) better facilitate process automation when
the forks are, in fact, contributing to a common line of development.

GCC development faces a problem which Linux kernel development, you
seem to have said elsewhere, avoids by social means: it has direct and
appreciated contributors to the mainline who, nevertheless, are asked
to contribute their changes indirectly through a formal review and
testing process (rather than through, say, a "trusted lieutenant" --
in other words, in GCC, the work pool of the "lieutenants"
("maintainers", actually) is collected and shared among them in
flexible, fine-grained ways that are performed with considerable
discipline). Distribution _with_ branches can be a boon to those
contributors and the maintainers.

Overall -- I don't think there can be _that_ much contrast between the
GCC and LK development processes. GCC is a bit like LK, except that
instead of a Linus, GCC has a team. That team needs (and has) tools
to make them effective as the "virtual linus". (Some of us have ideas
for even better tools :-) That there is less of a tendancy for 3rd
parties to throw up their arms and make their own forks may not have
quite the implications you assert: the natures of the two systems and
the uses they are put to make comparison very difficult.

-t
Linus Torvalds
2002-12-15 00:16:31 UTC
Permalink
Post by Tom Lord
What aspect of arch has you confused? or, alternatively, what flaw
do you see in arch's approach to distribution?
To be honest, I tried arch back when I was testing different SCM's for the
kernel, and even just the setup confused me enough that I never got past
that phase. I suspect I just tried it too early in the development cycle,
and that turned me off it.

Also, the oft-repeated performance issues have kept me wary about arch.
Bitkeeper is quite fast, but even so Larry and friends actually ended
having to make some major performance improvements to the bk-3 release
simply because they were taken by surprise at just _how_ much data the
kernel SCM ends up needing to process.

I realize that there are a lot of advantages to keeping to high-level
scripting languages for the SCM, but it's also quite important to try to
avoid making the SCM itself be a distraction from a performance
standpoint. However, since I never got very far with arch, I really only
parrot what I've heard from others about its performance, so this may be
unfair.

Linus
Tom Lord
2002-12-15 01:11:44 UTC
Permalink
Post by Linus Torvalds
Also, the oft-repeated performance issues have kept me wary
about arch.
Fair if you're evaluating it from the "should I start using this
tomorrow" perspective (don't).

I think most of us who are fairly deep into arch think these problems
have straightforward solutions, and my goal is to try to find a
solution to the resource crisis that keeps me from finishing the work.

I realize that there are a lot of advantages to keeping to
high-level scripting languages for the SCM, but it's also
quite important to try to avoid making the SCM itself be a
distraction from a performance standpoint. However, since I
never got very far with arch, I really only parrot what I've
heard from others about its performance, so this may be
unfair.

The prototype/reference implementation of arch _is_ a mixture of shell
scripts and small C programs. I think the enforced simplicity is very
good for the architecture and I'm quite optimistic about the future
performance potential of this code.

arch is tiny, and I'm encouraging alternative implementations for a
variety of purposes. I hear that (have some salt grains with this)
someone is working on one in C++, and someone else on one in Python.
A Perl translation was made, but work on it seems to have stopped
(perhaps because the author changed work contexts) around the time it
was starting to function.

It is not quite accurate to say "the current implementation is slow
because it uses sh" -- some sh parts need recasting in C, many don't,
some of the admin tweaks that improve performance need to be made more
automatic....things like that. It's an optimizable prototype that
has not been prematurely optimized.

Just reading what you say here: the arch design has everything you
like about BK and probably a bit more to boot. It's just a resource
problem to get it to a 1.0 that is as comfortable to adopt as you've
found BK. Rumours that that will require $12M are exaggerated by, in
my estimate, about a factor of 10.

To be honest, I tried arch back when I was testing different
SCM's for the kernel, and even just the setup confused me
enough that I never got past that phase. I suspect I just
tried it too early in the development cycle, and that turned
me off it.

Perhaps. The currently active developers seem to be giving a lot of
attention to encapsulating matters such as that in convenience
commands layered over the core.


-t
Neil Booth
2002-12-14 22:15:36 UTC
Permalink
Linus Torvalds wrote:-
Post by Linus Torvalds
The part that you're missing, simply because you've probably used CVS
for too long, is the _distributed_ nature of Bitkeeper, and of Linux
development. Repeat after me: "There is no single tree". Everything is
distributed.
Uh, careful, Zack wrote parts of Bitkeeper, including designing the network
protocols IIRC.

Neil.
Zack Weinberg
2002-12-14 23:00:51 UTC
Permalink
Post by Neil Booth
Linus Torvalds wrote:-
Post by Linus Torvalds
The part that you're missing, simply because you've probably used CVS
for too long, is the _distributed_ nature of Bitkeeper, and of Linux
development. Repeat after me: "There is no single tree". Everything is
distributed.
Uh, careful, Zack wrote parts of Bitkeeper, including designing the network
protocols IIRC.
It is my understanding that the network protocol I designed is no
longer in use, and good riddance, it was my first try at such things
and I didn't know what I was doing.

But yes, I worked on Bitkeeper for about six months in 2000, so I do
know what its architecture is like.

zw
Momchil Velikov
2002-12-14 23:39:59 UTC
Permalink
Linus> My tree is often called the "official" tree, but what it
Linus> really is is just a base tree that many people maintain
Linus> their own forks from. This is fundamentally _more_

Err, haven't you noticed that this is the tree that many (all) people
want to merge their forks into ? I think this is "what it really is".

When evaluating a SCM tool, IMHO, the most important is the ease of
merges - remove the need for later merges and any sophisticated "fork"
tool boils down to a "cp -R".

Linus> scalable than the CVS mess that is gcc, since it much more
Linus> easily allows for very radical branches that do not need
Linus> any centralized permission from me.

I surely have a "fork" of GCC and I ain't got nobody's permission.
Permission is needed not when forking, but when merging.

Linus> Think of it this way: in gcc, the egcs split was a very
Linus> painful thing. In Linux, those kinds of splits (people
Linus> doing what they think is right, _without_ support from the
Linus> official maintainers) is how _everything_ gets done. Linux
Linus> is a "constantly forking" project, and that's how
Linus> development very fundamentally happens.

Linus> And a fork is a lot more scalable than a branch. It's also

There's no difference, unless by "branch" and "fork" you mean the
corresponding implementations in CVS and BK of one and the same
development model.

Linus> I would like to point out that Linux development has scaled
Linus> a lot better than gcc, to a larger source base (it's 5+ M
Linus> lines) with much more fundamental programming issues
Linus> (concurrency etc). I will bet you that the Linux kernel
Linus> merges are a lot bigger than the gcc ones, that development
Linus> happens faster, and that there are more independent
Linus> developers working on their own versions of Linux than
Linus> there are of gcc.

How about a different view on the subject ?

IMHO a good metric of the complexity of a particular problem/domain is
the overall ability of the mankind to cope with it.

Thus, what you describe, may be due to the fact that people capable of
kernel programming are a lot more than people capable of compiler
programming, IOW, that most kernel programming requires rather basic
programming knowledge, compared to most compilers programming.

No ?

Linus> Any source control system that has "write access" issues is
Linus> fundamentally broken in my opinion. Your repository should
Linus> be _yours_, and nobody elses. There is no "write access".
Linus> There is only some way to expedite a merge between two
Linus> repositories. The source control management should make it
Linus> easy for you to export your changes to other repositories.

A SCM should facilitate collaboration. Any SCM that requires single
person's permission for modifications to the source base (e.g. by
having only private repositories) is broken beyond repair and scalable
exactly like a BitKeeper^WBKL.

~velco
Linus Torvalds
2002-12-14 23:32:51 UTC
Permalink
Post by Momchil Velikov
I surely have a "fork" of GCC and I ain't got nobody's permission.
Permission is needed not when forking, but when merging.
But the point is, the "CVS mentality" means that a fork is harder to merge
than a branch, and you often lose all development history when you merge a
fork as a result of this (yeah, you can do a _lot_ of work, and try to
also merge the SCM information on a fork merge, but it's almost never done
because it is so painful).

That's why I think the CVS mentality sucks. You have only branches that
are "first-class" citizens, and they need write permission to create and
are very expensive to create. Note: I'm not saying they are slow - that's
just a particular CVS implementation detail. By "expensive" I mean that
they cannot easily be created and thrown away, so with the "CVS mentality"
those branches only get created for "big" things.

And the "cheap" branches (private check-outs that don't need write
permissions and can be done by others) lose all access to real source
control except the ability to track the original. Two of the cheap
branches cannot track each other in any sane way. And they have no
revision history at all even internally.

Yet it is the _cheap_ branches that should be the true first-class
citizen. Potentially throw-away code that may end up being really really
useful, but might just be a crazy pipe-dream. The experimental stuff that
would _really_ want to have nice source control.

And the "CVS mentality" totally makes that impossible. Subversion seems to
be only a "better CVS", and hasn't gotten away from that mentality, which
is sad.
Post by Momchil Velikov
Linus> And a fork is a lot more scalable than a branch. It's also
There's no difference, unless by "branch" and "fork" you mean the
corresponding implementations in CVS and BK of one and the same
development model.
Basically, by "branch" I mean something that fundamentally is part of the
"official site". If a branch has to be part of the official site, then a
branch is BY DEFINITION useless for 99% of developers. Such branches
SHOULD NOT EXIST, since they are fundamentally against the notion of open
development!

A "fork" is something where people can just take the tree and do their own
thing to it. Forking simply doesn't work with the CVS mentality, yet
forking is clearly what true open development requires.
Post by Momchil Velikov
IMHO a good metric of the complexity of a particular problem/domain is
the overall ability of the mankind to cope with it.
Thus, what you describe, may be due to the fact that people capable of
kernel programming are a lot more than people capable of compiler
programming, IOW, that most kernel programming requires rather basic
programming knowledge, compared to most compilers programming.
No ?
No.

Sure, you can want to live in your own world, and try to keep the
riff-raff out. That's the argument I hear from a lot of commercial
developers ("we don't want random hackers playing with our code, we don't
believe they can do as good a job as our highly paid professionals").

The argument is crap. It was crap for the kernel, it's crap for gcc. The
only reason you think "anybody" can program kernels is the fact that Linux
has _shown_ that anybody can do so. If gcc had less of a restrictive model
for accepting patches, you'd have a lot more random people who would do
them, I bet. But gcc development not only has the "CVS mentality", it has
the "FSF disease" with the paperwork crap and copyright assignment crap.

So you keep outsiders out, and then you say it's because they couldn't do
what you can do anyway.

Crap crap crap arguments. Trust me, there are more intelligent people out
there than you believe, and they can do a hell of a lot better work than
you currently allow them to do. Often with very little formal schooling.
Post by Momchil Velikov
A SCM should facilitate collaboration. Any SCM that requires single
person's permission for modifications to the source base (e.g. by
having only private repositories) is broken beyond repair and scalable
exactly like a BitKeeper^WBKL.
But you don't _undestand_. BK allows hundreds of people to work on the
same repository, if you want to. You just give them BK accounts on the
machine, the same way you do with CVS.

But that's not the scalable way to do things. The _scalable_ thing is to
let everybody have their own tree, and _not_ have that "one common point"
disease. You have the networking people working on their networking trees
_without_ merging back to me, because they have their own development
branches that simply aren't ready yet, for example. Having a single point
for work like that is WRONG, and it's definitely _not_ scalable.

Linus
Momchil Velikov
2002-12-15 12:02:14 UTC
Permalink
Post by Momchil Velikov
A SCM should facilitate collaboration. Any SCM that requires
single person's permission for modifications to the source base
(e.g. by having only private repositories) is broken beyond
repair and scalable exactly like a BitKeeper^WBKL.
Linus> But you don't _undestand_. BK allows hundreds of people to
Linus> work on the same repository, if you want to. You just give
Linus> them BK accounts on the machine, the same way you do with
Linus> CVS.

Ah, I _do_ understand that this is possible. I _do_ understand very
well that there's no "Linux Kernel Project", but there is "Linus's
kernel tree" . You seem to not understand that there _is_ "GCC
Project" as well as "GNU Project".

Linus> But that's not the scalable way to do things. The
Linus> _scalable_ thing is to let everybody have their own tree,

In that case you don't need a SCM at all - you can do pretty well with
few simple utilities to maintain a number of hardlinked trees.

"cp -Rl" - branch, tag, fork, whatever
"share <src> <dst>" - make identical files hardlinks
"unshare <path>" - make <path> a file with one link (recursively)
"dmerge <old> <new> <mine> - same as merge(1), but for trees
"diff -r" - make a changeset
"mail" - send a changeset
"patch" - apply a changeset
"rm -rf" - transaction rollback, so we have atomicity, see :)

Linus> and _not_ have that "one common point" disease. You have

This is not a disease, it is a _natural_ consequence of
_collaboration_. And collaboration is an _absolute necessity_ when you
are above certain degree of coupling between the components. A change
in the network stack can hardly affect the operation of the ATA driver
- however this is not the case in GCC [1]. Changes in particular
phase _do_ affect other phases and this is not a coincidence - it is a
consequence from the fact that GCC components are tightly coupled by
the virtue of working on a common data structure.

The degree module coupling can be characterized as follows [2] (from
loose to tight):

Degree Description
------ -----------
0 Independent - no coupling

1 Data coupling - interaction between the
modules is with simple, unstructured data
types, via interface functions.

3 Template coupling - interface function
parameters include structured data types.

4 Common data - when modules share common data
structure.

5 Control - when one module controls others with
flags, switches, command codes, etc.

The Linux kernel tends to be in {0, 1, 3}. GCC tends to be in {4, 5}.

IOW, GCC components are roughly from three to five times more tighly
coupled than Linux kernel components.

My point is that the Linux kernel development model [3], while
obviously successful, it not necessarily adequate for other projects,
particularly for GCC.

~velco


[1] AFAICT, I'm not a GCC developer.

[2] there may be other mertics, I've found that one adequate, though
YMMV.

[3] And, yes, I claim I fully understand it, at least I fully
understand what _you_ want it to be.
Momchil Velikov
2002-12-15 12:21:34 UTC
Permalink
Linus> Crap crap crap arguments. Trust me, there are more
Linus> intelligent people out there than you believe, and they can
Linus> do a hell of a lot better work than you currently allow
Linus> them to do. Often with very little formal schooling.

But yes, there are lots of intelligent people out there, but while
intelligence is usually sufficient for working on a kernel, working on
a compiler requires _knowledge_ (no matter formal or not).

~velco
Linus Torvalds
2002-12-15 18:45:56 UTC
Permalink
Post by Momchil Velikov
Linus> Crap crap crap arguments. Trust me, there are more
Linus> intelligent people out there than you believe, and they can
Linus> do a hell of a lot better work than you currently allow
Linus> them to do. Often with very little formal schooling.
But yes, there are lots of intelligent people out there, but while
intelligence is usually sufficient for working on a kernel, working on
a compiler requires _knowledge_ (no matter formal or not).
Blaah. I _bet_ that is not true.

I actually had my own gcc tree for Linux kernel development back when I
started, mostly because I just enjoyed it and found the compiler
interesting. I added builtins for things like memcpy() etc because I cared
(and it was more fun that writing assembly language library routines), and
because gcc at that time didn't have hardly any support for things like
that.

I didn't understand the whole compiler, BUT THAT DID NOT MATTER. The same
way that most Linux kernel developers don't understand the whole kernel,
and do not even need to. Sure, you need people with specialized knowledge
for specialized areas (designing the big picture etc), but that's a small
small part of it.

To paraphrase, programming is often 1% inspiration and 99% perspiration.

In short, your argument is elitist and simply _wrong_. It's true that to
create a whole compiler you need a whole lot of knowledge, but that's true
of any project - including operating systems. But that doesn't matter,
because there isn't "one" person who needs to know everything.

Linus
Momchil Velikov
2002-12-15 21:41:04 UTC
Permalink
Linus> In short, your argument is elitist and simply _wrong_.

*shrug* That's my explanation to what I observe - more people develop
kernels than compilers. Particular compiler's development model or
patch review and acceptance policy do not matter at all - if they are
an obstacle people's creativity would be redirected somewhere else.

I may be wrong. But I'm yet to hear a more credible explanation for
this simple fact.

~velco
Pop Sébastian
2002-12-15 22:15:58 UTC
Permalink
Post by Momchil Velikov
Linus> In short, your argument is elitist and simply _wrong_.
*shrug* That's my explanation to what I observe - more people develop
kernels than compilers. Particular compiler's development model or
patch review and acceptance policy do not matter at all - if they are
an obstacle people's creativity would be redirected somewhere else.
I may be wrong. But I'm yet to hear a more credible explanation for
this simple fact.
Maybe it's true because for writing compiler optimizations one
should have some knowledge in mathematics. Most of the new techniques
developped for optimizing compilers use abstract representations
based on mathematical objects (such as graphs, lattices, vectorial spaces,
polyhedras, ...)

Maybe we're wrong but the percentage of mathematicians who contribute
to GCC could be slightly bigger than for LK.

Sebastian
Linus Torvalds
2002-12-15 23:45:59 UTC
Permalink
Post by Pop Sébastian
Post by Momchil Velikov
I may be wrong. But I'm yet to hear a more credible explanation for
this simple fact.
Maybe it's true because for writing compiler optimizations one
should have some knowledge in mathematics.
Naah. It's simple - kernels are just sexier.

Seriously, I think it's just that a kernel tends to have more different
_kinds_ of problems, and thus tend to attract different kinds of people,
and more of them.

Compilers are complicated, no doubt about that, but the complicated stuff
tends to be mostly of the same type (ie largely fairly algorithmic
transformations for the different optimization passes). In kernels, you
have many _different_ kinds of issues, and as a result you'll find more
people who are interested in one of them. So you'll find people who care
about filesystems, or people who care about memory management, or people
who find it interesting to do concurrency work or IO paths.

That is obviously also why the kernel ends up being a lot of lines of
code. I think it's about an order of magnitude bigger in size than all of
gcc - not because it is an order of magnitude more complex, obviously, but
simply because it has many more parts to it. And that directly translates
to more pieces that people can cut their teeth on.
Post by Pop Sébastian
Maybe we're wrong but the percentage of mathematicians who contribute
to GCC could be slightly bigger than for LK.
I don't think you're wrong per se. The "transformation" kind of code is
just much more common in a compiler, and the kind of people who work on it
are more likely to be the mathematical kind of people. It's not the only
part of gcc, obviously (I think parsing is underrated, and I'm happy that
the preprocessing front-end has gotten so much attention in the last few
years), but it's one of the bigger parts.

And people clearly seek out projects that satisfy their interests.

Linus
Bruce Stephens
2002-12-16 00:29:36 UTC
Permalink
Linus Torvalds <***@transmeta.com> writes:

[...]
Post by Linus Torvalds
That is obviously also why the kernel ends up being a lot of lines of
code. I think it's about an order of magnitude bigger in size than all of
gcc - not because it is an order of magnitude more complex, obviously, but
simply because it has many more parts to it. And that directly translates
to more pieces that people can cut their teeth on.
The gcc tree I have seems to have 4145483 lines, whereas the 2.4.20
kernel seems to have 4841227 lines. (Not lines of code; this includes
all files in the unbuilt tree (including CVS directories for gcc,
although this is probably trivial), and it includes comments and files
which are not code. In the gcc case, it may include some generated
files; I'm not sure how Ada builds nowadays.)

Excluding the gcc testsuites, gcc has 3848080 lines. So gcc (the
whole of gcc, with all its languages) seems to be a bit smaller than
the kernel, but probably not by an order of magnitude.

This is reenforced by "du -s": the gcc tree takes up 187144K, the
kernel takes up 170676K. None of this is particularly precise,
obviously, but it points to the two projects (with all their combined
bits) being not too dissimilar in size. Which is a possibly
interesting coincidence. (The 2.5 kernel may be much bigger; I
haven't looked. The tarballs don't look *that* much bigger, however.)

[...]
Linus Torvalds
2002-12-16 00:47:52 UTC
Permalink
Post by Bruce Stephens
The gcc tree I have seems to have 4145483 lines
Hmm, might be my mistake. I only have an old and possibly pared-down tree
online. However, I also counted lines differently: I only counted *.[chS]
files, and you may have counted everything (the gcc changelogs and .texi
files in particular are _huge_ if you have a full complement of them
there).

What does "find . -name '*.[chS]' | xargs cat | wc" say?

(But you're right - I should _at_least_ count the .md files too, so my
count was at least as bogus as I suspect yours was)

Linus
Bruce Stephens
2002-12-16 01:09:17 UTC
Permalink
Post by Linus Torvalds
Post by Bruce Stephens
The gcc tree I have seems to have 4145483 lines
Hmm, might be my mistake. I only have an old and possibly pared-down tree
online. However, I also counted lines differently: I only counted *.[chS]
files, and you may have counted everything (the gcc changelogs and .texi
files in particular are _huge_ if you have a full complement of them
there).
The ChangeLog files give a total of 306201 lines. texi files (and
info files) add another 426482. So that's a lot, yes. (In terms of
the size of the project, probably the texi files at least ought to be
counted, just as the stuff in Documentation ought to be counted in
some way for the Linux kernel. But not the generated .info files.)
Post by Linus Torvalds
What does "find . -name '*.[chS]' | xargs cat | wc" say?
1445809 5700810 43690421

But that doesn't include the ada or java files (or the C++ standard
library). Quite possibly it doesn't include some Objective C runtime
source, too.
Post by Linus Torvalds
(But you're right - I should _at_least_ count the .md files too, so
my count was at least as bogus as I suspect yours was)
Sure. It's all pretty meaningless---I think the two projects happen
to be approximately the same size (with the Linux kernel bigger), but
I don't think it's anything other than coincidence. gcc/ada accounts
for about 800K lines, for example, and that's relatively recent, IIRC.
Diego Novillo
2002-12-16 16:16:57 UTC
Permalink
Post by Linus Torvalds
Post by Bruce Stephens
The gcc tree I have seems to have 4145483 lines
Hmm, might be my mistake. I only have an old and possibly pared-down tree
online. However, I also counted lines differently: I only counted *.[chS]
files, and you may have counted everything (the gcc changelogs and .texi
files in particular are _huge_ if you have a full complement of them
there).
Output of sloccount on a relatively recent snapshot:

-----------------------------------------------------------------------------
SLOC Directory SLOC-by-Language (Sorted)
1274221 gcc ansic=839349,ada=298101,cpp=73596,yacc=23251,asm=20244,
fortran=6934,exp=4706,sh=4430,objc=2751,lex=559,perl=189,awk=111
225571 libjava java=131300,cpp=65054,ansic=27198,exp=1213,perl=782,
awk=24
67452 libstdc++-v3 cpp=49425,ansic=17270,sh=525,exp=193,awk=39
34729 boehm-gc ansic=25682,sh=7631,cpp=972,asm=444
21798 libiberty ansic=21495,perl=283,sed=20
11657 top_dir sh=11657
10376 libbanshee ansic=10376
10358 libf2c ansic=10037,fortran=321
9581 zlib ansic=8309,asm=712,cpp=560
8904 libffi ansic=5545,asm=3359
8002 libobjc ansic=7233,objc=397,cpp=372
3721 contrib cpp=2306,sh=935,perl=324,awk=67,lisp=59,ansic=30
3074 libmudflap ansic=3074
2506 fastjar ansic=2325,sh=181
1463 include ansic=1432,cpp=31
667 maintainer-scripts sh=667
0 config (none)
0 CVS (none)
0 INSTALL (none)


Totals grouped by language (dominant language first):
ansic: 979355 (57.81%)
ada: 298101 (17.60%)
cpp: 192316 (11.35%)
java: 131300 (7.75%)
sh: 26026 (1.54%)
asm: 24759 (1.46%)
yacc: 23251 (1.37%)
fortran: 7255 (0.43%)
exp: 6112 (0.36%)
objc: 3148 (0.19%)
perl: 1578 (0.09%)
lex: 559 (0.03%)
awk: 241 (0.01%)
lisp: 59 (0.00%)
sed: 20 (0.00%)


Total Physical Source Lines of Code (SLOC) = 1,694,080
Development Effort Estimate, Person-Years (Person-Months) = 491.37 (5,896.47)
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months) = 5.64 (67.73)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 87.06
Total Estimated Cost to Develop = $ 66,377,705
(average salary = $56,286/year, overhead = 2.40).
SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL.
Please credit this data as "generated using 'SLOCCount' by David A. Wheeler."
-----------------------------------------------------------------------------


Diego.
Pop Sébastian
2002-12-17 10:08:26 UTC
Permalink
For comparison I've ran sloccount on LK:

$ sloccount ./linux-2.5.52
[...]

SLOC Directory SLOC-by-Language (Sorted)
1664092 drivers ansic=1659643,asm=1949,yacc=1177,perl=813,lex=352,
sh=158
678895 arch ansic=507796,asm=170311,sh=624,awk=119,perl=45
365490 include ansic=364696,cpp=794
340797 fs ansic=340797
261122 sound ansic=260940,asm=182
193052 net ansic=193052
14814 kernel ansic=14814
13523 mm ansic=13523
11086 scripts ansic=6830,perl=1339,cpp=1218,yacc=531,tcl=509,lex=359,
sh=285,awk=15
6988 crypto ansic=6988
6083 lib ansic=6083
2740 ipc ansic=2740
1787 init ansic=1787
1748 Documentation sh=898,ansic=567,lisp=218,perl=65
1081 security ansic=1081
119 usr ansic=119
0 top_dir (none)


Totals grouped by language (dominant language first):
ansic: 3381456 (94.89%)
asm: 172442 (4.84%)
perl: 2262 (0.06%)
cpp: 2012 (0.06%)
sh: 1965 (0.06%)
yacc: 1708 (0.05%)
lex: 711 (0.02%)
tcl: 509 (0.01%)
lisp: 218 (0.01%)
awk: 134 (0.00%)




Total Physical Source Lines of Code (SLOC) = 3,563,417
Development Effort Estimate, Person-Years (Person-Months) = 1,072.73 (12,872.75)
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months) = 7.59 (91.12)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 141.27
Total Estimated Cost to Develop = $ 144,911,083
(average salary = $56,286/year, overhead = 2.40).
SLOCCount is Open Source Software/Free Software, licensed under the FSF GPL.
Please credit this data as "generated using 'SLOCCount' by David A. Wheeler."
Tom Lord
2002-12-17 12:02:19 UTC
Permalink
GCC:

Total Estimated Cost to Develop = $ 66,377,705

LK:

Total Estimated Cost to Develop = $ 144,911,083


and:

(average salary = $56,286/year, overhead = 2.40).


(That's an appallingly low average salary, btw., and a needlessly
large overhead. If we're thinking of a nearly 20 year average, maybe
it's not _too_ badly removed from reality, but it's not a realistic
basis for planning moving forward.)

Someone did a sloccount run on a bunch of my 1-man-effort software,
developed over about 10 calendar years, and the person-years count was
surprisingly accurate.

In general, there is something of a business crisis in the free
software world. It's particularly noticable around businesses based
on linux distributions.

Those distributions represent a huge amount of unpaid work.
Businesses using them got some free help bootstrapping themselves into
now favorable positions. So, not only did they get the unpaid work
for free (as in beer), they traded that for favorable market positions
that raise the barrier of entry to new competitors. While in theory
"anyone" can start selling their own distro, in reality, there's only
a few established companies and investors with deep pockets who have
any chance in this area.

So what's the crisis? Well, those freeloaders aren't exactly being
agressive about figuring out how to sustain the free software movement
with R&D investment. Companies spend a little on public projects,
sure, but you can count the number of employees participating,
industry wide, on the fingers of a few 10s of people and (total,
industry-wide) corporate donations to code-generating individuals and
NPOs with no more than 7 significant digits per year. When they do
spend on public projects, it is most often for very narrow tactical
purposes -- not to make the ecology of projects healthier overall. In
significant proportions, they spend R&D money on entirely in-house
projects that, while rooted in free software, benefit nobody but the
companies themselves.

You know, it's easy to make a few quarters for your business unit when
you, in essence, cheat.

So the crisis is that in the medium term, as engineering businesses
go, these aren't sustainable models. And when they start leading
volunteers and soaking up volunteer work for their own aims, and
capturing mind-share in the press, one has to start to wonder whether
they aren't, overall, doing more harm than good. And then there's
some social justice and labor issues....

Bill Gates, when he says that free software is a threat to innovation,
is currently correct. UnAmerican? You bet!

And, btw, surprise!: In the free software world, corporate GCC hackers
are the relative fat cats. Go figure.


-t
Stan Shebs
2002-12-17 22:22:43 UTC
Permalink
Post by Tom Lord
And, btw, surprise!: In the free software world, corporate GCC hackers
are the relative fat cats. Go figure.
That's because GCC hackers are doing things that are worth serious
amounts of money to people that have it to spend. Apple has signed
up with GCC because it solves more of Apple's problems more cheaply
than the several proprietary possibilities, and having made it part
of Mac OS X, Apple's overall corporate health is now partly dependent
on GCC continuing to be a good compiler, and on fixing remaining
problems, such as slowness.

If you were able to convince Apple mgmt that you could make GCC
10x faster not using precompiled headers, I think you could name
your price and get hired the same day; that's how important the
problem is to Apple. (You're going to have to be really convincing
though; our mgmt has listened to a hundred pitches already.)

Speaking more generally, the folks that get paid to do free software
are the ones who are solving the problems of people with the money.
It's up to us to be clever enough to figure out to solve the specific
problems in a way that improves architecture and infrastructure.
That was a key but underappreciated aspect of Cygnus' development
contracts; we would always try to go after projects that included
infrastructure improvement, but if necessary we would do something
that was random but lucrative and use the profits to pay for
generic work.

To put it more simply, find a rich person with an itch, and offer
to scratch it for them. :-)

Stan
Stan Shebs
2002-12-17 23:19:48 UTC
Permalink
Post by Stan Shebs
Speaking more generally, the folks that get paid to do free
software are the ones who are solving the problems of people
with the money. It's up to us to be clever enough to figure
out to solve the specific problems in a way that improves
architecture and infrastructure. That was a key but
underappreciated aspect of Cygnus' development contracts; we
would always try to go after projects that included
infrastructure improvement, but if necessary we would do
something that was random but lucrative and use the profits to
pay for generic work.
Was it customers who underappreciated that? or was that a selling
point?
Sometimes it was a selling point, sometimes the concept was too subtle
for the customer to grasp. In the mid-90s, a good percentage of time
still had to be spent explaining free software, reassuring people that
GCC didn't cause its output to be GPLed, etc. It was interesting to see
how much variation there was among customers, and also how important it
was to have actual sales people in the process - engineers left to
themselves would rathole on side issues and never get around to the
actual dealmaking.

Stan

Mike Stump
2002-12-17 01:12:25 UTC
Permalink
Post by Linus Torvalds
That is obviously also why the kernel ends up being a lot of lines of
code. I think it's about an order of magnitude bigger in size than all of
gcc
bash-2.05a$ find gcc -type f -print | xargs cat | wc -l
4084979

[ ducking ]
Stan Shebs
2002-12-16 00:47:54 UTC
Permalink
Post by Linus Torvalds
[...]
If gcc had less of a restrictive model
for accepting patches, you'd have a lot more random people who would do
them, I bet.
I can assure you that there are lots of random GCC patches and forks
out there, some of them drastically divergent from the main version.
(I myself have been responsible for a few of them.)

Nobody is being stopped from forking GCC and promoting their own
versions. A large number of GCC developers have chosen to cooperate
more closely on a single tree because we've empirically determined
that we get a better quality compiler that way. Choice of source
management systems is a minor detail, not a make-or-break issue.
Post by Linus Torvalds
But gcc development not only has the "CVS mentality", it has
the "FSF disease" with the paperwork crap and copyright assignment crap.
If AT&T had come down on GNU in the 80s the way that they did on
BSD in the early 90s, you wouldn't have had any software to go
with your kernel. RMS is much smarter than you seem to think.

Stan
Zack Weinberg
2002-12-10 01:46:00 UTC
Permalink
I'm keeping around a lot of context. Scroll down.
Post by Walter Landry
Post by Joseph S. Myers
Post by Walter Landry
Post by Zack Weinberg
0d. The data stored in the repository cannot be modified by
unprivileged local users except by going through the version
control system. Presently I could take 'vi' to one of the ,v
files in /cvs/gcc and break it thoroughly, or sneak something into
the file content, and leave no trace.
There is no interaction with root, so if you own the archive, you can
always do what you want. To get anything approaching this, you have
to deal with PGP signatures, SHA hashes, and the like. OpenCM is
probably the only group (including BitKeeper) that even comes close to
doing this right.
This sort of thing has been done simply by a modified setuid (to a cvs
user, not root) cvs binary so users can't access the repository directly,
only through that binary. More generically, with a reasonable protocol
for local repository access it should be possible to use GNU userv to
separate the repository from the users.
This is a different security model. Arch is secure because it doesn't
depend on having priviledged access. For example, there is an "rm
-rf" command built into arch.
I have a feeling that you are thinking of how CVS handles things, with
a centralized server. Part of the whole point of arch is that there
is no centralized server. So, for example, I can develop arch
independently of whether Tom thinks that I am worthy enough to do so.
I can screw up my archive as much as I want (and I have), and Tom can
be blissfully unaware. Easy merging is what makes this possible.
So you don't, in general, have a repository that is writeable by more
than one person.
Let me be specific about the problem I'm worried about.

As Joseph pointed out, GCC development is and will be centered around
a 'master' server. If we wind up using a distributed system,
individual developers will take advantage of it to do offline work,
but the master repository will still act as a communication nexus
between us all, and official releases will be cut from there. I doubt
anyone will do releases except from there.[1] The security of this
master server is mission-critical.

The present situation, with CVS pserver enabled for read-only
anonymous access, and write privilege available via CVS-over-ssh, has
two potentially exploitable vulnerabilities that should be easy to
address in a new system.

_Imprimis_, the CVS pserver requires write privileges on the CVS
repository directories, even if it is providing only read access.
Therefore, if the 'anoncvs' user is somehow compromised -- for
instance, by a buffer overflow bug in the pserver itself -- the
attacker could potentially modify any of the ,v files stored in the
repository. This was what I was talking about with my point 0c. It
sounds like all the replacements for CVS have addressed this, by
allowing the anoncvs-equivalent server process to run as a user that
doesn't have OS-level write privileges on the repository.

_Secundus_, CVS-over-ssh operates by invoking 'cvs server' on the
repository host -- running under the user ID of the invoker, who must
have an account on the repository host. It can't perform any
operations that the invoking user can't. Which means that the
invoking user must also have OS-level write privileges on the
repository. Now, such users are _supposed_ to be able to check in
changes to the repository, but they _aren't_ supposed to be able to
modify the ,v files with a text editor. The distinction is crucial.
If the account of a user with write privileges is compromised, and
used to check in a malicious change, the version history is intact,
the change will be easily detected, and we can simply back out the
malice. If the account of a user with write privileges is compromised
and used to hand-edit a malicious change into a ,v file, it's quite
that this will go undetected until after half the binaries on the
planet are untrustworthy. It is this latter scenario I would like to
be impossible.

There are several possible ways to do that. One way is the way
Perforce does it: _all_ access, even local access, goes through p4d,
and p4d can run under its own user ID and be the only user ID with
write access to the repository. Another way, and perhaps a cleverer
one, is OpenCM's way, where the (SHA of) the file content is the
file's identity, so a malicious change will not even be picked up.
(Please correct me if I misunderstand.) Of course, that provides no
insulation against an attacker using a compromised account to execute
"rm -fr /path/to/repository", but *that* problem is best solved with
backups, because a disk failure could have the same effect and there's
nothing software can do about that.

zw
Branko Čibej
2002-12-11 09:09:56 UTC
Permalink
Post by Joseph S. Myers
Post by Walter Landry
arch doesn't interact at all with root. The remote repositories are
all done with sftp, ftp, and http, which is as secure as those servers
are.
Is this - for anonymous access - _plain_ HTTP, or HTTP + WebDAV + DeltaV
which svn uses? One problem there was with SVN - it may have been fixed
by now, and a fix would be necessary for it to be usable for GCC - was its
use of HTTP and HTTPS (for write access); these tend to be heavily
controlled by firewalls and the ability to tunnel over SSH (with just that
one port needing to be open) would be necessary. "Transparent" proxies
may pass plain HTTP OK, but not the WebDAV/DeltaV extensions SVN needs.
There is now a new repository access layer in place that can be easily
piped over SSH and doesn't require Apache on the server side. It's not
as well tested yet, of course.
--
Brane Čibej <***@xbc.nu> http://www.xbc.nu/brane/
Joseph S. Myers
2002-12-09 00:49:49 UTC
Permalink
Post by Tom Lord
1) There are frequent reports on this list of glitches with
the current CVS repository.
The most common problem relates to the fileattr performance optimization.
There are known causes, and a known workaround (remove the cache files
when the problem occurs).

Other problems (occassional repository corruption) may often relate to
hardware problems. BK uses extensive checksumming to detect such failures
early (since early detection means backups can more easily be found); the
RCS format has no checksums. I don't know what svn or arch do here.

There are particular issues that are relevant to GCC (and other CVS users)
that SVN addresses or intends to address as a "better CVS":

* Proper file renaming support.
* Atomic checkins across multiple files (rarely a problem).
* O(1) performance of tag and branch operations. (A major issue for the
snapshot script; when the machine is loaded it can take hours to tag the
tree with the per-snapshot tag, remove the old gcc_latest snapshot tag and
apply the new one (writing to every ,v file several times). Part of the
problem, however, is waiting on locks in each directory, and reducing the
extent to which locks are needed (e.g. avoiding them for anonymous
checkouts) and the time for which they are held would help.)
* Performance of operations (checkout, update, ...) on branches (reading
every file in the tree; the cache mentioned above avoids this problem for
HEAD only).
* cvs update -d and modules (more an issue with merged gcc and src trees)
(I don't know whether svn does modules yet).

I haven't seen an obvious need for major changes in branch merging or
distributed repositories, but people making heavy use of branches may well
have a use for better tools there. It's just that something (a)
supporting file renames and (b) having much better performance (including
on branches) and (c) having better reliability would solve most of the
problems for most of the users. (Not all problems for all users, better
tools aiming towards that are still useful if they don't cause more
trouble in the common case. Checkout, update, diff, annotate, commit
shouldn't be made any more complicated.)
Post by Tom Lord
2) GCC, more than many projects, relies on a distributed
testing effort, which mostly applies to the HEAD revision
and to release candidates. Most of this testing is done
by hand.
Better tools are useful here (I always want more testing and more
testcases) but it isn't much to do with version control, rather with
processing the test results into a coherent form (there used to be a
database driven from gcc-testresults) and getting people to fix
regressions they cause (not a problem lately, but there have been long
periods with the regression tester showing regressions staying for weeks).
Post by Tom Lord
7) Questions about which patches relate to which issues in the
issue database are fairly common.
Better tools may help if they encourage volunteers to do the boring task
of going through incoming bug reports and checking they include enough
information to reproduce them and can be reproduced. But that's a matter
of the long-delayed Bugzilla transition (delayed by human time to set up a
new machine, not by lack of better version control) possibly linked with
some system for bug reports to have enough well-defined fields for
automatic testing.
Post by Tom Lord
9) Distributed testing occurs mostly on the HEAD -- which
means that the HEAD breaks on various targets, fairly
frequently.
It means that HEAD breakage is frequently detected.
Post by Tom Lord
11) Some efforts, such as overhauling the build process, will
probably benefit from a switch to rev ctl. systems that
support tree rearrangements.
I think it's better to just do renames the CVS way (delete and add) now,
rather than waiting, then when changing make the repository conversion
tool smart enough to handle most of the renames that have taken place in
the GCC repository.

Better tools such as svn or arch may be useful, but we're not CM
developers so it's just a matter of evaluating such tools when they are
ready (do all the common things CVS does just as easily, are reliable
enough, have good enough (preferably better than CVS) performance for what
we do, solve some of the problems with CVS). Indications (such as above)
of problems with CVS for GCC aren't particularly important, since the main
problems with CVS are well known and affect GCC much as they affect other
projects.
--
Joseph S. Myers
***@cam.ac.uk
Branko Čibej
2002-12-11 09:04:46 UTC
Permalink
Post by Joseph S. Myers
* cvs update -d and modules (more an issue with merged gcc and src trees)
(I don't know whether svn does modules yet).
Subversion does modules a lot better than CVS, if I do say so myself. See

http://svnbook.red-bean.com/book.html#svn-ch-6-sect-3
--
Brane Čibej <***@xbc.nu> http://www.xbc.nu/brane/
Zack Weinberg
2002-12-09 17:40:15 UTC
Permalink
Post by Zack Weinberg
0a. All data stored in the repository is under an end-to-end
checksum. All data transmitted over the network is independently
checksummed (yes, redundant with TCP-layer checksums). CVS does
no checksumming at all.
Doesn't SSH?
I assume it has to, since cryptography usually requires that.
(And CVS does checksum checkouts/updates: if after applying a diff in cvs
update the file checksum doesn't match, it warns and regets the whole
file, which can indicate something was broken in the latest checkin to the
file (yielding a bogus delta).
I didn't know that. But, as you say, it's not nearly enough. (When
was the last time we got a block of binary zeroes in a ,v file and
nobody noticed for months?)
0aa. Checksums stored in the repository format for all file
revisions, deltas, log messages etc., with an easy way to verify
them - to detect corruption early.)
Worth pointing out that subversion doesn't do as much checksumming as
we'd like, either.
The normal current practice here is for branch ChangeLogs to be kept
in a separate file, not the ChangeLogs that need merging from
mainline. (In the case of BIB the branch ChangeLog then goes on the
top of the mainline one (with an overall "merge from BIB" comment)
when the merge back to mainline is done. For branches developing
new features a new ChangeLog entry describing the overall logical
effect of the branch changes, not the details of how that state was
reached, is more appropriate.)
Unfortunately, this is not how BIB was done, and I'm stuck with the
way it is being done now (the normal ChangeLog files are used, and I
resolve the conflict on every merge). Next time around, it would
certainly be easier to use a separate file -- but better still to
avoid maintaining the files at all.

zw
Jack Lloyd
2002-12-09 18:56:09 UTC
Permalink
Post by Zack Weinberg
0aa. Checksums stored in the repository format for all file
revisions, deltas, log messages etc., with an easy way to verify
them - to detect corruption early.)
Worth pointing out that subversion doesn't do as much checksumming as
we'd like, either.
OpenCM (opencm.org) does really good checksumming; everything is based off
of strong hashes (and RSA signatures where needed). In particular you're 0d
requirement is met in a way that no other CM system (that I've heard about)
can do. Nobody (even root) can substitute one file for another or similiar
nastiness. Well, unless they can break SHA-1 in a real serious way.

I'll mention I work on OpenCM (it's my day job), and additionally I promise
I won't go endlessly promoting it on the list. I'd be happy to answer any
questions off list if you like, but this is the first and last time I'll
bring it up here.

-Jack
Robert Dewar
2002-12-12 13:17:07 UTC
Permalink
Post by Phil Edwards
Heh. I never thought I'd hear that term applied to software development.
I like it.
(term = centripetal)

Perhaps that's because the image/allusion is entirely unclear :-)
Nathanael Nerode
2002-12-15 01:42:55 UTC
Permalink
Post by Linus Torvalds
But the point is, the "CVS mentality" means that a fork is harder to
merge than a branch, and you often lose all development history when
you merge a
fork as a result of this (yeah, you can do a _lot_ of work, and try to
also merge the SCM information on a fork merge, but it's almost never
done because it is so painful).
In GCC, we've been known to lose development history when we merge a
branch, and merging branches has been incredibly painful. So I'm not
sure merging forks is actually harder; merging branches may be. ;-)

Fork merges get submitted as a series of patches (which then need to get
approved), and associated ChangeLog entries. They go in pretty cleanly.
The fork developer can track his/her own internal change history however
he or she likes, but generally will submit an 'expurgated' history for
merging, devoid of the false starts, which makes the patches a lot easier
to review. This is in fact an argument in favor of losing
development history. ;-D
Post by Linus Torvalds
And the "CVS mentality" totally makes that impossible. Subversion seems
to be only a "better CVS", and hasn't gotten away from that mentality,
which is sad.
Well, Subversion aims only to be a "better CVS", according to its
mission statement. Frankly, a 'better CVS' would help a lot, for GCC.
The GCC development plan actively encourges the use of branches for most
development (as opposed to bug fixes). But CVS makes it less than easy
and less than fast.

In addition, CVS makes setting up an anonymously accessible repository
into a pain in the neck; in this case 'forking' to your own repos has a
stupid and unnecessary overhead. Theoretically this should be easier
with Subversion, so there should be more private repositories floating
around.

I think that there are a few very small features which would make
Subversion fairly effective for the typical use case of "I have a
branch, I track a particular 'mainline', and intermittently I merge into
the mainline", even if the 'branch' is in a different repos from the
'mainline', and the 'mainline' is a branch. But I said that on the svn
list...
Post by Linus Torvalds
Yet it is the _cheap_ branches that should be the true first-class
citizen. Potentially throw-away code that may end up being really
really useful, but might just be a crazy pipe-dream. The experimental
stuff that would _really_ want to have nice source control.
Interestingly, I tend to find that this sort of stuff is exactly what
*doesn't* need source control; source control simply obscures the
process by exposing too much development history, much of which has no
relevance to the current version. Or did you mean code that already
works, and is being refined, rather than code in the 'rewrite from
scratch every two weeks' stage?

--Nathanael
Linus Torvalds
2002-12-15 04:13:35 UTC
Permalink
Post by Nathanael Nerode
In GCC, we've been known to lose development history when we merge a
branch, and merging branches has been incredibly painful. So I'm not
sure merging forks is actually harder; merging branches may be. ;-)
Heh. That's a sad statement about CVS branches in itself.
Post by Nathanael Nerode
Fork merges get submitted as a series of patches (which then need to get
approved), and associated ChangeLog entries. They go in pretty cleanly.
This is actually not that different from the "old" Linux way, ie the SCM
does _nothing_ for merging stuff. It certainly worked fine for me, and
it's how about half of the Linux developers still work.

The advantage of the SCM-assisted merges is really that when you trust
the other side, it becomes a non-issue. So to some degree you might as
well think of a SCM-assisted merge as having "write access" to the tree,
except it's a one-time event rather than a continuing process (but
unlike CVS write access it doesn't _need_ to be constant, since both
sides have access to their own SCM structures on their own, and don't
need to merge all the time).
Post by Nathanael Nerode
The fork developer can track his/her own internal change history however
he or she likes, but generally will submit an 'expurgated' history for
merging, devoid of the false starts, which makes the patches a lot easier
to review. This is in fact an argument in favor of losing
development history. ;-D
We do that with BK too, occasionally. It's sometimes just cleaner to
create a new clone with a "cleaned up" revision history. It's not needed
all that often, but I certainly agree that sometimes you just don't want
to see all the mistakes people initially made.

It's also needed in BK for things like merging from two totally
different repositories - you can't auto-merge just one fix from a
Linux-2.4.x BK tree into a 2.5.x BK tree, for example (when you merge in
BK, you merge _everything_ in the two repositories). So those have to
be done as patches, kind of like the clean-up thing.
Post by Nathanael Nerode
Post by Linus Torvalds
Yet it is the _cheap_ branches that should be the true first-class
citizen. Potentially throw-away code that may end up being really
really useful, but might just be a crazy pipe-dream. The experimental
stuff that would _really_ want to have nice source control.
Interestingly, I tend to find that this sort of stuff is exactly what
*doesn't* need source control; source control simply obscures the
process by exposing too much development history, much of which has no
relevance to the current version. Or did you mean code that already
works, and is being refined, rather than code in the 'rewrite from
scratch every two weeks' stage?
I personally find that _most_ changes by far tend to be fairly small and
simple, and take a few hours or days to do. Yet at the same time, you
want to have access to a lot of the SCM functionality (commit one set of
changes as "phase 1 - preparation", "phase 2 - update filesystems" etc).

At the same time, the tree often doesn't work until all phases are done,
so you do NOT want to commit "phase 1" to the CVS head - and creating a
CVS branch for something that is really not a big project is clearly not
something most people want to do. The pain of the branch is bigger than
it's worth.

And THIS is where the distributed repository nature of BK really shines.
It's a totally everyday thing, not something odd or special. You can
work in your own repository, with all the SCM tools, and document your
changes as you make them (and undo something if you notice it was
wrong). Yet you do _not_ need to pollute anything that somebody else
might be working on.

And then, when you're ready, you just push your changes to some other
tree (in BK it's an atomic operation to push _multiple_ changesets), and
tell others that you're done.

See? I'm not talking about a big six-month project. I'm talking about
something that potentially is just a few hours. You might do your first
cut, and check it into your tree, verify that it works, and then you
might want to go back and make another few check-ins to handle other
cases.

In gcc terms, let's say that you change the layout of something
fundamental, and you first make sure that the C front-end works. You
check that in and test it (on the C side only) as one stage. Only when
you're happy with that do you even _bother_ to start editing the C++ and
other front-ends.

With distributed trees, it's easy to make these kinds of multi-stage
things. Because nobody else sees what you're doing until you actually
decide to export the thing. With CVS, it's a total _disaster_ to do
this (and the way everybody works is to do all the work _without_ SCM
support, and then try to do one large check-in).

Linus
Tom Lord
2002-12-15 04:48:28 UTC
Permalink
The advantage of the SCM-assisted merges is really that when
you trust the other side, it becomes a non-issue.

It helps even when you don't implicitly trust the other side.

A remote, less-than-implicitly-trusted developer submits a patch. You
kick it back with comments. Meanwhile, your mainline has gone on.

Before resubmitting, that developer has to update his patch to reflect
the new head-of-mainline.

If that remote developer has his own repository, but a true,
first-class branch of your mainline, then he can use SCM-assisted
merges to keep his patch up-to-date.

A similar case occurs if you accept his patch, but then there's still
more to be done with it -- further development. In that case, there's
effectively back-and-forth merging between your mainline and his
remote branch. "star topology merging" handles exactly that case.

In relation to these features, it's interesting to read the recent
narrowly-on-topic traffic on the gcc list (Mark and Zack's
coordination with everyone else, comments about forming intermediate
merges, and calls for help with testing branches). I think that some
of the issues around synchronizing work would be somewhat relaxed by
applying these features; intermediate merges would be well
facilitated and _partially_ automated (regardless of access writes of the
authors); testing branches could be made more effectively
automated (again, orthogonally to access rights).

-t
Zack Weinberg
2002-12-15 07:12:23 UTC
Permalink
Post by Tom Lord
In relation to these features, it's interesting to read the recent
narrowly-on-topic traffic on the gcc list (Mark and Zack's
coordination with everyone else, comments about forming intermediate
merges, and calls for help with testing branches). I think that some
of the issues around synchronizing work would be somewhat relaxed by
applying these features; intermediate merges would be well
facilitated and _partially_ automated (regardless of access writes of the
authors); testing branches could be made more effectively
automated (again, orthogonally to access rights).
Oh, no kidding. I totally agree that a distributed repository system
is the way to go long term, and you'll notice that easy branches was
close to the top of my requirements list.

My comment about the Linux kernel development process was intended as
a comment about the way that project operates, which I happen to have
concerns about, but I'm pretty sure that's the way it has always
operated, independent of version-control system in use.
Post by Tom Lord
Many people want to check stuff into the CVS tree _not_ because they
really want everybody to see the thing, but because they use the CVS tree
as a way to communicate with a few other people working on the same thing.
That's where the "single main repository" _really_ falls down.
And this is the assumption underlying the part of the kernel process
that I have concerns about. It's true that mostly only a small number
of people work on any given chunk of a large piece of software, but I
strongly disagree that they should go off by themselves and merge into
the main tree only when they're done, as the default mode of
development. The effect from everyone else's point of view is that
they pop up and inflict a huge indigestible glob of code on everyone,
which invariably has not been tested thoroughly enough and breaks
stuff. Just watching linux-kernel suggests that this happens all the
time. The USAGI patches, for instance, or Rusty's module rewrite.
We've had our share of this in GCC, too; the 'subreg_byte' changes
were maintained at enormous effort separate from the main tree for
_years_ before they accumulated enough evidence that it wouldn't break
anything.

Contrariwise, forcing people to do their work in the main tree means
that they have to do it incrementally, more eyes at least skim the
code, and it naturally gets adequate testing as other people try to do
their own work. Also, I know that every time I've had to stop and
think how to do a transition incrementally, the end result has been
better off for it. Al Viro's rewrite of the VFS layer is the obvious
example of this, or Jan Hubicka's jump optimizer overhaul.

As a user of the kernel, I am also concerned about the apparent trend
toward having many variant trees not for development, but to present
divergent sets of features and bugfixes. It strikes me as a recipe
for user confusion and vendor malfeasance.

zw
Tom Lord
2002-12-15 07:43:12 UTC
Permalink
It's true that mostly only a small number of people work on any
given chunk of a large piece of software, but I strongly
disagree that they should go off by themselves and merge into
the main tree only when they're done, as the default mode of
development.


Well, my ideal is that changes to the mainline should occur only
_after_ they have verifiably passed all the available tests on a wide
range of platforms (a process that can be fully automated) and the
changes have passed senior engineer reviews (a process that can be
facilitated by substantial automated assistance). Mainlines should
increase in quality in a strictly monotonic fashion -- that's the
essence of what "gatekeeper management" is all about. Neither GCC nor
lk have that property -- though better tools can do much to put us
there. With good tools, the release manager can ultimately be
replaced by shell scripts.


As a user of the kernel, I am also concerned about the apparent
trend toward having many variant trees not for development, but
to present divergent sets of features and bugfixes. It strikes
me as a recipe for user confusion and vendor malfeasance.

Don't pull back -- go Furthur. There's a lot of good side effects
that can be obtained by giving each _end-customer_ (or nearly so)
their own source fork -- not _only_ feature divergence, but also (and
especially) risk management and emergency preparedness. Divergent
fixes and bugfixes aren't the end of the world. On the contrary,
they're a good thing.

Centralizing (in a small number of vendors) binary preparation for
millions (or for thousands of enterprises) is, in and of itself,
"vendor malfeasance". It creates vulnerabilities, slows down
responsiveness to customer needs, and enables monopolistic "lock-in".

The idea that either of these projects is way ahead of the other in
terms of process is, I think, not quite right. They're both about
equally far from the ideal.

But sure, I'm describing an ideal point that's on the horizon -- not
a change you should try to make next week. Just something to stroll
towards.

-t
Nathanael Nerode
2002-12-16 02:08:50 UTC
Permalink
Post by Linus Torvalds
Naah. It's simple - kernels are just sexier.
That I believe. :-)
Post by Linus Torvalds
Seriously, I think it's just that a kernel tends to have more different
_kinds_ of problems, and thus tend to attract different kinds of
people, and more of them.
I really don't think so. A kernel has more easy problems, in a
certain sense: more problems that can be started by people with little
background. A compiler has *lots* of different types of problems, but
they're almost all rather complicated and hard to start on, for a reason
I mention at the end of this.
Post by Linus Torvalds
people who are interested in one of them. So you'll find people who
care about filesystems,
We got that in compilers...
Post by Linus Torvalds
or people who care about memory management, or
We got that...
Post by Linus Torvalds
people who find it interesting to do concurrency work
We got that...
Post by Linus Torvalds
or IO paths.
We got that...

Heh. I think the real reason why fewer people work on compilers is
one of abstraction. To oversimplify, a kernel just *does* things.
A compiler is one level more abstract; it takes input and converts it
into code which does things. It takes some practice to get comfortable
with, and good at, that style of coding, and it's significantly harder
to debug. While I'm sure there's some kernel code like that, *most*
compiler code is like this. Given GCC's design, we even have two
levels of abstraction: the build system compiles and runs programs
to generate a compiler for a specific host-target pair. This is code
to generate code to generate code.

--Nathanael
Robert Dewar
2002-12-16 10:16:29 UTC
Permalink
Post by Tom Lord
Well, my ideal is that changes to the mainline should occur only
_after_ they have verifiably passed all the available tests on a wide
range of platforms (a process that can be fully automated) and the
changes have passed senior engineer reviews (a process that can be
facilitated by substantial automated assistance). Mainlines should
increase in quality in a strictly monotonic fashion -- that's the
essence of what "gatekeeper management" is all about. Neither GCC nor
lk have that property -- though better tools can do much to put us
there. With good tools, the release manager can ultimately be
replaced by shell scripts.
with GNAT, we let everyone within ACT, which is quite a diverse set of folks
about 35 in all, change anything in the mainline, but we guarantee the
monotonic properly (I agree this is crucial) by enforcing fairly strenuous
requirements on anyone doing a change. No change of any kind (not even
something that is "obviously" safe) is allowed without doing a complete
bootstrap, and running the entire regression suite (which is pretty
comprehensive at this stage) first. Now we only require this on one target
for changes that are expected to be target independent, so it is possible
to have unanticipated hits on other targets. We deal with this by building
the system on all targets every night and running the regression suites on
all targets every night. If the reports in the morning indicate a problem,
then it is all hands on deck to fix the problem.

When we get GNAT properly integrated into GCC, which involves several
things still to be done:

1. We need to get to a release point internally where the GCC 3 based GNAT
passes all regression tests etc. We are close to this, and expecting to
do a beta release in January on selected targets (should include Solaris,
Windows, GNU/Linux, HPUX).

2. We need to get the sources and our internal source procedures more
amenable to GCC style (e.g. we have removed the version numbers from
our sources, and adjusted all our scripts for this change recently).

3. We need to establish the ACATS test suite so that anyone can run it. This
is not as comprehensive as our internal test suite (which is not distributable
since it is mostly proprietary code).

4. We need to set up procedures so we can run and test changes that others
make against our internal test suite.

... then hopefullly we can duplicate at least some of these procedures
in a manner that others outside ACT can follow a similar path. We regard
this kind of automatic testing as absolutely crucial.
Post by Tom Lord
With good tools, the release manager can ultimately be
replaced by shell scripts.
I don't believe that, based on our experience where we have elaborate
scripts that try to automate everything, but you still need a release
manager to coordinate activities and check that everything is working
as expected.
Tom Lord
2002-12-16 11:05:51 UTC
Permalink
with GNAT, we let everyone within ACT, which is quite a diverse
set of folks about 35 in all, change anything in the mainline,
but we guarantee the monotonic properly (I agree this is
crucial) by enforcing fairly strenuous requirements on anyone
doing a change. [....]

Thanks for the report.

A lot of the thinking behind arch is to scale up and simplify adopting
practices such as you describe so that they are applied by default to
pretty much all of the free software (and "open source") projects in
the world. With your 35, you have social pressures and the power of
the employer to enforce restrictions like "run the tests before
committing to mainline" -- but wouldn't it be nice if that were
automated: so a developer could hit the "try to test and merge" button
before going home for the night, coming back in the morning to either
a commit email or a list of test failures -- and if you _didn't_ have
to write all your own tools for that automation because they were just
there already, such that setting up a new project with these
properties was as easy as creating a project on Savannah currently is
(or, easier :).
Post by Tom Lord
With good tools, the release manager can ultimately be
replaced by shell scripts.
I don't believe that, based on our experience where we have
elaborate scripts that try to automate everything, but you still
need a release manager to coordinate activities and check that
everything is working as expected.

Yeah, I said it badly. Observing gcc, Mark makes lots of judgment
calls and sets focus and picks dates and things like that. I only
meant that a lot of the mechanical source and repository manipulation
chores can be far better automated than foisting them off onto poor
'ol Zack :)

I have a question about ACT: How much value do you place in "clean
changesets" and how much would you if they were easier to manipulate?

Feature branches of GCC get smooshed into big merges on mainline, and
I'm wondering if my feeling that keeping changes independent is
shared by others. Maybe this applies more to the kernel, where
long-lived distribution forks regularly differ in their choices of
changes applied.

It's an interesting question for a rev ctl implementor because, if
maintaining clean changesets is a big deal, then I think you want some
high-level features to help maintain them over time. For example, if
one change appears on mainline, then months later, a fix related to
that change appears -- then you want to help cherry-pickers associate
those two and combine them. Another example: under some
circumstances, you want to recognize textually distinct merges that do
nothing more than combine equal sets of changes as, in some sense,
equivalent (as in, I did that same merge as him but in my own way --
even thought I don't have _his_ patch for that merge, don't treat my
tree as if it were missing that patch.)

Second paragraph says:

A lot of the thinking behind arch is to scale up practices such
as you describe so that they are applied by default to pretty
much all of the free software (and "open source") projects in
the world.

That's part of why I'm pretty frustrated. I think that is "obviously"
doable and "obviously" generates an (admittedly hard to measure in
conventional ways) ROI on R&D investment in arch. Nobody seems to
know how to do that in industry, though -- or even how to consider the
correctness of my assessment. People know how to do things like "buy
Rational".

-t
Tom Lord
2002-12-16 11:26:34 UTC
Permalink
Post by Tom Lord
Yeah, I said it badly. Observing gcc, Mark makes lots of
judgment calls and sets focus and picks dates and things like
that.
And even there, maybe said it badly again. I'm really not trying to
sell Mark short. "Generally keeps on top of things in gcc
developement (no small task)" might be a better description -- i.e.,
not just throwing darts at a calendar and so forth.

-t
Florian Weimer
2002-12-17 00:16:10 UTC
Permalink
Post by Tom Lord
A lot of the thinking behind arch is to scale up and simplify adopting
practices such as you describe so that they are applied by default to
pretty much all of the free software (and "open source") projects in
the world. With your 35, you have social pressures and the power of
the employer to enforce restrictions like "run the tests before
committing to mainline" -- but wouldn't it be nice if that were
automated: so a developer could hit the "try to test and merge" button
before going home for the night, coming back in the morning to either
a commit email or a list of test failures -- and if you _didn't_ have
to write all your own tools for that automation because they were just
there already, such that setting up a new project with these
properties was as easy as creating a project on Savannah currently is
(or, easier :).
Well, you can do this using Aegis for a couple of years now, but I
don't see that Aegis is adopted by the free software crowd. Most of
its members seem to have strong reservations regarding processes which
are enforced by software. ;-)
Momchil Velikov
2002-12-17 08:17:24 UTC
Permalink
Florian> Well, you can do this using Aegis for a couple of years now, but I
Florian> don't see that Aegis is adopted by the free software crowd. Most of
Florian> its members seem to have strong reservations regarding processes which
Florian> are enforced by software. ;-)

And rightfully so. Aegis' model of "change" assignments to developers
doesn't quite fit the volunteer driven projects.

~velco
Daniel Egger
2002-12-17 18:49:12 UTC
Permalink
Post by Momchil Velikov
And rightfully so. Aegis' model of "change" assignments to developers
doesn't quite fit the volunteer driven projects.
And it's pain in the rear to use. Frankly _I_ want a system consisting
of _one_ swiss army knife tool and not dotzend of more or less obviously
christened tools cluttering up /usr/bin. The system has to be easy to
explore, not necessarily to setup, and the basic features have to be
easily accessible; cvs and subversion provide that, bk not to that
extend but Aegis is far down the drain in this regard.
I claim that having a CM tools which is trivial to get started with is
important to not scare possible volunteers away, as a rule of a thumb
no more than 3 (better 2) easily explainable steps should be necessary
to retrieve a specific version.
--
Servus,
Daniel
Tom Lord
2002-12-17 08:53:08 UTC
Permalink
Florian> Well, you can do this using Aegis for a couple of
Florian> years now, but I don't see that Aegis is adopted by
Florian> the free software crowd. Most of its members seem to
Florian> have strong reservations regarding processes which are
Florian> enforced by software. ;-)
Post by Momchil Velikov
And rightfully so. Aegis' model of "change" assignments to
developers doesn't quite fit the volunteer driven projects.
Eh.

The volunteers to some really key projects tend to be either
corporations (or corporate sponsored hackers) or just really
serious-minded individual hackers.

I think audiences like that can appreciate a little bondage and
discipline from their tools if the overall effect is to make their
life easier and more pleasant and their work-product quality higher.
Well, actually, I don't just _think_ this, I _know_ it because that's
part of how GCC works already.

Moreover, those key projects look to me like trend setters. They can
raise the bar for other projects by leading by example.

I agree that Aegis isn't quite the right fit for the development
practices we see in free software projects -- but (a) process
enforcement isn't in and of itself the problem (relentless, fixed-form
process enforcement and inflexability is) and (b) I've at least
casually tried to interest Peter Miller into contributing some lessons
from aegis to arch superstructure. I think it's a good fit.

All: Not to gush, but: The engineering process quality on gcc is so
damn high, especially comparatively. I feel _very_ self conscious
making so much noise on this list -- so please realize that it's a
very considered decision to do so. In an ideal universe I'd just be
able to say "Here's arch 1.0, I'm sure you'll like it", but fiscal
reality means I have to advocate earlier in the development cycle. I
know there's no global GCC budget or anything like that, but there is
a human network here and it does connect to people that can spend
money, with relatively few degrees of separation.

-t
Richard Kenner
2002-12-16 12:42:35 UTC
Permalink
but wouldn't it be nice if that were automated: so a developer could
hit the "try to test and merge" button before going home for the
night, coming back in the morning to either a commit email or a list
of test failures

I'm not sure I like that kind of automation because of the potentially
unknown delay in the testing process (what if the queue that runs the
tests got stuck). I'd want to be able to know and control exactly
*when* the change went in.
Tom Lord
2002-12-16 12:54:58 UTC
Permalink
but wouldn't it be nice if that were automated: so a developer
could hit the "try to test and merge" button before going home
for the night, coming back in the morning to either a commit
email or a list of test failures

I'm not sure I like that kind of automation because of the
potentially unknown delay in the testing process (what if the
queue that runs the tests got stuck). I'd want to be able to know
and control exactly *when* the change went in.


I think your worry is premature, but understandable. We're bumping up
against the limits of email.

It's hard to explain abstractions through specific instantiations, but
harder to explain (and be heard explaining) abstractions through
abstract language. So in forums like this, I think I tend to state
instances, and hope readers form the underlying abstractions for
themselves. I'm aiming for the SYNC! or "Aha!" experience.

So, I say "going home for the night, coming back in the morning" --
but there's a less specific abstraction behind that I'm pointing at.

Computers keep getting cheaper and faster. I think in a few short
years, we developers will each have tons of them for day-to-day work.
What Savannah does for the maddening crowd asynchronously and slowly,
you and I will have synchronously and quickly. (Which is a related
observation, if you think about it.)


-t
Loading...