Discussion:
[fossil-users] Why Hash
Scott Doctor
2015-09-09 08:16:04 UTC
Permalink
Why does Fossil use a hash for an entries identity instead of sequential
numbering? Seems simply using the rowid of the associated database table
would be more meaningful and practical than those long strings of
arbitrary numbers.
--
------------
Scott Doctor
***@scottdoctor.com
---------------------
Stanislav Paskalev
2015-09-09 08:17:52 UTC
Permalink
Because it is distributed. If sequential numbering is used two
disconnected clients can happen to use the same number for different
artifacts.

Regards,
Stan
Stanislav Paskalev
Post by Scott Doctor
Why does Fossil use a hash for an entries identity instead of sequential
numbering? Seems simply using the rowid of the associated database table
would be more meaningful and practical than those long strings of arbitrary
numbers.
--
------------
Scott Doctor
---------------------
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Stephan Beal
2015-09-09 08:18:13 UTC
Permalink
Post by Scott Doctor
Why does Fossil use a hash for an entries identity instead of sequential
numbering? Seems simply using the rowid of the associated database table
would be more meaningful and practical than those long strings of arbitrary
numbers.
A DVCS cannot use sequential numbering. It's impossible.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Luca Ferrari
2015-09-09 11:19:42 UTC
Permalink
Post by Scott Doctor
Why does Fossil use a hash for an entries identity instead of sequential
numbering? Seems simply using the rowid of the associated database table
would be more meaningful and practical than those long strings of arbitrary
numbers.
Some DVCS, like hg, use both an hash and a sequential number. The
latter is simpler to keep things locally, but the hash is used when
things get distributed to the outside world.
So, in the last, you need a UUID or something alike for your
repository to be connect-able to other repositories.

Luca
Richard Hipp
2015-09-09 11:24:33 UTC
Permalink
Post by Luca Ferrari
Post by Scott Doctor
Why does Fossil use a hash for an entries identity instead of sequential
numbering? Seems simply using the rowid of the associated database table
would be more meaningful and practical than those long strings of arbitrary
numbers.
Some DVCS, like hg, use both an hash and a sequential number.
Fossil also uses both the SHA1 hash and a sequential number. But only
the hash is guaranteed to be stable (unchanged over the life of the
project) and the same across all clones of the repo. So only the hash
is shown and used.

You can see the sequence number for your repository by appending the
"showid" query parameter to the /timeline page. Example:

https://www.fossil-scm.org/fossil/timeline?y=ci&showid
--
D. Richard Hipp
***@sqlite.org
Stephan Beal
2015-09-09 11:28:53 UTC
Permalink
Post by Richard Hipp
Fossil also uses both the SHA1 hash and a sequential number. But only
the hash is guaranteed to be stable (unchanged over the life of the
project) and the same across all clones of the repo. So only the hash
is shown and used.
You can see the sequence number for your repository by appending the
https://www.fossil-scm.org/fossil/timeline?y=ci&showid
But be warned that they are not incremented in steps of 1. Any and all blob
data in to the db gets its own number, so commits are normally spaced out
several numbers apart because their file content get added to the db before
their accompanying commit blob does, and that content each gets a
(local/transient) number as well. Never, ever rely on those values
(colloquially called "RIDs") for anything useful.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Richard Hipp
2015-09-09 11:31:03 UTC
Permalink
Post by Stephan Beal
Post by Richard Hipp
https://www.fossil-scm.org/fossil/timeline?y=ci&showid
But be warned that they are not incremented in steps of 1. Any and all blob
data in to the db gets its own number, so commits are normally spaced out
several numbers apart because their file content get added to the db before
their accompanying commit blob does, and that content each gets a
(local/transient) number as well. Never, ever rely on those values
(colloquially called "RIDs") for anything useful.
Indeed. The showid query parameter is undocumented and is intended
for debugging only.
--
D. Richard Hipp
***@sqlite.org
Ron W
2015-09-09 18:19:04 UTC
Permalink
Post by Luca Ferrari
Some DVCS, like hg, use both an hash and a sequential number.
As I recall (been a few years since I last used hg), the numbers were
"relative" to the output of hg's equivalent to "timeline".

Assuming I am remembering correctly, if Fossil had this feature, you could
do something like:

$ fossil timeline -N -n 3
0 [d28be5063a] *CURRENT* Fix linker parameter file
1 [10a5af61c1] Alt code for HS interface
2 [5250e3796e] Increase speed threshold
$ fossil info 1
uuid: 10a5af61c1fc25060ad428de9c82e3615b45f6c8 ...

The numbers, of course, could change after any sync or commit.
j. van den hoff
2015-09-09 19:12:33 UTC
Permalink
Post by Ron W
Post by Luca Ferrari
Some DVCS, like hg, use both an hash and a sequential number.
As I recall (been a few years since I last used hg), the numbers were
"relative" to the output of hg's equivalent to "timeline".
Assuming I am remembering correctly, if Fossil had this feature, you could
$ fossil timeline -N -n 3
0 [d28be5063a] *CURRENT* Fix linker parameter file
1 [10a5af61c1] Alt code for HS interface
2 [5250e3796e] Increase speed threshold
$ fossil info 1
uuid: 10a5af61c1fc25060ad428de9c82e3615b45f6c8 ...
The numbers, of course, could change after any sync or commit.
in a breach of promise to myself to never again argue in favour of this
functionality on the fossil mailing list (it came up a few times over the
last years):

having simple chronological checkin numbers as an alternative way of
specifying checkins _locally_ just the way hg has done for years would be
a *good* thing. simply because for most projects (all the small ones out
there) specifying chronological numbers is shorter/easier than specifying
(unique min 4-digits prefixes of) sha1 hashes. and the "chronologic
property" itself is helpful in itself, e.g in comparing 'current vs.
previous' checkin. and until checkin 9999 its at least break even in terms
of typing effort. the fact that those chronological checkin numbers are a
local property of each clone/checkout rather than of the repo proper is
beside the point in my view: it is true but mostly irrelevant. I concede
that there might arise confusion if people are really not aware of the
potential ambiguity of those chronological numbers across different clones
if they start to argue about a certain checkin. but when interacting with
fossil it cannot have adverse effects afaiks. rather the opposite.
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
Baruch Burstein
2015-09-09 20:43:07 UTC
Permalink
Post by j. van den hoff
in a breach of promise to myself to never again argue in favour of this
functionality on the fossil mailing list (it came up a few times over the
having simple chronological checkin numbers as an alternative way of
specifying checkins _locally_ just the way hg has done for years would be
a *good* thing. simply because for most projects (all the small ones out
there) specifying chronological numbers is shorter/easier than specifying
(unique min 4-digits prefixes of) sha1 hashes. and the "chronologic
property" itself is helpful in itself, e.g in comparing 'current vs.
previous' checkin. and until checkin 9999 its at least break even in terms
of typing effort. the fact that those chronological checkin numbers are a
local property of each clone/checkout rather than of the repo proper is
beside the point in my view: it is true but mostly irrelevant. I concede
that there might arise confusion if people are really not aware of the
potential ambiguity of those chronological numbers across different clones
if they start to argue about a certain checkin. but when interacting with
fossil it cannot have adverse effects afaiks. rather the opposite.
If I understand correctly, the way fossil is designed could cause the
numbers to change *even locally* upon a rebuild, or even just a sync. This
would probably get confusing.
--
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
Stephan Beal
2015-09-10 06:05:09 UTC
Permalink
On Wed, Sep 9, 2015 at 10:12 PM, j. van den hoff <
Post by j. van den hoff
in a breach of promise to myself to never again argue in favour of this
functionality on the fossil mailing list (it came up a few times over the
If I understand correctly, the way fossil is designed could cause the
numbers to change *even locally* upon a rebuild, or even just a sync. This
would probably get confusing.
Correct. And if i'm not mistaken, if you rebuild with the --randomize
option then the order could get even weirder.

(@Joerg: i was trying to remember who it was who used to ask for this
feature ;)
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
j. van den hoff
2015-09-10 13:17:53 UTC
Permalink
Post by Stephan Beal
On Wed, Sep 9, 2015 at 10:12 PM, j. van den hoff <
Post by j. van den hoff
in a breach of promise to myself to never again argue in favour of this
functionality on the fossil mailing list (it came up a few times over the
If I understand correctly, the way fossil is designed could cause the
numbers to change *even locally* upon a rebuild, or even just a sync. This
would probably get confusing.
Correct. And if i'm not mistaken, if you rebuild with the --randomize
option then the order could get even weirder.
well, I'm only talking about the ordinal numbers chronologically
enumerating the timeline checkin(!) entries. this enumeration will not
change as a consequence of rebuild, right? it might change after a sync
against some remote repo if there are incoming checkins chronologically
interleaved with my own, sure, but so what? the relative numbers would be
just a (somewhat "volatile") convenience measure _locally_. and I agree
with another recent post that this would primarily concern the CLI. what I
mean: go ask some hg users when they last did use sha1 hashes for
specifying checkins in their interaction with hg (which supports both the
ordinals as well as the hashes for doing so) and how often the presence of
those numbers confused communication with other developers in their
project. I'm quite sure they _never_ specify sha1 hashes to denote
checkins in any small to medium-sized project below 10^4 checkins
(currently
this still includes fossil itself). not so sure about the "communication"
issue if users forget the potentially 'volatile' nature of the relative
enumeration, but this just can't possibly be a big issue ...

I therefore just maintain it would be "nice to have". it sure ain't a
killer feature, I admit ...
Post by Stephan Beal
feature ;)
I plead guilty ;-). and will now keep quite again regarding this issue ....
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
Michal Suchanek
2015-09-10 13:29:24 UTC
Permalink
On 10 September 2015 at 15:17, j. van den hoff
Post by Stephan Beal
On Wed, Sep 9, 2015 at 10:12 PM, j. van den hoff <
Post by j. van den hoff
in a breach of promise to myself to never again argue in favour of this
functionality on the fossil mailing list (it came up a few times over the
If I understand correctly, the way fossil is designed could cause the
numbers to change *even locally* upon a rebuild, or even just a sync. This
would probably get confusing.
Correct. And if i'm not mistaken, if you rebuild with the --randomize
option then the order could get even weirder.
well, I'm only talking about the ordinal numbers chronologically enumerating
the timeline checkin(!) entries. this enumeration will not change as a
consequence of rebuild, right? it might change after a sync against some
remote repo if there are incoming checkins chronologically interleaved with
my own, sure, but so what? the relative numbers would be just a (somewhat
"volatile") convenience measure _locally_. and I agree with another recent
post that this would primarily concern the CLI. what I mean: go ask some hg
users when they last did use sha1 hashes for specifying checkins in their
interaction with hg (which supports both the ordinals as well as the hashes
for doing so) and how often the presence of those numbers confused
communication with other developers in their project. I'm quite sure they
_never_ specify sha1 hashes to denote checkins in any small to medium-sized
project below 10^4 checkins (currently
this still includes fossil itself). not so sure about the "communication"
issue if users forget the potentially 'volatile' nature of the relative
enumeration, but this just can't possibly be a big issue ...
I therefore just maintain it would be "nice to have". it sure ain't a killer
feature, I admit ...
Given that fossil does not support history rewriting by design the
commit number on a particular branch counting from root is unique and
stable per branch across all repos.

If you release from a single master branch you have a monotonous
snapshot number.

When you use multiple branches you need to add branch name to have
stable unique identifier.

This is not viable eg. for git with rebasing.

Thanks

Michal
Martin Gagnon
2015-09-10 14:54:30 UTC
Permalink
Post by Michal Suchanek
On 10 September 2015 at 15:17, j. van den hoff
Post by Stephan Beal
On Wed, Sep 9, 2015 at 10:12 PM, j. van den hoff <
Post by j. van den hoff
in a breach of promise to myself to never again argue in favour of this
functionality on the fossil mailing list (it came up a few times over the
If I understand correctly, the way fossil is designed could cause the
numbers to change *even locally* upon a rebuild, or even just a sync. This
would probably get confusing.
Correct. And if i'm not mistaken, if you rebuild with the --randomize
option then the order could get even weirder.
well, I'm only talking about the ordinal numbers chronologically enumerating
the timeline checkin(!) entries. this enumeration will not change as a
consequence of rebuild, right? it might change after a sync against some
remote repo if there are incoming checkins chronologically interleaved with
my own, sure, but so what? the relative numbers would be just a (somewhat
"volatile") convenience measure _locally_. and I agree with another recent
post that this would primarily concern the CLI. what I mean: go ask some hg
users when they last did use sha1 hashes for specifying checkins in their
interaction with hg (which supports both the ordinals as well as the hashes
for doing so) and how often the presence of those numbers confused
communication with other developers in their project. I'm quite sure they
_never_ specify sha1 hashes to denote checkins in any small to medium-sized
project below 10^4 checkins (currently
this still includes fossil itself). not so sure about the "communication"
issue if users forget the potentially 'volatile' nature of the relative
enumeration, but this just can't possibly be a big issue ...
I therefore just maintain it would be "nice to have". it sure ain't a killer
feature, I admit ...
Given that fossil does not support history rewriting by design the
commit number on a particular branch counting from root is unique and
stable per branch across all repos.
If you release from a single master branch you have a monotonous
snapshot number.
When you use multiple branches you need to add branch name to have
stable unique identifier.
This is not viable eg. for git with rebasing.
Even in fossil it could be a problem, it cannot re-write history but a
branch is just a tag that can change. The identifier will change
after moving checkins on a branch.
--
Martin G.
Michal Suchanek
2015-09-10 15:05:13 UTC
Permalink
Post by Martin Gagnon
Post by Michal Suchanek
On 10 September 2015 at 15:17, j. van den hoff
Post by Stephan Beal
On Wed, Sep 9, 2015 at 10:12 PM, j. van den hoff <
Post by j. van den hoff
in a breach of promise to myself to never again argue in favour of this
functionality on the fossil mailing list (it came up a few times over the
If I understand correctly, the way fossil is designed could cause the
numbers to change *even locally* upon a rebuild, or even just a sync. This
would probably get confusing.
Correct. And if i'm not mistaken, if you rebuild with the --randomize
option then the order could get even weirder.
well, I'm only talking about the ordinal numbers chronologically enumerating
the timeline checkin(!) entries. this enumeration will not change as a
consequence of rebuild, right? it might change after a sync against some
remote repo if there are incoming checkins chronologically interleaved with
my own, sure, but so what? the relative numbers would be just a (somewhat
"volatile") convenience measure _locally_. and I agree with another recent
post that this would primarily concern the CLI. what I mean: go ask some hg
users when they last did use sha1 hashes for specifying checkins in their
interaction with hg (which supports both the ordinals as well as the hashes
for doing so) and how often the presence of those numbers confused
communication with other developers in their project. I'm quite sure they
_never_ specify sha1 hashes to denote checkins in any small to medium-sized
project below 10^4 checkins (currently
this still includes fossil itself). not so sure about the "communication"
issue if users forget the potentially 'volatile' nature of the relative
enumeration, but this just can't possibly be a big issue ...
I therefore just maintain it would be "nice to have". it sure ain't a killer
feature, I admit ...
Given that fossil does not support history rewriting by design the
commit number on a particular branch counting from root is unique and
stable per branch across all repos.
If you release from a single master branch you have a monotonous
snapshot number.
When you use multiple branches you need to add branch name to have
stable unique identifier.
This is not viable eg. for git with rebasing.
Even in fossil it could be a problem, it cannot re-write history but a
branch is just a tag that can change. The identifier will change
after moving checkins on a branch.
If you can remove the tag that denotes the branch name from a branch
and put it on another branch then the repo-unique identifiers will
change since the branch name part will change, of course. The
branch-unique identifier is stable, however. It can for example denote
commits on currently checked out branch fine.

And even with the ability to rename branches the branch name+commit
number identifier is unique at any given time. Just not stable wrt
branch rename which is understandable.

Thanks

Michal
j. van den hoff
2015-09-10 17:03:41 UTC
Permalink
Post by Martin Gagnon
Post by Michal Suchanek
On 10 September 2015 at 15:17, j. van den hoff
On Thu, 10 Sep 2015 08:05:09 +0200, Stephan Beal
On Wed, Sep 9, 2015 at 10:43 PM, Baruch Burstein
On Wed, Sep 9, 2015 at 10:12 PM, j. van den hoff <
Post by j. van den hoff
in a breach of promise to myself to never again argue in favour of
this
Post by j. van den hoff
functionality on the fossil mailing list (it came up a few times
over
Post by j. van den hoff
the
If I understand correctly, the way fossil is designed could cause
the
numbers to change *even locally* upon a rebuild, or even just a
sync.
This
would probably get confusing.
Correct. And if i'm not mistaken, if you rebuild with the --randomize
option then the order could get even weirder.
well, I'm only talking about the ordinal numbers chronologically
enumerating
the timeline checkin(!) entries. this enumeration will not change as a
consequence of rebuild, right? it might change after a sync against
some
remote repo if there are incoming checkins chronologically
interleaved with
my own, sure, but so what? the relative numbers would be just a
(somewhat
"volatile") convenience measure _locally_. and I agree with another
recent
post that this would primarily concern the CLI. what I mean: go ask
some hg
users when they last did use sha1 hashes for specifying checkins in
their
interaction with hg (which supports both the ordinals as well as the
hashes
for doing so) and how often the presence of those numbers confused
communication with other developers in their project. I'm quite sure
they
_never_ specify sha1 hashes to denote checkins in any small to
medium-sized
project below 10^4 checkins (currently
this still includes fossil itself). not so sure about the
"communication"
issue if users forget the potentially 'volatile' nature of the
relative
enumeration, but this just can't possibly be a big issue ...
I therefore just maintain it would be "nice to have". it sure ain't a
killer
feature, I admit ...
Given that fossil does not support history rewriting by design the
commit number on a particular branch counting from root is unique and
stable per branch across all repos.
If you release from a single master branch you have a monotonous
snapshot number.
When you use multiple branches you need to add branch name to have
stable unique identifier.
This is not viable eg. for git with rebasing.
Even in fossil it could be a problem, it cannot re-write history but a
branch is just a tag that can change. The identifier will change
after moving checkins on a branch.
is it not much simpler? the timeline of all checkins in any given checkout
has a well-defined immutable chronological order (as displayed by `fossil
timeline -t ci': and since fossil knows this order it could easily
enumerate the checkins just fine...). just enumerating them from "old to
new" yields the rank/ordinal/sequential number we are talking about that
might serve as replacement of the hashes for any fossil command where
those need to be specified. the enumeration just is not unique across
clones not being completely in sync/identical. but the mapping of these
numbers to sha1 hashes in _my_ clone (i.e. the sequence of checkins
displayed in the timeline) of the project might only change (as far as I
can see) if a sync injects "remote" checkins into the timeline that are
interleaved with my own (instead of just being "newer" than any of mine).
that's all. so the mapping `rank <--> sha1' indeed can change (that's why
the rank cannot completely replace the hashes for uniquely identifying a
checkin) due to this "chronological interleaving" of checkins in different
clones of the project. but that's all (the mapping would even be identical
across all clones being completely in sync at the considered point in
time). and it really is just irrelevant for the simple envisaged
convenience measure: being able to use the ranks instead of the hashes for
identifying checkins in _my_ clone when interacting with fossil. but if
implementing this seems not worth the effort to the devs, so be it.
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
Baruch Burstein
2015-09-10 17:39:49 UTC
Permalink
and it really is just irrelevant for the simple envisaged convenience
measure: being able to use the ranks instead of the hashes for identifying
checkins in _my_ clone when interacting with fossil.
I am starting to agree. When I used hg, I didn't usually even remember the
local numbers. I would usually look them up in the timeline of recent
checkins, and then use them for diffs/branches/rollbacks/whatnot. It was
just easier than hashes. So the renumbering would not be critical.
--
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
Martin Gagnon
2015-09-10 18:16:52 UTC
Permalink
On Thu, Sep 10, 2015 at 8:03 PM, j. van den hoff
and it really is just irrelevant for the simple envisaged convenience
measure: being able to use the ranks instead of the hashes for
identifying checkins in _my_ clone when interacting with fossil.
I am starting to agree. When I used hg, I didn't usually even remember the
local numbers. I would usually look them up in the timeline of recent
checkins, and then use them for diffs/branches/rollbacks/whatnot. It was
just easier than hashes. So the renumbering would not be critical.
I agree, but the only *potential* problem would be when people blindly
use the sequential number when posting links on mailing list or forum.
It could become confusing when the link point to another valid link,
but not the good one.
--
Martin G.
j. van den hoff
2015-09-10 18:26:46 UTC
Permalink
Post by Martin Gagnon
On Thu, Sep 10, 2015 at 8:03 PM, j. van den hoff
and it really is just irrelevant for the simple envisaged convenience
measure: being able to use the ranks instead of the hashes for
identifying checkins in _my_ clone when interacting with fossil.
I am starting to agree. When I used hg, I didn't usually even remember the
local numbers. I would usually look them up in the timeline of recent
checkins, and then use them for diffs/branches/rollbacks/whatnot. It was
just easier than hashes. So the renumbering would not be critical.
I agree, but the only *potential* problem would be when people blindly
use the sequential number when posting links on mailing list or forum.
It could become confusing when the link point to another valid link,
but not the good one.
yes, beyond "checkin 1000" this could happen (if by chance there is some
checkin whose sha1 hash starts with those 4 digits). but I would argue
that when posting links or talking about checkins on mailing lists it
simply should be considered mandatory to use the hashes. should not be
_that_ much of a pedagogical challenge to drive that point home ;-)
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
Noam Postavsky
2015-09-10 17:23:25 UTC
Permalink
Post by Michal Suchanek
Given that fossil does not support history rewriting by design the
commit number on a particular branch counting from root is unique and
stable per branch across all repos.
If you release from a single master branch you have a monotonous
snapshot number.
When you use multiple branches you need to add branch name to have
stable unique identifier.
This is not viable eg. for git with rebasing.
I think (accidental) forks in fossil would also break the uniqueness
of the numbering scheme.

For example see figure 3 of
http://fossil-scm.org/xfer/doc/trunk/www/branching.wiki

Both check-ins 3 and 4 are equidistant from the root. More complicated
cases with differing numbers of check-ins on each side of the fork are
possible.
Michal Suchanek
2015-09-11 07:57:48 UTC
Permalink
On 10 September 2015 at 19:23, Noam Postavsky
Post by Noam Postavsky
Post by Michal Suchanek
Given that fossil does not support history rewriting by design the
commit number on a particular branch counting from root is unique and
stable per branch across all repos.
If you release from a single master branch you have a monotonous
snapshot number.
When you use multiple branches you need to add branch name to have
stable unique identifier.
This is not viable eg. for git with rebasing.
I think (accidental) forks in fossil would also break the uniqueness
of the numbering scheme.
For example see figure 3 of
http://fossil-scm.org/xfer/doc/trunk/www/branching.wiki
Both check-ins 3 and 4 are equidistant from the root.
And each is on a differnt branch.

When you create the merge checkin 5 you create it on a particular
branch and it gets incremental number along the branch even if it
merges multiple checkins from other branch.
Post by Noam Postavsky
More complicated
cases with differing numbers of check-ins on each side of the fork are
possible.
And in each case the per-branch numbering is exactly defined. And when
you have some master branch on a master repo from which you cut
snapshot releases you get monotonous numbering.

Thanks

Michal
Noam Postavsky
2015-09-11 15:13:57 UTC
Permalink
Post by Michal Suchanek
On 10 September 2015 at 19:23, Noam Postavsky
Post by Noam Postavsky
For example see figure 3 of
http://fossil-scm.org/xfer/doc/trunk/www/branching.wiki
Both check-ins 3 and 4 are equidistant from the root.
And each is on a differnt branch.
This is a fork, not an intentional branch, so both sides are on the
same branch. Figure 4 shows intentional branching.
Michal Suchanek
2015-09-11 20:04:06 UTC
Permalink
On 11 September 2015 at 17:13, Noam Postavsky
Post by Noam Postavsky
Post by Michal Suchanek
On 10 September 2015 at 19:23, Noam Postavsky
Post by Noam Postavsky
For example see figure 3 of
http://fossil-scm.org/xfer/doc/trunk/www/branching.wiki
Both check-ins 3 and 4 are equidistant from the root.
And each is on a differnt branch.
This is a fork, not an intentional branch, so both sides are on the
same branch. Figure 4 shows intentional branching.
That does not really matter. Intentional or not it is a branch and has
to be merged before both commits appear on the same branch. Then they
both get unique number, too.

Thanks

Michal
Scott Doctor
2015-09-11 20:15:42 UTC
Permalink
_______________________________________________
fossil-users mailing list
fossil-***@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Stephan Beal
2015-09-11 20:18:20 UTC
Permalink
I am getting confuzzled. Could someone explain the difference between a
leaf, branch, and fork.
In fossil a branch and fork are technically the same thing, the terms are
just used in different contexts (branch = intentional, fork =
unintentional). leaf means...

http://fossil-scm.org/index.html/doc/trunk/www/branching.wiki

"A leaf is a check-in with no children in the same branch."
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Noam Postavsky
2015-09-11 20:34:26 UTC
Permalink
Post by Michal Suchanek
On 11 September 2015 at 17:13, Noam Postavsky
Post by Noam Postavsky
Post by Michal Suchanek
On 10 September 2015 at 19:23, Noam Postavsky
Post by Noam Postavsky
For example see figure 3 of
http://fossil-scm.org/xfer/doc/trunk/www/branching.wiki
Both check-ins 3 and 4 are equidistant from the root.
And each is on a differnt branch.
This is a fork, not an intentional branch, so both sides are on the
same branch. Figure 4 shows intentional branching.
That does not really matter. Intentional or not it is a branch and has
to be merged before both commits appear on the same branch. Then they
both get unique number, too.
Okay, if you define branch that way then the problem is that both
branches happen to have the same name. And yes, you can always assign
unique (to a single repo) numbers, they just won't be as nicely
ordered. I guess that's not so bad if forking is rare.
Warren Young
2015-09-11 21:28:04 UTC
Permalink
Okay, if you define branch that way…
It isn’t a question of philosophical semantics. Stephan is telling you a fact about how Fossil behaves, not offering a fuzzy definition.

Maybe I’m being overly sensitive about your choice of words, but in my world, definitions are fluid, may change over time, and usually don’t capture the entire sense of a concept even while the definition is current.

This aspect of Fossil’s behavior, by contrast, is *highly* unlikely to change, since doing so would probably break any nontrivial existing repository.
the problem is that both
branches happen to have the same name.
Keep in mind that branch names and tags are secondary things in Fossil. They’re merely labels on the underlying truth: the artifacts, their relationships, etc. The only reason two branches with the same name is a problem is that we humans prefer to call things by name rather than by artifact ID.

But, cases like this are one situation where you really do need to know that ID, or “hash” as this thread calls it.

It isn’t really a hash since it isn’t computed from the contents of the artifact. It’s just a random number, expressed as a long hex string. It *looks* like a hash, but it isn’t. Proof:

cd ~/tmp
f new ../x.fossil
f new ../y.fossil
f open ../x.fossil
touch foo
f add foo
f ci -m .
f close
f open ../y.fossil
f add foo
f ci -m .
f close

Notice that the “hashes” change value in both identical cases: the project codes are different, the initial checkout ID is different, and the checkin ID for “foo” in both cases is different, even though it hasn’t changed in any way.

When you create a fork, the proper way to heal it back into a single branch is to merge one of the two halves into the other. After the fork is healed, “f up branch-name” will give you the tip of that branch, which is now unambiguous because there is only one branch tip again.

And to heal that branch, you will need the artifact ID at the tip of the other side of the fork.

So that’s why you need to know about artifact IDs. :)
I guess that's not so bad if forking is rare.
I think the only time I’ve created a fork is when working offline, so that autosync can’t save me from creating accidental forks.
Noam Postavsky
2015-09-11 22:40:40 UTC
Permalink
Post by Warren Young
Okay, if you define branch that way…
It isn’t a question of philosophical semantics. Stephan is telling you a fact about how Fossil behaves, not offering a fuzzy definition.
According to http://fossil-scm.org/xfer/doc/trunk/www/branching.wiki

A branch is a set of check-ins with the same value for their
"branch" property.

Which is different from the definitions that Michael (whom I was
replying to) and Stephan were using. But that's really not a problem
because, as you said

[...] definitions are fluid, may change over time, and usually
don’t capture the entire sense of a concept even while the definition
is current.

I entirely agree with this.
Post by Warren Young
This aspect of Fossil’s behavior, by contrast, is *highly* unlikely to change, since doing so would probably break any nontrivial existing repository.
I don't think there was any disagreement over the facts of Fossil's
current behaviour, just a different usage of the word "branch".
[...]
Post by Warren Young
Notice that the “hashes” change value in both identical cases: the project codes are different, the initial checkout ID is different, and the checkin ID for “foo” in both cases is different, even though it hasn’t changed in any way.
Actually it's not the project code, it's the date. Try

cd ~/tmp
f new --date-override 2999-01-01 ../x.fossil
f new --date-override 2999-01-01 ../y.fossil
f open ../x.fossil
touch foo
f add foo
f ci --date-override 2999-01-01 -m .
f close
f open ../y.fossil
f add foo
f ci --date-override 2999-01-01 -m .
f close
Warren Young
2015-09-11 22:49:49 UTC
Permalink
Post by Noam Postavsky
Post by Warren Young
Okay, if you define branch that way…
It isn’t a question of philosophical semantics. Stephan is telling you a fact about how Fossil behaves, not offering a fuzzy definition.
According to http://fossil-scm.org/xfer/doc/trunk/www/branching.wiki
A branch is a set of check-ins with the same value for their
"branch" property.
Which is different from the definitions that Michael (whom I was
replying to) and Stephan were using.
The documentation is correct, but that’s why Fossil gets annoyed at you when you accidentally create a fork: “fossil up branch-name” becomes ambiguous, because there are two (or more!) tips to choose from. After healing the fork, there is only one.

While the entire branch’s content may include a healed fork, it is usually only the tip that matters at any one time, when you are giving commands to Fossil involving that branch name. If you mean something farther up the branch than the tip, you are giving checkin IDs, not the branch name.
Post by Noam Postavsky
Actually it's not the project code, it's the date.
Most interesting!

Thank you for the enlightenment.
Ron W
2015-09-11 22:46:05 UTC
Permalink
It isn’t really a hash since it isn’t computed from the contents of the
artifact. It’s just a random number, expressed as a long hex string. It
cd ~/tmp
f new ../x.fossil
f new ../y.fossil
f open ../x.fossil
touch foo
f add foo
f ci -m .
f close
f open ../y.fossil
f add foo
f ci -m .
f close
Notice that the “hashes” change value in both identical cases: the project
codes are different, the initial checkout ID is different, and the checkin
ID for “foo” in both cases is different, even though it hasn’t changed in
any way.
The commit ID really is a hash. It is the hash of the manifest artifact.
The manifest's 'D Card' has the date/time stamp of the commit. Also, the
manifest's 'P card' refers to the parent commit(s). Therefore, the commit
IDs of otherwise identical child commits will be different.

The 'foo' artifacts in the 2 repos, however, should have the same artifact
IDs,
Warren Young
2015-09-11 22:57:46 UTC
Permalink
The commit ID really is a hash. It is the hash of the manifest artifact. The manifest's 'D Card' has the date/time stamp of the commit. Also, the manifest's 'P card' refers to the parent commit(s). Therefore, the commit IDs of otherwise identical child commits will be different.
That certainly explains it.

I wonder if this is an implementation detail leaking through into the UI, though. Under what conditions, except for Noam’s contrived example with hardcoded dates, is there a useful distinction between “hash” — implying a number that you could reliably recompute given all the input data — and “random number”?

Short of crawling the DB, you don’t have all the input data, so what does it matter how that hex string was computed?

For instance, why even mention “SHA1 Hash” on the checkin details page in fossil ui, from src/info.c? Why not something more generic, like “checkin ID”?

While looking into this, I see evidence of past historical wrangling here: blob.uuid, for example, even though sizeof(SHA1) != sizeof(UUID). I guess Fossil once used MD5 for these ID values, too, and not just for integrity checksumming?
Ron W
2015-09-11 23:27:36 UTC
Permalink
Post by Warren Young
I wonder if this is an implementation detail leaking through into the UI,
though. Under what conditions, except for Noam’s contrived example with
hardcoded dates, is there a useful distinction between “hash” — implying a
number that you could reliably recompute given all the input data — and
“random number”?
Short of crawling the DB, you don’t have all the input data, so what does
it matter how that hex string was computed?
It is an implementation detail leaking through, but with a mitigating
reason. One of the things many people don't like about DVCSs in general is
the long commit IDs. They will ask why isn't it a small number like, for
example, SVN uses. Then they ask how do we know it's really unique? By
telling them (in simplified terms) how it is computed, they more readily
accept the need for such large IDs.
Warren Young
2015-09-12 00:23:09 UTC
Permalink
They will ask why isn't it a small number like, for example, SVN uses.
Solution: use tags. :)
Then they ask how do we know it's really unique? By telling them (in simplified terms) how it is computed, they more readily accept the need for such large IDs.
It’s too bad UUID has a fixed-length meaning already, otherwise you could just tell them it’s a UUID-160, and leave it at that.
Stephan Beal
2015-09-12 08:26:52 UTC
Permalink
For instance, why even mention “SHA1 Hash” on the checkin details page in
fossil ui, from src/info.c? Why not something more generic, like “checkin
ID”?
The checkin ID is the hash of the manifest for the checkin. e.g. try:

[***@host:~/cvs/fossil/cwal]$ f info
...
checkout: c56006cd3df5400a38de9b0b7e9797f8e2f3a999 2015-08-14 15:17:09
UTC
...

[***@host:~/cvs/fossil/cwal]$ f artifact
c56006cd3df5400a38de9b0b7e9797f8e2f3a999
B f88c7a2382edcc51d004f133fa72fc462814b200
C minor\stest\scode\stweaks.
D 2015-08-14T15:17:09.841
F s2/s2.c 46dc5ced9a243b20571781a6b4ac232e79ca1e27
F s2/s2.h 7ab0c333784a684f608938b676a6c1c6c09ec170
F s2/s2_ops.c 69d17c0d0b017e19d1b0c87a75153c55841a02db
F s2/unit/070-000-enum.s2 19c28bc16ea5e5a30953307344740566ca373ac4
P 3aa469f15c41f9920c0b43ae47acc3da4fc821e7
R 1f0ece1803f7f3776ace86d686725d62
U stephan
Z bfdac81f2888e26d15908ba724a50cf1

[***@host:~/cvs/fossil/cwal]$ f artifact
c56006cd3df5400a38de9b0b7e9797f8e2f3a999 | sha1sum -
c56006cd3df5400a38de9b0b7e9797f8e2f3a999 -

note that the input hash and output hash have the same value.
blob.uuid, for example, even though sizeof(SHA1) != sizeof(UUID). I guess
Fossil once used MD5 for these ID values, too, and not just for integrity
checksumming?
md5 is used in a small number of places for (IIRC) speed purposes. IIRC
it's only used in the calculation of the R-card (pure an integrity-checking
measure).
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Warren Young
2015-09-14 17:46:50 UTC
Permalink
Post by Warren Young
For instance, why even mention “SHA1 Hash” on the checkin details page in fossil ui, from src/info.c? Why not something more generic, like “checkin ID”?
The checkin ID is the hash of the manifest for the checkin.
Yes, I know that. The question is not, “Why is the checkin ID a SHA-1 hash?” The question is, “Why does this UI web page have to *say* that it is a SHA-1 hash?”

If this page just said “checkin ID,” what would be lost?

What would be gained is that people wouldn’t be trying to work out how to match sha1sum commands to Fossil output, and Fossil would be free to switch to a different algorithm later if that seemed like a good idea.

And indeed, maybe it is a good idea, since SHA-1 is nearing its EOL for cryptographic use:

https://www.google.com/?q=sha-1%20end%20of%20life
Scott Robison
2015-09-14 18:11:04 UTC
Permalink
Post by Warren Young
For instance, why even mention “SHA1 Hash” on the checkin details page
in fossil ui, from src/info.c? Why not something more generic, like
“checkin ID”?
The checkin ID is the hash of the manifest for the checkin.
Yes, I know that. The question is not, “Why is the checkin ID a SHA-1
hash?” The question is, “Why does this UI web page have to *say* that it
is a SHA-1 hash?”
If this page just said “checkin ID,” what would be lost?
What would be gained is that people wouldn’t be trying to work out how to
match sha1sum commands to Fossil output, and Fossil would be free to switch
to a different algorithm later if that seemed like a good idea.
Is this really a problem? Given that the checkin ID is generated from a
structured manifest file which is generated in part from sha1 hash values
from all included artifacts, it seems intractable to create a deliberately
colliding hash.
Post by Warren Young
https://www.google.com/?q=sha-1%20end%20of%20life
Except fossil doesn't use it for cryptographic security. For secure
communications, sure, make the change. For "deterministic generation of
identifiers with low probability of collision" it stills seems safe enough.
If people need more security, they should probably be using GPG to sign
commits.

If the powers that be want to make a change of algorithm for ID generation,
that'd be fine. I just don't see any urgency myself in non-cryptographic
applications.
--
Scott Robison
Warren Young
2015-09-14 19:01:58 UTC
Permalink
Post by Warren Young
Fossil would be free to switch to a different algorithm later if that seemed like a good idea.
Is this really a problem? Given that the checkin ID is generated from a structured manifest file which is generated in part from sha1 hash values from all included artifacts, it seems intractable to create a deliberately colliding hash.
If I were a black hat — and please realize that I have zero practice trying to be one, so assume that a real black hat would be as much better at this as Mario Andretti is better than me at driving really fast — and I wanted to attack someone else’s Fossil repo, I would consider its use of SHA-1 as at least “hopeful.”

The first line of defense is the passwords of valid committers, which presumably contain much less than 160 bits of entropy. All you need to do is find one weak password. And if that seems like an impossible thing to you, you haven’t been paying attention to the computer security news.

So now you have checkin privileges on someone else’s Fossil repo. Now what? Obviously you could just commit evil code to the trunk, but it would be much neater if you could insert it into an arbitrary point in the checkin tree, if for no other reason than to hide it from the timeline page, to reduce your chances of getting caught.

So yes, the question really does become, how difficult is it to forge a consistent yet bogus SHA-1 hash? If the crypto folk are worried about it — and a more conservative bunch of computer scientists you will not find — I’d say there is probably cause to be worried.

Let me restate that last point, to be doubly clear: If Bruce “security theater” Schneier is worried about SHA-1, *I* am worried about SHA-1.

https://www.schneier.com/blog/archives/2005/02/sha1_broken.html
https://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html
https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html
https://konklone.com/post/why-google-is-hurrying-the-web-to-kill-sha-1

The first two links talk about an attack that made it possible to generate a hash collision with difficult-to-obtain levels of technology…in 2005. That’s 6 Moore’s Law generations ago, which comes to about a factor of 100 in CPU cycles per dollar.

The third link gives a budgetary estimate of what it took to attack SHA-1 in 2012, with projections into the future that do not include an estimated rate of change in attack effectiveness. Attacks never get weaker, only stronger.

If you’re only thinking of maladjusted individuals and bottom-feeding criminal gangs doing this, you probably haven’t considered that there might be at least one major world government which would like to covertly insert a bit of code into a widely-used open source project.
Scott Robison
2015-09-14 19:10:08 UTC
Permalink
Post by Warren Young
Post by Scott Robison
Post by Warren Young
Fossil would be free to switch to a different algorithm later if that
seemed like a good idea.
Post by Scott Robison
Is this really a problem? Given that the checkin ID is generated from a
structured manifest file which is generated in part from sha1 hash values
from all included artifacts, it seems intractable to create a deliberately
colliding hash.
If I were a black hat — and please realize that I have zero practice
trying to be one, so assume that a real black hat would be as much better
at this as Mario Andretti is better than me at driving really fast — and I
wanted to attack someone else’s Fossil repo, I would consider its use of
SHA-1 as at least “hopeful.”
The first line of defense is the passwords of valid committers, which
presumably contain much less than 160 bits of entropy. All you need to do
is find one weak password. And if that seems like an impossible thing to
you, you haven’t been paying attention to the computer security news.
Fair enough.
Post by Warren Young
So now you have checkin privileges on someone else’s Fossil repo. Now
what? Obviously you could just commit evil code to the trunk, but it would
be much neater if you could insert it into an arbitrary point in the
checkin tree, if for no other reason than to hide it from the timeline
page, to reduce your chances of getting caught.
So yes, the question really does become, how difficult is it to forge a
consistent yet bogus SHA-1 hash? If the crypto folk are worried about it —
and a more conservative bunch of computer scientists you will not find —
I’d say there is probably cause to be worried.
Also fair enough. Though there would be the additional difficulty (though I
don't know how difficult it would be) to convince the canonical repository
to replace an old checkin with a crafted checkin. This seems unlikely to me
given that the receiving repo (as I understand it) will say "I already have
that ID, what about the next one".
Post by Warren Young
Let me restate that last point, to be doubly clear: If Bruce “security
theater” Schneier is worried about SHA-1, *I* am worried about SHA-1.
https://www.schneier.com/blog/archives/2005/02/sha1_broken.html
https://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html
https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html
https://konklone.com/post/why-google-is-hurrying-the-web-to-kill-sha-1
The first two links talk about an attack that made it possible to generate
a hash collision with difficult-to-obtain levels of technology
in 2005.
That’s 6 Moore’s Law generations ago, which comes to about a factor of 100
in CPU cycles per dollar.
The third link gives a budgetary estimate of what it took to attack SHA-1
in 2012, with projections into the future that do not include an estimated
rate of change in attack effectiveness. Attacks never get weaker, only
stronger.
If you’re only thinking of maladjusted individuals and bottom-feeding
criminal gangs doing this, you probably haven’t considered that there might
be at least one major world government which would like to covertly insert
a bit of code into a widely-used open source project.
I wasn't really thinking of who might want to do it, just that sha1 isn't
being used for cryptographic security and that would be covered by other
means (GPG for example).

Thanks for the thoughtful response vs the (all too often on the internet)
approach of questioning my parentage or intellect. :)
--
Scott Robison
Ron W
2015-09-14 22:05:11 UTC
Permalink
Post by Scott Robison
I wasn't really thinking of who might want to do it, just that sha1 isn't
being used for cryptographic security and that would be covered by other
means (GPG for example).
The hashes can be important for verifying the integrity of the repository.
Even when not "signing" commits, a secure hash is still valuable. The more
secure the hash, the harder it is to hide corruption.

Also, the description of the "PGP command" setting says "Command used to
clear-sign manifests at check-in." This suggests that only the manifest
itself is signed. Therefor, the GPG signature relies on the hashes - in the
manifest - generated by Fossil
Ron W
2015-09-14 22:24:37 UTC
Permalink
The question is, “Why does this UI web page have to *say* that it is a
SHA-1 hash?”
If this page just said “checkin ID,” what would be lost?
As far as VCS functionality, nothing.

On the other hand, many projects publish the hashes for their release
packages so that people can verify the package is correct.

The Fossil manifest's hash takes that to another level. Verify the
manifest, then verify each file listed in the manifest.
What would be gained is that people wouldn’t be trying to work out how to
match sha1sum commands to Fossil output,
fossil artifact id | sha1sum -
sha1sum path/to/file
and Fossil would be free to switch to a different algorithm later if that
seemed like a good idea.
Fossil still can switch hash algorithms. Existing repos probably remain
with SHA1, while new repos would use the new algorithm. Not impossible to
convert a repo, but all IDs would change. Any use of old IDs could utilize
tags generated during the conversion.

Even a mixed hash repo could exist if a version (or hash) card were
introduced.
Warren Young
2015-09-14 22:40:09 UTC
Permalink
Post by Ron W
Post by Warren Young
What would be gained is that people wouldn’t be trying to work out how to match sha1sum commands to Fossil output,
fossil artifact id | sha1sum -
sha1sum path/to/file
See, that just proves the point: the “SHA1 Hash” line on the /info page gives you a *checkin* ID, not an artifact ID. (Yes, yes, I know there are artifact IDs below that, but we’re not talking about them.)

In fact, this whole artifact ID vs checkin ID distinction completely flew over my head until recently. I had to re-read the file format wiki document again in the context of this discussion in order to finally grasp it.

I think I might have gotten over that hump a bit quicker if the UI was explicit about saying “checkin ID” and “artifact ID” instead of just saying, “Here’s some SHA-1 hashes, enjoy!"
Post by Ron W
Fossil still can switch hash algorithms. Existing repos probably remain with SHA1, while new repos would use the new algorithm. Not impossible to convert a repo, but all IDs would change. Any use of old IDs could utilize tags generated during the conversion.
Even a mixed hash repo could exist if a version (or hash) card were introduced.
glibc-based Linux systems cope with this problem in /etc/shadow by tagging the hash: a prefix of $1$ means it’s the old MD5 hash that replaced the ancient crypt(3) algortihm long ago, whereas the Linux box nearest to you probably uses $6$ by default, meaning SHA-512.

man 3 crypt for details.
Warren Young
2015-09-14 23:02:02 UTC
Permalink
Post by Warren Young
glibc-based Linux systems cope with this problem in /etc/shadow by tagging the hash
I just learned that this isn’t a Linux-specific thing, that it is in fact a pseudostandard also used on the BSDs and in various other places:

http://pythonhosted.org/passlib/modular_crypt_format.html
Stephan Beal
2015-09-15 05:53:47 UTC
Permalink
Post by Warren Young
output, and Fossil would be free to switch to a different algorithm later
if that seemed like a good idea.
Indeed, fossil's model allows any hash to be used, but it is not possible
to change the hash without a near-complete overhaul of fossil (and its
docs), nor without invalidating every repo in existence, so it's highly
unlikely to ever happen. Supporting two hash variants in one fossil binary
would likely prove to be problematic (and would require a major overhaul).
Post by Warren Young
https://www.google.com/?q=sha-1%20end%20of%20life
Fossil does not use it in a cryptographic context, so i would argue that
that's not relevant for fossil's continued use. Fossil only uses sha-1 to
define/determine content identity. (There are long threads somewhere in the
list archives about the changes of hash collision. Management summary: not
likely to happen for many human generations.)
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Scott Doctor
2015-09-15 06:28:19 UTC
Permalink
What are the items that are used to calculate the hash? Is the
hash salted?

------------
Scott Doctor
***@scottdoctor.com
------------------
On Mon, Sep 14, 2015 at 7:46 PM, Warren Young
output, and Fossil would be free to switch to a different
algorithm later if that seemed like a good idea.
Indeed, fossil's model allows any hash to be used, but it is
not possible to change the hash without a near-complete
overhaul of fossil (and its docs), nor without invalidating
every repo in existence, so it's highly unlikely to ever
happen. Supporting two hash variants in one fossil binary
would likely prove to be problematic (and would require a
major overhaul).
And indeed, maybe it is a good idea, since SHA-1 is
https://www.google.com/?q=sha-1%20end%20of%20life
Fossil does not use it in a cryptographic context, so i would
argue that that's not relevant for fossil's continued use.
Fossil only uses sha-1 to define/determine content identity.
(There are long threads somewhere in the list archives about
the changes of hash collision. Management summary: not likely
to happen for many human generations.)
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed
byproduct of those who insist on a perfect world, freedom will
have to do." -- Bigby Wolf
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Stephan Beal
2015-09-15 06:34:54 UTC
Permalink
What are the items that are used to calculate the hash? Is the hash salted?
For files/blobs, only their content is hashed (their name/timestamp/etc.,
if any, is not used). No salt is used. If i'm not mistaken (but might be),
a salt is irrelevant (or unnecessary) in a non-cryptographic context.

For passwords a combination of inputs is used: the project code (random hex
bytes), user name, and plain-text password.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Ron W
2015-09-15 17:17:21 UTC
Permalink
Post by Stephan Beal
For files/blobs, only their content is hashed (their name/timestamp/etc.,
if any, is not used). No salt is used. If i'm not mistaken (but might be),
a salt is irrelevant (or unnecessary) in a non-cryptographic context.
FYI, salts are mainly used for hashing passwords and authentication
tokens. This is to make the hashes different each time.

When using hashes to identify and/or check the integrity of documents,
salting doesn't really add to the security of either the hash or the
document.

Fossil, by using hashes as identifiers, also provides some integrity
checking of the stored documents. While Fossil is *not* generally
considered a cryptographic tool, the integrity checking it implicitly
provides could be considered a cryptographic feature.

Also, as I pointed in an earlier post, the description of how Fossil uses
GPG, PGP or similar tool, implies only the manifest gets signed. Therefor,
the signature might rely on hashes generated by Fossil.

Because of the way signatures work, if GPG were using SHA1, it would give
the same result as encrypting the manifest ID with the user's private key.
Of course, GPG can use newer, presumably better, algorithms.
Ron W
2015-09-15 17:47:18 UTC
Permalink
Post by Stephan Beal
Indeed, fossil's model allows any hash to be used, but it is not possible
to change the hash without a near-complete overhaul of fossil (and its
docs), nor without invalidating every repo in existence, so it's highly
unlikely to ever happen. Supporting two hash variants in one fossil binary
would likely prove to be problematic (and would require a major overhaul).
What parts of Fossil's source I have looked at seem to be well structure. I
would be surprised if changing the hash algorithm required changing the
internals of 1, maybe 2 functions.

As for repo compatibility, if nothing else, a build option to select the
hash algorithm to use.

It would also be a good idea that repos using other than SHA1 have a new
card in their manifests to indicate (and identify) use of a different has
algorithm.
Warren Young
2015-09-15 23:05:19 UTC
Permalink
it is not possible to change the hash without a near-complete overhaul of fossil (and its docs)
I’ve already addressed the documentation/UI issue repeatedly above: The fact that Fossil uses SHA-1 should be a hidden implementation detail, unimportant to anyone but those working on the lowest-level parts of Fossil.

(Plus those working on compatible software such as libfossil and FUEL.)
Supporting two hash variants in one fossil binary would likely prove to be problematic (and would require a major overhaul).
Why can’t an artifact’s or checkin’s hash be tagged in MCF fashion, so that when Fossil checks the hashes, it knows which algorithm to use at each step?

Many other systems support multiple encryption and digest algorithms, and many of those can switch mid-stream to a different algorithm. This is known tech.
Fossil does not use it in a cryptographic context
That’s a true non sequitur. Fossil uses SHA-1 as a kind of message authentication, the very sort of thing that HTTPS certificates use it for.

Therefore, either Fossil’s use of SHA-1 is not like HTTPS certs in some respect I do not understand, or Google is wrong to be trying to push the web world off SHA-1 authenticated HTTPS certs.
There are long threads somewhere in the list archives about the changes of hash collision. Management summary: not likely to happen for many human generations.
If you mean posts like this one

http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg05979.html

then the prior discussion was all about accidental collision. I’m talking instead about motivated, well-trained, intelligent, well-funded attackers purposely attempting to engineer a collision. Not the same thing at all.

If you were going to point me instead to a different thread with the value 2^80 or (heaven forfend, 2^160) in it anywhere, you’d be pointing to something almost certainly not written by a cryptographer. That complexity only applied when SHA-1 had no known weaknesses.

The Chinese attack from 2005 reduces the attack complexity to about 2^69 operations. The Stevens attack from 2011 reduces the attack complexity even further, to between 2^60.3 and 2^65.3 operations. Add to that the improvement from Moore’s Law and you’re talking about 5 to 7 orders of magnitude improvement.

Obviously the world’s HTTPS traffic is a far bigger target than public-facing Fossil repos, so Fossil’s urgency to get off SHA-1 should be lower. That said, attacks only get better, and Moore’s Law still has steam in it, at least for embarrassingly-parallel applications like hashing.
Scott Robison
2015-09-16 01:01:21 UTC
Permalink
it is not possible to change the hash without a near-complete overhaul
of fossil (and its docs)
I’ve already addressed the documentation/UI issue repeatedly above: The
fact that Fossil uses SHA-1 should be a hidden implementation detail,
unimportant to anyone but those working on the lowest-level parts of Fossil.
(Plus those working on compatible software such as libfossil and FUEL.)
I don't dispute the "implementation detail" of SHA-1 isn't needed in the
interface / user docs. I don't think it is nearly as big a problem as you
do though, but I could be wrong.
Supporting two hash variants in one fossil binary would likely prove to
be problematic (and would require a major overhaul).
Why can’t an artifact’s or checkin’s hash be tagged in MCF fashion, so
that when Fossil checks the hashes, it knows which algorithm to use at each
step?
Many other systems support multiple encryption and digest algorithms, and
many of those can switch mid-stream to a different algorithm. This is
known tech.
I see the problem less as "can it be done" as "what would existing fossil
implementations do with the data format changes that would be required"? I
certainly think a fossil 2.0 spec should probably accommodate such changes.
Fossil does not use it in a cryptographic context
That’s a true non sequitur. Fossil uses SHA-1 as a kind of message
authentication, the very sort of thing that HTTPS certificates use it for.
Therefore, either Fossil’s use of SHA-1 is not like HTTPS certs in some
respect I do not understand, or Google is wrong to be trying to push the
web world off SHA-1 authenticated HTTPS certs.
I think calling it a non sequitur is not completely fair, though admittedly
it depends on your point of view. SHA-1 was a convenient algorithm, already
in use by git & mercurial at least, for this very same sort of computation.
It wasn't intended to provide cryptographic security, it was designed to
take a blob of data and create a pseudo-random looking string of 40 hex
digits that would be highly unlikely to collide with anything being done by
anyone else. The fact that it can be used to detect errors in the original
data is more akin to the CRC in an ethernet frame than cryptographic
security. It is more likely to detect accidental corruption than deliberate
corruption. Perhaps better signing / validation of artifacts should be
added to the fossil 2.0 list.

I do see your points about interested parties trying to create a collision,
but even if they managed that, simply gaining commit access to the master
repository as mentioned the other day would not be adequate; the
maliciously modified artifact would be rejected by fossil (as I understand
it) as a duplicate / already received artifact. In order to impact the
official / master / canonical repository, someone would have to gain access
to the file so that it could be modified and presented to the world. I find
it far more likely that someone would fork the repo and contaminate it that
way, rebuilding it from scratch, and finding ways to induce parties to use
*that* version of the library instead of the blessed repo.
There are long threads somewhere in the list archives about the changes
of hash collision. Management summary: not likely to happen for many human
generations.
If you mean posts like this one
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg05979.html
then the prior discussion was all about accidental collision. I’m talking
instead about motivated, well-trained, intelligent, well-funded attackers
purposely attempting to engineer a collision. Not the same thing at all.
And avoiding accidental collision was the initial intent. The fact that it
can be used as "a kind of message authentication" does not mean that was
how it was intended to be used.

If the fact that some algorithm is cryptographically weak means that it
should be replaced, then we have a lot of work to do:

* ethernet uses a 32 bit CRC; how much internet traffic goes through
ethernet? Can't really change that because of backward compatibility.
* rsync uses MD5 & a 32 bit rolling checksum / CRC (Adler-32 if I remember
correctly). Can't really change that because of backward compatibility.

There are many more examples, these are the first two that came to mind.
And they are admittedly not fair examples, given they are used as part of a
protocol vs part of a durable long lasting repository structure, but they
are examples of "cryptographically insecure" algorithms being used
effectively to detect accidental corruption vs deliberate shenanigans.

Again, I concede your point about bad actors trying to create deliberate
collisions, but even in so doing there is far more to do than just "push an
update". Given the widespread use of SHA-1 in DVCS systems, and the use of
GPG signatures to authenticate commits, I think it would be reasonable to
enhance the cryptographic security in a future version of fossil. If what
is desired is not "cryptographic security" but rather "excellent but not
perfect hashing to create distributed unique identifiers" then SHA-1 will
continue to work for a very long time.
--
Scott Robison
Warren Young
2015-09-16 01:46:21 UTC
Permalink
Post by Scott Robison
I think calling it a non sequitur is not completely fair
Stephan stated that Fossil isn’t doing cryptography, therefore SHA-1 doesn’t have to be replaced. Cryptography and message authentication are not the same thing.

It’s like pointing out that the bald tires on the car do not need to be replaced because we don’t require that the car be able to climb trees.
Post by Scott Robison
It wasn't intended to provide cryptographic security
I’m probably just being pedantic, but now you’re doing it, too.

“Cryptographic security” implies encryption, which is not being done here.

The proper phrasing is “cyptographically-strong message digest algorithm.” The reference to cryptography is only an indicator that the use of a given MD algorithm can be used with some given cryptosystem without compromising its integrity.
Post by Scott Robison
It is more likely to detect accidental corruption than deliberate corruption.
I thought that’s what the MD5 bits were for.

My sense from reading the file format wiki page is that the SHA-1 bits ensure that blob B, which is intended to appear in the timeline between blobs A and C, was almost certainly inserted into the database at time T_b, where T_a <= T_b <= T_c. That is, it is primarily a guarantor of checkin ordering.

That’s why I’ve been framing the risk as one of potential insertion of a timeline item way in the past.

That may be a bogus risk for other reasons, though, since you’d also have to work out how to change all the deltas.

It’s also occurred to me since my previous post that all the work needed to generate a bogus SHA-1 hash for an HTTPS cert only has to be done once, at which point you now have a reusable cert good for months or years. The work needed to attack a single timeline entry in Fossil is a one-shot deal: to attack two different nodes in the timeline, you need to do 2x the work.
Post by Scott Robison
simply gaining commit access to the master repository as mentioned the other day would not be adequate; the maliciously modified artifact would be rejected by fossil
I’m no expert in Fossil’s inner workings, and I have no interest in trying to attack it.

I’m just aware that Bruce Schneier and Google’s crypto geeks know things I do not, and I use that awareness to guide my own design decisions. The last hash-based system I designed used SHA-256. :)
Post by Scott Robison
I find it far more likely that someone would fork the repo and contaminate it that way, rebuilding it from scratch, and finding ways to induce parties to use *that* version of the library instead of the blessed repo.
Clearly so.

Always attack the weakest link first, if possible.
Post by Scott Robison
* ethernet uses a 32 bit CRC; how much internet traffic goes through ethernet? Can't really change that because of backward compatibility.
* rsync uses MD5 & a 32 bit rolling checksum / CRC (Adler-32 if I remember correctly). Can't really change that because of backward compatibility.
That’s why TLS exists.

TLS doesn’t solve any weaknesses with Fossil’s use of SHA-1, though. It just prevents you from MITM-ing an existing TCP connection. Once you’ve got a TCP connection to the Fossil server, well, *then* what? That’s the purpose of this sub-thread.
Post by Scott Robison
Given the widespread use of SHA-1 in DVCS systems, and the use of GPG signatures to authenticate commits, I think it would be reasonable to enhance the cryptographic security in a future version of fossil.
Indeed, perhaps Fossil should just wait and see what Git does, if anything. Github is a much bigger target for this sort of thing, if there is a “thing” here at all.
Richard Hipp
2015-09-16 02:18:06 UTC
Permalink
Post by Warren Young
Post by Scott Robison
It is more likely to detect accidental corruption than deliberate corruption.
I thought that’s what the MD5 bits were for.
MD5 is 128 bytes versus 160 for SHA1. That's why I picked SHA1.

Tell me: suppose tomorrow somebody publishes a trivial preimage attack
against SHA1 - a program that will generate a file that has any SHA1
you want. (That's unlikely. No such program exists for even things
like MD4. But suppose.)

What attacks could you mount against Fossil using such a tool?

To put it another way, what problem would you solve by changing Fossil
to use the latest wizbang cryptographic hash function?
--
D. Richard Hipp
***@sqlite.org
Warren Young
2015-09-16 02:23:00 UTC
Permalink
Post by Richard Hipp
what problem would you solve by changing Fossil
to use the latest wizbang cryptographic hash function?
All I’m pointing out here is that we will shortly get to the time where it is economically feasible to forge arbitrary SHA-1 authenticated messages.

As for what you can do with that, that’s obviously something for you to say, as well as those of your associates who also know the internal structure of the system.

I am not going to go off and craft some kind of exploit just to be able to win this argument. If you aren’t worried after learning about the known weaknesses of SHA-1, then you aren’t worried. End of thread. :)
Scott Robison
2015-09-16 03:16:37 UTC
Permalink
Post by Scott Robison
I think calling it a non sequitur is not completely fair
Stephan stated that Fossil isn’t doing cryptography, therefore SHA-1
doesn’t have to be replaced. Cryptography and message authentication are
not the same thing.
It’s like pointing out that the bald tires on the car do not need to be
replaced because we don’t require that the car be able to climb trees.
Post by Scott Robison
It wasn't intended to provide cryptographic security
I’m probably just being pedantic, but now you’re doing it, too.
“Cryptographic security” implies encryption, which is not being done here.
The proper phrasing is “cyptographically-strong message digest
algorithm.” The reference to cryptography is only an indicator that the
use of a given MD algorithm can be used with some given cryptosystem
without compromising its integrity.
As I understand "cryptography" to be defined it means "the practice and
study of techniques for secure communication in the presence of third
parties". If that is correct, message authentication is most assuredly a
use of "cryptography to securely identify authenticity". If I'm using a
term incorrectly, my apologies.
Post by Scott Robison
It is more likely to detect accidental corruption than deliberate
corruption.
I thought that’s what the MD5 bits were for.
My sense from reading the file format wiki page is that the SHA-1 bits
ensure that blob B, which is intended to appear in the timeline between
blobs A and C, was almost certainly inserted into the database at time T_b,
where T_a <= T_b <= T_c. That is, it is primarily a guarantor of checkin
ordering.
That’s why I’ve been framing the risk as one of potential insertion of a
timeline item way in the past.
That may be a bogus risk for other reasons, though, since you’d also have
to work out how to change all the deltas.
The SHA-1 bits ensure (virtually guarantee) that blob B has a unique
identity so that two contributors don't allocate ID 42 at the same time
creating a collision in commit IDs. Artifacts are unordered and can
originally come into a repository from any source in any order. It is
primarily a guarantor of checkin identity, and the individual cards in the
manifest control timeline order, date, time, author, etc.
It’s also occurred to me since my previous post that all the work needed
to generate a bogus SHA-1 hash for an HTTPS cert only has to be done once,
at which point you now have a reusable cert good for months or years. The
to attack two different nodes in the timeline, you need to do 2x the work.
Given that the commit ID is a hash of the manifest, and most of the cards
in the manifest are F cards, in theory all you have to do is find a useful
collision with the SHA-1 hash of any file artifact. If you could modify one
file without changing its SHA-1 hash, everything else in the database would
still match its hash. Of course, if it is too far back in the history
(probably even a single merge behind tip) it may never be noticed because
the project has moved on. And the difficulty still exists of getting it
into the master repository.
Post by Scott Robison
simply gaining commit access to the master repository as mentioned the
other day would not be adequate; the maliciously modified artifact would be
rejected by fossil
I’m no expert in Fossil’s inner workings, and I have no interest in trying
to attack it.
I’m just aware that Bruce Schneier and Google’s crypto geeks know things I
do not, and I use that awareness to guide my own design decisions. The
last hash-based system I designed used SHA-256. :)
I have a lot of respect for Schneier. I would not consider using SHA-1 in a
security sensitive environment today. This use is less about security and
more about non-cryptographic hashing, where an (at one time) cryptographic
strength hash happened to be satisfactory for the needs at hand.
Post by Scott Robison
I find it far more likely that someone would fork the repo and
contaminate it that way, rebuilding it from scratch, and finding ways to
induce parties to use *that* version of the library instead of the blessed
repo.
Clearly so.
Always attack the weakest link first, if possible.
Post by Scott Robison
* ethernet uses a 32 bit CRC; how much internet traffic goes through
ethernet? Can't really change that because of backward compatibility.
Post by Scott Robison
* rsync uses MD5 & a 32 bit rolling checksum / CRC (Adler-32 if I
remember correctly). Can't really change that because of backward
compatibility.
That’s why TLS exists.
TLS doesn’t solve any weaknesses with Fossil’s use of SHA-1, though. It
just prevents you from MITM-ing an existing TCP connection. Once you’ve
got a TCP connection to the Fossil server, well, *then* what? That’s the
purpose of this sub-thread.
But GPG could solve any weaknesses with Fossil's use of SHA-1, though. It
won't prevent a determined party from deconstructing a repo, making
whatever changes are desired, and rebuilding a believable facsimile that
unwary parties might trust. The rebuilt repo could even have fraudulent GPG
signatures attached just to make it feel more legit to people who don't
really check such things.
Post by Scott Robison
Given the widespread use of SHA-1 in DVCS systems, and the use of GPG
signatures to authenticate commits, I think it would be reasonable to
enhance the cryptographic security in a future version of fossil.
Indeed, perhaps Fossil should just wait and see what Git does, if
anything. Github is a much bigger target for this sort of thing, if there
is a “thing” here at all.
I'm sure fossil will address any shortcomings before git. HA! :)
--
Scott Robison
Michal Suchanek
2015-09-16 09:23:27 UTC
Permalink
Post by Scott Robison
Post by Warren Young
Post by Scott Robison
I think calling it a non sequitur is not completely fair
Stephan stated that Fossil isn’t doing cryptography, therefore SHA-1
doesn’t have to be replaced. Cryptography and message authentication are
not the same thing.
It’s like pointing out that the bald tires on the car do not need to be
replaced because we don’t require that the car be able to climb trees.
Post by Scott Robison
It wasn't intended to provide cryptographic security
I’m probably just being pedantic, but now you’re doing it, too.
“Cryptographic security” implies encryption, which is not being done here.
The proper phrasing is “cyptographically-strong message digest algorithm.”
The reference to cryptography is only an indicator that the use of a given
MD algorithm can be used with some given cryptosystem without compromising
its integrity.
As I understand "cryptography" to be defined it means "the practice and
study of techniques for secure communication in the presence of third
parties". If that is correct, message authentication is most assuredly a use
of "cryptography to securely identify authenticity". If I'm using a term
incorrectly, my apologies.
Post by Warren Young
Post by Scott Robison
It is more likely to detect accidental corruption than deliberate corruption.
I thought that’s what the MD5 bits were for.
My sense from reading the file format wiki page is that the SHA-1 bits
ensure that blob B, which is intended to appear in the timeline between
blobs A and C, was almost certainly inserted into the database at time T_b,
where T_a <= T_b <= T_c. That is, it is primarily a guarantor of checkin
ordering.
That’s why I’ve been framing the risk as one of potential insertion of a
timeline item way in the past.
That may be a bogus risk for other reasons, though, since you’d also have
to work out how to change all the deltas.
The SHA-1 bits ensure (virtually guarantee) that blob B has a unique
identity so that two contributors don't allocate ID 42 at the same time
creating a collision in commit IDs. Artifacts are unordered and can
originally come into a repository from any source in any order. It is
primarily a guarantor of checkin identity, and the individual cards in the
manifest control timeline order, date, time, author, etc.
Post by Warren Young
It’s also occurred to me since my previous post that all the work needed
to generate a bogus SHA-1 hash for an HTTPS cert only has to be done once,
at which point you now have a reusable cert good for months or years. The
to attack two different nodes in the timeline, you need to do 2x the work.
Given that the commit ID is a hash of the manifest, and most of the cards in
the manifest are F cards, in theory all you have to do is find a useful
collision with the SHA-1 hash of any file artifact. If you could modify one
file without changing its SHA-1 hash, everything else in the database would
still match its hash. Of course, if it is too far back in the history
(probably even a single merge behind tip) it may never be noticed because
the project has moved on. And the difficulty still exists of getting it into
the master repository.
Post by Warren Young
Post by Scott Robison
simply gaining commit access to the master repository as mentioned the
other day would not be adequate; the maliciously modified artifact would be
rejected by fossil
I’m no expert in Fossil’s inner workings, and I have no interest in trying
to attack it.
I’m just aware that Bruce Schneier and Google’s crypto geeks know things I
do not, and I use that awareness to guide my own design decisions. The last
hash-based system I designed used SHA-256. :)
I have a lot of respect for Schneier. I would not consider using SHA-1 in a
security sensitive environment today. This use is less about security and
more about non-cryptographic hashing, where an (at one time) cryptographic
strength hash happened to be satisfactory for the needs at hand.
Post by Warren Young
Post by Scott Robison
I find it far more likely that someone would fork the repo and
contaminate it that way, rebuilding it from scratch, and finding ways to
induce parties to use *that* version of the library instead of the blessed
repo.
Clearly so.
Always attack the weakest link first, if possible.
Post by Scott Robison
* ethernet uses a 32 bit CRC; how much internet traffic goes through
ethernet? Can't really change that because of backward compatibility.
* rsync uses MD5 & a 32 bit rolling checksum / CRC (Adler-32 if I
remember correctly). Can't really change that because of backward
compatibility.
That’s why TLS exists.
TLS doesn’t solve any weaknesses with Fossil’s use of SHA-1, though. It
just prevents you from MITM-ing an existing TCP connection. Once you’ve got
a TCP connection to the Fossil server, well, *then* what? That’s the
purpose of this sub-thread.
But GPG could solve any weaknesses with Fossil's use of SHA-1, though. It
won't prevent a determined party from deconstructing a repo, making whatever
changes are desired, and rebuilding a believable facsimile that unwary
parties might trust. The rebuilt repo could even have fraudulent GPG
signatures attached just to make it feel more legit to people who don't
really check such things.
It has been pointed out that when using GPG to sign checking only the
manifests are signed and what links the manifests to the rest of the
content like actual file blobs or previous checkins are the weak SHA-1
hashes.

So while it is possible to use PGP with fossil it gives only a false
sense of security until fossil itself uses crypto grade hash to link
its internal articact structure.

Using a stronger or configurable hash for the internal linking of
artifacts would result in ability to verify the authenticity of a copy
of a signed repo even from unknown source so long as the signatures
are valid.

As actual signed repos are rare this is not really strong use case. On
the other hand, they may be rare because there is no real point.

Thanks

Michal
Scott Robison
2015-09-16 14:44:36 UTC
Permalink
Post by Michal Suchanek
Post by Scott Robison
But GPG could solve any weaknesses with Fossil's use of SHA-1, though. It
won't prevent a determined party from deconstructing a repo, making whatever
changes are desired, and rebuilding a believable facsimile that unwary
parties might trust. The rebuilt repo could even have fraudulent GPG
signatures attached just to make it feel more legit to people who don't
really check such things.
It has been pointed out that when using GPG to sign checking only the
manifests are signed and what links the manifests to the rest of the
content like actual file blobs or previous checkins are the weak SHA-1
hashes.
Right, I didn't mean "GPG can fix this today with the current
implementation in fossil". Just that it could be used to authenticate the
source of global repo state.

If we accept that sha1 is used for nothing more than identification and a
way to validate an artifact as having not been accidentally modified, then
clearly another means of authentication is necessary if it is a required
feature. I accept that all artifacts should be signed for such a feature
and that it is not happening at this time.
Post by Michal Suchanek
So while it is possible to use PGP with fossil it gives only a false
sense of security until fossil itself uses crypto grade hash to link
its internal articact structure.
Using a stronger or configurable hash for the internal linking of
artifacts would result in ability to verify the authenticity of a copy
of a signed repo even from unknown source so long as the signatures
are valid.
As actual signed repos are rare this is not really strong use case. On
the other hand, they may be rare because there is no real point.
I think they are rare because signing and verifying is a pain and we trust
the official versions of repos. Arguably we should not. Even if GPG were
being used completely and effectively, how can we be sure someone's private
keys weren't compromised?

We've talked on list before about how (with regard to computers) nothing is
perfect, everything is statistically flawed in some way making it less than
100% guaranteed to work properly. I think fossil's (and other dvcs) use of
sha1 fits in this category. It isn't perfect, but it is close enough for
the use case.
Post by Michal Suchanek
Thanks
Michal
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Stephan Beal
2015-09-16 05:38:16 UTC
Permalink
cryptographic security, it was designed to take a blob of data and create
a pseudo-random looking string of 40 hex digits that would be highly
unlikely to collide with anything being done by anyone else.
To add to that - a collision is not a problem so long as it doesn't happen
in the same context. If 2 independent repos end up with the same hash for 2
distinct blobs, _no big deal_. Nothing evil happens there. Even if one were
to try to feed "that other" blob into "the other" repo, the R-card
calculation (done using md5) would then change, invalidating any checkins.

I do see your points about interested parties trying to create a collision,
but even if they managed that, simply gaining commit access to the master
repository as mentioned the other day would not be adequate; the
maliciously modified artifact would be rejected by fossil (as I understand
it) as a duplicate / already received artifact.
Even if they locally inject it, the R-card calculation would see it, as it
is (i will naively assert) "impossible" that both the sha1 and md5 could
both be made to match in a collision of a non-empty blob. (The empty-blob
case is an interesting one, though, if only intellectually.)
Again, I concede your point about bad actors trying to create deliberate
collisions, but even in so doing there is far more to do than just "push an
update".
+1.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Noam Postavsky
2015-09-16 14:03:30 UTC
Permalink
Post by Stephan Beal
(i will naively assert) "impossible" that both the sha1 and md5 could
both be made to match in a collision of a non-empty blob.
This might be too optimistic. According to Antoine Joux in
"Multicollisions in Iterated Hash Functions. Application to Cascaded
Constructions" [1]

A natural construction to build large hash values is to
concatenate several smaller
hashes. For example, given two hash functions F and G, it seems
reasonable given
a message M to form the large hash value (F(M)||G(M)). In this construction,
F and G can either be two completely different hash functions or
two slightly
different instances of the same hash function. If F and G are good
iterated hash
functions with no attack better than the generic birthday paradox attack, we
claim that the hash function F||G obtained by concatenating F and G is not
really more secure that F or G by itself. Moreover, this result
applies both to
collision resistance, preimage resistance and second preimage resistance.
[...]
Another generalization of the above attack is also worth noting. In [14],
B. Schneier described a different way of building a long hash from
a hash function
F. In this method, F(M) is concatenated with G(F(M)||M) (or G(M||F(M))).
At first view, this is more complicated than the F||G
construction. However, the
very same attack can be applied.
[...]
One can also study a related question, how does the security of
the concatenated
hash F||G behaves, when F and G have non-generic attacks better than
the birthday paradox collision search? In that case, can F||G be
significantly
more secure than the best of F and G?
[...] if G also admits a shortcut attack (as in section 3), it is
unclear whether
the two shortcut attacks may be used together to improve the composed attack
against F||G. Yet, some other type of attacks against G can be
integrated into a
better composed attack on F||G. ... Thus, it is safer to assume that F||G is
essentially as secure as the best of F and G, no more.

[1] http://link.springer.com/chapter/10.1007%2F978-3-540-28628-8_19
Richard Hipp
2015-09-16 14:13:02 UTC
Permalink
Post by Noam Postavsky
Post by Stephan Beal
(i will naively assert) "impossible" that both the sha1 and md5 could
both be made to match in a collision of a non-empty blob.
This might be too optimistic. According to Antoine Joux in
"Multicollisions in Iterated Hash Functions. Application to Cascaded
Constructions" [1]
[1] http://link.springer.com/chapter/10.1007%2F978-3-540-28628-8_19
Fascinating. Thanks for the link!
--
D. Richard Hipp
***@sqlite.org
Stephan Beal
2015-09-16 05:33:26 UTC
Permalink
it is not possible to change the hash without a near-complete overhaul
of fossil (and its docs)
I’ve already addressed the documentation/UI issue repeatedly above: The
fact that Fossil uses SHA-1 should be a hidden implementation detail,
unimportant to anyone but those working on the lowest-level parts of Fossil.
There are lots of assumptions in many places about which hashes are being
used, and their properties (e.g. length and being made up solely of
lowercase hex).
(Plus those working on compatible software such as libfossil and FUEL.)
And i can say from my work on libfossil that this is so ;).
Supporting two hash variants in one fossil binary would likely prove to
be problematic (and would require a major overhaul).
Why can’t an artifact’s or checkin’s hash be tagged in MCF fashion, so
that when Fossil checks the hashes, it knows which algorithm to use at each
step?
It "could" be done, but it would essentially require duplicate code paths
for much of the existing code and would not be directly compatible with
existing repos (which would have to keep using sha1).
Many other systems support multiple encryption and digest algorithms, and
many of those can switch mid-stream to a different algorithm. This is
known tech.
Sure, it's conceivable, but it's more trouble than it's worth. There's no
use case for fossil where such a move would simplify its usage in any way.
then the prior discussion was all about accidental collision. I’m talking
instead about motivated, well-trained, intelligent, well-funded attackers
purposely attempting to engineer a collision. Not the same thing at all.
Wake me up when that happens. It hasn't happened yet and there is little
reason to suspect that it ever will.
The Chinese attack from 2005 reduces the attack complexity to about 2^69
operations. The Stevens attack from 2011 reduces the attack complexity
even further, to between 2^60.3 and 2^65.3 operations. Add to that the
improvement from Moore’s Law and you’re talking about 5 to 7 orders of
magnitude improvement.
Again - a hypothetical problem. (A) nobody has anything to gain by
maliciously injecting content into a fossil repo and (B) i would have to
see it happen to believe it.
Obviously the world’s HTTPS traffic is a far bigger target than
public-facing Fossil repos, so Fossil’s urgency to get off SHA-1 should be
lower. That said, attacks only get better, and Moore’s Law still has steam
in it, at least for embarrassingly-parallel applications like hashing.\
Show me one such successful attack on fossil and i'll be _all ears_.
--
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
Joerg Sonnenberger
2015-09-12 12:49:23 UTC
Permalink
Post by Warren Young
I wonder if this is an implementation detail leaking through into the
UI, though. Under what conditions, except for Noam’s contrived example
with hardcoded dates, is there a useful distinction between “hash” —
implying a number that you could reliably recompute given all the input
data — and “random number”?
The SHA1 hash is just a deterministic random number for this purpose.
Making it deterministic just means that the identifier can also double
as checksum.

Joerg
Ron W
2015-09-10 15:29:15 UTC
Permalink
Post by j. van den hoff
well, I'm only talking about the ordinal numbers chronologically
enumerating the timeline checkin(!) entries. this enumeration will not
change as a consequence of rebuild, right? it might change after a sync
against some remote repo if there are incoming checkins chronologically
interleaved with my own, sure, but so what? the relative numbers would be
just a (somewhat "volatile") convenience measure _locally_. and I agree
with another recent post that this would primarily concern the CLI. what I
mean: go ask some hg users when they last did use sha1 hashes for
specifying checkins in their interaction with hg (which supports both the
ordinals as well as the hashes for doing so) and how often the presence of
those numbers confused communication with other developers in their
project. I'm quite sure they _never_ specify sha1 hashes to denote checkins
in any small to medium-sized project below 10^4 checkins (currently
this still includes fossil itself). not so sure about the "communication"
issue if users forget the potentially 'volatile' nature of the relative
enumeration, but this just can't possibly be a big issue ...
For the project (a few years ago) I used Hg, yes, for my *local*
interaction with Hg, I did use the sequential numbers to save typing (or
copy/paste) the hashes.

When communicating with others, I used the hash.

In my earlier post, I had misremembered Hg's numbering. I had thought it
was in reverse order. But it is actually in commit order.

Personally, I would find some kind of relative specification more useful.
For example, if I could say "fossil gdiff --from cur-3" and get a diff
between the current check out and the revision 3 commits before the
revision the check out is from - along the same branch. If that would go
beyond the beginning of the branch, then continue back into the "parent"
branch from the branch point.

Revising my earlier example to more closely reflect this "relative"
concept:

$ fossil timeline -N -n 3
0 [d28be5063a] *CURRENT* Fix linker parameter file
-1 [10a5af61c1] Alt code for HS interface
-2 [5250e3796e] Increase speed threshold
$ fossil info -r cur-1
uuid: 10a5af61c1fc25060ad428de9c82e3615b45f6c8 ...
Konstantin Khomoutov
2015-09-10 15:58:59 UTC
Permalink
On Thu, 10 Sep 2015 11:29:15 -0400
Ron W <***@gmail.com> wrote:

[...]
Post by Ron W
Personally, I would find some kind of relative specification more
useful. For example, if I could say "fossil gdiff --from cur-3" and
get a diff between the current check out and the revision 3 commits
before the revision the check out is from - along the same branch.
That's what Git does (see ^N and ~N revision suffixes) [1].
Actually, its revision modifiers can do way more than that.

Mercurial's revsets [2] are also quite powerful.

1. https://www.kernel.org/pub/software/scm/git/docs/gitrevisions.html
2. http://hg.intevation.org/mercurial/crew/file/tip/mercurial/help/revsets.txt
paul
2015-09-09 21:03:59 UTC
Permalink
Post by j. van den hoff
Post by Ron W
Post by Luca Ferrari
Some DVCS, like hg, use both an hash and a sequential number.
As I recall (been a few years since I last used hg), the numbers were
"relative" to the output of hg's equivalent to "timeline".
Assuming I am remembering correctly, if Fossil had this feature, you could
$ fossil timeline -N -n 3
0 [d28be5063a] *CURRENT* Fix linker parameter file
1 [10a5af61c1] Alt code for HS interface
2 [5250e3796e] Increase speed threshold
$ fossil info 1
uuid: 10a5af61c1fc25060ad428de9c82e3615b45f6c8 ...
The numbers, of course, could change after any sync or commit.
in a breach of promise to myself to never again argue in favour of
this functionality on the fossil mailing list (it came up a few times
having simple chronological checkin numbers as an alternative way of
specifying checkins _locally_ just the way hg has done for years
would be a *good* thing. simply because for most projects (all the
small ones out there) specifying chronological numbers is
shorter/easier than specifying (unique min 4-digits prefixes of) sha1
hashes. and the "chronologic property" itself is helpful in itself,
e.g in comparing 'current vs. previous' checkin. and until checkin
9999 its at least break even in terms of typing effort. the fact that
those chronological checkin numbers are a local property of each
clone/checkout rather than of the repo proper is beside the point in
my view: it is true but mostly irrelevant. I concede that there might
arise confusion if people are really not aware of the potential
ambiguity of those chronological numbers across different clones if
they start to argue about a certain checkin. but when interacting with
fossil it cannot have adverse effects afaiks. rather the opposite
Sounds to me like you need a GUI.

I need to get my backside into gear and finish this:
www.p-code.org/fcommit. But you could try it out, because it's very
convenient for doing what you describe.

It's got a history dialog with links down the left. As you click on the
links a diff window updates with the changes for that checkin. If you
Ctrl and click on a link it will change colour to yellow. Ctrl-click on
another link and it will also change to yellow. Press control-d then
press ok to see a diff between the two checkins.

Some things are just more convenient with a GUI.
Barry Arthur
2015-09-10 01:42:53 UTC
Permalink
The latest Fuel 2.0 is also quite usable now.
https://fuel-scm.org/fossil/home
Post by paul
Post by j. van den hoff
Post by Ron W
Post by Luca Ferrari
Some DVCS, like hg, use both an hash and a sequential number.
As I recall (been a few years since I last used hg), the numbers were
"relative" to the output of hg's equivalent to "timeline".
Assuming I am remembering correctly, if Fossil had this feature, you could
$ fossil timeline -N -n 3
0 [d28be5063a] *CURRENT* Fix linker parameter file
1 [10a5af61c1] Alt code for HS interface
2 [5250e3796e] Increase speed threshold
$ fossil info 1
uuid: 10a5af61c1fc25060ad428de9c82e3615b45f6c8 ...
The numbers, of course, could change after any sync or commit.
in a breach of promise to myself to never again argue in favour of this
functionality on the fossil mailing list (it came up a few times over the
having simple chronological checkin numbers as an alternative way of
specifying checkins _locally_ just the way hg has done for years would be
a *good* thing. simply because for most projects (all the small ones out
there) specifying chronological numbers is shorter/easier than specifying
(unique min 4-digits prefixes of) sha1 hashes. and the "chronologic
property" itself is helpful in itself, e.g in comparing 'current vs.
previous' checkin. and until checkin 9999 its at least break even in terms
of typing effort. the fact that those chronological checkin numbers are a
local property of each clone/checkout rather than of the repo proper is
beside the point in my view: it is true but mostly irrelevant. I concede
that there might arise confusion if people are really not aware of the
potential ambiguity of those chronological numbers across different clones
if they start to argue about a certain checkin. but when interacting with
fossil it cannot have adverse effects afaiks. rather the opposite
Sounds to me like you need a GUI.
www.p-code.org/fcommit. But you could try it out, because it's very
convenient for doing what you describe.
It's got a history dialog with links down the left. As you click on the
links a diff window updates with the changes for that checkin. If you Ctrl
and click on a link it will change colour to yellow. Ctrl-click on another
link and it will also change to yellow. Press control-d then press ok to
see a diff between the two checkins.
Some things are just more convenient with a GUI.
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Jacek Cała
2015-09-10 10:44:30 UTC
Permalink
Post by paul
Post by Ron W
Post by Ron W
Assuming I am remembering correctly, if Fossil had this feature, you
could
$ fossil timeline -N -n 3
0 [d28be5063a] *CURRENT* Fix linker parameter file
1 [10a5af61c1] Alt code for HS interface
2 [5250e3796e] Increase speed threshold
$ fossil info 1
uuid: 10a5af61c1fc25060ad428de9c82e3615b45f6c8 ...
The numbers, of course, could change after any sync or commit.
Sounds to me like you need a GUI.
Some things are just more convenient with a GUI.
Personally, I don't think it is only the GUI thing. Fossil delivers both
sides: the CLI client and the server in a neat single exec file. The
ordered numbering is just a client-side convenience. Long time ago I was
trying to propose and (almost) implemented numbering for changes, so you
could do selective commit with range of files like 1-5,7. There was little
interest in that feature, so I gave up.

All in all, I think it would be nice to add these little things to the
console client, so the need for the GUI is only for those who really hate
console.

Cheers,
Jacek
Baruch Burstein
2015-09-10 13:56:49 UTC
Permalink
Post by Jacek Cała
All in all, I think it would be nice to add these little things to the
console client, so the need for the GUI is only for those who really hate
console.
Some of us (yes, even some programmers) think of it the other way round...
--
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
paul
2015-09-10 14:41:15 UTC
Permalink
Post by Jacek Cała
All in all, I think it would be nice to add these little things to
the console client, so the need for the GUI is only for those who
really hate console.
Some of us (yes, even some programmers) think of it the other way round...
--
˙uʍop-ǝpısdn sı ɹoʇıuoɯ ɹnoʎ 'sıɥʇ pɐǝɹ uɐɔ noʎ ɟı
_______________________________________________
fossil-users mailing list
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
I mostly use Linux and the command line, but for scm I prefer to use a
GUI. For Windows, I probably do hate its console :)
Ross Berteig
2015-09-10 18:30:43 UTC
Permalink
.... Long time ago I was
trying to propose and (almost) implemented numbering for changes, so you
could do selective commit with range of files like 1-5,7. There was
little interest in that feature, so I gave up.
All in all, I think it would be nice to add these little things to the
console client, so the need for the GUI is only for those who really
hate console.
I realized mid change the other day that the repo I was in had a bunch
of IDE private project data files that had been checked in. (Rant: Just
when will IDE authors learn to prominently document their project
databases to make interoperation with any VCS easier?)

I've learned to take care of that sort of housekeeping when I notice it,
so I went and did fossil rm (and fixed ignore-glob) to nip them off.
Then I had a current checkout where fossil changes listed the eight
files marked DELETED, an ADDED line for ignore-glob (apparently I hadn't
really set this repo up right yet), and four EDITED for the actual change.

I've never been certain how to checkin just a delete, rename or merge,
so the easy answer in this case was to remember that the stash exists
and use that to hold the real work for a minute or two while I checked
in the structural changes.

But numbering the output of fossil changes with simple ordinals that can
be used by an "immediately following" fossil commit would have been a
clear and direct way of saying that to the command line. Those ordinals
would have to be really transient, and likely are invalidated by nearly
anything that changes a file, but for that specific sequence of fossil
changes then fossil ci it could be friendly.

But the stash worked for me, and was in some sense safer because once
the right files were stashed, I couldn't accidentally mix them into any
checkin until they were popped back out.
--
Ross Berteig ***@CheshireEng.com
Cheshire Engineering Corp. http://www.CheshireEng.com/
Ron W
2015-09-10 20:03:43 UTC
Permalink
I realized mid change the other day that the repo I was in had a bunch of
IDE private project data files that had been checked in. (Rant: Just when
will IDE authors learn to prominently document their project databases to
make interoperation with any VCS easier?)
You mean so you more easily ignore or mark them as "binary"?

Fossil itself has no IDE integration features.

But numbering the output of fossil changes with simple ordinals that can be
used by an "immediately following" fossil commit would have been a clear
and direct way of saying that to the command line. Those ordinals would
have to be really transient,
Which is why I like the idea of relative specifications like "cur-3",
"cur+2" or "tip-1"
and likely are invalidated by nearly anything that changes a file, but for
that specific sequence of fossil changes then fossil ci it could be
friendly.
Ross Berteig
2015-09-10 21:00:18 UTC
Permalink
Post by Ron W
....
(Rant: Just when will IDE authors learn to prominently document
their project databases to make interoperation with any VCS easier?)
You mean so you more easily ignore or mark them as "binary"?
Yes, both, as needed.

This chip vendor supplied IDE is typical in my experience. It has a
project description file in some arbitrary textual format. It lists all
the source files, build parameters, and everything else needed to
actually build the HEX file you program into the embedded system.

Because it must have made the IDE developer's life easier, it also
creates a pile of undocumented binary and XML files. Those should not be
checked in as they are rewritten frequently while using the IDE and
don't control the actual build.

When you create a new project in the IDE, you get all of these files and
no description of which ones are original source code and which are
build products so that you can know that the repository really contains
everything needed to reproduce the build, and nothing that is going to
change just because someone sneezed.

(For example, this particular project builds three libraries and an
application. Of the 13 files belonging to the IDE, 5 hold lists of build
products, source files, build steps, and project options and clearly
belong in the repository. The remaining 8 include one INI file, 3 XML
files, and 4 SQLite databases. All those remaining 8 files are modified
just by opening the project in the IDE even if nothing else is done. All
appear to be recreated if missing, and hence need not be checked in.)

(Another rant: at least this IDE doesn't re-write the real project
description file in a fresh and arbitrary order every time you launch
the IDE. I worked with one that did that and also insisted on writing
all file names as fully qualified path names. It created extra chaff on
every checkin, and made it impossible to have multiple open checkouts on
the same PC without every checkin having guaranteed merge conflicts.
This added a lot of friction to my customer, and made it more difficult
than needed to deliver source code snapshots to my customer that could
be built by anyone other than me.)

IDE makers really should clearly document what files make up their
project metadata, and be careful to keep cosmetic parameters and caches
in separate files from the core project description.
Post by Ron W
Fossil itself has no IDE integration features.
Nope. And it really shouldn't need them.

The only thing the fossil community could do is keep some wiki pages
documenting what files are source code for various use cases so we had a
place to go look when forced to deal with a new and "wonderful" IDE.
Other notes that might be helpful would include ideas for things to put
in the *-glob settings, any other settings that prove helpful, ideas for
forcing the IDE to a sane(r) directory layout, notes on how to make it
cooperate with other build systems if possible, etc.
--
Ross Berteig ***@CheshireEng.com
Cheshire Engineering Corp. http://www.CheshireEng.com/
Loading...