it is not possible to change the hash without a near-complete overhaul
of fossil (and its docs)
Iâve already addressed the documentation/UI issue repeatedly above: The
fact that Fossil uses SHA-1 should be a hidden implementation detail,
unimportant to anyone but those working on the lowest-level parts of Fossil.
(Plus those working on compatible software such as libfossil and FUEL.)
I don't dispute the "implementation detail" of SHA-1 isn't needed in the
interface / user docs. I don't think it is nearly as big a problem as you
do though, but I could be wrong.
Supporting two hash variants in one fossil binary would likely prove to
be problematic (and would require a major overhaul).
Why canât an artifactâs or checkinâs hash be tagged in MCF fashion, so
that when Fossil checks the hashes, it knows which algorithm to use at each
step?
Many other systems support multiple encryption and digest algorithms, and
many of those can switch mid-stream to a different algorithm. This is
known tech.
I see the problem less as "can it be done" as "what would existing fossil
implementations do with the data format changes that would be required"? I
certainly think a fossil 2.0 spec should probably accommodate such changes.
Fossil does not use it in a cryptographic context
Thatâs a true non sequitur. Fossil uses SHA-1 as a kind of message
authentication, the very sort of thing that HTTPS certificates use it for.
Therefore, either Fossilâs use of SHA-1 is not like HTTPS certs in some
respect I do not understand, or Google is wrong to be trying to push the
web world off SHA-1 authenticated HTTPS certs.
I think calling it a non sequitur is not completely fair, though admittedly
it depends on your point of view. SHA-1 was a convenient algorithm, already
in use by git & mercurial at least, for this very same sort of computation.
It wasn't intended to provide cryptographic security, it was designed to
take a blob of data and create a pseudo-random looking string of 40 hex
digits that would be highly unlikely to collide with anything being done by
anyone else. The fact that it can be used to detect errors in the original
data is more akin to the CRC in an ethernet frame than cryptographic
security. It is more likely to detect accidental corruption than deliberate
corruption. Perhaps better signing / validation of artifacts should be
added to the fossil 2.0 list.
I do see your points about interested parties trying to create a collision,
but even if they managed that, simply gaining commit access to the master
repository as mentioned the other day would not be adequate; the
maliciously modified artifact would be rejected by fossil (as I understand
it) as a duplicate / already received artifact. In order to impact the
official / master / canonical repository, someone would have to gain access
to the file so that it could be modified and presented to the world. I find
it far more likely that someone would fork the repo and contaminate it that
way, rebuilding it from scratch, and finding ways to induce parties to use
*that* version of the library instead of the blessed repo.
There are long threads somewhere in the list archives about the changes
of hash collision. Management summary: not likely to happen for many human
generations.
If you mean posts like this one
http://www.mail-archive.com/fossil-users%40lists.fossil-scm.org/msg05979.html
then the prior discussion was all about accidental collision. Iâm talking
instead about motivated, well-trained, intelligent, well-funded attackers
purposely attempting to engineer a collision. Not the same thing at all.
And avoiding accidental collision was the initial intent. The fact that it
can be used as "a kind of message authentication" does not mean that was
how it was intended to be used.
If the fact that some algorithm is cryptographically weak means that it
should be replaced, then we have a lot of work to do:
* ethernet uses a 32 bit CRC; how much internet traffic goes through
ethernet? Can't really change that because of backward compatibility.
* rsync uses MD5 & a 32 bit rolling checksum / CRC (Adler-32 if I remember
correctly). Can't really change that because of backward compatibility.
There are many more examples, these are the first two that came to mind.
And they are admittedly not fair examples, given they are used as part of a
protocol vs part of a durable long lasting repository structure, but they
are examples of "cryptographically insecure" algorithms being used
effectively to detect accidental corruption vs deliberate shenanigans.
Again, I concede your point about bad actors trying to create deliberate
collisions, but even in so doing there is far more to do than just "push an
update". Given the widespread use of SHA-1 in DVCS systems, and the use of
GPG signatures to authenticate commits, I think it would be reasonable to
enhance the cryptographic security in a future version of fossil. If what
is desired is not "cryptographic security" but rather "excellent but not
perfect hashing to create distributed unique identifiers" then SHA-1 will
continue to work for a very long time.
--
Scott Robison