* RFC: Another proposed hash function transition plan
@ 2017-03-04 1:12 Jonathan Nieder
2017-03-05 2:35 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Jonathan Nieder @ 2017-03-04 1:12 UTC (permalink / raw)
To: git; +Cc: sbeller, bmwill, jonathantanmy, peff, Linus Torvalds
Hi,
This past week we came up with this idea for what a transition to a new
hash function for Git would look like. I'd be interested in your
thoughts (especially if you can make them as comments on the document,
which makes it easier to address them and update the document).
This document is still in flux but I thought it best to send it out
early to start getting feedback.
We tried to incorporate some thoughts from the thread
http://public-inbox.org/git/20170223164306.spg2avxzukkggrpb@kitenet.net
but it is a little long so it is easy to imagine we've missed
some things already discussed there.
You can use the doc URL
https://goo.gl/gh2Mzc
to view the latest version and comment.
Thoughts welcome, as always.
Git hash function transition
============================
Status: Draft
Last Updated: 2017-03-03
Objective
---------
Migrate Git from SHA-1 to a stronger hash function.
Background
----------
The Git version control system can be thought of as a content
addressable filesystem. It uses the SHA-1 hash function to name
content. For example, files, trees, commits are referred to by hash
values unlike in other traditional version control systems where files
or versions are referred to via sequential numbers. The use of a hash
function to address its content delivers a few advantages:
* Integrity checking is easy. Bit flips, for example, are easily
detected, as the hash of corrupted content does not match its name.
Lookup of objects is fast.
Using a cryptographically secure hash function brings additional advantages:
* Object names can be signed and third parties can trust the hash to
address the signed object and all objects it references.
* Communication using Git protocol and out of band communication
methods have a short reliable string that can be used to reliably
address stored content.
Over time some flaws in SHA-1 have been discovered by security
researchers. https://shattered.io demonstrated a practical SHA-1 hash
collision. As a result, SHA-1 cannot be considered cryptographically
secure any more. This impacts the communication of hash values because
we cannot trust that a given hash value represents the known good
version of content that the speaker intended.
SHA-1 still possesses the other properties such as fast object lookup
and safe error checking, but other hash functions are equally suitable
that are believed to be cryptographically secure.
Goals
-----
1. The transition to SHA256 can be done one local repository at a time.
a. Requiring no action by any other party.
b. A SHA256 repository can communicate with SHA-1 Git servers and
clients (push/fetch).
c. Users can use SHA-1 and SHA256 identifiers for objects
interchangeably.
d. New signed objects make use of a stronger hash function than
SHA-1 for their security guarantees.
2. Allow a complete transition away from SHA-1.
a. Local metadata for SHA-1 compatibility can be dropped in a
repository if compatibility with SHA-1 is no longer needed.
3. Maintainability throughout the process.
a. The object format is kept simple and consistent.
b. Creation of a generalized repository conversion tool.
Non-Goals
---------
1. Add SHA256 support to Git protocol. This is valuable and the
logical next step but it is out of scope for this initial design.
2. Transparently improving the security of existing SHA-1 signed
objects.
3. Intermixing objects using multiple hash functions in a single
repository.
4. Taking the opportunity to fix other bugs in git's formats and
protocols.
5. Shallow clones and fetches into a SHA256 repository. (This will
change when we add SHA256 support to Git protocol.)
6. Skip fetching some submodules of a project into a SHA256
repository. (This also depends on SHA256 support in Git protocol.)
Overview
--------
We introduce a new repository format extension `sha256`. Repositories
with this extension enabled use SHA256 instead of SHA-1 to name their
objects. This affects both object names and object content --- both
the names of objects and all references to other objects within an
object are switched to the new hash function.
sha256 repositories cannot be read by older versions of Git.
Alongside the packfile, a sha256 stores a bidirectional mapping
between sha256 and sha1 object names. The mapping is generated locally
and can be verified using "git fsck". Object lookups use this mapping
to allow naming objects using either their sha1 and sha256 names
interchangeably.
"git cat-file" and "git hash-object" gain options to display a sha256
object in its sha1 form and write a sha256 object given its sha1 form.
This requires all objects referenced by that object to be present in
the object database so that they can be named using the appropriate
name (using the bidirectional hash mapping).
Fetches from a SHA-1 based server convert the fetched objects into
sha256 form and record the mapping in the bidirectional mapping table
(see below for details). Pushes to a SHA-1 based server convert the
objects being pushed into sha1 form so the server does not have to be
aware of the hash function the client is using.
Detailed Design
---------------
Object names
~~~~~~~~~~~~
Objects can be named by their 40 hexadecimal digit sha1-name or 64
hexadecimal digit sha256-name, plus names derived from those (see
gitrevisions(7)).
The sha1-name of an object is the SHA-1 of the concatenation of its
type, length, a nul byte, and the object's sha1-content. This is the
traditional <sha1> used in Git to name objects.
The sha256-name of an object is the SHA-256 of the concatenation of
its type, length, a nul byte, and the object's sha256-content.
Object format
~~~~~~~~~~~~~
Objects are stored using a compressed representation of their
sha256-content. The sha256-content of an object is the same as its
sha1-content, except that:
* objects referenced by the object are named using their sha256-names
instead of sha1-names
* signed tags, commits, and merges of signed tags get some additional
fields (see below)
The format allows round-trip conversion between sha256-content and
sha1-content.
Loose objects use zlib compression and packed objects use the packed
format described in Documentation/technical/pack-format.txt, just like
today.
Translation table
~~~~~~~~~~~~~~~~~
A fast bidirectional mapping between sha1-names and sha256-names of
all local objects in the repository is kept on disk. The exact format
of that mapping is to be determined.
All operations that make new objects (e.g., "git commit") add the new
objects to the translation table.
Reading an object's sha1-content
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The sha1-content of an object can be read by converting all
sha256-names its sha256-content references to sha1-names using the
translation table. There is an additional minor transformation needed
for signed tags, commits, and merges (see below).
Fetch
~~~~~
Fetching from a SHA-1 based server requires translating between SHA-1
and SHA-256 based representations on the fly.
SHA-1s named in the ref advertisement can be translated to SHA-256 and
looked up as local objects using the translation table.
Negotiation proceeds as today. Any "have"s or "want"s generated
locally are converted to SHA-1 before being sent to the server, and
SHA-1s mentioned by the server are converted to SHA-256 when looking
them up locally.
After negotiation, the server sends a packfile containing the
requested objects. We convert the packfile to SHA-256 format using the
following steps:
1. index-pack: inflate each object in the packfile and compute its
SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
objects the client has locally. These objects can be looked up using
the translation table and their sha1-content read as described above
to resolve the deltas.
2. topological sort: starting at the "want"s from the negotiation
phase, walk through objects in the pack and emit a list of them in
topologically sorted order. (This list only contains objects
reachable from the "wants". If the pack from the server contained
additional extraneous objects, then they will be discarded.)
3. convert to sha256: open a new (sha256) packfile. Read the
topologically sorted list just generated in reverse order. For each
object, inflate its sha1-content, convert to sha256-content, and
write it to the sha256 pack. Write an idx file for this pack and
include the new sha1<->sha256 mapping entry in the translation
table.
4. clean up: remove the SHA-1 based pack file, index, and
topologically sorted list obtained from the server and steps 1 and 2.
Step 3 requires every object referenced by the new object to be in the
translation table. This is why the topological sort step is necessary.
As an optimization, step 1 can write a file describing what objects
each object it has inflated from the packfile references. This makes
the topological sort in step 2 possible without inflating the objects
in the packfile for a second time. The objects need to be inflated
again in step 3, for a total of two inflations.
Push
~~~~
Push is simpler than fetch because the objects referenced by the
pushed objects are already in the translation table. The sha1-content
of each object being pushed can be read as described in the "Reading
an object's sha1-content" section to generate the pack written by git
send-pack.
Signed Objects
~~~~~~~~~~~~~~
Commits
^^^^^^^
Commits currently have the following sequence of header lines:
"tree" SP object-name
("parent" SP object-name)*
"author" SP ident
"committer" SP ident
("mergetag" SP object-content)?
("gpgsig" SP pgp-signature)?
We introduce new header lines "hash" and "nohash" that come after the
"gpgsig" field. No "hash" lines may appear unless the "gpgsig" field
is present.
Hash lines have the form
"hash" SP hash-function SP field SP alternate-object-name
Nohash lines have the form
"nohash" SP hash-function
There are only two recognized values of hash-function: "sha1" and
"sha256". "git fsck" will tolerate values of hash-function it does not
recognize, as long as they do not come before either of those two. All
"nohash" lines come before all "hash" lines. Any "hash sha1" lines
must come before all "hash sha256" lines, and likewise for nohash. The
Git project determines any future supported hash-functions that can
come after those two and their order.
There can be at most one "nohash <hash-function>" for one hash
function, indicating that this hash function should not be used when
checking the commit's signature.
There is one "hash <hash-function>" line for each tree or parent field
in the commit object header. The hash lines record object names for
those trees and parents using the indicated hash function, to be used
when checking the commit's signature.
TODO: simplify signature rules, handle the mergetag field better.
sha256-content of signed commits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The sha256-content of a commit with a "gpgsig" header can include no
hash and nohash lines, a "nohash sha256" line and "hash sha1", or just
a "hash sha1" line.
Examples:
1. tree 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
parent e094bc809626f0a401a40d75c56df478e546902ff812772c4594265203b23980
parent 1059dab4748aa33b86dad5ca97357bd322abaa558921255623fbddd066bb3315
author A U Thor <author@example.com> 1465982009 +0000
committer C O Mitter <committer@example.com> 1465982009 +0000
gpgsig ...
2. tree 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
parent e094bc809626f0a401a40d75c56df478e546902ff812772c4594265203b23980
parent 1059dab4748aa33b86dad5ca97357bd322abaa558921255623fbddd066bb3315
author A U Thor <author@example.com> 1465982009 +0000
committer C O Mitter <committer@example.com> 1465982009 +0000
gpgsig ...
nohash sha256
hash sha1 tree c7b1cff039a93f3600a1d18b82d26688668c7dea
hash sha1 parent c33429be94b5f2d3ee9b0adad223f877f174b05d
hash sha1 parent 04b871796dc0420f8e7561a895b52484b701d51a
3. tree 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
parent e094bc809626f0a401a40d75c56df478e546902ff812772c4594265203b23980
parent 1059dab4748aa33b86dad5ca97357bd322abaa558921255623fbddd066bb3315
author A U Thor <author@example.com> 1465982009 +0000
committer C O Mitter <committer@example.com> 1465982009 +0000
gpgsig ...
hash sha1 tree c7b1cff039a93f3600a1d18b82d26688668c7dea
hash sha1 parent c33429be94b5f2d3ee9b0adad223f877f174b05d
hash sha1 parent 04b871796dc0420f8e7561a895b52484b701d51a
sha1-content of signed commits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The sha1-content of a commit with a "gpgsig" header can contain a
"nohash sha1" and "hash sha256" line, no hash or nohash lines, or just
a "hash sha256" line.
Examples:
1. tree c7b1cff039a93f3600a1d18b82d26688668c7dea
parent c33429be94b5f2d3ee9b0adad223f877f174b05d
parent 04b871796dc0420f8e7561a895b52484b701d51a
author A U Thor <author@example.com> 1465982009 +0000
committer C O Mitter <committer@example.com> 1465982009 +0000
gpgsig ...
nohash sha1
hash sha256 tree 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
hash sha256 parent e094bc809626f0a401a40d75c56df478e546902ff812772c4594265203b23980
hash sha256 parent 1059dab4748aa33b86dad5ca97357bd322abaa558921255623fbddd066bb3315
2. tree c7b1cff039a93f3600a1d18b82d26688668c7dea
parent c33429be94b5f2d3ee9b0adad223f877f174b05d
parent 04b871796dc0420f8e7561a895b52484b701d51a
author A U Thor <author@example.com> 1465982009 +0000
committer C O Mitter <committer@example.com> 1465982009 +0000
gpgsig ...
3. tree c7b1cff039a93f3600a1d18b82d26688668c7dea
parent c33429be94b5f2d3ee9b0adad223f877f174b05d
parent 04b871796dc0420f8e7561a895b52484b701d51a
author A U Thor <author@example.com> 1465982009 +0000
committer C O Mitter <committer@example.com> 1465982009 +0000
gpgsig ...
hash sha256 tree 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
hash sha256 parent e094bc809626f0a401a40d75c56df478e546902ff812772c4594265203b23980
hash sha256 parent 1059dab4748aa33b86dad5ca97357bd322abaa558921255623fbddd066bb3315
Converting signed commits
^^^^^^^^^^^^^^^^^^^^^^^^^
To convert the sha1-content of a signed commit to its sha256-content:
1. Change "tree" and "parent" lines to use the sha256-names of
referenced objects, as with unsigned commits.
2. If there is a "mergetag" field, convert it from sha1-content to
sha256-content, as with unsigned commits with a mergetag (see the
"Mergetag" section below).
3. Unless there is a "nohash sha1" line, add a full set of "hash sha1
<field> <sha1>" lines indicating the sha1-names of the tree and
parents.
4. Remove any "hash sha256 <field> <sha256>" lines. If no such lines
were present, add a "nohash sha256" line.
Converting the sha256-content of a signed commit to sha1-content uses
the same process with sha1 and sha256 switched.
Verifying signed commit signatures
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If the commit has a "hash sha1" line (or is sha1-content without a
"nohash sha1" line): check that the signature matches the sha1-content
with gpgsig field stripped out.
Otherwise: check that the signature matches the sha1-content with
gpgsig, nohash, tree, and parents fields stripped out.
With the examples above, the signed payloads are
1. author A U Thor <author@example.com> 1465982009 +0000
committer C O Mitter <committer@example.com> 1465982009 +0000
hash sha256 tree 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
hash sha256 parent e094bc809626f0a401a40d75c56df478e546902ff812772c4594265203b23980
hash sha256 parent 1059dab4748aa33b86dad5ca97357bd322abaa558921255623fbddd066bb3315
2. tree c7b1cff039a93f3600a1d18b82d26688668c7dea
parent c33429be94b5f2d3ee9b0adad223f877f174b05d
parent 04b871796dc0420f8e7561a895b52484b701d51a
author A U Thor <author@example.com> 1465982009 +0000
committer C O Mitter <committer@example.com> 1465982009 +0000
3. tree c7b1cff039a93f3600a1d18b82d26688668c7dea
parent c33429be94b5f2d3ee9b0adad223f877f174b05d
parent 04b871796dc0420f8e7561a895b52484b701d51a
author A U Thor <author@example.com> 1465982009 +0000
committer C O Mitter <committer@example.com> 1465982009 +0000
hash sha1
hash sha256 tree 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
hash sha256 parent e094bc809626f0a401a40d75c56df478e546902ff812772c4594265203b23980
hash sha256 parent 1059dab4748aa33b86dad5ca97357bd322abaa558921255623fbddd066bb3315
Current versions of "git verify-commit" can verify examples (2) and (3)
(but not (1)).
Tags
~~~~
Tags currently have the following sequence of header lines:
"object" SP object-name
"type" SP type
"tag" SP identifier
"tagger" SP ident
A tag's signature, if it exists, is in the message body.
We introduce new header lines "nohash" and "hash" that come after the
"tagger" field. No "nohash" or "hash" lines may appear unless the
message body contains a PGP signature.
As with commits, "nohash" lines have the form "nohash
<hash-function>", indicating that this hash function should not be
used when checking the tag's signature.
"hash" lines have the form
"hash" SP hash-function SP alternate-object-name
This records the pointed-to object name using the indicated hash
function, to be used when checking the tag's signature.
As with commits, "sha1" and "sha256" are the only permitted values of
hash-function and can only appear in that order for a field when they
appear. There can be at most one "nohash" line, and it comes before
any "hash" lines. There can be only one "hash" line for a given hash
function.
sha256-content of signed tags
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The sha256-content of a signed tag can include no "hash" or "nohash"
lines, a "nohash sha256" and "hash sha1 <sha1>" line, or just a "hash
sha1 <sha1>" line.
Examples:
1. object 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
type tree
tag v1.0
tagger C O Mitter <committer@example.com> 1465981006 +0000
Tag Demo v1.0
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
...
2. object 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
type tree
tag v1.0
tagger C O Mitter <committer@example.com> 1465981006 +0000
nohash sha256
hash sha1 c7b1cff039a93f3600a1d18b82d26688668c7dea
Tag Demo v1.0
-----BEGIN PGP SIGNATURE-----
...
3. object 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
type tree
tag v1.0
tagger C O Mitter <committer@example.com> 1465981006 +0000
hash sha1 c7b1cff039a93f3600a1d18b82d26688668c7dea
Tag Demo v1.0
...
sha1-content of signed tags
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The sha1-content of a signed tag can include a "nohash sha1" and "hash
sha256" line, no "nohash" or "hash" lines, or just a "hash sha256
<sha256>" line.
Examples:
1. object c7b1cff039a93f3600a1d18b82d26688668c7dea
...
tagger C O Mitter <committer@example.com> 1465981006 +0000
nohash sha1
hash sha256 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
Tag Demo v1.0
-----BEGIN PGP SIGNATURE-----
...
2. object c7b1cff039a93f3600a1d18b82d26688668c7dea
...
tagger C O Mitter <committer@example.com> 1465981006 +0000
Tag Demo v1.0
-----BEGIN PGP SIGNATURE-----
...
3. object c7b1cff039a93f3600a1d18b82d26688668c7dea
...
tagger C O Mitter <committer@example.com> 1465981006 +0000
hash sha256 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
Tag Demo v1.0
-----BEGIN PGP SIGNATURE-----
...
Signed tags can be converted between sha1-content and sha256-content
using the same process as signed commits.
Verifying signed tags
^^^^^^^^^^^^^^^^^^^^^
As with commits, if the tag has a "hash sha1" (or is sha1-content
without a "nohash sha1" line): check that the signature matches the
sha1-content with PGP signature stripped out.
Otherwise: check that the signature matches the sha1-content with
nohash and object fields and PGP signature stripped out.
Mergetag signatures
~~~~~~~~~~~~~~~~~~~
The mergetag field in the sha1-content of a commit contains the
sha1-content of a tag that was merged by that commit.
The mergetag field in the sha256-content of the same commit contains
the sha256-content of the same tag.
Submodules
~~~~~~~~~~
To convert recorded submodule pointers, you need to have the converted
submodule repository in place. The bidirectional mapping of the
submodule can be used to look up the new hash.
Caveats
-------
Shallow clone and submodules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Because this requires all referenced objects to be available in the
locally generated translation table, this design does not support
shallow clone or unfetched submodules.
Protocol improvements might allow lifting this restriction.
Alternatives considered
-----------------------
Upgrading everyone working on a particular project on a flag day
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Projects like the Linux kernel are large and complex enough that
flipping the switch for all projects based on the repository at once
is infeasible.
Not only would all developers and server operators supporting
developers have to switch on the same flag day, but supporting tooling
(continuous integration, code review, bug trackers, etc) would have to
be adapted as well. This also makes it difficult to get early feedback
from some project participants testing before it is time for mass
adoption.
Using hash functions in parallel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(e.g. https://public-inbox.org/git/22708.8913.864049.452252@chiark.greenend.org.uk/ )
Objects newly created would be addressed by the new hash, but inside
such an object (e.g. commit) it is still possible to address objects
using the old hash function.
* You cannot trust its history (needed for bisectability) in the
future without further work
* Maintenance burden as the number of supported hash functions grows
(they will never go away, so they accumulate). In this proposal, by
comparison, converted objects lose all references to SHA-1 except
where needed to verify signatures.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Another proposed hash function transition plan
2017-03-04 1:12 RFC: Another proposed hash function transition plan Jonathan Nieder
@ 2017-03-05 2:35 ` Linus Torvalds
2017-03-06 0:26 ` brian m. carlson
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2017-03-05 2:35 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Git Mailing List, Stefan Beller, bmwill, jonathantanmy, Jeff King
On Fri, Mar 3, 2017 at 5:12 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> This document is still in flux but I thought it best to send it out
> early to start getting feedback.
This actually looks very reasonable if you can implement it cleanly
enough. In many ways the "convert entirely to a new 256-bit hash" is
the cleanest model, and interoperability was at least my personal
concern. Maybe your model solves it (devil in the details), in which
case I really like it.
I do think that if you end up essentially converting the objects
without really having any true backwards compatibility at the object
layer (just the translation code), you should seriously look at doing
some other changes at the same time. Like not using zlib compression,
it really is very slow.
Btw, I do think the particular choice of hash should still be on the
table. sha-256 may be the obvious first choice, but there are
definitely a few reasons to consider alternatives, especially if it's
a complete switch-over like this.
One is large-file behavior - a parallel (or tree) mode could improve
on that noticeably. BLAKE2 does have special support for that, for
example. And SHA-256 does have known attacks compared to SHA-3-256 or
BLAKE2 - whether that is due to age or due to more effort, I can't
really judge. But if we're switching away from SHA1 due to known
attacks, it does feel like we should be careful.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Another proposed hash function transition plan
2017-03-05 2:35 ` Linus Torvalds
@ 2017-03-06 0:26 ` brian m. carlson
2017-03-06 18:24 ` Brandon Williams
0 siblings, 1 reply; 23+ messages in thread
From: brian m. carlson @ 2017-03-06 0:26 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jonathan Nieder, Git Mailing List, Stefan Beller, bmwill,
jonathantanmy, Jeff King
[-- Attachment #1: Type: text/plain, Size: 1818 bytes --]
On Sat, Mar 04, 2017 at 06:35:38PM -0800, Linus Torvalds wrote:
> On Fri, Mar 3, 2017 at 5:12 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> >
> > This document is still in flux but I thought it best to send it out
> > early to start getting feedback.
>
> This actually looks very reasonable if you can implement it cleanly
> enough. In many ways the "convert entirely to a new 256-bit hash" is
> the cleanest model, and interoperability was at least my personal
> concern. Maybe your model solves it (devil in the details), in which
> case I really like it.
If you think you can do it, I'm all for it.
> Btw, I do think the particular choice of hash should still be on the
> table. sha-256 may be the obvious first choice, but there are
> definitely a few reasons to consider alternatives, especially if it's
> a complete switch-over like this.
>
> One is large-file behavior - a parallel (or tree) mode could improve
> on that noticeably. BLAKE2 does have special support for that, for
> example. And SHA-256 does have known attacks compared to SHA-3-256 or
> BLAKE2 - whether that is due to age or due to more effort, I can't
> really judge. But if we're switching away from SHA1 due to known
> attacks, it does feel like we should be careful.
I agree with Linus on this. SHA-256 is the slowest option, and it's the
one with the most advanced cryptanalysis. SHA-3-256 is faster on 64-bit
machines (which, as we've seen on the list, is the overwhelming majority
of machines using Git), and even BLAKE2b-256 is stronger.
Doing this all over again in another couple years should also be a
non-goal.
--
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFC: Another proposed hash function transition plan
2017-03-06 0:26 ` brian m. carlson
@ 2017-03-06 18:24 ` Brandon Williams
2017-06-15 10:30 ` Which hash function to use, was " Johannes Schindelin
0 siblings, 1 reply; 23+ messages in thread
From: Brandon Williams @ 2017-03-06 18:24 UTC (permalink / raw)
To: brian m. carlson, Linus Torvalds, Jonathan Nieder,
Git Mailing List, Stefan Beller, jonathantanmy, Jeff King
On 03/06, brian m. carlson wrote:
> On Sat, Mar 04, 2017 at 06:35:38PM -0800, Linus Torvalds wrote:
> > On Fri, Mar 3, 2017 at 5:12 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> > >
> > > This document is still in flux but I thought it best to send it out
> > > early to start getting feedback.
> >
> > This actually looks very reasonable if you can implement it cleanly
> > enough. In many ways the "convert entirely to a new 256-bit hash" is
> > the cleanest model, and interoperability was at least my personal
> > concern. Maybe your model solves it (devil in the details), in which
> > case I really like it.
>
> If you think you can do it, I'm all for it.
>
> > Btw, I do think the particular choice of hash should still be on the
> > table. sha-256 may be the obvious first choice, but there are
> > definitely a few reasons to consider alternatives, especially if it's
> > a complete switch-over like this.
> >
> > One is large-file behavior - a parallel (or tree) mode could improve
> > on that noticeably. BLAKE2 does have special support for that, for
> > example. And SHA-256 does have known attacks compared to SHA-3-256 or
> > BLAKE2 - whether that is due to age or due to more effort, I can't
> > really judge. But if we're switching away from SHA1 due to known
> > attacks, it does feel like we should be careful.
>
> I agree with Linus on this. SHA-256 is the slowest option, and it's the
> one with the most advanced cryptanalysis. SHA-3-256 is faster on 64-bit
> machines (which, as we've seen on the list, is the overwhelming majority
> of machines using Git), and even BLAKE2b-256 is stronger.
>
> Doing this all over again in another couple years should also be a
> non-goal.
I agree that when we decide to move to a new algorithm that we should
select one which we plan on using for as long as possible (much longer
than a couple years). While writing the document we simply used
"sha256" because it was more tangible and easier to reference.
> --
> brian m. carlson / brian with sandals: Houston, Texas, US
> +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: https://keybase.io/bk2204
--
Brandon Williams
^ permalink raw reply [flat|nested] 23+ messages in thread
* Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-03-06 18:24 ` Brandon Williams
@ 2017-06-15 10:30 ` Johannes Schindelin
2017-06-15 11:05 ` Mike Hommey
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Johannes Schindelin @ 2017-06-15 10:30 UTC (permalink / raw)
To: Brandon Williams
Cc: brian m. carlson, Linus Torvalds, Jonathan Nieder,
Git Mailing List, Stefan Beller, jonathantanmy, Jeff King,
Junio Hamano
Hi,
I thought it better to revive this old thread rather than start a new
thread, so as to automatically reach everybody who chimed in originally.
On Mon, 6 Mar 2017, Brandon Williams wrote:
> On 03/06, brian m. carlson wrote:
>
> > On Sat, Mar 04, 2017 at 06:35:38PM -0800, Linus Torvalds wrote:
> >
> > > Btw, I do think the particular choice of hash should still be on the
> > > table. sha-256 may be the obvious first choice, but there are
> > > definitely a few reasons to consider alternatives, especially if
> > > it's a complete switch-over like this.
> > >
> > > One is large-file behavior - a parallel (or tree) mode could improve
> > > on that noticeably. BLAKE2 does have special support for that, for
> > > example. And SHA-256 does have known attacks compared to SHA-3-256
> > > or BLAKE2 - whether that is due to age or due to more effort, I
> > > can't really judge. But if we're switching away from SHA1 due to
> > > known attacks, it does feel like we should be careful.
> >
> > I agree with Linus on this. SHA-256 is the slowest option, and it's
> > the one with the most advanced cryptanalysis. SHA-3-256 is faster on
> > 64-bit machines (which, as we've seen on the list, is the overwhelming
> > majority of machines using Git), and even BLAKE2b-256 is stronger.
> >
> > Doing this all over again in another couple years should also be a
> > non-goal.
>
> I agree that when we decide to move to a new algorithm that we should
> select one which we plan on using for as long as possible (much longer
> than a couple years). While writing the document we simply used
> "sha256" because it was more tangible and easier to reference.
The SHA-1 transition *requires* a knob telling Git that the current
repository uses a hash function different from SHA-1.
It would make *a whole of a lot of sense* to make that knob *not* Boolean,
but to specify *which* hash function is in use.
That way, it will be easier to switch another time when it becomes
necessary.
And it will also make it easier for interested parties to use a different
hash function in their infrastructure if they want.
And it lifts part of that burden that we have to consider *very carefully*
which function to pick. We still should be more careful than in 2005, when
Git was born, and when, incidentally, when the first attacks on SHA-1
became known, of course. We were just lucky for almost 12 years.
Now, with Dunning-Kruger in mind, I feel that my degree in mathematics
equips me with *just enough* competence to know just how little *even I*
know about cryptography.
The smart thing to do, hence, was to get involved in this discussion and
act as Lt Tawney Madison between us Git developers and experts in
cryptography.
It just so happens that I work at a company with access to excellent
cryptographers, and as we own the largest Git repository on the planet, we
have a vested interest in ensuring Git's continued success.
After a couple of conversations with a couple of experts who I cannot
thank enough for their time and patience, let alone their knowledge about
this matter, it would appear that we may not have had a complete enough
picture yet to even start to make the decision on the hash function to
use.
From what I read, pretty much everybody who participated in the discussion
was aware that the essential question is: performance vs security.
It turns out that we can have essentially both.
SHA-256 is most likely the best-studied hash function we currently know
about (*maybe* SHA3-256 has been studied slightly more, but only
slightly). All the experts in the field banged on it with multiple sticks
and other weapons. And so far, they only found one weakness that does not
even apply to Git's usage [*1*]. For cryptography experts, this is the
ultimate measure of security: if something has been attacked that
intensely, by that many experts, for that long, with that little effect,
it is the best we got at the time.
And since SHA-256 has become the standard, and more importantly: since
SHA-256 was explicitly designed to allow for relatively inexpensive
hardware acceleration, this is what we will soon have: hardware support in
the form of, say, special CPU instructions. (That is what I meant by: we
can have performance *and* security.)
This is a rather important point to stress, by the way: BLAKE's design is
apparently *not* friendly to CPU instruction implementations. Meaning that
SHA-256 will be faster than BLAKE (and even than BLAKE2) once the Intel
and AMD CPUs with hardware support for SHA-256 become common.
I also heard something really worrisome about BLAKE2 that makes me want to
stay away from it (in addition to the difficulty it poses for hardware
acceleration): to compete in the SHA-3 contest, BLAKE added complexity so
that it would be roughly on par with its competitors. To allow for faster
execution in software, this complexity was *removed* from BLAKE to create
BLAKE2, making it weaker than SHA-256.
Another important point to consider is that SHA-256 implementations are
everywhere. Think e.g. how difficult we would make it on, say, JGit or
go-git if we chose a less common hash function.
As to KangarooTwelve: it has seen substantially less cryptanalysis than
SHA-256 and SHA3-256. That does not necessarily mean that it is weaker,
but it means that we simply cannot know whether it is as strong. On that
basis alone, I would already reject it, and then there are far fewer
implementations, too.
When it comes to choosing SHA-256 vs SHA3-256, I would like to point out
that hardware acceleration is a lot farther in the future than SHA-256
support. And according to the experts I asked, they are roughly equally
secure as far as Git's usage is concerned, even if the SHA-3 contest
provided SHA3-256 with even fiercer cryptanalysis than SHA-256.
In short: my takeaway from the conversations with cryptography experts was
that SHA-256 would be the best choice for now, and that we should make
sure that the next switch is not as painful as this one (read: we should
not repeat the mistake of hard-coding the new hash function into Git as
much as we hard-coded SHA-1 into it).
Ciao,
Johannes
Footnote *1*: SHA-256, as all hash functions whose output is essentially
the entire internal state, are susceptible to a so-called "length
extension attack", where the hash of a secret+message can be used to
generate the hash of secret+message+piggyback without knowing the secret.
This is not the case for Git: only visible data are hashed. The type of
attacks Git has to worry about is very different from the length extension
attacks, and it is highly unlikely that that weakness of SHA-256 leads to,
say, a collision attack.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 10:30 ` Which hash function to use, was " Johannes Schindelin
@ 2017-06-15 11:05 ` Mike Hommey
2017-06-15 13:01 ` Jeff King
2017-06-15 17:36 ` Brandon Williams
2017-06-15 19:13 ` Jonathan Nieder
2 siblings, 1 reply; 23+ messages in thread
From: Mike Hommey @ 2017-06-15 11:05 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Brandon Williams, brian m. carlson, Linus Torvalds,
Jonathan Nieder, Git Mailing List, Stefan Beller, jonathantanmy,
Jeff King, Junio Hamano
On Thu, Jun 15, 2017 at 12:30:46PM +0200, Johannes Schindelin wrote:
> Footnote *1*: SHA-256, as all hash functions whose output is essentially
> the entire internal state, are susceptible to a so-called "length
> extension attack", where the hash of a secret+message can be used to
> generate the hash of secret+message+piggyback without knowing the secret.
> This is not the case for Git: only visible data are hashed. The type of
> attacks Git has to worry about is very different from the length extension
> attacks, and it is highly unlikely that that weakness of SHA-256 leads to,
> say, a collision attack.
What do the experts think or SHA512/256, which completely removes the
concerns over length extension attack? (which I'd argue is better than
sweeping them under the carpet)
Mike
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 11:05 ` Mike Hommey
@ 2017-06-15 13:01 ` Jeff King
2017-06-15 16:30 ` Ævar Arnfjörð Bjarmason
2017-06-15 21:10 ` Mike Hommey
0 siblings, 2 replies; 23+ messages in thread
From: Jeff King @ 2017-06-15 13:01 UTC (permalink / raw)
To: Mike Hommey
Cc: Johannes Schindelin, Brandon Williams, brian m. carlson,
Linus Torvalds, Jonathan Nieder, Git Mailing List, Stefan Beller,
jonathantanmy, Junio Hamano
On Thu, Jun 15, 2017 at 08:05:18PM +0900, Mike Hommey wrote:
> On Thu, Jun 15, 2017 at 12:30:46PM +0200, Johannes Schindelin wrote:
> > Footnote *1*: SHA-256, as all hash functions whose output is essentially
> > the entire internal state, are susceptible to a so-called "length
> > extension attack", where the hash of a secret+message can be used to
> > generate the hash of secret+message+piggyback without knowing the secret.
> > This is not the case for Git: only visible data are hashed. The type of
> > attacks Git has to worry about is very different from the length extension
> > attacks, and it is highly unlikely that that weakness of SHA-256 leads to,
> > say, a collision attack.
>
> What do the experts think or SHA512/256, which completely removes the
> concerns over length extension attack? (which I'd argue is better than
> sweeping them under the carpet)
I don't think it's sweeping them under the carpet. Git does not use the
hash as a MAC, so length extension attacks aren't a thing (and even if
we later wanted to use the same algorithm as a MAC, the HMAC
construction is a well-studied technique for dealing with it).
That said, SHA-512 is typically a little faster than SHA-256 on 64-bit
platforms. I don't know if that will change with the advent of hardware
instructions oriented towards SHA-256.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 13:01 ` Jeff King
@ 2017-06-15 16:30 ` Ævar Arnfjörð Bjarmason
2017-06-15 19:34 ` Johannes Schindelin
2017-06-15 21:10 ` Mike Hommey
1 sibling, 1 reply; 23+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-15 16:30 UTC (permalink / raw)
To: Jeff King
Cc: Mike Hommey, Johannes Schindelin, Brandon Williams,
brian m. carlson, Linus Torvalds, Jonathan Nieder,
Git Mailing List, Stefan Beller, jonathantanmy, Junio Hamano
On Thu, Jun 15 2017, Jeff King jotted:
> On Thu, Jun 15, 2017 at 08:05:18PM +0900, Mike Hommey wrote:
>
>> On Thu, Jun 15, 2017 at 12:30:46PM +0200, Johannes Schindelin wrote:
>> > Footnote *1*: SHA-256, as all hash functions whose output is essentially
>> > the entire internal state, are susceptible to a so-called "length
>> > extension attack", where the hash of a secret+message can be used to
>> > generate the hash of secret+message+piggyback without knowing the secret.
>> > This is not the case for Git: only visible data are hashed. The type of
>> > attacks Git has to worry about is very different from the length extension
>> > attacks, and it is highly unlikely that that weakness of SHA-256 leads to,
>> > say, a collision attack.
>>
>> What do the experts think or SHA512/256, which completely removes the
>> concerns over length extension attack? (which I'd argue is better than
>> sweeping them under the carpet)
>
> I don't think it's sweeping them under the carpet. Git does not use the
> hash as a MAC, so length extension attacks aren't a thing (and even if
> we later wanted to use the same algorithm as a MAC, the HMAC
> construction is a well-studied technique for dealing with it).
>
> That said, SHA-512 is typically a little faster than SHA-256 on 64-bit
> platforms. I don't know if that will change with the advent of hardware
> instructions oriented towards SHA-256.
Quoting my own
CACBZZX7JRA2niwt9wsGAxnzS+gWS8hTUgzWm8NaY1gs87o8xVQ@mail.gmail.com sent
~2 weeks ago to the list:
On Fri, Jun 2, 2017 at 7:54 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
[...]
> 4. When choosing a hash function, people may argue about performance.
> It would be useful for run some benchmarks for git (running
> the test suite, t/perf tests, etc) using a variety of hash
> functions as input to such a discussion.
To the extent that such benchmarks matter, it seems prudent to heavily
weigh them in favor of whatever seems to be likely to be the more
common hash function going forward, since those are likely to get
faster through future hardware acceleration.
E.g. Intel announced Goldmont last year which according to one SHA-1
implementation improved from 9.5 cycles per byte to 2.7 cpb[1]. They
only have acceleration for SHA-1 and SHA-256[2]
1. https://github.com/weidai11/cryptopp/issues/139#issuecomment-264283385
2. https://en.wikipedia.org/wiki/Goldmont
Maybe someone else knows of better numbers / benchmarks, but such a
reduction in CBP likely makes it faster than SHA-512.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 16:30 ` Ævar Arnfjörð Bjarmason
@ 2017-06-15 19:34 ` Johannes Schindelin
2017-06-15 21:59 ` Adam Langley
0 siblings, 1 reply; 23+ messages in thread
From: Johannes Schindelin @ 2017-06-15 19:34 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Jeff King, Mike Hommey, Brandon Williams, brian m. carlson,
Linus Torvalds, Jonathan Nieder, Git Mailing List, Stefan Beller,
jonathantanmy, Junio Hamano
[-- Attachment #1: Type: text/plain, Size: 4489 bytes --]
Hi,
On Thu, 15 Jun 2017, Ævar Arnfjörð Bjarmason wrote:
> On Thu, Jun 15 2017, Jeff King jotted:
>
> > On Thu, Jun 15, 2017 at 08:05:18PM +0900, Mike Hommey wrote:
> >
> >> On Thu, Jun 15, 2017 at 12:30:46PM +0200, Johannes Schindelin wrote:
> >>
> >> > Footnote *1*: SHA-256, as all hash functions whose output is
> >> > essentially the entire internal state, are susceptible to a
> >> > so-called "length extension attack", where the hash of a
> >> > secret+message can be used to generate the hash of
> >> > secret+message+piggyback without knowing the secret. This is not
> >> > the case for Git: only visible data are hashed. The type of attacks
> >> > Git has to worry about is very different from the length extension
> >> > attacks, and it is highly unlikely that that weakness of SHA-256
> >> > leads to, say, a collision attack.
> >>
> >> What do the experts think or SHA512/256, which completely removes the
> >> concerns over length extension attack? (which I'd argue is better than
> >> sweeping them under the carpet)
> >
> > I don't think it's sweeping them under the carpet. Git does not use the
> > hash as a MAC, so length extension attacks aren't a thing (and even if
> > we later wanted to use the same algorithm as a MAC, the HMAC
> > construction is a well-studied technique for dealing with it).
I really tried to drive that point home, as it had been made very clear to
me that the length extension attack is something that Git need not concern
itself.
The length extension attack *only* comes into play when there are secrets
that are hashed. In that case, one would not want others to be able to
produce a valid hash *without* knowing the secrets. And SHA-256 allows to
"reconstruct" the internal state (which is the hash value) in order to
continue at any point, i.e. if the hash for secret+message is known, it is
easy to calculate the hash for secret+message+addition, without knowing
the secret at all.
That is exactly *not* the case with Git. In Git, what we want to hash is
known in its entirety. If the hash value were not identical to the
internal state, it would be easy enough to reconstruct, because *there are
no secrets*.
So please understand that even the direction that the length extension
attack takes is completely different than the direction any attack would
have to take that weakens SHA-256 for Git's purposes. As far as Git's
usage is concerned, SHA-256 has no known weaknesses.
It is *really, really, really* important to understand this before going
on to suggest another hash function such as SHA-512/256 (i.e. SHA-512
truncated to 256 bits), based only on that perceived weakness of SHA-256.
> > That said, SHA-512 is typically a little faster than SHA-256 on 64-bit
> > platforms. I don't know if that will change with the advent of
> > hardware instructions oriented towards SHA-256.
>
> Quoting my own
> CACBZZX7JRA2niwt9wsGAxnzS+gWS8hTUgzWm8NaY1gs87o8xVQ@mail.gmail.com sent
> ~2 weeks ago to the list:
>
> On Fri, Jun 2, 2017 at 7:54 PM, Jonathan Nieder <jrnieder@gmail.com>
> wrote:
> [...]
> > 4. When choosing a hash function, people may argue about performance.
> > It would be useful for run some benchmarks for git (running
> > the test suite, t/perf tests, etc) using a variety of hash
> > functions as input to such a discussion.
>
> To the extent that such benchmarks matter, it seems prudent to heavily
> weigh them in favor of whatever seems to be likely to be the more
> common hash function going forward, since those are likely to get
> faster through future hardware acceleration.
>
> E.g. Intel announced Goldmont last year which according to one SHA-1
> implementation improved from 9.5 cycles per byte to 2.7 cpb[1]. They
> only have acceleration for SHA-1 and SHA-256[2]
>
> 1. https://github.com/weidai11/cryptopp/issues/139#issuecomment-264283385
>
> 2. https://en.wikipedia.org/wiki/Goldmont
>
> Maybe someone else knows of better numbers / benchmarks, but such a
> reduction in CBP likely makes it faster than SHA-512.
Very, very likely faster than SHA-512.
I'd like to stress explicitly that the Intel SHA extensions do *not* cover
SHA-512:
https://en.wikipedia.org/wiki/Intel_SHA_extensions
In other words, once those extensions become commonplace, SHA-256 will be
faster than SHA-512, hands down.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 19:34 ` Johannes Schindelin
@ 2017-06-15 21:59 ` Adam Langley
2017-06-15 22:41 ` brian m. carlson
0 siblings, 1 reply; 23+ messages in thread
From: Adam Langley @ 2017-06-15 21:59 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Ævar Arnfjörð Bjarmason, Jeff King, Mike Hommey,
Brandon Williams, brian m. carlson, Linus Torvalds,
Jonathan Nieder, Git Mailing List, Stefan Beller, Jonathan Tan,
Junio Hamano
(I was asked to comment a few points in public by Jonathan.)
I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
K12, etc are all secure to the extent that I don't believe that making
comparisons between them on that axis is meaningful. Thus I think the
question is primarily concerned with performance and implementation
availability.
I think any of the above would be reasonable choices. I don't believe
that length-extension is a concern here.
SHA-512/256 will be faster than SHA-256 on 64-bit systems in software.
The graph at https://blake2.net/ suggests a 50% speedup on Skylake. On
my Ivy Bridge system, it's about 20%.
(SHA-512/256 does not enjoy the same availability in common libraries however.)
Both Intel and ARM have SHA-256 instructions defined. I've not seen
good benchmarks of them yet, but they will make SHA-256 faster than
SHA-512 when available. However, it's very possible that something
like BLAKE2bp will still be faster. Of course, BLAKE2bp does not enjoy
the ubiquity of SHA-256, but nor do you have to wait years for the CPU
population to advance for high performance.
So, overall, none of these choices should obviously be excluded. The
considerations at this point are not cryptographic and the tradeoff
between implementation ease and performance is one that the git
community would have to make.
Cheers
AGL
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 21:59 ` Adam Langley
@ 2017-06-15 22:41 ` brian m. carlson
2017-06-15 23:36 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 23+ messages in thread
From: brian m. carlson @ 2017-06-15 22:41 UTC (permalink / raw)
To: Adam Langley
Cc: Johannes Schindelin, Ævar Arnfjörð Bjarmason,
Jeff King, Mike Hommey, Brandon Williams, Linus Torvalds,
Jonathan Nieder, Git Mailing List, Stefan Beller, Jonathan Tan,
Junio Hamano
[-- Attachment #1: Type: text/plain, Size: 2755 bytes --]
On Thu, Jun 15, 2017 at 02:59:57PM -0700, Adam Langley wrote:
> (I was asked to comment a few points in public by Jonathan.)
>
> I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
> K12, etc are all secure to the extent that I don't believe that making
> comparisons between them on that axis is meaningful. Thus I think the
> question is primarily concerned with performance and implementation
> availability.
>
> I think any of the above would be reasonable choices. I don't believe
> that length-extension is a concern here.
>
> SHA-512/256 will be faster than SHA-256 on 64-bit systems in software.
> The graph at https://blake2.net/ suggests a 50% speedup on Skylake. On
> my Ivy Bridge system, it's about 20%.
>
> (SHA-512/256 does not enjoy the same availability in common libraries however.)
>
> Both Intel and ARM have SHA-256 instructions defined. I've not seen
> good benchmarks of them yet, but they will make SHA-256 faster than
> SHA-512 when available. However, it's very possible that something
> like BLAKE2bp will still be faster. Of course, BLAKE2bp does not enjoy
> the ubiquity of SHA-256, but nor do you have to wait years for the CPU
> population to advance for high performance.
SHA-256 acceleration exists for some existing Intel platforms already.
However, they're not practically present on anything but servers at the
moment, and so I don't think the acceleration of SHA-256 is a
something we should consider.
The SUPERCOP benchmarks tell me that generally, on 64-bit systems where
acceleration is not available, SHA-256 is the slowest, followed by
SHA3-256. BLAKE2b is the fastest.
If our goal is performance, then I would argue BLAKE2b-256 is the best
choice. It is secure and extremely fast. It does have the benefit that
we get to tell people that by moving away from SHA-1, they will get a
performance boost, pretty much no matter what the system.
BLAKE2bp may be faster, but it introduces additional implementation
complexity. I'm not sure crypto libraries will implement it, but then
again, OpenSSL only implements BLAKE2b-512 at the moment. I don't care
much either way, but we should add good tests to exercise the
implementation thoroughly. We're generally going to need to ship our
own implementation anyway.
I've argued that SHA3-256 probably has the longest life and good
unaccelerated performance, and for that reason, I've preferred it. But
if AGL says that they're all secure (and I generally think he knows
what he's talking about), we could consider performance more.
--
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 22:41 ` brian m. carlson
@ 2017-06-15 23:36 ` Ævar Arnfjörð Bjarmason
2017-06-16 0:17 ` brian m. carlson
0 siblings, 1 reply; 23+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-15 23:36 UTC (permalink / raw)
To: brian m. carlson, Adam Langley, Johannes Schindelin,
Ævar Arnfjörð Bjarmason, Jeff King, Mike Hommey,
Brandon Williams, Linus Torvalds, Jonathan Nieder,
Git Mailing List, Stefan Beller, Jonathan Tan, Junio Hamano
On Fri, Jun 16, 2017 at 12:41 AM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On Thu, Jun 15, 2017 at 02:59:57PM -0700, Adam Langley wrote:
>> (I was asked to comment a few points in public by Jonathan.)
>>
>> I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
>> K12, etc are all secure to the extent that I don't believe that making
>> comparisons between them on that axis is meaningful. Thus I think the
>> question is primarily concerned with performance and implementation
>> availability.
>>
>> I think any of the above would be reasonable choices. I don't believe
>> that length-extension is a concern here.
>>
>> SHA-512/256 will be faster than SHA-256 on 64-bit systems in software.
>> The graph at https://blake2.net/ suggests a 50% speedup on Skylake. On
>> my Ivy Bridge system, it's about 20%.
>>
>> (SHA-512/256 does not enjoy the same availability in common libraries however.)
>>
>> Both Intel and ARM have SHA-256 instructions defined. I've not seen
>> good benchmarks of them yet, but they will make SHA-256 faster than
>> SHA-512 when available. However, it's very possible that something
>> like BLAKE2bp will still be faster. Of course, BLAKE2bp does not enjoy
>> the ubiquity of SHA-256, but nor do you have to wait years for the CPU
>> population to advance for high performance.
>
> SHA-256 acceleration exists for some existing Intel platforms already.
> However, they're not practically present on anything but servers at the
> moment, and so I don't think the acceleration of SHA-256 is a
> something we should consider.
Whatever next-gen hash Git ends up with is going to be in use for
decades, so what hardware acceleration exists in consumer products
right now is practically irrelevant, but what acceleration is likely
to exist for the lifetime of the hash existing *is* relevant.
So I don't follow the argument that we shouldn't weigh future HW
acceleration highly just because you can't easily buy a laptop today
with these features.
Aside from that I think you've got this backwards, it's AMD that's
adding SHA acceleration to their high-end Ryzen chips[1] but Intel is
starting at the lower end this year with Goldmont which'll be in
lower-end consumer devices[2]. If you read the github issue I linked
to upthread[3] you can see that the cryptopp devs already tested their
SHA accelerated code on a consumer Celeron[4] recently.
I don't think Intel has announced the SHA extensions for future Xeon
releases, but it seems given that they're going to have it there as
well. Have there every been x86 extensions that aren't eventually
portable across the entire line, or that they've ended up removing
from x86 once introduced?
In any case, I think by the time we're ready to follow-up the current
hash refactoring efforts with actually changing the hash
implementation many of us are likely to have laptops with these
extensions, making this easy to test.
1. https://en.wikipedia.org/wiki/Intel_SHA_extensions
2. https://en.wikipedia.org/wiki/Goldmont
3. https://github.com/weidai11/cryptopp/issues/139#issuecomment-264283385
4. https://ark.intel.com/products/95594/Intel-Celeron-Processor-J3455-2M-Cache-up-to-2_3-GHz
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 23:36 ` Ævar Arnfjörð Bjarmason
@ 2017-06-16 0:17 ` brian m. carlson
2017-06-16 6:25 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 23+ messages in thread
From: brian m. carlson @ 2017-06-16 0:17 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Adam Langley, Johannes Schindelin, Jeff King, Mike Hommey,
Brandon Williams, Linus Torvalds, Jonathan Nieder,
Git Mailing List, Stefan Beller, Jonathan Tan, Junio Hamano
[-- Attachment #1: Type: text/plain, Size: 3810 bytes --]
On Fri, Jun 16, 2017 at 01:36:13AM +0200, Ævar Arnfjörð Bjarmason wrote:
> On Fri, Jun 16, 2017 at 12:41 AM, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > SHA-256 acceleration exists for some existing Intel platforms already.
> > However, they're not practically present on anything but servers at the
> > moment, and so I don't think the acceleration of SHA-256 is a
> > something we should consider.
>
> Whatever next-gen hash Git ends up with is going to be in use for
> decades, so what hardware acceleration exists in consumer products
> right now is practically irrelevant, but what acceleration is likely
> to exist for the lifetime of the hash existing *is* relevant.
The life of MD5 was about 23 years (introduction to first document
collision). SHA-1 had about 22. Decades, yes, but just barely. SHA-2
was introduced in 2001, and by the same estimate, we're a little over
halfway through its life.
> So I don't follow the argument that we shouldn't weigh future HW
> acceleration highly just because you can't easily buy a laptop today
> with these features.
>
> Aside from that I think you've got this backwards, it's AMD that's
> adding SHA acceleration to their high-end Ryzen chips[1] but Intel is
> starting at the lower end this year with Goldmont which'll be in
> lower-end consumer devices[2]. If you read the github issue I linked
> to upthread[3] you can see that the cryptopp devs already tested their
> SHA accelerated code on a consumer Celeron[4] recently.
>
> I don't think Intel has announced the SHA extensions for future Xeon
> releases, but it seems given that they're going to have it there as
> well. Have there every been x86 extensions that aren't eventually
> portable across the entire line, or that they've ended up removing
> from x86 once introduced?
>
> In any case, I think by the time we're ready to follow-up the current
> hash refactoring efforts with actually changing the hash
> implementation many of us are likely to have laptops with these
> extensions, making this easy to test.
I think you underestimate the life of hardware and software. I have
servers running KVM development instances that have been running since
at least 2012. Those machines are not scheduled for replacement anytime
soon.
Whatever we deploy within the next year is going to run on existing
hardware for probably a decade, whether we want it to or not. Most of
those machines don't have acceleration.
Furthermore, you need a reasonably modern crypto library to get hardware
acceleration. OpenSSL has only recently gained support for it. RHEL 7
does not currently support it, and probably never will. That OS is
going to be around for the next 6 years.
If we're optimizing for performance, I don't want to optimize for the
latest, greatest machines. Those machines are going to outperform
everything else either way. I'd rather optimize for something which
performs well on the whole everywhere. There are a lot of developers
who have older machines, for cost reasons or otherwise.
Here are some stats (cycles/byte for long messages):
SHA-256 BLAKE2b
Ryzen 1.89 3.06
Knight's Landing 19.00 5.65
Cortex-A72 1.99 5.48
Cortex-A57 11.81 5.47
Cortex-A7 28.19 15.16
In other words, BLAKE2b performs well uniformly across a wide variety of
architectures even without acceleration. I'd rather tell people that
upgrading to a new hash algorithm is a performance win either way, not
just if they have the latest hardware.
--
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-16 0:17 ` brian m. carlson
@ 2017-06-16 6:25 ` Ævar Arnfjörð Bjarmason
2017-06-16 13:24 ` Johannes Schindelin
0 siblings, 1 reply; 23+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-16 6:25 UTC (permalink / raw)
To: brian m. carlson
Cc: Adam Langley, Johannes Schindelin, Jeff King, Mike Hommey,
Brandon Williams, Linus Torvalds, Jonathan Nieder,
Git Mailing List, Stefan Beller, Jonathan Tan, Junio Hamano
On Fri, Jun 16 2017, brian m. carlson jotted:
> On Fri, Jun 16, 2017 at 01:36:13AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> On Fri, Jun 16, 2017 at 12:41 AM, brian m. carlson
>> <sandals@crustytoothpaste.net> wrote:
>> > SHA-256 acceleration exists for some existing Intel platforms already.
>> > However, they're not practically present on anything but servers at the
>> > moment, and so I don't think the acceleration of SHA-256 is a
>> > something we should consider.
>>
>> Whatever next-gen hash Git ends up with is going to be in use for
>> decades, so what hardware acceleration exists in consumer products
>> right now is practically irrelevant, but what acceleration is likely
>> to exist for the lifetime of the hash existing *is* relevant.
>
> The life of MD5 was about 23 years (introduction to first document
> collision). SHA-1 had about 22. Decades, yes, but just barely. SHA-2
> was introduced in 2001, and by the same estimate, we're a little over
> halfway through its life.
I'm talking about the lifetime of SHA-1 or $newhash's use in Git. As our
continued use of SHA-1 demonstrates the window of practical hash
function use extends well beyond the window from introduction to
published breakage.
It's also telling that SHA-1, which any cryptographer would have waived
you off from since around 2011, is just getting widely deployed HW
acceleration now in 2017. The practical use of hash functions far
exceeds their recommended use in new projects.
>> So I don't follow the argument that we shouldn't weigh future HW
>> acceleration highly just because you can't easily buy a laptop today
>> with these features.
>>
>> Aside from that I think you've got this backwards, it's AMD that's
>> adding SHA acceleration to their high-end Ryzen chips[1] but Intel is
>> starting at the lower end this year with Goldmont which'll be in
>> lower-end consumer devices[2]. If you read the github issue I linked
>> to upthread[3] you can see that the cryptopp devs already tested their
>> SHA accelerated code on a consumer Celeron[4] recently.
>>
>> I don't think Intel has announced the SHA extensions for future Xeon
>> releases, but it seems given that they're going to have it there as
>> well. Have there every been x86 extensions that aren't eventually
>> portable across the entire line, or that they've ended up removing
>> from x86 once introduced?
>>
>> In any case, I think by the time we're ready to follow-up the current
>> hash refactoring efforts with actually changing the hash
>> implementation many of us are likely to have laptops with these
>> extensions, making this easy to test.
>
> I think you underestimate the life of hardware and software. I have
> servers running KVM development instances that have been running since
> at least 2012. Those machines are not scheduled for replacement anytime
> soon.
>
> Whatever we deploy within the next year is going to run on existing
> hardware for probably a decade, whether we want it to or not. Most of
> those machines don't have acceleration.
To clarify, I'm not dismissing the need to consider existing hardware
without these acceleration functions or future processors without
them. I don't think that makes any sense, we need to keep those in mind.
I was replying to a bit in your comment where you (it seems to me) were
making the claim that we shouldn't consider the HW acceleration of
certain hash functions either.
Clearly both need to be considered.
> Furthermore, you need a reasonably modern crypto library to get hardware
> acceleration. OpenSSL has only recently gained support for it. RHEL 7
> does not currently support it, and probably never will. That OS is
> going to be around for the next 6 years.
>
> If we're optimizing for performance, I don't want to optimize for the
> latest, greatest machines. Those machines are going to outperform
> everything else either way. I'd rather optimize for something which
> performs well on the whole everywhere. There are a lot of developers
> who have older machines, for cost reasons or otherwise.
We have real data showing that the intersection between people who care
about the hash slowing down and those who can't afford the latest
hardware is pretty much nil.
I.e. in 2.13.0 SHA-1 got slower, and pretty much nobody noticed or cared
except Johannes Schindelin, myself & Christian Couder. This is because
in practice hashing only becomes a bottleneck on huge monorepos that
need to e.g. re-hash the contents of a huge index.
> Here are some stats (cycles/byte for long messages):
>
> SHA-256 BLAKE2b
> Ryzen 1.89 3.06
> Knight's Landing 19.00 5.65
> Cortex-A72 1.99 5.48
> Cortex-A57 11.81 5.47
> Cortex-A7 28.19 15.16
>
> In other words, BLAKE2b performs well uniformly across a wide variety of
> architectures even without acceleration. I'd rather tell people that
> upgrading to a new hash algorithm is a performance win either way, not
> just if they have the latest hardware.
Yup, all of those need to be considered, although given my comment above
about big repos a 40% improvement on Ryzen (a processor likely to be
used for big repos) stands out, where are those numbers from, and is
that with or without HW accel for SHA-256 on Ryzen?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-16 6:25 ` Ævar Arnfjörð Bjarmason
@ 2017-06-16 13:24 ` Johannes Schindelin
2017-06-16 17:38 ` Adam Langley
2017-06-16 20:42 ` Jeff King
0 siblings, 2 replies; 23+ messages in thread
From: Johannes Schindelin @ 2017-06-16 13:24 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: brian m. carlson, Adam Langley, Jeff King, Mike Hommey,
Brandon Williams, Linus Torvalds, Jonathan Nieder,
Git Mailing List, Stefan Beller, Jonathan Tan, Junio Hamano
[-- Attachment #1: Type: text/plain, Size: 7027 bytes --]
Hi,
On Fri, 16 Jun 2017, Ævar Arnfjörð Bjarmason wrote:
> On Fri, Jun 16 2017, brian m. carlson jotted:
>
> > On Fri, Jun 16, 2017 at 01:36:13AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >
> >> So I don't follow the argument that we shouldn't weigh future HW
> >> acceleration highly just because you can't easily buy a laptop today
> >> with these features.
> >>
> >> Aside from that I think you've got this backwards, it's AMD that's
> >> adding SHA acceleration to their high-end Ryzen chips[1] but Intel is
> >> starting at the lower end this year with Goldmont which'll be in
> >> lower-end consumer devices[2]. If you read the github issue I linked
> >> to upthread[3] you can see that the cryptopp devs already tested
> >> their SHA accelerated code on a consumer Celeron[4] recently.
> >>
> >> I don't think Intel has announced the SHA extensions for future Xeon
> >> releases, but it seems given that they're going to have it there as
> >> well. Have there every been x86 extensions that aren't eventually
> >> portable across the entire line, or that they've ended up removing
> >> from x86 once introduced?
> >>
> >> In any case, I think by the time we're ready to follow-up the current
> >> hash refactoring efforts with actually changing the hash
> >> implementation many of us are likely to have laptops with these
> >> extensions, making this easy to test.
> >
> > I think you underestimate the life of hardware and software. I have
> > servers running KVM development instances that have been running since
> > at least 2012. Those machines are not scheduled for replacement
> > anytime soon.
> >
> > Whatever we deploy within the next year is going to run on existing
> > hardware for probably a decade, whether we want it to or not. Most of
> > those machines don't have acceleration.
>
> To clarify, I'm not dismissing the need to consider existing hardware
> without these acceleration functions or future processors without them.
> I don't think that makes any sense, we need to keep those in mind.
>
> I was replying to a bit in your comment where you (it seems to me) were
> making the claim that we shouldn't consider the HW acceleration of
> certain hash functions either.
Yes, I also had the impression that it stressed the status quo quite a bit
too much.
We know for a fact that SHA-256 acceleration is coming to consumer CPUs.
We know of no plans for any of the other mentioned hash functions to
hardware-accelerate them in consumer CPUs.
And remember: for those who are affected most (humongous monorepos, source
code hosters), upgrading hardware is less of an issue than having a secure
hash function for the rest of us.
And while I am really thankful that Adam chimed in, I think he would agree
that BLAKE2 is a purposefully weakened version of BLAKE, for the benefit
of speed (with the caveat that one of my experts disagrees that BLAKE2b
would be faster than hardware-accelerated SHA-256). And while BLAKE has
seen roughly equivalent cryptanalysis as Keccak (which became SHA-3),
BLAKE2 has not.
That makes me *very* uneasy about choosing BLAKE2.
> > Furthermore, you need a reasonably modern crypto library to get hardware
> > acceleration. OpenSSL has only recently gained support for it. RHEL 7
> > does not currently support it, and probably never will. That OS is
> > going to be around for the next 6 years.
> >
> > If we're optimizing for performance, I don't want to optimize for the
> > latest, greatest machines. Those machines are going to outperform
> > everything else either way. I'd rather optimize for something which
> > performs well on the whole everywhere. There are a lot of developers
> > who have older machines, for cost reasons or otherwise.
>
> We have real data showing that the intersection between people who care
> about the hash slowing down and those who can't afford the latest
> hardware is pretty much nil.
>
> I.e. in 2.13.0 SHA-1 got slower, and pretty much nobody noticed or cared
> except Johannes Schindelin, myself & Christian Couder. This is because
> in practice hashing only becomes a bottleneck on huge monorepos that
> need to e.g. re-hash the contents of a huge index.
Indeed. I am still concerned about that. As you mention, though, it really
only affects users of ginormous monorepos, and of course source code
hosters.
The jury's still out on how much it impacts my colleagues, by the way.
I have no doubt that Visual Studio Team Services, GitHub and Atlassian
will eventually end up with FPGAs for hash computation. So that's that.
Side note: BLAKE is actually *not* friendly to hardware acceleration, I
have been told by one cryptography expert. In contrast, the Keccak team
claims SHA3-256 to be the easiest to hardware-accelerate, making it "a
green cryptographic primitive":
http://keccak.noekeon.org/is_sha3_slow.html
> > Here are some stats (cycles/byte for long messages):
> >
> > SHA-256 BLAKE2b
> > Ryzen 1.89 3.06
> > Knight's Landing 19.00 5.65
> > Cortex-A72 1.99 5.48
> > Cortex-A57 11.81 5.47
> > Cortex-A7 28.19 15.16
> >
> > In other words, BLAKE2b performs well uniformly across a wide variety of
> > architectures even without acceleration. I'd rather tell people that
> > upgrading to a new hash algorithm is a performance win either way, not
> > just if they have the latest hardware.
>
> Yup, all of those need to be considered, although given my comment above
> about big repos a 40% improvement on Ryzen (a processor likely to be
> used for big repos) stands out, where are those numbers from, and is
> that with or without HW accel for SHA-256 on Ryzen?
When it comes to BLAKE2, I would actually strongly suggest to consider the
amount of attempts to break it. Or rather, how much less attention it got
than, say, SHA-256.
In any case, I have been encouraged to stress the importance of
"crypto-agility", i.e. the ability to switch to another algorithm when the
current one gets broken "enough".
And I am delighted that that is exactly the direction we are going. In
other words, even if I still think (backed up by the experts on whose
knowledge I lean heavily to form my opinions) that SHA-256 would be the
best choice for now, it should be relatively easy to offer BLAKE2b support
for (and by [*1*]) those who want it.
Ciao,
Dscho
Footnote *1*: I say that the support for BLAKE2b should come from those
parties who desire it also because it is not as ubiquituous as SHA-256.
Hence, it would add the burden of having a performant and reasonably
bug-free implementation in Git's source tree. IIUC OpenSSL added BLAKE2b
support only in OpenSSL 1.1.0, the 1.0.2 line (which is still in use in
many places, e.g. Git for Windows' SDK) does not, meaning: Git's
implementation would be the one *everybody* relies on, with *no*
fall-back.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-16 13:24 ` Johannes Schindelin
@ 2017-06-16 17:38 ` Adam Langley
2017-06-16 20:52 ` Junio C Hamano
2017-06-16 20:42 ` Jeff King
1 sibling, 1 reply; 23+ messages in thread
From: Adam Langley @ 2017-06-16 17:38 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Ævar Arnfjörð Bjarmason, brian m. carlson,
Jeff King, Mike Hommey, Brandon Williams, Linus Torvalds,
Jonathan Nieder, Git Mailing List, Stefan Beller, Jonathan Tan,
Junio Hamano
On Fri, Jun 16, 2017 at 6:24 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> And while I am really thankful that Adam chimed in, I think he would agree
> that BLAKE2 is a purposefully weakened version of BLAKE, for the benefit
> of speed
That is correct.
Although worth keeping in mind that the analysis results from the
SHA-3 process informed this rebalancing. Indeed, NIST proposed[1] to
do the same with Keccak before stamping it as SHA-3 (although
ultimately did not in the context of public feeling in late 2013). The
Keccak team have essentially done the same with K12. Thus there is
evidence of a fairly widespread belief that the SHA-3 parameters were
excessively cautious.
[1] https://docs.google.com/file/d/0BzRYQSHuuMYOQXdHWkRiZXlURVE/edit, slide 48
> (with the caveat that one of my experts disagrees that BLAKE2b
> would be faster than hardware-accelerated SHA-256).
The numbers given above for SHA-256 on Ryzen and Cortex-A72 must be
with hardware acceleration and I thank Brian Carlson for digging them
up as I hadn't seen them before.
I suggested above that BLAKE2bp (note the p at the end) might be
faster than hardware SHA-256 and that appears to be plausible based on
benchmarks[2] of that function. (With the caveat those numbers are for
Haswell and Skylake and so cannot be directly compared with Ryzen.)
K12 reports similar speeds on Skylake[3] and thus is also plausibly
faster than hardware SHA-256.
[2] https://github.com/sneves/blake2-avx2
[3] http://keccak.noekeon.org/KangarooTwelve.pdf
However, as I'm not a git developer, I've no opinion on whether the
cost of carrying implementations of these functions is worth the speed
vs using SHA-256, which can be assumed to be supported everywhere
already.
Cheers
AGL
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-16 17:38 ` Adam Langley
@ 2017-06-16 20:52 ` Junio C Hamano
2017-06-16 21:12 ` Junio C Hamano
0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2017-06-16 20:52 UTC (permalink / raw)
To: Adam Langley
Cc: Johannes Schindelin, Ævar Arnfjörð Bjarmason,
brian m. carlson, Jeff King, Mike Hommey, Brandon Williams,
Linus Torvalds, Jonathan Nieder, Git Mailing List, Stefan Beller,
Jonathan Tan
Adam Langley <agl@google.com> writes:
> However, as I'm not a git developer, I've no opinion on whether the
> cost of carrying implementations of these functions is worth the speed
> vs using SHA-256, which can be assumed to be supported everywhere
> already.
Thanks.
My impression from this thread is that even though fast may be
better than slow, ubiquity trumps it for our use case, as long as
the thing is not absurdly and unusably slow, of course. Which makes
me lean towards something older/more established like SHA-256, and
it would be a very nice bonus if it gets hardware acceleration more
widely than others ;-)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-16 20:52 ` Junio C Hamano
@ 2017-06-16 21:12 ` Junio C Hamano
2017-06-16 21:24 ` Jonathan Nieder
0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2017-06-16 21:12 UTC (permalink / raw)
To: Adam Langley
Cc: Johannes Schindelin, Ævar Arnfjörð Bjarmason,
brian m. carlson, Jeff King, Mike Hommey, Brandon Williams,
Linus Torvalds, Jonathan Nieder, Git Mailing List, Stefan Beller,
Jonathan Tan
Junio C Hamano <gitster@pobox.com> writes:
> Adam Langley <agl@google.com> writes:
>
>> However, as I'm not a git developer, I've no opinion on whether the
>> cost of carrying implementations of these functions is worth the speed
>> vs using SHA-256, which can be assumed to be supported everywhere
>> already.
>
> Thanks.
>
> My impression from this thread is that even though fast may be
> better than slow, ubiquity trumps it for our use case, as long as
> the thing is not absurdly and unusably slow, of course. Which makes
> me lean towards something older/more established like SHA-256, and
> it would be a very nice bonus if it gets hardware acceleration more
> widely than others ;-)
Ah, I recall one thing that was mentioned but not discussed much in
the thread: possible use of tree-hashing to exploit multiple cores
hashing a large-ish payload. As long as it is OK to pick a sound
tree hash coding on top of any (secure) underlying hash function,
I do not think the use of tree-hashing should not affect which exact
underlying hash function is to be used, and I also am not convinced
if we really want tree hashing (some codepaths that deal with a large
payload wants to stream the data in single pass from head to tail)
in the context of Git, but I am not a crypto person, so ...
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-16 21:12 ` Junio C Hamano
@ 2017-06-16 21:24 ` Jonathan Nieder
2017-06-16 21:39 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 23+ messages in thread
From: Jonathan Nieder @ 2017-06-16 21:24 UTC (permalink / raw)
To: Junio C Hamano
Cc: Adam Langley, Johannes Schindelin,
Ævar Arnfjörð Bjarmason, brian m. carlson,
Jeff King, Mike Hommey, Brandon Williams, Linus Torvalds,
Git Mailing List, Stefan Beller, Jonathan Tan
Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>> Adam Langley <agl@google.com> writes:
>>> However, as I'm not a git developer, I've no opinion on whether the
>>> cost of carrying implementations of these functions is worth the speed
>>> vs using SHA-256, which can be assumed to be supported everywhere
>>> already.
>>
>> Thanks.
>>
>> My impression from this thread is that even though fast may be
>> better than slow, ubiquity trumps it for our use case, as long as
>> the thing is not absurdly and unusably slow, of course. Which makes
>> me lean towards something older/more established like SHA-256, and
>> it would be a very nice bonus if it gets hardware acceleration more
>> widely than others ;-)
>
> Ah, I recall one thing that was mentioned but not discussed much in
> the thread: possible use of tree-hashing to exploit multiple cores
> hashing a large-ish payload. As long as it is OK to pick a sound
> tree hash coding on top of any (secure) underlying hash function,
> I do not think the use of tree-hashing should not affect which exact
> underlying hash function is to be used, and I also am not convinced
> if we really want tree hashing (some codepaths that deal with a large
> payload wants to stream the data in single pass from head to tail)
> in the context of Git, but I am not a crypto person, so ...
Tree hashing also affects single-core performance because of the
availability of SIMD instructions.
That is how software implementations of e.g. blake2bp-256 and
SHA-256x16[1] are able to have competitive performance with (slightly
better performance than, at least in some cases) hardware
implementations of SHA-256.
It is also satisfying that we have options like these that are faster
than SHA-1.
All that said, SHA-256 seems like a fine choice, despite its worse
performance. The wide availability of reasonable-quality
implementations (e.g. in Java you can use
'MessageDigest.getInstance("SHA-256")') makes it a very tempting one.
Part of the reason I suggested previously that it would be helpful to
try to benchmark Git with various hash functions (which didn't go over
well, for some reason) is that it makes these comparisons more
concrete. Without measuring, it is hard to get a sense of the
distribution of input sizes and how much practical effect the
differences we are talking about have.
Thanks,
Jonathan
[1] https://eprint.iacr.org/2012/476.pdf
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-16 21:24 ` Jonathan Nieder
@ 2017-06-16 21:39 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 23+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-16 21:39 UTC (permalink / raw)
To: Jonathan Nieder
Cc: Junio C Hamano, Adam Langley, Johannes Schindelin,
brian m. carlson, Jeff King, Mike Hommey, Brandon Williams,
Linus Torvalds, Git Mailing List, Stefan Beller, Jonathan Tan
On Fri, Jun 16 2017, Jonathan Nieder jotted:
> Part of the reason I suggested previously that it would be helpful to
> try to benchmark Git with various hash functions (which didn't go over
> well, for some reason) is that it makes these comparisons more
> concrete. Without measuring, it is hard to get a sense of the
> distribution of input sizes and how much practical effect the
> differences we are talking about have.
It would be great to have such benchmarks (I probably missed the "didn't
go over well" part), but FWIW you can get pretty close to this right now
in git by running various t/perf benchmarks with
BLKSHA1/OPENSSL/SHA1DC.
Between the three of those (particularly SHA1DC being slower than
OpenSSL) you get a similar performance difference as some SHA-1
v.s. SHA-256 benchmarks I've seen, so to the extent that we have
existing performance tests it's revealing to see what's slower & faster.
It makes a particularly big difference for e.g. p3400-rebase.sh.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-16 13:24 ` Johannes Schindelin
2017-06-16 17:38 ` Adam Langley
@ 2017-06-16 20:42 ` Jeff King
2017-06-19 9:26 ` Johannes Schindelin
1 sibling, 1 reply; 23+ messages in thread
From: Jeff King @ 2017-06-16 20:42 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Ævar Arnfjörð Bjarmason, brian m. carlson,
Adam Langley, Mike Hommey, Brandon Williams, Linus Torvalds,
Jonathan Nieder, Git Mailing List, Stefan Beller, Jonathan Tan,
Junio Hamano
On Fri, Jun 16, 2017 at 03:24:19PM +0200, Johannes Schindelin wrote:
> I have no doubt that Visual Studio Team Services, GitHub and Atlassian
> will eventually end up with FPGAs for hash computation. So that's that.
I actually doubt this from the GitHub side. Hash performance is not even
on our radar as a bottleneck. In most cases the problem is touching
uncompressed data _at all_, not computing the hash over it (so things
like reusing on-disk deltas are really important).
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-16 20:42 ` Jeff King
@ 2017-06-19 9:26 ` Johannes Schindelin
0 siblings, 0 replies; 23+ messages in thread
From: Johannes Schindelin @ 2017-06-19 9:26 UTC (permalink / raw)
To: Jeff King
Cc: Ævar Arnfjörð Bjarmason, brian m. carlson,
Adam Langley, Mike Hommey, Brandon Williams, Linus Torvalds,
Jonathan Nieder, Git Mailing List, Stefan Beller, Jonathan Tan,
Junio Hamano
Hi Peff,
On Fri, 16 Jun 2017, Jeff King wrote:
> On Fri, Jun 16, 2017 at 03:24:19PM +0200, Johannes Schindelin wrote:
>
> > I have no doubt that Visual Studio Team Services, GitHub and Atlassian
> > will eventually end up with FPGAs for hash computation. So that's
> > that.
>
> I actually doubt this from the GitHub side. Hash performance is not even
> on our radar as a bottleneck. In most cases the problem is touching
> uncompressed data _at all_, not computing the hash over it (so things
> like reusing on-disk deltas are really important).
Thanks for pointing that out! As a mainly client-side person, I rarely get
insights into the server side...
Ciao,
Dscho
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 13:01 ` Jeff King
2017-06-15 16:30 ` Ævar Arnfjörð Bjarmason
@ 2017-06-15 21:10 ` Mike Hommey
2017-06-16 4:30 ` Jeff King
1 sibling, 1 reply; 23+ messages in thread
From: Mike Hommey @ 2017-06-15 21:10 UTC (permalink / raw)
To: Jeff King
Cc: Johannes Schindelin, Brandon Williams, brian m. carlson,
Linus Torvalds, Jonathan Nieder, Git Mailing List, Stefan Beller,
jonathantanmy, Junio Hamano
On Thu, Jun 15, 2017 at 09:01:45AM -0400, Jeff King wrote:
> On Thu, Jun 15, 2017 at 08:05:18PM +0900, Mike Hommey wrote:
>
> > On Thu, Jun 15, 2017 at 12:30:46PM +0200, Johannes Schindelin wrote:
> > > Footnote *1*: SHA-256, as all hash functions whose output is essentially
> > > the entire internal state, are susceptible to a so-called "length
> > > extension attack", where the hash of a secret+message can be used to
> > > generate the hash of secret+message+piggyback without knowing the secret.
> > > This is not the case for Git: only visible data are hashed. The type of
> > > attacks Git has to worry about is very different from the length extension
> > > attacks, and it is highly unlikely that that weakness of SHA-256 leads to,
> > > say, a collision attack.
> >
> > What do the experts think or SHA512/256, which completely removes the
> > concerns over length extension attack? (which I'd argue is better than
> > sweeping them under the carpet)
>
> I don't think it's sweeping them under the carpet. Git does not use the
> hash as a MAC, so length extension attacks aren't a thing (and even if
> we later wanted to use the same algorithm as a MAC, the HMAC
> construction is a well-studied technique for dealing with it).
AIUI, length extension does make brute force collision attacks (which,
really Shattered was) cheaper by allowing one to create the collision
with a small message and extend it later.
This might not be a credible thread against git, but if we go by that
standard, post-shattered Sha-1 is still fine for git. As a matter of
fact, MD5 would also be fine: there is still, to this day, no preimage
attack against them.
Mike
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 21:10 ` Mike Hommey
@ 2017-06-16 4:30 ` Jeff King
0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2017-06-16 4:30 UTC (permalink / raw)
To: Mike Hommey
Cc: Johannes Schindelin, Brandon Williams, brian m. carlson,
Linus Torvalds, Jonathan Nieder, Git Mailing List, Stefan Beller,
jonathantanmy, Junio Hamano
On Fri, Jun 16, 2017 at 06:10:22AM +0900, Mike Hommey wrote:
> > > What do the experts think or SHA512/256, which completely removes the
> > > concerns over length extension attack? (which I'd argue is better than
> > > sweeping them under the carpet)
> >
> > I don't think it's sweeping them under the carpet. Git does not use the
> > hash as a MAC, so length extension attacks aren't a thing (and even if
> > we later wanted to use the same algorithm as a MAC, the HMAC
> > construction is a well-studied technique for dealing with it).
>
> AIUI, length extension does make brute force collision attacks (which,
> really Shattered was) cheaper by allowing one to create the collision
> with a small message and extend it later.
>
> This might not be a credible thread against git, but if we go by that
> standard, post-shattered Sha-1 is still fine for git. As a matter of
> fact, MD5 would also be fine: there is still, to this day, no preimage
> attack against them.
I think collision attacks are of interest to Git. But I would think
2^128 would be enough (TBH, 2^80 probably would have been enough for
SHA-1; it was the weaknesses that brought that down by a factor of a
million that made it a problem).
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 10:30 ` Which hash function to use, was " Johannes Schindelin
2017-06-15 11:05 ` Mike Hommey
@ 2017-06-15 17:36 ` Brandon Williams
2017-06-15 19:20 ` Junio C Hamano
2017-06-15 19:13 ` Jonathan Nieder
2 siblings, 1 reply; 23+ messages in thread
From: Brandon Williams @ 2017-06-15 17:36 UTC (permalink / raw)
To: Johannes Schindelin
Cc: brian m. carlson, Linus Torvalds, Jonathan Nieder,
Git Mailing List, Stefan Beller, jonathantanmy, Jeff King,
Junio Hamano
On 06/15, Johannes Schindelin wrote:
> Hi,
>
> I thought it better to revive this old thread rather than start a new
> thread, so as to automatically reach everybody who chimed in originally.
>
> On Mon, 6 Mar 2017, Brandon Williams wrote:
>
> > On 03/06, brian m. carlson wrote:
> >
> > > On Sat, Mar 04, 2017 at 06:35:38PM -0800, Linus Torvalds wrote:
> > >
> > > > Btw, I do think the particular choice of hash should still be on the
> > > > table. sha-256 may be the obvious first choice, but there are
> > > > definitely a few reasons to consider alternatives, especially if
> > > > it's a complete switch-over like this.
> > > >
> > > > One is large-file behavior - a parallel (or tree) mode could improve
> > > > on that noticeably. BLAKE2 does have special support for that, for
> > > > example. And SHA-256 does have known attacks compared to SHA-3-256
> > > > or BLAKE2 - whether that is due to age or due to more effort, I
> > > > can't really judge. But if we're switching away from SHA1 due to
> > > > known attacks, it does feel like we should be careful.
> > >
> > > I agree with Linus on this. SHA-256 is the slowest option, and it's
> > > the one with the most advanced cryptanalysis. SHA-3-256 is faster on
> > > 64-bit machines (which, as we've seen on the list, is the overwhelming
> > > majority of machines using Git), and even BLAKE2b-256 is stronger.
> > >
> > > Doing this all over again in another couple years should also be a
> > > non-goal.
> >
> > I agree that when we decide to move to a new algorithm that we should
> > select one which we plan on using for as long as possible (much longer
> > than a couple years). While writing the document we simply used
> > "sha256" because it was more tangible and easier to reference.
>
> The SHA-1 transition *requires* a knob telling Git that the current
> repository uses a hash function different from SHA-1.
>
> It would make *a whole of a lot of sense* to make that knob *not* Boolean,
> but to specify *which* hash function is in use.
100% agree on this point. I believe the current plan is to have the
hashing function used for a repository be a repository format extension
which would be a value (most likely a string like 'sha1', 'sha256',
'black2', etc) stored in a repository's .git/config. This way, upon
startup git will die or ignore a repository which uses a hashing
function which it does not recognize or does not compiled to handle.
I hope (and expect) that the end produce of this transition is a nice,
clean hashing API and interface with sufficient abstractions such that
if I wanted to switch to a different hashing function I would just need
to implement the interface with the new hashing function and ensure that
'verify_repository_format' allows the new function.
>
> That way, it will be easier to switch another time when it becomes
> necessary.
>
> And it will also make it easier for interested parties to use a different
> hash function in their infrastructure if they want.
>
> And it lifts part of that burden that we have to consider *very carefully*
> which function to pick. We still should be more careful than in 2005, when
> Git was born, and when, incidentally, when the first attacks on SHA-1
> became known, of course. We were just lucky for almost 12 years.
>
> Now, with Dunning-Kruger in mind, I feel that my degree in mathematics
> equips me with *just enough* competence to know just how little *even I*
> know about cryptography.
>
> The smart thing to do, hence, was to get involved in this discussion and
> act as Lt Tawney Madison between us Git developers and experts in
> cryptography.
>
> It just so happens that I work at a company with access to excellent
> cryptographers, and as we own the largest Git repository on the planet, we
> have a vested interest in ensuring Git's continued success.
>
> After a couple of conversations with a couple of experts who I cannot
> thank enough for their time and patience, let alone their knowledge about
> this matter, it would appear that we may not have had a complete enough
> picture yet to even start to make the decision on the hash function to
> use.
>
--
Brandon Williams
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 17:36 ` Brandon Williams
@ 2017-06-15 19:20 ` Junio C Hamano
0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2017-06-15 19:20 UTC (permalink / raw)
To: Brandon Williams
Cc: Johannes Schindelin, brian m. carlson, Linus Torvalds,
Jonathan Nieder, Git Mailing List, Stefan Beller, jonathantanmy,
Jeff King
Brandon Williams <bmwill@google.com> writes:
>> It would make a whole of a lot of sense to make that knob not Boolean,
>> but to specify which hash function is in use.
>
> 100% agree on this point. I believe the current plan is to have the
> hashing function used for a repository be a repository format extension
> which would be a value (most likely a string like 'sha1', 'sha256',
> 'black2', etc) stored in a repository's .git/config. This way, upon
> startup git will die or ignore a repository which uses a hashing
> function which it does not recognize or does not compiled to handle.
>
> I hope (and expect) that the end produce of this transition is a nice,
> clean hashing API and interface with sufficient abstractions such that
> if I wanted to switch to a different hashing function I would just need
> to implement the interface with the new hashing function and ensure that
> 'verify_repository_format' allows the new function.
Yup. I thought that part has already been agreed upon, but it is a
good thing that somebody is writing it down (perhaps "again", if not
"for the first time").
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Which hash function to use, was Re: RFC: Another proposed hash function transition plan
2017-06-15 10:30 ` Which hash function to use, was " Johannes Schindelin
2017-06-15 11:05 ` Mike Hommey
2017-06-15 17:36 ` Brandon Williams
@ 2017-06-15 19:13 ` Jonathan Nieder
2 siblings, 0 replies; 23+ messages in thread
From: Jonathan Nieder @ 2017-06-15 19:13 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Brandon Williams, brian m. carlson, Linus Torvalds,
Git Mailing List, Stefan Beller, jonathantanmy, Jeff King,
Junio Hamano
Hi Dscho,
Johannes Schindelin wrote:
> From what I read, pretty much everybody who participated in the discussion
> was aware that the essential question is: performance vs security.
I don't completely agree with this framing. The essential question is:
how to get the right security properties without abysmal performance.
> It turns out that we can have essentially both.
>
> SHA-256 is most likely the best-studied hash function we currently know
[... etc ...]
Thanks for a thoughtful restart to the discussion. This is much more
concrete than your previous objections about process, and that is very
helpful.
In the interest of transparency: here are my current questions for
cryptographers to whom I have forwarded this thread. Several of these
questions involve predictions or opinions, so in my ideal world we'd
want multiple, well reasoned answers to them. Please feel free to
forward them to appropriate people or add more.
1. Now it sounds like SHA-512/256 is the safest choice (see also Mike
Hommey's response to Dscho's message). Please poke holes in my
understanding.
2. Would you be willing to weigh in publicly on the mailing list? I
think that would be the most straightforward way to move this
forward (and it would give you a chance to ask relevant questions,
etc). Feel free to contact me privately if you have any questions
about how this particular mailing list works.
3. On the speed side, Dscho states "SHA-256 will be faster than BLAKE
(and even than BLAKE2) once the Intel and AMD CPUs with hardware
support for SHA-256 become common." Do you agree?
4. On the security side, Dscho states "to compete in the SHA-3
contest, BLAKE added complexity so that it would be roughly on par
with its competitors. To allow for faster execution in software,
this complexity was *removed* from BLAKE to create BLAKE2, making
it weaker than SHA-256." Putting aside the historical questions,
do you agree with this "weaker than" claim?
5. On the security side, Dscho states, "The type of attacks Git has to
worry about is very different from the length extension attacks,
and it is highly unlikely that that weakness of SHA-256 leads to,
say, a collision attack", and Jeff King states, "Git does not use
the hash as a MAC, so length extension attacks aren't a thing (and
even if we later wanted to use the same algorithm as a MAC, the
HMAC construction is a well-studied technique for dealing with
it)." Is this correct in spirit? Is SHA-256 equally strong to
SHA-512/256 for Git's purposes, or are the increased bits of
internal state (or other differences) relevant? How would you
compare the two functions' properties?
6. On the speed side, Jeff King states "That said, SHA-512 is
typically a little faster than SHA-256 on 64-bit platforms. I
don't know if that will change with the advent of hardware
instructions oriented towards SHA-256." Thoughts?
7. If the answer to (2) is "no", do I have permission to quote or
paraphrase your replies that were given here?
Thanks, sincerely,
Jonathan
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, back to index
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-04 1:12 RFC: Another proposed hash function transition plan Jonathan Nieder
2017-03-05 2:35 ` Linus Torvalds
2017-03-06 0:26 ` brian m. carlson
2017-03-06 18:24 ` Brandon Williams
2017-06-15 10:30 ` Which hash function to use, was " Johannes Schindelin
2017-06-15 11:05 ` Mike Hommey
2017-06-15 13:01 ` Jeff King
2017-06-15 16:30 ` Ævar Arnfjörð Bjarmason
2017-06-15 19:34 ` Johannes Schindelin
2017-06-15 21:59 ` Adam Langley
2017-06-15 22:41 ` brian m. carlson
2017-06-15 23:36 ` Ævar Arnfjörð Bjarmason
2017-06-16 0:17 ` brian m. carlson
2017-06-16 6:25 ` Ævar Arnfjörð Bjarmason
2017-06-16 13:24 ` Johannes Schindelin
2017-06-16 17:38 ` Adam Langley
2017-06-16 20:52 ` Junio C Hamano
2017-06-16 21:12 ` Junio C Hamano
2017-06-16 21:24 ` Jonathan Nieder
2017-06-16 21:39 ` Ævar Arnfjörð Bjarmason
2017-06-16 20:42 ` Jeff King
2017-06-19 9:26 ` Johannes Schindelin
2017-06-15 21:10 ` Mike Hommey
2017-06-16 4:30 ` Jeff King
2017-06-15 17:36 ` Brandon Williams
2017-06-15 19:20 ` Junio C Hamano
2017-06-15 19:13 ` Jonathan Nieder
Git Mailing List Archive on lore.kernel.org
Archives are clonable:
git clone --mirror https://lore.kernel.org/git/0 git/git/0.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 git git/ https://lore.kernel.org/git \
git@vger.kernel.org
public-inbox-index git
Example config snippet for mirrors
Newsgroup available over NNTP:
nntp://nntp.lore.kernel.org/org.kernel.vger.git
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git