archive mirror
 help / color / mirror / Atom feed
To: "Ævar Arnfjörð Bjarmason" <>
Cc: Junio C Hamano <>,
	"brian m. carlson" <>,
Subject: Re: Is the sha256 object format experimental or not?
Date: Fri, 14 May 2021 11:10:09 -0700	[thread overview]
Message-ID: <20210514181009.GB16542@localhost> (raw)
In-Reply-To: <>

On 14.05.2021 10:49, Ævar Arnfjörð Bjarmason wrote:
>I agree insofar that I don't see a good reason for us to support some
>plethora of hash algorithms, but I wouldn't have objections to adding
>more if people find them useful for some reason. See e.g. [1] for an

I think Git should not try to do any cryptographic operations at all and
rely on external tools that are implemented properly and hardended.
Implementing cryptography isn't just about translating the algorithm
into code but also getting memory security correct, file handling
correct, input security correct, control flow correct (equal cost
multi-path), etc, etc. Most of the cryptography libraries aren't
designed to be misuse resistant. The only one I know of that has that as
a top-line requirement is Hyperledger Ursa [1].

I would like to see us remove all cryptography code (e.g. digests,
digital signatures, etc) from Git and rely on external tools entirely.
If we store the cryptographic material in a self-describing format that
identifies the associated tool as well as the cryptographic data, then
Git can be completely agnostic.

>But I really don't see how anything you've said would present a
>technical hurdle once we have SHA-1<->SHA-256 interop in a good enough
>state. At that point we'll support re-hashing on arrival of content
>hashed with algorithm X into Y, with a local lookup table between X<=>Y.
>So if somebody wants to maintain content hashed with algorithm Z locally
>we should easily be able to support that. The "diversity of naming"
>won't matter past that local repository, any mention of Z will be
>translated to X or Y on fetch/push.

Using self-describing formats allows us to honor history and keep old
object names as they and eliminate all of this added complications you
describe. I think there is a lot of room for errors to creep in when
collaborators have copies of the same repo and they have local mappings
between different hashing algorithms. How is this not setting up for a
combinatorial explosion of data? If the canonical repo uses SHA1 and one
contributor uses SHA2-512, another uses Blake2b-256, and yet another
uses SHA3-384, won't they all have to maintain six different translation
tables for all objects? SHA1 <=> SHA2-512, SHA1 <=> Blake2b-256, SHA1
<=> SHA3-384, SHA2-512 <=> Blake2b-256, SHA2-512 <=> SHA3-384, and
Blake2b-256 <=> SHA3-384? I guess that's your motivation for not
allowing algorithmic agility.

The way around this is to use self-describing formats and external
tools. Git repo copies wouldn't be required to have only *one* algorithm
naming all objects, requiring the translation tables. Instead Git repos
would/could have heterogeneous object names, each one with a single name
generated with a different digest algorithm. Git would simply consider
those names as plain strings and validating those strings requires
talking to the correct external tool, sending the name string and the
object data and reading back the result.

I think this is a much better approach because:

1. It creates algorithmic agility in a way that isn't top-down and heavy

2. It eliminates the need for all of the translation tables and
round-tripping complexity.

3. It empowers maintainers to decide which algorithms can/must be used
when naming objcts in a given repo. Merge hooks, CI/CD checks and
etiquette guides can be used to enforce this.

4. Git's attack surface becomes smaller (a very good thing) and limited
to doing IPC to external tools correctly and securely (easy) instead of
trying to get cryptography client code correct (very difficult).

One other thing to consider is that there are new tools being developed
that do similar things as Git that do have algorithmic agility and use
self-describing cryptographic primitives. Late-binding trust is now a
best practice and has been for quite some time. Many people rely upon
Git and I think we should keep up with the best practices.


  reply	other threads:[~2021-05-14 18:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-08  2:22 Preserving the ability to have both SHA1 and SHA256 signatures dwh
2021-05-08  6:39 ` Christian Couder
2021-05-08  6:56   ` Junio C Hamano
2021-05-08  8:03     ` Felipe Contreras
2021-05-08 10:11       ` Stefan Moch
2021-05-08 11:12         ` Junio C Hamano
2021-05-09  0:19 ` brian m. carlson
2021-05-10 12:22   ` Is the sha256 object format experimental or not? Ævar Arnfjörð Bjarmason
2021-05-10 22:42     ` brian m. carlson
2021-05-13 20:29       ` dwh
2021-05-13 20:49         ` Konstantin Ryabitsev
2021-05-13 23:47           ` dwh
2021-05-14 13:45             ` Konstantin Ryabitsev
2021-05-14 17:39               ` dwh
2021-05-13 21:03         ` Junio C Hamano
2021-05-13 23:26           ` dwh
2021-05-14  8:49           ` Ævar Arnfjörð Bjarmason
2021-05-14 18:10             ` dwh [this message]
2021-05-18  5:32         ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210514181009.GB16542@localhost \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).