All of lore.kernel.org
 help / color / mirror / Atom feed
From: dwh@linuxprogrammer.org
To: Junio C Hamano <gitster@pobox.com>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	git@vger.kernel.org
Subject: Re: Is the sha256 object format experimental or not?
Date: Thu, 13 May 2021 16:26:14 -0700	[thread overview]
Message-ID: <20210513232614.GF11882@localhost> (raw)
In-Reply-To: <xmqqo8de9wis.fsf@gitster.g>

On 14.05.2021 06:03, Junio C Hamano wrote:
>dwh@linuxprogrammer.org writes:
>
>> I think Git should externalize the calculation of object digests just
>> like it externalizes the calcualtion of object digital signatures.
>
>The hashing algorithms used to generate object names has
>requirements fundamentally different from that of digital
>signatures.  I strongly suspect that that fact would change the
>equation when you rethink what you said above.

I agree with you. Object names are exactly that: names. Names for
resources/data must be persistent, as well as global in scope and
uniqueness, and autonomously assigned. What this means is that once an
object has a name, that name shall never change as long as the object
remains unchanged. The names must be unique in the scope of all objects
(e.g. all copies of a repo) and generated without coordination.

Calculating object names using a digest algorithm meets all of these
requirements. Choosing a strong digest algorithm creates a strong
cryptographic binding between the name and the object contents. Using
self-describing digests allows for a repo to switch digest algorithms at
arbitrary points in the history.

I think that objects named with SHA1 digests should remain named with
the SHA1 digest. I do *not* advocate going back and rewriting history
to change all of the object names to a digest with a different
algorithm. Git is a provenance log and history matters. I recommend
preserving all existing names, even if they were created with known-weak
digest algorithms, and making the change to a new algorithm at a
specific point in time (e.g. at a tag). Using self-describing digest
encoding and externalizing digest calculation future-proofs
repositories and allows for preservation of history while allowing
algorithm agility.

To illustrate my point, I envision that a repos could have a history
like this:

object 2923f6fa36614586ea09b4424b438915cc1b9b67 (naked SHA1)
  |
<many objects named with SHA1>
  |
object 5f167fb6b3e96273b564fff0b041fb94fee4d3de (naked SHA1)
  |
<modify Git to ext. digest calculation and self-desc encoding>
  |
object 98c2e1c0965e60b0f137577ac5dd0a5c96ce224d (naked SHA1)
  |
<many objects named with SHA1>
  |
<a project decides to switch to SHA2-256, maybe marked in a tag>
  |
object IAOdLVxteOxQwKa-xn8yCBUkuPkjAqcuQ2V7fKAlao8o (self-desc.SHA2-256)
  |
<many objects named with self-describing SHA2-256 digests>
  |
<a project decices to switch to SHA3-256, maybe marked in a tag>
  |
object EK832G0PFhBFf-Dfgr205UKpUMqmVXJX9ltLwQo4Awct (self-desc.SHA3-256)
  |
<many objects named with self-descring SHA3-256 digests>
  .
  .
  .

Neither decision to switch to SHA2-256 nor to SHA3-256 would require any
code changes. If we continue down the current SHA-256 road, we will have
to repeat that multi-year effort in the future to switch to SHA3 or
something else. Most importantly, the choice of digest algorithm would
be left up to the maintainers of a given repo and not limited to the
algorithms we have hard coded into Git.

Brian's work on the SHA-256 switch is valuable. We can leverage a lot of
it to switch to externalized digest calculation and self-describing
digests and never have to worry about doing that again.

Cheers!
Dave

  reply	other threads:[~2021-05-13 23:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-08  2:22 Preserving the ability to have both SHA1 and SHA256 signatures dwh
2021-05-08  6:39 ` Christian Couder
2021-05-08  6:56   ` Junio C Hamano
2021-05-08  8:03     ` Felipe Contreras
2021-05-08 10:11       ` Stefan Moch
2021-05-08 11:12         ` Junio C Hamano
2021-05-09  0:19 ` brian m. carlson
2021-05-10 12:22   ` Is the sha256 object format experimental or not? Ævar Arnfjörð Bjarmason
2021-05-10 22:42     ` brian m. carlson
2021-05-13 20:29       ` dwh
2021-05-13 20:49         ` Konstantin Ryabitsev
2021-05-13 23:47           ` dwh
2021-05-14 13:45             ` Konstantin Ryabitsev
2021-05-14 17:39               ` dwh
2021-05-13 21:03         ` Junio C Hamano
2021-05-13 23:26           ` dwh [this message]
2021-05-14  8:49           ` Ævar Arnfjörð Bjarmason
2021-05-14 18:10             ` dwh
2021-05-18  5:32         ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210513232614.GF11882@localhost \
    --to=dwh@linuxprogrammer.org \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.