Cryptographic hash agnostic git (was: Re: Preserving the ability to have both SHA1 and SHA256 signatures)

From: Bagas Sanjaya <bagasdotme@gmail.com>
To: Felipe Contreras <felipe.contreras@gmail.com>,
	Personal Sam Smith <sam@samuelsmith.org>,
	dwh@linuxprogrammer.org
Cc: git@vger.kernel.org
Subject: Cryptographic hash agnostic git (was: Re: Preserving the ability to have both SHA1 and SHA256 signatures)
Date: Mon, 17 May 2021 13:49:06 +0700	[thread overview]
Message-ID: <649946ac-b080-7f8b-3777-0be6556547e9@gmail.com> (raw)
In-Reply-To: <60a1e1cc5b8b6_11206d20830@natae.notmuch>

On 17/05/21 10.23, Felipe Contreras wrote:
> Personal Sam Smith wrote:
>> One of the essential properties of any good cryptographic system is
>> what is called cryptographic algorithm agility. Without it the system
>> cannot easily adapt to new attacks and newly discovered weaknesses in
>> cryptographic algorithms. Self-describing cryptographic primitives are
>> the most convenient enabler for cryptographic agility. One advantage
>> of signed hash chained provenance logs is that the whole log must be
>> compromised not merely one part of it. Such a log that exhibits
>> agility especially through self-describing primitives is self-healing
>> in sense that new appendages to the log may use stronger crypto
>> primitives which protect earlier entries in the log that use weaker
>> primitives. This makes the log (or any such agile self-describing
>> verifiable data structure) future proof. It is the best practice for
>> designing distributed (over the internet) zero trust computing
>> applications.
> 
> This is way above my pay grade, but let me try to interpret the above.
> 
> If we have a repository with two digest algorithms:
> 
>   2. BLAKE2b (considered non-compromised)
>   1. SHA-1 (broken)
> 
> We may not be confident on the SHA-1 history (1), but as long as we have
> BLAKE2b history (2), we can be confident on that.
> 
> The delta between when SHA-1 was broken, and the switch to BLAKE2b
> happened, is when the repository could be potentially compromised.
> 
> So, it's in the best interest of the repository owners to switch to the
> non-compromised version as soon as possible. In fact, it would be better
> if the switch happened *BEFORE* SHA-1 was broken.
> 
> This is why algorithm agility is important.
> 
> 
> But this is not sufficient, because BLAKE2b could get
> compromised in the future. The repository owners need to be thinking
> ahead to the time, to when they'll need to make yet another algorithm
> switch.
> 
> When such times comes, they need their infrastructure to be able to
> perform the switch as fast as possible. If possible right after they've
> finalized their decision.
> 
> 
> So, if I can summarize your and dwh's proposal: git should be
> cryptographic-digest-algorithm-agnostic.
>

But SHA-256 support on Git is still on progress, unfortunately. What if
on someday SHA-1 is broken completely, and we're still not yet switch to
stronger hashes?

Anyway, beside SHA-256 and BLAKE2b, there is also SHA-3 family, which
supports from 224 bits to 512 bits. If Git wants to support SHA-3 hashed
objects, which bit length should we use? I prefer 256 bits, because it's
a nice trade-off between performance (speed) and security (resistance).

> 
> So far this makes sense to me.
> 
> The only problem comes when you consider day-to-day operations, which to
> be honest have been totally uninterrupted by 15 years of using SHA-1.
> 
> At this point it's worth noting that if the git project has a maxim, it
> would be a single word: "performance". Nothing else matters.
> 
> So, if you suggest to switch from SHA-1 to SHA-256, that's fine; as long
> as you can guarantee that *performance* is not affected. This is the
> work brian m. carlson seems to have been doing.
> 
> On the other hand what dwh seemed to suggest is to support every digest
> algorithm on the horizon--without regards of how that would affect
> performance--and as expected that didn't land very smoothly.
> 
> 
> But I don't think the two approaches are incompatible.
> 
> All we have to do is reconcile two facts:
> 
>    1. The ability for users to switch to a new digest is important
>    2. We don't want users to be switching algorithms every other commit
> 
> If git can switch the digest algorithm on a per-repository basis, I
> don't think anybody would have a problem with that.
> 
> Git could support SHA-1, SHA-256, and BLAKE2b as of today. The
> repository owners can decide wich algorithm to choose today, and their
> past history would not be affected.
> 

In reality, many users just use Git that is packaged by the distribution,
and depending on release version of the distro, it can be older than
recent. So we need to also consider that.

> This is future-proof, and would make repository owners be able to make
> that decision, not git.
> 
> If at some point in the future people want to start to get ready for
> SHA-4, that could be introduced to the git core, *before* people want to
> make such transition, and *after* the project has made sure such change
> does not impact on performance.
> 
> Or am I missing something?
> 
> Cheers.
> 

Another remark: currently we roll-out hash algorithms on our own, but
industry best practices said that we should instead use third-party libraries
to do the job (OpenSSL or similar).

The problem of offloading hash algorithm implementation to third-party
libraries is some (or most) distributions camp specific version of library
for usage for several years, with only (backported) bugfix updates are added.
This make algorithm agility is more resistant to do, because we must wait
until ALL distributions supported our objective algorithms in their libraries.

-- 
An old man doll... just what I always wanted! - Clara