All of lore.kernel.org
 help / color / mirror / Atom feed
* SHA-256 transition
@ 2022-06-20 22:51 Stephen Smith
  2022-06-20 23:13 ` rsbecker
  2022-06-21 10:25 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 21+ messages in thread
From: Stephen Smith @ 2022-06-20 22:51 UTC (permalink / raw)
  To: git

What is the current status of the SHA-1 to SHA-256 transition?   Is the 
transition far enough along that users should start changing over to the new 
format?  





^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: SHA-256 transition
  2022-06-20 22:51 SHA-256 transition Stephen Smith
@ 2022-06-20 23:13 ` rsbecker
  2022-06-21 10:25 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 21+ messages in thread
From: rsbecker @ 2022-06-20 23:13 UTC (permalink / raw)
  To: 'Stephen Smith', 'git'

On June 20, 2022 6:51 PM, Stephen Smith wrote:
>What is the current status of the SHA-1 to SHA-256 transition?   Is the
>transition far enough along that users should start changing over to the
new
>format?

I had the same question at a conference last week. Could not answer it so am
curious about the plan.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-20 22:51 SHA-256 transition Stephen Smith
  2022-06-20 23:13 ` rsbecker
@ 2022-06-21 10:25 ` Ævar Arnfjörð Bjarmason
  2022-06-21 13:18   ` rsbecker
  2022-06-22  0:29   ` brian m. carlson
  1 sibling, 2 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-21 10:25 UTC (permalink / raw)
  To: Stephen Smith; +Cc: git, Jeff King


On Mon, Jun 20 2022, Stephen Smith wrote:

> What is the current status of the SHA-1 to SHA-256 transition?   Is the 
> transition far enough along that users should start changing over to the new 
> format?  

Just my 0.02, not the official project line or anything:

I wouldn't recommend that anyone use it for anything serious at the
moment, as far as I can tell the only users (if any) are currently
(some) people work on git itself.

The status of it is, I think it's fair to say, that it /should/ work
100% (or at least 99.99%?) as far as git itself is concerned.

I.e. you can "init" a SHA-256 repository, all our in-repo tooling
etc. will work with it. We run full CI tests with a SHA-256 test suite,
and it's passing.

But the reason I'd still say "no" on the technical/UX side is:

 * The inter-op between SHA-256 and SHA-1 repositories is still
   nonexistent, except for a one-off import. I.e. we don't have any
   graceful way to migrate an existing repository.

 * For new repositories I think you'll probably want to eventually push
   it to one of the online git hosting providers, none of which (as far
   as I'm aware) support SHA-256 repos.

 * Even if not, any local git tooling that's not part of git.git is
   likely to break, often for trivial reasons like expecting SHA-1 sized
   hashes in the output, but if you start using it for your repositories
   and use such tools you're very likely to be the first person to run
   into bugs in those areas.

But more importantly (and note that these views are definitely *not*
shared by some other project members, so take it with a grain of salt):
There just isn't any compelling selling point to migrate to SHA-256 in
the near or foreseeable future for a given individual user of git.

The reason we started the SHA-1 -> $newhash (it wasn't known that it
would be SHA-256 at the time) was in response to https://shattered.io;
Although it had been discussed before, e.g. the thread starting at [1]
in 2012.

We've since migrated our default hash function from SHA-1 to SHA-1DC
(except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the
SHAttered attack implemented by the same researchers. I'm not aware of a
current viable SHA-1 collision against the variant of SHA-1 that we
actually use these days.

But even assuming for the sake of argument that we were using a much
weaker and easier to break hash (say MD4 or MD5) most users still
wouldn't have much or anything to worry about in practice.

Discovering a hash collision is only the first step in attacking a Git
repository. This aspect has been discussed many times on-list, but
e.g. [3] is one such thread.

The above is really *not* meant to poo-poo the whole notion of switching
to a new hash. We're making good progress on it, although I think the
really hard part UX-wise is left (online migration).

Likewise I'd be really surprised if given the progress of that work the
average Git user isn't going to be using not-SHA-1 with Git in 15-20
years, of it's even still around at that time as a relevant VCS.

But should even advanced git users be spending time on migrating their
data at this point?

No, I don't think so given all of the above, and I really think we
should carefully consider all of the trade-offs involved before
recommending that the average user of git migrate over.

1. https://lore.kernel.org/git/CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com/
2. https://lore.kernel.org/git/cover-0.5-00000000000-20220422T094624Z-avarab@gmail.com/
3. https://lore.kernel.org/git/CACBZZX65Kbp8N9X9UtBfJca7U1T0m-VtKZeKM5q9mhyCR7dwGg@mail.gmail.com/





^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: SHA-256 transition
  2022-06-21 10:25 ` Ævar Arnfjörð Bjarmason
@ 2022-06-21 13:18   ` rsbecker
  2022-06-21 18:14     ` Ævar Arnfjörð Bjarmason
  2022-06-22  0:29   ` brian m. carlson
  1 sibling, 1 reply; 21+ messages in thread
From: rsbecker @ 2022-06-21 13:18 UTC (permalink / raw)
  To: 'Ævar Arnfjörð Bjarmason',
	'Stephen Smith'
  Cc: 'git', 'Jeff King'

On June 21, 2022 6:25 AM, Ævar Arnfjörð Bjarmason wrote:
>On Mon, Jun 20 2022, Stephen Smith wrote:
>
>> What is the current status of the SHA-1 to SHA-256 transition?   Is the
>> transition far enough along that users should start changing over to
>> the new format?
>
>Just my 0.02, not the official project line or anything:
>
>I wouldn't recommend that anyone use it for anything serious at the moment, as
>far as I can tell the only users (if any) are currently
>(some) people work on git itself.
>
>The status of it is, I think it's fair to say, that it /should/ work 100% (or at least
>99.99%?) as far as git itself is concerned.
>
>I.e. you can "init" a SHA-256 repository, all our in-repo tooling etc. will work with it.
>We run full CI tests with a SHA-256 test suite, and it's passing.
>
>But the reason I'd still say "no" on the technical/UX side is:
>
> * The inter-op between SHA-256 and SHA-1 repositories is still
>   nonexistent, except for a one-off import. I.e. we don't have any
>   graceful way to migrate an existing repository.
>
> * For new repositories I think you'll probably want to eventually push
>   it to one of the online git hosting providers, none of which (as far
>   as I'm aware) support SHA-256 repos.
>
> * Even if not, any local git tooling that's not part of git.git is
>   likely to break, often for trivial reasons like expecting SHA-1 sized
>   hashes in the output, but if you start using it for your repositories
>   and use such tools you're very likely to be the first person to run
>   into bugs in those areas.
>
>But more importantly (and note that these views are definitely *not* shared by
>some other project members, so take it with a grain of salt):
>There just isn't any compelling selling point to migrate to SHA-256 in the near or
>foreseeable future for a given individual user of git.
>
>The reason we started the SHA-1 -> $newhash (it wasn't known that it would be
>SHA-256 at the time) was in response to https://shattered.io; Although it had
>been discussed before, e.g. the thread starting at [1] in 2012.
>
>We've since migrated our default hash function from SHA-1 to SHA-1DC (except
>on vanilla OSX, see [2]). It's a variant SHA-1 that detects the SHAttered attack
>implemented by the same researchers. I'm not aware of a current viable SHA-1
>collision against the variant of SHA-1 that we actually use these days.
>
>But even assuming for the sake of argument that we were using a much weaker
>and easier to break hash (say MD4 or MD5) most users still wouldn't have much or
>anything to worry about in practice.
>
>Discovering a hash collision is only the first step in attacking a Git repository. This
>aspect has been discussed many times on-list, but e.g. [3] is one such thread.
>
>The above is really *not* meant to poo-poo the whole notion of switching to a
>new hash. We're making good progress on it, although I think the really hard part
>UX-wise is left (online migration).
>
>Likewise I'd be really surprised if given the progress of that work the average Git
>user isn't going to be using not-SHA-1 with Git in 15-20 years, of it's even still
>around at that time as a relevant VCS.
>
>But should even advanced git users be spending time on migrating their data at
>this point?
>
>No, I don't think so given all of the above, and I really think we should carefully
>consider all of the trade-offs involved before recommending that the average
>user of git migrate over.
>
>1.
>https://lore.kernel.org/git/CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47E
>QDE+DiUQ@mail.gmail.com/
>2. https://lore.kernel.org/git/cover-0.5-00000000000-20220422T094624Z-
>avarab@gmail.com/
>3. https://lore.kernel.org/git/CACBZZX65Kbp8N9X9UtBfJca7U1T0m-
>VtKZeKM5q9mhyCR7dwGg@mail.gmail.com/
>

Adding my own 0.02, what some of us are facing is resistance to adopting git in our or client organizations because of the presence of SHA-1. There are organizations where SHA-1 is blanket banned across the board - regardless of its use. While it is sometimes possible to educate of out the situation, as above, and show that SHA-1 is not really vulnerable except as above, which arguably applies to any hash given enough computing power, and in in-flight communication scenarios and cryptographic use.  Getting around this blanket ban is a serious amount of work and I have very recently seen customers move to older much less functional (or useful) VCS platforms just because of SHA-1.

I also think the comment about git in 15-20 years is a bit concerning if we are making decisions on that basis. Having written code in the mid 1980s that is still alive and relevant today, once processes are put in place, customers are very reluctant to move. I expect git to continue to be relevant for a long time, particularly if it is actively maintained by a motivated team.

IMO, the SHA-1 to SHA-256 (or other hash) migration should receive more attention, which I am willing to give, but I think it requires a deeper discussion. Arguably, if GitHub were to offer SHA-256 repos, I am 99% certain you will see much wider adoption.

--Randall


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-21 13:18   ` rsbecker
@ 2022-06-21 18:14     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-21 18:14 UTC (permalink / raw)
  To: rsbecker; +Cc: 'Stephen Smith', 'git', Jeff King


On Tue, Jun 21 2022, rsbecker@nexbridge.com wrote:

> On June 21, 2022 6:25 AM, Ævar Arnfjörð Bjarmason wrote:
>>On Mon, Jun 20 2022, Stephen Smith wrote:
>>
>>> What is the current status of the SHA-1 to SHA-256 transition?   Is the
>>> transition far enough along that users should start changing over to
>>> the new format?
>>
>>Just my 0.02, not the official project line or anything:
>>
>>I wouldn't recommend that anyone use it for anything serious at the moment, as
>>far as I can tell the only users (if any) are currently
>>(some) people work on git itself.
>>
>>The status of it is, I think it's fair to say, that it /should/ work 100% (or at least
>>99.99%?) as far as git itself is concerned.
>>
>>I.e. you can "init" a SHA-256 repository, all our in-repo tooling etc. will work with it.
>>We run full CI tests with a SHA-256 test suite, and it's passing.
>>
>>But the reason I'd still say "no" on the technical/UX side is:
>>
>> * The inter-op between SHA-256 and SHA-1 repositories is still
>>   nonexistent, except for a one-off import. I.e. we don't have any
>>   graceful way to migrate an existing repository.
>>
>> * For new repositories I think you'll probably want to eventually push
>>   it to one of the online git hosting providers, none of which (as far
>>   as I'm aware) support SHA-256 repos.
>>
>> * Even if not, any local git tooling that's not part of git.git is
>>   likely to break, often for trivial reasons like expecting SHA-1 sized
>>   hashes in the output, but if you start using it for your repositories
>>   and use such tools you're very likely to be the first person to run
>>   into bugs in those areas.
>>
>>But more importantly (and note that these views are definitely *not* shared by
>>some other project members, so take it with a grain of salt):
>>There just isn't any compelling selling point to migrate to SHA-256 in the near or
>>foreseeable future for a given individual user of git.
>>
>>The reason we started the SHA-1 -> $newhash (it wasn't known that it would be
>>SHA-256 at the time) was in response to https://shattered.io; Although it had
>>been discussed before, e.g. the thread starting at [1] in 2012.
>>
>>We've since migrated our default hash function from SHA-1 to SHA-1DC (except
>>on vanilla OSX, see [2]). It's a variant SHA-1 that detects the SHAttered attack
>>implemented by the same researchers. I'm not aware of a current viable SHA-1
>>collision against the variant of SHA-1 that we actually use these days.
>>
>>But even assuming for the sake of argument that we were using a much weaker
>>and easier to break hash (say MD4 or MD5) most users still wouldn't have much or
>>anything to worry about in practice.
>>
>>Discovering a hash collision is only the first step in attacking a Git repository. This
>>aspect has been discussed many times on-list, but e.g. [3] is one such thread.
>>
>>The above is really *not* meant to poo-poo the whole notion of switching to a
>>new hash. We're making good progress on it, although I think the really hard part
>>UX-wise is left (online migration).
>>
>>Likewise I'd be really surprised if given the progress of that work the average Git
>>user isn't going to be using not-SHA-1 with Git in 15-20 years, of it's even still
>>around at that time as a relevant VCS.
>>
>>But should even advanced git users be spending time on migrating their data at
>>this point?
>>
>>No, I don't think so given all of the above, and I really think we should carefully
>>consider all of the trade-offs involved before recommending that the average
>>user of git migrate over.
>>
>>1.
>>https://lore.kernel.org/git/CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47E
>>QDE+DiUQ@mail.gmail.com/
>>2. https://lore.kernel.org/git/cover-0.5-00000000000-20220422T094624Z-
>>avarab@gmail.com/
>>3. https://lore.kernel.org/git/CACBZZX65Kbp8N9X9UtBfJca7U1T0m-
>>VtKZeKM5q9mhyCR7dwGg@mail.gmail.com/
>>
>
> Adding my own 0.02, what some of us are facing is resistance to
> adopting git in our or client organizations because of the presence of
> SHA-1. There are organizations where SHA-1 is blanket banned across
> the board - regardless of its use. While it is sometimes possible to
> educate of out the situation, as above, and show that SHA-1 is not
> really vulnerable except as above, which arguably applies to any hash
> given enough computing power, and in in-flight communication scenarios
> and cryptographic use.  Getting around this blanket ban is a serious
> amount of work and I have very recently seen customers move to older
> much less functional (or useful) VCS platforms just because of SHA-1.

I'm not sure if we're talking past one another, or if you're just using
this thread to raise a tangental topic.

I understood the question to be closer to "is it ready for normal users,
and should we generally recommend it?". Not whether a fully functioning
and integrated into the wider ecosystem git SHA-256 would be useful to
anyone.

Clearly it would be useful to you, but for that question I'd think that
your experience here is one more datapoint in the "it's not really
ready" column.

I.e. if SHA-1 is a pain for you why not just use SHA-256? That's of
course rhetorical, you and I know why you and I are not using it, which
was I was trying to get across here.

> I also think the comment about git in 15-20 years is a bit concerning
> if we are making decisions on that basis. Having written code in the
> mid 1980s that is still alive and relevant today, once processes are
> put in place, customers are very reluctant to move. I expect git to
> continue to be relevant for a long time, particularly if it is
> actively maintained by a motivated team.

I meant that I hope to be using git with SHA-256 in my daily workflow
around that time, at least. I'd probably have been more optimistic in
2017, but it's now been around 5 years since SHAttered and well, here we
are. So big migrations of infrastructure-level software take time.

But even if you read that (which I didn't mean) that we couldn't expect
git to be around by then that probably also wouldn't be such a big
deal.

Plenty of people were fully invested in Subversion around 2003 or so,
and what system were those people using 15 years later in 2018 ? :)

I hope git has more staying power than that, but if it doesn't then it's
probably for the best, as whatever new system will replace it will be
worthwhile enough to justify the migration pain.

> IMO, the SHA-1 to SHA-256 (or other hash) migration should receive
> more attention, which I am willing to give, but I think it requires a
> deeper discussion.

I think the overall state at this point is more "requires work/patches"
than "requires [deeper] discussion". I.e. I think having some
bidirectional mapping of SHA-1<->SHA-256 (as discussed in the hash
transition doc) was up next, and hashing out all the UX issues around
that.

I'm not sure what the state of patches (if any) is that area.

> Arguably, if GitHub were to offer SHA-256 repos, I am 99% certain you
> will see much wider adoption.

I hope you're right, but I'm really not so certain myself.

Even if we and the wider ecosystem magically get 100% of the
technological aspect right I think there'll still be emergent pain from
any such transition that'll outweigh any gains for many existing repos.

E.g. if you'll need to store objects twice for existing clients and
maintain a mapping how is any hosting provider that charges you for
storage space for your repositories going to handle that?

And there'll inevitably be some time of confusion etc. as repositories
are migrated.

Anyone who's gone though e.g. a CVS->SVN->Git migration with a large
organization will know what I mean. A Git->Git migration will be less
painful, but probably never pain-free.

I think it says a lot that the people most concerned about this (and
this may just be my confirmation bias) seem the least familiar with how
any potential issues with SHA-1 might affect Git in particular.

Or, as in your case, people who are at the receiving end of "checklist
compliance" droids :)

Which (and I am partially serious) I wonder if it would help if we
officialy stated that we're simply not using SHA-1 anymore.

Which is the case both in the the mathematical sense
(sha1collisiondetection won't return the same outputs for the same
inputs as "real" SHA-1), and in the sense that actually matters.

I.e. at least part of the urgency with SHA-1 migrations is because of
SHAttered specifically, but not entirely, as it's thought that SHA-1
variants might have other future vulnerabilities.

But that last bit is an area where I'm way less comfortable giving
anybody advice on, so take that with an even bigger grain of salt.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-21 10:25 ` Ævar Arnfjörð Bjarmason
  2022-06-21 13:18   ` rsbecker
@ 2022-06-22  0:29   ` brian m. carlson
  2022-06-23  0:45     ` Stephen Smith
                       ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: brian m. carlson @ 2022-06-22  0:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Stephen Smith, git, Jeff King

[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]

On 2022-06-21 at 10:25:01, Ævar Arnfjörð Bjarmason wrote:
> 
> But the reason I'd still say "no" on the technical/UX side is:
> 
>  * The inter-op between SHA-256 and SHA-1 repositories is still
>    nonexistent, except for a one-off import. I.e. we don't have any
>    graceful way to migrate an existing repository.

True, but that doesn't meant that new repositories couldn't use SHA-256.

>  * For new repositories I think you'll probably want to eventually push
>    it to one of the online git hosting providers, none of which (as far
>    as I'm aware) support SHA-256 repos.

This, in my view, is the only compelling reason not to use it for new
repositories.

>  * Even if not, any local git tooling that's not part of git.git is
>    likely to break, often for trivial reasons like expecting SHA-1 sized
>    hashes in the output, but if you start using it for your repositories
>    and use such tools you're very likely to be the first person to run
>    into bugs in those areas.

It's my hope to see libgit2 working on SHA-256 repositories in the
relatively near future.

> But more importantly (and note that these views are definitely *not*
> shared by some other project members, so take it with a grain of salt):
> There just isn't any compelling selling point to migrate to SHA-256 in
> the near or foreseeable future for a given individual user of git.

I wholly disagree.  SHA-1 is obsolete, and as soon as hosting providers
support SHA-256, all new repositories should be SHA-256.  There is no
other defensible reason to continue to use SHA-1 today.

> The reason we started the SHA-1 -> $newhash (it wasn't known that it
> would be SHA-256 at the time) was in response to https://shattered.io;
> Although it had been discussed before, e.g. the thread starting at [1]
> in 2012.
> 
> We've since migrated our default hash function from SHA-1 to SHA-1DC
> (except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the
> SHAttered attack implemented by the same researchers. I'm not aware of a
> current viable SHA-1 collision against the variant of SHA-1 that we
> actually use these days.

That's true, but that still doesn't let you store the data.  There is
some data that you can't store in a SHA-1 repository, and SHA-1DC is
extremely slow.  Using SHA-256 can make things like indexing packs
substantially faster.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-22  0:29   ` brian m. carlson
@ 2022-06-23  0:45     ` Stephen Smith
  2022-06-23  1:44       ` brian m. carlson
  2022-06-23 22:21     ` Ævar Arnfjörð Bjarmason
  2022-06-24 10:52     ` Jeff King
  2 siblings, 1 reply; 21+ messages in thread
From: Stephen Smith @ 2022-06-23  0:45 UTC (permalink / raw)
  To: brian m. carlson, Ævar Arnfjörð Bjarmason,
	Stephen Smith, git, Jeff King

On Tuesday, June 21, 2022 5:29:59 PM MST brian m. carlson wrote:
> On 2022-06-21 at 10:25:01, Ævar Arnfjörð Bjarmason wrote:
> > But the reason I'd still say "no" on the technical/UX side is:
> >  * The inter-op between SHA-256 and SHA-1 repositories is still
> >  
> >    nonexistent, except for a one-off import. I.e. we don't have any
> >    graceful way to migrate an existing repository.
> 
> True, but that doesn't meant that new repositories couldn't use SHA-256.

So, any idea when a graceful way to migrate a repository might show up?

> 
> >  * For new repositories I think you'll probably want to eventually push
> >  
> >    it to one of the online git hosting providers, none of which (as far
> >    as I'm aware) support SHA-256 repos.
> 
> This, in my view, is the only compelling reason not to use it for new
> repositories.

Which is a reason to send patches by email. 





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-23  0:45     ` Stephen Smith
@ 2022-06-23  1:44       ` brian m. carlson
  2022-06-23 15:32         ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: brian m. carlson @ 2022-06-23  1:44 UTC (permalink / raw)
  To: Stephen Smith; +Cc: Ævar Arnfjörð Bjarmason, git, Jeff King

[-- Attachment #1: Type: text/plain, Size: 1039 bytes --]

On 2022-06-23 at 00:45:40, Stephen Smith wrote:
> On Tuesday, June 21, 2022 5:29:59 PM MST brian m. carlson wrote:
> > On 2022-06-21 at 10:25:01, Ævar Arnfjörð Bjarmason wrote:
> > > But the reason I'd still say "no" on the technical/UX side is:
> > >  * The inter-op between SHA-256 and SHA-1 repositories is still
> > >  
> > >    nonexistent, except for a one-off import. I.e. we don't have any
> > >    graceful way to migrate an existing repository.
> > 
> > True, but that doesn't meant that new repositories couldn't use SHA-256.
> 
> So, any idea when a graceful way to migrate a repository might show up?

I'm hoping that my employer will give me time to work on this in the
future.  Perhaps I'll have more to show on this closer to the last
quarter of the year.

At the moment I happen to be very busy in my personal life, so I'm not
finding a great deal of time to code much of anything.  But if that
changes, I'll try to get back to it.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-23  1:44       ` brian m. carlson
@ 2022-06-23 15:32         ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2022-06-23 15:32 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Stephen Smith, Ævar Arnfjörð Bjarmason, git, Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2022-06-23 at 00:45:40, Stephen Smith wrote:
>> On Tuesday, June 21, 2022 5:29:59 PM MST brian m. carlson wrote:
>> > On 2022-06-21 at 10:25:01, Ævar Arnfjörð Bjarmason wrote:
>> > > But the reason I'd still say "no" on the technical/UX side is:
>> > >  * The inter-op between SHA-256 and SHA-1 repositories is still
>> > >  
>> > >    nonexistent, except for a one-off import. I.e. we don't have any
>> > >    graceful way to migrate an existing repository.
>> > 
>> > True, but that doesn't meant that new repositories couldn't use SHA-256.
>> 
>> So, any idea when a graceful way to migrate a repository might show up?
>
> I'm hoping that my employer will give me time to work on this in the
> future.  Perhaps I'll have more to show on this closer to the last
> quarter of the year.
>
> At the moment I happen to be very busy in my personal life, so I'm not
> finding a great deal of time to code much of anything.  But if that
> changes, I'll try to get back to it.

Great ;-).  Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-22  0:29   ` brian m. carlson
  2022-06-23  0:45     ` Stephen Smith
@ 2022-06-23 22:21     ` Ævar Arnfjörð Bjarmason
  2022-06-24  0:29       ` Kyle Meyer
  2022-06-24  1:03       ` Stephen Smith
  2022-06-24 10:52     ` Jeff King
  2 siblings, 2 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-23 22:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Stephen Smith, git, Jeff King, Kyle Meyer


On Wed, Jun 22 2022, brian m. carlson wrote:

> [[PGP Signed Part:Undecided]]
> On 2022-06-21 at 10:25:01, Ævar Arnfjörð Bjarmason wrote:
>> 
>> But the reason I'd still say "no" on the technical/UX side is:
>> 
>>  * The inter-op between SHA-256 and SHA-1 repositories is still
>>    nonexistent, except for a one-off import. I.e. we don't have any
>>    graceful way to migrate an existing repository.
>
> True, but that doesn't meant that new repositories couldn't use SHA-256.

Indeed, and people who know enough about its state can (and in some
cases probably should) use it.

I took the start of the thread to be a question about the state of the
SHA-1 -> SHA-256 transition, and what we should be generally
recommending to users at this point.

>>  * For new repositories I think you'll probably want to eventually push
>>    it to one of the online git hosting providers, none of which (as far
>>    as I'm aware) support SHA-256 repos.
>
> This, in my view, is the only compelling reason not to use it for new
> repositories.

I think certainly the main one, given most people's workflows around Git
being heavily forge-based .

>>  * Even if not, any local git tooling that's not part of git.git is
>>    likely to break, often for trivial reasons like expecting SHA-1 sized
>>    hashes in the output, but if you start using it for your repositories
>>    and use such tools you're very likely to be the first person to run
>>    into bugs in those areas.
>
> It's my hope to see libgit2 working on SHA-256 repositories in the
> relatively near future.

I was referring to the very long tail of tooling here.

E.g. I use magit with Emacs, and last I checked it would puke on
SHA-256. But checking again it seems someone patched it in January of
this year to e.g. change "{40}" in regexes to "{40,}", so in theory it
should work now (but I didn't try actually using it in that mode).

We even still have UI code shipped as part of git.git itself that only
supports SHA-1, e.g. git-gui's "blame" feature. We were discussing some
patches for that late last year, but they didn't make it in:
https://lore.kernel.org/git/20211011121757.627-1-carenas@gmail.com/

Any individual tool like that isn't critical, but I'd think that a large
long tail of tooling git users are likely to interact with, which for
the most part isn't ready.

I looked at "tig"'s source now, which I only very occasionally use, and
it still has SHA-1 sized constants hardcoded etc...

Of course that's a chicken & egg problem, and at some point we'll need
more brave early adopters. I'm only trying to relay the ground truth of
what the state is now, for someone who might not be aware of the
potential trouble they're getting themselves into.

>> But more importantly (and note that these views are definitely *not*
>> shared by some other project members, so take it with a grain of salt):
>> There just isn't any compelling selling point to migrate to SHA-256 in
>> the near or foreseeable future for a given individual user of git.
>
> I wholly disagree.  SHA-1 is obsolete, and as soon as hosting providers
> support SHA-256, all new repositories should be SHA-256.  There is no
> other defensible reason to continue to use SHA-1 today.

I really don't think we disagree on the need to move away from SHA-1 to
SHA-256. I'm only attempting to summarize the practical threat, and how
users might rightly weight that against other concerns.

NIST deprecated SHA-1 in 2011. I think it's safe given Git's growth that
most people who've used Git started using it after that date, so clearly
there's a large disconnect between official hash algorithm
recommendations and how that translates to practical concerns.

>> The reason we started the SHA-1 -> $newhash (it wasn't known that it
>> would be SHA-256 at the time) was in response to https://shattered.io;
>> Although it had been discussed before, e.g. the thread starting at [1]
>> in 2012.
>> 
>> We've since migrated our default hash function from SHA-1 to SHA-1DC
>> (except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the
>> SHAttered attack implemented by the same researchers. I'm not aware of a
>> current viable SHA-1 collision against the variant of SHA-1 that we
>> actually use these days.
>
> That's true, but that still doesn't let you store the data.  There is
> some data that you can't store in a SHA-1 repository, [...]

I don't think that's come up before, that's correct, but has anyone
wanted to do that? I.e. people aren't generating these collisions
accidentally, they're crafted.

If we did want to store those we could change the hardcoded
-DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 to "1", now it's set up to just die if
it finds a collision, but it could be made to return the "safe hash".

Of course doing so would mean going all-in on SHA1DC, i.e. such a
repository couldn't interop with our optional OpenSSL and other vanilla
SHA-1 backends.

> [...]and SHA-1DC is extremely slow.  Using SHA-256 can make things
> like indexing packs substantially faster.

Yeah, there's a lot of advantages. We could also safely use hardware
acceleration.

Really, I'm not meaning to poo-poo SHA-256 here, just to provide some
summary of the current state a user might expect.

I do think even this is mostly a fringe benefit in practice. I feel that
pain when I e.g. clone chromium.git, but once I pay that one-off cost
it's mostly not a bottleneck you notice on incremental push/fetch. You
pay for it on "repack", but that's in the background for most users.

It sure would make hosting providers happy though...

We have discussed having our cake here & eating it too in the
past. I.e. we could safely use say OpenSSL SHA-1 for "repack" on, as
long as we kept state and only did so for objects reachable from tips
that we'd already validated with SHA-1DC.

I think it's a datapoint that even those of us who've noticed the hash
slowdown have found it painful, but not *that* painful that we've
invested the effort in even relatively low-hanging-fruit workarounds for
the problem.

...

Finally, I'd really like to thank you for all your work on SHA-256 so
far, and really hope that none of what I've said here is discouraging in
any way. This thread has received some attention outside this ML (on
LWN), so I wanted to clarify some of the points above. Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-23 22:21     ` Ævar Arnfjörð Bjarmason
@ 2022-06-24  0:29       ` Kyle Meyer
  2022-06-24  1:03       ` Stephen Smith
  1 sibling, 0 replies; 21+ messages in thread
From: Kyle Meyer @ 2022-06-24  0:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: brian m. carlson, Stephen Smith, git, Jeff King

Ævar Arnfjörð Bjarmason writes:

> E.g. I use magit with Emacs, and last I checked it would puke on
> SHA-256. But checking again it seems someone patched it in January of
> this year to e.g. change "{40}" in regexes to "{40,}", so in theory it
> should work now (but I didn't try actually using it in that mode).

Yeah, I gave it some testing as I made those adjustments [*], but "in
theory it should work" is about my level of confidence too.  If you're
experimenting with SHA-256 repos and find spots where Magit chokes,
opening issues on Magit's side would be very appreciated.

[*] https://github.com/magit/magit/pull/4585

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-23 22:21     ` Ævar Arnfjörð Bjarmason
  2022-06-24  0:29       ` Kyle Meyer
@ 2022-06-24  1:03       ` Stephen Smith
  2022-06-24  1:19         ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 21+ messages in thread
From: Stephen Smith @ 2022-06-24  1:03 UTC (permalink / raw)
  To: brian m. carlson, Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Kyle Meyer

On Thursday, June 23, 2022 3:21:05 PM MST Ævar Arnfjörð Bjarmason wrote:
> Finally, I'd really like to thank you for all your work on SHA-256 so
> far, and really hope that none of what I've said here is discouraging in
> any way. This thread has received some attention outside this ML (on
> LWN), so I wanted to clarify some of the points above. Thanks!

I had looked on LWN before I started the thread to see if anything was being 
discussed and it wasn't.

I tend to be an early adopter.   I hadn't seen any new commits in the main git 
repository in a while and was beginning to wonder if it had been abandoned.   
This thread has convinced me that isn't the case, but the main person doing 
the developing being busy.

I too want to say thank you (Brian) for your hard work.   







^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-24  1:03       ` Stephen Smith
@ 2022-06-24  1:19         ` Ævar Arnfjörð Bjarmason
  2022-06-24 14:42           ` Jonathan Corbet
  0 siblings, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-24  1:19 UTC (permalink / raw)
  To: Stephen Smith; +Cc: brian m. carlson, git, Jeff King, Kyle Meyer


On Thu, Jun 23 2022, Stephen Smith wrote:

> On Thursday, June 23, 2022 3:21:05 PM MST Ævar Arnfjörð Bjarmason wrote:
>> Finally, I'd really like to thank you for all your work on SHA-256 so
>> far, and really hope that none of what I've said here is discouraging in
>> any way. This thread has received some attention outside this ML (on
>> LWN), so I wanted to clarify some of the points above. Thanks!
>
> I had looked on LWN before I started the thread to see if anything was being 
> discussed and it wasn't.

It wouldn't have helped, as I'm referring to LWN having written an
article about this thread that you started :)

It's part of an ongoing series they've had about Git's SHA-256
transition.

Given how LWN makes money I don't know if it's OK to link to it, but
it's easy enough to find and/or subscribe to LWN.

> I tend to be an early adopter.   I hadn't seen any new commits in the main git 
> repository in a while and was beginning to wonder if it had been abandoned.   
> This thread has convinced me that isn't the case, but the main person doing 
> the developing being busy.

It was a good discussion, and I'm happy you started it.

I think I've mentioned in some past discussions that it would be nice to
have some "gitsecurity" user-facing documentation, and one thing such a
thing could include is information that helped users to make an informed
decision about how much (if at all) they should be worrying about issues
arising from what hash they're using Git with.

But some documentation on the questions raised here would also be good,
i.e. "should I use the new hash?", which we could keep somewhat
up-to-date, and e.g. talk about the approximate state of major
third-party software, such as the forges.

Currently the closest thing we have to that is the rather sparse and
scary "THIS OPTION IS EXPERIMENTAL" in git-init(1) when talking about
--object-format=sha256.

> I too want to say thank you (Brian) for your hard work.   

And thank you for using & being interested in git, and contributing to
the ML!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-22  0:29   ` brian m. carlson
  2022-06-23  0:45     ` Stephen Smith
  2022-06-23 22:21     ` Ævar Arnfjörð Bjarmason
@ 2022-06-24 10:52     ` Jeff King
  2022-06-24 15:49       ` Ævar Arnfjörð Bjarmason
  2022-06-25  8:53       ` brian m. carlson
  2 siblings, 2 replies; 21+ messages in thread
From: Jeff King @ 2022-06-24 10:52 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Ævar Arnfjörð Bjarmason, Stephen Smith, git

On Wed, Jun 22, 2022 at 12:29:59AM +0000, brian m. carlson wrote:

> > We've since migrated our default hash function from SHA-1 to SHA-1DC
> > (except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the
> > SHAttered attack implemented by the same researchers. I'm not aware of a
> > current viable SHA-1 collision against the variant of SHA-1 that we
> > actually use these days.
> 
> That's true, but that still doesn't let you store the data.  There is
> some data that you can't store in a SHA-1 repository, and SHA-1DC is
> extremely slow.  Using SHA-256 can make things like indexing packs
> substantially faster.

I'm curious if you have numbers on this. I naively converted linux.git
to sha256 by doing "fast-export | fast-import" (the latter in a sha256
repo, of course, and then both repacked with "-f --window=250" to get
reasonable apples-to-apples packs).

Running "index-pack --verify" on the result takes about the same time
(this is on an 8-core system, hence the real/user differences):

  [sha1dc]
  real	2m43.754s
  user	10m52.452s
  sys	0m36.745s

  [sha256]
  real	2m41.884s
  user	12m23.344s
  sys	0m35.222s

The sha256 repo actually has about 10% fewer objects (I didn't
investigate, but this is perhaps due to cutting out tags and a few other
things to convince fast-export to finish running). I'm not sure about
the extra user time (multicore timings here are funny because of
frequency scaling, so I think the "real" line is more interesting). So
sha256 actually comes out a bit worse here. On the other hand, this is
just using our blk_SHA256 implementation. There may be faster
alternatives (including ones with hardware support).

I wouldn't be at all surprised if the difference isn't substantial in
the long run, though. The repo is on the order of 100GB of object data.
That's a lot to hash, but it's also just a lot to deal with at all (zlib
inflating, applying deltas, etc).

Anyway, this is a pretty rough cut at an experiment. I was mostly
curious if you had done something more advanced, and/or gotten different
results.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-24  1:19         ` Ævar Arnfjörð Bjarmason
@ 2022-06-24 14:42           ` Jonathan Corbet
  0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Corbet @ 2022-06-24 14:42 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Stephen Smith
  Cc: brian m. carlson, git, Jeff King, Kyle Meyer

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> It wouldn't have helped, as I'm referring to LWN having written an
> article about this thread that you started :)
>
> It's part of an ongoing series they've had about Git's SHA-256
> transition.
>
> Given how LWN makes money I don't know if it's OK to link to it, but
> it's easy enough to find and/or subscribe to LWN.

Heh...it's not like it hasn't been widely distributed thus far...:)

  https://lwn.net/SubscriberLink/898522/68ddb300e7eba05d/

Thanks,

jon

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-24 10:52     ` Jeff King
@ 2022-06-24 15:49       ` Ævar Arnfjörð Bjarmason
  2022-06-25  8:53       ` brian m. carlson
  1 sibling, 0 replies; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-24 15:49 UTC (permalink / raw)
  To: Jeff King; +Cc: brian m. carlson, Stephen Smith, git


On Fri, Jun 24 2022, Jeff King wrote:

> On Wed, Jun 22, 2022 at 12:29:59AM +0000, brian m. carlson wrote:
>
>> > We've since migrated our default hash function from SHA-1 to SHA-1DC
>> > (except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the
>> > SHAttered attack implemented by the same researchers. I'm not aware of a
>> > current viable SHA-1 collision against the variant of SHA-1 that we
>> > actually use these days.
>> 
>> That's true, but that still doesn't let you store the data.  There is
>> some data that you can't store in a SHA-1 repository, and SHA-1DC is
>> extremely slow.  Using SHA-256 can make things like indexing packs
>> substantially faster.
>
> I'm curious if you have numbers on this. I naively converted linux.git
> to sha256 by doing "fast-export | fast-import" (the latter in a sha256
> repo, of course, and then both repacked with "-f --window=250" to get
> reasonable apples-to-apples packs).
>
> Running "index-pack --verify" on the result takes about the same time
> (this is on an 8-core system, hence the real/user differences):
>
>   [sha1dc]
>   real	2m43.754s
>   user	10m52.452s
>   sys	0m36.745s
>
>   [sha256]
>   real	2m41.884s
>   user	12m23.344s
>   sys	0m35.222s
>
> The sha256 repo actually has about 10% fewer objects (I didn't
> investigate, but this is perhaps due to cutting out tags and a few other
> things to convince fast-export to finish running). I'm not sure about
> the extra user time (multicore timings here are funny because of
> frequency scaling, so I think the "real" line is more interesting). So
> sha256 actually comes out a bit worse here. On the other hand, this is
> just using our blk_SHA256 implementation. There may be faster
> alternatives (including ones with hardware support).
>
> I wouldn't be at all surprised if the difference isn't substantial in
> the long run, though. The repo is on the order of 100GB of object data.
> That's a lot to hash, but it's also just a lot to deal with at all (zlib
> inflating, applying deltas, etc).
>
> Anyway, this is a pretty rough cut at an experiment. I was mostly
> curious if you had done something more advanced, and/or gotten different
> results.

I haven't checked or verified this, but
https://www.marc-stevens.nl/research/#software claims:

    Counter-cryptanalysis: New improved release SHA-1 collision
    detection library, which protects against twice as many SHA-1 attack
    classes (disturbance vectors), but is 9 times faster than previous
    version. Speed is now 1.87 times normal SHA-1. It is currently used
    among others by Git, GitHub, GMail, Google Drive and Microsoft
    OneDrive.

And looking at the OID you initially imported for sha1dc (and my later
submodule import) we've always had what seems to have been that
performance improvement, which I think (but I didn't have time to
benchmark) is:
https://github.com/cr-marcstevens/sha1collisiondetection/pull/20

*But* there was also this later performance work:
https://github.com/cr-marcstevens/sha1collisiondetection/pull/30; see
also this comment:
https://github.com/cr-marcstevens/sha1collisiondetection/commit/33a694a9ee1b79c24be45f9eab5ac0e1aeeaf271

And then if you look at the sha1collisiondetection repo the latest tag
is stable-v1.0.3, which pre-dates that (but not the original perf work),
and was tagged in 2017. There were a lot of commits since then.

I wasn't able to find any third party package using DC_SHA1_EXTERNAL,
but I wonder if any performance tests with sha1dc in the wild are using
some older version, which from the looks of it might have had a
performance regression on x86...



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-24 10:52     ` Jeff King
  2022-06-24 15:49       ` Ævar Arnfjörð Bjarmason
@ 2022-06-25  8:53       ` brian m. carlson
  2022-06-26  0:09         ` Plan for SHA-256 repos to support SHA-1? Eric W. Biederman
  2022-07-01 18:00         ` SHA-256 transition Jeff King
  1 sibling, 2 replies; 21+ messages in thread
From: brian m. carlson @ 2022-06-25  8:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, Stephen Smith, git

[-- Attachment #1: Type: text/plain, Size: 2602 bytes --]

On 2022-06-24 at 10:52:36, Jeff King wrote:
> On Wed, Jun 22, 2022 at 12:29:59AM +0000, brian m. carlson wrote:
> 
> > > We've since migrated our default hash function from SHA-1 to SHA-1DC
> > > (except on vanilla OSX, see [2]). It's a variant SHA-1 that detects the
> > > SHAttered attack implemented by the same researchers. I'm not aware of a
> > > current viable SHA-1 collision against the variant of SHA-1 that we
> > > actually use these days.
> > 
> > That's true, but that still doesn't let you store the data.  There is
> > some data that you can't store in a SHA-1 repository, and SHA-1DC is
> > extremely slow.  Using SHA-256 can make things like indexing packs
> > substantially faster.
> 
> I'm curious if you have numbers on this. I naively converted linux.git
> to sha256 by doing "fast-export | fast-import" (the latter in a sha256
> repo, of course, and then both repacked with "-f --window=250" to get
> reasonable apples-to-apples packs).

I did the same thing, except I just did a regular gc and not a custom
repack, and I created both a SHA-1 and SHA-256 repo from the same
original.

> Running "index-pack --verify" on the result takes about the same time
> (this is on an 8-core system, hence the real/user differences):
> 
>   [sha1dc]
>   real	2m43.754s
>   user	10m52.452s
>   sys	0m36.745s
> 
>   [sha256]
>   real	2m41.884s
>   user	12m23.344s
>   sys	0m35.222s

Here are my results:

[sha256]
time ~/checkouts/git/git index-pack --verify .git/objects/pack/pack-*.pack
~/checkouts/git/git index-pack --verify .git/objects/pack/pack-*.pack  2768.42s user 181.00s system 185% cpu 26:31.70 total

[sha1dc]
time ~/checkouts/git/git index-pack --verify .git/objects/pack/pack-*.pack
~/checkouts/git/git index-pack --verify .git/objects/pack/pack-*.pack  3041.28s user 184.84s system 199% cpu 26:54.74 total

Note that in my case, I'm using an accelerated hardware-based SHA-256
implementation (Nettle, which I will send a patch for soon).  This is a
brand new ThinkPad X1 Carbon Gen 10 with an i7-1280P (with 20 "cores" of
different sizes).

So this is about 9% faster in terms of total CPU usage on SHA-256 with
that implementation.  The wallclock time is less impressive here.

Of course, it might be slower in software, but considering that AMD has
had SHA-NI for some time, newer Intel processors have it, and ARM also
has SHA-2 acceleration instructions, it's likely it will be faster on
most recent machines assuming it's compiled appropriately.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Plan for SHA-256 repos to support SHA-1?
  2022-06-25  8:53       ` brian m. carlson
@ 2022-06-26  0:09         ` Eric W. Biederman
  2022-06-26  0:27           ` Junio C Hamano
  2022-07-01 18:00         ` SHA-256 transition Jeff King
  1 sibling, 1 reply; 21+ messages in thread
From: Eric W. Biederman @ 2022-06-26  0:09 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Jeff King, Ævar Arnfjörð Bjarmason, Stephen Smith, git


Is there at this point a solid plan for how SHA-256 repos will support
access SHA-1 only clients?

I remember reading a discussion of having a table somewhere that would
translate SHA-256 to SHA-1 when needed.

I had a brainstorm which is probably the uniformed opinion of an
outsider.

I was thinking in server settings that a well-packed pack of all of the
objects is kept to make it quick for git clone to do it's work.

I was thinking perhaps in a repo that wanted to support access from
SHA-1 clients it might makes sense to have three packs instead of the
standard 1.

A pack of all of the blobs with no oid references.  So that either
a SHA-256 or a SHA-1 client could consume it (modulo header changes that
are needed).

The pack of blobs could have both an ordinary SHA-256 index and a SHA-1
index.

Then there could be two packs of metadata (aka trees and commits and
tags that embed oids).  One pack in SHA-256 and one pack in SHA-1.

Then with a little header surgery git clone could be served with
sendfile and gluing the pack of blobs and pack of object together.


In the normal end user client case that is doesn't seem to make a lot of
sense as all that is needed is to figure out which oid to use and always
display SHA-256.

My naivete suggests that just keeping the SHA-1 metadata in a SHA-256
repo could be simple enough to implement that it would allow the
transition to start happening, and it could be optimized away later.

Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Plan for SHA-256 repos to support SHA-1?
  2022-06-26  0:09         ` Plan for SHA-256 repos to support SHA-1? Eric W. Biederman
@ 2022-06-26  0:27           ` Junio C Hamano
  2022-06-26 15:19             ` brian m. carlson
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2022-06-26  0:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: brian m. carlson, Jeff King,
	Ævar Arnfjörð Bjarmason, Stephen Smith, git

On Sat, Jun 25, 2022 at 5:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Is there at this point a solid plan for how SHA-256 repos will support
> access SHA-1 only clients?
>
> I remember reading a discussion of having a table somewhere that would
> translate SHA-256 to SHA-1 when needed.

Documentation/technical/hash-function-transition.txt has flushed out
the necessary details?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Plan for SHA-256 repos to support SHA-1?
  2022-06-26  0:27           ` Junio C Hamano
@ 2022-06-26 15:19             ` brian m. carlson
  0 siblings, 0 replies; 21+ messages in thread
From: brian m. carlson @ 2022-06-26 15:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Eric W. Biederman, Jeff King,
	Ævar Arnfjörð Bjarmason, Stephen Smith, git

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

On 2022-06-26 at 00:27:57, Junio C Hamano wrote:
> On Sat, Jun 25, 2022 at 5:10 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Is there at this point a solid plan for how SHA-256 repos will support
> > access SHA-1 only clients?
> >
> > I remember reading a discussion of having a table somewhere that would
> > translate SHA-256 to SHA-1 when needed.
> 
> Documentation/technical/hash-function-transition.txt has flushed out
> the necessary details?

Yup.  The design there sounds very simple and it is, conceptually, but
practically implementing it is quite complex.

You can pull the in-progress work from transition-interop on my GitHub
remote to see where some of the complexity lies.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: SHA-256 transition
  2022-06-25  8:53       ` brian m. carlson
  2022-06-26  0:09         ` Plan for SHA-256 repos to support SHA-1? Eric W. Biederman
@ 2022-07-01 18:00         ` Jeff King
  1 sibling, 0 replies; 21+ messages in thread
From: Jeff King @ 2022-07-01 18:00 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Ævar Arnfjörð Bjarmason, Stephen Smith, git

On Sat, Jun 25, 2022 at 08:53:53AM +0000, brian m. carlson wrote:

> > I'm curious if you have numbers on this. I naively converted linux.git
> > to sha256 by doing "fast-export | fast-import" (the latter in a sha256
> > repo, of course, and then both repacked with "-f --window=250" to get
> > reasonable apples-to-apples packs).
> 
> I did the same thing, except I just did a regular gc and not a custom
> repack, and I created both a SHA-1 and SHA-256 repo from the same
> original.

That _might_ influence your timings a bit, just because the fast-import
packs have lousy deltas. I think my linux.git was something like 6GB
from fast-import, packed down to 1.5GB after "repack -f".

But I'm not sure if it would change the direction of the trend of what
you were measuring, only the magnitude. We'll hash the same bytes in
either case, but in the fast-import pack we'd spend more time on zlib
inflating and less time on delta reconstruction. Which one is more
expensive probably depends on a lot of factors, but it's entirely
possible that running your test after a "repack -f" would actually show
a greater change between the two cases.

> Here are my results:
> 
> [sha256]
> time ~/checkouts/git/git index-pack --verify .git/objects/pack/pack-*.pack
> ~/checkouts/git/git index-pack --verify .git/objects/pack/pack-*.pack  2768.42s user 181.00s system 185% cpu 26:31.70 total
> 
> [sha1dc]
> time ~/checkouts/git/git index-pack --verify .git/objects/pack/pack-*.pack
> ~/checkouts/git/git index-pack --verify .git/objects/pack/pack-*.pack  3041.28s user 184.84s system 199% cpu 26:54.74 total
> 
> Note that in my case, I'm using an accelerated hardware-based SHA-256
> implementation (Nettle, which I will send a patch for soon).  This is a
> brand new ThinkPad X1 Carbon Gen 10 with an i7-1280P (with 20 "cores" of
> different sizes).

OK, that probably explains the difference in results we saw. Thanks for
sharing your numbers.  I think that's pretty "apples to apples" since
we'd hope that sha256 will eventually be accelerated, but sha1dc never
will be.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-07-01 18:00 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-20 22:51 SHA-256 transition Stephen Smith
2022-06-20 23:13 ` rsbecker
2022-06-21 10:25 ` Ævar Arnfjörð Bjarmason
2022-06-21 13:18   ` rsbecker
2022-06-21 18:14     ` Ævar Arnfjörð Bjarmason
2022-06-22  0:29   ` brian m. carlson
2022-06-23  0:45     ` Stephen Smith
2022-06-23  1:44       ` brian m. carlson
2022-06-23 15:32         ` Junio C Hamano
2022-06-23 22:21     ` Ævar Arnfjörð Bjarmason
2022-06-24  0:29       ` Kyle Meyer
2022-06-24  1:03       ` Stephen Smith
2022-06-24  1:19         ` Ævar Arnfjörð Bjarmason
2022-06-24 14:42           ` Jonathan Corbet
2022-06-24 10:52     ` Jeff King
2022-06-24 15:49       ` Ævar Arnfjörð Bjarmason
2022-06-25  8:53       ` brian m. carlson
2022-06-26  0:09         ` Plan for SHA-256 repos to support SHA-1? Eric W. Biederman
2022-06-26  0:27           ` Junio C Hamano
2022-06-26 15:19             ` brian m. carlson
2022-07-01 18:00         ` SHA-256 transition Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.