* [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
@ 2020-08-14 12:21 ` Martin Ågren
2020-08-14 17:28 ` Junio C Hamano
2020-08-14 12:21 ` [PATCH 2/5] index-format.txt: document SHA-256 index format Martin Ågren
` (5 subsequent siblings)
6 siblings, 1 reply; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano
Document that in SHA-1 repositories, we use SHA-1 for "want"s and
"have"s, and in SHA-256 repositories, we use SHA-256.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Documentation/technical/http-protocol.txt | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 51a79e63de..507f28f9b3 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -401,8 +401,9 @@ at all in the request stream:
The stream is terminated by a pkt-line flush (`0000`).
A single "want" or "have" command MUST have one hex formatted
-SHA-1 as its value. Multiple SHA-1s MUST be sent by sending
-multiple commands.
+object name as its value. Multiple object names MUST be sent by sending
+multiple commands. (An object name is a SHA-1 hash in a SHA-1 repo
+and a SHA-256 hash in a SHA-256 repo.)
The `have` list is created by popping the first 32 commits
from `c_pending`. Less can be supplied if `c_pending` empties.
--
2.28.0.277.g9b3c35fffd
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
2020-08-14 12:21 ` [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
@ 2020-08-14 17:28 ` Junio C Hamano
2020-08-14 20:23 ` brian m. carlson
0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 17:28 UTC (permalink / raw)
To: Martin Ågren; +Cc: brian m. carlson, git, Derrick Stolee
Martin Ågren <martin.agren@gmail.com> writes:
> Document that in SHA-1 repositories, we use SHA-1 for "want"s and
> "have"s, and in SHA-256 repositories, we use SHA-256.
Ehh, doesn't this directly contradict the transition plan of "on the
wire everything will use SHA-1 version for now?"
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> Documentation/technical/http-protocol.txt | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
> index 51a79e63de..507f28f9b3 100644
> --- a/Documentation/technical/http-protocol.txt
> +++ b/Documentation/technical/http-protocol.txt
> @@ -401,8 +401,9 @@ at all in the request stream:
> The stream is terminated by a pkt-line flush (`0000`).
>
> A single "want" or "have" command MUST have one hex formatted
> -SHA-1 as its value. Multiple SHA-1s MUST be sent by sending
> -multiple commands.
> +object name as its value. Multiple object names MUST be sent by sending
> +multiple commands. (An object name is a SHA-1 hash in a SHA-1 repo
> +and a SHA-256 hash in a SHA-256 repo.)
>
> The `have` list is created by popping the first 32 commits
> from `c_pending`. Less can be supplied if `c_pending` empties.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
2020-08-14 17:28 ` Junio C Hamano
@ 2020-08-14 20:23 ` brian m. carlson
2020-08-14 20:32 ` Martin Ågren
2020-08-14 20:39 ` Junio C Hamano
0 siblings, 2 replies; 35+ messages in thread
From: brian m. carlson @ 2020-08-14 20:23 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Martin Ågren, git, Derrick Stolee
[-- Attachment #1: Type: text/plain, Size: 856 bytes --]
On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
> > "have"s, and in SHA-256 repositories, we use SHA-256.
>
> Ehh, doesn't this directly contradict the transition plan of "on the
> wire everything will use SHA-1 version for now?"
SHA-256 repositories interoperate currently using SHA-256 object IDs.
It was originally intended that we wouldn't update the protocol, but
that leads to much of the testsuite failing since it's impossible to
move objects from one place to another.
If we wanted to be more pedantically correct and optimize for the
future, we could say that the values use the format negotiated by the
"object-format" protocol extension and SHA-1 otherwise.
--
brian m. carlson: Houston, Texas, US
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
2020-08-14 20:23 ` brian m. carlson
@ 2020-08-14 20:32 ` Martin Ågren
2020-08-14 20:39 ` Junio C Hamano
1 sibling, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 20:32 UTC (permalink / raw)
To: brian m. carlson, Junio C Hamano, Martin Ågren,
Git Mailing List, Derrick Stolee
On Fri, 14 Aug 2020 at 22:23, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
> > Martin Ågren <martin.agren@gmail.com> writes:
> >
> > > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
> > > "have"s, and in SHA-256 repositories, we use SHA-256.
> >
> > Ehh, doesn't this directly contradict the transition plan of "on the
> > wire everything will use SHA-1 version for now?"
Yes, the transition plan would probably need updating there. I'm just
trying to document what we have.
> SHA-256 repositories interoperate currently using SHA-256 object IDs.
> It was originally intended that we wouldn't update the protocol, but
> that leads to much of the testsuite failing since it's impossible to
> move objects from one place to another.
>
> If we wanted to be more pedantically correct and optimize for the
> future, we could say that the values use the format negotiated by the
> "object-format" protocol extension and SHA-1 otherwise.
Hmm, I didn't think of that. Would we ever regret that we've painted
such a "big" picture and wish to refine it somehow? Compared to
admittedly being fairly narrow as I am here, then loosen things later.
I'll think about it, but I think I could go either way.
Martin
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
2020-08-14 20:23 ` brian m. carlson
2020-08-14 20:32 ` Martin Ågren
@ 2020-08-14 20:39 ` Junio C Hamano
2020-08-14 20:47 ` Junio C Hamano
1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 20:39 UTC (permalink / raw)
To: brian m. carlson; +Cc: Martin Ågren, git, Derrick Stolee
"brian m. carlson" <sandals@crustytoothpaste.net> writes:
> On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
>> Martin Ågren <martin.agren@gmail.com> writes:
>>
>> > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
>> > "have"s, and in SHA-256 repositories, we use SHA-256.
>>
>> Ehh, doesn't this directly contradict the transition plan of "on the
>> wire everything will use SHA-1 version for now?"
>
> SHA-256 repositories interoperate currently using SHA-256 object IDs.
> It was originally intended that we wouldn't update the protocol, but
> that leads to much of the testsuite failing since it's impossible to
> move objects from one place to another.
>
> If we wanted to be more pedantically correct and optimize for the
> future, we could say that the values use the format negotiated by the
> "object-format" protocol extension and SHA-1 otherwise.
Yup. I think a reasonable evolution path is
0) everything on the wire is SHA-1 and no local operation knows
SHA-256 (i.e. a few releases ago)
1) local operations are either SHA-1 or SHA-256 but not both.
On the wire, only protocol for SHA-1 repositories are
defined, so SHA-256 repositories cannot talk with anybody
using any official protocol, but a "borked" SHA-1 protocol
that naturally extends the object names width exists and
SHA-256 repositories can interoperate with each other. This
will be a backward compatibility nightmare, as Git from
SHA-256 repository that tries to talk to SHA-1 repository
will fail but without grace (i.e. the current situation).
2) on-the-wire protocol gains just one new capability to safely
unleash SHA-256 repositories to talk to the wider world. The
"borked" SHA-1 protocol above will become official when the
object-format=sha256 capability is negotiated by both ends.
At this stage, SHA-256 repositories still cannot talk with
SHA-1 repositories, but at least they can talk among
themselves as long as they use new-enough version of Git that
knows about the new capability.
3) on-the-fly SHA-1 vs SHA-256 migration gets implemented.
SHA-256 reposotories trying to talk to somebody else, after
discovering that the other end lacks object-format=sha256
capability, on-the-fly converts its SHA-256 objecst to SHA-1
and vice versa. Between SHA-256 repositories, the capability
above in 2) will allow native conversation with SHA-256.
Reaching 3) may be a lot of work, but at least we should get to 2)
to be able to safely let SHA-256 repositories to talk to the outside
world (yes, I consider it OK for SHA-256 repositories talking among
themselves in a private setting in the current state, and it would
be a good milestone and also test towards the eventual goal of
reaching 3), and with much smaller effort.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
2020-08-14 20:39 ` Junio C Hamano
@ 2020-08-14 20:47 ` Junio C Hamano
0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 20:47 UTC (permalink / raw)
To: brian m. carlson; +Cc: Martin Ågren, git, Derrick Stolee
Junio C Hamano <gitster@pobox.com> writes:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
>>> Martin Ågren <martin.agren@gmail.com> writes:
>>>
>>> > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
>>> > "have"s, and in SHA-256 repositories, we use SHA-256.
>>>
>>> Ehh, doesn't this directly contradict the transition plan of "on the
>>> wire everything will use SHA-1 version for now?"
>>
>> SHA-256 repositories interoperate currently using SHA-256 object IDs.
>> It was originally intended that we wouldn't update the protocol, but
>> that leads to much of the testsuite failing since it's impossible to
>> move objects from one place to another.
>>
>> If we wanted to be more pedantically correct and optimize for the
>> future, we could say that the values use the format negotiated by the
>> "object-format" protocol extension and SHA-1 otherwise.
Yes, that's wonderful. I was confused when I said about the
evolution path. We still would want to eventually do the on-the-fly
migration over the wire to make SHA-1 and SHA-256 repositories
interoperate, but at least we already can allow SHA-256 repositories
safely attempt to talk to SHA-1 repositories and gracefully fail.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 2/5] index-format.txt: document SHA-256 index format
2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
2020-08-14 12:21 ` [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
@ 2020-08-14 12:21 ` Martin Ågren
2020-08-14 12:28 ` Derrick Stolee
2020-08-14 12:21 ` [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
` (4 subsequent siblings)
6 siblings, 1 reply; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano
Similar to a recent commit, document that in SHA-1 repositories, we use
SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all
other uses of "SHA-1" with something more neutral.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Documentation/technical/index-format.txt | 27 +++++++++++++-----------
1 file changed, 15 insertions(+), 12 deletions(-)
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index faa25c5c52..827ece2ed1 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -3,8 +3,11 @@ Git index format
== The Git index file has the following format
- All binary numbers are in network byte order. Version 2 is described
- here unless stated otherwise.
+ All binary numbers are in network byte order.
+ In a repository using the traditional SHA-1, checksums and object IDs
+ (object names) mentioned below are all computed using SHA-1. Similarly,
+ in SHA-256 repositories, these values are computed using SHA-256.
+ Version 2 is described here unless stated otherwise.
- A 12-byte header consisting of
@@ -32,7 +35,7 @@ Git index format
Extension data
- - 160-bit SHA-1 over the content of the index file before this
+ - 160-bit hash checksum over the content of the index file before this
checksum.
== Index entry
@@ -80,7 +83,7 @@ Git index format
32-bit file size
This is the on-disk size from stat(2), truncated to 32-bit.
- 160-bit SHA-1 for the represented object
+ 160-bit object name for the represented object
A 16-bit 'flags' field split into (high to low bits)
@@ -211,8 +214,8 @@ Git index format
The extension consists of:
- - 160-bit SHA-1 of the shared index file. The shared index file path
- is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
+ - Hash of the shared index file. The shared index file path
+ is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
index does not require a shared index file.
- An ewah-encoded delete bitmap, each bit represents an entry in the
@@ -253,10 +256,10 @@ Git index format
- 32-bit dir_flags (see struct dir_struct)
- - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
+ - Hash of $GIT_DIR/info/exclude. A null hash means the file
does not exist.
- - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
+ - Hash of core.excludesfile. A null hash means the file does
not exist.
- NUL-terminated string of per-dir exclude file name. This usually
@@ -285,13 +288,13 @@ The remaining data of each directory block is grouped by type:
- An ewah bitmap, the n-th bit records "check-only" bit of
read_directory_recursive() for the n-th directory.
- - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
+ - An ewah bitmap, the n-th bit indicates whether hash and stat data
is valid for the n-th directory and exists in the next data.
- An array of stat data. The n-th data corresponds with the n-th
"one" bit in the previous ewah bitmap.
- - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
+ - An array of hashes. The n-th hash corresponds with the n-th "one" bit
in the previous ewah bitmap.
- One NUL.
@@ -330,12 +333,12 @@ The remaining data of each directory block is grouped by type:
- 32-bit offset to the end of the index entries
- - 160-bit SHA-1 over the extension types and their sizes (but not
+ - Hash over the extension types and their sizes (but not
their contents). E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:
- SHA-1("TREE" + <binary representation of N> +
+ Hash("TREE" + <binary representation of N> +
"REUC" + <binary representation of M>)
== Index Entry Offset Table
--
2.28.0.277.g9b3c35fffd
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 2/5] index-format.txt: document SHA-256 index format
2020-08-14 12:21 ` [PATCH 2/5] index-format.txt: document SHA-256 index format Martin Ågren
@ 2020-08-14 12:28 ` Derrick Stolee
2020-08-14 14:05 ` Martin Ågren
0 siblings, 1 reply; 35+ messages in thread
From: Derrick Stolee @ 2020-08-14 12:28 UTC (permalink / raw)
To: Martin Ågren, brian m. carlson; +Cc: git, Junio C Hamano
On 8/14/2020 8:21 AM, Martin Ågren wrote:
> Similar to a recent commit, document that in SHA-1 repositories, we use
> SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all
> other uses of "SHA-1" with something more neutral.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> Documentation/technical/index-format.txt | 27 +++++++++++++-----------
> 1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
> index faa25c5c52..827ece2ed1 100644
> --- a/Documentation/technical/index-format.txt
> +++ b/Documentation/technical/index-format.txt
> @@ -3,8 +3,11 @@ Git index format
>
> == The Git index file has the following format
>
> - All binary numbers are in network byte order. Version 2 is described
> - here unless stated otherwise.
> + All binary numbers are in network byte order.
> + In a repository using the traditional SHA-1, checksums and object IDs
> + (object names) mentioned below are all computed using SHA-1. Similarly,
> + in SHA-256 repositories, these values are computed using SHA-256.
> + Version 2 is described here unless stated otherwise.
>
> - A 12-byte header consisting of
>
> @@ -32,7 +35,7 @@ Git index format
>
> Extension data
>
> - - 160-bit SHA-1 over the content of the index file before this
> + - 160-bit hash checksum over the content of the index file before this
> checksum.
If this hash is flexible, then "160-bit" is not correct anymore, right?
> == Index entry
> @@ -80,7 +83,7 @@ Git index format
> 32-bit file size
> This is the on-disk size from stat(2), truncated to 32-bit.
>
> - 160-bit SHA-1 for the represented object
> + 160-bit object name for the represented object
Same here. The later instances of "160-bit" were dropped.
> A 16-bit 'flags' field split into (high to low bits)
>
> @@ -211,8 +214,8 @@ Git index format
>
> The extension consists of:
>
> - - 160-bit SHA-1 of the shared index file. The shared index file path
> - is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
> + - Hash of the shared index file. The shared index file path
> + is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
> index does not require a shared index file.
>
> - An ewah-encoded delete bitmap, each bit represents an entry in the
> @@ -253,10 +256,10 @@ Git index format
>
> - 32-bit dir_flags (see struct dir_struct)
>
> - - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
> + - Hash of $GIT_DIR/info/exclude. A null hash means the file
> does not exist.
>
> - - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
> + - Hash of core.excludesfile. A null hash means the file does
> not exist.
>
> - NUL-terminated string of per-dir exclude file name. This usually
> @@ -285,13 +288,13 @@ The remaining data of each directory block is grouped by type:
> - An ewah bitmap, the n-th bit records "check-only" bit of
> read_directory_recursive() for the n-th directory.
>
> - - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
> + - An ewah bitmap, the n-th bit indicates whether hash and stat data
> is valid for the n-th directory and exists in the next data.
>
> - An array of stat data. The n-th data corresponds with the n-th
> "one" bit in the previous ewah bitmap.
>
> - - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
> + - An array of hashes. The n-th hash corresponds with the n-th "one" bit
> in the previous ewah bitmap.
>
> - One NUL.
> @@ -330,12 +333,12 @@ The remaining data of each directory block is grouped by type:
>
> - 32-bit offset to the end of the index entries
>
> - - 160-bit SHA-1 over the extension types and their sizes (but not
> + - Hash over the extension types and their sizes (but not
> their contents). E.g. if we have "TREE" extension that is N-bytes
> long, "REUC" extension that is M-bytes long, followed by "EOIE",
> then the hash would be:
>
> - SHA-1("TREE" + <binary representation of N> +
> + Hash("TREE" + <binary representation of N> +
> "REUC" + <binary representation of M>)
>
> == Index Entry Offset Table
>
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 2/5] index-format.txt: document SHA-256 index format
2020-08-14 12:28 ` Derrick Stolee
@ 2020-08-14 14:05 ` Martin Ågren
0 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 14:05 UTC (permalink / raw)
To: Derrick Stolee; +Cc: brian m. carlson, Git Mailing List, Junio C Hamano
On Fri, 14 Aug 2020 at 14:28, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/14/2020 8:21 AM, Martin Ågren wrote:
> > - - 160-bit SHA-1 over the content of the index file before this
> > + - 160-bit hash checksum over the content of the index file before this
> > checksum.
>
> If this hash is flexible, then "160-bit" is not correct anymore, right?
>
> > - 160-bit SHA-1 for the represented object
> > + 160-bit object name for the represented object
>
> Same here. The later instances of "160-bit" were dropped.
Thanks for pointing out these errors.
Martin
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
2020-08-14 12:21 ` [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
2020-08-14 12:21 ` [PATCH 2/5] index-format.txt: document SHA-256 index format Martin Ågren
@ 2020-08-14 12:21 ` Martin Ågren
2020-08-14 12:31 ` Derrick Stolee
2020-08-14 17:33 ` Junio C Hamano
2020-08-14 12:21 ` [PATCH 4/5] shallow.txt: document SHA-256 shallow format Martin Ågren
` (3 subsequent siblings)
6 siblings, 2 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano
Two of our extensions contain "sha1" in their names, but that's
historical. The "want"s will take object names that are not necessarily
SHA-1s. Make this clear, but also make it clear how there's still just
one correct hash algo: These extensions don't somehow make the "want"s
take object names derived using *any* hash algorithm.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Documentation/technical/protocol-capabilities.txt | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 36ccd14f97..47f1b30090 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -324,15 +324,18 @@ allow-tip-sha1-in-want
----------------------
If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. (Note that the name of the capability
+contains "sha1", but that it's more general than that: in SHA-1
+repositories, the "want" lines provide SHA-1 values, but in SHA-256
+repositories, they provide SHA-256 values.)
allow-reachable-sha1-in-want
----------------------------
If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. (Same remark about "sha1" as above.)
push-cert=<nonce>
-----------------
--
2.28.0.277.g9b3c35fffd
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
2020-08-14 12:21 ` [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
@ 2020-08-14 12:31 ` Derrick Stolee
2020-08-14 14:05 ` Martin Ågren
2020-08-14 17:33 ` Junio C Hamano
1 sibling, 1 reply; 35+ messages in thread
From: Derrick Stolee @ 2020-08-14 12:31 UTC (permalink / raw)
To: Martin Ågren, brian m. carlson; +Cc: git, Junio C Hamano
On 8/14/2020 8:21 AM, Martin Ågren wrote:
> Two of our extensions contain "sha1" in their names, but that's
> historical. The "want"s will take object names that are not necessarily
> SHA-1s. Make this clear, but also make it clear how there's still just
> one correct hash algo: These extensions don't somehow make the "want"s
> take object names derived using *any* hash algorithm.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> Documentation/technical/protocol-capabilities.txt | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
> index 36ccd14f97..47f1b30090 100644
> --- a/Documentation/technical/protocol-capabilities.txt
> +++ b/Documentation/technical/protocol-capabilities.txt
> @@ -324,15 +324,18 @@ allow-tip-sha1-in-want
> ----------------------
>
> If the upload-pack server advertises this capability, fetch-pack may
> -send "want" lines with SHA-1s that exist at the server but are not
> -advertised by upload-pack.
> +send "want" lines with object names that exist at the server but are not
> +advertised by upload-pack. (Note that the name of the capability
> +contains "sha1", but that it's more general than that: in SHA-1
> +repositories, the "want" lines provide SHA-1 values, but in SHA-256
> +repositories, they provide SHA-256 values.)
>
> allow-reachable-sha1-in-want
> ----------------------------
>
> If the upload-pack server advertises this capability, fetch-pack may
> -send "want" lines with SHA-1s that exist at the server but are not
> -advertised by upload-pack.
> +send "want" lines with object names that exist at the server but are not
> +advertised by upload-pack. (Same remark about "sha1" as above.)
This "as above" is brittle to future changes. I think it
could be improved with
(As in "allow-tip-sha1-in-want", the "sha1" in this capability
refers to object names, not the hash algorithm chosen for the
repository.)
Or, just repeat the same note again.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
2020-08-14 12:31 ` Derrick Stolee
@ 2020-08-14 14:05 ` Martin Ågren
0 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 14:05 UTC (permalink / raw)
To: Derrick Stolee; +Cc: brian m. carlson, Git Mailing List, Junio C Hamano
On Fri, 14 Aug 2020 at 14:31, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/14/2020 8:21 AM, Martin Ågren wrote:
> >
> > If the upload-pack server advertises this capability, fetch-pack may
> > -send "want" lines with SHA-1s that exist at the server but are not
> > -advertised by upload-pack.
> > +send "want" lines with object names that exist at the server but are not
> > +advertised by upload-pack. (Note that the name of the capability
> > +contains "sha1", but that it's more general than that: in SHA-1
> > +repositories, the "want" lines provide SHA-1 values, but in SHA-256
> > +repositories, they provide SHA-256 values.)
> >
> > allow-reachable-sha1-in-want
> > ----------------------------
> >
> > If the upload-pack server advertises this capability, fetch-pack may
> > -send "want" lines with SHA-1s that exist at the server but are not
> > -advertised by upload-pack.
> > +send "want" lines with object names that exist at the server but are not
> > +advertised by upload-pack. (Same remark about "sha1" as above.)
>
> This "as above" is brittle to future changes.
Fair enough. :-) I actually thought this might be *less* brittle, since
we wouldn't need to do any additional changes twice.
> I think it
> could be improved with
>
> (As in "allow-tip-sha1-in-want", the "sha1" in this capability
> refers to object names, not the hash algorithm chosen for the
> repository.)
>
> Or, just repeat the same note again.
These two paragraphs are identical before this patch, so it might make
sense not to change that property. Thanks.
Martin
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
2020-08-14 12:21 ` [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
2020-08-14 12:31 ` Derrick Stolee
@ 2020-08-14 17:33 ` Junio C Hamano
2020-08-14 20:35 ` Martin Ågren
1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 17:33 UTC (permalink / raw)
To: Martin Ågren; +Cc: brian m. carlson, git, Derrick Stolee
Martin Ågren <martin.agren@gmail.com> writes:
> Two of our extensions contain "sha1" in their names, but that's
> historical. The "want"s will take object names that are not necessarily
> SHA-1s. Make this clear, but also make it clear how there's still just
> one correct hash algo: These extensions don't somehow make the "want"s
> take object names derived using *any* hash algorithm.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> Documentation/technical/protocol-capabilities.txt | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
> index 36ccd14f97..47f1b30090 100644
> --- a/Documentation/technical/protocol-capabilities.txt
> +++ b/Documentation/technical/protocol-capabilities.txt
> @@ -324,15 +324,18 @@ allow-tip-sha1-in-want
> ----------------------
>
> If the upload-pack server advertises this capability, fetch-pack may
> -send "want" lines with SHA-1s that exist at the server but are not
> -advertised by upload-pack.
> +send "want" lines with object names that exist at the server but are not
> +advertised by upload-pack. (Note that the name of the capability
> +contains "sha1", but that it's more general than that: in SHA-1
> +repositories, the "want" lines provide SHA-1 values, but in SHA-256
> +repositories, they provide SHA-256 values.)
I think we should have either a new sha256 capability or a more
generic hash-algo capability whose value can be set to sha256.
Neither the connection initiators or the acceptors should talk
in sha256 until both ends agreed to do so.
I do not think of any other way to make sure hosting sites to serve
projects that migrate at different pace. Per project, you might be
able to have a flag day. You cannot have a flag day that spans the
world.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
2020-08-14 17:33 ` Junio C Hamano
@ 2020-08-14 20:35 ` Martin Ågren
2020-08-14 20:43 ` Junio C Hamano
0 siblings, 1 reply; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 20:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: brian m. carlson, Git Mailing List, Derrick Stolee
On Fri, 14 Aug 2020 at 19:33, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > Two of our extensions contain "sha1" in their names, but that's
> > historical. The "want"s will take object names that are not necessarily
> > SHA-1s. Make this clear, but also make it clear how there's still just
> > one correct hash algo: These extensions don't somehow make the "want"s
> > take object names derived using *any* hash algorithm.
> >
> > Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> > ---
> > Documentation/technical/protocol-capabilities.txt | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
> > index 36ccd14f97..47f1b30090 100644
> > --- a/Documentation/technical/protocol-capabilities.txt
> > +++ b/Documentation/technical/protocol-capabilities.txt
> > @@ -324,15 +324,18 @@ allow-tip-sha1-in-want
> > ----------------------
> >
> > If the upload-pack server advertises this capability, fetch-pack may
> > -send "want" lines with SHA-1s that exist at the server but are not
> > -advertised by upload-pack.
> > +send "want" lines with object names that exist at the server but are not
> > +advertised by upload-pack. (Note that the name of the capability
> > +contains "sha1", but that it's more general than that: in SHA-1
> > +repositories, the "want" lines provide SHA-1 values, but in SHA-256
> > +repositories, they provide SHA-256 values.)
>
> I think we should have either a new sha256 capability or a more
> generic hash-algo capability whose value can be set to sha256.
> Neither the connection initiators or the acceptors should talk
> in sha256 until both ends agreed to do so.
I think we should, and I think we do. I haven't dug into the details,
but "object-format" looks like it's just that.
Maybe instead of SHA-1 and SHA-256, this should talk about "whatever has
been negotiated through 'object-format', or SHA-1", similar to brian's
suggestion elsewhere.
> I do not think of any other way to make sure hosting sites to serve
> projects that migrate at different pace. Per project, you might be
> able to have a flag day. You cannot have a flag day that spans the
> world.
Yeah, that makes sense.
Martin
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
2020-08-14 20:35 ` Martin Ågren
@ 2020-08-14 20:43 ` Junio C Hamano
0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 20:43 UTC (permalink / raw)
To: Martin Ågren; +Cc: brian m. carlson, Git Mailing List, Derrick Stolee
Martin Ågren <martin.agren@gmail.com> writes:
>> I think we should have either a new sha256 capability or a more
>> generic hash-algo capability whose value can be set to sha256.
>> Neither the connection initiators or the acceptors should talk
>> in sha256 until both ends agreed to do so.
>
> I think we should, and I think we do. I haven't dug into the details,
> but "object-format" looks like it's just that.
Ah, Yes, my thinko.
> Maybe instead of SHA-1 and SHA-256, this should talk about "whatever has
> been negotiated through 'object-format', or SHA-1", similar to brian's
> suggestion elsewhere.
Yup, that would be wonderful.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 4/5] shallow.txt: document SHA-256 shallow format
2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
` (2 preceding siblings ...)
2020-08-14 12:21 ` [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
@ 2020-08-14 12:21 ` Martin Ågren
2020-08-14 12:21 ` [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description Martin Ågren
` (2 subsequent siblings)
6 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano
Similar to recent commits, document that in SHA-1 repositories, we use
SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Documentation/technical/shallow.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt
index 01dedfe9ff..f3738baa0f 100644
--- a/Documentation/technical/shallow.txt
+++ b/Documentation/technical/shallow.txt
@@ -13,7 +13,7 @@ pretend as if they are root commits (e.g. "git log" traversal
stops after showing them; "git fsck" does not complain saying
the commits listed on their "parent" lines do not exist).
-Each line contains exactly one SHA-1. When read, a commit_graft
+Each line contains exactly one object name. When read, a commit_graft
will be constructed, which has nr_parent < 0 to make it easier
to discern from user provided grafts.
--
2.28.0.277.g9b3c35fffd
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description
2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
` (3 preceding siblings ...)
2020-08-14 12:21 ` [PATCH 4/5] shallow.txt: document SHA-256 shallow format Martin Ågren
@ 2020-08-14 12:21 ` Martin Ågren
2020-08-14 12:37 ` Derrick Stolee
2020-08-14 20:28 ` [PATCH 0/5] more SHA-256 documentation brian m. carlson
2020-08-15 16:05 ` [PATCH v2 0/4] " Martin Ågren
6 siblings, 1 reply; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano
We say that value 1 means "SHA-1", but in fact, it means "whatever
the_hash_algo is", see commit c166599862 ("commit-graph: convert to
using the_hash_algo", 2018-11-14).
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
If we want to be more fine-grained in the future, we'll need to say,
e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the
version number.
I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It
could be implemented as "easily" as "if (value_from_header !=
value_from_the_hash_algo) die(...);" for now. Might that pay off in the
long run?
This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're
fine.
[1] https://lore.kernel.org/git/da077fb0-14bb-b84f-c526-d759ebc9f5eb@gmail.com/
Documentation/technical/commit-graph-format.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 440541045d..3535426d32 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -42,8 +42,8 @@ HEADER:
1-byte version number:
Currently, the only valid version is 1.
- 1-byte Hash Version (1 = SHA-1)
- We infer the hash length (H) from this value.
+ 1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo)
+ We infer the hash length (H) from the hash algo derived from this value.
1-byte number (C) of "chunks"
--
2.28.0.277.g9b3c35fffd
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description
2020-08-14 12:21 ` [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description Martin Ågren
@ 2020-08-14 12:37 ` Derrick Stolee
2020-08-14 14:10 ` Martin Ågren
0 siblings, 1 reply; 35+ messages in thread
From: Derrick Stolee @ 2020-08-14 12:37 UTC (permalink / raw)
To: Martin Ågren, brian m. carlson
Cc: git, Junio C Hamano, Taylor Blau, Abhishek Kumar
On 8/14/2020 8:21 AM, Martin Ågren wrote:
> We say that value 1 means "SHA-1", but in fact, it means "whatever
> the_hash_algo is", see commit c166599862 ("commit-graph: convert to
> using the_hash_algo", 2018-11-14).
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
> If we want to be more fine-grained in the future, we'll need to say,
> e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the
> version number.
>
> I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It
> could be implemented as "easily" as "if (value_from_header !=
> value_from_the_hash_algo) die(...);" for now. Might that pay off in the
> long run?
>
> This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're
> fine.
I think that was the intention of the byte, but that is not what ended
up happening. If we want that to be the case, then we should do that
work as part of the 2.29 cycle before we release with the ability to
create SHA-256 repos (which will lock the commit-graph format for these
repos).
(By "we" I mean that I would try to do this work in a way that minimizes
conflicts with the current commit-graph work in flight [1] [2].)
[1] https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/cover.1597178914.git.me@ttaylorr.com/
> [1] https://lore.kernel.org/git/da077fb0-14bb-b84f-c526-d759ebc9f5eb@gmail.com/
>
> Documentation/technical/commit-graph-format.txt | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
> index 440541045d..3535426d32 100644
> --- a/Documentation/technical/commit-graph-format.txt
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -42,8 +42,8 @@ HEADER:
> 1-byte version number:
> Currently, the only valid version is 1.
>
> - 1-byte Hash Version (1 = SHA-1)
> - We infer the hash length (H) from this value.
> + 1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo)
> + We infer the hash length (H) from the hash algo derived from this value.
If we are _not_ changing the format to have a meaningful value in
this byte, then this documentation should be updated to state that
this byte must always have value 1, as it does not provide any
information.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description
2020-08-14 12:37 ` Derrick Stolee
@ 2020-08-14 14:10 ` Martin Ågren
0 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 14:10 UTC (permalink / raw)
To: Derrick Stolee
Cc: brian m. carlson, Git Mailing List, Junio C Hamano, Taylor Blau,
Abhishek Kumar
On Fri, 14 Aug 2020 at 14:37, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/14/2020 8:21 AM, Martin Ågren wrote:
> > We say that value 1 means "SHA-1", but in fact, it means "whatever
> > the_hash_algo is", see commit c166599862 ("commit-graph: convert to
> > using the_hash_algo", 2018-11-14).
> >
> > Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> > ---
> > If we want to be more fine-grained in the future, we'll need to say,
> > e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the
> > version number.
> >
> > I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It
> > could be implemented as "easily" as "if (value_from_header !=
> > value_from_the_hash_algo) die(...);" for now. Might that pay off in the
> > long run?
> >
> > This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're
> > fine.
>
> I think that was the intention of the byte, but that is not what ended
> up happening.
When I wrote this patch, I did go with "fix <thing>" rather than
"document SHA-256". For the other patches, it's sort of obvious how
those formats are so old that it's no wonder they assumed SHA-1. But
here, we did go to some trouble to try and future-proof things and
already had NewHash in mind. So that calls for "fix <thing>". But I'm
more and more starting to think that it's the implementation that should
be fixed and that the docs should just be extended to add "2 means
SHA-256".
> If we want that to be the case, then we should do that
> work as part of the 2.29 cycle before we release with the ability to
> create SHA-256 repos (which will lock the commit-graph format for these
> repos).
Part of my reasoning behind [3] was that in exactly a situation like this,
we'd be able to say
With extensions.objectFormat=sha256, Git 2.29-2.30 will barf at the
commit-graph files that Git 2.31+ generate, and the other way around.
Users will be able to remove "old" files and regenerate them, and
shouldn't use a mixed environment.
and know that those users knew this might happen.
But certainly, if we can avoid it altogether by changing behavior
already in 2.29, that would be better.
[3] https://lore.kernel.org/git/20200806202358.2265705-1-martin.agren@gmail.com/
> (By "we" I mean that I would try to do this work in a way that minimizes
> conflicts with the current commit-graph work in flight [1] [2].)
None of those seems to touch `oid_version()`, so if we can just tweak
that function to return 1 or 2 instead of always 1, I guess that's one
way.
> [1] https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/
>
> [2] https://lore.kernel.org/git/cover.1597178914.git.me@ttaylorr.com/
>
> >
> > - 1-byte Hash Version (1 = SHA-1)
> > - We infer the hash length (H) from this value.
> > + 1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo)
> > + We infer the hash length (H) from the hash algo derived from this value.
>
> If we are _not_ changing the format to have a meaningful value in
> this byte, then this documentation should be updated to state that
> this byte must always have value 1, as it does not provide any
> information.
We could still go
1 means whatever extensions.objectFormat says,
2 means SHA-1,
3 means SHA-256,
...
But maybe that would just be crazy.
Thanks for all your comments
Martin
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/5] more SHA-256 documentation
2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
` (4 preceding siblings ...)
2020-08-14 12:21 ` [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description Martin Ågren
@ 2020-08-14 20:28 ` brian m. carlson
2020-08-15 16:05 ` [PATCH v2 0/4] " Martin Ågren
6 siblings, 0 replies; 35+ messages in thread
From: brian m. carlson @ 2020-08-14 20:28 UTC (permalink / raw)
To: Martin Ågren; +Cc: git, Derrick Stolee, Junio C Hamano
[-- Attachment #1: Type: text/plain, Size: 1022 bytes --]
On 2020-08-14 at 12:21:41, Martin Ågren wrote:
> Hi brian,
>
> On Fri, 14 Aug 2020 at 00:49, brian m. carlson <sandals@crustytoothpaste.net> wrote:
> >
> > As was pointed out recently, some of our documentation doesn't properly
> > reflect the new support for SHA-256. This series updates the pack and
> > index documentation to reflect that these formats can handle SHA-256,
> > and updates the transition plan to reflect what we've implemented and
> > what the next steps are.
>
> Thanks, this looks great. Now we're making clear what it is we intend to
> be doing.
>
> What about these additional patches on top? These are based on my
> understanding, but hopefully they're not *too* wrong. I'm a bit hesitant
> about the final patch and it would be interesting to know what you
> think.
I think Stolee has a series so that the final patch isn't necessary, and
other than the things he mentioned in this thread, I think these would
be fine on top.
--
brian m. carlson: Houston, Texas, US
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v2 0/4] more SHA-256 documentation
2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
` (5 preceding siblings ...)
2020-08-14 20:28 ` [PATCH 0/5] more SHA-256 documentation brian m. carlson
@ 2020-08-15 16:05 ` Martin Ågren
2020-08-15 16:05 ` [PATCH v2 1/4] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
` (3 more replies)
6 siblings, 4 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:05 UTC (permalink / raw)
To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano
Thanks brian, Stolee and Junio for your comments on the initial
submission. Changes since v1:
* Dropped some "160-bit" I had missed.
* Refer to the 'object-format' capability in a few spots rather than
discussing SHA-1 vs SHA-256 repos.
* Dropped the final patch, since Stolee has submitted a patch (series)
for changing the implementation instead.
These could be part of bc/sha-256-doc-updates, since they are quite
similar in spirit, or go in separately, so these series don't need to
hold each other hostage. Whatever Junio and brian prefer will be fine
by me.
Martin
Martin Ågren (4):
http-protocol.txt: document SHA-256 "want"/"have" format
index-format.txt: document SHA-256 index format
protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
shallow.txt: document SHA-256 shallow format
Documentation/technical/http-protocol.txt | 5 +--
Documentation/technical/index-format.txt | 34 ++++++++++---------
.../technical/protocol-capabilities.txt | 12 ++++---
Documentation/technical/shallow.txt | 2 +-
4 files changed, 30 insertions(+), 23 deletions(-)
Range-diff against v1:
1: fcb26c81be ! 1: 2e9f6b9294 http-protocol.txt: document SHA-256 "want"/"have" format
@@ Metadata
## Commit message ##
http-protocol.txt: document SHA-256 "want"/"have" format
- Document that in SHA-1 repositories, we use SHA-1 for "want"s and
- "have"s, and in SHA-256 repositories, we use SHA-256.
+ Document that rather than always naming objects using SHA-1, we should
+ use whatever has been negotiated using the object-format capability.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
@@ Documentation/technical/http-protocol.txt: at all in the request stream:
-SHA-1 as its value. Multiple SHA-1s MUST be sent by sending
-multiple commands.
+object name as its value. Multiple object names MUST be sent by sending
-+multiple commands. (An object name is a SHA-1 hash in a SHA-1 repo
-+and a SHA-256 hash in a SHA-256 repo.)
++multiple commands. Object names MUST be given using the object format
++negotiated through the `object-format` capability (default SHA-1).
The `have` list is created by popping the first 32 commits
from `c_pending`. Less can be supplied if `c_pending` empties.
2: 5c13a9478a ! 2: 14bd0d9362 index-format.txt: document SHA-256 index format
@@ Metadata
## Commit message ##
index-format.txt: document SHA-256 index format
- Similar to a recent commit, document that in SHA-1 repositories, we use
- SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all
- other uses of "SHA-1" with something more neutral.
+ Document that in SHA-1 repositories, we use SHA-1 and in SHA-256
+ repositories, we use SHA-256, then replace all other uses of "SHA-1"
+ with something more neutral. Avoid referring to "160-bit" hash values.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
@@ Documentation/technical/index-format.txt: Git index format
Extension data
- - 160-bit SHA-1 over the content of the index file before this
-+ - 160-bit hash checksum over the content of the index file before this
- checksum.
+- checksum.
++ - Hash checksum over the content of the index file before this checksum.
== Index entry
+
@@ Documentation/technical/index-format.txt: Git index format
32-bit file size
This is the on-disk size from stat(2), truncated to 32-bit.
- 160-bit SHA-1 for the represented object
-+ 160-bit object name for the represented object
++ Object name for the represented object
A 16-bit 'flags' field split into (high to low bits)
+@@ Documentation/technical/index-format.txt: Git index format
+
+ - A newline (ASCII 10); and
+
+- - 160-bit object name for the object that would result from writing
+- this span of index as a tree.
++ - Object name for the object that would result from writing this span
++ of index as a tree.
+
+ An entry can be in an invalidated state and is represented by having
+ a negative number in the entry_count field. In this case, there is no
+@@ Documentation/technical/index-format.txt: Git index format
+ stage 1 to 3 (a missing stage is represented by "0" in this field);
+ and
+
+- - At most three 160-bit object names of the entry in stages from 1 to 3
++ - At most three object names of the entry in stages from 1 to 3
+ (nothing is written for a missing stage).
+
+ === Split index
@@ Documentation/technical/index-format.txt: Git index format
The extension consists of:
3: 82e5c67b7c ! 3: 2e82be9e36 protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
@@ Metadata
## Commit message ##
protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
- Two of our extensions contain "sha1" in their names, but that's
- historical. The "want"s will take object names that are not necessarily
- SHA-1s. Make this clear, but also make it clear how there's still just
- one correct hash algo: These extensions don't somehow make the "want"s
- take object names derived using *any* hash algorithm.
+ Two of our capabilities contain "sha1" in their names, but that's
+ historical. Clarify that object names are still to be given using
+ whatever object format has been negotiated using the "object-format"
+ capability.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
@@ Documentation/technical/protocol-capabilities.txt: allow-tip-sha1-in-want
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
-+advertised by upload-pack. (Note that the name of the capability
-+contains "sha1", but that it's more general than that: in SHA-1
-+repositories, the "want" lines provide SHA-1 values, but in SHA-256
-+repositories, they provide SHA-256 values.)
++advertised by upload-pack. For historical reasons, the name of this
++capability contains "sha1". Object names are always given using the
++object format negotiated through the 'object-format' capability.
allow-reachable-sha1-in-want
----------------------------
@@ Documentation/technical/protocol-capabilities.txt: allow-tip-sha1-in-want
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
-+advertised by upload-pack. (Same remark about "sha1" as above.)
++advertised by upload-pack. For historical reasons, the name of this
++capability contains "sha1". Object names are always given using the
++object format negotiated through the 'object-format' capability.
push-cert=<nonce>
-----------------
4: bcfbdd25e5 ! 4: 8680fc1af6 shallow.txt: document SHA-256 shallow format
@@ Metadata
## Commit message ##
shallow.txt: document SHA-256 shallow format
- Similar to recent commits, document that in SHA-1 repositories, we use
- SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256.
+ Similar to recent commits, document that we list object names rather
+ than SHA-1s.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
5: f95e3f65e7 < -: ---------- commit-graph-format.txt: fix "Hash Version" description
--
2.28.0.297.g1956fa8f8d
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v2 1/4] http-protocol.txt: document SHA-256 "want"/"have" format
2020-08-15 16:05 ` [PATCH v2 0/4] " Martin Ågren
@ 2020-08-15 16:05 ` Martin Ågren
2020-08-15 16:06 ` [PATCH v2 2/4] index-format.txt: document SHA-256 index format Martin Ågren
` (2 subsequent siblings)
3 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:05 UTC (permalink / raw)
To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano
Document that rather than always naming objects using SHA-1, we should
use whatever has been negotiated using the object-format capability.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Documentation/technical/http-protocol.txt | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 51a79e63de..96d89ea9b2 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -401,8 +401,9 @@ at all in the request stream:
The stream is terminated by a pkt-line flush (`0000`).
A single "want" or "have" command MUST have one hex formatted
-SHA-1 as its value. Multiple SHA-1s MUST be sent by sending
-multiple commands.
+object name as its value. Multiple object names MUST be sent by sending
+multiple commands. Object names MUST be given using the object format
+negotiated through the `object-format` capability (default SHA-1).
The `have` list is created by popping the first 32 commits
from `c_pending`. Less can be supplied if `c_pending` empties.
--
2.28.0.297.g1956fa8f8d
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 2/4] index-format.txt: document SHA-256 index format
2020-08-15 16:05 ` [PATCH v2 0/4] " Martin Ågren
2020-08-15 16:05 ` [PATCH v2 1/4] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
@ 2020-08-15 16:06 ` Martin Ågren
2020-08-15 16:06 ` [PATCH v2 3/4] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
2020-08-15 16:06 ` [PATCH v2 4/4] shallow.txt: document SHA-256 shallow format Martin Ågren
3 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:06 UTC (permalink / raw)
To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano
Document that in SHA-1 repositories, we use SHA-1 and in SHA-256
repositories, we use SHA-256, then replace all other uses of "SHA-1"
with something more neutral. Avoid referring to "160-bit" hash values.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Documentation/technical/index-format.txt | 34 +++++++++++++-----------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index faa25c5c52..f9a3644711 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -3,8 +3,11 @@ Git index format
== The Git index file has the following format
- All binary numbers are in network byte order. Version 2 is described
- here unless stated otherwise.
+ All binary numbers are in network byte order.
+ In a repository using the traditional SHA-1, checksums and object IDs
+ (object names) mentioned below are all computed using SHA-1. Similarly,
+ in SHA-256 repositories, these values are computed using SHA-256.
+ Version 2 is described here unless stated otherwise.
- A 12-byte header consisting of
@@ -32,8 +35,7 @@ Git index format
Extension data
- - 160-bit SHA-1 over the content of the index file before this
- checksum.
+ - Hash checksum over the content of the index file before this checksum.
== Index entry
@@ -80,7 +82,7 @@ Git index format
32-bit file size
This is the on-disk size from stat(2), truncated to 32-bit.
- 160-bit SHA-1 for the represented object
+ Object name for the represented object
A 16-bit 'flags' field split into (high to low bits)
@@ -160,8 +162,8 @@ Git index format
- A newline (ASCII 10); and
- - 160-bit object name for the object that would result from writing
- this span of index as a tree.
+ - Object name for the object that would result from writing this span
+ of index as a tree.
An entry can be in an invalidated state and is represented by having
a negative number in the entry_count field. In this case, there is no
@@ -198,7 +200,7 @@ Git index format
stage 1 to 3 (a missing stage is represented by "0" in this field);
and
- - At most three 160-bit object names of the entry in stages from 1 to 3
+ - At most three object names of the entry in stages from 1 to 3
(nothing is written for a missing stage).
=== Split index
@@ -211,8 +213,8 @@ Git index format
The extension consists of:
- - 160-bit SHA-1 of the shared index file. The shared index file path
- is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
+ - Hash of the shared index file. The shared index file path
+ is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
index does not require a shared index file.
- An ewah-encoded delete bitmap, each bit represents an entry in the
@@ -253,10 +255,10 @@ Git index format
- 32-bit dir_flags (see struct dir_struct)
- - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
+ - Hash of $GIT_DIR/info/exclude. A null hash means the file
does not exist.
- - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
+ - Hash of core.excludesfile. A null hash means the file does
not exist.
- NUL-terminated string of per-dir exclude file name. This usually
@@ -285,13 +287,13 @@ The remaining data of each directory block is grouped by type:
- An ewah bitmap, the n-th bit records "check-only" bit of
read_directory_recursive() for the n-th directory.
- - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
+ - An ewah bitmap, the n-th bit indicates whether hash and stat data
is valid for the n-th directory and exists in the next data.
- An array of stat data. The n-th data corresponds with the n-th
"one" bit in the previous ewah bitmap.
- - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
+ - An array of hashes. The n-th hash corresponds with the n-th "one" bit
in the previous ewah bitmap.
- One NUL.
@@ -330,12 +332,12 @@ The remaining data of each directory block is grouped by type:
- 32-bit offset to the end of the index entries
- - 160-bit SHA-1 over the extension types and their sizes (but not
+ - Hash over the extension types and their sizes (but not
their contents). E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:
- SHA-1("TREE" + <binary representation of N> +
+ Hash("TREE" + <binary representation of N> +
"REUC" + <binary representation of M>)
== Index Entry Offset Table
--
2.28.0.297.g1956fa8f8d
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 3/4] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
2020-08-15 16:05 ` [PATCH v2 0/4] " Martin Ågren
2020-08-15 16:05 ` [PATCH v2 1/4] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
2020-08-15 16:06 ` [PATCH v2 2/4] index-format.txt: document SHA-256 index format Martin Ågren
@ 2020-08-15 16:06 ` Martin Ågren
2020-08-15 16:06 ` [PATCH v2 4/4] shallow.txt: document SHA-256 shallow format Martin Ågren
3 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:06 UTC (permalink / raw)
To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano
Two of our capabilities contain "sha1" in their names, but that's
historical. Clarify that object names are still to be given using
whatever object format has been negotiated using the "object-format"
capability.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Documentation/technical/protocol-capabilities.txt | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 36ccd14f97..124d716807 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -324,15 +324,19 @@ allow-tip-sha1-in-want
----------------------
If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. For historical reasons, the name of this
+capability contains "sha1". Object names are always given using the
+object format negotiated through the 'object-format' capability.
allow-reachable-sha1-in-want
----------------------------
If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. For historical reasons, the name of this
+capability contains "sha1". Object names are always given using the
+object format negotiated through the 'object-format' capability.
push-cert=<nonce>
-----------------
--
2.28.0.297.g1956fa8f8d
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 4/4] shallow.txt: document SHA-256 shallow format
2020-08-15 16:05 ` [PATCH v2 0/4] " Martin Ågren
` (2 preceding siblings ...)
2020-08-15 16:06 ` [PATCH v2 3/4] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
@ 2020-08-15 16:06 ` Martin Ågren
3 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:06 UTC (permalink / raw)
To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano
Similar to recent commits, document that we list object names rather
than SHA-1s.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
Documentation/technical/shallow.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt
index 01dedfe9ff..f3738baa0f 100644
--- a/Documentation/technical/shallow.txt
+++ b/Documentation/technical/shallow.txt
@@ -13,7 +13,7 @@ pretend as if they are root commits (e.g. "git log" traversal
stops after showing them; "git fsck" does not complain saying
the commits listed on their "parent" lines do not exist).
-Each line contains exactly one SHA-1. When read, a commit_graft
+Each line contains exactly one object name. When read, a commit_graft
will be constructed, which has nr_parent < 0 to make it easier
to discern from user provided grafts.
--
2.28.0.297.g1956fa8f8d
^ permalink raw reply related [flat|nested] 35+ messages in thread