git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Documentation updates for SHA-256
@ 2020-08-13 22:48 brian m. carlson
  2020-08-13 22:49 ` [PATCH 1/2] docs: document SHA-256 pack and indices brian m. carlson
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: brian m. carlson @ 2020-08-13 22:48 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

As was pointed out recently, some of our documentation doesn't properly
reflect the new support for SHA-256.  This series updates the pack and
index documentation to reflect that these formats can handle SHA-256,
and updates the transition plan to reflect what we've implemented and
what the next steps are.

brian m. carlson (2):
  docs: document SHA-256 pack and indices
  docs: fix step in transition plan

 .../technical/hash-function-transition.txt    |  2 +-
 Documentation/technical/pack-format.txt       | 36 +++++++++++--------
 2 files changed, 22 insertions(+), 16 deletions(-)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] docs: document SHA-256 pack and indices
  2020-08-13 22:48 [PATCH 0/2] Documentation updates for SHA-256 brian m. carlson
@ 2020-08-13 22:49 ` brian m. carlson
  2020-08-14  2:26   ` Derrick Stolee
  2020-08-13 22:49 ` [PATCH 2/2] docs: fix step in transition plan brian m. carlson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: brian m. carlson @ 2020-08-13 22:49 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Now that we have SHA-256 support for packs and indices, let's document
that in SHA-256 repositories, we use SHA-256 instead of SHA-1 for object
names and checksums.  Instead of duplicating this information throughout
the document, let's just document that in SHA-1 repositories, we use
SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/technical/pack-format.txt | 36 ++++++++++++++-----------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index d3a142c652..f4c8d94f73 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -1,6 +1,12 @@
 Git pack format
 ===============
 
+== Checksums and object IDs
+
+In a repository using the traditional SHA-1, pack checksums, index checksums,
+and object IDs (object names) mentioned below are all computed using SHA-1.
+Similarly, in SHA-256 repositories, these values are computed using SHA-256.
+
 == pack-*.pack files have the following format:
 
    - A header appears at the beginning and consists of the following:
@@ -26,7 +32,7 @@ Git pack format
 
      (deltified representation)
      n-byte type and length (3-bit type, (n-1)*7+4-bit length)
-     20-byte base object name if OBJ_REF_DELTA or a negative relative
+     base object name if OBJ_REF_DELTA or a negative relative
 	 offset from the delta object's position in the pack if this
 	 is an OBJ_OFS_DELTA object
      compressed delta data
@@ -34,7 +40,7 @@ Git pack format
      Observation: length of each object is encoded in a variable
      length format and is not constrained to 32-bit or anything.
 
-  - The trailer records 20-byte SHA-1 checksum of all of the above.
+  - The trailer records a pack checksum of all of the above.
 
 === Object types
 
@@ -58,8 +64,8 @@ ofs-delta and ref-delta, which is only valid in a pack file.
 
 Both ofs-delta and ref-delta store the "delta" to be applied to
 another object (called 'base object') to reconstruct the object. The
-difference between them is, ref-delta directly encodes 20-byte base
-object name. If the base object is in the same pack, ofs-delta encodes
+difference between them is, ref-delta directly encodes base object
+name. If the base object is in the same pack, ofs-delta encodes
 the offset of the base object in the pack instead.
 
 The base object could also be deltified if it's in the same pack.
@@ -143,14 +149,14 @@ This is the instruction reserved for future expansion.
     object is stored in the packfile as the offset from the
     beginning.
 
-    20-byte object name.
+    one object name of the appropriate size.
 
   - The file is concluded with a trailer:
 
-    A copy of the 20-byte SHA-1 checksum at the end of
-    corresponding packfile.
+    A copy of the pack checksum at the end of the corresponding
+    packfile.
 
-    20-byte SHA-1-checksum of all of the above.
+    Index checksum of all of the above.
 
 Pack Idx file:
 
@@ -198,7 +204,7 @@ Pack file entry: <+
         If it is not DELTA, then deflated bytes (the size above
 		is the size before compression).
 	If it is REF_DELTA, then
-	  20-byte base object name SHA-1 (the size above is the
+	  base object name (the size above is the
 		size of the delta data that follows).
           delta data, deflated.
 	If it is OFS_DELTA, then
@@ -227,9 +233,9 @@ Pack file entry: <+
 
   - A 256-entry fan-out table just like v1.
 
-  - A table of sorted 20-byte SHA-1 object names.  These are
-    packed together without offset values to reduce the cache
-    footprint of the binary search for a specific object name.
+  - A table of sorted object names.  These are packed together
+    without offset values to reduce the cache footprint of the
+    binary search for a specific object name.
 
   - A table of 4-byte CRC32 values of the packed object data.
     This is new in v2 so compressed data can be copied directly
@@ -248,10 +254,10 @@ Pack file entry: <+
 
   - The same trailer as a v1 pack file:
 
-    A copy of the 20-byte SHA-1 checksum at the end of
+    A copy of the pack checksum at the end of
     corresponding packfile.
 
-    20-byte SHA-1-checksum of all of the above.
+    Index checksum of all of the above.
 
 == multi-pack-index (MIDX) files have the following format:
 
@@ -329,4 +335,4 @@ CHUNK DATA:
 
 TRAILER:
 
-	20-byte SHA1-checksum of the above contents.
+	Index checksum of the above contents.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/2] docs: fix step in transition plan
  2020-08-13 22:48 [PATCH 0/2] Documentation updates for SHA-256 brian m. carlson
  2020-08-13 22:49 ` [PATCH 1/2] docs: document SHA-256 pack and indices brian m. carlson
@ 2020-08-13 22:49 ` brian m. carlson
  2020-08-14 12:21   ` Martin Ågren
  2020-08-14  2:33 ` [PATCH 0/2] Documentation updates for SHA-256 Derrick Stolee
  2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
  3 siblings, 1 reply; 35+ messages in thread
From: brian m. carlson @ 2020-08-13 22:49 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

One of the required steps for the objectFormat extension is to implement
the loose object index.  However, without support for
compatObjectFormat, we don't even know if the loose object index is
needed, so it makes sense to move that step to the compatObjectFormat
section.  Do so.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/technical/hash-function-transition.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt
index 5b2db3be1e..6fd20ebbc2 100644
--- a/Documentation/technical/hash-function-transition.txt
+++ b/Documentation/technical/hash-function-transition.txt
@@ -650,7 +650,6 @@ Some initial steps can be implemented independently of one another:
 
 The first user-visible change is the introduction of the objectFormat
 extension (without compatObjectFormat). This requires:
-- implementing the loose-object-idx
 - teaching fsck about this mode of operation
 - using the hash function API (vtable) when computing object names
 - signing objects and verifying signatures
@@ -658,6 +657,7 @@ extension (without compatObjectFormat). This requires:
   repository
 
 Next comes introduction of compatObjectFormat:
+- implementing the loose-object-idx
 - translating object names between object formats
 - translating object content between object formats
 - generating and verifying signatures in the compat format

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] docs: document SHA-256 pack and indices
  2020-08-13 22:49 ` [PATCH 1/2] docs: document SHA-256 pack and indices brian m. carlson
@ 2020-08-14  2:26   ` Derrick Stolee
  0 siblings, 0 replies; 35+ messages in thread
From: Derrick Stolee @ 2020-08-14  2:26 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Martin Ågren

On 8/13/2020 6:49 PM, brian m. carlson wrote:
> Now that we have SHA-256 support for packs and indices, let's document
> that in SHA-256 repositories, we use SHA-256 instead of SHA-1 for object
> names and checksums.  Instead of duplicating this information throughout
> the document, let's just document that in SHA-1 repositories, we use
> SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  Documentation/technical/pack-format.txt | 36 ++++++++++++++-----------
>  1 file changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
> index d3a142c652..f4c8d94f73 100644
> --- a/Documentation/technical/pack-format.txt
> +++ b/Documentation/technical/pack-format.txt
> @@ -1,6 +1,12 @@
>  Git pack format
>  ===============
>  
> +== Checksums and object IDs
> +
> +In a repository using the traditional SHA-1, pack checksums, index checksums,
> +and object IDs (object names) mentioned below are all computed using SHA-1.
> +Similarly, in SHA-256 repositories, these values are computed using SHA-256.
> +
>  == pack-*.pack files have the following format:
>  
>     - A header appears at the beginning and consists of the following:
> @@ -26,7 +32,7 @@ Git pack format
>  
>       (deltified representation)
>       n-byte type and length (3-bit type, (n-1)*7+4-bit length)
> -     20-byte base object name if OBJ_REF_DELTA or a negative relative
> +     base object name if OBJ_REF_DELTA or a negative relative
>  	 offset from the delta object's position in the pack if this
>  	 is an OBJ_OFS_DELTA object
>       compressed delta data
> @@ -34,7 +40,7 @@ Git pack format
>       Observation: length of each object is encoded in a variable
>       length format and is not constrained to 32-bit or anything.
>  
> -  - The trailer records 20-byte SHA-1 checksum of all of the above.
> +  - The trailer records a pack checksum of all of the above.
>  
>  === Object types
>  
> @@ -58,8 +64,8 @@ ofs-delta and ref-delta, which is only valid in a pack file.
>  
>  Both ofs-delta and ref-delta store the "delta" to be applied to
>  another object (called 'base object') to reconstruct the object. The
> -difference between them is, ref-delta directly encodes 20-byte base
> -object name. If the base object is in the same pack, ofs-delta encodes
> +difference between them is, ref-delta directly encodes base object
> +name. If the base object is in the same pack, ofs-delta encodes
>  the offset of the base object in the pack instead.
>  
>  The base object could also be deltified if it's in the same pack.
> @@ -143,14 +149,14 @@ This is the instruction reserved for future expansion.
>      object is stored in the packfile as the offset from the
>      beginning.
>  
> -    20-byte object name.
> +    one object name of the appropriate size.
>  
>    - The file is concluded with a trailer:
>  
> -    A copy of the 20-byte SHA-1 checksum at the end of
> -    corresponding packfile.
> +    A copy of the pack checksum at the end of the corresponding
> +    packfile.
>  
> -    20-byte SHA-1-checksum of all of the above.
> +    Index checksum of all of the above.
>  
>  Pack Idx file:
>  
> @@ -198,7 +204,7 @@ Pack file entry: <+
>          If it is not DELTA, then deflated bytes (the size above
>  		is the size before compression).
>  	If it is REF_DELTA, then
> -	  20-byte base object name SHA-1 (the size above is the
> +	  base object name (the size above is the
>  		size of the delta data that follows).
>            delta data, deflated.
>  	If it is OFS_DELTA, then
> @@ -227,9 +233,9 @@ Pack file entry: <+
>  
>    - A 256-entry fan-out table just like v1.
>  
> -  - A table of sorted 20-byte SHA-1 object names.  These are
> -    packed together without offset values to reduce the cache
> -    footprint of the binary search for a specific object name.
> +  - A table of sorted object names.  These are packed together
> +    without offset values to reduce the cache footprint of the
> +    binary search for a specific object name.
>  
>    - A table of 4-byte CRC32 values of the packed object data.
>      This is new in v2 so compressed data can be copied directly
> @@ -248,10 +254,10 @@ Pack file entry: <+
>  
>    - The same trailer as a v1 pack file:
>  
> -    A copy of the 20-byte SHA-1 checksum at the end of
> +    A copy of the pack checksum at the end of
>      corresponding packfile.
>  
> -    20-byte SHA-1-checksum of all of the above.
> +    Index checksum of all of the above.
>  
>  == multi-pack-index (MIDX) files have the following format:
>  
> @@ -329,4 +335,4 @@ CHUNK DATA:
>  
>  TRAILER:
>  
> -	20-byte SHA1-checksum of the above contents.
> +	Index checksum of the above contents.

I immediately got concerned and looked at the existing MIDX
format to see how we deal with hash length and the contents
missing in this diff are suitably vague about the hash-length:

  "The OIDs for all objects in the MIDX are stored..."

These changes are good.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] Documentation updates for SHA-256
  2020-08-13 22:48 [PATCH 0/2] Documentation updates for SHA-256 brian m. carlson
  2020-08-13 22:49 ` [PATCH 1/2] docs: document SHA-256 pack and indices brian m. carlson
  2020-08-13 22:49 ` [PATCH 2/2] docs: fix step in transition plan brian m. carlson
@ 2020-08-14  2:33 ` Derrick Stolee
  2020-08-14  4:47   ` Junio C Hamano
  2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
  3 siblings, 1 reply; 35+ messages in thread
From: Derrick Stolee @ 2020-08-14  2:33 UTC (permalink / raw)
  To: brian m. carlson, git; +Cc: Martin Ågren

On 8/13/2020 6:48 PM, brian m. carlson wrote:
> As was pointed out recently, some of our documentation doesn't properly
> reflect the new support for SHA-256.

Just to bring up some subtlety in Martin's message [1], there is some
valid concern that the existing file formats are not correctly versioned
"in a vacuum." When they are in a repository that has SHA-256 set as its
hash algorithm, Git interprets these file formats correctly, but if a
*.pack file (and its *.idx) happened to be copied into the pack directory
of a Git repository still in SHA-1 mode, then Git would get confused and
probably fail miserably.

Is that really a concern? Maybe, but also Git will never move data like
that. The main thing is to focus on compatibility of the .git directory
as a whole (and the protocol, as we move into inter-operability mode).

[1] https://lore.kernel.org/git/CAN0heSptiJL9d86ZeNPMUaZeTA68juwTyf3K-uWR=K-vt=1Hrg@mail.gmail.com/

>  This series updates the pack and
> index documentation to reflect that these formats can handle SHA-256,
> and updates the transition plan to reflect what we've implemented and
> what the next steps are.

These patches are good to help clarify these formats in the new world.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] Documentation updates for SHA-256
  2020-08-14  2:33 ` [PATCH 0/2] Documentation updates for SHA-256 Derrick Stolee
@ 2020-08-14  4:47   ` Junio C Hamano
  2020-08-14 20:20     ` brian m. carlson
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14  4:47 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: brian m. carlson, git, Martin Ågren

Derrick Stolee <stolee@gmail.com> writes:

> Is that really a concern? Maybe, but also Git will never move data like
> that.

I would say that we can safely say that this year ;-) as dumb HTTP
would be mostly dead.

> The main thing is to focus on compatibility of the .git directory
> as a whole (and the protocol, as we move into inter-operability mode).
>
> [1] https://lore.kernel.org/git/CAN0heSptiJL9d86ZeNPMUaZeTA68juwTyf3K-uWR=K-vt=1Hrg@mail.gmail.com/
>
>>  This series updates the pack and
>> index documentation to reflect that these formats can handle SHA-256,
>> and updates the transition plan to reflect what we've implemented and
>> what the next steps are.
>
> These patches are good to help clarify these formats in the new world.

Yup.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] docs: fix step in transition plan
  2020-08-13 22:49 ` [PATCH 2/2] docs: fix step in transition plan brian m. carlson
@ 2020-08-14 12:21   ` Martin Ågren
  0 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List

On Fri, 14 Aug 2020 at 00:49, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> One of the required steps for the objectFormat extension is to implement
> the loose object index.  However, without support for
> compatObjectFormat, we don't even know if the loose object index is
> needed, so it makes sense to move that step to the compatObjectFormat
> section.  Do so.

This makes sense to me. I know I thought out loud before that maybe
there's some intention here and more specifically, maybe we want to
*know* that this loose-object-idx is always there. But we'd still need
to tiptoe around it in a SHA-1 repo, so even if we'd know that all
proper SHA-256 repos have been generating such a file since day 1, that
probably wouldn't help us much in terms of implementation/fallback
strategies and whatnot.

Martin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/5] more SHA-256 documentation
  2020-08-13 22:48 [PATCH 0/2] Documentation updates for SHA-256 brian m. carlson
                   ` (2 preceding siblings ...)
  2020-08-14  2:33 ` [PATCH 0/2] Documentation updates for SHA-256 Derrick Stolee
@ 2020-08-14 12:21 ` Martin Ågren
  2020-08-14 12:21   ` [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
                     ` (6 more replies)
  3 siblings, 7 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano

Hi brian,

On Fri, 14 Aug 2020 at 00:49, brian m. carlson <sandals@crustytoothpaste.net> wrote:
>
> As was pointed out recently, some of our documentation doesn't properly
> reflect the new support for SHA-256.  This series updates the pack and
> index documentation to reflect that these formats can handle SHA-256,
> and updates the transition plan to reflect what we've implemented and
> what the next steps are.

Thanks, this looks great. Now we're making clear what it is we intend to
be doing.

What about these additional patches on top? These are based on my
understanding, but hopefully they're not *too* wrong. I'm a bit hesitant
about the final patch and it would be interesting to know what you
think.

Martin Ågren (5):
  http-protocol.txt: document SHA-256 "want"/"have" format
  index-format.txt: document SHA-256 index format
  protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
  shallow.txt: document SHA-256 shallow format
  commit-graph-format.txt: fix "Hash Version" description

 .../technical/commit-graph-format.txt         |  4 +--
 Documentation/technical/http-protocol.txt     |  5 ++--
 Documentation/technical/index-format.txt      | 27 ++++++++++---------
 .../technical/protocol-capabilities.txt       | 11 +++++---
 Documentation/technical/shallow.txt           |  2 +-
 5 files changed, 28 insertions(+), 21 deletions(-)

-- 
2.28.0.277.g9b3c35fffd


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
  2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
@ 2020-08-14 12:21   ` Martin Ågren
  2020-08-14 17:28     ` Junio C Hamano
  2020-08-14 12:21   ` [PATCH 2/5] index-format.txt: document SHA-256 index format Martin Ågren
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano

Document that in SHA-1 repositories, we use SHA-1 for "want"s and
"have"s, and in SHA-256 repositories, we use SHA-256.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/http-protocol.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 51a79e63de..507f28f9b3 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -401,8 +401,9 @@ at all in the request stream:
 The stream is terminated by a pkt-line flush (`0000`).
 
 A single "want" or "have" command MUST have one hex formatted
-SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
-multiple commands.
+object name as its value.  Multiple object names MUST be sent by sending
+multiple commands. (An object name is a SHA-1 hash in a SHA-1 repo
+and a SHA-256 hash in a SHA-256 repo.)
 
 The `have` list is created by popping the first 32 commits
 from `c_pending`.  Less can be supplied if `c_pending` empties.
-- 
2.28.0.277.g9b3c35fffd


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/5] index-format.txt: document SHA-256 index format
  2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
  2020-08-14 12:21   ` [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
@ 2020-08-14 12:21   ` Martin Ågren
  2020-08-14 12:28     ` Derrick Stolee
  2020-08-14 12:21   ` [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano

Similar to a recent commit, document that in SHA-1 repositories, we use
SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all
other uses of "SHA-1" with something more neutral.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/index-format.txt | 27 +++++++++++++-----------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index faa25c5c52..827ece2ed1 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -3,8 +3,11 @@ Git index format
 
 == The Git index file has the following format
 
-  All binary numbers are in network byte order. Version 2 is described
-  here unless stated otherwise.
+  All binary numbers are in network byte order.
+  In a repository using the traditional SHA-1, checksums and object IDs
+  (object names) mentioned below are all computed using SHA-1.  Similarly,
+  in SHA-256 repositories, these values are computed using SHA-256.
+  Version 2 is described here unless stated otherwise.
 
    - A 12-byte header consisting of
 
@@ -32,7 +35,7 @@ Git index format
 
      Extension data
 
-   - 160-bit SHA-1 over the content of the index file before this
+   - 160-bit hash checksum over the content of the index file before this
      checksum.
 
 == Index entry
@@ -80,7 +83,7 @@ Git index format
   32-bit file size
     This is the on-disk size from stat(2), truncated to 32-bit.
 
-  160-bit SHA-1 for the represented object
+  160-bit object name for the represented object
 
   A 16-bit 'flags' field split into (high to low bits)
 
@@ -211,8 +214,8 @@ Git index format
 
   The extension consists of:
 
-  - 160-bit SHA-1 of the shared index file. The shared index file path
-    is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
+  - Hash of the shared index file. The shared index file path
+    is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
     index does not require a shared index file.
 
   - An ewah-encoded delete bitmap, each bit represents an entry in the
@@ -253,10 +256,10 @@ Git index format
 
   - 32-bit dir_flags (see struct dir_struct)
 
-  - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
+  - Hash of $GIT_DIR/info/exclude. A null hash means the file
     does not exist.
 
-  - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
+  - Hash of core.excludesfile. A null hash means the file does
     not exist.
 
   - NUL-terminated string of per-dir exclude file name. This usually
@@ -285,13 +288,13 @@ The remaining data of each directory block is grouped by type:
   - An ewah bitmap, the n-th bit records "check-only" bit of
     read_directory_recursive() for the n-th directory.
 
-  - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
+  - An ewah bitmap, the n-th bit indicates whether hash and stat data
     is valid for the n-th directory and exists in the next data.
 
   - An array of stat data. The n-th data corresponds with the n-th
     "one" bit in the previous ewah bitmap.
 
-  - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
+  - An array of hashes. The n-th hash corresponds with the n-th "one" bit
     in the previous ewah bitmap.
 
   - One NUL.
@@ -330,12 +333,12 @@ The remaining data of each directory block is grouped by type:
 
   - 32-bit offset to the end of the index entries
 
-  - 160-bit SHA-1 over the extension types and their sizes (but not
+  - Hash over the extension types and their sizes (but not
 	their contents).  E.g. if we have "TREE" extension that is N-bytes
 	long, "REUC" extension that is M-bytes long, followed by "EOIE",
 	then the hash would be:
 
-	SHA-1("TREE" + <binary representation of N> +
+	Hash("TREE" + <binary representation of N> +
 		"REUC" + <binary representation of M>)
 
 == Index Entry Offset Table
-- 
2.28.0.277.g9b3c35fffd


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
  2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
  2020-08-14 12:21   ` [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
  2020-08-14 12:21   ` [PATCH 2/5] index-format.txt: document SHA-256 index format Martin Ågren
@ 2020-08-14 12:21   ` Martin Ågren
  2020-08-14 12:31     ` Derrick Stolee
  2020-08-14 17:33     ` Junio C Hamano
  2020-08-14 12:21   ` [PATCH 4/5] shallow.txt: document SHA-256 shallow format Martin Ågren
                     ` (3 subsequent siblings)
  6 siblings, 2 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano

Two of our extensions contain "sha1" in their names, but that's
historical. The "want"s will take object names that are not necessarily
SHA-1s. Make this clear, but also make it clear how there's still just
one correct hash algo: These extensions don't somehow make the "want"s
take object names derived using *any* hash algorithm.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/protocol-capabilities.txt | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 36ccd14f97..47f1b30090 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -324,15 +324,18 @@ allow-tip-sha1-in-want
 ----------------------
 
 If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. (Note that the name of the capability
+contains "sha1", but that it's more general than that: in SHA-1
+repositories, the "want" lines provide SHA-1 values, but in SHA-256
+repositories, they provide SHA-256 values.)
 
 allow-reachable-sha1-in-want
 ----------------------------
 
 If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. (Same remark about "sha1" as above.)
 
 push-cert=<nonce>
 -----------------
-- 
2.28.0.277.g9b3c35fffd


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 4/5] shallow.txt: document SHA-256 shallow format
  2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
                     ` (2 preceding siblings ...)
  2020-08-14 12:21   ` [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
@ 2020-08-14 12:21   ` Martin Ågren
  2020-08-14 12:21   ` [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description Martin Ågren
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano

Similar to recent commits, document that in SHA-1 repositories, we use
SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/shallow.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt
index 01dedfe9ff..f3738baa0f 100644
--- a/Documentation/technical/shallow.txt
+++ b/Documentation/technical/shallow.txt
@@ -13,7 +13,7 @@ pretend as if they are root commits (e.g. "git log" traversal
 stops after showing them; "git fsck" does not complain saying
 the commits listed on their "parent" lines do not exist).
 
-Each line contains exactly one SHA-1. When read, a commit_graft
+Each line contains exactly one object name. When read, a commit_graft
 will be constructed, which has nr_parent < 0 to make it easier
 to discern from user provided grafts.
 
-- 
2.28.0.277.g9b3c35fffd


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description
  2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
                     ` (3 preceding siblings ...)
  2020-08-14 12:21   ` [PATCH 4/5] shallow.txt: document SHA-256 shallow format Martin Ågren
@ 2020-08-14 12:21   ` Martin Ågren
  2020-08-14 12:37     ` Derrick Stolee
  2020-08-14 20:28   ` [PATCH 0/5] more SHA-256 documentation brian m. carlson
  2020-08-15 16:05   ` [PATCH v2 0/4] " Martin Ågren
  6 siblings, 1 reply; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 12:21 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Derrick Stolee, Junio C Hamano

We say that value 1 means "SHA-1", but in fact, it means "whatever
the_hash_algo is", see commit c166599862 ("commit-graph: convert to
using the_hash_algo", 2018-11-14).

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 If we want to be more fine-grained in the future, we'll need to say,
 e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the
 version number.

 I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It
 could be implemented as "easily" as "if (value_from_header !=
 value_from_the_hash_algo) die(...);" for now. Might that pay off in the
 long run?

 This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're
 fine.

 [1] https://lore.kernel.org/git/da077fb0-14bb-b84f-c526-d759ebc9f5eb@gmail.com/

 Documentation/technical/commit-graph-format.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 440541045d..3535426d32 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -42,8 +42,8 @@ HEADER:
   1-byte version number:
       Currently, the only valid version is 1.
 
-  1-byte Hash Version (1 = SHA-1)
-      We infer the hash length (H) from this value.
+  1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo)
+      We infer the hash length (H) from the hash algo derived from this value.
 
   1-byte number (C) of "chunks"
 
-- 
2.28.0.277.g9b3c35fffd


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/5] index-format.txt: document SHA-256 index format
  2020-08-14 12:21   ` [PATCH 2/5] index-format.txt: document SHA-256 index format Martin Ågren
@ 2020-08-14 12:28     ` Derrick Stolee
  2020-08-14 14:05       ` Martin Ågren
  0 siblings, 1 reply; 35+ messages in thread
From: Derrick Stolee @ 2020-08-14 12:28 UTC (permalink / raw)
  To: Martin Ågren, brian m. carlson; +Cc: git, Junio C Hamano

On 8/14/2020 8:21 AM, Martin Ågren wrote:
> Similar to a recent commit, document that in SHA-1 repositories, we use
> SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all
> other uses of "SHA-1" with something more neutral.
> 
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  Documentation/technical/index-format.txt | 27 +++++++++++++-----------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
> index faa25c5c52..827ece2ed1 100644
> --- a/Documentation/technical/index-format.txt
> +++ b/Documentation/technical/index-format.txt
> @@ -3,8 +3,11 @@ Git index format
>  
>  == The Git index file has the following format
>  
> -  All binary numbers are in network byte order. Version 2 is described
> -  here unless stated otherwise.
> +  All binary numbers are in network byte order.
> +  In a repository using the traditional SHA-1, checksums and object IDs
> +  (object names) mentioned below are all computed using SHA-1.  Similarly,
> +  in SHA-256 repositories, these values are computed using SHA-256.
> +  Version 2 is described here unless stated otherwise.
>  
>     - A 12-byte header consisting of
>  
> @@ -32,7 +35,7 @@ Git index format
>  
>       Extension data
>  
> -   - 160-bit SHA-1 over the content of the index file before this
> +   - 160-bit hash checksum over the content of the index file before this
>       checksum.

If this hash is flexible, then "160-bit" is not correct anymore, right?

>  == Index entry
> @@ -80,7 +83,7 @@ Git index format
>    32-bit file size
>      This is the on-disk size from stat(2), truncated to 32-bit.
>  
> -  160-bit SHA-1 for the represented object
> +  160-bit object name for the represented object

Same here. The later instances of "160-bit" were dropped.

>    A 16-bit 'flags' field split into (high to low bits)
>  
> @@ -211,8 +214,8 @@ Git index format
>  
>    The extension consists of:
>  
> -  - 160-bit SHA-1 of the shared index file. The shared index file path
> -    is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
> +  - Hash of the shared index file. The shared index file path
> +    is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
>      index does not require a shared index file.
>  
>    - An ewah-encoded delete bitmap, each bit represents an entry in the
> @@ -253,10 +256,10 @@ Git index format
>  
>    - 32-bit dir_flags (see struct dir_struct)
>  
> -  - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
> +  - Hash of $GIT_DIR/info/exclude. A null hash means the file
>      does not exist.
>  
> -  - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
> +  - Hash of core.excludesfile. A null hash means the file does
>      not exist.
>  
>    - NUL-terminated string of per-dir exclude file name. This usually
> @@ -285,13 +288,13 @@ The remaining data of each directory block is grouped by type:
>    - An ewah bitmap, the n-th bit records "check-only" bit of
>      read_directory_recursive() for the n-th directory.
>  
> -  - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
> +  - An ewah bitmap, the n-th bit indicates whether hash and stat data
>      is valid for the n-th directory and exists in the next data.
>  
>    - An array of stat data. The n-th data corresponds with the n-th
>      "one" bit in the previous ewah bitmap.
>  
> -  - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
> +  - An array of hashes. The n-th hash corresponds with the n-th "one" bit
>      in the previous ewah bitmap.
>  
>    - One NUL.
> @@ -330,12 +333,12 @@ The remaining data of each directory block is grouped by type:
>  
>    - 32-bit offset to the end of the index entries
>  
> -  - 160-bit SHA-1 over the extension types and their sizes (but not
> +  - Hash over the extension types and their sizes (but not
>  	their contents).  E.g. if we have "TREE" extension that is N-bytes
>  	long, "REUC" extension that is M-bytes long, followed by "EOIE",
>  	then the hash would be:
>  
> -	SHA-1("TREE" + <binary representation of N> +
> +	Hash("TREE" + <binary representation of N> +
>  		"REUC" + <binary representation of M>)
>  
>  == Index Entry Offset Table
> 

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
  2020-08-14 12:21   ` [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
@ 2020-08-14 12:31     ` Derrick Stolee
  2020-08-14 14:05       ` Martin Ågren
  2020-08-14 17:33     ` Junio C Hamano
  1 sibling, 1 reply; 35+ messages in thread
From: Derrick Stolee @ 2020-08-14 12:31 UTC (permalink / raw)
  To: Martin Ågren, brian m. carlson; +Cc: git, Junio C Hamano

On 8/14/2020 8:21 AM, Martin Ågren wrote:
> Two of our extensions contain "sha1" in their names, but that's
> historical. The "want"s will take object names that are not necessarily
> SHA-1s. Make this clear, but also make it clear how there's still just
> one correct hash algo: These extensions don't somehow make the "want"s
> take object names derived using *any* hash algorithm.
> 
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  Documentation/technical/protocol-capabilities.txt | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
> index 36ccd14f97..47f1b30090 100644
> --- a/Documentation/technical/protocol-capabilities.txt
> +++ b/Documentation/technical/protocol-capabilities.txt
> @@ -324,15 +324,18 @@ allow-tip-sha1-in-want
>  ----------------------
>  
>  If the upload-pack server advertises this capability, fetch-pack may
> -send "want" lines with SHA-1s that exist at the server but are not
> -advertised by upload-pack.
> +send "want" lines with object names that exist at the server but are not
> +advertised by upload-pack. (Note that the name of the capability
> +contains "sha1", but that it's more general than that: in SHA-1
> +repositories, the "want" lines provide SHA-1 values, but in SHA-256
> +repositories, they provide SHA-256 values.)
>  
>  allow-reachable-sha1-in-want
>  ----------------------------
>  
>  If the upload-pack server advertises this capability, fetch-pack may
> -send "want" lines with SHA-1s that exist at the server but are not
> -advertised by upload-pack.
> +send "want" lines with object names that exist at the server but are not
> +advertised by upload-pack. (Same remark about "sha1" as above.)

This "as above" is brittle to future changes. I think it
could be improved with

	(As in "allow-tip-sha1-in-want", the "sha1" in this capability
	refers to object names, not the hash algorithm chosen for the
	repository.)

Or, just repeat the same note again.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description
  2020-08-14 12:21   ` [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description Martin Ågren
@ 2020-08-14 12:37     ` Derrick Stolee
  2020-08-14 14:10       ` Martin Ågren
  0 siblings, 1 reply; 35+ messages in thread
From: Derrick Stolee @ 2020-08-14 12:37 UTC (permalink / raw)
  To: Martin Ågren, brian m. carlson
  Cc: git, Junio C Hamano, Taylor Blau, Abhishek Kumar

On 8/14/2020 8:21 AM, Martin Ågren wrote:
> We say that value 1 means "SHA-1", but in fact, it means "whatever
> the_hash_algo is", see commit c166599862 ("commit-graph: convert to
> using the_hash_algo", 2018-11-14).
> 
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  If we want to be more fine-grained in the future, we'll need to say,
>  e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the
>  version number.
> 
>  I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It
>  could be implemented as "easily" as "if (value_from_header !=
>  value_from_the_hash_algo) die(...);" for now. Might that pay off in the
>  long run?
> 
>  This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're
>  fine.

I think that was the intention of the byte, but that is not what ended
up happening. If we want that to be the case, then we should do that
work as part of the 2.29 cycle before we release with the ability to
create SHA-256 repos (which will lock the commit-graph format for these
repos).

(By "we" I mean that I would try to do this work in a way that minimizes
conflicts with the current commit-graph work in flight [1] [2].)

[1] https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/

[2] https://lore.kernel.org/git/cover.1597178914.git.me@ttaylorr.com/

>  [1] https://lore.kernel.org/git/da077fb0-14bb-b84f-c526-d759ebc9f5eb@gmail.com/
> 
>  Documentation/technical/commit-graph-format.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
> index 440541045d..3535426d32 100644
> --- a/Documentation/technical/commit-graph-format.txt
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -42,8 +42,8 @@ HEADER:
>    1-byte version number:
>        Currently, the only valid version is 1.
>  
> -  1-byte Hash Version (1 = SHA-1)
> -      We infer the hash length (H) from this value.
> +  1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo)
> +      We infer the hash length (H) from the hash algo derived from this value.

If we are _not_ changing the format to have a meaningful value in
this byte, then this documentation should be updated to state that
this byte must always have value 1, as it does not provide any
information.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/5] index-format.txt: document SHA-256 index format
  2020-08-14 12:28     ` Derrick Stolee
@ 2020-08-14 14:05       ` Martin Ågren
  0 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 14:05 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: brian m. carlson, Git Mailing List, Junio C Hamano

On Fri, 14 Aug 2020 at 14:28, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/14/2020 8:21 AM, Martin Ågren wrote:
> > -   - 160-bit SHA-1 over the content of the index file before this
> > +   - 160-bit hash checksum over the content of the index file before this
> >       checksum.
>
> If this hash is flexible, then "160-bit" is not correct anymore, right?
>
> > -  160-bit SHA-1 for the represented object
> > +  160-bit object name for the represented object
>
> Same here. The later instances of "160-bit" were dropped.

Thanks for pointing out these errors.


Martin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
  2020-08-14 12:31     ` Derrick Stolee
@ 2020-08-14 14:05       ` Martin Ågren
  0 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 14:05 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: brian m. carlson, Git Mailing List, Junio C Hamano

On Fri, 14 Aug 2020 at 14:31, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/14/2020 8:21 AM, Martin Ågren wrote:
> >
> >  If the upload-pack server advertises this capability, fetch-pack may
> > -send "want" lines with SHA-1s that exist at the server but are not
> > -advertised by upload-pack.
> > +send "want" lines with object names that exist at the server but are not
> > +advertised by upload-pack. (Note that the name of the capability
> > +contains "sha1", but that it's more general than that: in SHA-1
> > +repositories, the "want" lines provide SHA-1 values, but in SHA-256
> > +repositories, they provide SHA-256 values.)
> >
> >  allow-reachable-sha1-in-want
> >  ----------------------------
> >
> >  If the upload-pack server advertises this capability, fetch-pack may
> > -send "want" lines with SHA-1s that exist at the server but are not
> > -advertised by upload-pack.
> > +send "want" lines with object names that exist at the server but are not
> > +advertised by upload-pack. (Same remark about "sha1" as above.)
>
> This "as above" is brittle to future changes.

Fair enough. :-) I actually thought this might be *less* brittle, since
we wouldn't need to do any additional changes twice.

> I think it
> could be improved with
>
>         (As in "allow-tip-sha1-in-want", the "sha1" in this capability
>         refers to object names, not the hash algorithm chosen for the
>         repository.)
>
> Or, just repeat the same note again.

These two paragraphs are identical before this patch, so it might make
sense not to change that property. Thanks.


Martin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description
  2020-08-14 12:37     ` Derrick Stolee
@ 2020-08-14 14:10       ` Martin Ågren
  0 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 14:10 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: brian m. carlson, Git Mailing List, Junio C Hamano, Taylor Blau,
	Abhishek Kumar

On Fri, 14 Aug 2020 at 14:37, Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/14/2020 8:21 AM, Martin Ågren wrote:
> > We say that value 1 means "SHA-1", but in fact, it means "whatever
> > the_hash_algo is", see commit c166599862 ("commit-graph: convert to
> > using the_hash_algo", 2018-11-14).
> >
> > Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> > ---
> >  If we want to be more fine-grained in the future, we'll need to say,
> >  e.g., "2 means SHA-1, 3 means SHA-256" or, perhaps preferrably, bump the
> >  version number.
> >
> >  I wonder: Should we instead say "1 means SHA-1, 2 means SHA-256"? It
> >  could be implemented as "easily" as "if (value_from_header !=
> >  value_from_the_hash_algo) die(...);" for now. Might that pay off in the
> >  long run?
> >
> >  This relates to Stolee's "in a vacuum" comment [1] ... so maybe we're
> >  fine.
>
> I think that was the intention of the byte, but that is not what ended
> up happening.

When I wrote this patch, I did go with "fix <thing>" rather than
"document SHA-256". For the other patches, it's sort of obvious how
those formats are so old that it's no wonder they assumed SHA-1. But
here, we did go to some trouble to try and future-proof things and
already had NewHash in mind. So that calls for "fix <thing>". But I'm
more and more starting to think that it's the implementation that should
be fixed and that the docs should just be extended to add "2 means
SHA-256".

> If we want that to be the case, then we should do that
> work as part of the 2.29 cycle before we release with the ability to
> create SHA-256 repos (which will lock the commit-graph format for these
> repos).

Part of my reasoning behind [3] was that in exactly a situation like this,
we'd be able to say

  With extensions.objectFormat=sha256, Git 2.29-2.30 will barf at the
  commit-graph files that Git 2.31+ generate, and the other way around.
  Users will be able to remove "old" files and regenerate them, and
  shouldn't use a mixed environment.

and know that those users knew this might happen.

But certainly, if we can avoid it altogether by changing behavior
already in 2.29, that would be better.

[3] https://lore.kernel.org/git/20200806202358.2265705-1-martin.agren@gmail.com/

> (By "we" I mean that I would try to do this work in a way that minimizes
> conflicts with the current commit-graph work in flight [1] [2].)

None of those seems to touch `oid_version()`, so if we can just tweak
that function to return 1 or 2 instead of always 1, I guess that's one
way.

> [1] https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/
>
> [2] https://lore.kernel.org/git/cover.1597178914.git.me@ttaylorr.com/
>
> >
> > -  1-byte Hash Version (1 = SHA-1)
> > -      We infer the hash length (H) from this value.
> > +  1-byte Hash Version (1 = SHA-1 in SHA-1 repo, SHA-256 in SHA-256 repo)
> > +      We infer the hash length (H) from the hash algo derived from this value.
>
> If we are _not_ changing the format to have a meaningful value in
> this byte, then this documentation should be updated to state that
> this byte must always have value 1, as it does not provide any
> information.

We could still go

  1 means whatever extensions.objectFormat says,
  2 means SHA-1,
  3 means SHA-256,
  ...

But maybe that would just be crazy.

Thanks for all your comments
Martin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
  2020-08-14 12:21   ` [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
@ 2020-08-14 17:28     ` Junio C Hamano
  2020-08-14 20:23       ` brian m. carlson
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 17:28 UTC (permalink / raw)
  To: Martin Ågren; +Cc: brian m. carlson, git, Derrick Stolee

Martin Ågren <martin.agren@gmail.com> writes:

> Document that in SHA-1 repositories, we use SHA-1 for "want"s and
> "have"s, and in SHA-256 repositories, we use SHA-256.

Ehh, doesn't this directly contradict the transition plan of "on the
wire everything will use SHA-1 version for now?"



> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  Documentation/technical/http-protocol.txt | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
> index 51a79e63de..507f28f9b3 100644
> --- a/Documentation/technical/http-protocol.txt
> +++ b/Documentation/technical/http-protocol.txt
> @@ -401,8 +401,9 @@ at all in the request stream:
>  The stream is terminated by a pkt-line flush (`0000`).
>  
>  A single "want" or "have" command MUST have one hex formatted
> -SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
> -multiple commands.
> +object name as its value.  Multiple object names MUST be sent by sending
> +multiple commands. (An object name is a SHA-1 hash in a SHA-1 repo
> +and a SHA-256 hash in a SHA-256 repo.)
>  
>  The `have` list is created by popping the first 32 commits
>  from `c_pending`.  Less can be supplied if `c_pending` empties.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
  2020-08-14 12:21   ` [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
  2020-08-14 12:31     ` Derrick Stolee
@ 2020-08-14 17:33     ` Junio C Hamano
  2020-08-14 20:35       ` Martin Ågren
  1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 17:33 UTC (permalink / raw)
  To: Martin Ågren; +Cc: brian m. carlson, git, Derrick Stolee

Martin Ågren <martin.agren@gmail.com> writes:

> Two of our extensions contain "sha1" in their names, but that's
> historical. The "want"s will take object names that are not necessarily
> SHA-1s. Make this clear, but also make it clear how there's still just
> one correct hash algo: These extensions don't somehow make the "want"s
> take object names derived using *any* hash algorithm.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  Documentation/technical/protocol-capabilities.txt | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
> index 36ccd14f97..47f1b30090 100644
> --- a/Documentation/technical/protocol-capabilities.txt
> +++ b/Documentation/technical/protocol-capabilities.txt
> @@ -324,15 +324,18 @@ allow-tip-sha1-in-want
>  ----------------------
>  
>  If the upload-pack server advertises this capability, fetch-pack may
> -send "want" lines with SHA-1s that exist at the server but are not
> -advertised by upload-pack.
> +send "want" lines with object names that exist at the server but are not
> +advertised by upload-pack. (Note that the name of the capability
> +contains "sha1", but that it's more general than that: in SHA-1
> +repositories, the "want" lines provide SHA-1 values, but in SHA-256
> +repositories, they provide SHA-256 values.)

I think we should have either a new sha256 capability or a more
generic hash-algo capability whose value can be set to sha256.
Neither the connection initiators or the acceptors should talk
in sha256 until both ends agreed to do so.  

I do not think of any other way to make sure hosting sites to serve
projects that migrate at different pace.  Per project, you might be
able to have a flag day.  You cannot have a flag day that spans the
world.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] Documentation updates for SHA-256
  2020-08-14  4:47   ` Junio C Hamano
@ 2020-08-14 20:20     ` brian m. carlson
  2020-08-14 20:25       ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: brian m. carlson @ 2020-08-14 20:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git, Martin Ågren

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

On 2020-08-14 at 04:47:19, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
> > Is that really a concern? Maybe, but also Git will never move data like
> > that.
> 
> I would say that we can safely say that this year ;-) as dumb HTTP
> would be mostly dead.

We do fetch the refs first for dumb HTTP so last I checked, we correctly
detected this case and failed.  I'd personally be happy to let the
DAV-based protocol die, but there are folks who like it.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
  2020-08-14 17:28     ` Junio C Hamano
@ 2020-08-14 20:23       ` brian m. carlson
  2020-08-14 20:32         ` Martin Ågren
  2020-08-14 20:39         ` Junio C Hamano
  0 siblings, 2 replies; 35+ messages in thread
From: brian m. carlson @ 2020-08-14 20:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Ågren, git, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
> Martin Ågren <martin.agren@gmail.com> writes:
> 
> > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
> > "have"s, and in SHA-256 repositories, we use SHA-256.
> 
> Ehh, doesn't this directly contradict the transition plan of "on the
> wire everything will use SHA-1 version for now?"

SHA-256 repositories interoperate currently using SHA-256 object IDs.
It was originally intended that we wouldn't update the protocol, but
that leads to much of the testsuite failing since it's impossible to
move objects from one place to another.

If we wanted to be more pedantically correct and optimize for the
future, we could say that the values use the format negotiated by the
"object-format" protocol extension and SHA-1 otherwise.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] Documentation updates for SHA-256
  2020-08-14 20:20     ` brian m. carlson
@ 2020-08-14 20:25       ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 20:25 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Derrick Stolee, git, Martin Ågren

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2020-08-14 at 04:47:19, Junio C Hamano wrote:
>> Derrick Stolee <stolee@gmail.com> writes:
>> 
>> > Is that really a concern? Maybe, but also Git will never move data like
>> > that.
>> 
>> I would say that we can safely say that this year ;-) as dumb HTTP
>> would be mostly dead.
>
> We do fetch the refs first for dumb HTTP so last I checked, we correctly
> detected this case and failed.  I'd personally be happy to let the
> DAV-based protocol die, but there are folks who like it.

I didn't mean DAV.  

The oldest dumb HTTP code grabs all packfiles listed in
objects/info/packs and there is nothing to prevent folks from
running the current client to fetch from SHA-256 repository into a
SHA-1 repository.  The resulting packfiles that do not identify with
the version number what hash it uses would be very hard to use.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/5] more SHA-256 documentation
  2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
                     ` (4 preceding siblings ...)
  2020-08-14 12:21   ` [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description Martin Ågren
@ 2020-08-14 20:28   ` brian m. carlson
  2020-08-15 16:05   ` [PATCH v2 0/4] " Martin Ågren
  6 siblings, 0 replies; 35+ messages in thread
From: brian m. carlson @ 2020-08-14 20:28 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git, Derrick Stolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1022 bytes --]

On 2020-08-14 at 12:21:41, Martin Ågren wrote:
> Hi brian,
> 
> On Fri, 14 Aug 2020 at 00:49, brian m. carlson <sandals@crustytoothpaste.net> wrote:
> >
> > As was pointed out recently, some of our documentation doesn't properly
> > reflect the new support for SHA-256.  This series updates the pack and
> > index documentation to reflect that these formats can handle SHA-256,
> > and updates the transition plan to reflect what we've implemented and
> > what the next steps are.
> 
> Thanks, this looks great. Now we're making clear what it is we intend to
> be doing.
> 
> What about these additional patches on top? These are based on my
> understanding, but hopefully they're not *too* wrong. I'm a bit hesitant
> about the final patch and it would be interesting to know what you
> think.

I think Stolee has a series so that the final patch isn't necessary, and
other than the things he mentioned in this thread, I think these would
be fine on top.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
  2020-08-14 20:23       ` brian m. carlson
@ 2020-08-14 20:32         ` Martin Ågren
  2020-08-14 20:39         ` Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 20:32 UTC (permalink / raw)
  To: brian m. carlson, Junio C Hamano, Martin Ågren,
	Git Mailing List, Derrick Stolee

On Fri, 14 Aug 2020 at 22:23, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
> > Martin Ågren <martin.agren@gmail.com> writes:
> >
> > > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
> > > "have"s, and in SHA-256 repositories, we use SHA-256.
> >
> > Ehh, doesn't this directly contradict the transition plan of "on the
> > wire everything will use SHA-1 version for now?"

Yes, the transition plan would probably need updating there. I'm just
trying to document what we have.

> SHA-256 repositories interoperate currently using SHA-256 object IDs.
> It was originally intended that we wouldn't update the protocol, but
> that leads to much of the testsuite failing since it's impossible to
> move objects from one place to another.
>
> If we wanted to be more pedantically correct and optimize for the
> future, we could say that the values use the format negotiated by the
> "object-format" protocol extension and SHA-1 otherwise.

Hmm, I didn't think of that. Would we ever regret that we've painted
such a "big" picture and wish to refine it somehow? Compared to
admittedly being fairly narrow as I am here, then loosen things later.
I'll think about it, but I think I could go either way.

Martin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
  2020-08-14 17:33     ` Junio C Hamano
@ 2020-08-14 20:35       ` Martin Ågren
  2020-08-14 20:43         ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Martin Ågren @ 2020-08-14 20:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: brian m. carlson, Git Mailing List, Derrick Stolee

On Fri, 14 Aug 2020 at 19:33, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > Two of our extensions contain "sha1" in their names, but that's
> > historical. The "want"s will take object names that are not necessarily
> > SHA-1s. Make this clear, but also make it clear how there's still just
> > one correct hash algo: These extensions don't somehow make the "want"s
> > take object names derived using *any* hash algorithm.
> >
> > Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> > ---
> >  Documentation/technical/protocol-capabilities.txt | 11 +++++++----
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
> > index 36ccd14f97..47f1b30090 100644
> > --- a/Documentation/technical/protocol-capabilities.txt
> > +++ b/Documentation/technical/protocol-capabilities.txt
> > @@ -324,15 +324,18 @@ allow-tip-sha1-in-want
> >  ----------------------
> >
> >  If the upload-pack server advertises this capability, fetch-pack may
> > -send "want" lines with SHA-1s that exist at the server but are not
> > -advertised by upload-pack.
> > +send "want" lines with object names that exist at the server but are not
> > +advertised by upload-pack. (Note that the name of the capability
> > +contains "sha1", but that it's more general than that: in SHA-1
> > +repositories, the "want" lines provide SHA-1 values, but in SHA-256
> > +repositories, they provide SHA-256 values.)
>
> I think we should have either a new sha256 capability or a more
> generic hash-algo capability whose value can be set to sha256.
> Neither the connection initiators or the acceptors should talk
> in sha256 until both ends agreed to do so.

I think we should, and I think we do. I haven't dug into the details,
but "object-format" looks like it's just that.

Maybe instead of SHA-1 and SHA-256, this should talk about "whatever has
been negotiated through 'object-format', or SHA-1", similar to brian's
suggestion elsewhere.

> I do not think of any other way to make sure hosting sites to serve
> projects that migrate at different pace.  Per project, you might be
> able to have a flag day.  You cannot have a flag day that spans the
> world.

Yeah, that makes sense.


Martin

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
  2020-08-14 20:23       ` brian m. carlson
  2020-08-14 20:32         ` Martin Ågren
@ 2020-08-14 20:39         ` Junio C Hamano
  2020-08-14 20:47           ` Junio C Hamano
  1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 20:39 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Martin Ågren, git, Derrick Stolee

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
>> Martin Ågren <martin.agren@gmail.com> writes:
>> 
>> > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
>> > "have"s, and in SHA-256 repositories, we use SHA-256.
>> 
>> Ehh, doesn't this directly contradict the transition plan of "on the
>> wire everything will use SHA-1 version for now?"
>
> SHA-256 repositories interoperate currently using SHA-256 object IDs.
> It was originally intended that we wouldn't update the protocol, but
> that leads to much of the testsuite failing since it's impossible to
> move objects from one place to another.
>
> If we wanted to be more pedantically correct and optimize for the
> future, we could say that the values use the format negotiated by the
> "object-format" protocol extension and SHA-1 otherwise.

Yup.  I think a reasonable evolution path is

    0) everything on the wire is SHA-1 and no local operation knows
       SHA-256 (i.e. a few releases ago)

    1) local operations are either SHA-1 or SHA-256 but not both.
       On the wire, only protocol for SHA-1 repositories are
       defined, so SHA-256 repositories cannot talk with anybody
       using any official protocol, but a "borked" SHA-1 protocol
       that naturally extends the object names width exists and
       SHA-256 repositories can interoperate with each other.  This
       will be a backward compatibility nightmare, as Git from
       SHA-256 repository that tries to talk to SHA-1 repository
       will fail but without grace (i.e. the current situation).

    2) on-the-wire protocol gains just one new capability to safely
       unleash SHA-256 repositories to talk to the wider world.  The
       "borked" SHA-1 protocol above will become official when the
       object-format=sha256 capability is negotiated by both ends.
       At this stage, SHA-256 repositories still cannot talk with
       SHA-1 repositories, but at least they can talk among
       themselves as long as they use new-enough version of Git that
       knows about the new capability.

    3) on-the-fly SHA-1 vs SHA-256 migration gets implemented.
       SHA-256 reposotories trying to talk to somebody else, after
       discovering that the other end lacks object-format=sha256
       capability, on-the-fly converts its SHA-256 objecst to SHA-1
       and vice versa.  Between SHA-256 repositories, the capability
       above in 2) will allow native conversation with SHA-256.

Reaching 3) may be a lot of work, but at least we should get to 2)
to be able to safely let SHA-256 repositories to talk to the outside
world (yes, I consider it OK for SHA-256 repositories talking among
themselves in a private setting in the current state, and it would
be a good milestone and also test towards the eventual goal of
reaching 3), and with much smaller effort.

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
  2020-08-14 20:35       ` Martin Ågren
@ 2020-08-14 20:43         ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 20:43 UTC (permalink / raw)
  To: Martin Ågren; +Cc: brian m. carlson, Git Mailing List, Derrick Stolee

Martin Ågren <martin.agren@gmail.com> writes:

>> I think we should have either a new sha256 capability or a more
>> generic hash-algo capability whose value can be set to sha256.
>> Neither the connection initiators or the acceptors should talk
>> in sha256 until both ends agreed to do so.
>
> I think we should, and I think we do. I haven't dug into the details,
> but "object-format" looks like it's just that.

Ah, Yes, my thinko.

> Maybe instead of SHA-1 and SHA-256, this should talk about "whatever has
> been negotiated through 'object-format', or SHA-1", similar to brian's
> suggestion elsewhere.

Yup, that would be wonderful.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format
  2020-08-14 20:39         ` Junio C Hamano
@ 2020-08-14 20:47           ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2020-08-14 20:47 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Martin Ågren, git, Derrick Stolee

Junio C Hamano <gitster@pobox.com> writes:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>> On 2020-08-14 at 17:28:27, Junio C Hamano wrote:
>>> Martin Ågren <martin.agren@gmail.com> writes:
>>> 
>>> > Document that in SHA-1 repositories, we use SHA-1 for "want"s and
>>> > "have"s, and in SHA-256 repositories, we use SHA-256.
>>> 
>>> Ehh, doesn't this directly contradict the transition plan of "on the
>>> wire everything will use SHA-1 version for now?"
>>
>> SHA-256 repositories interoperate currently using SHA-256 object IDs.
>> It was originally intended that we wouldn't update the protocol, but
>> that leads to much of the testsuite failing since it's impossible to
>> move objects from one place to another.
>>
>> If we wanted to be more pedantically correct and optimize for the
>> future, we could say that the values use the format negotiated by the
>> "object-format" protocol extension and SHA-1 otherwise.

Yes, that's wonderful.  I was confused when I said about the
evolution path.  We still would want to eventually do the on-the-fly
migration over the wire to make SHA-1 and SHA-256 repositories
interoperate, but at least we already can allow SHA-256 repositories
safely attempt to talk to SHA-1 repositories and gracefully fail.

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 0/4] more SHA-256 documentation
  2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
                     ` (5 preceding siblings ...)
  2020-08-14 20:28   ` [PATCH 0/5] more SHA-256 documentation brian m. carlson
@ 2020-08-15 16:05   ` Martin Ågren
  2020-08-15 16:05     ` [PATCH v2 1/4] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
                       ` (3 more replies)
  6 siblings, 4 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:05 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano

Thanks brian, Stolee and Junio for your comments on the initial
submission. Changes since v1:

 * Dropped some "160-bit" I had missed.

 * Refer to the 'object-format' capability in a few spots rather than
   discussing SHA-1 vs SHA-256 repos.

 * Dropped the final patch, since Stolee has submitted a patch (series)
   for changing the implementation instead.

These could be part of bc/sha-256-doc-updates, since they are quite
similar in spirit, or go in separately, so these series don't need to
hold each other hostage. Whatever Junio and brian prefer will be fine
by me.

Martin

Martin Ågren (4):
  http-protocol.txt: document SHA-256 "want"/"have" format
  index-format.txt: document SHA-256 index format
  protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
  shallow.txt: document SHA-256 shallow format

 Documentation/technical/http-protocol.txt     |  5 +--
 Documentation/technical/index-format.txt      | 34 ++++++++++---------
 .../technical/protocol-capabilities.txt       | 12 ++++---
 Documentation/technical/shallow.txt           |  2 +-
 4 files changed, 30 insertions(+), 23 deletions(-)

Range-diff against v1:
1:  fcb26c81be ! 1:  2e9f6b9294 http-protocol.txt: document SHA-256 "want"/"have" format
    @@ Metadata
      ## Commit message ##
         http-protocol.txt: document SHA-256 "want"/"have" format
     
    -    Document that in SHA-1 repositories, we use SHA-1 for "want"s and
    -    "have"s, and in SHA-256 repositories, we use SHA-256.
    +    Document that rather than always naming objects using SHA-1, we should
    +    use whatever has been negotiated using the object-format capability.
     
         Signed-off-by: Martin Ågren <martin.agren@gmail.com>
     
    @@ Documentation/technical/http-protocol.txt: at all in the request stream:
     -SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
     -multiple commands.
     +object name as its value.  Multiple object names MUST be sent by sending
    -+multiple commands. (An object name is a SHA-1 hash in a SHA-1 repo
    -+and a SHA-256 hash in a SHA-256 repo.)
    ++multiple commands. Object names MUST be given using the object format
    ++negotiated through the `object-format` capability (default SHA-1).
      
      The `have` list is created by popping the first 32 commits
      from `c_pending`.  Less can be supplied if `c_pending` empties.
2:  5c13a9478a ! 2:  14bd0d9362 index-format.txt: document SHA-256 index format
    @@ Metadata
      ## Commit message ##
         index-format.txt: document SHA-256 index format
     
    -    Similar to a recent commit, document that in SHA-1 repositories, we use
    -    SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all
    -    other uses of "SHA-1" with something more neutral.
    +    Document that in SHA-1 repositories, we use SHA-1 and in SHA-256
    +    repositories, we use SHA-256, then replace all other uses of "SHA-1"
    +    with something more neutral. Avoid referring to "160-bit" hash values.
     
         Signed-off-by: Martin Ågren <martin.agren@gmail.com>
     
    @@ Documentation/technical/index-format.txt: Git index format
           Extension data
      
     -   - 160-bit SHA-1 over the content of the index file before this
    -+   - 160-bit hash checksum over the content of the index file before this
    -      checksum.
    +-     checksum.
    ++   - Hash checksum over the content of the index file before this checksum.
      
      == Index entry
    + 
     @@ Documentation/technical/index-format.txt: Git index format
        32-bit file size
          This is the on-disk size from stat(2), truncated to 32-bit.
      
     -  160-bit SHA-1 for the represented object
    -+  160-bit object name for the represented object
    ++  Object name for the represented object
      
        A 16-bit 'flags' field split into (high to low bits)
      
    +@@ Documentation/technical/index-format.txt: Git index format
    + 
    +   - A newline (ASCII 10); and
    + 
    +-  - 160-bit object name for the object that would result from writing
    +-    this span of index as a tree.
    ++  - Object name for the object that would result from writing this span
    ++    of index as a tree.
    + 
    +   An entry can be in an invalidated state and is represented by having
    +   a negative number in the entry_count field. In this case, there is no
    +@@ Documentation/technical/index-format.txt: Git index format
    +     stage 1 to 3 (a missing stage is represented by "0" in this field);
    +     and
    + 
    +-  - At most three 160-bit object names of the entry in stages from 1 to 3
    ++  - At most three object names of the entry in stages from 1 to 3
    +     (nothing is written for a missing stage).
    + 
    + === Split index
     @@ Documentation/technical/index-format.txt: Git index format
      
        The extension consists of:
3:  82e5c67b7c ! 3:  2e82be9e36 protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
    @@ Metadata
      ## Commit message ##
         protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
     
    -    Two of our extensions contain "sha1" in their names, but that's
    -    historical. The "want"s will take object names that are not necessarily
    -    SHA-1s. Make this clear, but also make it clear how there's still just
    -    one correct hash algo: These extensions don't somehow make the "want"s
    -    take object names derived using *any* hash algorithm.
    +    Two of our capabilities contain "sha1" in their names, but that's
    +    historical. Clarify that object names are still to be given using
    +    whatever object format has been negotiated using the "object-format"
    +    capability.
     
         Signed-off-by: Martin Ågren <martin.agren@gmail.com>
     
    @@ Documentation/technical/protocol-capabilities.txt: allow-tip-sha1-in-want
     -send "want" lines with SHA-1s that exist at the server but are not
     -advertised by upload-pack.
     +send "want" lines with object names that exist at the server but are not
    -+advertised by upload-pack. (Note that the name of the capability
    -+contains "sha1", but that it's more general than that: in SHA-1
    -+repositories, the "want" lines provide SHA-1 values, but in SHA-256
    -+repositories, they provide SHA-256 values.)
    ++advertised by upload-pack. For historical reasons, the name of this
    ++capability contains "sha1". Object names are always given using the
    ++object format negotiated through the 'object-format' capability.
      
      allow-reachable-sha1-in-want
      ----------------------------
    @@ Documentation/technical/protocol-capabilities.txt: allow-tip-sha1-in-want
     -send "want" lines with SHA-1s that exist at the server but are not
     -advertised by upload-pack.
     +send "want" lines with object names that exist at the server but are not
    -+advertised by upload-pack. (Same remark about "sha1" as above.)
    ++advertised by upload-pack. For historical reasons, the name of this
    ++capability contains "sha1". Object names are always given using the
    ++object format negotiated through the 'object-format' capability.
      
      push-cert=<nonce>
      -----------------
4:  bcfbdd25e5 ! 4:  8680fc1af6 shallow.txt: document SHA-256 shallow format
    @@ Metadata
      ## Commit message ##
         shallow.txt: document SHA-256 shallow format
     
    -    Similar to recent commits, document that in SHA-1 repositories, we use
    -    SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256.
    +    Similar to recent commits, document that we list object names rather
    +    than SHA-1s.
     
         Signed-off-by: Martin Ågren <martin.agren@gmail.com>
     
5:  f95e3f65e7 < -:  ---------- commit-graph-format.txt: fix "Hash Version" description
-- 
2.28.0.297.g1956fa8f8d


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/4] http-protocol.txt: document SHA-256 "want"/"have" format
  2020-08-15 16:05   ` [PATCH v2 0/4] " Martin Ågren
@ 2020-08-15 16:05     ` Martin Ågren
  2020-08-15 16:06     ` [PATCH v2 2/4] index-format.txt: document SHA-256 index format Martin Ågren
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:05 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano

Document that rather than always naming objects using SHA-1, we should
use whatever has been negotiated using the object-format capability.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/http-protocol.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt
index 51a79e63de..96d89ea9b2 100644
--- a/Documentation/technical/http-protocol.txt
+++ b/Documentation/technical/http-protocol.txt
@@ -401,8 +401,9 @@ at all in the request stream:
 The stream is terminated by a pkt-line flush (`0000`).
 
 A single "want" or "have" command MUST have one hex formatted
-SHA-1 as its value.  Multiple SHA-1s MUST be sent by sending
-multiple commands.
+object name as its value.  Multiple object names MUST be sent by sending
+multiple commands. Object names MUST be given using the object format
+negotiated through the `object-format` capability (default SHA-1).
 
 The `have` list is created by popping the first 32 commits
 from `c_pending`.  Less can be supplied if `c_pending` empties.
-- 
2.28.0.297.g1956fa8f8d


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 2/4] index-format.txt: document SHA-256 index format
  2020-08-15 16:05   ` [PATCH v2 0/4] " Martin Ågren
  2020-08-15 16:05     ` [PATCH v2 1/4] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
@ 2020-08-15 16:06     ` Martin Ågren
  2020-08-15 16:06     ` [PATCH v2 3/4] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
  2020-08-15 16:06     ` [PATCH v2 4/4] shallow.txt: document SHA-256 shallow format Martin Ågren
  3 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:06 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano

Document that in SHA-1 repositories, we use SHA-1 and in SHA-256
repositories, we use SHA-256, then replace all other uses of "SHA-1"
with something more neutral. Avoid referring to "160-bit" hash values.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/index-format.txt | 34 +++++++++++++-----------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index faa25c5c52..f9a3644711 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -3,8 +3,11 @@ Git index format
 
 == The Git index file has the following format
 
-  All binary numbers are in network byte order. Version 2 is described
-  here unless stated otherwise.
+  All binary numbers are in network byte order.
+  In a repository using the traditional SHA-1, checksums and object IDs
+  (object names) mentioned below are all computed using SHA-1.  Similarly,
+  in SHA-256 repositories, these values are computed using SHA-256.
+  Version 2 is described here unless stated otherwise.
 
    - A 12-byte header consisting of
 
@@ -32,8 +35,7 @@ Git index format
 
      Extension data
 
-   - 160-bit SHA-1 over the content of the index file before this
-     checksum.
+   - Hash checksum over the content of the index file before this checksum.
 
 == Index entry
 
@@ -80,7 +82,7 @@ Git index format
   32-bit file size
     This is the on-disk size from stat(2), truncated to 32-bit.
 
-  160-bit SHA-1 for the represented object
+  Object name for the represented object
 
   A 16-bit 'flags' field split into (high to low bits)
 
@@ -160,8 +162,8 @@ Git index format
 
   - A newline (ASCII 10); and
 
-  - 160-bit object name for the object that would result from writing
-    this span of index as a tree.
+  - Object name for the object that would result from writing this span
+    of index as a tree.
 
   An entry can be in an invalidated state and is represented by having
   a negative number in the entry_count field. In this case, there is no
@@ -198,7 +200,7 @@ Git index format
     stage 1 to 3 (a missing stage is represented by "0" in this field);
     and
 
-  - At most three 160-bit object names of the entry in stages from 1 to 3
+  - At most three object names of the entry in stages from 1 to 3
     (nothing is written for a missing stage).
 
 === Split index
@@ -211,8 +213,8 @@ Git index format
 
   The extension consists of:
 
-  - 160-bit SHA-1 of the shared index file. The shared index file path
-    is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the
+  - Hash of the shared index file. The shared index file path
+    is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the
     index does not require a shared index file.
 
   - An ewah-encoded delete bitmap, each bit represents an entry in the
@@ -253,10 +255,10 @@ Git index format
 
   - 32-bit dir_flags (see struct dir_struct)
 
-  - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file
+  - Hash of $GIT_DIR/info/exclude. A null hash means the file
     does not exist.
 
-  - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does
+  - Hash of core.excludesfile. A null hash means the file does
     not exist.
 
   - NUL-terminated string of per-dir exclude file name. This usually
@@ -285,13 +287,13 @@ The remaining data of each directory block is grouped by type:
   - An ewah bitmap, the n-th bit records "check-only" bit of
     read_directory_recursive() for the n-th directory.
 
-  - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data
+  - An ewah bitmap, the n-th bit indicates whether hash and stat data
     is valid for the n-th directory and exists in the next data.
 
   - An array of stat data. The n-th data corresponds with the n-th
     "one" bit in the previous ewah bitmap.
 
-  - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit
+  - An array of hashes. The n-th hash corresponds with the n-th "one" bit
     in the previous ewah bitmap.
 
   - One NUL.
@@ -330,12 +332,12 @@ The remaining data of each directory block is grouped by type:
 
   - 32-bit offset to the end of the index entries
 
-  - 160-bit SHA-1 over the extension types and their sizes (but not
+  - Hash over the extension types and their sizes (but not
 	their contents).  E.g. if we have "TREE" extension that is N-bytes
 	long, "REUC" extension that is M-bytes long, followed by "EOIE",
 	then the hash would be:
 
-	SHA-1("TREE" + <binary representation of N> +
+	Hash("TREE" + <binary representation of N> +
 		"REUC" + <binary representation of M>)
 
 == Index Entry Offset Table
-- 
2.28.0.297.g1956fa8f8d


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 3/4] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256
  2020-08-15 16:05   ` [PATCH v2 0/4] " Martin Ågren
  2020-08-15 16:05     ` [PATCH v2 1/4] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
  2020-08-15 16:06     ` [PATCH v2 2/4] index-format.txt: document SHA-256 index format Martin Ågren
@ 2020-08-15 16:06     ` Martin Ågren
  2020-08-15 16:06     ` [PATCH v2 4/4] shallow.txt: document SHA-256 shallow format Martin Ågren
  3 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:06 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano

Two of our capabilities contain "sha1" in their names, but that's
historical. Clarify that object names are still to be given using
whatever object format has been negotiated using the "object-format"
capability.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/protocol-capabilities.txt | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 36ccd14f97..124d716807 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -324,15 +324,19 @@ allow-tip-sha1-in-want
 ----------------------
 
 If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. For historical reasons, the name of this
+capability contains "sha1". Object names are always given using the
+object format negotiated through the 'object-format' capability.
 
 allow-reachable-sha1-in-want
 ----------------------------
 
 If the upload-pack server advertises this capability, fetch-pack may
-send "want" lines with SHA-1s that exist at the server but are not
-advertised by upload-pack.
+send "want" lines with object names that exist at the server but are not
+advertised by upload-pack. For historical reasons, the name of this
+capability contains "sha1". Object names are always given using the
+object format negotiated through the 'object-format' capability.
 
 push-cert=<nonce>
 -----------------
-- 
2.28.0.297.g1956fa8f8d


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 4/4] shallow.txt: document SHA-256 shallow format
  2020-08-15 16:05   ` [PATCH v2 0/4] " Martin Ågren
                       ` (2 preceding siblings ...)
  2020-08-15 16:06     ` [PATCH v2 3/4] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
@ 2020-08-15 16:06     ` Martin Ågren
  3 siblings, 0 replies; 35+ messages in thread
From: Martin Ågren @ 2020-08-15 16:06 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson, Derrick Stolee, Junio C Hamano

Similar to recent commits, document that we list object names rather
than SHA-1s.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 Documentation/technical/shallow.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/technical/shallow.txt b/Documentation/technical/shallow.txt
index 01dedfe9ff..f3738baa0f 100644
--- a/Documentation/technical/shallow.txt
+++ b/Documentation/technical/shallow.txt
@@ -13,7 +13,7 @@ pretend as if they are root commits (e.g. "git log" traversal
 stops after showing them; "git fsck" does not complain saying
 the commits listed on their "parent" lines do not exist).
 
-Each line contains exactly one SHA-1. When read, a commit_graft
+Each line contains exactly one object name. When read, a commit_graft
 will be constructed, which has nr_parent < 0 to make it easier
 to discern from user provided grafts.
 
-- 
2.28.0.297.g1956fa8f8d


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2020-08-15 22:01 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-13 22:48 [PATCH 0/2] Documentation updates for SHA-256 brian m. carlson
2020-08-13 22:49 ` [PATCH 1/2] docs: document SHA-256 pack and indices brian m. carlson
2020-08-14  2:26   ` Derrick Stolee
2020-08-13 22:49 ` [PATCH 2/2] docs: fix step in transition plan brian m. carlson
2020-08-14 12:21   ` Martin Ågren
2020-08-14  2:33 ` [PATCH 0/2] Documentation updates for SHA-256 Derrick Stolee
2020-08-14  4:47   ` Junio C Hamano
2020-08-14 20:20     ` brian m. carlson
2020-08-14 20:25       ` Junio C Hamano
2020-08-14 12:21 ` [PATCH 0/5] more SHA-256 documentation Martin Ågren
2020-08-14 12:21   ` [PATCH 1/5] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
2020-08-14 17:28     ` Junio C Hamano
2020-08-14 20:23       ` brian m. carlson
2020-08-14 20:32         ` Martin Ågren
2020-08-14 20:39         ` Junio C Hamano
2020-08-14 20:47           ` Junio C Hamano
2020-08-14 12:21   ` [PATCH 2/5] index-format.txt: document SHA-256 index format Martin Ågren
2020-08-14 12:28     ` Derrick Stolee
2020-08-14 14:05       ` Martin Ågren
2020-08-14 12:21   ` [PATCH 3/5] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
2020-08-14 12:31     ` Derrick Stolee
2020-08-14 14:05       ` Martin Ågren
2020-08-14 17:33     ` Junio C Hamano
2020-08-14 20:35       ` Martin Ågren
2020-08-14 20:43         ` Junio C Hamano
2020-08-14 12:21   ` [PATCH 4/5] shallow.txt: document SHA-256 shallow format Martin Ågren
2020-08-14 12:21   ` [PATCH 5/5] commit-graph-format.txt: fix "Hash Version" description Martin Ågren
2020-08-14 12:37     ` Derrick Stolee
2020-08-14 14:10       ` Martin Ågren
2020-08-14 20:28   ` [PATCH 0/5] more SHA-256 documentation brian m. carlson
2020-08-15 16:05   ` [PATCH v2 0/4] " Martin Ågren
2020-08-15 16:05     ` [PATCH v2 1/4] http-protocol.txt: document SHA-256 "want"/"have" format Martin Ågren
2020-08-15 16:06     ` [PATCH v2 2/4] index-format.txt: document SHA-256 index format Martin Ågren
2020-08-15 16:06     ` [PATCH v2 3/4] protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Martin Ågren
2020-08-15 16:06     ` [PATCH v2 4/4] shallow.txt: document SHA-256 shallow format Martin Ågren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).