Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/44] SHA-256 part 2/3: protocol functionality
@ 2020-05-13  0:53 brian m. carlson
  2020-05-13  0:53 ` [PATCH 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
                   ` (45 more replies)
  0 siblings, 46 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This is part 2 of 3 of the SHA-256 work.  This series, which is
unfortunately longer than I'd like, adds all of the protocol logic to
work with SHA-256 repositories.

It was originally planned that we would not upgrade the protocol and
would use SHA-1 for all protocol functionality until some point in the
future.  However, doing that requires a huge amount of additional work
(probably incorporating several hundred more patches which are not yet
written) and it's not possible to get the test suite to even come close
to passing without a way to fetch and push repositories.  I therefore
decided that implementing an object-format extension was the best way
forward.

This series provides object-format extensions for both the original and
v2 protocols, including full documentation.  Helpers, such as
git-remote-https, also learn capabilities to pass the object-format
extension back and forth, and to query its state.  The code is designed
to allow multiple object-format extensions to be provided if the server
supports multiple algorithms for one repo and to default to SHA-1 if no
object-format extension is provided.

The other two cases are the dumb HTTP protocol and bundles, both of
which have no object-format extension (because they provide no
capabilities) and are therefore distinguished solely by their hash
length.  We will have problems if in the future we need to use another
256-bit algorithm, but I plan to be improvident and hope that we'll move
to longer algorithms in the future to cover ourselves for post-quantum
security.

Clone support is necessarily a little tricky because we are initializing
a repository and then fetching refs, at which point we learn what hash
algorithm the remote side supports.  We work around this by calling the
code that updates the hash algorithm and repository version a second
time to rewrite that data once we know what version we're using.  This
is the most robust way I could approach this problem, but it is still a
little ugly.

As mentioned, this series is longer than I'd like, but it is complete:
this is all the SHA-256 protocol work.  Additional future series include
one last series of test fixes (28 patches) plus six final patches in the
series that enables SHA-256 support.

brian m. carlson (44):
  t1050: match object ID paths in a hash-insensitive way
  Documentation: document v1 protocol object-format capability
  connect: have ref processing code take struct packet_reader
  wrapper: add function to compare strings with different NUL
    termination
  remote: advertise the object-format capability on the server side
  connect: add function to parse multiple v1 capability values
  connect: add function to fetch value of a v2 server capability
  pkt-line: add a member for hash algorithm
  transport: add a hash algorithm member
  connect: add function to detect supported v1 hash functions
  send-pack: detect when the server doesn't support our hash
  connect: make parse_feature_value extern
  fetch-pack: detect when the server doesn't support our hash
  connect: detect algorithm when fetching refs
  builtin/receive-pack: detect when the server doesn't support our hash
  docs: update remote helper docs for object-format extensions
  transport-helper: implement object-format extensions
  remote-curl: implement object-format extensions
  builtin/clone: initialize hash algorithm properly
  t5562: pass object-format in synthesized test data
  t5704: send object-format capability with SHA-256
  fetch-pack: parse and advertise the object-format capability
  setup: set the_repository's hash algo when checking format
  t3200: mark assertion with SHA1 prerequisite
  packfile: compute and use the index CRC offset
  t5302: modernize test formatting
  builtin/show-index: provide options to determine hash algo
  t1302: expect repo format version 1 for SHA-256
  Documentation/technical: document object-format for protocol v2
  connect: pass full packet reader when parsing v2 refs
  connect: parse v2 refs with correct hash algorithm
  serve: advertise object-format capability for protocol v2
  t5500: make hash independent
  builtin/ls-remote: initialize repository based on fetch
  remote-curl: detect algorithm for dumb HTTP by size
  builtin/index-pack: add option to specify hash algorithm
  t1050: pass algorithm to index-pack when outside repo
  remote-curl: avoid truncating refs with ls-remote
  t/helper: initialize the repository for test-sha1-array
  t5702: offer an object-format capability in the test
  t5703: use object-format serve option
  t5300: pass --object-format to git index-pack
  bundle: detect hash algorithm when reading refs
  remote-testgit: adapt for object-format

 Documentation/gitremote-helpers.txt           |  33 +-
 .../technical/protocol-capabilities.txt       |  16 +-
 Documentation/technical/protocol-v2.txt       |   9 +
 builtin/clone.c                               |   9 +
 builtin/index-pack.c                          |  11 +-
 builtin/ls-remote.c                           |   4 +
 builtin/receive-pack.c                        |  10 +
 builtin/show-index.c                          |  29 +-
 bundle.c                                      |  22 +-
 bundle.h                                      |   1 +
 connect.c                                     | 136 +++++--
 connect.h                                     |   3 +
 fetch-pack.c                                  |  14 +
 git-compat-util.h                             |   2 +
 git.c                                         |   2 +-
 object-store.h                                |   1 +
 packfile.c                                    |   1 +
 pkt-line.c                                    |   1 +
 pkt-line.h                                    |   3 +
 remote-curl.c                                 |  46 ++-
 send-pack.c                                   |   6 +
 serve.c                                       |  27 ++
 setup.c                                       |   1 +
 t/helper/test-oid-array.c                     |   3 +
 t/t1050-large.sh                              |   6 +-
 t/t1302-repo-version.sh                       |   6 +-
 t/t3200-branch.sh                             |   2 +-
 t/t5300-pack-object.sh                        |   9 +-
 t/t5302-pack-index.sh                         | 360 +++++++++---------
 t/t5500-fetch-pack.sh                         |   5 +-
 t/t5562-http-backend-content-length.sh        |  14 +-
 t/t5701-git-serve.sh                          |  28 +-
 t/t5702-protocol-v2.sh                        |   2 +
 t/t5703-upload-pack-ref-in-want.sh            |  19 +-
 t/t5704-protocol-violations.sh                |  12 +
 t/t5801/git-remote-testgit                    |   6 +
 t/test-lib.sh                                 |   1 +
 transport-helper.c                            |  24 +-
 transport.c                                   |  18 +-
 transport.h                                   |   8 +
 upload-pack.c                                 |   3 +-
 wrapper.c                                     |  12 +
 42 files changed, 670 insertions(+), 255 deletions(-)


^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 01/44] t1050: match object ID paths in a hash-insensitive way
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13  0:53 ` [PATCH 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
                   ` (44 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

The pattern here looking for failures is specific to SHA-1.  Let's
create a variable that matches the regex or glob pattern for a path
within the objects directory.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1050-large.sh | 2 +-
 t/test-lib.sh    | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 184b479a21..7f88ea07c2 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -64,7 +64,7 @@ test_expect_success 'add a large file or two' '
 	test $count = 1 &&
 	cnt=$(git show-index <"$idx" | wc -l) &&
 	test $cnt = 2 &&
-	for l in .git/objects/??/??????????????????????????????????????
+	for l in .git/objects/$OIDPATH_REGEX
 	do
 		test_path_is_file "$l" || continue
 		bad=t
diff --git a/t/test-lib.sh b/t/test-lib.sh
index baf94546da..77e9a60fcb 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1428,6 +1428,7 @@ test_oid_init
 
 ZERO_OID=$(test_oid zero)
 OID_REGEX=$(echo $ZERO_OID | sed -e 's/0/[0-9a-f]/g')
+OIDPATH_REGEX=$(test_oid_to_path $ZERO_OID | sed -e 's/0/[0-9a-f]/g')
 EMPTY_TREE=$(test_oid empty_tree)
 EMPTY_BLOB=$(test_oid empty_blob)
 _z40=$ZERO_OID

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 02/44] Documentation: document v1 protocol object-format capability
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
  2020-05-13  0:53 ` [PATCH 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13 19:28   ` Martin Ågren
  2020-05-13  0:53 ` [PATCH 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
                   ` (43 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Document a capability that indicates which hash algorithms are in use by
both sides of a remote connection.  Use the term "object-format", since
this is the term used for the repository extension as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 .../technical/protocol-capabilities.txt          | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 2b267c0da6..026c42f86a 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -176,6 +176,21 @@ agent strings are purely informative for statistics and debugging
 purposes, and MUST NOT be used to programmatically assume the presence
 or absence of particular features.
 
+object-format
+-------------
+
+This capability, which takes a hash algorithm as an argument, indicates
+that the server supports the given hash algorithms.  It may be sent
+multiple times; if so, the first one given is the one used in the ref
+advertisement.
+
+When provided by the client, this indicates that it intends to use the
+given hash algorithm to communicate.  The algorithm provided must be one
+that the server supports.
+
+If this capability is not provided, it is assumed that the only
+supported algorithm is SHA-1.
+
 symref
 ------
 
@@ -189,7 +204,6 @@ refs being sent.
 
 Clients MAY use the parameters from this capability to select the proper initial
 branch when cloning a repository.
-
 shallow
 -------
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 03/44] connect: have ref processing code take struct packet_reader
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
  2020-05-13  0:53 ` [PATCH 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
  2020-05-13  0:53 ` [PATCH 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13 19:30   ` Martin Ågren
  2020-05-13  0:53 ` [PATCH 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
                   ` (42 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

In a future patch, we'll want to access multiple members from struct
packet_reader when parsing references.  Therefore, have the ref parsing
code take pointers to struct reader instead of having to pass multiple
arguments to each function.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/connect.c b/connect.c
index 23013c6344..641388a766 100644
--- a/connect.c
+++ b/connect.c
@@ -204,8 +204,9 @@ static void annotate_refs_with_symref_info(struct ref *ref)
 	string_list_clear(&symref, 0);
 }
 
-static void process_capabilities(const char *line, int *len)
+static void process_capabilities(struct packet_reader *reader, int *len)
 {
+	const char *line = reader->line;
 	int nul_location = strlen(line);
 	if (nul_location == *len)
 		return;
@@ -213,8 +214,9 @@ static void process_capabilities(const char *line, int *len)
 	*len = nul_location;
 }
 
-static int process_dummy_ref(const char *line)
+static int process_dummy_ref(const struct packet_reader *reader)
 {
+	const char *line = reader->line;
 	struct object_id oid;
 	const char *name;
 
@@ -234,9 +236,11 @@ static void check_no_capabilities(const char *line, int len)
 			line + strlen(line));
 }
 
-static int process_ref(const char *line, int len, struct ref ***list,
-		       unsigned int flags, struct oid_array *extra_have)
+static int process_ref(const struct packet_reader *reader, int len,
+		       struct ref ***list, unsigned int flags,
+		       struct oid_array *extra_have)
 {
+	const char *line = reader->line;
 	struct object_id old_oid;
 	const char *name;
 
@@ -260,9 +264,10 @@ static int process_ref(const char *line, int len, struct ref ***list,
 	return 1;
 }
 
-static int process_shallow(const char *line, int len,
+static int process_shallow(const struct packet_reader *reader, int len,
 			   struct oid_array *shallow_points)
 {
+	const char *line = reader->line;
 	const char *arg;
 	struct object_id old_oid;
 
@@ -315,20 +320,20 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 
 		switch (state) {
 		case EXPECTING_FIRST_REF:
-			process_capabilities(reader->line, &len);
-			if (process_dummy_ref(reader->line)) {
+			process_capabilities(reader, &len);
+			if (process_dummy_ref(reader)) {
 				state = EXPECTING_SHALLOW;
 				break;
 			}
 			state = EXPECTING_REF;
 			/* fallthrough */
 		case EXPECTING_REF:
-			if (process_ref(reader->line, len, &list, flags, extra_have))
+			if (process_ref(reader, len, &list, flags, extra_have))
 				break;
 			state = EXPECTING_SHALLOW;
 			/* fallthrough */
 		case EXPECTING_SHALLOW:
-			if (process_shallow(reader->line, len, shallow_points))
+			if (process_shallow(reader, len, shallow_points))
 				break;
 			die(_("protocol error: unexpected '%s'"), reader->line);
 		case EXPECTING_DONE:

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 04/44] wrapper: add function to compare strings with different NUL termination
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (2 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13 19:32   ` Martin Ågren
  2020-05-13  0:53 ` [PATCH 05/44] remote: advertise the object-format capability on the server side brian m. carlson
                   ` (41 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When parsing capabilities for the pack protocol, there are times we'll
want to compare the value of a capability to a NUL-terminated string.
Since the data we're reading will be space-terminated, not
NUL-terminated, we need a function that compares the two strings, but
also checks that they're the same length.  Otherwise, if we used strncmp
to compare these strings, we might accidentally accept a parameter that
was a prefix of the expected value.

Add a function, xstrncmpz, that takes a NUL-terminated string and a
non-NUL-terminated string, plus a length, and compares them, ensuring
that they are the same length.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 git-compat-util.h |  2 ++
 wrapper.c         | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index 8ba576e81e..6503deb171 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -868,6 +868,8 @@ char *xgetcwd(void);
 FILE *fopen_for_writing(const char *path);
 FILE *fopen_or_warn(const char *path, const char *mode);
 
+int xstrncmpz(const char *s, const char *t, size_t len);
+
 /*
  * FREE_AND_NULL(ptr) is like free(ptr) followed by ptr = NULL. Note
  * that ptr is used twice, so don't pass e.g. ptr++.
diff --git a/wrapper.c b/wrapper.c
index 3a1c0e0526..15a09740e7 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -430,6 +430,18 @@ int xmkstemp(char *filename_template)
 	return fd;
 }
 
+/*
+ * Like strncmp, but only return zero if s is NUL-terminated and exactly len
+ * characters long.  If it is not, consider it greater than t.
+ */
+int xstrncmpz(const char *s, const char *t, size_t len)
+{
+	int res = strncmp(s, t, len);
+	if (res)
+		return res;
+	return s[len] == '\0' ? 0 : 1;
+}
+
 /* Adapted from libiberty's mkstemp.c. */
 
 #undef TMP_MAX

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 05/44] remote: advertise the object-format capability on the server side
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (3 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13  0:53 ` [PATCH 06/44] connect: add function to parse multiple v1 capability values brian m. carlson
                   ` (40 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Advertise the current hash algorithm in use by using the object-format
capability as part of the ref advertisement.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/receive-pack.c | 1 +
 upload-pack.c          | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d37ab776b3..a4159b559e 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -248,6 +248,7 @@ static void show_ref(const char *path, const struct object_id *oid)
 			strbuf_addf(&cap, " push-cert=%s", push_cert_nonce);
 		if (advertise_push_options)
 			strbuf_addstr(&cap, " push-options");
+		strbuf_addf(&cap, " object-format=%s", the_hash_algo->name);
 		strbuf_addf(&cap, " agent=%s", git_user_agent_sanitized());
 		packet_write_fmt(1, "%s %s%c%s\n",
 			     oid_to_hex(oid), path, 0, cap.buf);
diff --git a/upload-pack.c b/upload-pack.c
index 902d0ad5e1..df6cb51db7 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1005,7 +1005,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		struct strbuf symref_info = STRBUF_INIT;
 
 		format_symref_info(&symref_info, cb_data);
-		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s agent=%s\n",
+		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s object-format=%s agent=%s\n",
 			     oid_to_hex(oid), refname_nons,
 			     0, capabilities,
 			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
@@ -1015,6 +1015,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 			     stateless_rpc ? " no-done" : "",
 			     symref_info.buf,
 			     allow_filter ? " filter" : "",
+			     the_hash_algo->name,
 			     git_user_agent_sanitized());
 		strbuf_release(&symref_info);
 	} else {

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 06/44] connect: add function to parse multiple v1 capability values
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (4 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 05/44] remote: advertise the object-format capability on the server side brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13  0:53 ` [PATCH 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
                   ` (39 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

In a capability response, we can have multiple symref entries.  In the
future, we will also allow for multiple hash algorithms to be specified.
To avoid duplication, expand the parse_feature_value function to take an
optional offset where the parsing should begin next time.  Add a wrapper
function that allows us to query the next server feature value, and use
it in the existing symref parsing code.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/connect.c b/connect.c
index 641388a766..4027fd4677 100644
--- a/connect.c
+++ b/connect.c
@@ -18,7 +18,8 @@
 
 static char *server_capabilities_v1;
 static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
-static const char *parse_feature_value(const char *, const char *, int *);
+static const char *parse_feature_value(const char *, const char *, int *, int *);
+static const char *next_server_feature_value(const char *feature, int *len, int *offset);
 
 static int check_ref(const char *name, unsigned int flags)
 {
@@ -180,17 +181,16 @@ static void parse_one_symref_info(struct string_list *symref, const char *val, i
 static void annotate_refs_with_symref_info(struct ref *ref)
 {
 	struct string_list symref = STRING_LIST_INIT_DUP;
-	const char *feature_list = server_capabilities_v1;
+	int offset = 0;
 
-	while (feature_list) {
+	while (1) {
 		int len;
 		const char *val;
 
-		val = parse_feature_value(feature_list, "symref", &len);
+		val = next_server_feature_value("symref", &len, &offset);
 		if (!val)
 			break;
 		parse_one_symref_info(&symref, val, len);
-		feature_list = val + 1;
 	}
 	string_list_sort(&symref);
 
@@ -452,7 +452,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	return list;
 }
 
-static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp)
+static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
 {
 	int len;
 
@@ -460,6 +460,8 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 		return NULL;
 
 	len = strlen(feature);
+	if (offset)
+		feature_list += *offset;
 	while (*feature_list) {
 		const char *found = strstr(feature_list, feature);
 		if (!found)
@@ -474,9 +476,14 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 			}
 			/* feature with a value (e.g., "agent=git/1.2.3") */
 			else if (*value == '=') {
+				int end;
+
 				value++;
+				end = strcspn(value, " \t\n");
 				if (lenp)
-					*lenp = strcspn(value, " \t\n");
+					*lenp = end;
+				if (offset)
+					*offset = value + end - feature_list;
 				return value;
 			}
 			/*
@@ -491,12 +498,17 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 
 int parse_feature_request(const char *feature_list, const char *feature)
 {
-	return !!parse_feature_value(feature_list, feature, NULL);
+	return !!parse_feature_value(feature_list, feature, NULL, NULL);
+}
+
+static const char *next_server_feature_value(const char *feature, int *len, int *offset)
+{
+	return parse_feature_value(server_capabilities_v1, feature, len, offset);
 }
 
 const char *server_feature_value(const char *feature, int *len)
 {
-	return parse_feature_value(server_capabilities_v1, feature, len);
+	return parse_feature_value(server_capabilities_v1, feature, len, NULL);
 }
 
 int server_supports(const char *feature)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 07/44] connect: add function to fetch value of a v2 server capability
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (5 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 06/44] connect: add function to parse multiple v1 capability values brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13 19:37   ` Martin Ågren
  2020-05-13  0:53 ` [PATCH 08/44] pkt-line: add a member for hash algorithm brian m. carlson
                   ` (38 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

So far in protocol v2, all of our server capabilities that have values
have not had values that we've been interested in parsing.  For example,
we receive but ignore the agent value.

However, in a future commit, we're going to want to parse out the value
of a server capability.  To make this easy, add a function,
server_feature_v2, that can fetch the value provided as part of the
server capability.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 15 +++++++++++++++
 connect.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/connect.c b/connect.c
index 4027fd4677..4df9e77206 100644
--- a/connect.c
+++ b/connect.c
@@ -84,6 +84,21 @@ int server_supports_v2(const char *c, int die_on_error)
 	return 0;
 }
 
+int server_feature_v2(const char *c, const char **v)
+{
+	int i;
+
+	for (i = 0; i < server_capabilities_v2.argc; i++) {
+		const char *out;
+		if (skip_prefix(server_capabilities_v2.argv[i], c, &out) &&
+		    (*out == '=')) {
+			*v = out + 1;
+			return 1;
+		}
+	}
+	return 0;
+}
+
 int server_supports_feature(const char *c, const char *feature,
 			    int die_on_error)
 {
diff --git a/connect.h b/connect.h
index 5f2382e018..4d76a6017d 100644
--- a/connect.h
+++ b/connect.h
@@ -19,6 +19,7 @@ struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
 int server_supports_v2(const char *c, int die_on_error);
+int server_feature_v2(const char *c, const char **v);
 int server_supports_feature(const char *c, const char *feature,
 			    int die_on_error);
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 08/44] pkt-line: add a member for hash algorithm
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (6 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13  0:53 ` [PATCH 09/44] transport: add a hash algorithm member brian m. carlson
                   ` (37 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Add a member for the hash algorithm currently in use to the packet
reader so it can parse references correctly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 pkt-line.c | 1 +
 pkt-line.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index a0e87b1e81..a4aea075de 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -479,6 +479,7 @@ void packet_reader_init(struct packet_reader *reader, int fd,
 	reader->buffer_size = sizeof(packet_buffer);
 	reader->options = options;
 	reader->me = "git";
+	reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
 }
 
 enum packet_read_status packet_reader_read(struct packet_reader *reader)
diff --git a/pkt-line.h b/pkt-line.h
index fef3a0d792..4cd9435e9a 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -166,6 +166,9 @@ struct packet_reader {
 
 	unsigned use_sideband : 1;
 	const char *me;
+
+	/* hash algorithm in use */
+	const struct git_hash_algo *hash_algo;
 };
 
 /*

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 09/44] transport: add a hash algorithm member
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (7 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 08/44] pkt-line: add a member for hash algorithm brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13  0:53 ` [PATCH 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
                   ` (36 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When connecting to a remote system, we need to know what hash algorithm
it will be using to talk to us.  Add a hash_algo member to struct
transport and add a function to read this data from the transport
object.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 transport.c | 8 ++++++++
 transport.h | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/transport.c b/transport.c
index 15f5ba4e8f..b43d985f90 100644
--- a/transport.c
+++ b/transport.c
@@ -311,6 +311,7 @@ static struct ref *handshake(struct transport *transport, int for_push,
 		BUG("unknown protocol version");
 	}
 	data->got_remote_heads = 1;
+	transport->hash_algo = reader.hash_algo;
 
 	if (reader.line_peeked)
 		BUG("buffer must be empty at the end of handshake()");
@@ -996,9 +997,16 @@ struct transport *transport_get(struct remote *remote, const char *url)
 			ret->smart_options->receivepack = remote->receivepack;
 	}
 
+	ret->hash_algo = &hash_algos[GIT_HASH_SHA1];
+
 	return ret;
 }
 
+const struct git_hash_algo *transport_get_hash_algo(struct transport *transport)
+{
+	return transport->hash_algo;
+}
+
 int transport_set_option(struct transport *transport,
 			 const char *name, const char *value)
 {
diff --git a/transport.h b/transport.h
index 4298c855be..2a9f96c05a 100644
--- a/transport.h
+++ b/transport.h
@@ -115,6 +115,8 @@ struct transport {
 	struct git_transport_options *smart_options;
 
 	enum transport_family family;
+
+	const struct git_hash_algo *hash_algo;
 };
 
 #define TRANSPORT_PUSH_ALL			(1<<0)
@@ -243,6 +245,12 @@ int transport_push(struct repository *repo,
 const struct ref *transport_get_remote_refs(struct transport *transport,
 					    const struct argv_array *ref_prefixes);
 
+/*
+ * Fetch the hash algorithm used by a remote.
+ *
+ * This can only be called after fetching the remote refs.
+ */
+const struct git_hash_algo *transport_get_hash_algo(struct transport *transport);
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
 int transport_disconnect(struct transport *transport);

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 10/44] connect: add function to detect supported v1 hash functions
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (8 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 09/44] transport: add a hash algorithm member brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13 19:39   ` Martin Ågren
  2020-05-13  0:53 ` [PATCH 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
                   ` (35 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Add a function, server_supports_hash, to see if the remote server
supports a particular hash algorithm when speaking protocol v1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 24 ++++++++++++++++++++++++
 connect.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/connect.c b/connect.c
index 4df9e77206..cb69aafe2c 100644
--- a/connect.c
+++ b/connect.c
@@ -511,6 +511,30 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 	return NULL;
 }
 
+int server_supports_hash(const char *desired, int *feature_supported)
+{
+	int offset = 0;
+	int len, found = 0;
+	const char *hash;
+
+	hash = next_server_feature_value("object-format", &len, &offset);
+	if (feature_supported)
+		*feature_supported = !!hash;
+	if (!hash) {
+		hash = hash_algos[GIT_HASH_SHA1].name;
+		len = strlen(hash);
+	}
+	while (hash) {
+		if (!xstrncmpz(desired, hash, len))
+			found = 1;
+
+		if (found)
+			return 1;
+		hash = next_server_feature_value("object-format", &len, &offset);
+	}
+	return 0;
+}
+
 int parse_feature_request(const char *feature_list, const char *feature)
 {
 	return !!parse_feature_value(feature_list, feature, NULL, NULL);
diff --git a/connect.h b/connect.h
index 4d76a6017d..fc75d6a457 100644
--- a/connect.h
+++ b/connect.h
@@ -18,6 +18,7 @@ int url_is_local_not_ssh(const char *url);
 struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
+int server_supports_hash(const char *desired, int *feature_supported);
 int server_supports_v2(const char *c, int die_on_error);
 int server_feature_v2(const char *c, const char **v);
 int server_supports_feature(const char *c, const char *feature,

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 11/44] send-pack: detect when the server doesn't support our hash
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (9 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13 19:41   ` Martin Ågren
  2020-05-13  0:53 ` [PATCH 12/44] connect: make parse_feature_value extern brian m. carlson
                   ` (34 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Detect when the server doesn't support our hash algorithm and abort.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 send-pack.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/send-pack.c b/send-pack.c
index d1b7edc995..fb037568a9 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -362,6 +362,7 @@ int send_pack(struct send_pack_args *args,
 	int atomic_supported = 0;
 	int use_push_options = 0;
 	int push_options_supported = 0;
+	int object_format_supported = 0;
 	unsigned cmds_sent = 0;
 	int ret;
 	struct async demux;
@@ -388,6 +389,9 @@ int send_pack(struct send_pack_args *args,
 	if (server_supports("push-options"))
 		push_options_supported = 1;
 
+	if (!server_supports_hash(the_hash_algo->name, &object_format_supported))
+		die(_("the receiving end does not support this repository's hash algorithm"));
+
 	if (args->push_cert != SEND_PACK_PUSH_CERT_NEVER) {
 		int len;
 		push_cert_nonce = server_feature_value("push-cert", &len);
@@ -428,6 +432,8 @@ int send_pack(struct send_pack_args *args,
 		strbuf_addstr(&cap_buf, " atomic");
 	if (use_push_options)
 		strbuf_addstr(&cap_buf, " push-options");
+	if (object_format_supported)
+		strbuf_addf(&cap_buf, " object-format=%s", the_hash_algo->name);
 	if (agent_supported)
 		strbuf_addf(&cap_buf, " agent=%s", git_user_agent_sanitized());
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 12/44] connect: make parse_feature_value extern
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (10 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13 19:48   ` Martin Ågren
  2020-05-13  0:53 ` [PATCH 13/44] fetch-pack: detect when the server doesn't support our hash brian m. carlson
                   ` (33 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

We're going to be using this function in other files, so no longer mark
this function static.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 3 +--
 connect.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/connect.c b/connect.c
index cb69aafe2c..511a069304 100644
--- a/connect.c
+++ b/connect.c
@@ -18,7 +18,6 @@
 
 static char *server_capabilities_v1;
 static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
-static const char *parse_feature_value(const char *, const char *, int *, int *);
 static const char *next_server_feature_value(const char *feature, int *len, int *offset);
 
 static int check_ref(const char *name, unsigned int flags)
@@ -467,7 +466,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	return list;
 }
 
-static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
+const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
 {
 	int len;
 
diff --git a/connect.h b/connect.h
index fc75d6a457..81935a0f2a 100644
--- a/connect.h
+++ b/connect.h
@@ -19,6 +19,7 @@ struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
 int server_supports_hash(const char *desired, int *feature_supported);
+const char *parse_feature_value(const char *, const char *, int *, int *);
 int server_supports_v2(const char *c, int die_on_error);
 int server_feature_v2(const char *c, const char **v);
 int server_supports_feature(const char *c, const char *feature,

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 13/44] fetch-pack: detect when the server doesn't support our hash
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (11 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 12/44] connect: make parse_feature_value extern brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13  0:53 ` [PATCH 14/44] connect: detect algorithm when fetching refs brian m. carlson
                   ` (32 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Detect when the server doesn't support our hash algorithm and abort.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fetch-pack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index f73a2ce6cb..1d277190e7 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1039,6 +1039,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		print_verbose(args, _("Server supports %s"), "deepen-relative");
 	else if (args->deepen_relative)
 		die(_("Server does not support --deepen"));
+	if (!server_supports_hash(the_hash_algo->name, NULL))
+		die(_("Server does not support this repository's object format"));
 
 	if (!args->no_dependents) {
 		mark_complete_and_common_ref(negotiator, args, &ref);

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 14/44] connect: detect algorithm when fetching refs
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (12 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 13/44] fetch-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-16 10:40   ` Martin Ågren
  2020-05-13  0:53 ` [PATCH 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
                   ` (31 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

If we're fetching refs, detect the hash algorithm and parse the refs
using that algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/connect.c b/connect.c
index 511a069304..a39b843589 100644
--- a/connect.c
+++ b/connect.c
@@ -220,12 +220,24 @@ static void annotate_refs_with_symref_info(struct ref *ref)
 
 static void process_capabilities(struct packet_reader *reader, int *len)
 {
+	const char *feat_val;
+	int feat_len;
+	int hash_algo;
 	const char *line = reader->line;
 	int nul_location = strlen(line);
 	if (nul_location == *len)
 		return;
 	server_capabilities_v1 = xstrdup(line + nul_location + 1);
 	*len = nul_location;
+
+	feat_val = server_feature_value("object-format", &feat_len);
+	if (feat_val) {
+		char *hash_name = xstrndup(feat_val, feat_len);
+		hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo != GIT_HASH_UNKNOWN)
+			reader->hash_algo = &hash_algos[hash_algo];
+		free(hash_name);
+	}
 }
 
 static int process_dummy_ref(const struct packet_reader *reader)
@@ -234,7 +246,7 @@ static int process_dummy_ref(const struct packet_reader *reader)
 	struct object_id oid;
 	const char *name;
 
-	if (parse_oid_hex(line, &oid, &name))
+	if (parse_oid_hex_algop(line, &oid, &name, reader->hash_algo))
 		return 0;
 	if (*name != ' ')
 		return 0;
@@ -258,7 +270,7 @@ static int process_ref(const struct packet_reader *reader, int len,
 	struct object_id old_oid;
 	const char *name;
 
-	if (parse_oid_hex(line, &old_oid, &name))
+	if (parse_oid_hex_algop(line, &old_oid, &name, reader->hash_algo))
 		return 0;
 	if (*name != ' ')
 		return 0;
@@ -270,7 +282,7 @@ static int process_ref(const struct packet_reader *reader, int len,
 		die(_("protocol error: unexpected capabilities^{}"));
 	} else if (check_ref(name, flags)) {
 		struct ref *ref = alloc_ref(name);
-		oidcpy(&ref->old_oid, &old_oid);
+		memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
 		**list = ref;
 		*list = &ref->next;
 	}
@@ -288,7 +300,7 @@ static int process_shallow(const struct packet_reader *reader, int len,
 	if (!skip_prefix(line, "shallow ", &arg))
 		return 0;
 
-	if (get_oid_hex(arg, &old_oid))
+	if (get_oid_hex_algop(arg, &old_oid, reader->hash_algo))
 		die(_("protocol error: expected shallow sha-1, got '%s'"), arg);
 	if (!shallow_points)
 		die(_("repository on the other end cannot be shallow"));

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 15/44] builtin/receive-pack: detect when the server doesn't support our hash
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (13 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 14/44] connect: detect algorithm when fetching refs brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-16 10:41   ` Martin Ågren
  2020-05-13  0:53 ` [PATCH 16/44] docs: update remote helper docs for object-format extensions brian m. carlson
                   ` (30 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Detect when the server doesn't support our hash algorithm and abort.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/receive-pack.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index a4159b559e..8755fa2463 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1624,6 +1624,8 @@ static struct command *read_head_info(struct packet_reader *reader,
 		linelen = strlen(reader->line);
 		if (linelen < reader->pktlen) {
 			const char *feature_list = reader->line + linelen + 1;
+			const char *hash;
+			int len = 0;
 			if (parse_feature_request(feature_list, "report-status"))
 				report_status = 1;
 			if (parse_feature_request(feature_list, "side-band-64k"))
@@ -1636,6 +1638,13 @@ static struct command *read_head_info(struct packet_reader *reader,
 			if (advertise_push_options
 			    && parse_feature_request(feature_list, "push-options"))
 				use_push_options = 1;
+			hash = parse_feature_value(feature_list, "object-format", &len, NULL);
+			if (!hash) {
+				hash = hash_algos[GIT_HASH_SHA1].name;
+				len = strlen(hash);
+			}
+			if (xstrncmpz(the_hash_algo->name, hash, len))
+				die("error: unsupported object format '%s'", hash);
 		}
 
 		if (!strcmp(reader->line, "push-cert")) {

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 16/44] docs: update remote helper docs for object-format extensions
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (14 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13  0:53 ` [PATCH 17/44] transport-helper: implement " brian m. carlson
                   ` (29 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Update the remote helper docs to document the object-format extensions
we will implement in remote-curl and the transport helper code shortly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitremote-helpers.txt | 33 +++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index f48a031dc3..26f32e4421 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -238,6 +238,9 @@ the remote repository.
 	`--signed-tags=verbatim` to linkgit:git-fast-export[1].  In the
 	absence of this capability, Git will use `--signed-tags=warn-strip`.
 
+'object-format'::
+	This indicates that the helper is able to interact with the remote
+	side using an explicit hash algorithm extension.
 
 
 COMMANDS
@@ -257,12 +260,14 @@ Support for this command is mandatory.
 'list'::
 	Lists the refs, one per line, in the format "<value> <name>
 	[<attr> ...]". The value may be a hex sha1 hash, "@<dest>" for
-	a symref, or "?" to indicate that the helper could not get the
-	value of the ref. A space-separated list of attributes follows
-	the name; unrecognized attributes are ignored. The list ends
-	with a blank line.
+	a symref, ":<keyword> <value>" for a key-value pair, or
+	"?" to indicate that the helper could not get the value of the
+	ref. A space-separated list of attributes follows the name;
+	unrecognized attributes are ignored. The list ends with a
+	blank line.
 +
 See REF LIST ATTRIBUTES for a list of currently defined attributes.
+See REF LIST KEYWORDS for a list of currently defined keywords.
 +
 Supported if the helper has the "fetch" or "import" capability.
 
@@ -430,6 +435,18 @@ attributes are defined.
 	This ref is unchanged since the last import or fetch, although
 	the helper cannot necessarily determine what value that produced.
 
+REF LIST KEYWORDS
+-----------------
+
+The 'list' command may produce a list of key-value pairs.
+The following keys are defined.
+
+'object-format'::
+	The refs are using the given hash algorithm.  This keyword is only
+	used if the server and client both support the object-format
+	extension.
+
+
 OPTIONS
 -------
 
@@ -514,6 +531,14 @@ set by Git if the remote helper has the 'option' capability.
 	transaction.  If successful, all refs will be updated, or none will.  If the
 	remote side does not support this capability, the push will fail.
 
+'option object-format' {'true'|algorithm}::
+	If 'true', indicate that the caller wants hash algorithm information
+	to be passed back from the remote.  This mode is used when fetching
+	refs.
++
+If set to an algorithm, indicate that the caller wants to interact with
+the remote side using that algorithm.
+
 SEE ALSO
 --------
 linkgit:git-remote[1]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 17/44] transport-helper: implement object-format extensions
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (15 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 16/44] docs: update remote helper docs for object-format extensions brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13  0:53 ` [PATCH 18/44] remote-curl: " brian m. carlson
                   ` (28 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Implement the object-format extensions that let us determine the hash
algorithm in use when pushing or pulling data.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 transport-helper.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index a46afcb69d..ae33b0eea7 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -32,7 +32,8 @@ struct helper_data {
 		signed_tags : 1,
 		check_connectivity : 1,
 		no_disconnect_req : 1,
-		no_private_update : 1;
+		no_private_update : 1,
+		object_format : 1;
 
 	/*
 	 * As an optimization, the transport code may invoke fetch before
@@ -207,6 +208,8 @@ static struct child_process *get_helper(struct transport *transport)
 			data->import_marks = xstrdup(arg);
 		} else if (starts_with(capname, "no-private-update")) {
 			data->no_private_update = 1;
+		} else if (starts_with(capname, "object-format")) {
+			data->object_format = 1;
 		} else if (mandatory) {
 			die(_("unknown mandatory capability %s; this remote "
 			      "helper probably needs newer version of Git"),
@@ -1103,6 +1106,12 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 	data->get_refs_list_called = 1;
 	helper = get_helper(transport);
 
+	if (data->object_format) {
+		write_str_in_full(helper->in, "option object-format\n");
+		if (recvline(data, &buf) || strcmp(buf.buf, "ok"))
+			exit(128);
+	}
+
 	if (data->push && for_push)
 		write_str_in_full(helper->in, "list for-push\n");
 	else
@@ -1115,6 +1124,17 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 
 		if (!*buf.buf)
 			break;
+		else if (buf.buf[0] == ':') {
+			const char *value;
+			if (skip_prefix(buf.buf, ":object-format ", &value)) {
+				int algo = hash_algo_by_name(value);
+				if (algo == GIT_HASH_UNKNOWN)
+					die(_("unsupported object format '%s'"),
+					    value);
+				transport->hash_algo = &hash_algos[algo];
+			}
+			continue;
+		}
 
 		eov = strchr(buf.buf, ' ');
 		if (!eov)
@@ -1127,7 +1147,7 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 		if (buf.buf[0] == '@')
 			(*tail)->symref = xstrdup(buf.buf + 1);
 		else if (buf.buf[0] != '?')
-			get_oid_hex(buf.buf, &(*tail)->old_oid);
+			get_oid_hex_algop(buf.buf, &(*tail)->old_oid, transport->hash_algo);
 		if (eon) {
 			if (has_attribute(eon + 1, "unchanged")) {
 				(*tail)->status |= REF_STATUS_UPTODATE;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 18/44] remote-curl: implement object-format extensions
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (16 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 17/44] transport-helper: implement " brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-13  0:53 ` [PATCH 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
                   ` (27 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Implement the object-format extensions that let us determine the hash
algorithm in use when pushing, pulling, and fetching.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 1c9aa3d0ab..3ed0dfec1b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -41,7 +41,9 @@ struct options {
 		deepen_relative : 1,
 		from_promisor : 1,
 		no_dependents : 1,
-		atomic : 1;
+		atomic : 1,
+		object_format : 1;
+	const struct git_hash_algo *hash_algo;
 };
 static struct options options;
 static struct string_list cas_options = STRING_LIST_INIT_DUP;
@@ -190,6 +192,16 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "filter")) {
 		options.filter = xstrdup(value);
 		return 0;
+	} else if (!strcmp(name, "object-format")) {
+		int algo;
+		options.object_format = 1;
+		if (strcmp(value, "true")) {
+			algo = hash_algo_by_name(value);
+			if (algo == GIT_HASH_UNKNOWN)
+				die("unknown object format '%s'", value);
+			options.hash_algo = &hash_algos[algo];
+		}
+		return 0;
 	} else {
 		return 1 /* unsupported */;
 	}
@@ -231,6 +243,7 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	case protocol_v0:
 		get_remote_heads(&reader, &list, for_push ? REF_NORMAL : 0,
 				 NULL, &heads->shallow);
+		options.hash_algo = reader.hash_algo;
 		break;
 	case protocol_unknown_version:
 		BUG("unknown protocol version");
@@ -509,6 +522,9 @@ static struct ref *get_refs(int for_push)
 static void output_refs(struct ref *refs)
 {
 	struct ref *posn;
+	if (options.object_format && options.hash_algo) {
+		printf(":object-format %s\n", options.hash_algo->name);
+	}
 	for (posn = refs; posn; posn = posn->next) {
 		if (posn->symref)
 			printf("@%s %s\n", posn->symref, posn->name);
@@ -1439,6 +1455,7 @@ int cmd_main(int argc, const char **argv)
 			printf("option\n");
 			printf("push\n");
 			printf("check-connectivity\n");
+			printf("object-format\n");
 			printf("\n");
 			fflush(stdout);
 		} else if (skip_prefix(buf.buf, "stateless-connect ", &arg)) {

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 19/44] builtin/clone: initialize hash algorithm properly
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (17 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 18/44] remote-curl: " brian m. carlson
@ 2020-05-13  0:53 ` brian m. carlson
  2020-05-16 10:48   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 20/44] t5562: pass object-format in synthesized test data brian m. carlson
                   ` (26 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:53 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When performing a clone, we don't know what hash algorithm the other end
will support.  Currently, we don't support fetching data belonging to a
different algorithm, so we must know what algorithm the remote side is
using in order to properly initialize the repository.  We can know that
only after fetching the refs, so if the remote side has any references,
use that information to reinitialize the repository with the correct
hash algorithm information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/clone.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index cb48a291ca..f27d38bc8e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1217,6 +1217,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	refs = transport_get_remote_refs(transport, &ref_prefixes);
 
 	if (refs) {
+		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
+
+		/*
+		 * Now that we know what algorithm the remote side is using,
+		 * let's set ours to the same thing.
+		 */
+		initialize_repository_version(hash_algo);
+		repo_set_hash_algo(the_repository, hash_algo);
+
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
 		/*
 		 * transport_get_remote_refs() may return refs with null sha-1

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 20/44] t5562: pass object-format in synthesized test data
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (18 preceding siblings ...)
  2020-05-13  0:53 ` [PATCH 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 10:55   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 21/44] t5704: send object-format capability with SHA-256 brian m. carlson
                   ` (25 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Ensure that we pass the object-format capability in the synthesized test
data so that this test works with algorithms other than SHA-1.

In addition, add an additional test using the old data for when we're
using SHA-1 so that we can be sure that we preserve backwards
compatibility with servers not offering the object-format capability.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5562-http-backend-content-length.sh | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 3f4ac71f83..f508d4d449 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -46,6 +46,7 @@ ssize_b100dots() {
 }
 
 test_expect_success 'setup' '
+	test_oid_init &&
 	HTTP_CONTENT_ENCODING="identity" &&
 	export HTTP_CONTENT_ENCODING &&
 	git config http.receivepack true &&
@@ -62,8 +63,8 @@ test_expect_success 'setup' '
 	test_copy_bytes 10 <fetch_body >fetch_body.trunc &&
 	hash_next=$(git commit-tree -p HEAD -m next HEAD^{tree}) &&
 	{
-		printf "%s %s refs/heads/newbranch\\0report-status\\n" \
-			"$ZERO_OID" "$hash_next" | packetize &&
+		printf "%s %s refs/heads/newbranch\\0report-status object-format=%s\\n" \
+			"$ZERO_OID" "$hash_next" "$(test_oid algo)" | packetize &&
 		printf 0000 &&
 		echo "$hash_next" | git pack-objects --stdout
 	} >push_body &&
@@ -117,6 +118,15 @@ test_expect_success GZIP 'push plain' '
 	test_cmp act.head exp.head
 '
 
+test_expect_success GZIP 'push plain with SHA-1' '
+	test_when_finished "git branch -D newbranch" &&
+	test_http_env receive push_body &&
+	verify_http_result "200 OK" &&
+	git rev-parse newbranch >act.head &&
+	echo "$hash_next" >exp.head &&
+	test_cmp act.head exp.head
+'
+
 test_expect_success 'push plain truncated' '
 	test_http_env receive push_body.trunc &&
 	! verify_http_result "200 OK"

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 21/44] t5704: send object-format capability with SHA-256
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (19 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 20/44] t5562: pass object-format in synthesized test data brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:02   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 22/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
                   ` (24 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When we speak protocol v2 in this test, we must pass the object-format
header if the algorithm is not SHA-1.  Otherwise, git upload-pack fails
because the hash algorithm doesn't match and not because we've failed to
speak the protocol correctly.  Pass the header so that our assertions
test what we're really interested in.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5704-protocol-violations.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t5704-protocol-violations.sh b/t/t5704-protocol-violations.sh
index 950cfb21fe..47e78932b9 100755
--- a/t/t5704-protocol-violations.sh
+++ b/t/t5704-protocol-violations.sh
@@ -6,9 +6,20 @@ communications if the other side says something unexpected. We are mostly
 making sure that we do not segfault or otherwise behave badly.'
 . ./test-lib.sh
 
+# If we don't print the object format, we'll fail for a spurious reason: the
+# mismatched object format.
+print_object_format () {
+	local algo=$(test_oid algo) &&
+	if test "$algo" != "sha1"
+	then
+		packetize "object-format=$algo"
+	fi
+}
+
 test_expect_success 'extra delim packet in v2 ls-refs args' '
 	{
 		packetize command=ls-refs &&
+		print_object_format &&
 		printf 0001 &&
 		# protocol expects 0000 flush here
 		printf 0001
@@ -21,6 +32,7 @@ test_expect_success 'extra delim packet in v2 ls-refs args' '
 test_expect_success 'extra delim packet in v2 fetch args' '
 	{
 		packetize command=fetch &&
+		print_object_format &&
 		printf 0001 &&
 		# protocol expects 0000 flush here
 		printf 0001

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 22/44] fetch-pack: parse and advertise the object-format capability
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (20 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 21/44] t5704: send object-format capability with SHA-256 brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:03   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 23/44] setup: set the_repository's hash algo when checking format brian m. carlson
                   ` (23 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Parse the server's object-format capability and respond accordingly,
dying if there is a mismatch.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fetch-pack.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index 1d277190e7..3a48ed4b13 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1179,6 +1179,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
 			      int sideband_all, int seen_ack)
 {
 	int ret = 0;
+	const char *hash_name;
 	struct strbuf req_buf = STRBUF_INIT;
 
 	if (server_supports_v2("fetch", 1))
@@ -1193,6 +1194,17 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
 					 args->server_options->items[i].string);
 	}
 
+	if (server_feature_v2("object-format", &hash_name)) {
+		int hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
+			die(_("mismatched algorithms: client %s; server %s"),
+			    the_hash_algo->name, hash_name);
+		packet_write_fmt(fd_out, "object-format=%s", the_hash_algo->name);
+	}
+	else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1)
+		die(_("the server does not support algorithm '%s'"),
+		    the_hash_algo->name);
+
 	packet_buf_delim(&req_buf);
 	if (args->use_thin_pack)
 		packet_buf_write(&req_buf, "thin-pack");

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 23/44] setup: set the_repository's hash algo when checking format
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (21 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 22/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:03   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 24/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
                   ` (22 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When we're checking the repository's format, set the hash algorithm at
the same time.  This ensures that we perform a suitable initialization
early enough to avoid confusing any parts of the code.  If we defer
until later, we can end up with portions of the code which are confused
about the hash algorithm, resulting in segfaults.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/setup.c b/setup.c
index 65fe5ecefb..019a1c6367 100644
--- a/setup.c
+++ b/setup.c
@@ -1273,6 +1273,7 @@ void check_repository_format(struct repository_format *fmt)
 		fmt = &repo_fmt;
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
+	repo_set_hash_algo(the_repository, fmt->hash_algo);
 	clear_repository_format(&repo_fmt);
 }
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 24/44] t3200: mark assertion with SHA1 prerequisite
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (22 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 23/44] setup: set the_repository's hash algo when checking format brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:04   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 25/44] packfile: compute and use the index CRC offset brian m. carlson
                   ` (21 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

One of the test assertions in this test checks that git branch -m works
even without a .git/config file.  However, if the repository requires
configuration extensions, such as because it uses a non-SHA-1 algorithm,
this assertion will fail.  Mark the assertion as requiring SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t3200-branch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 411a70b0ce..2a3fedc6b0 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -402,7 +402,7 @@ EOF
 
 mv .git/config .git/config-saved
 
-test_expect_success 'git branch -m q q2 without config should succeed' '
+test_expect_success SHA1 'git branch -m q q2 without config should succeed' '
 	git branch -m q q2 &&
 	git branch -m q2 q
 '

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 25/44] packfile: compute and use the index CRC offset
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (23 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 24/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:12   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 26/44] t5302: modernize test formatting brian m. carlson
                   ` (20 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Both v2 pack index files and the v3 format specified as part of the
NewHash work have similar data starting at the CRC table.  Much of the
existing code wants to read either this table or the offset entries
following it, and in doing so computes the offset each time.

In order to share as much code between v2 and v3, compute the offset of
the CRC table and store it when the pack is opened.  Use this value to
compute offsets to not only the CRC table, but to the offset entries
beyond it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/index-pack.c | 6 +-----
 object-store.h       | 1 +
 packfile.c           | 1 +
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f176dd28c8..7bea1fba52 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1555,13 +1555,9 @@ static void read_v2_anomalous_offsets(struct packed_git *p,
 {
 	const uint32_t *idx1, *idx2;
 	uint32_t i;
-	const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t);
 
 	/* The address of the 4-byte offset table */
-	idx1 = (((const uint32_t *)p->index_data)
-		+ 2 /* 8-byte header */
-		+ 256 /* fan out */
-		+ hashwords * p->num_objects /* object ID table */
+	idx1 = (((const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset))
 		+ p->num_objects /* CRC32 table */
 		);
 
diff --git a/object-store.h b/object-store.h
index d1e490f203..f439d47af8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -70,6 +70,7 @@ struct packed_git {
 	size_t index_size;
 	uint32_t num_objects;
 	uint32_t num_bad_objects;
+	uint32_t crc_offset;
 	unsigned char *bad_object_sha1;
 	int index_version;
 	time_t mtime;
diff --git a/packfile.c b/packfile.c
index f4e752996d..6ab5233613 100644
--- a/packfile.c
+++ b/packfile.c
@@ -178,6 +178,7 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
 		     */
 		    (sizeof(off_t) <= 4))
 			return error("pack too large for current definition of off_t in %s", path);
+		p->crc_offset = 8 + 4 * 256 + nr * hashsz;
 	}
 
 	p->index_version = version;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 26/44] t5302: modernize test formatting
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (24 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 25/44] packfile: compute and use the index CRC offset brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 27/44] builtin/show-index: provide options to determine hash algo brian m. carlson
                   ` (19 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Our style these days is to place the description and the opening quote
of the body on the same line as test_expect_success (if it fits), to
place the trailing quote on a line by itself after the body, and to use
tabs.  Since we're going to be making several significant changes to
this test, modernize the style to aid in readability of the subsequent
patches.

This patch should have no functional change.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5302-pack-index.sh | 360 +++++++++++++++++++++---------------------
 1 file changed, 184 insertions(+), 176 deletions(-)

diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index ad07f2f7fc..8981c9b90e 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -7,65 +7,65 @@ test_description='pack index with 64-bit offsets and object CRC'
 . ./test-lib.sh
 
 test_expect_success 'setup' '
-     test_oid_init &&
-     rawsz=$(test_oid rawsz) &&
-     rm -rf .git &&
-     git init &&
-     git config pack.threads 1 &&
-     i=1 &&
-     while test $i -le 100
-     do
-         iii=$(printf '%03i' $i)
-	 test-tool genrandom "bar" 200 > wide_delta_$iii &&
-	 test-tool genrandom "baz $iii" 50 >> wide_delta_$iii &&
-	 test-tool genrandom "foo"$i 100 > deep_delta_$iii &&
-	 test-tool genrandom "foo"$(expr $i + 1) 100 >> deep_delta_$iii &&
-	 test-tool genrandom "foo"$(expr $i + 2) 100 >> deep_delta_$iii &&
-         echo $iii >file_$iii &&
-	 test-tool genrandom "$iii" 8192 >>file_$iii &&
-         git update-index --add file_$iii deep_delta_$iii wide_delta_$iii &&
-         i=$(expr $i + 1) || return 1
-     done &&
-     { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
-     git update-index --add file_101 &&
-     tree=$(git write-tree) &&
-     commit=$(git commit-tree $tree </dev/null) && {
-	 echo $tree &&
-	 git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\)	.*/\\1/"
-     } >obj-list &&
-     git update-ref HEAD $commit
+	test_oid_init &&
+	rawsz=$(test_oid rawsz) &&
+	rm -rf .git &&
+	git init &&
+	git config pack.threads 1 &&
+	i=1 &&
+	while test $i -le 100
+	do
+		iii=$(printf '%03i' $i)
+		test-tool genrandom "bar" 200 > wide_delta_$iii &&
+		test-tool genrandom "baz $iii" 50 >> wide_delta_$iii &&
+		test-tool genrandom "foo"$i 100 > deep_delta_$iii &&
+		test-tool genrandom "foo"$(expr $i + 1) 100 >> deep_delta_$iii &&
+		test-tool genrandom "foo"$(expr $i + 2) 100 >> deep_delta_$iii &&
+		echo $iii >file_$iii &&
+		test-tool genrandom "$iii" 8192 >>file_$iii &&
+		git update-index --add file_$iii deep_delta_$iii wide_delta_$iii &&
+		i=$(expr $i + 1) || return 1
+	done &&
+	{ echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+	git update-index --add file_101 &&
+	tree=$(git write-tree) &&
+	commit=$(git commit-tree $tree </dev/null) && {
+		echo $tree &&
+		git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\)	.*/\\1/"
+	} >obj-list &&
+	git update-ref HEAD $commit
 '
 
-test_expect_success \
-    'pack-objects with index version 1' \
-    'pack1=$(git pack-objects --index-version=1 test-1 <obj-list) &&
-     git verify-pack -v "test-1-${pack1}.pack"'
+test_expect_success 'pack-objects with index version 1' '
+	pack1=$(git pack-objects --index-version=1 test-1 <obj-list) &&
+	git verify-pack -v "test-1-${pack1}.pack"
+'
 
-test_expect_success \
-    'pack-objects with index version 2' \
-    'pack2=$(git pack-objects --index-version=2 test-2 <obj-list) &&
-     git verify-pack -v "test-2-${pack2}.pack"'
+test_expect_success 'pack-objects with index version 2' '
+	pack2=$(git pack-objects --index-version=2 test-2 <obj-list) &&
+	git verify-pack -v "test-2-${pack2}.pack"
+'
 
-test_expect_success \
-    'both packs should be identical' \
-    'cmp "test-1-${pack1}.pack" "test-2-${pack2}.pack"'
+test_expect_success 'both packs should be identical' '
+	cmp "test-1-${pack1}.pack" "test-2-${pack2}.pack"
+'
 
-test_expect_success \
-    'index v1 and index v2 should be different' \
-    '! cmp "test-1-${pack1}.idx" "test-2-${pack2}.idx"'
+test_expect_success 'index v1 and index v2 should be different' '
+	! cmp "test-1-${pack1}.idx" "test-2-${pack2}.idx"
+'
 
-test_expect_success \
-    'index-pack with index version 1' \
-    'git index-pack --index-version=1 -o 1.idx "test-1-${pack1}.pack"'
+test_expect_success 'index-pack with index version 1' '
+	git index-pack --index-version=1 -o 1.idx "test-1-${pack1}.pack"
+'
 
-test_expect_success \
-    'index-pack with index version 2' \
-    'git index-pack --index-version=2 -o 2.idx "test-1-${pack1}.pack"'
+test_expect_success 'index-pack with index version 2' '
+	git index-pack --index-version=2 -o 2.idx "test-1-${pack1}.pack"
+'
 
-test_expect_success \
-    'index-pack results should match pack-objects ones' \
-    'cmp "test-1-${pack1}.idx" "1.idx" &&
-     cmp "test-2-${pack2}.idx" "2.idx"'
+test_expect_success 'index-pack results should match pack-objects ones' '
+	cmp "test-1-${pack1}.idx" "1.idx" &&
+	cmp "test-2-${pack2}.idx" "2.idx"
+'
 
 test_expect_success 'index-pack --verify on index version 1' '
 	git index-pack --verify "test-1-${pack1}.pack"
@@ -75,13 +75,13 @@ test_expect_success 'index-pack --verify on index version 2' '
 	git index-pack --verify "test-2-${pack2}.pack"
 '
 
-test_expect_success \
-    'pack-objects --index-version=2, is not accepted' \
-    'test_must_fail git pack-objects --index-version=2, test-3 <obj-list'
+test_expect_success 'pack-objects --index-version=2, is not accepted' '
+	test_must_fail git pack-objects --index-version=2, test-3 <obj-list
+'
 
-test_expect_success \
-    'index v2: force some 64-bit offsets with pack-objects' \
-    'pack3=$(git pack-objects --index-version=2,0x40000 test-3 <obj-list)'
+test_expect_success 'index v2: force some 64-bit offsets with pack-objects' '
+	pack3=$(git pack-objects --index-version=2,0x40000 test-3 <obj-list)
+'
 
 if msg=$(git verify-pack -v "test-3-${pack3}.pack" 2>&1) ||
 	! (echo "$msg" | grep "pack too large .* off_t")
@@ -91,21 +91,21 @@ else
 	say "# skipping tests concerning 64-bit offsets"
 fi
 
-test_expect_success OFF64_T \
-    'index v2: verify a pack with some 64-bit offsets' \
-    'git verify-pack -v "test-3-${pack3}.pack"'
+test_expect_success OFF64_T 'index v2: verify a pack with some 64-bit offsets' '
+	git verify-pack -v "test-3-${pack3}.pack"
+'
 
-test_expect_success OFF64_T \
-    '64-bit offsets: should be different from previous index v2 results' \
-    '! cmp "test-2-${pack2}.idx" "test-3-${pack3}.idx"'
+test_expect_success OFF64_T '64-bit offsets: should be different from previous index v2 results' '
+	! cmp "test-2-${pack2}.idx" "test-3-${pack3}.idx"
+'
 
-test_expect_success OFF64_T \
-    'index v2: force some 64-bit offsets with index-pack' \
-    'git index-pack --index-version=2,0x40000 -o 3.idx "test-1-${pack1}.pack"'
+test_expect_success OFF64_T 'index v2: force some 64-bit offsets with index-pack' '
+	git index-pack --index-version=2,0x40000 -o 3.idx "test-1-${pack1}.pack"
+'
 
-test_expect_success OFF64_T \
-    '64-bit offsets: index-pack result should match pack-objects one' \
-    'cmp "test-3-${pack3}.idx" "3.idx"'
+test_expect_success OFF64_T '64-bit offsets: index-pack result should match pack-objects one' '
+	cmp "test-3-${pack3}.idx" "3.idx"
+'
 
 test_expect_success OFF64_T 'index-pack --verify on 64-bit offset v2 (cheat)' '
 	# This cheats by knowing which lower offset should still be encoded
@@ -120,135 +120,143 @@ test_expect_success OFF64_T 'index-pack --verify on 64-bit offset v2' '
 # returns the object number for given object in given pack index
 index_obj_nr()
 {
-    idx_file=$1
-    object_sha1=$2
-    nr=0
-    git show-index < $idx_file |
-    while read offs sha1 extra
-    do
-      nr=$(($nr + 1))
-      test "$sha1" = "$object_sha1" || continue
-      echo "$(($nr - 1))"
-      break
-    done
+	idx_file=$1
+	object_sha1=$2
+	nr=0
+	git show-index < $idx_file |
+	while read offs sha1 extra
+	do
+	  nr=$(($nr + 1))
+	  test "$sha1" = "$object_sha1" || continue
+	  echo "$(($nr - 1))"
+	  break
+	done
 }
 
 # returns the pack offset for given object as found in given pack index
 index_obj_offset()
 {
-    idx_file=$1
-    object_sha1=$2
-    git show-index < $idx_file | grep $object_sha1 |
-    ( read offs extra && echo "$offs" )
+	idx_file=$1
+	object_sha1=$2
+	git show-index < $idx_file | grep $object_sha1 |
+	( read offs extra && echo "$offs" )
 }
 
-test_expect_success \
-    '[index v1] 1) stream pack to repository' \
-    'git index-pack --index-version=1 --stdin < "test-1-${pack1}.pack" &&
-     git prune-packed &&
-     git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
-     cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
-     cmp "test-1-${pack1}.idx"  ".git/objects/pack/pack-${pack1}.idx"'
+test_expect_success '[index v1] 1) stream pack to repository' '
+	git index-pack --index-version=1 --stdin < "test-1-${pack1}.pack" &&
+	git prune-packed &&
+	git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
+	cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
+	cmp "test-1-${pack1}.idx"	".git/objects/pack/pack-${pack1}.idx"
+'
 
 test_expect_success \
-    '[index v1] 2) create a stealth corruption in a delta base reference' \
-    '# This test assumes file_101 is a delta smaller than 16 bytes.
-     # It should be against file_100 but we substitute its base for file_099
-     sha1_101=$(git hash-object file_101) &&
-     sha1_099=$(git hash-object file_099) &&
-     offs_101=$(index_obj_offset 1.idx $sha1_101) &&
-     nr_099=$(index_obj_nr 1.idx $sha1_099) &&
-     chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
-     recordsz=$((rawsz + 4)) &&
-     dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
-        if=".git/objects/pack/pack-${pack1}.idx" \
-        skip=$((4 + 256 * 4 + $nr_099 * recordsz)) \
-        bs=1 count=$rawsz conv=notrunc &&
-     git cat-file blob $sha1_101 > file_101_foo1'
+	'[index v1] 2) create a stealth corruption in a delta base reference' '
+	# This test assumes file_101 is a delta smaller than 16 bytes.
+	# It should be against file_100 but we substitute its base for file_099
+	sha1_101=$(git hash-object file_101) &&
+	sha1_099=$(git hash-object file_099) &&
+	offs_101=$(index_obj_offset 1.idx $sha1_101) &&
+	nr_099=$(index_obj_nr 1.idx $sha1_099) &&
+	chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
+	recordsz=$((rawsz + 4)) &&
+	dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
+	       if=".git/objects/pack/pack-${pack1}.idx" \
+	       skip=$((4 + 256 * 4 + $nr_099 * recordsz)) \
+	       bs=1 count=$rawsz conv=notrunc &&
+	git cat-file blob $sha1_101 > file_101_foo1
+'
 
 test_expect_success \
-    '[index v1] 3) corrupted delta happily returned wrong data' \
-    'test -f file_101_foo1 && ! cmp file_101 file_101_foo1'
+	'[index v1] 3) corrupted delta happily returned wrong data' '
+	test -f file_101_foo1 && ! cmp file_101 file_101_foo1
+'
 
 test_expect_success \
-    '[index v1] 4) confirm that the pack is actually corrupted' \
-    'test_must_fail git fsck --full $commit'
+	'[index v1] 4) confirm that the pack is actually corrupted' '
+	test_must_fail git fsck --full $commit
+'
 
 test_expect_success \
-    '[index v1] 5) pack-objects happily reuses corrupted data' \
-    'pack4=$(git pack-objects test-4 <obj-list) &&
-     test -f "test-4-${pack4}.pack"'
+	'[index v1] 5) pack-objects happily reuses corrupted data' '
+	pack4=$(git pack-objects test-4 <obj-list) &&
+	test -f "test-4-${pack4}.pack"
+'
+
+test_expect_success '[index v1] 6) newly created pack is BAD !' '
+	test_must_fail git verify-pack -v "test-4-${pack4}.pack"
+'
+
+test_expect_success '[index v2] 1) stream pack to repository' '
+	rm -f .git/objects/pack/* &&
+	git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
+	git prune-packed &&
+	git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
+	cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
+	cmp "test-2-${pack1}.idx"	".git/objects/pack/pack-${pack1}.idx"
+'
 
 test_expect_success \
-    '[index v1] 6) newly created pack is BAD !' \
-    'test_must_fail git verify-pack -v "test-4-${pack4}.pack"'
+	'[index v2] 2) create a stealth corruption in a delta base reference' '
+	# This test assumes file_101 is a delta smaller than 16 bytes.
+	# It should be against file_100 but we substitute its base for file_099
+	sha1_101=$(git hash-object file_101) &&
+	sha1_099=$(git hash-object file_099) &&
+	offs_101=$(index_obj_offset 1.idx $sha1_101) &&
+	nr_099=$(index_obj_nr 1.idx $sha1_099) &&
+	chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
+	dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
+		if=".git/objects/pack/pack-${pack1}.idx" \
+		skip=$((8 + 256 * 4 + $nr_099 * rawsz)) \
+		bs=1 count=$rawsz conv=notrunc &&
+	git cat-file blob $sha1_101 > file_101_foo2
+'
 
 test_expect_success \
-    '[index v2] 1) stream pack to repository' \
-    'rm -f .git/objects/pack/* &&
-     git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
-     git prune-packed &&
-     git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
-     cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
-     cmp "test-2-${pack1}.idx"  ".git/objects/pack/pack-${pack1}.idx"'
+	'[index v2] 3) corrupted delta happily returned wrong data' '
+	test -f file_101_foo2 && ! cmp file_101 file_101_foo2
+'
 
 test_expect_success \
-    '[index v2] 2) create a stealth corruption in a delta base reference' \
-    '# This test assumes file_101 is a delta smaller than 16 bytes.
-     # It should be against file_100 but we substitute its base for file_099
-     sha1_101=$(git hash-object file_101) &&
-     sha1_099=$(git hash-object file_099) &&
-     offs_101=$(index_obj_offset 1.idx $sha1_101) &&
-     nr_099=$(index_obj_nr 1.idx $sha1_099) &&
-     chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
-     dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
-        if=".git/objects/pack/pack-${pack1}.idx" \
-        skip=$((8 + 256 * 4 + $nr_099 * rawsz)) \
-        bs=1 count=$rawsz conv=notrunc &&
-     git cat-file blob $sha1_101 > file_101_foo2'
+	'[index v2] 4) confirm that the pack is actually corrupted' '
+	test_must_fail git fsck --full $commit
+'
 
 test_expect_success \
-    '[index v2] 3) corrupted delta happily returned wrong data' \
-    'test -f file_101_foo2 && ! cmp file_101 file_101_foo2'
+	'[index v2] 5) pack-objects refuses to reuse corrupted data' '
+	test_must_fail git pack-objects test-5 <obj-list &&
+	test_must_fail git pack-objects --no-reuse-object test-6 <obj-list
+'
 
 test_expect_success \
-    '[index v2] 4) confirm that the pack is actually corrupted' \
-    'test_must_fail git fsck --full $commit'
-
-test_expect_success \
-    '[index v2] 5) pack-objects refuses to reuse corrupted data' \
-    'test_must_fail git pack-objects test-5 <obj-list &&
-     test_must_fail git pack-objects --no-reuse-object test-6 <obj-list'
-
-test_expect_success \
-    '[index v2] 6) verify-pack detects CRC mismatch' \
-    'rm -f .git/objects/pack/* &&
-     git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
-     git verify-pack ".git/objects/pack/pack-${pack1}.pack" &&
-     obj=$(git hash-object file_001) &&
-     nr=$(index_obj_nr ".git/objects/pack/pack-${pack1}.idx" $obj) &&
-     chmod +w ".git/objects/pack/pack-${pack1}.idx" &&
-     printf xxxx | dd of=".git/objects/pack/pack-${pack1}.idx" conv=notrunc \
-        bs=1 count=4 seek=$((8 + 256 * 4 + $(wc -l <obj-list) * rawsz + $nr * 4)) &&
-     ( while read obj
-       do git cat-file -p $obj >/dev/null || exit 1
-       done <obj-list ) &&
-     test_must_fail git verify-pack ".git/objects/pack/pack-${pack1}.pack"
+	'[index v2] 6) verify-pack detects CRC mismatch' '
+	rm -f .git/objects/pack/* &&
+	git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
+	git verify-pack ".git/objects/pack/pack-${pack1}.pack" &&
+	obj=$(git hash-object file_001) &&
+	nr=$(index_obj_nr ".git/objects/pack/pack-${pack1}.idx" $obj) &&
+	chmod +w ".git/objects/pack/pack-${pack1}.idx" &&
+	printf xxxx | dd of=".git/objects/pack/pack-${pack1}.idx" conv=notrunc \
+		bs=1 count=4 seek=$((8 + 256 * 4 + $(wc -l <obj-list) * rawsz + $nr * 4)) &&
+	 ( while read obj
+	   do git cat-file -p $obj >/dev/null || exit 1
+	   done <obj-list ) &&
+	test_must_fail git verify-pack ".git/objects/pack/pack-${pack1}.pack"
 '
 
 test_expect_success 'running index-pack in the object store' '
-    rm -f .git/objects/pack/* &&
-    cp test-1-${pack1}.pack .git/objects/pack/pack-${pack1}.pack &&
-    (
-	cd .git/objects/pack &&
-	git index-pack pack-${pack1}.pack
-    ) &&
-    test -f .git/objects/pack/pack-${pack1}.idx
+	rm -f .git/objects/pack/* &&
+	cp test-1-${pack1}.pack .git/objects/pack/pack-${pack1}.pack &&
+	(
+		cd .git/objects/pack &&
+		git index-pack pack-${pack1}.pack
+	) &&
+	test -f .git/objects/pack/pack-${pack1}.idx
 '
 
 test_expect_success 'index-pack --strict warns upon missing tagger in tag' '
-    sha=$(git rev-parse HEAD) &&
-    cat >wrong-tag <<EOF &&
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<EOF &&
 object $sha
 type commit
 tag guten tag
@@ -256,18 +264,18 @@ tag guten tag
 This is an invalid tag.
 EOF
 
-    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
-    pack1=$(echo $tag $sha | git pack-objects tag-test) &&
-    echo remove tag object &&
-    thirtyeight=${tag#??} &&
-    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
-    git index-pack --strict tag-test-${pack1}.pack 2>err &&
-    grep "^warning:.* expected .tagger. line" err
+	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	pack1=$(echo $tag $sha | git pack-objects tag-test) &&
+	echo remove tag object &&
+	thirtyeight=${tag#??} &&
+	rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
+	git index-pack --strict tag-test-${pack1}.pack 2>err &&
+	grep "^warning:.* expected .tagger. line" err
 '
 
 test_expect_success 'index-pack --fsck-objects also warns upon missing tagger in tag' '
-    git index-pack --fsck-objects tag-test-${pack1}.pack 2>err &&
-    grep "^warning:.* expected .tagger. line" err
+	git index-pack --fsck-objects tag-test-${pack1}.pack 2>err &&
+	grep "^warning:.* expected .tagger. line" err
 '
 
 test_done

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 27/44] builtin/show-index: provide options to determine hash algo
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (25 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 26/44] t5302: modernize test formatting brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-18 16:20   ` Junio C Hamano
  2020-05-13  0:54 ` [PATCH 28/44] t1302: expect repo format version 1 for SHA-256 brian m. carlson
                   ` (18 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

It's possible to use a variety of index formats with show-index, and we
need a way to indicate the hash algorithm which is in use for a
particular index we'd like to show.  Default to using the value for the
repository we're in by calling setup_git_directory_gently, and allow
overriding it by using a --hash argument.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/show-index.c | 29 ++++++++++++++++++++++++-----
 git.c                |  2 +-
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/builtin/show-index.c b/builtin/show-index.c
index 0826f6a5a2..ebfa2e9abd 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -1,9 +1,12 @@
 #include "builtin.h"
 #include "cache.h"
 #include "pack.h"
+#include "parse-options.h"
 
-static const char show_index_usage[] =
-"git show-index";
+static const char *const show_index_usage[] = {
+	"git show-index [--hash=HASH]",
+	NULL
+};
 
 int cmd_show_index(int argc, const char **argv, const char *prefix)
 {
@@ -11,10 +14,26 @@ int cmd_show_index(int argc, const char **argv, const char *prefix)
 	unsigned nr;
 	unsigned int version;
 	static unsigned int top_index[256];
-	const unsigned hashsz = the_hash_algo->rawsz;
+	unsigned hashsz;
+	const char *hash_name = NULL;
+	int hash_algo;
+	const struct option show_index_options[] = {
+		OPT_STRING(0, "hash", &hash_name, N_("hash"),
+			   N_("specify the hash algorithm to use")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, show_index_options, show_index_usage, 0);
+
+	if (hash_name) {
+		hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("Unknown hash algorithm"));
+		repo_set_hash_algo(the_repository, hash_algo);
+	}
+
+	hashsz = the_hash_algo->rawsz;
 
-	if (argc != 1)
-		usage(show_index_usage);
 	if (fread(top_index, 2 * 4, 1, stdin) != 1)
 		die("unable to read header");
 	if (top_index[0] == htonl(PACK_IDX_SIGNATURE)) {
diff --git a/git.c b/git.c
index 2e4efb4ff0..e53e8159a2 100644
--- a/git.c
+++ b/git.c
@@ -573,7 +573,7 @@ static struct cmd_struct commands[] = {
 	{ "shortlog", cmd_shortlog, RUN_SETUP_GENTLY | USE_PAGER },
 	{ "show", cmd_show, RUN_SETUP },
 	{ "show-branch", cmd_show_branch, RUN_SETUP },
-	{ "show-index", cmd_show_index },
+	{ "show-index", cmd_show_index, RUN_SETUP_GENTLY },
 	{ "show-ref", cmd_show_ref, RUN_SETUP },
 	{ "sparse-checkout", cmd_sparse_checkout, RUN_SETUP | NEED_WORK_TREE },
 	{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 28/44] t1302: expect repo format version 1 for SHA-256
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (26 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 27/44] builtin/show-index: provide options to determine hash algo brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 29/44] Documentation/technical: document object-format for protocol v2 brian m. carlson
                   ` (17 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When using SHA-256, we need to take advantage of the extensions section
in the config file, so we need to use repository format version 1.
Update the test to look for the correct value.

Note that test_oid produces a value without a trailing newline, so use
echo to ensure we print a trailing newline to compare it correctly
against the actual results.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1302-repo-version.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t1302-repo-version.sh b/t/t1302-repo-version.sh
index ce4cff13bb..d60c042ce8 100755
--- a/t/t1302-repo-version.sh
+++ b/t/t1302-repo-version.sh
@@ -8,6 +8,10 @@ test_description='Test repository version check'
 . ./test-lib.sh
 
 test_expect_success 'setup' '
+	test_oid_cache <<-\EOF &&
+	version sha1:0
+	version sha256:1
+	EOF
 	cat >test.patch <<-\EOF &&
 	diff --git a/test.txt b/test.txt
 	new file mode 100644
@@ -23,7 +27,7 @@ test_expect_success 'setup' '
 '
 
 test_expect_success 'gitdir selection on normal repos' '
-	echo 0 >expect &&
+	echo $(test_oid version) >expect &&
 	git config core.repositoryformatversion >actual &&
 	git -C test config core.repositoryformatversion >actual2 &&
 	test_cmp expect actual &&

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 29/44] Documentation/technical: document object-format for protocol v2
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (27 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 28/44] t1302: expect repo format version 1 for SHA-256 brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 30/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
                   ` (16 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Document the object-format extension for protocol v2.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/technical/protocol-v2.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 7e3766cafb..107e421fb7 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -453,3 +453,12 @@ included in a request.  This is done by sending each option as a
 a request.
 
 The provided options must not contain a NUL or LF character.
+
+ object-format
+~~~~~~~~~~~~~~~
+
+The server can advertise the `object-format` capability with a value `X` (in the
+form `object-format=X`) to notify the client that the server is able to deal
+with objects using hash algorithm X.  If not specified, the server is assumed to
+only handle SHA-1.  If the client would like to use a hash algorithm other than
+SHA-1, it should specify its object-format string.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 30/44] connect: pass full packet reader when parsing v2 refs
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (28 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 29/44] Documentation/technical: document object-format for protocol v2 brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:13   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 31/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
                   ` (15 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When we're parsing refs, we need to know not only what the line we're
parsing is, but also the hash algorithm we should use to parse it, which
is stored in the reader object.  Pass the packet reader object through
to the protocol v2 ref parsing function.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/connect.c b/connect.c
index a39b843589..bb4bc4ab7f 100644
--- a/connect.c
+++ b/connect.c
@@ -373,7 +373,7 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 }
 
 /* Returns 1 when a valid ref has been added to `list`, 0 otherwise */
-static int process_ref_v2(const char *line, struct ref ***list)
+static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 {
 	int ret = 1;
 	int i = 0;
@@ -381,6 +381,7 @@ static int process_ref_v2(const char *line, struct ref ***list)
 	struct ref *ref;
 	struct string_list line_sections = STRING_LIST_INIT_DUP;
 	const char *end;
+	const char *line = reader->line;
 
 	/*
 	 * Ref lines have a number of fields which are space deliminated.  The
@@ -466,9 +467,10 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	}
 	packet_flush(fd_out);
 
+
 	/* Process response from server */
 	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
-		if (!process_ref_v2(reader->line, &list))
+		if (!process_ref_v2(reader, &list))
 			die(_("invalid ls-refs response: %s"), reader->line);
 	}
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 31/44] connect: parse v2 refs with correct hash algorithm
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (29 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 30/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:14   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 32/44] serve: advertise object-format capability for protocol v2 brian m. carlson
                   ` (14 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When using protocol v2, we need to know what hash algorithm is used by
the remote end.  See if the server has sent us an object-format
capability, and if so, use it to determine the hash algorithm in use and
set that value in the packet reader.  Parse the refs using this
algorithm.

Note that we use memcpy instead of oidcpy for copying values, since
oidcpy is intentionally limited to the current hash algorithm length,
and the copy will be too short if the server side uses SHA-256 but the
client side has not had a repository set up (and therefore defaults to
SHA-1).

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/connect.c b/connect.c
index bb4bc4ab7f..4e6462e52f 100644
--- a/connect.c
+++ b/connect.c
@@ -394,7 +394,7 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 		goto out;
 	}
 
-	if (parse_oid_hex(line_sections.items[i++].string, &old_oid, &end) ||
+	if (parse_oid_hex_algop(line_sections.items[i++].string, &old_oid, &end, reader->hash_algo) ||
 	    *end) {
 		ret = 0;
 		goto out;
@@ -402,7 +402,7 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 
 	ref = alloc_ref(line_sections.items[i++].string);
 
-	oidcpy(&ref->old_oid, &old_oid);
+	memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
 	**list = ref;
 	*list = &ref->next;
 
@@ -415,7 +415,8 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 			struct object_id peeled_oid;
 			char *peeled_name;
 			struct ref *peeled;
-			if (parse_oid_hex(arg, &peeled_oid, &end) || *end) {
+			if (parse_oid_hex_algop(arg, &peeled_oid, &end,
+						reader->hash_algo) || *end) {
 				ret = 0;
 				goto out;
 			}
@@ -423,7 +424,8 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 			peeled_name = xstrfmt("%s^{}", ref->name);
 			peeled = alloc_ref(peeled_name);
 
-			oidcpy(&peeled->old_oid, &peeled_oid);
+			memcpy(peeled->old_oid.hash, peeled_oid.hash,
+			       reader->hash_algo->rawsz);
 			**list = peeled;
 			*list = &peeled->next;
 
@@ -442,6 +444,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     const struct string_list *server_options)
 {
 	int i;
+	const char *hash_name;
 	*list = NULL;
 
 	if (server_supports_v2("ls-refs", 1))
@@ -450,6 +453,14 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	if (server_supports_v2("agent", 0))
 		packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized());
 
+	if (server_feature_v2("object-format", &hash_name)) {
+		int hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("unknown object format '%s' specified by server"), hash_name);
+		reader->hash_algo = &hash_algos[hash_algo];
+		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
+	}
+
 	if (server_options && server_options->nr &&
 	    server_supports_v2("server-option", 1))
 		for (i = 0; i < server_options->nr; i++)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 32/44] serve: advertise object-format capability for protocol v2
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (30 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 31/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:15   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 33/44] t5500: make hash independent brian m. carlson
                   ` (13 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

In order to communicate the protocol supported by the server side, add
support for advertising the object-format capability.  We check that the
client side sends us an identical algorithm if it sends us its own
object-format capability, and assume it speaks SHA-1 if not.

In the test, when we're using an algorithm other than SHA-1, we need to
specify the algorithm in use so we don't get a failure with an "unknown
format" message. Add a wrapper function that specifies this header if
required.  Skip specifying this header for SHA-1 to test that it works
both with and without this header.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 serve.c              | 27 +++++++++++++++++++++++++++
 t/t5701-git-serve.sh | 28 ++++++++++++++++++++--------
 2 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/serve.c b/serve.c
index 317256c1a4..7ab7807fef 100644
--- a/serve.c
+++ b/serve.c
@@ -22,6 +22,14 @@ static int agent_advertise(struct repository *r,
 	return 1;
 }
 
+static int object_format_advertise(struct repository *r,
+				   struct strbuf *value)
+{
+	if (value)
+		strbuf_addstr(value, r->hash_algo->name);
+	return 1;
+}
+
 struct protocol_capability {
 	/*
 	 * The name of the capability.  The server uses this name when
@@ -57,6 +65,7 @@ static struct protocol_capability capabilities[] = {
 	{ "ls-refs", always_advertise, ls_refs },
 	{ "fetch", upload_pack_advertise, upload_pack_v2 },
 	{ "server-option", always_advertise, NULL },
+	{ "object-format", object_format_advertise, NULL },
 };
 
 static void advertise_capabilities(void)
@@ -153,6 +162,22 @@ int has_capability(const struct argv_array *keys, const char *capability,
 	return 0;
 }
 
+static void check_algorithm(struct repository *r, struct argv_array *keys)
+{
+	int client = GIT_HASH_SHA1, server = hash_algo_by_ptr(r->hash_algo);
+	const char *algo_name;
+
+	if (has_capability(keys, "object-format", &algo_name)) {
+		client = hash_algo_by_name(algo_name);
+		if (client == GIT_HASH_UNKNOWN)
+			die("unknown object format '%s'", algo_name);
+	}
+
+	if (client != server)
+		die("mismatched object format: server %s; client %s\n",
+		    r->hash_algo->name, hash_algos[client].name);
+}
+
 enum request_state {
 	PROCESS_REQUEST_KEYS,
 	PROCESS_REQUEST_DONE,
@@ -223,6 +248,8 @@ static int process_request(void)
 	if (!command)
 		die("no command requested");
 
+	check_algorithm(the_repository, &keys);
+
 	command->command(the_repository, &keys, &reader);
 
 	argv_array_clear(&keys);
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index ffb9613885..bcb6453ae3 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -4,13 +4,24 @@ test_description='test protocol v2 server commands'
 
 . ./test-lib.sh
 
+write_command () {
+	echo "command=$1"
+
+	if test "$(test_oid algo)" != sha1
+	then
+		echo "object-format=$(test_oid algo)"
+	fi
+}
+
 test_expect_success 'test capability advertisement' '
+	test_oid_init &&
 	cat >expect <<-EOF &&
 	version 2
 	agent=git/$(git version | cut -d" " -f3)
 	ls-refs
 	fetch=shallow
 	server-option
+	object-format=$(test_oid algo)
 	0000
 	EOF
 
@@ -45,6 +56,7 @@ test_expect_success 'request invalid capability' '
 test_expect_success 'request with no command' '
 	test-tool pkt-line pack >in <<-EOF &&
 	agent=git/test
+	object-format=$(test_oid algo)
 	0000
 	EOF
 	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
@@ -53,7 +65,7 @@ test_expect_success 'request with no command' '
 
 test_expect_success 'request invalid command' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=foo
+	$(write_command foo)
 	agent=git/test
 	0000
 	EOF
@@ -73,7 +85,7 @@ test_expect_success 'setup some refs and tags' '
 
 test_expect_success 'basics of ls-refs' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=ls-refs
+	$(write_command ls-refs)
 	0000
 	EOF
 
@@ -95,7 +107,7 @@ test_expect_success 'basics of ls-refs' '
 
 test_expect_success 'basic ref-prefixes' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=ls-refs
+	$(write_command ls-refs)
 	0001
 	ref-prefix refs/heads/master
 	ref-prefix refs/tags/one
@@ -115,7 +127,7 @@ test_expect_success 'basic ref-prefixes' '
 
 test_expect_success 'refs/heads prefix' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=ls-refs
+	$(write_command ls-refs)
 	0001
 	ref-prefix refs/heads/
 	0000
@@ -135,7 +147,7 @@ test_expect_success 'refs/heads prefix' '
 
 test_expect_success 'peel parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=ls-refs
+	$(write_command ls-refs)
 	0001
 	peel
 	ref-prefix refs/tags/
@@ -156,7 +168,7 @@ test_expect_success 'peel parameter' '
 
 test_expect_success 'symrefs parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=ls-refs
+	$(write_command ls-refs)
 	0001
 	symrefs
 	ref-prefix refs/heads/
@@ -177,7 +189,7 @@ test_expect_success 'symrefs parameter' '
 
 test_expect_success 'sending server-options' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=ls-refs
+	$(write_command ls-refs)
 	server-option=hello
 	server-option=world
 	0001
@@ -199,7 +211,7 @@ test_expect_success 'unexpected lines are not allowed in fetch request' '
 	git init server &&
 
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	this-is-not-a-command
 	0000

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 33/44] t5500: make hash independent
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (31 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 32/44] serve: advertise object-format capability for protocol v2 brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 34/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
                   ` (12 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This test has hard-coded pkt-lines with object IDs.  The pkt-line
lengths necessarily differ between hash algorithms, so generate these
lines with the packetize helper so they're always the right size.  In
addition, we will require an object-format capability for SHA-256, so
pass that capability on to the upload-pack process.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5500-fetch-pack.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 52dd1a688c..8fee99ecfb 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -871,9 +871,10 @@ test_expect_success 'shallow since with commit graph and already-seen commit' '
 
 	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
 	0012command=fetch
+	$(echo "object-format=$(test_oid algo)" | packetize)
 	00010013deepen-since 1
-	0032want $(git rev-parse other)
-	0032have $(git rev-parse master)
+	$(echo "want $(git rev-parse other)" | packetize)
+	$(echo "have $(git rev-parse master)" | packetize)
 	0000
 	EOF
 	)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 34/44] builtin/ls-remote: initialize repository based on fetch
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (32 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 33/44] t5500: make hash independent brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:16   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 35/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
                   ` (11 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

ls-remote may or may not operate within a repository, and as such will
not have been initialized with the repository's hash algorithm.  Even if
it were, the remote side could be using a different algorithm and we
would still want to display those refs properly.  Find the hash
algorithm used by the remote side by querying the transport object and
set our hash algorithm accordingly.

Without this change, if the remote side is using SHA-256, we truncate
the refs to 40 hex characters, since that's the length of the default
hash algorithm (SHA-1).

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/ls-remote.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 6ef519514b..3a4dd12903 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -118,6 +118,10 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		transport->server_options = &server_options;
 
 	ref = transport_get_remote_refs(transport, &ref_prefixes);
+	if (ref) {
+		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
+		repo_set_hash_algo(the_repository, hash_algo);
+	}
 	if (transport_disconnect(transport)) {
 		UNLEAK(sorting);
 		return 1;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 35/44] remote-curl: detect algorithm for dumb HTTP by size
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (33 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 34/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:17   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 36/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
                   ` (10 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When reading the info/refs file for a repository, we have no explicit
way to detect which hash algorithm is in use because the file doesn't
provide one. Detect the hash algorithm in use by the size of the first
object ID.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 3ed0dfec1b..35275b42e9 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -252,6 +252,19 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	return list;
 }
 
+static const struct git_hash_algo *detect_hash_algo(struct discovery *heads)
+{
+	const char *p = memchr(heads->buf, '\t', heads->len);
+	int algo;
+	if (!p)
+		return NULL;
+
+	algo = hash_algo_by_length((p - heads->buf) / 2);
+	if (algo == GIT_HASH_UNKNOWN)
+		return NULL;
+	return &hash_algos[algo];
+}
+
 static struct ref *parse_info_refs(struct discovery *heads)
 {
 	char *data, *start, *mid;
@@ -262,6 +275,12 @@ static struct ref *parse_info_refs(struct discovery *heads)
 	struct ref *ref = NULL;
 	struct ref *last_ref = NULL;
 
+	options.hash_algo = detect_hash_algo(heads);
+	if (!options.hash_algo)
+		die("%sinfo/refs not valid: could not determine hash algorithm; "
+		    "is this a git repository?",
+		    url.buf);
+
 	data = heads->buf;
 	start = NULL;
 	mid = data;
@@ -272,13 +291,13 @@ static struct ref *parse_info_refs(struct discovery *heads)
 		if (data[i] == '\t')
 			mid = &data[i];
 		if (data[i] == '\n') {
-			if (mid - start != the_hash_algo->hexsz)
+			if (mid - start != options.hash_algo->hexsz)
 				die(_("%sinfo/refs not valid: is this a git repository?"),
 				    transport_anonymize_url(url.buf));
 			data[i] = 0;
 			ref_name = mid + 1;
 			ref = alloc_ref(ref_name);
-			get_oid_hex(start, &ref->old_oid);
+			get_oid_hex_algop(start, &ref->old_oid, options.hash_algo);
 			if (!refs)
 				refs = ref;
 			if (last_ref)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 36/44] builtin/index-pack: add option to specify hash algorithm
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (34 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 35/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-16 11:18   ` Martin Ågren
  2020-05-13  0:54 ` [PATCH 37/44] t1050: pass algorithm to index-pack when outside repo brian m. carlson
                   ` (9 subsequent siblings)
  45 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

git index-pack is usually run in a repository, but need not be. Since
packs don't contains information on the algorithm in use, instead
relying on context, add an option to index-pack to tell it which one
we're using in case someone runs it outside of a repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/index-pack.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7bea1fba52..89f4962a00 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1760,6 +1760,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					die(_("bad %s"), arg);
 			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
 				max_input_size = strtoumax(arg, NULL, 10);
+			} else if (skip_prefix(arg, "--object-format=", &arg)) {
+				int hash_algo = hash_algo_by_name(arg);
+				if (hash_algo == GIT_HASH_UNKNOWN)
+					die(_("unknown hash algorithm '%s'"), arg);
+				repo_set_hash_algo(the_repository, hash_algo);
 			} else
 				usage(index_pack_usage);
 			continue;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 37/44] t1050: pass algorithm to index-pack when outside repo
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (35 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 36/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 38/44] remote-curl: avoid truncating refs with ls-remote brian m. carlson
                   ` (8 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When outside a repository, git index-pack is unable to guess the hash
algorithm in use for a pack, since packs don't contain any information
on the algorithm in use. Pass an option to index-pack to help it out in
this test.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1050-large.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 7f88ea07c2..6a56d1ca24 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -12,6 +12,7 @@ file_size () {
 }
 
 test_expect_success setup '
+	test_oid_init &&
 	# clone does not allow us to pass core.bigfilethreshold to
 	# new repos, so set core.bigfilethreshold globally
 	git config --global core.bigfilethreshold 200k &&
@@ -177,7 +178,8 @@ test_expect_success 'git-show a large file' '
 
 test_expect_success 'index-pack' '
 	git clone file://"$(pwd)"/.git foo &&
-	GIT_DIR=non-existent git index-pack --strict --verify foo/.git/objects/pack/*.pack
+	GIT_DIR=non-existent git index-pack --object-format=$(test_oid algo) \
+		--strict --verify foo/.git/objects/pack/*.pack
 '
 
 test_expect_success 'repack' '

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 38/44] remote-curl: avoid truncating refs with ls-remote
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (36 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 37/44] t1050: pass algorithm to index-pack when outside repo brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 39/44] t/helper: initialize the repository for test-sha1-array brian m. carlson
                   ` (7 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Normally, the remote-curl transport helper is aware of the hash
algorithm we're using because we're in a repo with the appropriate hash
algorithm set. However, when using git ls-remote outside of a
repository, we won't have initialized the hash algorithm properly, so
use hash_to_hex_algop to print the ref corresponding to the algorithm
we've detected.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 35275b42e9..9808e53182 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -548,7 +548,9 @@ static void output_refs(struct ref *refs)
 		if (posn->symref)
 			printf("@%s %s\n", posn->symref, posn->name);
 		else
-			printf("%s %s\n", oid_to_hex(&posn->old_oid), posn->name);
+			printf("%s %s\n", hash_to_hex_algop(posn->old_oid.hash,
+							    options.hash_algo),
+					  posn->name);
 	}
 	printf("\n");
 	fflush(stdout);

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 39/44] t/helper: initialize the repository for test-sha1-array
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (37 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 38/44] remote-curl: avoid truncating refs with ls-remote brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 40/44] t5702: offer an object-format capability in the test brian m. carlson
                   ` (6 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

test-sha1-array uses the_hash_algo under the hood. Since t0064 wants to
use the value that is correct for the hash algorithm that we're testing,
make sure the test helper initializes the repository to set
the_hash_algo correctly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/helper/test-oid-array.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/helper/test-oid-array.c b/t/helper/test-oid-array.c
index ce9fd5f091..b16cd0b11b 100644
--- a/t/helper/test-oid-array.c
+++ b/t/helper/test-oid-array.c
@@ -12,6 +12,9 @@ int cmd__oid_array(int argc, const char **argv)
 {
 	struct oid_array array = OID_ARRAY_INIT;
 	struct strbuf line = STRBUF_INIT;
+	int nongit_ok;
+
+	setup_git_directory_gently(&nongit_ok);
 
 	while (strbuf_getline(&line, stdin) != EOF) {
 		const char *arg;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 40/44] t5702: offer an object-format capability in the test
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (38 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 39/44] t/helper: initialize the repository for test-sha1-array brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 41/44] t5703: use object-format serve option brian m. carlson
                   ` (5 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

In order to make this test work with SHA-256, offer an object-format
capability so that both sides use the same algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5702-protocol-v2.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 5039e66dc4..116358b9ac 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -13,6 +13,7 @@ start_git_daemon --export-all --enable=receive-pack
 daemon_parent=$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent
 
 test_expect_success 'create repo to be served by git-daemon' '
+	test_oid_init &&
 	git init "$daemon_parent" &&
 	test_commit -C "$daemon_parent" one
 '
@@ -394,6 +395,7 @@ test_expect_success 'even with handcrafted request, filter does not work if not
 	# Custom request that tries to filter even though it is not advertised.
 	test-tool pkt-line pack >in <<-EOF &&
 	command=fetch
+	object-format=$(test_oid algo)
 	0001
 	want $(git -C server rev-parse master)
 	filter blob:none

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 41/44] t5703: use object-format serve option
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (39 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 40/44] t5702: offer an object-format capability in the test brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 42/44] t5300: pass --object-format to git index-pack brian m. carlson
                   ` (4 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When we're using an algorithm other than SHA-1, we need to specify the
algorithm in use so we don't get a failure with an "unknown format"
message. Add a wrapper function that specifies this header if required.
Skip specifying this header for SHA-1 to test that it works both with an
without this header.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5703-upload-pack-ref-in-want.sh | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/t/t5703-upload-pack-ref-in-want.sh b/t/t5703-upload-pack-ref-in-want.sh
index a34460f7d8..afe7f7f919 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -27,6 +27,15 @@ check_output () {
 	test_cmp sorted_commits actual_commits
 }
 
+write_command () {
+	echo "command=$1"
+
+	if test "$(test_oid algo)" != sha1
+	then
+		echo "object-format=$(test_oid algo)"
+	fi
+}
+
 # c(o/foo) d(o/bar)
 #        \ /
 #         b   e(baz)  f(master)
@@ -62,7 +71,7 @@ test_expect_success 'config controls ref-in-want advertisement' '
 
 test_expect_success 'invalid want-ref line' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/non-existent
@@ -83,7 +92,7 @@ test_expect_success 'basic want-ref' '
 
 	oid=$(git rev-parse a) &&
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/master
@@ -107,7 +116,7 @@ test_expect_success 'multiple want-ref lines' '
 
 	oid=$(git rev-parse b) &&
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/o/foo
@@ -129,7 +138,7 @@ test_expect_success 'mix want and want-ref' '
 	git rev-parse e f >expected_commits &&
 
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/master
@@ -152,7 +161,7 @@ test_expect_success 'want-ref with ref we already have commit for' '
 
 	oid=$(git rev-parse c) &&
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/o/foo

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 42/44] t5300: pass --object-format to git index-pack
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (40 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 41/44] t5703: use object-format serve option brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 43/44] bundle: detect hash algorithm when reading refs brian m. carlson
                   ` (3 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

git index-pack by default reads the repository to determine the object
format. However, when outside of a repository, it's necessary to specify
the hash algorithm in use so that the pack can be properly indexed. Add
an --object-format argument when invoking git index-pack outside of a
repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5300-pack-object.sh | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 410a09b0dd..746cdb626e 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -12,7 +12,8 @@ TRASH=$(pwd)
 
 test_expect_success \
     'setup' \
-    'rm -f .git/index* &&
+    'test_oid_init &&
+     rm -f .git/index* &&
      perl -e "print \"a\" x 4096;" > a &&
      perl -e "print \"b\" x 4096;" > b &&
      perl -e "print \"c\" x 4096;" > c &&
@@ -412,18 +413,18 @@ test_expect_success 'set up pack for non-repo tests' '
 '
 
 test_expect_success 'index-pack --stdin complains of non-repo' '
-	nongit test_must_fail git index-pack --stdin <foo.pack &&
+	nongit test_must_fail git index-pack --object-format=$(test_oid algo) --stdin <foo.pack &&
 	test_path_is_missing non-repo/.git
 '
 
 test_expect_success 'index-pack <pack> works in non-repo' '
-	nongit git index-pack ../foo.pack &&
+	nongit git index-pack --object-format=$(test_oid algo) ../foo.pack &&
 	test_path_is_file foo.idx
 '
 
 test_expect_success 'index-pack --strict <pack> works in non-repo' '
 	rm -f foo.idx &&
-	nongit git index-pack --strict ../foo.pack &&
+	nongit git index-pack --strict --object-format=$(test_oid algo) ../foo.pack &&
 	test_path_is_file foo.idx
 '
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 43/44] bundle: detect hash algorithm when reading refs
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (41 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 42/44] t5300: pass --object-format to git index-pack brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-13  0:54 ` [PATCH 44/44] remote-testgit: adapt for object-format brian m. carlson
                   ` (2 subsequent siblings)
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Much like with the dumb HTTP transport, there isn't a way to explicitly
specify the hash algorithm when dealing with a bundle, so detect the
algorithm based on the length of the object IDs in the prerequisites and
ref advertisements.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 bundle.c    | 22 +++++++++++++++++++++-
 bundle.h    |  1 +
 transport.c | 10 ++++++++--
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/bundle.c b/bundle.c
index 99439e07a1..2a0d744d3f 100644
--- a/bundle.c
+++ b/bundle.c
@@ -23,6 +23,17 @@ static void add_to_ref_list(const struct object_id *oid, const char *name,
 	list->nr++;
 }
 
+static const struct git_hash_algo *detect_hash_algo(struct strbuf *buf)
+{
+	size_t len = strcspn(buf->buf, " \n");
+	int algo;
+
+	algo = hash_algo_by_length(len / 2);
+	if (algo == GIT_HASH_UNKNOWN)
+		return NULL;
+	return &hash_algos[algo];
+}
+
 static int parse_bundle_header(int fd, struct bundle_header *header,
 			       const char *report_path)
 {
@@ -52,12 +63,21 @@ static int parse_bundle_header(int fd, struct bundle_header *header,
 		}
 		strbuf_rtrim(&buf);
 
+		if (!header->hash_algo) {
+			header->hash_algo = detect_hash_algo(&buf);
+			if (!header->hash_algo) {
+				error(_("unknown hash algorithm length"));
+				status = -1;
+				break;
+			}
+		}
+
 		/*
 		 * Tip lines have object name, SP, and refname.
 		 * Prerequisites have object name that is optionally
 		 * followed by SP and subject line.
 		 */
-		if (parse_oid_hex(buf.buf, &oid, &p) ||
+		if (parse_oid_hex_algop(buf.buf, &oid, &p, header->hash_algo) ||
 		    (*p && !isspace(*p)) ||
 		    (!is_prereq && !*p)) {
 			if (report_path)
diff --git a/bundle.h b/bundle.h
index ceab0c7475..2dc9442024 100644
--- a/bundle.h
+++ b/bundle.h
@@ -15,6 +15,7 @@ struct ref_list {
 struct bundle_header {
 	struct ref_list prerequisites;
 	struct ref_list references;
+	const struct git_hash_algo *hash_algo;
 };
 
 int is_bundle(const char *path, int quiet);
diff --git a/transport.c b/transport.c
index b43d985f90..38a432be69 100644
--- a/transport.c
+++ b/transport.c
@@ -143,6 +143,9 @@ static struct ref *get_refs_from_bundle(struct transport *transport,
 	data->fd = read_bundle_header(transport->url, &data->header);
 	if (data->fd < 0)
 		die(_("could not read bundle '%s'"), transport->url);
+
+	transport->hash_algo = data->header.hash_algo;
+
 	for (i = 0; i < data->header.references.nr; i++) {
 		struct ref_list_entry *e = data->header.references.list + i;
 		struct ref *ref = alloc_ref(e->name);
@@ -157,11 +160,14 @@ static int fetch_refs_from_bundle(struct transport *transport,
 			       int nr_heads, struct ref **to_fetch)
 {
 	struct bundle_transport_data *data = transport->data;
+	int ret;
 
 	if (!data->get_refs_from_bundle_called)
 		get_refs_from_bundle(transport, 0, NULL);
-	return unbundle(the_repository, &data->header, data->fd,
-			transport->progress ? BUNDLE_VERBOSE : 0);
+	ret = unbundle(the_repository, &data->header, data->fd,
+			   transport->progress ? BUNDLE_VERBOSE : 0);
+	transport->hash_algo = data->header.hash_algo;
+	return ret;
 }
 
 static int close_bundle(struct transport *transport)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH 44/44] remote-testgit: adapt for object-format
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (42 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 43/44] bundle: detect hash algorithm when reading refs brian m. carlson
@ 2020-05-13  0:54 ` brian m. carlson
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
  45 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

When using an algorithm other than SHA-1, we need the remote helper to
advertise support for the object-format extension and provide
information back to us so that we can properly parse refs and return
data. Ensure that the test remote helper understands these extensions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5801/git-remote-testgit | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
index 6b9f0b5dc7..1544d6dc6b 100755
--- a/t/t5801/git-remote-testgit
+++ b/t/t5801/git-remote-testgit
@@ -52,9 +52,11 @@ do
 		test -n "$GIT_REMOTE_TESTGIT_SIGNED_TAGS" && echo "signed-tags"
 		test -n "$GIT_REMOTE_TESTGIT_NO_PRIVATE_UPDATE" && echo "no-private-update"
 		echo 'option'
+		echo 'object-format'
 		echo
 		;;
 	list)
+		echo ":object-format $(git rev-parse --show-object-format=storage)"
 		git for-each-ref --format='? %(refname)' 'refs/heads/' 'refs/tags/'
 		head=$(git symbolic-ref HEAD)
 		echo "@$head HEAD"
@@ -139,6 +141,10 @@ do
 			test $val = "true" && force="true" || force=
 			echo "ok"
 			;;
+		object-format)
+			test $val = "true" && object_format="true" || object_format=
+			echo "ok"
+			;;
 		*)
 			echo "unsupported"
 			;;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 02/44] Documentation: document v1 protocol object-format capability
  2020-05-13  0:53 ` [PATCH 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
@ 2020-05-13 19:28   ` Martin Ågren
  2020-05-14  1:12     ` Junio C Hamano
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-13 19:28 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:

> @@ -189,7 +204,6 @@ refs being sent.
>
>  Clients MAY use the parameters from this capability to select the proper initial
>  branch when cloning a repository.
> -
>  shallow
>  -------

Looks like a spurious line deletion snuck in.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 03/44] connect: have ref processing code take struct packet_reader
  2020-05-13  0:53 ` [PATCH 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
@ 2020-05-13 19:30   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-13 19:30 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> In a future patch, we'll want to access multiple members from struct
> packet_reader when parsing references.  Therefore, have the ref parsing
> code take pointers to struct reader instead of having to pass multiple
> arguments to each function.

Makes sense.

> -static void process_capabilities(const char *line, int *len)
> +static void process_capabilities(struct packet_reader *reader, int *len)
>  {
> +       const char *line = reader->line;
>         int nul_location = strlen(line);
>         if (nul_location == *len)
>                 return;

"line+len" made it pretty obvious that they belonged together.
"reader+len" not so much. Your patch does minimize the change. Would
s/len/linelen/ be worth the extra churn? Possibly not. Right now, at
least we're pretty consistent about using "len" -- if this ends up as a
mixture of "linelen" and "len" I think it's worse, overall.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 04/44] wrapper: add function to compare strings with different NUL termination
  2020-05-13  0:53 ` [PATCH 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
@ 2020-05-13 19:32   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-13 19:32 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 8ba576e81e..6503deb171 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -868,6 +868,8 @@ char *xgetcwd(void);
>  FILE *fopen_for_writing(const char *path);
>  FILE *fopen_or_warn(const char *path, const char *mode);
>
> +int xstrncmpz(const char *s, const char *t, size_t len);
> +
>  /*
>   * FREE_AND_NULL(ptr) is like free(ptr) followed by ptr = NULL. Note
>   * that ptr is used twice, so don't pass e.g. ptr++.
> diff --git a/wrapper.c b/wrapper.c
> index 3a1c0e0526..15a09740e7 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -430,6 +430,18 @@ int xmkstemp(char *filename_template)
>         return fd;
>  }
>
> +/*
> + * Like strncmp, but only return zero if s is NUL-terminated and exactly len
> + * characters long.  If it is not, consider it greater than t.
> + */

I think this comment would be easier to find in the .h file.

And since I'm already commenting...

> +int xstrncmpz(const char *s, const char *t, size_t len)
> +{
> +       int res = strncmp(s, t, len);
> +       if (res)
> +               return res;
> +       return s[len] == '\0' ? 0 : 1;
> +}
> +
>  /* Adapted from libiberty's mkstemp.c. */
>
>  #undef TMP_MAX

It's not entirely obvious from the context, but this function is
inserted between some "tmp" stuff and some other "tmp" stuff. I don't
think we need to bikeshed its exact home, but maybe "close to other
string stuff", or at least not in the middle of the "tmp" section.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 07/44] connect: add function to fetch value of a v2 server capability
  2020-05-13  0:53 ` [PATCH 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
@ 2020-05-13 19:37   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-13 19:37 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:58, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> +int server_feature_v2(const char *c, const char **v)
> +{
> +       int i;
> +
> +       for (i = 0; i < server_capabilities_v2.argc; i++) {
> +               const char *out;
> +               if (skip_prefix(server_capabilities_v2.argv[i], c, &out) &&
> +                   (*out == '=')) {
> +                       *v = out + 1;
> +                       return 1;
> +               }
> +       }
> +       return 0;
> +}
> +

This looks like it was based on `server_supports_feature()`, which
explains the "1 means yup got it, 0 means no match". The name of
`server_supports_feature()` does suggest the boolean nature of return
value. For this new function, I would perhaps have expected "0 means
success, negative means error". That said, I'm not familiar with
connect.c. Let's see how this is used...

>  int server_supports_feature(const char *c, const char *feature,
>                             int die_on_error)
>  {

Just a thought:

Maybe this existing function could learn to take a pointer (or NULL) and
assign to it if we have a '=' (possibly even requiring a '=' if this new
pointer is non-NULL). I dunno, maybe two similar functions are better
after all than having one with modes like that.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 10/44] connect: add function to detect supported v1 hash functions
  2020-05-13  0:53 ` [PATCH 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
@ 2020-05-13 19:39   ` Martin Ågren
  2020-05-13 22:49     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-13 19:39 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> Add a function, server_supports_hash, to see if the remote server
> supports a particular hash algorithm when speaking protocol v1.

> +int server_supports_hash(const char *desired, int *feature_supported)
> +{
> +       int offset = 0;
> +       int len, found = 0;
> +       const char *hash;
> +
> +       hash = next_server_feature_value("object-format", &len, &offset);
> +       if (feature_supported)
> +               *feature_supported = !!hash;

If we got something, anything, the server supports this feature. It just
remains to see if it supports the exact algorithm we're after.

> +       if (!hash) {
> +               hash = hash_algos[GIT_HASH_SHA1].name;
> +               len = strlen(hash);
> +       }

OK, if the server doesn't say anything, we fall back to SHA-1. If it's
the desired one, we'll return 1 accordingly below.

> +       while (hash) {
> +               if (!xstrncmpz(desired, hash, len))
> +                       found = 1;
> +
> +               if (found)
> +                       return 1;

I first thought this structure was because this loop body would learn to
do something else later in the series. But this is it. This looks like
it could just be "if (!xstrncmpz(...)) return 1;" and drop "found".



> +               hash = next_server_feature_value("object-format", &len, &offset);
> +       }
> +       return 0;
> +}

Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 11/44] send-pack: detect when the server doesn't support our hash
  2020-05-13  0:53 ` [PATCH 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-05-13 19:41   ` Martin Ågren
  2020-05-13 22:52     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-13 19:41 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:58, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Detect when the server doesn't support our hash algorithm and abort.

> +       if (!server_supports_hash(the_hash_algo->name, &object_format_supported))
> +               die(_("the receiving end does not support this repository's hash algorithm"));

I suppose this isn't the long-term wanted behavior? Would this be where
we would later learn to realize that "oh, crap, we need to
convert/translate on the fly"?

> @@ -428,6 +432,8 @@ int send_pack(struct send_pack_args *args,
>                 strbuf_addstr(&cap_buf, " atomic");
>         if (use_push_options)
>                 strbuf_addstr(&cap_buf, " push-options");
> +       if (object_format_supported)
> +               strbuf_addf(&cap_buf, " object-format=%s", the_hash_algo->name);

This isn't advertised in the log message: If we do detect support, go on
to reply with our choice of object format / hash algo name.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 12/44] connect: make parse_feature_value extern
  2020-05-13  0:53 ` [PATCH 12/44] connect: make parse_feature_value extern brian m. carlson
@ 2020-05-13 19:48   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-13 19:48 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> We're going to be using this function in other files, so no longer mark
> this function static.
>
>  static char *server_capabilities_v1;
>  static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
> -static const char *parse_feature_value(const char *, const char *, int *, int *);
>  static const char *next_server_feature_value(const char *feature, int *len, int *offset);

> -static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
> +const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
>  {

> --- a/connect.h
> +++ b/connect.h
> +const char *parse_feature_value(const char *, const char *, int *, int *);

This "char *, int *" comes from the forward-declaration above, which is
now dropped. Now that this is a header file for everyone to use, I think
these parameters should be named, at least, but even better would be
some documentation. ;-)

I'll stop reading here. I'm not familiar with the technical details here
(i.e., where you'd be most interested in review), so I've just left some
more or less superficial comments.

One thing I've noticed is that there are relatively few tests so far. I
suppose it could be hard to trigger things before everything is properly
plugged through. But maybe at least various error paths could be
exercised already at this point, such as in the previous patch I
commented on.

So far I feel like I'm following along ok and I have a feeling I know
where this is leading up to. Nicely done so far.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 10/44] connect: add function to detect supported v1 hash functions
  2020-05-13 19:39   ` Martin Ågren
@ 2020-05-13 22:49     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13 22:49 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

On 2020-05-13 at 19:39:41, Martin Ågren wrote:
> On Wed, 13 May 2020 at 02:56, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > +       while (hash) {
> > +               if (!xstrncmpz(desired, hash, len))
> > +                       found = 1;
> > +
> > +               if (found)
> > +                       return 1;
> 
> I first thought this structure was because this loop body would learn to
> do something else later in the series. But this is it. This looks like
> it could just be "if (!xstrncmpz(...)) return 1;" and drop "found".

Yeah, I think it could.  I originally didn't have the helper and the
code was pretty hideous, so I probably forgot to simplify when I used
the helper again.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 11/44] send-pack: detect when the server doesn't support our hash
  2020-05-13 19:41   ` Martin Ågren
@ 2020-05-13 22:52     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-13 22:52 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 1481 bytes --]

On 2020-05-13 at 19:41:15, Martin Ågren wrote:
> On Wed, 13 May 2020 at 02:58, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > Detect when the server doesn't support our hash algorithm and abort.
> 
> > +       if (!server_supports_hash(the_hash_algo->name, &object_format_supported))
> > +               die(_("the receiving end does not support this repository's hash algorithm"));
> 
> I suppose this isn't the long-term wanted behavior? Would this be where
> we would later learn to realize that "oh, crap, we need to
> convert/translate on the fly"?

Yes, this would be the point at which we'd decide whether we could
support the remote side's algorithm and decide to rewrite objects.  We
might still fail, such as if we're SHA-256 only without a lookup table
and the remote side is SHA-1, but theoretically we'd do the conversion
here.

> > @@ -428,6 +432,8 @@ int send_pack(struct send_pack_args *args,
> >                 strbuf_addstr(&cap_buf, " atomic");
> >         if (use_push_options)
> >                 strbuf_addstr(&cap_buf, " push-options");
> > +       if (object_format_supported)
> > +               strbuf_addf(&cap_buf, " object-format=%s", the_hash_algo->name);
> 
> This isn't advertised in the log message: If we do detect support, go on
> to reply with our choice of object format / hash algo name.

I'll update the message.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 02/44] Documentation: document v1 protocol object-format capability
  2020-05-13 19:28   ` Martin Ågren
@ 2020-05-14  1:12     ` Junio C Hamano
  2020-05-15 23:22       ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Junio C Hamano @ 2020-05-14  1:12 UTC (permalink / raw)
  To: Martin Ågren; +Cc: brian m. carlson, Git Mailing List, Jonathan Tan

Martin Ågren <martin.agren@gmail.com> writes:

> On Wed, 13 May 2020 at 02:56, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
>
>> @@ -189,7 +204,6 @@ refs being sent.
>>
>>  Clients MAY use the parameters from this capability to select the proper initial
>>  branch when cloning a repository.
>> -
>>  shallow
>>  -------
>
> Looks like a spurious line deletion snuck in.

Indeed.  I wonder if that is why our documentation build fails near
the tip of 'pu'.


^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 02/44] Documentation: document v1 protocol object-format capability
  2020-05-14  1:12     ` Junio C Hamano
@ 2020-05-15 23:22       ` brian m. carlson
  2020-05-16  0:02         ` Junio C Hamano
  0 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-15 23:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Ågren, Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

On 2020-05-14 at 01:12:19, Junio C Hamano wrote:
> Martin Ågren <martin.agren@gmail.com> writes:
> 
> > On Wed, 13 May 2020 at 02:56, brian m. carlson
> > <sandals@crustytoothpaste.net> wrote:
> >
> >> @@ -189,7 +204,6 @@ refs being sent.
> >>
> >>  Clients MAY use the parameters from this capability to select the proper initial
> >>  branch when cloning a repository.
> >> -
> >>  shallow
> >>  -------
> >
> > Looks like a spurious line deletion snuck in.
> 
> Indeed.  I wonder if that is why our documentation build fails near
> the tip of 'pu'.

I'll definitely do a reroll this weekend and kick a basic doc build off
before I send it out.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 02/44] Documentation: document v1 protocol object-format capability
  2020-05-15 23:22       ` brian m. carlson
@ 2020-05-16  0:02         ` Junio C Hamano
  0 siblings, 0 replies; 175+ messages in thread
From: Junio C Hamano @ 2020-05-16  0:02 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Martin Ågren, Git Mailing List, Jonathan Tan

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2020-05-14 at 01:12:19, Junio C Hamano wrote:
>> Martin Ågren <martin.agren@gmail.com> writes:
>> 
>> > On Wed, 13 May 2020 at 02:56, brian m. carlson
>> > <sandals@crustytoothpaste.net> wrote:
>> >
>> >> @@ -189,7 +204,6 @@ refs being sent.
>> >>
>> >>  Clients MAY use the parameters from this capability to select the proper initial
>> >>  branch when cloning a repository.
>> >> -
>> >>  shallow
>> >>  -------
>> >
>> > Looks like a spurious line deletion snuck in.
>> 
>> Indeed.  I wonder if that is why our documentation build fails near
>> the tip of 'pu'.
>
> I'll definitely do a reroll this weekend and kick a basic doc build off
> before I send it out.

FWIW, I've fixed it up on my end so the documentation build of 'pu'
has been working OK.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 14/44] connect: detect algorithm when fetching refs
  2020-05-13  0:53 ` [PATCH 14/44] connect: detect algorithm when fetching refs brian m. carlson
@ 2020-05-16 10:40   ` Martin Ågren
  2020-05-16 19:59     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 10:40 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:57, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> If we're fetching refs, detect the hash algorithm and parse the refs
> using that algorithm.

As the added documentation from patch 2 says, if there are multiple
"object-format" capabilities, "the first one given is the one used in
the ref advertisement". And that's what you implement below.

Explaining that in this commit message and/or referring to "a recent
commit" (patch 2) and/or adding that documentation here, not back then,
would have avoided some confusion on my part, and perhaps also for
future readers.

I don't have a strong opinion on which of those is better, I just think
you could somehow make that a bit clearer here.

>  static void process_capabilities(struct packet_reader *reader, int *len)
>  {
> +       const char *feat_val;
> +       int feat_len;
> +       int hash_algo;

You could reduce the scope of `hash_algo`.

>         const char *line = reader->line;
>         int nul_location = strlen(line);
>         if (nul_location == *len)
>                 return;
>         server_capabilities_v1 = xstrdup(line + nul_location + 1);
>         *len = nul_location;
> +
> +       feat_val = server_feature_value("object-format", &feat_len);
> +       if (feat_val) {
> +               char *hash_name = xstrndup(feat_val, feat_len);
> +               hash_algo = hash_algo_by_name(hash_name);
> +               if (hash_algo != GIT_HASH_UNKNOWN)
> +                       reader->hash_algo = &hash_algos[hash_algo];
> +               free(hash_name);
> +       }
>  }

xstrndup is needed because we're not guaranteed a terminating NUL. You
remember to call free afterwards. Ok.

If we don't get any "object-format", we do basically nothing here and
`reader->hash_algo` will remain as whatever it already is. The docs from
patch 2 promise that this will be handled as "SHA-1" -- would it be more
robust if we did a similar fallback dance as you do elsewhere?

  feat_val = ...;
  if (!feat_val) {
          feat_val = hash_algos[GIT_HASH_SHA1].name;
          feat_len = strlen(feat_val);
  }
  char *hash_name = ...
  ...

You do initialize `reader->hash_algo` in patch 8, so I don't think this
changes anything now. Maybe it's just premature future-proofing (if such
a thing exists).



Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 15/44] builtin/receive-pack: detect when the server doesn't support our hash
  2020-05-13  0:53 ` [PATCH 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-05-16 10:41   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 10:41 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> +                       const char *hash;
> +                       int len = 0;

Micronit: These new variables are used in tandem. I could see these as
"NULL, 0" or both uninitialized, but this is a mixture. That's a really
small nit, of course. (Maybe you needed to fight a compiler warning?)

> +                       hash = parse_feature_value(feature_list, "object-format", &len, NULL);
> +                       if (!hash) {
> +                               hash = hash_algos[GIT_HASH_SHA1].name;
> +                               len = strlen(hash);
> +                       }
> +                       if (xstrncmpz(the_hash_algo->name, hash, len))
> +                               die("error: unsupported object format '%s'", hash);

Ok, this is a familiar pattern by now: if we get nothing, behave as if
we got SHA-1.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 19/44] builtin/clone: initialize hash algorithm properly
  2020-05-13  0:53 ` [PATCH 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
@ 2020-05-16 10:48   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 10:48 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:57, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> When performing a clone, we don't know what hash algorithm the other end
> will support.  Currently, we don't support fetching data belonging to a
> different algorithm, so we must know what algorithm the remote side is
> using in order to properly initialize the repository.  We can know that
> only after fetching the refs, so if the remote side has any references,
> use that information to reinitialize the repository with the correct
> hash algorithm information.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  builtin/clone.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index cb48a291ca..f27d38bc8e 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1217,6 +1217,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>         refs = transport_get_remote_refs(transport, &ref_prefixes);
>
>         if (refs) {
> +               int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
> +
> +               /*
> +                * Now that we know what algorithm the remote side is using,
> +                * let's set ours to the same thing.
> +                */
> +               initialize_repository_version(hash_algo);

This made me go "huh". It's not really new in this series, it's just
that from `initialize_repository_version(int)` I would have expected the
argument to be the, well, repository version, not a hash algo
identifier. But it all makes sense once you realize that the function is
"please initialize the repository version based on this stuff that I
give you" where, currently, the only input is a hash algo. (I see that
Han-Wen's reftable series adds another parameter here.)

> +               repo_set_hash_algo(the_repository, hash_algo);

I first wondered whether all calls to `repo_set_hash_algo()` would want
to be preceded by `initialize_repository_version()`, which might call
for the latter being called by the former. But I guess not. Various
users of `repo_set_hash_algo()` -- not that there would be a lot of them
-- might want to do similar updating and/or sanity checks, but the exact
details would differ.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 20/44] t5562: pass object-format in synthesized test data
  2020-05-13  0:54 ` [PATCH 20/44] t5562: pass object-format in synthesized test data brian m. carlson
@ 2020-05-16 10:55   ` Martin Ågren
  2020-05-16 19:50     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 10:55 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Ensure that we pass the object-format capability in the synthesized test
> data so that this test works with algorithms other than SHA-1.

Right.

> In addition, add an additional test using the old data for when we're
> using SHA-1 so that we can be sure that we preserve backwards
> compatibility with servers not offering the object-format capability.

I'll have some questions on this below.

> @@ -62,8 +63,8 @@ test_expect_success 'setup' '
>         test_copy_bytes 10 <fetch_body >fetch_body.trunc &&
>         hash_next=$(git commit-tree -p HEAD -m next HEAD^{tree}) &&
>         {
> -               printf "%s %s refs/heads/newbranch\\0report-status\\n" \
> -                       "$ZERO_OID" "$hash_next" | packetize &&
> +               printf "%s %s refs/heads/newbranch\\0report-status object-format=%s\\n" \
> +                       "$ZERO_OID" "$hash_next" "$(test_oid algo)" | packetize &&
>                 printf 0000 &&
>                 echo "$hash_next" | git pack-objects --stdout
>         } >push_body &&

Makes sense.

> @@ -117,6 +118,15 @@ test_expect_success GZIP 'push plain' '
>         test_cmp act.head exp.head
>  '
>
> +test_expect_success GZIP 'push plain with SHA-1' '
> +       test_when_finished "git branch -D newbranch" &&
> +       test_http_env receive push_body &&
> +       verify_http_result "200 OK" &&
> +       git rev-parse newbranch >act.head &&
> +       echo "$hash_next" >exp.head &&
> +       test_cmp act.head exp.head
> +'
> +

Hmmm. Isn't this an exact copy of the 'push plain' test immediately
preceding it? The commit message talks about using the "old data"
(i.e., without "object-format=%s"?). Should this test use a variant of
push_body where we're not adding "object-format"? I'm not sure I grok
what exactly we want to test here.. And does it really belong in
t/t*-content-length.sh?


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 21/44] t5704: send object-format capability with SHA-256
  2020-05-13  0:54 ` [PATCH 21/44] t5704: send object-format capability with SHA-256 brian m. carlson
@ 2020-05-16 11:02   ` Martin Ågren
  2020-05-16 19:14     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:02 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> When we speak protocol v2 in this test, we must pass the object-format
> header if the algorithm is not SHA-1.  Otherwise, git upload-pack fails
> because the hash algorithm doesn't match and not because we've failed to
> speak the protocol correctly.  Pass the header so that our assertions
> test what we're really interested in.

> +# If we don't print the object format, we'll fail for a spurious reason: the
> +# mismatched object format.
> +print_object_format () {
> +       local algo=$(test_oid algo) &&
> +       if test "$algo" != "sha1"
> +       then
> +               packetize "object-format=$algo"
> +       fi
> +}
> +
>  test_expect_success 'extra delim packet in v2 ls-refs args' '
>         {
>                 packetize command=ls-refs &&
> +               print_object_format &&
>                 printf 0001 &&
>                 # protocol expects 0000 flush here
>                 printf 0001
> @@ -21,6 +32,7 @@ test_expect_success 'extra delim packet in v2 ls-refs args' '
>  test_expect_success 'extra delim packet in v2 fetch args' '
>         {
>                 packetize command=fetch &&
> +               print_object_format &&
>                 printf 0001 &&
>                 # protocol expects 0000 flush here
>                 printf 0001

So we need to pass this capability for the SHA-256 tests to run ok. But
if we start passing "object-format=sha1" unconditionally at this point
in the series, the tests will fail:

  error: 'grep expected flush after ls-refs arguments err' didn't find
a match in:
  fatal: unknown capability 'object-format=sha1'

That is, we don't yet actually implement "object-format" handling. So
this will still fail with SHA-256 ("unknown capability"), just that once
the implementation is in place, the SHA-256 tests will pass (as will the
normal SHA-1 runs). Do I understand that correctly?

Or put differently, by the end of the series, we can do this:

diff --git a/t/t5704-protocol-violations.sh b/t/t5704-protocol-violations.sh
index 47e78932b9..22993812e2 100755
--- a/t/t5704-protocol-violations.sh
+++ b/t/t5704-protocol-violations.sh
@@ -6,14 +6,11 @@ communications if the other side says something
unexpected. We are mostly
 making sure that we do not segfault or otherwise behave badly.'
 . ./test-lib.sh

-# If we don't print the object format, we'll fail for a spurious reason: the
-# mismatched object format.
+# If we don't print the object format, we might fail for a spurious reason:
+# the mismatched object format.
 print_object_format () {
        local algo=$(test_oid algo) &&
-       if test "$algo" != "sha1"
-       then
-               packetize "object-format=$algo"
-       fi
+       packetize "object-format=$algo"
 }

 test_expect_success 'extra delim packet in v2 ls-refs args' '

Should we? (And if we do, we might as well drop this function and inline
the whole thing, IMHO.)


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 22/44] fetch-pack: parse and advertise the object-format capability
  2020-05-13  0:54 ` [PATCH 22/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
@ 2020-05-16 11:03   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:03 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Parse the server's object-format capability and respond accordingly,
> dying if there is a mismatch.

> +       if (server_feature_v2("object-format", &hash_name)) {
> +               int hash_algo = hash_algo_by_name(hash_name);
> +               if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
> +                       die(_("mismatched algorithms: client %s; server %s"),
> +                           the_hash_algo->name, hash_name);
> +               packet_write_fmt(fd_out, "object-format=%s", the_hash_algo->name);
> +       }
> +       else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1)
> +               die(_("the server does not support algorithm '%s'"),
> +                   the_hash_algo->name);

Micronit: "} else if (...) {", i.e., join to one line and add braces.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 23/44] setup: set the_repository's hash algo when checking format
  2020-05-13  0:54 ` [PATCH 23/44] setup: set the_repository's hash algo when checking format brian m. carlson
@ 2020-05-16 11:03   ` Martin Ågren
  2020-05-16 19:29     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:03 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> When we're checking the repository's format, set the hash algorithm at
> the same time.  This ensures that we perform a suitable initialization
> early enough to avoid confusing any parts of the code.  If we defer
> until later, we can end up with portions of the code which are confused
> about the hash algorithm, resulting in segfaults.

This doesn't make a difference as long as you just use SHA-1, right?
That is, this isn't a bug in the first half of this series nor in
v2.27-rc0 as long as you stick to SHA-1?


> --- a/setup.c
> +++ b/setup.c
> @@ -1273,6 +1273,7 @@ void check_repository_format(struct repository_format *fmt)
>                 fmt = &repo_fmt;
>         check_repository_format_gently(get_git_dir(), fmt, NULL);
>         startup_info->have_repository = 1;
> +       repo_set_hash_algo(the_repository, fmt->hash_algo);
>         clear_repository_format(&repo_fmt);
>  }

Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 24/44] t3200: mark assertion with SHA1 prerequisite
  2020-05-13  0:54 ` [PATCH 24/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
@ 2020-05-16 11:04   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:04 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> One of the test assertions in this test checks that git branch -m works
> even without a .git/config file.  However, if the repository requires
> configuration extensions, such as because it uses a non-SHA-1 algorithm,
> this assertion will fail.  Mark the assertion as requiring SHA-1.

Makes sense.

> -test_expect_success 'git branch -m q q2 without config should succeed' '
> +test_expect_success SHA1 'git branch -m q q2 without config should succeed' '
>         git branch -m q q2 &&
>         git branch -m q2 q
>  '

Going forward, we might need config files for other reasons (reftable?),
meaning this would become "SHA1,!REFTABLE". So maybe this should be
"!CONFIG_EXTENSIONS" or "CONFIG_LESS". I think this is ok for now,
though. When/if someone needs to make another fix like this here -- or
at the very least the *third* time around -- that's when we should think
a bit bigger.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 25/44] packfile: compute and use the index CRC offset
  2020-05-13  0:54 ` [PATCH 25/44] packfile: compute and use the index CRC offset brian m. carlson
@ 2020-05-16 11:12   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:12 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Both v2 pack index files and the v3 format specified as part of the
> NewHash work have similar data starting at the CRC table.  Much of the
> existing code wants to read either this table or the offset entries
> following it, and in doing so computes the offset each time.
>
> In order to share as much code between v2 and v3, compute the offset of
> the CRC table and store it when the pack is opened.  Use this value to
> compute offsets to not only the CRC table, but to the offset entries
> beyond it.

> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1555,13 +1555,9 @@ static void read_v2_anomalous_offsets(struct packed_git *p,
>  {
>         const uint32_t *idx1, *idx2;
>         uint32_t i;
> -       const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t);
>
>         /* The address of the 4-byte offset table */
> -       idx1 = (((const uint32_t *)p->index_data)
> -               + 2 /* 8-byte header */
> -               + 256 /* fan out */
> -               + hashwords * p->num_objects /* object ID table */
> +       idx1 = (((const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset))
>                 + p->num_objects /* CRC32 table */
>                 );

This counts in four-byte words (so `+ 2` skips ahead 8B as the comment
notes). And that's why we need to use "rawsz/4".

Not new in this patch, but that outer pair of parenthesis just makes
this harder to read, IMHO. I keep scanning back and forth wondering,
"where is this whole thing going to get multiplied or something?"

  idx1 = (const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset)
         + p->num_objects /* CRC32 table */;

The double-casting can be avoided with something like this, but I'm not
sure it's really any better:

  idx1 = (const uint32_t *)p->index_data
         + p->crc_offset/sizeof(uint32_t)
         + p->num_objects /* CRC32 table */;

> --- a/packfile.c
> +++ b/packfile.c
> @@ -178,6 +178,7 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
>                      */
>                     (sizeof(off_t) <= 4))
>                         return error("pack too large for current definition of off_t in %s", path);
> +               p->crc_offset = 8 + 4 * 256 + nr * hashsz;
>         }
>
>         p->index_version = version;

It doesn't fit in the context, but `nr` will be assigned to
`p->num_objects`. And now we can just use `hashsz` without dividing by
4, so this does the same calculation as the old one above.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 30/44] connect: pass full packet reader when parsing v2 refs
  2020-05-13  0:54 ` [PATCH 30/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
@ 2020-05-16 11:13   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:13 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> @@ -466,9 +467,10 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
>         }
>         packet_flush(fd_out);
>
> +

Ah, this is where that line from patch 2 went. ;-)


>         /* Process response from server */
>         while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
> -               if (!process_ref_v2(reader->line, &list))
> +               if (!process_ref_v2(reader, &list))
>                         die(_("invalid ls-refs response: %s"), reader->line);
>         }

Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 31/44] connect: parse v2 refs with correct hash algorithm
  2020-05-13  0:54 ` [PATCH 31/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
@ 2020-05-16 11:14   ` Martin Ågren
  2020-05-17 22:37     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:14 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:58, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> When using protocol v2, we need to know what hash algorithm is used by
> the remote end.  See if the server has sent us an object-format
> capability, and if so, use it to determine the hash algorithm in use and
> set that value in the packet reader.  Parse the refs using this
> algorithm.
>
> Note that we use memcpy instead of oidcpy for copying values, since
> oidcpy is intentionally limited to the current hash algorithm length,
> and the copy will be too short if the server side uses SHA-256 but the
> client side has not had a repository set up (and therefore defaults to
> SHA-1).

> -       oidcpy(&ref->old_oid, &old_oid);
> +       memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);

Might an `oidcpy_algop()` prove useful over time?

  oidcpy_algop(&ref->old_oid, &old_oid, reader->hash_algo);

> @@ -442,6 +444,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
>                              const struct string_list *server_options)
>  {
>         int i;
> +       const char *hash_name;
>         *list = NULL;
>
>         if (server_supports_v2("ls-refs", 1))
> @@ -450,6 +453,14 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
>         if (server_supports_v2("agent", 0))
>                 packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized());
>
> +       if (server_feature_v2("object-format", &hash_name)) {
> +               int hash_algo = hash_algo_by_name(hash_name);
> +               if (hash_algo == GIT_HASH_UNKNOWN)
> +                       die(_("unknown object format '%s' specified by server"), hash_name);
> +               reader->hash_algo = &hash_algos[hash_algo];
> +               packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
> +       }
> +

(Similar to an earlier comment I made, if we don't see any
"object-format", we rely on `reader->hash_algo` to have been properly
set up (which it has) and to not have been modified since (which we
could probably rely on, hmm?).)



Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 32/44] serve: advertise object-format capability for protocol v2
  2020-05-13  0:54 ` [PATCH 32/44] serve: advertise object-format capability for protocol v2 brian m. carlson
@ 2020-05-16 11:15   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:15 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> In order to communicate the protocol supported by the server side, add
> support for advertising the object-format capability.  We check that the
> client side sends us an identical algorithm if it sends us its own
> object-format capability, and assume it speaks SHA-1 if not.
>
> In the test, when we're using an algorithm other than SHA-1, we need to
> specify the algorithm in use so we don't get a failure with an "unknown
> format" message. Add a wrapper function that specifies this header if
> required.  Skip specifying this header for SHA-1 to test that it works
> both with and without this header.

This last sentence sort of answers an earlier question I made: should we
stop special-casing in the test and just always write the capability? I
can see your point here, but it only applies if you actually go to the
trouble of running the tests both with SHA-1 and SHA-256, right?

That is, I wonder if we shouldn't always pass the "object-format"
capability in the tests and, if we have the SHA-1 prereq, execute a
dedicated test where we do not pass it and verify that we default
correctly. Hmm?

> +write_command () {
> +       echo "command=$1"
> +
> +       if test "$(test_oid algo)" != sha1
> +       then
> +               echo "object-format=$(test_oid algo)"
> +       fi
> +}
> +
>  test_expect_success 'test capability advertisement' '
> +       test_oid_init &&
>         cat >expect <<-EOF &&
>         version 2
>         agent=git/$(git version | cut -d" " -f3)
>         ls-refs
>         fetch=shallow
>         server-option
> +       object-format=$(test_oid algo)
>         0000
>         EOF
>
> @@ -45,6 +56,7 @@ test_expect_success 'request invalid capability' '
>  test_expect_success 'request with no command' '
>         test-tool pkt-line pack >in <<-EOF &&
>         agent=git/test
> +       object-format=$(test_oid algo)
>         0000
>         EOF
>         test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&

In these two tests, we give "object-format" unconditionally, meaning
that in a SHA-1 run, we don't *always* skip passing in the capability.
So that's good. Should we verify that the implementation acts on the
"object-format=sha1" capability? Can we? The server should behave as
if it wasn't passed in at all, so I'm not sure how we could do that.

But that brings me to another point: Shouldn't we try to test the whole
"mismatched object format" detection by passing in "sha1" in a SHA-256
build and "sha256" with SHA-1. I suppose a `test_oid wrong_algo` could
come in handy in lots of negative tests that we'll want to add
throughout. Or maybe that doesn't quite fit the long-term goal.


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 34/44] builtin/ls-remote: initialize repository based on fetch
  2020-05-13  0:54 ` [PATCH 34/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
@ 2020-05-16 11:16   ` Martin Ågren
  2020-05-16 20:28     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:16 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:58, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> ls-remote may or may not operate within a repository, and as such will
> not have been initialized with the repository's hash algorithm.  Even if
> it were, the remote side could be using a different algorithm and we
> would still want to display those refs properly.  Find the hash
> algorithm used by the remote side by querying the transport object and
> set our hash algorithm accordingly.
>
> Without this change, if the remote side is using SHA-256, we truncate
> the refs to 40 hex characters, since that's the length of the default
> hash algorithm (SHA-1).

Could we add a test that passes now but would have failed before?

>         ref = transport_get_remote_refs(transport, &ref_prefixes);
> +       if (ref) {
> +               int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
> +               repo_set_hash_algo(the_repository, hash_algo);
> +       }

This will modify `the_hash_algo`. Quoting commit 78a6766802 ("Integrate
hash algorithm support with repo setup", 2017-11-12):

  Add a constant, the_hash_algo, which points to the hash_algo structure
  pointer in the repository global.  Note that this is the hash which is
  used to serialize data to disk, not the hash which is used to display
  items to the user.  The transition plan anticipates that these may be
  different.  We can add an additional element in the future (say,
  ui_hash_algo) to provide for this case.

Don't we violate that here? Is it mostly luck that we can go on to list
what we want to list and that we will never write to disk based on
`the_hash_algo` being "wrong"(?)? Or am I missing something?


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 35/44] remote-curl: detect algorithm for dumb HTTP by size
  2020-05-13  0:54 ` [PATCH 35/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
@ 2020-05-16 11:17   ` Martin Ågren
  0 siblings, 0 replies; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:17 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:57, brian m. carlson
<sandals@crustytoothpaste.net> wrote:

> +       options.hash_algo = detect_hash_algo(heads);
> +       if (!options.hash_algo)
> +               die("%sinfo/refs not valid: could not determine hash algorithm; "
> +                   "is this a git repository?",
> +                   url.buf);

Should this use `transport_anonymize_url()`?

>                 if (data[i] == '\n') {
> -                       if (mid - start != the_hash_algo->hexsz)
> +                       if (mid - start != options.hash_algo->hexsz)
>                                 die(_("%sinfo/refs not valid: is this a git repository?"),
>                                     transport_anonymize_url(url.buf));

Like here and elsewhere.

>                         data[i] = 0;
>                         ref_name = mid + 1;
>                         ref = alloc_ref(ref_name);
> -                       get_oid_hex(start, &ref->old_oid);
> +                       get_oid_hex_algop(start, &ref->old_oid, options.hash_algo);

Other than that, looks ok.



Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 36/44] builtin/index-pack: add option to specify hash algorithm
  2020-05-13  0:54 ` [PATCH 36/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
@ 2020-05-16 11:18   ` Martin Ågren
  2020-05-16 20:47     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-16 11:18 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Jonathan Tan

On Wed, 13 May 2020 at 02:56, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> git index-pack is usually run in a repository, but need not be. Since
> packs don't contains information on the algorithm in use, instead
> relying on context, add an option to index-pack to tell it which one
> we're using in case someone runs it outside of a repository.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  builtin/index-pack.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 7bea1fba52..89f4962a00 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1760,6 +1760,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
>                                         die(_("bad %s"), arg);
>                         } else if (skip_prefix(arg, "--max-input-size=", &arg)) {
>                                 max_input_size = strtoumax(arg, NULL, 10);
> +                       } else if (skip_prefix(arg, "--object-format=", &arg)) {
> +                               int hash_algo = hash_algo_by_name(arg);
> +                               if (hash_algo == GIT_HASH_UNKNOWN)
> +                                       die(_("unknown hash algorithm '%s'"), arg);
> +                               repo_set_hash_algo(the_repository, hash_algo);
>                         } else

Patch 27 added `--hash` to `git show-index` and I almost commented on
"hash" vs "object-format". In the end I figured the object format was a
more technical (protocol) term. But now I wonder. Should we try to align
such options from the start? Or is there perhaps a reason for those
different approaches?

Similar to an earlier patch where we modify `the_hash_algo` like this, I
feel a bit nervous. What happens if you pass in a "wrong" algo here,
i.e., SHA-1 in a SHA-256 repo? Or, given the motivation in the commit
message, should this only be allowed if we really *are* outside a repo?


Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 21/44] t5704: send object-format capability with SHA-256
  2020-05-16 11:02   ` Martin Ågren
@ 2020-05-16 19:14     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-16 19:14 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]

On 2020-05-16 at 11:02:48, Martin Ågren wrote:
> So we need to pass this capability for the SHA-256 tests to run ok. But
> if we start passing "object-format=sha1" unconditionally at this point
> in the series, the tests will fail:
> 
>   error: 'grep expected flush after ls-refs arguments err' didn't find
> a match in:
>   fatal: unknown capability 'object-format=sha1'
> 
> That is, we don't yet actually implement "object-format" handling. So
> this will still fail with SHA-256 ("unknown capability"), just that once
> the implementation is in place, the SHA-256 tests will pass (as will the
> normal SHA-1 runs). Do I understand that correctly?

Yes, that's correct.

> Or put differently, by the end of the series, we can do this:
> 
> diff --git a/t/t5704-protocol-violations.sh b/t/t5704-protocol-violations.sh
> index 47e78932b9..22993812e2 100755
> --- a/t/t5704-protocol-violations.sh
> +++ b/t/t5704-protocol-violations.sh
> @@ -6,14 +6,11 @@ communications if the other side says something
> unexpected. We are mostly
>  making sure that we do not segfault or otherwise behave badly.'
>  . ./test-lib.sh
> 
> -# If we don't print the object format, we'll fail for a spurious reason: the
> -# mismatched object format.
> +# If we don't print the object format, we might fail for a spurious reason:
> +# the mismatched object format.
>  print_object_format () {
>         local algo=$(test_oid algo) &&
> -       if test "$algo" != "sha1"
> -       then
> -               packetize "object-format=$algo"
> -       fi
> +       packetize "object-format=$algo"
>  }
> 
>  test_expect_success 'extra delim packet in v2 ls-refs args' '
> 
> Should we? (And if we do, we might as well drop this function and inline
> the whole thing, IMHO.)

We certainly can.  I'll move this later on in the series so that we
can simplify the code.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 23/44] setup: set the_repository's hash algo when checking format
  2020-05-16 11:03   ` Martin Ågren
@ 2020-05-16 19:29     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-16 19:29 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

On 2020-05-16 at 11:03:56, Martin Ågren wrote:
> On Wed, 13 May 2020 at 02:56, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > When we're checking the repository's format, set the hash algorithm at
> > the same time.  This ensures that we perform a suitable initialization
> > early enough to avoid confusing any parts of the code.  If we defer
> > until later, we can end up with portions of the code which are confused
> > about the hash algorithm, resulting in segfaults.
> 
> This doesn't make a difference as long as you just use SHA-1, right?
> That is, this isn't a bug in the first half of this series nor in
> v2.27-rc0 as long as you stick to SHA-1?

Correct, because the default is SHA-1 if no algorithm is specified.
I'll update the commit message to reflect that this affects only
SHA-256.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 20/44] t5562: pass object-format in synthesized test data
  2020-05-16 10:55   ` Martin Ågren
@ 2020-05-16 19:50     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-16 19:50 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

On 2020-05-16 at 10:55:33, Martin Ågren wrote:
> On Wed, 13 May 2020 at 02:56, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > Ensure that we pass the object-format capability in the synthesized test
> > data so that this test works with algorithms other than SHA-1.
> 
> Right.
> 
> > In addition, add an additional test using the old data for when we're
> > using SHA-1 so that we can be sure that we preserve backwards
> > compatibility with servers not offering the object-format capability.
> 
> I'll have some questions on this below.

I think this got dropped in the rebase.

> Hmmm. Isn't this an exact copy of the 'push plain' test immediately
> preceding it? The commit message talks about using the "old data"
> (i.e., without "object-format=%s"?). Should this test use a variant of
> push_body where we're not adding "object-format"? I'm not sure I grok
> what exactly we want to test here.. And does it really belong in
> t/t*-content-length.sh?

It is.  I'll probably drop this part of the patch.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 14/44] connect: detect algorithm when fetching refs
  2020-05-16 10:40   ` Martin Ågren
@ 2020-05-16 19:59     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-16 19:59 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 2384 bytes --]

On 2020-05-16 at 10:40:11, Martin Ågren wrote:
> On Wed, 13 May 2020 at 02:57, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > If we're fetching refs, detect the hash algorithm and parse the refs
> > using that algorithm.
> 
> As the added documentation from patch 2 says, if there are multiple
> "object-format" capabilities, "the first one given is the one used in
> the ref advertisement". And that's what you implement below.
> 
> Explaining that in this commit message and/or referring to "a recent
> commit" (patch 2) and/or adding that documentation here, not back then,
> would have avoided some confusion on my part, and perhaps also for
> future readers.

I'll try to reword to improve things.

> >  static void process_capabilities(struct packet_reader *reader, int *len)
> >  {
> > +       const char *feat_val;
> > +       int feat_len;
> > +       int hash_algo;
> 
> You could reduce the scope of `hash_algo`.

Can do.

> >         const char *line = reader->line;
> >         int nul_location = strlen(line);
> >         if (nul_location == *len)
> >                 return;
> >         server_capabilities_v1 = xstrdup(line + nul_location + 1);
> >         *len = nul_location;
> > +
> > +       feat_val = server_feature_value("object-format", &feat_len);
> > +       if (feat_val) {
> > +               char *hash_name = xstrndup(feat_val, feat_len);
> > +               hash_algo = hash_algo_by_name(hash_name);
> > +               if (hash_algo != GIT_HASH_UNKNOWN)
> > +                       reader->hash_algo = &hash_algos[hash_algo];
> > +               free(hash_name);
> > +       }
> >  }
> 
> xstrndup is needed because we're not guaranteed a terminating NUL. You
> remember to call free afterwards. Ok.
> 
> If we don't get any "object-format", we do basically nothing here and
> `reader->hash_algo` will remain as whatever it already is. The docs from
> patch 2 promise that this will be handled as "SHA-1" -- would it be more
> robust if we did a similar fallback dance as you do elsewhere?
> 
>   feat_val = ...;
>   if (!feat_val) {
>           feat_val = hash_algos[GIT_HASH_SHA1].name;
>           feat_len = strlen(feat_val);
>   }
>   char *hash_name = ...
>   ...

Yeah, I can do that.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 34/44] builtin/ls-remote: initialize repository based on fetch
  2020-05-16 11:16   ` Martin Ågren
@ 2020-05-16 20:28     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-16 20:28 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 3591 bytes --]

On 2020-05-16 at 11:16:46, Martin Ågren wrote:
> On Wed, 13 May 2020 at 02:58, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > ls-remote may or may not operate within a repository, and as such will
> > not have been initialized with the repository's hash algorithm.  Even if
> > it were, the remote side could be using a different algorithm and we
> > would still want to display those refs properly.  Find the hash
> > algorithm used by the remote side by querying the transport object and
> > set our hash algorithm accordingly.
> >
> > Without this change, if the remote side is using SHA-256, we truncate
> > the refs to 40 hex characters, since that's the length of the default
> > hash algorithm (SHA-1).
> 
> Could we add a test that passes now but would have failed before?

The existing tests that call "git ls-remote" actually fail with SHA-256
if we don't do this, specifically "ls-remote works outside repository"
in t5512.  That's the thing with a lot of this series: our existing test
suite is enormously effective at catching these things, but writing a
new test is hard because we can't actually instantiate a SHA-256
repository (because then users could, and it's broken until the end of
the series).  Perhaps unsurprisingly, that's how I found this problem.

So while I would love to write a test for this case, I can't without
allowing users to corrupt and destroy their data in the mean time (or
tacking the final six commits to this series).

> >         ref = transport_get_remote_refs(transport, &ref_prefixes);
> > +       if (ref) {
> > +               int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
> > +               repo_set_hash_algo(the_repository, hash_algo);
> > +       }
> 
> This will modify `the_hash_algo`. Quoting commit 78a6766802 ("Integrate
> hash algorithm support with repo setup", 2017-11-12):
> 
>   Add a constant, the_hash_algo, which points to the hash_algo structure
>   pointer in the repository global.  Note that this is the hash which is
>   used to serialize data to disk, not the hash which is used to display
>   items to the user.  The transition plan anticipates that these may be
>   different.  We can add an additional element in the future (say,
>   ui_hash_algo) to provide for this case.
> 
> Don't we violate that here? Is it mostly luck that we can go on to list
> what we want to list and that we will never write to disk based on
> `the_hash_algo` being "wrong"(?)? Or am I missing something?

We do violate that and we also rely on it never having any effect on our
current repository.  Unfortunately, as things stand now, we don't
support multiple hash algorithms in the same running binary, and we
can't until we allow a member of struct object_id to vary based on the
hash algorithm.  That work is coming in a future series (after we have a
fully functioning SHA-256 stage 4 implementation), but at this point,
I'm still working through all of the crashes we get from random places
where we make assumptions about initializing things, so it's not a
straightforward fix.

For now, I think this is the best we can do without major additional
surgery to the codebase.  I'm fine with stating that git ls-remote can
read the repository (to parse remotes) but can't write to it, since
that's the behavior users will expect anyway.  I'll update the commit
message to reflect that wart and assumption, since it would be good to
document it.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 36/44] builtin/index-pack: add option to specify hash algorithm
  2020-05-16 11:18   ` Martin Ågren
@ 2020-05-16 20:47     ` brian m. carlson
  2020-05-17 18:16       ` Martin Ågren
  0 siblings, 1 reply; 175+ messages in thread
From: brian m. carlson @ 2020-05-16 20:47 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 2972 bytes --]

On 2020-05-16 at 11:18:12, Martin Ågren wrote:
> On Wed, 13 May 2020 at 02:56, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > git index-pack is usually run in a repository, but need not be. Since
> > packs don't contains information on the algorithm in use, instead
> > relying on context, add an option to index-pack to tell it which one
> > we're using in case someone runs it outside of a repository.
> >
> > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> > ---
> >  builtin/index-pack.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> > index 7bea1fba52..89f4962a00 100644
> > --- a/builtin/index-pack.c
> > +++ b/builtin/index-pack.c
> > @@ -1760,6 +1760,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
> >                                         die(_("bad %s"), arg);
> >                         } else if (skip_prefix(arg, "--max-input-size=", &arg)) {
> >                                 max_input_size = strtoumax(arg, NULL, 10);
> > +                       } else if (skip_prefix(arg, "--object-format=", &arg)) {
> > +                               int hash_algo = hash_algo_by_name(arg);
> > +                               if (hash_algo == GIT_HASH_UNKNOWN)
> > +                                       die(_("unknown hash algorithm '%s'"), arg);
> > +                               repo_set_hash_algo(the_repository, hash_algo);
> >                         } else
> 
> Patch 27 added `--hash` to `git show-index` and I almost commented on
> "hash" vs "object-format". In the end I figured the object format was a
> more technical (protocol) term. But now I wonder. Should we try to align
> such options from the start? Or is there perhaps a reason for those
> different approaches?

I'll bring them into sync.

> Similar to an earlier patch where we modify `the_hash_algo` like this, I
> feel a bit nervous. What happens if you pass in a "wrong" algo here,
> i.e., SHA-1 in a SHA-256 repo? Or, given the motivation in the commit
> message, should this only be allowed if we really *are* outside a repo?

Unfortunately, we can't prevent the user from being inside repository A,
which is SHA-1, while invoking git index-pack on repository B, which is
SHA-256.  That is valid without --stdin, if uncommon, and it needs to be
supported.  I can prevent it from being used with --stdin, though.

If you pass in a wrong algorithm, we usually blow up with an inflate
error because we consume more bytes than expected with our ref deltas.
I'm not aware of any cases where we segfault or access invalid memory;
we just blow up in a nonobvious way.  That's true, too, if you manually
tamper with the algorithm in extensions.objectformat; usually we blow up
(but not segfault) because the index is "corrupt".
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 36/44] builtin/index-pack: add option to specify hash algorithm
  2020-05-16 20:47     ` brian m. carlson
@ 2020-05-17 18:16       ` Martin Ågren
  2020-05-17 20:52         ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Martin Ågren @ 2020-05-17 18:16 UTC (permalink / raw)
  To: brian m. carlson, Martin Ågren, Git Mailing List, Jonathan Tan

On Sat, 16 May 2020 at 22:47, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-05-16 at 11:18:12, Martin Ågren wrote:
> > On Wed, 13 May 2020 at 02:56, brian m. carlson
> > <sandals@crustytoothpaste.net> wrote:
> > >
> > > git index-pack is usually run in a repository, but need not be. Since
> > > packs don't contains information on the algorithm in use, instead
> > > relying on context, add an option to index-pack to tell it which one
> > > we're using in case someone runs it outside of a repository.
>
> > Similar to an earlier patch where we modify `the_hash_algo` like this, I
> > feel a bit nervous. What happens if you pass in a "wrong" algo here,
> > i.e., SHA-1 in a SHA-256 repo? Or, given the motivation in the commit
> > message, should this only be allowed if we really *are* outside a repo?
>
> Unfortunately, we can't prevent the user from being inside repository A,
> which is SHA-1, while invoking git index-pack on repository B, which is
> SHA-256.

Ah, I see.

>  That is valid without --stdin, if uncommon, and it needs to be
> supported.  I can prevent it from being used with --stdin, though.

Hmm, that might make sense. I suppose it could quickly get out of
control with bug reports coming in along the lines of "if I do this
really crazy git index-pack invocation, I manage to mess things up". The
easiest way to address this might be through documentation, i.e., "don't
use this option", "for internal use" or even "to be used by the test
suite only" for which there is even precedence in git-index-pack(1).

On the other hand, if we need to detect such hash mismatch even once the
SHA-256 work is 100% complete, then I suppose we really should try a
bit to catch bad invocations.

As a tangent, I see that v2.27.0 will come with `git init
--object-format=<format>` and `GIT_DEFAULT_HASH_ALGORITHM`. The docs for
the former mentions "(if enabled)". Should we add something more scary
to those to make it clear that they shouldn't be used and that you
basically shouldn't even try to figure out how to enable them? I can
already see the tweets and blog posts a few weeks from now about how you
can build Git from source setting a single switch, run

  git init --object-format=sha256

and you're in the future! Which will just lead to pain some days or
weeks later.... "I've done lots of work. How do I convert my repo to
SHA-1 so I can share it?"...

We've added "experimental" things before and tried to document the
experimental nature. Maybe here we're not even "experimental" -- more
like "if you use this in production, you *will* suffer"?

> If you pass in a wrong algorithm, we usually blow up with an inflate
> error because we consume more bytes than expected with our ref deltas.
> I'm not aware of any cases where we segfault or access invalid memory;
> we just blow up in a nonobvious way.  That's true, too, if you manually
> tamper with the algorithm in extensions.objectformat; usually we blow up
> (but not segfault) because the index is "corrupt".

Ok, I see. I suppose "some time", we could tweak error messages to hint
about an object-format mismatch, but I don't think that needs to block
your work here now.

Martin

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 36/44] builtin/index-pack: add option to specify hash algorithm
  2020-05-17 18:16       ` Martin Ågren
@ 2020-05-17 20:52         ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-17 20:52 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 2869 bytes --]

On 2020-05-17 at 18:16:37, Martin Ågren wrote:
> On Sat, 16 May 2020 at 22:47, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >  That is valid without --stdin, if uncommon, and it needs to be
> > supported.  I can prevent it from being used with --stdin, though.
> 
> Hmm, that might make sense. I suppose it could quickly get out of
> control with bug reports coming in along the lines of "if I do this
> really crazy git index-pack invocation, I manage to mess things up". The
> easiest way to address this might be through documentation, i.e., "don't
> use this option", "for internal use" or even "to be used by the test
> suite only" for which there is even precedence in git-index-pack(1).
> 
> On the other hand, if we need to detect such hash mismatch even once the
> SHA-256 work is 100% complete, then I suppose we really should try a
> bit to catch bad invocations.

I can add documentation and a warning there.

If we actually verified the checksum at the end of the pack first, then
we'd be able to distinguish the two cases, because we'd try to compute a
clearly invalid hash over the body, and the likelihood of it matching
would be very small.  We don't at the moment, for reasons I'm unclear
about, but it's probably performance.

> As a tangent, I see that v2.27.0 will come with `git init
> --object-format=<format>` and `GIT_DEFAULT_HASH_ALGORITHM`. The docs for
> the former mentions "(if enabled)". Should we add something more scary
> to those to make it clear that they shouldn't be used and that you
> basically shouldn't even try to figure out how to enable them? I can
> already see the tweets and blog posts a few weeks from now about how you
> can build Git from source setting a single switch, run
> 
>   git init --object-format=sha256
> 
> and you're in the future! Which will just lead to pain some days or
> weeks later.... "I've done lots of work. How do I convert my repo to
> SHA-1 so I can share it?"...
> 
> We've added "experimental" things before and tried to document the
> experimental nature. Maybe here we're not even "experimental" -- more
> like "if you use this in production, you *will* suffer"?

Well, the option is there, but it produces the following:

  % git init --object-format=sha256
  fatal: The hash algorithm sha256 is not supported in this build.

which can be distinguished from this:

  % git init --object-format=blake2b
  fatal: unknown hash algorithm 'blake2b'

Right now it's pretty broken without this series, so you can't use it.
I mean, you have the source and can remove the check, but it doesn't
work as it stands, so I'm not too worried about people trying to do that
at the moment.  I'll sneak in some documentation for the end product,
though.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 31/44] connect: parse v2 refs with correct hash algorithm
  2020-05-16 11:14   ` Martin Ågren
@ 2020-05-17 22:37     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-17 22:37 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]

On 2020-05-16 at 11:14:16, Martin Ågren wrote:
> On Wed, 13 May 2020 at 02:58, brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >
> > When using protocol v2, we need to know what hash algorithm is used by
> > the remote end.  See if the server has sent us an object-format
> > capability, and if so, use it to determine the hash algorithm in use and
> > set that value in the packet reader.  Parse the refs using this
> > algorithm.
> >
> > Note that we use memcpy instead of oidcpy for copying values, since
> > oidcpy is intentionally limited to the current hash algorithm length,
> > and the copy will be too short if the server side uses SHA-256 but the
> > client side has not had a repository set up (and therefore defaults to
> > SHA-1).
> 
> > -       oidcpy(&ref->old_oid, &old_oid);
> > +       memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
> 
> Might an `oidcpy_algop()` prove useful over time?
> 
>   oidcpy_algop(&ref->old_oid, &old_oid, reader->hash_algo);

I think I can just omit this chunk, because oidcpy now copies the entire
struct for speed.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 27/44] builtin/show-index: provide options to determine hash algo
  2020-05-13  0:54 ` [PATCH 27/44] builtin/show-index: provide options to determine hash algo brian m. carlson
@ 2020-05-18 16:20   ` Junio C Hamano
  2020-05-19  0:31     ` brian m. carlson
  0 siblings, 1 reply; 175+ messages in thread
From: Junio C Hamano @ 2020-05-18 16:20 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Jonathan Tan

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> It's possible to use a variety of index formats with show-index, and we
> need a way to indicate the hash algorithm which is in use for a
> particular index we'd like to show.  Default to using the value for the
> repository we're in by calling setup_git_directory_gently, and allow
> overriding it by using a --hash argument.

I think you meant to say that "show-index" does not autodetect what
hash algorithm is used from its input, and the new argument is a way
for the user to help the command when the hash algorithm is
different from what is used in the current repository?

I ask because I found that your version can be read to say that
"show-index" can show the contents of a given pack index using any
hash algorithm we support, and the user can specify --hash=SHA-256
when running the command on a pack .idx that uses SHA-1 object names
to auto-convert it, and readers wouldn't be able to guess which was
meant with only the above five lines.

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  builtin/show-index.c | 29 ++++++++++++++++++++++++-----
>  git.c                |  2 +-
>  2 files changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/show-index.c b/builtin/show-index.c
> index 0826f6a5a2..ebfa2e9abd 100644
> --- a/builtin/show-index.c
> +++ b/builtin/show-index.c
> @@ -1,9 +1,12 @@
>  #include "builtin.h"
>  #include "cache.h"
>  #include "pack.h"
> +#include "parse-options.h"
>  
> -static const char show_index_usage[] =
> -"git show-index";
> +static const char *const show_index_usage[] = {
> +	"git show-index [--hash=HASH]",
> +	NULL
> +};

Do we say --hash=SHA-1 etc. or --hash-algo=SHA-256 in other places?
Would the word "hash" alone clear enough that it does not refer to
a specific "hash" value but the name of an algorithm?

The generating side seems to use "index-pack --object-format=<algo>"
and the transport seems to use a capability "object-format=<algo>",
neither of which is directly visible to the end users, but I think
they follow "git init --object-format=<algo>", so we are consistent
there.

Perhaps we should follow suit here, too?

>  int cmd_show_index(int argc, const char **argv, const char *prefix)
>  {
> @@ -11,10 +14,26 @@ int cmd_show_index(int argc, const char **argv, const char *prefix)
>  	unsigned nr;
>  	unsigned int version;
>  	static unsigned int top_index[256];
> -	const unsigned hashsz = the_hash_algo->rawsz;
> +	unsigned hashsz;
> +	const char *hash_name = NULL;
> +	int hash_algo;
> +	const struct option show_index_options[] = {
> +		OPT_STRING(0, "hash", &hash_name, N_("hash"),
> +			   N_("specify the hash algorithm to use")),

init-db has an entry identical to this except for the second token
given to the macro is "object-format" instead of "hash".  Both may
want to change what's inside N_() to "hash algorithm".

> +		OPT_END()
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, show_index_options, show_index_usage, 0);
> +
> +	if (hash_name) {
> +		hash_algo = hash_algo_by_name(hash_name);
> +		if (hash_algo == GIT_HASH_UNKNOWN)
> +			die(_("Unknown hash algorithm"));
> +		repo_set_hash_algo(the_repository, hash_algo);
> +	}
> +
> +	hashsz = the_hash_algo->rawsz;
>  
> -	if (argc != 1)
> -		usage(show_index_usage);
>  	if (fread(top_index, 2 * 4, 1, stdin) != 1)
>  		die("unable to read header");
>  	if (top_index[0] == htonl(PACK_IDX_SIGNATURE)) {
> diff --git a/git.c b/git.c
> index 2e4efb4ff0..e53e8159a2 100644
> --- a/git.c
> +++ b/git.c
> @@ -573,7 +573,7 @@ static struct cmd_struct commands[] = {
>  	{ "shortlog", cmd_shortlog, RUN_SETUP_GENTLY | USE_PAGER },
>  	{ "show", cmd_show, RUN_SETUP },
>  	{ "show-branch", cmd_show_branch, RUN_SETUP },
> -	{ "show-index", cmd_show_index },
> +	{ "show-index", cmd_show_index, RUN_SETUP_GENTLY },

Hmph, this is not necessary to support peeking an .idx file in
another repository that uses a different hash algorithm than ours
(we do need the --hash=<algo> override to tell that the algo is
different from what we read from our repository settings).  Is this
absolutely necessary?

Ah, I am misreading the patch.  We didn't even do setup but we now
optionally do, in order to see if we are in a repository and what
object format it uses to give the default value to --hash=<algo>
when the argument is not given.  The need for RUN_SETUP_GENTLY
is understandable.

As we do not take any path argument on the command line, the other
side effect of setup_git_directory() that takes us up to the top
level of the working tree does not hurt us, either, so this is a
good change, I think.

Thanks.



>  	{ "show-ref", cmd_show_ref, RUN_SETUP },
>  	{ "sparse-checkout", cmd_sparse_checkout, RUN_SETUP | NEED_WORK_TREE },
>  	{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH 27/44] builtin/show-index: provide options to determine hash algo
  2020-05-18 16:20   ` Junio C Hamano
@ 2020-05-19  0:31     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-19  0:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jonathan Tan


[-- Attachment #1: Type: text/plain, Size: 3344 bytes --]

On 2020-05-18 at 16:20:22, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > It's possible to use a variety of index formats with show-index, and we
> > need a way to indicate the hash algorithm which is in use for a
> > particular index we'd like to show.  Default to using the value for the
> > repository we're in by calling setup_git_directory_gently, and allow
> > overriding it by using a --hash argument.
> 
> I think you meant to say that "show-index" does not autodetect what
> hash algorithm is used from its input, and the new argument is a way
> for the user to help the command when the hash algorithm is
> different from what is used in the current repository?

Correct.

> I ask because I found that your version can be read to say that
> "show-index" can show the contents of a given pack index using any
> hash algorithm we support, and the user can specify --hash=SHA-256
> when running the command on a pack .idx that uses SHA-1 object names
> to auto-convert it, and readers wouldn't be able to guess which was
> meant with only the above five lines.

No, that's definitely not what I meant.  I'll adjust the commit message
to make this clearer.

> Do we say --hash=SHA-1 etc. or --hash-algo=SHA-256 in other places?
> Would the word "hash" alone clear enough that it does not refer to
> a specific "hash" value but the name of an algorithm?
> 
> The generating side seems to use "index-pack --object-format=<algo>"
> and the transport seems to use a capability "object-format=<algo>",
> neither of which is directly visible to the end users, but I think
> they follow "git init --object-format=<algo>", so we are consistent
> there.
> 
> Perhaps we should follow suit here, too?

Yeah, as I mentioned to Martin elsewhere in the thread, I'm going to
make this consistent and use --object-formta.

> > diff --git a/git.c b/git.c
> > index 2e4efb4ff0..e53e8159a2 100644
> > --- a/git.c
> > +++ b/git.c
> > @@ -573,7 +573,7 @@ static struct cmd_struct commands[] = {
> >  	{ "shortlog", cmd_shortlog, RUN_SETUP_GENTLY | USE_PAGER },
> >  	{ "show", cmd_show, RUN_SETUP },
> >  	{ "show-branch", cmd_show_branch, RUN_SETUP },
> > -	{ "show-index", cmd_show_index },
> > +	{ "show-index", cmd_show_index, RUN_SETUP_GENTLY },
> 
> Hmph, this is not necessary to support peeking an .idx file in
> another repository that uses a different hash algorithm than ours
> (we do need the --hash=<algo> override to tell that the algo is
> different from what we read from our repository settings).  Is this
> absolutely necessary?
> 
> Ah, I am misreading the patch.  We didn't even do setup but we now
> optionally do, in order to see if we are in a repository and what
> object format it uses to give the default value to --hash=<algo>
> when the argument is not given.  The need for RUN_SETUP_GENTLY
> is understandable.

Yes, this is designed to make us do the right thing when we're in a
repository (e.g., with --stdin) by autodetecting the algorithm in use
but not fail when we're outside of a repository.  I'll update the commit
message to make this a lot clearer, since I clearly omitted a lot of
things that were in my head when writing this.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (43 preceding siblings ...)
  2020-05-13  0:54 ` [PATCH 44/44] remote-testgit: adapt for object-format brian m. carlson
@ 2020-05-25 19:58 ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
                     ` (43 more replies)
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
  45 siblings, 44 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

This is part 2 of 3 of the SHA-256 work.  This series adds all of the
protocol logic to work with SHA-256 repositories.

Changes from v1:
* Fix spurious line additions and deletions.
* Rename len to linelen for easier understanding.
* Move the documentation comment for xstrncmpz to the header.
* Drop a useless variable (found).
* Update several commit messages to better explain things as suggested
  by Junio and Martin.
* Name the parameters for parse_feature_value for better documentation.
* Reduce the scope of variables when possible.
* Add explicit handling for missing object-format capabilities.
* Rename all new options to --object-format.
* Use oidcpy where possible.
* Test more failure cases.
* Have index-pack fail if --stdin and --object-format are both
  specified.
* Move and simplify t5704.
* Other miscellaneous cleanups to respond to review feedback.

Range-diff below.

brian m. carlson (44):
  t1050: match object ID paths in a hash-insensitive way
  Documentation: document v1 protocol object-format capability
  connect: have ref processing code take struct packet_reader
  wrapper: add function to compare strings with different NUL
    termination
  remote: advertise the object-format capability on the server side
  connect: add function to parse multiple v1 capability values
  connect: add function to fetch value of a v2 server capability
  pkt-line: add a member for hash algorithm
  transport: add a hash algorithm member
  connect: add function to detect supported v1 hash functions
  send-pack: detect when the server doesn't support our hash
  connect: make parse_feature_value extern
  fetch-pack: detect when the server doesn't support our hash
  connect: detect algorithm when fetching refs
  builtin/receive-pack: detect when the server doesn't support our hash
  docs: update remote helper docs for object-format extensions
  transport-helper: implement object-format extensions
  remote-curl: implement object-format extensions
  builtin/clone: initialize hash algorithm properly
  t5562: pass object-format in synthesized test data
  fetch-pack: parse and advertise the object-format capability
  setup: set the_repository's hash algo when checking format
  t3200: mark assertion with SHA1 prerequisite
  packfile: compute and use the index CRC offset
  t5302: modernize test formatting
  builtin/show-index: provide options to determine hash algo
  t1302: expect repo format version 1 for SHA-256
  Documentation/technical: document object-format for protocol v2
  connect: pass full packet reader when parsing v2 refs
  connect: parse v2 refs with correct hash algorithm
  serve: advertise object-format capability for protocol v2
  t5500: make hash independent
  builtin/ls-remote: initialize repository based on fetch
  remote-curl: detect algorithm for dumb HTTP by size
  builtin/index-pack: add option to specify hash algorithm
  t1050: pass algorithm to index-pack when outside repo
  remote-curl: avoid truncating refs with ls-remote
  t/helper: initialize the repository for test-sha1-array
  t5702: offer an object-format capability in the test
  t5703: use object-format serve option
  t5704: send object-format capability with SHA-256
  t5300: pass --object-format to git index-pack
  bundle: detect hash algorithm when reading refs
  remote-testgit: adapt for object-format

 Documentation/git-index-pack.txt              |   8 +
 Documentation/git-show-index.txt              |  11 +-
 Documentation/gitremote-helpers.txt           |  33 +-
 .../technical/protocol-capabilities.txt       |  15 +
 Documentation/technical/protocol-v2.txt       |   9 +
 builtin/clone.c                               |   9 +
 builtin/index-pack.c                          |  14 +-
 builtin/ls-remote.c                           |   4 +
 builtin/receive-pack.c                        |  10 +
 builtin/show-index.c                          |  29 +-
 bundle.c                                      |  22 +-
 bundle.h                                      |   1 +
 connect.c                                     | 138 +++++--
 connect.h                                     |   3 +
 fetch-pack.c                                  |  14 +
 git-compat-util.h                             |   6 +
 git.c                                         |   2 +-
 object-store.h                                |   1 +
 packfile.c                                    |   1 +
 pkt-line.c                                    |   1 +
 pkt-line.h                                    |   3 +
 remote-curl.c                                 |  46 ++-
 send-pack.c                                   |   6 +
 serve.c                                       |  27 ++
 setup.c                                       |   1 +
 t/helper/test-oid-array.c                     |   3 +
 t/t1050-large.sh                              |   6 +-
 t/t1302-repo-version.sh                       |   6 +-
 t/t3200-branch.sh                             |   2 +-
 t/t5300-pack-object.sh                        |   9 +-
 t/t5302-pack-index.sh                         | 360 +++++++++---------
 t/t5500-fetch-pack.sh                         |   5 +-
 t/t5562-http-backend-content-length.sh        |   5 +-
 t/t5701-git-serve.sh                          |  25 ++
 t/t5702-protocol-v2.sh                        |   2 +
 t/t5703-upload-pack-ref-in-want.sh            |  19 +-
 t/t5704-protocol-violations.sh                |   2 +
 t/t5801/git-remote-testgit                    |   6 +
 t/test-lib.sh                                 |   1 +
 transport-helper.c                            |  24 +-
 transport.c                                   |  18 +-
 transport.h                                   |   8 +
 upload-pack.c                                 |   3 +-
 wrapper.c                                     |   8 +
 44 files changed, 678 insertions(+), 248 deletions(-)

Range-diff against v1:
 1:  82a0a5beae =  1:  5878fe6a98 t1050: match object ID paths in a hash-insensitive way
 2:  95e84f6457 !  2:  402864eaa3 Documentation: document v1 protocol object-format capability
    @@ Documentation/technical/protocol-capabilities.txt: agent strings are purely info
      symref
      ------
      
    -@@ Documentation/technical/protocol-capabilities.txt: refs being sent.
    - 
    - Clients MAY use the parameters from this capability to select the proper initial
    - branch when cloning a repository.
    --
    - shallow
    - -------
    - 
 3:  7c82e91a11 !  3:  d124692e2f connect: have ref processing code take struct packet_reader
    @@ Commit message
         code take pointers to struct reader instead of having to pass multiple
         arguments to each function.
     
    +    Rename the len variable to "linelen" to make it clearer what the
    +    variable does in light of the variable change.
    +
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
      ## connect.c ##
    @@ connect.c: static void annotate_refs_with_symref_info(struct ref *ref)
      }
      
     -static void process_capabilities(const char *line, int *len)
    -+static void process_capabilities(struct packet_reader *reader, int *len)
    ++static void process_capabilities(struct packet_reader *reader, int *linelen)
      {
     +	const char *line = reader->line;
      	int nul_location = strlen(line);
    - 	if (nul_location == *len)
    +-	if (nul_location == *len)
    ++	if (nul_location == *linelen)
      		return;
    -@@ connect.c: static void process_capabilities(const char *line, int *len)
    - 	*len = nul_location;
    + 	server_capabilities_v1 = xstrdup(line + nul_location + 1);
    +-	*len = nul_location;
    ++	*linelen = nul_location;
      }
      
     -static int process_dummy_ref(const char *line)
 4:  a78234de04 <  -:  ---------- wrapper: add function to compare strings with different NUL termination
 -:  ---------- >  4:  cce29662b4 wrapper: add function to compare strings with different NUL termination
 5:  628ecec99a =  5:  3b207e304b remote: advertise the object-format capability on the server side
 6:  9990767072 =  6:  235d7f5b8f connect: add function to parse multiple v1 capability values
 7:  5ce2b7afde =  7:  0324e126b1 connect: add function to fetch value of a v2 server capability
 8:  e5d58b48f3 =  8:  cdba3122ce pkt-line: add a member for hash algorithm
 9:  bce9ba0538 =  9:  c8233c3b42 transport: add a hash algorithm member
10:  2d016e3870 ! 10:  b9273c4021 connect: add function to detect supported v1 hash functions
    @@ connect.c: static const char *parse_feature_value(const char *feature_list, cons
     +int server_supports_hash(const char *desired, int *feature_supported)
     +{
     +	int offset = 0;
    -+	int len, found = 0;
    ++	int len;
     +	const char *hash;
     +
     +	hash = next_server_feature_value("object-format", &len, &offset);
    @@ connect.c: static const char *parse_feature_value(const char *feature_list, cons
     +	}
     +	while (hash) {
     +		if (!xstrncmpz(desired, hash, len))
    -+			found = 1;
    -+
    -+		if (found)
     +			return 1;
    ++
     +		hash = next_server_feature_value("object-format", &len, &offset);
     +	}
     +	return 0;
11:  9fdc67b825 ! 11:  e2d37b75c8 send-pack: detect when the server doesn't support our hash
    @@ Commit message
         send-pack: detect when the server doesn't support our hash
     
         Detect when the server doesn't support our hash algorithm and abort.
    +    If the server does support our hash, advertise it as part of our
    +    capabilities.
     
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
12:  91a1fb0a7d ! 12:  602734cbbb connect: make parse_feature_value extern
    @@ connect.h: struct packet_reader;
      enum protocol_version discover_version(struct packet_reader *reader);
      
      int server_supports_hash(const char *desired, int *feature_supported);
    -+const char *parse_feature_value(const char *, const char *, int *, int *);
    ++const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset);
      int server_supports_v2(const char *c, int die_on_error);
      int server_feature_v2(const char *c, const char **v);
      int server_supports_feature(const char *c, const char *feature,
13:  fd82e5f755 = 13:  d97fa2c8aa fetch-pack: detect when the server doesn't support our hash
14:  b62f751fe4 ! 14:  ba052f1da7 connect: detect algorithm when fetching refs
    @@ Commit message
         If we're fetching refs, detect the hash algorithm and parse the refs
         using that algorithm.
     
    +    As mentioned in the documentation, if multiple versions of the
    +    object-format capability are provided, we use the first.  No known
    +    implementation supports multiple algorithms now, but they may in the
    +    future.
    +
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
      ## connect.c ##
     @@ connect.c: static void annotate_refs_with_symref_info(struct ref *ref)
      
    - static void process_capabilities(struct packet_reader *reader, int *len)
    + static void process_capabilities(struct packet_reader *reader, int *linelen)
      {
     +	const char *feat_val;
     +	int feat_len;
    -+	int hash_algo;
      	const char *line = reader->line;
      	int nul_location = strlen(line);
    - 	if (nul_location == *len)
    + 	if (nul_location == *linelen)
      		return;
      	server_capabilities_v1 = xstrdup(line + nul_location + 1);
    - 	*len = nul_location;
    + 	*linelen = nul_location;
     +
     +	feat_val = server_feature_value("object-format", &feat_len);
     +	if (feat_val) {
     +		char *hash_name = xstrndup(feat_val, feat_len);
    -+		hash_algo = hash_algo_by_name(hash_name);
    ++		int hash_algo = hash_algo_by_name(hash_name);
     +		if (hash_algo != GIT_HASH_UNKNOWN)
     +			reader->hash_algo = &hash_algos[hash_algo];
     +		free(hash_name);
    ++	} else {
    ++		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
     +	}
      }
      
15:  29b4219411 ! 15:  661d94d4de builtin/receive-pack: detect when the server doesn't support our hash
    @@ builtin/receive-pack.c: static struct command *read_head_info(struct packet_read
      		linelen = strlen(reader->line);
      		if (linelen < reader->pktlen) {
      			const char *feature_list = reader->line + linelen + 1;
    -+			const char *hash;
    ++			const char *hash = NULL;
     +			int len = 0;
      			if (parse_feature_request(feature_list, "report-status"))
      				report_status = 1;
16:  f8eb8c96f8 = 16:  fd8b85390c docs: update remote helper docs for object-format extensions
17:  93bf7005a8 = 17:  32285e611f transport-helper: implement object-format extensions
18:  ed75c102a3 = 18:  a33d1ed9a0 remote-curl: implement object-format extensions
19:  bf16872e73 = 19:  fffdf0780d builtin/clone: initialize hash algorithm properly
20:  ce77713343 ! 20:  f616f85b4b t5562: pass object-format in synthesized test data
    @@ t/t5562-http-backend-content-length.sh: test_expect_success 'setup' '
      		printf 0000 &&
      		echo "$hash_next" | git pack-objects --stdout
      	} >push_body &&
    -@@ t/t5562-http-backend-content-length.sh: test_expect_success GZIP 'push plain' '
    - 	test_cmp act.head exp.head
    - '
    - 
    -+test_expect_success GZIP 'push plain with SHA-1' '
    -+	test_when_finished "git branch -D newbranch" &&
    -+	test_http_env receive push_body &&
    -+	verify_http_result "200 OK" &&
    -+	git rev-parse newbranch >act.head &&
    -+	echo "$hash_next" >exp.head &&
    -+	test_cmp act.head exp.head
    -+'
    -+
    - test_expect_success 'push plain truncated' '
    - 	test_http_env receive push_body.trunc &&
    - 	! verify_http_result "200 OK"
21:  e4dd90fa9d <  -:  ---------- t5704: send object-format capability with SHA-256
22:  626d6e9018 ! 21:  eca43da42e fetch-pack: parse and advertise the object-format capability
    @@ fetch-pack.c: static int send_fetch_request(struct fetch_negotiator *negotiator,
     +			die(_("mismatched algorithms: client %s; server %s"),
     +			    the_hash_algo->name, hash_name);
     +		packet_write_fmt(fd_out, "object-format=%s", the_hash_algo->name);
    -+	}
    -+	else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1)
    ++	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
     +		die(_("the server does not support algorithm '%s'"),
     +		    the_hash_algo->name);
    ++	}
     +
      	packet_buf_delim(&req_buf);
      	if (args->use_thin_pack)
23:  8c675b5117 ! 22:  22c1a62e10 setup: set the_repository's hash algo when checking format
    @@ Commit message
         the same time.  This ensures that we perform a suitable initialization
         early enough to avoid confusing any parts of the code.  If we defer
         until later, we can end up with portions of the code which are confused
    -    about the hash algorithm, resulting in segfaults.
    +    about the hash algorithm, resulting in segfaults when working with
    +    SHA-256 repositories.
     
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
24:  b714d4accc = 23:  7c7f2263d5 t3200: mark assertion with SHA1 prerequisite
25:  eacde58fda = 24:  ee8a71a926 packfile: compute and use the index CRC offset
26:  81bb8cdb18 = 25:  6afecf0b09 t5302: modernize test formatting
27:  8623f21715 ! 26:  99a847ba4e builtin/show-index: provide options to determine hash algo
    @@ Metadata
      ## Commit message ##
         builtin/show-index: provide options to determine hash algo
     
    -    It's possible to use a variety of index formats with show-index, and we
    -    need a way to indicate the hash algorithm which is in use for a
    -    particular index we'd like to show.  Default to using the value for the
    -    repository we're in by calling setup_git_directory_gently, and allow
    -    overriding it by using a --hash argument.
    +    show-index is capable of reading any possible index file whether or not
    +    the index is inside a repository.  However, because our index files lack
    +    metadata about the hash algorithm in use, it's not possible to
    +    autodetect the algorithm that a particular index file is using.
    +
    +    In order to allow us to read index files of any algorithm, let's set up
    +    the .git directory gently so that we default to the algorithm for the
    +    current repository, and add an --object-format option to allow users to
    +    override this setting and continue to run show-index outside of a
    +    repository altogether.  Let's also document this new option so that
    +    people can find it and use it.
     
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
    + ## Documentation/git-show-index.txt ##
    +@@ Documentation/git-show-index.txt: git-show-index - Show packed archive index
    + SYNOPSIS
    + --------
    + [verse]
    +-'git show-index'
    ++'git show-index' [--object-format=<hash-algorithm>]
    + 
    + 
    + DESCRIPTION
    +@@ Documentation/git-show-index.txt: Note that you can get more information on a packfile by calling
    + linkgit:git-verify-pack[1]. However, as this command considers only the
    + index file itself, it's both faster and more flexible.
    + 
    ++OPTIONS
    ++-------
    ++
    ++--object-format=<hash-algorithm>::
    ++	Specify the given object format (hash algorithm) for the index file.  The
    ++	valid values are 'sha1' and (if enabled) 'sha256'.  The default is the
    ++	algorithm for the current repository (set by `extensions.objectFormat`), or
    ++	'sha1' if no value is set or outside a repository..
    ++
    + GIT
    + ---
    + Part of the linkgit:git[1] suite
    +
      ## builtin/show-index.c ##
     @@
      #include "builtin.h"
    @@ builtin/show-index.c
     -static const char show_index_usage[] =
     -"git show-index";
     +static const char *const show_index_usage[] = {
    -+	"git show-index [--hash=HASH]",
    ++	"git show-index [--object-format=<hash-algorithm>]",
     +	NULL
     +};
      
    @@ builtin/show-index.c: int cmd_show_index(int argc, const char **argv, const char
     +	const char *hash_name = NULL;
     +	int hash_algo;
     +	const struct option show_index_options[] = {
    -+		OPT_STRING(0, "hash", &hash_name, N_("hash"),
    ++		OPT_STRING(0, "object-format", &hash_name, N_("hash-algorithm"),
     +			   N_("specify the hash algorithm to use")),
     +		OPT_END()
     +	};
28:  bb3d2f566a = 27:  9f7c7bafaf t1302: expect repo format version 1 for SHA-256
29:  cc25069cb6 = 28:  d0ea597d63 Documentation/technical: document object-format for protocol v2
30:  efdac6383f ! 29:  51848df542 connect: pass full packet reader when parsing v2 refs
    @@ connect.c: static int process_ref_v2(const char *line, struct ref ***list)
      	/*
      	 * Ref lines have a number of fields which are space deliminated.  The
     @@ connect.c: struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
    - 	}
    - 	packet_flush(fd_out);
      
    -+
      	/* Process response from server */
      	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
     -		if (!process_ref_v2(reader->line, &list))
31:  602405e436 ! 30:  b57361f3b8 connect: parse v2 refs with correct hash algorithm
    @@ Commit message
         set that value in the packet reader.  Parse the refs using this
         algorithm.
     
    -    Note that we use memcpy instead of oidcpy for copying values, since
    -    oidcpy is intentionally limited to the current hash algorithm length,
    -    and the copy will be too short if the server side uses SHA-256 but the
    -    client side has not had a repository set up (and therefore defaults to
    -    SHA-1).
    -
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
      ## connect.c ##
    +@@ connect.c: static int process_ref(const struct packet_reader *reader, int len,
    + 		die(_("protocol error: unexpected capabilities^{}"));
    + 	} else if (check_ref(name, flags)) {
    + 		struct ref *ref = alloc_ref(name);
    +-		memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
    ++		oidcpy(&ref->old_oid, &old_oid);
    + 		**list = ref;
    + 		*list = &ref->next;
    + 	}
     @@ connect.c: static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
      		goto out;
      	}
32:  96236ac9ae ! 31:  a0c0f0f7a3 serve: advertise object-format capability for protocol v2
    @@ Commit message
     
         In the test, when we're using an algorithm other than SHA-1, we need to
         specify the algorithm in use so we don't get a failure with an "unknown
    -    format" message. Add a wrapper function that specifies this header if
    -    required.  Skip specifying this header for SHA-1 to test that it works
    -    both with and without this header.
    +    format" message.  Add a test that we handle a mismatched algorithm.
    +    Remove the test_oid_init call since it's no longer necessary.
     
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
    + ## connect.c ##
    +@@ connect.c: struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
    + 			die(_("unknown object format '%s' specified by server"), hash_name);
    + 		reader->hash_algo = &hash_algos[hash_algo];
    + 		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
    ++	} else {
    ++		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
    + 	}
    + 
    + 	if (server_options && server_options->nr &&
    +
      ## serve.c ##
     @@ serve.c: static int agent_advertise(struct repository *r,
      	return 1;
    @@ serve.c: static int process_request(void)
     
      ## t/t5701-git-serve.sh ##
     @@ t/t5701-git-serve.sh: test_description='test protocol v2 server commands'
    - 
      . ./test-lib.sh
      
    -+write_command () {
    -+	echo "command=$1"
    -+
    -+	if test "$(test_oid algo)" != sha1
    -+	then
    -+		echo "object-format=$(test_oid algo)"
    -+	fi
    -+}
    -+
      test_expect_success 'test capability advertisement' '
    -+	test_oid_init &&
    ++	test_oid_cache <<-EOF &&
    ++	wrong_algo sha1:sha256
    ++	wrong_algo sha256:sha1
    ++	EOF
      	cat >expect <<-EOF &&
      	version 2
      	agent=git/$(git version | cut -d" " -f3)
    @@ t/t5701-git-serve.sh: test_expect_success 'request invalid capability' '
      	EOF
      	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
     @@ t/t5701-git-serve.sh: test_expect_success 'request with no command' '
    - 
      test_expect_success 'request invalid command' '
      	test-tool pkt-line pack >in <<-EOF &&
    --	command=foo
    -+	$(write_command foo)
    + 	command=foo
    ++	object-format=$(test_oid algo)
      	agent=git/test
      	0000
      	EOF
    -@@ t/t5701-git-serve.sh: test_expect_success 'setup some refs and tags' '
    +@@ t/t5701-git-serve.sh: test_expect_success 'request invalid command' '
    + 	test_i18ngrep "invalid command" err
    + '
      
    ++test_expect_success 'wrong object-format' '
    ++	test-tool pkt-line pack >in <<-EOF &&
    ++	command=fetch
    ++	agent=git/test
    ++	object-format=$(test_oid wrong_algo)
    ++	0000
    ++	EOF
    ++	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
    ++	test_i18ngrep "mismatched object format" err
    ++'
    ++
    + # Test the basics of ls-refs
    + #
    + test_expect_success 'setup some refs and tags' '
    +@@ t/t5701-git-serve.sh: test_expect_success 'setup some refs and tags' '
      test_expect_success 'basics of ls-refs' '
      	test-tool pkt-line pack >in <<-EOF &&
    --	command=ls-refs
    -+	$(write_command ls-refs)
    + 	command=ls-refs
    ++	object-format=$(test_oid algo)
      	0000
      	EOF
      
     @@ t/t5701-git-serve.sh: test_expect_success 'basics of ls-refs' '
    - 
      test_expect_success 'basic ref-prefixes' '
      	test-tool pkt-line pack >in <<-EOF &&
    --	command=ls-refs
    -+	$(write_command ls-refs)
    + 	command=ls-refs
    ++	object-format=$(test_oid algo)
      	0001
      	ref-prefix refs/heads/master
      	ref-prefix refs/tags/one
     @@ t/t5701-git-serve.sh: test_expect_success 'basic ref-prefixes' '
    - 
      test_expect_success 'refs/heads prefix' '
      	test-tool pkt-line pack >in <<-EOF &&
    --	command=ls-refs
    -+	$(write_command ls-refs)
    + 	command=ls-refs
    ++	object-format=$(test_oid algo)
      	0001
      	ref-prefix refs/heads/
      	0000
     @@ t/t5701-git-serve.sh: test_expect_success 'refs/heads prefix' '
    - 
      test_expect_success 'peel parameter' '
      	test-tool pkt-line pack >in <<-EOF &&
    --	command=ls-refs
    -+	$(write_command ls-refs)
    + 	command=ls-refs
    ++	object-format=$(test_oid algo)
      	0001
      	peel
      	ref-prefix refs/tags/
     @@ t/t5701-git-serve.sh: test_expect_success 'peel parameter' '
    - 
      test_expect_success 'symrefs parameter' '
      	test-tool pkt-line pack >in <<-EOF &&
    --	command=ls-refs
    -+	$(write_command ls-refs)
    + 	command=ls-refs
    ++	object-format=$(test_oid algo)
      	0001
      	symrefs
      	ref-prefix refs/heads/
     @@ t/t5701-git-serve.sh: test_expect_success 'symrefs parameter' '
    - 
      test_expect_success 'sending server-options' '
      	test-tool pkt-line pack >in <<-EOF &&
    --	command=ls-refs
    -+	$(write_command ls-refs)
    + 	command=ls-refs
    ++	object-format=$(test_oid algo)
      	server-option=hello
      	server-option=world
      	0001
     @@ t/t5701-git-serve.sh: test_expect_success 'unexpected lines are not allowed in fetch request' '
    - 	git init server &&
      
      	test-tool pkt-line pack >in <<-EOF &&
    --	command=fetch
    -+	$(write_command fetch)
    + 	command=fetch
    ++	object-format=$(test_oid algo)
      	0001
      	this-is-not-a-command
      	0000
33:  57f3bbb709 = 32:  1694f3f838 t5500: make hash independent
34:  8242e65747 ! 33:  902b394667 builtin/ls-remote: initialize repository based on fetch
    @@ Commit message
         the refs to 40 hex characters, since that's the length of the default
         hash algorithm (SHA-1).
     
    +    Note that technically this is not a correct setting of the repository
    +    hash algorithm since, if we are in a repository, it might be one of a
    +    different hash algorithm from the remote side.  However, our current
    +    code paths don't handle multiple algorithms and won't for some time, so
    +    this is the best we can do.  We rely on the fact that ls-remote never
    +    modifies the current repository, which is a reasonable assumption to
    +    make.
    +
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
      ## builtin/ls-remote.c ##
35:  c5664b646f ! 34:  cc12b9b51f remote-curl: detect algorithm for dumb HTTP by size
    @@ Commit message
         provide one. Detect the hash algorithm in use by the size of the first
         object ID.
     
    +    We anonymize the URL like elsewhere in the function in case the user has
    +    decided to include a secret in the URL.
    +
         Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
     
      ## remote-curl.c ##
    @@ remote-curl.c: static struct ref *parse_info_refs(struct discovery *heads)
     +	if (!options.hash_algo)
     +		die("%sinfo/refs not valid: could not determine hash algorithm; "
     +		    "is this a git repository?",
    -+		    url.buf);
    ++		    transport_anonymize_url(url.buf));
     +
      	data = heads->buf;
      	start = NULL;
36:  31cd59a221 <  -:  ---------- builtin/index-pack: add option to specify hash algorithm
 -:  ---------- > 35:  b5425c9f54 builtin/index-pack: add option to specify hash algorithm
37:  658b787e8c = 36:  5c70c24d7a t1050: pass algorithm to index-pack when outside repo
38:  64429337ba = 37:  460d6008e8 remote-curl: avoid truncating refs with ls-remote
39:  cde2128520 = 38:  60a98d9b53 t/helper: initialize the repository for test-sha1-array
40:  0af00a7681 = 39:  b66c3ead37 t5702: offer an object-format capability in the test
41:  74278d4c1c = 40:  af43274a1f t5703: use object-format serve option
 -:  ---------- > 41:  f5085b1f3f t5704: send object-format capability with SHA-256
42:  4f735c8bb5 = 42:  a1b01babda t5300: pass --object-format to git index-pack
43:  3854f70427 = 43:  dbb5f7195e bundle: detect hash algorithm when reading refs
44:  103be1f4d6 = 44:  6c823bbe68 remote-testgit: adapt for object-format

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 01/44] t1050: match object ID paths in a hash-insensitive way
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
                     ` (42 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

The pattern here looking for failures is specific to SHA-1.  Let's
create a variable that matches the regex or glob pattern for a path
within the objects directory.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1050-large.sh | 2 +-
 t/test-lib.sh    | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 184b479a21..7f88ea07c2 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -64,7 +64,7 @@ test_expect_success 'add a large file or two' '
 	test $count = 1 &&
 	cnt=$(git show-index <"$idx" | wc -l) &&
 	test $cnt = 2 &&
-	for l in .git/objects/??/??????????????????????????????????????
+	for l in .git/objects/$OIDPATH_REGEX
 	do
 		test_path_is_file "$l" || continue
 		bad=t
diff --git a/t/test-lib.sh b/t/test-lib.sh
index d36b6ddc62..5c65c3e26c 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1414,6 +1414,7 @@ test_oid_init
 
 ZERO_OID=$(test_oid zero)
 OID_REGEX=$(echo $ZERO_OID | sed -e 's/0/[0-9a-f]/g')
+OIDPATH_REGEX=$(test_oid_to_path $ZERO_OID | sed -e 's/0/[0-9a-f]/g')
 EMPTY_TREE=$(test_oid empty_tree)
 EMPTY_BLOB=$(test_oid empty_blob)
 _z40=$ZERO_OID

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 02/44] Documentation: document v1 protocol object-format capability
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
                     ` (41 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Document a capability that indicates which hash algorithms are in use by
both sides of a remote connection.  Use the term "object-format", since
this is the term used for the repository extension as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/technical/protocol-capabilities.txt | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 2b267c0da6..36ccd14f97 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -176,6 +176,21 @@ agent strings are purely informative for statistics and debugging
 purposes, and MUST NOT be used to programmatically assume the presence
 or absence of particular features.
 
+object-format
+-------------
+
+This capability, which takes a hash algorithm as an argument, indicates
+that the server supports the given hash algorithms.  It may be sent
+multiple times; if so, the first one given is the one used in the ref
+advertisement.
+
+When provided by the client, this indicates that it intends to use the
+given hash algorithm to communicate.  The algorithm provided must be one
+that the server supports.
+
+If this capability is not provided, it is assumed that the only
+supported algorithm is SHA-1.
+
 symref
 ------
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 03/44] connect: have ref processing code take struct packet_reader
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
                     ` (40 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

In a future patch, we'll want to access multiple members from struct
packet_reader when parsing references.  Therefore, have the ref parsing
code take pointers to struct reader instead of having to pass multiple
arguments to each function.

Rename the len variable to "linelen" to make it clearer what the
variable does in light of the variable change.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/connect.c b/connect.c
index 23013c6344..ccc5274189 100644
--- a/connect.c
+++ b/connect.c
@@ -204,17 +204,19 @@ static void annotate_refs_with_symref_info(struct ref *ref)
 	string_list_clear(&symref, 0);
 }
 
-static void process_capabilities(const char *line, int *len)
+static void process_capabilities(struct packet_reader *reader, int *linelen)
 {
+	const char *line = reader->line;
 	int nul_location = strlen(line);
-	if (nul_location == *len)
+	if (nul_location == *linelen)
 		return;
 	server_capabilities_v1 = xstrdup(line + nul_location + 1);
-	*len = nul_location;
+	*linelen = nul_location;
 }
 
-static int process_dummy_ref(const char *line)
+static int process_dummy_ref(const struct packet_reader *reader)
 {
+	const char *line = reader->line;
 	struct object_id oid;
 	const char *name;
 
@@ -234,9 +236,11 @@ static void check_no_capabilities(const char *line, int len)
 			line + strlen(line));
 }
 
-static int process_ref(const char *line, int len, struct ref ***list,
-		       unsigned int flags, struct oid_array *extra_have)
+static int process_ref(const struct packet_reader *reader, int len,
+		       struct ref ***list, unsigned int flags,
+		       struct oid_array *extra_have)
 {
+	const char *line = reader->line;
 	struct object_id old_oid;
 	const char *name;
 
@@ -260,9 +264,10 @@ static int process_ref(const char *line, int len, struct ref ***list,
 	return 1;
 }
 
-static int process_shallow(const char *line, int len,
+static int process_shallow(const struct packet_reader *reader, int len,
 			   struct oid_array *shallow_points)
 {
+	const char *line = reader->line;
 	const char *arg;
 	struct object_id old_oid;
 
@@ -315,20 +320,20 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 
 		switch (state) {
 		case EXPECTING_FIRST_REF:
-			process_capabilities(reader->line, &len);
-			if (process_dummy_ref(reader->line)) {
+			process_capabilities(reader, &len);
+			if (process_dummy_ref(reader)) {
 				state = EXPECTING_SHALLOW;
 				break;
 			}
 			state = EXPECTING_REF;
 			/* fallthrough */
 		case EXPECTING_REF:
-			if (process_ref(reader->line, len, &list, flags, extra_have))
+			if (process_ref(reader, len, &list, flags, extra_have))
 				break;
 			state = EXPECTING_SHALLOW;
 			/* fallthrough */
 		case EXPECTING_SHALLOW:
-			if (process_shallow(reader->line, len, shallow_points))
+			if (process_shallow(reader, len, shallow_points))
 				break;
 			die(_("protocol error: unexpected '%s'"), reader->line);
 		case EXPECTING_DONE:

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 04/44] wrapper: add function to compare strings with different NUL termination
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (2 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 05/44] remote: advertise the object-format capability on the server side brian m. carlson
                     ` (39 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When parsing capabilities for the pack protocol, there are times we'll
want to compare the value of a capability to a NUL-terminated string.
Since the data we're reading will be space-terminated, not
NUL-terminated, we need a function that compares the two strings, but
also checks that they're the same length.  Otherwise, if we used strncmp
to compare these strings, we might accidentally accept a parameter that
was a prefix of the expected value.

Add a function, xstrncmpz, that takes a NUL-terminated string and a
non-NUL-terminated string, plus a length, and compares them, ensuring
that they are the same length.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 git-compat-util.h | 6 ++++++
 wrapper.c         | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index a73632e8e4..5637114b8d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -868,6 +868,12 @@ char *xgetcwd(void);
 FILE *fopen_for_writing(const char *path);
 FILE *fopen_or_warn(const char *path, const char *mode);
 
+/*
+ * Like strncmp, but only return zero if s is NUL-terminated and exactly len
+ * characters long.  If it is not, consider it greater than t.
+ */
+int xstrncmpz(const char *s, const char *t, size_t len);
+
 /*
  * FREE_AND_NULL(ptr) is like free(ptr) followed by ptr = NULL. Note
  * that ptr is used twice, so don't pass e.g. ptr++.
diff --git a/wrapper.c b/wrapper.c
index 3a1c0e0526..4ff4a9c3db 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -105,6 +105,14 @@ char *xstrndup(const char *str, size_t len)
 	return xmemdupz(str, p ? p - str : len);
 }
 
+int xstrncmpz(const char *s, const char *t, size_t len)
+{
+	int res = strncmp(s, t, len);
+	if (res)
+		return res;
+	return s[len] == '\0' ? 0 : 1;
+}
+
 void *xrealloc(void *ptr, size_t size)
 {
 	void *ret;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 05/44] remote: advertise the object-format capability on the server side
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (3 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 06/44] connect: add function to parse multiple v1 capability values brian m. carlson
                     ` (38 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Advertise the current hash algorithm in use by using the object-format
capability as part of the ref advertisement.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/receive-pack.c | 1 +
 upload-pack.c          | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index ea3d0f01af..4ffa501dce 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -249,6 +249,7 @@ static void show_ref(const char *path, const struct object_id *oid)
 			strbuf_addf(&cap, " push-cert=%s", push_cert_nonce);
 		if (advertise_push_options)
 			strbuf_addstr(&cap, " push-options");
+		strbuf_addf(&cap, " object-format=%s", the_hash_algo->name);
 		strbuf_addf(&cap, " agent=%s", git_user_agent_sanitized());
 		packet_write_fmt(1, "%s %s%c%s\n",
 			     oid_to_hex(oid), path, 0, cap.buf);
diff --git a/upload-pack.c b/upload-pack.c
index 0478bff3e7..636911fec5 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1008,7 +1008,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		struct strbuf symref_info = STRBUF_INIT;
 
 		format_symref_info(&symref_info, cb_data);
-		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s agent=%s\n",
+		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s object-format=%s agent=%s\n",
 			     oid_to_hex(oid), refname_nons,
 			     0, capabilities,
 			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
@@ -1018,6 +1018,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 			     stateless_rpc ? " no-done" : "",
 			     symref_info.buf,
 			     allow_filter ? " filter" : "",
+			     the_hash_algo->name,
 			     git_user_agent_sanitized());
 		strbuf_release(&symref_info);
 	} else {

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 06/44] connect: add function to parse multiple v1 capability values
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (4 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 05/44] remote: advertise the object-format capability on the server side brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
                     ` (37 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

In a capability response, we can have multiple symref entries.  In the
future, we will also allow for multiple hash algorithms to be specified.
To avoid duplication, expand the parse_feature_value function to take an
optional offset where the parsing should begin next time.  Add a wrapper
function that allows us to query the next server feature value, and use
it in the existing symref parsing code.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/connect.c b/connect.c
index ccc5274189..2b55a32d4d 100644
--- a/connect.c
+++ b/connect.c
@@ -18,7 +18,8 @@
 
 static char *server_capabilities_v1;
 static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
-static const char *parse_feature_value(const char *, const char *, int *);
+static const char *parse_feature_value(const char *, const char *, int *, int *);
+static const char *next_server_feature_value(const char *feature, int *len, int *offset);
 
 static int check_ref(const char *name, unsigned int flags)
 {
@@ -180,17 +181,16 @@ static void parse_one_symref_info(struct string_list *symref, const char *val, i
 static void annotate_refs_with_symref_info(struct ref *ref)
 {
 	struct string_list symref = STRING_LIST_INIT_DUP;
-	const char *feature_list = server_capabilities_v1;
+	int offset = 0;
 
-	while (feature_list) {
+	while (1) {
 		int len;
 		const char *val;
 
-		val = parse_feature_value(feature_list, "symref", &len);
+		val = next_server_feature_value("symref", &len, &offset);
 		if (!val)
 			break;
 		parse_one_symref_info(&symref, val, len);
-		feature_list = val + 1;
 	}
 	string_list_sort(&symref);
 
@@ -452,7 +452,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	return list;
 }
 
-static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp)
+static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
 {
 	int len;
 
@@ -460,6 +460,8 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 		return NULL;
 
 	len = strlen(feature);
+	if (offset)
+		feature_list += *offset;
 	while (*feature_list) {
 		const char *found = strstr(feature_list, feature);
 		if (!found)
@@ -474,9 +476,14 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 			}
 			/* feature with a value (e.g., "agent=git/1.2.3") */
 			else if (*value == '=') {
+				int end;
+
 				value++;
+				end = strcspn(value, " \t\n");
 				if (lenp)
-					*lenp = strcspn(value, " \t\n");
+					*lenp = end;
+				if (offset)
+					*offset = value + end - feature_list;
 				return value;
 			}
 			/*
@@ -491,12 +498,17 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 
 int parse_feature_request(const char *feature_list, const char *feature)
 {
-	return !!parse_feature_value(feature_list, feature, NULL);
+	return !!parse_feature_value(feature_list, feature, NULL, NULL);
+}
+
+static const char *next_server_feature_value(const char *feature, int *len, int *offset)
+{
+	return parse_feature_value(server_capabilities_v1, feature, len, offset);
 }
 
 const char *server_feature_value(const char *feature, int *len)
 {
-	return parse_feature_value(server_capabilities_v1, feature, len);
+	return parse_feature_value(server_capabilities_v1, feature, len, NULL);
 }
 
 int server_supports(const char *feature)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 07/44] connect: add function to fetch value of a v2 server capability
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (5 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 06/44] connect: add function to parse multiple v1 capability values brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 08/44] pkt-line: add a member for hash algorithm brian m. carlson
                     ` (36 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

So far in protocol v2, all of our server capabilities that have values
have not had values that we've been interested in parsing.  For example,
we receive but ignore the agent value.

However, in a future commit, we're going to want to parse out the value
of a server capability.  To make this easy, add a function,
server_feature_v2, that can fetch the value provided as part of the
server capability.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 15 +++++++++++++++
 connect.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/connect.c b/connect.c
index 2b55a32d4d..ad0e4e8e56 100644
--- a/connect.c
+++ b/connect.c
@@ -84,6 +84,21 @@ int server_supports_v2(const char *c, int die_on_error)
 	return 0;
 }
 
+int server_feature_v2(const char *c, const char **v)
+{
+	int i;
+
+	for (i = 0; i < server_capabilities_v2.argc; i++) {
+		const char *out;
+		if (skip_prefix(server_capabilities_v2.argv[i], c, &out) &&
+		    (*out == '=')) {
+			*v = out + 1;
+			return 1;
+		}
+	}
+	return 0;
+}
+
 int server_supports_feature(const char *c, const char *feature,
 			    int die_on_error)
 {
diff --git a/connect.h b/connect.h
index 5f2382e018..4d76a6017d 100644
--- a/connect.h
+++ b/connect.h
@@ -19,6 +19,7 @@ struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
 int server_supports_v2(const char *c, int die_on_error);
+int server_feature_v2(const char *c, const char **v);
 int server_supports_feature(const char *c, const char *feature,
 			    int die_on_error);
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 08/44] pkt-line: add a member for hash algorithm
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (6 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 09/44] transport: add a hash algorithm member brian m. carlson
                     ` (35 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Add a member for the hash algorithm currently in use to the packet
reader so it can parse references correctly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 pkt-line.c | 1 +
 pkt-line.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index a0e87b1e81..a4aea075de 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -479,6 +479,7 @@ void packet_reader_init(struct packet_reader *reader, int fd,
 	reader->buffer_size = sizeof(packet_buffer);
 	reader->options = options;
 	reader->me = "git";
+	reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
 }
 
 enum packet_read_status packet_reader_read(struct packet_reader *reader)
diff --git a/pkt-line.h b/pkt-line.h
index fef3a0d792..4cd9435e9a 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -166,6 +166,9 @@ struct packet_reader {
 
 	unsigned use_sideband : 1;
 	const char *me;
+
+	/* hash algorithm in use */
+	const struct git_hash_algo *hash_algo;
 };
 
 /*

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 09/44] transport: add a hash algorithm member
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (7 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 08/44] pkt-line: add a member for hash algorithm brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
                     ` (34 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When connecting to a remote system, we need to know what hash algorithm
it will be using to talk to us.  Add a hash_algo member to struct
transport and add a function to read this data from the transport
object.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 transport.c | 8 ++++++++
 transport.h | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/transport.c b/transport.c
index 15f5ba4e8f..b43d985f90 100644
--- a/transport.c
+++ b/transport.c
@@ -311,6 +311,7 @@ static struct ref *handshake(struct transport *transport, int for_push,
 		BUG("unknown protocol version");
 	}
 	data->got_remote_heads = 1;
+	transport->hash_algo = reader.hash_algo;
 
 	if (reader.line_peeked)
 		BUG("buffer must be empty at the end of handshake()");
@@ -996,9 +997,16 @@ struct transport *transport_get(struct remote *remote, const char *url)
 			ret->smart_options->receivepack = remote->receivepack;
 	}
 
+	ret->hash_algo = &hash_algos[GIT_HASH_SHA1];
+
 	return ret;
 }
 
+const struct git_hash_algo *transport_get_hash_algo(struct transport *transport)
+{
+	return transport->hash_algo;
+}
+
 int transport_set_option(struct transport *transport,
 			 const char *name, const char *value)
 {
diff --git a/transport.h b/transport.h
index 4298c855be..2a9f96c05a 100644
--- a/transport.h
+++ b/transport.h
@@ -115,6 +115,8 @@ struct transport {
 	struct git_transport_options *smart_options;
 
 	enum transport_family family;
+
+	const struct git_hash_algo *hash_algo;
 };
 
 #define TRANSPORT_PUSH_ALL			(1<<0)
@@ -243,6 +245,12 @@ int transport_push(struct repository *repo,
 const struct ref *transport_get_remote_refs(struct transport *transport,
 					    const struct argv_array *ref_prefixes);
 
+/*
+ * Fetch the hash algorithm used by a remote.
+ *
+ * This can only be called after fetching the remote refs.
+ */
+const struct git_hash_algo *transport_get_hash_algo(struct transport *transport);
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
 int transport_disconnect(struct transport *transport);

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 10/44] connect: add function to detect supported v1 hash functions
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (8 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 09/44] transport: add a hash algorithm member brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
                     ` (33 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Add a function, server_supports_hash, to see if the remote server
supports a particular hash algorithm when speaking protocol v1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 22 ++++++++++++++++++++++
 connect.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/connect.c b/connect.c
index ad0e4e8e56..eaa13b41bb 100644
--- a/connect.c
+++ b/connect.c
@@ -511,6 +511,28 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 	return NULL;
 }
 
+int server_supports_hash(const char *desired, int *feature_supported)
+{
+	int offset = 0;
+	int len;
+	const char *hash;
+
+	hash = next_server_feature_value("object-format", &len, &offset);
+	if (feature_supported)
+		*feature_supported = !!hash;
+	if (!hash) {
+		hash = hash_algos[GIT_HASH_SHA1].name;
+		len = strlen(hash);
+	}
+	while (hash) {
+		if (!xstrncmpz(desired, hash, len))
+			return 1;
+
+		hash = next_server_feature_value("object-format", &len, &offset);
+	}
+	return 0;
+}
+
 int parse_feature_request(const char *feature_list, const char *feature)
 {
 	return !!parse_feature_value(feature_list, feature, NULL, NULL);
diff --git a/connect.h b/connect.h
index 4d76a6017d..fc75d6a457 100644
--- a/connect.h
+++ b/connect.h
@@ -18,6 +18,7 @@ int url_is_local_not_ssh(const char *url);
 struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
+int server_supports_hash(const char *desired, int *feature_supported);
 int server_supports_v2(const char *c, int die_on_error);
 int server_feature_v2(const char *c, const char **v);
 int server_supports_feature(const char *c, const char *feature,

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 11/44] send-pack: detect when the server doesn't support our hash
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (9 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 12/44] connect: make parse_feature_value extern brian m. carlson
                     ` (32 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Detect when the server doesn't support our hash algorithm and abort.
If the server does support our hash, advertise it as part of our
capabilities.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 send-pack.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/send-pack.c b/send-pack.c
index 0abee22283..02aefcb08e 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -363,6 +363,7 @@ int send_pack(struct send_pack_args *args,
 	int atomic_supported = 0;
 	int use_push_options = 0;
 	int push_options_supported = 0;
+	int object_format_supported = 0;
 	unsigned cmds_sent = 0;
 	int ret;
 	struct async demux;
@@ -389,6 +390,9 @@ int send_pack(struct send_pack_args *args,
 	if (server_supports("push-options"))
 		push_options_supported = 1;
 
+	if (!server_supports_hash(the_hash_algo->name, &object_format_supported))
+		die(_("the receiving end does not support this repository's hash algorithm"));
+
 	if (args->push_cert != SEND_PACK_PUSH_CERT_NEVER) {
 		int len;
 		push_cert_nonce = server_feature_value("push-cert", &len);
@@ -429,6 +433,8 @@ int send_pack(struct send_pack_args *args,
 		strbuf_addstr(&cap_buf, " atomic");
 	if (use_push_options)
 		strbuf_addstr(&cap_buf, " push-options");
+	if (object_format_supported)
+		strbuf_addf(&cap_buf, " object-format=%s", the_hash_algo->name);
 	if (agent_supported)
 		strbuf_addf(&cap_buf, " agent=%s", git_user_agent_sanitized());
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 12/44] connect: make parse_feature_value extern
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (10 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:58   ` [PATCH v2 13/44] fetch-pack: detect when the server doesn't support our hash brian m. carlson
                     ` (31 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

We're going to be using this function in other files, so no longer mark
this function static.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 3 +--
 connect.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/connect.c b/connect.c
index eaa13b41bb..397fad7e32 100644
--- a/connect.c
+++ b/connect.c
@@ -18,7 +18,6 @@
 
 static char *server_capabilities_v1;
 static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
-static const char *parse_feature_value(const char *, const char *, int *, int *);
 static const char *next_server_feature_value(const char *feature, int *len, int *offset);
 
 static int check_ref(const char *name, unsigned int flags)
@@ -467,7 +466,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	return list;
 }
 
-static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
+const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
 {
 	int len;
 
diff --git a/connect.h b/connect.h
index fc75d6a457..ace074dcb6 100644
--- a/connect.h
+++ b/connect.h
@@ -19,6 +19,7 @@ struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
 int server_supports_hash(const char *desired, int *feature_supported);
+const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset);
 int server_supports_v2(const char *c, int die_on_error);
 int server_feature_v2(const char *c, const char **v);
 int server_supports_feature(const char *c, const char *feature,

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 13/44] fetch-pack: detect when the server doesn't support our hash
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (11 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 12/44] connect: make parse_feature_value extern brian m. carlson
@ 2020-05-25 19:58   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 14/44] connect: detect algorithm when fetching refs brian m. carlson
                     ` (30 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:58 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Detect when the server doesn't support our hash algorithm and abort.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fetch-pack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index 7eaa19d7c1..34c339a5fe 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1040,6 +1040,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		print_verbose(args, _("Server supports %s"), "deepen-relative");
 	else if (args->deepen_relative)
 		die(_("Server does not support --deepen"));
+	if (!server_supports_hash(the_hash_algo->name, NULL))
+		die(_("Server does not support this repository's object format"));
 
 	if (!args->no_dependents) {
 		mark_complete_and_common_ref(negotiator, args, &ref);

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 14/44] connect: detect algorithm when fetching refs
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (12 preceding siblings ...)
  2020-05-25 19:58   ` [PATCH v2 13/44] fetch-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
                     ` (29 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

If we're fetching refs, detect the hash algorithm and parse the refs
using that algorithm.

As mentioned in the documentation, if multiple versions of the
object-format capability are provided, we use the first.  No known
implementation supports multiple algorithms now, but they may in the
future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/connect.c b/connect.c
index 397fad7e32..915f1736a0 100644
--- a/connect.c
+++ b/connect.c
@@ -220,12 +220,25 @@ static void annotate_refs_with_symref_info(struct ref *ref)
 
 static void process_capabilities(struct packet_reader *reader, int *linelen)
 {
+	const char *feat_val;
+	int feat_len;
 	const char *line = reader->line;
 	int nul_location = strlen(line);
 	if (nul_location == *linelen)
 		return;
 	server_capabilities_v1 = xstrdup(line + nul_location + 1);
 	*linelen = nul_location;
+
+	feat_val = server_feature_value("object-format", &feat_len);
+	if (feat_val) {
+		char *hash_name = xstrndup(feat_val, feat_len);
+		int hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo != GIT_HASH_UNKNOWN)
+			reader->hash_algo = &hash_algos[hash_algo];
+		free(hash_name);
+	} else {
+		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
+	}
 }
 
 static int process_dummy_ref(const struct packet_reader *reader)
@@ -234,7 +247,7 @@ static int process_dummy_ref(const struct packet_reader *reader)
 	struct object_id oid;
 	const char *name;
 
-	if (parse_oid_hex(line, &oid, &name))
+	if (parse_oid_hex_algop(line, &oid, &name, reader->hash_algo))
 		return 0;
 	if (*name != ' ')
 		return 0;
@@ -258,7 +271,7 @@ static int process_ref(const struct packet_reader *reader, int len,
 	struct object_id old_oid;
 	const char *name;
 
-	if (parse_oid_hex(line, &old_oid, &name))
+	if (parse_oid_hex_algop(line, &old_oid, &name, reader->hash_algo))
 		return 0;
 	if (*name != ' ')
 		return 0;
@@ -270,7 +283,7 @@ static int process_ref(const struct packet_reader *reader, int len,
 		die(_("protocol error: unexpected capabilities^{}"));
 	} else if (check_ref(name, flags)) {
 		struct ref *ref = alloc_ref(name);
-		oidcpy(&ref->old_oid, &old_oid);
+		memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
 		**list = ref;
 		*list = &ref->next;
 	}
@@ -288,7 +301,7 @@ static int process_shallow(const struct packet_reader *reader, int len,
 	if (!skip_prefix(line, "shallow ", &arg))
 		return 0;
 
-	if (get_oid_hex(arg, &old_oid))
+	if (get_oid_hex_algop(arg, &old_oid, reader->hash_algo))
 		die(_("protocol error: expected shallow sha-1, got '%s'"), arg);
 	if (!shallow_points)
 		die(_("repository on the other end cannot be shallow"));

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 15/44] builtin/receive-pack: detect when the server doesn't support our hash
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (13 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 14/44] connect: detect algorithm when fetching refs brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 16/44] docs: update remote helper docs for object-format extensions brian m. carlson
                     ` (28 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Detect when the server doesn't support our hash algorithm and abort.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/receive-pack.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 4ffa501dce..d43663bb0a 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1625,6 +1625,8 @@ static struct command *read_head_info(struct packet_reader *reader,
 		linelen = strlen(reader->line);
 		if (linelen < reader->pktlen) {
 			const char *feature_list = reader->line + linelen + 1;
+			const char *hash = NULL;
+			int len = 0;
 			if (parse_feature_request(feature_list, "report-status"))
 				report_status = 1;
 			if (parse_feature_request(feature_list, "side-band-64k"))
@@ -1637,6 +1639,13 @@ static struct command *read_head_info(struct packet_reader *reader,
 			if (advertise_push_options
 			    && parse_feature_request(feature_list, "push-options"))
 				use_push_options = 1;
+			hash = parse_feature_value(feature_list, "object-format", &len, NULL);
+			if (!hash) {
+				hash = hash_algos[GIT_HASH_SHA1].name;
+				len = strlen(hash);
+			}
+			if (xstrncmpz(the_hash_algo->name, hash, len))
+				die("error: unsupported object format '%s'", hash);
 		}
 
 		if (!strcmp(reader->line, "push-cert")) {

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 16/44] docs: update remote helper docs for object-format extensions
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (14 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 17/44] transport-helper: implement " brian m. carlson
                     ` (27 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Update the remote helper docs to document the object-format extensions
we will implement in remote-curl and the transport helper code shortly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitremote-helpers.txt | 33 +++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index f48a031dc3..26f32e4421 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -238,6 +238,9 @@ the remote repository.
 	`--signed-tags=verbatim` to linkgit:git-fast-export[1].  In the
 	absence of this capability, Git will use `--signed-tags=warn-strip`.
 
+'object-format'::
+	This indicates that the helper is able to interact with the remote
+	side using an explicit hash algorithm extension.
 
 
 COMMANDS
@@ -257,12 +260,14 @@ Support for this command is mandatory.
 'list'::
 	Lists the refs, one per line, in the format "<value> <name>
 	[<attr> ...]". The value may be a hex sha1 hash, "@<dest>" for
-	a symref, or "?" to indicate that the helper could not get the
-	value of the ref. A space-separated list of attributes follows
-	the name; unrecognized attributes are ignored. The list ends
-	with a blank line.
+	a symref, ":<keyword> <value>" for a key-value pair, or
+	"?" to indicate that the helper could not get the value of the
+	ref. A space-separated list of attributes follows the name;
+	unrecognized attributes are ignored. The list ends with a
+	blank line.
 +
 See REF LIST ATTRIBUTES for a list of currently defined attributes.
+See REF LIST KEYWORDS for a list of currently defined keywords.
 +
 Supported if the helper has the "fetch" or "import" capability.
 
@@ -430,6 +435,18 @@ attributes are defined.
 	This ref is unchanged since the last import or fetch, although
 	the helper cannot necessarily determine what value that produced.
 
+REF LIST KEYWORDS
+-----------------
+
+The 'list' command may produce a list of key-value pairs.
+The following keys are defined.
+
+'object-format'::
+	The refs are using the given hash algorithm.  This keyword is only
+	used if the server and client both support the object-format
+	extension.
+
+
 OPTIONS
 -------
 
@@ -514,6 +531,14 @@ set by Git if the remote helper has the 'option' capability.
 	transaction.  If successful, all refs will be updated, or none will.  If the
 	remote side does not support this capability, the push will fail.
 
+'option object-format' {'true'|algorithm}::
+	If 'true', indicate that the caller wants hash algorithm information
+	to be passed back from the remote.  This mode is used when fetching
+	refs.
++
+If set to an algorithm, indicate that the caller wants to interact with
+the remote side using that algorithm.
+
 SEE ALSO
 --------
 linkgit:git-remote[1]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 17/44] transport-helper: implement object-format extensions
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (15 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 16/44] docs: update remote helper docs for object-format extensions brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 18/44] remote-curl: " brian m. carlson
                     ` (26 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Implement the object-format extensions that let us determine the hash
algorithm in use when pushing or pulling data.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 transport-helper.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index a46afcb69d..ae33b0eea7 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -32,7 +32,8 @@ struct helper_data {
 		signed_tags : 1,
 		check_connectivity : 1,
 		no_disconnect_req : 1,
-		no_private_update : 1;
+		no_private_update : 1,
+		object_format : 1;
 
 	/*
 	 * As an optimization, the transport code may invoke fetch before
@@ -207,6 +208,8 @@ static struct child_process *get_helper(struct transport *transport)
 			data->import_marks = xstrdup(arg);
 		} else if (starts_with(capname, "no-private-update")) {
 			data->no_private_update = 1;
+		} else if (starts_with(capname, "object-format")) {
+			data->object_format = 1;
 		} else if (mandatory) {
 			die(_("unknown mandatory capability %s; this remote "
 			      "helper probably needs newer version of Git"),
@@ -1103,6 +1106,12 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 	data->get_refs_list_called = 1;
 	helper = get_helper(transport);
 
+	if (data->object_format) {
+		write_str_in_full(helper->in, "option object-format\n");
+		if (recvline(data, &buf) || strcmp(buf.buf, "ok"))
+			exit(128);
+	}
+
 	if (data->push && for_push)
 		write_str_in_full(helper->in, "list for-push\n");
 	else
@@ -1115,6 +1124,17 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 
 		if (!*buf.buf)
 			break;
+		else if (buf.buf[0] == ':') {
+			const char *value;
+			if (skip_prefix(buf.buf, ":object-format ", &value)) {
+				int algo = hash_algo_by_name(value);
+				if (algo == GIT_HASH_UNKNOWN)
+					die(_("unsupported object format '%s'"),
+					    value);
+				transport->hash_algo = &hash_algos[algo];
+			}
+			continue;
+		}
 
 		eov = strchr(buf.buf, ' ');
 		if (!eov)
@@ -1127,7 +1147,7 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 		if (buf.buf[0] == '@')
 			(*tail)->symref = xstrdup(buf.buf + 1);
 		else if (buf.buf[0] != '?')
-			get_oid_hex(buf.buf, &(*tail)->old_oid);
+			get_oid_hex_algop(buf.buf, &(*tail)->old_oid, transport->hash_algo);
 		if (eon) {
 			if (has_attribute(eon + 1, "unchanged")) {
 				(*tail)->status |= REF_STATUS_UPTODATE;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 18/44] remote-curl: implement object-format extensions
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (16 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 17/44] transport-helper: implement " brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
                     ` (25 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Implement the object-format extensions that let us determine the hash
algorithm in use when pushing, pulling, and fetching.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 1c9aa3d0ab..3ed0dfec1b 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -41,7 +41,9 @@ struct options {
 		deepen_relative : 1,
 		from_promisor : 1,
 		no_dependents : 1,
-		atomic : 1;
+		atomic : 1,
+		object_format : 1;
+	const struct git_hash_algo *hash_algo;
 };
 static struct options options;
 static struct string_list cas_options = STRING_LIST_INIT_DUP;
@@ -190,6 +192,16 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "filter")) {
 		options.filter = xstrdup(value);
 		return 0;
+	} else if (!strcmp(name, "object-format")) {
+		int algo;
+		options.object_format = 1;
+		if (strcmp(value, "true")) {
+			algo = hash_algo_by_name(value);
+			if (algo == GIT_HASH_UNKNOWN)
+				die("unknown object format '%s'", value);
+			options.hash_algo = &hash_algos[algo];
+		}
+		return 0;
 	} else {
 		return 1 /* unsupported */;
 	}
@@ -231,6 +243,7 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	case protocol_v0:
 		get_remote_heads(&reader, &list, for_push ? REF_NORMAL : 0,
 				 NULL, &heads->shallow);
+		options.hash_algo = reader.hash_algo;
 		break;
 	case protocol_unknown_version:
 		BUG("unknown protocol version");
@@ -509,6 +522,9 @@ static struct ref *get_refs(int for_push)
 static void output_refs(struct ref *refs)
 {
 	struct ref *posn;
+	if (options.object_format && options.hash_algo) {
+		printf(":object-format %s\n", options.hash_algo->name);
+	}
 	for (posn = refs; posn; posn = posn->next) {
 		if (posn->symref)
 			printf("@%s %s\n", posn->symref, posn->name);
@@ -1439,6 +1455,7 @@ int cmd_main(int argc, const char **argv)
 			printf("option\n");
 			printf("push\n");
 			printf("check-connectivity\n");
+			printf("object-format\n");
 			printf("\n");
 			fflush(stdout);
 		} else if (skip_prefix(buf.buf, "stateless-connect ", &arg)) {

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 19/44] builtin/clone: initialize hash algorithm properly
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (17 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 18/44] remote-curl: " brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 20/44] t5562: pass object-format in synthesized test data brian m. carlson
                     ` (24 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When performing a clone, we don't know what hash algorithm the other end
will support.  Currently, we don't support fetching data belonging to a
different algorithm, so we must know what algorithm the remote side is
using in order to properly initialize the repository.  We can know that
only after fetching the refs, so if the remote side has any references,
use that information to reinitialize the repository with the correct
hash algorithm information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/clone.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index cb48a291ca..f27d38bc8e 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1217,6 +1217,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	refs = transport_get_remote_refs(transport, &ref_prefixes);
 
 	if (refs) {
+		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
+
+		/*
+		 * Now that we know what algorithm the remote side is using,
+		 * let's set ours to the same thing.
+		 */
+		initialize_repository_version(hash_algo);
+		repo_set_hash_algo(the_repository, hash_algo);
+
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
 		/*
 		 * transport_get_remote_refs() may return refs with null sha-1

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 20/44] t5562: pass object-format in synthesized test data
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (18 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 21/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
                     ` (23 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Ensure that we pass the object-format capability in the synthesized test
data so that this test works with algorithms other than SHA-1.

In addition, add an additional test using the old data for when we're
using SHA-1 so that we can be sure that we preserve backwards
compatibility with servers not offering the object-format capability.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5562-http-backend-content-length.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 3f4ac71f83..c6ec625497 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -46,6 +46,7 @@ ssize_b100dots() {
 }
 
 test_expect_success 'setup' '
+	test_oid_init &&
 	HTTP_CONTENT_ENCODING="identity" &&
 	export HTTP_CONTENT_ENCODING &&
 	git config http.receivepack true &&
@@ -62,8 +63,8 @@ test_expect_success 'setup' '
 	test_copy_bytes 10 <fetch_body >fetch_body.trunc &&
 	hash_next=$(git commit-tree -p HEAD -m next HEAD^{tree}) &&
 	{
-		printf "%s %s refs/heads/newbranch\\0report-status\\n" \
-			"$ZERO_OID" "$hash_next" | packetize &&
+		printf "%s %s refs/heads/newbranch\\0report-status object-format=%s\\n" \
+			"$ZERO_OID" "$hash_next" "$(test_oid algo)" | packetize &&
 		printf 0000 &&
 		echo "$hash_next" | git pack-objects --stdout
 	} >push_body &&

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 21/44] fetch-pack: parse and advertise the object-format capability
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (19 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 20/44] t5562: pass object-format in synthesized test data brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 22/44] setup: set the_repository's hash algo when checking format brian m. carlson
                     ` (22 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Parse the server's object-format capability and respond accordingly,
dying if there is a mismatch.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fetch-pack.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index 34c339a5fe..ecfb8c3cf5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1180,6 +1180,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
 			      int sideband_all, int seen_ack)
 {
 	int ret = 0;
+	const char *hash_name;
 	struct strbuf req_buf = STRBUF_INIT;
 
 	if (server_supports_v2("fetch", 1))
@@ -1194,6 +1195,17 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
 					 args->server_options->items[i].string);
 	}
 
+	if (server_feature_v2("object-format", &hash_name)) {
+		int hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
+			die(_("mismatched algorithms: client %s; server %s"),
+			    the_hash_algo->name, hash_name);
+		packet_write_fmt(fd_out, "object-format=%s", the_hash_algo->name);
+	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
+		die(_("the server does not support algorithm '%s'"),
+		    the_hash_algo->name);
+	}
+
 	packet_buf_delim(&req_buf);
 	if (args->use_thin_pack)
 		packet_buf_write(&req_buf, "thin-pack");

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 22/44] setup: set the_repository's hash algo when checking format
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (20 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 21/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 23/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
                     ` (21 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When we're checking the repository's format, set the hash algorithm at
the same time.  This ensures that we perform a suitable initialization
early enough to avoid confusing any parts of the code.  If we defer
until later, we can end up with portions of the code which are confused
about the hash algorithm, resulting in segfaults when working with
SHA-256 repositories.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/setup.c b/setup.c
index 65fe5ecefb..019a1c6367 100644
--- a/setup.c
+++ b/setup.c
@@ -1273,6 +1273,7 @@ void check_repository_format(struct repository_format *fmt)
 		fmt = &repo_fmt;
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
+	repo_set_hash_algo(the_repository, fmt->hash_algo);
 	clear_repository_format(&repo_fmt);
 }
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 23/44] t3200: mark assertion with SHA1 prerequisite
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (21 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 22/44] setup: set the_repository's hash algo when checking format brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 24/44] packfile: compute and use the index CRC offset brian m. carlson
                     ` (20 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

One of the test assertions in this test checks that git branch -m works
even without a .git/config file.  However, if the repository requires
configuration extensions, such as because it uses a non-SHA-1 algorithm,
this assertion will fail.  Mark the assertion as requiring SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t3200-branch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 411a70b0ce..2a3fedc6b0 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -402,7 +402,7 @@ EOF
 
 mv .git/config .git/config-saved
 
-test_expect_success 'git branch -m q q2 without config should succeed' '
+test_expect_success SHA1 'git branch -m q q2 without config should succeed' '
 	git branch -m q q2 &&
 	git branch -m q2 q
 '

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 24/44] packfile: compute and use the index CRC offset
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (22 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 23/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 25/44] t5302: modernize test formatting brian m. carlson
                     ` (19 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Both v2 pack index files and the v3 format specified as part of the
NewHash work have similar data starting at the CRC table.  Much of the
existing code wants to read either this table or the offset entries
following it, and in doing so computes the offset each time.

In order to share as much code between v2 and v3, compute the offset of
the CRC table and store it when the pack is opened.  Use this value to
compute offsets to not only the CRC table, but to the offset entries
beyond it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/index-pack.c | 6 +-----
 object-store.h       | 1 +
 packfile.c           | 1 +
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f176dd28c8..7bea1fba52 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1555,13 +1555,9 @@ static void read_v2_anomalous_offsets(struct packed_git *p,
 {
 	const uint32_t *idx1, *idx2;
 	uint32_t i;
-	const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t);
 
 	/* The address of the 4-byte offset table */
-	idx1 = (((const uint32_t *)p->index_data)
-		+ 2 /* 8-byte header */
-		+ 256 /* fan out */
-		+ hashwords * p->num_objects /* object ID table */
+	idx1 = (((const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset))
 		+ p->num_objects /* CRC32 table */
 		);
 
diff --git a/object-store.h b/object-store.h
index d1e490f203..f439d47af8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -70,6 +70,7 @@ struct packed_git {
 	size_t index_size;
 	uint32_t num_objects;
 	uint32_t num_bad_objects;
+	uint32_t crc_offset;
 	unsigned char *bad_object_sha1;
 	int index_version;
 	time_t mtime;
diff --git a/packfile.c b/packfile.c
index f4e752996d..6ab5233613 100644
--- a/packfile.c
+++ b/packfile.c
@@ -178,6 +178,7 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
 		     */
 		    (sizeof(off_t) <= 4))
 			return error("pack too large for current definition of off_t in %s", path);
+		p->crc_offset = 8 + 4 * 256 + nr * hashsz;
 	}
 
 	p->index_version = version;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 25/44] t5302: modernize test formatting
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (23 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 24/44] packfile: compute and use the index CRC offset brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 26/44] builtin/show-index: provide options to determine hash algo brian m. carlson
                     ` (18 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Our style these days is to place the description and the opening quote
of the body on the same line as test_expect_success (if it fits), to
place the trailing quote on a line by itself after the body, and to use
tabs.  Since we're going to be making several significant changes to
this test, modernize the style to aid in readability of the subsequent
patches.

This patch should have no functional change.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5302-pack-index.sh | 360 +++++++++++++++++++++---------------------
 1 file changed, 184 insertions(+), 176 deletions(-)

diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index ad07f2f7fc..8981c9b90e 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -7,65 +7,65 @@ test_description='pack index with 64-bit offsets and object CRC'
 . ./test-lib.sh
 
 test_expect_success 'setup' '
-     test_oid_init &&
-     rawsz=$(test_oid rawsz) &&
-     rm -rf .git &&
-     git init &&
-     git config pack.threads 1 &&
-     i=1 &&
-     while test $i -le 100
-     do
-         iii=$(printf '%03i' $i)
-	 test-tool genrandom "bar" 200 > wide_delta_$iii &&
-	 test-tool genrandom "baz $iii" 50 >> wide_delta_$iii &&
-	 test-tool genrandom "foo"$i 100 > deep_delta_$iii &&
-	 test-tool genrandom "foo"$(expr $i + 1) 100 >> deep_delta_$iii &&
-	 test-tool genrandom "foo"$(expr $i + 2) 100 >> deep_delta_$iii &&
-         echo $iii >file_$iii &&
-	 test-tool genrandom "$iii" 8192 >>file_$iii &&
-         git update-index --add file_$iii deep_delta_$iii wide_delta_$iii &&
-         i=$(expr $i + 1) || return 1
-     done &&
-     { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
-     git update-index --add file_101 &&
-     tree=$(git write-tree) &&
-     commit=$(git commit-tree $tree </dev/null) && {
-	 echo $tree &&
-	 git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\)	.*/\\1/"
-     } >obj-list &&
-     git update-ref HEAD $commit
+	test_oid_init &&
+	rawsz=$(test_oid rawsz) &&
+	rm -rf .git &&
+	git init &&
+	git config pack.threads 1 &&
+	i=1 &&
+	while test $i -le 100
+	do
+		iii=$(printf '%03i' $i)
+		test-tool genrandom "bar" 200 > wide_delta_$iii &&
+		test-tool genrandom "baz $iii" 50 >> wide_delta_$iii &&
+		test-tool genrandom "foo"$i 100 > deep_delta_$iii &&
+		test-tool genrandom "foo"$(expr $i + 1) 100 >> deep_delta_$iii &&
+		test-tool genrandom "foo"$(expr $i + 2) 100 >> deep_delta_$iii &&
+		echo $iii >file_$iii &&
+		test-tool genrandom "$iii" 8192 >>file_$iii &&
+		git update-index --add file_$iii deep_delta_$iii wide_delta_$iii &&
+		i=$(expr $i + 1) || return 1
+	done &&
+	{ echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+	git update-index --add file_101 &&
+	tree=$(git write-tree) &&
+	commit=$(git commit-tree $tree </dev/null) && {
+		echo $tree &&
+		git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\)	.*/\\1/"
+	} >obj-list &&
+	git update-ref HEAD $commit
 '
 
-test_expect_success \
-    'pack-objects with index version 1' \
-    'pack1=$(git pack-objects --index-version=1 test-1 <obj-list) &&
-     git verify-pack -v "test-1-${pack1}.pack"'
+test_expect_success 'pack-objects with index version 1' '
+	pack1=$(git pack-objects --index-version=1 test-1 <obj-list) &&
+	git verify-pack -v "test-1-${pack1}.pack"
+'
 
-test_expect_success \
-    'pack-objects with index version 2' \
-    'pack2=$(git pack-objects --index-version=2 test-2 <obj-list) &&
-     git verify-pack -v "test-2-${pack2}.pack"'
+test_expect_success 'pack-objects with index version 2' '
+	pack2=$(git pack-objects --index-version=2 test-2 <obj-list) &&
+	git verify-pack -v "test-2-${pack2}.pack"
+'
 
-test_expect_success \
-    'both packs should be identical' \
-    'cmp "test-1-${pack1}.pack" "test-2-${pack2}.pack"'
+test_expect_success 'both packs should be identical' '
+	cmp "test-1-${pack1}.pack" "test-2-${pack2}.pack"
+'
 
-test_expect_success \
-    'index v1 and index v2 should be different' \
-    '! cmp "test-1-${pack1}.idx" "test-2-${pack2}.idx"'
+test_expect_success 'index v1 and index v2 should be different' '
+	! cmp "test-1-${pack1}.idx" "test-2-${pack2}.idx"
+'
 
-test_expect_success \
-    'index-pack with index version 1' \
-    'git index-pack --index-version=1 -o 1.idx "test-1-${pack1}.pack"'
+test_expect_success 'index-pack with index version 1' '
+	git index-pack --index-version=1 -o 1.idx "test-1-${pack1}.pack"
+'
 
-test_expect_success \
-    'index-pack with index version 2' \
-    'git index-pack --index-version=2 -o 2.idx "test-1-${pack1}.pack"'
+test_expect_success 'index-pack with index version 2' '
+	git index-pack --index-version=2 -o 2.idx "test-1-${pack1}.pack"
+'
 
-test_expect_success \
-    'index-pack results should match pack-objects ones' \
-    'cmp "test-1-${pack1}.idx" "1.idx" &&
-     cmp "test-2-${pack2}.idx" "2.idx"'
+test_expect_success 'index-pack results should match pack-objects ones' '
+	cmp "test-1-${pack1}.idx" "1.idx" &&
+	cmp "test-2-${pack2}.idx" "2.idx"
+'
 
 test_expect_success 'index-pack --verify on index version 1' '
 	git index-pack --verify "test-1-${pack1}.pack"
@@ -75,13 +75,13 @@ test_expect_success 'index-pack --verify on index version 2' '
 	git index-pack --verify "test-2-${pack2}.pack"
 '
 
-test_expect_success \
-    'pack-objects --index-version=2, is not accepted' \
-    'test_must_fail git pack-objects --index-version=2, test-3 <obj-list'
+test_expect_success 'pack-objects --index-version=2, is not accepted' '
+	test_must_fail git pack-objects --index-version=2, test-3 <obj-list
+'
 
-test_expect_success \
-    'index v2: force some 64-bit offsets with pack-objects' \
-    'pack3=$(git pack-objects --index-version=2,0x40000 test-3 <obj-list)'
+test_expect_success 'index v2: force some 64-bit offsets with pack-objects' '
+	pack3=$(git pack-objects --index-version=2,0x40000 test-3 <obj-list)
+'
 
 if msg=$(git verify-pack -v "test-3-${pack3}.pack" 2>&1) ||
 	! (echo "$msg" | grep "pack too large .* off_t")
@@ -91,21 +91,21 @@ else
 	say "# skipping tests concerning 64-bit offsets"
 fi
 
-test_expect_success OFF64_T \
-    'index v2: verify a pack with some 64-bit offsets' \
-    'git verify-pack -v "test-3-${pack3}.pack"'
+test_expect_success OFF64_T 'index v2: verify a pack with some 64-bit offsets' '
+	git verify-pack -v "test-3-${pack3}.pack"
+'
 
-test_expect_success OFF64_T \
-    '64-bit offsets: should be different from previous index v2 results' \
-    '! cmp "test-2-${pack2}.idx" "test-3-${pack3}.idx"'
+test_expect_success OFF64_T '64-bit offsets: should be different from previous index v2 results' '
+	! cmp "test-2-${pack2}.idx" "test-3-${pack3}.idx"
+'
 
-test_expect_success OFF64_T \
-    'index v2: force some 64-bit offsets with index-pack' \
-    'git index-pack --index-version=2,0x40000 -o 3.idx "test-1-${pack1}.pack"'
+test_expect_success OFF64_T 'index v2: force some 64-bit offsets with index-pack' '
+	git index-pack --index-version=2,0x40000 -o 3.idx "test-1-${pack1}.pack"
+'
 
-test_expect_success OFF64_T \
-    '64-bit offsets: index-pack result should match pack-objects one' \
-    'cmp "test-3-${pack3}.idx" "3.idx"'
+test_expect_success OFF64_T '64-bit offsets: index-pack result should match pack-objects one' '
+	cmp "test-3-${pack3}.idx" "3.idx"
+'
 
 test_expect_success OFF64_T 'index-pack --verify on 64-bit offset v2 (cheat)' '
 	# This cheats by knowing which lower offset should still be encoded
@@ -120,135 +120,143 @@ test_expect_success OFF64_T 'index-pack --verify on 64-bit offset v2' '
 # returns the object number for given object in given pack index
 index_obj_nr()
 {
-    idx_file=$1
-    object_sha1=$2
-    nr=0
-    git show-index < $idx_file |
-    while read offs sha1 extra
-    do
-      nr=$(($nr + 1))
-      test "$sha1" = "$object_sha1" || continue
-      echo "$(($nr - 1))"
-      break
-    done
+	idx_file=$1
+	object_sha1=$2
+	nr=0
+	git show-index < $idx_file |
+	while read offs sha1 extra
+	do
+	  nr=$(($nr + 1))
+	  test "$sha1" = "$object_sha1" || continue
+	  echo "$(($nr - 1))"
+	  break
+	done
 }
 
 # returns the pack offset for given object as found in given pack index
 index_obj_offset()
 {
-    idx_file=$1
-    object_sha1=$2
-    git show-index < $idx_file | grep $object_sha1 |
-    ( read offs extra && echo "$offs" )
+	idx_file=$1
+	object_sha1=$2
+	git show-index < $idx_file | grep $object_sha1 |
+	( read offs extra && echo "$offs" )
 }
 
-test_expect_success \
-    '[index v1] 1) stream pack to repository' \
-    'git index-pack --index-version=1 --stdin < "test-1-${pack1}.pack" &&
-     git prune-packed &&
-     git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
-     cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
-     cmp "test-1-${pack1}.idx"  ".git/objects/pack/pack-${pack1}.idx"'
+test_expect_success '[index v1] 1) stream pack to repository' '
+	git index-pack --index-version=1 --stdin < "test-1-${pack1}.pack" &&
+	git prune-packed &&
+	git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
+	cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
+	cmp "test-1-${pack1}.idx"	".git/objects/pack/pack-${pack1}.idx"
+'
 
 test_expect_success \
-    '[index v1] 2) create a stealth corruption in a delta base reference' \
-    '# This test assumes file_101 is a delta smaller than 16 bytes.
-     # It should be against file_100 but we substitute its base for file_099
-     sha1_101=$(git hash-object file_101) &&
-     sha1_099=$(git hash-object file_099) &&
-     offs_101=$(index_obj_offset 1.idx $sha1_101) &&
-     nr_099=$(index_obj_nr 1.idx $sha1_099) &&
-     chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
-     recordsz=$((rawsz + 4)) &&
-     dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
-        if=".git/objects/pack/pack-${pack1}.idx" \
-        skip=$((4 + 256 * 4 + $nr_099 * recordsz)) \
-        bs=1 count=$rawsz conv=notrunc &&
-     git cat-file blob $sha1_101 > file_101_foo1'
+	'[index v1] 2) create a stealth corruption in a delta base reference' '
+	# This test assumes file_101 is a delta smaller than 16 bytes.
+	# It should be against file_100 but we substitute its base for file_099
+	sha1_101=$(git hash-object file_101) &&
+	sha1_099=$(git hash-object file_099) &&
+	offs_101=$(index_obj_offset 1.idx $sha1_101) &&
+	nr_099=$(index_obj_nr 1.idx $sha1_099) &&
+	chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
+	recordsz=$((rawsz + 4)) &&
+	dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
+	       if=".git/objects/pack/pack-${pack1}.idx" \
+	       skip=$((4 + 256 * 4 + $nr_099 * recordsz)) \
+	       bs=1 count=$rawsz conv=notrunc &&
+	git cat-file blob $sha1_101 > file_101_foo1
+'
 
 test_expect_success \
-    '[index v1] 3) corrupted delta happily returned wrong data' \
-    'test -f file_101_foo1 && ! cmp file_101 file_101_foo1'
+	'[index v1] 3) corrupted delta happily returned wrong data' '
+	test -f file_101_foo1 && ! cmp file_101 file_101_foo1
+'
 
 test_expect_success \
-    '[index v1] 4) confirm that the pack is actually corrupted' \
-    'test_must_fail git fsck --full $commit'
+	'[index v1] 4) confirm that the pack is actually corrupted' '
+	test_must_fail git fsck --full $commit
+'
 
 test_expect_success \
-    '[index v1] 5) pack-objects happily reuses corrupted data' \
-    'pack4=$(git pack-objects test-4 <obj-list) &&
-     test -f "test-4-${pack4}.pack"'
+	'[index v1] 5) pack-objects happily reuses corrupted data' '
+	pack4=$(git pack-objects test-4 <obj-list) &&
+	test -f "test-4-${pack4}.pack"
+'
+
+test_expect_success '[index v1] 6) newly created pack is BAD !' '
+	test_must_fail git verify-pack -v "test-4-${pack4}.pack"
+'
+
+test_expect_success '[index v2] 1) stream pack to repository' '
+	rm -f .git/objects/pack/* &&
+	git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
+	git prune-packed &&
+	git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
+	cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
+	cmp "test-2-${pack1}.idx"	".git/objects/pack/pack-${pack1}.idx"
+'
 
 test_expect_success \
-    '[index v1] 6) newly created pack is BAD !' \
-    'test_must_fail git verify-pack -v "test-4-${pack4}.pack"'
+	'[index v2] 2) create a stealth corruption in a delta base reference' '
+	# This test assumes file_101 is a delta smaller than 16 bytes.
+	# It should be against file_100 but we substitute its base for file_099
+	sha1_101=$(git hash-object file_101) &&
+	sha1_099=$(git hash-object file_099) &&
+	offs_101=$(index_obj_offset 1.idx $sha1_101) &&
+	nr_099=$(index_obj_nr 1.idx $sha1_099) &&
+	chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
+	dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
+		if=".git/objects/pack/pack-${pack1}.idx" \
+		skip=$((8 + 256 * 4 + $nr_099 * rawsz)) \
+		bs=1 count=$rawsz conv=notrunc &&
+	git cat-file blob $sha1_101 > file_101_foo2
+'
 
 test_expect_success \
-    '[index v2] 1) stream pack to repository' \
-    'rm -f .git/objects/pack/* &&
-     git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
-     git prune-packed &&
-     git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
-     cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
-     cmp "test-2-${pack1}.idx"  ".git/objects/pack/pack-${pack1}.idx"'
+	'[index v2] 3) corrupted delta happily returned wrong data' '
+	test -f file_101_foo2 && ! cmp file_101 file_101_foo2
+'
 
 test_expect_success \
-    '[index v2] 2) create a stealth corruption in a delta base reference' \
-    '# This test assumes file_101 is a delta smaller than 16 bytes.
-     # It should be against file_100 but we substitute its base for file_099
-     sha1_101=$(git hash-object file_101) &&
-     sha1_099=$(git hash-object file_099) &&
-     offs_101=$(index_obj_offset 1.idx $sha1_101) &&
-     nr_099=$(index_obj_nr 1.idx $sha1_099) &&
-     chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
-     dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
-        if=".git/objects/pack/pack-${pack1}.idx" \
-        skip=$((8 + 256 * 4 + $nr_099 * rawsz)) \
-        bs=1 count=$rawsz conv=notrunc &&
-     git cat-file blob $sha1_101 > file_101_foo2'
+	'[index v2] 4) confirm that the pack is actually corrupted' '
+	test_must_fail git fsck --full $commit
+'
 
 test_expect_success \
-    '[index v2] 3) corrupted delta happily returned wrong data' \
-    'test -f file_101_foo2 && ! cmp file_101 file_101_foo2'
+	'[index v2] 5) pack-objects refuses to reuse corrupted data' '
+	test_must_fail git pack-objects test-5 <obj-list &&
+	test_must_fail git pack-objects --no-reuse-object test-6 <obj-list
+'
 
 test_expect_success \
-    '[index v2] 4) confirm that the pack is actually corrupted' \
-    'test_must_fail git fsck --full $commit'
-
-test_expect_success \
-    '[index v2] 5) pack-objects refuses to reuse corrupted data' \
-    'test_must_fail git pack-objects test-5 <obj-list &&
-     test_must_fail git pack-objects --no-reuse-object test-6 <obj-list'
-
-test_expect_success \
-    '[index v2] 6) verify-pack detects CRC mismatch' \
-    'rm -f .git/objects/pack/* &&
-     git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
-     git verify-pack ".git/objects/pack/pack-${pack1}.pack" &&
-     obj=$(git hash-object file_001) &&
-     nr=$(index_obj_nr ".git/objects/pack/pack-${pack1}.idx" $obj) &&
-     chmod +w ".git/objects/pack/pack-${pack1}.idx" &&
-     printf xxxx | dd of=".git/objects/pack/pack-${pack1}.idx" conv=notrunc \
-        bs=1 count=4 seek=$((8 + 256 * 4 + $(wc -l <obj-list) * rawsz + $nr * 4)) &&
-     ( while read obj
-       do git cat-file -p $obj >/dev/null || exit 1
-       done <obj-list ) &&
-     test_must_fail git verify-pack ".git/objects/pack/pack-${pack1}.pack"
+	'[index v2] 6) verify-pack detects CRC mismatch' '
+	rm -f .git/objects/pack/* &&
+	git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
+	git verify-pack ".git/objects/pack/pack-${pack1}.pack" &&
+	obj=$(git hash-object file_001) &&
+	nr=$(index_obj_nr ".git/objects/pack/pack-${pack1}.idx" $obj) &&
+	chmod +w ".git/objects/pack/pack-${pack1}.idx" &&
+	printf xxxx | dd of=".git/objects/pack/pack-${pack1}.idx" conv=notrunc \
+		bs=1 count=4 seek=$((8 + 256 * 4 + $(wc -l <obj-list) * rawsz + $nr * 4)) &&
+	 ( while read obj
+	   do git cat-file -p $obj >/dev/null || exit 1
+	   done <obj-list ) &&
+	test_must_fail git verify-pack ".git/objects/pack/pack-${pack1}.pack"
 '
 
 test_expect_success 'running index-pack in the object store' '
-    rm -f .git/objects/pack/* &&
-    cp test-1-${pack1}.pack .git/objects/pack/pack-${pack1}.pack &&
-    (
-	cd .git/objects/pack &&
-	git index-pack pack-${pack1}.pack
-    ) &&
-    test -f .git/objects/pack/pack-${pack1}.idx
+	rm -f .git/objects/pack/* &&
+	cp test-1-${pack1}.pack .git/objects/pack/pack-${pack1}.pack &&
+	(
+		cd .git/objects/pack &&
+		git index-pack pack-${pack1}.pack
+	) &&
+	test -f .git/objects/pack/pack-${pack1}.idx
 '
 
 test_expect_success 'index-pack --strict warns upon missing tagger in tag' '
-    sha=$(git rev-parse HEAD) &&
-    cat >wrong-tag <<EOF &&
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<EOF &&
 object $sha
 type commit
 tag guten tag
@@ -256,18 +264,18 @@ tag guten tag
 This is an invalid tag.
 EOF
 
-    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
-    pack1=$(echo $tag $sha | git pack-objects tag-test) &&
-    echo remove tag object &&
-    thirtyeight=${tag#??} &&
-    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
-    git index-pack --strict tag-test-${pack1}.pack 2>err &&
-    grep "^warning:.* expected .tagger. line" err
+	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	pack1=$(echo $tag $sha | git pack-objects tag-test) &&
+	echo remove tag object &&
+	thirtyeight=${tag#??} &&
+	rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
+	git index-pack --strict tag-test-${pack1}.pack 2>err &&
+	grep "^warning:.* expected .tagger. line" err
 '
 
 test_expect_success 'index-pack --fsck-objects also warns upon missing tagger in tag' '
-    git index-pack --fsck-objects tag-test-${pack1}.pack 2>err &&
-    grep "^warning:.* expected .tagger. line" err
+	git index-pack --fsck-objects tag-test-${pack1}.pack 2>err &&
+	grep "^warning:.* expected .tagger. line" err
 '
 
 test_done

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 26/44] builtin/show-index: provide options to determine hash algo
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (24 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 25/44] t5302: modernize test formatting brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 27/44] t1302: expect repo format version 1 for SHA-256 brian m. carlson
                     ` (17 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

show-index is capable of reading any possible index file whether or not
the index is inside a repository.  However, because our index files lack
metadata about the hash algorithm in use, it's not possible to
autodetect the algorithm that a particular index file is using.

In order to allow us to read index files of any algorithm, let's set up
the .git directory gently so that we default to the algorithm for the
current repository, and add an --object-format option to allow users to
override this setting and continue to run show-index outside of a
repository altogether.  Let's also document this new option so that
people can find it and use it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-show-index.txt | 11 ++++++++++-
 builtin/show-index.c             | 29 ++++++++++++++++++++++++-----
 git.c                            |  2 +-
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-show-index.txt b/Documentation/git-show-index.txt
index 424e4ba84c..39b1d8eaa1 100644
--- a/Documentation/git-show-index.txt
+++ b/Documentation/git-show-index.txt
@@ -9,7 +9,7 @@ git-show-index - Show packed archive index
 SYNOPSIS
 --------
 [verse]
-'git show-index'
+'git show-index' [--object-format=<hash-algorithm>]
 
 
 DESCRIPTION
@@ -36,6 +36,15 @@ Note that you can get more information on a packfile by calling
 linkgit:git-verify-pack[1]. However, as this command considers only the
 index file itself, it's both faster and more flexible.
 
+OPTIONS
+-------
+
+--object-format=<hash-algorithm>::
+	Specify the given object format (hash algorithm) for the index file.  The
+	valid values are 'sha1' and (if enabled) 'sha256'.  The default is the
+	algorithm for the current repository (set by `extensions.objectFormat`), or
+	'sha1' if no value is set or outside a repository..
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/show-index.c b/builtin/show-index.c
index 0826f6a5a2..8106b03a6b 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -1,9 +1,12 @@
 #include "builtin.h"
 #include "cache.h"
 #include "pack.h"
+#include "parse-options.h"
 
-static const char show_index_usage[] =
-"git show-index";
+static const char *const show_index_usage[] = {
+	"git show-index [--object-format=<hash-algorithm>]",
+	NULL
+};
 
 int cmd_show_index(int argc, const char **argv, const char *prefix)
 {
@@ -11,10 +14,26 @@ int cmd_show_index(int argc, const char **argv, const char *prefix)
 	unsigned nr;
 	unsigned int version;
 	static unsigned int top_index[256];
-	const unsigned hashsz = the_hash_algo->rawsz;
+	unsigned hashsz;
+	const char *hash_name = NULL;
+	int hash_algo;
+	const struct option show_index_options[] = {
+		OPT_STRING(0, "object-format", &hash_name, N_("hash-algorithm"),
+			   N_("specify the hash algorithm to use")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, show_index_options, show_index_usage, 0);
+
+	if (hash_name) {
+		hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("Unknown hash algorithm"));
+		repo_set_hash_algo(the_repository, hash_algo);
+	}
+
+	hashsz = the_hash_algo->rawsz;
 
-	if (argc != 1)
-		usage(show_index_usage);
 	if (fread(top_index, 2 * 4, 1, stdin) != 1)
 		die("unable to read header");
 	if (top_index[0] == htonl(PACK_IDX_SIGNATURE)) {
diff --git a/git.c b/git.c
index a2d337eed7..2f021b97f3 100644
--- a/git.c
+++ b/git.c
@@ -574,7 +574,7 @@ static struct cmd_struct commands[] = {
 	{ "shortlog", cmd_shortlog, RUN_SETUP_GENTLY | USE_PAGER },
 	{ "show", cmd_show, RUN_SETUP },
 	{ "show-branch", cmd_show_branch, RUN_SETUP },
-	{ "show-index", cmd_show_index },
+	{ "show-index", cmd_show_index, RUN_SETUP_GENTLY },
 	{ "show-ref", cmd_show_ref, RUN_SETUP },
 	{ "sparse-checkout", cmd_sparse_checkout, RUN_SETUP | NEED_WORK_TREE },
 	{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 27/44] t1302: expect repo format version 1 for SHA-256
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (25 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 26/44] builtin/show-index: provide options to determine hash algo brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 28/44] Documentation/technical: document object-format for protocol v2 brian m. carlson
                     ` (16 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When using SHA-256, we need to take advantage of the extensions section
in the config file, so we need to use repository format version 1.
Update the test to look for the correct value.

Note that test_oid produces a value without a trailing newline, so use
echo to ensure we print a trailing newline to compare it correctly
against the actual results.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1302-repo-version.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t1302-repo-version.sh b/t/t1302-repo-version.sh
index ce4cff13bb..d60c042ce8 100755
--- a/t/t1302-repo-version.sh
+++ b/t/t1302-repo-version.sh
@@ -8,6 +8,10 @@ test_description='Test repository version check'
 . ./test-lib.sh
 
 test_expect_success 'setup' '
+	test_oid_cache <<-\EOF &&
+	version sha1:0
+	version sha256:1
+	EOF
 	cat >test.patch <<-\EOF &&
 	diff --git a/test.txt b/test.txt
 	new file mode 100644
@@ -23,7 +27,7 @@ test_expect_success 'setup' '
 '
 
 test_expect_success 'gitdir selection on normal repos' '
-	echo 0 >expect &&
+	echo $(test_oid version) >expect &&
 	git config core.repositoryformatversion >actual &&
 	git -C test config core.repositoryformatversion >actual2 &&
 	test_cmp expect actual &&

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 28/44] Documentation/technical: document object-format for protocol v2
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (26 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 27/44] t1302: expect repo format version 1 for SHA-256 brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 29/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
                     ` (15 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Document the object-format extension for protocol v2.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/technical/protocol-v2.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 7e3766cafb..107e421fb7 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -453,3 +453,12 @@ included in a request.  This is done by sending each option as a
 a request.
 
 The provided options must not contain a NUL or LF character.
+
+ object-format
+~~~~~~~~~~~~~~~
+
+The server can advertise the `object-format` capability with a value `X` (in the
+form `object-format=X`) to notify the client that the server is able to deal
+with objects using hash algorithm X.  If not specified, the server is assumed to
+only handle SHA-1.  If the client would like to use a hash algorithm other than
+SHA-1, it should specify its object-format string.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 29/44] connect: pass full packet reader when parsing v2 refs
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (27 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 28/44] Documentation/technical: document object-format for protocol v2 brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 30/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
                     ` (14 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When we're parsing refs, we need to know not only what the line we're
parsing is, but also the hash algorithm we should use to parse it, which
is stored in the reader object.  Pass the packet reader object through
to the protocol v2 ref parsing function.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/connect.c b/connect.c
index 915f1736a0..1d05bc56ed 100644
--- a/connect.c
+++ b/connect.c
@@ -374,7 +374,7 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 }
 
 /* Returns 1 when a valid ref has been added to `list`, 0 otherwise */
-static int process_ref_v2(const char *line, struct ref ***list)
+static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 {
 	int ret = 1;
 	int i = 0;
@@ -382,6 +382,7 @@ static int process_ref_v2(const char *line, struct ref ***list)
 	struct ref *ref;
 	struct string_list line_sections = STRING_LIST_INIT_DUP;
 	const char *end;
+	const char *line = reader->line;
 
 	/*
 	 * Ref lines have a number of fields which are space deliminated.  The
@@ -469,7 +470,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 
 	/* Process response from server */
 	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
-		if (!process_ref_v2(reader->line, &list))
+		if (!process_ref_v2(reader, &list))
 			die(_("invalid ls-refs response: %s"), reader->line);
 	}
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 30/44] connect: parse v2 refs with correct hash algorithm
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (28 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 29/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 31/44] serve: advertise object-format capability for protocol v2 brian m. carlson
                     ` (13 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When using protocol v2, we need to know what hash algorithm is used by
the remote end.  See if the server has sent us an object-format
capability, and if so, use it to determine the hash algorithm in use and
set that value in the packet reader.  Parse the refs using this
algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/connect.c b/connect.c
index 1d05bc56ed..66650ff2d3 100644
--- a/connect.c
+++ b/connect.c
@@ -283,7 +283,7 @@ static int process_ref(const struct packet_reader *reader, int len,
 		die(_("protocol error: unexpected capabilities^{}"));
 	} else if (check_ref(name, flags)) {
 		struct ref *ref = alloc_ref(name);
-		memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
+		oidcpy(&ref->old_oid, &old_oid);
 		**list = ref;
 		*list = &ref->next;
 	}
@@ -395,7 +395,7 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 		goto out;
 	}
 
-	if (parse_oid_hex(line_sections.items[i++].string, &old_oid, &end) ||
+	if (parse_oid_hex_algop(line_sections.items[i++].string, &old_oid, &end, reader->hash_algo) ||
 	    *end) {
 		ret = 0;
 		goto out;
@@ -403,7 +403,7 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 
 	ref = alloc_ref(line_sections.items[i++].string);
 
-	oidcpy(&ref->old_oid, &old_oid);
+	memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
 	**list = ref;
 	*list = &ref->next;
 
@@ -416,7 +416,8 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 			struct object_id peeled_oid;
 			char *peeled_name;
 			struct ref *peeled;
-			if (parse_oid_hex(arg, &peeled_oid, &end) || *end) {
+			if (parse_oid_hex_algop(arg, &peeled_oid, &end,
+						reader->hash_algo) || *end) {
 				ret = 0;
 				goto out;
 			}
@@ -424,7 +425,8 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 			peeled_name = xstrfmt("%s^{}", ref->name);
 			peeled = alloc_ref(peeled_name);
 
-			oidcpy(&peeled->old_oid, &peeled_oid);
+			memcpy(peeled->old_oid.hash, peeled_oid.hash,
+			       reader->hash_algo->rawsz);
 			**list = peeled;
 			*list = &peeled->next;
 
@@ -443,6 +445,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     const struct string_list *server_options)
 {
 	int i;
+	const char *hash_name;
 	*list = NULL;
 
 	if (server_supports_v2("ls-refs", 1))
@@ -451,6 +454,14 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	if (server_supports_v2("agent", 0))
 		packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized());
 
+	if (server_feature_v2("object-format", &hash_name)) {
+		int hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("unknown object format '%s' specified by server"), hash_name);
+		reader->hash_algo = &hash_algos[hash_algo];
+		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
+	}
+
 	if (server_options && server_options->nr &&
 	    server_supports_v2("server-option", 1))
 		for (i = 0; i < server_options->nr; i++)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 31/44] serve: advertise object-format capability for protocol v2
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (29 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 30/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 32/44] t5500: make hash independent brian m. carlson
                     ` (12 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

In order to communicate the protocol supported by the server side, add
support for advertising the object-format capability.  We check that the
client side sends us an identical algorithm if it sends us its own
object-format capability, and assume it speaks SHA-1 if not.

In the test, when we're using an algorithm other than SHA-1, we need to
specify the algorithm in use so we don't get a failure with an "unknown
format" message.  Add a test that we handle a mismatched algorithm.
Remove the test_oid_init call since it's no longer necessary.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c            |  2 ++
 serve.c              | 27 +++++++++++++++++++++++++++
 t/t5701-git-serve.sh | 25 +++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/connect.c b/connect.c
index 66650ff2d3..2ada5b5161 100644
--- a/connect.c
+++ b/connect.c
@@ -460,6 +460,8 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			die(_("unknown object format '%s' specified by server"), hash_name);
 		reader->hash_algo = &hash_algos[hash_algo];
 		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
+	} else {
+		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
 	}
 
 	if (server_options && server_options->nr &&
diff --git a/serve.c b/serve.c
index 317256c1a4..7ab7807fef 100644
--- a/serve.c
+++ b/serve.c
@@ -22,6 +22,14 @@ static int agent_advertise(struct repository *r,
 	return 1;
 }
 
+static int object_format_advertise(struct repository *r,
+				   struct strbuf *value)
+{
+	if (value)
+		strbuf_addstr(value, r->hash_algo->name);
+	return 1;
+}
+
 struct protocol_capability {
 	/*
 	 * The name of the capability.  The server uses this name when
@@ -57,6 +65,7 @@ static struct protocol_capability capabilities[] = {
 	{ "ls-refs", always_advertise, ls_refs },
 	{ "fetch", upload_pack_advertise, upload_pack_v2 },
 	{ "server-option", always_advertise, NULL },
+	{ "object-format", object_format_advertise, NULL },
 };
 
 static void advertise_capabilities(void)
@@ -153,6 +162,22 @@ int has_capability(const struct argv_array *keys, const char *capability,
 	return 0;
 }
 
+static void check_algorithm(struct repository *r, struct argv_array *keys)
+{
+	int client = GIT_HASH_SHA1, server = hash_algo_by_ptr(r->hash_algo);
+	const char *algo_name;
+
+	if (has_capability(keys, "object-format", &algo_name)) {
+		client = hash_algo_by_name(algo_name);
+		if (client == GIT_HASH_UNKNOWN)
+			die("unknown object format '%s'", algo_name);
+	}
+
+	if (client != server)
+		die("mismatched object format: server %s; client %s\n",
+		    r->hash_algo->name, hash_algos[client].name);
+}
+
 enum request_state {
 	PROCESS_REQUEST_KEYS,
 	PROCESS_REQUEST_DONE,
@@ -223,6 +248,8 @@ static int process_request(void)
 	if (!command)
 		die("no command requested");
 
+	check_algorithm(the_repository, &keys);
+
 	command->command(the_repository, &keys, &reader);
 
 	argv_array_clear(&keys);
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index ffb9613885..a1f5fdc9fd 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -5,12 +5,17 @@ test_description='test protocol v2 server commands'
 . ./test-lib.sh
 
 test_expect_success 'test capability advertisement' '
+	test_oid_cache <<-EOF &&
+	wrong_algo sha1:sha256
+	wrong_algo sha256:sha1
+	EOF
 	cat >expect <<-EOF &&
 	version 2
 	agent=git/$(git version | cut -d" " -f3)
 	ls-refs
 	fetch=shallow
 	server-option
+	object-format=$(test_oid algo)
 	0000
 	EOF
 
@@ -45,6 +50,7 @@ test_expect_success 'request invalid capability' '
 test_expect_success 'request with no command' '
 	test-tool pkt-line pack >in <<-EOF &&
 	agent=git/test
+	object-format=$(test_oid algo)
 	0000
 	EOF
 	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
@@ -54,6 +60,7 @@ test_expect_success 'request with no command' '
 test_expect_success 'request invalid command' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=foo
+	object-format=$(test_oid algo)
 	agent=git/test
 	0000
 	EOF
@@ -61,6 +68,17 @@ test_expect_success 'request invalid command' '
 	test_i18ngrep "invalid command" err
 '
 
+test_expect_success 'wrong object-format' '
+	test-tool pkt-line pack >in <<-EOF &&
+	command=fetch
+	agent=git/test
+	object-format=$(test_oid wrong_algo)
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	test_i18ngrep "mismatched object format" err
+'
+
 # Test the basics of ls-refs
 #
 test_expect_success 'setup some refs and tags' '
@@ -74,6 +92,7 @@ test_expect_success 'setup some refs and tags' '
 test_expect_success 'basics of ls-refs' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0000
 	EOF
 
@@ -96,6 +115,7 @@ test_expect_success 'basics of ls-refs' '
 test_expect_success 'basic ref-prefixes' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0001
 	ref-prefix refs/heads/master
 	ref-prefix refs/tags/one
@@ -116,6 +136,7 @@ test_expect_success 'basic ref-prefixes' '
 test_expect_success 'refs/heads prefix' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0001
 	ref-prefix refs/heads/
 	0000
@@ -136,6 +157,7 @@ test_expect_success 'refs/heads prefix' '
 test_expect_success 'peel parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0001
 	peel
 	ref-prefix refs/tags/
@@ -157,6 +179,7 @@ test_expect_success 'peel parameter' '
 test_expect_success 'symrefs parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0001
 	symrefs
 	ref-prefix refs/heads/
@@ -178,6 +201,7 @@ test_expect_success 'symrefs parameter' '
 test_expect_success 'sending server-options' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	server-option=hello
 	server-option=world
 	0001
@@ -200,6 +224,7 @@ test_expect_success 'unexpected lines are not allowed in fetch request' '
 
 	test-tool pkt-line pack >in <<-EOF &&
 	command=fetch
+	object-format=$(test_oid algo)
 	0001
 	this-is-not-a-command
 	0000

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 32/44] t5500: make hash independent
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (30 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 31/44] serve: advertise object-format capability for protocol v2 brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 33/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
                     ` (11 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

This test has hard-coded pkt-lines with object IDs.  The pkt-line
lengths necessarily differ between hash algorithms, so generate these
lines with the packetize helper so they're always the right size.  In
addition, we will require an object-format capability for SHA-256, so
pass that capability on to the upload-pack process.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5500-fetch-pack.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 8c54e34ef1..dfed113247 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -871,9 +871,10 @@ test_expect_success 'shallow since with commit graph and already-seen commit' '
 
 	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
 	0012command=fetch
+	$(echo "object-format=$(test_oid algo)" | packetize)
 	00010013deepen-since 1
-	0032want $(git rev-parse other)
-	0032have $(git rev-parse master)
+	$(echo "want $(git rev-parse other)" | packetize)
+	$(echo "have $(git rev-parse master)" | packetize)
 	0000
 	EOF
 	)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 33/44] builtin/ls-remote: initialize repository based on fetch
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (31 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 32/44] t5500: make hash independent brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 34/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
                     ` (10 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

ls-remote may or may not operate within a repository, and as such will
not have been initialized with the repository's hash algorithm.  Even if
it were, the remote side could be using a different algorithm and we
would still want to display those refs properly.  Find the hash
algorithm used by the remote side by querying the transport object and
set our hash algorithm accordingly.

Without this change, if the remote side is using SHA-256, we truncate
the refs to 40 hex characters, since that's the length of the default
hash algorithm (SHA-1).

Note that technically this is not a correct setting of the repository
hash algorithm since, if we are in a repository, it might be one of a
different hash algorithm from the remote side.  However, our current
code paths don't handle multiple algorithms and won't for some time, so
this is the best we can do.  We rely on the fact that ls-remote never
modifies the current repository, which is a reasonable assumption to
make.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/ls-remote.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 6ef519514b..3a4dd12903 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -118,6 +118,10 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		transport->server_options = &server_options;
 
 	ref = transport_get_remote_refs(transport, &ref_prefixes);
+	if (ref) {
+		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
+		repo_set_hash_algo(the_repository, hash_algo);
+	}
 	if (transport_disconnect(transport)) {
 		UNLEAK(sorting);
 		return 1;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 34/44] remote-curl: detect algorithm for dumb HTTP by size
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (32 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 33/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 35/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
                     ` (9 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When reading the info/refs file for a repository, we have no explicit
way to detect which hash algorithm is in use because the file doesn't
provide one. Detect the hash algorithm in use by the size of the first
object ID.

We anonymize the URL like elsewhere in the function in case the user has
decided to include a secret in the URL.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 3ed0dfec1b..617edfedf4 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -252,6 +252,19 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	return list;
 }
 
+static const struct git_hash_algo *detect_hash_algo(struct discovery *heads)
+{
+	const char *p = memchr(heads->buf, '\t', heads->len);
+	int algo;
+	if (!p)
+		return NULL;
+
+	algo = hash_algo_by_length((p - heads->buf) / 2);
+	if (algo == GIT_HASH_UNKNOWN)
+		return NULL;
+	return &hash_algos[algo];
+}
+
 static struct ref *parse_info_refs(struct discovery *heads)
 {
 	char *data, *start, *mid;
@@ -262,6 +275,12 @@ static struct ref *parse_info_refs(struct discovery *heads)
 	struct ref *ref = NULL;
 	struct ref *last_ref = NULL;
 
+	options.hash_algo = detect_hash_algo(heads);
+	if (!options.hash_algo)
+		die("%sinfo/refs not valid: could not determine hash algorithm; "
+		    "is this a git repository?",
+		    transport_anonymize_url(url.buf));
+
 	data = heads->buf;
 	start = NULL;
 	mid = data;
@@ -272,13 +291,13 @@ static struct ref *parse_info_refs(struct discovery *heads)
 		if (data[i] == '\t')
 			mid = &data[i];
 		if (data[i] == '\n') {
-			if (mid - start != the_hash_algo->hexsz)
+			if (mid - start != options.hash_algo->hexsz)
 				die(_("%sinfo/refs not valid: is this a git repository?"),
 				    transport_anonymize_url(url.buf));
 			data[i] = 0;
 			ref_name = mid + 1;
 			ref = alloc_ref(ref_name);
-			get_oid_hex(start, &ref->old_oid);
+			get_oid_hex_algop(start, &ref->old_oid, options.hash_algo);
 			if (!refs)
 				refs = ref;
 			if (last_ref)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 35/44] builtin/index-pack: add option to specify hash algorithm
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (33 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 34/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 36/44] t1050: pass algorithm to index-pack when outside repo brian m. carlson
                     ` (8 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

git index-pack is usually run in a repository, but need not be. Since
packs don't contains information on the algorithm in use, instead
relying on context, add an option to index-pack to tell it which one
we're using in case someone runs it outside of a repository.  Since
using --stdin necessarily implies a repository, don't allow specifying
an object format if it's provided to prevent users from passing an
option that won't work.  Add documentation for this option.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-index-pack.txt | 8 ++++++++
 builtin/index-pack.c             | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index d5b7560bfe..9316d9a80b 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -93,6 +93,14 @@ OPTIONS
 --max-input-size=<size>::
 	Die, if the pack is larger than <size>.
 
+--object-format=<hash-algorithm>::
+	Specify the given object format (hash algorithm) for the pack.  The valid
+	values are 'sha1' and (if enabled) 'sha256'.  The default is the algorithm for
+	the current repository (set by `extensions.objectFormat`), or 'sha1' if no
+	value is set or outside a repository.
++
+This option cannot be used with --stdin.
+
 NOTES
 -----
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7bea1fba52..f865666db9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1667,6 +1667,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	unsigned char pack_hash[GIT_MAX_RAWSZ];
 	unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
 	int report_end_of_input = 0;
+	int hash_algo = 0;
 
 	/*
 	 * index-pack never needs to fetch missing objects except when
@@ -1760,6 +1761,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					die(_("bad %s"), arg);
 			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
 				max_input_size = strtoumax(arg, NULL, 10);
+			} else if (skip_prefix(arg, "--object-format=", &arg)) {
+				hash_algo = hash_algo_by_name(arg);
+				if (hash_algo == GIT_HASH_UNKNOWN)
+					die(_("unknown hash algorithm '%s'"), arg);
+				repo_set_hash_algo(the_repository, hash_algo);
 			} else
 				usage(index_pack_usage);
 			continue;
@@ -1776,6 +1782,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		die(_("--fix-thin cannot be used without --stdin"));
 	if (from_stdin && !startup_info->have_repository)
 		die(_("--stdin requires a git repository"));
+	if (from_stdin && hash_algo)
+		die(_("--object-format cannot be used with --stdin"));
 	if (!index_name && pack_name)
 		index_name = derive_filename(pack_name, "idx", &index_name_buf);
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 36/44] t1050: pass algorithm to index-pack when outside repo
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (34 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 35/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 37/44] remote-curl: avoid truncating refs with ls-remote brian m. carlson
                     ` (7 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When outside a repository, git index-pack is unable to guess the hash
algorithm in use for a pack, since packs don't contain any information
on the algorithm in use. Pass an option to index-pack to help it out in
this test.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1050-large.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 7f88ea07c2..6a56d1ca24 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -12,6 +12,7 @@ file_size () {
 }
 
 test_expect_success setup '
+	test_oid_init &&
 	# clone does not allow us to pass core.bigfilethreshold to
 	# new repos, so set core.bigfilethreshold globally
 	git config --global core.bigfilethreshold 200k &&
@@ -177,7 +178,8 @@ test_expect_success 'git-show a large file' '
 
 test_expect_success 'index-pack' '
 	git clone file://"$(pwd)"/.git foo &&
-	GIT_DIR=non-existent git index-pack --strict --verify foo/.git/objects/pack/*.pack
+	GIT_DIR=non-existent git index-pack --object-format=$(test_oid algo) \
+		--strict --verify foo/.git/objects/pack/*.pack
 '
 
 test_expect_success 'repack' '

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 37/44] remote-curl: avoid truncating refs with ls-remote
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (35 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 36/44] t1050: pass algorithm to index-pack when outside repo brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 38/44] t/helper: initialize the repository for test-sha1-array brian m. carlson
                     ` (6 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Normally, the remote-curl transport helper is aware of the hash
algorithm we're using because we're in a repo with the appropriate hash
algorithm set. However, when using git ls-remote outside of a
repository, we won't have initialized the hash algorithm properly, so
use hash_to_hex_algop to print the ref corresponding to the algorithm
we've detected.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 617edfedf4..c077323008 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -548,7 +548,9 @@ static void output_refs(struct ref *refs)
 		if (posn->symref)
 			printf("@%s %s\n", posn->symref, posn->name);
 		else
-			printf("%s %s\n", oid_to_hex(&posn->old_oid), posn->name);
+			printf("%s %s\n", hash_to_hex_algop(posn->old_oid.hash,
+							    options.hash_algo),
+					  posn->name);
 	}
 	printf("\n");
 	fflush(stdout);

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 38/44] t/helper: initialize the repository for test-sha1-array
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (36 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 37/44] remote-curl: avoid truncating refs with ls-remote brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 39/44] t5702: offer an object-format capability in the test brian m. carlson
                     ` (5 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

test-sha1-array uses the_hash_algo under the hood. Since t0064 wants to
use the value that is correct for the hash algorithm that we're testing,
make sure the test helper initializes the repository to set
the_hash_algo correctly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/helper/test-oid-array.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/helper/test-oid-array.c b/t/helper/test-oid-array.c
index ce9fd5f091..b16cd0b11b 100644
--- a/t/helper/test-oid-array.c
+++ b/t/helper/test-oid-array.c
@@ -12,6 +12,9 @@ int cmd__oid_array(int argc, const char **argv)
 {
 	struct oid_array array = OID_ARRAY_INIT;
 	struct strbuf line = STRBUF_INIT;
+	int nongit_ok;
+
+	setup_git_directory_gently(&nongit_ok);
 
 	while (strbuf_getline(&line, stdin) != EOF) {
 		const char *arg;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 39/44] t5702: offer an object-format capability in the test
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (37 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 38/44] t/helper: initialize the repository for test-sha1-array brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 40/44] t5703: use object-format serve option brian m. carlson
                     ` (4 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

In order to make this test work with SHA-256, offer an object-format
capability so that both sides use the same algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5702-protocol-v2.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 5039e66dc4..116358b9ac 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -13,6 +13,7 @@ start_git_daemon --export-all --enable=receive-pack
 daemon_parent=$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent
 
 test_expect_success 'create repo to be served by git-daemon' '
+	test_oid_init &&
 	git init "$daemon_parent" &&
 	test_commit -C "$daemon_parent" one
 '
@@ -394,6 +395,7 @@ test_expect_success 'even with handcrafted request, filter does not work if not
 	# Custom request that tries to filter even though it is not advertised.
 	test-tool pkt-line pack >in <<-EOF &&
 	command=fetch
+	object-format=$(test_oid algo)
 	0001
 	want $(git -C server rev-parse master)
 	filter blob:none

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 40/44] t5703: use object-format serve option
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (38 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 39/44] t5702: offer an object-format capability in the test brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 41/44] t5704: send object-format capability with SHA-256 brian m. carlson
                     ` (3 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When we're using an algorithm other than SHA-1, we need to specify the
algorithm in use so we don't get a failure with an "unknown format"
message. Add a wrapper function that specifies this header if required.
Skip specifying this header for SHA-1 to test that it works both with an
without this header.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5703-upload-pack-ref-in-want.sh | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/t/t5703-upload-pack-ref-in-want.sh b/t/t5703-upload-pack-ref-in-want.sh
index a34460f7d8..afe7f7f919 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -27,6 +27,15 @@ check_output () {
 	test_cmp sorted_commits actual_commits
 }
 
+write_command () {
+	echo "command=$1"
+
+	if test "$(test_oid algo)" != sha1
+	then
+		echo "object-format=$(test_oid algo)"
+	fi
+}
+
 # c(o/foo) d(o/bar)
 #        \ /
 #         b   e(baz)  f(master)
@@ -62,7 +71,7 @@ test_expect_success 'config controls ref-in-want advertisement' '
 
 test_expect_success 'invalid want-ref line' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/non-existent
@@ -83,7 +92,7 @@ test_expect_success 'basic want-ref' '
 
 	oid=$(git rev-parse a) &&
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/master
@@ -107,7 +116,7 @@ test_expect_success 'multiple want-ref lines' '
 
 	oid=$(git rev-parse b) &&
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/o/foo
@@ -129,7 +138,7 @@ test_expect_success 'mix want and want-ref' '
 	git rev-parse e f >expected_commits &&
 
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/master
@@ -152,7 +161,7 @@ test_expect_success 'want-ref with ref we already have commit for' '
 
 	oid=$(git rev-parse c) &&
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/o/foo

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 41/44] t5704: send object-format capability with SHA-256
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (39 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 40/44] t5703: use object-format serve option brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 42/44] t5300: pass --object-format to git index-pack brian m. carlson
                     ` (2 subsequent siblings)
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When we speak protocol v2 in this test, we must pass the object-format
header if the algorithm is not SHA-1.  Otherwise, git upload-pack fails
because the hash algorithm doesn't match and not because we've failed to
speak the protocol correctly.  Pass the header so that our assertions
test what we're really interested in.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5704-protocol-violations.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t5704-protocol-violations.sh b/t/t5704-protocol-violations.sh
index 950cfb21fe..5c941949b9 100755
--- a/t/t5704-protocol-violations.sh
+++ b/t/t5704-protocol-violations.sh
@@ -9,6 +9,7 @@ making sure that we do not segfault or otherwise behave badly.'
 test_expect_success 'extra delim packet in v2 ls-refs args' '
 	{
 		packetize command=ls-refs &&
+		packetize "object-format=$(test_oid algo)" &&
 		printf 0001 &&
 		# protocol expects 0000 flush here
 		printf 0001
@@ -21,6 +22,7 @@ test_expect_success 'extra delim packet in v2 ls-refs args' '
 test_expect_success 'extra delim packet in v2 fetch args' '
 	{
 		packetize command=fetch &&
+		packetize "object-format=$(test_oid algo)" &&
 		printf 0001 &&
 		# protocol expects 0000 flush here
 		printf 0001

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 42/44] t5300: pass --object-format to git index-pack
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (40 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 41/44] t5704: send object-format capability with SHA-256 brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 43/44] bundle: detect hash algorithm when reading refs brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 44/44] remote-testgit: adapt for object-format brian m. carlson
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

git index-pack by default reads the repository to determine the object
format. However, when outside of a repository, it's necessary to specify
the hash algorithm in use so that the pack can be properly indexed. Add
an --object-format argument when invoking git index-pack outside of a
repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5300-pack-object.sh | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 410a09b0dd..746cdb626e 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -12,7 +12,8 @@ TRASH=$(pwd)
 
 test_expect_success \
     'setup' \
-    'rm -f .git/index* &&
+    'test_oid_init &&
+     rm -f .git/index* &&
      perl -e "print \"a\" x 4096;" > a &&
      perl -e "print \"b\" x 4096;" > b &&
      perl -e "print \"c\" x 4096;" > c &&
@@ -412,18 +413,18 @@ test_expect_success 'set up pack for non-repo tests' '
 '
 
 test_expect_success 'index-pack --stdin complains of non-repo' '
-	nongit test_must_fail git index-pack --stdin <foo.pack &&
+	nongit test_must_fail git index-pack --object-format=$(test_oid algo) --stdin <foo.pack &&
 	test_path_is_missing non-repo/.git
 '
 
 test_expect_success 'index-pack <pack> works in non-repo' '
-	nongit git index-pack ../foo.pack &&
+	nongit git index-pack --object-format=$(test_oid algo) ../foo.pack &&
 	test_path_is_file foo.idx
 '
 
 test_expect_success 'index-pack --strict <pack> works in non-repo' '
 	rm -f foo.idx &&
-	nongit git index-pack --strict ../foo.pack &&
+	nongit git index-pack --strict --object-format=$(test_oid algo) ../foo.pack &&
 	test_path_is_file foo.idx
 '
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 43/44] bundle: detect hash algorithm when reading refs
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (41 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 42/44] t5300: pass --object-format to git index-pack brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  2020-05-25 19:59   ` [PATCH v2 44/44] remote-testgit: adapt for object-format brian m. carlson
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

Much like with the dumb HTTP transport, there isn't a way to explicitly
specify the hash algorithm when dealing with a bundle, so detect the
algorithm based on the length of the object IDs in the prerequisites and
ref advertisements.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 bundle.c    | 22 +++++++++++++++++++++-
 bundle.h    |  1 +
 transport.c | 10 ++++++++--
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/bundle.c b/bundle.c
index 99439e07a1..2a0d744d3f 100644
--- a/bundle.c
+++ b/bundle.c
@@ -23,6 +23,17 @@ static void add_to_ref_list(const struct object_id *oid, const char *name,
 	list->nr++;
 }
 
+static const struct git_hash_algo *detect_hash_algo(struct strbuf *buf)
+{
+	size_t len = strcspn(buf->buf, " \n");
+	int algo;
+
+	algo = hash_algo_by_length(len / 2);
+	if (algo == GIT_HASH_UNKNOWN)
+		return NULL;
+	return &hash_algos[algo];
+}
+
 static int parse_bundle_header(int fd, struct bundle_header *header,
 			       const char *report_path)
 {
@@ -52,12 +63,21 @@ static int parse_bundle_header(int fd, struct bundle_header *header,
 		}
 		strbuf_rtrim(&buf);
 
+		if (!header->hash_algo) {
+			header->hash_algo = detect_hash_algo(&buf);
+			if (!header->hash_algo) {
+				error(_("unknown hash algorithm length"));
+				status = -1;
+				break;
+			}
+		}
+
 		/*
 		 * Tip lines have object name, SP, and refname.
 		 * Prerequisites have object name that is optionally
 		 * followed by SP and subject line.
 		 */
-		if (parse_oid_hex(buf.buf, &oid, &p) ||
+		if (parse_oid_hex_algop(buf.buf, &oid, &p, header->hash_algo) ||
 		    (*p && !isspace(*p)) ||
 		    (!is_prereq && !*p)) {
 			if (report_path)
diff --git a/bundle.h b/bundle.h
index ceab0c7475..2dc9442024 100644
--- a/bundle.h
+++ b/bundle.h
@@ -15,6 +15,7 @@ struct ref_list {
 struct bundle_header {
 	struct ref_list prerequisites;
 	struct ref_list references;
+	const struct git_hash_algo *hash_algo;
 };
 
 int is_bundle(const char *path, int quiet);
diff --git a/transport.c b/transport.c
index b43d985f90..38a432be69 100644
--- a/transport.c
+++ b/transport.c
@@ -143,6 +143,9 @@ static struct ref *get_refs_from_bundle(struct transport *transport,
 	data->fd = read_bundle_header(transport->url, &data->header);
 	if (data->fd < 0)
 		die(_("could not read bundle '%s'"), transport->url);
+
+	transport->hash_algo = data->header.hash_algo;
+
 	for (i = 0; i < data->header.references.nr; i++) {
 		struct ref_list_entry *e = data->header.references.list + i;
 		struct ref *ref = alloc_ref(e->name);
@@ -157,11 +160,14 @@ static int fetch_refs_from_bundle(struct transport *transport,
 			       int nr_heads, struct ref **to_fetch)
 {
 	struct bundle_transport_data *data = transport->data;
+	int ret;
 
 	if (!data->get_refs_from_bundle_called)
 		get_refs_from_bundle(transport, 0, NULL);
-	return unbundle(the_repository, &data->header, data->fd,
-			transport->progress ? BUNDLE_VERBOSE : 0);
+	ret = unbundle(the_repository, &data->header, data->fd,
+			   transport->progress ? BUNDLE_VERBOSE : 0);
+	transport->hash_algo = data->header.hash_algo;
+	return ret;
 }
 
 static int close_bundle(struct transport *transport)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v2 44/44] remote-testgit: adapt for object-format
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (42 preceding siblings ...)
  2020-05-25 19:59   ` [PATCH v2 43/44] bundle: detect hash algorithm when reading refs brian m. carlson
@ 2020-05-25 19:59   ` brian m. carlson
  43 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-05-25 19:59 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren

When using an algorithm other than SHA-1, we need the remote helper to
advertise support for the object-format extension and provide
information back to us so that we can properly parse refs and return
data. Ensure that the test remote helper understands these extensions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5801/git-remote-testgit | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
index 6b9f0b5dc7..1544d6dc6b 100755
--- a/t/t5801/git-remote-testgit
+++ b/t/t5801/git-remote-testgit
@@ -52,9 +52,11 @@ do
 		test -n "$GIT_REMOTE_TESTGIT_SIGNED_TAGS" && echo "signed-tags"
 		test -n "$GIT_REMOTE_TESTGIT_NO_PRIVATE_UPDATE" && echo "no-private-update"
 		echo 'option'
+		echo 'object-format'
 		echo
 		;;
 	list)
+		echo ":object-format $(git rev-parse --show-object-format=storage)"
 		git for-each-ref --format='? %(refname)' 'refs/heads/' 'refs/tags/'
 		head=$(git symbolic-ref HEAD)
 		echo "@$head HEAD"
@@ -139,6 +141,10 @@ do
 			test $val = "true" && force="true" || force=
 			echo "ok"
 			;;
+		object-format)
+			test $val = "true" && object_format="true" || object_format=
+			echo "ok"
+			;;
 		*)
 			echo "unsupported"
 			;;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality
  2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                   ` (44 preceding siblings ...)
  2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
@ 2020-06-19 17:55 ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
                     ` (44 more replies)
  45 siblings, 45 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

This is part 2 of 3 of the SHA-256 work.  This series adds all of the
protocol logic to work with SHA-256 repositories.

v3 fixes a bug in patch 34 which prevented cloning an empty repository
with the dumb HTTP protocol.  We look up the hash algorithm by length of
the data in the info/refs file and if we have no refs, we have no
entries.

Previously, we just failed and complained, which isn't really helpful,
nor is it backward compatible.  So now we use whatever the default is
for the current repository.  That means we honor GIT_DEFAULT_HASH or git
clone -c, and default to SHA-1 otherwise.  Users are encouraged to
switch to the smart protocol if they need to distinguish the remote
side's hash algorithm when the repository is empty.

There are tests for the default hash behavior, but not for git clone -c,
because the extensions.objectformat option doesn't exist yet.  I have
tested that it does indeed work, though.

Otherwise, this series is the same as v2 except for a rebase (for my
convenience and Junio's).

Changes from v2:
* Rebase onto master.
* Fix cloning an empty repository with the dumb HTTP protocol.

Changes from v1:
* Fix spurious line additions and deletions.
* Rename len to linelen for easier understanding.
* Move the documentation comment for xstrncmpz to the header.
* Drop a useless variable (found).
* Update several commit messages to better explain things as suggested
  by Junio and Martin.
* Name the parameters for parse_feature_value for better documentation.
* Reduce the scope of variables when possible.
* Add explicit handling for missing object-format capabilities.
* Rename all new options to --object-format.
* Use oidcpy where possible.
* Test more failure cases.
* Have index-pack fail if --stdin and --object-format are both
  specified.
* Move and simplify t5704.
* Other miscellaneous cleanups to respond to review feedback.

Range-diff below.

brian m. carlson (44):
  t1050: match object ID paths in a hash-insensitive way
  Documentation: document v1 protocol object-format capability
  connect: have ref processing code take struct packet_reader
  wrapper: add function to compare strings with different NUL
    termination
  remote: advertise the object-format capability on the server side
  connect: add function to parse multiple v1 capability values
  connect: add function to fetch value of a v2 server capability
  pkt-line: add a member for hash algorithm
  transport: add a hash algorithm member
  connect: add function to detect supported v1 hash functions
  send-pack: detect when the server doesn't support our hash
  connect: make parse_feature_value extern
  fetch-pack: detect when the server doesn't support our hash
  connect: detect algorithm when fetching refs
  builtin/receive-pack: detect when the server doesn't support our hash
  docs: update remote helper docs for object-format extensions
  transport-helper: implement object-format extensions
  remote-curl: implement object-format extensions
  builtin/clone: initialize hash algorithm properly
  t5562: pass object-format in synthesized test data
  fetch-pack: parse and advertise the object-format capability
  setup: set the_repository's hash algo when checking format
  t3200: mark assertion with SHA1 prerequisite
  packfile: compute and use the index CRC offset
  t5302: modernize test formatting
  builtin/show-index: provide options to determine hash algo
  t1302: expect repo format version 1 for SHA-256
  Documentation/technical: document object-format for protocol v2
  connect: pass full packet reader when parsing v2 refs
  connect: parse v2 refs with correct hash algorithm
  serve: advertise object-format capability for protocol v2
  t5500: make hash independent
  builtin/ls-remote: initialize repository based on fetch
  remote-curl: detect algorithm for dumb HTTP by size
  builtin/index-pack: add option to specify hash algorithm
  t1050: pass algorithm to index-pack when outside repo
  remote-curl: avoid truncating refs with ls-remote
  t/helper: initialize the repository for test-sha1-array
  t5702: offer an object-format capability in the test
  t5703: use object-format serve option
  t5704: send object-format capability with SHA-256
  t5300: pass --object-format to git index-pack
  bundle: detect hash algorithm when reading refs
  remote-testgit: adapt for object-format

 Documentation/git-index-pack.txt              |   8 +
 Documentation/git-show-index.txt              |  11 +-
 Documentation/gitremote-helpers.txt           |  33 +-
 .../technical/protocol-capabilities.txt       |  15 +
 Documentation/technical/protocol-v2.txt       |   9 +
 builtin/clone.c                               |   9 +
 builtin/index-pack.c                          |  14 +-
 builtin/ls-remote.c                           |   4 +
 builtin/receive-pack.c                        |  10 +
 builtin/show-index.c                          |  29 +-
 bundle.c                                      |  22 +-
 bundle.h                                      |   1 +
 connect.c                                     | 138 +++++--
 connect.h                                     |   3 +
 fetch-pack.c                                  |  14 +
 git-compat-util.h                             |   6 +
 git.c                                         |   2 +-
 object-store.h                                |   1 +
 packfile.c                                    |   1 +
 pkt-line.c                                    |   1 +
 pkt-line.h                                    |   3 +
 remote-curl.c                                 |  46 ++-
 send-pack.c                                   |   6 +
 serve.c                                       |  27 ++
 setup.c                                       |   1 +
 t/helper/test-oid-array.c                     |   3 +
 t/t1050-large.sh                              |   6 +-
 t/t1302-repo-version.sh                       |   6 +-
 t/t3200-branch.sh                             |   2 +-
 t/t5300-pack-object.sh                        |   9 +-
 t/t5302-pack-index.sh                         | 360 +++++++++---------
 t/t5500-fetch-pack.sh                         |   5 +-
 t/t5550-http-fetch-dumb.sh                    |  18 +
 t/t5562-http-backend-content-length.sh        |   5 +-
 t/t5701-git-serve.sh                          |  25 ++
 t/t5702-protocol-v2.sh                        |   2 +
 t/t5703-upload-pack-ref-in-want.sh            |  19 +-
 t/t5704-protocol-violations.sh                |   2 +
 t/t5801/git-remote-testgit                    |   6 +
 t/test-lib.sh                                 |   1 +
 transport-helper.c                            |  24 +-
 transport.c                                   |  18 +-
 transport.h                                   |   8 +
 upload-pack.c                                 |   3 +-
 wrapper.c                                     |   8 +
 45 files changed, 696 insertions(+), 248 deletions(-)

Range-diff against v2:
 1:  5878fe6a98 =  1:  3504602e31 t1050: match object ID paths in a hash-insensitive way
 2:  402864eaa3 =  2:  150ccddb98 Documentation: document v1 protocol object-format capability
 3:  d124692e2f =  3:  b86ec9fffe connect: have ref processing code take struct packet_reader
 4:  cce29662b4 =  4:  f048e638e5 wrapper: add function to compare strings with different NUL termination
 5:  3b207e304b !  5:  99261e8221 remote: advertise the object-format capability on the server side
    @@ upload-pack.c
     @@ upload-pack.c: static int send_ref(const char *refname, const struct object_id *oid,
      		struct strbuf symref_info = STRBUF_INIT;
      
    - 		format_symref_info(&symref_info, cb_data);
    + 		format_symref_info(&symref_info, &data->symref);
     -		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s agent=%s\n",
     +		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s object-format=%s agent=%s\n",
      			     oid_to_hex(oid), refname_nons,
      			     0, capabilities,
      			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
     @@ upload-pack.c: static int send_ref(const char *refname, const struct object_id *oid,
    - 			     stateless_rpc ? " no-done" : "",
    + 			     data->stateless_rpc ? " no-done" : "",
      			     symref_info.buf,
      			     allow_filter ? " filter" : "",
     +			     the_hash_algo->name,
 6:  235d7f5b8f =  6:  5504199a26 connect: add function to parse multiple v1 capability values
 7:  0324e126b1 =  7:  59d1b463bf connect: add function to fetch value of a v2 server capability
 8:  cdba3122ce =  8:  1b2789cab7 pkt-line: add a member for hash algorithm
 9:  c8233c3b42 =  9:  971d05e2c7 transport: add a hash algorithm member
10:  b9273c4021 = 10:  7b90abd41a connect: add function to detect supported v1 hash functions
11:  e2d37b75c8 = 11:  578676762d send-pack: detect when the server doesn't support our hash
12:  602734cbbb = 12:  131e98603a connect: make parse_feature_value extern
13:  d97fa2c8aa = 13:  a786478005 fetch-pack: detect when the server doesn't support our hash
14:  ba052f1da7 = 14:  3436a6db7b connect: detect algorithm when fetching refs
15:  661d94d4de = 15:  c8d0760e3f builtin/receive-pack: detect when the server doesn't support our hash
16:  fd8b85390c = 16:  944bf6ab9a docs: update remote helper docs for object-format extensions
17:  32285e611f = 17:  9f072d34dc transport-helper: implement object-format extensions
18:  a33d1ed9a0 = 18:  2bdc53a8d9 remote-curl: implement object-format extensions
19:  fffdf0780d = 19:  87c6cd32f7 builtin/clone: initialize hash algorithm properly
20:  f616f85b4b = 20:  e2cc4d34fe t5562: pass object-format in synthesized test data
21:  eca43da42e = 21:  34b712a983 fetch-pack: parse and advertise the object-format capability
22:  22c1a62e10 = 22:  3e43a7d314 setup: set the_repository's hash algo when checking format
23:  7c7f2263d5 = 23:  fcf0ef64f0 t3200: mark assertion with SHA1 prerequisite
24:  ee8a71a926 = 24:  192578b5dd packfile: compute and use the index CRC offset
25:  6afecf0b09 = 25:  a560ed3194 t5302: modernize test formatting
26:  99a847ba4e = 26:  9a079f99a0 builtin/show-index: provide options to determine hash algo
27:  9f7c7bafaf = 27:  56d97c9c94 t1302: expect repo format version 1 for SHA-256
28:  d0ea597d63 = 28:  8c25898ded Documentation/technical: document object-format for protocol v2
29:  51848df542 = 29:  39af318fb0 connect: pass full packet reader when parsing v2 refs
30:  b57361f3b8 ! 30:  5b334cb9b9 connect: parse v2 refs with correct hash algorithm
    @@ connect.c: static int process_ref_v2(struct packet_reader *reader, struct ref **
      			*list = &peeled->next;
      
     @@ connect.c: struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
    - 			     const struct string_list *server_options)
    + 			     int stateless_rpc)
      {
      	int i;
     +	const char *hash_name;
31:  a0c0f0f7a3 = 31:  955ea0b4cb serve: advertise object-format capability for protocol v2
32:  1694f3f838 = 32:  5172e56115 t5500: make hash independent
33:  902b394667 = 33:  79b576238f builtin/ls-remote: initialize repository based on fetch
34:  cc12b9b51f ! 34:  0ecf5c1a1f remote-curl: detect algorithm for dumb HTTP by size
    @@ Commit message
         provide one. Detect the hash algorithm in use by the size of the first
         object ID.
     
    +    If we have an empty repository, we don't know what the hash algorithm is
    +    on the remote side, so default to whatever the local side has
    +    configured.  Without doing this, we cannot clone an empty repository
    +    since we don't know its hash algorithm.  Test this case appropriately,
    +    since we currently have no tests for cloning an empty repository with
    +    the dumb HTTP protocol.
    +
         We anonymize the URL like elsewhere in the function in case the user has
         decided to include a secret in the URL.
     
    @@ remote-curl.c: static struct ref *parse_git_refs(struct discovery *heads, int fo
     +	const char *p = memchr(heads->buf, '\t', heads->len);
     +	int algo;
     +	if (!p)
    -+		return NULL;
    ++		return the_hash_algo;
     +
     +	algo = hash_algo_by_length((p - heads->buf) / 2);
     +	if (algo == GIT_HASH_UNKNOWN)
    @@ remote-curl.c: static struct ref *parse_info_refs(struct discovery *heads)
      			if (!refs)
      				refs = ref;
      			if (last_ref)
    +
    + ## t/t5550-http-fetch-dumb.sh ##
    +@@ t/t5550-http-fetch-dumb.sh: test_expect_success 'create password-protected repository' '
    + 	       "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/repo.git"
    + '
    + 
    ++test_expect_success 'create empty remote repository' '
    ++	git init --bare "$HTTPD_DOCUMENT_ROOT_PATH/empty.git" &&
    ++	(cd "$HTTPD_DOCUMENT_ROOT_PATH/empty.git" &&
    ++	 mkdir -p hooks &&
    ++	 write_script "hooks/post-update" <<-\EOF &&
    ++	 exec git update-server-info
    ++	EOF
    ++	 hooks/post-update
    ++	)
    ++'
    ++
    ++test_expect_success 'empty dumb HTTP repository has default hash algorithm' '
    ++	test_when_finished "rm -fr clone-empty" &&
    ++	git clone $HTTPD_URL/dumb/empty.git clone-empty &&
    ++	git -C clone-empty rev-parse --show-object-format >empty-format &&
    ++	test "$(cat empty-format)" = "$(test_oid algo)"
    ++'
    ++
    + setup_askpass_helper
    + 
    + test_expect_success 'cloning password-protected repository can fail' '
35:  b5425c9f54 = 35:  95bc1ef6fb builtin/index-pack: add option to specify hash algorithm
36:  5c70c24d7a = 36:  89c389100e t1050: pass algorithm to index-pack when outside repo
37:  460d6008e8 = 37:  bd93aabe6b remote-curl: avoid truncating refs with ls-remote
38:  60a98d9b53 = 38:  a8e31600cc t/helper: initialize the repository for test-sha1-array
39:  b66c3ead37 = 39:  d217758a83 t5702: offer an object-format capability in the test
40:  af43274a1f = 40:  6d1964142c t5703: use object-format serve option
41:  f5085b1f3f = 41:  84cad9b3ba t5704: send object-format capability with SHA-256
42:  a1b01babda = 42:  1e96dbf3dc t5300: pass --object-format to git index-pack
43:  dbb5f7195e = 43:  14cb067334 bundle: detect hash algorithm when reading refs
44:  6c823bbe68 = 44:  816d08eb2e remote-testgit: adapt for object-format

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 01/44] t1050: match object ID paths in a hash-insensitive way
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
                     ` (43 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

The pattern here looking for failures is specific to SHA-1.  Let's
create a variable that matches the regex or glob pattern for a path
within the objects directory.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1050-large.sh | 2 +-
 t/test-lib.sh    | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 184b479a21..7f88ea07c2 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -64,7 +64,7 @@ test_expect_success 'add a large file or two' '
 	test $count = 1 &&
 	cnt=$(git show-index <"$idx" | wc -l) &&
 	test $cnt = 2 &&
-	for l in .git/objects/??/??????????????????????????????????????
+	for l in .git/objects/$OIDPATH_REGEX
 	do
 		test_path_is_file "$l" || continue
 		bad=t
diff --git a/t/test-lib.sh b/t/test-lib.sh
index dbc027ff26..618a7c8d5b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1414,6 +1414,7 @@ test_oid_init
 
 ZERO_OID=$(test_oid zero)
 OID_REGEX=$(echo $ZERO_OID | sed -e 's/0/[0-9a-f]/g')
+OIDPATH_REGEX=$(test_oid_to_path $ZERO_OID | sed -e 's/0/[0-9a-f]/g')
 EMPTY_TREE=$(test_oid empty_tree)
 EMPTY_BLOB=$(test_oid empty_blob)
 _z40=$ZERO_OID

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 02/44] Documentation: document v1 protocol object-format capability
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
                     ` (42 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Document a capability that indicates which hash algorithms are in use by
both sides of a remote connection.  Use the term "object-format", since
this is the term used for the repository extension as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/technical/protocol-capabilities.txt | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt
index 2b267c0da6..36ccd14f97 100644
--- a/Documentation/technical/protocol-capabilities.txt
+++ b/Documentation/technical/protocol-capabilities.txt
@@ -176,6 +176,21 @@ agent strings are purely informative for statistics and debugging
 purposes, and MUST NOT be used to programmatically assume the presence
 or absence of particular features.
 
+object-format
+-------------
+
+This capability, which takes a hash algorithm as an argument, indicates
+that the server supports the given hash algorithms.  It may be sent
+multiple times; if so, the first one given is the one used in the ref
+advertisement.
+
+When provided by the client, this indicates that it intends to use the
+given hash algorithm to communicate.  The algorithm provided must be one
+that the server supports.
+
+If this capability is not provided, it is assumed that the only
+supported algorithm is SHA-1.
+
 symref
 ------
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 03/44] connect: have ref processing code take struct packet_reader
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
                     ` (41 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

In a future patch, we'll want to access multiple members from struct
packet_reader when parsing references.  Therefore, have the ref parsing
code take pointers to struct reader instead of having to pass multiple
arguments to each function.

Rename the len variable to "linelen" to make it clearer what the
variable does in light of the variable change.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/connect.c b/connect.c
index 0df45a1108..e66e779ebd 100644
--- a/connect.c
+++ b/connect.c
@@ -205,17 +205,19 @@ static void annotate_refs_with_symref_info(struct ref *ref)
 	string_list_clear(&symref, 0);
 }
 
-static void process_capabilities(const char *line, int *len)
+static void process_capabilities(struct packet_reader *reader, int *linelen)
 {
+	const char *line = reader->line;
 	int nul_location = strlen(line);
-	if (nul_location == *len)
+	if (nul_location == *linelen)
 		return;
 	server_capabilities_v1 = xstrdup(line + nul_location + 1);
-	*len = nul_location;
+	*linelen = nul_location;
 }
 
-static int process_dummy_ref(const char *line)
+static int process_dummy_ref(const struct packet_reader *reader)
 {
+	const char *line = reader->line;
 	struct object_id oid;
 	const char *name;
 
@@ -235,9 +237,11 @@ static void check_no_capabilities(const char *line, int len)
 			line + strlen(line));
 }
 
-static int process_ref(const char *line, int len, struct ref ***list,
-		       unsigned int flags, struct oid_array *extra_have)
+static int process_ref(const struct packet_reader *reader, int len,
+		       struct ref ***list, unsigned int flags,
+		       struct oid_array *extra_have)
 {
+	const char *line = reader->line;
 	struct object_id old_oid;
 	const char *name;
 
@@ -261,9 +265,10 @@ static int process_ref(const char *line, int len, struct ref ***list,
 	return 1;
 }
 
-static int process_shallow(const char *line, int len,
+static int process_shallow(const struct packet_reader *reader, int len,
 			   struct oid_array *shallow_points)
 {
+	const char *line = reader->line;
 	const char *arg;
 	struct object_id old_oid;
 
@@ -317,20 +322,20 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 
 		switch (state) {
 		case EXPECTING_FIRST_REF:
-			process_capabilities(reader->line, &len);
-			if (process_dummy_ref(reader->line)) {
+			process_capabilities(reader, &len);
+			if (process_dummy_ref(reader)) {
 				state = EXPECTING_SHALLOW;
 				break;
 			}
 			state = EXPECTING_REF;
 			/* fallthrough */
 		case EXPECTING_REF:
-			if (process_ref(reader->line, len, &list, flags, extra_have))
+			if (process_ref(reader, len, &list, flags, extra_have))
 				break;
 			state = EXPECTING_SHALLOW;
 			/* fallthrough */
 		case EXPECTING_SHALLOW:
-			if (process_shallow(reader->line, len, shallow_points))
+			if (process_shallow(reader, len, shallow_points))
 				break;
 			die(_("protocol error: unexpected '%s'"), reader->line);
 		case EXPECTING_DONE:

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 04/44] wrapper: add function to compare strings with different NUL termination
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (2 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 05/44] remote: advertise the object-format capability on the server side brian m. carlson
                     ` (40 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When parsing capabilities for the pack protocol, there are times we'll
want to compare the value of a capability to a NUL-terminated string.
Since the data we're reading will be space-terminated, not
NUL-terminated, we need a function that compares the two strings, but
also checks that they're the same length.  Otherwise, if we used strncmp
to compare these strings, we might accidentally accept a parameter that
was a prefix of the expected value.

Add a function, xstrncmpz, that takes a NUL-terminated string and a
non-NUL-terminated string, plus a length, and compares them, ensuring
that they are the same length.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 git-compat-util.h | 6 ++++++
 wrapper.c         | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index a73632e8e4..5637114b8d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -868,6 +868,12 @@ char *xgetcwd(void);
 FILE *fopen_for_writing(const char *path);
 FILE *fopen_or_warn(const char *path, const char *mode);
 
+/*
+ * Like strncmp, but only return zero if s is NUL-terminated and exactly len
+ * characters long.  If it is not, consider it greater than t.
+ */
+int xstrncmpz(const char *s, const char *t, size_t len);
+
 /*
  * FREE_AND_NULL(ptr) is like free(ptr) followed by ptr = NULL. Note
  * that ptr is used twice, so don't pass e.g. ptr++.
diff --git a/wrapper.c b/wrapper.c
index 3a1c0e0526..4ff4a9c3db 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -105,6 +105,14 @@ char *xstrndup(const char *str, size_t len)
 	return xmemdupz(str, p ? p - str : len);
 }
 
+int xstrncmpz(const char *s, const char *t, size_t len)
+{
+	int res = strncmp(s, t, len);
+	if (res)
+		return res;
+	return s[len] == '\0' ? 0 : 1;
+}
+
 void *xrealloc(void *ptr, size_t size)
 {
 	void *ret;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 05/44] remote: advertise the object-format capability on the server side
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (3 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 06/44] connect: add function to parse multiple v1 capability values brian m. carlson
                     ` (39 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Advertise the current hash algorithm in use by using the object-format
capability as part of the ref advertisement.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/receive-pack.c | 1 +
 upload-pack.c          | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index ea3d0f01af..4ffa501dce 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -249,6 +249,7 @@ static void show_ref(const char *path, const struct object_id *oid)
 			strbuf_addf(&cap, " push-cert=%s", push_cert_nonce);
 		if (advertise_push_options)
 			strbuf_addstr(&cap, " push-options");
+		strbuf_addf(&cap, " object-format=%s", the_hash_algo->name);
 		strbuf_addf(&cap, " agent=%s", git_user_agent_sanitized());
 		packet_write_fmt(1, "%s %s%c%s\n",
 			     oid_to_hex(oid), path, 0, cap.buf);
diff --git a/upload-pack.c b/upload-pack.c
index 401c9e6c4b..a72eef9d14 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -1063,7 +1063,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		struct strbuf symref_info = STRBUF_INIT;
 
 		format_symref_info(&symref_info, &data->symref);
-		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s agent=%s\n",
+		packet_write_fmt(1, "%s %s%c%s%s%s%s%s%s object-format=%s agent=%s\n",
 			     oid_to_hex(oid), refname_nons,
 			     0, capabilities,
 			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
@@ -1073,6 +1073,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 			     data->stateless_rpc ? " no-done" : "",
 			     symref_info.buf,
 			     allow_filter ? " filter" : "",
+			     the_hash_algo->name,
 			     git_user_agent_sanitized());
 		strbuf_release(&symref_info);
 	} else {

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 06/44] connect: add function to parse multiple v1 capability values
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (4 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 05/44] remote: advertise the object-format capability on the server side brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
                     ` (38 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

In a capability response, we can have multiple symref entries.  In the
future, we will also allow for multiple hash algorithms to be specified.
To avoid duplication, expand the parse_feature_value function to take an
optional offset where the parsing should begin next time.  Add a wrapper
function that allows us to query the next server feature value, and use
it in the existing symref parsing code.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/connect.c b/connect.c
index e66e779ebd..d30c637ab3 100644
--- a/connect.c
+++ b/connect.c
@@ -18,7 +18,8 @@
 
 static char *server_capabilities_v1;
 static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
-static const char *parse_feature_value(const char *, const char *, int *);
+static const char *parse_feature_value(const char *, const char *, int *, int *);
+static const char *next_server_feature_value(const char *feature, int *len, int *offset);
 
 static int check_ref(const char *name, unsigned int flags)
 {
@@ -181,17 +182,16 @@ static void parse_one_symref_info(struct string_list *symref, const char *val, i
 static void annotate_refs_with_symref_info(struct ref *ref)
 {
 	struct string_list symref = STRING_LIST_INIT_DUP;
-	const char *feature_list = server_capabilities_v1;
+	int offset = 0;
 
-	while (feature_list) {
+	while (1) {
 		int len;
 		const char *val;
 
-		val = parse_feature_value(feature_list, "symref", &len);
+		val = next_server_feature_value("symref", &len, &offset);
 		if (!val)
 			break;
 		parse_one_symref_info(&symref, val, len);
-		feature_list = val + 1;
 	}
 	string_list_sort(&symref);
 
@@ -468,7 +468,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	return list;
 }
 
-static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp)
+static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
 {
 	int len;
 
@@ -476,6 +476,8 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 		return NULL;
 
 	len = strlen(feature);
+	if (offset)
+		feature_list += *offset;
 	while (*feature_list) {
 		const char *found = strstr(feature_list, feature);
 		if (!found)
@@ -490,9 +492,14 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 			}
 			/* feature with a value (e.g., "agent=git/1.2.3") */
 			else if (*value == '=') {
+				int end;
+
 				value++;
+				end = strcspn(value, " \t\n");
 				if (lenp)
-					*lenp = strcspn(value, " \t\n");
+					*lenp = end;
+				if (offset)
+					*offset = value + end - feature_list;
 				return value;
 			}
 			/*
@@ -507,12 +514,17 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 
 int parse_feature_request(const char *feature_list, const char *feature)
 {
-	return !!parse_feature_value(feature_list, feature, NULL);
+	return !!parse_feature_value(feature_list, feature, NULL, NULL);
+}
+
+static const char *next_server_feature_value(const char *feature, int *len, int *offset)
+{
+	return parse_feature_value(server_capabilities_v1, feature, len, offset);
 }
 
 const char *server_feature_value(const char *feature, int *len)
 {
-	return parse_feature_value(server_capabilities_v1, feature, len);
+	return parse_feature_value(server_capabilities_v1, feature, len, NULL);
 }
 
 int server_supports(const char *feature)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 07/44] connect: add function to fetch value of a v2 server capability
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (5 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 06/44] connect: add function to parse multiple v1 capability values brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 08/44] pkt-line: add a member for hash algorithm brian m. carlson
                     ` (37 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

So far in protocol v2, all of our server capabilities that have values
have not had values that we've been interested in parsing.  For example,
we receive but ignore the agent value.

However, in a future commit, we're going to want to parse out the value
of a server capability.  To make this easy, add a function,
server_feature_v2, that can fetch the value provided as part of the
server capability.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 15 +++++++++++++++
 connect.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/connect.c b/connect.c
index d30c637ab3..2cded78b0a 100644
--- a/connect.c
+++ b/connect.c
@@ -84,6 +84,21 @@ int server_supports_v2(const char *c, int die_on_error)
 	return 0;
 }
 
+int server_feature_v2(const char *c, const char **v)
+{
+	int i;
+
+	for (i = 0; i < server_capabilities_v2.argc; i++) {
+		const char *out;
+		if (skip_prefix(server_capabilities_v2.argv[i], c, &out) &&
+		    (*out == '=')) {
+			*v = out + 1;
+			return 1;
+		}
+	}
+	return 0;
+}
+
 int server_supports_feature(const char *c, const char *feature,
 			    int die_on_error)
 {
diff --git a/connect.h b/connect.h
index 235bc66254..88702fdd17 100644
--- a/connect.h
+++ b/connect.h
@@ -19,6 +19,7 @@ struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
 int server_supports_v2(const char *c, int die_on_error);
+int server_feature_v2(const char *c, const char **v);
 int server_supports_feature(const char *c, const char *feature,
 			    int die_on_error);
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 08/44] pkt-line: add a member for hash algorithm
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (6 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 09/44] transport: add a hash algorithm member brian m. carlson
                     ` (36 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Add a member for the hash algorithm currently in use to the packet
reader so it can parse references correctly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 pkt-line.c | 1 +
 pkt-line.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 8f9bc68ee2..844c253ccd 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -490,6 +490,7 @@ void packet_reader_init(struct packet_reader *reader, int fd,
 	reader->buffer_size = sizeof(packet_buffer);
 	reader->options = options;
 	reader->me = "git";
+	reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
 }
 
 enum packet_read_status packet_reader_read(struct packet_reader *reader)
diff --git a/pkt-line.h b/pkt-line.h
index 5b373fe4cd..8c90daa59e 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -177,6 +177,9 @@ struct packet_reader {
 
 	unsigned use_sideband : 1;
 	const char *me;
+
+	/* hash algorithm in use */
+	const struct git_hash_algo *hash_algo;
 };
 
 /*

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 09/44] transport: add a hash algorithm member
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (7 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 08/44] pkt-line: add a member for hash algorithm brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
                     ` (35 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When connecting to a remote system, we need to know what hash algorithm
it will be using to talk to us.  Add a hash_algo member to struct
transport and add a function to read this data from the transport
object.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 transport.c | 8 ++++++++
 transport.h | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/transport.c b/transport.c
index 7d50c502ad..a016f41702 100644
--- a/transport.c
+++ b/transport.c
@@ -312,6 +312,7 @@ static struct ref *handshake(struct transport *transport, int for_push,
 		BUG("unknown protocol version");
 	}
 	data->got_remote_heads = 1;
+	transport->hash_algo = reader.hash_algo;
 
 	if (reader.line_peeked)
 		BUG("buffer must be empty at the end of handshake()");
@@ -988,9 +989,16 @@ struct transport *transport_get(struct remote *remote, const char *url)
 			ret->smart_options->receivepack = remote->receivepack;
 	}
 
+	ret->hash_algo = &hash_algos[GIT_HASH_SHA1];
+
 	return ret;
 }
 
+const struct git_hash_algo *transport_get_hash_algo(struct transport *transport)
+{
+	return transport->hash_algo;
+}
+
 int transport_set_option(struct transport *transport,
 			 const char *name, const char *value)
 {
diff --git a/transport.h b/transport.h
index 4298c855be..2a9f96c05a 100644
--- a/transport.h
+++ b/transport.h
@@ -115,6 +115,8 @@ struct transport {
 	struct git_transport_options *smart_options;
 
 	enum transport_family family;
+
+	const struct git_hash_algo *hash_algo;
 };
 
 #define TRANSPORT_PUSH_ALL			(1<<0)
@@ -243,6 +245,12 @@ int transport_push(struct repository *repo,
 const struct ref *transport_get_remote_refs(struct transport *transport,
 					    const struct argv_array *ref_prefixes);
 
+/*
+ * Fetch the hash algorithm used by a remote.
+ *
+ * This can only be called after fetching the remote refs.
+ */
+const struct git_hash_algo *transport_get_hash_algo(struct transport *transport);
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
 int transport_disconnect(struct transport *transport);

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 10/44] connect: add function to detect supported v1 hash functions
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (8 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 09/44] transport: add a hash algorithm member brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
                     ` (34 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Add a function, server_supports_hash, to see if the remote server
supports a particular hash algorithm when speaking protocol v1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 22 ++++++++++++++++++++++
 connect.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/connect.c b/connect.c
index 2cded78b0a..a52b038865 100644
--- a/connect.c
+++ b/connect.c
@@ -527,6 +527,28 @@ static const char *parse_feature_value(const char *feature_list, const char *fea
 	return NULL;
 }
 
+int server_supports_hash(const char *desired, int *feature_supported)
+{
+	int offset = 0;
+	int len;
+	const char *hash;
+
+	hash = next_server_feature_value("object-format", &len, &offset);
+	if (feature_supported)
+		*feature_supported = !!hash;
+	if (!hash) {
+		hash = hash_algos[GIT_HASH_SHA1].name;
+		len = strlen(hash);
+	}
+	while (hash) {
+		if (!xstrncmpz(desired, hash, len))
+			return 1;
+
+		hash = next_server_feature_value("object-format", &len, &offset);
+	}
+	return 0;
+}
+
 int parse_feature_request(const char *feature_list, const char *feature)
 {
 	return !!parse_feature_value(feature_list, feature, NULL, NULL);
diff --git a/connect.h b/connect.h
index 88702fdd17..c53976f7ec 100644
--- a/connect.h
+++ b/connect.h
@@ -18,6 +18,7 @@ int url_is_local_not_ssh(const char *url);
 struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
+int server_supports_hash(const char *desired, int *feature_supported);
 int server_supports_v2(const char *c, int die_on_error);
 int server_feature_v2(const char *c, const char **v);
 int server_supports_feature(const char *c, const char *feature,

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 11/44] send-pack: detect when the server doesn't support our hash
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (9 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 12/44] connect: make parse_feature_value extern brian m. carlson
                     ` (33 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Detect when the server doesn't support our hash algorithm and abort.
If the server does support our hash, advertise it as part of our
capabilities.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 send-pack.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/send-pack.c b/send-pack.c
index 0abee22283..02aefcb08e 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -363,6 +363,7 @@ int send_pack(struct send_pack_args *args,
 	int atomic_supported = 0;
 	int use_push_options = 0;
 	int push_options_supported = 0;
+	int object_format_supported = 0;
 	unsigned cmds_sent = 0;
 	int ret;
 	struct async demux;
@@ -389,6 +390,9 @@ int send_pack(struct send_pack_args *args,
 	if (server_supports("push-options"))
 		push_options_supported = 1;
 
+	if (!server_supports_hash(the_hash_algo->name, &object_format_supported))
+		die(_("the receiving end does not support this repository's hash algorithm"));
+
 	if (args->push_cert != SEND_PACK_PUSH_CERT_NEVER) {
 		int len;
 		push_cert_nonce = server_feature_value("push-cert", &len);
@@ -429,6 +433,8 @@ int send_pack(struct send_pack_args *args,
 		strbuf_addstr(&cap_buf, " atomic");
 	if (use_push_options)
 		strbuf_addstr(&cap_buf, " push-options");
+	if (object_format_supported)
+		strbuf_addf(&cap_buf, " object-format=%s", the_hash_algo->name);
 	if (agent_supported)
 		strbuf_addf(&cap_buf, " agent=%s", git_user_agent_sanitized());
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 12/44] connect: make parse_feature_value extern
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (10 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 13/44] fetch-pack: detect when the server doesn't support our hash brian m. carlson
                     ` (32 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

We're going to be using this function in other files, so no longer mark
this function static.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 3 +--
 connect.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/connect.c b/connect.c
index a52b038865..8f70a05699 100644
--- a/connect.c
+++ b/connect.c
@@ -18,7 +18,6 @@
 
 static char *server_capabilities_v1;
 static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
-static const char *parse_feature_value(const char *, const char *, int *, int *);
 static const char *next_server_feature_value(const char *feature, int *len, int *offset);
 
 static int check_ref(const char *name, unsigned int flags)
@@ -483,7 +482,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	return list;
 }
 
-static const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
+const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset)
 {
 	int len;
 
diff --git a/connect.h b/connect.h
index c53976f7ec..c53586e929 100644
--- a/connect.h
+++ b/connect.h
@@ -19,6 +19,7 @@ struct packet_reader;
 enum protocol_version discover_version(struct packet_reader *reader);
 
 int server_supports_hash(const char *desired, int *feature_supported);
+const char *parse_feature_value(const char *feature_list, const char *feature, int *lenp, int *offset);
 int server_supports_v2(const char *c, int die_on_error);
 int server_feature_v2(const char *c, const char **v);
 int server_supports_feature(const char *c, const char *feature,

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 13/44] fetch-pack: detect when the server doesn't support our hash
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (11 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 12/44] connect: make parse_feature_value extern brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 14/44] connect: detect algorithm when fetching refs brian m. carlson
                     ` (31 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Detect when the server doesn't support our hash algorithm and abort.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fetch-pack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index d8bbf45ee2..c090030680 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1040,6 +1040,8 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		print_verbose(args, _("Server supports %s"), "deepen-relative");
 	else if (args->deepen_relative)
 		die(_("Server does not support --deepen"));
+	if (!server_supports_hash(the_hash_algo->name, NULL))
+		die(_("Server does not support this repository's object format"));
 
 	if (!args->no_dependents) {
 		mark_complete_and_common_ref(negotiator, args, &ref);

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 14/44] connect: detect algorithm when fetching refs
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (12 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 13/44] fetch-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
                     ` (30 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

If we're fetching refs, detect the hash algorithm and parse the refs
using that algorithm.

As mentioned in the documentation, if multiple versions of the
object-format capability are provided, we use the first.  No known
implementation supports multiple algorithms now, but they may in the
future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/connect.c b/connect.c
index 8f70a05699..b6e110cb24 100644
--- a/connect.c
+++ b/connect.c
@@ -221,12 +221,25 @@ static void annotate_refs_with_symref_info(struct ref *ref)
 
 static void process_capabilities(struct packet_reader *reader, int *linelen)
 {
+	const char *feat_val;
+	int feat_len;
 	const char *line = reader->line;
 	int nul_location = strlen(line);
 	if (nul_location == *linelen)
 		return;
 	server_capabilities_v1 = xstrdup(line + nul_location + 1);
 	*linelen = nul_location;
+
+	feat_val = server_feature_value("object-format", &feat_len);
+	if (feat_val) {
+		char *hash_name = xstrndup(feat_val, feat_len);
+		int hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo != GIT_HASH_UNKNOWN)
+			reader->hash_algo = &hash_algos[hash_algo];
+		free(hash_name);
+	} else {
+		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
+	}
 }
 
 static int process_dummy_ref(const struct packet_reader *reader)
@@ -235,7 +248,7 @@ static int process_dummy_ref(const struct packet_reader *reader)
 	struct object_id oid;
 	const char *name;
 
-	if (parse_oid_hex(line, &oid, &name))
+	if (parse_oid_hex_algop(line, &oid, &name, reader->hash_algo))
 		return 0;
 	if (*name != ' ')
 		return 0;
@@ -259,7 +272,7 @@ static int process_ref(const struct packet_reader *reader, int len,
 	struct object_id old_oid;
 	const char *name;
 
-	if (parse_oid_hex(line, &old_oid, &name))
+	if (parse_oid_hex_algop(line, &old_oid, &name, reader->hash_algo))
 		return 0;
 	if (*name != ' ')
 		return 0;
@@ -271,7 +284,7 @@ static int process_ref(const struct packet_reader *reader, int len,
 		die(_("protocol error: unexpected capabilities^{}"));
 	} else if (check_ref(name, flags)) {
 		struct ref *ref = alloc_ref(name);
-		oidcpy(&ref->old_oid, &old_oid);
+		memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
 		**list = ref;
 		*list = &ref->next;
 	}
@@ -289,7 +302,7 @@ static int process_shallow(const struct packet_reader *reader, int len,
 	if (!skip_prefix(line, "shallow ", &arg))
 		return 0;
 
-	if (get_oid_hex(arg, &old_oid))
+	if (get_oid_hex_algop(arg, &old_oid, reader->hash_algo))
 		die(_("protocol error: expected shallow sha-1, got '%s'"), arg);
 	if (!shallow_points)
 		die(_("repository on the other end cannot be shallow"));

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 15/44] builtin/receive-pack: detect when the server doesn't support our hash
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (13 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 14/44] connect: detect algorithm when fetching refs brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 16/44] docs: update remote helper docs for object-format extensions brian m. carlson
                     ` (29 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Detect when the server doesn't support our hash algorithm and abort.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/receive-pack.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 4ffa501dce..d43663bb0a 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1625,6 +1625,8 @@ static struct command *read_head_info(struct packet_reader *reader,
 		linelen = strlen(reader->line);
 		if (linelen < reader->pktlen) {
 			const char *feature_list = reader->line + linelen + 1;
+			const char *hash = NULL;
+			int len = 0;
 			if (parse_feature_request(feature_list, "report-status"))
 				report_status = 1;
 			if (parse_feature_request(feature_list, "side-band-64k"))
@@ -1637,6 +1639,13 @@ static struct command *read_head_info(struct packet_reader *reader,
 			if (advertise_push_options
 			    && parse_feature_request(feature_list, "push-options"))
 				use_push_options = 1;
+			hash = parse_feature_value(feature_list, "object-format", &len, NULL);
+			if (!hash) {
+				hash = hash_algos[GIT_HASH_SHA1].name;
+				len = strlen(hash);
+			}
+			if (xstrncmpz(the_hash_algo->name, hash, len))
+				die("error: unsupported object format '%s'", hash);
 		}
 
 		if (!strcmp(reader->line, "push-cert")) {

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 16/44] docs: update remote helper docs for object-format extensions
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (14 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 17/44] transport-helper: implement " brian m. carlson
                     ` (28 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Update the remote helper docs to document the object-format extensions
we will implement in remote-curl and the transport helper code shortly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/gitremote-helpers.txt | 33 +++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
index 93baeeb029..6f1e269ae4 100644
--- a/Documentation/gitremote-helpers.txt
+++ b/Documentation/gitremote-helpers.txt
@@ -238,6 +238,9 @@ the remote repository.
 	`--signed-tags=verbatim` to linkgit:git-fast-export[1].  In the
 	absence of this capability, Git will use `--signed-tags=warn-strip`.
 
+'object-format'::
+	This indicates that the helper is able to interact with the remote
+	side using an explicit hash algorithm extension.
 
 
 COMMANDS
@@ -257,12 +260,14 @@ Support for this command is mandatory.
 'list'::
 	Lists the refs, one per line, in the format "<value> <name>
 	[<attr> ...]". The value may be a hex sha1 hash, "@<dest>" for
-	a symref, or "?" to indicate that the helper could not get the
-	value of the ref. A space-separated list of attributes follows
-	the name; unrecognized attributes are ignored. The list ends
-	with a blank line.
+	a symref, ":<keyword> <value>" for a key-value pair, or
+	"?" to indicate that the helper could not get the value of the
+	ref. A space-separated list of attributes follows the name;
+	unrecognized attributes are ignored. The list ends with a
+	blank line.
 +
 See REF LIST ATTRIBUTES for a list of currently defined attributes.
+See REF LIST KEYWORDS for a list of currently defined keywords.
 +
 Supported if the helper has the "fetch" or "import" capability.
 
@@ -432,6 +437,18 @@ attributes are defined.
 	This ref is unchanged since the last import or fetch, although
 	the helper cannot necessarily determine what value that produced.
 
+REF LIST KEYWORDS
+-----------------
+
+The 'list' command may produce a list of key-value pairs.
+The following keys are defined.
+
+'object-format'::
+	The refs are using the given hash algorithm.  This keyword is only
+	used if the server and client both support the object-format
+	extension.
+
+
 OPTIONS
 -------
 
@@ -516,6 +533,14 @@ set by Git if the remote helper has the 'option' capability.
 	transaction.  If successful, all refs will be updated, or none will.  If the
 	remote side does not support this capability, the push will fail.
 
+'option object-format' {'true'|algorithm}::
+	If 'true', indicate that the caller wants hash algorithm information
+	to be passed back from the remote.  This mode is used when fetching
+	refs.
++
+If set to an algorithm, indicate that the caller wants to interact with
+the remote side using that algorithm.
+
 SEE ALSO
 --------
 linkgit:git-remote[1]

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 17/44] transport-helper: implement object-format extensions
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (15 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 16/44] docs: update remote helper docs for object-format extensions brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 18/44] remote-curl: " brian m. carlson
                     ` (27 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Implement the object-format extensions that let us determine the hash
algorithm in use when pushing or pulling data.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 transport-helper.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index a46afcb69d..ae33b0eea7 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -32,7 +32,8 @@ struct helper_data {
 		signed_tags : 1,
 		check_connectivity : 1,
 		no_disconnect_req : 1,
-		no_private_update : 1;
+		no_private_update : 1,
+		object_format : 1;
 
 	/*
 	 * As an optimization, the transport code may invoke fetch before
@@ -207,6 +208,8 @@ static struct child_process *get_helper(struct transport *transport)
 			data->import_marks = xstrdup(arg);
 		} else if (starts_with(capname, "no-private-update")) {
 			data->no_private_update = 1;
+		} else if (starts_with(capname, "object-format")) {
+			data->object_format = 1;
 		} else if (mandatory) {
 			die(_("unknown mandatory capability %s; this remote "
 			      "helper probably needs newer version of Git"),
@@ -1103,6 +1106,12 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 	data->get_refs_list_called = 1;
 	helper = get_helper(transport);
 
+	if (data->object_format) {
+		write_str_in_full(helper->in, "option object-format\n");
+		if (recvline(data, &buf) || strcmp(buf.buf, "ok"))
+			exit(128);
+	}
+
 	if (data->push && for_push)
 		write_str_in_full(helper->in, "list for-push\n");
 	else
@@ -1115,6 +1124,17 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 
 		if (!*buf.buf)
 			break;
+		else if (buf.buf[0] == ':') {
+			const char *value;
+			if (skip_prefix(buf.buf, ":object-format ", &value)) {
+				int algo = hash_algo_by_name(value);
+				if (algo == GIT_HASH_UNKNOWN)
+					die(_("unsupported object format '%s'"),
+					    value);
+				transport->hash_algo = &hash_algos[algo];
+			}
+			continue;
+		}
 
 		eov = strchr(buf.buf, ' ');
 		if (!eov)
@@ -1127,7 +1147,7 @@ static struct ref *get_refs_list_using_list(struct transport *transport,
 		if (buf.buf[0] == '@')
 			(*tail)->symref = xstrdup(buf.buf + 1);
 		else if (buf.buf[0] != '?')
-			get_oid_hex(buf.buf, &(*tail)->old_oid);
+			get_oid_hex_algop(buf.buf, &(*tail)->old_oid, transport->hash_algo);
 		if (eon) {
 			if (has_attribute(eon + 1, "unchanged")) {
 				(*tail)->status |= REF_STATUS_UPTODATE;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 18/44] remote-curl: implement object-format extensions
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (16 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 17/44] transport-helper: implement " brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
                     ` (26 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Implement the object-format extensions that let us determine the hash
algorithm in use when pushing, pulling, and fetching.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 75532a8bae..f0203547c5 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -41,7 +41,9 @@ struct options {
 		deepen_relative : 1,
 		from_promisor : 1,
 		no_dependents : 1,
-		atomic : 1;
+		atomic : 1,
+		object_format : 1;
+	const struct git_hash_algo *hash_algo;
 };
 static struct options options;
 static struct string_list cas_options = STRING_LIST_INIT_DUP;
@@ -190,6 +192,16 @@ static int set_option(const char *name, const char *value)
 	} else if (!strcmp(name, "filter")) {
 		options.filter = xstrdup(value);
 		return 0;
+	} else if (!strcmp(name, "object-format")) {
+		int algo;
+		options.object_format = 1;
+		if (strcmp(value, "true")) {
+			algo = hash_algo_by_name(value);
+			if (algo == GIT_HASH_UNKNOWN)
+				die("unknown object format '%s'", value);
+			options.hash_algo = &hash_algos[algo];
+		}
+		return 0;
 	} else {
 		return 1 /* unsupported */;
 	}
@@ -231,6 +243,7 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	case protocol_v0:
 		get_remote_heads(&reader, &list, for_push ? REF_NORMAL : 0,
 				 NULL, &heads->shallow);
+		options.hash_algo = reader.hash_algo;
 		break;
 	case protocol_unknown_version:
 		BUG("unknown protocol version");
@@ -509,6 +522,9 @@ static struct ref *get_refs(int for_push)
 static void output_refs(struct ref *refs)
 {
 	struct ref *posn;
+	if (options.object_format && options.hash_algo) {
+		printf(":object-format %s\n", options.hash_algo->name);
+	}
 	for (posn = refs; posn; posn = posn->next) {
 		if (posn->symref)
 			printf("@%s %s\n", posn->symref, posn->name);
@@ -1499,6 +1515,7 @@ int cmd_main(int argc, const char **argv)
 			printf("option\n");
 			printf("push\n");
 			printf("check-connectivity\n");
+			printf("object-format\n");
 			printf("\n");
 			fflush(stdout);
 		} else if (skip_prefix(buf.buf, "stateless-connect ", &arg)) {

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 19/44] builtin/clone: initialize hash algorithm properly
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (17 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 18/44] remote-curl: " brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 20/44] t5562: pass object-format in synthesized test data brian m. carlson
                     ` (25 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When performing a clone, we don't know what hash algorithm the other end
will support.  Currently, we don't support fetching data belonging to a
different algorithm, so we must know what algorithm the remote side is
using in order to properly initialize the repository.  We can know that
only after fetching the refs, so if the remote side has any references,
use that information to reinitialize the repository with the correct
hash algorithm information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/clone.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index 2a8e3aaaed..e3519a8355 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1220,6 +1220,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	refs = transport_get_remote_refs(transport, &ref_prefixes);
 
 	if (refs) {
+		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
+
+		/*
+		 * Now that we know what algorithm the remote side is using,
+		 * let's set ours to the same thing.
+		 */
+		initialize_repository_version(hash_algo);
+		repo_set_hash_algo(the_repository, hash_algo);
+
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
 		/*
 		 * transport_get_remote_refs() may return refs with null sha-1

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 20/44] t5562: pass object-format in synthesized test data
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (18 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 21/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
                     ` (24 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Ensure that we pass the object-format capability in the synthesized test
data so that this test works with algorithms other than SHA-1.

In addition, add an additional test using the old data for when we're
using SHA-1 so that we can be sure that we preserve backwards
compatibility with servers not offering the object-format capability.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5562-http-backend-content-length.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 3f4ac71f83..c6ec625497 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -46,6 +46,7 @@ ssize_b100dots() {
 }
 
 test_expect_success 'setup' '
+	test_oid_init &&
 	HTTP_CONTENT_ENCODING="identity" &&
 	export HTTP_CONTENT_ENCODING &&
 	git config http.receivepack true &&
@@ -62,8 +63,8 @@ test_expect_success 'setup' '
 	test_copy_bytes 10 <fetch_body >fetch_body.trunc &&
 	hash_next=$(git commit-tree -p HEAD -m next HEAD^{tree}) &&
 	{
-		printf "%s %s refs/heads/newbranch\\0report-status\\n" \
-			"$ZERO_OID" "$hash_next" | packetize &&
+		printf "%s %s refs/heads/newbranch\\0report-status object-format=%s\\n" \
+			"$ZERO_OID" "$hash_next" "$(test_oid algo)" | packetize &&
 		printf 0000 &&
 		echo "$hash_next" | git pack-objects --stdout
 	} >push_body &&

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 21/44] fetch-pack: parse and advertise the object-format capability
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (19 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 20/44] t5562: pass object-format in synthesized test data brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 22/44] setup: set the_repository's hash algo when checking format brian m. carlson
                     ` (23 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Parse the server's object-format capability and respond accordingly,
dying if there is a mismatch.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fetch-pack.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index c090030680..7e58f295f5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1180,6 +1180,7 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
 			      int sideband_all, int seen_ack)
 {
 	int ret = 0;
+	const char *hash_name;
 	struct strbuf req_buf = STRBUF_INIT;
 
 	if (server_supports_v2("fetch", 1))
@@ -1194,6 +1195,17 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
 					 args->server_options->items[i].string);
 	}
 
+	if (server_feature_v2("object-format", &hash_name)) {
+		int hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
+			die(_("mismatched algorithms: client %s; server %s"),
+			    the_hash_algo->name, hash_name);
+		packet_write_fmt(fd_out, "object-format=%s", the_hash_algo->name);
+	} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
+		die(_("the server does not support algorithm '%s'"),
+		    the_hash_algo->name);
+	}
+
 	packet_buf_delim(&req_buf);
 	if (args->use_thin_pack)
 		packet_buf_write(&req_buf, "thin-pack");

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 22/44] setup: set the_repository's hash algo when checking format
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (20 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 21/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 23/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
                     ` (22 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When we're checking the repository's format, set the hash algorithm at
the same time.  This ensures that we perform a suitable initialization
early enough to avoid confusing any parts of the code.  If we defer
until later, we can end up with portions of the code which are confused
about the hash algorithm, resulting in segfaults when working with
SHA-256 repositories.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/setup.c b/setup.c
index 65fe5ecefb..019a1c6367 100644
--- a/setup.c
+++ b/setup.c
@@ -1273,6 +1273,7 @@ void check_repository_format(struct repository_format *fmt)
 		fmt = &repo_fmt;
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
+	repo_set_hash_algo(the_repository, fmt->hash_algo);
 	clear_repository_format(&repo_fmt);
 }
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 23/44] t3200: mark assertion with SHA1 prerequisite
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (21 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 22/44] setup: set the_repository's hash algo when checking format brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 24/44] packfile: compute and use the index CRC offset brian m. carlson
                     ` (21 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

One of the test assertions in this test checks that git branch -m works
even without a .git/config file.  However, if the repository requires
configuration extensions, such as because it uses a non-SHA-1 algorithm,
this assertion will fail.  Mark the assertion as requiring SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t3200-branch.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 411a70b0ce..2a3fedc6b0 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -402,7 +402,7 @@ EOF
 
 mv .git/config .git/config-saved
 
-test_expect_success 'git branch -m q q2 without config should succeed' '
+test_expect_success SHA1 'git branch -m q q2 without config should succeed' '
 	git branch -m q q2 &&
 	git branch -m q2 q
 '

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 24/44] packfile: compute and use the index CRC offset
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (22 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 23/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 25/44] t5302: modernize test formatting brian m. carlson
                     ` (20 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Both v2 pack index files and the v3 format specified as part of the
NewHash work have similar data starting at the CRC table.  Much of the
existing code wants to read either this table or the offset entries
following it, and in doing so computes the offset each time.

In order to share as much code between v2 and v3, compute the offset of
the CRC table and store it when the pack is opened.  Use this value to
compute offsets to not only the CRC table, but to the offset entries
beyond it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/index-pack.c | 6 +-----
 object-store.h       | 1 +
 packfile.c           | 1 +
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f176dd28c8..7bea1fba52 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1555,13 +1555,9 @@ static void read_v2_anomalous_offsets(struct packed_git *p,
 {
 	const uint32_t *idx1, *idx2;
 	uint32_t i;
-	const uint32_t hashwords = the_hash_algo->rawsz / sizeof(uint32_t);
 
 	/* The address of the 4-byte offset table */
-	idx1 = (((const uint32_t *)p->index_data)
-		+ 2 /* 8-byte header */
-		+ 256 /* fan out */
-		+ hashwords * p->num_objects /* object ID table */
+	idx1 = (((const uint32_t *)((const uint8_t *)p->index_data + p->crc_offset))
 		+ p->num_objects /* CRC32 table */
 		);
 
diff --git a/object-store.h b/object-store.h
index d1e490f203..f439d47af8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -70,6 +70,7 @@ struct packed_git {
 	size_t index_size;
 	uint32_t num_objects;
 	uint32_t num_bad_objects;
+	uint32_t crc_offset;
 	unsigned char *bad_object_sha1;
 	int index_version;
 	time_t mtime;
diff --git a/packfile.c b/packfile.c
index f4e752996d..6ab5233613 100644
--- a/packfile.c
+++ b/packfile.c
@@ -178,6 +178,7 @@ int load_idx(const char *path, const unsigned int hashsz, void *idx_map,
 		     */
 		    (sizeof(off_t) <= 4))
 			return error("pack too large for current definition of off_t in %s", path);
+		p->crc_offset = 8 + 4 * 256 + nr * hashsz;
 	}
 
 	p->index_version = version;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 25/44] t5302: modernize test formatting
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (23 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 24/44] packfile: compute and use the index CRC offset brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 26/44] builtin/show-index: provide options to determine hash algo brian m. carlson
                     ` (19 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Our style these days is to place the description and the opening quote
of the body on the same line as test_expect_success (if it fits), to
place the trailing quote on a line by itself after the body, and to use
tabs.  Since we're going to be making several significant changes to
this test, modernize the style to aid in readability of the subsequent
patches.

This patch should have no functional change.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5302-pack-index.sh | 360 +++++++++++++++++++++---------------------
 1 file changed, 184 insertions(+), 176 deletions(-)

diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index ad07f2f7fc..8981c9b90e 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -7,65 +7,65 @@ test_description='pack index with 64-bit offsets and object CRC'
 . ./test-lib.sh
 
 test_expect_success 'setup' '
-     test_oid_init &&
-     rawsz=$(test_oid rawsz) &&
-     rm -rf .git &&
-     git init &&
-     git config pack.threads 1 &&
-     i=1 &&
-     while test $i -le 100
-     do
-         iii=$(printf '%03i' $i)
-	 test-tool genrandom "bar" 200 > wide_delta_$iii &&
-	 test-tool genrandom "baz $iii" 50 >> wide_delta_$iii &&
-	 test-tool genrandom "foo"$i 100 > deep_delta_$iii &&
-	 test-tool genrandom "foo"$(expr $i + 1) 100 >> deep_delta_$iii &&
-	 test-tool genrandom "foo"$(expr $i + 2) 100 >> deep_delta_$iii &&
-         echo $iii >file_$iii &&
-	 test-tool genrandom "$iii" 8192 >>file_$iii &&
-         git update-index --add file_$iii deep_delta_$iii wide_delta_$iii &&
-         i=$(expr $i + 1) || return 1
-     done &&
-     { echo 101 && test-tool genrandom 100 8192; } >file_101 &&
-     git update-index --add file_101 &&
-     tree=$(git write-tree) &&
-     commit=$(git commit-tree $tree </dev/null) && {
-	 echo $tree &&
-	 git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\)	.*/\\1/"
-     } >obj-list &&
-     git update-ref HEAD $commit
+	test_oid_init &&
+	rawsz=$(test_oid rawsz) &&
+	rm -rf .git &&
+	git init &&
+	git config pack.threads 1 &&
+	i=1 &&
+	while test $i -le 100
+	do
+		iii=$(printf '%03i' $i)
+		test-tool genrandom "bar" 200 > wide_delta_$iii &&
+		test-tool genrandom "baz $iii" 50 >> wide_delta_$iii &&
+		test-tool genrandom "foo"$i 100 > deep_delta_$iii &&
+		test-tool genrandom "foo"$(expr $i + 1) 100 >> deep_delta_$iii &&
+		test-tool genrandom "foo"$(expr $i + 2) 100 >> deep_delta_$iii &&
+		echo $iii >file_$iii &&
+		test-tool genrandom "$iii" 8192 >>file_$iii &&
+		git update-index --add file_$iii deep_delta_$iii wide_delta_$iii &&
+		i=$(expr $i + 1) || return 1
+	done &&
+	{ echo 101 && test-tool genrandom 100 8192; } >file_101 &&
+	git update-index --add file_101 &&
+	tree=$(git write-tree) &&
+	commit=$(git commit-tree $tree </dev/null) && {
+		echo $tree &&
+		git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\)	.*/\\1/"
+	} >obj-list &&
+	git update-ref HEAD $commit
 '
 
-test_expect_success \
-    'pack-objects with index version 1' \
-    'pack1=$(git pack-objects --index-version=1 test-1 <obj-list) &&
-     git verify-pack -v "test-1-${pack1}.pack"'
+test_expect_success 'pack-objects with index version 1' '
+	pack1=$(git pack-objects --index-version=1 test-1 <obj-list) &&
+	git verify-pack -v "test-1-${pack1}.pack"
+'
 
-test_expect_success \
-    'pack-objects with index version 2' \
-    'pack2=$(git pack-objects --index-version=2 test-2 <obj-list) &&
-     git verify-pack -v "test-2-${pack2}.pack"'
+test_expect_success 'pack-objects with index version 2' '
+	pack2=$(git pack-objects --index-version=2 test-2 <obj-list) &&
+	git verify-pack -v "test-2-${pack2}.pack"
+'
 
-test_expect_success \
-    'both packs should be identical' \
-    'cmp "test-1-${pack1}.pack" "test-2-${pack2}.pack"'
+test_expect_success 'both packs should be identical' '
+	cmp "test-1-${pack1}.pack" "test-2-${pack2}.pack"
+'
 
-test_expect_success \
-    'index v1 and index v2 should be different' \
-    '! cmp "test-1-${pack1}.idx" "test-2-${pack2}.idx"'
+test_expect_success 'index v1 and index v2 should be different' '
+	! cmp "test-1-${pack1}.idx" "test-2-${pack2}.idx"
+'
 
-test_expect_success \
-    'index-pack with index version 1' \
-    'git index-pack --index-version=1 -o 1.idx "test-1-${pack1}.pack"'
+test_expect_success 'index-pack with index version 1' '
+	git index-pack --index-version=1 -o 1.idx "test-1-${pack1}.pack"
+'
 
-test_expect_success \
-    'index-pack with index version 2' \
-    'git index-pack --index-version=2 -o 2.idx "test-1-${pack1}.pack"'
+test_expect_success 'index-pack with index version 2' '
+	git index-pack --index-version=2 -o 2.idx "test-1-${pack1}.pack"
+'
 
-test_expect_success \
-    'index-pack results should match pack-objects ones' \
-    'cmp "test-1-${pack1}.idx" "1.idx" &&
-     cmp "test-2-${pack2}.idx" "2.idx"'
+test_expect_success 'index-pack results should match pack-objects ones' '
+	cmp "test-1-${pack1}.idx" "1.idx" &&
+	cmp "test-2-${pack2}.idx" "2.idx"
+'
 
 test_expect_success 'index-pack --verify on index version 1' '
 	git index-pack --verify "test-1-${pack1}.pack"
@@ -75,13 +75,13 @@ test_expect_success 'index-pack --verify on index version 2' '
 	git index-pack --verify "test-2-${pack2}.pack"
 '
 
-test_expect_success \
-    'pack-objects --index-version=2, is not accepted' \
-    'test_must_fail git pack-objects --index-version=2, test-3 <obj-list'
+test_expect_success 'pack-objects --index-version=2, is not accepted' '
+	test_must_fail git pack-objects --index-version=2, test-3 <obj-list
+'
 
-test_expect_success \
-    'index v2: force some 64-bit offsets with pack-objects' \
-    'pack3=$(git pack-objects --index-version=2,0x40000 test-3 <obj-list)'
+test_expect_success 'index v2: force some 64-bit offsets with pack-objects' '
+	pack3=$(git pack-objects --index-version=2,0x40000 test-3 <obj-list)
+'
 
 if msg=$(git verify-pack -v "test-3-${pack3}.pack" 2>&1) ||
 	! (echo "$msg" | grep "pack too large .* off_t")
@@ -91,21 +91,21 @@ else
 	say "# skipping tests concerning 64-bit offsets"
 fi
 
-test_expect_success OFF64_T \
-    'index v2: verify a pack with some 64-bit offsets' \
-    'git verify-pack -v "test-3-${pack3}.pack"'
+test_expect_success OFF64_T 'index v2: verify a pack with some 64-bit offsets' '
+	git verify-pack -v "test-3-${pack3}.pack"
+'
 
-test_expect_success OFF64_T \
-    '64-bit offsets: should be different from previous index v2 results' \
-    '! cmp "test-2-${pack2}.idx" "test-3-${pack3}.idx"'
+test_expect_success OFF64_T '64-bit offsets: should be different from previous index v2 results' '
+	! cmp "test-2-${pack2}.idx" "test-3-${pack3}.idx"
+'
 
-test_expect_success OFF64_T \
-    'index v2: force some 64-bit offsets with index-pack' \
-    'git index-pack --index-version=2,0x40000 -o 3.idx "test-1-${pack1}.pack"'
+test_expect_success OFF64_T 'index v2: force some 64-bit offsets with index-pack' '
+	git index-pack --index-version=2,0x40000 -o 3.idx "test-1-${pack1}.pack"
+'
 
-test_expect_success OFF64_T \
-    '64-bit offsets: index-pack result should match pack-objects one' \
-    'cmp "test-3-${pack3}.idx" "3.idx"'
+test_expect_success OFF64_T '64-bit offsets: index-pack result should match pack-objects one' '
+	cmp "test-3-${pack3}.idx" "3.idx"
+'
 
 test_expect_success OFF64_T 'index-pack --verify on 64-bit offset v2 (cheat)' '
 	# This cheats by knowing which lower offset should still be encoded
@@ -120,135 +120,143 @@ test_expect_success OFF64_T 'index-pack --verify on 64-bit offset v2' '
 # returns the object number for given object in given pack index
 index_obj_nr()
 {
-    idx_file=$1
-    object_sha1=$2
-    nr=0
-    git show-index < $idx_file |
-    while read offs sha1 extra
-    do
-      nr=$(($nr + 1))
-      test "$sha1" = "$object_sha1" || continue
-      echo "$(($nr - 1))"
-      break
-    done
+	idx_file=$1
+	object_sha1=$2
+	nr=0
+	git show-index < $idx_file |
+	while read offs sha1 extra
+	do
+	  nr=$(($nr + 1))
+	  test "$sha1" = "$object_sha1" || continue
+	  echo "$(($nr - 1))"
+	  break
+	done
 }
 
 # returns the pack offset for given object as found in given pack index
 index_obj_offset()
 {
-    idx_file=$1
-    object_sha1=$2
-    git show-index < $idx_file | grep $object_sha1 |
-    ( read offs extra && echo "$offs" )
+	idx_file=$1
+	object_sha1=$2
+	git show-index < $idx_file | grep $object_sha1 |
+	( read offs extra && echo "$offs" )
 }
 
-test_expect_success \
-    '[index v1] 1) stream pack to repository' \
-    'git index-pack --index-version=1 --stdin < "test-1-${pack1}.pack" &&
-     git prune-packed &&
-     git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
-     cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
-     cmp "test-1-${pack1}.idx"  ".git/objects/pack/pack-${pack1}.idx"'
+test_expect_success '[index v1] 1) stream pack to repository' '
+	git index-pack --index-version=1 --stdin < "test-1-${pack1}.pack" &&
+	git prune-packed &&
+	git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
+	cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
+	cmp "test-1-${pack1}.idx"	".git/objects/pack/pack-${pack1}.idx"
+'
 
 test_expect_success \
-    '[index v1] 2) create a stealth corruption in a delta base reference' \
-    '# This test assumes file_101 is a delta smaller than 16 bytes.
-     # It should be against file_100 but we substitute its base for file_099
-     sha1_101=$(git hash-object file_101) &&
-     sha1_099=$(git hash-object file_099) &&
-     offs_101=$(index_obj_offset 1.idx $sha1_101) &&
-     nr_099=$(index_obj_nr 1.idx $sha1_099) &&
-     chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
-     recordsz=$((rawsz + 4)) &&
-     dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
-        if=".git/objects/pack/pack-${pack1}.idx" \
-        skip=$((4 + 256 * 4 + $nr_099 * recordsz)) \
-        bs=1 count=$rawsz conv=notrunc &&
-     git cat-file blob $sha1_101 > file_101_foo1'
+	'[index v1] 2) create a stealth corruption in a delta base reference' '
+	# This test assumes file_101 is a delta smaller than 16 bytes.
+	# It should be against file_100 but we substitute its base for file_099
+	sha1_101=$(git hash-object file_101) &&
+	sha1_099=$(git hash-object file_099) &&
+	offs_101=$(index_obj_offset 1.idx $sha1_101) &&
+	nr_099=$(index_obj_nr 1.idx $sha1_099) &&
+	chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
+	recordsz=$((rawsz + 4)) &&
+	dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
+	       if=".git/objects/pack/pack-${pack1}.idx" \
+	       skip=$((4 + 256 * 4 + $nr_099 * recordsz)) \
+	       bs=1 count=$rawsz conv=notrunc &&
+	git cat-file blob $sha1_101 > file_101_foo1
+'
 
 test_expect_success \
-    '[index v1] 3) corrupted delta happily returned wrong data' \
-    'test -f file_101_foo1 && ! cmp file_101 file_101_foo1'
+	'[index v1] 3) corrupted delta happily returned wrong data' '
+	test -f file_101_foo1 && ! cmp file_101 file_101_foo1
+'
 
 test_expect_success \
-    '[index v1] 4) confirm that the pack is actually corrupted' \
-    'test_must_fail git fsck --full $commit'
+	'[index v1] 4) confirm that the pack is actually corrupted' '
+	test_must_fail git fsck --full $commit
+'
 
 test_expect_success \
-    '[index v1] 5) pack-objects happily reuses corrupted data' \
-    'pack4=$(git pack-objects test-4 <obj-list) &&
-     test -f "test-4-${pack4}.pack"'
+	'[index v1] 5) pack-objects happily reuses corrupted data' '
+	pack4=$(git pack-objects test-4 <obj-list) &&
+	test -f "test-4-${pack4}.pack"
+'
+
+test_expect_success '[index v1] 6) newly created pack is BAD !' '
+	test_must_fail git verify-pack -v "test-4-${pack4}.pack"
+'
+
+test_expect_success '[index v2] 1) stream pack to repository' '
+	rm -f .git/objects/pack/* &&
+	git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
+	git prune-packed &&
+	git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
+	cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
+	cmp "test-2-${pack1}.idx"	".git/objects/pack/pack-${pack1}.idx"
+'
 
 test_expect_success \
-    '[index v1] 6) newly created pack is BAD !' \
-    'test_must_fail git verify-pack -v "test-4-${pack4}.pack"'
+	'[index v2] 2) create a stealth corruption in a delta base reference' '
+	# This test assumes file_101 is a delta smaller than 16 bytes.
+	# It should be against file_100 but we substitute its base for file_099
+	sha1_101=$(git hash-object file_101) &&
+	sha1_099=$(git hash-object file_099) &&
+	offs_101=$(index_obj_offset 1.idx $sha1_101) &&
+	nr_099=$(index_obj_nr 1.idx $sha1_099) &&
+	chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
+	dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
+		if=".git/objects/pack/pack-${pack1}.idx" \
+		skip=$((8 + 256 * 4 + $nr_099 * rawsz)) \
+		bs=1 count=$rawsz conv=notrunc &&
+	git cat-file blob $sha1_101 > file_101_foo2
+'
 
 test_expect_success \
-    '[index v2] 1) stream pack to repository' \
-    'rm -f .git/objects/pack/* &&
-     git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
-     git prune-packed &&
-     git count-objects | ( read nr rest && test "$nr" -eq 1 ) &&
-     cmp "test-1-${pack1}.pack" ".git/objects/pack/pack-${pack1}.pack" &&
-     cmp "test-2-${pack1}.idx"  ".git/objects/pack/pack-${pack1}.idx"'
+	'[index v2] 3) corrupted delta happily returned wrong data' '
+	test -f file_101_foo2 && ! cmp file_101 file_101_foo2
+'
 
 test_expect_success \
-    '[index v2] 2) create a stealth corruption in a delta base reference' \
-    '# This test assumes file_101 is a delta smaller than 16 bytes.
-     # It should be against file_100 but we substitute its base for file_099
-     sha1_101=$(git hash-object file_101) &&
-     sha1_099=$(git hash-object file_099) &&
-     offs_101=$(index_obj_offset 1.idx $sha1_101) &&
-     nr_099=$(index_obj_nr 1.idx $sha1_099) &&
-     chmod +w ".git/objects/pack/pack-${pack1}.pack" &&
-     dd of=".git/objects/pack/pack-${pack1}.pack" seek=$(($offs_101 + 1)) \
-        if=".git/objects/pack/pack-${pack1}.idx" \
-        skip=$((8 + 256 * 4 + $nr_099 * rawsz)) \
-        bs=1 count=$rawsz conv=notrunc &&
-     git cat-file blob $sha1_101 > file_101_foo2'
+	'[index v2] 4) confirm that the pack is actually corrupted' '
+	test_must_fail git fsck --full $commit
+'
 
 test_expect_success \
-    '[index v2] 3) corrupted delta happily returned wrong data' \
-    'test -f file_101_foo2 && ! cmp file_101 file_101_foo2'
+	'[index v2] 5) pack-objects refuses to reuse corrupted data' '
+	test_must_fail git pack-objects test-5 <obj-list &&
+	test_must_fail git pack-objects --no-reuse-object test-6 <obj-list
+'
 
 test_expect_success \
-    '[index v2] 4) confirm that the pack is actually corrupted' \
-    'test_must_fail git fsck --full $commit'
-
-test_expect_success \
-    '[index v2] 5) pack-objects refuses to reuse corrupted data' \
-    'test_must_fail git pack-objects test-5 <obj-list &&
-     test_must_fail git pack-objects --no-reuse-object test-6 <obj-list'
-
-test_expect_success \
-    '[index v2] 6) verify-pack detects CRC mismatch' \
-    'rm -f .git/objects/pack/* &&
-     git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
-     git verify-pack ".git/objects/pack/pack-${pack1}.pack" &&
-     obj=$(git hash-object file_001) &&
-     nr=$(index_obj_nr ".git/objects/pack/pack-${pack1}.idx" $obj) &&
-     chmod +w ".git/objects/pack/pack-${pack1}.idx" &&
-     printf xxxx | dd of=".git/objects/pack/pack-${pack1}.idx" conv=notrunc \
-        bs=1 count=4 seek=$((8 + 256 * 4 + $(wc -l <obj-list) * rawsz + $nr * 4)) &&
-     ( while read obj
-       do git cat-file -p $obj >/dev/null || exit 1
-       done <obj-list ) &&
-     test_must_fail git verify-pack ".git/objects/pack/pack-${pack1}.pack"
+	'[index v2] 6) verify-pack detects CRC mismatch' '
+	rm -f .git/objects/pack/* &&
+	git index-pack --index-version=2 --stdin < "test-1-${pack1}.pack" &&
+	git verify-pack ".git/objects/pack/pack-${pack1}.pack" &&
+	obj=$(git hash-object file_001) &&
+	nr=$(index_obj_nr ".git/objects/pack/pack-${pack1}.idx" $obj) &&
+	chmod +w ".git/objects/pack/pack-${pack1}.idx" &&
+	printf xxxx | dd of=".git/objects/pack/pack-${pack1}.idx" conv=notrunc \
+		bs=1 count=4 seek=$((8 + 256 * 4 + $(wc -l <obj-list) * rawsz + $nr * 4)) &&
+	 ( while read obj
+	   do git cat-file -p $obj >/dev/null || exit 1
+	   done <obj-list ) &&
+	test_must_fail git verify-pack ".git/objects/pack/pack-${pack1}.pack"
 '
 
 test_expect_success 'running index-pack in the object store' '
-    rm -f .git/objects/pack/* &&
-    cp test-1-${pack1}.pack .git/objects/pack/pack-${pack1}.pack &&
-    (
-	cd .git/objects/pack &&
-	git index-pack pack-${pack1}.pack
-    ) &&
-    test -f .git/objects/pack/pack-${pack1}.idx
+	rm -f .git/objects/pack/* &&
+	cp test-1-${pack1}.pack .git/objects/pack/pack-${pack1}.pack &&
+	(
+		cd .git/objects/pack &&
+		git index-pack pack-${pack1}.pack
+	) &&
+	test -f .git/objects/pack/pack-${pack1}.idx
 '
 
 test_expect_success 'index-pack --strict warns upon missing tagger in tag' '
-    sha=$(git rev-parse HEAD) &&
-    cat >wrong-tag <<EOF &&
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<EOF &&
 object $sha
 type commit
 tag guten tag
@@ -256,18 +264,18 @@ tag guten tag
 This is an invalid tag.
 EOF
 
-    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
-    pack1=$(echo $tag $sha | git pack-objects tag-test) &&
-    echo remove tag object &&
-    thirtyeight=${tag#??} &&
-    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
-    git index-pack --strict tag-test-${pack1}.pack 2>err &&
-    grep "^warning:.* expected .tagger. line" err
+	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	pack1=$(echo $tag $sha | git pack-objects tag-test) &&
+	echo remove tag object &&
+	thirtyeight=${tag#??} &&
+	rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
+	git index-pack --strict tag-test-${pack1}.pack 2>err &&
+	grep "^warning:.* expected .tagger. line" err
 '
 
 test_expect_success 'index-pack --fsck-objects also warns upon missing tagger in tag' '
-    git index-pack --fsck-objects tag-test-${pack1}.pack 2>err &&
-    grep "^warning:.* expected .tagger. line" err
+	git index-pack --fsck-objects tag-test-${pack1}.pack 2>err &&
+	grep "^warning:.* expected .tagger. line" err
 '
 
 test_done

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 26/44] builtin/show-index: provide options to determine hash algo
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (24 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 25/44] t5302: modernize test formatting brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 27/44] t1302: expect repo format version 1 for SHA-256 brian m. carlson
                     ` (18 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

show-index is capable of reading any possible index file whether or not
the index is inside a repository.  However, because our index files lack
metadata about the hash algorithm in use, it's not possible to
autodetect the algorithm that a particular index file is using.

In order to allow us to read index files of any algorithm, let's set up
the .git directory gently so that we default to the algorithm for the
current repository, and add an --object-format option to allow users to
override this setting and continue to run show-index outside of a
repository altogether.  Let's also document this new option so that
people can find it and use it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-show-index.txt | 11 ++++++++++-
 builtin/show-index.c             | 29 ++++++++++++++++++++++++-----
 git.c                            |  2 +-
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-show-index.txt b/Documentation/git-show-index.txt
index 424e4ba84c..39b1d8eaa1 100644
--- a/Documentation/git-show-index.txt
+++ b/Documentation/git-show-index.txt
@@ -9,7 +9,7 @@ git-show-index - Show packed archive index
 SYNOPSIS
 --------
 [verse]
-'git show-index'
+'git show-index' [--object-format=<hash-algorithm>]
 
 
 DESCRIPTION
@@ -36,6 +36,15 @@ Note that you can get more information on a packfile by calling
 linkgit:git-verify-pack[1]. However, as this command considers only the
 index file itself, it's both faster and more flexible.
 
+OPTIONS
+-------
+
+--object-format=<hash-algorithm>::
+	Specify the given object format (hash algorithm) for the index file.  The
+	valid values are 'sha1' and (if enabled) 'sha256'.  The default is the
+	algorithm for the current repository (set by `extensions.objectFormat`), or
+	'sha1' if no value is set or outside a repository..
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/show-index.c b/builtin/show-index.c
index 0826f6a5a2..8106b03a6b 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -1,9 +1,12 @@
 #include "builtin.h"
 #include "cache.h"
 #include "pack.h"
+#include "parse-options.h"
 
-static const char show_index_usage[] =
-"git show-index";
+static const char *const show_index_usage[] = {
+	"git show-index [--object-format=<hash-algorithm>]",
+	NULL
+};
 
 int cmd_show_index(int argc, const char **argv, const char *prefix)
 {
@@ -11,10 +14,26 @@ int cmd_show_index(int argc, const char **argv, const char *prefix)
 	unsigned nr;
 	unsigned int version;
 	static unsigned int top_index[256];
-	const unsigned hashsz = the_hash_algo->rawsz;
+	unsigned hashsz;
+	const char *hash_name = NULL;
+	int hash_algo;
+	const struct option show_index_options[] = {
+		OPT_STRING(0, "object-format", &hash_name, N_("hash-algorithm"),
+			   N_("specify the hash algorithm to use")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, show_index_options, show_index_usage, 0);
+
+	if (hash_name) {
+		hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("Unknown hash algorithm"));
+		repo_set_hash_algo(the_repository, hash_algo);
+	}
+
+	hashsz = the_hash_algo->rawsz;
 
-	if (argc != 1)
-		usage(show_index_usage);
 	if (fread(top_index, 2 * 4, 1, stdin) != 1)
 		die("unable to read header");
 	if (top_index[0] == htonl(PACK_IDX_SIGNATURE)) {
diff --git a/git.c b/git.c
index a2d337eed7..2f021b97f3 100644
--- a/git.c
+++ b/git.c
@@ -574,7 +574,7 @@ static struct cmd_struct commands[] = {
 	{ "shortlog", cmd_shortlog, RUN_SETUP_GENTLY | USE_PAGER },
 	{ "show", cmd_show, RUN_SETUP },
 	{ "show-branch", cmd_show_branch, RUN_SETUP },
-	{ "show-index", cmd_show_index },
+	{ "show-index", cmd_show_index, RUN_SETUP_GENTLY },
 	{ "show-ref", cmd_show_ref, RUN_SETUP },
 	{ "sparse-checkout", cmd_sparse_checkout, RUN_SETUP | NEED_WORK_TREE },
 	{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 27/44] t1302: expect repo format version 1 for SHA-256
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (25 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 26/44] builtin/show-index: provide options to determine hash algo brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 28/44] Documentation/technical: document object-format for protocol v2 brian m. carlson
                     ` (17 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When using SHA-256, we need to take advantage of the extensions section
in the config file, so we need to use repository format version 1.
Update the test to look for the correct value.

Note that test_oid produces a value without a trailing newline, so use
echo to ensure we print a trailing newline to compare it correctly
against the actual results.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1302-repo-version.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t1302-repo-version.sh b/t/t1302-repo-version.sh
index ce4cff13bb..d60c042ce8 100755
--- a/t/t1302-repo-version.sh
+++ b/t/t1302-repo-version.sh
@@ -8,6 +8,10 @@ test_description='Test repository version check'
 . ./test-lib.sh
 
 test_expect_success 'setup' '
+	test_oid_cache <<-\EOF &&
+	version sha1:0
+	version sha256:1
+	EOF
 	cat >test.patch <<-\EOF &&
 	diff --git a/test.txt b/test.txt
 	new file mode 100644
@@ -23,7 +27,7 @@ test_expect_success 'setup' '
 '
 
 test_expect_success 'gitdir selection on normal repos' '
-	echo 0 >expect &&
+	echo $(test_oid version) >expect &&
 	git config core.repositoryformatversion >actual &&
 	git -C test config core.repositoryformatversion >actual2 &&
 	test_cmp expect actual &&

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 28/44] Documentation/technical: document object-format for protocol v2
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (26 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 27/44] t1302: expect repo format version 1 for SHA-256 brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 29/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
                     ` (16 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Document the object-format extension for protocol v2.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/technical/protocol-v2.txt | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 3996d70891..b288df7ed7 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -455,3 +455,12 @@ included in a request.  This is done by sending each option as a
 a request.
 
 The provided options must not contain a NUL or LF character.
+
+ object-format
+~~~~~~~~~~~~~~~
+
+The server can advertise the `object-format` capability with a value `X` (in the
+form `object-format=X`) to notify the client that the server is able to deal
+with objects using hash algorithm X.  If not specified, the server is assumed to
+only handle SHA-1.  If the client would like to use a hash algorithm other than
+SHA-1, it should specify its object-format string.

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 29/44] connect: pass full packet reader when parsing v2 refs
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (27 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 28/44] Documentation/technical: document object-format for protocol v2 brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 30/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
                     ` (15 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When we're parsing refs, we need to know not only what the line we're
parsing is, but also the hash algorithm we should use to parse it, which
is stored in the reader object.  Pass the packet reader object through
to the protocol v2 ref parsing function.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/connect.c b/connect.c
index b6e110cb24..320cc2016d 100644
--- a/connect.c
+++ b/connect.c
@@ -376,7 +376,7 @@ struct ref **get_remote_heads(struct packet_reader *reader,
 }
 
 /* Returns 1 when a valid ref has been added to `list`, 0 otherwise */
-static int process_ref_v2(const char *line, struct ref ***list)
+static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 {
 	int ret = 1;
 	int i = 0;
@@ -384,6 +384,7 @@ static int process_ref_v2(const char *line, struct ref ***list)
 	struct ref *ref;
 	struct string_list line_sections = STRING_LIST_INIT_DUP;
 	const char *end;
+	const char *line = reader->line;
 
 	/*
 	 * Ref lines have a number of fields which are space deliminated.  The
@@ -482,7 +483,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 
 	/* Process response from server */
 	while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
-		if (!process_ref_v2(reader->line, &list))
+		if (!process_ref_v2(reader, &list))
 			die(_("invalid ls-refs response: %s"), reader->line);
 	}
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 30/44] connect: parse v2 refs with correct hash algorithm
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (28 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 29/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 31/44] serve: advertise object-format capability for protocol v2 brian m. carlson
                     ` (14 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When using protocol v2, we need to know what hash algorithm is used by
the remote end.  See if the server has sent us an object-format
capability, and if so, use it to determine the hash algorithm in use and
set that value in the packet reader.  Parse the refs using this
algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/connect.c b/connect.c
index 320cc2016d..e6cf2f8dc4 100644
--- a/connect.c
+++ b/connect.c
@@ -284,7 +284,7 @@ static int process_ref(const struct packet_reader *reader, int len,
 		die(_("protocol error: unexpected capabilities^{}"));
 	} else if (check_ref(name, flags)) {
 		struct ref *ref = alloc_ref(name);
-		memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
+		oidcpy(&ref->old_oid, &old_oid);
 		**list = ref;
 		*list = &ref->next;
 	}
@@ -397,7 +397,7 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 		goto out;
 	}
 
-	if (parse_oid_hex(line_sections.items[i++].string, &old_oid, &end) ||
+	if (parse_oid_hex_algop(line_sections.items[i++].string, &old_oid, &end, reader->hash_algo) ||
 	    *end) {
 		ret = 0;
 		goto out;
@@ -405,7 +405,7 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 
 	ref = alloc_ref(line_sections.items[i++].string);
 
-	oidcpy(&ref->old_oid, &old_oid);
+	memcpy(ref->old_oid.hash, old_oid.hash, reader->hash_algo->rawsz);
 	**list = ref;
 	*list = &ref->next;
 
@@ -418,7 +418,8 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 			struct object_id peeled_oid;
 			char *peeled_name;
 			struct ref *peeled;
-			if (parse_oid_hex(arg, &peeled_oid, &end) || *end) {
+			if (parse_oid_hex_algop(arg, &peeled_oid, &end,
+						reader->hash_algo) || *end) {
 				ret = 0;
 				goto out;
 			}
@@ -426,7 +427,8 @@ static int process_ref_v2(struct packet_reader *reader, struct ref ***list)
 			peeled_name = xstrfmt("%s^{}", ref->name);
 			peeled = alloc_ref(peeled_name);
 
-			oidcpy(&peeled->old_oid, &peeled_oid);
+			memcpy(peeled->old_oid.hash, peeled_oid.hash,
+			       reader->hash_algo->rawsz);
 			**list = peeled;
 			*list = &peeled->next;
 
@@ -456,6 +458,7 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			     int stateless_rpc)
 {
 	int i;
+	const char *hash_name;
 	*list = NULL;
 
 	if (server_supports_v2("ls-refs", 1))
@@ -464,6 +467,14 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 	if (server_supports_v2("agent", 0))
 		packet_write_fmt(fd_out, "agent=%s", git_user_agent_sanitized());
 
+	if (server_feature_v2("object-format", &hash_name)) {
+		int hash_algo = hash_algo_by_name(hash_name);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("unknown object format '%s' specified by server"), hash_name);
+		reader->hash_algo = &hash_algos[hash_algo];
+		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
+	}
+
 	if (server_options && server_options->nr &&
 	    server_supports_v2("server-option", 1))
 		for (i = 0; i < server_options->nr; i++)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 31/44] serve: advertise object-format capability for protocol v2
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (29 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 30/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 32/44] t5500: make hash independent brian m. carlson
                     ` (13 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

In order to communicate the protocol supported by the server side, add
support for advertising the object-format capability.  We check that the
client side sends us an identical algorithm if it sends us its own
object-format capability, and assume it speaks SHA-1 if not.

In the test, when we're using an algorithm other than SHA-1, we need to
specify the algorithm in use so we don't get a failure with an "unknown
format" message.  Add a test that we handle a mismatched algorithm.
Remove the test_oid_init call since it's no longer necessary.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 connect.c            |  2 ++
 serve.c              | 27 +++++++++++++++++++++++++++
 t/t5701-git-serve.sh | 25 +++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/connect.c b/connect.c
index e6cf2f8dc4..e0d5b9fee0 100644
--- a/connect.c
+++ b/connect.c
@@ -473,6 +473,8 @@ struct ref **get_remote_refs(int fd_out, struct packet_reader *reader,
 			die(_("unknown object format '%s' specified by server"), hash_name);
 		reader->hash_algo = &hash_algos[hash_algo];
 		packet_write_fmt(fd_out, "object-format=%s", reader->hash_algo->name);
+	} else {
+		reader->hash_algo = &hash_algos[GIT_HASH_SHA1];
 	}
 
 	if (server_options && server_options->nr &&
diff --git a/serve.c b/serve.c
index c046926ba1..fbd2fcdfb5 100644
--- a/serve.c
+++ b/serve.c
@@ -22,6 +22,14 @@ static int agent_advertise(struct repository *r,
 	return 1;
 }
 
+static int object_format_advertise(struct repository *r,
+				   struct strbuf *value)
+{
+	if (value)
+		strbuf_addstr(value, r->hash_algo->name);
+	return 1;
+}
+
 struct protocol_capability {
 	/*
 	 * The name of the capability.  The server uses this name when
@@ -57,6 +65,7 @@ static struct protocol_capability capabilities[] = {
 	{ "ls-refs", always_advertise, ls_refs },
 	{ "fetch", upload_pack_advertise, upload_pack_v2 },
 	{ "server-option", always_advertise, NULL },
+	{ "object-format", object_format_advertise, NULL },
 };
 
 static void advertise_capabilities(void)
@@ -153,6 +162,22 @@ int has_capability(const struct argv_array *keys, const char *capability,
 	return 0;
 }
 
+static void check_algorithm(struct repository *r, struct argv_array *keys)
+{
+	int client = GIT_HASH_SHA1, server = hash_algo_by_ptr(r->hash_algo);
+	const char *algo_name;
+
+	if (has_capability(keys, "object-format", &algo_name)) {
+		client = hash_algo_by_name(algo_name);
+		if (client == GIT_HASH_UNKNOWN)
+			die("unknown object format '%s'", algo_name);
+	}
+
+	if (client != server)
+		die("mismatched object format: server %s; client %s\n",
+		    r->hash_algo->name, hash_algos[client].name);
+}
+
 enum request_state {
 	PROCESS_REQUEST_KEYS,
 	PROCESS_REQUEST_DONE,
@@ -225,6 +250,8 @@ static int process_request(void)
 	if (!command)
 		die("no command requested");
 
+	check_algorithm(the_repository, &keys);
+
 	command->command(the_repository, &keys, &reader);
 
 	argv_array_clear(&keys);
diff --git a/t/t5701-git-serve.sh b/t/t5701-git-serve.sh
index ffb9613885..a1f5fdc9fd 100755
--- a/t/t5701-git-serve.sh
+++ b/t/t5701-git-serve.sh
@@ -5,12 +5,17 @@ test_description='test protocol v2 server commands'
 . ./test-lib.sh
 
 test_expect_success 'test capability advertisement' '
+	test_oid_cache <<-EOF &&
+	wrong_algo sha1:sha256
+	wrong_algo sha256:sha1
+	EOF
 	cat >expect <<-EOF &&
 	version 2
 	agent=git/$(git version | cut -d" " -f3)
 	ls-refs
 	fetch=shallow
 	server-option
+	object-format=$(test_oid algo)
 	0000
 	EOF
 
@@ -45,6 +50,7 @@ test_expect_success 'request invalid capability' '
 test_expect_success 'request with no command' '
 	test-tool pkt-line pack >in <<-EOF &&
 	agent=git/test
+	object-format=$(test_oid algo)
 	0000
 	EOF
 	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
@@ -54,6 +60,7 @@ test_expect_success 'request with no command' '
 test_expect_success 'request invalid command' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=foo
+	object-format=$(test_oid algo)
 	agent=git/test
 	0000
 	EOF
@@ -61,6 +68,17 @@ test_expect_success 'request invalid command' '
 	test_i18ngrep "invalid command" err
 '
 
+test_expect_success 'wrong object-format' '
+	test-tool pkt-line pack >in <<-EOF &&
+	command=fetch
+	agent=git/test
+	object-format=$(test_oid wrong_algo)
+	0000
+	EOF
+	test_must_fail test-tool serve-v2 --stateless-rpc 2>err <in &&
+	test_i18ngrep "mismatched object format" err
+'
+
 # Test the basics of ls-refs
 #
 test_expect_success 'setup some refs and tags' '
@@ -74,6 +92,7 @@ test_expect_success 'setup some refs and tags' '
 test_expect_success 'basics of ls-refs' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0000
 	EOF
 
@@ -96,6 +115,7 @@ test_expect_success 'basics of ls-refs' '
 test_expect_success 'basic ref-prefixes' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0001
 	ref-prefix refs/heads/master
 	ref-prefix refs/tags/one
@@ -116,6 +136,7 @@ test_expect_success 'basic ref-prefixes' '
 test_expect_success 'refs/heads prefix' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0001
 	ref-prefix refs/heads/
 	0000
@@ -136,6 +157,7 @@ test_expect_success 'refs/heads prefix' '
 test_expect_success 'peel parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0001
 	peel
 	ref-prefix refs/tags/
@@ -157,6 +179,7 @@ test_expect_success 'peel parameter' '
 test_expect_success 'symrefs parameter' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	0001
 	symrefs
 	ref-prefix refs/heads/
@@ -178,6 +201,7 @@ test_expect_success 'symrefs parameter' '
 test_expect_success 'sending server-options' '
 	test-tool pkt-line pack >in <<-EOF &&
 	command=ls-refs
+	object-format=$(test_oid algo)
 	server-option=hello
 	server-option=world
 	0001
@@ -200,6 +224,7 @@ test_expect_success 'unexpected lines are not allowed in fetch request' '
 
 	test-tool pkt-line pack >in <<-EOF &&
 	command=fetch
+	object-format=$(test_oid algo)
 	0001
 	this-is-not-a-command
 	0000

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 32/44] t5500: make hash independent
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (30 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 31/44] serve: advertise object-format capability for protocol v2 brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 33/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
                     ` (12 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

This test has hard-coded pkt-lines with object IDs.  The pkt-line
lengths necessarily differ between hash algorithms, so generate these
lines with the packetize helper so they're always the right size.  In
addition, we will require an object-format capability for SHA-256, so
pass that capability on to the upload-pack process.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5500-fetch-pack.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 8c54e34ef1..dfed113247 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -871,9 +871,10 @@ test_expect_success 'shallow since with commit graph and already-seen commit' '
 
 	GIT_PROTOCOL=version=2 git upload-pack . <<-EOF >/dev/null
 	0012command=fetch
+	$(echo "object-format=$(test_oid algo)" | packetize)
 	00010013deepen-since 1
-	0032want $(git rev-parse other)
-	0032have $(git rev-parse master)
+	$(echo "want $(git rev-parse other)" | packetize)
+	$(echo "have $(git rev-parse master)" | packetize)
 	0000
 	EOF
 	)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 33/44] builtin/ls-remote: initialize repository based on fetch
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (31 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 32/44] t5500: make hash independent brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 34/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
                     ` (11 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

ls-remote may or may not operate within a repository, and as such will
not have been initialized with the repository's hash algorithm.  Even if
it were, the remote side could be using a different algorithm and we
would still want to display those refs properly.  Find the hash
algorithm used by the remote side by querying the transport object and
set our hash algorithm accordingly.

Without this change, if the remote side is using SHA-256, we truncate
the refs to 40 hex characters, since that's the length of the default
hash algorithm (SHA-1).

Note that technically this is not a correct setting of the repository
hash algorithm since, if we are in a repository, it might be one of a
different hash algorithm from the remote side.  However, our current
code paths don't handle multiple algorithms and won't for some time, so
this is the best we can do.  We rely on the fact that ls-remote never
modifies the current repository, which is a reasonable assumption to
make.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/ls-remote.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 6ef519514b..3a4dd12903 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -118,6 +118,10 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		transport->server_options = &server_options;
 
 	ref = transport_get_remote_refs(transport, &ref_prefixes);
+	if (ref) {
+		int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
+		repo_set_hash_algo(the_repository, hash_algo);
+	}
 	if (transport_disconnect(transport)) {
 		UNLEAK(sorting);
 		return 1;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 34/44] remote-curl: detect algorithm for dumb HTTP by size
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (32 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 33/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 35/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
                     ` (10 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When reading the info/refs file for a repository, we have no explicit
way to detect which hash algorithm is in use because the file doesn't
provide one. Detect the hash algorithm in use by the size of the first
object ID.

If we have an empty repository, we don't know what the hash algorithm is
on the remote side, so default to whatever the local side has
configured.  Without doing this, we cannot clone an empty repository
since we don't know its hash algorithm.  Test this case appropriately,
since we currently have no tests for cloning an empty repository with
the dumb HTTP protocol.

We anonymize the URL like elsewhere in the function in case the user has
decided to include a secret in the URL.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c              | 23 +++++++++++++++++++++--
 t/t5550-http-fetch-dumb.sh | 18 ++++++++++++++++++
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index f0203547c5..e666845d9d 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -252,6 +252,19 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	return list;
 }
 
+static const struct git_hash_algo *detect_hash_algo(struct discovery *heads)
+{
+	const char *p = memchr(heads->buf, '\t', heads->len);
+	int algo;
+	if (!p)
+		return the_hash_algo;
+
+	algo = hash_algo_by_length((p - heads->buf) / 2);
+	if (algo == GIT_HASH_UNKNOWN)
+		return NULL;
+	return &hash_algos[algo];
+}
+
 static struct ref *parse_info_refs(struct discovery *heads)
 {
 	char *data, *start, *mid;
@@ -262,6 +275,12 @@ static struct ref *parse_info_refs(struct discovery *heads)
 	struct ref *ref = NULL;
 	struct ref *last_ref = NULL;
 
+	options.hash_algo = detect_hash_algo(heads);
+	if (!options.hash_algo)
+		die("%sinfo/refs not valid: could not determine hash algorithm; "
+		    "is this a git repository?",
+		    transport_anonymize_url(url.buf));
+
 	data = heads->buf;
 	start = NULL;
 	mid = data;
@@ -272,13 +291,13 @@ static struct ref *parse_info_refs(struct discovery *heads)
 		if (data[i] == '\t')
 			mid = &data[i];
 		if (data[i] == '\n') {
-			if (mid - start != the_hash_algo->hexsz)
+			if (mid - start != options.hash_algo->hexsz)
 				die(_("%sinfo/refs not valid: is this a git repository?"),
 				    transport_anonymize_url(url.buf));
 			data[i] = 0;
 			ref_name = mid + 1;
 			ref = alloc_ref(ref_name);
-			get_oid_hex(start, &ref->old_oid);
+			get_oid_hex_algop(start, &ref->old_oid, options.hash_algo);
 			if (!refs)
 				refs = ref;
 			if (last_ref)
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 50485300eb..e57716bacd 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -50,6 +50,24 @@ test_expect_success 'create password-protected repository' '
 	       "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/repo.git"
 '
 
+test_expect_success 'create empty remote repository' '
+	git init --bare "$HTTPD_DOCUMENT_ROOT_PATH/empty.git" &&
+	(cd "$HTTPD_DOCUMENT_ROOT_PATH/empty.git" &&
+	 mkdir -p hooks &&
+	 write_script "hooks/post-update" <<-\EOF &&
+	 exec git update-server-info
+	EOF
+	 hooks/post-update
+	)
+'
+
+test_expect_success 'empty dumb HTTP repository has default hash algorithm' '
+	test_when_finished "rm -fr clone-empty" &&
+	git clone $HTTPD_URL/dumb/empty.git clone-empty &&
+	git -C clone-empty rev-parse --show-object-format >empty-format &&
+	test "$(cat empty-format)" = "$(test_oid algo)"
+'
+
 setup_askpass_helper
 
 test_expect_success 'cloning password-protected repository can fail' '

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 35/44] builtin/index-pack: add option to specify hash algorithm
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (33 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 34/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 36/44] t1050: pass algorithm to index-pack when outside repo brian m. carlson
                     ` (9 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

git index-pack is usually run in a repository, but need not be. Since
packs don't contains information on the algorithm in use, instead
relying on context, add an option to index-pack to tell it which one
we're using in case someone runs it outside of a repository.  Since
using --stdin necessarily implies a repository, don't allow specifying
an object format if it's provided to prevent users from passing an
option that won't work.  Add documentation for this option.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-index-pack.txt | 8 ++++++++
 builtin/index-pack.c             | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index d5b7560bfe..9316d9a80b 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -93,6 +93,14 @@ OPTIONS
 --max-input-size=<size>::
 	Die, if the pack is larger than <size>.
 
+--object-format=<hash-algorithm>::
+	Specify the given object format (hash algorithm) for the pack.  The valid
+	values are 'sha1' and (if enabled) 'sha256'.  The default is the algorithm for
+	the current repository (set by `extensions.objectFormat`), or 'sha1' if no
+	value is set or outside a repository.
++
+This option cannot be used with --stdin.
+
 NOTES
 -----
 
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7bea1fba52..f865666db9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1667,6 +1667,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	unsigned char pack_hash[GIT_MAX_RAWSZ];
 	unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
 	int report_end_of_input = 0;
+	int hash_algo = 0;
 
 	/*
 	 * index-pack never needs to fetch missing objects except when
@@ -1760,6 +1761,11 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 					die(_("bad %s"), arg);
 			} else if (skip_prefix(arg, "--max-input-size=", &arg)) {
 				max_input_size = strtoumax(arg, NULL, 10);
+			} else if (skip_prefix(arg, "--object-format=", &arg)) {
+				hash_algo = hash_algo_by_name(arg);
+				if (hash_algo == GIT_HASH_UNKNOWN)
+					die(_("unknown hash algorithm '%s'"), arg);
+				repo_set_hash_algo(the_repository, hash_algo);
 			} else
 				usage(index_pack_usage);
 			continue;
@@ -1776,6 +1782,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		die(_("--fix-thin cannot be used without --stdin"));
 	if (from_stdin && !startup_info->have_repository)
 		die(_("--stdin requires a git repository"));
+	if (from_stdin && hash_algo)
+		die(_("--object-format cannot be used with --stdin"));
 	if (!index_name && pack_name)
 		index_name = derive_filename(pack_name, "idx", &index_name_buf);
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 36/44] t1050: pass algorithm to index-pack when outside repo
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (34 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 35/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 37/44] remote-curl: avoid truncating refs with ls-remote brian m. carlson
                     ` (8 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When outside a repository, git index-pack is unable to guess the hash
algorithm in use for a pack, since packs don't contain any information
on the algorithm in use. Pass an option to index-pack to help it out in
this test.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t1050-large.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 7f88ea07c2..6a56d1ca24 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -12,6 +12,7 @@ file_size () {
 }
 
 test_expect_success setup '
+	test_oid_init &&
 	# clone does not allow us to pass core.bigfilethreshold to
 	# new repos, so set core.bigfilethreshold globally
 	git config --global core.bigfilethreshold 200k &&
@@ -177,7 +178,8 @@ test_expect_success 'git-show a large file' '
 
 test_expect_success 'index-pack' '
 	git clone file://"$(pwd)"/.git foo &&
-	GIT_DIR=non-existent git index-pack --strict --verify foo/.git/objects/pack/*.pack
+	GIT_DIR=non-existent git index-pack --object-format=$(test_oid algo) \
+		--strict --verify foo/.git/objects/pack/*.pack
 '
 
 test_expect_success 'repack' '

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 37/44] remote-curl: avoid truncating refs with ls-remote
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (35 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 36/44] t1050: pass algorithm to index-pack when outside repo brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 38/44] t/helper: initialize the repository for test-sha1-array brian m. carlson
                     ` (7 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Normally, the remote-curl transport helper is aware of the hash
algorithm we're using because we're in a repo with the appropriate hash
algorithm set. However, when using git ls-remote outside of a
repository, we won't have initialized the hash algorithm properly, so
use hash_to_hex_algop to print the ref corresponding to the algorithm
we've detected.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 remote-curl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index e666845d9d..5cbc6e5002 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -548,7 +548,9 @@ static void output_refs(struct ref *refs)
 		if (posn->symref)
 			printf("@%s %s\n", posn->symref, posn->name);
 		else
-			printf("%s %s\n", oid_to_hex(&posn->old_oid), posn->name);
+			printf("%s %s\n", hash_to_hex_algop(posn->old_oid.hash,
+							    options.hash_algo),
+					  posn->name);
 	}
 	printf("\n");
 	fflush(stdout);

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 38/44] t/helper: initialize the repository for test-sha1-array
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (36 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 37/44] remote-curl: avoid truncating refs with ls-remote brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 39/44] t5702: offer an object-format capability in the test brian m. carlson
                     ` (6 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

test-sha1-array uses the_hash_algo under the hood. Since t0064 wants to
use the value that is correct for the hash algorithm that we're testing,
make sure the test helper initializes the repository to set
the_hash_algo correctly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/helper/test-oid-array.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/t/helper/test-oid-array.c b/t/helper/test-oid-array.c
index ce9fd5f091..b16cd0b11b 100644
--- a/t/helper/test-oid-array.c
+++ b/t/helper/test-oid-array.c
@@ -12,6 +12,9 @@ int cmd__oid_array(int argc, const char **argv)
 {
 	struct oid_array array = OID_ARRAY_INIT;
 	struct strbuf line = STRBUF_INIT;
+	int nongit_ok;
+
+	setup_git_directory_gently(&nongit_ok);
 
 	while (strbuf_getline(&line, stdin) != EOF) {
 		const char *arg;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 39/44] t5702: offer an object-format capability in the test
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (37 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 38/44] t/helper: initialize the repository for test-sha1-array brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 40/44] t5703: use object-format serve option brian m. carlson
                     ` (5 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

In order to make this test work with SHA-256, offer an object-format
capability so that both sides use the same algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5702-protocol-v2.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 8da65e60de..63f425bbad 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -13,6 +13,7 @@ start_git_daemon --export-all --enable=receive-pack
 daemon_parent=$GIT_DAEMON_DOCUMENT_ROOT_PATH/parent
 
 test_expect_success 'create repo to be served by git-daemon' '
+	test_oid_init &&
 	git init "$daemon_parent" &&
 	test_commit -C "$daemon_parent" one
 '
@@ -394,6 +395,7 @@ test_expect_success 'even with handcrafted request, filter does not work if not
 	# Custom request that tries to filter even though it is not advertised.
 	test-tool pkt-line pack >in <<-EOF &&
 	command=fetch
+	object-format=$(test_oid algo)
 	0001
 	want $(git -C server rev-parse master)
 	filter blob:none

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 40/44] t5703: use object-format serve option
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (38 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 39/44] t5702: offer an object-format capability in the test brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 41/44] t5704: send object-format capability with SHA-256 brian m. carlson
                     ` (4 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When we're using an algorithm other than SHA-1, we need to specify the
algorithm in use so we don't get a failure with an "unknown format"
message. Add a wrapper function that specifies this header if required.
Skip specifying this header for SHA-1 to test that it works both with an
without this header.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5703-upload-pack-ref-in-want.sh | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/t/t5703-upload-pack-ref-in-want.sh b/t/t5703-upload-pack-ref-in-want.sh
index 92ad5eeec0..748282f058 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -27,6 +27,15 @@ check_output () {
 	test_cmp sorted_commits actual_commits
 }
 
+write_command () {
+	echo "command=$1"
+
+	if test "$(test_oid algo)" != sha1
+	then
+		echo "object-format=$(test_oid algo)"
+	fi
+}
+
 # c(o/foo) d(o/bar)
 #        \ /
 #         b   e(baz)  f(master)
@@ -65,7 +74,7 @@ test_expect_success 'config controls ref-in-want advertisement' '
 
 test_expect_success 'invalid want-ref line' '
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/non-existent
@@ -86,7 +95,7 @@ test_expect_success 'basic want-ref' '
 
 	oid=$(git rev-parse a) &&
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/master
@@ -110,7 +119,7 @@ test_expect_success 'multiple want-ref lines' '
 
 	oid=$(git rev-parse b) &&
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/o/foo
@@ -132,7 +141,7 @@ test_expect_success 'mix want and want-ref' '
 	git rev-parse e f >expected_commits &&
 
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/master
@@ -155,7 +164,7 @@ test_expect_success 'want-ref with ref we already have commit for' '
 
 	oid=$(git rev-parse c) &&
 	test-tool pkt-line pack >in <<-EOF &&
-	command=fetch
+	$(write_command fetch)
 	0001
 	no-progress
 	want-ref refs/heads/o/foo

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 41/44] t5704: send object-format capability with SHA-256
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (39 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 40/44] t5703: use object-format serve option brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:55   ` [PATCH v3 42/44] t5300: pass --object-format to git index-pack brian m. carlson
                     ` (3 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When we speak protocol v2 in this test, we must pass the object-format
header if the algorithm is not SHA-1.  Otherwise, git upload-pack fails
because the hash algorithm doesn't match and not because we've failed to
speak the protocol correctly.  Pass the header so that our assertions
test what we're really interested in.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5704-protocol-violations.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t5704-protocol-violations.sh b/t/t5704-protocol-violations.sh
index 950cfb21fe..5c941949b9 100755
--- a/t/t5704-protocol-violations.sh
+++ b/t/t5704-protocol-violations.sh
@@ -9,6 +9,7 @@ making sure that we do not segfault or otherwise behave badly.'
 test_expect_success 'extra delim packet in v2 ls-refs args' '
 	{
 		packetize command=ls-refs &&
+		packetize "object-format=$(test_oid algo)" &&
 		printf 0001 &&
 		# protocol expects 0000 flush here
 		printf 0001
@@ -21,6 +22,7 @@ test_expect_success 'extra delim packet in v2 ls-refs args' '
 test_expect_success 'extra delim packet in v2 fetch args' '
 	{
 		packetize command=fetch &&
+		packetize "object-format=$(test_oid algo)" &&
 		printf 0001 &&
 		# protocol expects 0000 flush here
 		printf 0001

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 42/44] t5300: pass --object-format to git index-pack
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (40 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 41/44] t5704: send object-format capability with SHA-256 brian m. carlson
@ 2020-06-19 17:55   ` brian m. carlson
  2020-06-19 17:56   ` [PATCH v3 43/44] bundle: detect hash algorithm when reading refs brian m. carlson
                     ` (2 subsequent siblings)
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

git index-pack by default reads the repository to determine the object
format. However, when outside of a repository, it's necessary to specify
the hash algorithm in use so that the pack can be properly indexed. Add
an --object-format argument when invoking git index-pack outside of a
repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5300-pack-object.sh | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 410a09b0dd..746cdb626e 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -12,7 +12,8 @@ TRASH=$(pwd)
 
 test_expect_success \
     'setup' \
-    'rm -f .git/index* &&
+    'test_oid_init &&
+     rm -f .git/index* &&
      perl -e "print \"a\" x 4096;" > a &&
      perl -e "print \"b\" x 4096;" > b &&
      perl -e "print \"c\" x 4096;" > c &&
@@ -412,18 +413,18 @@ test_expect_success 'set up pack for non-repo tests' '
 '
 
 test_expect_success 'index-pack --stdin complains of non-repo' '
-	nongit test_must_fail git index-pack --stdin <foo.pack &&
+	nongit test_must_fail git index-pack --object-format=$(test_oid algo) --stdin <foo.pack &&
 	test_path_is_missing non-repo/.git
 '
 
 test_expect_success 'index-pack <pack> works in non-repo' '
-	nongit git index-pack ../foo.pack &&
+	nongit git index-pack --object-format=$(test_oid algo) ../foo.pack &&
 	test_path_is_file foo.idx
 '
 
 test_expect_success 'index-pack --strict <pack> works in non-repo' '
 	rm -f foo.idx &&
-	nongit git index-pack --strict ../foo.pack &&
+	nongit git index-pack --strict --object-format=$(test_oid algo) ../foo.pack &&
 	test_path_is_file foo.idx
 '
 

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 43/44] bundle: detect hash algorithm when reading refs
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (41 preceding siblings ...)
  2020-06-19 17:55   ` [PATCH v3 42/44] t5300: pass --object-format to git index-pack brian m. carlson
@ 2020-06-19 17:56   ` brian m. carlson
  2020-06-19 17:56   ` [PATCH v3 44/44] remote-testgit: adapt for object-format brian m. carlson
  2020-06-19 21:09   ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality Junio C Hamano
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

Much like with the dumb HTTP transport, there isn't a way to explicitly
specify the hash algorithm when dealing with a bundle, so detect the
algorithm based on the length of the object IDs in the prerequisites and
ref advertisements.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 bundle.c    | 22 +++++++++++++++++++++-
 bundle.h    |  1 +
 transport.c | 10 ++++++++--
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/bundle.c b/bundle.c
index 99439e07a1..2a0d744d3f 100644
--- a/bundle.c
+++ b/bundle.c
@@ -23,6 +23,17 @@ static void add_to_ref_list(const struct object_id *oid, const char *name,
 	list->nr++;
 }
 
+static const struct git_hash_algo *detect_hash_algo(struct strbuf *buf)
+{
+	size_t len = strcspn(buf->buf, " \n");
+	int algo;
+
+	algo = hash_algo_by_length(len / 2);
+	if (algo == GIT_HASH_UNKNOWN)
+		return NULL;
+	return &hash_algos[algo];
+}
+
 static int parse_bundle_header(int fd, struct bundle_header *header,
 			       const char *report_path)
 {
@@ -52,12 +63,21 @@ static int parse_bundle_header(int fd, struct bundle_header *header,
 		}
 		strbuf_rtrim(&buf);
 
+		if (!header->hash_algo) {
+			header->hash_algo = detect_hash_algo(&buf);
+			if (!header->hash_algo) {
+				error(_("unknown hash algorithm length"));
+				status = -1;
+				break;
+			}
+		}
+
 		/*
 		 * Tip lines have object name, SP, and refname.
 		 * Prerequisites have object name that is optionally
 		 * followed by SP and subject line.
 		 */
-		if (parse_oid_hex(buf.buf, &oid, &p) ||
+		if (parse_oid_hex_algop(buf.buf, &oid, &p, header->hash_algo) ||
 		    (*p && !isspace(*p)) ||
 		    (!is_prereq && !*p)) {
 			if (report_path)
diff --git a/bundle.h b/bundle.h
index ceab0c7475..2dc9442024 100644
--- a/bundle.h
+++ b/bundle.h
@@ -15,6 +15,7 @@ struct ref_list {
 struct bundle_header {
 	struct ref_list prerequisites;
 	struct ref_list references;
+	const struct git_hash_algo *hash_algo;
 };
 
 int is_bundle(const char *path, int quiet);
diff --git a/transport.c b/transport.c
index a016f41702..b255f123c0 100644
--- a/transport.c
+++ b/transport.c
@@ -143,6 +143,9 @@ static struct ref *get_refs_from_bundle(struct transport *transport,
 	data->fd = read_bundle_header(transport->url, &data->header);
 	if (data->fd < 0)
 		die(_("could not read bundle '%s'"), transport->url);
+
+	transport->hash_algo = data->header.hash_algo;
+
 	for (i = 0; i < data->header.references.nr; i++) {
 		struct ref_list_entry *e = data->header.references.list + i;
 		struct ref *ref = alloc_ref(e->name);
@@ -157,11 +160,14 @@ static int fetch_refs_from_bundle(struct transport *transport,
 			       int nr_heads, struct ref **to_fetch)
 {
 	struct bundle_transport_data *data = transport->data;
+	int ret;
 
 	if (!data->get_refs_from_bundle_called)
 		get_refs_from_bundle(transport, 0, NULL);
-	return unbundle(the_repository, &data->header, data->fd,
-			transport->progress ? BUNDLE_VERBOSE : 0);
+	ret = unbundle(the_repository, &data->header, data->fd,
+			   transport->progress ? BUNDLE_VERBOSE : 0);
+	transport->hash_algo = data->header.hash_algo;
+	return ret;
 }
 
 static int close_bundle(struct transport *transport)

^ permalink raw reply	[flat|nested] 175+ messages in thread

* [PATCH v3 44/44] remote-testgit: adapt for object-format
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (42 preceding siblings ...)
  2020-06-19 17:56   ` [PATCH v3 43/44] bundle: detect hash algorithm when reading refs brian m. carlson
@ 2020-06-19 17:56   ` brian m. carlson
  2020-06-19 21:09   ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality Junio C Hamano
  44 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-19 17:56 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Martin Ågren

When using an algorithm other than SHA-1, we need the remote helper to
advertise support for the object-format extension and provide
information back to us so that we can properly parse refs and return
data. Ensure that the test remote helper understands these extensions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t5801/git-remote-testgit | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
index 6b9f0b5dc7..1544d6dc6b 100755
--- a/t/t5801/git-remote-testgit
+++ b/t/t5801/git-remote-testgit
@@ -52,9 +52,11 @@ do
 		test -n "$GIT_REMOTE_TESTGIT_SIGNED_TAGS" && echo "signed-tags"
 		test -n "$GIT_REMOTE_TESTGIT_NO_PRIVATE_UPDATE" && echo "no-private-update"
 		echo 'option'
+		echo 'object-format'
 		echo
 		;;
 	list)
+		echo ":object-format $(git rev-parse --show-object-format=storage)"
 		git for-each-ref --format='? %(refname)' 'refs/heads/' 'refs/tags/'
 		head=$(git symbolic-ref HEAD)
 		echo "@$head HEAD"
@@ -139,6 +141,10 @@ do
 			test $val = "true" && force="true" || force=
 			echo "ok"
 			;;
+		object-format)
+			test $val = "true" && object_format="true" || object_format=
+			echo "ok"
+			;;
 		*)
 			echo "unsupported"
 			;;

^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality
  2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
                     ` (43 preceding siblings ...)
  2020-06-19 17:56   ` [PATCH v3 44/44] remote-testgit: adapt for object-format brian m. carlson
@ 2020-06-19 21:09   ` Junio C Hamano
  2020-06-20  1:33     ` brian m. carlson
  44 siblings, 1 reply; 175+ messages in thread
From: Junio C Hamano @ 2020-06-19 21:09 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Martin Ågren

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> This is part 2 of 3 of the SHA-256 work.  This series adds all of the
> protocol logic to work with SHA-256 repositories.
>
> v3 fixes a bug in patch 34 which prevented cloning an empty repository
> with the dumb HTTP protocol.  We look up the hash algorithm by length of
> the data in the info/refs file and if we have no refs, we have no
> entries.
>
> Previously, we just failed and complained, which isn't really helpful,
> nor is it backward compatible.  So now we use whatever the default is
> for the current repository.  That means we honor GIT_DEFAULT_HASH or git
> clone -c, and default to SHA-1 otherwise.  Users are encouraged to
> switch to the smart protocol if they need to distinguish the remote
> side's hash algorithm when the repository is empty.
>
> There are tests for the default hash behavior, but not for git clone -c,
> because the extensions.objectformat option doesn't exist yet.  I have
> tested that it does indeed work, though.
>
> Otherwise, this series is the same as v2 except for a rebase (for my
> convenience and Junio's).

Not mine, though.  Keeping the same base is easier to see the
incremental difference.

It wasn't too cumbersome to rebase back on the same base as what was
queued (and the making sure the result, when merged to 'master',
matches the result of applying all these patches directly on top of
'master'), though ;-)

In any case, the updated step 34 made sense to me.  Thanks.


^ permalink raw reply	[flat|nested] 175+ messages in thread

* Re: [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality
  2020-06-19 21:09   ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality Junio C Hamano
@ 2020-06-20  1:33     ` brian m. carlson
  0 siblings, 0 replies; 175+ messages in thread
From: brian m. carlson @ 2020-06-20  1:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Martin Ågren


[-- Attachment #1: Type: text/plain, Size: 1281 bytes --]

On 2020-06-19 at 21:09:33, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > Otherwise, this series is the same as v2 except for a rebase (for my
> > convenience and Junio's).
> 
> Not mine, though.  Keeping the same base is easier to see the
> incremental difference.

Okay, sorry about that.  It does make it more convenient for me
eventually (since I get to resolve conflicts more incrementally), but I
don't usually have to worry about that until the series hits master, so
I can hold off.  I'm not rebasing as many patches anymore, so it's less
of a problem for me.

> It wasn't too cumbersome to rebase back on the same base as what was
> queued (and the making sure the result, when merged to 'master',
> matches the result of applying all these patches directly on top of
> 'master'), though ;-)
> 
> In any case, the updated step 34 made sense to me.  Thanks.

Yeah, I discovered it the other day when updating another project to
deal with a SHA-256 Git, and I happen to be on vacation today, so I
thought I'd send out a quick fix.  I was surprised to learn that we had
no tests for cloning empty repositories, but here we are.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 175+ messages in thread

end of thread, back to index

Thread overview: 175+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13  0:53 [PATCH 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
2020-05-13  0:53 ` [PATCH 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
2020-05-13  0:53 ` [PATCH 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
2020-05-13 19:28   ` Martin Ågren
2020-05-14  1:12     ` Junio C Hamano
2020-05-15 23:22       ` brian m. carlson
2020-05-16  0:02         ` Junio C Hamano
2020-05-13  0:53 ` [PATCH 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
2020-05-13 19:30   ` Martin Ågren
2020-05-13  0:53 ` [PATCH 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
2020-05-13 19:32   ` Martin Ågren
2020-05-13  0:53 ` [PATCH 05/44] remote: advertise the object-format capability on the server side brian m. carlson
2020-05-13  0:53 ` [PATCH 06/44] connect: add function to parse multiple v1 capability values brian m. carlson
2020-05-13  0:53 ` [PATCH 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
2020-05-13 19:37   ` Martin Ågren
2020-05-13  0:53 ` [PATCH 08/44] pkt-line: add a member for hash algorithm brian m. carlson
2020-05-13  0:53 ` [PATCH 09/44] transport: add a hash algorithm member brian m. carlson
2020-05-13  0:53 ` [PATCH 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
2020-05-13 19:39   ` Martin Ågren
2020-05-13 22:49     ` brian m. carlson
2020-05-13  0:53 ` [PATCH 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
2020-05-13 19:41   ` Martin Ågren
2020-05-13 22:52     ` brian m. carlson
2020-05-13  0:53 ` [PATCH 12/44] connect: make parse_feature_value extern brian m. carlson
2020-05-13 19:48   ` Martin Ågren
2020-05-13  0:53 ` [PATCH 13/44] fetch-pack: detect when the server doesn't support our hash brian m. carlson
2020-05-13  0:53 ` [PATCH 14/44] connect: detect algorithm when fetching refs brian m. carlson
2020-05-16 10:40   ` Martin Ågren
2020-05-16 19:59     ` brian m. carlson
2020-05-13  0:53 ` [PATCH 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
2020-05-16 10:41   ` Martin Ågren
2020-05-13  0:53 ` [PATCH 16/44] docs: update remote helper docs for object-format extensions brian m. carlson
2020-05-13  0:53 ` [PATCH 17/44] transport-helper: implement " brian m. carlson
2020-05-13  0:53 ` [PATCH 18/44] remote-curl: " brian m. carlson
2020-05-13  0:53 ` [PATCH 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
2020-05-16 10:48   ` Martin Ågren
2020-05-13  0:54 ` [PATCH 20/44] t5562: pass object-format in synthesized test data brian m. carlson
2020-05-16 10:55   ` Martin Ågren
2020-05-16 19:50     ` brian m. carlson
2020-05-13  0:54 ` [PATCH 21/44] t5704: send object-format capability with SHA-256 brian m. carlson
2020-05-16 11:02   ` Martin Ågren
2020-05-16 19:14     ` brian m. carlson
2020-05-13  0:54 ` [PATCH 22/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
2020-05-16 11:03   ` Martin Ågren
2020-05-13  0:54 ` [PATCH 23/44] setup: set the_repository's hash algo when checking format brian m. carlson
2020-05-16 11:03   ` Martin Ågren
2020-05-16 19:29     ` brian m. carlson
2020-05-13  0:54 ` [PATCH 24/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
2020-05-16 11:04   ` Martin Ågren
2020-05-13  0:54 ` [PATCH 25/44] packfile: compute and use the index CRC offset brian m. carlson
2020-05-16 11:12   ` Martin Ågren
2020-05-13  0:54 ` [PATCH 26/44] t5302: modernize test formatting brian m. carlson
2020-05-13  0:54 ` [PATCH 27/44] builtin/show-index: provide options to determine hash algo brian m. carlson
2020-05-18 16:20   ` Junio C Hamano
2020-05-19  0:31     ` brian m. carlson
2020-05-13  0:54 ` [PATCH 28/44] t1302: expect repo format version 1 for SHA-256 brian m. carlson
2020-05-13  0:54 ` [PATCH 29/44] Documentation/technical: document object-format for protocol v2 brian m. carlson
2020-05-13  0:54 ` [PATCH 30/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
2020-05-16 11:13   ` Martin Ågren
2020-05-13  0:54 ` [PATCH 31/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
2020-05-16 11:14   ` Martin Ågren
2020-05-17 22:37     ` brian m. carlson
2020-05-13  0:54 ` [PATCH 32/44] serve: advertise object-format capability for protocol v2 brian m. carlson
2020-05-16 11:15   ` Martin Ågren
2020-05-13  0:54 ` [PATCH 33/44] t5500: make hash independent brian m. carlson
2020-05-13  0:54 ` [PATCH 34/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
2020-05-16 11:16   ` Martin Ågren
2020-05-16 20:28     ` brian m. carlson
2020-05-13  0:54 ` [PATCH 35/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
2020-05-16 11:17   ` Martin Ågren
2020-05-13  0:54 ` [PATCH 36/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
2020-05-16 11:18   ` Martin Ågren
2020-05-16 20:47     ` brian m. carlson
2020-05-17 18:16       ` Martin Ågren
2020-05-17 20:52         ` brian m. carlson
2020-05-13  0:54 ` [PATCH 37/44] t1050: pass algorithm to index-pack when outside repo brian m. carlson
2020-05-13  0:54 ` [PATCH 38/44] remote-curl: avoid truncating refs with ls-remote brian m. carlson
2020-05-13  0:54 ` [PATCH 39/44] t/helper: initialize the repository for test-sha1-array brian m. carlson
2020-05-13  0:54 ` [PATCH 40/44] t5702: offer an object-format capability in the test brian m. carlson
2020-05-13  0:54 ` [PATCH 41/44] t5703: use object-format serve option brian m. carlson
2020-05-13  0:54 ` [PATCH 42/44] t5300: pass --object-format to git index-pack brian m. carlson
2020-05-13  0:54 ` [PATCH 43/44] bundle: detect hash algorithm when reading refs brian m. carlson
2020-05-13  0:54 ` [PATCH 44/44] remote-testgit: adapt for object-format brian m. carlson
2020-05-25 19:58 ` [PATCH v2 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
2020-05-25 19:58   ` [PATCH v2 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
2020-05-25 19:58   ` [PATCH v2 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
2020-05-25 19:58   ` [PATCH v2 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
2020-05-25 19:58   ` [PATCH v2 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
2020-05-25 19:58   ` [PATCH v2 05/44] remote: advertise the object-format capability on the server side brian m. carlson
2020-05-25 19:58   ` [PATCH v2 06/44] connect: add function to parse multiple v1 capability values brian m. carlson
2020-05-25 19:58   ` [PATCH v2 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
2020-05-25 19:58   ` [PATCH v2 08/44] pkt-line: add a member for hash algorithm brian m. carlson
2020-05-25 19:58   ` [PATCH v2 09/44] transport: add a hash algorithm member brian m. carlson
2020-05-25 19:58   ` [PATCH v2 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
2020-05-25 19:58   ` [PATCH v2 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
2020-05-25 19:58   ` [PATCH v2 12/44] connect: make parse_feature_value extern brian m. carlson
2020-05-25 19:58   ` [PATCH v2 13/44] fetch-pack: detect when the server doesn't support our hash brian m. carlson
2020-05-25 19:59   ` [PATCH v2 14/44] connect: detect algorithm when fetching refs brian m. carlson
2020-05-25 19:59   ` [PATCH v2 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
2020-05-25 19:59   ` [PATCH v2 16/44] docs: update remote helper docs for object-format extensions brian m. carlson
2020-05-25 19:59   ` [PATCH v2 17/44] transport-helper: implement " brian m. carlson
2020-05-25 19:59   ` [PATCH v2 18/44] remote-curl: " brian m. carlson
2020-05-25 19:59   ` [PATCH v2 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
2020-05-25 19:59   ` [PATCH v2 20/44] t5562: pass object-format in synthesized test data brian m. carlson
2020-05-25 19:59   ` [PATCH v2 21/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
2020-05-25 19:59   ` [PATCH v2 22/44] setup: set the_repository's hash algo when checking format brian m. carlson
2020-05-25 19:59   ` [PATCH v2 23/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
2020-05-25 19:59   ` [PATCH v2 24/44] packfile: compute and use the index CRC offset brian m. carlson
2020-05-25 19:59   ` [PATCH v2 25/44] t5302: modernize test formatting brian m. carlson
2020-05-25 19:59   ` [PATCH v2 26/44] builtin/show-index: provide options to determine hash algo brian m. carlson
2020-05-25 19:59   ` [PATCH v2 27/44] t1302: expect repo format version 1 for SHA-256 brian m. carlson
2020-05-25 19:59   ` [PATCH v2 28/44] Documentation/technical: document object-format for protocol v2 brian m. carlson
2020-05-25 19:59   ` [PATCH v2 29/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
2020-05-25 19:59   ` [PATCH v2 30/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
2020-05-25 19:59   ` [PATCH v2 31/44] serve: advertise object-format capability for protocol v2 brian m. carlson
2020-05-25 19:59   ` [PATCH v2 32/44] t5500: make hash independent brian m. carlson
2020-05-25 19:59   ` [PATCH v2 33/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
2020-05-25 19:59   ` [PATCH v2 34/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
2020-05-25 19:59   ` [PATCH v2 35/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
2020-05-25 19:59   ` [PATCH v2 36/44] t1050: pass algorithm to index-pack when outside repo brian m. carlson
2020-05-25 19:59   ` [PATCH v2 37/44] remote-curl: avoid truncating refs with ls-remote brian m. carlson
2020-05-25 19:59   ` [PATCH v2 38/44] t/helper: initialize the repository for test-sha1-array brian m. carlson
2020-05-25 19:59   ` [PATCH v2 39/44] t5702: offer an object-format capability in the test brian m. carlson
2020-05-25 19:59   ` [PATCH v2 40/44] t5703: use object-format serve option brian m. carlson
2020-05-25 19:59   ` [PATCH v2 41/44] t5704: send object-format capability with SHA-256 brian m. carlson
2020-05-25 19:59   ` [PATCH v2 42/44] t5300: pass --object-format to git index-pack brian m. carlson
2020-05-25 19:59   ` [PATCH v2 43/44] bundle: detect hash algorithm when reading refs brian m. carlson
2020-05-25 19:59   ` [PATCH v2 44/44] remote-testgit: adapt for object-format brian m. carlson
2020-06-19 17:55 ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality brian m. carlson
2020-06-19 17:55   ` [PATCH v3 01/44] t1050: match object ID paths in a hash-insensitive way brian m. carlson
2020-06-19 17:55   ` [PATCH v3 02/44] Documentation: document v1 protocol object-format capability brian m. carlson
2020-06-19 17:55   ` [PATCH v3 03/44] connect: have ref processing code take struct packet_reader brian m. carlson
2020-06-19 17:55   ` [PATCH v3 04/44] wrapper: add function to compare strings with different NUL termination brian m. carlson
2020-06-19 17:55   ` [PATCH v3 05/44] remote: advertise the object-format capability on the server side brian m. carlson
2020-06-19 17:55   ` [PATCH v3 06/44] connect: add function to parse multiple v1 capability values brian m. carlson
2020-06-19 17:55   ` [PATCH v3 07/44] connect: add function to fetch value of a v2 server capability brian m. carlson
2020-06-19 17:55   ` [PATCH v3 08/44] pkt-line: add a member for hash algorithm brian m. carlson
2020-06-19 17:55   ` [PATCH v3 09/44] transport: add a hash algorithm member brian m. carlson
2020-06-19 17:55   ` [PATCH v3 10/44] connect: add function to detect supported v1 hash functions brian m. carlson
2020-06-19 17:55   ` [PATCH v3 11/44] send-pack: detect when the server doesn't support our hash brian m. carlson
2020-06-19 17:55   ` [PATCH v3 12/44] connect: make parse_feature_value extern brian m. carlson
2020-06-19 17:55   ` [PATCH v3 13/44] fetch-pack: detect when the server doesn't support our hash brian m. carlson
2020-06-19 17:55   ` [PATCH v3 14/44] connect: detect algorithm when fetching refs brian m. carlson
2020-06-19 17:55   ` [PATCH v3 15/44] builtin/receive-pack: detect when the server doesn't support our hash brian m. carlson
2020-06-19 17:55   ` [PATCH v3 16/44] docs: update remote helper docs for object-format extensions brian m. carlson
2020-06-19 17:55   ` [PATCH v3 17/44] transport-helper: implement " brian m. carlson
2020-06-19 17:55   ` [PATCH v3 18/44] remote-curl: " brian m. carlson
2020-06-19 17:55   ` [PATCH v3 19/44] builtin/clone: initialize hash algorithm properly brian m. carlson
2020-06-19 17:55   ` [PATCH v3 20/44] t5562: pass object-format in synthesized test data brian m. carlson
2020-06-19 17:55   ` [PATCH v3 21/44] fetch-pack: parse and advertise the object-format capability brian m. carlson
2020-06-19 17:55   ` [PATCH v3 22/44] setup: set the_repository's hash algo when checking format brian m. carlson
2020-06-19 17:55   ` [PATCH v3 23/44] t3200: mark assertion with SHA1 prerequisite brian m. carlson
2020-06-19 17:55   ` [PATCH v3 24/44] packfile: compute and use the index CRC offset brian m. carlson
2020-06-19 17:55   ` [PATCH v3 25/44] t5302: modernize test formatting brian m. carlson
2020-06-19 17:55   ` [PATCH v3 26/44] builtin/show-index: provide options to determine hash algo brian m. carlson
2020-06-19 17:55   ` [PATCH v3 27/44] t1302: expect repo format version 1 for SHA-256 brian m. carlson
2020-06-19 17:55   ` [PATCH v3 28/44] Documentation/technical: document object-format for protocol v2 brian m. carlson
2020-06-19 17:55   ` [PATCH v3 29/44] connect: pass full packet reader when parsing v2 refs brian m. carlson
2020-06-19 17:55   ` [PATCH v3 30/44] connect: parse v2 refs with correct hash algorithm brian m. carlson
2020-06-19 17:55   ` [PATCH v3 31/44] serve: advertise object-format capability for protocol v2 brian m. carlson
2020-06-19 17:55   ` [PATCH v3 32/44] t5500: make hash independent brian m. carlson
2020-06-19 17:55   ` [PATCH v3 33/44] builtin/ls-remote: initialize repository based on fetch brian m. carlson
2020-06-19 17:55   ` [PATCH v3 34/44] remote-curl: detect algorithm for dumb HTTP by size brian m. carlson
2020-06-19 17:55   ` [PATCH v3 35/44] builtin/index-pack: add option to specify hash algorithm brian m. carlson
2020-06-19 17:55   ` [PATCH v3 36/44] t1050: pass algorithm to index-pack when outside repo brian m. carlson
2020-06-19 17:55   ` [PATCH v3 37/44] remote-curl: avoid truncating refs with ls-remote brian m. carlson
2020-06-19 17:55   ` [PATCH v3 38/44] t/helper: initialize the repository for test-sha1-array brian m. carlson
2020-06-19 17:55   ` [PATCH v3 39/44] t5702: offer an object-format capability in the test brian m. carlson
2020-06-19 17:55   ` [PATCH v3 40/44] t5703: use object-format serve option brian m. carlson
2020-06-19 17:55   ` [PATCH v3 41/44] t5704: send object-format capability with SHA-256 brian m. carlson
2020-06-19 17:55   ` [PATCH v3 42/44] t5300: pass --object-format to git index-pack brian m. carlson
2020-06-19 17:56   ` [PATCH v3 43/44] bundle: detect hash algorithm when reading refs brian m. carlson
2020-06-19 17:56   ` [PATCH v3 44/44] remote-testgit: adapt for object-format brian m. carlson
2020-06-19 21:09   ` [PATCH v3 00/44] SHA-256 part 2/3: protocol functionality Junio C Hamano
2020-06-20  1:33     ` brian m. carlson

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git