Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3
@ 2020-01-13 12:47 brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 01/22] hex: introduce parsing variants taking hash algorithms brian m. carlson
                   ` (21 more replies)
  0 siblings, 22 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

This is an RFC series for part 1 of 3 of a SHA-256 implementation.  It
contains a few interesting pieces, and there are also some pieces it
does not contain.

First, it contains the pieces necessary to set up repositories and write
_but not read_ extensions.objectFormat.  In other words, you can create
a SHA-256 repository, but will be unable to read it.  We also
intentionally fail any use of SHA-256 in a repository except when
DEVELOPER=1 is set to discourage anyone from wiring it up, since at this
point it is nonfunctional.

It additionally contains code to adjust the setup code as necessary,
handle repository version 1 in worktrees, properly handle signatures in
tags and commits, and import and export submodules using SHA-256
(required for converting the Git repository).

The setup code now learns an environment variable to specify a default
hash.  This is useful for us because otherwise the test suite would need
every call to "git init" (which is a lot) to add a command line
argument, which would be untidy and burdensome.  It is not recommended
for everyday use.

Furthermore, this contains direct calls to test_oid_init in the test
suite setup code, since the alternative is duplicating those values.
I've opted not to remove the calls in the tests because we have other
topics in flight that may conflict with doing that, but I plan to send a
follow-up patch which does that at the end.  The _z40 variable is
persisted there for compatibility with master and will be dropped once
my current test series in next hits master.

There are also things it does not contain.  As mentioned, it lacks
support for reading extensions.objectFormat at all.  It lacks support
for cloning, fetching, and pushing, which while considered non-goals in
the transition plan, are required for the test suite to even come close
to passing.  That code, both for the original protocol and v2, would be
in part 2.  That code does not provide interoperability between SHA-1
and SHA-256 repositories, which will be the subject of the next major
chunk I do.  Part 3 contains the portions making the
extensions.objectFormat option functional and permitting us to run tests
against the new version.

This series, of course, lacks the test suite fixes which will be
required for the test suite to pass.  Those will be coming in two
further series, one of which I plan to send out soon.

Because this series sets up (and documents!) a useless option (which is
a large footgun) and because I'd like feedback about this approach, this
series is RFC.  I'd also like to know if you think anything is missing
outside of the items I've mentioned, because part 3 will result in the
test suite depending on SHA-256 support and therefore any structural
changes will be difficult to make at that point.  And of course, any
other feedback about this series is certainly welcome.

There may be other things that are interesting about this series, but
this cover letter is too small to contain them, so I encourage you to
look at the series for yourself.

If you'd like to see what the entire series looks like when complete,
you're welcome to inspect the transition-stage-4 branch at
https://github.com/bk2204/git.git.

brian m. carlson (22):
  hex: introduce parsing variants taking hash algorithms
  hex: add functions to parse hex object IDs in any algorithm
  repository: require a build flag to use SHA-256
  t: use hash-specific lookup tables to define test constants
  t6300: abstract away SHA-1-specific constants
  t6300: make hash algorithm independent
  t/helper/test-dump-split-index: initialize git repository
  t/helper: initialize repository if necessary
  t/helper: make repository tests hash independent
  setup: allow check_repository_format to read repository format
  builtin/init-db: allow specifying hash algorithm on command line
  builtin/init-db: add environment variable for new repo hash
  init-db: move writing repo version into a function
  worktree: allow repository version 1
  commit: use expected signature header for SHA-256
  gpg-interface: improve interface for parsing tags
  tag: store SHA-256 signatures in a header
  fast-import: permit reading multiple marks files
  fast-import: add helper function for inserting mark object entries
  fast-import: make find_marks work on any mark set
  fast-import: add a generic function to iterate over marks
  fast-import: add options for rewriting submodules

 Documentation/git-fast-import.txt |  20 +++
 Documentation/git-init.txt        |   7 +-
 Documentation/git.txt             |   6 +
 builtin/clone.c                   |   2 +-
 builtin/commit.c                  |   2 +-
 builtin/fmt-merge-msg.c           |  26 +++-
 builtin/init-db.c                 |  70 +++++++--
 builtin/mktag.c                   |  14 ++
 builtin/receive-pack.c            |   4 +-
 builtin/tag.c                     |  20 ++-
 cache.h                           |  25 ++-
 commit.c                          |  58 +++++--
 commit.h                          |   8 +
 config.mak.dev                    |   2 +
 fast-import.c                     | 246 ++++++++++++++++++++++--------
 gpg-interface.c                   |  17 ++-
 gpg-interface.h                   |   9 +-
 hex.c                             |  57 ++++++-
 log-tree.c                        |  14 +-
 path.c                            |   2 +-
 ref-filter.c                      |  23 ++-
 repository.c                      |   4 +
 sequencer.c                       |   2 +-
 setup.c                           |   6 +-
 t/helper/test-dump-split-index.c  |   2 +
 t/helper/test-repository.c        |  14 +-
 t/t1450-fsck.sh                   |  24 +++
 t/t5801-remote-helpers.sh         |   4 +-
 t/t6300-for-each-ref.sh           |  61 +++++---
 t/t7004-tag.sh                    |   8 +-
 t/t7030-verify-tag.sh             |  17 +++
 t/t7510-signed-commit.sh          |  16 +-
 t/t9300-fast-import.sh            | 109 +++++++++++++
 t/test-lib.sh                     |  29 ++--
 tag.c                             |  15 +-
 worktree.c                        |  10 +-
 36 files changed, 758 insertions(+), 195 deletions(-)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 01/22] hex: introduce parsing variants taking hash algorithms
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 02/22] hex: add functions to parse hex object IDs in any algorithm brian m. carlson
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

Introduce variants of get_oid_hex and parse_oid_hex that parse an
arbitrary hash algorithm, implementing internal functions to avoid
duplication.  These functions can be used in the transport code to parse
refs properly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h |  7 +++++++
 hex.c   | 35 +++++++++++++++++++++++++++--------
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index cbfaead23a..493d57febe 100644
--- a/cache.h
+++ b/cache.h
@@ -1480,6 +1480,9 @@ int set_disambiguate_hint_config(const char *var, const char *value);
 int get_sha1_hex(const char *hex, unsigned char *sha1);
 int get_oid_hex(const char *hex, struct object_id *sha1);
 
+/* Like get_oid_hex, but for an arbitrary hash algorithm. */
+int get_oid_hex_algop(const char *hex, struct object_id *oid, const struct git_hash_algo *algop);
+
 /*
  * Read `len` pairs of hexadecimal digits from `hex` and write the
  * values to `binary` as `len` bytes. Return 0 on success, or -1 if
@@ -1515,6 +1518,10 @@ char *oid_to_hex(const struct object_id *oid);						/* same static buffer */
  */
 int parse_oid_hex(const char *hex, struct object_id *oid, const char **end);
 
+/* Like parse_oid_hex, but for an arbitrary hash algorithm. */
+int parse_oid_hex_algop(const char *hex, struct object_id *oid, const char **end,
+			const struct git_hash_algo *algo);
+
 /*
  * This reads short-hand syntax that not only evaluates to a commit
  * object name, but also can act as if the end user spelled the name
diff --git a/hex.c b/hex.c
index fd7f00c43f..10e24dc2e4 100644
--- a/hex.c
+++ b/hex.c
@@ -47,30 +47,49 @@ int hex_to_bytes(unsigned char *binary, const char *hex, size_t len)
 	return 0;
 }
 
-int get_sha1_hex(const char *hex, unsigned char *sha1)
+static int get_hash_hex_algop(const char *hex, unsigned char *hash,
+			      const struct git_hash_algo *algop)
 {
 	int i;
-	for (i = 0; i < the_hash_algo->rawsz; i++) {
+	for (i = 0; i < algop->rawsz; i++) {
 		int val = hex2chr(hex);
 		if (val < 0)
 			return -1;
-		*sha1++ = val;
+		*hash++ = val;
 		hex += 2;
 	}
 	return 0;
 }
 
+int get_sha1_hex(const char *hex, unsigned char *sha1)
+{
+	return get_hash_hex_algop(hex, sha1, the_hash_algo);
+}
+
+int get_oid_hex_algop(const char *hex, struct object_id *oid,
+		      const struct git_hash_algo *algop)
+{
+	return get_hash_hex_algop(hex, oid->hash, algop);
+}
+
 int get_oid_hex(const char *hex, struct object_id *oid)
 {
-	return get_sha1_hex(hex, oid->hash);
+	return get_oid_hex_algop(hex, oid, the_hash_algo);
+}
+
+int parse_oid_hex_algop(const char *hex, struct object_id *oid,
+			const char **end,
+			const struct git_hash_algo *algop)
+{
+	int ret = get_hash_hex_algop(hex, oid->hash, algop);
+	if (!ret)
+		*end = hex + algop->hexsz;
+	return ret;
 }
 
 int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
 {
-	int ret = get_oid_hex(hex, oid);
-	if (!ret)
-		*end = hex + the_hash_algo->hexsz;
-	return ret;
+	return parse_oid_hex_algop(hex, oid, end, the_hash_algo);
 }
 
 char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash,

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 02/22] hex: add functions to parse hex object IDs in any algorithm
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 01/22] hex: introduce parsing variants taking hash algorithms brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-15 21:40   ` Junio C Hamano
  2020-01-13 12:47 ` [RFC PATCH 03/22] repository: require a build flag to use SHA-256 brian m. carlson
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

There are some places where we need to parse a hex object ID in any
algorithm without knowing beforehand which algorithm is in use. An
example is when parsing fast-import marks.

Add a get_oid_hex_any to parse an object ID and return the algorithm it
belongs to, and additionally add parse_oid_hex_any which is the
equivalent change for parse_oid_hex. If the object is not parseable, we
return GIT_HASH_UNKNOWN.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h | 10 ++++++++++
 hex.c   | 22 ++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/cache.h b/cache.h
index 493d57febe..6c094c3210 100644
--- a/cache.h
+++ b/cache.h
@@ -1522,6 +1522,16 @@ int parse_oid_hex(const char *hex, struct object_id *oid, const char **end);
 int parse_oid_hex_algop(const char *hex, struct object_id *oid, const char **end,
 			const struct git_hash_algo *algo);
 
+
+/*
+ * These functions work like get_oid_hex and parse_oid_hex, but they will parse
+ * a hex value for any algorithm. The algorithm is detected based on the length
+ * and the algorithm in use is returned. If this is not a hex object ID in any
+ * algorithm, returns GIT_HASH_UNKNOWN.
+ */
+int get_oid_hex_any(const char *hex, struct object_id *oid);
+int parse_oid_hex_any(const char *hex, struct object_id *oid, const char **end);
+
 /*
  * This reads short-hand syntax that not only evaluates to a commit
  * object name, but also can act as if the end user spelled the name
diff --git a/hex.c b/hex.c
index 10e24dc2e4..da51e64929 100644
--- a/hex.c
+++ b/hex.c
@@ -72,6 +72,20 @@ int get_oid_hex_algop(const char *hex, struct object_id *oid,
 	return get_hash_hex_algop(hex, oid->hash, algop);
 }
 
+/*
+ * NOTE: This function relies on hash algorithms being in order from shortest
+ * length to longest length.
+ */
+int get_oid_hex_any(const char *hex, struct object_id *oid)
+{
+	int i;
+	for (i = GIT_HASH_NALGOS - 1; i > 0; i--) {
+		if (!get_hash_hex_algop(hex, oid->hash, &hash_algos[i]))
+			return i;
+	}
+	return GIT_HASH_UNKNOWN;
+}
+
 int get_oid_hex(const char *hex, struct object_id *oid)
 {
 	return get_oid_hex_algop(hex, oid, the_hash_algo);
@@ -87,6 +101,14 @@ int parse_oid_hex_algop(const char *hex, struct object_id *oid,
 	return ret;
 }
 
+int parse_oid_hex_any(const char *hex, struct object_id *oid, const char **end)
+{
+	int ret = get_oid_hex_any(hex, oid);
+	if (ret)
+		*end = hex + hash_algos[ret].hexsz;
+	return ret;
+}
+
 int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
 {
 	return parse_oid_hex_algop(hex, oid, end, the_hash_algo);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 03/22] repository: require a build flag to use SHA-256
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 01/22] hex: introduce parsing variants taking hash algorithms brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 02/22] hex: add functions to parse hex object IDs in any algorithm brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 04/22] t: use hash-specific lookup tables to define test constants brian m. carlson
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

At this point, SHA-256 support is experimental and some behavior may
change. To avoid surprising unsuspecting users, require a build flag,
ENABLE_SHA256, to allow use of a non-SHA-1 algorithm. Enable this flag
by default when the DEVELOPER make flag is set so that contributors can
test this case adequately.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 config.mak.dev | 2 ++
 repository.c   | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/config.mak.dev b/config.mak.dev
index bf1f3fcdee..2e19435915 100644
--- a/config.mak.dev
+++ b/config.mak.dev
@@ -17,6 +17,8 @@ DEVELOPER_CFLAGS += -Wstrict-prototypes
 DEVELOPER_CFLAGS += -Wunused
 DEVELOPER_CFLAGS += -Wvla
 
+DEVELOPER_CFLAGS += -DENABLE_SHA256
+
 ifndef COMPILER_FEATURES
 COMPILER_FEATURES := $(shell ./detect-compiler $(CC))
 endif
diff --git a/repository.c b/repository.c
index a4174ddb06..6f7f6f002b 100644
--- a/repository.c
+++ b/repository.c
@@ -89,6 +89,10 @@ void repo_set_gitdir(struct repository *repo,
 void repo_set_hash_algo(struct repository *repo, int hash_algo)
 {
 	repo->hash_algo = &hash_algos[hash_algo];
+#ifndef ENABLE_SHA256
+	if (hash_algo != GIT_HASH_SHA1)
+		die(_("The hash algorithm %s is not supported in this build."), repo->hash_algo->name);
+#endif
 }
 
 /*

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 04/22] t: use hash-specific lookup tables to define test constants
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (2 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 03/22] repository: require a build flag to use SHA-256 brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 05/22] t6300: abstract away SHA-1-specific constants brian m. carlson
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

In the future, we'll allow developers to run the testsuite with a hash
algorithm of their choice.  To make this easier, compute the fixed
constants using test_oid. Move the constant initialization down below
the point where test-lib-functions.sh is loaded so the functions are
defined.

Note that we don't provide a value for the OID_REGEX value directly
because writing a large number of instances of "[0-9a-f]" in the
oid-info files is unwieldy and there isn't a way to compute it based on
those values. Instead, compute it based on ZERO_OID.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/test-lib.sh | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 44df51be8f..0b2566ad98 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -494,21 +494,6 @@ case $(echo $GIT_TRACE |tr "[A-Z]" "[a-z]") in
 	;;
 esac
 
-# Convenience
-#
-# A regexp to match 5, 35 and 40 hexdigits
-_x05='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
-_x35="$_x05$_x05$_x05$_x05$_x05$_x05$_x05"
-_x40="$_x35$_x05"
-
-# Zero SHA-1
-_z40=0000000000000000000000000000000000000000
-
-OID_REGEX="$_x40"
-ZERO_OID=$_z40
-EMPTY_TREE=4b825dc642cb6eb9a060e54bf8d69288fbee4904
-EMPTY_BLOB=e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
-
 # Line feed
 LF='
 '
@@ -1382,6 +1367,20 @@ then
 	fi
 fi
 
+# Convenience
+# A regexp to match 5, 35 and 40 hexdigits
+_x05='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
+_x35="$_x05$_x05$_x05$_x05$_x05$_x05$_x05"
+_x40="$_x35$_x05"
+
+test_oid_init
+
+ZERO_OID=$(test_oid zero)
+OID_REGEX=$(echo $ZERO_OID | sed -e 's/0/[0-9a-f]/g')
+EMPTY_TREE=$(test_oid empty_tree)
+EMPTY_BLOB=$(test_oid empty_blob)
+_z40=$ZERO_OID
+
 # Provide an implementation of the 'yes' utility; the upper bound
 # limit is there to help Windows that cannot stop this loop from
 # wasting cycles when the downstream stops reading, so do not be

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 05/22] t6300: abstract away SHA-1-specific constants
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (3 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 04/22] t: use hash-specific lookup tables to define test constants brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 06/22] t6300: make hash algorithm independent brian m. carlson
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

Adjust the test so that it computes variables for object IDs instead of
using hard-coded hashes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t6300-for-each-ref.sh | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 9c910ce746..2406b93d35 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -20,6 +20,10 @@ setdate_and_increment () {
 }
 
 test_expect_success setup '
+	test_oid_cache <<-EOF &&
+	disklen sha1:138
+	disklen sha256:154
+	EOF
 	setdate_and_increment &&
 	echo "Using $datestamp" > one &&
 	git add one &&
@@ -50,6 +54,9 @@ test_atom() {
 	"
 }
 
+hexlen=$(test_oid hexsz)
+disklen=$(test_oid disklen)
+
 test_atom head refname refs/heads/master
 test_atom head refname: refs/heads/master
 test_atom head refname:short master
@@ -82,9 +89,9 @@ test_atom head push:rstrip=-1 refs
 test_atom head push:strip=1 remotes/myfork/master
 test_atom head push:strip=-1 master
 test_atom head objecttype commit
-test_atom head objectsize 171
-test_atom head objectsize:disk 138
-test_atom head deltabase 0000000000000000000000000000000000000000
+test_atom head objectsize $((131 + hexlen))
+test_atom head objectsize:disk $disklen
+test_atom head deltabase $ZERO_OID
 test_atom head objectname $(git rev-parse refs/heads/master)
 test_atom head objectname:short $(git rev-parse --short refs/heads/master)
 test_atom head objectname:short=1 $(git rev-parse --short=1 refs/heads/master)
@@ -125,11 +132,11 @@ test_atom tag refname:short testtag
 test_atom tag upstream ''
 test_atom tag push ''
 test_atom tag objecttype tag
-test_atom tag objectsize 154
-test_atom tag objectsize:disk 138
-test_atom tag '*objectsize:disk' 138
-test_atom tag deltabase 0000000000000000000000000000000000000000
-test_atom tag '*deltabase' 0000000000000000000000000000000000000000
+test_atom tag objectsize $((114 + hexlen))
+test_atom tag objectsize:disk $disklen
+test_atom tag '*objectsize:disk' $disklen
+test_atom tag deltabase $ZERO_OID
+test_atom tag '*deltabase' $ZERO_OID
 test_atom tag objectname $(git rev-parse refs/tags/testtag)
 test_atom tag objectname:short $(git rev-parse --short refs/tags/testtag)
 test_atom head objectname:short=1 $(git rev-parse --short=1 refs/heads/master)
@@ -139,7 +146,7 @@ test_atom tag parent ''
 test_atom tag numparent ''
 test_atom tag object $(git rev-parse refs/tags/testtag^0)
 test_atom tag type 'commit'
-test_atom tag '*objectname' 'ea122842f48be4afb2d1fc6a4b96c05885ab7463'
+test_atom tag '*objectname' $(git rev-parse refs/tags/testtag^{})
 test_atom tag '*objecttype' 'commit'
 test_atom tag author ''
 test_atom tag authorname ''

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 06/22] t6300: make hash algorithm independent
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (4 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 05/22] t6300: abstract away SHA-1-specific constants brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 07/22] t/helper/test-dump-split-index: initialize git repository brian m. carlson
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

One of the git for-each-ref tests asks to sort by object ID.  However,
when sorted, the order of the refs differs between SHA-1 and SHA-256.
Sort the expected output so that the test passes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/t6300-for-each-ref.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 2406b93d35..b3c1092338 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -650,7 +650,7 @@ test_atom refs/tags/signed-long contents "subject line
 body contents
 $sig"
 
-cat >expected <<EOF
+sort >expected <<EOF
 $(git rev-parse refs/tags/bogo) <committer@example.com> refs/tags/bogo
 $(git rev-parse refs/tags/master) <committer@example.com> refs/tags/master
 EOF

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 07/22] t/helper/test-dump-split-index: initialize git repository
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (5 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 06/22] t6300: make hash algorithm independent brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 08/22] t/helper: initialize repository if necessary brian m. carlson
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

In this test helper, we read the index.  In order to have the proper
hash algorithm set up, we must call setup_git_directory.  Do so, so that
the test works when extensions.objectFormat is set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/helper/test-dump-split-index.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/helper/test-dump-split-index.c b/t/helper/test-dump-split-index.c
index 63c689d6ee..a209880eb3 100644
--- a/t/helper/test-dump-split-index.c
+++ b/t/helper/test-dump-split-index.c
@@ -13,6 +13,8 @@ int cmd__dump_split_index(int ac, const char **av)
 	struct split_index *si;
 	int i;
 
+	setup_git_directory();
+
 	do_read_index(&the_index, av[1], 1);
 	printf("own %s\n", oid_to_hex(&the_index.oid));
 	si = the_index.split_index;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 08/22] t/helper: initialize repository if necessary
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (6 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 07/22] t/helper/test-dump-split-index: initialize git repository brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 09/22] t/helper: make repository tests hash independent brian m. carlson
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

The repository helper is used in t5318 to read commit graphs whether
we're in a repository or not. However, without a repository, we have no
way to properly initialize the hash algorithm, meaning that the file is
misread.

Fix this by calling setup_git_directory_gently, which will read the
environment variable the testsuite sets to ensure that the correct hash
algorithm is set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/helper/test-repository.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
index f7f8618445..ecc768e4cb 100644
--- a/t/helper/test-repository.c
+++ b/t/helper/test-repository.c
@@ -75,6 +75,10 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
 
 int cmd__repository(int argc, const char **argv)
 {
+	int nongit_ok = 0;
+
+	setup_git_directory_gently(&nongit_ok);
+
 	if (argc < 2)
 		die("must have at least 2 arguments");
 	if (!strcmp(argv[1], "parse_commit_in_graph")) {

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 09/22] t/helper: make repository tests hash independent
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (7 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 08/22] t/helper: initialize repository if necessary brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 10/22] setup: allow check_repository_format to read repository format brian m. carlson
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

This test currently hard-codes the hash algorithm as SHA-1 when calling
repo_set_hash_algo so that the_hash_algo is properly initialized.
However, this does not work with SHA-256 repositories. Read the
repository value that repo_init has read into the local repository
variable and set the algorithm based on that value.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 t/helper/test-repository.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
index ecc768e4cb..56f0e3c1be 100644
--- a/t/helper/test-repository.c
+++ b/t/helper/test-repository.c
@@ -19,12 +19,11 @@ static void test_parse_commit_in_graph(const char *gitdir, const char *worktree,
 
 	memset(the_repository, 0, sizeof(*the_repository));
 
-	/* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
-	repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
-
 	if (repo_init(&r, gitdir, worktree))
 		die("Couldn't init repo");
 
+	repo_set_hash_algo(the_repository, hash_algo_by_ptr(r.hash_algo));
+
 	c = lookup_commit(&r, commit_oid);
 
 	if (!parse_commit_in_graph(&r, c))
@@ -50,12 +49,11 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
 
 	memset(the_repository, 0, sizeof(*the_repository));
 
-	/* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
-	repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
-
 	if (repo_init(&r, gitdir, worktree))
 		die("Couldn't init repo");
 
+	repo_set_hash_algo(the_repository, hash_algo_by_ptr(r.hash_algo));
+
 	c = lookup_commit(&r, commit_oid);
 
 	/*

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 10/22] setup: allow check_repository_format to read repository format
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (8 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 09/22] t/helper: make repository tests hash independent brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 11/22] builtin/init-db: allow specifying hash algorithm on command line brian m. carlson
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

In some cases, we will want to not only check the repository format, but
extract the information that we've gained.  To do so, allow
check_repository_format to take a pointer to struct repository_format.
Allow passing NULL for this argument if we're not interested in the
information, and pass NULL for all existing callers.  A future patch
will make use of this information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/init-db.c | 2 +-
 cache.h           | 4 +++-
 path.c            | 2 +-
 setup.c           | 6 ++++--
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 944ec77fe1..b11f07064d 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -378,7 +378,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * config file, so this will not fail.  What we are catching
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
-	check_repository_format();
+	check_repository_format(NULL);
 
 	reinit = create_default_files(template_dir, original_git_dir);
 
diff --git a/cache.h b/cache.h
index 6c094c3210..75f95f6f10 100644
--- a/cache.h
+++ b/cache.h
@@ -1086,8 +1086,10 @@ int verify_repository_format(const struct repository_format *format,
  * and die if it is a version we don't understand. Generally one would
  * set_git_dir() before calling this, and use it only for "are we in a valid
  * repo?".
+ *
+ * If successful and fmt is not NULL, fill fmt with data.
  */
-void check_repository_format(void);
+void check_repository_format(struct repository_format *fmt);
 
 #define MTIME_CHANGED	0x0001
 #define CTIME_CHANGED	0x0002
diff --git a/path.c b/path.c
index a76eec8b96..18d96c8760 100644
--- a/path.c
+++ b/path.c
@@ -851,7 +851,7 @@ const char *enter_repo(const char *path, int strict)
 
 	if (is_git_directory(".")) {
 		set_git_dir(".");
-		check_repository_format();
+		check_repository_format(NULL);
 		return path;
 	}
 
diff --git a/setup.c b/setup.c
index e2a479a64f..57c3e98d7b 100644
--- a/setup.c
+++ b/setup.c
@@ -1235,10 +1235,12 @@ int git_config_perm(const char *var, const char *value)
 	return -(i & 0666);
 }
 
-void check_repository_format(void)
+void check_repository_format(struct repository_format *fmt)
 {
 	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
-	check_repository_format_gently(get_git_dir(), &repo_fmt, NULL);
+	if (!fmt)
+		fmt = &repo_fmt;
+	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	clear_repository_format(&repo_fmt);
 }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 11/22] builtin/init-db: allow specifying hash algorithm on command line
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (9 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 10/22] setup: allow check_repository_format to read repository format brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 12/22] builtin/init-db: add environment variable for new repo hash brian m. carlson
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

Allow the user to specify the hash algorithm on the command line by
using the --object-format option to git init.  Validate that the user is
not attempting to reinitialize a repository with a different hash
algorithm.  Ensure that if we are writing a non-SHA-1 repository that we
set the repository version to 1 and write the objectFormat extension.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-init.txt |  7 +++++-
 builtin/clone.c            |  2 +-
 builtin/init-db.c          | 47 +++++++++++++++++++++++++++++++++-----
 cache.h                    |  3 ++-
 4 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-init.txt b/Documentation/git-init.txt
index 32880aafb0..adc6adfd38 100644
--- a/Documentation/git-init.txt
+++ b/Documentation/git-init.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git init' [-q | --quiet] [--bare] [--template=<template_directory>]
-	  [--separate-git-dir <git dir>]
+	  [--separate-git-dir <git dir>] [--object-format=<format]
 	  [--shared[=<permissions>]] [directory]
 
 
@@ -48,6 +48,11 @@ Only print error and warning messages; all other output will be suppressed.
 Create a bare repository. If `GIT_DIR` environment is not set, it is set to the
 current working directory.
 
+--object-format=<format>::
+
+Specify the given object format (hash algorithm) for the repository.  The valid
+values are 'sha1' and (if enabled) 'sha256'.  'sha1' is the default.
+
 --template=<template_directory>::
 
 Specify the directory from which templates will be used.  (See the "TEMPLATE
diff --git a/builtin/clone.c b/builtin/clone.c
index 0fc89ae2b9..df895133b3 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1096,7 +1096,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index b11f07064d..acb1fa1ad9 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -177,7 +177,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 }
 
 static int create_default_files(const char *template_path,
-				const char *original_git_dir)
+				const char *original_git_dir,
+				const struct repository_format *fmt)
 {
 	struct stat st1;
 	struct strbuf buf = STRBUF_INIT;
@@ -187,6 +188,7 @@ static int create_default_files(const char *template_path,
 	int reinit;
 	int filemode;
 	struct strbuf err = STRBUF_INIT;
+	int repo_version = GIT_REPO_VERSION;
 
 	/* Just look for `init.templatedir` */
 	init_db_template_dir = NULL; /* re-set in case it was set before */
@@ -244,11 +246,18 @@ static int create_default_files(const char *template_path,
 			exit(1);
 	}
 
+	if (fmt->hash_algo != GIT_HASH_SHA1)
+		repo_version = GIT_REPO_VERSION_READ;
+
 	/* This forces creation of new config file */
 	xsnprintf(repo_version_string, sizeof(repo_version_string),
-		  "%d", GIT_REPO_VERSION);
+		  "%d", repo_version);
 	git_config_set("core.repositoryformatversion", repo_version_string);
 
+	if (fmt->hash_algo != GIT_HASH_SHA1)
+		git_config_set("extensions.objectformat",
+			       hash_algos[fmt->hash_algo].name);
+
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
 	filemode = TEST_FILEMODE;
@@ -340,12 +349,26 @@ static void separate_git_dir(const char *git_dir, const char *git_link)
 	write_file(git_link, "gitdir: %s", git_dir);
 }
 
+static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash)
+{
+	/*
+	 * If we already have an initialized repo, don't allow the user to
+	 * specify a different algorithm, as that could cause corruption.
+	 * Otherwise, if the user has specified one on the command line, use it.
+	 */
+	if (repo_fmt->version >= 0 && hash != GIT_HASH_UNKNOWN && hash != repo_fmt->hash_algo)
+		die(_("attempt to reinitialize repository with different hash"));
+	else if (hash != GIT_HASH_UNKNOWN)
+		repo_fmt->hash_algo = hash;
+}
+
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags)
+	    const char *template_dir, int hash, unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
 	char *original_git_dir = real_pathdup(git_dir, 1);
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 
 	if (real_git_dir) {
 		struct stat st;
@@ -378,9 +401,11 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * config file, so this will not fail.  What we are catching
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
-	check_repository_format(NULL);
+	check_repository_format(&repo_fmt);
 
-	reinit = create_default_files(template_dir, original_git_dir);
+	validate_hash_algorithm(&repo_fmt, hash);
+
+	reinit = create_default_files(template_dir, original_git_dir, &repo_fmt);
 
 	create_object_directory();
 
@@ -482,6 +507,8 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *work_tree;
 	const char *template_dir = NULL;
 	unsigned int flags = 0;
+	const char *object_format = NULL;
+	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
 		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
 				N_("directory from which templates will be used")),
@@ -494,6 +521,8 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
+		OPT_STRING(0, "object-format", &object_format, N_("hash"),
+			   N_("specify the hash algorithm to use")),
 		OPT_END()
 	};
 
@@ -546,6 +575,12 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 		free(cwd);
 	}
 
+	if (object_format) {
+		hash_algo = hash_algo_by_name(object_format);
+		if (hash_algo == GIT_HASH_UNKNOWN)
+			die(_("unknown hash algorithm '%s'"), object_format);
+	}
+
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
 
@@ -597,5 +632,5 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
 }
diff --git a/cache.h b/cache.h
index 75f95f6f10..14321f55da 100644
--- a/cache.h
+++ b/cache.h
@@ -627,7 +627,8 @@ int path_inside_repo(const char *prefix, const char *path);
 #define INIT_DB_EXIST_OK 0x0002
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags);
+	    const char *template_dir, int hash_algo,
+	    unsigned int flags);
 
 void sanitize_stdfds(void);
 int daemonize(void);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 12/22] builtin/init-db: add environment variable for new repo hash
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (10 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 11/22] builtin/init-db: allow specifying hash algorithm on command line brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 13/22] init-db: move writing repo version into a function brian m. carlson
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

For the foreseeable future, SHA-1 will be the default algorithm for Git.
However, when running the testsuite, we want to be able to test an
arbitrary algorithm. It would be quite burdensome and very untidy to
have to specify the algorithm we'd like to test every time we
initialized a new repository somewhere in the testsuite, so add an
environment variable to allow us to specify the default hash algorithm
for Git.

This has the benefit that we can set it once for the entire testsuite
and not have to think about it. In the future, users can also use it to
set the default for their repositories if they would like to do so.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git.txt | 6 ++++++
 builtin/init-db.c     | 9 +++++++++
 2 files changed, 15 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index b1597ac002..d0e83fb7d7 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -479,6 +479,12 @@ double-quotes and respecting backslash escapes. E.g., the value
 	details. This variable has lower precedence than other path
 	variables such as GIT_INDEX_FILE, GIT_OBJECT_DIRECTORY...
 
+`GIT_DEFAULT_HASH_ALGORITHM`::
+	If this variable is set, the default hash algorithm for new
+	repositories will be set to this value. This value is currently
+	ignored when cloning; the setting of the remote repository
+	is used instead. The default is "sha1".
+
 Git Commits
 ~~~~~~~~~~~
 `GIT_AUTHOR_NAME`::
diff --git a/builtin/init-db.c b/builtin/init-db.c
index acb1fa1ad9..7b385aacfd 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -20,6 +20,8 @@
 #define TEST_FILEMODE 1
 #endif
 
+#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
+
 static int init_is_bare_repository = 0;
 static int init_shared_repository = -1;
 static const char *init_db_template_dir;
@@ -351,6 +353,7 @@ static void separate_git_dir(const char *git_dir, const char *git_link)
 
 static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash)
 {
+	const char *env = getenv(GIT_DEFAULT_HASH_ENVIRONMENT);
 	/*
 	 * If we already have an initialized repo, don't allow the user to
 	 * specify a different algorithm, as that could cause corruption.
@@ -360,6 +363,12 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 		die(_("attempt to reinitialize repository with different hash"));
 	else if (hash != GIT_HASH_UNKNOWN)
 		repo_fmt->hash_algo = hash;
+	else if (env) {
+		int env_algo = hash_algo_by_name(env);
+		if (env_algo == GIT_HASH_UNKNOWN)
+			die(_("unknown hash algorithm '%s'"), env);
+		repo_fmt->hash_algo = env_algo;
+	}
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 13/22] init-db: move writing repo version into a function
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (11 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 12/22] builtin/init-db: add environment variable for new repo hash brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 14/22] worktree: allow repository version 1 brian m. carlson
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

When we perform a clone, we won't know the remote side's hash algorithm
until we've read the heads.  Consequently, we'll need to rewrite the
repository format version and hash algorithm once we know what the
remote side has.  Move the code that does this into its own function so
that we can call it from clone in the future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/init-db.c | 32 +++++++++++++++++++-------------
 cache.h           |  1 +
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 7b385aacfd..9df9a54d20 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,6 +178,24 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
+void initialize_repository_version(int hash_algo)
+{
+	char repo_version_string[10];
+	int repo_version = GIT_REPO_VERSION;
+
+	if (hash_algo != GIT_HASH_SHA1)
+		repo_version = GIT_REPO_VERSION_READ;
+
+	/* This forces creation of new config file */
+	xsnprintf(repo_version_string, sizeof(repo_version_string),
+		  "%d", repo_version);
+	git_config_set("core.repositoryformatversion", repo_version_string);
+
+	if (hash_algo != GIT_HASH_SHA1)
+		git_config_set("extensions.objectformat",
+			       hash_algos[hash_algo].name);
+}
+
 static int create_default_files(const char *template_path,
 				const char *original_git_dir,
 				const struct repository_format *fmt)
@@ -185,12 +203,10 @@ static int create_default_files(const char *template_path,
 	struct stat st1;
 	struct strbuf buf = STRBUF_INIT;
 	char *path;
-	char repo_version_string[10];
 	char junk[2];
 	int reinit;
 	int filemode;
 	struct strbuf err = STRBUF_INIT;
-	int repo_version = GIT_REPO_VERSION;
 
 	/* Just look for `init.templatedir` */
 	init_db_template_dir = NULL; /* re-set in case it was set before */
@@ -248,17 +264,7 @@ static int create_default_files(const char *template_path,
 			exit(1);
 	}
 
-	if (fmt->hash_algo != GIT_HASH_SHA1)
-		repo_version = GIT_REPO_VERSION_READ;
-
-	/* This forces creation of new config file */
-	xsnprintf(repo_version_string, sizeof(repo_version_string),
-		  "%d", repo_version);
-	git_config_set("core.repositoryformatversion", repo_version_string);
-
-	if (fmt->hash_algo != GIT_HASH_SHA1)
-		git_config_set("extensions.objectformat",
-			       hash_algos[fmt->hash_algo].name);
+	initialize_repository_version(fmt->hash_algo);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
diff --git a/cache.h b/cache.h
index 14321f55da..a47a9f5dbd 100644
--- a/cache.h
+++ b/cache.h
@@ -629,6 +629,7 @@ int path_inside_repo(const char *prefix, const char *path);
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
 	    unsigned int flags);
+void initialize_repository_version(int hash_algo);
 
 void sanitize_stdfds(void);
 int daemonize(void);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 14/22] worktree: allow repository version 1
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (12 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 13/22] init-db: move writing repo version into a function brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 15/22] commit: use expected signature header for SHA-256 brian m. carlson
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

Git supports both repository versions 0 and 1.  These formats are
identical except for the presence of extensions.  When using an
extension, such as for a different hash algorithm, a check for only
version 0 causes the check to fail.  Instead, call
verify_repository_format to verify that we have an appropriate version
and no unknown extensions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 worktree.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/worktree.c b/worktree.c
index 5b4793caa3..d1d23aadb4 100644
--- a/worktree.c
+++ b/worktree.c
@@ -449,7 +449,7 @@ const struct worktree *find_shared_symref(const char *symref,
 int submodule_uses_worktrees(const char *path)
 {
 	char *submodule_gitdir;
-	struct strbuf sb = STRBUF_INIT;
+	struct strbuf sb = STRBUF_INIT, err = STRBUF_INIT;
 	DIR *dir;
 	struct dirent *d;
 	int ret = 0;
@@ -463,18 +463,16 @@ int submodule_uses_worktrees(const char *path)
 	get_common_dir_noenv(&sb, submodule_gitdir);
 	free(submodule_gitdir);
 
-	/*
-	 * The check below is only known to be good for repository format
-	 * version 0 at the time of writing this code.
-	 */
 	strbuf_addstr(&sb, "/config");
 	read_repository_format(&format, sb.buf);
-	if (format.version != 0) {
+	if (verify_repository_format(&format, &err)) {
+		strbuf_release(&err);
 		strbuf_release(&sb);
 		clear_repository_format(&format);
 		return 1;
 	}
 	clear_repository_format(&format);
+	strbuf_release(&err);
 
 	/* Replace config by worktrees. */
 	strbuf_setlen(&sb, sb.len - strlen("config"));

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 15/22] commit: use expected signature header for SHA-256
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (13 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 14/22] worktree: allow repository version 1 brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 16/22] gpg-interface: improve interface for parsing tags brian m. carlson
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

The transition plan anticipates that we will allow signatures using
multiple algorithms in a single commit. In order to do so, we need to
use a different header per algorithm so that it will be obvious over
which data to compute the signature.

The transition plan specifies that we should use "gpgsig-sha256", so
wire up the commit code such that it can write and parse the current
algorithm, and it can remove the headers for any algorithm when creating
a new commit. Add tests to ensure that we write using the right header
and that git fsck doesn't reject these commits.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/commit.c         |  2 +-
 commit.c                 | 30 +++++++++++++++++++++++-------
 sequencer.c              |  2 +-
 t/t1450-fsck.sh          | 24 ++++++++++++++++++++++++
 t/t7510-signed-commit.sh | 16 +++++++++++++---
 5 files changed, 62 insertions(+), 12 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index aa1332308a..22b75492bb 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1654,7 +1654,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	}
 
 	if (amend) {
-		const char *exclude_gpgsig[2] = { "gpgsig", NULL };
+		const char *exclude_gpgsig[3] = { "gpgsig", "gpgsig-sha256", NULL };
 		extra = read_commit_extra_headers(current_head, exclude_gpgsig);
 	} else {
 		struct commit_extra_header **tail = &extra;
diff --git a/commit.c b/commit.c
index 434ec030d6..e903ba3c79 100644
--- a/commit.c
+++ b/commit.c
@@ -961,14 +961,22 @@ struct commit *get_fork_point(const char *refname, struct commit *commit)
 	return ret;
 }
 
-static const char gpg_sig_header[] = "gpgsig";
-static const int gpg_sig_header_len = sizeof(gpg_sig_header) - 1;
+/*
+ * Indexed by hash algorithm identifier.
+ */
+static const char *gpg_sig_headers[] = {
+	NULL,
+	"gpgsig",
+	"gpgsig-sha256",
+};
 
 static int do_sign_commit(struct strbuf *buf, const char *keyid)
 {
 	struct strbuf sig = STRBUF_INIT;
 	int inspos, copypos;
 	const char *eoh;
+	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)];
+	int gpg_sig_header_len = strlen(gpg_sig_header);
 
 	/* find the end of the header */
 	eoh = strstr(buf->buf, "\n\n");
@@ -1010,6 +1018,8 @@ int parse_signed_commit(const struct commit *commit,
 	const char *buffer = get_commit_buffer(commit, &size);
 	int in_signature, saw_signature = -1;
 	const char *line, *tail;
+	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)];
+	int gpg_sig_header_len = strlen(gpg_sig_header);
 
 	line = buffer;
 	tail = buffer + size;
@@ -1056,11 +1066,17 @@ int remove_signature(struct strbuf *buf)
 
 		if (in_signature && line[0] == ' ')
 			sig_end = next;
-		else if (starts_with(line, gpg_sig_header) &&
-			 line[gpg_sig_header_len] == ' ') {
-			sig_start = line;
-			sig_end = next;
-			in_signature = 1;
+		else if (starts_with(line, "gpgsig")) {
+			int i;
+			for (i = 1; i < GIT_HASH_NALGOS; i++) {
+				const char *p;
+				if (skip_prefix(line, gpg_sig_headers[i], &p) &&
+				    *p == ' ') {
+					sig_start = line;
+					sig_end = next;
+					in_signature = 1;
+				}
+			}
 		} else {
 			if (*line == '\n')
 				/* dump the whole remainder of the buffer */
diff --git a/sequencer.c b/sequencer.c
index 763ccbbc45..6603cf5c54 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1392,7 +1392,7 @@ static int try_to_commit(struct repository *r,
 	if (parse_head(r, &current_head))
 		return -1;
 	if (flags & AMEND_MSG) {
-		const char *exclude_gpgsig[] = { "gpgsig", NULL };
+		const char *exclude_gpgsig[] = { "gpgsig", "gpgsig-sha256", NULL };
 		const char *out_enc = get_commit_output_encoding();
 		const char *message = logmsg_reencode(current_head, NULL,
 						      out_enc);
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 02478bc4ec..70a8307154 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -133,6 +133,30 @@ test_expect_success 'other worktree HEAD link pointing at a funny place' '
 	test_i18ngrep "worktrees/other/HEAD points to something strange" out
 '
 
+test_expect_success 'commit with multiple signatures is okay' '
+	git cat-file commit HEAD >basis &&
+	cat >sigs <<-EOF &&
+	gpgsig -----BEGIN PGP SIGNATURE-----
+	  VGhpcyBpcyBub3QgcmVhbGx5IGEgc2lnbmF0dXJlLg==
+	  -----END PGP SIGNATURE-----
+	gpgsig-sha256 -----BEGIN PGP SIGNATURE-----
+	  VGhpcyBpcyBub3QgcmVhbGx5IGEgc2lnbmF0dXJlLg==
+	  -----END PGP SIGNATURE-----
+	EOF
+	sed -e "/^committer/q" basis >okay &&
+	cat sigs >>okay &&
+	echo >>okay &&
+	sed -e "1,/^$/d" basis >>okay &&
+	cat okay &&
+	new=$(git hash-object -t commit -w --stdin <okay) &&
+	test_when_finished "remove_object $new" &&
+	git update-ref refs/heads/bogus "$new" &&
+	test_when_finished "git update-ref -d refs/heads/bogus" &&
+	git fsck 2>out &&
+	cat out &&
+	! grep "commit $new" out
+'
+
 test_expect_success 'email without @ is okay' '
 	git cat-file commit HEAD >basis &&
 	sed "s/@/AT/" basis >okay &&
diff --git a/t/t7510-signed-commit.sh b/t/t7510-signed-commit.sh
index 682b23a068..c4b07240f2 100755
--- a/t/t7510-signed-commit.sh
+++ b/t/t7510-signed-commit.sh
@@ -6,6 +6,11 @@ GNUPGHOME_NOT_USED=$GNUPGHOME
 . "$TEST_DIRECTORY/lib-gpg.sh"
 
 test_expect_success GPG 'create signed commits' '
+	test_oid_cache <<-\EOF &&
+	header sha1:gpgsig
+	header sha256:gpgsig-sha256
+	EOF
+
 	test_when_finished "test_unconfig commit.gpgsign" &&
 
 	echo 1 >file && git add file &&
@@ -140,6 +145,11 @@ test_expect_success GPG 'verify signatures with --raw' '
 	)
 '
 
+test_expect_success GPG 'proper header is used for hash algorithm' '
+	git cat-file commit fourth-signed >output &&
+	grep "^$(test_oid header) -----BEGIN PGP SIGNATURE-----" output
+'
+
 test_expect_success GPG 'show signed commit with signature' '
 	git show -s initial >commit &&
 	git show -s --show-signature initial >show &&
@@ -147,7 +157,7 @@ test_expect_success GPG 'show signed commit with signature' '
 	git cat-file commit initial >cat &&
 	grep -v -e "gpg: " -e "Warning: " show >show.commit &&
 	grep -e "gpg: " -e "Warning: " show >show.gpg &&
-	grep -v "^ " cat | grep -v "^gpgsig " >cat.commit &&
+	grep -v "^ " cat | grep -v "^$(test_oid header) " >cat.commit &&
 	test_cmp show.commit commit &&
 	test_cmp show.gpg verify.2 &&
 	test_cmp cat.commit verify.1
@@ -260,10 +270,10 @@ test_expect_success GPG 'check config gpg.format values' '
 test_expect_success GPG 'detect fudged commit with double signature' '
 	sed -e "/gpgsig/,/END PGP/d" forged1 >double-base &&
 	sed -n -e "/gpgsig/,/END PGP/p" forged1 | \
-		sed -e "s/^gpgsig//;s/^ //" | gpg --dearmor >double-sig1.sig &&
+		sed -e "s/^$(test_oid header)//;s/^ //" | gpg --dearmor >double-sig1.sig &&
 	gpg -o double-sig2.sig -u 29472784 --detach-sign double-base &&
 	cat double-sig1.sig double-sig2.sig | gpg --enarmor >double-combined.asc &&
-	sed -e "s/^\(-.*\)ARMORED FILE/\1SIGNATURE/;1s/^/gpgsig /;2,\$s/^/ /" \
+	sed -e "s/^\(-.*\)ARMORED FILE/\1SIGNATURE/;1s/^/$(test_oid header) /;2,\$s/^/ /" \
 		double-combined.asc > double-gpgsig &&
 	sed -e "/committer/r double-gpgsig" double-base >double-commit &&
 	git hash-object -w -t commit double-commit >double-commit.commit &&

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 16/22] gpg-interface: improve interface for parsing tags
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (14 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 15/22] commit: use expected signature header for SHA-256 brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 17/22] tag: store SHA-256 signatures in a header brian m. carlson
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

We have a function which parses a buffer with a signature at the end,
parse_signature, and this function is used for signed tags.  However,
the transition plan has SHA-256 tags using a header, which is a
materially different syntax.  The current interface is not suitable for
parsing such tags.

Adjust the parse_signature interface to store the parsed data in two
strbufs and turn the existing function into parse_signed_buffer.  The
latter is still used in places where we want to strip off the signature
in a SHA-1 tag or in places where we know we always have a signed
buffer, such as push certs.

Adjust all the callers to deal with this new interface.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/fmt-merge-msg.c | 26 ++++++++++++++++++--------
 builtin/receive-pack.c  |  4 ++--
 builtin/tag.c           | 16 ++++++++++++----
 commit.c                |  9 ++++++---
 gpg-interface.c         | 13 ++++++++++++-
 gpg-interface.h         |  9 ++++++++-
 log-tree.c              | 14 ++++++++------
 ref-filter.c            | 18 ++++++++++++++----
 tag.c                   | 15 ++++++++-------
 9 files changed, 88 insertions(+), 36 deletions(-)

diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 05a92c59d8..29f647e2d9 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -472,6 +472,7 @@ static void fmt_tag_signature(struct strbuf *tagbuf,
 			      const char *buf,
 			      unsigned long len)
 {
+
 	const char *tag_body = strstr(buf, "\n\n");
 	if (tag_body) {
 		tag_body += 2;
@@ -492,24 +493,31 @@ static void fmt_merge_msg_sigs(struct strbuf *out)
 	for (i = 0; i < origins.nr; i++) {
 		struct object_id *oid = origins.items[i].util;
 		enum object_type type;
-		unsigned long size, len;
-		char *buf = read_object_file(oid, &type, &size);
+		unsigned long size;
+		char *buf = read_object_file(oid, &type, &size), *orig = buf;
 		struct signature_check sigc = { 0 };
+		struct strbuf payload = STRBUF_INIT;
+		struct strbuf signature = STRBUF_INIT;
 		struct strbuf sig = STRBUF_INIT;
+		size_t len = size;
 
 		if (!buf || type != OBJ_TAG)
 			goto next;
-		len = parse_signature(buf, size);
-
-		if (size == len)
-			; /* merely annotated */
-		else if (!check_signature(buf, len, buf + len, size - len,
+		if (!parse_signature(buf, size, &payload, &signature))
+			len = size; /* merely annotated */
+		else if (!check_signature(payload.buf, payload.len,
+					  signature.buf, signature.len,
 					  &sigc)) {
 			strbuf_addstr(&sig, sigc.gpg_output);
 			signature_check_clear(&sigc);
 		} else
 			strbuf_addstr(&sig, "gpg verification failed.\n");
 
+		if (payload.len) {
+			buf = payload.buf;
+			len = payload.len;
+		}
+
 		if (!tag_number++) {
 			fmt_tag_signature(&tagbuf, &sig, buf, len);
 			first_tag = i;
@@ -531,8 +539,10 @@ static void fmt_merge_msg_sigs(struct strbuf *out)
 			fmt_tag_signature(&tagbuf, &sig, buf, len);
 		}
 		strbuf_release(&sig);
+		strbuf_release(&payload);
+		strbuf_release(&signature);
 	next:
-		free(buf);
+		free(orig);
 	}
 	if (tagbuf.len) {
 		strbuf_addch(out, '\n');
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 411e0b4d99..4c814c1963 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -636,7 +636,7 @@ static void prepare_push_cert_sha1(struct child_process *proc)
 
 		memset(&sigcheck, '\0', sizeof(sigcheck));
 
-		bogs = parse_signature(push_cert.buf, push_cert.len);
+		bogs = parse_signed_buffer(push_cert.buf, push_cert.len);
 		check_signature(push_cert.buf, bogs, push_cert.buf + bogs,
 				push_cert.len - bogs, &sigcheck);
 
@@ -1568,7 +1568,7 @@ static void queue_commands_from_cert(struct command **tail,
 		die("malformed push certificate %.*s", 100, push_cert->buf);
 	else
 		boc += 2;
-	eoc = push_cert->buf + parse_signature(push_cert->buf, push_cert->len);
+	eoc = push_cert->buf + parse_signed_buffer(push_cert->buf, push_cert->len);
 
 	while (boc < eoc) {
 		const char *eol = memchr(boc, '\n', eoc - boc);
diff --git a/builtin/tag.c b/builtin/tag.c
index e0a4c25382..6b95c6a2ea 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -174,11 +174,17 @@ static void write_tag_body(int fd, const struct object_id *oid)
 {
 	unsigned long size;
 	enum object_type type;
-	char *buf, *sp;
+	char *buf, *sp, *orig;
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 
-	buf = read_object_file(oid, &type, &size);
+	orig = buf = read_object_file(oid, &type, &size);
 	if (!buf)
 		return;
+	if (parse_signature(buf, size, &payload, &signature)) {
+		buf = payload.buf;
+		size = payload.len;
+	}
 	/* skip header */
 	sp = strstr(buf, "\n\n");
 
@@ -187,9 +193,11 @@ static void write_tag_body(int fd, const struct object_id *oid)
 		return;
 	}
 	sp += 2; /* skip the 2 LFs */
-	write_or_die(fd, sp, parse_signature(sp, buf + size - sp));
+	write_or_die(fd, sp, buf + size - sp);
 
-	free(buf);
+	free(orig);
+	strbuf_release(&payload);
+	strbuf_release(&signature);
 }
 
 static int build_tag_object(struct strbuf *buf, int sign, struct object_id *result)
diff --git a/commit.c b/commit.c
index e903ba3c79..d44f0c1c26 100644
--- a/commit.c
+++ b/commit.c
@@ -1097,8 +1097,10 @@ static void handle_signed_tag(struct commit *parent, struct commit_extra_header
 	struct merge_remote_desc *desc;
 	struct commit_extra_header *mergetag;
 	char *buf;
-	unsigned long size, len;
+	unsigned long size;
 	enum object_type type;
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 
 	desc = merge_remote_util(parent);
 	if (!desc || !desc->obj)
@@ -1106,8 +1108,7 @@ static void handle_signed_tag(struct commit *parent, struct commit_extra_header
 	buf = read_object_file(&desc->obj->oid, &type, &size);
 	if (!buf || type != OBJ_TAG)
 		goto free_return;
-	len = parse_signature(buf, size);
-	if (size == len)
+	if (!parse_signature(buf, size, &payload, &signature))
 		goto free_return;
 	/*
 	 * We could verify this signature and either omit the tag when
@@ -1126,6 +1127,8 @@ static void handle_signed_tag(struct commit *parent, struct commit_extra_header
 
 	**tail = mergetag;
 	*tail = &mergetag->next;
+	strbuf_release(&payload);
+	strbuf_release(&signature);
 	return;
 
 free_return:
diff --git a/gpg-interface.c b/gpg-interface.c
index 5134ce2780..727a657a5b 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -294,7 +294,7 @@ void print_signature_buffer(const struct signature_check *sigc, unsigned flags)
 		fputs(output, stderr);
 }
 
-size_t parse_signature(const char *buf, size_t size)
+size_t parse_signed_buffer(const char *buf, size_t size)
 {
 	size_t len = 0;
 	size_t match = size;
@@ -310,6 +310,17 @@ size_t parse_signature(const char *buf, size_t size)
 	return match;
 }
 
+int parse_signature(const char *buf, size_t size, struct strbuf *payload, struct strbuf *signature)
+{
+	size_t match = parse_signed_buffer(buf, size);
+	if (match != size) {
+		strbuf_add(payload, buf, match);
+		strbuf_add(signature, buf + match, size - match);
+		return 1;
+	}
+	return 0;
+}
+
 void set_signing_key(const char *key)
 {
 	free(configured_signing_key);
diff --git a/gpg-interface.h b/gpg-interface.h
index 93cc3aff5c..c5686fa31b 100644
--- a/gpg-interface.h
+++ b/gpg-interface.h
@@ -29,13 +29,20 @@ struct signature_check {
 
 void signature_check_clear(struct signature_check *sigc);
 
+/*
+ * Look at a GPG signed tag object.  If such a signature exists, store it in
+ * signature and the signed content in payload.  Return 1 if a signature was
+ * found, and 0 otherwise.
+ */
+int parse_signature(const char *buf, size_t size, struct strbuf *payload, struct strbuf *signature);
+
 /*
  * Look at GPG signed content (e.g. a signed tag object), whose
  * payload is followed by a detached signature on it.  Return the
  * offset where the embedded detached signature begins, or the end of
  * the data when there is no such signature.
  */
-size_t parse_signature(const char *buf, size_t size);
+size_t parse_signed_buffer(const char *buf, size_t size);
 
 /*
  * Create a detached signature for the contents of "buffer" and append
diff --git a/log-tree.c b/log-tree.c
index 4e32638de8..06c8ec1e4e 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -499,7 +499,9 @@ static int show_one_mergetag(struct commit *commit,
 	struct strbuf verify_message;
 	struct signature_check sigc = { 0 };
 	int status, nth;
-	size_t payload_size, gpg_message_offset;
+	size_t gpg_message_offset;
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 
 	hash_object_file(extra->value, extra->len, type_name(OBJ_TAG), &oid);
 	tag = lookup_tag(the_repository, &oid);
@@ -522,13 +524,11 @@ static int show_one_mergetag(struct commit *commit,
 			    "parent #%d, tagged '%s'\n", nth + 1, tag->tag);
 	gpg_message_offset = verify_message.len;
 
-	payload_size = parse_signature(extra->value, extra->len);
 	status = -1;
-	if (extra->len > payload_size) {
+	if (parse_signature(extra->value, extra->len, &payload, &signature)) {
 		/* could have a good signature */
-		if (!check_signature(extra->value, payload_size,
-				     extra->value + payload_size,
-				     extra->len - payload_size, &sigc)) {
+		if (!check_signature(payload.buf, payload.len,
+				     signature.buf, signature.len, &sigc)) {
 			strbuf_addstr(&verify_message, sigc.gpg_output);
 			signature_check_clear(&sigc);
 			status = 0; /* good */
@@ -539,6 +539,8 @@ static int show_one_mergetag(struct commit *commit,
 
 	show_sig_lines(opt, status, verify_message.buf);
 	strbuf_release(&verify_message);
+	strbuf_release(&payload);
+	strbuf_release(&signature);
 	return 0;
 }
 
diff --git a/ref-filter.c b/ref-filter.c
index 6867e33648..212f1165bb 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1161,7 +1161,13 @@ static void find_subpos(const char *buf,
 			unsigned long *nonsiglen,
 			const char **sig, unsigned long *siglen)
 {
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 	const char *eol;
+	const char *end = buf + strlen(buf);
+	const char *sigstart;
+
+
 	/* skip past header until we hit empty line */
 	while (*buf && *buf != '\n') {
 		eol = strchrnul(buf, '\n');
@@ -1174,13 +1180,14 @@ static void find_subpos(const char *buf,
 		buf++;
 
 	/* parse signature first; we might not even have a subject line */
-	*sig = buf + parse_signature(buf, strlen(buf));
-	*siglen = strlen(*sig);
+	parse_signature(buf, end - buf, &payload, &signature);
+	*sig = strbuf_detach(&signature, siglen);
+	sigstart = buf + parse_signed_buffer(buf, strlen(buf));
 
 	/* subject is first non-empty line */
 	*sub = buf;
 	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
+	while (buf < sigstart && *buf && *buf != '\n') {
 		eol = strchrnul(buf, '\n');
 		if (*eol)
 			eol++;
@@ -1196,7 +1203,7 @@ static void find_subpos(const char *buf,
 		buf++;
 	*body = buf;
 	*bodylen = strlen(buf);
-	*nonsiglen = *sig - buf;
+	*nonsiglen = sigstart - buf;
 }
 
 /*
@@ -1234,6 +1241,7 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 		struct used_atom *atom = &used_atom[i];
 		const char *name = atom->name;
 		struct atom_value *v = &val[i];
+
 		if (!!deref != (*name == '*'))
 			continue;
 		if (deref)
@@ -1273,6 +1281,8 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			v->s = strbuf_detach(&s, NULL);
 		} else if (atom->u.contents.option == C_BARE)
 			v->s = xstrdup(subpos);
+
+		free((void *)sigpos);
 	}
 }
 
diff --git a/tag.c b/tag.c
index 71b544467e..5d04506d10 100644
--- a/tag.c
+++ b/tag.c
@@ -13,26 +13,27 @@ const char *tag_type = "tag";
 static int run_gpg_verify(const char *buf, unsigned long size, unsigned flags)
 {
 	struct signature_check sigc;
-	size_t payload_size;
+	struct strbuf payload = STRBUF_INIT;
+	struct strbuf signature = STRBUF_INIT;
 	int ret;
 
 	memset(&sigc, 0, sizeof(sigc));
 
-	payload_size = parse_signature(buf, size);
-
-	if (size == payload_size) {
+	if (!parse_signature(buf, size, &payload, &signature)) {
 		if (flags & GPG_VERIFY_VERBOSE)
-			write_in_full(1, buf, payload_size);
+			write_in_full(1, buf, size);
 		return error("no signature found");
 	}
 
-	ret = check_signature(buf, payload_size, buf + payload_size,
-				size - payload_size, &sigc);
+	ret = check_signature(payload.buf, payload.len, signature.buf,
+				signature.len, &sigc);
 
 	if (!(flags & GPG_VERIFY_OMIT_STATUS))
 		print_signature_buffer(&sigc, flags);
 
 	signature_check_clear(&sigc);
+	strbuf_release(&payload);
+	strbuf_release(&signature);
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 17/22] tag: store SHA-256 signatures in a header
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (15 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 16/22] gpg-interface: improve interface for parsing tags brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 18/22] fast-import: permit reading multiple marks files brian m. carlson
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

In the future, we'll want to allow a user to sign both the SHA-1 version
of a tag and the SHA-256 version at the same time.  Since for SHA-1 the
signature is appended to the tag message, we must use a different way to
allow multiple signature.

The transition plan envisions this using a gpgsig-sha256 header, much as
for commits.  Refactor the commit code that performs parsing of this
header and use it for tags that use SHA-256.  Check that we get tags in
the correct format depending on what algorithm we're using.

Note that currently we have no way to rewrite an object into another
hash algorithm, and therefore we don't have a way to verify the
signatures of the other hash algorithm.  Because of the way the
signatures are stored, we'll reject commits signed with both algorithms,
which is essential for security.  If we allowed both signatures despite
not being able to verify them and one signature were invalid, we'd end
up with a security problem.

There are, however, a few things to note.

In the ref-filter code, we hoist the signature parsing above the blank
line delimiting headers and body so we can find the signature when using
SHA-256.  For similar reasons, t6300 no longer emits the signature as
part of the body since it's no longer part of the body.

We mark a test for exporting signatures with remote helpers as SHA-1
only.  Since we no longer have signatures in the body (those are for
SHA-1 only), they cannot be exported this way.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/mktag.c           | 14 ++++++++++++++
 builtin/tag.c             |  4 +++-
 commit.c                  | 19 +++++++++++++++----
 commit.h                  |  8 ++++++++
 gpg-interface.c           | 16 ++++++++++------
 ref-filter.c              |  7 +++----
 t/t5801-remote-helpers.sh |  4 +++-
 t/t6300-for-each-ref.sh   | 34 +++++++++++++++++++++++++---------
 t/t7004-tag.sh            |  8 +++++++-
 t/t7030-verify-tag.sh     | 17 +++++++++++++++++
 10 files changed, 105 insertions(+), 26 deletions(-)

diff --git a/builtin/mktag.c b/builtin/mktag.c
index 6fb7dc8578..6c568b4d74 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -141,6 +141,20 @@ static int verify_tag(char *buffer, unsigned long size)
 			(uintmax_t) (tagger_line - buffer));
 	tagger_line += 6;
 
+	if (hash_algo_by_ptr(the_hash_algo) == GIT_HASH_SHA256 &&
+		!memcmp(tagger_line, "gpgsig-sha256 ", 14)) {
+		char *p = strpbrk(tagger_line + 1, "\n");
+		if (!p)
+			return error("char%"PRIuMAX": could not find end of line",
+				(uintmax_t) (tagger_line - buffer));
+		tagger_line = p + 1;
+		while (*tagger_line == ' ' && (p = strpbrk(tagger_line, "\n")))
+			tagger_line = p + 1;
+		if (!p)
+			return error("char%"PRIuMAX": could not find end of line",
+				(uintmax_t) (tagger_line - buffer));
+	}
+
 	/* Verify the blank line separating the header from the body */
 	if (*tagger_line != '\n')
 		return error("char%"PRIuMAX": trailing garbage in tag header",
diff --git a/builtin/tag.c b/builtin/tag.c
index 6b95c6a2ea..ab6b5044e6 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -128,7 +128,9 @@ static int verify_tag(const char *name, const char *ref,
 
 static int do_sign(struct strbuf *buffer)
 {
-	return sign_buffer(buffer, buffer, get_signing_key());
+	if (hash_algo_by_ptr(the_hash_algo) == GIT_HASH_SHA1)
+		return sign_buffer(buffer, buffer, get_signing_key());
+	return sign_with_header(buffer, get_signing_key());
 }
 
 static const char tag_template[] =
diff --git a/commit.c b/commit.c
index d44f0c1c26..ded396b69d 100644
--- a/commit.c
+++ b/commit.c
@@ -970,7 +970,7 @@ static const char *gpg_sig_headers[] = {
 	"gpgsig-sha256",
 };
 
-static int do_sign_commit(struct strbuf *buf, const char *keyid)
+int sign_with_header(struct strbuf *buf, const char *keyid)
 {
 	struct strbuf sig = STRBUF_INIT;
 	int inspos, copypos;
@@ -1010,12 +1010,24 @@ static int do_sign_commit(struct strbuf *buf, const char *keyid)
 	return 0;
 }
 
+
+
 int parse_signed_commit(const struct commit *commit,
 			struct strbuf *payload, struct strbuf *signature)
 {
-
 	unsigned long size;
 	const char *buffer = get_commit_buffer(commit, &size);
+	int ret = parse_buffer_signed_by_header(buffer, size, payload, signature);
+
+	unuse_commit_buffer(commit, buffer);
+	return ret;
+}
+
+int parse_buffer_signed_by_header(const char *buffer,
+				  unsigned long size,
+				  struct strbuf *payload,
+				  struct strbuf *signature)
+{
 	int in_signature, saw_signature = -1;
 	const char *line, *tail;
 	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)];
@@ -1048,7 +1060,6 @@ int parse_signed_commit(const struct commit *commit,
 		}
 		line = next;
 	}
-	unuse_commit_buffer(commit, buffer);
 	return saw_signature;
 }
 
@@ -1488,7 +1499,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 	if (encoding_is_utf8 && !verify_utf8(&buffer))
 		fprintf(stderr, _(commit_utf8_warn));
 
-	if (sign_commit && do_sign_commit(&buffer, sign_commit)) {
+	if (sign_commit && sign_with_header(&buffer, sign_commit)) {
 		result = -1;
 		goto out;
 	}
diff --git a/commit.h b/commit.h
index 221cdaa34b..257acc857b 100644
--- a/commit.h
+++ b/commit.h
@@ -392,4 +392,12 @@ int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void
 LAST_ARG_MUST_BE_NULL
 int run_commit_hook(int editor_is_used, const char *index_file, const char *name, ...);
 
+/* Sign a commit or tag buffer, storing the result in a header. */
+int sign_with_header(struct strbuf *buf, const char *keyid);
+/* Parse the signature out of a header. */
+int parse_buffer_signed_by_header(const char *buffer,
+				  unsigned long size,
+				  struct strbuf *payload,
+				  struct strbuf *signature);
+
 #endif /* COMMIT_H */
diff --git a/gpg-interface.c b/gpg-interface.c
index 727a657a5b..961bdb7224 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "commit.h"
 #include "config.h"
 #include "run-command.h"
 #include "strbuf.h"
@@ -312,13 +313,16 @@ size_t parse_signed_buffer(const char *buf, size_t size)
 
 int parse_signature(const char *buf, size_t size, struct strbuf *payload, struct strbuf *signature)
 {
-	size_t match = parse_signed_buffer(buf, size);
-	if (match != size) {
-		strbuf_add(payload, buf, match);
-		strbuf_add(signature, buf + match, size - match);
-		return 1;
+	if (hash_algo_by_ptr(the_hash_algo) == GIT_HASH_SHA1) {
+		size_t match = parse_signed_buffer(buf, size);
+		if (match != size) {
+			strbuf_add(payload, buf, match);
+			strbuf_add(signature, buf + match, size - match);
+			return 1;
+		}
+		return 0;
 	}
-	return 0;
+	return parse_buffer_signed_by_header(buf, size, payload, signature);
 }
 
 void set_signing_key(const char *key)
diff --git a/ref-filter.c b/ref-filter.c
index 212f1165bb..933530a14c 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1167,6 +1167,8 @@ static void find_subpos(const char *buf,
 	const char *end = buf + strlen(buf);
 	const char *sigstart;
 
+	/* parse signature first; we might not even have a subject line */
+	parse_signature(buf, end - buf, &payload, &signature);
 
 	/* skip past header until we hit empty line */
 	while (*buf && *buf != '\n') {
@@ -1178,9 +1180,6 @@ static void find_subpos(const char *buf,
 	/* skip any empty lines */
 	while (*buf == '\n')
 		buf++;
-
-	/* parse signature first; we might not even have a subject line */
-	parse_signature(buf, end - buf, &payload, &signature);
 	*sig = strbuf_detach(&signature, siglen);
 	sigstart = buf + parse_signed_buffer(buf, strlen(buf));
 
@@ -1267,7 +1266,7 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, void *buf)
 			v->s = xmemdupz(sigpos, siglen);
 		else if (atom->u.contents.option == C_LINES) {
 			struct strbuf s = STRBUF_INIT;
-			const char *contents_end = bodylen + bodypos - siglen;
+			const char *contents_end = bodypos + nonsiglen;
 
 			/*  Size is the length of the message after removing the signature */
 			append_lines(&s, subpos, contents_end - subpos, atom->u.contents.nlines);
diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
index 121e5c6edb..801802be9d 100755
--- a/t/t5801-remote-helpers.sh
+++ b/t/t5801-remote-helpers.sh
@@ -192,7 +192,9 @@ test_expect_success GPG 'push signed tag' '
 	test_must_fail compare_refs local signed-tag server signed-tag
 '
 
-test_expect_success GPG 'push signed tag with signed-tags capability' '
+# SHA-256 signatures are stored in the header and can't be round-tripped through
+# fast-export.
+test_expect_success GPG,SHA1 'push signed tag with signed-tags capability' '
 	(cd local &&
 	git checkout master &&
 	git tag -s -m signed-tag signed-tag-2 &&
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index b3c1092338..caeabfb293 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -54,6 +54,17 @@ test_atom() {
 	"
 }
 
+test_atom_hash () {
+	local val
+	if [ "$(test_oid algo)" = sha1 ]
+	then
+		val="$3"
+	else
+		val="$4"
+	fi
+	test_atom "$1" "$2" "$val"
+}
+
 hexlen=$(test_oid hexsz)
 disklen=$(test_oid disklen)
 
@@ -625,30 +636,35 @@ sig='-----BEGIN PGP SIGNATURE-----
 PREREQ=GPG
 test_atom refs/tags/signed-empty subject ''
 test_atom refs/tags/signed-empty contents:subject ''
-test_atom refs/tags/signed-empty body "$sig"
+test_atom_hash refs/tags/signed-empty body "$sig" ''
 test_atom refs/tags/signed-empty contents:body ''
 test_atom refs/tags/signed-empty contents:signature "$sig"
-test_atom refs/tags/signed-empty contents "$sig"
+test_atom_hash refs/tags/signed-empty contents "$sig" ''
 
 test_atom refs/tags/signed-short subject 'subject line'
 test_atom refs/tags/signed-short contents:subject 'subject line'
-test_atom refs/tags/signed-short body "$sig"
+test_atom_hash refs/tags/signed-short body "$sig" ''
 test_atom refs/tags/signed-short contents:body ''
 test_atom refs/tags/signed-short contents:signature "$sig"
-test_atom refs/tags/signed-short contents "subject line
-$sig"
+test_atom_hash refs/tags/signed-short contents "subject line
+$sig" 'subject line
+'
 
 test_atom refs/tags/signed-long subject 'subject line'
 test_atom refs/tags/signed-long contents:subject 'subject line'
-test_atom refs/tags/signed-long body "body contents
-$sig"
+test_atom_hash refs/tags/signed-long body "body contents
+$sig" 'body contents
+'
 test_atom refs/tags/signed-long contents:body 'body contents
 '
 test_atom refs/tags/signed-long contents:signature "$sig"
-test_atom refs/tags/signed-long contents "subject line
+test_atom_hash refs/tags/signed-long contents "subject line
 
 body contents
-$sig"
+$sig" 'subject line
+
+body contents
+'
 
 sort >expected <<EOF
 $(git rev-parse refs/tags/bogo) <committer@example.com> refs/tags/bogo
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 6db92bd3ba..bd74b2d7e0 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -425,7 +425,13 @@ test_expect_success \
 # creating annotated tags:
 
 get_tag_msg () {
-	git cat-file tag "$1" | sed -e "/BEGIN PGP/q"
+	if [ "$(test_oid algo)" = sha1 ]
+	then
+		git cat-file tag "$1" | sed -e "/BEGIN PGP/q"
+	else
+
+		git cat-file tag "$1" | sed -e '/^gpgsig-sha256/{s/^gpgsig-sha256 //;h;d};/^ /d;${p;x;/^$/d}'
+	fi
 }
 
 # run test_tick before committing always gives the time in that timezone
diff --git a/t/t7030-verify-tag.sh b/t/t7030-verify-tag.sh
index 8f077bea60..d339512da2 100755
--- a/t/t7030-verify-tag.sh
+++ b/t/t7030-verify-tag.sh
@@ -79,6 +79,23 @@ test_expect_success GPG 'verify and show signatures' '
 	)
 '
 
+test_expect_success GPG 'signature has expected format' '
+	for tag in initial second merge fourth-signed sixth-signed seventh-signed
+	do
+		if [ "$(test_oid algo)"	= sha1 ]
+		then
+			git cat-file tag seventh-signed >output &&
+			! grep gpgsig output &&
+			grep "^-----BEGIN PGP SIGNATURE-----" output
+		else
+			git cat-file tag seventh-signed >output &&
+			grep gpgsig-sha256 output &&
+			! grep "^-----BEGIN PGP SIGNATURE-----" output
+		fi &&
+		echo $tag OK || exit 1
+	done
+'
+
 test_expect_success GPGSM 'verify and show signatures x509' '
 	git verify-tag ninth-signed-x509 2>actual &&
 	grep "Good signature from" actual &&

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 18/22] fast-import: permit reading multiple marks files
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (16 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 17/22] tag: store SHA-256 signatures in a header brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 19/22] fast-import: add helper function for inserting mark object entries brian m. carlson
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

In the future, we'll want to read marks files for submodules as well.
Refactor the existing code to make it possible to read multiple marks
files, each into their own marks set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index b8b65a801c..b9ecd89699 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -493,9 +493,8 @@ static char *pool_strdup(const char *s)
 	return r;
 }
 
-static void insert_mark(uintmax_t idnum, struct object_entry *oe)
+static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry *oe)
 {
-	struct mark_set *s = marks;
 	while ((idnum >> s->shift) >= 1024) {
 		s = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set));
 		s->shift = marks->shift + 10;
@@ -919,7 +918,7 @@ static int store_object(
 
 	e = insert_object(&oid);
 	if (mark)
-		insert_mark(mark, e);
+		insert_mark(marks, mark, e);
 	if (e->idx.offset) {
 		duplicate_count_by_type[type]++;
 		return 1;
@@ -1117,7 +1116,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 	e = insert_object(&oid);
 
 	if (mark)
-		insert_mark(mark, e);
+		insert_mark(marks, mark, e);
 
 	if (e->idx.offset) {
 		duplicate_count_by_type[OBJ_BLOB]++;
@@ -1712,16 +1711,9 @@ static void dump_marks(void)
 	}
 }
 
-static void read_marks(void)
+static void read_mark_file(struct mark_set *s, FILE *f)
 {
 	char line[512];
-	FILE *f = fopen(import_marks_file, "r");
-	if (f)
-		;
-	else if (import_marks_file_ignore_missing && errno == ENOENT)
-		goto done; /* Marks file does not exist */
-	else
-		die_errno("cannot read '%s'", import_marks_file);
 	while (fgets(line, sizeof(line), f)) {
 		uintmax_t mark;
 		char *end;
@@ -1747,8 +1739,20 @@ static void read_marks(void)
 			e->pack_id = MAX_PACK_ID;
 			e->idx.offset = 1; /* just not zero! */
 		}
-		insert_mark(mark, e);
+		insert_mark(s, mark, e);
 	}
+}
+
+static void read_marks(void)
+{
+	FILE *f = fopen(import_marks_file, "r");
+	if (f)
+		;
+	else if (import_marks_file_ignore_missing && errno == ENOENT)
+		goto done; /* Marks file does not exist */
+	else
+		die_errno("cannot read '%s'", import_marks_file);
+	read_mark_file(marks, f);
 	fclose(f);
 done:
 	import_marks_file_done = 1;
@@ -3130,7 +3134,7 @@ static void parse_alias(void)
 		die(_("Expected 'to' command, got %s"), command_buf.buf);
 	e = find_object(&b.oid);
 	assert(e);
-	insert_mark(next_mark, e);
+	insert_mark(marks, next_mark, e);
 }
 
 static char* make_fast_import_path(const char *path)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 19/22] fast-import: add helper function for inserting mark object entries
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (17 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 18/22] fast-import: permit reading multiple marks files brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 20/22] fast-import: make find_marks work on any mark set brian m. carlson
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

Currently, everything we want to insert into a mark set is an object
entry. However, in the future, we will want to insert objects of other
types. Teach read_mark_file to take a function pointer which helps us
insert the object we want into our mark set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 37 ++++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index b9ecd89699..3ce4a04473 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -131,6 +131,8 @@ struct recent_command {
 	char *buf;
 };
 
+typedef void (*mark_set_inserter_t)(struct mark_set *s, struct object_id *oid, uintmax_t mark);
+
 /* Configured limits on output */
 static unsigned long max_depth = 50;
 static off_t max_packsize;
@@ -1711,14 +1713,30 @@ static void dump_marks(void)
 	}
 }
 
-static void read_mark_file(struct mark_set *s, FILE *f)
+static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
+{
+	struct object_entry *e;
+	e = find_object(oid);
+	if (!e) {
+		enum object_type type = oid_object_info(the_repository,
+							oid, NULL);
+		if (type < 0)
+			die("object not found: %s", oid_to_hex(oid));
+		e = insert_object(oid);
+		e->type = type;
+		e->pack_id = MAX_PACK_ID;
+		e->idx.offset = 1; /* just not zero! */
+	}
+	insert_mark(s, mark, e);
+}
+
+static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inserter)
 {
 	char line[512];
 	while (fgets(line, sizeof(line), f)) {
 		uintmax_t mark;
 		char *end;
 		struct object_id oid;
-		struct object_entry *e;
 
 		end = strchr(line, '\n');
 		if (line[0] != ':' || !end)
@@ -1728,18 +1746,7 @@ static void read_mark_file(struct mark_set *s, FILE *f)
 		if (!mark || end == line + 1
 			|| *end != ' ' || get_oid_hex(end + 1, &oid))
 			die("corrupt mark line: %s", line);
-		e = find_object(&oid);
-		if (!e) {
-			enum object_type type = oid_object_info(the_repository,
-								&oid, NULL);
-			if (type < 0)
-				die("object not found: %s", oid_to_hex(&oid));
-			e = insert_object(&oid);
-			e->type = type;
-			e->pack_id = MAX_PACK_ID;
-			e->idx.offset = 1; /* just not zero! */
-		}
-		insert_mark(s, mark, e);
+		inserter(s, &oid, mark);
 	}
 }
 
@@ -1752,7 +1759,7 @@ static void read_marks(void)
 		goto done; /* Marks file does not exist */
 	else
 		die_errno("cannot read '%s'", import_marks_file);
-	read_mark_file(marks, f);
+	read_mark_file(marks, f, insert_object_entry);
 	fclose(f);
 done:
 	import_marks_file_done = 1;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 20/22] fast-import: make find_marks work on any mark set
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (18 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 19/22] fast-import: add helper function for inserting mark object entries brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 21/22] fast-import: add a generic function to iterate over marks brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 22/22] fast-import: add options for rewriting submodules brian m. carlson
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

In the future, we'll use multiple different mark sets with this
function, so make it take an argument that points to the mark set to
operate on.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 3ce4a04473..8aaa7f6289 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -517,10 +517,9 @@ static void insert_mark(struct mark_set *s, uintmax_t idnum, struct object_entry
 	s->data.marked[idnum] = oe;
 }
 
-static struct object_entry *find_mark(uintmax_t idnum)
+static void *find_mark(struct mark_set *s, uintmax_t idnum)
 {
 	uintmax_t orig_idnum = idnum;
-	struct mark_set *s = marks;
 	struct object_entry *oe = NULL;
 	if ((idnum >> s->shift) < 1024) {
 		while (s && s->shift) {
@@ -2225,7 +2224,7 @@ static void file_change_m(const char *p, struct branch *b)
 	}
 
 	if (*p == ':') {
-		oe = find_mark(parse_mark_ref_space(&p));
+		oe = find_mark(marks, parse_mark_ref_space(&p));
 		oidcpy(&oid, &oe->idx.oid);
 	} else if (skip_prefix(p, "inline ", &p)) {
 		inline_data = 1;
@@ -2399,7 +2398,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 	/* Now parse the notemodify command. */
 	/* <dataref> or 'inline' */
 	if (*p == ':') {
-		oe = find_mark(parse_mark_ref_space(&p));
+		oe = find_mark(marks, parse_mark_ref_space(&p));
 		oidcpy(&oid, &oe->idx.oid);
 	} else if (skip_prefix(p, "inline ", &p)) {
 		inline_data = 1;
@@ -2420,7 +2419,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 		oidcpy(&commit_oid, &s->oid);
 	} else if (*p == ':') {
 		uintmax_t commit_mark = parse_mark_ref_eol(p);
-		struct object_entry *commit_oe = find_mark(commit_mark);
+		struct object_entry *commit_oe = find_mark(marks, commit_mark);
 		if (commit_oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", commit_mark);
 		oidcpy(&commit_oid, &commit_oe->idx.oid);
@@ -2524,7 +2523,7 @@ static int parse_objectish(struct branch *b, const char *objectish)
 		oidcpy(&b->branch_tree.versions[1].oid, t);
 	} else if (*objectish == ':') {
 		uintmax_t idnum = parse_mark_ref_eol(objectish);
-		struct object_entry *oe = find_mark(idnum);
+		struct object_entry *oe = find_mark(marks, idnum);
 		if (oe->type != OBJ_COMMIT)
 			die("Mark :%" PRIuMAX " not a commit", idnum);
 		if (!oideq(&b->oid, &oe->idx.oid)) {
@@ -2588,7 +2587,7 @@ static struct hash_list *parse_merge(unsigned int *count)
 			oidcpy(&n->oid, &s->oid);
 		else if (*from == ':') {
 			uintmax_t idnum = parse_mark_ref_eol(from);
-			struct object_entry *oe = find_mark(idnum);
+			struct object_entry *oe = find_mark(marks, idnum);
 			if (oe->type != OBJ_COMMIT)
 				die("Mark :%" PRIuMAX " not a commit", idnum);
 			oidcpy(&n->oid, &oe->idx.oid);
@@ -2762,7 +2761,7 @@ static void parse_new_tag(const char *arg)
 	} else if (*from == ':') {
 		struct object_entry *oe;
 		from_mark = parse_mark_ref_eol(from);
-		oe = find_mark(from_mark);
+		oe = find_mark(marks, from_mark);
 		type = oe->type;
 		oidcpy(&oid, &oe->idx.oid);
 	} else if (!get_oid(from, &oid)) {
@@ -2920,7 +2919,7 @@ static void parse_get_mark(const char *p)
 	if (*p != ':')
 		die("Not a mark: %s", p);
 
-	oe = find_mark(parse_mark_ref_eol(p));
+	oe = find_mark(marks, parse_mark_ref_eol(p));
 	if (!oe)
 		die("Unknown mark: %s", command_buf.buf);
 
@@ -2935,7 +2934,7 @@ static void parse_cat_blob(const char *p)
 
 	/* cat-blob SP <object> LF */
 	if (*p == ':') {
-		oe = find_mark(parse_mark_ref_eol(p));
+		oe = find_mark(marks, parse_mark_ref_eol(p));
 		if (!oe)
 			die("Unknown mark: %s", command_buf.buf);
 		oidcpy(&oid, &oe->idx.oid);
@@ -3010,7 +3009,7 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 	struct object_entry *e;
 
 	if (**p == ':') {	/* <mark> */
-		e = find_mark(parse_mark_ref_space(p));
+		e = find_mark(marks, parse_mark_ref_space(p));
 		if (!e)
 			die("Unknown mark: %s", command_buf.buf);
 		oidcpy(&oid, &e->idx.oid);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 21/22] fast-import: add a generic function to iterate over marks
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (19 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 20/22] fast-import: make find_marks work on any mark set brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  2020-01-13 12:47 ` [RFC PATCH 22/22] fast-import: add options for rewriting submodules brian m. carlson
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

Currently, we can iterate over marks only to dump them to a file. In the
future, we'll want to perform an arbitrary operation over the items of a
mark set. Add a function, for_each_mark, that iterates over marks in a
set and performs an arbitrary callback function for each mark. Switch
the mark dumping routine to use this function now that it's available.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 fast-import.c | 50 ++++++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 8aaa7f6289..6711f71ba7 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -132,6 +132,7 @@ struct recent_command {
 };
 
 typedef void (*mark_set_inserter_t)(struct mark_set *s, struct object_id *oid, uintmax_t mark);
+typedef void (*each_mark_fn_t)(uintmax_t mark, void *obj, void *cbp);
 
 /* Configured limits on output */
 static unsigned long max_depth = 50;
@@ -232,6 +233,29 @@ static void parse_get_mark(const char *p);
 static void parse_cat_blob(const char *p);
 static void parse_ls(const char *p, struct branch *b);
 
+static void for_each_mark(struct mark_set *m, uintmax_t base, each_mark_fn_t callback, void *p)
+{
+	uintmax_t k;
+	if (m->shift) {
+		for (k = 0; k < 1024; k++) {
+			if (m->data.sets[k])
+				for_each_mark(m->data.sets[k], base + (k << m->shift), callback, p);
+		}
+	} else {
+		for (k = 0; k < 1024; k++) {
+			if (m->data.marked[k])
+				callback(base + k, m->data.marked[k], p);
+		}
+	}
+}
+
+static void dump_marks_fn(uintmax_t mark, void *object, void *cbp) {
+	struct object_entry *e = object;
+	FILE *f = cbp;
+
+	fprintf(f, ":%" PRIuMAX " %s\n", mark, oid_to_hex(&e->idx.oid));
+}
+
 static void write_branch_report(FILE *rpt, struct branch *b)
 {
 	fprintf(rpt, "%s:\n", b->name);
@@ -260,8 +284,6 @@ static void write_branch_report(FILE *rpt, struct branch *b)
 	fputc('\n', rpt);
 }
 
-static void dump_marks_helper(FILE *, uintmax_t, struct mark_set *);
-
 static void write_crash_report(const char *err)
 {
 	char *loc = git_pathdup("fast_import_crash_%"PRIuMAX, (uintmax_t) getpid());
@@ -340,7 +362,7 @@ static void write_crash_report(const char *err)
 	if (export_marks_file)
 		fprintf(rpt, "  exported to %s\n", export_marks_file);
 	else
-		dump_marks_helper(rpt, 0, marks);
+		for_each_mark(marks, 0, dump_marks_fn, rpt);
 
 	fputc('\n', rpt);
 	fputs("-------------------\n", rpt);
@@ -1655,26 +1677,6 @@ static void dump_tags(void)
 	strbuf_release(&err);
 }
 
-static void dump_marks_helper(FILE *f,
-	uintmax_t base,
-	struct mark_set *m)
-{
-	uintmax_t k;
-	if (m->shift) {
-		for (k = 0; k < 1024; k++) {
-			if (m->data.sets[k])
-				dump_marks_helper(f, base + (k << m->shift),
-					m->data.sets[k]);
-		}
-	} else {
-		for (k = 0; k < 1024; k++) {
-			if (m->data.marked[k])
-				fprintf(f, ":%" PRIuMAX " %s\n", base + k,
-					oid_to_hex(&m->data.marked[k]->idx.oid));
-		}
-	}
-}
-
 static void dump_marks(void)
 {
 	struct lock_file mark_lock = LOCK_INIT;
@@ -1704,7 +1706,7 @@ static void dump_marks(void)
 		return;
 	}
 
-	dump_marks_helper(f, 0, marks);
+	for_each_mark(marks, 0, dump_marks_fn, f);
 	if (commit_lock_file(&mark_lock)) {
 		failure |= error_errno("Unable to write file %s",
 				       export_marks_file);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 22/22] fast-import: add options for rewriting submodules
  2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
                   ` (20 preceding siblings ...)
  2020-01-13 12:47 ` [RFC PATCH 21/22] fast-import: add a generic function to iterate over marks brian m. carlson
@ 2020-01-13 12:47 ` brian m. carlson
  21 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-13 12:47 UTC (permalink / raw)
  To: git

When converting a repository using submodules from one hash algorithm to
another, it is necessary to rewrite the submodules from the old
algorithm to the new algorithm, since only references to submodules, not
their contents, are written to the fast-export stream. Without rewriting
the submodules, fast-import fails with an "Invalid dataref" error when
encountering a submodule in another algorithm.

Add a pair of options, --rewrite-submodules-from and
--rewrite-submodules-to, that take a list of marks produced by
fast-export and fast-import, respectively, when processing the
submodule. Use these marks to map the submodule commits from the old
algorithm to the new algorithm.

We read marks into two corresponding struct mark_set objects and then
perform a mapping from the old to the new using a hash table. This lets
us reuse the same mark parsing code that is used elsewhere and allows us
to efficiently read and match marks based on their ID, since mark files
need not be sorted.

Note that because we're using a khash table for the object IDs, and this
table copies values of struct object_id instead of taking references to
them, it's necessary to zero the struct object_id values that we use to
insert and look up in the table. Otherwise, we would end up with SHA-1
values that don't match because of whatever stack garbage might be left
in the unused area.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 Documentation/git-fast-import.txt |  20 ++++++
 fast-import.c                     | 112 ++++++++++++++++++++++++++++--
 t/t9300-fast-import.sh            | 109 +++++++++++++++++++++++++++++
 3 files changed, 236 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 7889f95940..77c6b3d001 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -122,6 +122,26 @@ Locations of Marks Files
 Relative and non-relative marks may be combined by interweaving
 --(no-)-relative-marks with the --(import|export)-marks= options.
 
+Submodule Rewriting
+~~~~~~~~~~~~~~~~~~~
+
+--rewrite-submodules-from=<name>:<file>::
+--rewrite-submodules-to=<name>:<file>::
+  Rewrite the object IDs for the submodule specified by <name> from the values
+	used in the from <file> to those used in the to <file>. The from marks should
+	have been created by `git fast-export`, and the to marks should have been
+	created by `git fast-import` when importing that same submodule.
++
+<name> may be any arbitrary string not containing a colon character, but the
+same value must be used with both options when specifying corresponding marks.
+Multiple submodules may be specified with different values for <name>. It is an
+error not to use these options in corresponding pairs.
++
+These options are primarily useful when converting a repository from one hash
+algorithm to another; without them, fast-import will fail if it encounters a
+submodule because it has no way of writing the object ID into the new hash
+algorithm.
+
 Performance and Compression Tuning
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/fast-import.c b/fast-import.c
index 6711f71ba7..202dda11a6 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -18,6 +18,7 @@
 #include "object-store.h"
 #include "mem-pool.h"
 #include "commit-reach.h"
+#include "khash.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<<PACK_ID_BITS)-1)
@@ -53,6 +54,7 @@ struct object_entry_pool {
 
 struct mark_set {
 	union {
+		struct object_id *oids[1024];
 		struct object_entry *marked[1024];
 		struct mark_set *sets[1024];
 	} data;
@@ -225,6 +227,11 @@ static int allow_unsafe_features;
 /* Signal handling */
 static volatile sig_atomic_t checkpoint_requested;
 
+/* Submodule marks */
+static struct string_list sub_marks_from = STRING_LIST_INIT_DUP;
+static struct string_list sub_marks_to = STRING_LIST_INIT_DUP;
+static kh_oid_map_t *sub_oid_map;
+
 /* Where to write output of cat-blob commands */
 static int cat_blob_fd = STDOUT_FILENO;
 
@@ -1731,6 +1738,11 @@ static void insert_object_entry(struct mark_set *s, struct object_id *oid, uintm
 	insert_mark(s, mark, e);
 }
 
+static void insert_oid_entry(struct mark_set *s, struct object_id *oid, uintmax_t mark)
+{
+	insert_mark(s, mark, xmemdupz(oid, sizeof(*oid)));
+}
+
 static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inserter)
 {
 	char line[512];
@@ -1739,13 +1751,17 @@ static void read_mark_file(struct mark_set *s, FILE *f, mark_set_inserter_t inse
 		char *end;
 		struct object_id oid;
 
+		/* Ensure SHA-1 objects are padded with zeros. */
+		memset(oid.hash, 0, sizeof(oid.hash));
+
 		end = strchr(line, '\n');
 		if (line[0] != ':' || !end)
 			die("corrupt mark line: %s", line);
 		*end = 0;
 		mark = strtoumax(line + 1, &end, 10);
 		if (!mark || end == line + 1
-			|| *end != ' ' || get_oid_hex(end + 1, &oid))
+			|| *end != ' '
+			|| get_oid_hex_any(end + 1, &oid) == GIT_HASH_UNKNOWN)
 			die("corrupt mark line: %s", line);
 		inserter(s, &oid, mark);
 	}
@@ -2146,6 +2162,30 @@ static uintmax_t change_note_fanout(struct tree_entry *root,
 	return do_change_note_fanout(root, root, hex_oid, 0, path, 0, fanout);
 }
 
+static int parse_mapped_oid_hex(const char *hex, struct object_id *oid, const char **end)
+{
+	int algo;
+	khiter_t it;
+
+	/* Make SHA-1 object IDs have all-zero padding. */
+	memset(oid->hash, 0, sizeof(oid->hash));
+
+	algo = parse_oid_hex_any(hex, oid, end);
+	if (algo == GIT_HASH_UNKNOWN)
+		return -1;
+
+	it = kh_get_oid_map(sub_oid_map, *oid);
+	/* No such object? */
+	if (it == kh_end(sub_oid_map)) {
+		/* If we're using the same algorithm, pass it through. */
+		if (hash_algos[algo].format_id == the_hash_algo->format_id)
+			return 0;
+		return -1;
+	}
+	oidcpy(oid, kh_value(sub_oid_map, it));
+	return 0;
+}
+
 /*
  * Given a pointer into a string, parse a mark reference:
  *
@@ -2232,7 +2272,7 @@ static void file_change_m(const char *p, struct branch *b)
 		inline_data = 1;
 		oe = NULL; /* not used with inline_data, but makes gcc happy */
 	} else {
-		if (parse_oid_hex(p, &oid, &p))
+		if (parse_mapped_oid_hex(p, &oid, &p))
 			die("Invalid dataref: %s", command_buf.buf);
 		oe = find_object(&oid);
 		if (*p++ != ' ')
@@ -2406,7 +2446,7 @@ static void note_change_n(const char *p, struct branch *b, unsigned char *old_fa
 		inline_data = 1;
 		oe = NULL; /* not used with inline_data, but makes gcc happy */
 	} else {
-		if (parse_oid_hex(p, &oid, &p))
+		if (parse_mapped_oid_hex(p, &oid, &p))
 			die("Invalid dataref: %s", command_buf.buf);
 		oe = find_object(&oid);
 		if (*p++ != ' ')
@@ -2941,7 +2981,7 @@ static void parse_cat_blob(const char *p)
 			die("Unknown mark: %s", command_buf.buf);
 		oidcpy(&oid, &oe->idx.oid);
 	} else {
-		if (parse_oid_hex(p, &oid, &p))
+		if (parse_mapped_oid_hex(p, &oid, &p))
 			die("Invalid dataref: %s", command_buf.buf);
 		if (*p)
 			die("Garbage after SHA1: %s", command_buf.buf);
@@ -3005,6 +3045,42 @@ static struct object_entry *dereference(struct object_entry *oe,
 	return find_object(oid);
 }
 
+static void insert_mapped_mark(uintmax_t mark, void *object, void *cbp)
+{
+	struct object_id *fromoid = object;
+	struct object_id *tooid = find_mark(cbp, mark);
+	int ret;
+	khiter_t it;
+
+	it = kh_put_oid_map(sub_oid_map, *fromoid, &ret);
+	/* We've already seen this object. */
+	if (ret == 0)
+		return;
+	kh_value(sub_oid_map, it) = tooid;
+}
+
+static void build_mark_map_one(struct mark_set *from, struct mark_set *to)
+{
+	for_each_mark(from, 0, insert_mapped_mark, to);
+}
+
+static void build_mark_map(struct string_list *from, struct string_list *to)
+{
+	struct string_list_item *fromp, *top;
+
+	sub_oid_map = kh_init_oid_map();
+
+	for_each_string_list_item(fromp, from) {
+		top = string_list_lookup(to, fromp->string);
+		if (!fromp->util) {
+			die(_("Missing from marks for submodule '%s'"), fromp->string);
+		} else if (!top || !top->util) {
+			die(_("Missing to marks for submodule '%s'"), fromp->string);
+		}
+		build_mark_map_one(fromp->util, top->util);
+	}
+}
+
 static struct object_entry *parse_treeish_dataref(const char **p)
 {
 	struct object_id oid;
@@ -3016,7 +3092,7 @@ static struct object_entry *parse_treeish_dataref(const char **p)
 			die("Unknown mark: %s", command_buf.buf);
 		oidcpy(&oid, &e->idx.oid);
 	} else {	/* <sha1> */
-		if (parse_oid_hex(*p, &oid, p))
+		if (parse_mapped_oid_hex(*p, &oid, p))
 			die("Invalid dataref: %s", command_buf.buf);
 		e = find_object(&oid);
 		if (*(*p)++ != ' ')
@@ -3222,6 +3298,26 @@ static void option_export_pack_edges(const char *edges)
 	pack_edges = xfopen(edges, "a");
 }
 
+static void option_rewrite_submodules(const char *arg, struct string_list *list)
+{
+	struct mark_set *ms;
+	FILE *fp;
+	char *s = xstrdup(arg);
+	char *f = strchr(s, ':');
+	if (!f)
+		die(_("Expected format name:filename for submodule rewrite option"));
+	*f = '\0';
+	f++;
+	ms = xcalloc(1, sizeof(*ms));
+	string_list_insert(list, s)->util = ms;
+
+	fp = fopen(f, "r");
+	if (!fp)
+		die_errno("cannot read '%s'", f);
+	read_mark_file(ms, fp, insert_oid_entry);
+	fclose(fp);
+}
+
 static int parse_one_option(const char *option)
 {
 	if (skip_prefix(option, "max-pack-size=", &option)) {
@@ -3284,6 +3380,11 @@ static int parse_one_feature(const char *feature, int from_stream)
 		option_export_marks(arg);
 	} else if (!strcmp(feature, "alias")) {
 		; /* Don't die - this feature is supported */
+	} else if (skip_prefix(feature, "rewrite-submodules-to=", &arg)) {
+		option_rewrite_submodules(arg, &sub_marks_to);
+	} else if (skip_prefix(feature, "rewrite-submodules-from=", &arg)) {
+		option_rewrite_submodules(arg, &sub_marks_from);
+	} else if (skip_prefix(feature, "rewrite-submodules-from=", &arg)) {
 	} else if (!strcmp(feature, "get-mark")) {
 		; /* Don't die - this feature is supported */
 	} else if (!strcmp(feature, "cat-blob")) {
@@ -3389,6 +3490,7 @@ static void parse_argv(void)
 	seen_data_command = 1;
 	if (import_marks_file)
 		read_marks();
+	build_mark_map(&sub_marks_from, &sub_marks_to);
 }
 
 int cmd_main(int argc, const char **argv)
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index ae9950a9c2..22c6c27763 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3382,4 +3382,113 @@ test_expect_success 'X: handling encoding' '
 	git log -1 --format=%B encoding | grep $(printf "\317\200")
 '
 
+###
+### series Y (submodules and hash algorithms)
+###
+
+cat >Y-sub-input <<\Y_INPUT_END
+blob
+mark :1
+data 4
+foo
+
+reset refs/heads/master
+commit refs/heads/master
+mark :2
+author Full Name <user@company.tld> 1000000000 +0100
+committer Full Name <user@company.tld> 1000000000 +0100
+data 24
+Test submodule commit 1
+M 100644 :1 file
+
+blob
+mark :3
+data 8
+foo
+bar
+
+commit refs/heads/master
+mark :4
+author Full Name <user@company.tld> 1000000001 +0100
+committer Full Name <user@company.tld> 1000000001 +0100
+data 24
+Test submodule commit 2
+from :2
+M 100644 :3 file
+Y_INPUT_END
+
+# Note that the submodule object IDs are intentionally not translated.
+cat >Y-main-input <<\Y_INPUT_END
+blob
+mark :1
+data 4
+foo
+
+reset refs/heads/master
+commit refs/heads/master
+mark :2
+author Full Name <user@company.tld> 2000000000 +0100
+committer Full Name <user@company.tld> 2000000000 +0100
+data 14
+Test commit 1
+M 100644 :1 file
+
+blob
+mark :3
+data 73
+[submodule "sub1"]
+	path = sub1
+	url = https://void.example.com/main.git
+
+commit refs/heads/master
+mark :4
+author Full Name <user@company.tld> 2000000001 +0100
+committer Full Name <user@company.tld> 2000000001 +0100
+data 14
+Test commit 2
+from :2
+M 100644 :3 .gitmodules
+M 160000 0712c5be7cf681388e355ef47525aaf23aee1a6d sub1
+
+blob
+mark :5
+data 8
+foo
+bar
+
+commit refs/heads/master
+mark :6
+author Full Name <user@company.tld> 2000000002 +0100
+committer Full Name <user@company.tld> 2000000002 +0100
+data 14
+Test commit 3
+from :4
+M 100644 :5 file
+M 160000 ff729f5e62f72c0c3978207d9a80e5f3a65f14d7 sub1
+Y_INPUT_END
+
+cat >Y-marks <<\Y_INPUT_END
+:2 0712c5be7cf681388e355ef47525aaf23aee1a6d
+:4 ff729f5e62f72c0c3978207d9a80e5f3a65f14d7
+Y_INPUT_END
+
+test_expect_success 'Y: setup' '
+	test_oid_cache <<-EOF
+	Ymaster sha1:9afed2f9161ddf416c0a1863b8b0725b00070504
+	Ymaster sha256:c0a1010da1df187b2e287654793df01b464bd6f8e3f17fc1481a7dadf84caee3
+	EOF
+'
+
+test_expect_success 'Y: rewrite submodules' '
+	git init main1 &&
+	(
+		cd main1 &&
+		git init sub2 &&
+		git -C sub2 fast-import --export-marks=../sub2-marks <../Y-sub-input &&
+		git fast-import --rewrite-submodules-from=sub:../Y-marks \
+			--rewrite-submodules-to=sub:sub2-marks <../Y-main-input &&
+		test "$(git rev-parse master)" = "$(test_oid Ymaster)"
+	)
+'
+
 test_done

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 02/22] hex: add functions to parse hex object IDs in any algorithm
  2020-01-13 12:47 ` [RFC PATCH 02/22] hex: add functions to parse hex object IDs in any algorithm brian m. carlson
@ 2020-01-15 21:40   ` Junio C Hamano
  2020-01-16  0:22     ` brian m. carlson
  0 siblings, 1 reply; 25+ messages in thread
From: Junio C Hamano @ 2020-01-15 21:40 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> +/*
> + * NOTE: This function relies on hash algorithms being in order from shortest
> + * length to longest length.
> + */
> +int get_oid_hex_any(const char *hex, struct object_id *oid)
> +{
> +	int i;
> +	for (i = GIT_HASH_NALGOS - 1; i > 0; i--) {
> +		if (!get_hash_hex_algop(hex, oid->hash, &hash_algos[i]))
> +			return i;
> +	}
> +	return GIT_HASH_UNKNOWN;
> +}

Two rather obvious questions are

 - what if we have more than one algos that produce hashes of the
   same length?

 - it feels that GIT_HASH_UNKNOWN being 0 wastes the first/zeroth
   element in the hash_algos[] array.

In the future, I would imagine that we would want to be able to say
"here I have a dozen hexdigits that is an abbreviated SHA2 hash",
and we would use some syntax (e.g. "sha2:123456123456") for that.
Would this function be at the layer that would be extended later to
support such a syntax, or would we have a layer higher than this to
do so?



>  int get_oid_hex(const char *hex, struct object_id *oid)
>  {
>  	return get_oid_hex_algop(hex, oid, the_hash_algo);
> @@ -87,6 +101,14 @@ int parse_oid_hex_algop(const char *hex, struct object_id *oid,
>  	return ret;
>  }
>  
> +int parse_oid_hex_any(const char *hex, struct object_id *oid, const char **end)
> +{
> +	int ret = get_oid_hex_any(hex, oid);
> +	if (ret)
> +		*end = hex + hash_algos[ret].hexsz;
> +	return ret;
> +}
> +
>  int parse_oid_hex(const char *hex, struct object_id *oid, const char **end)
>  {
>  	return parse_oid_hex_algop(hex, oid, end, the_hash_algo);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 02/22] hex: add functions to parse hex object IDs in any algorithm
  2020-01-15 21:40   ` Junio C Hamano
@ 2020-01-16  0:22     ` brian m. carlson
  0 siblings, 0 replies; 25+ messages in thread
From: brian m. carlson @ 2020-01-16  0:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2379 bytes --]

On 2020-01-15 at 21:40:54, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > +/*
> > + * NOTE: This function relies on hash algorithms being in order from shortest
> > + * length to longest length.
> > + */
> > +int get_oid_hex_any(const char *hex, struct object_id *oid)
> > +{
> > +	int i;
> > +	for (i = GIT_HASH_NALGOS - 1; i > 0; i--) {
> > +		if (!get_hash_hex_algop(hex, oid->hash, &hash_algos[i]))
> > +			return i;
> > +	}
> > +	return GIT_HASH_UNKNOWN;
> > +}
> 
> Two rather obvious questions are
> 
>  - what if we have more than one algos that produce hashes of the
>    same length?

Than we have a problem that we'll have to deal with then.  There are a
handful of functions that essentially document all these locations and
we'll have to decide how we fix them.

Notably, the dumb HTTP protocol doesn't provide any capability
advertisement, so it's not possible to ask the remote server which
algorithm it's using or negotiate a different one.  Bundles and
get-tar-commit-id are the other problem cases.

Granted, I did very much try to limit these cases as much as possible,
and most of our more modern code doesn't have this problem, but in some
cases it's just unavoidable.  I feel like with only three uses, doing
this won't be mortgaging our future too much.

>  - it feels that GIT_HASH_UNKNOWN being 0 wastes the first/zeroth
>    element in the hash_algos[] array.

I actually think it's really useful to have it this way, because then
it's easy to check for a valid hash algorithm by a comparison against 0.

> In the future, I would imagine that we would want to be able to say
> "here I have a dozen hexdigits that is an abbreviated SHA2 hash",
> and we would use some syntax (e.g. "sha2:123456123456") for that.
> Would this function be at the layer that would be extended later to
> support such a syntax, or would we have a layer higher than this to
> do so?

That's going to be at a different layer.  We'll have the ^{sha1} and
^{sha256} disambiguators that can be used with the normal revision
parsing syntax, and we'll handle the ambiguity there if one isn't
provided.  As for output, we'll only produce output in one algorithm at
a time so ambiguity isn't a problem there.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, back to index

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-13 12:47 [RFC PATCH 00/22] SHA-256 stage 4 implementation, part 1/3 brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 01/22] hex: introduce parsing variants taking hash algorithms brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 02/22] hex: add functions to parse hex object IDs in any algorithm brian m. carlson
2020-01-15 21:40   ` Junio C Hamano
2020-01-16  0:22     ` brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 03/22] repository: require a build flag to use SHA-256 brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 04/22] t: use hash-specific lookup tables to define test constants brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 05/22] t6300: abstract away SHA-1-specific constants brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 06/22] t6300: make hash algorithm independent brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 07/22] t/helper/test-dump-split-index: initialize git repository brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 08/22] t/helper: initialize repository if necessary brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 09/22] t/helper: make repository tests hash independent brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 10/22] setup: allow check_repository_format to read repository format brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 11/22] builtin/init-db: allow specifying hash algorithm on command line brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 12/22] builtin/init-db: add environment variable for new repo hash brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 13/22] init-db: move writing repo version into a function brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 14/22] worktree: allow repository version 1 brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 15/22] commit: use expected signature header for SHA-256 brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 16/22] gpg-interface: improve interface for parsing tags brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 17/22] tag: store SHA-256 signatures in a header brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 18/22] fast-import: permit reading multiple marks files brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 19/22] fast-import: add helper function for inserting mark object entries brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 20/22] fast-import: make find_marks work on any mark set brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 21/22] fast-import: add a generic function to iterate over marks brian m. carlson
2020-01-13 12:47 ` [RFC PATCH 22/22] fast-import: add options for rewriting submodules brian m. carlson

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git