git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo`
@ 2024-04-19  9:51 Patrick Steinhardt
  2024-04-19  9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
                   ` (13 more replies)
  0 siblings, 14 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2808 bytes --]

Hi,

when starting up, Git will initialize `the_repository` by calling
`initialize_the_repository()`. Part of this is also that we set the hash
algo of `the_repository` to SHA1, which implicitly sets `the_hash_algo`
because it is a macro expanding to `the_repository->hash_algo`.

Usually, we eventually set up the correct hash algorithm here once we
have properly set up `the_repository` via `setup_git_directory()`. But
in some commands we actually don't require a Git repository, and thus we
will leave the SHA1 hash algorithm in place.

This has led to some subtle bugs when the context really asks for a
SHA256 repository, which this patch series corrects for most of the
part. Some commands need further work, like for example git-diff(1),
where the user might want to have the ability to pick a hash function
when run outside of a repository.

Ultimately, the last patch then drops the setup of the fallback hash
algorithm completely. This will cause `the_hash_algo` to be a `NULL`
pointer unless explicitly configured, and thus we now start to crash
when it gets accessed without doing so beforehand. This is a rather
risky thing to do, but it does catch bugs where we might otherwise
accidentally do the wrong thing. And even though I think it is the right
thing to do even conceptually, I'd be okay to drop it if folks think
that the risk is not worth it.

Patrick

Patrick Steinhardt (11):
  path: harden validation of HEAD with non-standard hashes
  parse-options-cb: only abbreviate hashes when hash algo is known
  attr: don't recompute default attribute source
  attr: fix BUG() when parsing attrs outside of repo
  remote-curl: fix parsing of detached SHA256 heads
  builtin/rev-parse: allow shortening to more than 40 hex characters
  builtin/blame: don't access potentially unitialized `the_hash_algo`
  builtin/bundle: abort "verify" early when there is no repository
  builtin/diff: explicitly set hash algo when there is no repo
  builtin/shortlog: don't set up revisions without repo
  repository: stop setting SHA1 as the default object hash

 attr.c                     | 31 ++++++++++++++++++++++---------
 builtin/blame.c            |  5 ++---
 builtin/bundle.c           |  5 +++++
 builtin/diff.c             |  9 +++++++++
 builtin/rev-parse.c        |  5 ++---
 builtin/shortlog.c         |  2 +-
 parse-options-cb.c         |  3 ++-
 path.c                     |  2 +-
 remote-curl.c              | 19 ++++++++++++++++++-
 repository.c               |  2 --
 t/t0003-attributes.sh      | 15 +++++++++++++++
 t/t0040-parse-options.sh   | 17 +++++++++++++++++
 t/t1500-rev-parse.sh       |  6 ++++++
 t/t5550-http-fetch-dumb.sh | 15 +++++++++++++++
 14 files changed, 115 insertions(+), 21 deletions(-)

-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 01/11] path: harden validation of HEAD with non-standard hashes
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-19 19:03   ` brian m. carlson
  2024-04-22 16:15   ` Junio C Hamano
  2024-04-19  9:51 ` [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2277 bytes --]

The `validate_headref()` function takes a path to a supposed "HEAD" file
and checks whether its format is something that we understand. It is
used as part of our repository discovery to check whether a specific
directory is a Git directory or not.

Part of the validation is a check for a detached HEAD that contains a
plain object ID. To do this validation we use `get_oid_hex()`, which
relies on `the_hash_algo`. At this point in time the hash algo cannot
yet be initialized though because we didn't yet read the Git config.
Consequently, it will always be the SHA1 hash algorithm.

In practice this works alright because `get_oid_hex()` only ends up
checking whether the prefix of the buffer is a valid object ID. And
because SHA1 is shorter than SHA256, the function will successfully
parse SHA256 object IDs, as well.

It is somewhat fragile though and not really the intent to only check
for SHA1. With this in mind, harden the code to use `get_oid_hex_any()`
to check whether the "HEAD" file parses as any known hash.

One might be hard pressed to tighten the check even further and fully
validate the file contents, not only the prefix. In practice though that
wouldn't make a lot of sense as it could be that the repository uses a
hash function that produces longer hashes than SHA256, but which the
current version of Git doesn't understand yet. We'd still want to detect
the repository as proper Git repository in that case, and we will fail
eventually with a proper error message that the hash isn't understood
when trying to set up the repostiory format.

It follows that we could just leave the current code intact, as in
practice the code change doesn't have any user visible impact. But it
also prepares us for `the_hash_algo` being unset when there is no
repositroy.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 path.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/path.c b/path.c
index 67229edb9c..cc02165530 100644
--- a/path.c
+++ b/path.c
@@ -693,7 +693,7 @@ int validate_headref(const char *path)
 	/*
 	 * Is this a detached HEAD?
 	 */
-	if (!get_oid_hex(buffer, &oid))
+	if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN)
 		return 0;
 
 	return -1;
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
  2024-04-19  9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-23  0:30   ` Justin Tobler
  2024-04-19  9:51 ` [PATCH 03/11] attr: don't recompute default attribute source Patrick Steinhardt
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2444 bytes --]

The `OPT__ABBREV()` option can be used to add an option that abbreviates
object IDs. When given an length longer than `the_hash_algo->hexsz`,
then it will instead set the length to that maximum length.

It may not always be guaranteed that we have `the_hash_algo` initialized
properly as the hash algortihm can only be set up after we have set up
`the_repository`. In that case, the hash would always be truncated to
the hex length of SHA1, which may not be what the user desires.

In practice it's not a problem as all commands that use `OPT__ABBREV()`
also have `RUN_SETUP` set and thus cannot work without a repository.
Consequently, both `the_repository` and `the_hash_algo` would be
properly set up.

Regardless of that, harden the code to not truncate the length when we
didn't set up a repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 parse-options-cb.c       |  3 ++-
 t/t0040-parse-options.sh | 17 +++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/parse-options-cb.c b/parse-options-cb.c
index bdc7fae497..d99d688d3c 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -7,6 +7,7 @@
 #include "environment.h"
 #include "gettext.h"
 #include "object-name.h"
+#include "setup.h"
 #include "string-list.h"
 #include "strvec.h"
 #include "oid-array.h"
@@ -29,7 +30,7 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset)
 				     opt->long_name);
 		if (v && v < MINIMUM_ABBREV)
 			v = MINIMUM_ABBREV;
-		else if (v > the_hash_algo->hexsz)
+		else if (startup_info->have_repository && v > the_hash_algo->hexsz)
 			v = the_hash_algo->hexsz;
 	}
 	*(int *)(opt->value) = v;
diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index 8bb2a8b453..45a773642f 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -176,6 +176,23 @@ test_expect_success 'long options' '
 	test_cmp expect output
 '
 
+test_expect_success 'abbreviate to something longer than SHA1 length' '
+	cat >expect <<-EOF &&
+	boolean: 0
+	integer: 0
+	magnitude: 0
+	timestamp: 0
+	string: (not set)
+	abbrev: 100
+	verbose: -1
+	quiet: 0
+	dry run: no
+	file: (not set)
+	EOF
+	test-tool parse-options --abbrev=100 >output &&
+	test_cmp expect output
+'
+
 test_expect_success 'missing required value' '
 	cat >expect <<-\EOF &&
 	error: switch `s'\'' requires a value
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 03/11] attr: don't recompute default attribute source
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
  2024-04-19  9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
  2024-04-19  9:51 ` [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-23  0:32   ` Justin Tobler
  2024-04-19  9:51 ` [PATCH 04/11] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2889 bytes --]

The `default_attr_source()` function lazily computes the attr source
supposedly once, only. This is done via a static variable `attr_source`
that contains the resolved object ID of the attr source's tree. If the
variable is the null object ID then we try to look up the attr source,
otherwise we skip over it.

This has approach is flawed though: the variable will never be set to
anything else but the null object ID in case there is no attr source.
Consequently, we re-compute the information on every call. And in the
worst case, when we silently ignore bad trees, this will cause us to try
and look up the treeish every single time.

Improve this by introducing a separate variable `has_attr_source` to
track whether we already computed the attr source and, if so, whether we
have an attr source or not.

This also allows us to convert the `ignore_bad_attr_tree` to not be
static anymore as the code will only be executed once anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 attr.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/attr.c b/attr.c
index 679e42258c..9d911aeb31 100644
--- a/attr.c
+++ b/attr.c
@@ -1206,15 +1206,16 @@ static void collect_some_attrs(struct index_state *istate,
 }
 
 static const char *default_attr_source_tree_object_name;
-static int ignore_bad_attr_tree;
 
 void set_git_attr_source(const char *tree_object_name)
 {
 	default_attr_source_tree_object_name = xstrdup(tree_object_name);
 }
 
-static void compute_default_attr_source(struct object_id *attr_source)
+static int compute_default_attr_source(struct object_id *attr_source)
 {
+	int ignore_bad_attr_tree = 0;
+
 	if (!default_attr_source_tree_object_name)
 		default_attr_source_tree_object_name = getenv(GIT_ATTR_SOURCE_ENVIRONMENT);
 
@@ -1230,22 +1231,28 @@ static void compute_default_attr_source(struct object_id *attr_source)
 		ignore_bad_attr_tree = 1;
 	}
 
-	if (!default_attr_source_tree_object_name || !is_null_oid(attr_source))
-		return;
+	if (!default_attr_source_tree_object_name)
+		return 0;
 
 	if (repo_get_oid_treeish(the_repository,
 				 default_attr_source_tree_object_name,
-				 attr_source) && !ignore_bad_attr_tree)
-		die(_("bad --attr-source or GIT_ATTR_SOURCE"));
+				 attr_source)) {
+		if (!ignore_bad_attr_tree)
+			die(_("bad --attr-source or GIT_ATTR_SOURCE"));
+		return 0;
+	}
+
+	return 1;
 }
 
 static struct object_id *default_attr_source(void)
 {
 	static struct object_id attr_source;
+	static int has_attr_source = -1;
 
-	if (is_null_oid(&attr_source))
-		compute_default_attr_source(&attr_source);
-	if (is_null_oid(&attr_source))
+	if (has_attr_source < 0)
+		has_attr_source = compute_default_attr_source(&attr_source);
+	if (!has_attr_source)
 		return NULL;
 	return &attr_source;
 }
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 04/11] attr: fix BUG() when parsing attrs outside of repo
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2024-04-19  9:51 ` [PATCH 03/11] attr: don't recompute default attribute source Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-19  9:51 ` [PATCH 05/11] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2065 bytes --]

If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are
set, then `compute_default_attr_source()` will try to look up the value
as a treeish. It is possible to hit that function while outside of a Git
repository though, for example when using `git grep --no-index`. In that
case, Git will hit a bug because we try to look up the main ref store
outside of a repository.

Handle the case gracefully and detect when we try to look up an attr
source without a repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 attr.c                |  6 ++++++
 t/t0003-attributes.sh | 15 +++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/attr.c b/attr.c
index 9d911aeb31..4bd09bcb4b 100644
--- a/attr.c
+++ b/attr.c
@@ -1234,6 +1234,12 @@ static int compute_default_attr_source(struct object_id *attr_source)
 	if (!default_attr_source_tree_object_name)
 		return 0;
 
+	if (!startup_info->have_repository) {
+		if (!ignore_bad_attr_tree)
+			die(_("cannot use --attr-source or GIT_ATTR_SOURCE without repo"));
+		return 0;
+	}
+
 	if (repo_get_oid_treeish(the_repository,
 				 default_attr_source_tree_object_name,
 				 attr_source)) {
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 774b52c298..3efdec54dd 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -428,6 +428,21 @@ test_expect_success 'precedence of --attr-source, GIT_ATTR_SOURCE, then attr.tre
 	)
 '
 
+test_expect_success 'diff without repository with attr source' '
+	mkdir -p "$TRASH_DIRECTORY/outside/nongit" &&
+	(
+		cd "$TRASH_DIRECTORY/outside/nongit" &&
+		GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/outside" &&
+		export GIT_CEILING_DIRECTORIES &&
+		touch file &&
+		cat >expect <<-EOF &&
+		fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo
+		EOF
+		test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err &&
+		test_cmp expect err
+	)
+'
+
 test_expect_success 'bare repository: with --source' '
 	(
 		cd bare.git &&
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 05/11] remote-curl: fix parsing of detached SHA256 heads
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2024-04-19  9:51 ` [PATCH 04/11] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-19  9:51 ` [PATCH 06/11] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3671 bytes --]

The dumb HTTP transport tries to read the remote HEAD reference by
downloading the "HEAD" file and then parsing it via `http_fetch_ref()`.
This function will either parse the file as an object ID in case it is
exactly `the_hash_algo->hexsz` long, or otherwise it will check whether
the reference starts with "ref :" and parse it as a symbolic ref.

This is broken when parsing detached HEADs of a remote SHA256 repository
because we never update `the_hash_algo` to the discovered remote object
hash. Consequently, `the_hash_algo` will always be the fallback SHA1
hash algorithm, which will cause us to fail parsing HEAD altogteher when
it contains a SHA256 object ID.

Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`.
While at it, let's make the expected SHA1 fallback explicit in our code,
which also addresses an upcoming issue where we are going to remove the
SHA1 fallback for `the_hash_algo`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 remote-curl.c              | 19 ++++++++++++++++++-
 t/t5550-http-fetch-dumb.sh | 15 +++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 0b6d7815fd..004b707fdf 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -266,12 +266,23 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	return list;
 }
 
+/*
+ * Try to detect the hash algorithm used by the remote repository when using
+ * the dumb HTTP transport. As dumb transports cannot tell us the object hash
+ * directly have to derive it from the advertised ref lengths.
+ */
 static const struct git_hash_algo *detect_hash_algo(struct discovery *heads)
 {
 	const char *p = memchr(heads->buf, '\t', heads->len);
 	int algo;
+
+	/*
+	 * In case the remote has no refs we have no way to reliably determine
+	 * the object hash used by that repository. In that case we simply fall
+	 * back to SHA1, which may or may not be correct.
+	 */
 	if (!p)
-		return the_hash_algo;
+		return &hash_algos[GIT_HASH_SHA1];
 
 	algo = hash_algo_by_length((p - heads->buf) / 2);
 	if (algo == GIT_HASH_UNKNOWN)
@@ -295,6 +306,12 @@ static struct ref *parse_info_refs(struct discovery *heads)
 		    "is this a git repository?",
 		    transport_anonymize_url(url.buf));
 
+	/*
+	 * Set the repository's hash algo to whatever we have just detected.
+	 * This ensures that we can correctly parse the remote references.
+	 */
+	repo_set_hash_algo(the_repository, hash_algo_by_ptr(options.hash_algo));
+
 	data = heads->buf;
 	start = NULL;
 	mid = data;
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 4c3b32785d..5f16cbc58d 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -55,6 +55,21 @@ test_expect_success 'list refs from outside any repository' '
 	test_cmp expect actual
 '
 
+
+test_expect_success 'list detached HEAD from outside any repository' '
+	git clone --mirror "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
+		"$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" \
+		update-ref --no-deref HEAD refs/heads/main &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" update-server-info &&
+	cat >expect <<-EOF &&
+	$(git rev-parse main)	HEAD
+	$(git rev-parse main)	refs/heads/main
+	EOF
+	nongit git ls-remote "$HTTPD_URL/dumb/repo-detached.git" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'create password-protected repository' '
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/" &&
 	cp -Rf "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 06/11] builtin/rev-parse: allow shortening to more than 40 hex characters
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2024-04-19  9:51 ` [PATCH 05/11] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-19  9:51 ` [PATCH 07/11] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1995 bytes --]

The `--short=` option for git-rev-parse(1) allows the user to specify
to how many characters object IDs should be shortened to. The option is
broken though for SHA256 repositories because we set the maximum allowed
hash size to `the_hash_algo->hexsz` before we have even set up the repo.
Consequently, `the_hash_algo` will always be SHA1 and thus we truncate
every hash after at most 40 characters.

Fix this by accessing `the_hash_algo` only after we have set up the
repo.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/rev-parse.c  | 5 ++---
 t/t1500-rev-parse.sh | 6 ++++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 624182e507..d7b87c605e 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -687,7 +687,6 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 	const char *name = NULL;
 	struct object_context unused;
 	struct strbuf buf = STRBUF_INIT;
-	const int hexsz = the_hash_algo->hexsz;
 	int seen_end_of_options = 0;
 	enum format_type format = FORMAT_DEFAULT;
 
@@ -863,8 +862,8 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				abbrev = strtoul(arg, NULL, 10);
 				if (abbrev < MINIMUM_ABBREV)
 					abbrev = MINIMUM_ABBREV;
-				else if (hexsz <= abbrev)
-					abbrev = hexsz;
+				else if ((int)the_hash_algo->hexsz <= abbrev)
+					abbrev = the_hash_algo->hexsz;
 				continue;
 			}
 			if (!strcmp(arg, "--sq")) {
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index a669e592f1..30c31918fd 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -304,4 +304,10 @@ test_expect_success 'rev-parse --bisect includes bad, excludes good' '
 	test_cmp expect actual
 '
 
+test_expect_success '--short= truncates to the actual hash length' '
+	git rev-parse HEAD >expect &&
+	git rev-parse --short=100 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 07/11] builtin/blame: don't access potentially unitialized `the_hash_algo`
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2024-04-19  9:51 ` [PATCH 06/11] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-19  9:51 ` [PATCH 08/11] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2001 bytes --]

We access `the_hash_algo` in git-blame(1) before we have executed
`parse_options_start()`, which may not be properly set up in case we
have no repository. This is fine for most of the part because all the
call paths that lead to it (git-blame(1), git-annotate(1) as well as
git-pick-axe(1)) specify `RUN_SETUP` and thus require a repository.

There is one exception though, namely when passing `-h` to print the
help. Here we will access `the_hash_algo` even if there is no repo.
This works fine right now because `the_hash_algo` gets sets up to point
to the SHA1 algorithm via `initialize_repository()`. But we're about to
stop doing this, and thus the code would lead to a `NULL` pointer
exception.

Prepare the code for this and only access `the_hash_algo` after we are
sure that there is a proper repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/blame.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 9aa74680a3..e325825936 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -915,7 +915,6 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	struct range_set ranges;
 	unsigned int range_i;
 	long anchor;
-	const int hexsz = the_hash_algo->hexsz;
 	long num_lines = 0;
 	const char *str_usage = cmd_is_annotate ? annotate_usage : blame_usage;
 	const char **opt_usage = cmd_is_annotate ? annotate_opt_usage : blame_opt_usage;
@@ -973,11 +972,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	} else if (show_progress < 0)
 		show_progress = isatty(2);
 
-	if (0 < abbrev && abbrev < hexsz)
+	if (0 < abbrev && abbrev < (int)the_hash_algo->hexsz)
 		/* one more abbrev length is needed for the boundary commit */
 		abbrev++;
 	else if (!abbrev)
-		abbrev = hexsz;
+		abbrev = the_hash_algo->hexsz;
 
 	if (revs_file && read_ancestry(revs_file))
 		die_errno("reading graft file '%s' failed", revs_file);
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 08/11] builtin/bundle: abort "verify" early when there is no repository
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (6 preceding siblings ...)
  2024-04-19  9:51 ` [PATCH 07/11] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-19  9:51 ` [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1526 bytes --]

Verifying a bundle requires us to have a repository. This is encoded in
`verify_bundle()`, which will return an error if there is no repository.
We call `open_bundle()` before we call `verify_bundle()` though, which
already performs some verifications even though we may ultimately abort
due to a missing repository.

This is problematic because `open_bundle()` already reads the bundle
header and verifies that it contains a properly formatted hash. When
there is no repository we have no clue what hash function to expect
though, so we always end up assuming SHA1 here, which may or may not be
correct. Furthermore, we are about to stop initializing `the_hash_algo`
when there is no repository, which will lead to segfaults.

Check early on whether we have a repository to fix this issue.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/bundle.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/bundle.c b/builtin/bundle.c
index 3ad11dc5d0..d5d41a8f67 100644
--- a/builtin/bundle.c
+++ b/builtin/bundle.c
@@ -140,6 +140,11 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) {
 			builtin_bundle_verify_usage, options, &bundle_file);
 	/* bundle internals use argv[1] as further parameters */
 
+	if (!startup_info->have_repository) {
+		ret = error(_("need a repository to verify a bundle"));
+		goto cleanup;
+	}
+
 	if ((bundle_fd = open_bundle(bundle_file, &header, &name)) < 0) {
 		ret = 1;
 		goto cleanup;
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (7 preceding siblings ...)
  2024-04-19  9:51 ` [PATCH 08/11] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-22 18:41   ` Junio C Hamano
  2024-04-19  9:51 ` [PATCH 10/11] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1811 bytes --]

The git-diff(1) command can be used outside repositories to diff two
files with each other. But even if there is no repository we will end up
hashing the files that we are diffing so that we can print the "index"
line:

```
diff --git a/a b/b
index 7898192..6178079 100644
--- a/a
+++ b/b
@@ -1 +1 @@
-a
+b
```

We implicitly use SHA1 to calculate the hash here, which is because
`the_repository` gets initialized with SHA1 during the startup routine.
We are about to stop doing this though such that `the_repository` only
ever has a hash function when it was properly initialized via a repo's
configuration.

To give full control to our users, we would ideally add a new switch to
git-diff(1) that allows them to specify the hash function when executed
outside of a repository. But for now, we only convert the code to make
this explicit such that we can stop setting the default hash algorithm
for `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/diff.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/diff.c b/builtin/diff.c
index 6e196e0c7d..58ec7e5da2 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -465,6 +465,15 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 			no_index = DIFF_NO_INDEX_IMPLICIT;
 	}
 
+	/*
+	 * When operating outside of a Git repository we need to have a hash
+	 * algorithm at hand so that we can generate the blob hashes. We
+	 * default to SHA1 here, but may eventually want to change this to be
+	 * configurable via a command line option.
+	 */
+	if (nongit)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	init_diff_ui_defaults();
 	git_config(git_diff_ui_config, NULL);
 	prefix = precompose_argv_prefix(argc, argv, prefix);
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 10/11] builtin/shortlog: don't set up revisions without repo
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (8 preceding siblings ...)
  2024-04-19  9:51 ` [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-23  0:35   ` Justin Tobler
  2024-04-19  9:51 ` [PATCH 11/11] repository: stop setting SHA1 as the default object hash Patrick Steinhardt
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

It is possible to run git-shortlog(1) outside of a repository by passing
it output from git-log(1) via standard input. Obviously, as there is no
repository in that context, it is thus unsupported to pass any revisions
as arguments.

Reghardless of that we still end up calling `setup_revisions()`. While
that works alright, it is somewhat strange. Furthermore, this is about
to cause problems when we unset the default object hash.

Refactor the code to only call `setup_revisions()` when we have a
repository. This is safe to do as we already verify that there are no
arguments when running outside of a repository anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/shortlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 3c7cd2d6ef..d4daf31e22 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -435,7 +435,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 		usage_with_options(shortlog_usage, options);
 	}
 
-	if (setup_revisions(argc, argv, &rev, NULL) != 1) {
+	if (!nongit && setup_revisions(argc, argv, &rev, NULL) != 1) {
 		error(_("unrecognized argument: %s"), argv[1]);
 		usage_with_options(shortlog_usage, options);
 	}
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 11/11] repository: stop setting SHA1 as the default object hash
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (9 preceding siblings ...)
  2024-04-19  9:51 ` [PATCH 10/11] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
@ 2024-04-19  9:51 ` Patrick Steinhardt
  2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-19  9:51 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]

During the startup of Git, we call `initialize_the_repository()` to set
up `the_repository` as well as `the_index`. Part of this setup is also
to set the default object hash of the repository to SHA1. This has the
effect that `the_hash_algo` is getting initialized to SHA1, as well.
This default hash algorithm eventually gets overridden by most Git
commands via `setup_git_directory()`, which also detects the actual hash
algorithm used by the repository.

There are some commands though that don't access a repository at all, or
at a later point only, and thus retain the default hash function for
some amount of time. As some of the the preceding commits demonstrate,
this can lead to subtle issues when we access `the_hash_algo` when no
repository has been set up.

Address this issue by dropping the set up of the default hash algorithm
completely. The effect of this is that `the_hash_algo` will map to a
`NULL` pointer and thus cause Git to crash when something tries to
access the hash algorithm without it being properly initialized. It thus
forces all Git commands to explicitly set up the hash algorithm in case
there is no repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 repository.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/repository.c b/repository.c
index e15b416944..b65b1a8c8b 100644
--- a/repository.c
+++ b/repository.c
@@ -35,8 +35,6 @@ void initialize_the_repository(void)
 	the_repo.parsed_objects = parsed_object_pool_new();
 
 	index_state_init(&the_index, the_repository);
-
-	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
 }
 
 static void expand_base_dir(char **out, const char *in,
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes
  2024-04-19  9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
@ 2024-04-19 19:03   ` brian m. carlson
  2024-04-22  4:56     ` Patrick Steinhardt
  2024-04-22 16:15   ` Junio C Hamano
  1 sibling, 1 reply; 53+ messages in thread
From: brian m. carlson @ 2024-04-19 19:03 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 432 bytes --]

On 2024-04-19 at 09:51:10, Patrick Steinhardt wrote:
> It follows that we could just leave the current code intact, as in
> practice the code change doesn't have any user visible impact. But it
> also prepares us for `the_hash_algo` being unset when there is no
> repositroy.

The patch looks fine and is well explained, but you have a typo
("repositroy").
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo`
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (10 preceding siblings ...)
  2024-04-19  9:51 ` [PATCH 11/11] repository: stop setting SHA1 as the default object hash Patrick Steinhardt
@ 2024-04-19 19:12 ` brian m. carlson
  2024-04-19 19:16   ` Junio C Hamano
  2024-04-22  4:56   ` Patrick Steinhardt
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
  13 siblings, 2 replies; 53+ messages in thread
From: brian m. carlson @ 2024-04-19 19:12 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2039 bytes --]

On 2024-04-19 at 09:51:03, Patrick Steinhardt wrote:
> Hi,
> 
> when starting up, Git will initialize `the_repository` by calling
> `initialize_the_repository()`. Part of this is also that we set the hash
> algo of `the_repository` to SHA1, which implicitly sets `the_hash_algo`
> because it is a macro expanding to `the_repository->hash_algo`.
> 
> Usually, we eventually set up the correct hash algorithm here once we
> have properly set up `the_repository` via `setup_git_directory()`. But
> in some commands we actually don't require a Git repository, and thus we
> will leave the SHA1 hash algorithm in place.
> 
> This has led to some subtle bugs when the context really asks for a
> SHA256 repository, which this patch series corrects for most of the
> part. Some commands need further work, like for example git-diff(1),
> where the user might want to have the ability to pick a hash function
> when run outside of a repository.
> 
> Ultimately, the last patch then drops the setup of the fallback hash
> algorithm completely. This will cause `the_hash_algo` to be a `NULL`
> pointer unless explicitly configured, and thus we now start to crash
> when it gets accessed without doing so beforehand. This is a rather
> risky thing to do, but it does catch bugs where we might otherwise
> accidentally do the wrong thing. And even though I think it is the right
> thing to do even conceptually, I'd be okay to drop it if folks think
> that the risk is not worth it.

I've taken a look, and other than the minor typo I noted, this seems
fine.  I'm in favour of getting rid of the SHA-1 default, even though my
gut tells me we might find a bug or two along the way where things
aren't initialized properly.  I still think that'll be okay and it's
worth doing, since it'll help us prepare for the case in the future
where we want to switch the default and also for libification, where we
won't want to make those kinds of assumptions.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo`
  2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson
@ 2024-04-19 19:16   ` Junio C Hamano
  2024-04-22  4:56   ` Patrick Steinhardt
  1 sibling, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2024-04-19 19:16 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Patrick Steinhardt, git

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I've taken a look, and other than the minor typo I noted, this seems
> fine.  I'm in favour of getting rid of the SHA-1 default, even though my
> gut tells me we might find a bug or two along the way where things
> aren't initialized properly.  I still think that'll be okay and it's
> worth doing, since it'll help us prepare for the case in the future
> where we want to switch the default and also for libification, where we
> won't want to make those kinds of assumptions.

Yup, thanks for reviewing (and thanks Patrick for writing the
series).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo`
  2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson
  2024-04-19 19:16   ` Junio C Hamano
@ 2024-04-22  4:56   ` Patrick Steinhardt
  1 sibling, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-22  4:56 UTC (permalink / raw)
  To: brian m. carlson, git

[-- Attachment #1: Type: text/plain, Size: 2517 bytes --]

On Fri, Apr 19, 2024 at 07:12:59PM +0000, brian m. carlson wrote:
> On 2024-04-19 at 09:51:03, Patrick Steinhardt wrote:
> > Hi,
> > 
> > when starting up, Git will initialize `the_repository` by calling
> > `initialize_the_repository()`. Part of this is also that we set the hash
> > algo of `the_repository` to SHA1, which implicitly sets `the_hash_algo`
> > because it is a macro expanding to `the_repository->hash_algo`.
> > 
> > Usually, we eventually set up the correct hash algorithm here once we
> > have properly set up `the_repository` via `setup_git_directory()`. But
> > in some commands we actually don't require a Git repository, and thus we
> > will leave the SHA1 hash algorithm in place.
> > 
> > This has led to some subtle bugs when the context really asks for a
> > SHA256 repository, which this patch series corrects for most of the
> > part. Some commands need further work, like for example git-diff(1),
> > where the user might want to have the ability to pick a hash function
> > when run outside of a repository.
> > 
> > Ultimately, the last patch then drops the setup of the fallback hash
> > algorithm completely. This will cause `the_hash_algo` to be a `NULL`
> > pointer unless explicitly configured, and thus we now start to crash
> > when it gets accessed without doing so beforehand. This is a rather
> > risky thing to do, but it does catch bugs where we might otherwise
> > accidentally do the wrong thing. And even though I think it is the right
> > thing to do even conceptually, I'd be okay to drop it if folks think
> > that the risk is not worth it.
> 
> I've taken a look, and other than the minor typo I noted, this seems
> fine.  I'm in favour of getting rid of the SHA-1 default, even though my
> gut tells me we might find a bug or two along the way where things
> aren't initialized properly.  I still think that'll be okay and it's
> worth doing, since it'll help us prepare for the case in the future
> where we want to switch the default and also for libification, where we
> won't want to make those kinds of assumptions.

Yeah, I would expect there to be a few more bugs that I just didn't
spot. But I think if we do this early in the next release cycle it would
give us plenty of time to detect any such issues. And in the worst case
we would still be able to revert the last patch of this series. So all
in all it hopefully shouldn't be all that bad and prepares us for the
future.

Thanks for your review!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes
  2024-04-19 19:03   ` brian m. carlson
@ 2024-04-22  4:56     ` Patrick Steinhardt
  0 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-22  4:56 UTC (permalink / raw)
  To: brian m. carlson, git

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]

On Fri, Apr 19, 2024 at 07:03:33PM +0000, brian m. carlson wrote:
> On 2024-04-19 at 09:51:10, Patrick Steinhardt wrote:
> > It follows that we could just leave the current code intact, as in
> > practice the code change doesn't have any user visible impact. But it
> > also prepares us for `the_hash_algo` being unset when there is no
> > repositroy.
> 
> The patch looks fine and is well explained, but you have a typo
> ("repositroy").

Thanks, fixed! I'll refrain from sending out another version just to
address this typo though.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes
  2024-04-19  9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
  2024-04-19 19:03   ` brian m. carlson
@ 2024-04-22 16:15   ` Junio C Hamano
  2024-04-23  4:50     ` Patrick Steinhardt
  1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2024-04-22 16:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> The `validate_headref()` function takes a path to a supposed "HEAD" file
> and checks whether its format is something that we understand. It is
> used as part of our repository discovery to check whether a specific
> directory is a Git directory or not.
>
> Part of the validation is a check for a detached HEAD that contains a
> plain object ID. To do this validation we use `get_oid_hex()`, which
> relies on `the_hash_algo`. At this point in time the hash algo cannot
> yet be initialized though because we didn't yet read the Git config.
> Consequently, it will always be the SHA1 hash algorithm.
>
> In practice this works alright because `get_oid_hex()` only ends up
> checking whether the prefix of the buffer is a valid object ID. And
> because SHA1 is shorter than SHA256, the function will successfully
> parse SHA256 object IDs, as well.
>
> It is somewhat fragile though and not really the intent to only check
> for SHA1. With this in mind, harden the code to use `get_oid_hex_any()`
> to check whether the "HEAD" file parses as any known hash.

All makes sense, and given the above, I strongly suspect that we
would want to make the validate_headref() function a file-scope
static in setup.c to make sure it is only called/callable from the
repository discovery codepath.  Especially that if somebody calls
this function again after we find out that the repository uses
SHA-256, we would want to let the caller know that the detached HEAD
records SHA-1 and we are in an inconsistent state.

> It follows that we could just leave the current code intact, as in
> practice the code change doesn't have any user visible impact. But it
> also prepares us for `the_hash_algo` being unset when there is no
> repositroy.

Or perhaps we use get_oid_hex_any() != GIT_HASH_UNKNOWN when
the_hash_algo has not been determined, and use !get_oid_hex() after
we have determined which algorithm we are using?  It may be what you
did in a later step in the series, so let me read on.

Thanks.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  path.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/path.c b/path.c
> index 67229edb9c..cc02165530 100644
> --- a/path.c
> +++ b/path.c
> @@ -693,7 +693,7 @@ int validate_headref(const char *path)
>  	/*
>  	 * Is this a detached HEAD?
>  	 */
> -	if (!get_oid_hex(buffer, &oid))
> +	if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN)
>  		return 0;
>  
>  	return -1;

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo
  2024-04-19  9:51 ` [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
@ 2024-04-22 18:41   ` Junio C Hamano
  0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2024-04-22 18:41 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> The git-diff(1) command can be used outside repositories to diff two
> files with each other. But even if there is no repository we will end up
> hashing the files that we are diffing so that we can print the "index"
> line:
>
> ```
> diff --git a/a b/b
> index 7898192..6178079 100644
> --- a/a
> +++ b/b
> @@ -1 +1 @@
> -a
> +b
> ```

This will break "git am"; it is customary to indent such sample
patch block in a log message.

> We implicitly use SHA1 to calculate the hash here, which is because
> `the_repository` gets initialized with SHA1 during the startup routine.
> We are about to stop doing this though such that `the_repository` only
> ever has a hash function when it was properly initialized via a repo's
> configuration.
>
> To give full control to our users, we would ideally add a new switch to
> git-diff(1) that allows them to specify the hash function when executed
> outside of a repository. But for now, we only convert the code to make
> this explicit such that we can stop setting the default hash algorithm
> for `the_repository`.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/diff.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/builtin/diff.c b/builtin/diff.c
> index 6e196e0c7d..58ec7e5da2 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -465,6 +465,15 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
>  			no_index = DIFF_NO_INDEX_IMPLICIT;
>  	}
>  
> +	/*
> +	 * When operating outside of a Git repository we need to have a hash
> +	 * algorithm at hand so that we can generate the blob hashes. We
> +	 * default to SHA1 here, but may eventually want to change this to be
> +	 * configurable via a command line option.
> +	 */
> +	if (nongit)
> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +
>  	init_diff_ui_defaults();
>  	git_config(git_diff_ui_config, NULL);
>  	prefix = precompose_argv_prefix(argc, argv, prefix);

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known
  2024-04-19  9:51 ` [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
@ 2024-04-23  0:30   ` Justin Tobler
  0 siblings, 0 replies; 53+ messages in thread
From: Justin Tobler @ 2024-04-23  0:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/04/19 11:51AM, Patrick Steinhardt wrote:
> The `OPT__ABBREV()` option can be used to add an option that abbreviates
> object IDs. When given an length longer than `the_hash_algo->hexsz`,

s/an length/a length/

> then it will instead set the length to that maximum length.
> 
> It may not always be guaranteed that we have `the_hash_algo` initialized
> properly as the hash algortihm can only be set up after we have set up

s/algortihm/algorithm/

> `the_repository`. In that case, the hash would always be truncated to
> the hex length of SHA1, which may not be what the user desires.
> 
> In practice it's not a problem as all commands that use `OPT__ABBREV()`
> also have `RUN_SETUP` set and thus cannot work without a repository.
> Consequently, both `the_repository` and `the_hash_algo` would be
> properly set up.
> 
> Regardless of that, harden the code to not truncate the length when we
> didn't set up a repository.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
...

A couple of typos I noticed.

-Justin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 03/11] attr: don't recompute default attribute source
  2024-04-19  9:51 ` [PATCH 03/11] attr: don't recompute default attribute source Patrick Steinhardt
@ 2024-04-23  0:32   ` Justin Tobler
  0 siblings, 0 replies; 53+ messages in thread
From: Justin Tobler @ 2024-04-23  0:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/04/19 11:51AM, Patrick Steinhardt wrote:
> The `default_attr_source()` function lazily computes the attr source
> supposedly once, only. This is done via a static variable `attr_source`
> that contains the resolved object ID of the attr source's tree. If the
> variable is the null object ID then we try to look up the attr source,
> otherwise we skip over it.
> 
> This has approach is flawed though: the variable will never be set to

Remove "has"

-Justin

> anything else but the null object ID in case there is no attr source.
> Consequently, we re-compute the information on every call. And in the
> worst case, when we silently ignore bad trees, this will cause us to try
> and look up the treeish every single time.
> 
> Improve this by introducing a separate variable `has_attr_source` to
> track whether we already computed the attr source and, if so, whether we
> have an attr source or not.
> 
> This also allows us to convert the `ignore_bad_attr_tree` to not be
> static anymore as the code will only be executed once anyway.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 10/11] builtin/shortlog: don't set up revisions without repo
  2024-04-19  9:51 ` [PATCH 10/11] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
@ 2024-04-23  0:35   ` Justin Tobler
  0 siblings, 0 replies; 53+ messages in thread
From: Justin Tobler @ 2024-04-23  0:35 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/04/19 11:51AM, Patrick Steinhardt wrote:
> It is possible to run git-shortlog(1) outside of a repository by passing
> it output from git-log(1) via standard input. Obviously, as there is no
> repository in that context, it is thus unsupported to pass any revisions
> as arguments.
> 
> Reghardless of that we still end up calling `setup_revisions()`. While

s/Reghardless/Regardless/

> that works alright, it is somewhat strange. Furthermore, this is about
> to cause problems when we unset the default object hash.
> 
> Refactor the code to only call `setup_revisions()` when we have a
> repository. This is safe to do as we already verify that there are no
> arguments when running outside of a repository anyway.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes
  2024-04-22 16:15   ` Junio C Hamano
@ 2024-04-23  4:50     ` Patrick Steinhardt
  2024-04-23 16:54       ` Junio C Hamano
  0 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  4:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3075 bytes --]

On Mon, Apr 22, 2024 at 09:15:41AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > The `validate_headref()` function takes a path to a supposed "HEAD" file
> > and checks whether its format is something that we understand. It is
> > used as part of our repository discovery to check whether a specific
> > directory is a Git directory or not.
> >
> > Part of the validation is a check for a detached HEAD that contains a
> > plain object ID. To do this validation we use `get_oid_hex()`, which
> > relies on `the_hash_algo`. At this point in time the hash algo cannot
> > yet be initialized though because we didn't yet read the Git config.
> > Consequently, it will always be the SHA1 hash algorithm.
> >
> > In practice this works alright because `get_oid_hex()` only ends up
> > checking whether the prefix of the buffer is a valid object ID. And
> > because SHA1 is shorter than SHA256, the function will successfully
> > parse SHA256 object IDs, as well.
> >
> > It is somewhat fragile though and not really the intent to only check
> > for SHA1. With this in mind, harden the code to use `get_oid_hex_any()`
> > to check whether the "HEAD" file parses as any known hash.
> 
> All makes sense, and given the above, I strongly suspect that we
> would want to make the validate_headref() function a file-scope
> static in setup.c to make sure it is only called/callable from the
> repository discovery codepath.  Especially that if somebody calls
> this function again after we find out that the repository uses
> SHA-256, we would want to let the caller know that the detached HEAD
> records SHA-1 and we are in an inconsistent state.

Fair enough, we can definitely do so. It only has a single callsite
anyway.

> > It follows that we could just leave the current code intact, as in
> > practice the code change doesn't have any user visible impact. But it
> > also prepares us for `the_hash_algo` being unset when there is no
> > repositroy.
> 
> Or perhaps we use get_oid_hex_any() != GIT_HASH_UNKNOWN when
> the_hash_algo has not been determined, and use !get_oid_hex() after
> we have determined which algorithm we are using?  It may be what you
> did in a later step in the series, so let me read on.

I didn't, and given that it's only used from `is_git_directory()` it
would probably not make much sense, either. I'll rather add another
patch on top that moves the function around to clarify that it is only
expected to be called from "setup.c".

Patrick

> Thanks.
> 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  path.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/path.c b/path.c
> > index 67229edb9c..cc02165530 100644
> > --- a/path.c
> > +++ b/path.c
> > @@ -693,7 +693,7 @@ int validate_headref(const char *path)
> >  	/*
> >  	 * Is this a detached HEAD?
> >  	 */
> > -	if (!get_oid_hex(buffer, &oid))
> > +	if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN)
> >  		return 0;
> >  
> >  	return -1;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo`
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (11 preceding siblings ...)
  2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson
@ 2024-04-23  5:07 ` Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
                     ` (12 more replies)
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
  13 siblings, 13 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 6510 bytes --]

Hi,

this is the second version of my patch series that causes us to stop
relying on the SHA1 default hash.

Changes compared to v1:

    - Various typo fixes in commit messages.

    - Added another patch that moves `validate_headref()` into "setup.c"
      to clarify that it is only used during repository discovery.

    - Indented a diff in a commit message so that git-am(1) is happy.

Thanks!

Patrick

Patrick Steinhardt (12):
  path: harden validation of HEAD with non-standard hashes
  path: move `validate_headref()` to its only user
  parse-options-cb: only abbreviate hashes when hash algo is known
  attr: don't recompute default attribute source
  attr: fix BUG() when parsing attrs outside of repo
  remote-curl: fix parsing of detached SHA256 heads
  builtin/rev-parse: allow shortening to more than 40 hex characters
  builtin/blame: don't access potentially unitialized `the_hash_algo`
  builtin/bundle: abort "verify" early when there is no repository
  builtin/diff: explicitly set hash algo when there is no repo
  builtin/shortlog: don't set up revisions without repo
  repository: stop setting SHA1 as the default object hash

 attr.c                     | 31 +++++++++++++++-------
 builtin/blame.c            |  5 ++--
 builtin/bundle.c           |  5 ++++
 builtin/diff.c             |  9 +++++++
 builtin/rev-parse.c        |  5 ++--
 builtin/shortlog.c         |  2 +-
 parse-options-cb.c         |  3 ++-
 path.c                     | 53 --------------------------------------
 path.h                     |  1 -
 remote-curl.c              | 19 +++++++++++++-
 repository.c               |  2 --
 setup.c                    | 53 ++++++++++++++++++++++++++++++++++++++
 t/t0003-attributes.sh      | 15 +++++++++++
 t/t0040-parse-options.sh   | 17 ++++++++++++
 t/t1500-rev-parse.sh       |  6 +++++
 t/t5550-http-fetch-dumb.sh | 15 +++++++++++
 16 files changed, 167 insertions(+), 74 deletions(-)

Range-diff against v1:
 1:  aa4d6f508b !  1:  a986b464d3 path: harden validation of HEAD with non-standard hashes
    @@ Commit message
         current version of Git doesn't understand yet. We'd still want to detect
         the repository as proper Git repository in that case, and we will fail
         eventually with a proper error message that the hash isn't understood
    -    when trying to set up the repostiory format.
    +    when trying to set up the repository format.
     
         It follows that we could just leave the current code intact, as in
         practice the code change doesn't have any user visible impact. But it
         also prepares us for `the_hash_algo` being unset when there is no
    -    repositroy.
    +    repository.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
 -:  ---------- >  2:  a347c7e6ca path: move `validate_headref()` to its only user
 2:  5daaaed2b9 !  3:  c0a15b2fa6 parse-options-cb: only abbreviate hashes when hash algo is known
    @@ Commit message
         parse-options-cb: only abbreviate hashes when hash algo is known
     
         The `OPT__ABBREV()` option can be used to add an option that abbreviates
    -    object IDs. When given an length longer than `the_hash_algo->hexsz`,
    -    then it will instead set the length to that maximum length.
    +    object IDs. When given a length longer than `the_hash_algo->hexsz`, then
    +    it will instead set the length to that maximum length.
     
         It may not always be guaranteed that we have `the_hash_algo` initialized
    -    properly as the hash algortihm can only be set up after we have set up
    +    properly as the hash algorithm can only be set up after we have set up
         `the_repository`. In that case, the hash would always be truncated to
         the hex length of SHA1, which may not be what the user desires.
     
 3:  ae91a27ffc !  4:  1b5f904eed attr: don't recompute default attribute source
    @@ Commit message
         variable is the null object ID then we try to look up the attr source,
         otherwise we skip over it.
     
    -    This has approach is flawed though: the variable will never be set to
    +    This approach is flawed though: the variable will never be set to
         anything else but the null object ID in case there is no attr source.
         Consequently, we re-compute the information on every call. And in the
         worst case, when we silently ignore bad trees, this will cause us to try
 4:  53c8e1cd7c =  5:  26909daca4 attr: fix BUG() when parsing attrs outside of repo
 5:  32a429fb60 =  6:  0b99184f50 remote-curl: fix parsing of detached SHA256 heads
 6:  9cb7baa50c =  7:  ccfda3c2d2 builtin/rev-parse: allow shortening to more than 40 hex characters
 7:  e189a4ad15 =  8:  1813e7eb5c builtin/blame: don't access potentially unitialized `the_hash_algo`
 8:  bc4bda3508 =  9:  31182a1fc6 builtin/bundle: abort "verify" early when there is no repository
 9:  39e56dab62 ! 10:  78e19d0a1b builtin/diff: explicitly set hash algo when there is no repo
    @@ Commit message
         hashing the files that we are diffing so that we can print the "index"
         line:
     
    -    ```
    -    diff --git a/a b/b
    -    index 7898192..6178079 100644
    -    --- a/a
    -    +++ b/b
    -    @@ -1 +1 @@
    -    -a
    -    +b
    -    ```
    +        ```
    +        diff --git a/a b/b
    +        index 7898192..6178079 100644
    +        --- a/a
    +        +++ b/b
    +        @@ -1 +1 @@
    +        -a
    +        +b
    +        ```
     
         We implicitly use SHA1 to calculate the hash here, which is because
         `the_repository` gets initialized with SHA1 during the startup routine.
10:  508e28ed1e ! 11:  51bcddbc31 builtin/shortlog: don't set up revisions without repo
    @@ Commit message
         repository in that context, it is thus unsupported to pass any revisions
         as arguments.
     
    -    Reghardless of that we still end up calling `setup_revisions()`. While
    +    Regardless of that we still end up calling `setup_revisions()`. While
         that works alright, it is somewhat strange. Furthermore, this is about
         to cause problems when we unset the default object hash.
     
11:  f86a6ff3ba = 12:  e8126371e1 repository: stop setting SHA1 as the default object hash

base-commit: 21306a098c3f174ad4c2a5cddb9069ee27a548b0
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
@ 2024-04-23  5:07   ` Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 02/12] path: move `validate_headref()` to its only user Patrick Steinhardt
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2279 bytes --]

The `validate_headref()` function takes a path to a supposed "HEAD" file
and checks whether its format is something that we understand. It is
used as part of our repository discovery to check whether a specific
directory is a Git directory or not.

Part of the validation is a check for a detached HEAD that contains a
plain object ID. To do this validation we use `get_oid_hex()`, which
relies on `the_hash_algo`. At this point in time the hash algo cannot
yet be initialized though because we didn't yet read the Git config.
Consequently, it will always be the SHA1 hash algorithm.

In practice this works alright because `get_oid_hex()` only ends up
checking whether the prefix of the buffer is a valid object ID. And
because SHA1 is shorter than SHA256, the function will successfully
parse SHA256 object IDs, as well.

It is somewhat fragile though and not really the intent to only check
for SHA1. With this in mind, harden the code to use `get_oid_hex_any()`
to check whether the "HEAD" file parses as any known hash.

One might be hard pressed to tighten the check even further and fully
validate the file contents, not only the prefix. In practice though that
wouldn't make a lot of sense as it could be that the repository uses a
hash function that produces longer hashes than SHA256, but which the
current version of Git doesn't understand yet. We'd still want to detect
the repository as proper Git repository in that case, and we will fail
eventually with a proper error message that the hash isn't understood
when trying to set up the repository format.

It follows that we could just leave the current code intact, as in
practice the code change doesn't have any user visible impact. But it
also prepares us for `the_hash_algo` being unset when there is no
repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 path.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/path.c b/path.c
index 67229edb9c..cc02165530 100644
--- a/path.c
+++ b/path.c
@@ -693,7 +693,7 @@ int validate_headref(const char *path)
 	/*
 	 * Is this a detached HEAD?
 	 */
-	if (!get_oid_hex(buffer, &oid))
+	if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN)
 		return 0;
 
 	return -1;
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 02/12] path: move `validate_headref()` to its only user
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
@ 2024-04-23  5:07   ` Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 03/12] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3965 bytes --]

While `validate_headref()` is only called from `is_git_directory()` in
"setup.c", it is currently implemented in "path.c". Move it over such
that it becomes clear that it is only really used during setup in order
to discover repositories.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 path.c  | 53 -----------------------------------------------------
 path.h  |  1 -
 setup.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 54 deletions(-)

diff --git a/path.c b/path.c
index cc02165530..bd6e25245d 100644
--- a/path.c
+++ b/path.c
@@ -5,7 +5,6 @@
 #include "abspath.h"
 #include "environment.h"
 #include "gettext.h"
-#include "hex.h"
 #include "repository.h"
 #include "strbuf.h"
 #include "string-list.h"
@@ -647,58 +646,6 @@ void strbuf_git_common_path(struct strbuf *sb,
 	va_end(args);
 }
 
-int validate_headref(const char *path)
-{
-	struct stat st;
-	char buffer[256];
-	const char *refname;
-	struct object_id oid;
-	int fd;
-	ssize_t len;
-
-	if (lstat(path, &st) < 0)
-		return -1;
-
-	/* Make sure it is a "refs/.." symlink */
-	if (S_ISLNK(st.st_mode)) {
-		len = readlink(path, buffer, sizeof(buffer)-1);
-		if (len >= 5 && !memcmp("refs/", buffer, 5))
-			return 0;
-		return -1;
-	}
-
-	/*
-	 * Anything else, just open it and try to see if it is a symbolic ref.
-	 */
-	fd = open(path, O_RDONLY);
-	if (fd < 0)
-		return -1;
-	len = read_in_full(fd, buffer, sizeof(buffer)-1);
-	close(fd);
-
-	if (len < 0)
-		return -1;
-	buffer[len] = '\0';
-
-	/*
-	 * Is it a symbolic ref?
-	 */
-	if (skip_prefix(buffer, "ref:", &refname)) {
-		while (isspace(*refname))
-			refname++;
-		if (starts_with(refname, "refs/"))
-			return 0;
-	}
-
-	/*
-	 * Is this a detached HEAD?
-	 */
-	if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN)
-		return 0;
-
-	return -1;
-}
-
 static struct passwd *getpw_str(const char *username, size_t len)
 {
 	struct passwd *pw;
diff --git a/path.h b/path.h
index ea96487b29..c3bc8617bd 100644
--- a/path.h
+++ b/path.h
@@ -173,7 +173,6 @@ const char *git_path_fetch_head(struct repository *r);
 const char *git_path_shallow(struct repository *r);
 
 int ends_with_path_components(const char *path, const char *components);
-int validate_headref(const char *ref);
 
 int calc_shared_perm(int mode);
 int adjust_shared_perm(const char *path);
diff --git a/setup.c b/setup.c
index f4b32f76e3..7c996659bd 100644
--- a/setup.c
+++ b/setup.c
@@ -4,6 +4,7 @@
 #include "environment.h"
 #include "exec-cmd.h"
 #include "gettext.h"
+#include "hex.h"
 #include "object-name.h"
 #include "refs.h"
 #include "repository.h"
@@ -341,6 +342,58 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 	return ret;
 }
 
+static int validate_headref(const char *path)
+{
+	struct stat st;
+	char buffer[256];
+	const char *refname;
+	struct object_id oid;
+	int fd;
+	ssize_t len;
+
+	if (lstat(path, &st) < 0)
+		return -1;
+
+	/* Make sure it is a "refs/.." symlink */
+	if (S_ISLNK(st.st_mode)) {
+		len = readlink(path, buffer, sizeof(buffer)-1);
+		if (len >= 5 && !memcmp("refs/", buffer, 5))
+			return 0;
+		return -1;
+	}
+
+	/*
+	 * Anything else, just open it and try to see if it is a symbolic ref.
+	 */
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return -1;
+	len = read_in_full(fd, buffer, sizeof(buffer)-1);
+	close(fd);
+
+	if (len < 0)
+		return -1;
+	buffer[len] = '\0';
+
+	/*
+	 * Is it a symbolic ref?
+	 */
+	if (skip_prefix(buffer, "ref:", &refname)) {
+		while (isspace(*refname))
+			refname++;
+		if (starts_with(refname, "refs/"))
+			return 0;
+	}
+
+	/*
+	 * Is this a detached HEAD?
+	 */
+	if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN)
+		return 0;
+
+	return -1;
+}
+
 /*
  * Test if it looks like we're at a git directory.
  * We want to see:
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 03/12] parse-options-cb: only abbreviate hashes when hash algo is known
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 02/12] path: move `validate_headref()` to its only user Patrick Steinhardt
@ 2024-04-23  5:07   ` Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 04/12] attr: don't recompute default attribute source Patrick Steinhardt
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2445 bytes --]

The `OPT__ABBREV()` option can be used to add an option that abbreviates
object IDs. When given a length longer than `the_hash_algo->hexsz`, then
it will instead set the length to that maximum length.

It may not always be guaranteed that we have `the_hash_algo` initialized
properly as the hash algorithm can only be set up after we have set up
`the_repository`. In that case, the hash would always be truncated to
the hex length of SHA1, which may not be what the user desires.

In practice it's not a problem as all commands that use `OPT__ABBREV()`
also have `RUN_SETUP` set and thus cannot work without a repository.
Consequently, both `the_repository` and `the_hash_algo` would be
properly set up.

Regardless of that, harden the code to not truncate the length when we
didn't set up a repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 parse-options-cb.c       |  3 ++-
 t/t0040-parse-options.sh | 17 +++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/parse-options-cb.c b/parse-options-cb.c
index bdc7fae497..d99d688d3c 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -7,6 +7,7 @@
 #include "environment.h"
 #include "gettext.h"
 #include "object-name.h"
+#include "setup.h"
 #include "string-list.h"
 #include "strvec.h"
 #include "oid-array.h"
@@ -29,7 +30,7 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset)
 				     opt->long_name);
 		if (v && v < MINIMUM_ABBREV)
 			v = MINIMUM_ABBREV;
-		else if (v > the_hash_algo->hexsz)
+		else if (startup_info->have_repository && v > the_hash_algo->hexsz)
 			v = the_hash_algo->hexsz;
 	}
 	*(int *)(opt->value) = v;
diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index 8bb2a8b453..45a773642f 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -176,6 +176,23 @@ test_expect_success 'long options' '
 	test_cmp expect output
 '
 
+test_expect_success 'abbreviate to something longer than SHA1 length' '
+	cat >expect <<-EOF &&
+	boolean: 0
+	integer: 0
+	magnitude: 0
+	timestamp: 0
+	string: (not set)
+	abbrev: 100
+	verbose: -1
+	quiet: 0
+	dry run: no
+	file: (not set)
+	EOF
+	test-tool parse-options --abbrev=100 >output &&
+	test_cmp expect output
+'
+
 test_expect_success 'missing required value' '
 	cat >expect <<-\EOF &&
 	error: switch `s'\'' requires a value
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 04/12] attr: don't recompute default attribute source
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-04-23  5:07   ` [PATCH v2 03/12] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
@ 2024-04-23  5:07   ` Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 05/12] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2887 bytes --]

The `default_attr_source()` function lazily computes the attr source
supposedly once, only. This is done via a static variable `attr_source`
that contains the resolved object ID of the attr source's tree. If the
variable is the null object ID then we try to look up the attr source,
otherwise we skip over it.

This approach is flawed though: the variable will never be set to
anything else but the null object ID in case there is no attr source.
Consequently, we re-compute the information on every call. And in the
worst case, when we silently ignore bad trees, this will cause us to try
and look up the treeish every single time.

Improve this by introducing a separate variable `has_attr_source` to
track whether we already computed the attr source and, if so, whether we
have an attr source or not.

This also allows us to convert the `ignore_bad_attr_tree` to not be
static anymore as the code will only be executed once anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 attr.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/attr.c b/attr.c
index 679e42258c..9d911aeb31 100644
--- a/attr.c
+++ b/attr.c
@@ -1206,15 +1206,16 @@ static void collect_some_attrs(struct index_state *istate,
 }
 
 static const char *default_attr_source_tree_object_name;
-static int ignore_bad_attr_tree;
 
 void set_git_attr_source(const char *tree_object_name)
 {
 	default_attr_source_tree_object_name = xstrdup(tree_object_name);
 }
 
-static void compute_default_attr_source(struct object_id *attr_source)
+static int compute_default_attr_source(struct object_id *attr_source)
 {
+	int ignore_bad_attr_tree = 0;
+
 	if (!default_attr_source_tree_object_name)
 		default_attr_source_tree_object_name = getenv(GIT_ATTR_SOURCE_ENVIRONMENT);
 
@@ -1230,22 +1231,28 @@ static void compute_default_attr_source(struct object_id *attr_source)
 		ignore_bad_attr_tree = 1;
 	}
 
-	if (!default_attr_source_tree_object_name || !is_null_oid(attr_source))
-		return;
+	if (!default_attr_source_tree_object_name)
+		return 0;
 
 	if (repo_get_oid_treeish(the_repository,
 				 default_attr_source_tree_object_name,
-				 attr_source) && !ignore_bad_attr_tree)
-		die(_("bad --attr-source or GIT_ATTR_SOURCE"));
+				 attr_source)) {
+		if (!ignore_bad_attr_tree)
+			die(_("bad --attr-source or GIT_ATTR_SOURCE"));
+		return 0;
+	}
+
+	return 1;
 }
 
 static struct object_id *default_attr_source(void)
 {
 	static struct object_id attr_source;
+	static int has_attr_source = -1;
 
-	if (is_null_oid(&attr_source))
-		compute_default_attr_source(&attr_source);
-	if (is_null_oid(&attr_source))
+	if (has_attr_source < 0)
+		has_attr_source = compute_default_attr_source(&attr_source);
+	if (!has_attr_source)
 		return NULL;
 	return &attr_source;
 }
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 05/12] attr: fix BUG() when parsing attrs outside of repo
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2024-04-23  5:07   ` [PATCH v2 04/12] attr: don't recompute default attribute source Patrick Steinhardt
@ 2024-04-23  5:07   ` Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 06/12] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]

If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are
set, then `compute_default_attr_source()` will try to look up the value
as a treeish. It is possible to hit that function while outside of a Git
repository though, for example when using `git grep --no-index`. In that
case, Git will hit a bug because we try to look up the main ref store
outside of a repository.

Handle the case gracefully and detect when we try to look up an attr
source without a repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 attr.c                |  6 ++++++
 t/t0003-attributes.sh | 15 +++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/attr.c b/attr.c
index 9d911aeb31..4bd09bcb4b 100644
--- a/attr.c
+++ b/attr.c
@@ -1234,6 +1234,12 @@ static int compute_default_attr_source(struct object_id *attr_source)
 	if (!default_attr_source_tree_object_name)
 		return 0;
 
+	if (!startup_info->have_repository) {
+		if (!ignore_bad_attr_tree)
+			die(_("cannot use --attr-source or GIT_ATTR_SOURCE without repo"));
+		return 0;
+	}
+
 	if (repo_get_oid_treeish(the_repository,
 				 default_attr_source_tree_object_name,
 				 attr_source)) {
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 774b52c298..3efdec54dd 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -428,6 +428,21 @@ test_expect_success 'precedence of --attr-source, GIT_ATTR_SOURCE, then attr.tre
 	)
 '
 
+test_expect_success 'diff without repository with attr source' '
+	mkdir -p "$TRASH_DIRECTORY/outside/nongit" &&
+	(
+		cd "$TRASH_DIRECTORY/outside/nongit" &&
+		GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/outside" &&
+		export GIT_CEILING_DIRECTORIES &&
+		touch file &&
+		cat >expect <<-EOF &&
+		fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo
+		EOF
+		test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err &&
+		test_cmp expect err
+	)
+'
+
 test_expect_success 'bare repository: with --source' '
 	(
 		cd bare.git &&
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 06/12] remote-curl: fix parsing of detached SHA256 heads
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2024-04-23  5:07   ` [PATCH v2 05/12] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt
@ 2024-04-23  5:07   ` Patrick Steinhardt
  2024-04-23  5:07   ` [PATCH v2 07/12] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3673 bytes --]

The dumb HTTP transport tries to read the remote HEAD reference by
downloading the "HEAD" file and then parsing it via `http_fetch_ref()`.
This function will either parse the file as an object ID in case it is
exactly `the_hash_algo->hexsz` long, or otherwise it will check whether
the reference starts with "ref :" and parse it as a symbolic ref.

This is broken when parsing detached HEADs of a remote SHA256 repository
because we never update `the_hash_algo` to the discovered remote object
hash. Consequently, `the_hash_algo` will always be the fallback SHA1
hash algorithm, which will cause us to fail parsing HEAD altogteher when
it contains a SHA256 object ID.

Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`.
While at it, let's make the expected SHA1 fallback explicit in our code,
which also addresses an upcoming issue where we are going to remove the
SHA1 fallback for `the_hash_algo`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 remote-curl.c              | 19 ++++++++++++++++++-
 t/t5550-http-fetch-dumb.sh | 15 +++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 0b6d7815fd..004b707fdf 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -266,12 +266,23 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	return list;
 }
 
+/*
+ * Try to detect the hash algorithm used by the remote repository when using
+ * the dumb HTTP transport. As dumb transports cannot tell us the object hash
+ * directly have to derive it from the advertised ref lengths.
+ */
 static const struct git_hash_algo *detect_hash_algo(struct discovery *heads)
 {
 	const char *p = memchr(heads->buf, '\t', heads->len);
 	int algo;
+
+	/*
+	 * In case the remote has no refs we have no way to reliably determine
+	 * the object hash used by that repository. In that case we simply fall
+	 * back to SHA1, which may or may not be correct.
+	 */
 	if (!p)
-		return the_hash_algo;
+		return &hash_algos[GIT_HASH_SHA1];
 
 	algo = hash_algo_by_length((p - heads->buf) / 2);
 	if (algo == GIT_HASH_UNKNOWN)
@@ -295,6 +306,12 @@ static struct ref *parse_info_refs(struct discovery *heads)
 		    "is this a git repository?",
 		    transport_anonymize_url(url.buf));
 
+	/*
+	 * Set the repository's hash algo to whatever we have just detected.
+	 * This ensures that we can correctly parse the remote references.
+	 */
+	repo_set_hash_algo(the_repository, hash_algo_by_ptr(options.hash_algo));
+
 	data = heads->buf;
 	start = NULL;
 	mid = data;
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 4c3b32785d..5f16cbc58d 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -55,6 +55,21 @@ test_expect_success 'list refs from outside any repository' '
 	test_cmp expect actual
 '
 
+
+test_expect_success 'list detached HEAD from outside any repository' '
+	git clone --mirror "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
+		"$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" \
+		update-ref --no-deref HEAD refs/heads/main &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" update-server-info &&
+	cat >expect <<-EOF &&
+	$(git rev-parse main)	HEAD
+	$(git rev-parse main)	refs/heads/main
+	EOF
+	nongit git ls-remote "$HTTPD_URL/dumb/repo-detached.git" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'create password-protected repository' '
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/" &&
 	cp -Rf "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 07/12] builtin/rev-parse: allow shortening to more than 40 hex characters
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2024-04-23  5:07   ` [PATCH v2 06/12] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt
@ 2024-04-23  5:07   ` Patrick Steinhardt
  2024-04-23  5:08   ` [PATCH v2 08/12] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:07 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

The `--short=` option for git-rev-parse(1) allows the user to specify
to how many characters object IDs should be shortened to. The option is
broken though for SHA256 repositories because we set the maximum allowed
hash size to `the_hash_algo->hexsz` before we have even set up the repo.
Consequently, `the_hash_algo` will always be SHA1 and thus we truncate
every hash after at most 40 characters.

Fix this by accessing `the_hash_algo` only after we have set up the
repo.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/rev-parse.c  | 5 ++---
 t/t1500-rev-parse.sh | 6 ++++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 624182e507..d7b87c605e 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -687,7 +687,6 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 	const char *name = NULL;
 	struct object_context unused;
 	struct strbuf buf = STRBUF_INIT;
-	const int hexsz = the_hash_algo->hexsz;
 	int seen_end_of_options = 0;
 	enum format_type format = FORMAT_DEFAULT;
 
@@ -863,8 +862,8 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				abbrev = strtoul(arg, NULL, 10);
 				if (abbrev < MINIMUM_ABBREV)
 					abbrev = MINIMUM_ABBREV;
-				else if (hexsz <= abbrev)
-					abbrev = hexsz;
+				else if ((int)the_hash_algo->hexsz <= abbrev)
+					abbrev = the_hash_algo->hexsz;
 				continue;
 			}
 			if (!strcmp(arg, "--sq")) {
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index a669e592f1..30c31918fd 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -304,4 +304,10 @@ test_expect_success 'rev-parse --bisect includes bad, excludes good' '
 	test_cmp expect actual
 '
 
+test_expect_success '--short= truncates to the actual hash length' '
+	git rev-parse HEAD >expect &&
+	git rev-parse --short=100 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 08/12] builtin/blame: don't access potentially unitialized `the_hash_algo`
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2024-04-23  5:07   ` [PATCH v2 07/12] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt
@ 2024-04-23  5:08   ` Patrick Steinhardt
  2024-04-23  5:08   ` [PATCH v2 09/12] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2003 bytes --]

We access `the_hash_algo` in git-blame(1) before we have executed
`parse_options_start()`, which may not be properly set up in case we
have no repository. This is fine for most of the part because all the
call paths that lead to it (git-blame(1), git-annotate(1) as well as
git-pick-axe(1)) specify `RUN_SETUP` and thus require a repository.

There is one exception though, namely when passing `-h` to print the
help. Here we will access `the_hash_algo` even if there is no repo.
This works fine right now because `the_hash_algo` gets sets up to point
to the SHA1 algorithm via `initialize_repository()`. But we're about to
stop doing this, and thus the code would lead to a `NULL` pointer
exception.

Prepare the code for this and only access `the_hash_algo` after we are
sure that there is a proper repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/blame.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 9aa74680a3..e325825936 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -915,7 +915,6 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	struct range_set ranges;
 	unsigned int range_i;
 	long anchor;
-	const int hexsz = the_hash_algo->hexsz;
 	long num_lines = 0;
 	const char *str_usage = cmd_is_annotate ? annotate_usage : blame_usage;
 	const char **opt_usage = cmd_is_annotate ? annotate_opt_usage : blame_opt_usage;
@@ -973,11 +972,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	} else if (show_progress < 0)
 		show_progress = isatty(2);
 
-	if (0 < abbrev && abbrev < hexsz)
+	if (0 < abbrev && abbrev < (int)the_hash_algo->hexsz)
 		/* one more abbrev length is needed for the boundary commit */
 		abbrev++;
 	else if (!abbrev)
-		abbrev = hexsz;
+		abbrev = the_hash_algo->hexsz;
 
 	if (revs_file && read_ancestry(revs_file))
 		die_errno("reading graft file '%s' failed", revs_file);
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 09/12] builtin/bundle: abort "verify" early when there is no repository
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2024-04-23  5:08   ` [PATCH v2 08/12] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt
@ 2024-04-23  5:08   ` Patrick Steinhardt
  2024-04-23  5:08   ` [PATCH v2 10/12] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]

Verifying a bundle requires us to have a repository. This is encoded in
`verify_bundle()`, which will return an error if there is no repository.
We call `open_bundle()` before we call `verify_bundle()` though, which
already performs some verifications even though we may ultimately abort
due to a missing repository.

This is problematic because `open_bundle()` already reads the bundle
header and verifies that it contains a properly formatted hash. When
there is no repository we have no clue what hash function to expect
though, so we always end up assuming SHA1 here, which may or may not be
correct. Furthermore, we are about to stop initializing `the_hash_algo`
when there is no repository, which will lead to segfaults.

Check early on whether we have a repository to fix this issue.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/bundle.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/bundle.c b/builtin/bundle.c
index 3ad11dc5d0..d5d41a8f67 100644
--- a/builtin/bundle.c
+++ b/builtin/bundle.c
@@ -140,6 +140,11 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) {
 			builtin_bundle_verify_usage, options, &bundle_file);
 	/* bundle internals use argv[1] as further parameters */
 
+	if (!startup_info->have_repository) {
+		ret = error(_("need a repository to verify a bundle"));
+		goto cleanup;
+	}
+
 	if ((bundle_fd = open_bundle(bundle_file, &header, &name)) < 0) {
 		ret = 1;
 		goto cleanup;
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 10/12] builtin/diff: explicitly set hash algo when there is no repo
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2024-04-23  5:08   ` [PATCH v2 09/12] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt
@ 2024-04-23  5:08   ` Patrick Steinhardt
  2024-04-23  5:08   ` [PATCH v2 11/12] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

The git-diff(1) command can be used outside repositories to diff two
files with each other. But even if there is no repository we will end up
hashing the files that we are diffing so that we can print the "index"
line:

    ```
    diff --git a/a b/b
    index 7898192..6178079 100644
    --- a/a
    +++ b/b
    @@ -1 +1 @@
    -a
    +b
    ```

We implicitly use SHA1 to calculate the hash here, which is because
`the_repository` gets initialized with SHA1 during the startup routine.
We are about to stop doing this though such that `the_repository` only
ever has a hash function when it was properly initialized via a repo's
configuration.

To give full control to our users, we would ideally add a new switch to
git-diff(1) that allows them to specify the hash function when executed
outside of a repository. But for now, we only convert the code to make
this explicit such that we can stop setting the default hash algorithm
for `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/diff.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/diff.c b/builtin/diff.c
index 6e196e0c7d..58ec7e5da2 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -465,6 +465,15 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 			no_index = DIFF_NO_INDEX_IMPLICIT;
 	}
 
+	/*
+	 * When operating outside of a Git repository we need to have a hash
+	 * algorithm at hand so that we can generate the blob hashes. We
+	 * default to SHA1 here, but may eventually want to change this to be
+	 * configurable via a command line option.
+	 */
+	if (nongit)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	init_diff_ui_defaults();
 	git_config(git_diff_ui_config, NULL);
 	prefix = precompose_argv_prefix(argc, argv, prefix);
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 11/12] builtin/shortlog: don't set up revisions without repo
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2024-04-23  5:08   ` [PATCH v2 10/12] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
@ 2024-04-23  5:08   ` Patrick Steinhardt
  2024-04-23  5:08   ` [PATCH v2 12/12] repository: stop setting SHA1 as the default object hash Patrick Steinhardt
  2024-04-27 22:09   ` [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` Junio C Hamano
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1300 bytes --]

It is possible to run git-shortlog(1) outside of a repository by passing
it output from git-log(1) via standard input. Obviously, as there is no
repository in that context, it is thus unsupported to pass any revisions
as arguments.

Regardless of that we still end up calling `setup_revisions()`. While
that works alright, it is somewhat strange. Furthermore, this is about
to cause problems when we unset the default object hash.

Refactor the code to only call `setup_revisions()` when we have a
repository. This is safe to do as we already verify that there are no
arguments when running outside of a repository anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/shortlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 3c7cd2d6ef..d4daf31e22 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -435,7 +435,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 		usage_with_options(shortlog_usage, options);
 	}
 
-	if (setup_revisions(argc, argv, &rev, NULL) != 1) {
+	if (!nongit && setup_revisions(argc, argv, &rev, NULL) != 1) {
 		error(_("unrecognized argument: %s"), argv[1]);
 		usage_with_options(shortlog_usage, options);
 	}
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 12/12] repository: stop setting SHA1 as the default object hash
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2024-04-23  5:08   ` [PATCH v2 11/12] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
@ 2024-04-23  5:08   ` Patrick Steinhardt
  2024-04-27 22:09   ` [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` Junio C Hamano
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-23  5:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1696 bytes --]

During the startup of Git, we call `initialize_the_repository()` to set
up `the_repository` as well as `the_index`. Part of this setup is also
to set the default object hash of the repository to SHA1. This has the
effect that `the_hash_algo` is getting initialized to SHA1, as well.
This default hash algorithm eventually gets overridden by most Git
commands via `setup_git_directory()`, which also detects the actual hash
algorithm used by the repository.

There are some commands though that don't access a repository at all, or
at a later point only, and thus retain the default hash function for
some amount of time. As some of the the preceding commits demonstrate,
this can lead to subtle issues when we access `the_hash_algo` when no
repository has been set up.

Address this issue by dropping the set up of the default hash algorithm
completely. The effect of this is that `the_hash_algo` will map to a
`NULL` pointer and thus cause Git to crash when something tries to
access the hash algorithm without it being properly initialized. It thus
forces all Git commands to explicitly set up the hash algorithm in case
there is no repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 repository.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/repository.c b/repository.c
index e15b416944..b65b1a8c8b 100644
--- a/repository.c
+++ b/repository.c
@@ -35,8 +35,6 @@ void initialize_the_repository(void)
 	the_repo.parsed_objects = parsed_object_pool_new();
 
 	index_state_init(&the_index, the_repository);
-
-	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
 }
 
 static void expand_base_dir(char **out, const char *in,
-- 
2.45.0-rc0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 01/11] path: harden validation of HEAD with non-standard hashes
  2024-04-23  4:50     ` Patrick Steinhardt
@ 2024-04-23 16:54       ` Junio C Hamano
  0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2024-04-23 16:54 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

>> All makes sense, and given the above, I strongly suspect that we
>> would want to make the validate_headref() function a file-scope
>> static in setup.c to make sure it is only called/callable from the
>> repository discovery codepath.  Especially that if somebody calls
>> this function again after we find out that the repository uses
>> SHA-256, we would want to let the caller know that the detached HEAD
>> records SHA-1 and we are in an inconsistent state.
>
> Fair enough, we can definitely do so. It only has a single callsite
> anyway.

I was wondering if I was missing a future plans to call it from
other code paths that are triggered after the set-up was done.  If
that is not the case, we should do so.  It matters more for the
future callers than the current ones.  They _all_ have to be aware
of the deliberate looseness of the check here.

Thanks.  

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo`
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
                     ` (11 preceding siblings ...)
  2024-04-23  5:08   ` [PATCH v2 12/12] repository: stop setting SHA1 as the default object hash Patrick Steinhardt
@ 2024-04-27 22:09   ` Junio C Hamano
  2024-04-29  6:05     ` Patrick Steinhardt
  12 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2024-04-27 22:09 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, brian m. carlson, Justin Tobler, Josh Steadmon

Patrick Steinhardt <ps@pks.im> writes:

[cc: added Josh as the commit-graph fuzzer was his creation].

> this is the second version of my patch series that causes us to stop
> relying on the SHA1 default hash.

With this topic merged, 'seen' fails "fuzz smoke test"; I think this

    https://github.com/git/git/actions/runs/8807729398/job/24175445340

is the first merge of this topic into 'seen' where "fuzz smoke test"
started failing.

With the merge of the topic from 'seen' reverted tentatively,

    https://github.com/git/git/actions/runs/8862811497/job/24336185541

the same test seems happy.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo`
  2024-04-27 22:09   ` [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` Junio C Hamano
@ 2024-04-29  6:05     ` Patrick Steinhardt
  0 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Justin Tobler, Josh Steadmon

[-- Attachment #1: Type: text/plain, Size: 1658 bytes --]

On Sat, Apr 27, 2024 at 03:09:43PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> [cc: added Josh as the commit-graph fuzzer was his creation].
> 
> > this is the second version of my patch series that causes us to stop
> > relying on the SHA1 default hash.
> 
> With this topic merged, 'seen' fails "fuzz smoke test"; I think this
> 
>     https://github.com/git/git/actions/runs/8807729398/job/24175445340
> 
> is the first merge of this topic into 'seen' where "fuzz smoke test"
> started failing.
> 
> With the merge of the topic from 'seen' reverted tentatively,
> 
>     https://github.com/git/git/actions/runs/8862811497/job/24336185541
> 
> the same test seems happy.

Indeed, thanks for the heads up! Another test gap that we have in the
GitLab CI setup. I'll add a separate patch series to plug this gap and
add the job to GitLab CI, too.

The fix for the failure is easy:

    diff --git a/oss-fuzz/fuzz-commit-graph.c b/oss-fuzz/fuzz-commit-graph.c
    index 2992079dd9..94ecbb9242 100644
    --- a/oss-fuzz/fuzz-commit-graph.c
    +++ b/oss-fuzz/fuzz-commit-graph.c
    @@ -18,6 +18,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
         * touching the disk to keep the individual fuzz-test cases as fast as
         * possible.
         */
    +	repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
        the_repository->settings.commit_graph_generation_version = 2;
        the_repository->settings.commit_graph_read_changed_paths = 1;
        g = parse_commit_graph(&the_repository->settings, (void *)data, size);

I'll send a v3.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 00/13] Stop relying on SHA1 fallback for `the_hash_algo`
  2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
                   ` (12 preceding siblings ...)
  2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
@ 2024-04-29  6:34 ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
                     ` (12 more replies)
  13 siblings, 13 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3161 bytes --]

Hi,

this is the third version of my patch series that stops relying on the
SHA1 fallback configured for `the_hash_algo`.

There's only a single change compared to v2, which is a new patch that
fixes a segfault in the commit-graph fuzzer.

Thanks!

Patrick

Patrick Steinhardt (13):
  path: harden validation of HEAD with non-standard hashes
  path: move `validate_headref()` to its only user
  parse-options-cb: only abbreviate hashes when hash algo is known
  attr: don't recompute default attribute source
  attr: fix BUG() when parsing attrs outside of repo
  remote-curl: fix parsing of detached SHA256 heads
  builtin/rev-parse: allow shortening to more than 40 hex characters
  builtin/blame: don't access potentially unitialized `the_hash_algo`
  builtin/bundle: abort "verify" early when there is no repository
  builtin/diff: explicitly set hash algo when there is no repo
  builtin/shortlog: don't set up revisions without repo
  oss-fuzz/commit-graph: set up hash algorithm
  repository: stop setting SHA1 as the default object hash

 attr.c                       | 31 +++++++++++++++------
 builtin/blame.c              |  5 ++--
 builtin/bundle.c             |  5 ++++
 builtin/diff.c               |  9 ++++++
 builtin/rev-parse.c          |  5 ++--
 builtin/shortlog.c           |  2 +-
 oss-fuzz/fuzz-commit-graph.c |  1 +
 parse-options-cb.c           |  3 +-
 path.c                       | 53 ------------------------------------
 path.h                       |  1 -
 remote-curl.c                | 19 ++++++++++++-
 repository.c                 |  2 --
 setup.c                      | 53 ++++++++++++++++++++++++++++++++++++
 t/t0003-attributes.sh        | 15 ++++++++++
 t/t0040-parse-options.sh     | 17 ++++++++++++
 t/t1500-rev-parse.sh         |  6 ++++
 t/t5550-http-fetch-dumb.sh   | 15 ++++++++++
 17 files changed, 168 insertions(+), 74 deletions(-)

Range-diff against v2:
 1:  a986b464d3 =  1:  5134f35cda path: harden validation of HEAD with non-standard hashes
 2:  a347c7e6ca =  2:  589b6a99ef path: move `validate_headref()` to its only user
 3:  c0a15b2fa6 =  3:  9a63c445d2 parse-options-cb: only abbreviate hashes when hash algo is known
 4:  1b5f904eed =  4:  929bacbfce attr: don't recompute default attribute source
 5:  26909daca4 =  5:  8f20aec1ee attr: fix BUG() when parsing attrs outside of repo
 6:  0b99184f50 =  6:  53439067a1 remote-curl: fix parsing of detached SHA256 heads
 7:  ccfda3c2d2 =  7:  1f74960760 builtin/rev-parse: allow shortening to more than 40 hex characters
 8:  1813e7eb5c =  8:  2d985abca1 builtin/blame: don't access potentially unitialized `the_hash_algo`
 9:  31182a1fc6 =  9:  f3b23d28aa builtin/bundle: abort "verify" early when there is no repository
10:  78e19d0a1b = 10:  7577b6b96c builtin/diff: explicitly set hash algo when there is no repo
11:  51bcddbc31 = 11:  509c79d1d3 builtin/shortlog: don't set up revisions without repo
 -:  ---------- > 12:  660f976129 oss-fuzz/commit-graph: set up hash algorithm
12:  e8126371e1 = 13:  95909c2da5 repository: stop setting SHA1 as the default object hash
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 02/13] path: move `validate_headref()` to its only user Patrick Steinhardt
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2279 bytes --]

The `validate_headref()` function takes a path to a supposed "HEAD" file
and checks whether its format is something that we understand. It is
used as part of our repository discovery to check whether a specific
directory is a Git directory or not.

Part of the validation is a check for a detached HEAD that contains a
plain object ID. To do this validation we use `get_oid_hex()`, which
relies on `the_hash_algo`. At this point in time the hash algo cannot
yet be initialized though because we didn't yet read the Git config.
Consequently, it will always be the SHA1 hash algorithm.

In practice this works alright because `get_oid_hex()` only ends up
checking whether the prefix of the buffer is a valid object ID. And
because SHA1 is shorter than SHA256, the function will successfully
parse SHA256 object IDs, as well.

It is somewhat fragile though and not really the intent to only check
for SHA1. With this in mind, harden the code to use `get_oid_hex_any()`
to check whether the "HEAD" file parses as any known hash.

One might be hard pressed to tighten the check even further and fully
validate the file contents, not only the prefix. In practice though that
wouldn't make a lot of sense as it could be that the repository uses a
hash function that produces longer hashes than SHA256, but which the
current version of Git doesn't understand yet. We'd still want to detect
the repository as proper Git repository in that case, and we will fail
eventually with a proper error message that the hash isn't understood
when trying to set up the repository format.

It follows that we could just leave the current code intact, as in
practice the code change doesn't have any user visible impact. But it
also prepares us for `the_hash_algo` being unset when there is no
repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 path.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/path.c b/path.c
index 67229edb9c..cc02165530 100644
--- a/path.c
+++ b/path.c
@@ -693,7 +693,7 @@ int validate_headref(const char *path)
 	/*
 	 * Is this a detached HEAD?
 	 */
-	if (!get_oid_hex(buffer, &oid))
+	if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN)
 		return 0;
 
 	return -1;
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 02/13] path: move `validate_headref()` to its only user
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 03/13] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3965 bytes --]

While `validate_headref()` is only called from `is_git_directory()` in
"setup.c", it is currently implemented in "path.c". Move it over such
that it becomes clear that it is only really used during setup in order
to discover repositories.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 path.c  | 53 -----------------------------------------------------
 path.h  |  1 -
 setup.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+), 54 deletions(-)

diff --git a/path.c b/path.c
index cc02165530..bd6e25245d 100644
--- a/path.c
+++ b/path.c
@@ -5,7 +5,6 @@
 #include "abspath.h"
 #include "environment.h"
 #include "gettext.h"
-#include "hex.h"
 #include "repository.h"
 #include "strbuf.h"
 #include "string-list.h"
@@ -647,58 +646,6 @@ void strbuf_git_common_path(struct strbuf *sb,
 	va_end(args);
 }
 
-int validate_headref(const char *path)
-{
-	struct stat st;
-	char buffer[256];
-	const char *refname;
-	struct object_id oid;
-	int fd;
-	ssize_t len;
-
-	if (lstat(path, &st) < 0)
-		return -1;
-
-	/* Make sure it is a "refs/.." symlink */
-	if (S_ISLNK(st.st_mode)) {
-		len = readlink(path, buffer, sizeof(buffer)-1);
-		if (len >= 5 && !memcmp("refs/", buffer, 5))
-			return 0;
-		return -1;
-	}
-
-	/*
-	 * Anything else, just open it and try to see if it is a symbolic ref.
-	 */
-	fd = open(path, O_RDONLY);
-	if (fd < 0)
-		return -1;
-	len = read_in_full(fd, buffer, sizeof(buffer)-1);
-	close(fd);
-
-	if (len < 0)
-		return -1;
-	buffer[len] = '\0';
-
-	/*
-	 * Is it a symbolic ref?
-	 */
-	if (skip_prefix(buffer, "ref:", &refname)) {
-		while (isspace(*refname))
-			refname++;
-		if (starts_with(refname, "refs/"))
-			return 0;
-	}
-
-	/*
-	 * Is this a detached HEAD?
-	 */
-	if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN)
-		return 0;
-
-	return -1;
-}
-
 static struct passwd *getpw_str(const char *username, size_t len)
 {
 	struct passwd *pw;
diff --git a/path.h b/path.h
index ea96487b29..c3bc8617bd 100644
--- a/path.h
+++ b/path.h
@@ -173,7 +173,6 @@ const char *git_path_fetch_head(struct repository *r);
 const char *git_path_shallow(struct repository *r);
 
 int ends_with_path_components(const char *path, const char *components);
-int validate_headref(const char *ref);
 
 int calc_shared_perm(int mode);
 int adjust_shared_perm(const char *path);
diff --git a/setup.c b/setup.c
index f4b32f76e3..7c996659bd 100644
--- a/setup.c
+++ b/setup.c
@@ -4,6 +4,7 @@
 #include "environment.h"
 #include "exec-cmd.h"
 #include "gettext.h"
+#include "hex.h"
 #include "object-name.h"
 #include "refs.h"
 #include "repository.h"
@@ -341,6 +342,58 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
 	return ret;
 }
 
+static int validate_headref(const char *path)
+{
+	struct stat st;
+	char buffer[256];
+	const char *refname;
+	struct object_id oid;
+	int fd;
+	ssize_t len;
+
+	if (lstat(path, &st) < 0)
+		return -1;
+
+	/* Make sure it is a "refs/.." symlink */
+	if (S_ISLNK(st.st_mode)) {
+		len = readlink(path, buffer, sizeof(buffer)-1);
+		if (len >= 5 && !memcmp("refs/", buffer, 5))
+			return 0;
+		return -1;
+	}
+
+	/*
+	 * Anything else, just open it and try to see if it is a symbolic ref.
+	 */
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return -1;
+	len = read_in_full(fd, buffer, sizeof(buffer)-1);
+	close(fd);
+
+	if (len < 0)
+		return -1;
+	buffer[len] = '\0';
+
+	/*
+	 * Is it a symbolic ref?
+	 */
+	if (skip_prefix(buffer, "ref:", &refname)) {
+		while (isspace(*refname))
+			refname++;
+		if (starts_with(refname, "refs/"))
+			return 0;
+	}
+
+	/*
+	 * Is this a detached HEAD?
+	 */
+	if (get_oid_hex_any(buffer, &oid) != GIT_HASH_UNKNOWN)
+		return 0;
+
+	return -1;
+}
+
 /*
  * Test if it looks like we're at a git directory.
  * We want to see:
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 03/13] parse-options-cb: only abbreviate hashes when hash algo is known
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 02/13] path: move `validate_headref()` to its only user Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 04/13] attr: don't recompute default attribute source Patrick Steinhardt
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2445 bytes --]

The `OPT__ABBREV()` option can be used to add an option that abbreviates
object IDs. When given a length longer than `the_hash_algo->hexsz`, then
it will instead set the length to that maximum length.

It may not always be guaranteed that we have `the_hash_algo` initialized
properly as the hash algorithm can only be set up after we have set up
`the_repository`. In that case, the hash would always be truncated to
the hex length of SHA1, which may not be what the user desires.

In practice it's not a problem as all commands that use `OPT__ABBREV()`
also have `RUN_SETUP` set and thus cannot work without a repository.
Consequently, both `the_repository` and `the_hash_algo` would be
properly set up.

Regardless of that, harden the code to not truncate the length when we
didn't set up a repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 parse-options-cb.c       |  3 ++-
 t/t0040-parse-options.sh | 17 +++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/parse-options-cb.c b/parse-options-cb.c
index bdc7fae497..d99d688d3c 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -7,6 +7,7 @@
 #include "environment.h"
 #include "gettext.h"
 #include "object-name.h"
+#include "setup.h"
 #include "string-list.h"
 #include "strvec.h"
 #include "oid-array.h"
@@ -29,7 +30,7 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset)
 				     opt->long_name);
 		if (v && v < MINIMUM_ABBREV)
 			v = MINIMUM_ABBREV;
-		else if (v > the_hash_algo->hexsz)
+		else if (startup_info->have_repository && v > the_hash_algo->hexsz)
 			v = the_hash_algo->hexsz;
 	}
 	*(int *)(opt->value) = v;
diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
index 8bb2a8b453..45a773642f 100755
--- a/t/t0040-parse-options.sh
+++ b/t/t0040-parse-options.sh
@@ -176,6 +176,23 @@ test_expect_success 'long options' '
 	test_cmp expect output
 '
 
+test_expect_success 'abbreviate to something longer than SHA1 length' '
+	cat >expect <<-EOF &&
+	boolean: 0
+	integer: 0
+	magnitude: 0
+	timestamp: 0
+	string: (not set)
+	abbrev: 100
+	verbose: -1
+	quiet: 0
+	dry run: no
+	file: (not set)
+	EOF
+	test-tool parse-options --abbrev=100 >output &&
+	test_cmp expect output
+'
+
 test_expect_success 'missing required value' '
 	cat >expect <<-\EOF &&
 	error: switch `s'\'' requires a value
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 04/13] attr: don't recompute default attribute source
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-04-29  6:34   ` [PATCH v3 03/13] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 05/13] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2887 bytes --]

The `default_attr_source()` function lazily computes the attr source
supposedly once, only. This is done via a static variable `attr_source`
that contains the resolved object ID of the attr source's tree. If the
variable is the null object ID then we try to look up the attr source,
otherwise we skip over it.

This approach is flawed though: the variable will never be set to
anything else but the null object ID in case there is no attr source.
Consequently, we re-compute the information on every call. And in the
worst case, when we silently ignore bad trees, this will cause us to try
and look up the treeish every single time.

Improve this by introducing a separate variable `has_attr_source` to
track whether we already computed the attr source and, if so, whether we
have an attr source or not.

This also allows us to convert the `ignore_bad_attr_tree` to not be
static anymore as the code will only be executed once anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 attr.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/attr.c b/attr.c
index 679e42258c..9d911aeb31 100644
--- a/attr.c
+++ b/attr.c
@@ -1206,15 +1206,16 @@ static void collect_some_attrs(struct index_state *istate,
 }
 
 static const char *default_attr_source_tree_object_name;
-static int ignore_bad_attr_tree;
 
 void set_git_attr_source(const char *tree_object_name)
 {
 	default_attr_source_tree_object_name = xstrdup(tree_object_name);
 }
 
-static void compute_default_attr_source(struct object_id *attr_source)
+static int compute_default_attr_source(struct object_id *attr_source)
 {
+	int ignore_bad_attr_tree = 0;
+
 	if (!default_attr_source_tree_object_name)
 		default_attr_source_tree_object_name = getenv(GIT_ATTR_SOURCE_ENVIRONMENT);
 
@@ -1230,22 +1231,28 @@ static void compute_default_attr_source(struct object_id *attr_source)
 		ignore_bad_attr_tree = 1;
 	}
 
-	if (!default_attr_source_tree_object_name || !is_null_oid(attr_source))
-		return;
+	if (!default_attr_source_tree_object_name)
+		return 0;
 
 	if (repo_get_oid_treeish(the_repository,
 				 default_attr_source_tree_object_name,
-				 attr_source) && !ignore_bad_attr_tree)
-		die(_("bad --attr-source or GIT_ATTR_SOURCE"));
+				 attr_source)) {
+		if (!ignore_bad_attr_tree)
+			die(_("bad --attr-source or GIT_ATTR_SOURCE"));
+		return 0;
+	}
+
+	return 1;
 }
 
 static struct object_id *default_attr_source(void)
 {
 	static struct object_id attr_source;
+	static int has_attr_source = -1;
 
-	if (is_null_oid(&attr_source))
-		compute_default_attr_source(&attr_source);
-	if (is_null_oid(&attr_source))
+	if (has_attr_source < 0)
+		has_attr_source = compute_default_attr_source(&attr_source);
+	if (!has_attr_source)
 		return NULL;
 	return &attr_source;
 }
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 05/13] attr: fix BUG() when parsing attrs outside of repo
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2024-04-29  6:34   ` [PATCH v3 04/13] attr: don't recompute default attribute source Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 06/13] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]

If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are
set, then `compute_default_attr_source()` will try to look up the value
as a treeish. It is possible to hit that function while outside of a Git
repository though, for example when using `git grep --no-index`. In that
case, Git will hit a bug because we try to look up the main ref store
outside of a repository.

Handle the case gracefully and detect when we try to look up an attr
source without a repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 attr.c                |  6 ++++++
 t/t0003-attributes.sh | 15 +++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/attr.c b/attr.c
index 9d911aeb31..4bd09bcb4b 100644
--- a/attr.c
+++ b/attr.c
@@ -1234,6 +1234,12 @@ static int compute_default_attr_source(struct object_id *attr_source)
 	if (!default_attr_source_tree_object_name)
 		return 0;
 
+	if (!startup_info->have_repository) {
+		if (!ignore_bad_attr_tree)
+			die(_("cannot use --attr-source or GIT_ATTR_SOURCE without repo"));
+		return 0;
+	}
+
 	if (repo_get_oid_treeish(the_repository,
 				 default_attr_source_tree_object_name,
 				 attr_source)) {
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 774b52c298..3efdec54dd 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -428,6 +428,21 @@ test_expect_success 'precedence of --attr-source, GIT_ATTR_SOURCE, then attr.tre
 	)
 '
 
+test_expect_success 'diff without repository with attr source' '
+	mkdir -p "$TRASH_DIRECTORY/outside/nongit" &&
+	(
+		cd "$TRASH_DIRECTORY/outside/nongit" &&
+		GIT_CEILING_DIRECTORIES="$TRASH_DIRECTORY/outside" &&
+		export GIT_CEILING_DIRECTORIES &&
+		touch file &&
+		cat >expect <<-EOF &&
+		fatal: cannot use --attr-source or GIT_ATTR_SOURCE without repo
+		EOF
+		test_must_fail env GIT_ATTR_SOURCE=HEAD git grep --no-index foo file 2>err &&
+		test_cmp expect err
+	)
+'
+
 test_expect_success 'bare repository: with --source' '
 	(
 		cd bare.git &&
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 06/13] remote-curl: fix parsing of detached SHA256 heads
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2024-04-29  6:34   ` [PATCH v3 05/13] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3673 bytes --]

The dumb HTTP transport tries to read the remote HEAD reference by
downloading the "HEAD" file and then parsing it via `http_fetch_ref()`.
This function will either parse the file as an object ID in case it is
exactly `the_hash_algo->hexsz` long, or otherwise it will check whether
the reference starts with "ref :" and parse it as a symbolic ref.

This is broken when parsing detached HEADs of a remote SHA256 repository
because we never update `the_hash_algo` to the discovered remote object
hash. Consequently, `the_hash_algo` will always be the fallback SHA1
hash algorithm, which will cause us to fail parsing HEAD altogteher when
it contains a SHA256 object ID.

Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`.
While at it, let's make the expected SHA1 fallback explicit in our code,
which also addresses an upcoming issue where we are going to remove the
SHA1 fallback for `the_hash_algo`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 remote-curl.c              | 19 ++++++++++++++++++-
 t/t5550-http-fetch-dumb.sh | 15 +++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/remote-curl.c b/remote-curl.c
index 0b6d7815fd..004b707fdf 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -266,12 +266,23 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
 	return list;
 }
 
+/*
+ * Try to detect the hash algorithm used by the remote repository when using
+ * the dumb HTTP transport. As dumb transports cannot tell us the object hash
+ * directly have to derive it from the advertised ref lengths.
+ */
 static const struct git_hash_algo *detect_hash_algo(struct discovery *heads)
 {
 	const char *p = memchr(heads->buf, '\t', heads->len);
 	int algo;
+
+	/*
+	 * In case the remote has no refs we have no way to reliably determine
+	 * the object hash used by that repository. In that case we simply fall
+	 * back to SHA1, which may or may not be correct.
+	 */
 	if (!p)
-		return the_hash_algo;
+		return &hash_algos[GIT_HASH_SHA1];
 
 	algo = hash_algo_by_length((p - heads->buf) / 2);
 	if (algo == GIT_HASH_UNKNOWN)
@@ -295,6 +306,12 @@ static struct ref *parse_info_refs(struct discovery *heads)
 		    "is this a git repository?",
 		    transport_anonymize_url(url.buf));
 
+	/*
+	 * Set the repository's hash algo to whatever we have just detected.
+	 * This ensures that we can correctly parse the remote references.
+	 */
+	repo_set_hash_algo(the_repository, hash_algo_by_ptr(options.hash_algo));
+
 	data = heads->buf;
 	start = NULL;
 	mid = data;
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 4c3b32785d..5f16cbc58d 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -55,6 +55,21 @@ test_expect_success 'list refs from outside any repository' '
 	test_cmp expect actual
 '
 
+
+test_expect_success 'list detached HEAD from outside any repository' '
+	git clone --mirror "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
+		"$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" \
+		update-ref --no-deref HEAD refs/heads/main &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/repo-detached.git" update-server-info &&
+	cat >expect <<-EOF &&
+	$(git rev-parse main)	HEAD
+	$(git rev-parse main)	refs/heads/main
+	EOF
+	nongit git ls-remote "$HTTPD_URL/dumb/repo-detached.git" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'create password-protected repository' '
 	mkdir -p "$HTTPD_DOCUMENT_ROOT_PATH/auth/dumb/" &&
 	cp -Rf "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2024-04-29  6:34   ` [PATCH v3 06/13] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

The `--short=` option for git-rev-parse(1) allows the user to specify
to how many characters object IDs should be shortened to. The option is
broken though for SHA256 repositories because we set the maximum allowed
hash size to `the_hash_algo->hexsz` before we have even set up the repo.
Consequently, `the_hash_algo` will always be SHA1 and thus we truncate
every hash after at most 40 characters.

Fix this by accessing `the_hash_algo` only after we have set up the
repo.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/rev-parse.c  | 5 ++---
 t/t1500-rev-parse.sh | 6 ++++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 624182e507..d7b87c605e 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -687,7 +687,6 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 	const char *name = NULL;
 	struct object_context unused;
 	struct strbuf buf = STRBUF_INIT;
-	const int hexsz = the_hash_algo->hexsz;
 	int seen_end_of_options = 0;
 	enum format_type format = FORMAT_DEFAULT;
 
@@ -863,8 +862,8 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				abbrev = strtoul(arg, NULL, 10);
 				if (abbrev < MINIMUM_ABBREV)
 					abbrev = MINIMUM_ABBREV;
-				else if (hexsz <= abbrev)
-					abbrev = hexsz;
+				else if ((int)the_hash_algo->hexsz <= abbrev)
+					abbrev = the_hash_algo->hexsz;
 				continue;
 			}
 			if (!strcmp(arg, "--sq")) {
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index a669e592f1..30c31918fd 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -304,4 +304,10 @@ test_expect_success 'rev-parse --bisect includes bad, excludes good' '
 	test_cmp expect actual
 '
 
+test_expect_success '--short= truncates to the actual hash length' '
+	git rev-parse HEAD >expect &&
+	git rev-parse --short=100 HEAD >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo`
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2024-04-29  6:34   ` [PATCH v3 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 09/13] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2003 bytes --]

We access `the_hash_algo` in git-blame(1) before we have executed
`parse_options_start()`, which may not be properly set up in case we
have no repository. This is fine for most of the part because all the
call paths that lead to it (git-blame(1), git-annotate(1) as well as
git-pick-axe(1)) specify `RUN_SETUP` and thus require a repository.

There is one exception though, namely when passing `-h` to print the
help. Here we will access `the_hash_algo` even if there is no repo.
This works fine right now because `the_hash_algo` gets sets up to point
to the SHA1 algorithm via `initialize_repository()`. But we're about to
stop doing this, and thus the code would lead to a `NULL` pointer
exception.

Prepare the code for this and only access `the_hash_algo` after we are
sure that there is a proper repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/blame.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 9aa74680a3..e325825936 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -915,7 +915,6 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	struct range_set ranges;
 	unsigned int range_i;
 	long anchor;
-	const int hexsz = the_hash_algo->hexsz;
 	long num_lines = 0;
 	const char *str_usage = cmd_is_annotate ? annotate_usage : blame_usage;
 	const char **opt_usage = cmd_is_annotate ? annotate_opt_usage : blame_opt_usage;
@@ -973,11 +972,11 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	} else if (show_progress < 0)
 		show_progress = isatty(2);
 
-	if (0 < abbrev && abbrev < hexsz)
+	if (0 < abbrev && abbrev < (int)the_hash_algo->hexsz)
 		/* one more abbrev length is needed for the boundary commit */
 		abbrev++;
 	else if (!abbrev)
-		abbrev = hexsz;
+		abbrev = the_hash_algo->hexsz;
 
 	if (revs_file && read_ancestry(revs_file))
 		die_errno("reading graft file '%s' failed", revs_file);
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 09/13] builtin/bundle: abort "verify" early when there is no repository
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2024-04-29  6:34   ` [PATCH v3 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:34   ` [PATCH v3 10/13] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]

Verifying a bundle requires us to have a repository. This is encoded in
`verify_bundle()`, which will return an error if there is no repository.
We call `open_bundle()` before we call `verify_bundle()` though, which
already performs some verifications even though we may ultimately abort
due to a missing repository.

This is problematic because `open_bundle()` already reads the bundle
header and verifies that it contains a properly formatted hash. When
there is no repository we have no clue what hash function to expect
though, so we always end up assuming SHA1 here, which may or may not be
correct. Furthermore, we are about to stop initializing `the_hash_algo`
when there is no repository, which will lead to segfaults.

Check early on whether we have a repository to fix this issue.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/bundle.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/bundle.c b/builtin/bundle.c
index 3ad11dc5d0..d5d41a8f67 100644
--- a/builtin/bundle.c
+++ b/builtin/bundle.c
@@ -140,6 +140,11 @@ static int cmd_bundle_verify(int argc, const char **argv, const char *prefix) {
 			builtin_bundle_verify_usage, options, &bundle_file);
 	/* bundle internals use argv[1] as further parameters */
 
+	if (!startup_info->have_repository) {
+		ret = error(_("need a repository to verify a bundle"));
+		goto cleanup;
+	}
+
 	if ((bundle_fd = open_bundle(bundle_file, &header, &name)) < 0) {
 		ret = 1;
 		goto cleanup;
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 10/13] builtin/diff: explicitly set hash algo when there is no repo
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2024-04-29  6:34   ` [PATCH v3 09/13] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt
@ 2024-04-29  6:34   ` Patrick Steinhardt
  2024-04-29  6:35   ` [PATCH v3 11/13] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

The git-diff(1) command can be used outside repositories to diff two
files with each other. But even if there is no repository we will end up
hashing the files that we are diffing so that we can print the "index"
line:

    ```
    diff --git a/a b/b
    index 7898192..6178079 100644
    --- a/a
    +++ b/b
    @@ -1 +1 @@
    -a
    +b
    ```

We implicitly use SHA1 to calculate the hash here, which is because
`the_repository` gets initialized with SHA1 during the startup routine.
We are about to stop doing this though such that `the_repository` only
ever has a hash function when it was properly initialized via a repo's
configuration.

To give full control to our users, we would ideally add a new switch to
git-diff(1) that allows them to specify the hash function when executed
outside of a repository. But for now, we only convert the code to make
this explicit such that we can stop setting the default hash algorithm
for `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/diff.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/builtin/diff.c b/builtin/diff.c
index 6e196e0c7d..58ec7e5da2 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -465,6 +465,15 @@ int cmd_diff(int argc, const char **argv, const char *prefix)
 			no_index = DIFF_NO_INDEX_IMPLICIT;
 	}
 
+	/*
+	 * When operating outside of a Git repository we need to have a hash
+	 * algorithm at hand so that we can generate the blob hashes. We
+	 * default to SHA1 here, but may eventually want to change this to be
+	 * configurable via a command line option.
+	 */
+	if (nongit)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	init_diff_ui_defaults();
 	git_config(git_diff_ui_config, NULL);
 	prefix = precompose_argv_prefix(argc, argv, prefix);
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 11/13] builtin/shortlog: don't set up revisions without repo
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2024-04-29  6:34   ` [PATCH v3 10/13] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
@ 2024-04-29  6:35   ` Patrick Steinhardt
  2024-04-29  6:35   ` [PATCH v3 12/13] oss-fuzz/commit-graph: set up hash algorithm Patrick Steinhardt
  2024-04-29  6:35   ` [PATCH v3 13/13] repository: stop setting SHA1 as the default object hash Patrick Steinhardt
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1300 bytes --]

It is possible to run git-shortlog(1) outside of a repository by passing
it output from git-log(1) via standard input. Obviously, as there is no
repository in that context, it is thus unsupported to pass any revisions
as arguments.

Regardless of that we still end up calling `setup_revisions()`. While
that works alright, it is somewhat strange. Furthermore, this is about
to cause problems when we unset the default object hash.

Refactor the code to only call `setup_revisions()` when we have a
repository. This is safe to do as we already verify that there are no
arguments when running outside of a repository anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/shortlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 3c7cd2d6ef..d4daf31e22 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -435,7 +435,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix)
 		usage_with_options(shortlog_usage, options);
 	}
 
-	if (setup_revisions(argc, argv, &rev, NULL) != 1) {
+	if (!nongit && setup_revisions(argc, argv, &rev, NULL) != 1) {
 		error(_("unrecognized argument: %s"), argv[1]);
 		usage_with_options(shortlog_usage, options);
 	}
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 12/13] oss-fuzz/commit-graph: set up hash algorithm
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2024-04-29  6:35   ` [PATCH v3 11/13] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
@ 2024-04-29  6:35   ` Patrick Steinhardt
  2024-04-29  6:35   ` [PATCH v3 13/13] repository: stop setting SHA1 as the default object hash Patrick Steinhardt
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]

Our fuzzing setups don't work in a proper repository, but only use the
in-memory configured `the_repository`. Consequently, we never go through
the full repository setup procedures and thus do not set up the hash
algo used by the repository.

The commit-graph fuzzer does rely on a properly initialized hash algo
though. Initialize it explicitly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 oss-fuzz/fuzz-commit-graph.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/oss-fuzz/fuzz-commit-graph.c b/oss-fuzz/fuzz-commit-graph.c
index 2992079dd9..94ecbb9242 100644
--- a/oss-fuzz/fuzz-commit-graph.c
+++ b/oss-fuzz/fuzz-commit-graph.c
@@ -18,6 +18,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	 * touching the disk to keep the individual fuzz-test cases as fast as
 	 * possible.
 	 */
+	repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
 	the_repository->settings.commit_graph_generation_version = 2;
 	the_repository->settings.commit_graph_read_changed_paths = 1;
 	g = parse_commit_graph(&the_repository->settings, (void *)data, size);
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 13/13] repository: stop setting SHA1 as the default object hash
  2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
                     ` (11 preceding siblings ...)
  2024-04-29  6:35   ` [PATCH v3 12/13] oss-fuzz/commit-graph: set up hash algorithm Patrick Steinhardt
@ 2024-04-29  6:35   ` Patrick Steinhardt
  12 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2024-04-29  6:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, brian m. carlson, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1696 bytes --]

During the startup of Git, we call `initialize_the_repository()` to set
up `the_repository` as well as `the_index`. Part of this setup is also
to set the default object hash of the repository to SHA1. This has the
effect that `the_hash_algo` is getting initialized to SHA1, as well.
This default hash algorithm eventually gets overridden by most Git
commands via `setup_git_directory()`, which also detects the actual hash
algorithm used by the repository.

There are some commands though that don't access a repository at all, or
at a later point only, and thus retain the default hash function for
some amount of time. As some of the the preceding commits demonstrate,
this can lead to subtle issues when we access `the_hash_algo` when no
repository has been set up.

Address this issue by dropping the set up of the default hash algorithm
completely. The effect of this is that `the_hash_algo` will map to a
`NULL` pointer and thus cause Git to crash when something tries to
access the hash algorithm without it being properly initialized. It thus
forces all Git commands to explicitly set up the hash algorithm in case
there is no repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 repository.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/repository.c b/repository.c
index e15b416944..b65b1a8c8b 100644
--- a/repository.c
+++ b/repository.c
@@ -35,8 +35,6 @@ void initialize_the_repository(void)
 	the_repo.parsed_objects = parsed_object_pool_new();
 
 	index_state_init(&the_index, the_repository);
-
-	repo_set_hash_algo(&the_repo, GIT_HASH_SHA1);
 }
 
 static void expand_base_dir(char **out, const char *in,
-- 
2.45.0-rc1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2024-04-29  6:35 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-19  9:51 [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` Patrick Steinhardt
2024-04-19  9:51 ` [PATCH 01/11] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
2024-04-19 19:03   ` brian m. carlson
2024-04-22  4:56     ` Patrick Steinhardt
2024-04-22 16:15   ` Junio C Hamano
2024-04-23  4:50     ` Patrick Steinhardt
2024-04-23 16:54       ` Junio C Hamano
2024-04-19  9:51 ` [PATCH 02/11] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
2024-04-23  0:30   ` Justin Tobler
2024-04-19  9:51 ` [PATCH 03/11] attr: don't recompute default attribute source Patrick Steinhardt
2024-04-23  0:32   ` Justin Tobler
2024-04-19  9:51 ` [PATCH 04/11] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt
2024-04-19  9:51 ` [PATCH 05/11] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt
2024-04-19  9:51 ` [PATCH 06/11] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt
2024-04-19  9:51 ` [PATCH 07/11] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt
2024-04-19  9:51 ` [PATCH 08/11] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt
2024-04-19  9:51 ` [PATCH 09/11] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
2024-04-22 18:41   ` Junio C Hamano
2024-04-19  9:51 ` [PATCH 10/11] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
2024-04-23  0:35   ` Justin Tobler
2024-04-19  9:51 ` [PATCH 11/11] repository: stop setting SHA1 as the default object hash Patrick Steinhardt
2024-04-19 19:12 ` [PATCH 00/11] Stop relying on SHA1 fallback for `the_hash_algo` brian m. carlson
2024-04-19 19:16   ` Junio C Hamano
2024-04-22  4:56   ` Patrick Steinhardt
2024-04-23  5:07 ` [PATCH v2 00/12] " Patrick Steinhardt
2024-04-23  5:07   ` [PATCH v2 01/12] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
2024-04-23  5:07   ` [PATCH v2 02/12] path: move `validate_headref()` to its only user Patrick Steinhardt
2024-04-23  5:07   ` [PATCH v2 03/12] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
2024-04-23  5:07   ` [PATCH v2 04/12] attr: don't recompute default attribute source Patrick Steinhardt
2024-04-23  5:07   ` [PATCH v2 05/12] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt
2024-04-23  5:07   ` [PATCH v2 06/12] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt
2024-04-23  5:07   ` [PATCH v2 07/12] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt
2024-04-23  5:08   ` [PATCH v2 08/12] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt
2024-04-23  5:08   ` [PATCH v2 09/12] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt
2024-04-23  5:08   ` [PATCH v2 10/12] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
2024-04-23  5:08   ` [PATCH v2 11/12] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
2024-04-23  5:08   ` [PATCH v2 12/12] repository: stop setting SHA1 as the default object hash Patrick Steinhardt
2024-04-27 22:09   ` [PATCH v2 00/12] Stop relying on SHA1 fallback for `the_hash_algo` Junio C Hamano
2024-04-29  6:05     ` Patrick Steinhardt
2024-04-29  6:34 ` [PATCH v3 00/13] " Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 01/13] path: harden validation of HEAD with non-standard hashes Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 02/13] path: move `validate_headref()` to its only user Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 03/13] parse-options-cb: only abbreviate hashes when hash algo is known Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 04/13] attr: don't recompute default attribute source Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 05/13] attr: fix BUG() when parsing attrs outside of repo Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 06/13] remote-curl: fix parsing of detached SHA256 heads Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 07/13] builtin/rev-parse: allow shortening to more than 40 hex characters Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 08/13] builtin/blame: don't access potentially unitialized `the_hash_algo` Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 09/13] builtin/bundle: abort "verify" early when there is no repository Patrick Steinhardt
2024-04-29  6:34   ` [PATCH v3 10/13] builtin/diff: explicitly set hash algo when there is no repo Patrick Steinhardt
2024-04-29  6:35   ` [PATCH v3 11/13] builtin/shortlog: don't set up revisions without repo Patrick Steinhardt
2024-04-29  6:35   ` [PATCH v3 12/13] oss-fuzz/commit-graph: set up hash algorithm Patrick Steinhardt
2024-04-29  6:35   ` [PATCH v3 13/13] repository: stop setting SHA1 as the default object hash Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).