git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix use of uninitialized hash algos
@ 2024-05-13  7:15 Patrick Steinhardt
  2024-05-13  7:15 ` [PATCH 1/2] builtin/patch-id: fix uninitialized hash function Patrick Steinhardt
                   ` (6 more replies)
  0 siblings, 7 replies; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  7:15 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2068 bytes --]

Hi,

with c8aed5e8da (repository: stop setting SHA1 as the default object
hash, 2024-05-07), we have stopped setting up the default hash function
for `the_repository`. This change was done so that we stop implicitly
using SHA1 in places where we don't really intend to. Instead, code
where we try to access `the_hash_algo` without having `the_repository`
properly initialized will now crash hard.

I have found two more cases where this can now be triggered:

  - git-patch-id(1) can read diffs from stdin.

  - git-hash-object(1) can hash data from stdin.

Both cases can work without a repository, and if they don't have one
they will now crash.

I still consider it a good thing that we did the change regardless of
those crashes. In the case of git-patch-id(1) I would claim that using
`the_hash_algo` is wrong in the first place, as patch IDs should be
stable and are documented to always use SHA1. Thus, patch IDs in SHA256
repos are essentially broken. And in the case of git-hash-object(1), we
should expose a command line option to let the user specify the object
hash. So both cases demonstrate that there is room for improvement.

If these cases keep on popping up and we don't feel comfortable with it,
then we can still decide to drop c8aed5e8da. The remainder of the topic
that this commit was part of should in that case stay though, as those
are real bug fixes. We could then re-try in a subsequent release cycle.
But for now I don't think this would be warranted yet.

This topic depends on js/ps/undecided-is-not-necessarily-sha1 at
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07).

Thanks!

Patrick

Patrick Steinhardt (2):
  builtin/patch-id: fix uninitialized hash function
  builtin/hash-object: fix uninitialized hash function

 builtin/hash-object.c  |  7 +++++++
 builtin/patch-id.c     | 13 +++++++++++++
 t/t1007-hash-object.sh |  6 ++++++
 t/t4204-patch-id.sh    | 34 ++++++++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+)

-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/2] builtin/patch-id: fix uninitialized hash function
  2024-05-13  7:15 [PATCH 0/2] Fix use of uninitialized hash algos Patrick Steinhardt
@ 2024-05-13  7:15 ` Patrick Steinhardt
  2024-05-13  7:15 ` [PATCH 2/2] builtin/hash-object: " Patrick Steinhardt
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  7:15 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4066 bytes --]

In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have adapted `initialize_repository()` to no longer set
up a default hash function. As this function is also used to set up
`the_repository`, the consequence is that `the_hash_algo` will now by
default be a `NULL` pointer unless the hash algorithm was configured
properly. This is done as a mechanism to detect cases where we may be
using the wrong hash function by accident.

This change now causes git-patch-id(1) to segfault when it's run outside
of a repository. As this command can read diffs from stdin, it does not
necessarily need a repository, but then relies on `the_hash_algo` to
compute the patch ID itself.

It is somewhat dubious that git-patch-id(1) relies on `the_hash_algo` in
the first place. Quoting its manpage:

    A "patch ID" is nothing but a sum of SHA-1 of the file diffs
    associated with a patch, with line numbers ignored. As such, it’s
    "reasonably stable", but at the same time also reasonably unique,
    i.e., two patches that have the same "patch ID" are almost
    guaranteed to be the same thing.

We explicitly document patch IDs to be using SHA-1. Furthermore, patch
IDs are supposed to be stable for most of the part. But even with the
same input, the patch IDs will now be different depending on the repo's
configured object hash.

Work around the issue by setting up SHA-1 when there was no startup
repository for now. This is arguably not the correct fix, but for now we
rather want to focus on getting the segfault fixed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/patch-id.c  | 13 +++++++++++++
 t/t4204-patch-id.sh | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 3894d2b970..be5a85e71c 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -5,6 +5,7 @@
 #include "hash.h"
 #include "hex.h"
 #include "parse-options.h"
+#include "setup.h"
 
 static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
@@ -237,6 +238,18 @@ int cmd_patch_id(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
 			     patch_id_usage, 0);
 
+	/*
+	 * We rely on `the_hash_algo` to compute patch IDs. This is dubious as
+	 * it means that the hash algorithm now depends on the object hash of
+	 * the repository, even though git-patch-id(1) clearly defines that
+	 * patch IDs always use SHA1.
+	 *
+	 * TODO: This hack should be removed in favor of converting the code
+	 *       that computes patch IDs to always use SHA1.
+	 */
+	if (!startup_info->have_repository)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	generate_id_list(opts ? opts > 1 : config.stable,
 			 opts ? opts == 3 : config.verbatim);
 	return 0;
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index a7fa94ce0a..605faea0c7 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -310,4 +310,38 @@ test_expect_success 'patch-id handles diffs with one line of before/after' '
 	test_config patchid.stable true &&
 	calc_patch_id diffu1stable <diffu1
 '
+
+test_expect_failure 'patch-id computes same ID with different object hashes' '
+	test_when_finished "rm -rf repo-sha1 repo-sha256" &&
+
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+
+	git init --object-format=sha1 repo-sha1 &&
+	git -C repo-sha1 patch-id <diff >patch-id-sha1 &&
+	git init --object-format=sha256 repo-sha256 &&
+	git -C repo-sha256 patch-id <diff >patch-id-sha256 &&
+	test_cmp patch-id-sha1 patch-id-sha256
+'
+
+test_expect_success 'patch-id without repository' '
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+	nongit git patch-id <diff
+'
+
 test_done
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 2/2] builtin/hash-object: fix uninitialized hash function
  2024-05-13  7:15 [PATCH 0/2] Fix use of uninitialized hash algos Patrick Steinhardt
  2024-05-13  7:15 ` [PATCH 1/2] builtin/patch-id: fix uninitialized hash function Patrick Steinhardt
@ 2024-05-13  7:15 ` Patrick Steinhardt
  2024-05-14  0:16   ` Junio C Hamano
  2024-05-13 16:01 ` [PATCH 0/2] Fix use of uninitialized hash algos Junio C Hamano
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  7:15 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]

The git-hash-object(1) command allows users to hash an object even
without a repository. Starting with c8aed5e8da (repository: stop setting
SHA1 as the default object hash, 2024-05-07), this will make us hit an
uninitialized hash function, which subsequently leads to a segfault.

Fix this by falling back to SHA-1 explicitly when running outside of a
Git repository. Eventually, we should expose this function as a command
line option to the users so that they can pick which object hash to use
by themselves.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/hash-object.c  | 7 +++++++
 t/t1007-hash-object.sh | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 82ca6d2bfd..0855f4f8aa 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -123,6 +123,13 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	else
 		prefix = setup_git_directory_gently(&nongit);
 
+	/*
+	 * TODO: Allow the hash algorithm to be configured by the user via a
+	 *       command line option when not using `-w`.
+	 */
+	if (nongit)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	if (vpath && prefix) {
 		vpath_free = prefix_filename(prefix, vpath);
 		vpath = vpath_free;
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64aea38486..4c138c6ca4 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -260,4 +260,10 @@ test_expect_success '--literally with extra-long type' '
 	echo example | git hash-object -t $t --literally --stdin
 '
 
+test_expect_success '--stdin outside of repository' '
+	nongit git hash-object --stdin <hello >actual &&
+	echo "$(test_oid hello)" >expect &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/2] Fix use of uninitialized hash algos
  2024-05-13  7:15 [PATCH 0/2] Fix use of uninitialized hash algos Patrick Steinhardt
  2024-05-13  7:15 ` [PATCH 1/2] builtin/patch-id: fix uninitialized hash function Patrick Steinhardt
  2024-05-13  7:15 ` [PATCH 2/2] builtin/hash-object: " Patrick Steinhardt
@ 2024-05-13 16:01 ` Junio C Hamano
  2024-05-13 18:36   ` Junio C Hamano
  2024-05-13 19:21 ` [PATCH v2 0/4] Fix use of uninitialized hash algorithms Junio C Hamano
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 16:01 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> with c8aed5e8da (repository: stop setting SHA1 as the default object
> hash, 2024-05-07), we have stopped setting up the default hash function
> for `the_repository`. This change was done so that we stop implicitly
> using SHA1 in places where we don't really intend to. Instead, code
> where we try to access `the_hash_algo` without having `the_repository`
> properly initialized will now crash hard.
>
> I have found two more cases where this can now be triggered:
>
>   - git-patch-id(1) can read diffs from stdin.
>
>   - git-hash-object(1) can hash data from stdin.
>
> Both cases can work without a repository, and if they don't have one
> they will now crash.

Perhaps we should double-check with all commands that are designed
to be able to work outside a repository, e.g. "git apply", "git grep
--no-index", "git diff --no-index" (tried to be exhausitive without
consulting documentation, so the list is not exhausitive at all).

> I still consider it a good thing that we did the change regardless of
> those crashes. In the case of git-patch-id(1) I would claim that using
> `the_hash_algo` is wrong in the first place, as patch IDs should be
> stable and are documented to always use SHA1. Thus, patch IDs in SHA256
> repos are essentially broken. And in the case of git-hash-object(1), we
> should expose a command line option to let the user specify the object
> hash. So both cases demonstrate that there is room for improvement.

It is good that the topic is kept outside 'master' (and it is in
'next' to give the topic a bit wider exposure than merely in 'seen'
and the list archive).

We may want a test file that explicitly make commands that ought
to work outside a repository actually run outside a repository,
making use of the GIT_CEILING_DIRECTORIES mechanism, something along
the lines of the attached.

 t/t1517-outside-repo.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git c/t/t1517-outside-repo.sh w/t/t1517-outside-repo.sh
new file mode 100755
index 0000000000..4c595c2ff7
--- /dev/null
+++ w/t/t1517-outside-repo.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description='check random commands outside repo'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'set up a non-repo directory and test file' '
+	GIT_CEILING_DIRECTORIES=$(pwd) &&
+	export GIT_CEILING_DIRECTORIES &&
+	mkdir non-repo &&
+	(
+		cd non-repo &&
+		# confirm that git does not find a repo
+		test_must_fail git rev-parse --git-dir
+	) &&
+	test_write_lines one two three four >nums &&
+	git add nums &&
+	cp nums nums.old &&
+	test_write_lines five >>nums &&
+	git diff >sample.patch
+'
+
+test_expect_success 'apply a patch outside repository' '
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git apply ../sample.patch
+	) &&
+	test_cmp nums non-repo/nums
+'
+
+test_expect_success 'compute a patch-id outside repository' '
+	git patch-id <sample.patch >patch-id.expect &&
+	(
+		cd non-repo &&
+		git patch-id <../sample.patch >../patch-id.actual
+	) &&
+	test_cmp patch-id.expect patch-id.actual
+'
+
+test_done

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/2] Fix use of uninitialized hash algos
  2024-05-13 16:01 ` [PATCH 0/2] Fix use of uninitialized hash algos Junio C Hamano
@ 2024-05-13 18:36   ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 18:36 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Kyle Lippincott, Jonathan Nieder

Junio C Hamano <gitster@pobox.com> writes:

>> I still consider it a good thing that we did the change regardless of
>> those crashes. In the case of git-patch-id(1) I would claim that using
>> `the_hash_algo` is wrong in the first place, as patch IDs should be
>> stable and are documented to always use SHA1. Thus, patch IDs in SHA256
>> repos are essentially broken. And in the case of git-hash-object(1), we
>> should expose a command line option to let the user specify the object
>> hash. So both cases demonstrate that there is room for improvement.
>
> It is good that the topic is kept outside 'master' (and it is in
> 'next' to give the topic a bit wider exposure than merely in 'seen'
> and the list archive).
>
> We may want a test file that explicitly make commands that ought
> to work outside a repository actually run outside a repository,
> making use of the GIT_CEILING_DIRECTORIES mechanism, something along
> the lines of the attached.

Another thought.  Perhaps we should do, in the meantime, the
attached patch?

Without your "fix patch-id" patch, one of the tests in 1517 will
fail, and the other fails as there is no fix posted yet, but with
the GIT_TEST_DEFAULT_HASH_IS_SHA1 set explicit to 1, they keep
passing.  This way, when we run out tests, we will always leave the
default hash uninitialized to catch more errors, the binary built
out of our source by default will crash to help our users catch more
errors for us, yet when they really really need to, the users can
set and export GIT_TEST_DEFAULT_HASH_IS_SHA1=1 to keep assuming that
SHA1 is the hash algorithm when there is no specified.


 repository.c            | 24 ++++++++++++++++++++++++
 t/t1517-outside-repo.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
 t/test-lib.sh           |  4 ++++
 3 files changed, 70 insertions(+)

diff --git c/repository.c w/repository.c
index 15c10015b0..6f67966e35 100644
--- c/repository.c
+++ w/repository.c
@@ -26,6 +26,30 @@ void initialize_repository(struct repository *repo)
 	repo->parsed_objects = parsed_object_pool_new();
 	ALLOC_ARRAY(repo->index, 1);
 	index_state_init(repo->index, repo);
+
+	/*
+	 * Unfortunately, we need to keep this hack around for the time being:
+	 *
+	 *   - Not setting up the hash algorithm for `the_repository` leads to
+	 *     crashes because `the_hash_algo` is a macro that expands to
+	 *     `the_repository->hash_algo`. So if Git commands try to access
+	 *     `the_hash_algo` without a Git directory we crash.
+	 *
+	 *   - Setting up the hash algorithm to be SHA1 by default breaks other
+	 *     commands when running with SHA256.
+	 *
+	 * This is another point in case why having global state is a bad idea.
+	 * Eventually, we should remove this hack and stop setting the hash
+	 * algorithm in this function altogether. Instead, it should only ever
+	 * be set via our repository setup procedures. But that requires more
+	 * work.
+	 *
+	 * As we discover the code paths that need fixing, we may remove this
+	 * code completely, but we are not there yet.
+	 */
+	if (repo == the_repository &&
+	    git_env_bool("GIT_TEST_DEFAULT_HASH_IS_SHA1", 0))
+		repo_set_hash_algo(repo, GIT_HASH_SHA1);
 }
 
 static void expand_base_dir(char **out, const char *in,
diff --git c/t/t1517-outside-repo.sh w/t/t1517-outside-repo.sh
new file mode 100755
index 0000000000..4c595c2ff7
--- /dev/null
+++ w/t/t1517-outside-repo.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description='check random commands outside repo'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'set up a non-repo directory and test file' '
+	GIT_CEILING_DIRECTORIES=$(pwd) &&
+	export GIT_CEILING_DIRECTORIES &&
+	mkdir non-repo &&
+	(
+		cd non-repo &&
+		# confirm that git does not find a repo
+		test_must_fail git rev-parse --git-dir
+	) &&
+	test_write_lines one two three four >nums &&
+	git add nums &&
+	cp nums nums.old &&
+	test_write_lines five >>nums &&
+	git diff >sample.patch
+'
+
+test_expect_success 'apply a patch outside repository' '
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git apply ../sample.patch
+	) &&
+	test_cmp nums non-repo/nums
+'
+
+test_expect_success 'compute a patch-id outside repository' '
+	git patch-id <sample.patch >patch-id.expect &&
+	(
+		cd non-repo &&
+		git patch-id <../sample.patch >../patch-id.actual
+	) &&
+	test_cmp patch-id.expect patch-id.actual
+'
+
+test_done
diff --git c/t/test-lib.sh w/t/test-lib.sh
index 79d3e0e7d9..36d311fb4a 100644
--- c/t/test-lib.sh
+++ w/t/test-lib.sh
@@ -542,6 +542,10 @@ export EDITOR
 
 GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-sha1}"
 export GIT_DEFAULT_HASH
+
+GIT_TEST_DEFAULT_HASH_IS_SHA1=0
+export GIT_TEST_DEFAULT_HASH_IS_SHA1
+
 GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}"
 export GIT_DEFAULT_REF_FORMAT
 GIT_TEST_MERGE_ALGORITHM="${GIT_TEST_MERGE_ALGORITHM:-ort}"

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 0/4] Fix use of uninitialized hash algorithms
  2024-05-13  7:15 [PATCH 0/2] Fix use of uninitialized hash algos Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2024-05-13 16:01 ` [PATCH 0/2] Fix use of uninitialized hash algos Junio C Hamano
@ 2024-05-13 19:21 ` Junio C Hamano
  2024-05-13 19:21   ` [PATCH v2 1/4] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
                     ` (4 more replies)
  2024-05-13 22:41 ` [PATCH v3 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
                   ` (2 subsequent siblings)
  6 siblings, 5 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 19:21 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

The series adds an escape hatch to "the default hash algorithm is
no longer SHA-1; it is an error to rely on the default" by
introducing an undocumented GIT_TEST_DEFAULT_HASH_IS_SHA1=1
environment variable that users can set after reporting an failure
they saw to us.

We can see that Patrick's two fixes to patch-id and hash-object
corrects the breakage as the series progresses.

Junio C Hamano (2):
  setup: add an escape hatch for "no more default hash algorithm" change
  t1517: test commands that are designed to be run outside repository

Patrick Steinhardt (2):
  builtin/patch-id: fix uninitialized hash function
  builtin/hash-object: fix uninitialized hash function

 builtin/hash-object.c   |  7 +++++
 builtin/patch-id.c      | 13 +++++++++
 repository.c            | 24 ++++++++++++++++
 t/t1007-hash-object.sh  |  6 ++++
 t/t1517-outside-repo.sh | 61 +++++++++++++++++++++++++++++++++++++++++
 t/t4204-patch-id.sh     | 34 +++++++++++++++++++++++
 t/test-lib.sh           |  4 +++
 7 files changed, 149 insertions(+)
 create mode 100755 t/t1517-outside-repo.sh

-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v2 1/4] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-13 19:21 ` [PATCH v2 0/4] Fix use of uninitialized hash algorithms Junio C Hamano
@ 2024-05-13 19:21   ` Junio C Hamano
  2024-05-13 19:48     ` Kyle Lippincott
  2024-05-13 19:21   ` [PATCH v2 2/4] t1517: test commands that are designed to be run outside repository Junio C Hamano
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 19:21 UTC (permalink / raw)
  To: git

Partially revert c8aed5e8 (repository: stop setting SHA1 as the
default object hash, 2024-05-07), to keep end-user systems still
broken when we have gap in our test coverage but yet give them an
escape hatch to set the GIT_TEST_DEFAULT_HASH_IS_SHA1=1 environment
variable to revert to the previous behaviour.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 repository.c  | 24 ++++++++++++++++++++++++
 t/test-lib.sh |  4 ++++
 2 files changed, 28 insertions(+)

diff --git a/repository.c b/repository.c
index 15c10015b0..6f67966e35 100644
--- a/repository.c
+++ b/repository.c
@@ -26,6 +26,30 @@ void initialize_repository(struct repository *repo)
 	repo->parsed_objects = parsed_object_pool_new();
 	ALLOC_ARRAY(repo->index, 1);
 	index_state_init(repo->index, repo);
+
+	/*
+	 * Unfortunately, we need to keep this hack around for the time being:
+	 *
+	 *   - Not setting up the hash algorithm for `the_repository` leads to
+	 *     crashes because `the_hash_algo` is a macro that expands to
+	 *     `the_repository->hash_algo`. So if Git commands try to access
+	 *     `the_hash_algo` without a Git directory we crash.
+	 *
+	 *   - Setting up the hash algorithm to be SHA1 by default breaks other
+	 *     commands when running with SHA256.
+	 *
+	 * This is another point in case why having global state is a bad idea.
+	 * Eventually, we should remove this hack and stop setting the hash
+	 * algorithm in this function altogether. Instead, it should only ever
+	 * be set via our repository setup procedures. But that requires more
+	 * work.
+	 *
+	 * As we discover the code paths that need fixing, we may remove this
+	 * code completely, but we are not there yet.
+	 */
+	if (repo == the_repository &&
+	    git_env_bool("GIT_TEST_DEFAULT_HASH_IS_SHA1", 0))
+		repo_set_hash_algo(repo, GIT_HASH_SHA1);
 }
 
 static void expand_base_dir(char **out, const char *in,
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 79d3e0e7d9..36d311fb4a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -542,6 +542,10 @@ export EDITOR
 
 GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-sha1}"
 export GIT_DEFAULT_HASH
+
+GIT_TEST_DEFAULT_HASH_IS_SHA1=0
+export GIT_TEST_DEFAULT_HASH_IS_SHA1
+
 GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}"
 export GIT_DEFAULT_REF_FORMAT
 GIT_TEST_MERGE_ALGORITHM="${GIT_TEST_MERGE_ALGORITHM:-ort}"
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 2/4] t1517: test commands that are designed to be run outside repository
  2024-05-13 19:21 ` [PATCH v2 0/4] Fix use of uninitialized hash algorithms Junio C Hamano
  2024-05-13 19:21   ` [PATCH v2 1/4] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
@ 2024-05-13 19:21   ` Junio C Hamano
  2024-05-13 19:57     ` Kyle Lippincott
  2024-05-13 19:21   ` [PATCH v2 3/4] builtin/patch-id: fix uninitialized hash function Junio C Hamano
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 19:21 UTC (permalink / raw)
  To: git

A few commands, like "git apply" and "git patch-id", have been
broken with a recent change to stop setting the default hash
algorithm to SHA-1.  Test them and fix them in later commits.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t1517-outside-repo.sh | 61 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100755 t/t1517-outside-repo.sh

diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
new file mode 100755
index 0000000000..e0fd495ec1
--- /dev/null
+++ b/t/t1517-outside-repo.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='check random commands outside repo'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'set up a non-repo directory and test file' '
+	GIT_CEILING_DIRECTORIES=$(pwd) &&
+	export GIT_CEILING_DIRECTORIES &&
+	mkdir non-repo &&
+	(
+		cd non-repo &&
+		# confirm that git does not find a repo
+		test_must_fail git rev-parse --git-dir
+	) &&
+	test_write_lines one two three four >nums &&
+	git add nums &&
+	cp nums nums.old &&
+	test_write_lines five >>nums &&
+	git diff >sample.patch
+'
+
+test_expect_failure 'compute a patch-id outside repository' '
+	git patch-id <sample.patch >patch-id.expect &&
+	(
+		cd non-repo &&
+		git patch-id <../sample.patch >../patch-id.actual
+	) &&
+	test_cmp patch-id.expect patch-id.actual
+'
+
+test_expect_failure 'hash-object outside repository' '
+	git hash-object --stdin <sample.patch >hash.expect &&
+	(
+		cd non-repo &&
+		git hash-object --stdin <../sample.patch >../hash.actual
+	) &&
+	test_cmp hash.expect hash.actual
+'
+
+test_expect_failure 'apply a patch outside repository' '
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git apply ../sample.patch
+	) &&
+	test_cmp nums non-repo/nums
+'
+
+test_expect_success 'grep outside repository' '
+	git grep --cached two >expect &&
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git grep --no-index two >../actual
+	) &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 3/4] builtin/patch-id: fix uninitialized hash function
  2024-05-13 19:21 ` [PATCH v2 0/4] Fix use of uninitialized hash algorithms Junio C Hamano
  2024-05-13 19:21   ` [PATCH v2 1/4] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
  2024-05-13 19:21   ` [PATCH v2 2/4] t1517: test commands that are designed to be run outside repository Junio C Hamano
@ 2024-05-13 19:21   ` Junio C Hamano
  2024-05-13 19:21   ` [PATCH v2 4/4] builtin/hash-object: " Junio C Hamano
  2024-05-13 21:28   ` [PATCH 5/4] apply: " Junio C Hamano
  4 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 19:21 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

From: Patrick Steinhardt <ps@pks.im>

In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have adapted `initialize_repository()` to no longer set
up a default hash function. As this function is also used to set up
`the_repository`, the consequence is that `the_hash_algo` will now by
default be a `NULL` pointer unless the hash algorithm was configured
properly. This is done as a mechanism to detect cases where we may be
using the wrong hash function by accident.

This change now causes git-patch-id(1) to segfault when it's run outside
of a repository. As this command can read diffs from stdin, it does not
necessarily need a repository, but then relies on `the_hash_algo` to
compute the patch ID itself.

It is somewhat dubious that git-patch-id(1) relies on `the_hash_algo` in
the first place. Quoting its manpage:

    A "patch ID" is nothing but a sum of SHA-1 of the file diffs
    associated with a patch, with line numbers ignored. As such, it’s
    "reasonably stable", but at the same time also reasonably unique,
    i.e., two patches that have the same "patch ID" are almost
    guaranteed to be the same thing.

We explicitly document patch IDs to be using SHA-1. Furthermore, patch
IDs are supposed to be stable for most of the part. But even with the
same input, the patch IDs will now be different depending on the repo's
configured object hash.

Work around the issue by setting up SHA-1 when there was no startup
repository for now. This is arguably not the correct fix, but for now we
rather want to focus on getting the segfault fixed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/patch-id.c      | 13 +++++++++++++
 t/t1517-outside-repo.sh |  2 +-
 t/t4204-patch-id.sh     | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 3894d2b970..be5a85e71c 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -5,6 +5,7 @@
 #include "hash.h"
 #include "hex.h"
 #include "parse-options.h"
+#include "setup.h"
 
 static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
@@ -237,6 +238,18 @@ int cmd_patch_id(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
 			     patch_id_usage, 0);
 
+	/*
+	 * We rely on `the_hash_algo` to compute patch IDs. This is dubious as
+	 * it means that the hash algorithm now depends on the object hash of
+	 * the repository, even though git-patch-id(1) clearly defines that
+	 * patch IDs always use SHA1.
+	 *
+	 * TODO: This hack should be removed in favor of converting the code
+	 *       that computes patch IDs to always use SHA1.
+	 */
+	if (!startup_info->have_repository)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	generate_id_list(opts ? opts > 1 : config.stable,
 			 opts ? opts == 3 : config.verbatim);
 	return 0;
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index e0fd495ec1..ac5f3191cc 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -21,7 +21,7 @@ test_expect_success 'set up a non-repo directory and test file' '
 	git diff >sample.patch
 '
 
-test_expect_failure 'compute a patch-id outside repository' '
+test_expect_success 'compute a patch-id outside repository' '
 	git patch-id <sample.patch >patch-id.expect &&
 	(
 		cd non-repo &&
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index a7fa94ce0a..605faea0c7 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -310,4 +310,38 @@ test_expect_success 'patch-id handles diffs with one line of before/after' '
 	test_config patchid.stable true &&
 	calc_patch_id diffu1stable <diffu1
 '
+
+test_expect_failure 'patch-id computes same ID with different object hashes' '
+	test_when_finished "rm -rf repo-sha1 repo-sha256" &&
+
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+
+	git init --object-format=sha1 repo-sha1 &&
+	git -C repo-sha1 patch-id <diff >patch-id-sha1 &&
+	git init --object-format=sha256 repo-sha256 &&
+	git -C repo-sha256 patch-id <diff >patch-id-sha256 &&
+	test_cmp patch-id-sha1 patch-id-sha256
+'
+
+test_expect_success 'patch-id without repository' '
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+	nongit git patch-id <diff
+'
+
 test_done
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v2 4/4] builtin/hash-object: fix uninitialized hash function
  2024-05-13 19:21 ` [PATCH v2 0/4] Fix use of uninitialized hash algorithms Junio C Hamano
                     ` (2 preceding siblings ...)
  2024-05-13 19:21   ` [PATCH v2 3/4] builtin/patch-id: fix uninitialized hash function Junio C Hamano
@ 2024-05-13 19:21   ` Junio C Hamano
  2024-05-13 21:28   ` [PATCH 5/4] apply: " Junio C Hamano
  4 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 19:21 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

From: Patrick Steinhardt <ps@pks.im>

The git-hash-object(1) command allows users to hash an object even
without a repository. Starting with c8aed5e8da (repository: stop setting
SHA1 as the default object hash, 2024-05-07), this will make us hit an
uninitialized hash function, which subsequently leads to a segfault.

Fix this by falling back to SHA-1 explicitly when running outside of a
Git repository. Eventually, we should expose this function as a command
line option to the users so that they can pick which object hash to use
by themselves.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/hash-object.c   | 7 +++++++
 t/t1007-hash-object.sh  | 6 ++++++
 t/t1517-outside-repo.sh | 2 +-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 82ca6d2bfd..0855f4f8aa 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -123,6 +123,13 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	else
 		prefix = setup_git_directory_gently(&nongit);
 
+	/*
+	 * TODO: Allow the hash algorithm to be configured by the user via a
+	 *       command line option when not using `-w`.
+	 */
+	if (nongit)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	if (vpath && prefix) {
 		vpath_free = prefix_filename(prefix, vpath);
 		vpath = vpath_free;
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64aea38486..4c138c6ca4 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -260,4 +260,10 @@ test_expect_success '--literally with extra-long type' '
 	echo example | git hash-object -t $t --literally --stdin
 '
 
+test_expect_success '--stdin outside of repository' '
+	nongit git hash-object --stdin <hello >actual &&
+	echo "$(test_oid hello)" >expect &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index ac5f3191cc..854bb8f343 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -30,7 +30,7 @@ test_expect_success 'compute a patch-id outside repository' '
 	test_cmp patch-id.expect patch-id.actual
 '
 
-test_expect_failure 'hash-object outside repository' '
+test_expect_success 'hash-object outside repository' '
 	git hash-object --stdin <sample.patch >hash.expect &&
 	(
 		cd non-repo &&
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 1/4] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-13 19:21   ` [PATCH v2 1/4] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
@ 2024-05-13 19:48     ` Kyle Lippincott
  0 siblings, 0 replies; 62+ messages in thread
From: Kyle Lippincott @ 2024-05-13 19:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, May 13, 2024 at 12:21 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Partially revert c8aed5e8 (repository: stop setting SHA1 as the
> default object hash, 2024-05-07), to keep end-user systems still
> broken when we have gap in our test coverage but yet give them an
> escape hatch to set the GIT_TEST_DEFAULT_HASH_IS_SHA1=1 environment
> variable to revert to the previous behaviour.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  repository.c  | 24 ++++++++++++++++++++++++

Nice idea, this concept LG.

>  t/test-lib.sh |  4 ++++
>  2 files changed, 28 insertions(+)
>
> diff --git a/repository.c b/repository.c
> index 15c10015b0..6f67966e35 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -26,6 +26,30 @@ void initialize_repository(struct repository *repo)
>         repo->parsed_objects = parsed_object_pool_new();
>         ALLOC_ARRAY(repo->index, 1);
>         index_state_init(repo->index, repo);
> +
> +       /*
> +        * Unfortunately, we need to keep this hack around for the time being:

Nit: the comment says we need to keep it around, but it's actually
defaulted to off. We may want to add clarification of the current
status to the comment (or replace the entire comment with a comment
describing that we added a hack to the hack, and that it should be
removed "soon").

> +        *
> +        *   - Not setting up the hash algorithm for `the_repository` leads to
> +        *     crashes because `the_hash_algo` is a macro that expands to
> +        *     `the_repository->hash_algo`. So if Git commands try to access
> +        *     `the_hash_algo` without a Git directory we crash.
> +        *
> +        *   - Setting up the hash algorithm to be SHA1 by default breaks other
> +        *     commands when running with SHA256.
> +        *
> +        * This is another point in case why having global state is a bad idea.
> +        * Eventually, we should remove this hack and stop setting the hash
> +        * algorithm in this function altogether. Instead, it should only ever
> +        * be set via our repository setup procedures. But that requires more
> +        * work.
> +        *
> +        * As we discover the code paths that need fixing, we may remove this
> +        * code completely, but we are not there yet.
> +        */
> +       if (repo == the_repository &&
> +           git_env_bool("GIT_TEST_DEFAULT_HASH_IS_SHA1", 0))
> +               repo_set_hash_algo(repo, GIT_HASH_SHA1);
>  }
>
>  static void expand_base_dir(char **out, const char *in,
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 79d3e0e7d9..36d311fb4a 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -542,6 +542,10 @@ export EDITOR
>
>  GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-sha1}"
>  export GIT_DEFAULT_HASH
> +
> +GIT_TEST_DEFAULT_HASH_IS_SHA1=0

This is the default as being established in this commit, correct?
Should we add a comment saying this is the default and describe why we
have it here anyway?

> +export GIT_TEST_DEFAULT_HASH_IS_SHA1
> +
>  GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}"
>  export GIT_DEFAULT_REF_FORMAT
>  GIT_TEST_MERGE_ALGORITHM="${GIT_TEST_MERGE_ALGORITHM:-ort}"
> --
> 2.45.0-145-g3e4a232f6e
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 2/4] t1517: test commands that are designed to be run outside repository
  2024-05-13 19:21   ` [PATCH v2 2/4] t1517: test commands that are designed to be run outside repository Junio C Hamano
@ 2024-05-13 19:57     ` Kyle Lippincott
  2024-05-13 20:33       ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Kyle Lippincott @ 2024-05-13 19:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, May 13, 2024 at 12:21 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> A few commands, like "git apply" and "git patch-id", have been
> broken with a recent change to stop setting the default hash
> algorithm to SHA-1.  Test them and fix them in later commits.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  t/t1517-outside-repo.sh | 61 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
>  create mode 100755 t/t1517-outside-repo.sh
>
> diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
> new file mode 100755
> index 0000000000..e0fd495ec1
> --- /dev/null
> +++ b/t/t1517-outside-repo.sh
> @@ -0,0 +1,61 @@
> +#!/bin/sh
> +
> +test_description='check random commands outside repo'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +test_expect_success 'set up a non-repo directory and test file' '
> +       GIT_CEILING_DIRECTORIES=$(pwd) &&
> +       export GIT_CEILING_DIRECTORIES &&
> +       mkdir non-repo &&
> +       (
> +               cd non-repo &&
> +               # confirm that git does not find a repo
> +               test_must_fail git rev-parse --git-dir
> +       ) &&
> +       test_write_lines one two three four >nums &&
> +       git add nums &&
> +       cp nums nums.old &&
> +       test_write_lines five >>nums &&
> +       git diff >sample.patch
> +'
> +
> +test_expect_failure 'compute a patch-id outside repository' '

Do we only expect failure because of a temporary condition (the bug
that is mentioned in the commit message)? If so, we should probably
add a TODO, FIXME, or some other similar style of comment that
describes that this should be fixed. This way, if the patch series to
fix the issue doesn't materialize, people don't read the test file and
think that these commands aren't supported outside of a repository.

Do we have a way of catching the specific failure mode? i.e. if it
crashes, is there a test_expect_crash? I'm thinking that it might be
nice to be more specific about what kind of failure we expect, this
way if it fails in a different way before we convert these to
test_expect_success, the test fails (due to the unexpected change).
I'm assuming that for crashes of this type there's no good/portable
way of verifying that it's a specific crash, but having the test check
for a difference between an exit code that indicates a signal was
raised and an exit code that indicates that the process returned
"naturally" after an unsuccessful execution might be feasible, if we
already have such a mechanism. Adding one just for this series doesn't
seem justified.

> +       git patch-id <sample.patch >patch-id.expect &&
> +       (
> +               cd non-repo &&
> +               git patch-id <../sample.patch >../patch-id.actual
> +       ) &&
> +       test_cmp patch-id.expect patch-id.actual
> +'
> +
> +test_expect_failure 'hash-object outside repository' '
> +       git hash-object --stdin <sample.patch >hash.expect &&
> +       (
> +               cd non-repo &&
> +               git hash-object --stdin <../sample.patch >../hash.actual
> +       ) &&
> +       test_cmp hash.expect hash.actual
> +'
> +
> +test_expect_failure 'apply a patch outside repository' '
> +       (
> +               cd non-repo &&
> +               cp ../nums.old nums &&
> +               git apply ../sample.patch
> +       ) &&
> +       test_cmp nums non-repo/nums
> +'
> +
> +test_expect_success 'grep outside repository' '
> +       git grep --cached two >expect &&
> +       (
> +               cd non-repo &&
> +               cp ../nums.old nums &&
> +               git grep --no-index two >../actual
> +       ) &&
> +       test_cmp expect actual
> +'
> +
> +test_done
> --
> 2.45.0-145-g3e4a232f6e
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 2/4] t1517: test commands that are designed to be run outside repository
  2024-05-13 19:57     ` Kyle Lippincott
@ 2024-05-13 20:33       ` Junio C Hamano
  2024-05-13 21:00         ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 20:33 UTC (permalink / raw)
  To: Kyle Lippincott; +Cc: git

Kyle Lippincott <spectral@google.com> writes:

> Do we only expect failure because of a temporary condition (the bug
> that is mentioned in the commit message)? If so, we should probably
> add a TODO, FIXME, or some other similar style of comment that
> describes that this should be fixed.

test_expect_failure is description enough for that purpose.

If a command should NOT work outside the project we will write a
test more like so:

	test_expect_success 'foo does not work outside' '
		... prepare that $cwd is outside ...
		test_must_fail git foo
	'

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 2/4] t1517: test commands that are designed to be run outside repository
  2024-05-13 20:33       ` Junio C Hamano
@ 2024-05-13 21:00         ` Junio C Hamano
  2024-05-13 21:07           ` Kyle Lippincott
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 21:00 UTC (permalink / raw)
  To: Kyle Lippincott; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Kyle Lippincott <spectral@google.com> writes:
>
>> Do we only expect failure because of a temporary condition (the bug
>> that is mentioned in the commit message)? If so, we should probably
>> add a TODO, FIXME, or some other similar style of comment that
>> describes that this should be fixed.
>
> test_expect_failure is description enough for that purpose.

We say this in t/README:

 - test_expect_failure [<prereq>] <message> <script>

   This is NOT the opposite of test_expect_success, but is used
   to mark a test that demonstrates a known breakage.  Unlike
   the usual test_expect_success tests, which say "ok" on
   success and "FAIL" on failure, this will say "FIXED" on
   success and "still broken" on failure.  Failures from these
   tests won't cause -i (immediate) to stop.

Which means that when somebody rans out of things to do, grepping
for test_expect_failure may give them a good place to start ;-).

Note that there were a few very rare occasions that what was marked
as "known breakage" with test_expect_failure turned out to be what
was working as intended.

Thanks.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v2 2/4] t1517: test commands that are designed to be run outside repository
  2024-05-13 21:00         ` Junio C Hamano
@ 2024-05-13 21:07           ` Kyle Lippincott
  0 siblings, 0 replies; 62+ messages in thread
From: Kyle Lippincott @ 2024-05-13 21:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, May 13, 2024 at 2:00 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Kyle Lippincott <spectral@google.com> writes:
> >
> >> Do we only expect failure because of a temporary condition (the bug
> >> that is mentioned in the commit message)? If so, we should probably
> >> add a TODO, FIXME, or some other similar style of comment that
> >> describes that this should be fixed.
> >
> > test_expect_failure is description enough for that purpose.
>
> We say this in t/README:
>
>  - test_expect_failure [<prereq>] <message> <script>
>
>    This is NOT the opposite of test_expect_success, but is used
>    to mark a test that demonstrates a known breakage.  Unlike
>    the usual test_expect_success tests, which say "ok" on
>    success and "FAIL" on failure, this will say "FIXED" on
>    success and "still broken" on failure.  Failures from these
>    tests won't cause -i (immediate) to stop.

Got it, thanks for explaining. With that, this change looks good to me.

>
> Which means that when somebody rans out of things to do, grepping
> for test_expect_failure may give them a good place to start ;-).
>
> Note that there were a few very rare occasions that what was marked
> as "known breakage" with test_expect_failure turned out to be what
> was working as intended.
>
> Thanks.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 5/4] apply: fix uninitialized hash function
  2024-05-13 19:21 ` [PATCH v2 0/4] Fix use of uninitialized hash algorithms Junio C Hamano
                     ` (3 preceding siblings ...)
  2024-05-13 19:21   ` [PATCH v2 4/4] builtin/hash-object: " Junio C Hamano
@ 2024-05-13 21:28   ` Junio C Hamano
  4 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 21:28 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

"git apply" can work outside a repository as a better "GNU patch",
but when it does so, it still assumed that it can access
the_hash_algo, which is no longer true in the new world order.

Make sure we explicitly fall back to SHA-1 algorithm for backward
compatibility.  Just like we argued for hash-objects, we probably
should have a mechanism to specify the hash algorithm used outside
a repository.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * After finishing these steps, it started to dawn on me that we
   probably can instead add GIT_DEFAULT_HASH_ALGO environment
   variable for exactly that purpose.  Then the escape hatch I
   introduced in [1/4] can become just that.

 builtin/apply.c         | 4 ++++
 t/t1517-outside-repo.sh | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 861a01910c..e9175f820f 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -1,6 +1,7 @@
 #include "builtin.h"
 #include "gettext.h"
 #include "repository.h"
+#include "hash.h"
 #include "apply.h"
 
 static const char * const apply_usage[] = {
@@ -18,6 +19,9 @@ int cmd_apply(int argc, const char **argv, const char *prefix)
 	if (init_apply_state(&state, the_repository, prefix))
 		exit(128);
 
+	if (!the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	argc = apply_parse_options(argc, argv,
 				   &state, &force_apply, &options,
 				   apply_usage);
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index 854bb8f343..6f32a40c6d 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -39,7 +39,7 @@ test_expect_success 'hash-object outside repository' '
 	test_cmp hash.expect hash.actual
 '
 
-test_expect_failure 'apply a patch outside repository' '
+test_expect_success 'apply a patch outside repository' '
 	(
 		cd non-repo &&
 		cp ../nums.old nums &&
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 0/5] Fix use of uninitialized hash algorithms
  2024-05-13  7:15 [PATCH 0/2] Fix use of uninitialized hash algos Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2024-05-13 19:21 ` [PATCH v2 0/4] Fix use of uninitialized hash algorithms Junio C Hamano
@ 2024-05-13 22:41 ` Junio C Hamano
  2024-05-13 22:41   ` [PATCH v3 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
                     ` (4 more replies)
  2024-05-14  1:14 ` [PATCH v4 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
  2024-05-20 23:14 ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
  6 siblings, 5 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 22:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

In this round, the escape hatch is no longer the escape hatch, but
also serves as an end-user knob to specify what hash algorithm to
use when there is no repository to tell us which one to use.

We can see that Patrick's two fixes to patch-id and hash-object, as
well as my fix to apply, correct the breakage as the series
progresses.

Junio C Hamano (3):
  setup: add an escape hatch for "no more default hash algorithm" change
  t1517: test commands that are designed to be run outside repository
  apply: fix uninitialized hash function

Patrick Steinhardt (2):
  builtin/patch-id: fix uninitialized hash function
  builtin/hash-object: fix uninitialized hash function

 builtin/apply.c         |  4 +++
 builtin/hash-object.c   |  3 ++
 builtin/patch-id.c      | 13 +++++++++
 repository.c            | 32 +++++++++++++++++++++
 t/t1007-hash-object.sh  |  6 ++++
 t/t1517-outside-repo.sh | 64 +++++++++++++++++++++++++++++++++++++++++
 t/t4204-patch-id.sh     | 34 ++++++++++++++++++++++
 7 files changed, 156 insertions(+)
 create mode 100755 t/t1517-outside-repo.sh

-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v3 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-13 22:41 ` [PATCH v3 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
@ 2024-05-13 22:41   ` Junio C Hamano
  2024-05-13 22:41   ` [PATCH v3 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 22:41 UTC (permalink / raw)
  To: git

Partially revert c8aed5e8 (repository: stop setting SHA1 as the
default object hash, 2024-05-07), to keep end-user systems still
broken when we have gap in our test coverage but yet give them an
escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGORITHM environment
variable to "sha1" in order to revert to the previous behaviour.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 repository.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/repository.c b/repository.c
index 15c10015b0..2001c760dc 100644
--- a/repository.c
+++ b/repository.c
@@ -19,6 +19,20 @@
 static struct repository the_repo;
 struct repository *the_repository = &the_repo;
 
+static void set_default_hash_algo(struct repository *repo)
+{
+	const char *hash_name;
+	int algo;
+
+	hash_name = getenv("GIT_DEFAULT_HASH_ALGORITHM");
+	if (!hash_name)
+		return;
+	algo = hash_algo_by_name(hash_name);
+	if (algo == GIT_HASH_UNKNOWN)
+		return;
+	repo_set_hash_algo(repo, algo);
+}
+
 void initialize_repository(struct repository *repo)
 {
 	repo->objects = raw_object_store_new();
@@ -26,6 +40,24 @@ void initialize_repository(struct repository *repo)
 	repo->parsed_objects = parsed_object_pool_new();
 	ALLOC_ARRAY(repo->index, 1);
 	index_state_init(repo->index, repo);
+
+	/*
+	 * When a command runs inside a repository, it learns what
+	 * hash algorithm is in use from the repository, but some
+	 * commands are designed to work outside a repository, yet
+	 * they want to access the_hash_algo, if only for the length
+	 * of the hashed value to see if their input looks like a
+	 * plausible hash value.
+	 *
+	 * We are in the process of identifying the codepaths and
+	 * giving them an appropriate default individually; any
+	 * unconverted codepath that tries to access the_hash_algo
+	 * will thus fail.  The end-users however have an escape hatch
+	 * to set GIT_DEFAULT_HASH_ALGORITHM environment variable to
+	 * "sha1" get back the old behaviour of defaulting to SHA-1.
+	 */
+	if (repo == the_repository)
+		set_default_hash_algo(repo);
 }
 
 static void expand_base_dir(char **out, const char *in,
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 2/5] t1517: test commands that are designed to be run outside repository
  2024-05-13 22:41 ` [PATCH v3 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
  2024-05-13 22:41   ` [PATCH v3 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
@ 2024-05-13 22:41   ` Junio C Hamano
  2024-05-13 22:41   ` [PATCH v3 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 22:41 UTC (permalink / raw)
  To: git

A few commands, like "git apply" and "git patch-id", have been
broken with a recent change to stop setting the default hash
algorithm to SHA-1.  Test them and fix them in later commits.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t1517-outside-repo.sh | 64 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
 create mode 100755 t/t1517-outside-repo.sh

diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
new file mode 100755
index 0000000000..16d9714c27
--- /dev/null
+++ b/t/t1517-outside-repo.sh
@@ -0,0 +1,64 @@
+#!/bin/sh
+
+test_description='check random commands outside repo'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+# Do not use the end-user workaround for this test.
+# GIT_DEFAULT_HASH_ALGORITHM=sha1; export GIT_DEFAULT_HASH_ALGORITHM
+
+test_expect_success 'set up a non-repo directory and test file' '
+	GIT_CEILING_DIRECTORIES=$(pwd) &&
+	export GIT_CEILING_DIRECTORIES &&
+	mkdir non-repo &&
+	(
+		cd non-repo &&
+		# confirm that git does not find a repo
+		test_must_fail git rev-parse --git-dir
+	) &&
+	test_write_lines one two three four >nums &&
+	git add nums &&
+	cp nums nums.old &&
+	test_write_lines five >>nums &&
+	git diff >sample.patch
+'
+
+test_expect_failure 'compute a patch-id outside repository' '
+	git patch-id <sample.patch >patch-id.expect &&
+	(
+		cd non-repo &&
+		git patch-id <../sample.patch >../patch-id.actual
+	) &&
+	test_cmp patch-id.expect patch-id.actual
+'
+
+test_expect_failure 'hash-object outside repository' '
+	git hash-object --stdin <sample.patch >hash.expect &&
+	(
+		cd non-repo &&
+		git hash-object --stdin <../sample.patch >../hash.actual
+	) &&
+	test_cmp hash.expect hash.actual
+'
+
+test_expect_failure 'apply a patch outside repository' '
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git apply ../sample.patch
+	) &&
+	test_cmp nums non-repo/nums
+'
+
+test_expect_success 'grep outside repository' '
+	git grep --cached two >expect &&
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git grep --no-index two >../actual
+	) &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 3/5] builtin/patch-id: fix uninitialized hash function
  2024-05-13 22:41 ` [PATCH v3 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
  2024-05-13 22:41   ` [PATCH v3 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
  2024-05-13 22:41   ` [PATCH v3 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
@ 2024-05-13 22:41   ` Junio C Hamano
  2024-05-13 23:11     ` Junio C Hamano
  2024-05-13 22:41   ` [PATCH v3 4/5] builtin/hash-object: " Junio C Hamano
  2024-05-13 22:41   ` [PATCH v3 5/5] apply: " Junio C Hamano
  4 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 22:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

From: Patrick Steinhardt <ps@pks.im>

In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have adapted `initialize_repository()` to no longer set
up a default hash function. As this function is also used to set up
`the_repository`, the consequence is that `the_hash_algo` will now by
default be a `NULL` pointer unless the hash algorithm was configured
properly. This is done as a mechanism to detect cases where we may be
using the wrong hash function by accident.

This change now causes git-patch-id(1) to segfault when it's run outside
of a repository. As this command can read diffs from stdin, it does not
necessarily need a repository, but then relies on `the_hash_algo` to
compute the patch ID itself.

It is somewhat dubious that git-patch-id(1) relies on `the_hash_algo` in
the first place. Quoting its manpage:

    A "patch ID" is nothing but a sum of SHA-1 of the file diffs
    associated with a patch, with line numbers ignored. As such, it’s
    "reasonably stable", but at the same time also reasonably unique,
    i.e., two patches that have the same "patch ID" are almost
    guaranteed to be the same thing.

We explicitly document patch IDs to be using SHA-1. Furthermore, patch
IDs are supposed to be stable for most of the part. But even with the
same input, the patch IDs will now be different depending on the repo's
configured object hash.

Work around the issue by setting up SHA-1 when there was no startup
repository for now. This is arguably not the correct fix, but for now we
rather want to focus on getting the segfault fixed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/patch-id.c      | 13 +++++++++++++
 t/t1517-outside-repo.sh |  2 +-
 t/t4204-patch-id.sh     | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 3894d2b970..e6ae89beab 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -5,6 +5,7 @@
 #include "hash.h"
 #include "hex.h"
 #include "parse-options.h"
+#include "setup.h"
 
 static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
@@ -237,6 +238,18 @@ int cmd_patch_id(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
 			     patch_id_usage, 0);
 
+	/*
+	 * We rely on `the_hash_algo` to compute patch IDs. This is dubious as
+	 * it means that the hash algorithm now depends on the object hash of
+	 * the repository, even though git-patch-id(1) clearly defines that
+	 * patch IDs always use SHA1.
+	 *
+	 * NEEDSWORK: This hack should be removed in favor of converting
+	 * the code that computes patch IDs to always use SHA1.
+	 */
+	if (!startup_info->have_repository)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	generate_id_list(opts ? opts > 1 : config.stable,
 			 opts ? opts == 3 : config.verbatim);
 	return 0;
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index 16d9714c27..f1fd5c9888 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -24,7 +24,7 @@ test_expect_success 'set up a non-repo directory and test file' '
 	git diff >sample.patch
 '
 
-test_expect_failure 'compute a patch-id outside repository' '
+test_expect_success 'compute a patch-id outside repository' '
 	git patch-id <sample.patch >patch-id.expect &&
 	(
 		cd non-repo &&
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index a7fa94ce0a..605faea0c7 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -310,4 +310,38 @@ test_expect_success 'patch-id handles diffs with one line of before/after' '
 	test_config patchid.stable true &&
 	calc_patch_id diffu1stable <diffu1
 '
+
+test_expect_failure 'patch-id computes same ID with different object hashes' '
+	test_when_finished "rm -rf repo-sha1 repo-sha256" &&
+
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+
+	git init --object-format=sha1 repo-sha1 &&
+	git -C repo-sha1 patch-id <diff >patch-id-sha1 &&
+	git init --object-format=sha256 repo-sha256 &&
+	git -C repo-sha256 patch-id <diff >patch-id-sha256 &&
+	test_cmp patch-id-sha1 patch-id-sha256
+'
+
+test_expect_success 'patch-id without repository' '
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+	nongit git patch-id <diff
+'
+
 test_done
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 4/5] builtin/hash-object: fix uninitialized hash function
  2024-05-13 22:41 ` [PATCH v3 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
                     ` (2 preceding siblings ...)
  2024-05-13 22:41   ` [PATCH v3 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
@ 2024-05-13 22:41   ` Junio C Hamano
  2024-05-13 23:13     ` Junio C Hamano
  2024-05-13 22:41   ` [PATCH v3 5/5] apply: " Junio C Hamano
  4 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 22:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

From: Patrick Steinhardt <ps@pks.im>

The git-hash-object(1) command allows users to hash an object even
without a repository. Starting with c8aed5e8da (repository: stop setting
SHA1 as the default object hash, 2024-05-07), this will make us hit an
uninitialized hash function, which subsequently leads to a segfault.

Fix this by falling back to SHA-1 explicitly when running outside of a
Git repository. Users can use GIT_DEFAULT_HASH_ALGORITHM environment to
specify what hash algorithm they want, so arguably this code should not
be needed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/hash-object.c   | 3 +++
 t/t1007-hash-object.sh  | 6 ++++++
 t/t1517-outside-repo.sh | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 82ca6d2bfd..c767414a0c 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -123,6 +123,9 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	else
 		prefix = setup_git_directory_gently(&nongit);
 
+	if (nongit && !the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	if (vpath && prefix) {
 		vpath_free = prefix_filename(prefix, vpath);
 		vpath = vpath_free;
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64aea38486..4c138c6ca4 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -260,4 +260,10 @@ test_expect_success '--literally with extra-long type' '
 	echo example | git hash-object -t $t --literally --stdin
 '
 
+test_expect_success '--stdin outside of repository' '
+	nongit git hash-object --stdin <hello >actual &&
+	echo "$(test_oid hello)" >expect &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index f1fd5c9888..80261e43c7 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -33,7 +33,7 @@ test_expect_success 'compute a patch-id outside repository' '
 	test_cmp patch-id.expect patch-id.actual
 '
 
-test_expect_failure 'hash-object outside repository' '
+test_expect_success 'hash-object outside repository' '
 	git hash-object --stdin <sample.patch >hash.expect &&
 	(
 		cd non-repo &&
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v3 5/5] apply: fix uninitialized hash function
  2024-05-13 22:41 ` [PATCH v3 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
                     ` (3 preceding siblings ...)
  2024-05-13 22:41   ` [PATCH v3 4/5] builtin/hash-object: " Junio C Hamano
@ 2024-05-13 22:41   ` Junio C Hamano
  4 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 22:41 UTC (permalink / raw)
  To: git

"git apply" can work outside a repository as a better "GNU patch",
but when it does so, it still assumed that it can access
the_hash_algo, which is no longer true in the new world order.

Make sure we explicitly fall back to SHA-1 algorithm for backward
compatibility.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/apply.c         | 4 ++++
 t/t1517-outside-repo.sh | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 861a01910c..e9175f820f 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -1,6 +1,7 @@
 #include "builtin.h"
 #include "gettext.h"
 #include "repository.h"
+#include "hash.h"
 #include "apply.h"
 
 static const char * const apply_usage[] = {
@@ -18,6 +19,9 @@ int cmd_apply(int argc, const char **argv, const char *prefix)
 	if (init_apply_state(&state, the_repository, prefix))
 		exit(128);
 
+	if (!the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	argc = apply_parse_options(argc, argv,
 				   &state, &force_apply, &options,
 				   apply_usage);
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index 80261e43c7..89d89cf3f0 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -42,7 +42,7 @@ test_expect_success 'hash-object outside repository' '
 	test_cmp hash.expect hash.actual
 '
 
-test_expect_failure 'apply a patch outside repository' '
+test_expect_success 'apply a patch outside repository' '
 	(
 		cd non-repo &&
 		cp ../nums.old nums &&
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 3/5] builtin/patch-id: fix uninitialized hash function
  2024-05-13 22:41   ` [PATCH v3 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
@ 2024-05-13 23:11     ` Junio C Hamano
  2024-05-14  4:31       ` Patrick Steinhardt
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 23:11 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

Junio C Hamano <gitster@pobox.com> writes:

> We explicitly document patch IDs to be using SHA-1. Furthermore, patch
> IDs are supposed to be stable for most of the part. But even with the
> same input, the patch IDs will now be different depending on the repo's
> configured object hash.
>
> Work around the issue by setting up SHA-1 when there was no startup
> repository for now. This is arguably not the correct fix, but for now we
> rather want to focus on getting the segfault fixed.

I tend to agree that the use of SHA-256 in patch-id computation is a
regression when we added the SHA-256 support.  Even with the
GIT_DEFAULT_HASH_ALGORITHM fallback, we cannot fix it in an
initialized SHA-256 repository.

We should fix it but I agree that is probably outside the scope of
the "oops, we leave the hash algorithm totally uninitialized" fix.
> +	/*
> +	 * We rely on `the_hash_algo` to compute patch IDs. This is dubious as
> +	 * it means that the hash algorithm now depends on the object hash of
> +	 * the repository, even though git-patch-id(1) clearly defines that
> +	 * patch IDs always use SHA1.
> +	 *
> +	 * NEEDSWORK: This hack should be removed in favor of converting
> +	 * the code that computes patch IDs to always use SHA1.
> +	 */
> +	if (!startup_info->have_repository)
> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);

Hmph, in other places I did

	if (!the_hash_algo)
		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);

to find the case where we need a reasonable default.

Is there a practical difference?  If there isn't we should
standardise one and use the same test consistently everywhere.

Not that it matters for this particular case, where we in the longer
term should be hardcoding the use of SHA-1 even in SHA-256 repository
for the pupose of computing the patch-id.

Thanks.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 4/5] builtin/hash-object: fix uninitialized hash function
  2024-05-13 22:41   ` [PATCH v3 4/5] builtin/hash-object: " Junio C Hamano
@ 2024-05-13 23:13     ` Junio C Hamano
  2024-05-14  4:32       ` Patrick Steinhardt
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-13 23:13 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

Junio C Hamano <gitster@pobox.com> writes:

> From: Patrick Steinhardt <ps@pks.im>
>
> The git-hash-object(1) command allows users to hash an object even
> without a repository. Starting with c8aed5e8da (repository: stop setting
> SHA1 as the default object hash, 2024-05-07), this will make us hit an
> uninitialized hash function, which subsequently leads to a segfault.
>
> Fix this by falling back to SHA-1 explicitly when running outside of a
> Git repository. Users can use GIT_DEFAULT_HASH_ALGORITHM environment to
> specify what hash algorithm they want, so arguably this code should not
> be needed.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/hash-object.c   | 3 +++
>  t/t1007-hash-object.sh  | 6 ++++++
>  t/t1517-outside-repo.sh | 2 +-
>  3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/hash-object.c b/builtin/hash-object.c
> index 82ca6d2bfd..c767414a0c 100644
> --- a/builtin/hash-object.c
> +++ b/builtin/hash-object.c
> @@ -123,6 +123,9 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
>  	else
>  		prefix = setup_git_directory_gently(&nongit);
>  
> +	if (nongit && !the_hash_algo)
> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);

This is slightly different from your version, which always used SHA-1
when nongit is true, in the hope that it would make it more useful if
the user can say "GIT_DEFAULT_HASH_ALGORITHM=sha256 git hash-objects"
outside a repository.  I am not 100% convinced it is better or we
rather should aim for predictability that you'd always and only get
SHA-1 outside a repository.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] builtin/hash-object: fix uninitialized hash function
  2024-05-13  7:15 ` [PATCH 2/2] builtin/hash-object: " Patrick Steinhardt
@ 2024-05-14  0:16   ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14  0:16 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> diff --git a/builtin/hash-object.c b/builtin/hash-object.c
> index 82ca6d2bfd..0855f4f8aa 100644
> --- a/builtin/hash-object.c
> +++ b/builtin/hash-object.c
> @@ -123,6 +123,13 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
>  	else
>  		prefix = setup_git_directory_gently(&nongit);
>  
> +	/*
> +	 * TODO: Allow the hash algorithm to be configured by the user via a
> +	 *       command line option when not using `-w`.
> +	 */
> +	if (nongit)
> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);

This breaks t1007 when

        linux-sha256)
                export GIT_TEST_DEFAULT_HASH=sha256

is in effect.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 0/5] Fix use of uninitialized hash algorithms
  2024-05-13  7:15 [PATCH 0/2] Fix use of uninitialized hash algos Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2024-05-13 22:41 ` [PATCH v3 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
@ 2024-05-14  1:14 ` Junio C Hamano
  2024-05-14  1:14   ` [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
                     ` (4 more replies)
  2024-05-20 23:14 ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
  6 siblings, 5 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14  1:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

A change recently merged to 'next' stops us from defaulting to using
SHA-1 unless other code (like a logic early in the start-up sequence
to see what hash is being used in the repository we are working in)
explicitly sets it, leading to a (deliberate) crash of "git" when we
forgot to cover certain code paths.

It turns out we have a few.  Notable ones are all operations that
are designed to work outside a repository.  We should go over all
such code paths and give them a reasonable default when there is one
available (e.g. for historical reasons, patch-id is documented to
work with SHA-1 hashes, so arguably it should do so everywhere, not
just in SHA-1 repositories, but outside any repository), and when
there is none, we can use the existing GIT_DEFAULT_HASH environment
variable to let the user control the choice.

In this fourth iteration, I used the existing GIT_DEFAULT_HASH
environment variable that has been used to allow users to opt into
an early experiment of SHA-256 support.  I saw breakages in SHA-256
CI jobs for earlier rounds of this series, but hopefully this is now
good to go.  Knock wood...

Junio C Hamano (3):
  setup: add an escape hatch for "no more default hash algorithm" change
  t1517: test commands that are designed to be run outside repository
  apply: fix uninitialized hash function

Patrick Steinhardt (2):
  builtin/patch-id: fix uninitialized hash function
  builtin/hash-object: fix uninitialized hash function

 builtin/apply.c         |  4 +++
 builtin/hash-object.c   |  3 ++
 builtin/patch-id.c      | 13 +++++++++
 environment.h           |  1 +
 repository.c            | 40 +++++++++++++++++++++++++++
 setup.c                 |  2 --
 t/t1007-hash-object.sh  |  6 ++++
 t/t1517-outside-repo.sh | 61 +++++++++++++++++++++++++++++++++++++++++
 t/t4204-patch-id.sh     | 34 +++++++++++++++++++++++
 9 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100755 t/t1517-outside-repo.sh

Range-diff against v3:
1:  7adbf0cc5f ! 1:  3038f73e1c setup: add an escape hatch for "no more default hash algorithm" change
    @@ Commit message
         Partially revert c8aed5e8 (repository: stop setting SHA1 as the
         default object hash, 2024-05-07), to keep end-user systems still
         broken when we have gap in our test coverage but yet give them an
    -    escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGORITHM environment
    -    variable to "sha1" in order to revert to the previous behaviour.
    +    escape hatch to set the GIT_DEFAULT_HASH environment variable to
    +    "sha1" in order to revert to the previous behaviour.  This variable
    +    has been in use for using SHA-256 hash by default, and it should be
    +    a better fit than inventing a new and test-only knob.
     
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
    + ## environment.h ##
    +@@ environment.h: const char *getenv_safe(struct strvec *argv, const char *name);
    + #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
    + #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
    + #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
    ++#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
    + 
    + /*
    +  * Environment variable used in handshaking the wire protocol.
    +
      ## repository.c ##
    +@@
    + #include "git-compat-util.h"
    + #include "abspath.h"
    ++#include "environment.h"
    + #include "repository.h"
    + #include "object-store-ll.h"
    + #include "config.h"
     @@
      static struct repository the_repo;
      struct repository *the_repository = &the_repo;
    @@ repository.c
     +	const char *hash_name;
     +	int algo;
     +
    -+	hash_name = getenv("GIT_DEFAULT_HASH_ALGORITHM");
    ++	hash_name = getenv(GIT_DEFAULT_HASH_ENVIRONMENT);
     +	if (!hash_name)
     +		return;
     +	algo = hash_algo_by_name(hash_name);
    ++
    ++	/*
    ++	 * NEEDSWORK: after all, falling back to SHA-1 by assigning
    ++	 * GIT_HASH_SHA1 to algo here, instead of returning, may give
    ++	 * us better behaviour.
    ++	 */
     +	if (algo == GIT_HASH_UNKNOWN)
     +		return;
    ++
     +	repo_set_hash_algo(repo, algo);
     +}
     +
    @@ repository.c: void initialize_repository(struct repository *repo)
     +	 * giving them an appropriate default individually; any
     +	 * unconverted codepath that tries to access the_hash_algo
     +	 * will thus fail.  The end-users however have an escape hatch
    -+	 * to set GIT_DEFAULT_HASH_ALGORITHM environment variable to
    -+	 * "sha1" get back the old behaviour of defaulting to SHA-1.
    ++	 * to set GIT_DEFAULT_HASH environment variable to "sha1" get
    ++	 * back the old behaviour of defaulting to SHA-1.
     +	 */
     +	if (repo == the_repository)
     +		set_default_hash_algo(repo);
      }
      
      static void expand_base_dir(char **out, const char *in,
    +
    + ## setup.c ##
    +@@ setup.c: int daemonize(void)
    + #define TEST_FILEMODE 1
    + #endif
    + 
    +-#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
    +-
    + static void copy_templates_1(struct strbuf *path, struct strbuf *template_path,
    + 			     DIR *dir)
    + {
2:  c40b2bc7b5 ! 2:  e56ae68961 t1517: test commands that are designed to be run outside repository
    @@ t/t1517-outside-repo.sh (new)
     +TEST_PASSES_SANITIZE_LEAK=true
     +. ./test-lib.sh
     +
    -+# Do not use the end-user workaround for this test.
    -+# GIT_DEFAULT_HASH_ALGORITHM=sha1; export GIT_DEFAULT_HASH_ALGORITHM
    -+
     +test_expect_success 'set up a non-repo directory and test file' '
     +	GIT_CEILING_DIRECTORIES=$(pwd) &&
     +	export GIT_CEILING_DIRECTORIES &&
3:  48d83fe1fd ! 3:  e1ae0e95fc builtin/patch-id: fix uninitialized hash function
    @@ builtin/patch-id.c: int cmd_patch_id(int argc, const char **argv, const char *pr
     +	 * NEEDSWORK: This hack should be removed in favor of converting
     +	 * the code that computes patch IDs to always use SHA1.
     +	 */
    -+	if (!startup_info->have_repository)
    ++	if (!the_hash_algo)
     +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
     +
      	generate_id_list(opts ? opts > 1 : config.stable,
4:  a5bf0dfb0a ! 4:  1c8ce0d024 builtin/hash-object: fix uninitialized hash function
    @@ Commit message
         SHA1 as the default object hash, 2024-05-07), this will make us hit an
         uninitialized hash function, which subsequently leads to a segfault.
     
    -    Fix this by falling back to SHA-1 explicitly when running outside of a
    -    Git repository. Users can use GIT_DEFAULT_HASH_ALGORITHM environment to
    -    specify what hash algorithm they want, so arguably this code should not
    -    be needed.
    +    Fix this by falling back to SHA-1 explicitly when running outside of
    +    a Git repository. Users can use GIT_DEFAULT_HASH environment to
    +    specify what hash algorithm they want, so arguably this code should
    +    not be needed.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
5:  2b9e486c3f = 5:  5b0ec43757 apply: fix uninitialized hash function

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-14  1:14 ` [PATCH v4 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
@ 2024-05-14  1:14   ` Junio C Hamano
  2024-05-14  4:32     ` Patrick Steinhardt
  2024-05-14 17:19     ` Junio C Hamano
  2024-05-14  1:14   ` [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14  1:14 UTC (permalink / raw)
  To: git

Partially revert c8aed5e8 (repository: stop setting SHA1 as the
default object hash, 2024-05-07), to keep end-user systems still
broken when we have gap in our test coverage but yet give them an
escape hatch to set the GIT_DEFAULT_HASH environment variable to
"sha1" in order to revert to the previous behaviour.  This variable
has been in use for using SHA-256 hash by default, and it should be
a better fit than inventing a new and test-only knob.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 environment.h |  1 +
 repository.c  | 40 ++++++++++++++++++++++++++++++++++++++++
 setup.c       |  2 --
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/environment.h b/environment.h
index 05fd94d7be..deaa29408f 100644
--- a/environment.h
+++ b/environment.h
@@ -56,6 +56,7 @@ const char *getenv_safe(struct strvec *argv, const char *name);
 #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
 #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
 #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
+#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
 
 /*
  * Environment variable used in handshaking the wire protocol.
diff --git a/repository.c b/repository.c
index 15c10015b0..f912ee9a7c 100644
--- a/repository.c
+++ b/repository.c
@@ -1,5 +1,6 @@
 #include "git-compat-util.h"
 #include "abspath.h"
+#include "environment.h"
 #include "repository.h"
 #include "object-store-ll.h"
 #include "config.h"
@@ -19,6 +20,27 @@
 static struct repository the_repo;
 struct repository *the_repository = &the_repo;
 
+static void set_default_hash_algo(struct repository *repo)
+{
+	const char *hash_name;
+	int algo;
+
+	hash_name = getenv(GIT_DEFAULT_HASH_ENVIRONMENT);
+	if (!hash_name)
+		return;
+	algo = hash_algo_by_name(hash_name);
+
+	/*
+	 * NEEDSWORK: after all, falling back to SHA-1 by assigning
+	 * GIT_HASH_SHA1 to algo here, instead of returning, may give
+	 * us better behaviour.
+	 */
+	if (algo == GIT_HASH_UNKNOWN)
+		return;
+
+	repo_set_hash_algo(repo, algo);
+}
+
 void initialize_repository(struct repository *repo)
 {
 	repo->objects = raw_object_store_new();
@@ -26,6 +48,24 @@ void initialize_repository(struct repository *repo)
 	repo->parsed_objects = parsed_object_pool_new();
 	ALLOC_ARRAY(repo->index, 1);
 	index_state_init(repo->index, repo);
+
+	/*
+	 * When a command runs inside a repository, it learns what
+	 * hash algorithm is in use from the repository, but some
+	 * commands are designed to work outside a repository, yet
+	 * they want to access the_hash_algo, if only for the length
+	 * of the hashed value to see if their input looks like a
+	 * plausible hash value.
+	 *
+	 * We are in the process of identifying the codepaths and
+	 * giving them an appropriate default individually; any
+	 * unconverted codepath that tries to access the_hash_algo
+	 * will thus fail.  The end-users however have an escape hatch
+	 * to set GIT_DEFAULT_HASH environment variable to "sha1" get
+	 * back the old behaviour of defaulting to SHA-1.
+	 */
+	if (repo == the_repository)
+		set_default_hash_algo(repo);
 }
 
 static void expand_base_dir(char **out, const char *in,
diff --git a/setup.c b/setup.c
index 7c996659bd..d084703465 100644
--- a/setup.c
+++ b/setup.c
@@ -1840,8 +1840,6 @@ int daemonize(void)
 #define TEST_FILEMODE 1
 #endif
 
-#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
-
 static void copy_templates_1(struct strbuf *path, struct strbuf *template_path,
 			     DIR *dir)
 {
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository
  2024-05-14  1:14 ` [PATCH v4 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
  2024-05-14  1:14   ` [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
@ 2024-05-14  1:14   ` Junio C Hamano
  2024-05-14  4:32     ` Patrick Steinhardt
  2024-05-14  1:14   ` [PATCH v4 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14  1:14 UTC (permalink / raw)
  To: git

A few commands, like "git apply" and "git patch-id", have been
broken with a recent change to stop setting the default hash
algorithm to SHA-1.  Test them and fix them in later commits.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t1517-outside-repo.sh | 61 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100755 t/t1517-outside-repo.sh

diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
new file mode 100755
index 0000000000..e0fd495ec1
--- /dev/null
+++ b/t/t1517-outside-repo.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='check random commands outside repo'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'set up a non-repo directory and test file' '
+	GIT_CEILING_DIRECTORIES=$(pwd) &&
+	export GIT_CEILING_DIRECTORIES &&
+	mkdir non-repo &&
+	(
+		cd non-repo &&
+		# confirm that git does not find a repo
+		test_must_fail git rev-parse --git-dir
+	) &&
+	test_write_lines one two three four >nums &&
+	git add nums &&
+	cp nums nums.old &&
+	test_write_lines five >>nums &&
+	git diff >sample.patch
+'
+
+test_expect_failure 'compute a patch-id outside repository' '
+	git patch-id <sample.patch >patch-id.expect &&
+	(
+		cd non-repo &&
+		git patch-id <../sample.patch >../patch-id.actual
+	) &&
+	test_cmp patch-id.expect patch-id.actual
+'
+
+test_expect_failure 'hash-object outside repository' '
+	git hash-object --stdin <sample.patch >hash.expect &&
+	(
+		cd non-repo &&
+		git hash-object --stdin <../sample.patch >../hash.actual
+	) &&
+	test_cmp hash.expect hash.actual
+'
+
+test_expect_failure 'apply a patch outside repository' '
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git apply ../sample.patch
+	) &&
+	test_cmp nums non-repo/nums
+'
+
+test_expect_success 'grep outside repository' '
+	git grep --cached two >expect &&
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git grep --no-index two >../actual
+	) &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 3/5] builtin/patch-id: fix uninitialized hash function
  2024-05-14  1:14 ` [PATCH v4 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
  2024-05-14  1:14   ` [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
  2024-05-14  1:14   ` [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
@ 2024-05-14  1:14   ` Junio C Hamano
  2024-05-14  1:14   ` [PATCH v4 4/5] builtin/hash-object: " Junio C Hamano
  2024-05-14  1:14   ` [PATCH v4 5/5] apply: " Junio C Hamano
  4 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14  1:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

From: Patrick Steinhardt <ps@pks.im>

In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have adapted `initialize_repository()` to no longer set
up a default hash function. As this function is also used to set up
`the_repository`, the consequence is that `the_hash_algo` will now by
default be a `NULL` pointer unless the hash algorithm was configured
properly. This is done as a mechanism to detect cases where we may be
using the wrong hash function by accident.

This change now causes git-patch-id(1) to segfault when it's run outside
of a repository. As this command can read diffs from stdin, it does not
necessarily need a repository, but then relies on `the_hash_algo` to
compute the patch ID itself.

It is somewhat dubious that git-patch-id(1) relies on `the_hash_algo` in
the first place. Quoting its manpage:

    A "patch ID" is nothing but a sum of SHA-1 of the file diffs
    associated with a patch, with line numbers ignored. As such, it’s
    "reasonably stable", but at the same time also reasonably unique,
    i.e., two patches that have the same "patch ID" are almost
    guaranteed to be the same thing.

We explicitly document patch IDs to be using SHA-1. Furthermore, patch
IDs are supposed to be stable for most of the part. But even with the
same input, the patch IDs will now be different depending on the repo's
configured object hash.

Work around the issue by setting up SHA-1 when there was no startup
repository for now. This is arguably not the correct fix, but for now we
rather want to focus on getting the segfault fixed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/patch-id.c      | 13 +++++++++++++
 t/t1517-outside-repo.sh |  2 +-
 t/t4204-patch-id.sh     | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 3894d2b970..583099cacf 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -5,6 +5,7 @@
 #include "hash.h"
 #include "hex.h"
 #include "parse-options.h"
+#include "setup.h"
 
 static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
@@ -237,6 +238,18 @@ int cmd_patch_id(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
 			     patch_id_usage, 0);
 
+	/*
+	 * We rely on `the_hash_algo` to compute patch IDs. This is dubious as
+	 * it means that the hash algorithm now depends on the object hash of
+	 * the repository, even though git-patch-id(1) clearly defines that
+	 * patch IDs always use SHA1.
+	 *
+	 * NEEDSWORK: This hack should be removed in favor of converting
+	 * the code that computes patch IDs to always use SHA1.
+	 */
+	if (!the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	generate_id_list(opts ? opts > 1 : config.stable,
 			 opts ? opts == 3 : config.verbatim);
 	return 0;
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index e0fd495ec1..ac5f3191cc 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -21,7 +21,7 @@ test_expect_success 'set up a non-repo directory and test file' '
 	git diff >sample.patch
 '
 
-test_expect_failure 'compute a patch-id outside repository' '
+test_expect_success 'compute a patch-id outside repository' '
 	git patch-id <sample.patch >patch-id.expect &&
 	(
 		cd non-repo &&
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index a7fa94ce0a..605faea0c7 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -310,4 +310,38 @@ test_expect_success 'patch-id handles diffs with one line of before/after' '
 	test_config patchid.stable true &&
 	calc_patch_id diffu1stable <diffu1
 '
+
+test_expect_failure 'patch-id computes same ID with different object hashes' '
+	test_when_finished "rm -rf repo-sha1 repo-sha256" &&
+
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+
+	git init --object-format=sha1 repo-sha1 &&
+	git -C repo-sha1 patch-id <diff >patch-id-sha1 &&
+	git init --object-format=sha256 repo-sha256 &&
+	git -C repo-sha256 patch-id <diff >patch-id-sha256 &&
+	test_cmp patch-id-sha1 patch-id-sha256
+'
+
+test_expect_success 'patch-id without repository' '
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+	nongit git patch-id <diff
+'
+
 test_done
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 4/5] builtin/hash-object: fix uninitialized hash function
  2024-05-14  1:14 ` [PATCH v4 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
                     ` (2 preceding siblings ...)
  2024-05-14  1:14   ` [PATCH v4 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
@ 2024-05-14  1:14   ` Junio C Hamano
  2024-05-17 23:49     ` Junio C Hamano
  2024-05-14  1:14   ` [PATCH v4 5/5] apply: " Junio C Hamano
  4 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14  1:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

From: Patrick Steinhardt <ps@pks.im>

The git-hash-object(1) command allows users to hash an object even
without a repository. Starting with c8aed5e8da (repository: stop setting
SHA1 as the default object hash, 2024-05-07), this will make us hit an
uninitialized hash function, which subsequently leads to a segfault.

Fix this by falling back to SHA-1 explicitly when running outside of
a Git repository. Users can use GIT_DEFAULT_HASH environment to
specify what hash algorithm they want, so arguably this code should
not be needed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/hash-object.c   | 3 +++
 t/t1007-hash-object.sh  | 6 ++++++
 t/t1517-outside-repo.sh | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 82ca6d2bfd..c767414a0c 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -123,6 +123,9 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	else
 		prefix = setup_git_directory_gently(&nongit);
 
+	if (nongit && !the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	if (vpath && prefix) {
 		vpath_free = prefix_filename(prefix, vpath);
 		vpath = vpath_free;
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64aea38486..4c138c6ca4 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -260,4 +260,10 @@ test_expect_success '--literally with extra-long type' '
 	echo example | git hash-object -t $t --literally --stdin
 '
 
+test_expect_success '--stdin outside of repository' '
+	nongit git hash-object --stdin <hello >actual &&
+	echo "$(test_oid hello)" >expect &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index ac5f3191cc..854bb8f343 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -30,7 +30,7 @@ test_expect_success 'compute a patch-id outside repository' '
 	test_cmp patch-id.expect patch-id.actual
 '
 
-test_expect_failure 'hash-object outside repository' '
+test_expect_success 'hash-object outside repository' '
 	git hash-object --stdin <sample.patch >hash.expect &&
 	(
 		cd non-repo &&
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v4 5/5] apply: fix uninitialized hash function
  2024-05-14  1:14 ` [PATCH v4 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
                     ` (3 preceding siblings ...)
  2024-05-14  1:14   ` [PATCH v4 4/5] builtin/hash-object: " Junio C Hamano
@ 2024-05-14  1:14   ` Junio C Hamano
  4 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14  1:14 UTC (permalink / raw)
  To: git

"git apply" can work outside a repository as a better "GNU patch",
but when it does so, it still assumed that it can access
the_hash_algo, which is no longer true in the new world order.

Make sure we explicitly fall back to SHA-1 algorithm for backward
compatibility.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/apply.c         | 4 ++++
 t/t1517-outside-repo.sh | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 861a01910c..e9175f820f 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -1,6 +1,7 @@
 #include "builtin.h"
 #include "gettext.h"
 #include "repository.h"
+#include "hash.h"
 #include "apply.h"
 
 static const char * const apply_usage[] = {
@@ -18,6 +19,9 @@ int cmd_apply(int argc, const char **argv, const char *prefix)
 	if (init_apply_state(&state, the_repository, prefix))
 		exit(128);
 
+	if (!the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	argc = apply_parse_options(argc, argv,
 				   &state, &force_apply, &options,
 				   apply_usage);
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index 854bb8f343..6f32a40c6d 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -39,7 +39,7 @@ test_expect_success 'hash-object outside repository' '
 	test_cmp hash.expect hash.actual
 '
 
-test_expect_failure 'apply a patch outside repository' '
+test_expect_success 'apply a patch outside repository' '
 	(
 		cd non-repo &&
 		cp ../nums.old nums &&
-- 
2.45.0-145-g3e4a232f6e


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 3/5] builtin/patch-id: fix uninitialized hash function
  2024-05-13 23:11     ` Junio C Hamano
@ 2024-05-14  4:31       ` Patrick Steinhardt
  2024-05-14 15:52         ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-14  4:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]

On Mon, May 13, 2024 at 04:11:01PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
[snip]
> > +	/*
> > +	 * We rely on `the_hash_algo` to compute patch IDs. This is dubious as
> > +	 * it means that the hash algorithm now depends on the object hash of
> > +	 * the repository, even though git-patch-id(1) clearly defines that
> > +	 * patch IDs always use SHA1.
> > +	 *
> > +	 * NEEDSWORK: This hack should be removed in favor of converting
> > +	 * the code that computes patch IDs to always use SHA1.
> > +	 */
> > +	if (!startup_info->have_repository)
> > +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> 
> Hmph, in other places I did
> 
> 	if (!the_hash_algo)
> 		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> 
> to find the case where we need a reasonable default.
> 
> Is there a practical difference?  If there isn't we should
> standardise one and use the same test consistently everywhere.
> 
> Not that it matters for this particular case, where we in the longer
> term should be hardcoding the use of SHA-1 even in SHA-256 repository
> for the pupose of computing the patch-id.

To the best of my knowledge there isn't. What I prefer about my approach
is that it explicitly points out that this is conditional on whether or
not we have a repository. But in the end I don't mind much which of both
versions we use.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-14  1:14   ` [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
@ 2024-05-14  4:32     ` Patrick Steinhardt
  2024-05-14 15:05       ` Junio C Hamano
  2024-05-14 17:19     ` Junio C Hamano
  1 sibling, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-14  4:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]

On Mon, May 13, 2024 at 06:14:33PM -0700, Junio C Hamano wrote:
[snip]
> diff --git a/repository.c b/repository.c
> index 15c10015b0..f912ee9a7c 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -1,5 +1,6 @@
>  #include "git-compat-util.h"
>  #include "abspath.h"
> +#include "environment.h"
>  #include "repository.h"
>  #include "object-store-ll.h"
>  #include "config.h"
> @@ -19,6 +20,27 @@
>  static struct repository the_repo;
>  struct repository *the_repository = &the_repo;
>  
> +static void set_default_hash_algo(struct repository *repo)
> +{
> +	const char *hash_name;
> +	int algo;
> +
> +	hash_name = getenv(GIT_DEFAULT_HASH_ENVIRONMENT);
> +	if (!hash_name)
> +		return;
> +	algo = hash_algo_by_name(hash_name);
> +
> +	/*
> +	 * NEEDSWORK: after all, falling back to SHA-1 by assigning
> +	 * GIT_HASH_SHA1 to algo here, instead of returning, may give
> +	 * us better behaviour.
> +	 */
> +	if (algo == GIT_HASH_UNKNOWN)
> +		return;
> +
> +	repo_set_hash_algo(repo, algo);
> +}

The problem with reusing "GIT_DEFAULT_HASH" is that we unconditionally
set it in our test suite in "test-lib.sh". This will have the effect
that we will never hit segfaults in our tests because we always end up
setting up the default hash, whereas our users now will.

I would propose to revert this back to the first iteration you had,
where the workaround only enables the SHA1 fallback. No users have yet
complained about the inability to pick the hash algo outside of a repo,
indicating that it's not widely used. And when they complain, there is
more motivation to fix this properly by adding a `--object-hash=` switch
to the respective commands so that a user can pick the desired object
hash.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository
  2024-05-14  1:14   ` [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
@ 2024-05-14  4:32     ` Patrick Steinhardt
  2024-05-14 15:08       ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-14  4:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1506 bytes --]

On Mon, May 13, 2024 at 06:14:34PM -0700, Junio C Hamano wrote:
> A few commands, like "git apply" and "git patch-id", have been
> broken with a recent change to stop setting the default hash
> algorithm to SHA-1.  Test them and fix them in later commits.

Is there a specific reason why this needs a whole patch suite, as
opposed to adding the tests to the respective test suites of the
commands?

> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  t/t1517-outside-repo.sh | 61 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
>  create mode 100755 t/t1517-outside-repo.sh
> 
> diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
> new file mode 100755
> index 0000000000..e0fd495ec1
> --- /dev/null
> +++ b/t/t1517-outside-repo.sh
> @@ -0,0 +1,61 @@
> +#!/bin/sh
> +
> +test_description='check random commands outside repo'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +test_expect_success 'set up a non-repo directory and test file' '
> +	GIT_CEILING_DIRECTORIES=$(pwd) &&
> +	export GIT_CEILING_DIRECTORIES &&
> +	mkdir non-repo &&
> +	(
> +		cd non-repo &&
> +		# confirm that git does not find a repo
> +		test_must_fail git rev-parse --git-dir
> +	) &&
> +	test_write_lines one two three four >nums &&
> +	git add nums &&
> +	cp nums nums.old &&
> +	test_write_lines five >>nums &&
> +	git diff >sample.patch
> +'

We have the "nongit" command that does most of this for us.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 4/5] builtin/hash-object: fix uninitialized hash function
  2024-05-13 23:13     ` Junio C Hamano
@ 2024-05-14  4:32       ` Patrick Steinhardt
  2024-05-14 15:55         ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-14  4:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2370 bytes --]

On Mon, May 13, 2024 at 04:13:19PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > From: Patrick Steinhardt <ps@pks.im>
> >
> > The git-hash-object(1) command allows users to hash an object even
> > without a repository. Starting with c8aed5e8da (repository: stop setting
> > SHA1 as the default object hash, 2024-05-07), this will make us hit an
> > uninitialized hash function, which subsequently leads to a segfault.
> >
> > Fix this by falling back to SHA-1 explicitly when running outside of a
> > Git repository. Users can use GIT_DEFAULT_HASH_ALGORITHM environment to
> > specify what hash algorithm they want, so arguably this code should not
> > be needed.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> >  builtin/hash-object.c   | 3 +++
> >  t/t1007-hash-object.sh  | 6 ++++++
> >  t/t1517-outside-repo.sh | 2 +-
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/hash-object.c b/builtin/hash-object.c
> > index 82ca6d2bfd..c767414a0c 100644
> > --- a/builtin/hash-object.c
> > +++ b/builtin/hash-object.c
> > @@ -123,6 +123,9 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
> >  	else
> >  		prefix = setup_git_directory_gently(&nongit);
> >  
> > +	if (nongit && !the_hash_algo)
> > +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> 
> This is slightly different from your version, which always used SHA-1
> when nongit is true, in the hope that it would make it more useful if
> the user can say "GIT_DEFAULT_HASH_ALGORITHM=sha256 git hash-objects"
> outside a repository.  I am not 100% convinced it is better or we
> rather should aim for predictability that you'd always and only get
> SHA-1 outside a repository.

I'd prefer the latter -- always use SHA-1. As you say, it's easier to
understand and doesn't create implicit mechanisms that we'll have to
maintain going forward. Also, users didn't have a desire yet to pick a
different algorithm than SHA-1, which probably also comes from the fact
that SHA-256 repositories are still scarce.

Eventually, we should then add a new option `--object-hash=` to
git-hash-object(1) and other commands that may run outside of a Git
repository to let the user pick their desired hash.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-14  4:32     ` Patrick Steinhardt
@ 2024-05-14 15:05       ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14 15:05 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> The problem with reusing "GIT_DEFAULT_HASH" is that we unconditionally
> set it in our test suite in "test-lib.sh". This will have the effect
> that we will never hit segfaults in our tests because we always end up
> setting up the default hash, whereas our users now will.

I tried the version you posted with only SHA-1 fallback and it
failed with SHA256 CI jobs, and eventually came up with using the
existing environment variable.

The variable is the perfect fit in the longer term for our purpose.
It is what the end-users will set, not for papering over remaining
bugs from the "no longer there is a fallback default" change, but
for telling Git that they want to use sha256 repositories.

With GIT_DEFAULT_HASH set to sha256, we will be testing a
configuration that is very close to those end-users.

There probably are four cases we need to check:

 - In a repository, invocations of commands that should honor the
   hash function that is in use by the repository (i.e. this is true
   for most commands and their invocations).

 - In a repository, invocations of commands that should ignore the
   repository settings and always use SHA-1 (i.e. proposed fix to
   patch-id).

 - Outside a repository, invocations with GIT_DEFAULT_HASH set
   should probably parallel the above two, as if GIT_DEFAULT_HASH
   came from the repository.

 - Outside a repository, invocations without GIT_DEFAULT_HASH set
   should all default to SHA-1???

The last combination cannot be tested UNLESS we unset GIT_DEFAULT_HASH
that is given by test-lib.sh, but that can easily be arranged, now
we have one primary/central place that tests out-of-repository
invocations of various commands in t1517.

And using the existing environment variable has an added benefit
that we do not have to add an extra option to random commands.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository
  2024-05-14  4:32     ` Patrick Steinhardt
@ 2024-05-14 15:08       ` Junio C Hamano
  2024-05-15 12:24         ` Patrick Steinhardt
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14 15:08 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> Is there a specific reason why this needs a whole patch suite, as
> opposed to adding the tests to the respective test suites of the
> commands?

Yes, testing out-of-repository operations needs certain trick and
people forget to write such tests using the GIT_CEILING_DIRECTORIES
mechanism.  Having one place where we have an enumeration of
commands that are designed to be usable outside repository is a
handy way to make sure that we have enough test coverage.  It would
make it easy to control how GIT_DEFAULT_HASH environment is set
during these tests to have them in all one place.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 3/5] builtin/patch-id: fix uninitialized hash function
  2024-05-14  4:31       ` Patrick Steinhardt
@ 2024-05-14 15:52         ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14 15:52 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

>> Hmph, in other places I did
>> 
>> 	if (!the_hash_algo)
>> 		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
>> 
>> to find the case where we need a reasonable default.
>> 
>> Is there a practical difference?  If there isn't we should
>> standardise one and use the same test consistently everywhere.
>> ...
>
> To the best of my knowledge there isn't. What I prefer about my approach
> is that it explicitly points out that this is conditional on whether or
> not we have a repository. But in the end I don't mind much which of both
> versions we use.

Ah, that makes sense, and it is quite subjective but makes certain
sense.

The reason I prefered to check "the_hash_algo" is very much the
opposite.  In this particular decision to call (or not call)
set_hash_algo(), we only care if the_hash_algo is not yet set, and
that is why the_hash_algo is checked.

Specifically, we do not care *why* it is still unset; in the current
codebase, the most likely reason why we do not have the_hash_algo
set might be that we haven't found the repository yet, but we do not
have to rely on that assumption to hold true.  It would help
maintainability into the future where the_hash_algo is set already
before we come here when outside a repository, or vice versa.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v3 4/5] builtin/hash-object: fix uninitialized hash function
  2024-05-14  4:32       ` Patrick Steinhardt
@ 2024-05-14 15:55         ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14 15:55 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> I'd prefer the latter -- always use SHA-1. As you say, it's easier to
> understand and doesn't create implicit mechanisms that we'll have to
> maintain going forward. Also, users didn't have a desire yet to pick a
> different algorithm than SHA-1, which probably also comes from the fact
> that SHA-256 repositories are still scarce.
> ...
> Eventually, we should then add a new option `--object-hash=` to
> git-hash-object(1) and other commands that may run outside of a Git
> repository to let the user pick their desired hash.

If we were planning to allow using something other than SHA-1 in the
future, then I do not think I agree with your argument.  We can use
the same GIT_DEFAULT_HASH mechanism to specify what hash to use without
later adding an extra option just fine.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-14  1:14   ` [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
  2024-05-14  4:32     ` Patrick Steinhardt
@ 2024-05-14 17:19     ` Junio C Hamano
  2024-05-15 12:23       ` Patrick Steinhardt
  2024-05-16 15:31       ` Junio C Hamano
  1 sibling, 2 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-14 17:19 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster@pobox.com> writes:

> Partially revert c8aed5e8 (repository: stop setting SHA1 as the
> default object hash, 2024-05-07), to keep end-user systems still
> broken when we have gap in our test coverage but yet give them an
> escape hatch to set the GIT_DEFAULT_HASH environment variable to
> "sha1" in order to revert to the previous behaviour.  This variable
> has been in use for using SHA-256 hash by default, and it should be
> a better fit than inventing a new and test-only knob.

Having done all of this, I actually am very tempted to add the
"always default to SHA-1" back as a fallback position to the
set_default_hash_algo() function.  We know we are going to get the
right hash algorithm when working in the repository, so the only
case the default matters in practice is when working outside the
repository.

We already have such a custom code for "git diff --no-index", and we
are adding a few more back in here, but they can disappear if we had
code to set the fallback default when GIT_DEFAULT_HASH does not
exist here.  The "always use SHA-1 regardless of the hash used by
the repository" code like "patch-id" should not depend on such a
fallback default but should have its own code to explicitly set it.

As the user can tweak what algorithm they want if the wanted to, it
does not sound too bad to have a fallback default when the user did
not choose.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-14 17:19     ` Junio C Hamano
@ 2024-05-15 12:23       ` Patrick Steinhardt
  2024-05-16 15:31       ` Junio C Hamano
  1 sibling, 0 replies; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-15 12:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2361 bytes --]

On Tue, May 14, 2024 at 10:19:01AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Partially revert c8aed5e8 (repository: stop setting SHA1 as the
> > default object hash, 2024-05-07), to keep end-user systems still
> > broken when we have gap in our test coverage but yet give them an
> > escape hatch to set the GIT_DEFAULT_HASH environment variable to
> > "sha1" in order to revert to the previous behaviour.  This variable
> > has been in use for using SHA-256 hash by default, and it should be
> > a better fit than inventing a new and test-only knob.
> 
> Having done all of this, I actually am very tempted to add the
> "always default to SHA-1" back as a fallback position to the
> set_default_hash_algo() function.  We know we are going to get the
> right hash algorithm when working in the repository, so the only
> case the default matters in practice is when working outside the
> repository.
> 
> We already have such a custom code for "git diff --no-index", and we
> are adding a few more back in here, but they can disappear if we had
> code to set the fallback default when GIT_DEFAULT_HASH does not
> exist here.  The "always use SHA-1 regardless of the hash used by
> the repository" code like "patch-id" should not depend on such a
> fallback default but should have its own code to explicitly set it.
> 
> As the user can tweak what algorithm they want if the wanted to, it
> does not sound too bad to have a fallback default when the user did
> not choose.

I think that this is going into the wrong direction. The patches that we
have added surface real issues. If we now unconditionally add back the
fallback to SHA-1, then we only punt those issues further down the road
to the point in time where we drop `the_hash_algo` and/or
`the_repository`. All the issues that were surfaced until now are
technical debt, and the proposed fixes have been documented with a
"TODO" item that they need more work.

In some cases it's as simple as adding in an option to the respective
command that lets the user pick the hash algorithm, and I do not think
that GIT_DEFAULT_HASH is a proper replacement for that. In other cases
like for git-patch-id(1) the issue runs deeper and we need to refactor
the whole subsystem to stop relying on `the_hash_algo` altogether.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository
  2024-05-14 15:08       ` Junio C Hamano
@ 2024-05-15 12:24         ` Patrick Steinhardt
  2024-05-15 14:15           ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-15 12:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 887 bytes --]

On Tue, May 14, 2024 at 08:08:19AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Is there a specific reason why this needs a whole patch suite, as
> > opposed to adding the tests to the respective test suites of the
> > commands?
> 
> Yes, testing out-of-repository operations needs certain trick and
> people forget to write such tests using the GIT_CEILING_DIRECTORIES
> mechanism.  Having one place where we have an enumeration of
> commands that are designed to be usable outside repository is a
> handy way to make sure that we have enough test coverage.  It would
> make it easy to control how GIT_DEFAULT_HASH environment is set
> during these tests to have them in all one place.

We already have the "nogit" command that neatly encapsulates all of this
logic, so the trickery is contained in a single spot in practice.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository
  2024-05-15 12:24         ` Patrick Steinhardt
@ 2024-05-15 14:15           ` Junio C Hamano
  2024-05-15 14:25             ` Patrick Steinhardt
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-15 14:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> On Tue, May 14, 2024 at 08:08:19AM -0700, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> 
>> > Is there a specific reason why this needs a whole patch suite, as
>> > opposed to adding the tests to the respective test suites of the
>> > commands?
>> 
>> Yes, testing out-of-repository operations needs certain trick and
>> people forget to write such tests using the GIT_CEILING_DIRECTORIES
>> mechanism.  Having one place where we have an enumeration of
>> commands that are designed to be usable outside repository is a
>> handy way to make sure that we have enough test coverage.  It would
>> make it easy to control how GIT_DEFAULT_HASH environment is set
>> during these tests to have them in all one place.
>
> We already have the "nogit" command that neatly encapsulates all of this
> logic, so the trickery is contained in a single spot in practice.

Heh, you asked for "a" specific reason, and I listed three.  The
tests that are spread across many scripts make it harder to see if
we have enough coverage for the out-of-repository operations, and
the use of "nongit" helper does not change the equation, does it?


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository
  2024-05-15 14:15           ` Junio C Hamano
@ 2024-05-15 14:25             ` Patrick Steinhardt
  2024-05-15 15:40               ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-15 14:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]

On Wed, May 15, 2024 at 07:15:49AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > On Tue, May 14, 2024 at 08:08:19AM -0700, Junio C Hamano wrote:
> >> Patrick Steinhardt <ps@pks.im> writes:
> >> 
> >> > Is there a specific reason why this needs a whole patch suite, as
> >> > opposed to adding the tests to the respective test suites of the
> >> > commands?
> >> 
> >> Yes, testing out-of-repository operations needs certain trick and
> >> people forget to write such tests using the GIT_CEILING_DIRECTORIES
> >> mechanism.  Having one place where we have an enumeration of
> >> commands that are designed to be usable outside repository is a
> >> handy way to make sure that we have enough test coverage.  It would
> >> make it easy to control how GIT_DEFAULT_HASH environment is set
> >> during these tests to have them in all one place.
> >
> > We already have the "nogit" command that neatly encapsulates all of this
> > logic, so the trickery is contained in a single spot in practice.
> 
> Heh, you asked for "a" specific reason, and I listed three.

Hm. I guess I got thrown off a bit as this was written in a single
paragraph, so my mind didn't process it as different reasons :)

> The tests that are spread across many scripts make it harder to see if
> we have enough coverage for the out-of-repository operations,

Put this way I in fact agree with you.

> and the use of "nongit" helper does not change the equation, does it?

No, it doesn't really.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository
  2024-05-15 14:25             ` Patrick Steinhardt
@ 2024-05-15 15:40               ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-15 15:40 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

>> The tests that are spread across many scripts make it harder to see if
>> we have enough coverage for the out-of-repository operations,
>
> Put this way I in fact agree with you.
>
>> and the use of "nongit" helper does not change the equation, does it?
>
> No, it doesn't really.

Of course the other side of the coin is that having all tests about
a single command (say, "git mailmap") in the same script for both
in-repo and out-of-repo operations is also handy in a different way.

I haven't found a good way to balance the two.  Obviously duplicating
the same test in two scripts (one collection for the same command,
the other collection out-of repo operation of various commands) is
something I really do want to avoid, so...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-14 17:19     ` Junio C Hamano
  2024-05-15 12:23       ` Patrick Steinhardt
@ 2024-05-16 15:31       ` Junio C Hamano
  1 sibling, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-16 15:31 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster@pobox.com> writes:

> Having done all of this, I actually am very tempted to add the
> "always default to SHA-1" back as a fallback position to the
> set_default_hash_algo() function.  We know we are going to get the
> right hash algorithm when working in the repository, so the only
> case the default matters in practice is when working outside the
> repository.

Not really.  It does not add anything to help either real world or
our tests.  The current test setting is already bad enough in that,
unlike in the real world settings, even tests with the SHA-1
algorithm has GIT_DEFAULT_HASH environment variable set, which means
that such a "if the environment variable is not set, further fall
back to SHA-1" does not do anything.

Unless we change t/test-lib.sh not to set GIT_DEFAULT_HASH tweaking
the fallback default in repository.c:set_default_hash_algo() based
on GIT_DEFAULT_HASH would not be a workable solution.

I wanted to arrange things so that the end-user exectuion by default
has an extra fallback (perhaps to SHA-1, or GIT_DEFAULT_HASH) to
avoid disrupting their real-world use, which we can disable in our
tests to expose code paths that still rely on the "default" set when
in-core repository struct gets initialized, but that is not possible
without changing the way t/test-lib.sh uses GIT_DEFAULT_HASH, it
seems.  So the arrangement unfortunately has to be "we have no
default, and bugs will break the real-world uses as well as tests
the same way.  The real-world users have to export an extra
'workaround' environment variable to force "default" to SHA-1 (or
GIT_DEFAULT_HASH) --- which may be "workable" but very far from being
intuitive.  They can set GIT_DEFAULT_HASH but to make it effective
everywhere, including the "default" given by set_default_hash_algo(),
they need to set this other "workaround" thing.

> We already have such a custom code for "git diff --no-index", and we
> are adding a few more back in here, but they can disappear if we had
> code to set the fallback default when GIT_DEFAULT_HASH does not
> exist here.

While I think a manual setting of the_hash_algo in "diff --no-index"
code path should not hardcode "SHA-1" but instead use the hash
specified by the GIT_DEFAULT_HASH environment to be consistent with
the use of "git" by the same parent process that had that variable
exported to the environment, that should not be done globally in
repository.c:set_default_hash_algo().

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 4/5] builtin/hash-object: fix uninitialized hash function
  2024-05-14  1:14   ` [PATCH v4 4/5] builtin/hash-object: " Junio C Hamano
@ 2024-05-17 23:49     ` Junio C Hamano
  2024-05-20 21:19       ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-17 23:49 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

Junio C Hamano <gitster@pobox.com> writes:

> From: Patrick Steinhardt <ps@pks.im>
>
> The git-hash-object(1) command allows users to hash an object even
> without a repository. Starting with c8aed5e8da (repository: stop setting
> SHA1 as the default object hash, 2024-05-07), this will make us hit an
> uninitialized hash function, which subsequently leads to a segfault.
>
> Fix this by falling back to SHA-1 explicitly when running outside of
> a Git repository. Users can use GIT_DEFAULT_HASH environment to
> specify what hash algorithm they want, so arguably this code should
> not be needed.

By the way, this breaks the expectation of t1007 and t1517 when run
with GIT_DEFAULT_HASH=sha256 (I recall that they passed with the
earlier iteration of my "fall back to GIT_DEFAULT_HASH" in [1/5],
but we decided abusing GIT_DEFAULT_HASH is a bad idea).

https://github.com/git/git/actions/runs/9135149292/job/25122031380

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 4/5] builtin/hash-object: fix uninitialized hash function
  2024-05-17 23:49     ` Junio C Hamano
@ 2024-05-20 21:19       ` Junio C Hamano
  2024-05-20 22:45         ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-20 21:19 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> From: Patrick Steinhardt <ps@pks.im>
>>
>> The git-hash-object(1) command allows users to hash an object even
>> without a repository. Starting with c8aed5e8da (repository: stop setting
>> SHA1 as the default object hash, 2024-05-07), this will make us hit an
>> uninitialized hash function, which subsequently leads to a segfault.
>>
>> Fix this by falling back to SHA-1 explicitly when running outside of
>> a Git repository. Users can use GIT_DEFAULT_HASH environment to
>> specify what hash algorithm they want, so arguably this code should
>> not be needed.
>
> By the way, this breaks the expectation of t1007 and t1517 when run
> with GIT_DEFAULT_HASH=sha256 (I recall that they passed with the
> earlier iteration of my "fall back to GIT_DEFAULT_HASH" in [1/5],
> but we decided abusing GIT_DEFAULT_HASH is a bad idea).

The breakage in 1517 turns out to be a thinko on my part.  Outside a
repository, we do use SHA-1 with our "if the_hash_algo is not set
yet, default to SHA-1" in patch-id and hash-object but the way I
prepared the expected output was to use whatever GIT_TEST_DEFAULT_HASH
was in use.  A fix would be to go outside a repository and force the
hash with GIT_DEFAULT_HASH to sha1 when preparing the expected output.

I haven't looked at the breakage in 1007 yet, though.

diff --git i/t/t1517-outside-repo.sh w/t/t1517-outside-repo.sh
index 6f32a40c6d..6363c8e3c4 100755
--- i/t/t1517-outside-repo.sh
+++ w/t/t1517-outside-repo.sh
@@ -21,22 +21,24 @@ test_expect_success 'set up a non-repo directory and test file' '
 	git diff >sample.patch
 '
 
-test_expect_success 'compute a patch-id outside repository' '
-	git patch-id <sample.patch >patch-id.expect &&
+test_expect_success 'compute a patch-id outside repository (default to SHA-1)' '
 	(
 		cd non-repo &&
-		git patch-id <../sample.patch >../patch-id.actual
-	) &&
-	test_cmp patch-id.expect patch-id.actual
+		GIT_DEFAULT_HASH=sha1 \
+		git patch-id <../sample.patch >patch-id.expect &&
+		git patch-id <../sample.patch >patch-id.actual &&
+		test_cmp patch-id.expect patch-id.actual
+	)
 '
 
-test_expect_success 'hash-object outside repository' '
-	git hash-object --stdin <sample.patch >hash.expect &&
+test_expect_success 'hash-object outside repository (default to SHA-1)' '
 	(
 		cd non-repo &&
-		git hash-object --stdin <../sample.patch >../hash.actual
-	) &&
-	test_cmp hash.expect hash.actual
+		GIT_DEFAULT_HASH=sha1 \
+		git hash-object --stdin <../sample.patch >hash.expect &&
+		git hash-object --stdin <../sample.patch >hash.actual &&
+		test_cmp hash.expect hash.actual
+	)
 '
 
 test_expect_success 'apply a patch outside repository' '


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v4 4/5] builtin/hash-object: fix uninitialized hash function
  2024-05-20 21:19       ` Junio C Hamano
@ 2024-05-20 22:45         ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-20 22:45 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

Junio C Hamano <gitster@pobox.com> writes:

> I haven't looked at the breakage in 1007 yet, though.

This turned out to be almost trivial.  A fix relative to an earlier
version is that the call to test_oid helper needs to explicitly ask
for SHA-1 variant, as the command invocation outside a repository
uses SHA-1 (not due to falling back to a hardcoded default, but by
an explicit fallback in the "git hash-object" itself.

I'll send a v5 of the whole series sometime later (if I have time it
may happen today but otherwise tomorrow).

Thanks.

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64aea38486..d73a5cc237 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -260,4 +260,10 @@ test_expect_success '--literally with extra-long type' '
 	echo example | git hash-object -t $t --literally --stdin
 '
 
+test_expect_success '--stdin outside of repository (uses SHA-1)' '
+	nongit git hash-object --stdin <hello >actual &&
+	echo "$(test_oid --hash=sha1 hello)" >expect &&
+	test_cmp expect actual
+'
+
 test_done

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 82ca6d2bfd..c767414a0c 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -123,6 +123,9 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	else
 		prefix = setup_git_directory_gently(&nongit);
 
+	if (nongit && !the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	if (vpath && prefix) {
 		vpath_free = prefix_filename(prefix, vpath);
 		vpath = vpath_free;

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v5 0/5] Fix use of uninitialized hash algorithms
  2024-05-13  7:15 [PATCH 0/2] Fix use of uninitialized hash algos Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2024-05-14  1:14 ` [PATCH v4 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
@ 2024-05-20 23:14 ` Junio C Hamano
  2024-05-20 23:14   ` [PATCH v5 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
                     ` (5 more replies)
  6 siblings, 6 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-20 23:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

A change recently merged to 'next' stops us from defaulting to using
SHA-1 unless other code (like a logic early in the start-up sequence
to see what hash is being used in the repository we are working in)
explicitly sets it, leading to a (deliberate) crash of "git" when we
forgot to cover certain code paths.

It turns out we have a few.  Notable ones are all operations that
are designed to work outside a repository.  We should go over all
such code paths and give them a reasonable default when there is one
available (e.g. for historical reasons, patch-id is documented to
work with SHA-1 hashes, so arguably it, or at least when it is
invoked with the "--stable" option, should do so everywhere, not
just in SHA-1 repositories, but in SHA-256 repository or outside any
repository).  In the meantime, if an end-user hits such a "bug"
before we can fix it, it would be nice to give them an escape hatch
to restore the historical behaviour of falling back to use SHA-1.

These patches are designed to apply on a merge of c8aed5e8
(repository: stop setting SHA1 as the default object hash,
2024-05-07) into 3e4a232f (The third batch, 2024-05-13), which has
been the same base throughout the past iterations.

In this fifth iteration:

 - The first step no longer falls back to GIT_DEFAULT_HASH; the
   escape hatch is a dedicated GIT_TEST_DEFAULT_HASH_ALGO
   environment variable, but hopefully we do not have to advertise
   it all that often.

 - The second step has been simplified somewhat to use the "nongit"
   helper when we only need to run a single "git" command in t1517.
   The way the expected output files were prepared in the previous
   versions did not correctly force use of SHA-1 algorithm, which
   has been corrected.  The third step and fourth step for t1517
   continue to be "flip expect_failure to expect_success", but you
   can see context differences in the range-diff.

 - The fourth step also has a fix for t1007 where the previous
   iterations did not correctly force use of SHA-1 to prepare the
   expected output.

Otherwise this round should be ready, modulo possible typoes.


Junio C Hamano (3):
  setup: add an escape hatch for "no more default hash algorithm" change
  t1517: test commands that are designed to be run outside repository
  apply: fix uninitialized hash function

Patrick Steinhardt (2):
  builtin/patch-id: fix uninitialized hash function
  builtin/hash-object: fix uninitialized hash function

 builtin/apply.c         |  4 +++
 builtin/hash-object.c   |  3 +++
 builtin/patch-id.c      | 13 +++++++++
 repository.c            | 44 ++++++++++++++++++++++++++++++
 t/t1007-hash-object.sh  |  6 +++++
 t/t1517-outside-repo.sh | 59 +++++++++++++++++++++++++++++++++++++++++
 t/t4204-patch-id.sh     | 34 ++++++++++++++++++++++++
 7 files changed, 163 insertions(+)
 create mode 100755 t/t1517-outside-repo.sh

-- 
2.45.1-216-g4365c6fcf9

1:  3038f73e1c ! 1:  f5e6868a61 setup: add an escape hatch for "no more default hash algorithm" change
    @@ Commit message
         Partially revert c8aed5e8 (repository: stop setting SHA1 as the
         default object hash, 2024-05-07), to keep end-user systems still
         broken when we have gap in our test coverage but yet give them an
    -    escape hatch to set the GIT_DEFAULT_HASH environment variable to
    -    "sha1" in order to revert to the previous behaviour.  This variable
    -    has been in use for using SHA-256 hash by default, and it should be
    -    a better fit than inventing a new and test-only knob.
    +    escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGO environment
    +    variable to "sha1" in order to revert to the previous behaviour.
     
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    +    Due to the way the end-user facing GIT_DEFAULT_HASH environment
    +    variable is used in our test suite, we unfortunately cannot reuse it
    +    for this purpose.
     
    - ## environment.h ##
    -@@ environment.h: const char *getenv_safe(struct strvec *argv, const char *name);
    - #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
    - #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
    - #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE"
    -+#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
    - 
    - /*
    -  * Environment variable used in handshaking the wire protocol.
    +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## repository.c ##
    -@@
    - #include "git-compat-util.h"
    - #include "abspath.h"
    -+#include "environment.h"
    - #include "repository.h"
    - #include "object-store-ll.h"
    - #include "config.h"
     @@
      static struct repository the_repo;
      struct repository *the_repository = &the_repo;
      
    ++/*
    ++ * An escape hatch: if we hit a bug in the production code that fails
    ++ * to set an appropriate hash algorithm (most likely to happen when
    ++ * running outside a repository), we can tell the user who reported
    ++ * the crash to set the environment variable to "sha1" (all lowercase)
    ++ * to revert to the historical behaviour of defaulting to SHA-1.
    ++ */
     +static void set_default_hash_algo(struct repository *repo)
     +{
     +	const char *hash_name;
     +	int algo;
     +
    -+	hash_name = getenv(GIT_DEFAULT_HASH_ENVIRONMENT);
    ++	hash_name = getenv("GIT_TEST_DEFAULT_HASH_ALGO");
     +	if (!hash_name)
     +		return;
     +	algo = hash_algo_by_name(hash_name);
    -+
    -+	/*
    -+	 * NEEDSWORK: after all, falling back to SHA-1 by assigning
    -+	 * GIT_HASH_SHA1 to algo here, instead of returning, may give
    -+	 * us better behaviour.
    -+	 */
     +	if (algo == GIT_HASH_UNKNOWN)
     +		return;
     +
    @@ repository.c: void initialize_repository(struct repository *repo)
     +	 * of the hashed value to see if their input looks like a
     +	 * plausible hash value.
     +	 *
    -+	 * We are in the process of identifying the codepaths and
    ++	 * We are in the process of identifying such code paths and
     +	 * giving them an appropriate default individually; any
    -+	 * unconverted codepath that tries to access the_hash_algo
    ++	 * unconverted code paths that try to access the_hash_algo
     +	 * will thus fail.  The end-users however have an escape hatch
    -+	 * to set GIT_DEFAULT_HASH environment variable to "sha1" get
    -+	 * back the old behaviour of defaulting to SHA-1.
    ++	 * to set GIT_TEST_DEFAULT_HASH_ALGO environment variable to
    ++	 * "sha1" to get back the old behaviour of defaulting to SHA-1.
    ++	 *
    ++	 * This escape hatch is deliberately kept unadvertised, so
    ++	 * that they see crashes and we can get a report before
    ++	 * telling them about it.
     +	 */
     +	if (repo == the_repository)
     +		set_default_hash_algo(repo);
      }
      
      static void expand_base_dir(char **out, const char *in,
    -
    - ## setup.c ##
    -@@ setup.c: int daemonize(void)
    - #define TEST_FILEMODE 1
    - #endif
    - 
    --#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
    --
    - static void copy_templates_1(struct strbuf *path, struct strbuf *template_path,
    - 			     DIR *dir)
    - {
2:  e56ae68961 ! 2:  f3f9bf0480 t1517: test commands that are designed to be run outside repository
    @@ t/t1517-outside-repo.sh (new)
     +	git diff >sample.patch
     +'
     +
    -+test_expect_failure 'compute a patch-id outside repository' '
    -+	git patch-id <sample.patch >patch-id.expect &&
    -+	(
    -+		cd non-repo &&
    -+		git patch-id <../sample.patch >../patch-id.actual
    -+	) &&
    ++test_expect_failure 'compute a patch-id outside repository (uses SHA-1)' '
    ++	nongit env GIT_DEFAULT_HASH=sha1 \
    ++		git patch-id <sample.patch >patch-id.expect &&
    ++	nongit \
    ++		git patch-id <sample.patch >patch-id.actual &&
     +	test_cmp patch-id.expect patch-id.actual
     +'
     +
    -+test_expect_failure 'hash-object outside repository' '
    -+	git hash-object --stdin <sample.patch >hash.expect &&
    -+	(
    -+		cd non-repo &&
    -+		git hash-object --stdin <../sample.patch >../hash.actual
    -+	) &&
    ++test_expect_failure 'hash-object outside repository (uses SHA-1)' '
    ++	nongit env GIT_DEFAULT_HASH=sha1 \
    ++		git hash-object --stdin <sample.patch >hash.expect &&
    ++	nongit \
    ++		git hash-object --stdin <sample.patch >hash.actual &&
     +	test_cmp hash.expect hash.actual
     +'
     +
3:  e1ae0e95fc ! 3:  246cf395d0 builtin/patch-id: fix uninitialized hash function
    @@ t/t1517-outside-repo.sh: test_expect_success 'set up a non-repo directory and te
      	git diff >sample.patch
      '
      
    --test_expect_failure 'compute a patch-id outside repository' '
    -+test_expect_success 'compute a patch-id outside repository' '
    - 	git patch-id <sample.patch >patch-id.expect &&
    - 	(
    - 		cd non-repo &&
    +-test_expect_failure 'compute a patch-id outside repository (uses SHA-1)' '
    ++test_expect_success 'compute a patch-id outside repository (uses SHA-1)' '
    + 	nongit env GIT_DEFAULT_HASH=sha1 \
    + 		git patch-id <sample.patch >patch-id.expect &&
    + 	nongit \
     
      ## t/t4204-patch-id.sh ##
     @@ t/t4204-patch-id.sh: test_expect_success 'patch-id handles diffs with one line of before/after' '
4:  1c8ce0d024 ! 4:  fe14490548 builtin/hash-object: fix uninitialized hash function
    @@ t/t1007-hash-object.sh: test_expect_success '--literally with extra-long type' '
      	echo example | git hash-object -t $t --literally --stdin
      '
      
    -+test_expect_success '--stdin outside of repository' '
    ++test_expect_success '--stdin outside of repository (uses SHA-1)' '
     +	nongit git hash-object --stdin <hello >actual &&
    -+	echo "$(test_oid hello)" >expect &&
    ++	echo "$(test_oid --hash=sha1 hello)" >expect &&
     +	test_cmp expect actual
     +'
     +
      test_done
     
      ## t/t1517-outside-repo.sh ##
    -@@ t/t1517-outside-repo.sh: test_expect_success 'compute a patch-id outside repository' '
    +@@ t/t1517-outside-repo.sh: test_expect_success 'compute a patch-id outside repository (uses SHA-1)' '
      	test_cmp patch-id.expect patch-id.actual
      '
      
    --test_expect_failure 'hash-object outside repository' '
    -+test_expect_success 'hash-object outside repository' '
    - 	git hash-object --stdin <sample.patch >hash.expect &&
    - 	(
    - 		cd non-repo &&
    +-test_expect_failure 'hash-object outside repository (uses SHA-1)' '
    ++test_expect_success 'hash-object outside repository (uses SHA-1)' '
    + 	nongit env GIT_DEFAULT_HASH=sha1 \
    + 		git hash-object --stdin <sample.patch >hash.expect &&
    + 	nongit \
5:  5b0ec43757 ! 5:  c6d5e68374 apply: fix uninitialized hash function
    @@ builtin/apply.c: int cmd_apply(int argc, const char **argv, const char *prefix)
      				   apply_usage);
     
      ## t/t1517-outside-repo.sh ##
    -@@ t/t1517-outside-repo.sh: test_expect_success 'hash-object outside repository' '
    +@@ t/t1517-outside-repo.sh: test_expect_success 'hash-object outside repository (uses SHA-1)' '
      	test_cmp hash.expect hash.actual
      '
      

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH v5 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-20 23:14 ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
@ 2024-05-20 23:14   ` Junio C Hamano
  2024-05-21  7:57     ` Patrick Steinhardt
  2024-05-20 23:14   ` [PATCH v5 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-20 23:14 UTC (permalink / raw)
  To: git

Partially revert c8aed5e8 (repository: stop setting SHA1 as the
default object hash, 2024-05-07), to keep end-user systems still
broken when we have gap in our test coverage but yet give them an
escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGO environment
variable to "sha1" in order to revert to the previous behaviour.

Due to the way the end-user facing GIT_DEFAULT_HASH environment
variable is used in our test suite, we unfortunately cannot reuse it
for this purpose.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 repository.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/repository.c b/repository.c
index 15c10015b0..c62e329878 100644
--- a/repository.c
+++ b/repository.c
@@ -19,6 +19,28 @@
 static struct repository the_repo;
 struct repository *the_repository = &the_repo;
 
+/*
+ * An escape hatch: if we hit a bug in the production code that fails
+ * to set an appropriate hash algorithm (most likely to happen when
+ * running outside a repository), we can tell the user who reported
+ * the crash to set the environment variable to "sha1" (all lowercase)
+ * to revert to the historical behaviour of defaulting to SHA-1.
+ */
+static void set_default_hash_algo(struct repository *repo)
+{
+	const char *hash_name;
+	int algo;
+
+	hash_name = getenv("GIT_TEST_DEFAULT_HASH_ALGO");
+	if (!hash_name)
+		return;
+	algo = hash_algo_by_name(hash_name);
+	if (algo == GIT_HASH_UNKNOWN)
+		return;
+
+	repo_set_hash_algo(repo, algo);
+}
+
 void initialize_repository(struct repository *repo)
 {
 	repo->objects = raw_object_store_new();
@@ -26,6 +48,28 @@ void initialize_repository(struct repository *repo)
 	repo->parsed_objects = parsed_object_pool_new();
 	ALLOC_ARRAY(repo->index, 1);
 	index_state_init(repo->index, repo);
+
+	/*
+	 * When a command runs inside a repository, it learns what
+	 * hash algorithm is in use from the repository, but some
+	 * commands are designed to work outside a repository, yet
+	 * they want to access the_hash_algo, if only for the length
+	 * of the hashed value to see if their input looks like a
+	 * plausible hash value.
+	 *
+	 * We are in the process of identifying such code paths and
+	 * giving them an appropriate default individually; any
+	 * unconverted code paths that try to access the_hash_algo
+	 * will thus fail.  The end-users however have an escape hatch
+	 * to set GIT_TEST_DEFAULT_HASH_ALGO environment variable to
+	 * "sha1" to get back the old behaviour of defaulting to SHA-1.
+	 *
+	 * This escape hatch is deliberately kept unadvertised, so
+	 * that they see crashes and we can get a report before
+	 * telling them about it.
+	 */
+	if (repo == the_repository)
+		set_default_hash_algo(repo);
 }
 
 static void expand_base_dir(char **out, const char *in,
-- 
2.45.1-216-g4365c6fcf9


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v5 2/5] t1517: test commands that are designed to be run outside repository
  2024-05-20 23:14 ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
  2024-05-20 23:14   ` [PATCH v5 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
@ 2024-05-20 23:14   ` Junio C Hamano
  2024-05-20 23:14   ` [PATCH v5 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-20 23:14 UTC (permalink / raw)
  To: git

A few commands, like "git apply" and "git patch-id", have been
broken with a recent change to stop setting the default hash
algorithm to SHA-1.  Test them and fix them in later commits.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t1517-outside-repo.sh | 59 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100755 t/t1517-outside-repo.sh

diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
new file mode 100755
index 0000000000..389974d9fb
--- /dev/null
+++ b/t/t1517-outside-repo.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+
+test_description='check random commands outside repo'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'set up a non-repo directory and test file' '
+	GIT_CEILING_DIRECTORIES=$(pwd) &&
+	export GIT_CEILING_DIRECTORIES &&
+	mkdir non-repo &&
+	(
+		cd non-repo &&
+		# confirm that git does not find a repo
+		test_must_fail git rev-parse --git-dir
+	) &&
+	test_write_lines one two three four >nums &&
+	git add nums &&
+	cp nums nums.old &&
+	test_write_lines five >>nums &&
+	git diff >sample.patch
+'
+
+test_expect_failure 'compute a patch-id outside repository (uses SHA-1)' '
+	nongit env GIT_DEFAULT_HASH=sha1 \
+		git patch-id <sample.patch >patch-id.expect &&
+	nongit \
+		git patch-id <sample.patch >patch-id.actual &&
+	test_cmp patch-id.expect patch-id.actual
+'
+
+test_expect_failure 'hash-object outside repository (uses SHA-1)' '
+	nongit env GIT_DEFAULT_HASH=sha1 \
+		git hash-object --stdin <sample.patch >hash.expect &&
+	nongit \
+		git hash-object --stdin <sample.patch >hash.actual &&
+	test_cmp hash.expect hash.actual
+'
+
+test_expect_failure 'apply a patch outside repository' '
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git apply ../sample.patch
+	) &&
+	test_cmp nums non-repo/nums
+'
+
+test_expect_success 'grep outside repository' '
+	git grep --cached two >expect &&
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git grep --no-index two >../actual
+	) &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.45.1-216-g4365c6fcf9


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v5 3/5] builtin/patch-id: fix uninitialized hash function
  2024-05-20 23:14 ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
  2024-05-20 23:14   ` [PATCH v5 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
  2024-05-20 23:14   ` [PATCH v5 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
@ 2024-05-20 23:14   ` Junio C Hamano
  2024-05-20 23:14   ` [PATCH v5 4/5] builtin/hash-object: " Junio C Hamano
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-20 23:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

From: Patrick Steinhardt <ps@pks.im>

In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have adapted `initialize_repository()` to no longer set
up a default hash function. As this function is also used to set up
`the_repository`, the consequence is that `the_hash_algo` will now by
default be a `NULL` pointer unless the hash algorithm was configured
properly. This is done as a mechanism to detect cases where we may be
using the wrong hash function by accident.

This change now causes git-patch-id(1) to segfault when it's run outside
of a repository. As this command can read diffs from stdin, it does not
necessarily need a repository, but then relies on `the_hash_algo` to
compute the patch ID itself.

It is somewhat dubious that git-patch-id(1) relies on `the_hash_algo` in
the first place. Quoting its manpage:

    A "patch ID" is nothing but a sum of SHA-1 of the file diffs
    associated with a patch, with line numbers ignored. As such, it’s
    "reasonably stable", but at the same time also reasonably unique,
    i.e., two patches that have the same "patch ID" are almost
    guaranteed to be the same thing.

We explicitly document patch IDs to be using SHA-1. Furthermore, patch
IDs are supposed to be stable for most of the part. But even with the
same input, the patch IDs will now be different depending on the repo's
configured object hash.

Work around the issue by setting up SHA-1 when there was no startup
repository for now. This is arguably not the correct fix, but for now we
rather want to focus on getting the segfault fixed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/patch-id.c      | 13 +++++++++++++
 t/t1517-outside-repo.sh |  2 +-
 t/t4204-patch-id.sh     | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 3894d2b970..583099cacf 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -5,6 +5,7 @@
 #include "hash.h"
 #include "hex.h"
 #include "parse-options.h"
+#include "setup.h"
 
 static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
 {
@@ -237,6 +238,18 @@ int cmd_patch_id(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
 			     patch_id_usage, 0);
 
+	/*
+	 * We rely on `the_hash_algo` to compute patch IDs. This is dubious as
+	 * it means that the hash algorithm now depends on the object hash of
+	 * the repository, even though git-patch-id(1) clearly defines that
+	 * patch IDs always use SHA1.
+	 *
+	 * NEEDSWORK: This hack should be removed in favor of converting
+	 * the code that computes patch IDs to always use SHA1.
+	 */
+	if (!the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	generate_id_list(opts ? opts > 1 : config.stable,
 			 opts ? opts == 3 : config.verbatim);
 	return 0;
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index 389974d9fb..278ef57b3a 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -21,7 +21,7 @@ test_expect_success 'set up a non-repo directory and test file' '
 	git diff >sample.patch
 '
 
-test_expect_failure 'compute a patch-id outside repository (uses SHA-1)' '
+test_expect_success 'compute a patch-id outside repository (uses SHA-1)' '
 	nongit env GIT_DEFAULT_HASH=sha1 \
 		git patch-id <sample.patch >patch-id.expect &&
 	nongit \
diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh
index a7fa94ce0a..605faea0c7 100755
--- a/t/t4204-patch-id.sh
+++ b/t/t4204-patch-id.sh
@@ -310,4 +310,38 @@ test_expect_success 'patch-id handles diffs with one line of before/after' '
 	test_config patchid.stable true &&
 	calc_patch_id diffu1stable <diffu1
 '
+
+test_expect_failure 'patch-id computes same ID with different object hashes' '
+	test_when_finished "rm -rf repo-sha1 repo-sha256" &&
+
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+
+	git init --object-format=sha1 repo-sha1 &&
+	git -C repo-sha1 patch-id <diff >patch-id-sha1 &&
+	git init --object-format=sha256 repo-sha256 &&
+	git -C repo-sha256 patch-id <diff >patch-id-sha256 &&
+	test_cmp patch-id-sha1 patch-id-sha256
+'
+
+test_expect_success 'patch-id without repository' '
+	cat >diff <<-\EOF &&
+	diff --git a/bar b/bar
+	index bdaf90f..31051f6 100644
+	--- a/bar
+	+++ b/bar
+	@@ -2 +2,2 @@
+	 b
+	+c
+	EOF
+	nongit git patch-id <diff
+'
+
 test_done
-- 
2.45.1-216-g4365c6fcf9


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v5 4/5] builtin/hash-object: fix uninitialized hash function
  2024-05-20 23:14 ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
                     ` (2 preceding siblings ...)
  2024-05-20 23:14   ` [PATCH v5 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
@ 2024-05-20 23:14   ` Junio C Hamano
  2024-05-20 23:14   ` [PATCH v5 5/5] apply: " Junio C Hamano
  2024-05-21  7:58   ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Patrick Steinhardt
  5 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-20 23:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt

From: Patrick Steinhardt <ps@pks.im>

The git-hash-object(1) command allows users to hash an object even
without a repository. Starting with c8aed5e8da (repository: stop setting
SHA1 as the default object hash, 2024-05-07), this will make us hit an
uninitialized hash function, which subsequently leads to a segfault.

Fix this by falling back to SHA-1 explicitly when running outside of
a Git repository. Users can use GIT_DEFAULT_HASH environment to
specify what hash algorithm they want, so arguably this code should
not be needed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/hash-object.c   | 3 +++
 t/t1007-hash-object.sh  | 6 ++++++
 t/t1517-outside-repo.sh | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 82ca6d2bfd..c767414a0c 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -123,6 +123,9 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	else
 		prefix = setup_git_directory_gently(&nongit);
 
+	if (nongit && !the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	if (vpath && prefix) {
 		vpath_free = prefix_filename(prefix, vpath);
 		vpath = vpath_free;
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64aea38486..d73a5cc237 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -260,4 +260,10 @@ test_expect_success '--literally with extra-long type' '
 	echo example | git hash-object -t $t --literally --stdin
 '
 
+test_expect_success '--stdin outside of repository (uses SHA-1)' '
+	nongit git hash-object --stdin <hello >actual &&
+	echo "$(test_oid --hash=sha1 hello)" >expect &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index 278ef57b3a..2d8982d61a 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -29,7 +29,7 @@ test_expect_success 'compute a patch-id outside repository (uses SHA-1)' '
 	test_cmp patch-id.expect patch-id.actual
 '
 
-test_expect_failure 'hash-object outside repository (uses SHA-1)' '
+test_expect_success 'hash-object outside repository (uses SHA-1)' '
 	nongit env GIT_DEFAULT_HASH=sha1 \
 		git hash-object --stdin <sample.patch >hash.expect &&
 	nongit \
-- 
2.45.1-216-g4365c6fcf9


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH v5 5/5] apply: fix uninitialized hash function
  2024-05-20 23:14 ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
                     ` (3 preceding siblings ...)
  2024-05-20 23:14   ` [PATCH v5 4/5] builtin/hash-object: " Junio C Hamano
@ 2024-05-20 23:14   ` Junio C Hamano
  2024-05-21  7:58     ` Patrick Steinhardt
  2024-05-21  7:58   ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Patrick Steinhardt
  5 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-20 23:14 UTC (permalink / raw)
  To: git

"git apply" can work outside a repository as a better "GNU patch",
but when it does so, it still assumed that it can access
the_hash_algo, which is no longer true in the new world order.

Make sure we explicitly fall back to SHA-1 algorithm for backward
compatibility.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/apply.c         | 4 ++++
 t/t1517-outside-repo.sh | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 861a01910c..e9175f820f 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -1,6 +1,7 @@
 #include "builtin.h"
 #include "gettext.h"
 #include "repository.h"
+#include "hash.h"
 #include "apply.h"
 
 static const char * const apply_usage[] = {
@@ -18,6 +19,9 @@ int cmd_apply(int argc, const char **argv, const char *prefix)
 	if (init_apply_state(&state, the_repository, prefix))
 		exit(128);
 
+	if (!the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	argc = apply_parse_options(argc, argv,
 				   &state, &force_apply, &options,
 				   apply_usage);
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index 2d8982d61a..557808ffa7 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -37,7 +37,7 @@ test_expect_success 'hash-object outside repository (uses SHA-1)' '
 	test_cmp hash.expect hash.actual
 '
 
-test_expect_failure 'apply a patch outside repository' '
+test_expect_success 'apply a patch outside repository' '
 	(
 		cd non-repo &&
 		cp ../nums.old nums &&
-- 
2.45.1-216-g4365c6fcf9


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH v5 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-20 23:14   ` [PATCH v5 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
@ 2024-05-21  7:57     ` Patrick Steinhardt
  2024-05-21 15:59       ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-21  7:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 942 bytes --]

On Mon, May 20, 2024 at 04:14:30PM -0700, Junio C Hamano wrote:
> Partially revert c8aed5e8 (repository: stop setting SHA1 as the
> default object hash, 2024-05-07), to keep end-user systems still
> broken when we have gap in our test coverage but yet give them an
> escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGO environment
> variable to "sha1" in order to revert to the previous behaviour.
> 
> Due to the way the end-user facing GIT_DEFAULT_HASH environment
> variable is used in our test suite, we unfortunately cannot reuse it
> for this purpose.

Okay, so this now really only is an escape hatch for users as we do not
specify it in our tests anymore. It does make me wonder whether we want
to name the envvar accordingly, but don't mind it much either way. It is
only intended to be a stop gap solution anyway that we can eventually
drop once we are sufficiently sure that there is no further breakage.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v5 5/5] apply: fix uninitialized hash function
  2024-05-20 23:14   ` [PATCH v5 5/5] apply: " Junio C Hamano
@ 2024-05-21  7:58     ` Patrick Steinhardt
  2024-05-21 13:36       ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-21  7:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]

On Mon, May 20, 2024 at 04:14:34PM -0700, Junio C Hamano wrote:
> "git apply" can work outside a repository as a better "GNU patch",
> but when it does so, it still assumed that it can access
> the_hash_algo, which is no longer true in the new world order.
> 
> Make sure we explicitly fall back to SHA-1 algorithm for backward
> compatibility.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/apply.c         | 4 ++++
>  t/t1517-outside-repo.sh | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/apply.c b/builtin/apply.c
> index 861a01910c..e9175f820f 100644
> --- a/builtin/apply.c
> +++ b/builtin/apply.c
> @@ -1,6 +1,7 @@
>  #include "builtin.h"
>  #include "gettext.h"
>  #include "repository.h"
> +#include "hash.h"
>  #include "apply.h"
>  
>  static const char * const apply_usage[] = {
> @@ -18,6 +19,9 @@ int cmd_apply(int argc, const char **argv, const char *prefix)
>  	if (init_apply_state(&state, the_repository, prefix))
>  		exit(128);
>  
> +	if (!the_hash_algo)
> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +

Do we also want to add a comment here that mentions that we may want to
make this configureable via a command line option, like we have in the
preceding commits?

Patrick

>  	argc = apply_parse_options(argc, argv,
>  				   &state, &force_apply, &options,
>  				   apply_usage);
> diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
> index 2d8982d61a..557808ffa7 100755
> --- a/t/t1517-outside-repo.sh
> +++ b/t/t1517-outside-repo.sh
> @@ -37,7 +37,7 @@ test_expect_success 'hash-object outside repository (uses SHA-1)' '
>  	test_cmp hash.expect hash.actual
>  '
>  
> -test_expect_failure 'apply a patch outside repository' '
> +test_expect_success 'apply a patch outside repository' '
>  	(
>  		cd non-repo &&
>  		cp ../nums.old nums &&
> -- 
> 2.45.1-216-g4365c6fcf9
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v5 0/5] Fix use of uninitialized hash algorithms
  2024-05-20 23:14 ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
                     ` (4 preceding siblings ...)
  2024-05-20 23:14   ` [PATCH v5 5/5] apply: " Junio C Hamano
@ 2024-05-21  7:58   ` Patrick Steinhardt
  2024-05-21 18:07     ` Junio C Hamano
  5 siblings, 1 reply; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-21  7:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2498 bytes --]

On Mon, May 20, 2024 at 04:14:29PM -0700, Junio C Hamano wrote:
> A change recently merged to 'next' stops us from defaulting to using
> SHA-1 unless other code (like a logic early in the start-up sequence
> to see what hash is being used in the repository we are working in)
> explicitly sets it, leading to a (deliberate) crash of "git" when we
> forgot to cover certain code paths.
> 
> It turns out we have a few.  Notable ones are all operations that
> are designed to work outside a repository.  We should go over all
> such code paths and give them a reasonable default when there is one
> available (e.g. for historical reasons, patch-id is documented to
> work with SHA-1 hashes, so arguably it, or at least when it is
> invoked with the "--stable" option, should do so everywhere, not
> just in SHA-1 repositories, but in SHA-256 repository or outside any
> repository).  In the meantime, if an end-user hits such a "bug"
> before we can fix it, it would be nice to give them an escape hatch
> to restore the historical behaviour of falling back to use SHA-1.
> 
> These patches are designed to apply on a merge of c8aed5e8
> (repository: stop setting SHA1 as the default object hash,
> 2024-05-07) into 3e4a232f (The third batch, 2024-05-13), which has
> been the same base throughout the past iterations.
> 
> In this fifth iteration:
> 
>  - The first step no longer falls back to GIT_DEFAULT_HASH; the
>    escape hatch is a dedicated GIT_TEST_DEFAULT_HASH_ALGO
>    environment variable, but hopefully we do not have to advertise
>    it all that often.
> 
>  - The second step has been simplified somewhat to use the "nongit"
>    helper when we only need to run a single "git" command in t1517.
>    The way the expected output files were prepared in the previous
>    versions did not correctly force use of SHA-1 algorithm, which
>    has been corrected.  The third step and fourth step for t1517
>    continue to be "flip expect_failure to expect_success", but you
>    can see context differences in the range-diff.
> 
>  - The fourth step also has a fix for t1007 where the previous
>    iterations did not correctly force use of SHA-1 to prepare the
>    expected output.
> 
> Otherwise this round should be ready, modulo possible typoes.

I have two smallish comments, but neither of them really have to be
addressed. Overall I very much agree with this iteration and think that
it's the right way to go.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v5 5/5] apply: fix uninitialized hash function
  2024-05-21  7:58     ` Patrick Steinhardt
@ 2024-05-21 13:36       ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-21 13:36 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

>> +	if (!the_hash_algo)
>> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
>> +
>
> Do we also want to add a comment here that mentions that we may want to
> make this configureable via a command line option, like we have in the
> preceding commits?

We may want a comment here that says 

    we could to redo the "apply.c" machinery to make this arbitrary
    fallback unnecessary, but the benefit to do so is dubious and
    the risk of breaking the code is probably not worth the effort.

When working as "better GNU patch" mode without a repository, we
should not and do not use the_hash_algo for the purpose of hashing
at all.  We do not do the binary diff (because we cannot grab the
preimage object out of the object store (that does not exist) to
apply the delta to form the postimage, we do not do the 3-way
fallback using the preimage blob object names that appear on the
"index" lines.  As far as I recall, the only thing we use
the_hash_algo for is for the max length of the hash to ask "is this
hexadecimal string a plausible looking object name?  We are parsing
a line that started with 'index ' and trying to see if the line
syntactically looks like a valid 'index' line" and the like.  If we
assume SHA-1 and if somebody tries to injest a SHA-256 --full-index
patch, that logic may say "nah, we didn't find a valid 'index'
header", but I think we'll just leave the fields like old-object
blank, which does not affect anything because we won't do "apply
--3way" anyway.

So we _could_ identify such places and tell the code "when
the_hash_algo is NULL, instead of the_hash_algo->hexsz, use 0 as
hexsz" (this example is from apply.c:gitdiff_index()) and it indeed
was the approach I tried in my unpublished draft before the first
version I posted.  It quickly got ugly.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v5 1/5] setup: add an escape hatch for "no more default hash algorithm" change
  2024-05-21  7:57     ` Patrick Steinhardt
@ 2024-05-21 15:59       ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2024-05-21 15:59 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> On Mon, May 20, 2024 at 04:14:30PM -0700, Junio C Hamano wrote:
>> Partially revert c8aed5e8 (repository: stop setting SHA1 as the
>> default object hash, 2024-05-07), to keep end-user systems still
>> broken when we have gap in our test coverage but yet give them an
>> escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGO environment
>> variable to "sha1" in order to revert to the previous behaviour.
>> 
>> Due to the way the end-user facing GIT_DEFAULT_HASH environment
>> variable is used in our test suite, we unfortunately cannot reuse it
>> for this purpose.
>
> Okay, so this now really only is an escape hatch for users as we do not
> specify it in our tests anymore. It does make me wonder whether we want
> to name the envvar accordingly, but don't mind it much either way. It is
> only intended to be a stop gap solution anyway that we can eventually
> drop once we are sufficiently sure that there is no further breakage.

Yes, I think the name of that escape hatch variable should be
irrelevant by the time we are reasonably confident that we could
remove it ;-)

But the proposed log message should make that intention more clear.
How does this read?

    ... give them an escape hatch ... in order to revert to the
    previous behaviour, just in case we haven't done a thorough job
    in fixing the fallout from c8aed5e8.  After we build confidence,
    we should remove the escape hatch support, but we are not there
    yet after only fixing three commands (hash-object, apply, and
    patch-id) in this series.

Thanks.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v5 0/5] Fix use of uninitialized hash algorithms
  2024-05-21  7:58   ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Patrick Steinhardt
@ 2024-05-21 18:07     ` Junio C Hamano
  2024-05-22  4:51       ` Patrick Steinhardt
  0 siblings, 1 reply; 62+ messages in thread
From: Junio C Hamano @ 2024-05-21 18:07 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> I have two smallish comments, but neither of them really have to be
> addressed. Overall I very much agree with this iteration and think that
> it's the right way to go.

I've locally done the following locally but it probably does not
need to be resent to the list before merging down to 'next'.


1:  b23a93597c ! 1:  d3b2ff75fd setup: add an escape hatch for "no more default hash algorithm" change
    @@ Commit message
         default object hash, 2024-05-07), to keep end-user systems still
         broken when we have gap in our test coverage but yet give them an
         escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGO environment
    -    variable to "sha1" in order to revert to the previous behaviour.
    +    variable to "sha1" in order to revert to the previous behaviour, in
    +    case we haven't done a thorough job in fixing the fallout from
    +    c8aed5e8.  After we build confidence, we should remove the escape
    +    hatch support, but we are not there yet after only fixing three
    +    commands (hash-object, apply, and patch-id) in this series.
     
         Due to the way the end-user facing GIT_DEFAULT_HASH environment
         variable is used in our test suite, we unfortunately cannot reuse it
2:  6a20370944 = 2:  abece6e970 t1517: test commands that are designed to be run outside repository
3:  fa258c5d47 = 3:  4a1c95931f builtin/patch-id: fix uninitialized hash function
4:  164d340cbe = 4:  8d058b8024 builtin/hash-object: fix uninitialized hash function
5:  bd0246eb51 ! 5:  4674ab682d apply: fix uninitialized hash function
    @@ Commit message
         Make sure we explicitly fall back to SHA-1 algorithm for backward
         compatibility.
     
    +    It is of dubious value to make this configurable to other hash
    +    algorithms, as the code does not use the_hash_algo for hashing
    +    purposes when working outside a repository (which is how
    +    the_hash_algo is left to NULL)---it is only used to learn the max
    +    length of the hash when parsing the object names on the "index"
    +    line, but failing to parse the "index" line is not a hard failure,
    +    and the program does not support operations like applying binary
    +    patches and --3way fallback that requires object access outside a
    +    repository.
    +
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## builtin/apply.c ##
    @@ builtin/apply.c: int cmd_apply(int argc, const char **argv, const char *prefix)
      	if (init_apply_state(&state, the_repository, prefix))
      		exit(128);
      
    ++	/*
    ++	 * We could to redo the "apply.c" machinery to make this
    ++	 * arbitrary fallback unnecessary, but it is dubious that it
    ++	 * is worth the effort.
    ++	 * cf. https://lore.kernel.org/git/xmqqcypfcmn4.fsf@gitster.g/
    ++	 */
     +	if (!the_hash_algo)
     +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
     +

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH v5 0/5] Fix use of uninitialized hash algorithms
  2024-05-21 18:07     ` Junio C Hamano
@ 2024-05-22  4:51       ` Patrick Steinhardt
  0 siblings, 0 replies; 62+ messages in thread
From: Patrick Steinhardt @ 2024-05-22  4:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 477 bytes --]

On Tue, May 21, 2024 at 11:07:12AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > I have two smallish comments, but neither of them really have to be
> > addressed. Overall I very much agree with this iteration and think that
> > it's the right way to go.
> 
> I've locally done the following locally but it probably does not
> need to be resent to the list before merging down to 'next'.

Thanks, the diff looks good to me.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2024-05-22  4:51 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-13  7:15 [PATCH 0/2] Fix use of uninitialized hash algos Patrick Steinhardt
2024-05-13  7:15 ` [PATCH 1/2] builtin/patch-id: fix uninitialized hash function Patrick Steinhardt
2024-05-13  7:15 ` [PATCH 2/2] builtin/hash-object: " Patrick Steinhardt
2024-05-14  0:16   ` Junio C Hamano
2024-05-13 16:01 ` [PATCH 0/2] Fix use of uninitialized hash algos Junio C Hamano
2024-05-13 18:36   ` Junio C Hamano
2024-05-13 19:21 ` [PATCH v2 0/4] Fix use of uninitialized hash algorithms Junio C Hamano
2024-05-13 19:21   ` [PATCH v2 1/4] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
2024-05-13 19:48     ` Kyle Lippincott
2024-05-13 19:21   ` [PATCH v2 2/4] t1517: test commands that are designed to be run outside repository Junio C Hamano
2024-05-13 19:57     ` Kyle Lippincott
2024-05-13 20:33       ` Junio C Hamano
2024-05-13 21:00         ` Junio C Hamano
2024-05-13 21:07           ` Kyle Lippincott
2024-05-13 19:21   ` [PATCH v2 3/4] builtin/patch-id: fix uninitialized hash function Junio C Hamano
2024-05-13 19:21   ` [PATCH v2 4/4] builtin/hash-object: " Junio C Hamano
2024-05-13 21:28   ` [PATCH 5/4] apply: " Junio C Hamano
2024-05-13 22:41 ` [PATCH v3 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
2024-05-13 22:41   ` [PATCH v3 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
2024-05-13 22:41   ` [PATCH v3 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
2024-05-13 22:41   ` [PATCH v3 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
2024-05-13 23:11     ` Junio C Hamano
2024-05-14  4:31       ` Patrick Steinhardt
2024-05-14 15:52         ` Junio C Hamano
2024-05-13 22:41   ` [PATCH v3 4/5] builtin/hash-object: " Junio C Hamano
2024-05-13 23:13     ` Junio C Hamano
2024-05-14  4:32       ` Patrick Steinhardt
2024-05-14 15:55         ` Junio C Hamano
2024-05-13 22:41   ` [PATCH v3 5/5] apply: " Junio C Hamano
2024-05-14  1:14 ` [PATCH v4 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
2024-05-14  1:14   ` [PATCH v4 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
2024-05-14  4:32     ` Patrick Steinhardt
2024-05-14 15:05       ` Junio C Hamano
2024-05-14 17:19     ` Junio C Hamano
2024-05-15 12:23       ` Patrick Steinhardt
2024-05-16 15:31       ` Junio C Hamano
2024-05-14  1:14   ` [PATCH v4 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
2024-05-14  4:32     ` Patrick Steinhardt
2024-05-14 15:08       ` Junio C Hamano
2024-05-15 12:24         ` Patrick Steinhardt
2024-05-15 14:15           ` Junio C Hamano
2024-05-15 14:25             ` Patrick Steinhardt
2024-05-15 15:40               ` Junio C Hamano
2024-05-14  1:14   ` [PATCH v4 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
2024-05-14  1:14   ` [PATCH v4 4/5] builtin/hash-object: " Junio C Hamano
2024-05-17 23:49     ` Junio C Hamano
2024-05-20 21:19       ` Junio C Hamano
2024-05-20 22:45         ` Junio C Hamano
2024-05-14  1:14   ` [PATCH v4 5/5] apply: " Junio C Hamano
2024-05-20 23:14 ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Junio C Hamano
2024-05-20 23:14   ` [PATCH v5 1/5] setup: add an escape hatch for "no more default hash algorithm" change Junio C Hamano
2024-05-21  7:57     ` Patrick Steinhardt
2024-05-21 15:59       ` Junio C Hamano
2024-05-20 23:14   ` [PATCH v5 2/5] t1517: test commands that are designed to be run outside repository Junio C Hamano
2024-05-20 23:14   ` [PATCH v5 3/5] builtin/patch-id: fix uninitialized hash function Junio C Hamano
2024-05-20 23:14   ` [PATCH v5 4/5] builtin/hash-object: " Junio C Hamano
2024-05-20 23:14   ` [PATCH v5 5/5] apply: " Junio C Hamano
2024-05-21  7:58     ` Patrick Steinhardt
2024-05-21 13:36       ` Junio C Hamano
2024-05-21  7:58   ` [PATCH v5 0/5] Fix use of uninitialized hash algorithms Patrick Steinhardt
2024-05-21 18:07     ` Junio C Hamano
2024-05-22  4:51       ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).