All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] builtin/clone: allow remote helpers to detect repo
@ 2024-02-27 14:27 Patrick Steinhardt
  2024-02-27 14:27 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Patrick Steinhardt
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2024-02-27 14:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]

Hi,

this patch series addresses a regression reported by Mike in Git v2.44
where remote helpers cannot access the Git repository anymore when
running git-clone(1).

The root cause of this is that we have started to initialize the refdb
at a later point, after the helper is spawned. This is required such
that we can initialize it with the correct object format, which fixes
clones of SHA256 repositories with the reftable format and when using
bundles.

The proposed fix here is to partially initialize the refdb with just
enough data such that it can be discovered. The fix isn't pretty, but
addresses the issue. I also couldn't come up with a better idea than
this.

Patrick

Patrick Steinhardt (2):
  refs/reftable: don't fail empty transactions in repo without HEAD
  builtin/clone: allow remote helpers to detect repo

 builtin/clone.c            | 46 ++++++++++++++++++++++++++++++++++++++
 refs/reftable-backend.c    |  1 +
 setup.c                    |  9 +++++++-
 t/t0610-reftable-basics.sh | 13 +++++++++++
 t/t5801/git-remote-testgit |  5 +++++
 5 files changed, 73 insertions(+), 1 deletion(-)


base-commit: a2082dbdd315aa4dd3f315545e5b3ab3b3e2d894
-- 
2.44.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD
  2024-02-27 14:27 [PATCH 0/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
@ 2024-02-27 14:27 ` Patrick Steinhardt
  2024-02-28 11:32   ` Karthik Nayak
  2024-03-01  1:20   ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Justin Tobler
  2024-02-27 14:27 ` [PATCH 2/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
  2024-02-27 20:58 ` [PATCH 0/2] " Junio C Hamano
  2 siblings, 2 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2024-02-27 14:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 2409 bytes --]

Under normal circumstances, it shouldn't ever happen that a repository
has no HEAD reference. In fact, git-update-ref(1) would fail any request
to delete the HEAD reference, and a newly initialized repository always
pre-creates it, too.

But in the next commit, we are going to change git-clone(1) to partially
initialize the refdb just up to the point where remote helpers can find
the repository. With that change, we are going to run into a situation
where repositories have no refs at all.

Now there is a very particular edge case in this situation: when
preparing an empty ref transacton, we end up returning whatever value
`read_ref_without_reload()` returned to the caller. Under normal
conditions this would be fine: "HEAD" should usually exist, and thus the
function would return `0`. But if "HEAD" doesn't exist, the function
returns a positive value which we end up returning to the caller.

Fix this bug by resetting the return code to `0` and add a test.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c    |  1 +
 t/t0610-reftable-basics.sh | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index a14f2ad7f4..45568818f0 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -821,6 +821,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
 				      &head_referent, &head_type);
 	if (ret < 0)
 		goto done;
+	ret = 0;
 
 	for (i = 0; i < transaction->nr; i++) {
 		struct ref_update *u = transaction->updates[i];
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 6a131e40b8..c5f4d23433 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -328,6 +328,19 @@ test_expect_success 'ref transaction: writes are synced' '
 	EOF
 '
 
+test_expect_success 'ref transaction: empty transaction in empty repo' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	test_commit -C repo --no-tag A &&
+	COMMIT=$(git -C repo rev-parse HEAD) &&
+	git -C repo update-ref -d refs/heads/main &&
+	test-tool -C repo ref-store main delete-refs REF_NO_DEREF msg HEAD &&
+	git -C repo update-ref --stdin <<-EOF
+	prepare
+	commit
+	EOF
+'
+
 test_expect_success 'pack-refs: compacts tables' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
-- 
2.44.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] builtin/clone: allow remote helpers to detect repo
  2024-02-27 14:27 [PATCH 0/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
  2024-02-27 14:27 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Patrick Steinhardt
@ 2024-02-27 14:27 ` Patrick Steinhardt
  2024-02-27 21:31   ` Junio C Hamano
  2024-05-03  2:04   ` Mike Hommey
  2024-02-27 20:58 ` [PATCH 0/2] " Junio C Hamano
  2 siblings, 2 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2024-02-27 14:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 6333 bytes --]

In 18c9cb7524 (builtin/clone: create the refdb with the correct object
format, 2023-12-12), we have changed git-clone(1) so that it delays
creation of the refdb until after it has learned about the remote's
object format. This change was required for the reftable backend, which
encodes the object format into the tables. So if we pre-initialized the
refdb with the default object format, but the remote uses a different
object format than that, then the resulting tables would have encoded
the wrong object format.

This change unfortunately breaks remote helpers which try to access the
repository that is about to be created. Because the refdb has not yet
been initialized at the point where we spawn the remote helper, we also
don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by
the remote helper which try to access the repository would fail because
it cannot be discovered.

This is essentially a chicken-and-egg problem: we cannot initialize the
refdb because we don't know about the object format. But we cannot learn
about the object format because the remote helper may be unable to
access the partially-initialized repository.

Ideally, we would address this issue via capabilities. But the remote
helper protocol is not structured in a way that guarantees that the
capability announcement happens before the remote helper tries to access
the repository.

Instead, fix this issue by partially initializing the refdb up to the
point where it becomes discoverable by Git commands.

Reported-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/clone.c            | 46 ++++++++++++++++++++++++++++++++++++++
 setup.c                    |  9 +++++++-
 t/t5801/git-remote-testgit |  5 +++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index bad1b70ce8..5d7f112125 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -926,6 +926,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	struct ref *mapped_refs = NULL;
 	const struct ref *ref;
 	struct strbuf key = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
 	struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
 	struct transport *transport = NULL;
 	const char *src_ref_prefix = "refs/heads/";
@@ -1125,6 +1126,50 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		git_dir = real_git_dir;
 	}
 
+	/*
+	 * We have a chicken-and-egg situation between initializing the refdb
+	 * and spawning transport helpers:
+	 *
+	 *   - Initializing the refdb requires us to know about the object
+	 *     format. We thus have to spawn the transport helper to learn
+	 *     about it.
+	 *
+	 *   - The transport helper may want to access the Git repository. But
+	 *     because the refdb has not been initialized, we don't have "HEAD"
+	 *     or "refs/". Thus, the helper cannot find the Git repository.
+	 *
+	 * Ideally, we would have structured the helper protocol such that it's
+	 * mandatory for the helper to first announce its capabilities without
+	 * yet assuming a fully initialized repository. Like that, we could
+	 * have added a "lazy-refdb-init" capability that announces whether the
+	 * helper is ready to handle not-yet-initialized refdbs. If any helper
+	 * didn't support them, we would have fully initialized the refdb with
+	 * the SHA1 object format, but later on bailed out if we found out that
+	 * the remote repository used a different object format.
+	 *
+	 * But we didn't, and thus we use the following workaround to partially
+	 * initialize the repository's refdb such that it can be discovered by
+	 * Git commands. To do so, we:
+	 *
+	 *   - Create an invalid HEAD ref pointing at "refs/heads/.invalid".
+	 *
+	 *   - Create the "refs/" directory.
+	 *
+	 *   - Set up the ref storage format and repository version as
+	 *     required.
+	 *
+	 * This is sufficient for Git commands to discover the Git directory.
+	 */
+	initialize_repository_version(GIT_HASH_UNKNOWN,
+				      the_repository->ref_storage_format, 1);
+
+	strbuf_addf(&buf, "%s/HEAD", git_dir);
+	write_file(buf.buf, "ref: refs/heads/.invalid");
+
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "%s/refs", git_dir);
+	safe_create_dir(buf.buf, 1);
+
 	/*
 	 * additional config can be injected with -c, make sure it's included
 	 * after init_db, which clears the entire config environment.
@@ -1453,6 +1498,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	free(remote_name);
 	strbuf_release(&reflog_msg);
 	strbuf_release(&branch_top);
+	strbuf_release(&buf);
 	strbuf_release(&key);
 	free_refs(mapped_refs);
 	free_refs(remote_head_points_at);
diff --git a/setup.c b/setup.c
index b69b1cbc2a..e3b76e84b5 100644
--- a/setup.c
+++ b/setup.c
@@ -1889,6 +1889,13 @@ void initialize_repository_version(int hash_algo,
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
 
+	/*
+	 * Note that we initialize the repository version to 1 when the ref
+	 * storage format is unknown. This is on purpose so that we can add the
+	 * correct object format to the config during git-clone(1). The format
+	 * version will get adjusted by git-clone(1) once it has learned about
+	 * the remote repository's format.
+	 */
 	if (hash_algo != GIT_HASH_SHA1 ||
 	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
 		repo_version = GIT_REPO_VERSION_READ;
@@ -1898,7 +1905,7 @@ void initialize_repository_version(int hash_algo,
 		  "%d", repo_version);
 	git_config_set("core.repositoryformatversion", repo_version_string);
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
 		git_config_set("extensions.objectformat",
 			       hash_algos[hash_algo].name);
 	else if (reinit)
diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
index 1544d6dc6b..bcfb358c51 100755
--- a/t/t5801/git-remote-testgit
+++ b/t/t5801/git-remote-testgit
@@ -12,6 +12,11 @@ url=$2
 
 dir="$GIT_DIR/testgit/$alias"
 
+if ! git rev-parse --is-inside-git-dir
+then
+	exit 1
+fi
+
 h_refspec="refs/heads/*:refs/testgit/$alias/heads/*"
 t_refspec="refs/tags/*:refs/testgit/$alias/tags/*"
 
-- 
2.44.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] builtin/clone: allow remote helpers to detect repo
  2024-02-27 14:27 [PATCH 0/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
  2024-02-27 14:27 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Patrick Steinhardt
  2024-02-27 14:27 ` [PATCH 2/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
@ 2024-02-27 20:58 ` Junio C Hamano
  2024-02-27 21:33   ` Junio C Hamano
  2 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2024-02-27 20:58 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Mike Hommey

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this patch series addresses a regression reported by Mike in Git v2.44
> where remote helpers cannot access the Git repository anymore when
> running git-clone(1).
> ...
>  builtin/clone.c            | 46 ++++++++++++++++++++++++++++++++++++++
>  refs/reftable-backend.c    |  1 +

Sorry, but this confuses me.  Was a regression really in v2.44.0,
where refs/reftable-backend.c did not even exist?  If so why does a
fix for it need to touch that file?

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] builtin/clone: allow remote helpers to detect repo
  2024-02-27 14:27 ` [PATCH 2/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
@ 2024-02-27 21:31   ` Junio C Hamano
  2024-03-04  6:44     ` Patrick Steinhardt
  2024-05-03  2:04   ` Mike Hommey
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2024-02-27 21:31 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Mike Hommey

Patrick Steinhardt <ps@pks.im> writes:

> Instead, fix this issue by partially initializing the refdb up to the
> point where it becomes discoverable by Git commands.

As you mentioned earlier, this indeed is ugly, but I agree that
there is no other reasonable way out.  I am also unsure if this is a
viable workaround in the longer term, but it should do for now.

> +	/*
> +	 * We have a chicken-and-egg situation between initializing the refdb
> +	 * and spawning transport helpers:
> +	 *
> +	 *   - Initializing the refdb requires us to know about the object
> +	 *     format. We thus have to spawn the transport helper to learn
> +	 *     about it.
> +	 *   - The transport helper may want to access the Git repository. But
> +	 *     because the refdb has not been initialized, we don't have "HEAD"
> +	 *     or "refs/". Thus, the helper cannot find the Git repository.
> +	 *
> +	 * Ideally, we would have structured the helper protocol such that it's
> +	 * mandatory for the helper to first announce its capabilities without
> +	 * yet assuming a fully initialized repository. Like that, we could
> +	 * have added a "lazy-refdb-init" capability that announces whether the
> +	 * helper is ready to handle not-yet-initialized refdbs. If any helper
> +	 * didn't support them, we would have fully initialized the refdb with
> +	 * the SHA1 object format, but later on bailed out if we found out that
> +	 * the remote repository used a different object format.
> +	 * But we didn't, and thus we use the following workaround to partially
> +	 * initialize the repository's refdb such that it can be discovered by
> +	 * Git commands. To do so, we:
> +	 *
> +	 *   - Create an invalid HEAD ref pointing at "refs/heads/.invalid".
> +	 *
> +	 *   - Create the "refs/" directory.
> +	 *
> +	 *   - Set up the ref storage format and repository version as
> +	 *     required.

"as required" by whom and in what way?  

"The code to recognize a repository requires them to be set already,
but they do not have to be the real value---we just assign random
valid values for now, let remote helper do its work and then set the
real values after they are done" would be a plausible interpretation
of the above.  Is that what is going on?

> +	 * This is sufficient for Git commands to discover the Git directory.
> +	 */
> +	initialize_repository_version(GIT_HASH_UNKNOWN,
> +				      the_repository->ref_storage_format, 1);

OK, so we start with "we do not know it yet" here.

> +	strbuf_addf(&buf, "%s/HEAD", git_dir);
> +	write_file(buf.buf, "ref: refs/heads/.invalid");
> +
> +	strbuf_reset(&buf);
> +	strbuf_addf(&buf, "%s/refs", git_dir);
> +	safe_create_dir(buf.buf, 1);
> +
>  	/*
>  	 * additional config can be injected with -c, make sure it's included
>  	 * after init_db, which clears the entire config environment.
> @@ -1453,6 +1498,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	free(remote_name);
>  	strbuf_release(&reflog_msg);
>  	strbuf_release(&branch_top);
> +	strbuf_release(&buf);
>  	strbuf_release(&key);
>  	free_refs(mapped_refs);
>  	free_refs(remote_head_points_at);
> diff --git a/setup.c b/setup.c
> index b69b1cbc2a..e3b76e84b5 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1889,6 +1889,13 @@ void initialize_repository_version(int hash_algo,
>  	char repo_version_string[10];
>  	int repo_version = GIT_REPO_VERSION;
>  
> +	/*
> +	 * Note that we initialize the repository version to 1 when the ref
> +	 * storage format is unknown. This is on purpose so that we can add the
> +	 * correct object format to the config during git-clone(1). The format
> +	 * version will get adjusted by git-clone(1) once it has learned about
> +	 * the remote repository's format.
> +	 */
>  	if (hash_algo != GIT_HASH_SHA1 ||
>  	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
>  		repo_version = GIT_REPO_VERSION_READ;
> @@ -1898,7 +1905,7 @@ void initialize_repository_version(int hash_algo,
>  		  "%d", repo_version);
>  	git_config_set("core.repositoryformatversion", repo_version_string);
>  
> -	if (hash_algo != GIT_HASH_SHA1)
> +	if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
>  		git_config_set("extensions.objectformat",
>  			       hash_algos[hash_algo].name);
>  	else if (reinit)

Here we refrain from writing extensions.objectformat in the
repository when we do not know what hash function is being used.
Without this change, we would instead remove extensions.objectformat,
which does not sound too bad, as "reinit" is passed by the new call
we saw above and this "else if (reinit)" will do exactly that anyway.

In any case, after we finish talking with the other side over the
transport, we make another call to initialize_repository_version()
with the real objectformat and everything will be fine.

The ealier description of the implementation idea made it sound
really yucky, but I do not think the solution presented here is too
bad.

> diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
> index 1544d6dc6b..bcfb358c51 100755
> --- a/t/t5801/git-remote-testgit
> +++ b/t/t5801/git-remote-testgit
> @@ -12,6 +12,11 @@ url=$2
>  
>  dir="$GIT_DIR/testgit/$alias"
>  
> +if ! git rev-parse --is-inside-git-dir
> +then
> +	exit 1
> +fi
> +
>  h_refspec="refs/heads/*:refs/testgit/$alias/heads/*"
>  t_refspec="refs/tags/*:refs/testgit/$alias/tags/*"

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] builtin/clone: allow remote helpers to detect repo
  2024-02-27 20:58 ` [PATCH 0/2] " Junio C Hamano
@ 2024-02-27 21:33   ` Junio C Hamano
  2024-03-04  6:44     ` Patrick Steinhardt
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2024-02-27 21:33 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Mike Hommey

Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> Hi,
>>
>> this patch series addresses a regression reported by Mike in Git v2.44
>> where remote helpers cannot access the Git repository anymore when
>> running git-clone(1).
>> ...
>>  builtin/clone.c            | 46 ++++++++++++++++++++++++++++++++++++++
>>  refs/reftable-backend.c    |  1 +
>
> Sorry, but this confuses me.  Was a regression really in v2.44.0,
> where refs/reftable-backend.c did not even exist?  If so why does a
> fix for it need to touch that file?
>
> Thanks.

I guess [2/2] alone is the fix to be applied directly on top of v2.44.0
and eventually be merged to 'maint' to become v2.44.1 release, while
[1/2] is necessary to adjust reftable backend when such a fix is
merged to more recent codebase that already has the reftable
backend?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD
  2024-02-27 14:27 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Patrick Steinhardt
@ 2024-02-28 11:32   ` Karthik Nayak
  2024-03-04  6:44     ` Patrick Steinhardt
  2024-03-01  1:20   ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Justin Tobler
  1 sibling, 1 reply; 18+ messages in thread
From: Karthik Nayak @ 2024-02-28 11:32 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 2395 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Under normal circumstances, it shouldn't ever happen that a repository
> has no HEAD reference. In fact, git-update-ref(1) would fail any request
> to delete the HEAD reference, and a newly initialized repository always
> pre-creates it, too.
>
> But in the next commit, we are going to change git-clone(1) to partially
> initialize the refdb just up to the point where remote helpers can find
> the repository. With that change, we are going to run into a situation
> where repositories have no refs at all.
>
> Now there is a very particular edge case in this situation: when
> preparing an empty ref transacton, we end up returning whatever value
> `read_ref_without_reload()` returned to the caller. Under normal
> conditions this would be fine: "HEAD" should usually exist, and thus the
> function would return `0`. But if "HEAD" doesn't exist, the function
> returns a positive value which we end up returning to the caller.
>
> Fix this bug by resetting the return code to `0` and add a test.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/reftable-backend.c    |  1 +
>  t/t0610-reftable-basics.sh | 13 +++++++++++++
>  2 files changed, 14 insertions(+)
>
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index a14f2ad7f4..45568818f0 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -821,6 +821,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  				      &head_referent, &head_type);
>  	if (ret < 0)
>  		goto done;
> +	ret = 0;
>

So this is not really a problem in this function, it's more of that
`refs.c:ref_transaction_prepare` checks if `ret` is non-zero.

Nit: would be nice to have a comment about why overriding this value is
ok.

>  	for (i = 0; i < transaction->nr; i++) {
>  		struct ref_update *u = transaction->updates[i];
> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> index 6a131e40b8..c5f4d23433 100755
> --- a/t/t0610-reftable-basics.sh
> +++ b/t/t0610-reftable-basics.sh
> @@ -328,6 +328,19 @@ test_expect_success 'ref transaction: writes are synced' '
>  	EOF
>  '
>
> +test_expect_success 'ref transaction: empty transaction in empty repo' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	test_commit -C repo --no-tag A &&
> +	COMMIT=$(git -C repo rev-parse HEAD) &&

why do we do this?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD
  2024-02-27 14:27 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Patrick Steinhardt
  2024-02-28 11:32   ` Karthik Nayak
@ 2024-03-01  1:20   ` Justin Tobler
  2024-03-04  6:44     ` Patrick Steinhardt
  1 sibling, 1 reply; 18+ messages in thread
From: Justin Tobler @ 2024-03-01  1:20 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Mike Hommey

On 24/02/27 03:27PM, Patrick Steinhardt wrote:
> Under normal circumstances, it shouldn't ever happen that a repository
> has no HEAD reference. In fact, git-update-ref(1) would fail any request
> to delete the HEAD reference, and a newly initialized repository always
> pre-creates it, too.
> 
> But in the next commit, we are going to change git-clone(1) to partially
> initialize the refdb just up to the point where remote helpers can find
> the repository. With that change, we are going to run into a situation
> where repositories have no refs at all.
> 
> Now there is a very particular edge case in this situation: when
> preparing an empty ref transacton, we end up returning whatever value
> `read_ref_without_reload()` returned to the caller. Under normal
> conditions this would be fine: "HEAD" should usually exist, and thus the
> function would return `0`. But if "HEAD" doesn't exist, the function
> returns a positive value which we end up returning to the caller.

In what context are reference transactions being created before the
refdb is fully initialized? Is this in regards to repositories
initialized with the reftables backend and used during execution of a
remote-helper? I was originally under the impression that a partially
initalized refdb would appear as the reffile backend before being fully
initialized.
> 
> Fix this bug by resetting the return code to `0` and add a test.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>

-Justin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD
  2024-03-01  1:20   ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Justin Tobler
@ 2024-03-04  6:44     ` Patrick Steinhardt
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2024-03-04  6:44 UTC (permalink / raw)
  To: git, Junio C Hamano, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 1693 bytes --]

On Thu, Feb 29, 2024 at 07:20:42PM -0600, Justin Tobler wrote:
> On 24/02/27 03:27PM, Patrick Steinhardt wrote:
> > Under normal circumstances, it shouldn't ever happen that a repository
> > has no HEAD reference. In fact, git-update-ref(1) would fail any request
> > to delete the HEAD reference, and a newly initialized repository always
> > pre-creates it, too.
> > 
> > But in the next commit, we are going to change git-clone(1) to partially
> > initialize the refdb just up to the point where remote helpers can find
> > the repository. With that change, we are going to run into a situation
> > where repositories have no refs at all.
> > 
> > Now there is a very particular edge case in this situation: when
> > preparing an empty ref transacton, we end up returning whatever value
> > `read_ref_without_reload()` returned to the caller. Under normal
> > conditions this would be fine: "HEAD" should usually exist, and thus the
> > function would return `0`. But if "HEAD" doesn't exist, the function
> > returns a positive value which we end up returning to the caller.
> 
> In what context are reference transactions being created before the
> refdb is fully initialized? Is this in regards to repositories
> initialized with the reftables backend and used during execution of a
> remote-helper? I was originally under the impression that a partially
> initalized refdb would appear as the reffile backend before being fully
> initialized.

The only case where this can happen is during git-clone(1).

Patrick

> > Fix this bug by resetting the return code to `0` and add a test.
> > 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> 
> -Justin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD
  2024-02-28 11:32   ` Karthik Nayak
@ 2024-03-04  6:44     ` Patrick Steinhardt
  2024-03-04 16:28       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick Steinhardt @ 2024-03-04  6:44 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Junio C Hamano, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 2845 bytes --]

On Wed, Feb 28, 2024 at 03:32:35AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Under normal circumstances, it shouldn't ever happen that a repository
> > has no HEAD reference. In fact, git-update-ref(1) would fail any request
> > to delete the HEAD reference, and a newly initialized repository always
> > pre-creates it, too.
> >
> > But in the next commit, we are going to change git-clone(1) to partially
> > initialize the refdb just up to the point where remote helpers can find
> > the repository. With that change, we are going to run into a situation
> > where repositories have no refs at all.
> >
> > Now there is a very particular edge case in this situation: when
> > preparing an empty ref transacton, we end up returning whatever value
> > `read_ref_without_reload()` returned to the caller. Under normal
> > conditions this would be fine: "HEAD" should usually exist, and thus the
> > function would return `0`. But if "HEAD" doesn't exist, the function
> > returns a positive value which we end up returning to the caller.
> >
> > Fix this bug by resetting the return code to `0` and add a test.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  refs/reftable-backend.c    |  1 +
> >  t/t0610-reftable-basics.sh | 13 +++++++++++++
> >  2 files changed, 14 insertions(+)
> >
> > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> > index a14f2ad7f4..45568818f0 100644
> > --- a/refs/reftable-backend.c
> > +++ b/refs/reftable-backend.c
> > @@ -821,6 +821,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
> >  				      &head_referent, &head_type);
> >  	if (ret < 0)
> >  		goto done;
> > +	ret = 0;
> >
> 
> So this is not really a problem in this function, it's more of that
> `refs.c:ref_transaction_prepare` checks if `ret` is non-zero.

Well, yes. I'd claim that it is a problem in this function because it
returns positive even though the transaction was prepared successfully.

> Nit: would be nice to have a comment about why overriding this value is
> ok.

True.

> >  	for (i = 0; i < transaction->nr; i++) {
> >  		struct ref_update *u = transaction->updates[i];
> > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> > index 6a131e40b8..c5f4d23433 100755
> > --- a/t/t0610-reftable-basics.sh
> > +++ b/t/t0610-reftable-basics.sh
> > @@ -328,6 +328,19 @@ test_expect_success 'ref transaction: writes are synced' '
> >  	EOF
> >  '
> >
> > +test_expect_success 'ref transaction: empty transaction in empty repo' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	test_commit -C repo --no-tag A &&
> > +	COMMIT=$(git -C repo rev-parse HEAD) &&
> 
> why do we do this?

Oh, true, this isn't actually needed.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] builtin/clone: allow remote helpers to detect repo
  2024-02-27 21:33   ` Junio C Hamano
@ 2024-03-04  6:44     ` Patrick Steinhardt
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2024-03-04  6:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]

On Tue, Feb 27, 2024 at 01:33:55PM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> >> Hi,
> >>
> >> this patch series addresses a regression reported by Mike in Git v2.44
> >> where remote helpers cannot access the Git repository anymore when
> >> running git-clone(1).
> >> ...
> >>  builtin/clone.c            | 46 ++++++++++++++++++++++++++++++++++++++
> >>  refs/reftable-backend.c    |  1 +
> >
> > Sorry, but this confuses me.  Was a regression really in v2.44.0,
> > where refs/reftable-backend.c did not even exist?  If so why does a
> > fix for it need to touch that file?
> >
> > Thanks.
> 
> I guess [2/2] alone is the fix to be applied directly on top of v2.44.0
> and eventually be merged to 'maint' to become v2.44.1 release, while
> [1/2] is necessary to adjust reftable backend when such a fix is
> merged to more recent codebase that already has the reftable
> backend?

Yes, exactly! Sorry, should've explained this more thoroughly.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] builtin/clone: allow remote helpers to detect repo
  2024-02-27 21:31   ` Junio C Hamano
@ 2024-03-04  6:44     ` Patrick Steinhardt
  2024-03-04 16:17       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick Steinhardt @ 2024-03-04  6:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 6140 bytes --]

On Tue, Feb 27, 2024 at 01:31:48PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Instead, fix this issue by partially initializing the refdb up to the
> > point where it becomes discoverable by Git commands.
> 
> As you mentioned earlier, this indeed is ugly, but I agree that
> there is no other reasonable way out.  I am also unsure if this is a
> viable workaround in the longer term, but it should do for now.
> 
> > +	/*
> > +	 * We have a chicken-and-egg situation between initializing the refdb
> > +	 * and spawning transport helpers:
> > +	 *
> > +	 *   - Initializing the refdb requires us to know about the object
> > +	 *     format. We thus have to spawn the transport helper to learn
> > +	 *     about it.
> > +	 *   - The transport helper may want to access the Git repository. But
> > +	 *     because the refdb has not been initialized, we don't have "HEAD"
> > +	 *     or "refs/". Thus, the helper cannot find the Git repository.
> > +	 *
> > +	 * Ideally, we would have structured the helper protocol such that it's
> > +	 * mandatory for the helper to first announce its capabilities without
> > +	 * yet assuming a fully initialized repository. Like that, we could
> > +	 * have added a "lazy-refdb-init" capability that announces whether the
> > +	 * helper is ready to handle not-yet-initialized refdbs. If any helper
> > +	 * didn't support them, we would have fully initialized the refdb with
> > +	 * the SHA1 object format, but later on bailed out if we found out that
> > +	 * the remote repository used a different object format.
> > +	 * But we didn't, and thus we use the following workaround to partially
> > +	 * initialize the repository's refdb such that it can be discovered by
> > +	 * Git commands. To do so, we:
> > +	 *
> > +	 *   - Create an invalid HEAD ref pointing at "refs/heads/.invalid".
> > +	 *
> > +	 *   - Create the "refs/" directory.
> > +	 *
> > +	 *   - Set up the ref storage format and repository version as
> > +	 *     required.
> 
> "as required" by whom and in what way?  
> 
> "The code to recognize a repository requires them to be set already,
> but they do not have to be the real value---we just assign random
> valid values for now, let remote helper do its work and then set the
> real values after they are done" would be a plausible interpretation
> of the above.  Is that what is going on?

Partially. While we cannot yet determine the object format, we do know
the ref storage format as it is specified by the caller when invoking
git-clone(1) with the `--ref-format=` switch, not by the remote repo.
In that case, we'd have to set up the ref format accordingly and also
set the repository version to "1".

Patrick

> > +	 * This is sufficient for Git commands to discover the Git directory.
> > +	 */
> > +	initialize_repository_version(GIT_HASH_UNKNOWN,
> > +				      the_repository->ref_storage_format, 1);
> 
> OK, so we start with "we do not know it yet" here.
> 
> > +	strbuf_addf(&buf, "%s/HEAD", git_dir);
> > +	write_file(buf.buf, "ref: refs/heads/.invalid");
> > +
> > +	strbuf_reset(&buf);
> > +	strbuf_addf(&buf, "%s/refs", git_dir);
> > +	safe_create_dir(buf.buf, 1);
> > +
> >  	/*
> >  	 * additional config can be injected with -c, make sure it's included
> >  	 * after init_db, which clears the entire config environment.
> > @@ -1453,6 +1498,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> >  	free(remote_name);
> >  	strbuf_release(&reflog_msg);
> >  	strbuf_release(&branch_top);
> > +	strbuf_release(&buf);
> >  	strbuf_release(&key);
> >  	free_refs(mapped_refs);
> >  	free_refs(remote_head_points_at);
> > diff --git a/setup.c b/setup.c
> > index b69b1cbc2a..e3b76e84b5 100644
> > --- a/setup.c
> > +++ b/setup.c
> > @@ -1889,6 +1889,13 @@ void initialize_repository_version(int hash_algo,
> >  	char repo_version_string[10];
> >  	int repo_version = GIT_REPO_VERSION;
> >  
> > +	/*
> > +	 * Note that we initialize the repository version to 1 when the ref
> > +	 * storage format is unknown. This is on purpose so that we can add the
> > +	 * correct object format to the config during git-clone(1). The format
> > +	 * version will get adjusted by git-clone(1) once it has learned about
> > +	 * the remote repository's format.
> > +	 */
> >  	if (hash_algo != GIT_HASH_SHA1 ||
> >  	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
> >  		repo_version = GIT_REPO_VERSION_READ;
> > @@ -1898,7 +1905,7 @@ void initialize_repository_version(int hash_algo,
> >  		  "%d", repo_version);
> >  	git_config_set("core.repositoryformatversion", repo_version_string);
> >  
> > -	if (hash_algo != GIT_HASH_SHA1)
> > +	if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
> >  		git_config_set("extensions.objectformat",
> >  			       hash_algos[hash_algo].name);
> >  	else if (reinit)
> 
> Here we refrain from writing extensions.objectformat in the
> repository when we do not know what hash function is being used.
> Without this change, we would instead remove extensions.objectformat,
> which does not sound too bad, as "reinit" is passed by the new call
> we saw above and this "else if (reinit)" will do exactly that anyway.
> 
> In any case, after we finish talking with the other side over the
> transport, we make another call to initialize_repository_version()
> with the real objectformat and everything will be fine.
> 
> The ealier description of the implementation idea made it sound
> really yucky, but I do not think the solution presented here is too
> bad.
> 
> > diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
> > index 1544d6dc6b..bcfb358c51 100755
> > --- a/t/t5801/git-remote-testgit
> > +++ b/t/t5801/git-remote-testgit
> > @@ -12,6 +12,11 @@ url=$2
> >  
> >  dir="$GIT_DIR/testgit/$alias"
> >  
> > +if ! git rev-parse --is-inside-git-dir
> > +then
> > +	exit 1
> > +fi
> > +
> >  h_refspec="refs/heads/*:refs/testgit/$alias/heads/*"
> >  t_refspec="refs/tags/*:refs/testgit/$alias/tags/*"

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] builtin/clone: allow remote helpers to detect repo
  2024-03-04  6:44     ` Patrick Steinhardt
@ 2024-03-04 16:17       ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2024-03-04 16:17 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Mike Hommey

Patrick Steinhardt <ps@pks.im> writes:

>> > +	 * Git commands. To do so, we:
>> > +	 *
>> > +	 *   - Create an invalid HEAD ref pointing at "refs/heads/.invalid".
>> > +	 *
>> > +	 *   - Create the "refs/" directory.
>> > +	 *
>> > +	 *   - Set up the ref storage format and repository version as
>> > +	 *     required.
>> 
>> "as required" by whom and in what way?  
>> 
>> "The code to recognize a repository requires them to be set already,
>> but they do not have to be the real value---we just assign random
>> valid values for now, let remote helper do its work and then set the
>> real values after they are done" would be a plausible interpretation
>> of the above.  Is that what is going on?
>
> Partially. While we cannot yet determine the object format, we do know
> the ref storage format as it is specified by the caller when invoking
> git-clone(1) with the `--ref-format=` switch, not by the remote repo.
> In that case, we'd have to set up the ref format accordingly and also
> set the repository version to "1".

Ah, of course.  Unlike the object format, we do not say "the other
end uses reftable, so we should do the same" and we do not have to,
thanks to some artificial limitations added to the reftable backend
not to exceed the expressiveness of the files backend.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD
  2024-03-04  6:44     ` Patrick Steinhardt
@ 2024-03-04 16:28       ` Junio C Hamano
  2024-03-05 11:43         ` Patrick Steinhardt
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2024-03-04 16:28 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, git, Mike Hommey

Patrick Steinhardt <ps@pks.im> writes:

>> > Now there is a very particular edge case in this situation: when
>> > preparing an empty ref transacton, we end up returning whatever value
>> > `read_ref_without_reload()` returned to the caller. Under normal
>> > conditions this would be fine: "HEAD" should usually exist, and thus the
>> > function would return `0`. But if "HEAD" doesn't exist, the function
>> > returns a positive value which we end up returning to the caller.
>> >
>> > Fix this bug by resetting the return code to `0` and add a test.

So this _will_ surface as a bug when the other change in the series
is applied, but it nevertheless is worth fixing independently of the
other one, because ...

>> > @@ -821,6 +821,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>> >  				      &head_referent, &head_type);
>> >  	if (ret < 0)
>> >  		goto done;
>> > +	ret = 0;

... after "if the refs->err records an error already, skip
everything we do and return to the caller", we should take the
ownership of what we return (which will be in "ret") from now on.

So the current code uses "ret" as an uninitialized variable, even
not technically so because it is "initialized" to "refs->err"
upfront, and this is like a fix of uninitialized variable use.

>> So this is not really a problem in this function, it's more of that
>> `refs.c:ref_transaction_prepare` checks if `ret` is non-zero.
>
> Well, yes. I'd claim that it is a problem in this function because it
> returns positive even though the transaction was prepared successfully.
>
>> Nit: would be nice to have a comment about why overriding this value is
>> ok.
>
> True.

Yup.  It seems we will see a v2 for updating the test code as well,
so I'll assume that you'd explain this as an independent fix (as
well as a required preliminary fix for the other one).

Thanks, both.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD
  2024-03-04 16:28       ` Junio C Hamano
@ 2024-03-05 11:43         ` Patrick Steinhardt
  2024-03-05 15:59           ` Junio C Hamano
  2024-03-06 11:17           ` [PATCH] t0610: remove unused variable assignment Patrick Steinhardt
  0 siblings, 2 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2024-03-05 11:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, git, Mike Hommey

[-- Attachment #1: Type: text/plain, Size: 3296 bytes --]

On Mon, Mar 04, 2024 at 08:28:17AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> > Now there is a very particular edge case in this situation: when
> >> > preparing an empty ref transacton, we end up returning whatever value
> >> > `read_ref_without_reload()` returned to the caller. Under normal
> >> > conditions this would be fine: "HEAD" should usually exist, and thus the
> >> > function would return `0`. But if "HEAD" doesn't exist, the function
> >> > returns a positive value which we end up returning to the caller.
> >> >
> >> > Fix this bug by resetting the return code to `0` and add a test.
> 
> So this _will_ surface as a bug when the other change in the series
> is applied, but it nevertheless is worth fixing independently of the
> other one, because ...
> 
> >> > @@ -821,6 +821,7 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
> >> >  				      &head_referent, &head_type);
> >> >  	if (ret < 0)
> >> >  		goto done;
> >> > +	ret = 0;
> 
> ... after "if the refs->err records an error already, skip
> everything we do and return to the caller", we should take the
> ownership of what we return (which will be in "ret") from now on.
> 
> So the current code uses "ret" as an uninitialized variable, even
> not technically so because it is "initialized" to "refs->err"
> upfront, and this is like a fix of uninitialized variable use.

The problem is a bit different. We call `read_ref_without_reload()` to
look up the "HEAD" ref, which will return a positive value in case the
ref wasn't found. This is customary in the reftable library: positive
error values indicate that an iter is over, and thus by extension that a
value was not found. It's fine though if the ref doesn't exist, and we
handle that case gracefully.

The only exception is when the transaction is also empty. In that case,
we skip the loop and thus end up not assigning to `ret` anymore. Thus,
the positive error code we still have in `ret` from the failed "HEAD"
lookup gets returned to the caller, which is wrong.

So it's not uninitialized, it rather is stale.

But yes, the bug _can_ be hit independently of the second patch in this
series. It's just really unlikely as a repo without "HEAD" is considered
to be broken anyway.

> >> So this is not really a problem in this function, it's more of that
> >> `refs.c:ref_transaction_prepare` checks if `ret` is non-zero.
> >
> > Well, yes. I'd claim that it is a problem in this function because it
> > returns positive even though the transaction was prepared successfully.
> >
> >> Nit: would be nice to have a comment about why overriding this value is
> >> ok.
> >
> > True.
> 
> Yup.  It seems we will see a v2 for updating the test code as well,
> so I'll assume that you'd explain this as an independent fix (as
> well as a required preliminary fix for the other one).

I see that the patch series has been merged to "next" a few days ago
though, and is slated to be merged to "master". That's why I refrained
from sending a v2.

I can send a follow-up patch to remove the useless variable assignment
in the test, but other than that I don't think anything needs to change
here. Or did I miss anything else?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD
  2024-03-05 11:43         ` Patrick Steinhardt
@ 2024-03-05 15:59           ` Junio C Hamano
  2024-03-06 11:17           ` [PATCH] t0610: remove unused variable assignment Patrick Steinhardt
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2024-03-05 15:59 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, git, Mike Hommey

Patrick Steinhardt <ps@pks.im> writes:

> So it's not uninitialized, it rather is stale.

What I wanted to say with "technically it is not" was that these two
are moral equivalents in the code around there after the early return
happened ;-).

> I can send a follow-up patch to remove the useless variable assignment
> in the test, but other than that I don't think anything needs to change
> here.

OK, thanks.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] t0610: remove unused variable assignment
  2024-03-05 11:43         ` Patrick Steinhardt
  2024-03-05 15:59           ` Junio C Hamano
@ 2024-03-06 11:17           ` Patrick Steinhardt
  1 sibling, 0 replies; 18+ messages in thread
From: Patrick Steinhardt @ 2024-03-06 11:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 936 bytes --]

In b0f6b6b523 (refs/reftable: don't fail empty transactions in repo
without HEAD, 2024-02-27), we have added a new test to t0610. This test
contains a useless assignment to a variable that is never actually used.
Remove it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t0610-reftable-basics.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index c5f4d23433..686781192e 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -332,7 +332,6 @@ test_expect_success 'ref transaction: empty transaction in empty repo' '
 	test_when_finished "rm -rf repo" &&
 	git init repo &&
 	test_commit -C repo --no-tag A &&
-	COMMIT=$(git -C repo rev-parse HEAD) &&
 	git -C repo update-ref -d refs/heads/main &&
 	test-tool -C repo ref-store main delete-refs REF_NO_DEREF msg HEAD &&
 	git -C repo update-ref --stdin <<-EOF
-- 
2.44.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] builtin/clone: allow remote helpers to detect repo
  2024-02-27 14:27 ` [PATCH 2/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
  2024-02-27 21:31   ` Junio C Hamano
@ 2024-05-03  2:04   ` Mike Hommey
  1 sibling, 0 replies; 18+ messages in thread
From: Mike Hommey @ 2024-05-03  2:04 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano

Hey, it's me again, coming back way too late to figure out that this fix
had an adverse side effect in a corner case.

Here's a simplified reproducer:
```
$ cat > git-remote-foo <<EOF
#!/bin/sh
# capabilities
echo option
echo import
echo push
echo
# option progress true
echo ok
# option verbosity 1
echo ok
# list
echo
EOF
$ chmod +x git-remote-foo
$ PATH=$PWD:$PATH git clone foo::bar
Cloning into 'bar'...
created HEAD
warning: this remote helper should implement refspec capability
created reference db
warning: You appear to have cloned an empty repository.
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: 	git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: 	git branch -m <name>
```

That hint is the adverse side effect. It shouldn't be shown.

It's happening because `builtin/clone.c` does this:
	`create_reference_database(the_repository->ref_storage_format, NULL, 1);`

`create_reference` calls `is_reinit`, which returns true because there is
HEAD, and proceeds to skip the block behind `if (!reinit)`.

Before the change, that code block that is now skipped would have called
`git_default_branch_name(quiet)` (where `quiet` is 1, per the above call
to `create_reference_database`). In turn
`git_default_branch_name(quiet)` would have a) skipped the initial
branch hint (because quiet), and b) cached the branch name for subsequent
`git_default_branch_name` calls.

Then later in `builtin/clone.c`, we end up in this code:
	`branch = git_default_branch_name(0);`

Before the change, this would get the cache value from the previous
call, but now it has to compute it, and because 0 is passed, it's not
quiet about the initial branch hint.

The non-remote-helper case goes through with this without a problem
because it ends up in the transport_ls_refs_options.unborn_head_target
case and never calls `git_default_branch_name(0)`.

Mike

On Tue, Feb 27, 2024 at 03:27:44PM +0100, Patrick Steinhardt wrote:
> In 18c9cb7524 (builtin/clone: create the refdb with the correct object
> format, 2023-12-12), we have changed git-clone(1) so that it delays
> creation of the refdb until after it has learned about the remote's
> object format. This change was required for the reftable backend, which
> encodes the object format into the tables. So if we pre-initialized the
> refdb with the default object format, but the remote uses a different
> object format than that, then the resulting tables would have encoded
> the wrong object format.
> 
> This change unfortunately breaks remote helpers which try to access the
> repository that is about to be created. Because the refdb has not yet
> been initialized at the point where we spawn the remote helper, we also
> don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by
> the remote helper which try to access the repository would fail because
> it cannot be discovered.
> 
> This is essentially a chicken-and-egg problem: we cannot initialize the
> refdb because we don't know about the object format. But we cannot learn
> about the object format because the remote helper may be unable to
> access the partially-initialized repository.
> 
> Ideally, we would address this issue via capabilities. But the remote
> helper protocol is not structured in a way that guarantees that the
> capability announcement happens before the remote helper tries to access
> the repository.
> 
> Instead, fix this issue by partially initializing the refdb up to the
> point where it becomes discoverable by Git commands.
> 
> Reported-by: Mike Hommey <mh@glandium.org>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/clone.c            | 46 ++++++++++++++++++++++++++++++++++++++
>  setup.c                    |  9 +++++++-
>  t/t5801/git-remote-testgit |  5 +++++
>  3 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index bad1b70ce8..5d7f112125 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -926,6 +926,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	struct ref *mapped_refs = NULL;
>  	const struct ref *ref;
>  	struct strbuf key = STRBUF_INIT;
> +	struct strbuf buf = STRBUF_INIT;
>  	struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
>  	struct transport *transport = NULL;
>  	const char *src_ref_prefix = "refs/heads/";
> @@ -1125,6 +1126,50 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  		git_dir = real_git_dir;
>  	}
>  
> +	/*
> +	 * We have a chicken-and-egg situation between initializing the refdb
> +	 * and spawning transport helpers:
> +	 *
> +	 *   - Initializing the refdb requires us to know about the object
> +	 *     format. We thus have to spawn the transport helper to learn
> +	 *     about it.
> +	 *
> +	 *   - The transport helper may want to access the Git repository. But
> +	 *     because the refdb has not been initialized, we don't have "HEAD"
> +	 *     or "refs/". Thus, the helper cannot find the Git repository.
> +	 *
> +	 * Ideally, we would have structured the helper protocol such that it's
> +	 * mandatory for the helper to first announce its capabilities without
> +	 * yet assuming a fully initialized repository. Like that, we could
> +	 * have added a "lazy-refdb-init" capability that announces whether the
> +	 * helper is ready to handle not-yet-initialized refdbs. If any helper
> +	 * didn't support them, we would have fully initialized the refdb with
> +	 * the SHA1 object format, but later on bailed out if we found out that
> +	 * the remote repository used a different object format.
> +	 *
> +	 * But we didn't, and thus we use the following workaround to partially
> +	 * initialize the repository's refdb such that it can be discovered by
> +	 * Git commands. To do so, we:
> +	 *
> +	 *   - Create an invalid HEAD ref pointing at "refs/heads/.invalid".
> +	 *
> +	 *   - Create the "refs/" directory.
> +	 *
> +	 *   - Set up the ref storage format and repository version as
> +	 *     required.
> +	 *
> +	 * This is sufficient for Git commands to discover the Git directory.
> +	 */
> +	initialize_repository_version(GIT_HASH_UNKNOWN,
> +				      the_repository->ref_storage_format, 1);
> +
> +	strbuf_addf(&buf, "%s/HEAD", git_dir);
> +	write_file(buf.buf, "ref: refs/heads/.invalid");
> +
> +	strbuf_reset(&buf);
> +	strbuf_addf(&buf, "%s/refs", git_dir);
> +	safe_create_dir(buf.buf, 1);
> +
>  	/*
>  	 * additional config can be injected with -c, make sure it's included
>  	 * after init_db, which clears the entire config environment.
> @@ -1453,6 +1498,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	free(remote_name);
>  	strbuf_release(&reflog_msg);
>  	strbuf_release(&branch_top);
> +	strbuf_release(&buf);
>  	strbuf_release(&key);
>  	free_refs(mapped_refs);
>  	free_refs(remote_head_points_at);
> diff --git a/setup.c b/setup.c
> index b69b1cbc2a..e3b76e84b5 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1889,6 +1889,13 @@ void initialize_repository_version(int hash_algo,
>  	char repo_version_string[10];
>  	int repo_version = GIT_REPO_VERSION;
>  
> +	/*
> +	 * Note that we initialize the repository version to 1 when the ref
> +	 * storage format is unknown. This is on purpose so that we can add the
> +	 * correct object format to the config during git-clone(1). The format
> +	 * version will get adjusted by git-clone(1) once it has learned about
> +	 * the remote repository's format.
> +	 */
>  	if (hash_algo != GIT_HASH_SHA1 ||
>  	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
>  		repo_version = GIT_REPO_VERSION_READ;
> @@ -1898,7 +1905,7 @@ void initialize_repository_version(int hash_algo,
>  		  "%d", repo_version);
>  	git_config_set("core.repositoryformatversion", repo_version_string);
>  
> -	if (hash_algo != GIT_HASH_SHA1)
> +	if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
>  		git_config_set("extensions.objectformat",
>  			       hash_algos[hash_algo].name);
>  	else if (reinit)
> diff --git a/t/t5801/git-remote-testgit b/t/t5801/git-remote-testgit
> index 1544d6dc6b..bcfb358c51 100755
> --- a/t/t5801/git-remote-testgit
> +++ b/t/t5801/git-remote-testgit
> @@ -12,6 +12,11 @@ url=$2
>  
>  dir="$GIT_DIR/testgit/$alias"
>  
> +if ! git rev-parse --is-inside-git-dir
> +then
> +	exit 1
> +fi
> +
>  h_refspec="refs/heads/*:refs/testgit/$alias/heads/*"
>  t_refspec="refs/tags/*:refs/testgit/$alias/tags/*"
>  
> -- 
> 2.44.0
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-05-03  2:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-27 14:27 [PATCH 0/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
2024-02-27 14:27 ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Patrick Steinhardt
2024-02-28 11:32   ` Karthik Nayak
2024-03-04  6:44     ` Patrick Steinhardt
2024-03-04 16:28       ` Junio C Hamano
2024-03-05 11:43         ` Patrick Steinhardt
2024-03-05 15:59           ` Junio C Hamano
2024-03-06 11:17           ` [PATCH] t0610: remove unused variable assignment Patrick Steinhardt
2024-03-01  1:20   ` [PATCH 1/2] refs/reftable: don't fail empty transactions in repo without HEAD Justin Tobler
2024-03-04  6:44     ` Patrick Steinhardt
2024-02-27 14:27 ` [PATCH 2/2] builtin/clone: allow remote helpers to detect repo Patrick Steinhardt
2024-02-27 21:31   ` Junio C Hamano
2024-03-04  6:44     ` Patrick Steinhardt
2024-03-04 16:17       ` Junio C Hamano
2024-05-03  2:04   ` Mike Hommey
2024-02-27 20:58 ` [PATCH 0/2] " Junio C Hamano
2024-02-27 21:33   ` Junio C Hamano
2024-03-04  6:44     ` Patrick Steinhardt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.