git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Posible bug with GIT_DEFAULT_HASH during clone
@ 2020-09-11 15:17 Matheus Tavares
  2020-09-11 23:20 ` brian m. carlson
  2020-09-11 23:38 ` [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH brian m. carlson
  0 siblings, 2 replies; 15+ messages in thread
From: Matheus Tavares @ 2020-09-11 15:17 UTC (permalink / raw)
  To: git; +Cc: sandals, martin.agren

Hi, everyone

Documentation/git.txt mentions that GIT_DEFAULT_HASH is ignored during
clone, but I think it may not be *totally* ignored, sometimes leaving
the config file on the cloned repo in an inconsistent state.

To reproduce this (tested with current `master` and `seen`):

git init test
echo F>test/F
git -C test add F
git -C test commit -m F
export GIT_DEFAULT_HASH=sha256
git clone test test-clone
git -C test-clone status

Which outputs:
fatal: repo version is 0, but v1-only extensions found:
        objectformat

From what I could see under gdb, the steps leading to this are:

- First, this call chain gets the GIT_DEFAULT_HASH value:
  cmd_clone() > init_db() > validate_hash_algorithm().

- Then, init_db() calls create_default_files(), which calls
  initialize_repository_version() with GIT_HASH_SHA256, setting these
  configs:
  * extensions.objectFormat=sha256
  * core.repositoryFormatVersion=1

- Finally, cmd_clone() later uses the return of
  transport_get_hash_algo() to call initialize_repository_version()
  again, but with GIT_HASH_SHA1, setting:
  * core.repositoryFormatVersion=0

So we end up with the repository format version as 0 but the
objectFormat extension is present.

Thanks,
Matheus


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Posible bug with GIT_DEFAULT_HASH during clone
  2020-09-11 15:17 Posible bug with GIT_DEFAULT_HASH during clone Matheus Tavares
@ 2020-09-11 23:20 ` brian m. carlson
  2020-09-11 23:38 ` [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH brian m. carlson
  1 sibling, 0 replies; 15+ messages in thread
From: brian m. carlson @ 2020-09-11 23:20 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, martin.agren

[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]

On 2020-09-11 at 15:17:17, Matheus Tavares wrote:
> Hi, everyone
> 
> Documentation/git.txt mentions that GIT_DEFAULT_HASH is ignored during
> clone, but I think it may not be *totally* ignored, sometimes leaving
> the config file on the cloned repo in an inconsistent state.
> 
> To reproduce this (tested with current `master` and `seen`):
> 
> git init test
> echo F>test/F
> git -C test add F
> git -C test commit -m F
> export GIT_DEFAULT_HASH=sha256
> git clone test test-clone
> git -C test-clone status
> 
> Which outputs:
> fatal: repo version is 0, but v1-only extensions found:
>         objectformat
> 
> From what I could see under gdb, the steps leading to this are:
> 
> - First, this call chain gets the GIT_DEFAULT_HASH value:
>   cmd_clone() > init_db() > validate_hash_algorithm().
> 
> - Then, init_db() calls create_default_files(), which calls
>   initialize_repository_version() with GIT_HASH_SHA256, setting these
>   configs:
>   * extensions.objectFormat=sha256
>   * core.repositoryFormatVersion=1
> 
> - Finally, cmd_clone() later uses the return of
>   transport_get_hash_algo() to call initialize_repository_version()
>   again, but with GIT_HASH_SHA1, setting:
>   * core.repositoryFormatVersion=0
> 
> So we end up with the repository format version as 0 but the
> objectFormat extension is present.

Thanks for the bug report and reproduction steps.  There are a couple
possibilities here:

* First, we just initialize the repository with SHA-1.  When I saw this,
  I tried that, but it breaks anytime we have an empty repository (and
  with partial clone, apparently).  So I don't think this is going to
  work.
* We take a reinitialize flag in initialize_repository_version and then
  set the algorithm to "sha1".  This is extremely easy, but it also
  poses some compatibility problems, because older versions of Git won't
  handle this configuration.
* We take a reinitialize flag and clear the value, which seems to be the
  best way forward.  I'll verify that this doesn't pose any unforeseen
  problems elsewhere in the testsuite and then send a patch.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-11 15:17 Posible bug with GIT_DEFAULT_HASH during clone Matheus Tavares
  2020-09-11 23:20 ` brian m. carlson
@ 2020-09-11 23:38 ` brian m. carlson
  2020-09-12  3:24   ` Taylor Blau
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: brian m. carlson @ 2020-09-11 23:38 UTC (permalink / raw)
  To: git; +Cc: Matheus Tavares

If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256".  This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).

This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using.  We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.

We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be.  And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.

Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1.  This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.

Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/clone.c   |  2 +-
 builtin/init-db.c |  6 ++++--
 cache.h           |  2 +-
 t/t5601-clone.sh  | 10 ++++++++++
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index b087ee40c2..925a2e3dd6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1235,7 +1235,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		 * Now that we know what algorithm the remote side is using,
 		 * let's set ours to the same thing.
 		 */
-		initialize_repository_version(hash_algo);
+		initialize_repository_version(hash_algo, 1);
 		repo_set_hash_algo(the_repository, hash_algo);
 
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
diff --git a/builtin/init-db.c b/builtin/init-db.c
index cd3e760541..57e15e93da 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -179,7 +179,7 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo, int reinit)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -195,6 +195,8 @@ void initialize_repository_version(int hash_algo)
 	if (hash_algo != GIT_HASH_SHA1)
 		git_config_set("extensions.objectformat",
 			       hash_algos[hash_algo].name);
+	else if (reinit)
+		git_config_set("extensions.objectformat", NULL);
 }
 
 static int create_default_files(const char *template_path,
@@ -277,7 +279,7 @@ static int create_default_files(const char *template_path,
 		free(ref);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, 0);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
diff --git a/cache.h b/cache.h
index cee8aa5dc3..c0072d43b1 100644
--- a/cache.h
+++ b/cache.h
@@ -629,7 +629,7 @@ int path_inside_repo(const char *prefix, const char *path);
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
 	    const char *initial_branch, unsigned int flags);
-void initialize_repository_version(int hash_algo);
+void initialize_repository_version(int hash_algo, int reinit);
 
 void sanitize_stdfds(void);
 int daemonize(void);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 15fb64c18d..570d989795 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -631,6 +631,16 @@ test_expect_success CASE_INSENSITIVE_FS 'colliding file detection' '
 	test_i18ngrep "the following paths have collided" icasefs/warning
 '
 
+test_expect_success 'clone with GIT_DEFAULT_HASH' '
+	(
+		sane_unset GIT_DEFAULT_HASH &&
+		git init test
+	) &&
+	test_commit -C test foo &&
+	git clone test test-clone &&
+	git -C test-clone status
+'
+
 partial_clone_server () {
 	       SERVER="$1" &&
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-11 23:38 ` [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH brian m. carlson
@ 2020-09-12  3:24   ` Taylor Blau
  2020-09-12 19:52     ` brian m. carlson
  2020-09-15  1:58   ` [PATCH v2] " brian m. carlson
  2020-09-20 22:35   ` [PATCH v3] " brian m. carlson
  2 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2020-09-12  3:24 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Matheus Tavares

On Fri, Sep 11, 2020 at 11:38:15PM +0000, brian m. carlson wrote:
> Neither of those are appealing, so let's tell the repository
> initialization code if we're doing a reinit like this, and if so, to
> clear the extension if we're using SHA-1.  This makes sure we produce a
> valid and functional repository and doesn't break any of our other use
> cases.

Your explanation makes sense, and what you're proposing here seems like
a very reasonable path forward. I can't think of anything that would
cause it not to work as expected.

> -void initialize_repository_version(int hash_algo);
> +void initialize_repository_version(int hash_algo, int reinit);

I'm not a huge fan of adding a 'reinit' parameter to a function that
itself begins with the word 'initialize' (why wouldn't you call
'reinitialize_repository_version()' instead?), but seeing as there are
only a couple of callers, maybe it is OK.

Alternatively, I certainly wouldn't complain if you did introduce a new
function and updated the call-site that passes reinit as 1.

> +test_expect_success 'clone with GIT_DEFAULT_HASH' '
> +	(
> +		sane_unset GIT_DEFAULT_HASH &&
> +		git init test
> +	) &&
> +	test_commit -C test foo &&
> +	git clone test test-clone &&
> +	git -C test-clone status
> +'
> +

This test looks very reasonable, and certainly demonstrates the bug and
fix. Thanks.

With or without my nitpick:

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-12  3:24   ` Taylor Blau
@ 2020-09-12 19:52     ` brian m. carlson
  2020-09-14 21:37       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: brian m. carlson @ 2020-09-12 19:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Matheus Tavares

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On 2020-09-12 at 03:24:48, Taylor Blau wrote:
> On Fri, Sep 11, 2020 at 11:38:15PM +0000, brian m. carlson wrote:
> > -void initialize_repository_version(int hash_algo);
> > +void initialize_repository_version(int hash_algo, int reinit);
> 
> I'm not a huge fan of adding a 'reinit' parameter to a function that
> itself begins with the word 'initialize' (why wouldn't you call
> 'reinitialize_repository_version()' instead?), but seeing as there are
> only a couple of callers, maybe it is OK.
> 
> Alternatively, I certainly wouldn't complain if you did introduce a new
> function and updated the call-site that passes reinit as 1.

I thought about introducing a new function, but since it would share
almost all of the code, it seemed a bit wasteful, even if the function
is small.  We do have only two callers, I believe, since I recall
making this function non-static and calling it from clone, so I think
it's okay.

> > +test_expect_success 'clone with GIT_DEFAULT_HASH' '
> > +	(
> > +		sane_unset GIT_DEFAULT_HASH &&
> > +		git init test
> > +	) &&
> > +	test_commit -C test foo &&
> > +	git clone test test-clone &&
> > +	git -C test-clone status
> > +'
> > +
> 
> This test looks very reasonable, and certainly demonstrates the bug and
> fix. Thanks.

This is essentially Matheus's testcase.  I considered using env -u to
avoid the subshell but POSIX doesn't specify that, so we have a
subshell.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-12 19:52     ` brian m. carlson
@ 2020-09-14 21:37       ` Junio C Hamano
  2020-09-14 21:44         ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2020-09-14 21:37 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Taylor Blau, git, Matheus Tavares

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> Alternatively, I certainly wouldn't complain if you did introduce a new
>> function and updated the call-site that passes reinit as 1.
>
> I thought about introducing a new function, but since it would share
> almost all of the code, it seemed a bit wasteful, even if the function
> is small.  We do have only two callers, I believe, since I recall
> making this function non-static and calling it from clone, so I think
> it's okay.

Perhaps.

FWIW, this seems to have strange interaction with something in
'seen' when merged; I suspect it is the topic that mucks with the
set-up sequence for "git clone", but didn't check.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-14 21:37       ` Junio C Hamano
@ 2020-09-14 21:44         ` Junio C Hamano
  2020-09-15  1:32           ` brian m. carlson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2020-09-14 21:44 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Taylor Blau, git, Matheus Tavares

Junio C Hamano <gitster@pobox.com> writes:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>>> Alternatively, I certainly wouldn't complain if you did introduce a new
>>> function and updated the call-site that passes reinit as 1.
>>
>> I thought about introducing a new function, but since it would share
>> almost all of the code, it seemed a bit wasteful, even if the function
>> is small.  We do have only two callers, I believe, since I recall
>> making this function non-static and calling it from clone, so I think
>> it's okay.
>
> Perhaps.
>
> FWIW, this seems to have strange interaction with something in
> 'seen' when merged; I suspect it is the topic that mucks with the
> set-up sequence for "git clone", but didn't check.

Actually I have to take it back.  I have this directly on top of
v2.28.0 and it already breaks tests big time.  For example, here is
how "cd t && sh t0021-conversion.sh -i -v" breaks:

    ...
    Cloning into 'repo-cloned'...
    fatal: could not unset 'extensions.objectformat'
    fatal: the remote end hung up unexpectedly
    not ok 25 - delayed checkout in process filter

The story is the same if it is applied on top of 'master' (which is
expected, as we haven't done anything to affect this area since
v2.28.0).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-14 21:44         ` Junio C Hamano
@ 2020-09-15  1:32           ` brian m. carlson
  0 siblings, 0 replies; 15+ messages in thread
From: brian m. carlson @ 2020-09-15  1:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, Matheus Tavares

[-- Attachment #1: Type: text/plain, Size: 836 bytes --]

On 2020-09-14 at 21:44:45, Junio C Hamano wrote:
> Actually I have to take it back.  I have this directly on top of
> v2.28.0 and it already breaks tests big time.  For example, here is
> how "cd t && sh t0021-conversion.sh -i -v" breaks:
> 
>     ...
>     Cloning into 'repo-cloned'...
>     fatal: could not unset 'extensions.objectformat'
>     fatal: the remote end hung up unexpectedly
>     not ok 25 - delayed checkout in process filter
> 
> The story is the same if it is applied on top of 'master' (which is
> expected, as we haven't done anything to affect this area since
> v2.28.0).

Ah, yes.  I build with GIT_TEST_DEFAULT_HASH=sha256, but if you run the
tests with SHA-1, it fails.  Feel free to drop it for now and I'll send
out a v2 soon.  Sorry about that.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-11 23:38 ` [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH brian m. carlson
  2020-09-12  3:24   ` Taylor Blau
@ 2020-09-15  1:58   ` brian m. carlson
  2020-09-15  4:31     ` Junio C Hamano
  2020-09-20 22:35   ` [PATCH v3] " brian m. carlson
  2 siblings, 1 reply; 15+ messages in thread
From: brian m. carlson @ 2020-09-15  1:58 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Matheus Tavares

If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256".  This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).

This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using.  We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.

We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be.  And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.

Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1.  This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.

Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Changes since v1:
* Use git_config_set_gently to make things work with SHA-1 repos as well
  as SHA-256 repos.

Diff-intervalle contre v1 :
1:  32d3357460 ! 1:  1becbbbb50 builtin/clone: avoid failure with GIT_DEFAULT_HASH
    @@ builtin/init-db.c: void initialize_repository_version(int hash_algo)
      		git_config_set("extensions.objectformat",
      			       hash_algos[hash_algo].name);
     +	else if (reinit)
    -+		git_config_set("extensions.objectformat", NULL);
    ++		git_config_set_gently("extensions.objectformat", NULL);
      }
      
      static int create_default_files(const char *template_path,

 builtin/clone.c   |  2 +-
 builtin/init-db.c |  6 ++++--
 cache.h           |  2 +-
 t/t5601-clone.sh  | 10 ++++++++++
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index b087ee40c2..925a2e3dd6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1235,7 +1235,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		 * Now that we know what algorithm the remote side is using,
 		 * let's set ours to the same thing.
 		 */
-		initialize_repository_version(hash_algo);
+		initialize_repository_version(hash_algo, 1);
 		repo_set_hash_algo(the_repository, hash_algo);
 
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
diff --git a/builtin/init-db.c b/builtin/init-db.c
index cd3e760541..01bc648d41 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -179,7 +179,7 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo, int reinit)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -195,6 +195,8 @@ void initialize_repository_version(int hash_algo)
 	if (hash_algo != GIT_HASH_SHA1)
 		git_config_set("extensions.objectformat",
 			       hash_algos[hash_algo].name);
+	else if (reinit)
+		git_config_set_gently("extensions.objectformat", NULL);
 }
 
 static int create_default_files(const char *template_path,
@@ -277,7 +279,7 @@ static int create_default_files(const char *template_path,
 		free(ref);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, 0);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
diff --git a/cache.h b/cache.h
index cee8aa5dc3..c0072d43b1 100644
--- a/cache.h
+++ b/cache.h
@@ -629,7 +629,7 @@ int path_inside_repo(const char *prefix, const char *path);
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
 	    const char *initial_branch, unsigned int flags);
-void initialize_repository_version(int hash_algo);
+void initialize_repository_version(int hash_algo, int reinit);
 
 void sanitize_stdfds(void);
 int daemonize(void);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 15fb64c18d..570d989795 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -631,6 +631,16 @@ test_expect_success CASE_INSENSITIVE_FS 'colliding file detection' '
 	test_i18ngrep "the following paths have collided" icasefs/warning
 '
 
+test_expect_success 'clone with GIT_DEFAULT_HASH' '
+	(
+		sane_unset GIT_DEFAULT_HASH &&
+		git init test
+	) &&
+	test_commit -C test foo &&
+	git clone test test-clone &&
+	git -C test-clone status
+'
+
 partial_clone_server () {
 	       SERVER="$1" &&
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-15  1:58   ` [PATCH v2] " brian m. carlson
@ 2020-09-15  4:31     ` Junio C Hamano
  2020-09-15 22:51       ` brian m. carlson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2020-09-15  4:31 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Taylor Blau, Matheus Tavares

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
> "sha256", then we can end up with a repository where the repository
> format version is 0 but the extensions.objectformat key is set to
> "sha256".  This is both wrong (the user has a SHA-1 repository) and
> nonfunctional (because the extension cannot be used in a v0 repository).
>
> This happens because in a clone, we initially set up the repository, and
> then change its algorithm based on what the remote side tells us it's
> using.  We've initially set up the repository as SHA-256 in this case,
> and then later on reset the repository version without clearing the
> extension.
>
> We could just always set the extension in this case, but that would mean
> that our SHA-1 repositories weren't compatible with older Git versions,
> even though there's no reason why they shouldn't be.  And we also don't
> want to initialize the repository as SHA-1 initially, since that means
> if we're cloning an empty repository, we'll have failed to honor the
> GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
> SHA-256 repository.
>
> Neither of those are appealing, so let's tell the repository
> initialization code if we're doing a reinit like this, and if so, to
> clear the extension if we're using SHA-1.  This makes sure we produce a
> valid and functional repository and doesn't break any of our other use
> cases.
>
> Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
> Changes since v1:
> * Use git_config_set_gently to make things work with SHA-1 repos as well
>   as SHA-256 repos.

Hmph, the reason why v1's bug weren't caught was because it was only
tested with GIT_TEST_DEFAULT_HASH=sha256, right?  I am wondering if
adding two new tests that run the same end-user scenario except
for the choice of hash algorithms would be a good way to ensure this
will stay fixed.  Am I mistaken?

Thanks anyway for a quick turnaround.

> Diff-intervalle contre v1 :
> 1:  32d3357460 ! 1:  1becbbbb50 builtin/clone: avoid failure with GIT_DEFAULT_HASH
>     @@ builtin/init-db.c: void initialize_repository_version(int hash_algo)
>       		git_config_set("extensions.objectformat",
>       			       hash_algos[hash_algo].name);
>      +	else if (reinit)
>     -+		git_config_set("extensions.objectformat", NULL);
>     ++		git_config_set_gently("extensions.objectformat", NULL);
>       }
>       
>       static int create_default_files(const char *template_path,
>
>  builtin/clone.c   |  2 +-
>  builtin/init-db.c |  6 ++++--
>  cache.h           |  2 +-
>  t/t5601-clone.sh  | 10 ++++++++++
>  4 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index b087ee40c2..925a2e3dd6 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1235,7 +1235,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  		 * Now that we know what algorithm the remote side is using,
>  		 * let's set ours to the same thing.
>  		 */
> -		initialize_repository_version(hash_algo);
> +		initialize_repository_version(hash_algo, 1);
>  		repo_set_hash_algo(the_repository, hash_algo);
>  
>  		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index cd3e760541..01bc648d41 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -179,7 +179,7 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
>  	return 1;
>  }
>  
> -void initialize_repository_version(int hash_algo)
> +void initialize_repository_version(int hash_algo, int reinit)
>  {
>  	char repo_version_string[10];
>  	int repo_version = GIT_REPO_VERSION;
> @@ -195,6 +195,8 @@ void initialize_repository_version(int hash_algo)
>  	if (hash_algo != GIT_HASH_SHA1)
>  		git_config_set("extensions.objectformat",
>  			       hash_algos[hash_algo].name);
> +	else if (reinit)
> +		git_config_set_gently("extensions.objectformat", NULL);
>  }
>  
>  static int create_default_files(const char *template_path,
> @@ -277,7 +279,7 @@ static int create_default_files(const char *template_path,
>  		free(ref);
>  	}
>  
> -	initialize_repository_version(fmt->hash_algo);
> +	initialize_repository_version(fmt->hash_algo, 0);
>  
>  	/* Check filemode trustability */
>  	path = git_path_buf(&buf, "config");
> diff --git a/cache.h b/cache.h
> index cee8aa5dc3..c0072d43b1 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -629,7 +629,7 @@ int path_inside_repo(const char *prefix, const char *path);
>  int init_db(const char *git_dir, const char *real_git_dir,
>  	    const char *template_dir, int hash_algo,
>  	    const char *initial_branch, unsigned int flags);
> -void initialize_repository_version(int hash_algo);
> +void initialize_repository_version(int hash_algo, int reinit);
>  
>  void sanitize_stdfds(void);
>  int daemonize(void);
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index 15fb64c18d..570d989795 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -631,6 +631,16 @@ test_expect_success CASE_INSENSITIVE_FS 'colliding file detection' '
>  	test_i18ngrep "the following paths have collided" icasefs/warning
>  '
>  
> +test_expect_success 'clone with GIT_DEFAULT_HASH' '
> +	(
> +		sane_unset GIT_DEFAULT_HASH &&
> +		git init test
> +	) &&
> +	test_commit -C test foo &&
> +	git clone test test-clone &&
> +	git -C test-clone status
> +'
> +
>  partial_clone_server () {
>  	       SERVER="$1" &&
>  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-15  4:31     ` Junio C Hamano
@ 2020-09-15 22:51       ` brian m. carlson
  0 siblings, 0 replies; 15+ messages in thread
From: brian m. carlson @ 2020-09-15 22:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, Matheus Tavares

[-- Attachment #1: Type: text/plain, Size: 597 bytes --]

On 2020-09-15 at 04:31:14, Junio C Hamano wrote:
> > Changes since v1:
> > * Use git_config_set_gently to make things work with SHA-1 repos as well
> >   as SHA-256 repos.
> 
> Hmph, the reason why v1's bug weren't caught was because it was only
> tested with GIT_TEST_DEFAULT_HASH=sha256, right?  I am wondering if
> adding two new tests that run the same end-user scenario except
> for the choice of hash algorithms would be a good way to ensure this
> will stay fixed.  Am I mistaken?

Sure, I can do that.  I'll try to get a v3 out soon.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-11 23:38 ` [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH brian m. carlson
  2020-09-12  3:24   ` Taylor Blau
  2020-09-15  1:58   ` [PATCH v2] " brian m. carlson
@ 2020-09-20 22:35   ` brian m. carlson
  2020-09-21  4:27     ` Junio C Hamano
  2 siblings, 1 reply; 15+ messages in thread
From: brian m. carlson @ 2020-09-20 22:35 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Matheus Tavares

If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256".  This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).

This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using.  We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.

We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be.  And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.

Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1.  This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.

Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
Diff-intervalle contre v2 :
1:  1becbbbb50 ! 1:  004a7b86f8 builtin/clone: avoid failure with GIT_DEFAULT_HASH
    @@ t/t5601-clone.sh: test_expect_success CASE_INSENSITIVE_FS 'colliding file detect
     +test_expect_success 'clone with GIT_DEFAULT_HASH' '
     +	(
     +		sane_unset GIT_DEFAULT_HASH &&
    -+		git init test
    ++		git init --object-format=sha1 test-sha1 &&
    ++		git init --object-format=sha256 test-sha256
     +	) &&
    -+	test_commit -C test foo &&
    -+	git clone test test-clone &&
    -+	git -C test-clone status
    ++	test_commit -C test-sha1 foo &&
    ++	test_commit -C test-sha256 foo &&
    ++	GIT_DEFAULT_HASH=sha1 git clone test-sha256 test-clone-sha256 &&
    ++	GIT_DEFAULT_HASH=sha256 git clone test-sha1 test-clone-sha1 &&
    ++	git -C test-clone-sha1 status &&
    ++	git -C test-clone-sha256 status
     +'
     +
      partial_clone_server () {

 builtin/clone.c   |  2 +-
 builtin/init-db.c |  6 ++++--
 cache.h           |  2 +-
 t/t5601-clone.sh  | 14 ++++++++++++++
 4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index b087ee40c2..925a2e3dd6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1235,7 +1235,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		 * Now that we know what algorithm the remote side is using,
 		 * let's set ours to the same thing.
 		 */
-		initialize_repository_version(hash_algo);
+		initialize_repository_version(hash_algo, 1);
 		repo_set_hash_algo(the_repository, hash_algo);
 
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
diff --git a/builtin/init-db.c b/builtin/init-db.c
index cd3e760541..01bc648d41 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -179,7 +179,7 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo, int reinit)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -195,6 +195,8 @@ void initialize_repository_version(int hash_algo)
 	if (hash_algo != GIT_HASH_SHA1)
 		git_config_set("extensions.objectformat",
 			       hash_algos[hash_algo].name);
+	else if (reinit)
+		git_config_set_gently("extensions.objectformat", NULL);
 }
 
 static int create_default_files(const char *template_path,
@@ -277,7 +279,7 @@ static int create_default_files(const char *template_path,
 		free(ref);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, 0);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
diff --git a/cache.h b/cache.h
index cee8aa5dc3..c0072d43b1 100644
--- a/cache.h
+++ b/cache.h
@@ -629,7 +629,7 @@ int path_inside_repo(const char *prefix, const char *path);
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
 	    const char *initial_branch, unsigned int flags);
-void initialize_repository_version(int hash_algo);
+void initialize_repository_version(int hash_algo, int reinit);
 
 void sanitize_stdfds(void);
 int daemonize(void);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 15fb64c18d..b6c8312da1 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -631,6 +631,20 @@ test_expect_success CASE_INSENSITIVE_FS 'colliding file detection' '
 	test_i18ngrep "the following paths have collided" icasefs/warning
 '
 
+test_expect_success 'clone with GIT_DEFAULT_HASH' '
+	(
+		sane_unset GIT_DEFAULT_HASH &&
+		git init --object-format=sha1 test-sha1 &&
+		git init --object-format=sha256 test-sha256
+	) &&
+	test_commit -C test-sha1 foo &&
+	test_commit -C test-sha256 foo &&
+	GIT_DEFAULT_HASH=sha1 git clone test-sha256 test-clone-sha256 &&
+	GIT_DEFAULT_HASH=sha256 git clone test-sha1 test-clone-sha1 &&
+	git -C test-clone-sha1 status &&
+	git -C test-clone-sha256 status
+'
+
 partial_clone_server () {
 	       SERVER="$1" &&
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-20 22:35   ` [PATCH v3] " brian m. carlson
@ 2020-09-21  4:27     ` Junio C Hamano
  2020-09-22  9:17       ` brian m. carlson
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2020-09-21  4:27 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Taylor Blau, Matheus Tavares

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> +test_expect_success 'clone with GIT_DEFAULT_HASH' '
> +	(
> +		sane_unset GIT_DEFAULT_HASH &&
> +		git init --object-format=sha1 test-sha1 &&
> +		git init --object-format=sha256 test-sha256
> +	) &&
> +	test_commit -C test-sha1 foo &&
> +	test_commit -C test-sha256 foo &&

Unfortunately, the 'foo' commit is created in test-sha1, but the
next step to create 'foo' in test-sha256 fails with

        fatal: unknown repository extensions found:
                objectformat

> +	GIT_DEFAULT_HASH=sha1 git clone test-sha256 test-clone-sha256 &&
> +	GIT_DEFAULT_HASH=sha256 git clone test-sha1 test-clone-sha1 &&
> +	git -C test-clone-sha1 status &&
> +	git -C test-clone-sha256 status
> +'
> +
>  partial_clone_server () {
>  	       SERVER="$1" &&
>  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-21  4:27     ` Junio C Hamano
@ 2020-09-22  9:17       ` brian m. carlson
  2020-09-22 16:27         ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: brian m. carlson @ 2020-09-22  9:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, Matheus Tavares

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]

On 2020-09-21 at 04:27:14, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > +test_expect_success 'clone with GIT_DEFAULT_HASH' '
> > +	(
> > +		sane_unset GIT_DEFAULT_HASH &&
> > +		git init --object-format=sha1 test-sha1 &&
> > +		git init --object-format=sha256 test-sha256
> > +	) &&
> > +	test_commit -C test-sha1 foo &&
> > +	test_commit -C test-sha256 foo &&
> 
> Unfortunately, the 'foo' commit is created in test-sha1, but the
> next step to create 'foo' in test-sha256 fails with
> 
>         fatal: unknown repository extensions found:
>                 objectformat

I'm not seeing that with this series based on master
(385c171a018f2747b329bcfa6be8eda1709e5abd).  I'm doing this:

  make -j6 all && (cd t && GIT_TEST_DEFAULT_HASH=sha256 ./t5601-*.sh --verbose)
  make -j6 all && (cd t && GIT_TEST_DEFAULT_HASH=sha1 ./t5601-*.sh --verbose)

And getting this output:

  Initialized empty Git repository in /home/bmc/checkouts/git/t/trash directory.t5601-clone/test-sha1/.git/
  Initialized empty Git repository in /home/bmc/checkouts/git/t/trash directory.t5601-clone/test-sha256/.git/
  [master (root-commit) 946e985] foo
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+)
   create mode 100644 foo.t
  [master (root-commit) ff872d8] foo
   Author: A U Thor <author@example.com>
   1 file changed, 1 insertion(+)
   create mode 100644 foo.t
  Cloning into 'test-clone-sha256'...
  done.
  Cloning into 'test-clone-sha1'...
  done.
  On branch master
  Your branch is up to date with 'origin/master'.
  
  nothing to commit, working tree clean
  On branch master
  Your branch is up to date with 'origin/master'.
  
  nothing to commit, working tree clean

Is there something I'm missing here?
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] builtin/clone: avoid failure with GIT_DEFAULT_HASH
  2020-09-22  9:17       ` brian m. carlson
@ 2020-09-22 16:27         ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2020-09-22 16:27 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Taylor Blau, Matheus Tavares

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2020-09-21 at 04:27:14, Junio C Hamano wrote:
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> 
>> > +test_expect_success 'clone with GIT_DEFAULT_HASH' '
>> > +	(
>> > +		sane_unset GIT_DEFAULT_HASH &&
>> > +		git init --object-format=sha1 test-sha1 &&
>> > +		git init --object-format=sha256 test-sha256
>> > +	) &&
>> > +	test_commit -C test-sha1 foo &&
>> > +	test_commit -C test-sha256 foo &&
>> 
>> Unfortunately, the 'foo' commit is created in test-sha1, but the
>> next step to create 'foo' in test-sha256 fails with
>> 
>>         fatal: unknown repository extensions found:
>>                 objectformat
>
> I'm not seeing that with this series based on master
> (385c171a018f2747b329bcfa6be8eda1709e5abd).  I'm doing this:
>
>   make -j6 all && (cd t && GIT_TEST_DEFAULT_HASH=sha256 ./t5601-*.sh --verbose)
>   make -j6 all && (cd t && GIT_TEST_DEFAULT_HASH=sha1 ./t5601-*.sh --verbose)
>
> And getting this output:
>
>   Initialized empty Git repository in /home/bmc/checkouts/git/t/trash directory.t5601-clone/test-sha1/.git/
>   Initialized empty Git repository in /home/bmc/checkouts/git/t/trash directory.t5601-clone/test-sha256/.git/
>   [master (root-commit) 946e985] foo
>    Author: A U Thor <author@example.com>
>    1 file changed, 1 insertion(+)
>    create mode 100644 foo.t
>   [master (root-commit) ff872d8] foo
>    Author: A U Thor <author@example.com>
>    1 file changed, 1 insertion(+)
>    create mode 100644 foo.t
>   Cloning into 'test-clone-sha256'...
>   done.
>   Cloning into 'test-clone-sha1'...
>   done.
>   On branch master
>   Your branch is up to date with 'origin/master'.
>   
>   nothing to commit, working tree clean
>   On branch master
>   Your branch is up to date with 'origin/master'.
>   
>   nothing to commit, working tree clean
>
> Is there something I'm missing here?

I think the issue was because I queued the patch on the same base as
the previous round, which was v2.28.0.  Whe applied to 385c171a
(Fifteenth batch, 2020-09-18), I get the same as the above.

Sorry about the noise.  Will queue again.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-09-22 16:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11 15:17 Posible bug with GIT_DEFAULT_HASH during clone Matheus Tavares
2020-09-11 23:20 ` brian m. carlson
2020-09-11 23:38 ` [PATCH] builtin/clone: avoid failure with GIT_DEFAULT_HASH brian m. carlson
2020-09-12  3:24   ` Taylor Blau
2020-09-12 19:52     ` brian m. carlson
2020-09-14 21:37       ` Junio C Hamano
2020-09-14 21:44         ` Junio C Hamano
2020-09-15  1:32           ` brian m. carlson
2020-09-15  1:58   ` [PATCH v2] " brian m. carlson
2020-09-15  4:31     ` Junio C Hamano
2020-09-15 22:51       ` brian m. carlson
2020-09-20 22:35   ` [PATCH v3] " brian m. carlson
2020-09-21  4:27     ` Junio C Hamano
2020-09-22  9:17       ` brian m. carlson
2020-09-22 16:27         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).