git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix fsck --name-objects bug
@ 2021-02-10 18:01 Johannes Schindelin via GitGitGadget
  2021-02-10 18:01 ` [PATCH 1/2] t1450: robustify `remove_object()` Johannes Schindelin via GitGitGadget
  2021-02-10 18:01 ` [PATCH 2/2] fsck --name-objects: be more careful parsing generation numbers Johannes Schindelin via GitGitGadget
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2021-02-10 18:01 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

As described in https://github.com/gitgitgadget/git/pull/873, I managed to
corrupt my Git checkout in a rather thorough manner yesterday, and it took
me a long time to undo the damage. One of my tools for that should have been
git fsck --name-objects, but that command produced bogus output: It said
that a commit with the name ~2 had a missing tree, but ~2 is not even a
legal rev name.

Turns out that this is an ancient bug, and the fact that nobody complained
about it suggests to me that the --name-objects probably has exactly one
user, and he uses it only every four years, when he manages to hose his Git
checkout.

Johannes Schindelin (2):
  t1450: robustify `remove_object()`
  fsck --name-objects: be more careful parsing generation numbers

 fsck.c          |  5 +++++
 t/t1450-fsck.sh | 26 ++++++++++++--------------
 2 files changed, 17 insertions(+), 14 deletions(-)


base-commit: 7397ca33730626f682845f8691b39c305535611e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-874%2Fdscho%2Ffix-fsck-name-objects-generation-parsing-bug-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-874/dscho/fix-fsck-name-objects-generation-parsing-bug-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/874
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] t1450: robustify `remove_object()`
  2021-02-10 18:01 [PATCH 0/2] Fix fsck --name-objects bug Johannes Schindelin via GitGitGadget
@ 2021-02-10 18:01 ` Johannes Schindelin via GitGitGadget
  2021-02-10 20:36   ` Junio C Hamano
  2021-02-10 18:01 ` [PATCH 2/2] fsck --name-objects: be more careful parsing generation numbers Johannes Schindelin via GitGitGadget
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2021-02-10 18:01 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This function can be simplified by using the `test_oid_to_path()`
helper, which incidentally also makes it more robust by not relying on
the exact file system layout of the loose object files.

While at it, do not define those functions in a test case, it buys us
nothing.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1450-fsck.sh | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 02478bc4ece2..779f700ac4a0 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -41,17 +41,13 @@ test_expect_success 'HEAD is part of refs, valid objects appear valid' '
 # specific corruption you test afterwards, lest a later test trip over
 # it.
 
-test_expect_success 'setup: helpers for corruption tests' '
-	sha1_file() {
-		remainder=${1#??} &&
-		firsttwo=${1%$remainder} &&
-		echo ".git/objects/$firsttwo/$remainder"
-	} &&
+sha1_file () {
+	git rev-parse --git-path objects/$(test_oid_to_path "$1")
+}
 
-	remove_object() {
-		rm "$(sha1_file "$1")"
-	}
-'
+remove_object() {
+	rm "$(sha1_file "$1")"
+}
 
 test_expect_success 'object with bad sha1' '
 	sha=$(echo blob | git hash-object -w --stdin) &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] fsck --name-objects: be more careful parsing generation numbers
  2021-02-10 18:01 [PATCH 0/2] Fix fsck --name-objects bug Johannes Schindelin via GitGitGadget
  2021-02-10 18:01 ` [PATCH 1/2] t1450: robustify `remove_object()` Johannes Schindelin via GitGitGadget
@ 2021-02-10 18:01 ` Johannes Schindelin via GitGitGadget
  2021-02-10 20:38   ` Junio C Hamano
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2021-02-10 18:01 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In 7b35efd734e (fsck_walk(): optionally name objects on the go,
2016-07-17), the `fsck` machinery learned to optionally name the
objects, so that it is easier to see what part of the repository is in a
bad shape, say, when objects are missing.

To save on complexity, this machinery uses a parser to determine the
name of a parent given a commit's name: any `~<n>` suffix is parsed and
the parent's name is formed from the prefix together with `~<n+1>`.

However, this parser has a bug: if it finds a suffix `<n>` that is _not_
`~<n>`, it will mistake the empty string for the prefix and `<n>` for
the generation number. In other words, it will generate a name of the
form `~<bogus-number>`.

Let's fix this.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c          |  5 +++++
 t/t1450-fsck.sh | 10 ++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fsck.c b/fsck.c
index 73f30773f28a..83d727c6fe33 100644
--- a/fsck.c
+++ b/fsck.c
@@ -461,6 +461,11 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 				generation += power * (name[--len] - '0');
 			if (power > 1 && len && name[len - 1] == '~')
 				name_prefix_len = len - 1;
+			else {
+				/* Maybe a non-first parent, e.g. HEAD^2 */
+				generation = 0;
+				name_prefix_len = len;
+			}
 		}
 	}
 
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 779f700ac4a0..bfa3588f37ab 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -607,13 +607,15 @@ test_expect_success 'fsck --name-objects' '
 	git init name-objects &&
 	(
 		cd name-objects &&
+		git config core.logAllRefUpdates false &&
 		test_commit julius caesar.t &&
-		test_commit augustus &&
-		test_commit caesar &&
+		test_commit augustus44 &&
+		test_commit caesar  &&
 		remove_object $(git rev-parse julius:caesar.t) &&
-		test_must_fail git fsck --name-objects >out &&
 		tree=$(git rev-parse --verify julius:) &&
-		test_i18ngrep "$tree (refs/tags/julius:" out
+		git tag -d julius &&
+		test_must_fail git fsck --name-objects >out &&
+		test_i18ngrep "$tree (refs/tags/augustus44\\^:" out
 	)
 '
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] t1450: robustify `remove_object()`
  2021-02-10 18:01 ` [PATCH 1/2] t1450: robustify `remove_object()` Johannes Schindelin via GitGitGadget
@ 2021-02-10 20:36   ` Junio C Hamano
  2021-02-10 23:20     ` Taylor Blau
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2021-02-10 20:36 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> -test_expect_success 'setup: helpers for corruption tests' '
> -	sha1_file() {
> -		remainder=${1#??} &&
> -		firsttwo=${1%$remainder} &&
> -		echo ".git/objects/$firsttwo/$remainder"
> -	} &&
> +sha1_file () {
> +	git rev-parse --git-path objects/$(test_oid_to_path "$1")
> +}

Yeah, back when 90cf590f (fsck: optionally show more helpful info
for broken links, 2016-07-17) originally introduced this pattern,
we didn't have nicely abstracted helper, but now we do, and there
is no reason not to use it.  Nice.

> -	remove_object() {
> -		rm "$(sha1_file "$1")"
> -	}
> -'
> +remove_object() {

Just like you did for the other one, let's insert SP before () for
consistency here.

> +	rm "$(sha1_file "$1")"
> +}
>  
>  test_expect_success 'object with bad sha1' '
>  	sha=$(echo blob | git hash-object -w --stdin) &&

Nicely done.  

Reviewed-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] fsck --name-objects: be more careful parsing generation numbers
  2021-02-10 18:01 ` [PATCH 2/2] fsck --name-objects: be more careful parsing generation numbers Johannes Schindelin via GitGitGadget
@ 2021-02-10 20:38   ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2021-02-10 20:38 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In 7b35efd734e (fsck_walk(): optionally name objects on the go,
> 2016-07-17), the `fsck` machinery learned to optionally name the
> objects, so that it is easier to see what part of the repository is in a
> bad shape, say, when objects are missing.
>
> To save on complexity, this machinery uses a parser to determine the
> name of a parent given a commit's name: any `~<n>` suffix is parsed and
> the parent's name is formed from the prefix together with `~<n+1>`.
>
> However, this parser has a bug: if it finds a suffix `<n>` that is _not_
> `~<n>`, it will mistake the empty string for the prefix and `<n>` for
> the generation number. In other words, it will generate a name of the
> form `~<bogus-number>`.
>
> Let's fix this.

Thanks; will queue.

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  fsck.c          |  5 +++++
>  t/t1450-fsck.sh | 10 ++++++----
>  2 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/fsck.c b/fsck.c
> index 73f30773f28a..83d727c6fe33 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -461,6 +461,11 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
>  				generation += power * (name[--len] - '0');
>  			if (power > 1 && len && name[len - 1] == '~')
>  				name_prefix_len = len - 1;
> +			else {
> +				/* Maybe a non-first parent, e.g. HEAD^2 */
> +				generation = 0;
> +				name_prefix_len = len;
> +			}
>  		}
>  	}
>  
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 779f700ac4a0..bfa3588f37ab 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -607,13 +607,15 @@ test_expect_success 'fsck --name-objects' '
>  	git init name-objects &&
>  	(
>  		cd name-objects &&
> +		git config core.logAllRefUpdates false &&
>  		test_commit julius caesar.t &&
> -		test_commit augustus &&
> -		test_commit caesar &&
> +		test_commit augustus44 &&
> +		test_commit caesar  &&
>  		remove_object $(git rev-parse julius:caesar.t) &&
> -		test_must_fail git fsck --name-objects >out &&
>  		tree=$(git rev-parse --verify julius:) &&
> -		test_i18ngrep "$tree (refs/tags/julius:" out
> +		git tag -d julius &&
> +		test_must_fail git fsck --name-objects >out &&
> +		test_i18ngrep "$tree (refs/tags/augustus44\\^:" out
>  	)
>  '

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] t1450: robustify `remove_object()`
  2021-02-10 20:36   ` Junio C Hamano
@ 2021-02-10 23:20     ` Taylor Blau
  2021-02-11  0:10       ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Taylor Blau @ 2021-02-10 23:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

On Wed, Feb 10, 2021 at 12:36:19PM -0800, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > -test_expect_success 'setup: helpers for corruption tests' '
> > -	sha1_file() {
> > -		remainder=${1#??} &&
> > -		firsttwo=${1%$remainder} &&
> > -		echo ".git/objects/$firsttwo/$remainder"
> > -	} &&
> > +sha1_file () {
> > +	git rev-parse --git-path objects/$(test_oid_to_path "$1")
> > +}
>
> Yeah, back when 90cf590f (fsck: optionally show more helpful info
> for broken links, 2016-07-17) originally introduced this pattern,
> we didn't have nicely abstracted helper, but now we do, and there
> is no reason not to use it.  Nice.

This has nothing to do with this series, but I do notice a number of
other uses of test_oid_to_path that are doing this exact thing. In fact,
many of them don't use "git rev-parse --git-path", which would be
better.

I wonder if it's worth a clean-up on top to consolidate all of those
"combine the loose object path of the object with xyz OID and the
$GIT_DIR/objects directory".

In either case -- and I think I'm pretty clearly being pedantic at this
point -- do you suppose it's worthwhile to rename sha1_file to something
that doesn't have sha1 in it?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] t1450: robustify `remove_object()`
  2021-02-10 23:20     ` Taylor Blau
@ 2021-02-11  0:10       ` Junio C Hamano
  0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2021-02-11  0:10 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Johannes Schindelin via GitGitGadget, git, Johannes Schindelin

Taylor Blau <me@ttaylorr.com> writes:

> On Wed, Feb 10, 2021 at 12:36:19PM -0800, Junio C Hamano wrote:
>> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
>> writes:
>>
>> > -test_expect_success 'setup: helpers for corruption tests' '
>> > -	sha1_file() {
>> > -		remainder=${1#??} &&
>> > -		firsttwo=${1%$remainder} &&
>> > -		echo ".git/objects/$firsttwo/$remainder"
>> > -	} &&
>> > +sha1_file () {
>> > +	git rev-parse --git-path objects/$(test_oid_to_path "$1")
>> > +}
>>
>> Yeah, back when 90cf590f (fsck: optionally show more helpful info
>> for broken links, 2016-07-17) originally introduced this pattern,
>> we didn't have nicely abstracted helper, but now we do, and there
>> is no reason not to use it.  Nice.
>
> This has nothing to do with this series, but I do notice a number of
> other uses of test_oid_to_path that are doing this exact thing. In fact,
> many of them don't use "git rev-parse --git-path", which would be
> better.
>
> I wonder if it's worth a clean-up on top to consolidate all of those
> "combine the loose object path of the object with xyz OID and the
> $GIT_DIR/objects directory".
>
> In either case -- and I think I'm pretty clearly being pedantic at this
> point -- do you suppose it's worthwhile to rename sha1_file to something
> that doesn't have sha1 in it?

Possibly.  That is probably outside the scope of this topic, but we
see such SHA -> HASH clean-up patches in different places, and this
certainly is a fair game for such a clean-up, I would think.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-02-11  0:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-10 18:01 [PATCH 0/2] Fix fsck --name-objects bug Johannes Schindelin via GitGitGadget
2021-02-10 18:01 ` [PATCH 1/2] t1450: robustify `remove_object()` Johannes Schindelin via GitGitGadget
2021-02-10 20:36   ` Junio C Hamano
2021-02-10 23:20     ` Taylor Blau
2021-02-11  0:10       ` Junio C Hamano
2021-02-10 18:01 ` [PATCH 2/2] fsck --name-objects: be more careful parsing generation numbers Johannes Schindelin via GitGitGadget
2021-02-10 20:38   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).