* [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57 ` Ævar Arnfjörð Bjarmason
2021-09-16 19:40 ` Taylor Blau
2021-09-07 10:57 ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
` (22 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.
We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.
But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 34 ++++++++++++++++------------------
1 file changed, 16 insertions(+), 18 deletions(-)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..7becab5ba1e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
rm "$(sha1_file "$1")"
}
-test_expect_success 'object with bad sha1' '
- sha=$(echo blob | git hash-object -w --stdin) &&
- old=$(test_oid_to_path "$sha") &&
- new=$(dirname $old)/$(test_oid ff_2) &&
- sha="$(dirname $new)$(basename $new)" &&
- mv .git/objects/$old .git/objects/$new &&
- test_when_finished "remove_object $sha" &&
- git update-index --add --cacheinfo 100644 $sha foo &&
- test_when_finished "git read-tree -u --reset HEAD" &&
- tree=$(git write-tree) &&
- test_when_finished "remove_object $tree" &&
- cmt=$(echo bogus | git commit-tree $tree) &&
- test_when_finished "remove_object $cmt" &&
- git update-ref refs/heads/bogus $cmt &&
- test_when_finished "git update-ref -d refs/heads/bogus" &&
-
- test_must_fail git fsck 2>out &&
- test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+ git init --bare hash-mismatch &&
+ (
+ cd hash-mismatch &&
+ oid=$(echo blob | git hash-object -w --stdin) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+ test_must_fail git fsck 2>out &&
+ test_i18ngrep "$oid.*corrupt" out
+ )
'
test_expect_success 'branch pointing to non-commit' '
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo
2021-09-07 10:57 ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-16 19:40 ` Taylor Blau
2021-09-17 9:27 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 19:40 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Refactor one of the fsck tests to use a throwaway repository. It's a
> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
> teardown of a tests so we're not leaving corrupt content for the next
> test.
OK. I seem to recall you advocating against this pattern elsewhere[1], but
this is a good example of why it can sometimes make writing tests much
easier when not having to reason about what leaks out of running a test.
[1]: https://lore.kernel.org/git/87zgsnj0q0.fsf@evledraar.gmail.com/,
although after re-reading it it looks like you were more focused on the
unnecessary "rm -fr repo" there and not the "git init +
test_when_finished rm -fr" pattern.
> -test_expect_success 'object with bad sha1' '
> - sha=$(echo blob | git hash-object -w --stdin) &&
> - old=$(test_oid_to_path "$sha") &&
> - new=$(dirname $old)/$(test_oid ff_2) &&
> - sha="$(dirname $new)$(basename $new)" &&
> - mv .git/objects/$old .git/objects/$new &&
> - test_when_finished "remove_object $sha" &&
> - git update-index --add --cacheinfo 100644 $sha foo &&
> - test_when_finished "git read-tree -u --reset HEAD" &&
> - tree=$(git write-tree) &&
> - test_when_finished "remove_object $tree" &&
> - cmt=$(echo bogus | git commit-tree $tree) &&
> - test_when_finished "remove_object $cmt" &&
> - git update-ref refs/heads/bogus $cmt &&
> - test_when_finished "git update-ref -d refs/heads/bogus" &&
> -
> - test_must_fail git fsck 2>out &&
> - test_i18ngrep "$sha.*corrupt" out
> +test_expect_success 'object with hash mismatch' '
> + git init --bare hash-mismatch &&
> + (
> + cd hash-mismatch &&
> + oid=$(echo blob | git hash-object -w --stdin) &&
> + old=$(test_oid_to_path "$oid") &&
> + new=$(dirname $old)/$(test_oid ff_2) &&
> + oid="$(dirname $new)$(basename $new)" &&
> + mv objects/$old objects/$new &&
> + git update-index --add --cacheinfo 100644 $oid foo &&
> + tree=$(git write-tree) &&
> + cmt=$(echo bogus | git commit-tree $tree) &&
> + git update-ref refs/heads/bogus $cmt &&
> + test_must_fail git fsck 2>out &&
> + test_i18ngrep "$oid.*corrupt" out
> + )
> '
This all looks fine to me. The translation is s/sha/oid and removing all
of the now-unnecessary test_when_finished calls.
But the test_i18ngrep (which isn't new) could probably also stand to get
cleaned up and converted to a normal grep.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo
2021-09-16 19:40 ` Taylor Blau
@ 2021-09-17 9:27 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-17 9:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Thu, Sep 16 2021, Taylor Blau wrote:
> On Tue, Sep 07, 2021 at 12:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Refactor one of the fsck tests to use a throwaway repository. It's a
>> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
>> teardown of a tests so we're not leaving corrupt content for the next
>> test.
>
> OK. I seem to recall you advocating against this pattern elsewhere[1], but
> this is a good example of why it can sometimes make writing tests much
> easier when not having to reason about what leaks out of running a test.
>
> [1]: https://lore.kernel.org/git/87zgsnj0q0.fsf@evledraar.gmail.com/,
> although after re-reading it it looks like you were more focused on the
> unnecessary "rm -fr repo" there and not the "git init +
> test_when_finished rm -fr" pattern.
I was referring to a different pattern there, replied in some detail at
https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/
>> -test_expect_success 'object with bad sha1' '
>> - sha=$(echo blob | git hash-object -w --stdin) &&
>> - old=$(test_oid_to_path "$sha") &&
>> - new=$(dirname $old)/$(test_oid ff_2) &&
>> - sha="$(dirname $new)$(basename $new)" &&
>> - mv .git/objects/$old .git/objects/$new &&
>> - test_when_finished "remove_object $sha" &&
>> - git update-index --add --cacheinfo 100644 $sha foo &&
>> - test_when_finished "git read-tree -u --reset HEAD" &&
>> - tree=$(git write-tree) &&
>> - test_when_finished "remove_object $tree" &&
>> - cmt=$(echo bogus | git commit-tree $tree) &&
>> - test_when_finished "remove_object $cmt" &&
>> - git update-ref refs/heads/bogus $cmt &&
>> - test_when_finished "git update-ref -d refs/heads/bogus" &&
>> -
>> - test_must_fail git fsck 2>out &&
>> - test_i18ngrep "$sha.*corrupt" out
>> +test_expect_success 'object with hash mismatch' '
>> + git init --bare hash-mismatch &&
>> + (
>> + cd hash-mismatch &&
>> + oid=$(echo blob | git hash-object -w --stdin) &&
>> + old=$(test_oid_to_path "$oid") &&
>> + new=$(dirname $old)/$(test_oid ff_2) &&
>> + oid="$(dirname $new)$(basename $new)" &&
>> + mv objects/$old objects/$new &&
>> + git update-index --add --cacheinfo 100644 $oid foo &&
>> + tree=$(git write-tree) &&
>> + cmt=$(echo bogus | git commit-tree $tree) &&
>> + git update-ref refs/heads/bogus $cmt &&
>> + test_must_fail git fsck 2>out &&
>> + test_i18ngrep "$oid.*corrupt" out
>> + )
>> '
>
> This all looks fine to me. The translation is s/sha/oid and removing all
> of the now-unnecessary test_when_finished calls.
>
> But the test_i18ngrep (which isn't new) could probably also stand to get
> cleaned up and converted to a normal grep.
Thanks, I missed that one!
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-09-07 10:57 ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57 ` Ævar Arnfjörð Bjarmason
2021-09-16 19:51 ` Taylor Blau
2021-09-07 10:57 ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
` (21 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.
This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 7becab5ba1e..f10d6f7b7e8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
+test_expect_success 'fsck hard errors on an invalid object type' '
+ git init --bare garbage-type &&
+ empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+ garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+ test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
+ test_cmp err.expect err.actual &&
+ test_must_be_empty out.actual
+'
+
test_done
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type
2021-09-07 10:57 ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-16 19:51 ` Taylor Blau
2021-09-17 9:39 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 19:51 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:57:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a blindspot in the fsck tests by checking what we do when we
> encounter an unknown "garbage" type produced with hash-object's
> --literally option.
>
> This behavior needs to be improved, which'll be done in subsequent
> patches, but for now let's test for the current behavior.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> t/t1450-fsck.sh | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 7becab5ba1e..f10d6f7b7e8 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
> test_i18ngrep "bad index file" errors
> '
>
> +test_expect_success 'fsck hard errors on an invalid object type' '
> + git init --bare garbage-type &&
I wondered whether it was really possible to not cover this, since I
figured such a test may have just been hiding elsewhere. But we really
do seem to be lacking coverage. So, adding this test is good.
> + empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
> + garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
I'm nitpicking, but I find the -C garbage-type pattern less than ideal
for two reasons:
- It makes every line longer (since "-C garbage type" is wider than an
8-wide tab, even indenting this in a subshell would take up fewer
characters visually)
- It pollutes the current directory with things like "err.expect" and
"err.actual" that have nothing to do with the current directory (and
much more to do with the garbage-type repository within it).
So I don't care, really, but it may be better to just put all of this in
a subshell.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type
2021-09-16 19:51 ` Taylor Blau
@ 2021-09-17 9:39 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-17 9:39 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Thu, Sep 16 2021, Taylor Blau wrote:
> On Tue, Sep 07, 2021 at 12:57:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Fix a blindspot in the fsck tests by checking what we do when we
>> encounter an unknown "garbage" type produced with hash-object's
>> --literally option.
>>
>> This behavior needs to be improved, which'll be done in subsequent
>> patches, but for now let's test for the current behavior.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>> t/t1450-fsck.sh | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
>> index 7becab5ba1e..f10d6f7b7e8 100755
>> --- a/t/t1450-fsck.sh
>> +++ b/t/t1450-fsck.sh
>> @@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
>> test_i18ngrep "bad index file" errors
>> '
>>
>> +test_expect_success 'fsck hard errors on an invalid object type' '
>> + git init --bare garbage-type &&
>
> I wondered whether it was really possible to not cover this, since I
> figured such a test may have just been hiding elsewhere. But we really
> do seem to be lacking coverage. So, adding this test is good.
>
>> + empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
>> + garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
>
> I'm nitpicking, but I find the -C garbage-type pattern less than ideal
> for two reasons:
>
> - It makes every line longer (since "-C garbage type" is wider than an
> 8-wide tab, even indenting this in a subshell would take up fewer
> characters visually)
>
> - It pollutes the current directory with things like "err.expect" and
> "err.actual" that have nothing to do with the current directory (and
> much more to do with the garbage-type repository within it).
>
> So I don't care, really, but it may be better to just put all of this in
> a subshell.
Yes, it does look much nicer like that. Thanks!
Some aspects of style I use I have some informed/strong opinion about,
like the teardown/setup pattern noted in [1], but for some other stuff
like this ... I think I was just following the pattern of some recent
test I'd read or something.
Well, one advantage of using "git -C" is that if it fails you can cd to
the trash directory and run the command you saw fail as-is without
cd-ing further, and in that case the "polluting" is a feature, you can
cat the top-level expect/actual consistently.
But I think on balance having the test itself be easier to read is more
important, so I'm going with the subshell.
1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-09-07 10:57 ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
2021-09-07 10:57 ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57 ` Ævar Arnfjörð Bjarmason
2021-09-16 19:57 ` Taylor Blau
2021-09-07 10:57 ` [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
` (20 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9faf0 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..3a7b138fe4e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,33 @@ test_expect_success '%(deltabase) reports packed delta bases' '
}
'
+missing_oid=$(test_oid deadbeef)
+test_expect_success 'error on type of missing object' '
+ cat >expect.err <<-\EOF &&
+ fatal: git cat-file: could not get object info
+ EOF
+ test_must_fail git cat-file -t $missing_oid >out 2>err &&
+ test_must_be_empty out &&
+ test_cmp expect.err err &&
+
+ test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
+ test_must_be_empty out &&
+ test_cmp expect.err err
+'
+
+test_expect_success 'error on size of missing object' '
+ cat >expect.err <<-\EOF &&
+ fatal: git cat-file: could not get object info
+ EOF
+ test_must_fail git cat-file -s $missing_oid >out 2>err &&
+ test_must_be_empty out &&
+ test_cmp expect.err err &&
+
+ test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
+ test_must_be_empty out &&
+ test_cmp expect.err err
+'
+
bogus_type="bogus"
bogus_content="bogus"
bogus_size=$(strlen "$bogus_content")
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
2021-09-07 10:57 ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-09-16 19:57 ` Taylor Blau
2021-09-16 20:01 ` Taylor Blau
2021-09-16 22:52 ` Ævar Arnfjörð Bjarmason
0 siblings, 2 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 19:57 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Test for what happens when the -t and -s flags are asked to operate on
> a missing object, this extends tests added in 3e370f9faf0 (t1006: add
> tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
> -s flags are the only ones that can be combined with
> --allow-unknown-type, so let's test with and without that flag.
I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
and "with `--allow-unknown-type` and without `--allow-unknown-type`".
Testing both the presence and absence of `--allow-unknown-type` seems
useful to me, but I'm not sure what testing `-t` and `-s` separately
buys us.
(If you really feel the need test both, I'd encourage looping like:
for arg in -t -s
do
test_must_fail git cat-file $arg $missing_oid >out 2>err &&
test_must_be_empty out &&
test_cmp expect.err err &&
test_must_fail git cat-file $arg --allow-unknown-type $missing_oid >out 2>err &&
test_must_be_empty out &&
test_cmp expect.err err
done &&
but I would be equally or perhaps even happier to just have one of the
two tests).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
2021-09-16 19:57 ` Taylor Blau
@ 2021-09-16 20:01 ` Taylor Blau
2021-09-16 22:52 ` Ævar Arnfjörð Bjarmason
1 sibling, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 20:01 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Thu, Sep 16, 2021 at 03:57:30PM -0400, Taylor Blau wrote:
> On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > Test for what happens when the -t and -s flags are asked to operate on
> > a missing object, this extends tests added in 3e370f9faf0 (t1006: add
> > tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
> > -s flags are the only ones that can be combined with
> > --allow-unknown-type, so let's test with and without that flag.
>
> I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
> and "with `--allow-unknown-type` and without `--allow-unknown-type`".
Ah. Reading the next patch makes me feel even more certain of this
advice. Consider squashing this and the next patch with my suggestion
to use a loop below?
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
2021-09-16 19:57 ` Taylor Blau
2021-09-16 20:01 ` Taylor Blau
@ 2021-09-16 22:52 ` Ævar Arnfjörð Bjarmason
1 sibling, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-16 22:52 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Thu, Sep 16 2021, Taylor Blau wrote:
> On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Test for what happens when the -t and -s flags are asked to operate on
>> a missing object, this extends tests added in 3e370f9faf0 (t1006: add
>> tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
>> -s flags are the only ones that can be combined with
>> --allow-unknown-type, so let's test with and without that flag.
>
> I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
> and "with `--allow-unknown-type` and without `--allow-unknown-type`".
>
> Testing both the presence and absence of `--allow-unknown-type` seems
> useful to me, but I'm not sure what testing `-t` and `-s` separately
> buys us.
>
> (If you really feel the need test both, I'd encourage looping like:
Thanks, I'll try to simplify it.
> for arg in -t -s
> do
> test_must_fail git cat-file $arg $missing_oid >out 2>err &&
> test_must_be_empty out &&
> test_cmp expect.err err &&
>
> test_must_fail git cat-file $arg --allow-unknown-type $missing_oid >out 2>err &&
> test_must_be_empty out &&
> test_cmp expect.err err
> done &&
>
> but I would be equally or perhaps even happier to just have one of the
> two tests).
A loop like that can be further simplified as just (just inlining
arg=-s):
test_must_fail git cat-file -s $missing_oid >out 2>err &&
test_must_be_empty out &&
test_cmp expect.err err &&
test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
test_must_be_empty out &&
test_cmp expect.err err
:)
I.e. unless you end &&-chains in loops in the test framework with an ||
return 1 you're only testing your last iteration. Aside from whatever
I'm doing here I generally prefer to either just spell it out twice (if
small enough), or:
for arg in -t -s
do
test_expect_success '...' "[... use $arg ...]"
done
Which both nicely get around the issue of that easy-to-make mistake.
We've got some in-tree tests that are broken this way, well, at least
4cf67869b2a (list-objects.c: don't segfault for missing cmdline objects,
2018-12-05). But I think I'll leave that for a #leftoverbits submission
given my outstanding patch queue..., oh there's another one in
t1010-mktree.sh ... :)
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (2 preceding siblings ...)
2021-09-07 10:57 ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:58 ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
` (19 subsequent siblings)
23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Fix a blindspot in the tests for the --allow-unknown-type feature
added in 39e4ae38804 (cat-file: teach cat-file a
'--allow-unknown-type' option, 2015-05-03). We should check that
--allow-unknown-type isn't on by default.
Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 3a7b138fe4e..5e05ea0861e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -347,6 +347,20 @@ bogus_content="bogus"
bogus_size=$(strlen "$bogus_content")
bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+ test_cmp err.expect err.actual &&
+ test_must_be_empty out.actual &&
+
+ test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+ test_cmp err.expect err.actual &&
+ test_must_be_empty out.actual
+'
+
test_expect_success "Type of broken object is correct" '
echo $bogus_type >expect &&
git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -363,6 +377,21 @@ bogus_content="bogus"
bogus_size=$(strlen "$bogus_content")
bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
+ cat >err.expect <<-EOF &&
+ error: unable to unpack $bogus_sha1 header
+ fatal: git cat-file: could not get object info
+ EOF
+
+ test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+ test_cmp err.expect err.actual &&
+ test_must_be_empty out.actual &&
+
+ test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+ test_cmp err.expect err.actual &&
+ test_must_be_empty out.actual
+'
+
test_expect_success "Type of broken object is correct when type is large" '
echo $bogus_type >expect &&
git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (3 preceding siblings ...)
2021-09-07 10:57 ` [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-16 20:40 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 06/22] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
` (18 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8384a (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t6115-rev-list-du.sh | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..edb2ed55846 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,15 @@ check_du HEAD
check_du --objects HEAD
check_du --objects HEAD^..HEAD
+test_expect_success 'setup garbage repository' '
+ git clone --bare . garbage.git &&
+ garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
+ git -C garbage.git rev-list --objects --all --disk-usage &&
+
+ # Manually create a ref because "update-ref", "tag" etc. have
+ # no corresponding --literally option.
+ echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
+ test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
+'
+
test_done
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types
2021-09-07 10:58 ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-16 20:40 ` Taylor Blau
2021-09-17 11:59 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 20:40 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:00PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a blindspot in the tests for the "rev-list --disk-usage" feature
> added in 16950f8384a (rev-list: add --disk-usage option for
> calculating disk usage, 2021-02-09) to test for what happens when it's
> asked to calculate the disk usage of invalid object types.
I'm not sure that I agree this is a blindspot, or at least one worth
testing. Is the goal to add tests to every Git command that might have
to do something with a corrupt object and make sure that it is handled
correctly?
I'm not sure that doing so would be useful, or at the very least that it
would be worth the effort. That's not to say I'm not interested in
having tests fail when we don't handle corrupt objects correctly, but
more to say that I think there are so many parts of Git that might touch
a corrupt object that trying to test all of them seems like a losing
battle.
Assuming that this is a useful direction, though...
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> t/t6115-rev-list-du.sh | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
> index b4aef32b713..edb2ed55846 100755
> --- a/t/t6115-rev-list-du.sh
> +++ b/t/t6115-rev-list-du.sh
> @@ -48,4 +48,15 @@ check_du HEAD
> check_du --objects HEAD
> check_du --objects HEAD^..HEAD
>
> +test_expect_success 'setup garbage repository' '
> + git clone --bare . garbage.git &&
Since this is cloned within the working directory, should we bother to
clean this up to avoid munging with future tests?
> + garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
> + git -C garbage.git rev-list --objects --all --disk-usage &&
> +
> + # Manually create a ref because "update-ref", "tag" etc. have
> + # no corresponding --literally option.
> + echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
> + test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
See also my earlier comment about this being much more readable in a
sub-shell.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types
2021-09-16 20:40 ` Taylor Blau
@ 2021-09-17 11:59 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-17 11:59 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Thu, Sep 16 2021, Taylor Blau wrote:
> On Tue, Sep 07, 2021 at 12:58:00PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Fix a blindspot in the tests for the "rev-list --disk-usage" feature
>> added in 16950f8384a (rev-list: add --disk-usage option for
>> calculating disk usage, 2021-02-09) to test for what happens when it's
>> asked to calculate the disk usage of invalid object types.
>
> I'm not sure that I agree this is a blindspot, or at least one worth
> testing. Is the goal to add tests to every Git command that might have
> to do something with a corrupt object and make sure that it is handled
> correctly?
>
> I'm not sure that doing so would be useful, or at the very least that
> it would be worth the effort. [...] I think there are so many parts of
> Git that might touch a corrupt object that trying to test all of them
> seems like a losing battle.
I'll drop it since it doesn't have anything directly to do with this
series. This slipped in from the work I meant to follow-up after this
with.
This isn't just any random command that might come across an invalid
object though, it's specifically reporting object sizes. Once we change
that to not die we'll we'll want to see how invalid objects are handled
by it. Will the disk size be reported as -1? 0? ~0?
> [...]
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>> t/t6115-rev-list-du.sh | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
>> index b4aef32b713..edb2ed55846 100755
>> --- a/t/t6115-rev-list-du.sh
>> +++ b/t/t6115-rev-list-du.sh
>> @@ -48,4 +48,15 @@ check_du HEAD
>> check_du --objects HEAD
>> check_du --objects HEAD^..HEAD
>>
>> +test_expect_success 'setup garbage repository' '
>> + git clone --bare . garbage.git &&
>
> Since this is cloned within the working directory, should we bother to
> clean this up to avoid munging with future tests?
In general (and I had some other replies with this) I think no, if a an
individual test is picking a unique name for its data it doesn't need to
bother with test_when_finished, it can just leave the cleanup to the
eventual trash directory cleanup.
>> + garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
>> + git -C garbage.git rev-list --objects --all --disk-usage &&
>> +
>> + # Manually create a ref because "update-ref", "tag" etc. have
>> + # no corresponding --literally option.
>> + echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
>> + test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
>
> See also my earlier comment about this being much more readable in a
> sub-shell.
*nod*
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 06/22] cat-file tests: add corrupt loose object test
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (4 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:58 ` [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
` (17 subsequent siblings)
23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5e05ea0861e..8f3516db188 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -404,6 +404,58 @@ test_expect_success "Size of large broken object is correct when type is large"
test_cmp expect actual
'
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+ git init --bare corrupt-loose.git &&
+ (
+ cd corrupt-loose.git &&
+
+ # Setup and create the empty blob and its path
+ empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+ git hash-object -w --stdin </dev/null &&
+
+ # Create another blob and its path
+ echo other >other.blob &&
+ other_blob=$(git hash-object -w --stdin <other.blob) &&
+ other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+ # Before the swap the size is 0
+ cat >out.expect <<-EOF &&
+ 0
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # Swap the two to corrupt the repository
+ mv -f "$other_path" "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "hash mismatch" err.fsck &&
+
+ # confirm that cat-file is reading the new swapped-in
+ # blob...
+ cat >out.expect <<-EOF &&
+ blob
+ EOF
+ git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # ... since it has a different size now.
+ cat >out.expect <<-EOF &&
+ 6
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # So far "cat-file" has been happy to spew the found
+ # content out as-is. Try to make it zlib-invalid.
+ mv -f other.blob "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "^error: inflate: data stream error (" err.fsck
+ )
+'
+
# Tests for git cat-file --follow-symlinks
test_expect_success 'prep for symlink tests' '
echo_without_newline "$hello_content" >morx &&
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (5 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 06/22] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:58 ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
` (16 subsequent siblings)
23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.
1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8f3516db188..98729f1edfc 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -361,6 +361,46 @@ test_expect_success 'die on broken object under -t and -s without --allow-unknow
test_must_be_empty out.actual
'
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+ git cat-file -e $bogus_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+ test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+ test_must_fail git cat-file -p $bogus_sha1 &&
+ test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type "bogus"
+ EOF
+ test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+ echo $bogus_sha1 >bogus-oid &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual &&
+
+ test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+ test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+ test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
test_expect_success "Type of broken object is correct" '
echo $bogus_type >expect &&
git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -372,6 +412,27 @@ test_expect_success "Size of broken object is correct" '
git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
test_cmp expect actual
'
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+ cat >expect <<-EOF &&
+ $bogus_type
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ test_cmp expect actual &&
+
+ # Create it manually, as "git replace" will die on bogus
+ # types.
+ head=$(git rev-parse --verify HEAD) &&
+ mkdir -p .git/refs/replace &&
+ echo $head >.git/refs/replace/$bogus_sha1 &&
+
+ cat >expect <<-EOF &&
+ commit
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ test_cmp expect actual
+'
+
bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
bogus_content="bogus"
bogus_size=$(strlen "$bogus_content")
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (6 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-16 21:29 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
` (15 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.
That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.
Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.
Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.
This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.
Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.
Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/object-file.c b/object-file.c
index a8be8994814..bda3497d5ca 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1503,8 +1503,6 @@ static int loose_object_info(struct repository *r,
git_inflate_end(&stream);
munmap(map, mapsize);
- if (status && oi->typep)
- *oi->typep = status;
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero
2021-09-07 10:58 ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:29 ` Taylor Blau
2021-09-16 21:56 ` Jeff King
0 siblings, 1 reply; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 21:29 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
> When the loose_object_info() function returns an error stop faking up
> the "oi->typep" to OBJ_BAD. Let the return value of the function
> itself suffice. This code cleanup simplifies subsequent changes.
The obvious danger (which you mention) is that somebody is relying on
what typep points to, and is reading it even if we returned non-zero
from whatever called this function.
Hopefully nobody is, but this change makes me a little uncomfortable
nonetheless, since there are so many potential callers (even though this
function has only one caller, it doesn't take long before the number of
indirect callers explodes).
So it would be nice if we could do without it, but you claim that it
simplifies changes that happen later on. So let's continue to see if we
really do need it...
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero
2021-09-16 21:29 ` Taylor Blau
@ 2021-09-16 21:56 ` Jeff King
0 siblings, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-09-16 21:56 UTC (permalink / raw)
To: Taylor Blau
Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
Jonathan Tan, Andrei Rybak
On Thu, Sep 16, 2021 at 05:29:30PM -0400, Taylor Blau wrote:
> On Tue, Sep 07, 2021 at 12:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > When the loose_object_info() function returns an error stop faking up
> > the "oi->typep" to OBJ_BAD. Let the return value of the function
> > itself suffice. This code cleanup simplifies subsequent changes.
>
> The obvious danger (which you mention) is that somebody is relying on
> what typep points to, and is reading it even if we returned non-zero
> from whatever called this function.
>
> Hopefully nobody is, but this change makes me a little uncomfortable
> nonetheless, since there are so many potential callers (even though this
> function has only one caller, it doesn't take long before the number of
> indirect callers explodes).
>
> So it would be nice if we could do without it, but you claim that it
> simplifies changes that happen later on. So let's continue to see if we
> really do need it...
I'm actually reasonable comfortable with this patch. If we return an
error from the *_object_info() functions, then I think all bets are off
on what is in the resulting object_info struct. E.g., we'd already leave
sizep uninitialized in such a case.
It feels like oi->typep may be a little bit special because we conflate
"error" and "type" in the return from oid_object_info(). But
oid_object_info_extended() does not do that, and the innards of
oid_object_info() do the right thing.
Of course we _have_ been setting typep in this way for a while, so it's
worth making sure nobody is depending on. Notably packed_object_info()
does not behave in this way; if it hits an error, typep may be left
unset. So any oid_object_info_extended() callers depending on this were
already potentially buggy. I'd be OK with a quick sweep of the hits of
"git grep typep" here.
I just did that, and all the sites look pretty reasonable (they call
oid_object_info_extended() and bail as soon as they see that it fails).
-Peff
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 09/22] cache.h: move object functions to object-store.h
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (7 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-16 21:33 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
` (14 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 10 ----------
object-store.h | 9 +++++++++
2 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/cache.h b/cache.h
index d23de693680..11a04a93436 100644
--- a/cache.h
+++ b/cache.h
@@ -1313,16 +1313,6 @@ char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
- void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
extern const signed char hexval_table[256];
static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index d24915ced1b..eb4876ec983 100644
--- a/object-store.h
+++ b/object-store.h
@@ -485,4 +485,13 @@ int for_each_object_in_pack(struct packed_git *p,
int for_each_packed_object(each_packed_object_fn, void *,
enum for_each_object_flags flags);
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+ unsigned long mapsize, void *buffer,
+ unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+ void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
#endif /* OBJECT_STORE_H */
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 09/22] cache.h: move object functions to object-store.h
2021-09-07 10:58 ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:33 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 21:33 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Move the declaration of some ancient object functions added in
> e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
> 2005-06-01) from cache.h to object-store.h. This continues work
> started in cbd53a2193d (object-store: move object access functions to
> object-store.h, 2018-05-15).
This builds with DEVELOPER=1, likely as a result of all of the includes
on object-store.h added in cbd53a2193d.
> diff --git a/cache.h b/cache.h
> index d23de693680..11a04a93436 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1313,16 +1313,6 @@ char *xdg_cache_home(const char *filename);
>
> int git_open_cloexec(const char *name, int flags);
> #define git_open(name) git_open_cloexec(name, O_RDONLY)
> -int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
> -int parse_loose_header(const char *hdr, unsigned long *sizep);
> -
> -int check_object_signature(struct repository *r, const struct object_id *oid,
> - void *buf, unsigned long size, const char *type);
> -
> -int finalize_object_file(const char *tmpfile, const char *filename);
> -
> -/* Helper to check and "touch" a file */
I'm fine to drop this comment, by the way, since it does not add any
explanation to what a function called check_and_freshen_file() might do
;).
> -int check_and_freshen_file(const char *fn, int freshen);
Everything else looks fine, although it's unclear how this is related to
the rest of your series.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (8 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-16 21:39 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
` (13 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.
This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 21 ++++++++-------------
object-store.h | 3 ++-
streaming.c | 5 ++++-
3 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/object-file.c b/object-file.c
index bda3497d5ca..7a47af68bd8 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1363,8 +1363,9 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr,
+ struct object_info *oi,
+ unsigned int flags)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1424,14 +1425,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
return *hdr ? -1 : type;
}
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
- struct object_info oi = OBJECT_INFO_INIT;
-
- oi.sizep = sizep;
- return parse_loose_header_extended(hdr, &oi, 0);
-}
-
static int loose_object_info(struct repository *r,
const struct object_id *oid,
struct object_info *oi, int flags)
@@ -1486,10 +1479,10 @@ static int loose_object_info(struct repository *r,
if (status < 0)
; /* Do nothing */
else if (hdrbuf.len) {
- if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+ if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
- } else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+ } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
status = error(_("unable to parse %s header"), oid_to_hex(oid));
if (status >= 0 && oi->contentp) {
@@ -2573,6 +2566,8 @@ int read_loose_object(const char *path,
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = size;
*contents = NULL;
@@ -2587,7 +2582,7 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, size);
+ *type = parse_loose_header(hdr, &oi, 0);
if (*type < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index eb4876ec983..25e641a606f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -488,7 +488,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
int unpack_loose_header(git_zstream *stream, unsigned char *map,
unsigned long mapsize, void *buffer,
unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags);
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
const struct object_id *oid,
enum object_type *type)
{
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = &st->size;
+
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapsize,
st->u.loose.hdr,
sizeof(st->u.loose.hdr)) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+ (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
return -1;
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public
2021-09-07 10:58 ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:39 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 21:39 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Make the parse_loose_header_extended() function public and remove the
> parse_loose_header() wrapper. The only direct user of it outside of
> object-file.c itself was in streaming.c, that caller can simply pass
> the required "struct object-info *" instead.
>
> This change is being done in preparation for teaching
> read_loose_object() to accept a flag to pass to
> parse_loose_header(). It isn't strictly necessary for that change, we
> could simply use parse_loose_header_extended() there, but will leave
> the API in a better end state.
All seems reasonable. I agree that this is not a necessary step, but at
least the clean-up is self contained and an easy enough read.
The flag that read_loose_object() is going to start passing to
parse_loose_header() is left a bit vague, but I'll continue reading to
figure out what it is.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info()
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (9 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:58 ` [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
` (12 subsequent siblings)
23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Change the formatting in loose_object_info() to conform with our usual
coding style:
When there are multiple arms to a conditional and some of them
require braces, enclose even a single line block in braces for
consistency -- Documentation/CodingGuidelines
This formatting-only change makes a subsequent commit easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/object-file.c b/object-file.c
index 7a47af68bd8..878a4298c9b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1473,17 +1473,20 @@ static int loose_object_info(struct repository *r,
if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
status = error(_("unable to unpack %s header with --allow-unknown-type"),
oid_to_hex(oid));
- } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+ } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
- if (status < 0)
- ; /* Do nothing */
- else if (hdrbuf.len) {
+ }
+
+ if (status < 0) {
+ /* Do nothing */
+ } else if (hdrbuf.len) {
if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
- } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
+ } else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
status = error(_("unable to parse %s header"), oid_to_hex(oid));
+ }
if (status >= 0 && oi->contentp) {
*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1492,8 +1495,9 @@ static int loose_object_info(struct repository *r,
git_inflate_end(&stream);
status = -1;
}
- } else
+ } else {
git_inflate_end(&stream);
+ }
munmap(map, mapsize);
if (oi->sizep == &size_scratch)
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header()
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (10 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:58 ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
` (11 subsequent siblings)
23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.
The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).
Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.
I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 60 ++++++++++++++++++--------------------------------
object-store.h | 14 +++++++++++-
streaming.c | 3 ++-
3 files changed, 37 insertions(+), 40 deletions(-)
diff --git a/object-file.c b/object-file.c
index 878a4298c9b..2dd4cdd1ae0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,11 +1233,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-static int unpack_loose_short_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+ unsigned char *map, unsigned long mapsize,
+ void *buffer, unsigned long bufsiz,
+ struct strbuf *header)
{
- int ret;
+ int status;
/* Get the data stream */
memset(stream, 0, sizeof(*stream));
@@ -1248,44 +1249,25 @@ static int unpack_loose_short_header(git_zstream *stream,
git_inflate_init(stream);
obj_read_unlock();
- ret = git_inflate(stream, 0);
+ status = git_inflate(stream, 0);
obj_read_lock();
-
- return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
-{
- int status = unpack_loose_short_header(stream, map, mapsize,
- buffer, bufsiz);
-
if (status < Z_OK)
return status;
- /* Make sure we have the terminating NUL */
- if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return -1;
- return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *header)
-{
- int status;
-
- status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
- if (status < Z_OK)
- return -1;
-
/*
* Check if entire header is unpacked in the first iteration.
*/
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
return 0;
+ /*
+ * We have a header longer than MAX_HEADER_LEN. The "header"
+ * here is only non-NULL when we run "cat-file
+ * --allow-unknown-type".
+ */
+ if (!header)
+ return -1;
+
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
* result out to header, and then append the result of further
@@ -1433,9 +1415,11 @@ static int loose_object_info(struct repository *r,
unsigned long mapsize;
void *map;
git_zstream stream;
+ int hdr_ret;
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
oidclr(oi->delta_base_oid);
@@ -1469,11 +1453,10 @@ static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
- if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
- status = error(_("unable to unpack %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+
+ hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL);
+ if (hdr_ret < 0) {
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
}
@@ -2581,7 +2564,8 @@ int read_loose_object(const char *path,
goto out;
}
- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ NULL) < 0) {
error(_("unable to unpack header of %s"), path);
goto out;
}
diff --git a/object-store.h b/object-store.h
index 25e641a606f..4064710ae29 100644
--- a/object-store.h
+++ b/object-store.h
@@ -485,9 +485,21 @@ int for_each_object_in_pack(struct packed_git *p,
int for_each_packed_object(each_packed_object_fn, void *,
enum for_each_object_flags flags);
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
int unpack_loose_header(git_zstream *stream, unsigned char *map,
unsigned long mapsize, void *buffer,
- unsigned long bufsiz);
+ unsigned long bufsiz, struct strbuf *hdrbuf);
int parse_loose_header(const char *hdr, struct object_info *oi,
unsigned int flags);
int check_object_signature(struct repository *r, const struct object_id *oid,
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped,
st->u.loose.mapsize,
st->u.loose.hdr,
- sizeof(st->u.loose.hdr)) < 0) ||
+ sizeof(st->u.loose.hdr),
+ NULL) < 0) ||
(parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header()
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (11 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-16 21:58 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
` (10 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
This minor formatting change serves to make a subsequent patch easier
to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/object-file.c b/object-file.c
index 2dd4cdd1ae0..7c6a865a6c0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1404,7 +1404,10 @@ int parse_loose_header(const char *hdr,
/*
* The length must be followed by a zero byte
*/
- return *hdr ? -1 : type;
+ if (*hdr)
+ return -1;
+
+ return type;
}
static int loose_object_info(struct repository *r,
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header()
2021-09-07 10:58 ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:58 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-16 21:58 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:08PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This minor formatting change serves to make a subsequent patch easier
> to read.
Hmm. I'm not sure if I agree.
As far as I can tell from reading the subsequent patch, this is designed
to make it easier to add a comment above "return type" that pertains
just to the case when !*hdr.
I think it would have been fine to go from the ternary to this style
with the comment in a single patch. But I also think it would have been
fine to add the comment above the ternary and instead start it off by
saying "when !*hdr, ...".
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 14/22] object-file.c: stop dying in parse_loose_header()
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (12 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-17 2:32 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
` (9 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Start the libification of parse_loose_header() by making it return
error codes and data instead of invoking die() by itself. For now
we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller, but in subsequent
commits we'll also libify those.
Since the refactoring of parse_loose_header_extended() into
parse_loose_header() in an earlier commit, its interface accepts a
"unsigned long *sizep". Rather it accepts a "struct object_info *",
that structure will be populated with information about the object.
It thus makes sense to further libify the interface so that it stops
calling die() when it encounters OBJ_BAD, and instead rely on its
callers to check the populated "oi->typep".
Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().
This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.
In another case added in c84a1f3ed4d (sha1_file: refactor read_object,
2017-06-21) (but the behavior pre-dated that) we did checks of "status
>= 0", because at that point "status" had become the return value of
parse_loose_header(). I.e. a non-negative "enum object_type" (unless
we -1, aka. OBJ_BAD).
Now that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 53 ++++++++++++++++++++++++++------------------------
object-store.h | 13 +++++++++++--
streaming.c | 4 +++-
3 files changed, 42 insertions(+), 28 deletions(-)
diff --git a/object-file.c b/object-file.c
index 7c6a865a6c0..d656960422d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1345,9 +1345,7 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-int parse_loose_header(const char *hdr,
- struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1369,15 +1367,6 @@ int parse_loose_header(const char *hdr,
type = type_from_string_gently(type_buf, type_len, 1);
if (oi->type_name)
strbuf_add(oi->type_name, type_buf, type_len);
- /*
- * Set type to 0 if its an unknown object and
- * we're obtaining the type using '--allow-unknown-type'
- * option.
- */
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
- type = 0;
- else if (type < 0)
- die(_("invalid object type"));
if (oi->typep)
*oi->typep = type;
@@ -1407,7 +1396,11 @@ int parse_loose_header(const char *hdr,
if (*hdr)
return -1;
- return type;
+ /*
+ * The format is valid, but the type may still be bogus. The
+ * Caller needs to check its oi->typep.
+ */
+ return 0;
}
static int loose_object_info(struct repository *r,
@@ -1422,6 +1415,8 @@ static int loose_object_info(struct repository *r,
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ enum object_type type_scratch;
+ int parsed_header = 0;
int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
@@ -1453,6 +1448,8 @@ static int loose_object_info(struct repository *r,
if (!oi->sizep)
oi->sizep = &size_scratch;
+ if (!oi->typep)
+ oi->typep = &type_scratch;
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
@@ -1463,18 +1460,20 @@ static int loose_object_info(struct repository *r,
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
}
-
- if (status < 0) {
- /* Do nothing */
- } else if (hdrbuf.len) {
- if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
- status = error(_("unable to parse %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
- status = error(_("unable to parse %s header"), oid_to_hex(oid));
+ if (!status) {
+ if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
+ /*
+ * oi->{sizep,typep} are meaningless unless
+ * parse_loose_header() returns >= 0.
+ */
+ parsed_header = 1;
+ else
+ status = error(_("unable to parse %s header"), oid_to_hex(oid));
}
+ if (!allow_unknown && parsed_header && *oi->typep < 0)
+ die(_("invalid object type"));
- if (status >= 0 && oi->contentp) {
+ if (parsed_header && oi->contentp) {
*oi->contentp = unpack_loose_rest(&stream, hdr,
*oi->sizep, oid);
if (!*oi->contentp) {
@@ -1489,6 +1488,8 @@ static int loose_object_info(struct repository *r,
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
+ if (oi->typep == &type_scratch)
+ oi->typep = NULL;
oi->whence = OI_LOOSE;
return (status < 0) ? status : 0;
}
@@ -2557,6 +2558,7 @@ int read_loose_object(const char *path,
git_zstream stream;
char hdr[MAX_HEADER_LEN];
struct object_info oi = OBJECT_INFO_INIT;
+ oi.typep = type;
oi.sizep = size;
*contents = NULL;
@@ -2573,12 +2575,13 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, &oi, 0);
- if (*type < 0) {
+ if (parse_loose_header(hdr, &oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
}
+ if (*type < 0)
+ die(_("invalid object type"));
if (*type == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 4064710ae29..584bf5556af 100644
--- a/object-store.h
+++ b/object-store.h
@@ -500,8 +500,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
int unpack_loose_header(git_zstream *stream, unsigned char *map,
unsigned long mapsize, void *buffer,
unsigned long bufsiz, struct strbuf *hdrbuf);
-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags);
+
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
+int parse_loose_header(const char *hdr, struct object_info *oi);
+
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..c3dc241d6a5 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
{
struct object_info oi = OBJECT_INFO_INIT;
oi.sizep = &st->size;
+ oi.typep = type;
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
@@ -235,7 +236,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.hdr,
sizeof(st->u.loose.hdr),
NULL) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
+ (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
+ *type < 0) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
return -1;
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 14/22] object-file.c: stop dying in parse_loose_header()
2021-09-07 10:58 ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-09-17 2:32 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17 2:32 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:09PM +0200, Ævar Arnfjörð Bjarmason wrote:
> It thus makes sense to further libify the interface so that it stops
> calling die() when it encounters OBJ_BAD, and instead rely on its
> callers to check the populated "oi->typep".
Hmm. I thought we got rid of this behavior in a previous commit? Perhaps
I'm thinking of something else, but I would certainly appreciate a
clarification :).
> @@ -1369,15 +1367,6 @@ int parse_loose_header(const char *hdr,
> type = type_from_string_gently(type_buf, type_len, 1);
> if (oi->type_name)
> strbuf_add(oi->type_name, type_buf, type_len);
> - /*
> - * Set type to 0 if its an unknown object and
> - * we're obtaining the type using '--allow-unknown-type'
> - * option.
> - */
> - if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
> - type = 0;
> - else if (type < 0)
> - die(_("invalid object type"));
Good, this part moved to loose_object_info() as you said it would.
> @@ -1463,18 +1460,20 @@ static int loose_object_info(struct repository *r,
> status = error(_("unable to unpack %s header"),
> oid_to_hex(oid));
> }
> -
> - if (status < 0) {
> - /* Do nothing */
> - } else if (hdrbuf.len) {
> - if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
> - status = error(_("unable to parse %s header with --allow-unknown-type"),
> - oid_to_hex(oid));
> - } else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
> - status = error(_("unable to parse %s header"), oid_to_hex(oid));
> + if (!status) {
> + if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
> + /*
> + * oi->{sizep,typep} are meaningless unless
> + * parse_loose_header() returns >= 0.
> + */
This double negative is a little confusing. Clearer to say:
"oi->{size,type}p is meaningless if parse_loose_header() returns < 0"?
But I was also a little confused to see that the expression we are
checking here is just that parse_loose_header() returned zero. What
about other positive values?
I think we should either update the comment to say "unless it returns
zero" or the conditional expression to check for >= 0.
> diff --git a/object-store.h b/object-store.h
> index 4064710ae29..584bf5556af 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -500,8 +500,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
> int unpack_loose_header(git_zstream *stream, unsigned char *map,
> unsigned long mapsize, void *buffer,
> unsigned long bufsiz, struct strbuf *hdrbuf);
> -int parse_loose_header(const char *hdr, struct object_info *oi,
> - unsigned int flags);
> +
> +/**
> + * parse_loose_header() parses the starting "<type> <len>\0" of an
> + * object. If it doesn't follow that format -1 is returned. To check
> + * the validity of the <type> populate the "typep" in the "struct
> + * object_info". It will be OBJ_BAD if the object type is unknown. The
> + * parsed <len> can be retrieved via "oi->sizep", and from there
> + * passed to unpack_loose_rest().
> + */
> +int parse_loose_header(const char *hdr, struct object_info *oi);
OK, I guess this must be what I was confused about earlier (that I
thought we didn't support reading typep if returning OBJ_BAD). But it
seems odd to me that we would get rid of it elsewhere, yet continue
using this pattern here.
Or am I mistaken that the two are different?
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info()
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (13 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-17 2:35 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
` (8 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
An earlier version of the preceding commit had a subtle bug where our
"type_scratch" (later assigned to "oi->typep") would be uninitialized
and used in the "!allow_unknown" case, at which point it would contain
a nonsensical value if we'd failed to call parse_loose_header().
The preceding commit introduced "parsed_header" variable to check for
this case, but I think we can do better, let's carry a "oi_header"
variable initially set to NULL, and only set it to "oi" once we're
past parse_loose_header().
This is functionally the same thing, but hopefully makes it even more
obvious in the future that we must not access the "typep" and
"sizep" (or "type_name") unless parse_loose_header() succeeds, but
that accessing other fields set earlier (such as the "disk_sizep" set
earlier) is OK.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/object-file.c b/object-file.c
index d656960422d..ae6a37ab5fb 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1416,7 +1416,7 @@ static int loose_object_info(struct repository *r,
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
enum object_type type_scratch;
- int parsed_header = 0;
+ struct object_info *oi_header = NULL;
int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
@@ -1464,18 +1464,20 @@ static int loose_object_info(struct repository *r,
if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
/*
* oi->{sizep,typep} are meaningless unless
- * parse_loose_header() returns >= 0.
+ * parse_loose_header() returns >= 0. Let's
+ * access them as "oi_header" (just an alias
+ * for "oi") below to make that intent clear.
*/
- parsed_header = 1;
+ oi_header = oi;
else
status = error(_("unable to parse %s header"), oid_to_hex(oid));
}
- if (!allow_unknown && parsed_header && *oi->typep < 0)
+ if (!allow_unknown && oi_header && *oi_header->typep < 0)
die(_("invalid object type"));
- if (parsed_header && oi->contentp) {
+ if (oi_header && oi->contentp) {
*oi->contentp = unpack_loose_rest(&stream, hdr,
- *oi->sizep, oid);
+ *oi_header->sizep, oid);
if (!*oi->contentp) {
git_inflate_end(&stream);
status = -1;
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info()
2021-09-07 10:58 ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-09-17 2:35 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17 2:35 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> An earlier version of the preceding commit had a subtle bug where our
> "type_scratch" (later assigned to "oi->typep") would be uninitialized
> and used in the "!allow_unknown" case, at which point it would contain
> a nonsensical value if we'd failed to call parse_loose_header().
>
> The preceding commit introduced "parsed_header" variable to check for
> this case, but I think we can do better, let's carry a "oi_header"
> variable initially set to NULL, and only set it to "oi" once we're
> past parse_loose_header().
Everything in this patch seems OK to me.
For what it's worth, I think that this could likely have been folded
into the previous commit. I was just a little surprised to see
parsed_header go away after I had just a minute or two again spent time
thinking about what it was for.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header()
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (14 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:58 ` [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
` (7 subsequent siblings)
23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.
See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".
At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").
However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.
So let's do the minor cleanup of also changing this function to return
a -1.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/object-file.c b/object-file.c
index ae6a37ab5fb..11df4485147 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1252,7 +1252,7 @@ int unpack_loose_header(git_zstream *stream,
status = git_inflate(stream, 0);
obj_read_lock();
if (status < Z_OK)
- return status;
+ return -1;
/*
* Check if entire header is unpacked in the first iteration.
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header()
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (15 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:58 ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
` (6 subsequent siblings)
23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.
As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return -2 saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 16 +++++++++++++---
object-store.h | 6 ++++--
t/t1006-cat-file.sh | 2 +-
3 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/object-file.c b/object-file.c
index 11df4485147..0cb5287d3ef 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1266,7 +1266,7 @@ int unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
- return -1;
+ return -2;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
@@ -1287,7 +1287,7 @@ int unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
- return -1;
+ return -2;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ -1456,9 +1456,19 @@ static int loose_object_info(struct repository *r,
hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
allow_unknown ? &hdrbuf : NULL);
- if (hdr_ret < 0) {
+ switch (hdr_ret) {
+ case 0:
+ break;
+ case -1:
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
+ break;
+ case -2:
+ status = error(_("header for %s too long, exceeds %d bytes"),
+ oid_to_hex(oid), MAX_HEADER_LEN);
+ break;
+ default:
+ BUG("unknown hdr_ret value %d", hdr_ret);
}
if (!status) {
if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index 584bf5556af..e896b813f24 100644
--- a/object-store.h
+++ b/object-store.h
@@ -489,13 +489,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
* unpack_loose_header() initializes the data stream needed to unpack
* a loose object header.
*
- * Returns 0 on success. Returns negative values on error.
+ * Returns 0 on success. Returns negative values on error. If the
+ * header exceeds MAX_HEADER_LEN -2 will be returned.
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
* OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
* reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), -2 will still be returned from this
+ * function to indicate that the header was too long.
*/
int unpack_loose_header(git_zstream *stream, unsigned char *map,
unsigned long mapsize, void *buffer,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 98729f1edfc..43a9f4e7f0c 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
cat >err.expect <<-EOF &&
- error: unable to unpack $bogus_sha1 header
+ error: header for $bogus_sha1 too long, exceeds 32 bytes
fatal: git cat-file: could not get object info
EOF
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header()
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (16 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-17 2:45 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
` (5 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
In the preceding commits we changed and documented
unpack_loose_header() from return any negative value or zero, to only
-2, -1 or 0. Let's instead add an "enum unpack_loose_header_result"
type and use it, and have the compiler assert that we're exhaustively
covering all return values. This gets rid of the need for having a
"default" BUG() case in loose_object_info().
I'm on the fence about whether this is more readable or worth it, but
since it was suggested in [1] to do this let's go for it.
1. https://lore.kernel.org/git/20210527175433.2673306-1-jonathantanmy@google.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 20 ++++++++++----------
object-store.h | 27 ++++++++++++++++++++-------
streaming.c | 27 ++++++++++++++++-----------
3 files changed, 46 insertions(+), 28 deletions(-)
diff --git a/object-file.c b/object-file.c
index 0cb5287d3ef..9484c7ce2be 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,10 +1233,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz,
- struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *header)
{
int status;
@@ -1411,7 +1413,7 @@ static int loose_object_info(struct repository *r,
unsigned long mapsize;
void *map;
git_zstream stream;
- int hdr_ret;
+ enum unpack_loose_header_result hdr_ret;
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
@@ -1457,18 +1459,16 @@ static int loose_object_info(struct repository *r,
hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
allow_unknown ? &hdrbuf : NULL);
switch (hdr_ret) {
- case 0:
+ case UNPACK_LOOSE_HEADER_RESULT_OK:
break;
- case -1:
+ case UNPACK_LOOSE_HEADER_RESULT_BAD:
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
break;
- case -2:
+ case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
status = error(_("header for %s too long, exceeds %d bytes"),
oid_to_hex(oid), MAX_HEADER_LEN);
break;
- default:
- BUG("unknown hdr_ret value %d", hdr_ret);
}
if (!status) {
if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index e896b813f24..ac55b02f15a 100644
--- a/object-store.h
+++ b/object-store.h
@@ -485,23 +485,36 @@ int for_each_object_in_pack(struct packed_git *p,
int for_each_packed_object(each_packed_object_fn, void *,
enum for_each_object_flags flags);
+enum unpack_loose_header_result {
+ UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG = -2,
+ UNPACK_LOOSE_HEADER_RESULT_BAD = -1,
+ UNPACK_LOOSE_HEADER_RESULT_OK,
+
+};
+
/**
* unpack_loose_header() initializes the data stream needed to unpack
* a loose object header.
*
- * Returns 0 on success. Returns negative values on error. If the
- * header exceeds MAX_HEADER_LEN -2 will be returned.
+ * Returns UNPACK_LOOSE_HEADER_RESULT_OK on success. Returns
+ * UNPACK_LOOSE_HEADER_RESULT_BAD values on error, or if the header
+ * exceeds MAX_HEADER_LEN UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG will
+ * be returned.
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
* OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
* reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header(), -2 will still be returned from this
- * function to indicate that the header was too long.
+ * with parse_loose_header(), UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG
+ * will still be returned from this function to indicate that the
+ * header was too long.
*/
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *hdrbuf);
/**
* parse_loose_header() parses the starting "<type> <len>\0" of an
diff --git a/streaming.c b/streaming.c
index c3dc241d6a5..3e5045c004d 100644
--- a/streaming.c
+++ b/streaming.c
@@ -224,24 +224,25 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
enum object_type *type)
{
struct object_info oi = OBJECT_INFO_INIT;
+ enum unpack_loose_header_result hdr_ret;
oi.sizep = &st->size;
oi.typep = type;
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
- if ((unpack_loose_header(&st->z,
- st->u.loose.mapped,
- st->u.loose.mapsize,
- st->u.loose.hdr,
- sizeof(st->u.loose.hdr),
- NULL) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
- *type < 0) {
- git_inflate_end(&st->z);
- munmap(st->u.loose.mapped, st->u.loose.mapsize);
- return -1;
+ hdr_ret = unpack_loose_header(&st->z, st->u.loose.mapped,
+ st->u.loose.mapsize, st->u.loose.hdr,
+ sizeof(st->u.loose.hdr), NULL);
+ switch (hdr_ret) {
+ case UNPACK_LOOSE_HEADER_RESULT_OK:
+ break;
+ case UNPACK_LOOSE_HEADER_RESULT_BAD:
+ case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
+ goto error;
}
+ if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
+ goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
st->u.loose.hdr_avail = st->z.total_out;
@@ -250,6 +251,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->read = read_istream_loose;
return 0;
+error:
+ git_inflate_end(&st->z);
+ munmap(st->u.loose.mapped, st->u.loose.mapsize);
+ return -1;
}
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header()
2021-09-07 10:58 ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-17 2:45 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17 2:45 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
> In the preceding commits we changed and documented
> unpack_loose_header() from return any negative value or zero, to only
> -2, -1 or 0. Let's instead add an "enum unpack_loose_header_result"
> type and use it, and have the compiler assert that we're exhaustively
> covering all return values. This gets rid of the need for having a
> "default" BUG() case in loose_object_info().
>
> I'm on the fence about whether this is more readable or worth it, but
> since it was suggested in [1] to do this let's go for it.
:-). The first hunk is quite a long line, but I think that only suggests
the enum has a long name. I also can't think of anything shorter, so I
think what you have is just fine.
I do think that this is an improvement in readability, and for what it's
worth I am a fan of the previous two changes as well.
As a workflow comment, I would have perhaps done these conversions a
little earlier, maybe in these steps:
- First a patch to introduce unpack_loose_header_result with just OK
and BAD, and then converted all callers that return negative numbers
to return BAD (and all others to return OK).
- Then a second patch to convert some of the BAD returns into
BAD_TOO_LONG.
That gets things done in two patches, instead of three, at the cost of a
slightly more complicated first patch. But I think you also get some
more insight into why we're making the change in the first place instead
of having to read through a couple of commits to get there.
In any case, what you have is certainly fine, and I don't think that one
approach is any better or worse than the other. Just mentioning it in
case it's something may try in the future.
This patch looks good.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 19/22] fsck: don't hard die on invalid object types
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (17 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-17 3:37 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Change the error fsck emits on invalid object types, such as:
$ git hash-object --stdin -w -t garbage --literally </dev/null
<OID>
From the very ungraceful error of:
$ git fsck
fatal: invalid object type
$
To:
$ git fsck
error: hash mismatch for <OID_PATH> (expected <OID>)
error: <OID>: object corrupt or missing: <OID_PATH>
[ the rest of the fsck output here, i.e. it didn't hard die ]
We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).
To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it. See f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for the introduction of read_loose_object().
Why are we complaining about a "hash mismatch" for an object of a type
we don't know about? We shouldn't. This is the bare minimal change
needed to not make fsck hard die on a repository that's been corrupted
in this manner. In subsequent commits we'll teach fsck to recognize
this particular type of corruption and emit a better error message.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fsck.c | 3 ++-
object-file.c | 11 ++++++++---
object-store.h | 3 ++-
t/t1450-fsck.sh | 14 +++++++-------
4 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..082dadd5629 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
void *contents;
int eaten;
- if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+ if (read_loose_object(path, oid, &type, &size, &contents,
+ OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
errors_found |= ERROR_OBJECT;
error(_("%s: object corrupt or missing: %s"),
oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 9484c7ce2be..0e6937fad73 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2562,7 +2562,8 @@ int read_loose_object(const char *path,
const struct object_id *expected_oid,
enum object_type *type,
unsigned long *size,
- void **contents)
+ void **contents,
+ unsigned int oi_flags)
{
int ret = -1;
void *map = NULL;
@@ -2570,6 +2571,7 @@ int read_loose_object(const char *path,
git_zstream stream;
char hdr[MAX_HEADER_LEN];
struct object_info oi = OBJECT_INFO_INIT;
+ int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
oi.typep = type;
oi.sizep = size;
@@ -2592,8 +2594,11 @@ int read_loose_object(const char *path,
git_inflate_end(&stream);
goto out;
}
- if (*type < 0)
- die(_("invalid object type"));
+ if (!allow_unknown && *type < 0) {
+ error(_("header for %s declares an unknown type"), path);
+ git_inflate_end(&stream);
+ goto out;
+ }
if (*type == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index ac55b02f15a..c268662f5ba 100644
--- a/object-store.h
+++ b/object-store.h
@@ -253,7 +253,8 @@ int read_loose_object(const char *path,
const struct object_id *expected_oid,
enum object_type *type,
unsigned long *size,
- void **contents);
+ void **contents,
+ unsigned int oi_flags);
/* Retry packed storage after checking packed and loose storage */
#define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f10d6f7b7e8..d8303db9709 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,16 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
git init --bare garbage-type &&
empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
- cat >err.expect <<-\EOF &&
- fatal: invalid object type
- EOF
- test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
- test_cmp err.expect err.actual &&
- test_must_be_empty out.actual
+ test_must_fail git -C garbage-type fsck >out 2>err &&
+ grep -e "^error" -e "^fatal" err >errors &&
+ test_line_count = 2 errors &&
+ grep "error: hash mismatch for" err &&
+ grep "$garbage_blob: object corrupt or missing:" err &&
+ grep "dangling blob $empty_blob" out
'
test_done
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 19/22] fsck: don't hard die on invalid object types
2021-09-07 10:58 ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-17 3:37 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17 3:37 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:14PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Change the error fsck emits on invalid object types, such as:
>
> $ git hash-object --stdin -w -t garbage --literally </dev/null
> <OID>
>
> >From the very ungraceful error of:
>
> $ git fsck
> fatal: invalid object type
> $
>
> To:
>
> $ git fsck
> error: hash mismatch for <OID_PATH> (expected <OID>)
> error: <OID>: object corrupt or missing: <OID_PATH>
> [ the rest of the fsck output here, i.e. it didn't hard die ]
Great. I don't love the second error (since it doesn't really give the
user any new information when read after the first) but that's fsck's
fault, and not your patch's.
> To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
> flag from read_loose_object() through to parse_loose_header(). Since
> the read_loose_object() function is only used in builtin/fsck.c we can
> simply change it. See f6371f92104 (sha1_file: add read_loose_object()
> function, 2017-01-13) for the introduction of read_loose_object().
>
> Why are we complaining about a "hash mismatch" for an object of a type
> we don't know about? We shouldn't. This is the bare minimal change
> needed to not make fsck hard die on a repository that's been corrupted
> in this manner. In subsequent commits we'll teach fsck to recognize
> this particular type of corruption and emit a better error message.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> builtin/fsck.c | 3 ++-
> object-file.c | 11 ++++++++---
> object-store.h | 3 ++-
> t/t1450-fsck.sh | 14 +++++++-------
> 4 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index b42b6fe21f7..082dadd5629 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> void *contents;
> int eaten;
>
> - if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
> + if (read_loose_object(path, oid, &type, &size, &contents,
> + OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
> errors_found |= ERROR_OBJECT;
> error(_("%s: object corrupt or missing: %s"),
> oid_to_hex(oid), path);
> diff --git a/object-file.c b/object-file.c
> index 9484c7ce2be..0e6937fad73 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2562,7 +2562,8 @@ int read_loose_object(const char *path,
> const struct object_id *expected_oid,
> enum object_type *type,
> unsigned long *size,
> - void **contents)
> + void **contents,
> + unsigned int oi_flags)
> {
> int ret = -1;
> void *map = NULL;
> @@ -2570,6 +2571,7 @@ int read_loose_object(const char *path,
> git_zstream stream;
> char hdr[MAX_HEADER_LEN];
> struct object_info oi = OBJECT_INFO_INIT;
> + int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> oi.typep = type;
> oi.sizep = size;
>
> @@ -2592,8 +2594,11 @@ int read_loose_object(const char *path,
> git_inflate_end(&stream);
> goto out;
> }
> - if (*type < 0)
> - die(_("invalid object type"));
> + if (!allow_unknown && *type < 0) {
> + error(_("header for %s declares an unknown type"), path);
> + git_inflate_end(&stream);
> + goto out;
> + }
Hmm. I'm not sure that I new test for this error (which may be
uninteresting, in which case it is fine to skip).
>
> -test_expect_success 'fsck hard errors on an invalid object type' '
> +test_expect_success 'fsck error and recovery on invalid object type' '
> git init --bare garbage-type &&
> empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
> garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
> - cat >err.expect <<-\EOF &&
> - fatal: invalid object type
> - EOF
> - test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
> - test_cmp err.expect err.actual &&
> - test_must_be_empty out.actual
> + test_must_fail git -C garbage-type fsck >out 2>err &&
> + grep -e "^error" -e "^fatal" err >errors &&
> + test_line_count = 2 errors &&
> + grep "error: hash mismatch for" err &&
> + grep "$garbage_blob: object corrupt or missing:" err &&
> + grep "dangling blob $empty_blob" out
> '
Great.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info'
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (18 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-07 10:58 ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
23 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Move the declaration of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-store.h | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/object-store.h b/object-store.h
index c268662f5ba..dc638335e7d 100644
--- a/object-store.h
+++ b/object-store.h
@@ -242,20 +242,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
int force_object_loose(const struct object_id *oid, time_t mtime);
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
- const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents,
- unsigned int oi_flags);
-
/* Retry packed storage after checking packed and loose storage */
#define HAS_OBJECT_RECHECK_PACKED 1
@@ -396,6 +382,20 @@ int oid_object_info_extended(struct repository *r,
const struct object_id *,
struct object_info *, unsigned flags);
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+ const struct object_id *expected_oid,
+ enum object_type *type,
+ unsigned long *size,
+ void **contents,
+ unsigned int oi_flags);
+
/*
* Iterate over the files in the loose-object parts of the object
* directory "path", triggering the following callbacks:
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v6 21/22] fsck: report invalid types recorded in objects
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (19 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-17 3:57 ` Taylor Blau
2021-09-07 10:58 ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Continue the work in the preceding commit and improve the error on:
$ git hash-object --stdin -w -t garbage --literally </dev/null
$ git fsck
error: hash mismatch for <OID_PATH> (expected <OID>)
error: <OID>: object corrupt or missing: <OID_PATH>
[ other fsck output ]
To instead emit:
$ git fsck
error: <OID>: object is of unknown type 'garbage': <OID_PATH>
[ other fsck output ]
The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fsck.c | 22 ++++++++++++++++++----
object-file.c | 13 +++++--------
object-store.h | 4 ++--
t/t1450-fsck.sh | 24 +++++++++++++++++++++---
4 files changed, 46 insertions(+), 17 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 082dadd5629..07af0434db6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
unsigned long size;
void *contents;
int eaten;
-
- if (read_loose_object(path, oid, &type, &size, &contents,
- OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
- errors_found |= ERROR_OBJECT;
+ struct strbuf sb = STRBUF_INIT;
+ unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+ struct object_info oi;
+ int found = 0;
+ oi.type_name = &sb;
+ oi.sizep = &size;
+ oi.typep = &type;
+
+ if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+ found |= ERROR_OBJECT;
error(_("%s: object corrupt or missing: %s"),
oid_to_hex(oid), path);
+ }
+ if (type < 0) {
+ found |= ERROR_OBJECT;
+ error(_("%s: object is of unknown type '%s': %s"),
+ oid_to_hex(oid), sb.buf, path);
+ }
+ if (found) {
+ errors_found |= ERROR_OBJECT;
return 0; /* keep checking other objects */
}
diff --git a/object-file.c b/object-file.c
index 0e6937fad73..f4850ba62b4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2560,9 +2560,8 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
void **contents,
+ struct object_info *oi,
unsigned int oi_flags)
{
int ret = -1;
@@ -2570,10 +2569,9 @@ int read_loose_object(const char *path,
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
- struct object_info oi = OBJECT_INFO_INIT;
int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
- oi.typep = type;
- oi.sizep = size;
+ enum object_type *type = oi->typep;
+ unsigned long *size = oi->sizep;
*contents = NULL;
@@ -2589,7 +2587,7 @@ int read_loose_object(const char *path,
goto out;
}
- if (parse_loose_header(hdr, &oi) < 0) {
+ if (parse_loose_header(hdr, oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
@@ -2611,8 +2609,7 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size,
- type_name(*type))) {
+ *contents, *size, oi->type_name->buf)) {
error(_("hash mismatch for %s (expected %s)"), path,
oid_to_hex(expected_oid));
free(*contents);
diff --git a/object-store.h b/object-store.h
index dc638335e7d..f3045148b89 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
/*
* Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
* type, and size. If the object is a blob, then "contents" may return NULL,
* to allow streaming of large blobs.
*
@@ -391,9 +392,8 @@ int oid_object_info_extended(struct repository *r,
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
void **contents,
+ struct object_info *oi,
unsigned int oi_flags);
/*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index d8303db9709..da2658155c7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
)
'
+test_expect_success 'object with hash and type mismatch' '
+ git init --bare hash-type-mismatch &&
+ (
+ cd hash-type-mismatch &&
+ oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+ test_must_fail git fsck 2>out &&
+ grep "^error: hash mismatch for " out &&
+ grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+ )
+'
+
test_expect_success 'branch pointing to non-commit' '
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -869,9 +888,8 @@ test_expect_success 'fsck error and recovery on invalid object type' '
garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
test_must_fail git -C garbage-type fsck >out 2>err &&
grep -e "^error" -e "^fatal" err >errors &&
- test_line_count = 2 errors &&
- grep "error: hash mismatch for" err &&
- grep "$garbage_blob: object corrupt or missing:" err &&
+ test_line_count = 1 errors &&
+ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
grep "dangling blob $empty_blob" out
'
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 21/22] fsck: report invalid types recorded in objects
2021-09-07 10:58 ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-09-17 3:57 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17 3:57 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Continue the work in the preceding commit and improve the error on:
>
> $ git hash-object --stdin -w -t garbage --literally </dev/null
> $ git fsck
> error: hash mismatch for <OID_PATH> (expected <OID>)
> error: <OID>: object corrupt or missing: <OID_PATH>
> [ other fsck output ]
>
> To instead emit:
>
> $ git fsck
> error: <OID>: object is of unknown type 'garbage': <OID_PATH>
> [ other fsck output ]
>
> The complaint about a "hash mismatch" was simply an emergent property
> of how we'd fall though from read_loose_object() into fsck_loose()
> when we didn't get the data we expected. Now we'll correctly note that
> the object type is invalid.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> builtin/fsck.c | 22 ++++++++++++++++++----
> object-file.c | 13 +++++--------
> object-store.h | 4 ++--
> t/t1450-fsck.sh | 24 +++++++++++++++++++++---
> 4 files changed, 46 insertions(+), 17 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 082dadd5629..07af0434db6 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> unsigned long size;
> void *contents;
> int eaten;
> -
> - if (read_loose_object(path, oid, &type, &size, &contents,
> - OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
> - errors_found |= ERROR_OBJECT;
> + struct strbuf sb = STRBUF_INIT;
> + unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> + struct object_info oi;
> + int found = 0;
> + oi.type_name = &sb;
> + oi.sizep = &size;
> + oi.typep = &type;
> +
> + if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
OK, now we pass a struct object_info instead of pointers to type and
size separately. Makes sense.
> + found |= ERROR_OBJECT;
And found tracks the error we found when trying to read this loose
object, if any. Having a separate variable makes sense, since we only
want to avoid calling fsck_obj() if we found any errors for this object
while trying to call read_loose_object().
> error(_("%s: object corrupt or missing: %s"),
> oid_to_hex(oid), path);
> + }
> + if (type < 0) {
> + found |= ERROR_OBJECT;
> + error(_("%s: object is of unknown type '%s': %s"),
> + oid_to_hex(oid), sb.buf, path);
> + }
> + if (found) {
> + errors_found |= ERROR_OBJECT;
Perhaps errors_found |= found ?
> return 0; /* keep checking other objects */
> }
>
> diff --git a/object-file.c b/object-file.c
> index 0e6937fad73..f4850ba62b4 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2560,9 +2560,8 @@ static int check_stream_oid(git_zstream *stream,
>
> int read_loose_object(const char *path,
> const struct object_id *expected_oid,
> - enum object_type *type,
> - unsigned long *size,
> void **contents,
> + struct object_info *oi,
> unsigned int oi_flags)
All of the changes in this function make perfect sense, except...
> {
> int ret = -1;
> @@ -2570,10 +2569,9 @@ int read_loose_object(const char *path,
> unsigned long mapsize;
> git_zstream stream;
> char hdr[MAX_HEADER_LEN];
> - struct object_info oi = OBJECT_INFO_INIT;
> int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> - oi.typep = type;
> - oi.sizep = size;
> + enum object_type *type = oi->typep;
> + unsigned long *size = oi->sizep;
...I see that size is used in check_object_signature(), but I don't see
any uses for type. Am I missing it?
The tests look good to me.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v6 22/22] fsck: report invalid object type-path combinations
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (20 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58 ` Ævar Arnfjörð Bjarmason
2021-09-17 4:06 ` Taylor Blau
2021-09-17 4:08 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
23 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Ævar Arnfjörð Bjarmason
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fast-export.c | 2 +-
builtin/fsck.c | 13 +++++++++----
builtin/index-pack.c | 2 +-
builtin/mktag.c | 3 ++-
object-file.c | 21 ++++++++++++---------
object-store.h | 4 +++-
object.c | 4 ++--
pack-check.c | 3 ++-
t/t1006-cat-file.sh | 2 +-
t/t1450-fsck.sh | 8 +++++---
10 files changed, 38 insertions(+), 24 deletions(-)
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3c20f164f0f..48a3b6a7f8f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
if (!buf)
die("could not read blob %s", oid_to_hex(oid));
if (check_object_signature(the_repository, oid, buf, size,
- type_name(type)) < 0)
+ type_name(type), NULL) < 0)
die("oid mismatch in blob %s", oid_to_hex(oid));
object = parse_object_buffer(the_repository, oid, type,
size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 07af0434db6..158b9dac9b3 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
struct strbuf sb = STRBUF_INIT;
unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
struct object_info oi;
+ struct object_id real_oid = *null_oid();
int found = 0;
oi.type_name = &sb;
oi.sizep = &size;
oi.typep = &type;
- if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+ if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
found |= ERROR_OBJECT;
- error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
+ if (!oideq(&real_oid, oid))
+ error(_("%s: hash-path mismatch, found at: %s"),
+ oid_to_hex(&real_oid), path);
+ else
+ error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
}
if (type < 0) {
found |= ERROR_OBJECT;
error(_("%s: object is of unknown type '%s': %s"),
- oid_to_hex(oid), sb.buf, path);
+ oid_to_hex(&real_oid), sb.buf, path);
}
if (found) {
errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8336466865c..9f540e0236a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1419,7 +1419,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
if (check_object_signature(the_repository, &d->oid,
data, size,
- type_name(type)))
+ type_name(type), NULL))
die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
repl = lookup_replace_object(the_repository, tagged_oid);
ret = check_object_signature(the_repository, repl,
- buffer, size, type_name(*tagged_type));
+ buffer, size, type_name(*tagged_type),
+ NULL);
free(buffer);
return ret;
diff --git a/object-file.c b/object-file.c
index f4850ba62b4..07b3e4d9b4b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1062,9 +1062,11 @@ void *xmmap(void *start, size_t length,
* the streaming interface and rehash it to do the same.
*/
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *map, unsigned long size, const char *type)
+ void *map, unsigned long size, const char *type,
+ struct object_id *real_oidp)
{
- struct object_id real_oid;
+ struct object_id tmp;
+ struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
enum object_type obj_type;
struct git_istream *st;
git_hash_ctx c;
@@ -1072,8 +1074,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
int hdrlen;
if (map) {
- hash_object_file(r->hash_algo, map, size, type, &real_oid);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ hash_object_file(r->hash_algo, map, size, type, real_oid);
+ return !oideq(oid, real_oid) ? -1 : 0;
}
st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1098,9 +1100,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
break;
r->hash_algo->update_fn(&c, buf, readlen);
}
- r->hash_algo->final_oid_fn(&real_oid, &c);
+ r->hash_algo->final_oid_fn(real_oid, &c);
close_istream(st);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ return !oideq(oid, real_oid) ? -1 : 0;
}
int git_open_cloexec(const char *name, int flags)
@@ -2560,6 +2562,7 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi,
unsigned int oi_flags)
@@ -2609,9 +2612,9 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size, oi->type_name->buf)) {
- error(_("hash mismatch for %s (expected %s)"), path,
- oid_to_hex(expected_oid));
+ *contents, *size, oi->type_name->buf, real_oid)) {
+ if (oideq(real_oid, null_oid()))
+ BUG("should only get OID mismatch errors with mapped contents");
free(*contents);
goto out;
}
diff --git a/object-store.h b/object-store.h
index f3045148b89..3c4ada23f5d 100644
--- a/object-store.h
+++ b/object-store.h
@@ -392,6 +392,7 @@ int oid_object_info_extended(struct repository *r,
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi,
unsigned int oi_flags);
@@ -528,7 +529,8 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
int parse_loose_header(const char *hdr, struct object_info *oi);
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *buf, unsigned long size, const char *type);
+ void *buf, unsigned long size, const char *type,
+ struct object_id *real_oidp);
int finalize_object_file(const char *tmpfile, const char *filename);
int check_and_freshen_file(const char *fn, int freshen);
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
(!obj && repo_has_object_file(r, oid) &&
oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
- if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+ if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
error(_("hash mismatch %s"), oid_to_hex(oid));
return NULL;
}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
buffer = repo_read_object_file(r, oid, &type, &size);
if (buffer) {
if (check_object_signature(r, repl, buffer, size,
- type_name(type)) < 0) {
+ type_name(type), NULL) < 0) {
free(buffer);
error(_("hash mismatch %s"), oid_to_hex(repl));
return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
oid_to_hex(&oid), p->pack_name,
(uintmax_t)entries[i].offset);
- else if (check_object_signature(r, &oid, data, size, type_name(type)))
+ else if (check_object_signature(r, &oid, data, size,
+ type_name(type), NULL))
err = error("packed %s from %s is corrupt",
oid_to_hex(&oid), p->pack_name);
else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 43a9f4e7f0c..39fe11bc92c 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -490,7 +490,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
# Swap the two to corrupt the repository
mv -f "$other_path" "$empty_path" &&
test_must_fail git fsck 2>err.fsck &&
- grep "hash mismatch" err.fsck &&
+ grep "hash-path mismatch" err.fsck &&
# confirm that cat-file is reading the new swapped-in
# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index da2658155c7..7d0d57564b5 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
(
cd hash-mismatch &&
oid=$(echo blob | git hash-object -w --stdin) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
cmt=$(echo bogus | git commit-tree $tree) &&
git update-ref refs/heads/bogus $cmt &&
test_must_fail git fsck 2>out &&
- test_i18ngrep "$oid.*corrupt" out
+ grep "$oldoid: hash-path mismatch, found at: .*$new" out
)
'
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
(
cd hash-type-mismatch &&
oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
cmt=$(echo bogus | git commit-tree $tree) &&
git update-ref refs/heads/bogus $cmt &&
test_must_fail git fsck 2>out &&
- grep "^error: hash mismatch for " out &&
- grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+ grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+ grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
)
'
--
2.33.0.815.g21c7aaf6073
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v6 22/22] fsck: report invalid object type-path combinations
2021-09-07 10:58 ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-17 4:06 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17 4:06 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:58:17PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Improve the error that's emitted in cases where we find a loose object
> we parse, but which isn't at the location we expect it to be.
>
> Before this change we'd prefix the error with a not-a-OID derived from
> the path at which the object was found, due to an emergent behavior in
> how we'd end up with an "OID" in these codepaths.
>
> Now we'll instead say what object we hashed, and what path it was
> found at. Before this patch series e.g.:
>
> $ git hash-object --stdin -w -t blob </dev/null
> e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
> $ mv objects/e6/ objects/e7
>
> Would emit ("[...]" used to abbreviate the OIDs):
>
> git fsck
> error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
> error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
>
> Now we'll instead emit:
>
> error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Lovely!
> @@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> struct strbuf sb = STRBUF_INIT;
> unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> struct object_info oi;
> + struct object_id real_oid = *null_oid();
> int found = 0;
> oi.type_name = &sb;
> oi.sizep = &size;
> oi.typep = &type;
>
> - if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
> + if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
> found |= ERROR_OBJECT;
> - error(_("%s: object corrupt or missing: %s"),
> - oid_to_hex(oid), path);
> + if (!oideq(&real_oid, oid))
> + error(_("%s: hash-path mismatch, found at: %s"),
> + oid_to_hex(&real_oid), path);
> + else
> + error(_("%s: object corrupt or missing: %s"),
> + oid_to_hex(oid), path);
Nice; this is the important part that this patch is changing, and the
logic is very nice. Before it read "anytime read_loose_object fails,
it's an error" to "it's still an error, but we can handle the case where
the real OID and the one we expected were different separately from
generic corruption".
> }
> if (type < 0) {
> found |= ERROR_OBJECT;
> error(_("%s: object is of unknown type '%s': %s"),
> - oid_to_hex(oid), sb.buf, path);
> + oid_to_hex(&real_oid), sb.buf, path);
Could go either way on this hunk, but I think that I err slightly on
your side now that we have access to the "real_oid".
The rest of the code and test changes in this patch look good to me.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (21 preceding siblings ...)
2021-09-07 10:58 ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-17 4:08 ` Taylor Blau
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
23 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-17 4:08 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 07, 2021 at 12:57:55PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This improves fsck error reporting, see the examples in the commit
> messages of 19/22, 21/22 and 22/22. To get there I've lib-ified more
> thigs in object-file.c and the general object APIs, i.e. now we'll
> return error codes instead of calling die() in these cases.
>
> This series has been in "needs review" state for a while. This re-roll
> is mainly to bump it for the list's attention, but while I was at it I
> addressed point from Jonathan Tan raised in a previous round: use an
> enum instead of int for the unpack_loose_header() return value.
I took a thorough look through this series, and left a handful of minor
comments. I didn't spot any glaring issues, and think that this series
is in pretty good shape.
I do admit there were quite a large number of patches to get to the
couple of changes at the end. I left some thoughts throughout for places
that I would have combined things / presented them in a different order
or similar.
I don't think you should spend much time changing the structure now that
it's been looked at with close eyes, but just some idle thoughts for
other large series you might send in the future.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v7 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
2021-09-07 10:57 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (22 preceding siblings ...)
2021-09-17 4:08 ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
` (17 more replies)
23 siblings, 18 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
This improves fsck error reporting, see the examples in the commit
messages of 16/17 and 17/17. To get there I've lib-ified more things
in object-file.c and the general object APIs, i.e. now we'll return
error codes instead of calling die() in these cases.
v6 of this got a very detailed review from Taylor Blau (thanks a
lot!), for the v6 see:
https://lore.kernel.org/git/cover-v6-00.22-00000000000-20210907T104558Z-avarab@gmail.com/
This should address all of the things brought up, and more. After
leaving this series for a while I came up with ways to simplify it
even more, so now it's 17 instead of 22 patches!
So things like:
* The move of functions from cache.h to object-store.h is gone, that
still makes sense to do, but can be left for later.
* A large part of the mid-series is squashed together and
re-arranged, e.g. I moved the migration of unpack_loose_header() to
to an enum earlier, which simplified later steps. The 15/17 rewrite
of much of parse_loose_header() is now much simpler.
* I attempted to address the comments about the tests with some
for-loops and boilerplate-generated testing, maybe some of it's a
bit too ugly, but it's both less copy/pasting now, and more cases
(e.g. the "cat-file -p" case) are tested.
* We now test for what happens/error reporting when we append garbage
to a loose object.
* Many more small changes / improvements / simplifications, see the
range-diff below, but given its size perhaps a re-read is easier...
Ævar Arnfjörð Bjarmason (17):
fsck tests: add test for fsck-ing an unknown type
fsck tests: refactor one test to use a sub-repo
fsck tests: test current hash/type mismatch behavior
fsck tests: test for garbage appended to a loose object
cat-file tests: move bogus_* variable declarations earlier
cat-file tests: test for missing/bogus object with -t, -s and -p
cat-file tests: add corrupt loose object test
cat-file tests: test for current --allow-unknown-type behavior
object-file.c: don't set "typep" when returning non-zero
object-file.c: return -1, not "status" from unpack_loose_header()
object-file.c: make parse_loose_header_extended() public
object-file.c: simplify unpack_loose_short_header()
object-file.c: use "enum" return type for unpack_loose_header()
object-file.c: return ULHR_TOO_LONG on "header too long"
object-file.c: stop dying in parse_loose_header()
fsck: don't hard die on invalid object types
fsck: report invalid object type-path combinations
builtin/fast-export.c | 2 +-
builtin/fsck.c | 28 +++++-
builtin/index-pack.c | 2 +-
builtin/mktag.c | 3 +-
cache.h | 45 ++++++++-
object-file.c | 176 +++++++++++++++------------------
object-store.h | 7 +-
object.c | 4 +-
pack-check.c | 3 +-
streaming.c | 27 +++--
t/oid-info/oid | 2 +
t/t1006-cat-file.sh | 223 +++++++++++++++++++++++++++++++++++++++---
t/t1450-fsck.sh | 99 +++++++++++++++----
13 files changed, 463 insertions(+), 158 deletions(-)
Range-diff against v6:
2: 9072eef3be3 ! 1: 752cef556c2 fsck tests: add test for fsck-ing an unknown type
@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
+test_expect_success 'fsck hard errors on an invalid object type' '
+ git init --bare garbage-type &&
-+ empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
-+ garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-+ cat >err.expect <<-\EOF &&
-+ fatal: invalid object type
-+ EOF
-+ test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-+ test_cmp err.expect err.actual &&
-+ test_must_be_empty out.actual
++ (
++ cd garbage-type &&
++
++ empty=$(git hash-object --stdin -w -t blob </dev/null) &&
++ garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
++
++ cat >err.expect <<-\EOF &&
++ fatal: invalid object type
++ EOF
++ test_must_fail git fsck >out 2>err &&
++ test_cmp err.expect err &&
++ test_must_be_empty out
++ )
+'
+
test_done
1: ebe89f65354 ! 2: 612003bdd2c fsck tests: refactor one test to use a sub-repo
@@ Commit message
teardown of a tests so we're not leaving corrupt content for the next
test.
- We should instead simply use something like this test_create_repo
- pattern. It's both less verbose, and makes things easier to debug as a
- failing test can have their state left behind under -d without
- damaging the state for other tests.
+ We can instead use the pattern of creating a named sub-repository,
+ then we don't have to worry about cleaning up after ourselves, nobody
+ will care what state the broken "hash-mismatch" repository is after
+ this test runs.
- But let's punt on that general refactoring and just change this one
- test, I'm going to change it further in subsequent commits.
+ See [1] for related discussion on various "modern" test patterns that
+ can be used to avoid verbosity and increase reliability.
+
+ 1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
@@ t/t1450-fsck.sh: remove_object () {
- test_when_finished "remove_object $cmt" &&
- git update-ref refs/heads/bogus $cmt &&
- test_when_finished "git update-ref -d refs/heads/bogus" &&
--
-- test_must_fail git fsck 2>out &&
-- test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+ git init --bare hash-mismatch &&
+ (
+ cd hash-mismatch &&
+
+- test_must_fail git fsck 2>out &&
+- test_i18ngrep "$sha.*corrupt" out
+ oid=$(echo blob | git hash-object -w --stdin) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
++
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
++
+ test_must_fail git fsck 2>out &&
-+ test_i18ngrep "$oid.*corrupt" out
++ grep "$oid.*corrupt" out
+ )
'
3: d442a309178 ! 3: 1e40a4235e9 cat-file tests: test for missing object with -t and -s
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- cat-file tests: test for missing object with -t and -s
+ fsck tests: test current hash/type mismatch behavior
- Test for what happens when the -t and -s flags are asked to operate on
- a missing object, this extends tests added in 3e370f9faf0 (t1006: add
- tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
- -s flags are the only ones that can be combined with
- --allow-unknown-type, so let's test with and without that flag.
+ If fsck we move an object around between .git/objects/?? directories
+ to simulate a hash mismatch "git fsck" will currently hard die() in
+ object-file.c. This behavior will be fixed in subsequent commits, but
+ let's test for it as-is for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
- ## t/t1006-cat-file.sh ##
-@@ t/t1006-cat-file.sh: test_expect_success '%(deltabase) reports packed delta bases' '
- }
+ ## t/t1450-fsck.sh ##
+@@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
+ )
'
-+missing_oid=$(test_oid deadbeef)
-+test_expect_success 'error on type of missing object' '
-+ cat >expect.err <<-\EOF &&
-+ fatal: git cat-file: could not get object info
-+ EOF
-+ test_must_fail git cat-file -t $missing_oid >out 2>err &&
-+ test_must_be_empty out &&
-+ test_cmp expect.err err &&
++test_expect_success 'object with hash and type mismatch' '
++ git init --bare hash-type-mismatch &&
++ (
++ cd hash-type-mismatch &&
+
-+ test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
-+ test_must_be_empty out &&
-+ test_cmp expect.err err
-+'
++ oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
++ old=$(test_oid_to_path "$oid") &&
++ new=$(dirname $old)/$(test_oid ff_2) &&
++ oid="$(dirname $new)$(basename $new)" &&
+
-+test_expect_success 'error on size of missing object' '
-+ cat >expect.err <<-\EOF &&
-+ fatal: git cat-file: could not get object info
-+ EOF
-+ test_must_fail git cat-file -s $missing_oid >out 2>err &&
-+ test_must_be_empty out &&
-+ test_cmp expect.err err &&
++ mv objects/$old objects/$new &&
++ git update-index --add --cacheinfo 100644 $oid foo &&
++ tree=$(git write-tree) &&
++ cmt=$(echo bogus | git commit-tree $tree) &&
++ git update-ref refs/heads/bogus $cmt &&
+
-+ test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
-+ test_must_be_empty out &&
-+ test_cmp expect.err err
++ cat >expect <<-\EOF &&
++ fatal: invalid object type
++ EOF
++ test_must_fail git fsck 2>actual &&
++ test_cmp expect actual
++ )
+'
+
- bogus_type="bogus"
- bogus_content="bogus"
- bogus_size=$(strlen "$bogus_content")
+ test_expect_success 'branch pointing to non-commit' '
+ git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
+ test_when_finished "git update-ref -d refs/heads/invalid" &&
5: 82db40ebf8a ! 4: 854991c1543 rev-list tests: test for behavior with invalid object types
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- rev-list tests: test for behavior with invalid object types
+ fsck tests: test for garbage appended to a loose object
- Fix a blindspot in the tests for the "rev-list --disk-usage" feature
- added in 16950f8384a (rev-list: add --disk-usage option for
- calculating disk usage, 2021-02-09) to test for what happens when it's
- asked to calculate the disk usage of invalid object types.
+ There wasn't any output tests for this scenario, let's ensure that we
+ don't regress on it in the changes that come after this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
- ## t/t6115-rev-list-du.sh ##
-@@ t/t6115-rev-list-du.sh: check_du HEAD
- check_du --objects HEAD
- check_du --objects HEAD^..HEAD
+ ## t/t1450-fsck.sh ##
+@@ t/t1450-fsck.sh: test_expect_success 'object with hash and type mismatch' '
+ )
+ '
-+test_expect_success 'setup garbage repository' '
-+ git clone --bare . garbage.git &&
-+ garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
-+ git -C garbage.git rev-list --objects --all --disk-usage &&
++test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
++ git init --bare corrupt-loose-output &&
++ (
++ cd corrupt-loose-output &&
++ oid=$(git hash-object -w --stdin --literally </dev/null) &&
++ oidf=objects/$(test_oid_to_path "$oid") &&
++ chmod 755 $oidf &&
++ echo extra garbage >>$oidf &&
+
-+ # Manually create a ref because "update-ref", "tag" etc. have
-+ # no corresponding --literally option.
-+ echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
-+ test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
++ cat >expect.error <<-EOF &&
++ error: garbage at end of loose object '\''$oid'\''
++ error: unable to unpack contents of ./$oidf
++ error: $oid: object corrupt or missing: ./$oidf
++ EOF
++ test_must_fail git fsck 2>actual &&
++ grep ^error: actual >error &&
++ test_cmp expect.error error
++ )
+'
+
- test_done
+ test_expect_success 'branch pointing to non-commit' '
+ git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
+ test_when_finished "git update-ref -d refs/heads/invalid" &&
19: ad1614dbb8d ! 5: fc93c2c2530 fsck: don't hard die on invalid object types
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- fsck: don't hard die on invalid object types
+ cat-file tests: move bogus_* variable declarations earlier
- Change the error fsck emits on invalid object types, such as:
-
- $ git hash-object --stdin -w -t garbage --literally </dev/null
- <OID>
-
- From the very ungraceful error of:
-
- $ git fsck
- fatal: invalid object type
- $
-
- To:
-
- $ git fsck
- error: hash mismatch for <OID_PATH> (expected <OID>)
- error: <OID>: object corrupt or missing: <OID_PATH>
- [ the rest of the fsck output here, i.e. it didn't hard die ]
-
- We'll still exit with non-zero, but now we'll finish the rest of the
- traversal. The tests that's being added here asserts that we'll still
- complain about other fsck issues (e.g. an unrelated dangling blob).
-
- To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
- flag from read_loose_object() through to parse_loose_header(). Since
- the read_loose_object() function is only used in builtin/fsck.c we can
- simply change it. See f6371f92104 (sha1_file: add read_loose_object()
- function, 2017-01-13) for the introduction of read_loose_object().
-
- Why are we complaining about a "hash mismatch" for an object of a type
- we don't know about? We shouldn't. This is the bare minimal change
- needed to not make fsck hard die on a repository that's been corrupted
- in this manner. In subsequent commits we'll teach fsck to recognize
- this particular type of corruption and emit a better error message.
+ Change the short/long bogus bogus object type variables into a form
+ where the two sets can be used concurrently. This'll be used by
+ subsequently added tests.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
- ## builtin/fsck.c ##
-@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
- void *contents;
- int eaten;
+ ## t/t1006-cat-file.sh ##
+@@ t/t1006-cat-file.sh: test_expect_success '%(deltabase) reports packed delta bases' '
+ }
+ '
-- if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
-+ if (read_loose_object(path, oid, &type, &size, &contents,
-+ OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
- errors_found |= ERROR_OBJECT;
- error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
-
- ## object-file.c ##
-@@ object-file.c: int read_loose_object(const char *path,
- const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
-- void **contents)
-+ void **contents,
-+ unsigned int oi_flags)
- {
- int ret = -1;
- void *map = NULL;
-@@ object-file.c: int read_loose_object(const char *path,
- git_zstream stream;
- char hdr[MAX_HEADER_LEN];
- struct object_info oi = OBJECT_INFO_INIT;
-+ int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
- oi.typep = type;
- oi.sizep = size;
+-bogus_type="bogus"
+-bogus_content="bogus"
+-bogus_size=$(strlen "$bogus_content")
+-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
++test_expect_success 'setup bogus data' '
++ bogus_short_type="bogus" &&
++ bogus_short_content="bogus" &&
++ bogus_short_size=$(strlen "$bogus_short_content") &&
++ bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
++
++ bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
++ bogus_long_content="bogus" &&
++ bogus_long_size=$(strlen "$bogus_long_content") &&
++ bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
++'
-@@ object-file.c: int read_loose_object(const char *path,
- git_inflate_end(&stream);
- goto out;
- }
-- if (*type < 0)
-- die(_("invalid object type"));
-+ if (!allow_unknown && *type < 0) {
-+ error(_("header for %s declares an unknown type"), path);
-+ git_inflate_end(&stream);
-+ goto out;
-+ }
+ test_expect_success "Type of broken object is correct" '
+- echo $bogus_type >expect &&
+- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
++ echo $bogus_short_type >expect &&
++ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual
+ '
- if (*type == OBJ_BLOB && *size > big_file_threshold) {
- if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
-
- ## object-store.h ##
-@@ object-store.h: int read_loose_object(const char *path,
- const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
-- void **contents);
-+ void **contents,
-+ unsigned int oi_flags);
+ test_expect_success "Size of broken object is correct" '
+- echo $bogus_size >expect &&
+- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
++ echo $bogus_short_size >expect &&
++ git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual
+ '
+-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
+-bogus_content="bogus"
+-bogus_size=$(strlen "$bogus_content")
+-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
- /* Retry packed storage after checking packed and loose storage */
- #define HAS_OBJECT_RECHECK_PACKED 1
-
- ## t/t1450-fsck.sh ##
-@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
- test_i18ngrep "bad index file" errors
+ test_expect_success "Type of broken object is correct when type is large" '
+- echo $bogus_type >expect &&
+- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
++ echo $bogus_long_type >expect &&
++ git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
+ test_cmp expect actual
'
--test_expect_success 'fsck hard errors on an invalid object type' '
-+test_expect_success 'fsck error and recovery on invalid object type' '
- git init --bare garbage-type &&
- empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
- garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-- cat >err.expect <<-\EOF &&
-- fatal: invalid object type
-- EOF
-- test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-- test_cmp err.expect err.actual &&
-- test_must_be_empty out.actual
-+ test_must_fail git -C garbage-type fsck >out 2>err &&
-+ grep -e "^error" -e "^fatal" err >errors &&
-+ test_line_count = 2 errors &&
-+ grep "error: hash mismatch for" err &&
-+ grep "$garbage_blob: object corrupt or missing:" err &&
-+ grep "dangling blob $empty_blob" out
+ test_expect_success "Size of large broken object is correct when type is large" '
+- echo $bogus_size >expect &&
+- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
++ echo $bogus_long_size >expect &&
++ git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
+ test_cmp expect actual
'
- test_done
4: 0358273022f ! 6: 051088aa114 cat-file tests: test that --allow-unknown-type isn't on by default
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- cat-file tests: test that --allow-unknown-type isn't on by default
+ cat-file tests: test for missing/bogus object with -t, -s and -p
- Fix a blindspot in the tests for the --allow-unknown-type feature
- added in 39e4ae38804 (cat-file: teach cat-file a
- '--allow-unknown-type' option, 2015-05-03). We should check that
- --allow-unknown-type isn't on by default.
+ When we look up a missing object with cat_one_file() what error we
+ print out currently depends on whether we'll error out early in
+ get_oid_with_context(), or if we'll get an error later from
+ oid_object_info_extended().
- Before this change all the tests would succeed if --allow-unknown-type
- was on by default, let's fix that by asserting that -t and -s die on a
- "garbage" type without --allow-unknown-type.
+ The --allow-unknown-type flag then changes whether we pass the
+ "OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
+ not.
+
+ The "-p" flag is yet another special-case in printing the same output
+ on the deadbeef OID as we'd emit on the deadbeef_short OID for the
+ "-s" and "-t" options, it also doesn't support the
+ "--allow-unknown-type" flag at all.
+
+ Let's test the combination of the two sets of [-t, -s, -p] and
+ [--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
+ in not supplying it), as well as a [missing,bogus] object pair.
+
+ This extends tests added in 3e370f9faf0 (t1006: add tests for git
+ cat-file --allow-unknown-type, 2015-05-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+ ## t/oid-info/oid ##
+@@ t/oid-info/oid: numeric sha1:0123456789012345678901234567890123456789
+ numeric sha256:0123456789012345678901234567890123456789012345678901234567890123
+ deadbeef sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+ deadbeef sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
++deadbeef_short sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
++deadbee_short sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
+
## t/t1006-cat-file.sh ##
-@@ t/t1006-cat-file.sh: bogus_content="bogus"
- bogus_size=$(strlen "$bogus_content")
- bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+@@ t/t1006-cat-file.sh: test_expect_success 'setup bogus data' '
+ bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+ '
-+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
-+ cat >err.expect <<-\EOF &&
-+ fatal: invalid object type
-+ EOF
++for arg1 in '' --allow-unknown-type
++do
++ for arg2 in -s -t -p
++ do
++ if test $arg1 = "--allow-unknown-type" && test "$arg2" = "-p"
++ then
++ continue
++ fi
++
+
-+ test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
-+ test_cmp err.expect err.actual &&
-+ test_must_be_empty out.actual &&
++ test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
++ cat >expect <<-\EOF &&
++ fatal: invalid object type
++ EOF
+
-+ test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
-+ test_cmp err.expect err.actual &&
-+ test_must_be_empty out.actual
-+'
++ if test "$arg1" = "--allow-unknown-type"
++ then
++ git cat-file $arg1 $arg2 $bogus_short_sha1
++ else
++ test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
++ test_must_be_empty out &&
++ test_cmp expect actual
++ fi
++ '
++
++ test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
++ if test "$arg2" = "-p"
++ then
++ cat >expect <<-EOF
++ error: unable to unpack $bogus_long_sha1 header
++ fatal: Not a valid object name $bogus_long_sha1
++ EOF
++ else
++ cat >expect <<-EOF
++ error: unable to unpack $bogus_long_sha1 header
++ fatal: git cat-file: could not get object info
++ EOF
++ fi &&
++
++ if test "$arg1" = "--allow-unknown-type"
++ then
++ git cat-file $arg1 $arg2 $bogus_short_sha1
++ else
++ test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
++ test_must_be_empty out &&
++ test_cmp expect actual
++ fi
++ '
++
++ test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
++ cat >expect.err <<-EOF &&
++ fatal: Not a valid object name $(test_oid deadbeef_short)
++ EOF
++ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
++ test_must_be_empty out
++ '
++
++ test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
++ if test "$arg2" = "-p"
++ then
++ cat >expect.err <<-EOF
++ fatal: Not a valid object name $(test_oid deadbeef)
++ EOF
++ else
++ cat >expect.err <<-\EOF
++ fatal: git cat-file: could not get object info
++ EOF
++ fi &&
++ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
++ test_must_be_empty out &&
++ test_cmp expect.err err.actual
++ '
++ done
++done
+
test_expect_success "Type of broken object is correct" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-@@ t/t1006-cat-file.sh: bogus_content="bogus"
- bogus_size=$(strlen "$bogus_content")
- bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
-
-+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
-+ cat >err.expect <<-EOF &&
-+ error: unable to unpack $bogus_sha1 header
-+ fatal: git cat-file: could not get object info
-+ EOF
-+
-+ test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
-+ test_cmp err.expect err.actual &&
-+ test_must_be_empty out.actual &&
-+
-+ test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
-+ test_cmp err.expect err.actual &&
-+ test_must_be_empty out.actual
-+'
-+
- test_expect_success "Type of broken object is correct when type is large" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_short_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
6: d1ffd21acc5 = 7: 20bd81c1af0 cat-file tests: add corrupt loose object test
7: 22ab12c2282 ! 8: cd1d52b8a07 cat-file tests: test for current --allow-unknown-type behavior
@@ Commit message
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## t/t1006-cat-file.sh ##
-@@ t/t1006-cat-file.sh: test_expect_success 'die on broken object under -t and -s without --allow-unknow
- test_must_be_empty out.actual
- '
+@@ t/t1006-cat-file.sh: do
+ done
+ done
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
-+ git cat-file -e $bogus_sha1
++ git cat-file -e $bogus_short_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
-+ test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
++ test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
-+ test_must_fail git cat-file -p $bogus_sha1 &&
-+ test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
++ test_must_fail git cat-file -p $bogus_short_sha1 &&
++ test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type "bogus"
+ EOF
-+ test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
++ test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
-+ echo $bogus_sha1 >bogus-oid &&
++ echo $bogus_short_sha1 >bogus-oid &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
@@ t/t1006-cat-file.sh: test_expect_success 'die on broken object under -t and -s w
+ test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+ test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
-+
- test_expect_success "Type of broken object is correct" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-@@ t/t1006-cat-file.sh: test_expect_success "Size of broken object is correct" '
- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
- test_cmp expect actual
- '
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+ cat >expect <<-EOF &&
-+ $bogus_type
++ $bogus_short_type
+ EOF
-+ git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
++ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual &&
+
+ # Create it manually, as "git replace" will die on bogus
+ # types.
+ head=$(git rev-parse --verify HEAD) &&
++ test_when_finished "rm -rf .git/refs/replace" &&
+ mkdir -p .git/refs/replace &&
-+ echo $head >.git/refs/replace/$bogus_sha1 &&
++ echo $head >.git/refs/replace/$bogus_short_sha1 &&
+
+ cat >expect <<-EOF &&
+ commit
+ EOF
-+ git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
++ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual
+'
+
- bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
- bogus_content="bogus"
- bogus_size=$(strlen "$bogus_content")
+ test_expect_success "Type of broken object is correct" '
+ echo $bogus_short_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
8: 38e4266772d = 9: d9f5adfc74b object-file.c: don't set "typep" when returning non-zero
9: 5b9278e7bb4 < -: ----------- cache.h: move object functions to object-store.h
16: 9e7dbfb4aa3 ! 10: 51d14bc9274 object-file.c: return -1, not "status" from unpack_loose_header()
@@ Commit message
## object-file.c ##
@@ object-file.c: int unpack_loose_header(git_zstream *stream,
- status = git_inflate(stream, 0);
- obj_read_lock();
+ buffer, bufsiz);
+
if (status < Z_OK)
- return status;
+ return -1;
- /*
- * Check if entire header is unpacked in the first iteration.
+ /* Make sure we have the terminating NUL */
+ if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
10: b15ad53414b ! 11: f43cfd8a5ed object-file.c: make parse_loose_header_extended() public
@@ Commit message
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.
+ It would be a better end-state to have already moved the declaration
+ of these functions to object-store.h to avoid the forward declaration
+ of "struct object_info" in cache.h, but let's leave that cleanup for
+ some other time.
+
+ 1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/
+
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+ ## cache.h ##
+@@ cache.h: char *xdg_cache_home(const char *filename);
+ int git_open_cloexec(const char *name, int flags);
+ #define git_open(name) git_open_cloexec(name, O_RDONLY)
+ int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
+-int parse_loose_header(const char *hdr, unsigned long *sizep);
++struct object_info;
++int parse_loose_header(const char *hdr, struct object_info *oi,
++ unsigned int flags);
+
+ int check_object_signature(struct repository *r, const struct object_id *oid,
+ void *buf, unsigned long size, const char *type);
+
## object-file.c ##
@@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
@@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
*/
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
- unsigned int flags)
-+int parse_loose_header(const char *hdr,
-+ struct object_info *oi,
++int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags)
{
const char *type_buf = hdr;
@@ object-file.c: int read_loose_object(const char *path,
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
- ## object-store.h ##
-@@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
- int unpack_loose_header(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz);
--int parse_loose_header(const char *hdr, unsigned long *sizep);
-+int parse_loose_header(const char *hdr, struct object_info *oi,
-+ unsigned int flags);
- int check_object_signature(struct repository *r, const struct object_id *oid,
- void *buf, unsigned long size, const char *type);
- int finalize_object_file(const char *tmpfile, const char *filename);
-
## streaming.c ##
@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
const struct object_id *oid,
11: 326eb74545d < -: ----------- object-file.c: add missing braces to loose_object_info()
12: 4f829e9b727 ! 12: 50d938f7f3c object-file.c: simplify unpack_loose_short_header()
@@ Commit message
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+ ## cache.h ##
+@@ cache.h: char *xdg_cache_home(const char *filename);
+
+ int git_open_cloexec(const char *name, int flags);
+ #define git_open(name) git_open_cloexec(name, O_RDONLY)
+-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
++
++/**
++ * unpack_loose_header() initializes the data stream needed to unpack
++ * a loose object header.
++ *
++ * Returns 0 on success. Returns negative values on error.
++ *
++ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
++ * "hdrbuf" argument is non-NULL. This is intended for use with
++ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
++ * reporting. The full header will be extracted to "hdrbuf" for use
++ * with parse_loose_header().
++ */
++int unpack_loose_header(git_zstream *stream, unsigned char *map,
++ unsigned long mapsize, void *buffer,
++ unsigned long bufsiz, struct strbuf *hdrbuf);
+ struct object_info;
+ int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags);
+
## object-file.c ##
@@ object-file.c: void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
@@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
- int status = unpack_loose_short_header(stream, map, mapsize,
- buffer, bufsiz);
-
- if (status < Z_OK)
- return status;
-
+- if (status < Z_OK)
+- return -1;
+-
- /* Make sure we have the terminating NUL */
- if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return -1;
@@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
- int status;
-
- status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
-- if (status < Z_OK)
-- return -1;
--
- /*
- * Check if entire header is unpacked in the first iteration.
- */
+ if (status < Z_OK)
+ return -1;
+
+@@ object-file.c: static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
return 0;
@@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
* buffer[0..bufsiz] was not large enough. Copy the partial
* result out to header, and then append the result of further
@@ object-file.c: static int loose_object_info(struct repository *r,
- unsigned long mapsize;
- void *map;
- git_zstream stream;
-+ int hdr_ret;
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
@@ object-file.c: static int loose_object_info(struct repository *r,
- if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
- status = error(_("unable to unpack %s header with --allow-unknown-type"),
- oid_to_hex(oid));
-- } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+- } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+
-+ hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
-+ allow_unknown ? &hdrbuf : NULL);
-+ if (hdr_ret < 0) {
++ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
++ allow_unknown ? &hdrbuf : NULL) < 0)
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
- }
+ if (status < 0)
@@ object-file.c: int read_loose_object(const char *path,
goto out;
}
@@ object-file.c: int read_loose_object(const char *path,
goto out;
}
- ## object-store.h ##
-@@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
- int for_each_packed_object(each_packed_object_fn, void *,
- enum for_each_object_flags flags);
-
-+/**
-+ * unpack_loose_header() initializes the data stream needed to unpack
-+ * a loose object header.
-+ *
-+ * Returns 0 on success. Returns negative values on error.
-+ *
-+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
-+ * "hdrbuf" argument is non-NULL. This is intended for use with
-+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
-+ * reporting. The full header will be extracted to "hdrbuf" for use
-+ * with parse_loose_header().
-+ */
- int unpack_loose_header(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
-- unsigned long bufsiz);
-+ unsigned long bufsiz, struct strbuf *hdrbuf);
- int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags);
- int check_object_signature(struct repository *r, const struct object_id *oid,
-
## streaming.c ##
@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped,
13: 90489d9e6ec < -: ----------- object-file.c: split up ternary in parse_loose_header()
18: 1b7173a5b5b ! 13: 755fde00b46 object-file.c: use "enum" return type for unpack_loose_header()
@@ Metadata
## Commit message ##
object-file.c: use "enum" return type for unpack_loose_header()
- In the preceding commits we changed and documented
- unpack_loose_header() from return any negative value or zero, to only
- -2, -1 or 0. Let's instead add an "enum unpack_loose_header_result"
- type and use it, and have the compiler assert that we're exhaustively
- covering all return values. This gets rid of the need for having a
- "default" BUG() case in loose_object_info().
+ In a preceding commit we changed and documented unpack_loose_header()
+ from its previous behavior of returning any negative value or zero, to
+ only -1 or 0.
- I'm on the fence about whether this is more readable or worth it, but
- since it was suggested in [1] to do this let's go for it.
-
- 1. https://lore.kernel.org/git/20210527175433.2673306-1-jonathantanmy@google.com/
+ Let's add an "enum unpack_loose_header_result" type and use it for
+ these return values, and have the compiler assert that we're
+ exhaustively covering all of them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+ ## cache.h ##
+@@ cache.h: int git_open_cloexec(const char *name, int flags);
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+- * Returns 0 on success. Returns negative values on error.
++ * Returns:
++ *
++ * - ULHR_OK on success
++ * - ULHR_BAD on error
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+@@ cache.h: int git_open_cloexec(const char *name, int flags);
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
+-int unpack_loose_header(git_zstream *stream, unsigned char *map,
+- unsigned long mapsize, void *buffer,
+- unsigned long bufsiz, struct strbuf *hdrbuf);
++enum unpack_loose_header_result {
++ ULHR_OK,
++ ULHR_BAD,
++};
++enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
++ unsigned char *map,
++ unsigned long mapsize,
++ void *buffer,
++ unsigned long bufsiz,
++ struct strbuf *hdrbuf);
++
+ struct object_info;
+ int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags);
+
## object-file.c ##
@@ object-file.c: void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
@@ object-file.c: void *map_loose_object(struct repository *r,
{
int status;
+@@ object-file.c: int unpack_loose_header(git_zstream *stream,
+ status = git_inflate(stream, 0);
+ obj_read_lock();
+ if (status < Z_OK)
+- return -1;
++ return ULHR_BAD;
+
+ /*
+ * Check if entire header is unpacked in the first iteration.
+ */
+ if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
+- return 0;
++ return ULHR_OK;
+
+ /*
+ * We have a header longer than MAX_HEADER_LEN. The "header"
+@@ object-file.c: int unpack_loose_header(git_zstream *stream,
+ * --allow-unknown-type".
+ */
+ if (!header)
+- return -1;
++ return ULHR_BAD;
+
+ /*
+ * buffer[0..bufsiz] was not large enough. Copy the partial
+@@ object-file.c: int unpack_loose_header(git_zstream *stream,
+ stream->next_out = buffer;
+ stream->avail_out = bufsiz;
+ } while (status != Z_STREAM_END);
+- return -1;
++ return ULHR_BAD;
+ }
+
+ static void *unpack_loose_rest(git_zstream *stream,
@@ object-file.c: static int loose_object_info(struct repository *r,
- unsigned long mapsize;
- void *map;
- git_zstream stream;
-- int hdr_ret;
-+ enum unpack_loose_header_result hdr_ret;
- char hdr[MAX_HEADER_LEN];
- struct strbuf hdrbuf = STRBUF_INIT;
- unsigned long size_scratch;
-@@ object-file.c: static int loose_object_info(struct repository *r,
- hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
- allow_unknown ? &hdrbuf : NULL);
- switch (hdr_ret) {
-- case 0:
-+ case UNPACK_LOOSE_HEADER_RESULT_OK:
- break;
-- case -1:
-+ case UNPACK_LOOSE_HEADER_RESULT_BAD:
+ if (oi->disk_sizep)
+ *oi->disk_sizep = mapsize;
+
+- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+- allow_unknown ? &hdrbuf : NULL) < 0)
++ switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
++ allow_unknown ? &hdrbuf : NULL)) {
++ case ULHR_OK:
++ break;
++ case ULHR_BAD:
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
- break;
-- case -2:
-+ case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
- status = error(_("header for %s too long, exceeds %d bytes"),
- oid_to_hex(oid), MAX_HEADER_LEN);
- break;
-- default:
-- BUG("unknown hdr_ret value %d", hdr_ret);
- }
- if (!status) {
- if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
-
- ## object-store.h ##
-@@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
- int for_each_packed_object(each_packed_object_fn, void *,
- enum for_each_object_flags flags);
-
-+enum unpack_loose_header_result {
-+ UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG = -2,
-+ UNPACK_LOOSE_HEADER_RESULT_BAD = -1,
-+ UNPACK_LOOSE_HEADER_RESULT_OK,
-+
-+};
+- if (status < 0)
+- ; /* Do nothing */
+- else if (hdrbuf.len) {
++ break;
++ }
+
- /**
- * unpack_loose_header() initializes the data stream needed to unpack
- * a loose object header.
- *
-- * Returns 0 on success. Returns negative values on error. If the
-- * header exceeds MAX_HEADER_LEN -2 will be returned.
-+ * Returns UNPACK_LOOSE_HEADER_RESULT_OK on success. Returns
-+ * UNPACK_LOOSE_HEADER_RESULT_BAD values on error, or if the header
-+ * exceeds MAX_HEADER_LEN UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG will
-+ * be returned.
- *
- * It will only parse up to MAX_HEADER_LEN bytes unless an optional
- * "hdrbuf" argument is non-NULL. This is intended for use with
- * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
- * reporting. The full header will be extracted to "hdrbuf" for use
-- * with parse_loose_header(), -2 will still be returned from this
-- * function to indicate that the header was too long.
-+ * with parse_loose_header(), UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG
-+ * will still be returned from this function to indicate that the
-+ * header was too long.
- */
--int unpack_loose_header(git_zstream *stream, unsigned char *map,
-- unsigned long mapsize, void *buffer,
-- unsigned long bufsiz, struct strbuf *hdrbuf);
-+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
-+ unsigned char *map,
-+ unsigned long mapsize,
-+ void *buffer,
-+ unsigned long bufsiz,
-+ struct strbuf *hdrbuf);
-
- /**
- * parse_loose_header() parses the starting "<type> <len>\0" of an
++ if (status < 0) {
++ /* Do nothing */
++ } else if (hdrbuf.len) {
+ if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
+ status = error(_("unable to parse %s header with --allow-unknown-type"),
+ oid_to_hex(oid));
## streaming.c ##
@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
- enum object_type *type)
- {
- struct object_info oi = OBJECT_INFO_INIT;
-+ enum unpack_loose_header_result hdr_ret;
- oi.sizep = &st->size;
- oi.typep = type;
-
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct reposi
- st->u.loose.hdr,
- sizeof(st->u.loose.hdr),
- NULL) < 0) ||
-- (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
-- *type < 0) {
+- (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
- git_inflate_end(&st->z);
- munmap(st->u.loose.mapped, st->u.loose.mapsize);
- return -1;
-+ hdr_ret = unpack_loose_header(&st->z, st->u.loose.mapped,
-+ st->u.loose.mapsize, st->u.loose.hdr,
-+ sizeof(st->u.loose.hdr), NULL);
-+ switch (hdr_ret) {
-+ case UNPACK_LOOSE_HEADER_RESULT_OK:
++ switch (unpack_loose_header(&st->z, st->u.loose.mapped,
++ st->u.loose.mapsize, st->u.loose.hdr,
++ sizeof(st->u.loose.hdr), NULL)) {
++ case ULHR_OK:
+ break;
-+ case UNPACK_LOOSE_HEADER_RESULT_BAD:
-+ case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
++ case ULHR_BAD:
+ goto error;
}
-+ if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
++ if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+ goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
17: f28c4f0dfb4 ! 14: 522d71eb19d object-file.c: return -2 on "header too long" in unpack_loose_header()
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- object-file.c: return -2 on "header too long" in unpack_loose_header()
+ object-file.c: return ULHR_TOO_LONG on "header too long"
Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
@@ Commit message
As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
- stack. Let's instead just return -2 saying we ran into the
+ stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+ ## cache.h ##
+@@ cache.h: int git_open_cloexec(const char *name, int flags);
+ *
+ * - ULHR_OK on success
+ * - ULHR_BAD on error
++ * - ULHR_TOO_LONG if the header was too long
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+- * with parse_loose_header().
++ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
++ * from this function to indicate that the header was too long.
+ */
+ enum unpack_loose_header_result {
+ ULHR_OK,
+ ULHR_BAD,
++ ULHR_TOO_LONG,
+ };
+ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+
## object-file.c ##
-@@ object-file.c: int unpack_loose_header(git_zstream *stream,
+@@ object-file.c: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
-- return -1;
-+ return -2;
+- return ULHR_BAD;
++ return ULHR_TOO_LONG;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
-@@ object-file.c: int unpack_loose_header(git_zstream *stream,
+@@ object-file.c: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
-- return -1;
-+ return -2;
+- return ULHR_BAD;
++ return ULHR_TOO_LONG;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ object-file.c: static int loose_object_info(struct repository *r,
-
- hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
- allow_unknown ? &hdrbuf : NULL);
-- if (hdr_ret < 0) {
-+ switch (hdr_ret) {
-+ case 0:
-+ break;
-+ case -1:
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
-+ break;
-+ case -2:
+ break;
++ case ULHR_TOO_LONG:
+ status = error(_("header for %s too long, exceeds %d bytes"),
+ oid_to_hex(oid), MAX_HEADER_LEN);
+ break;
-+ default:
-+ BUG("unknown hdr_ret value %d", hdr_ret);
}
- if (!status) {
- if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
+
+ if (status < 0) {
- ## object-store.h ##
-@@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
- * unpack_loose_header() initializes the data stream needed to unpack
- * a loose object header.
- *
-- * Returns 0 on success. Returns negative values on error.
-+ * Returns 0 on success. Returns negative values on error. If the
-+ * header exceeds MAX_HEADER_LEN -2 will be returned.
- *
- * It will only parse up to MAX_HEADER_LEN bytes unless an optional
- * "hdrbuf" argument is non-NULL. This is intended for use with
- * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
- * reporting. The full header will be extracted to "hdrbuf" for use
-- * with parse_loose_header().
-+ * with parse_loose_header(), -2 will still be returned from this
-+ * function to indicate that the header was too long.
- */
- int unpack_loose_header(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
+ ## streaming.c ##
+@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
+ case ULHR_OK:
+ break;
+ case ULHR_BAD:
++ case ULHR_TOO_LONG:
+ goto error;
+ }
+ if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
## t/t1006-cat-file.sh ##
-@@ t/t1006-cat-file.sh: bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
-
- test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
- cat >err.expect <<-EOF &&
-- error: unable to unpack $bogus_sha1 header
-+ error: header for $bogus_sha1 too long, exceeds 32 bytes
- fatal: git cat-file: could not get object info
- EOF
-
+@@ t/t1006-cat-file.sh: do
+ if test "$arg2" = "-p"
+ then
+ cat >expect <<-EOF
+- error: unable to unpack $bogus_long_sha1 header
++ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
+ fatal: Not a valid object name $bogus_long_sha1
+ EOF
+ else
+ cat >expect <<-EOF
+- error: unable to unpack $bogus_long_sha1 header
++ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
+ fatal: git cat-file: could not get object info
+ EOF
+ fi &&
14: 7c9819d37c5 ! 15: 1ca875395c1 object-file.c: stop dying in parse_loose_header()
@@ Metadata
## Commit message ##
object-file.c: stop dying in parse_loose_header()
- Start the libification of parse_loose_header() by making it return
- error codes and data instead of invoking die() by itself. For now
- we'll move the relevant die() call to loose_object_info() and
- read_loose_object() to keep this change smaller, but in subsequent
- commits we'll also libify those.
+ Make parse_loose_header() return error codes and data instead of
+ invoking die() by itself.
- Since the refactoring of parse_loose_header_extended() into
- parse_loose_header() in an earlier commit, its interface accepts a
- "unsigned long *sizep". Rather it accepts a "struct object_info *",
- that structure will be populated with information about the object.
+ For now we'll move the relevant die() call to loose_object_info() and
+ read_loose_object() to keep this change smaller. In a subsequent
+ commit we'll make read_loose_object() return an error code instead of
+ dying. We should also address the "allow_unknown" case (should be
+ moved to builtin/cat-file.c), but for now I'll be leaving it.
- It thus makes sense to further libify the interface so that it stops
- calling die() when it encounters OBJ_BAD, and instead rely on its
- callers to check the populated "oi->typep".
+ For making parse_loose_header() not die() change its prototype to
+ accept a "struct object_info *" instead of the "unsigned long *sizep"
+ it accepted before. Its callers can now check the populated populated
+ "oi->typep".
Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
@@ Commit message
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.
- In another case added in c84a1f3ed4d (sha1_file: refactor read_object,
- 2017-06-21) (but the behavior pre-dated that) we did checks of "status
- >= 0", because at that point "status" had become the return value of
- parse_loose_header(). I.e. a non-negative "enum object_type" (unless
- we -1, aka. OBJ_BAD).
+ Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
+ objects, 2017-04-01) the return value of loose_object_info() (then
+ named sha1_loose_object_info()) had been a "status" variable that be
+ any negative value, as we were expecting to return the "enum
+ object_type".
- Now that parse_loose_header() will return 0 on success instead of the
+ The only negative type happens to be OBJ_BAD, but the code still
+ assumed that more might be added. This was then used later in
+ e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
+ that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.
+ Since parse_loose_header() doesn't need to return an arbitrary
+ "status" we only need to treat its "ret < 0" specially, but can
+ idiomatically overwrite it with our own error() return. This along
+ with having made unpack_loose_header() return an "enum
+ unpack_loose_header_result" in an earlier commit means that we can
+ move the previously nested if/else cases mostly into the "ULHR_OK"
+ branch of the "switch" statement.
+
+ We should be less silent if we reach that "status = -1" branch, which
+ happens if we've got trailing garbage in loose objects, see
+ f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
+ for a better way to handle it. For now let's punt on it, a subsequent
+ commit will address that edge case.
+
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+ ## cache.h ##
+@@ cache.h: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned long bufsiz,
+ struct strbuf *hdrbuf);
+
++/**
++ * parse_loose_header() parses the starting "<type> <len>\0" of an
++ * object. If it doesn't follow that format -1 is returned. To check
++ * the validity of the <type> populate the "typep" in the "struct
++ * object_info". It will be OBJ_BAD if the object type is unknown. The
++ * parsed <len> can be retrieved via "oi->sizep", and from there
++ * passed to unpack_loose_rest().
++ */
+ struct object_info;
+-int parse_loose_header(const char *hdr, struct object_info *oi,
+- unsigned int flags);
++int parse_loose_header(const char *hdr, struct object_info *oi);
+
+ int check_object_signature(struct repository *r, const struct object_id *oid,
+ void *buf, unsigned long size, const char *type);
+
## object-file.c ##
@@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
--int parse_loose_header(const char *hdr,
-- struct object_info *oi,
+-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
{
const char *type_buf = hdr;
unsigned long size;
-@@ object-file.c: int parse_loose_header(const char *hdr,
+@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi,
type = type_from_string_gently(type_buf, type_len, 1);
if (oi->type_name)
strbuf_add(oi->type_name, type_buf, type_len);
@@ object-file.c: int parse_loose_header(const char *hdr,
if (oi->typep)
*oi->typep = type;
-@@ object-file.c: int parse_loose_header(const char *hdr,
- if (*hdr)
- return -1;
-
-- return type;
+@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi,
+ /*
+ * The length must be followed by a zero byte
+ */
+- return *hdr ? -1 : type;
++ if (*hdr)
++ return -1;
++
+ /*
+ * The format is valid, but the type may still be bogus. The
+ * Caller needs to check its oi->typep.
@@ object-file.c: static int loose_object_info(struct repository *r,
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ enum object_type type_scratch;
-+ int parsed_header = 0;
int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
@@ object-file.c: static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
@@ object-file.c: static int loose_object_info(struct repository *r,
+ switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL)) {
+ case ULHR_OK:
++ if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
++ status = error(_("unable to parse %s header"), oid_to_hex(oid));
++ else if (!allow_unknown && *oi->typep < 0)
++ die(_("invalid object type"));
++
++ if (!oi->contentp)
++ break;
++ *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
++ if (*oi->contentp)
++ goto cleanup;
++
++ status = -1;
+ break;
+ case ULHR_BAD:
status = error(_("unable to unpack %s header"),
- oid_to_hex(oid));
+@@ object-file.c: static int loose_object_info(struct repository *r,
+ break;
}
--
+
- if (status < 0) {
- /* Do nothing */
- } else if (hdrbuf.len) {
- if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
- status = error(_("unable to parse %s header with --allow-unknown-type"),
- oid_to_hex(oid));
-- } else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
+- } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
- status = error(_("unable to parse %s header"), oid_to_hex(oid));
-+ if (!status) {
-+ if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
-+ /*
-+ * oi->{sizep,typep} are meaningless unless
-+ * parse_loose_header() returns >= 0.
-+ */
-+ parsed_header = 1;
-+ else
-+ status = error(_("unable to parse %s header"), oid_to_hex(oid));
- }
-+ if (!allow_unknown && parsed_header && *oi->typep < 0)
-+ die(_("invalid object type"));
-
+-
- if (status >= 0 && oi->contentp) {
-+ if (parsed_header && oi->contentp) {
- *oi->contentp = unpack_loose_rest(&stream, hdr,
- *oi->sizep, oid);
- if (!*oi->contentp) {
-@@ object-file.c: static int loose_object_info(struct repository *r,
+- *oi->contentp = unpack_loose_rest(&stream, hdr,
+- *oi->sizep, oid);
+- if (!*oi->contentp) {
+- git_inflate_end(&stream);
+- status = -1;
+- }
+- } else
+- git_inflate_end(&stream);
+-
++ git_inflate_end(&stream);
++cleanup:
+ munmap(map, mapsize);
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
+ if (oi->typep == &type_scratch)
+ oi->typep = NULL;
oi->whence = OI_LOOSE;
- return (status < 0) ? status : 0;
+- return (status < 0) ? status : 0;
++ return status;
}
+
+ int obj_read_use_lock = 0;
@@ object-file.c: int read_loose_object(const char *path,
git_zstream stream;
char hdr[MAX_HEADER_LEN];
@@ object-file.c: int read_loose_object(const char *path,
if (*type == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
- ## object-store.h ##
-@@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
- int unpack_loose_header(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *hdrbuf);
--int parse_loose_header(const char *hdr, struct object_info *oi,
-- unsigned int flags);
-+
-+/**
-+ * parse_loose_header() parses the starting "<type> <len>\0" of an
-+ * object. If it doesn't follow that format -1 is returned. To check
-+ * the validity of the <type> populate the "typep" in the "struct
-+ * object_info". It will be OBJ_BAD if the object type is unknown. The
-+ * parsed <len> can be retrieved via "oi->sizep", and from there
-+ * passed to unpack_loose_rest().
-+ */
-+int parse_loose_header(const char *hdr, struct object_info *oi);
-+
- int check_object_signature(struct repository *r, const struct object_id *oid,
- void *buf, unsigned long size, const char *type);
- int finalize_object_file(const char *tmpfile, const char *filename);
-
## streaming.c ##
@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
{
@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct reposi
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
- st->u.loose.hdr,
- sizeof(st->u.loose.hdr),
- NULL) < 0) ||
-- (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
-+ (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
-+ *type < 0) {
- git_inflate_end(&st->z);
- munmap(st->u.loose.mapped, st->u.loose.mapsize);
- return -1;
+ case ULHR_TOO_LONG:
+ goto error;
+ }
+- if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
++ if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
+ goto error;
+
+ st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
15: 3fb660ff944 < -: ----------- object-file.c: guard against future bugs in loose_object_info()
20: 3bf3cf2299d < -: ----------- object-store.h: move read_loose_object() below 'struct object_info'
21: 974f650cddf ! 16: d38067feab3 fsck: report invalid types recorded in objects
@@ Metadata
Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## Commit message ##
- fsck: report invalid types recorded in objects
+ fsck: don't hard die on invalid object types
- Continue the work in the preceding commit and improve the error on:
+ Change the error fsck emits on invalid object types, such as:
$ git hash-object --stdin -w -t garbage --literally </dev/null
+ <OID>
+
+ From the very ungraceful error of:
+
$ git fsck
- error: hash mismatch for <OID_PATH> (expected <OID>)
- error: <OID>: object corrupt or missing: <OID_PATH>
- [ other fsck output ]
+ fatal: invalid object type
+ $
- To instead emit:
+ To:
$ git fsck
error: <OID>: object is of unknown type 'garbage': <OID_PATH>
[ other fsck output ]
- The complaint about a "hash mismatch" was simply an emergent property
- of how we'd fall though from read_loose_object() into fsck_loose()
- when we didn't get the data we expected. Now we'll correctly note that
- the object type is invalid.
+ We'll still exit with non-zero, but now we'll finish the rest of the
+ traversal. The tests that's being added here asserts that we'll still
+ complain about other fsck issues (e.g. an unrelated dangling blob).
+
+ To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
+ flag from read_loose_object() through to parse_loose_header(). Since
+ the read_loose_object() function is only used in builtin/fsck.c we can
+ simply change it to accept a "struct object_info" (which contains the
+ OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
+ f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
+ for the introduction of read_loose_object().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
unsigned long size;
void *contents;
int eaten;
--
-- if (read_loose_object(path, oid, &type, &size, &contents,
-- OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-- errors_found |= ERROR_OBJECT;
+ struct strbuf sb = STRBUF_INIT;
-+ unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
-+ struct object_info oi;
-+ int found = 0;
++ struct object_info oi = OBJECT_INFO_INIT;
++ int err = 0;
+
+- if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+ oi.type_name = &sb;
+ oi.sizep = &size;
+ oi.typep = &type;
+
-+ if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
-+ found |= ERROR_OBJECT;
- error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
-+ }
-+ if (type < 0) {
-+ found |= ERROR_OBJECT;
-+ error(_("%s: object is of unknown type '%s': %s"),
-+ oid_to_hex(oid), sb.buf, path);
-+ }
-+ if (found) {
-+ errors_found |= ERROR_OBJECT;
++ if (read_loose_object(path, oid, &contents, &oi) < 0)
++ err = error(_("%s: object corrupt or missing: %s"),
++ oid_to_hex(oid), path);
++ if (type < 0)
++ err = error(_("%s: object is of unknown type '%s': %s"),
++ oid_to_hex(oid), sb.buf, path);
++ if (err) {
+ errors_found |= ERROR_OBJECT;
+- error(_("%s: object corrupt or missing: %s"),
+- oid_to_hex(oid), path);
return 0; /* keep checking other objects */
}
@@ object-file.c: static int check_stream_oid(git_zstream *stream,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents,
-+ struct object_info *oi,
- unsigned int oi_flags)
+- void **contents)
++ void **contents,
++ struct object_info *oi)
{
int ret = -1;
-@@ object-file.c: int read_loose_object(const char *path,
+ void *map = NULL;
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
- struct object_info oi = OBJECT_INFO_INIT;
- int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
- oi.typep = type;
- oi.sizep = size;
-+ enum object_type *type = oi->typep;
+ unsigned long *size = oi->sizep;
*contents = NULL;
@@ object-file.c: int read_loose_object(const char *path,
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
+ }
+- if (*type < 0)
+- die(_("invalid object type"));
+
+- if (*type == OBJ_BLOB && *size > big_file_threshold) {
++ if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
+ if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
+ goto out;
+ } else {
@@ object-file.c: int read_loose_object(const char *path,
goto out;
}
@@ object-file.c: int read_loose_object(const char *path,
free(*contents);
## object-store.h ##
-@@ object-store.h: int oid_object_info_extended(struct repository *r,
+@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
/*
* Open the loose object at path, check its hash, and return the contents,
@@ object-store.h: int oid_object_info_extended(struct repository *r,
* type, and size. If the object is a blob, then "contents" may return NULL,
* to allow streaming of large blobs.
*
-@@ object-store.h: int oid_object_info_extended(struct repository *r,
+@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents,
-+ struct object_info *oi,
- unsigned int oi_flags);
+- void **contents);
++ void **contents,
++ struct object_info *oi);
- /*
+ /* Retry packed storage after checking packed and loose storage */
+ #define HAS_OBJECT_RECHECK_PACKED 1
## t/t1450-fsck.sh ##
-@@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
- )
- '
+@@ t/t1450-fsck.sh: test_expect_success 'object with hash and type mismatch' '
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
-+test_expect_success 'object with hash and type mismatch' '
-+ git init --bare hash-type-mismatch &&
-+ (
-+ cd hash-type-mismatch &&
-+ oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
-+ old=$(test_oid_to_path "$oid") &&
-+ new=$(dirname $old)/$(test_oid ff_2) &&
-+ oid="$(dirname $new)$(basename $new)" &&
-+ mv objects/$old objects/$new &&
-+ git update-index --add --cacheinfo 100644 $oid foo &&
-+ tree=$(git write-tree) &&
-+ cmt=$(echo bogus | git commit-tree $tree) &&
-+ git update-ref refs/heads/bogus $cmt &&
+- cat >expect <<-\EOF &&
+- fatal: invalid object type
+- EOF
+- test_must_fail git fsck 2>actual &&
+- test_cmp expect actual
++
+ test_must_fail git fsck 2>out &&
+ grep "^error: hash mismatch for " out &&
+ grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
-+ )
-+'
-+
- test_expect_success 'branch pointing to non-commit' '
- git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
- test_when_finished "git update-ref -d refs/heads/invalid" &&
-@@ t/t1450-fsck.sh: test_expect_success 'fsck error and recovery on invalid object type' '
- garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
- test_must_fail git -C garbage-type fsck >out 2>err &&
- grep -e "^error" -e "^fatal" err >errors &&
-- test_line_count = 2 errors &&
-- grep "error: hash mismatch for" err &&
-- grep "$garbage_blob: object corrupt or missing:" err &&
-+ test_line_count = 1 errors &&
-+ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
- grep "dangling blob $empty_blob" out
+ )
+ '
+
+@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
+ test_i18ngrep "bad index file" errors
+ '
+
+-test_expect_success 'fsck hard errors on an invalid object type' '
++test_expect_success 'fsck error and recovery on invalid object type' '
+ git init --bare garbage-type &&
+ (
+ cd garbage-type &&
+@@ t/t1450-fsck.sh: test_expect_success 'fsck hard errors on an invalid object type' '
+ fatal: invalid object type
+ EOF
+ test_must_fail git fsck >out 2>err &&
+- test_cmp err.expect err &&
+- test_must_be_empty out
++ grep -e "^error" -e "^fatal" err >errors &&
++ test_line_count = 1 errors &&
++ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
++ grep "dangling blob $empty_blob" out
+ )
'
22: 804673a17b0 ! 17: b07e892fc19 fsck: report invalid object type-path combinations
@@ Commit message
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
- In the case of check_object_signature() I don't really trust all the
- moving parts there to behave consistently, in the face of future
- refactorings. Getting it wrong would mean that we'd potentially emit
- no error at all on a failing check_object_signature(), or worse
- misreport whatever issue we encountered. So let's use the new bug()
- function to ferry and return code up to fsck_loose() in that case.
+ We need to add the "object corrupt or missing" special-case to deal
+ with cases where read_loose_object() will return an error before
+ completing check_object_signature(), e.g. if we have an error in
+ unpack_loose_rest() because we find garbage after the valid gzip
+ content:
+
+ $ git hash-object --stdin -w -t blob </dev/null
+ e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
+ $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
+ $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
+ $ git fsck
+ error: garbage at end of loose object 'e69d[...]'
+ error: unable to unpack contents of ./objects/e6/9d[...]
+ error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
+
+ There is currently some weird messaging in the edge case when the two
+ are combined, i.e. because we're not explicitly passing along an error
+ state about this specific scenario from check_stream_oid() via
+ read_loose_object() we'll end up printing the null OID if an object is
+ of an unknown type *and* it can't be unpacked by zlib, e.g.:
+
+ $ git hash-object --stdin -w -t garbage --literally </dev/null
+ 8315a83d2acc4c174aed59430f9a9c4ed926440f
+ $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
+ $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
+ $ /usr/bin/git fsck
+ fatal: invalid object type
+ $ ~/g/git/git fsck
+ error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
+ error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
+ error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
+ error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
+ [...]
+
+ I think it's OK to leave that for future improvements, which would
+ involve enum-ifying more error state as we've done with "enum
+ unpack_loose_header_result" in preceding commits. In these
+ increasingly more obscure cases the worst that can happen is that
+ we'll get slightly nonsensical or inapplicable error messages.
+
+ There's other such potential edge cases, all of which might produce
+ some confusing messaging, but still be handled correctly as far as
+ passing along errors goes. E.g. if check_object_signature() returns
+ and oideq(real_oid, null_oid()) is true, which could happen if it
+ returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
@@ builtin/fast-export.c: static void export_blob(const struct object_id *oid)
## builtin/fsck.c ##
@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
+ struct object *obj;
+ enum object_type type;
+ unsigned long size;
+- void *contents;
++ unsigned char *contents = NULL;
+ int eaten;
struct strbuf sb = STRBUF_INIT;
- unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
- struct object_info oi;
+ struct object_info oi = OBJECT_INFO_INIT;
+- int err = 0;
+ struct object_id real_oid = *null_oid();
- int found = 0;
++ int ret;
+
oi.type_name = &sb;
oi.sizep = &size;
oi.typep = &type;
-- if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
-+ if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
- found |= ERROR_OBJECT;
-- error(_("%s: object corrupt or missing: %s"),
-- oid_to_hex(oid), path);
-+ if (!oideq(&real_oid, oid))
+- if (read_loose_object(path, oid, &contents, &oi) < 0)
+- err = error(_("%s: object corrupt or missing: %s"),
+- oid_to_hex(oid), path);
++ ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
++ if (ret < 0) {
++ if (contents && !oideq(&real_oid, oid))
+ error(_("%s: hash-path mismatch, found at: %s"),
+ oid_to_hex(&real_oid), path);
+ else
+ error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
- }
- if (type < 0) {
- found |= ERROR_OBJECT;
- error(_("%s: object is of unknown type '%s': %s"),
-- oid_to_hex(oid), sb.buf, path);
-+ oid_to_hex(&real_oid), sb.buf, path);
- }
- if (found) {
++ }
+ if (type < 0)
+- err = error(_("%s: object is of unknown type '%s': %s"),
+- oid_to_hex(oid), sb.buf, path);
+- if (err) {
++ ret = error(_("%s: object is of unknown type '%s': %s"),
++ oid_to_hex(&real_oid), sb.buf, path);
++ if (ret < 0) {
errors_found |= ERROR_OBJECT;
+ return 0; /* keep checking other objects */
+ }
## builtin/index-pack.c ##
@@ builtin/index-pack.c: static void fix_unresolved_deltas(struct hashfile *f)
@@ builtin/mktag.c: static int verify_object_in_tag(struct object_id *tagged_oid, i
return ret;
+ ## cache.h ##
+@@ cache.h: struct object_info;
+ int parse_loose_header(const char *hdr, struct object_info *oi);
+
+ int check_object_signature(struct repository *r, const struct object_id *oid,
+- void *buf, unsigned long size, const char *type);
++ void *buf, unsigned long size, const char *type,
++ struct object_id *real_oidp);
+
+ int finalize_object_file(const char *tmpfile, const char *filename);
+
+
## object-file.c ##
@@ object-file.c: void *xmmap(void *start, size_t length,
* the streaming interface and rehash it to do the same.
@@ object-file.c: static int check_stream_oid(git_zstream *stream,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
- struct object_info *oi,
- unsigned int oi_flags)
+ struct object_info *oi)
+ {
+@@ object-file.c: int read_loose_object(const char *path,
+ char hdr[MAX_HEADER_LEN];
+ unsigned long *size = oi->sizep;
+
+- *contents = NULL;
+-
+ map = map_loose_object_1(the_repository, path, NULL, &mapsize);
+ if (!map) {
+ error_errno(_("unable to mmap %s"), path);
@@ object-file.c: int read_loose_object(const char *path,
goto out;
}
@@ object-file.c: int read_loose_object(const char *path,
- error(_("hash mismatch for %s (expected %s)"), path,
- oid_to_hex(expected_oid));
+ *contents, *size, oi->type_name->buf, real_oid)) {
-+ if (oideq(real_oid, null_oid()))
-+ BUG("should only get OID mismatch errors with mapped contents");
free(*contents);
goto out;
}
## object-store.h ##
-@@ object-store.h: int oid_object_info_extended(struct repository *r,
+@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
- struct object_info *oi,
- unsigned int oi_flags);
-@@ object-store.h: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
- int parse_loose_header(const char *hdr, struct object_info *oi);
-
- int check_object_signature(struct repository *r, const struct object_id *oid,
-- void *buf, unsigned long size, const char *type);
-+ void *buf, unsigned long size, const char *type,
-+ struct object_id *real_oidp);
- int finalize_object_file(const char *tmpfile, const char *filename);
- int check_and_freshen_file(const char *fn, int freshen);
+ struct object_info *oi);
## object.c ##
@@ t/t1006-cat-file.sh: test_expect_success 'cat-file -t and -s on corrupt loose ob
## t/t1450-fsck.sh ##
@@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
- (
cd hash-mismatch &&
+
oid=$(echo blob | git hash-object -w --stdin) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
- cmt=$(echo bogus | git commit-tree $tree) &&
git update-ref refs/heads/bogus $cmt &&
+
test_must_fail git fsck 2>out &&
-- test_i18ngrep "$oid.*corrupt" out
+- grep "$oid.*corrupt" out
+ grep "$oldoid: hash-path mismatch, found at: .*$new" out
)
'
@@ t/t1450-fsck.sh: test_expect_success 'object with hash and type mismatch' '
- (
cd hash-type-mismatch &&
+
oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ t/t1450-fsck.sh: test_expect_success 'object with hash and type mismatch' '
- cmt=$(echo bogus | git commit-tree $tree) &&
- git update-ref refs/heads/bogus $cmt &&
+
+
test_must_fail git fsck 2>out &&
- grep "^error: hash mismatch for " out &&
- grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v7 01/17] fsck tests: add test for fsck-ing an unknown type
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
` (16 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.
This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..969bfbbdd8f 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
+test_expect_success 'fsck hard errors on an invalid object type' '
+ git init --bare garbage-type &&
+ (
+ cd garbage-type &&
+
+ empty=$(git hash-object --stdin -w -t blob </dev/null) &&
+ garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+ test_must_fail git fsck >out 2>err &&
+ test_cmp err.expect err &&
+ test_must_be_empty out
+ )
+'
+
test_done
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 02/17] fsck tests: refactor one test to use a sub-repo
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
` (15 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.
We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.
See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.
1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 969bfbbdd8f..f8edd15abf8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,25 @@ remove_object () {
rm "$(sha1_file "$1")"
}
-test_expect_success 'object with bad sha1' '
- sha=$(echo blob | git hash-object -w --stdin) &&
- old=$(test_oid_to_path "$sha") &&
- new=$(dirname $old)/$(test_oid ff_2) &&
- sha="$(dirname $new)$(basename $new)" &&
- mv .git/objects/$old .git/objects/$new &&
- test_when_finished "remove_object $sha" &&
- git update-index --add --cacheinfo 100644 $sha foo &&
- test_when_finished "git read-tree -u --reset HEAD" &&
- tree=$(git write-tree) &&
- test_when_finished "remove_object $tree" &&
- cmt=$(echo bogus | git commit-tree $tree) &&
- test_when_finished "remove_object $cmt" &&
- git update-ref refs/heads/bogus $cmt &&
- test_when_finished "git update-ref -d refs/heads/bogus" &&
+test_expect_success 'object with hash mismatch' '
+ git init --bare hash-mismatch &&
+ (
+ cd hash-mismatch &&
- test_must_fail git fsck 2>out &&
- test_i18ngrep "$sha.*corrupt" out
+ oid=$(echo blob | git hash-object -w --stdin) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+
+ test_must_fail git fsck 2>out &&
+ grep "$oid.*corrupt" out
+ )
'
test_expect_success 'branch pointing to non-commit' '
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 03/17] fsck tests: test current hash/type mismatch behavior
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
` (14 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f8edd15abf8..175ed304637 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,6 +69,30 @@ test_expect_success 'object with hash mismatch' '
)
'
+test_expect_success 'object with hash and type mismatch' '
+ git init --bare hash-type-mismatch &&
+ (
+ cd hash-type-mismatch &&
+
+ oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+
+ cat >expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+ test_must_fail git fsck 2>actual &&
+ test_cmp expect actual
+ )
+'
+
test_expect_success 'branch pointing to non-commit' '
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
test_when_finished "git update-ref -d refs/heads/invalid" &&
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 04/17] fsck tests: test for garbage appended to a loose object
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (2 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
` (13 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 175ed304637..bd696d21dba 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -93,6 +93,26 @@ test_expect_success 'object with hash and type mismatch' '
)
'
+test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
+ git init --bare corrupt-loose-output &&
+ (
+ cd corrupt-loose-output &&
+ oid=$(git hash-object -w --stdin --literally </dev/null) &&
+ oidf=objects/$(test_oid_to_path "$oid") &&
+ chmod 755 $oidf &&
+ echo extra garbage >>$oidf &&
+
+ cat >expect.error <<-EOF &&
+ error: garbage at end of loose object '\''$oid'\''
+ error: unable to unpack contents of ./$oidf
+ error: $oid: object corrupt or missing: ./$oidf
+ EOF
+ test_must_fail git fsck 2>actual &&
+ grep ^error: actual >error &&
+ test_cmp expect.error error
+ )
+'
+
test_expect_success 'branch pointing to non-commit' '
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
test_when_finished "git update-ref -d refs/heads/invalid" &&
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 05/17] cat-file tests: move bogus_* variable declarations earlier
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (3 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
` (12 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 35 +++++++++++++++++++----------------
1 file changed, 19 insertions(+), 16 deletions(-)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..ea6a53d425b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,36 +315,39 @@ test_expect_success '%(deltabase) reports packed delta bases' '
}
'
-bogus_type="bogus"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'setup bogus data' '
+ bogus_short_type="bogus" &&
+ bogus_short_content="bogus" &&
+ bogus_short_size=$(strlen "$bogus_short_content") &&
+ bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+
+ bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
+ bogus_long_content="bogus" &&
+ bogus_long_size=$(strlen "$bogus_long_content") &&
+ bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+'
test_expect_success "Type of broken object is correct" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_short_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
test_cmp expect actual
'
test_expect_success "Size of broken object is correct" '
- echo $bogus_size >expect &&
- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_short_size >expect &&
+ git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
test_cmp expect actual
'
-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
test_expect_success "Type of broken object is correct when type is large" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_long_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
test_cmp expect actual
'
test_expect_success "Size of large broken object is correct when type is large" '
- echo $bogus_size >expect &&
- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_long_size >expect &&
+ git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
test_cmp expect actual
'
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (4 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-21 3:30 ` Taylor Blau
2021-09-20 19:04 ` [PATCH v7 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
` (11 subsequent siblings)
17 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().
The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.
The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.
Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.
This extends tests added in 3e370f9faf0 (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/oid-info/oid | 2 ++
t/t1006-cat-file.sh | 75 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 77 insertions(+)
diff --git a/t/oid-info/oid b/t/oid-info/oid
index a754970523c..ecffa9045f9 100644
--- a/t/oid-info/oid
+++ b/t/oid-info/oid
@@ -27,3 +27,5 @@ numeric sha1:0123456789012345678901234567890123456789
numeric sha256:0123456789012345678901234567890123456789012345678901234567890123
deadbeef sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
deadbeef sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef_short sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
+deadbee_short sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ea6a53d425b..af59613250b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
'
+for arg1 in '' --allow-unknown-type
+do
+ for arg2 in -s -t -p
+ do
+ if test $arg1 = "--allow-unknown-type" && test "$arg2" = "-p"
+ then
+ continue
+ fi
+
+
+ test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
+ cat >expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ if test "$arg1" = "--allow-unknown-type"
+ then
+ git cat-file $arg1 $arg2 $bogus_short_sha1
+ else
+ test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+ test_must_be_empty out &&
+ test_cmp expect actual
+ fi
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
+ if test "$arg2" = "-p"
+ then
+ cat >expect <<-EOF
+ error: unable to unpack $bogus_long_sha1 header
+ fatal: Not a valid object name $bogus_long_sha1
+ EOF
+ else
+ cat >expect <<-EOF
+ error: unable to unpack $bogus_long_sha1 header
+ fatal: git cat-file: could not get object info
+ EOF
+ fi &&
+
+ if test "$arg1" = "--allow-unknown-type"
+ then
+ git cat-file $arg1 $arg2 $bogus_short_sha1
+ else
+ test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+ test_must_be_empty out &&
+ test_cmp expect actual
+ fi
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
+ cat >expect.err <<-EOF &&
+ fatal: Not a valid object name $(test_oid deadbeef_short)
+ EOF
+ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
+ test_must_be_empty out
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
+ if test "$arg2" = "-p"
+ then
+ cat >expect.err <<-EOF
+ fatal: Not a valid object name $(test_oid deadbeef)
+ EOF
+ else
+ cat >expect.err <<-\EOF
+ fatal: git cat-file: could not get object info
+ EOF
+ fi &&
+ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
+ test_must_be_empty out &&
+ test_cmp expect.err err.actual
+ '
+ done
+done
+
test_expect_success "Type of broken object is correct" '
echo $bogus_short_type >expect &&
git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
2021-09-20 19:04 ` [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-09-21 3:30 ` Taylor Blau
0 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-21 3:30 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Mon, Sep 20, 2021 at 09:04:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> diff --git a/t/oid-info/oid b/t/oid-info/oid
> index a754970523c..ecffa9045f9 100644
> --- a/t/oid-info/oid
> +++ b/t/oid-info/oid
> @@ -27,3 +27,5 @@ numeric sha1:0123456789012345678901234567890123456789
> numeric sha256:0123456789012345678901234567890123456789012345678901234567890123
> deadbeef sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> deadbeef sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
> +deadbeef_short sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
> +deadbee_short sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
This jumped out at me while I was reading it. In the second line,
s/deadbee_short/deadbeef_short/ ?
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index ea6a53d425b..af59613250b 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
> bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
> '
>
> +for arg1 in '' --allow-unknown-type
> +do
> + for arg2 in -s -t -p
> + do
This is quite the loop! I appreciate the extra thoroughness, although it
may come at some extra cost of intertwining all of these combinations of
tests together.
But that may be warranted, since they are all related. But it's not a
full matrix of all possible combinations; e.g., "--allow-unknown-type"
does not go with "-p".
So this may be the best that we can do. It's definitely a mouthful, but
I think it's overall an easier read than what we had in the previous
version. And it's definitely more thorough, which is good. Thanks for
spending the time improving this test.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v7 07/17] cat-file tests: add corrupt loose object test
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (5 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
` (10 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index af59613250b..8bbc34efb0c 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -426,6 +426,58 @@ test_expect_success "Size of large broken object is correct when type is large"
test_cmp expect actual
'
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+ git init --bare corrupt-loose.git &&
+ (
+ cd corrupt-loose.git &&
+
+ # Setup and create the empty blob and its path
+ empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+ git hash-object -w --stdin </dev/null &&
+
+ # Create another blob and its path
+ echo other >other.blob &&
+ other_blob=$(git hash-object -w --stdin <other.blob) &&
+ other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+ # Before the swap the size is 0
+ cat >out.expect <<-EOF &&
+ 0
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # Swap the two to corrupt the repository
+ mv -f "$other_path" "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "hash mismatch" err.fsck &&
+
+ # confirm that cat-file is reading the new swapped-in
+ # blob...
+ cat >out.expect <<-EOF &&
+ blob
+ EOF
+ git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # ... since it has a different size now.
+ cat >out.expect <<-EOF &&
+ 6
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # So far "cat-file" has been happy to spew the found
+ # content out as-is. Try to make it zlib-invalid.
+ mv -f other.blob "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "^error: inflate: data stream error (" err.fsck
+ )
+'
+
# Tests for git cat-file --follow-symlinks
test_expect_success 'prep for symlink tests' '
echo_without_newline "$hello_content" >morx &&
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 08/17] cat-file tests: test for current --allow-unknown-type behavior
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (6 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
` (9 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.
1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8bbc34efb0c..269ab7e4729 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -402,6 +402,67 @@ do
done
done
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+ git cat-file -e $bogus_short_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+ test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+ test_must_fail git cat-file -p $bogus_short_sha1 &&
+ test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type "bogus"
+ EOF
+ test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+ echo $bogus_short_sha1 >bogus-oid &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual &&
+
+ test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+ test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+ test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+ cat >expect <<-EOF &&
+ $bogus_short_type
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual &&
+
+ # Create it manually, as "git replace" will die on bogus
+ # types.
+ head=$(git rev-parse --verify HEAD) &&
+ test_when_finished "rm -rf .git/refs/replace" &&
+ mkdir -p .git/refs/replace &&
+ echo $head >.git/refs/replace/$bogus_short_sha1 &&
+
+ cat >expect <<-EOF &&
+ commit
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual
+'
+
test_expect_success "Type of broken object is correct" '
echo $bogus_short_type >expect &&
git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 09/17] object-file.c: don't set "typep" when returning non-zero
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (7 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
` (8 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.
That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.
Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.
Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.
This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.
Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.
Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/object-file.c b/object-file.c
index a8be8994814..bda3497d5ca 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1503,8 +1503,6 @@ static int loose_object_info(struct repository *r,
git_inflate_end(&stream);
munmap(map, mapsize);
- if (status && oi->typep)
- *oi->typep = status;
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 10/17] object-file.c: return -1, not "status" from unpack_loose_header()
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (8 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
` (7 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.
See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".
At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").
However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.
So let's do the minor cleanup of also changing this function to return
a -1.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/object-file.c b/object-file.c
index bda3497d5ca..774ec8c866f 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1262,7 +1262,7 @@ int unpack_loose_header(git_zstream *stream,
buffer, bufsiz);
if (status < Z_OK)
- return status;
+ return -1;
/* Make sure we have the terminating NUL */
if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 11/17] object-file.c: make parse_loose_header_extended() public
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (9 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
` (6 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.
This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.
It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.
1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 4 +++-
object-file.c | 20 +++++++-------------
streaming.c | 5 ++++-
3 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/cache.h b/cache.h
index d23de693680..33cacbd22ac 100644
--- a/cache.h
+++ b/cache.h
@@ -1314,7 +1314,9 @@ char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+struct object_info;
+int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags);
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 774ec8c866f..33a01ac203f 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1363,8 +1363,8 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1424,14 +1424,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
return *hdr ? -1 : type;
}
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
- struct object_info oi = OBJECT_INFO_INIT;
-
- oi.sizep = sizep;
- return parse_loose_header_extended(hdr, &oi, 0);
-}
-
static int loose_object_info(struct repository *r,
const struct object_id *oid,
struct object_info *oi, int flags)
@@ -1486,10 +1478,10 @@ static int loose_object_info(struct repository *r,
if (status < 0)
; /* Do nothing */
else if (hdrbuf.len) {
- if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+ if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
- } else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+ } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
status = error(_("unable to parse %s header"), oid_to_hex(oid));
if (status >= 0 && oi->contentp) {
@@ -2573,6 +2565,8 @@ int read_loose_object(const char *path,
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = size;
*contents = NULL;
@@ -2587,7 +2581,7 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, size);
+ *type = parse_loose_header(hdr, &oi, 0);
if (*type < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
const struct object_id *oid,
enum object_type *type)
{
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = &st->size;
+
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapsize,
st->u.loose.hdr,
sizeof(st->u.loose.hdr)) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+ (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
return -1;
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 12/17] object-file.c: simplify unpack_loose_short_header()
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (10 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
` (5 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.
The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).
Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.
I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 17 ++++++++++++++-
object-file.c | 58 ++++++++++++++++++---------------------------------
streaming.c | 3 ++-
3 files changed, 38 insertions(+), 40 deletions(-)
diff --git a/cache.h b/cache.h
index 33cacbd22ac..9ad81d452ad 100644
--- a/cache.h
+++ b/cache.h
@@ -1313,7 +1313,22 @@ char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
+
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+ unsigned long mapsize, void *buffer,
+ unsigned long bufsiz, struct strbuf *hdrbuf);
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi,
unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 33a01ac203f..8dd35f768bb 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,11 +1233,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-static int unpack_loose_short_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+ unsigned char *map, unsigned long mapsize,
+ void *buffer, unsigned long bufsiz,
+ struct strbuf *header)
{
- int ret;
+ int status;
/* Get the data stream */
memset(stream, 0, sizeof(*stream));
@@ -1248,35 +1249,8 @@ static int unpack_loose_short_header(git_zstream *stream,
git_inflate_init(stream);
obj_read_unlock();
- ret = git_inflate(stream, 0);
+ status = git_inflate(stream, 0);
obj_read_lock();
-
- return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
-{
- int status = unpack_loose_short_header(stream, map, mapsize,
- buffer, bufsiz);
-
- if (status < Z_OK)
- return -1;
-
- /* Make sure we have the terminating NUL */
- if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return -1;
- return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *header)
-{
- int status;
-
- status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
if (status < Z_OK)
return -1;
@@ -1286,6 +1260,14 @@ static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
return 0;
+ /*
+ * We have a header longer than MAX_HEADER_LEN. The "header"
+ * here is only non-NULL when we run "cat-file
+ * --allow-unknown-type".
+ */
+ if (!header)
+ return -1;
+
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
* result out to header, and then append the result of further
@@ -1435,6 +1417,7 @@ static int loose_object_info(struct repository *r,
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
oidclr(oi->delta_base_oid);
@@ -1468,11 +1451,9 @@ static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
- if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
- status = error(_("unable to unpack %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+
+ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL) < 0)
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
if (status < 0)
@@ -2576,7 +2557,8 @@ int read_loose_object(const char *path,
goto out;
}
- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ NULL) < 0) {
error(_("unable to unpack header of %s"), path);
goto out;
}
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped,
st->u.loose.mapsize,
st->u.loose.hdr,
- sizeof(st->u.loose.hdr)) < 0) ||
+ sizeof(st->u.loose.hdr),
+ NULL) < 0) ||
(parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 13/17] object-file.c: use "enum" return type for unpack_loose_header()
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (11 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.
Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 19 +++++++++++++++----
object-file.c | 34 +++++++++++++++++++++-------------
streaming.c | 23 +++++++++++++----------
3 files changed, 49 insertions(+), 27 deletions(-)
diff --git a/cache.h b/cache.h
index 9ad81d452ad..90dde86828e 100644
--- a/cache.h
+++ b/cache.h
@@ -1318,7 +1318,10 @@ int git_open_cloexec(const char *name, int flags);
* unpack_loose_header() initializes the data stream needed to unpack
* a loose object header.
*
- * Returns 0 on success. Returns negative values on error.
+ * Returns:
+ *
+ * - ULHR_OK on success
+ * - ULHR_BAD on error
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
@@ -1326,9 +1329,17 @@ int git_open_cloexec(const char *name, int flags);
* reporting. The full header will be extracted to "hdrbuf" for use
* with parse_loose_header().
*/
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result {
+ ULHR_OK,
+ ULHR_BAD,
+};
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *hdrbuf);
+
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi,
unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 8dd35f768bb..b214a152ca8 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,10 +1233,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz,
- struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *header)
{
int status;
@@ -1252,13 +1254,13 @@ int unpack_loose_header(git_zstream *stream,
status = git_inflate(stream, 0);
obj_read_lock();
if (status < Z_OK)
- return -1;
+ return ULHR_BAD;
/*
* Check if entire header is unpacked in the first iteration.
*/
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return 0;
+ return ULHR_OK;
/*
* We have a header longer than MAX_HEADER_LEN. The "header"
@@ -1266,7 +1268,7 @@ int unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
- return -1;
+ return ULHR_BAD;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
@@ -1287,7 +1289,7 @@ int unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
- return -1;
+ return ULHR_BAD;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ -1452,13 +1454,19 @@ static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
- allow_unknown ? &hdrbuf : NULL) < 0)
+ switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL)) {
+ case ULHR_OK:
+ break;
+ case ULHR_BAD:
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
- if (status < 0)
- ; /* Do nothing */
- else if (hdrbuf.len) {
+ break;
+ }
+
+ if (status < 0) {
+ /* Do nothing */
+ } else if (hdrbuf.len) {
if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..6df0247a4cb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -229,17 +229,16 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
- if ((unpack_loose_header(&st->z,
- st->u.loose.mapped,
- st->u.loose.mapsize,
- st->u.loose.hdr,
- sizeof(st->u.loose.hdr),
- NULL) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
- git_inflate_end(&st->z);
- munmap(st->u.loose.mapped, st->u.loose.mapsize);
- return -1;
+ switch (unpack_loose_header(&st->z, st->u.loose.mapped,
+ st->u.loose.mapsize, st->u.loose.hdr,
+ sizeof(st->u.loose.hdr), NULL)) {
+ case ULHR_OK:
+ break;
+ case ULHR_BAD:
+ goto error;
}
+ if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+ goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
st->u.loose.hdr_avail = st->z.total_out;
@@ -248,6 +247,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->read = read_istream_loose;
return 0;
+error:
+ git_inflate_end(&st->z);
+ munmap(st->u.loose.mapped, st->u.loose.mapsize);
+ return -1;
}
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 14/17] object-file.c: return ULHR_TOO_LONG on "header too long"
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (12 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.
As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 5 ++++-
object-file.c | 8 ++++++--
streaming.c | 1 +
t/t1006-cat-file.sh | 4 ++--
4 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/cache.h b/cache.h
index 90dde86828e..49b18f2755c 100644
--- a/cache.h
+++ b/cache.h
@@ -1322,16 +1322,19 @@ int git_open_cloexec(const char *name, int flags);
*
* - ULHR_OK on success
* - ULHR_BAD on error
+ * - ULHR_TOO_LONG if the header was too long
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
* OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
* reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
+ * from this function to indicate that the header was too long.
*/
enum unpack_loose_header_result {
ULHR_OK,
ULHR_BAD,
+ ULHR_TOO_LONG,
};
enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
unsigned char *map,
diff --git a/object-file.c b/object-file.c
index b214a152ca8..ca4abe172ce 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1268,7 +1268,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
- return ULHR_BAD;
+ return ULHR_TOO_LONG;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
@@ -1289,7 +1289,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
- return ULHR_BAD;
+ return ULHR_TOO_LONG;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ -1462,6 +1462,10 @@ static int loose_object_info(struct repository *r,
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
break;
+ case ULHR_TOO_LONG:
+ status = error(_("header for %s too long, exceeds %d bytes"),
+ oid_to_hex(oid), MAX_HEADER_LEN);
+ break;
}
if (status < 0) {
diff --git a/streaming.c b/streaming.c
index 6df0247a4cb..bd89c50e7b3 100644
--- a/streaming.c
+++ b/streaming.c
@@ -235,6 +235,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
case ULHR_OK:
break;
case ULHR_BAD:
+ case ULHR_TOO_LONG:
goto error;
}
if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 269ab7e4729..711dcc6d795 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -356,12 +356,12 @@ do
if test "$arg2" = "-p"
then
cat >expect <<-EOF
- error: unable to unpack $bogus_long_sha1 header
+ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
fatal: Not a valid object name $bogus_long_sha1
EOF
else
cat >expect <<-EOF
- error: unable to unpack $bogus_long_sha1 header
+ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
fatal: git cat-file: could not get object info
EOF
fi &&
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 15/17] object-file.c: stop dying in parse_loose_header()
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (13 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Make parse_loose_header() return error codes and data instead of
invoking die() by itself.
For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.
For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".
Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().
This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.
Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".
The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.
Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.
We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 11 +++++++--
object-file.c | 67 +++++++++++++++++++++++++--------------------------
streaming.c | 3 ++-
3 files changed, 44 insertions(+), 37 deletions(-)
diff --git a/cache.h b/cache.h
index 49b18f2755c..23f0534b70e 100644
--- a/cache.h
+++ b/cache.h
@@ -1343,9 +1343,16 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
unsigned long bufsiz,
struct strbuf *hdrbuf);
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
struct object_info;
-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags);
+int parse_loose_header(const char *hdr, struct object_info *oi);
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index ca4abe172ce..1af914c19c6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1347,8 +1347,7 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1370,15 +1369,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
type = type_from_string_gently(type_buf, type_len, 1);
if (oi->type_name)
strbuf_add(oi->type_name, type_buf, type_len);
- /*
- * Set type to 0 if its an unknown object and
- * we're obtaining the type using '--allow-unknown-type'
- * option.
- */
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
- type = 0;
- else if (type < 0)
- die(_("invalid object type"));
if (oi->typep)
*oi->typep = type;
@@ -1405,7 +1395,14 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
/*
* The length must be followed by a zero byte
*/
- return *hdr ? -1 : type;
+ if (*hdr)
+ return -1;
+
+ /*
+ * The format is valid, but the type may still be bogus. The
+ * Caller needs to check its oi->typep.
+ */
+ return 0;
}
static int loose_object_info(struct repository *r,
@@ -1419,6 +1416,7 @@ static int loose_object_info(struct repository *r,
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ enum object_type type_scratch;
int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
@@ -1450,6 +1448,8 @@ static int loose_object_info(struct repository *r,
if (!oi->sizep)
oi->sizep = &size_scratch;
+ if (!oi->typep)
+ oi->typep = &type_scratch;
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
@@ -1457,6 +1457,18 @@ static int loose_object_info(struct repository *r,
switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
allow_unknown ? &hdrbuf : NULL)) {
case ULHR_OK:
+ if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
+ status = error(_("unable to parse %s header"), oid_to_hex(oid));
+ else if (!allow_unknown && *oi->typep < 0)
+ die(_("invalid object type"));
+
+ if (!oi->contentp)
+ break;
+ *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+ if (*oi->contentp)
+ goto cleanup;
+
+ status = -1;
break;
case ULHR_BAD:
status = error(_("unable to unpack %s header"),
@@ -1468,31 +1480,16 @@ static int loose_object_info(struct repository *r,
break;
}
- if (status < 0) {
- /* Do nothing */
- } else if (hdrbuf.len) {
- if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
- status = error(_("unable to parse %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
- status = error(_("unable to parse %s header"), oid_to_hex(oid));
-
- if (status >= 0 && oi->contentp) {
- *oi->contentp = unpack_loose_rest(&stream, hdr,
- *oi->sizep, oid);
- if (!*oi->contentp) {
- git_inflate_end(&stream);
- status = -1;
- }
- } else
- git_inflate_end(&stream);
-
+ git_inflate_end(&stream);
+cleanup:
munmap(map, mapsize);
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
+ if (oi->typep == &type_scratch)
+ oi->typep = NULL;
oi->whence = OI_LOOSE;
- return (status < 0) ? status : 0;
+ return status;
}
int obj_read_use_lock = 0;
@@ -2559,6 +2556,7 @@ int read_loose_object(const char *path,
git_zstream stream;
char hdr[MAX_HEADER_LEN];
struct object_info oi = OBJECT_INFO_INIT;
+ oi.typep = type;
oi.sizep = size;
*contents = NULL;
@@ -2575,12 +2573,13 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, &oi, 0);
- if (*type < 0) {
+ if (parse_loose_header(hdr, &oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
}
+ if (*type < 0)
+ die(_("invalid object type"));
if (*type == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/streaming.c b/streaming.c
index bd89c50e7b3..fe54665d86e 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
{
struct object_info oi = OBJECT_INFO_INIT;
oi.sizep = &st->size;
+ oi.typep = type;
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
@@ -238,7 +239,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
case ULHR_TOO_LONG:
goto error;
}
- if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+ if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 16/17] fsck: don't hard die on invalid object types
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (14 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-20 19:04 ` [PATCH v7 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Change the error fsck emits on invalid object types, such as:
$ git hash-object --stdin -w -t garbage --literally </dev/null
<OID>
From the very ungraceful error of:
$ git fsck
fatal: invalid object type
$
To:
$ git fsck
error: <OID>: object is of unknown type 'garbage': <OID_PATH>
[ other fsck output ]
We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).
To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fsck.c | 17 ++++++++++++++---
object-file.c | 18 ++++++------------
object-store.h | 6 +++---
t/t1450-fsck.sh | 17 +++++++++--------
4 files changed, 32 insertions(+), 26 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..3b046820750 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,11 +600,22 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
unsigned long size;
void *contents;
int eaten;
+ struct strbuf sb = STRBUF_INIT;
+ struct object_info oi = OBJECT_INFO_INIT;
+ int err = 0;
- if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+ oi.type_name = &sb;
+ oi.sizep = &size;
+ oi.typep = &type;
+
+ if (read_loose_object(path, oid, &contents, &oi) < 0)
+ err = error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
+ if (type < 0)
+ err = error(_("%s: object is of unknown type '%s': %s"),
+ oid_to_hex(oid), sb.buf, path);
+ if (err) {
errors_found |= ERROR_OBJECT;
- error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
return 0; /* keep checking other objects */
}
diff --git a/object-file.c b/object-file.c
index 1af914c19c6..be568ade95b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2546,18 +2546,15 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents)
+ void **contents,
+ struct object_info *oi)
{
int ret = -1;
void *map = NULL;
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
- struct object_info oi = OBJECT_INFO_INIT;
- oi.typep = type;
- oi.sizep = size;
+ unsigned long *size = oi->sizep;
*contents = NULL;
@@ -2573,15 +2570,13 @@ int read_loose_object(const char *path,
goto out;
}
- if (parse_loose_header(hdr, &oi) < 0) {
+ if (parse_loose_header(hdr, oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
}
- if (*type < 0)
- die(_("invalid object type"));
- if (*type == OBJ_BLOB && *size > big_file_threshold) {
+ if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
goto out;
} else {
@@ -2592,8 +2587,7 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size,
- type_name(*type))) {
+ *contents, *size, oi->type_name->buf)) {
error(_("hash mismatch for %s (expected %s)"), path,
oid_to_hex(expected_oid));
free(*contents);
diff --git a/object-store.h b/object-store.h
index b4dc6668aa2..e8b4d87b898 100644
--- a/object-store.h
+++ b/object-store.h
@@ -244,6 +244,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
/*
* Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
* type, and size. If the object is a blob, then "contents" may return NULL,
* to allow streaming of large blobs.
*
@@ -251,9 +252,8 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents);
+ void **contents,
+ struct object_info *oi);
/* Retry packed storage after checking packed and loose storage */
#define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bd696d21dba..167c319823a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -85,11 +85,10 @@ test_expect_success 'object with hash and type mismatch' '
cmt=$(echo bogus | git commit-tree $tree) &&
git update-ref refs/heads/bogus $cmt &&
- cat >expect <<-\EOF &&
- fatal: invalid object type
- EOF
- test_must_fail git fsck 2>actual &&
- test_cmp expect actual
+
+ test_must_fail git fsck 2>out &&
+ grep "^error: hash mismatch for " out &&
+ grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
)
'
@@ -910,7 +909,7 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
git init --bare garbage-type &&
(
cd garbage-type &&
@@ -922,8 +921,10 @@ test_expect_success 'fsck hard errors on an invalid object type' '
fatal: invalid object type
EOF
test_must_fail git fsck >out 2>err &&
- test_cmp err.expect err &&
- test_must_be_empty out
+ grep -e "^error" -e "^fatal" err >errors &&
+ test_line_count = 1 errors &&
+ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+ grep "dangling blob $empty_blob" out
)
'
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v7 17/17] fsck: report invalid object type-path combinations
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (15 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-20 19:04 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
17 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-20 19:04 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git fsck
error: garbage at end of loose object 'e69d[...]'
error: unable to unpack contents of ./objects/e6/9d[...]
error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ /usr/bin/git fsck
fatal: invalid object type
$ ~/g/git/git fsck
error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
[...]
I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.
There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fast-export.c | 2 +-
builtin/fsck.c | 23 +++++++++++++++--------
builtin/index-pack.c | 2 +-
builtin/mktag.c | 3 ++-
cache.h | 3 ++-
object-file.c | 21 ++++++++++-----------
object-store.h | 1 +
object.c | 4 ++--
pack-check.c | 3 ++-
t/t1006-cat-file.sh | 2 +-
t/t1450-fsck.sh | 8 +++++---
11 files changed, 42 insertions(+), 30 deletions(-)
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 95e8e89e81f..8e2caf72819 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
if (!buf)
die("could not read blob %s", oid_to_hex(oid));
if (check_object_signature(the_repository, oid, buf, size,
- type_name(type)) < 0)
+ type_name(type), NULL) < 0)
die("oid mismatch in blob %s", oid_to_hex(oid));
object = parse_object_buffer(the_repository, oid, type,
size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3b046820750..d925cdbae5c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -598,23 +598,30 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
struct object *obj;
enum object_type type;
unsigned long size;
- void *contents;
+ unsigned char *contents = NULL;
int eaten;
struct strbuf sb = STRBUF_INIT;
struct object_info oi = OBJECT_INFO_INIT;
- int err = 0;
+ struct object_id real_oid = *null_oid();
+ int ret;
oi.type_name = &sb;
oi.sizep = &size;
oi.typep = &type;
- if (read_loose_object(path, oid, &contents, &oi) < 0)
- err = error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
+ ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
+ if (ret < 0) {
+ if (contents && !oideq(&real_oid, oid))
+ error(_("%s: hash-path mismatch, found at: %s"),
+ oid_to_hex(&real_oid), path);
+ else
+ error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
+ }
if (type < 0)
- err = error(_("%s: object is of unknown type '%s': %s"),
- oid_to_hex(oid), sb.buf, path);
- if (err) {
+ ret = error(_("%s: object is of unknown type '%s': %s"),
+ oid_to_hex(&real_oid), sb.buf, path);
+ if (ret < 0) {
errors_found |= ERROR_OBJECT;
return 0; /* keep checking other objects */
}
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 6cc48902170..17c4b1d3ead 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1415,7 +1415,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
if (check_object_signature(the_repository, &d->oid,
data, size,
- type_name(type)))
+ type_name(type), NULL))
die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
repl = lookup_replace_object(the_repository, tagged_oid);
ret = check_object_signature(the_repository, repl,
- buffer, size, type_name(*tagged_type));
+ buffer, size, type_name(*tagged_type),
+ NULL);
free(buffer);
return ret;
diff --git a/cache.h b/cache.h
index 23f0534b70e..44b11f52362 100644
--- a/cache.h
+++ b/cache.h
@@ -1355,7 +1355,8 @@ struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *buf, unsigned long size, const char *type);
+ void *buf, unsigned long size, const char *type,
+ struct object_id *real_oidp);
int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/object-file.c b/object-file.c
index be568ade95b..ff0e465d556 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1062,9 +1062,11 @@ void *xmmap(void *start, size_t length,
* the streaming interface and rehash it to do the same.
*/
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *map, unsigned long size, const char *type)
+ void *map, unsigned long size, const char *type,
+ struct object_id *real_oidp)
{
- struct object_id real_oid;
+ struct object_id tmp;
+ struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
enum object_type obj_type;
struct git_istream *st;
git_hash_ctx c;
@@ -1072,8 +1074,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
int hdrlen;
if (map) {
- hash_object_file(r->hash_algo, map, size, type, &real_oid);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ hash_object_file(r->hash_algo, map, size, type, real_oid);
+ return !oideq(oid, real_oid) ? -1 : 0;
}
st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1098,9 +1100,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
break;
r->hash_algo->update_fn(&c, buf, readlen);
}
- r->hash_algo->final_oid_fn(&real_oid, &c);
+ r->hash_algo->final_oid_fn(real_oid, &c);
close_istream(st);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ return !oideq(oid, real_oid) ? -1 : 0;
}
int git_open_cloexec(const char *name, int flags)
@@ -2546,6 +2548,7 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi)
{
@@ -2556,8 +2559,6 @@ int read_loose_object(const char *path,
char hdr[MAX_HEADER_LEN];
unsigned long *size = oi->sizep;
- *contents = NULL;
-
map = map_loose_object_1(the_repository, path, NULL, &mapsize);
if (!map) {
error_errno(_("unable to mmap %s"), path);
@@ -2587,9 +2588,7 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size, oi->type_name->buf)) {
- error(_("hash mismatch for %s (expected %s)"), path,
- oid_to_hex(expected_oid));
+ *contents, *size, oi->type_name->buf, real_oid)) {
free(*contents);
goto out;
}
diff --git a/object-store.h b/object-store.h
index e8b4d87b898..77aa3d857cc 100644
--- a/object-store.h
+++ b/object-store.h
@@ -252,6 +252,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi);
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
(!obj && repo_has_object_file(r, oid) &&
oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
- if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+ if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
error(_("hash mismatch %s"), oid_to_hex(oid));
return NULL;
}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
buffer = repo_read_object_file(r, oid, &type, &size);
if (buffer) {
if (check_object_signature(r, repl, buffer, size,
- type_name(type)) < 0) {
+ type_name(type), NULL) < 0) {
free(buffer);
error(_("hash mismatch %s"), oid_to_hex(repl));
return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
oid_to_hex(&oid), p->pack_name,
(uintmax_t)entries[i].offset);
- else if (check_object_signature(r, &oid, data, size, type_name(type)))
+ else if (check_object_signature(r, &oid, data, size,
+ type_name(type), NULL))
err = error("packed %s from %s is corrupt",
oid_to_hex(&oid), p->pack_name);
else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 711dcc6d795..1f7cc0717b7 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -512,7 +512,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
# Swap the two to corrupt the repository
mv -f "$other_path" "$empty_path" &&
test_must_fail git fsck 2>err.fsck &&
- grep "hash mismatch" err.fsck &&
+ grep "hash-path mismatch" err.fsck &&
# confirm that cat-file is reading the new swapped-in
# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 167c319823a..eb0e772f098 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -54,6 +54,7 @@ test_expect_success 'object with hash mismatch' '
cd hash-mismatch &&
oid=$(echo blob | git hash-object -w --stdin) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -65,7 +66,7 @@ test_expect_success 'object with hash mismatch' '
git update-ref refs/heads/bogus $cmt &&
test_must_fail git fsck 2>out &&
- grep "$oid.*corrupt" out
+ grep "$oldoid: hash-path mismatch, found at: .*$new" out
)
'
@@ -75,6 +76,7 @@ test_expect_success 'object with hash and type mismatch' '
cd hash-type-mismatch &&
oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -87,8 +89,8 @@ test_expect_success 'object with hash and type mismatch' '
test_must_fail git fsck 2>out &&
- grep "^error: hash mismatch for " out &&
- grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+ grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+ grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
)
'
--
2.33.0.1098.g29a6526ae47
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
2021-09-20 19:04 ` [PATCH v7 00/17] " Ævar Arnfjörð Bjarmason
` (16 preceding siblings ...)
2021-09-20 19:04 ` [PATCH v7 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
` (18 more replies)
17 siblings, 19 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
This improves fsck error reporting, see the examples in the commit
messages of 16/17 and 17/17. To get there I've lib-ified more things
in object-file.c and the general object APIs, i.e. now we'll return
error codes instead of calling die() in these cases.
v6 of this got a very detailed review from Taylor Blau (thanks a
lot!), for the v6 see:
https://lore.kernel.org/git/cover-v6-00.22-00000000000-20210907T104558Z-avarab@gmail.com/
The v7 had a couple of trivial shellscripting issues, a typo'd
test_oid variable, and a warning on a "test" comparison. For v7 see
https://lore.kernel.org/git/cover-v7-00.17-00000000000-20210920T190304Z-avarab@gmail.com/
Ævar Arnfjörð Bjarmason (17):
fsck tests: add test for fsck-ing an unknown type
fsck tests: refactor one test to use a sub-repo
fsck tests: test current hash/type mismatch behavior
fsck tests: test for garbage appended to a loose object
cat-file tests: move bogus_* variable declarations earlier
cat-file tests: test for missing/bogus object with -t, -s and -p
cat-file tests: add corrupt loose object test
cat-file tests: test for current --allow-unknown-type behavior
object-file.c: don't set "typep" when returning non-zero
object-file.c: return -1, not "status" from unpack_loose_header()
object-file.c: make parse_loose_header_extended() public
object-file.c: simplify unpack_loose_short_header()
object-file.c: use "enum" return type for unpack_loose_header()
object-file.c: return ULHR_TOO_LONG on "header too long"
object-file.c: stop dying in parse_loose_header()
fsck: don't hard die on invalid object types
fsck: report invalid object type-path combinations
builtin/fast-export.c | 2 +-
builtin/fsck.c | 28 +++++-
builtin/index-pack.c | 2 +-
builtin/mktag.c | 3 +-
cache.h | 45 ++++++++-
object-file.c | 176 +++++++++++++++------------------
object-store.h | 7 +-
object.c | 4 +-
pack-check.c | 3 +-
streaming.c | 27 +++--
t/oid-info/oid | 2 +
t/t1006-cat-file.sh | 223 +++++++++++++++++++++++++++++++++++++++---
t/t1450-fsck.sh | 99 +++++++++++++++----
13 files changed, 463 insertions(+), 158 deletions(-)
Range-diff against v7:
1: 752cef556c2 = 1: b999ab695d9 fsck tests: add test for fsck-ing an unknown type
2: 612003bdd2c = 2: e01c21378a4 fsck tests: refactor one test to use a sub-repo
3: 1e40a4235e9 = 3: 93197a7bcee fsck tests: test current hash/type mismatch behavior
4: 854991c1543 = 4: 277188dd58d fsck tests: test for garbage appended to a loose object
5: fc93c2c2530 = 5: ab2ea1beaaf cat-file tests: move bogus_* variable declarations earlier
6: 051088aa114 ! 6: 91229b94fac cat-file tests: test for missing/bogus object with -t, -s and -p
@@ t/oid-info/oid: numeric sha1:0123456789012345678901234567890123456789
deadbeef sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
deadbeef sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef_short sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
-+deadbee_short sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
++deadbeef_short sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
## t/t1006-cat-file.sh ##
@@ t/t1006-cat-file.sh: test_expect_success 'setup bogus data' '
@@ t/t1006-cat-file.sh: test_expect_success 'setup bogus data' '
+do
+ for arg2 in -s -t -p
+ do
-+ if test $arg1 = "--allow-unknown-type" && test "$arg2" = "-p"
++ if test "$arg1" = "--allow-unknown-type" && test "$arg2" = "-p"
+ then
+ continue
+ fi
7: 20bd81c1af0 = 7: 9e95e134d30 cat-file tests: add corrupt loose object test
8: cd1d52b8a07 = 8: 215f98ad369 cat-file tests: test for current --allow-unknown-type behavior
9: d9f5adfc74b = 9: 3e1df3594df object-file.c: don't set "typep" when returning non-zero
10: 51d14bc9274 = 10: b96828f3d5b object-file.c: return -1, not "status" from unpack_loose_header()
11: f43cfd8a5ed = 11: 273acb45517 object-file.c: make parse_loose_header_extended() public
12: 50d938f7f3c = 12: 314d34357dd object-file.c: simplify unpack_loose_short_header()
13: 755fde00b46 = 13: 07481bcb55c object-file.c: use "enum" return type for unpack_loose_header()
14: 522d71eb19d = 14: 42b8d135c8c object-file.c: return ULHR_TOO_LONG on "header too long"
15: 1ca875395c1 = 15: 106b7461ce9 object-file.c: stop dying in parse_loose_header()
16: d38067feab3 = 16: d01223ae322 fsck: don't hard die on invalid object types
17: b07e892fc19 = 17: 7f394a991a6 fsck: report invalid object type-path combinations
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v8 01/17] fsck tests: add test for fsck-ing an unknown type
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
` (17 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.
This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..969bfbbdd8f 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
+test_expect_success 'fsck hard errors on an invalid object type' '
+ git init --bare garbage-type &&
+ (
+ cd garbage-type &&
+
+ empty=$(git hash-object --stdin -w -t blob </dev/null) &&
+ garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+ test_must_fail git fsck >out 2>err &&
+ test_cmp err.expect err &&
+ test_must_be_empty out
+ )
+'
+
test_done
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 02/17] fsck tests: refactor one test to use a sub-repo
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
` (16 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.
We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.
See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.
1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 969bfbbdd8f..f8edd15abf8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,25 @@ remove_object () {
rm "$(sha1_file "$1")"
}
-test_expect_success 'object with bad sha1' '
- sha=$(echo blob | git hash-object -w --stdin) &&
- old=$(test_oid_to_path "$sha") &&
- new=$(dirname $old)/$(test_oid ff_2) &&
- sha="$(dirname $new)$(basename $new)" &&
- mv .git/objects/$old .git/objects/$new &&
- test_when_finished "remove_object $sha" &&
- git update-index --add --cacheinfo 100644 $sha foo &&
- test_when_finished "git read-tree -u --reset HEAD" &&
- tree=$(git write-tree) &&
- test_when_finished "remove_object $tree" &&
- cmt=$(echo bogus | git commit-tree $tree) &&
- test_when_finished "remove_object $cmt" &&
- git update-ref refs/heads/bogus $cmt &&
- test_when_finished "git update-ref -d refs/heads/bogus" &&
+test_expect_success 'object with hash mismatch' '
+ git init --bare hash-mismatch &&
+ (
+ cd hash-mismatch &&
- test_must_fail git fsck 2>out &&
- test_i18ngrep "$sha.*corrupt" out
+ oid=$(echo blob | git hash-object -w --stdin) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+
+ test_must_fail git fsck 2>out &&
+ grep "$oid.*corrupt" out
+ )
'
test_expect_success 'branch pointing to non-commit' '
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 03/17] fsck tests: test current hash/type mismatch behavior
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
` (15 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f8edd15abf8..175ed304637 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,6 +69,30 @@ test_expect_success 'object with hash mismatch' '
)
'
+test_expect_success 'object with hash and type mismatch' '
+ git init --bare hash-type-mismatch &&
+ (
+ cd hash-type-mismatch &&
+
+ oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+
+ cat >expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+ test_must_fail git fsck 2>actual &&
+ test_cmp expect actual
+ )
+'
+
test_expect_success 'branch pointing to non-commit' '
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
test_when_finished "git update-ref -d refs/heads/invalid" &&
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 04/17] fsck tests: test for garbage appended to a loose object
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (2 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
` (14 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 175ed304637..bd696d21dba 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -93,6 +93,26 @@ test_expect_success 'object with hash and type mismatch' '
)
'
+test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
+ git init --bare corrupt-loose-output &&
+ (
+ cd corrupt-loose-output &&
+ oid=$(git hash-object -w --stdin --literally </dev/null) &&
+ oidf=objects/$(test_oid_to_path "$oid") &&
+ chmod 755 $oidf &&
+ echo extra garbage >>$oidf &&
+
+ cat >expect.error <<-EOF &&
+ error: garbage at end of loose object '\''$oid'\''
+ error: unable to unpack contents of ./$oidf
+ error: $oid: object corrupt or missing: ./$oidf
+ EOF
+ test_must_fail git fsck 2>actual &&
+ grep ^error: actual >error &&
+ test_cmp expect.error error
+ )
+'
+
test_expect_success 'branch pointing to non-commit' '
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
test_when_finished "git update-ref -d refs/heads/invalid" &&
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 05/17] cat-file tests: move bogus_* variable declarations earlier
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (3 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
` (13 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 35 +++++++++++++++++++----------------
1 file changed, 19 insertions(+), 16 deletions(-)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..ea6a53d425b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,36 +315,39 @@ test_expect_success '%(deltabase) reports packed delta bases' '
}
'
-bogus_type="bogus"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'setup bogus data' '
+ bogus_short_type="bogus" &&
+ bogus_short_content="bogus" &&
+ bogus_short_size=$(strlen "$bogus_short_content") &&
+ bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+
+ bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
+ bogus_long_content="bogus" &&
+ bogus_long_size=$(strlen "$bogus_long_content") &&
+ bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+'
test_expect_success "Type of broken object is correct" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_short_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
test_cmp expect actual
'
test_expect_success "Size of broken object is correct" '
- echo $bogus_size >expect &&
- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_short_size >expect &&
+ git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
test_cmp expect actual
'
-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
test_expect_success "Type of broken object is correct when type is large" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_long_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
test_cmp expect actual
'
test_expect_success "Size of large broken object is correct when type is large" '
- echo $bogus_size >expect &&
- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_long_size >expect &&
+ git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
test_cmp expect actual
'
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (4 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
` (12 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().
The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.
The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.
Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.
This extends tests added in 3e370f9faf0 (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/oid-info/oid | 2 ++
t/t1006-cat-file.sh | 75 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 77 insertions(+)
diff --git a/t/oid-info/oid b/t/oid-info/oid
index a754970523c..7547d2c7903 100644
--- a/t/oid-info/oid
+++ b/t/oid-info/oid
@@ -27,3 +27,5 @@ numeric sha1:0123456789012345678901234567890123456789
numeric sha256:0123456789012345678901234567890123456789012345678901234567890123
deadbeef sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
deadbeef sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef_short sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
+deadbeef_short sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ea6a53d425b..abf57339a29 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
'
+for arg1 in '' --allow-unknown-type
+do
+ for arg2 in -s -t -p
+ do
+ if test "$arg1" = "--allow-unknown-type" && test "$arg2" = "-p"
+ then
+ continue
+ fi
+
+
+ test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
+ cat >expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ if test "$arg1" = "--allow-unknown-type"
+ then
+ git cat-file $arg1 $arg2 $bogus_short_sha1
+ else
+ test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+ test_must_be_empty out &&
+ test_cmp expect actual
+ fi
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
+ if test "$arg2" = "-p"
+ then
+ cat >expect <<-EOF
+ error: unable to unpack $bogus_long_sha1 header
+ fatal: Not a valid object name $bogus_long_sha1
+ EOF
+ else
+ cat >expect <<-EOF
+ error: unable to unpack $bogus_long_sha1 header
+ fatal: git cat-file: could not get object info
+ EOF
+ fi &&
+
+ if test "$arg1" = "--allow-unknown-type"
+ then
+ git cat-file $arg1 $arg2 $bogus_short_sha1
+ else
+ test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+ test_must_be_empty out &&
+ test_cmp expect actual
+ fi
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
+ cat >expect.err <<-EOF &&
+ fatal: Not a valid object name $(test_oid deadbeef_short)
+ EOF
+ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
+ test_must_be_empty out
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
+ if test "$arg2" = "-p"
+ then
+ cat >expect.err <<-EOF
+ fatal: Not a valid object name $(test_oid deadbeef)
+ EOF
+ else
+ cat >expect.err <<-\EOF
+ fatal: git cat-file: could not get object info
+ EOF
+ fi &&
+ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
+ test_must_be_empty out &&
+ test_cmp expect.err err.actual
+ '
+ done
+done
+
test_expect_success "Type of broken object is correct" '
echo $bogus_short_type >expect &&
git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 07/17] cat-file tests: add corrupt loose object test
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (5 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
` (11 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index abf57339a29..15774979ad3 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -426,6 +426,58 @@ test_expect_success "Size of large broken object is correct when type is large"
test_cmp expect actual
'
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+ git init --bare corrupt-loose.git &&
+ (
+ cd corrupt-loose.git &&
+
+ # Setup and create the empty blob and its path
+ empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+ git hash-object -w --stdin </dev/null &&
+
+ # Create another blob and its path
+ echo other >other.blob &&
+ other_blob=$(git hash-object -w --stdin <other.blob) &&
+ other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+ # Before the swap the size is 0
+ cat >out.expect <<-EOF &&
+ 0
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # Swap the two to corrupt the repository
+ mv -f "$other_path" "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "hash mismatch" err.fsck &&
+
+ # confirm that cat-file is reading the new swapped-in
+ # blob...
+ cat >out.expect <<-EOF &&
+ blob
+ EOF
+ git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # ... since it has a different size now.
+ cat >out.expect <<-EOF &&
+ 6
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # So far "cat-file" has been happy to spew the found
+ # content out as-is. Try to make it zlib-invalid.
+ mv -f other.blob "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "^error: inflate: data stream error (" err.fsck
+ )
+'
+
# Tests for git cat-file --follow-symlinks
test_expect_success 'prep for symlink tests' '
echo_without_newline "$hello_content" >morx &&
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 08/17] cat-file tests: test for current --allow-unknown-type behavior
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (6 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
` (10 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.
1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 15774979ad3..5b16c69c286 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -402,6 +402,67 @@ do
done
done
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+ git cat-file -e $bogus_short_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+ test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+ test_must_fail git cat-file -p $bogus_short_sha1 &&
+ test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type "bogus"
+ EOF
+ test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+ echo $bogus_short_sha1 >bogus-oid &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual &&
+
+ test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+ test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+ test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+ cat >expect <<-EOF &&
+ $bogus_short_type
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual &&
+
+ # Create it manually, as "git replace" will die on bogus
+ # types.
+ head=$(git rev-parse --verify HEAD) &&
+ test_when_finished "rm -rf .git/refs/replace" &&
+ mkdir -p .git/refs/replace &&
+ echo $head >.git/refs/replace/$bogus_short_sha1 &&
+
+ cat >expect <<-EOF &&
+ commit
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual
+'
+
test_expect_success "Type of broken object is correct" '
echo $bogus_short_type >expect &&
git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 09/17] object-file.c: don't set "typep" when returning non-zero
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (7 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
` (9 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.
That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.
Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.
Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.
This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.
Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.
Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/object-file.c b/object-file.c
index be4f94ecf3b..766ba88b851 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1525,8 +1525,6 @@ static int loose_object_info(struct repository *r,
git_inflate_end(&stream);
munmap(map, mapsize);
- if (status && oi->typep)
- *oi->typep = status;
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 10/17] object-file.c: return -1, not "status" from unpack_loose_header()
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (8 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
` (8 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.
See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".
At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").
However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.
So let's do the minor cleanup of also changing this function to return
a -1.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/object-file.c b/object-file.c
index 766ba88b851..8475b128944 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1284,7 +1284,7 @@ int unpack_loose_header(git_zstream *stream,
buffer, bufsiz);
if (status < Z_OK)
- return status;
+ return -1;
/* Make sure we have the terminating NUL */
if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 11/17] object-file.c: make parse_loose_header_extended() public
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (9 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
` (7 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.
This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.
It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.
1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 4 +++-
object-file.c | 20 +++++++-------------
streaming.c | 5 ++++-
3 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/cache.h b/cache.h
index f6295f3b048..35f254dae4a 100644
--- a/cache.h
+++ b/cache.h
@@ -1320,7 +1320,9 @@ char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+struct object_info;
+int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags);
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 8475b128944..6b91c4edcf6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1385,8 +1385,8 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1446,14 +1446,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
return *hdr ? -1 : type;
}
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
- struct object_info oi = OBJECT_INFO_INIT;
-
- oi.sizep = sizep;
- return parse_loose_header_extended(hdr, &oi, 0);
-}
-
static int loose_object_info(struct repository *r,
const struct object_id *oid,
struct object_info *oi, int flags)
@@ -1508,10 +1500,10 @@ static int loose_object_info(struct repository *r,
if (status < 0)
; /* Do nothing */
else if (hdrbuf.len) {
- if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+ if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
- } else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+ } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
status = error(_("unable to parse %s header"), oid_to_hex(oid));
if (status >= 0 && oi->contentp) {
@@ -2599,6 +2591,8 @@ int read_loose_object(const char *path,
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = size;
*contents = NULL;
@@ -2613,7 +2607,7 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, size);
+ *type = parse_loose_header(hdr, &oi, 0);
if (*type < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
const struct object_id *oid,
enum object_type *type)
{
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = &st->size;
+
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapsize,
st->u.loose.hdr,
sizeof(st->u.loose.hdr)) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+ (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
return -1;
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 12/17] object-file.c: simplify unpack_loose_short_header()
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (10 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
` (6 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.
The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).
Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.
I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 17 ++++++++++++++-
object-file.c | 58 ++++++++++++++++++---------------------------------
streaming.c | 3 ++-
3 files changed, 38 insertions(+), 40 deletions(-)
diff --git a/cache.h b/cache.h
index 35f254dae4a..d7189aed8fc 100644
--- a/cache.h
+++ b/cache.h
@@ -1319,7 +1319,22 @@ char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
+
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+ unsigned long mapsize, void *buffer,
+ unsigned long bufsiz, struct strbuf *hdrbuf);
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi,
unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 6b91c4edcf6..1327872cbf4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,11 +1255,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-static int unpack_loose_short_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+ unsigned char *map, unsigned long mapsize,
+ void *buffer, unsigned long bufsiz,
+ struct strbuf *header)
{
- int ret;
+ int status;
/* Get the data stream */
memset(stream, 0, sizeof(*stream));
@@ -1270,35 +1271,8 @@ static int unpack_loose_short_header(git_zstream *stream,
git_inflate_init(stream);
obj_read_unlock();
- ret = git_inflate(stream, 0);
+ status = git_inflate(stream, 0);
obj_read_lock();
-
- return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
-{
- int status = unpack_loose_short_header(stream, map, mapsize,
- buffer, bufsiz);
-
- if (status < Z_OK)
- return -1;
-
- /* Make sure we have the terminating NUL */
- if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return -1;
- return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *header)
-{
- int status;
-
- status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
if (status < Z_OK)
return -1;
@@ -1308,6 +1282,14 @@ static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
return 0;
+ /*
+ * We have a header longer than MAX_HEADER_LEN. The "header"
+ * here is only non-NULL when we run "cat-file
+ * --allow-unknown-type".
+ */
+ if (!header)
+ return -1;
+
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
* result out to header, and then append the result of further
@@ -1457,6 +1439,7 @@ static int loose_object_info(struct repository *r,
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
oidclr(oi->delta_base_oid);
@@ -1490,11 +1473,9 @@ static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
- if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
- status = error(_("unable to unpack %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+
+ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL) < 0)
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
if (status < 0)
@@ -2602,7 +2583,8 @@ int read_loose_object(const char *path,
goto out;
}
- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ NULL) < 0) {
error(_("unable to unpack header of %s"), path);
goto out;
}
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped,
st->u.loose.mapsize,
st->u.loose.hdr,
- sizeof(st->u.loose.hdr)) < 0) ||
+ sizeof(st->u.loose.hdr),
+ NULL) < 0) ||
(parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 13/17] object-file.c: use "enum" return type for unpack_loose_header()
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (11 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
` (5 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.
Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 19 +++++++++++++++----
object-file.c | 34 +++++++++++++++++++++-------------
streaming.c | 23 +++++++++++++----------
3 files changed, 49 insertions(+), 27 deletions(-)
diff --git a/cache.h b/cache.h
index d7189aed8fc..7239e20a625 100644
--- a/cache.h
+++ b/cache.h
@@ -1324,7 +1324,10 @@ int git_open_cloexec(const char *name, int flags);
* unpack_loose_header() initializes the data stream needed to unpack
* a loose object header.
*
- * Returns 0 on success. Returns negative values on error.
+ * Returns:
+ *
+ * - ULHR_OK on success
+ * - ULHR_BAD on error
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
@@ -1332,9 +1335,17 @@ int git_open_cloexec(const char *name, int flags);
* reporting. The full header will be extracted to "hdrbuf" for use
* with parse_loose_header().
*/
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result {
+ ULHR_OK,
+ ULHR_BAD,
+};
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *hdrbuf);
+
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi,
unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 1327872cbf4..e0f508415dd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,10 +1255,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz,
- struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *header)
{
int status;
@@ -1274,13 +1276,13 @@ int unpack_loose_header(git_zstream *stream,
status = git_inflate(stream, 0);
obj_read_lock();
if (status < Z_OK)
- return -1;
+ return ULHR_BAD;
/*
* Check if entire header is unpacked in the first iteration.
*/
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return 0;
+ return ULHR_OK;
/*
* We have a header longer than MAX_HEADER_LEN. The "header"
@@ -1288,7 +1290,7 @@ int unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
- return -1;
+ return ULHR_BAD;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
@@ -1309,7 +1311,7 @@ int unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
- return -1;
+ return ULHR_BAD;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ -1474,13 +1476,19 @@ static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
- allow_unknown ? &hdrbuf : NULL) < 0)
+ switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL)) {
+ case ULHR_OK:
+ break;
+ case ULHR_BAD:
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
- if (status < 0)
- ; /* Do nothing */
- else if (hdrbuf.len) {
+ break;
+ }
+
+ if (status < 0) {
+ /* Do nothing */
+ } else if (hdrbuf.len) {
if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..6df0247a4cb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -229,17 +229,16 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
- if ((unpack_loose_header(&st->z,
- st->u.loose.mapped,
- st->u.loose.mapsize,
- st->u.loose.hdr,
- sizeof(st->u.loose.hdr),
- NULL) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
- git_inflate_end(&st->z);
- munmap(st->u.loose.mapped, st->u.loose.mapsize);
- return -1;
+ switch (unpack_loose_header(&st->z, st->u.loose.mapped,
+ st->u.loose.mapsize, st->u.loose.hdr,
+ sizeof(st->u.loose.hdr), NULL)) {
+ case ULHR_OK:
+ break;
+ case ULHR_BAD:
+ goto error;
}
+ if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+ goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
st->u.loose.hdr_avail = st->z.total_out;
@@ -248,6 +247,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->read = read_istream_loose;
return 0;
+error:
+ git_inflate_end(&st->z);
+ munmap(st->u.loose.mapped, st->u.loose.mapsize);
+ return -1;
}
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 14/17] object-file.c: return ULHR_TOO_LONG on "header too long"
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (12 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.
As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 5 ++++-
object-file.c | 8 ++++++--
streaming.c | 1 +
t/t1006-cat-file.sh | 4 ++--
4 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/cache.h b/cache.h
index 7239e20a625..8e05392fda8 100644
--- a/cache.h
+++ b/cache.h
@@ -1328,16 +1328,19 @@ int git_open_cloexec(const char *name, int flags);
*
* - ULHR_OK on success
* - ULHR_BAD on error
+ * - ULHR_TOO_LONG if the header was too long
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
* OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
* reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
+ * from this function to indicate that the header was too long.
*/
enum unpack_loose_header_result {
ULHR_OK,
ULHR_BAD,
+ ULHR_TOO_LONG,
};
enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
unsigned char *map,
diff --git a/object-file.c b/object-file.c
index e0f508415dd..3589c5a2e33 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1290,7 +1290,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
- return ULHR_BAD;
+ return ULHR_TOO_LONG;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
@@ -1311,7 +1311,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
- return ULHR_BAD;
+ return ULHR_TOO_LONG;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ -1484,6 +1484,10 @@ static int loose_object_info(struct repository *r,
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
break;
+ case ULHR_TOO_LONG:
+ status = error(_("header for %s too long, exceeds %d bytes"),
+ oid_to_hex(oid), MAX_HEADER_LEN);
+ break;
}
if (status < 0) {
diff --git a/streaming.c b/streaming.c
index 6df0247a4cb..bd89c50e7b3 100644
--- a/streaming.c
+++ b/streaming.c
@@ -235,6 +235,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
case ULHR_OK:
break;
case ULHR_BAD:
+ case ULHR_TOO_LONG:
goto error;
}
if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5b16c69c286..a5e7401af8b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -356,12 +356,12 @@ do
if test "$arg2" = "-p"
then
cat >expect <<-EOF
- error: unable to unpack $bogus_long_sha1 header
+ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
fatal: Not a valid object name $bogus_long_sha1
EOF
else
cat >expect <<-EOF
- error: unable to unpack $bogus_long_sha1 header
+ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
fatal: git cat-file: could not get object info
EOF
fi &&
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 15/17] object-file.c: stop dying in parse_loose_header()
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (13 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Make parse_loose_header() return error codes and data instead of
invoking die() by itself.
For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.
For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".
Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().
This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.
Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".
The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.
Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.
We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 11 +++++++--
object-file.c | 67 +++++++++++++++++++++++++--------------------------
streaming.c | 3 ++-
3 files changed, 44 insertions(+), 37 deletions(-)
diff --git a/cache.h b/cache.h
index 8e05392fda8..6c5f00c82d5 100644
--- a/cache.h
+++ b/cache.h
@@ -1349,9 +1349,16 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
unsigned long bufsiz,
struct strbuf *hdrbuf);
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
struct object_info;
-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags);
+int parse_loose_header(const char *hdr, struct object_info *oi);
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 3589c5a2e33..a70669700d0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1369,8 +1369,7 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1392,15 +1391,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
type = type_from_string_gently(type_buf, type_len, 1);
if (oi->type_name)
strbuf_add(oi->type_name, type_buf, type_len);
- /*
- * Set type to 0 if its an unknown object and
- * we're obtaining the type using '--allow-unknown-type'
- * option.
- */
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
- type = 0;
- else if (type < 0)
- die(_("invalid object type"));
if (oi->typep)
*oi->typep = type;
@@ -1427,7 +1417,14 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
/*
* The length must be followed by a zero byte
*/
- return *hdr ? -1 : type;
+ if (*hdr)
+ return -1;
+
+ /*
+ * The format is valid, but the type may still be bogus. The
+ * Caller needs to check its oi->typep.
+ */
+ return 0;
}
static int loose_object_info(struct repository *r,
@@ -1441,6 +1438,7 @@ static int loose_object_info(struct repository *r,
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ enum object_type type_scratch;
int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
@@ -1472,6 +1470,8 @@ static int loose_object_info(struct repository *r,
if (!oi->sizep)
oi->sizep = &size_scratch;
+ if (!oi->typep)
+ oi->typep = &type_scratch;
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
@@ -1479,6 +1479,18 @@ static int loose_object_info(struct repository *r,
switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
allow_unknown ? &hdrbuf : NULL)) {
case ULHR_OK:
+ if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
+ status = error(_("unable to parse %s header"), oid_to_hex(oid));
+ else if (!allow_unknown && *oi->typep < 0)
+ die(_("invalid object type"));
+
+ if (!oi->contentp)
+ break;
+ *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+ if (*oi->contentp)
+ goto cleanup;
+
+ status = -1;
break;
case ULHR_BAD:
status = error(_("unable to unpack %s header"),
@@ -1490,31 +1502,16 @@ static int loose_object_info(struct repository *r,
break;
}
- if (status < 0) {
- /* Do nothing */
- } else if (hdrbuf.len) {
- if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
- status = error(_("unable to parse %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
- status = error(_("unable to parse %s header"), oid_to_hex(oid));
-
- if (status >= 0 && oi->contentp) {
- *oi->contentp = unpack_loose_rest(&stream, hdr,
- *oi->sizep, oid);
- if (!*oi->contentp) {
- git_inflate_end(&stream);
- status = -1;
- }
- } else
- git_inflate_end(&stream);
-
+ git_inflate_end(&stream);
+cleanup:
munmap(map, mapsize);
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
+ if (oi->typep == &type_scratch)
+ oi->typep = NULL;
oi->whence = OI_LOOSE;
- return (status < 0) ? status : 0;
+ return status;
}
int obj_read_use_lock = 0;
@@ -2585,6 +2582,7 @@ int read_loose_object(const char *path,
git_zstream stream;
char hdr[MAX_HEADER_LEN];
struct object_info oi = OBJECT_INFO_INIT;
+ oi.typep = type;
oi.sizep = size;
*contents = NULL;
@@ -2601,12 +2599,13 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, &oi, 0);
- if (*type < 0) {
+ if (parse_loose_header(hdr, &oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
}
+ if (*type < 0)
+ die(_("invalid object type"));
if (*type == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/streaming.c b/streaming.c
index bd89c50e7b3..fe54665d86e 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
{
struct object_info oi = OBJECT_INFO_INIT;
oi.sizep = &st->size;
+ oi.typep = type;
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
@@ -238,7 +239,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
case ULHR_TOO_LONG:
goto error;
}
- if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+ if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 16/17] fsck: don't hard die on invalid object types
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (14 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-28 2:18 ` [PATCH v8 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Change the error fsck emits on invalid object types, such as:
$ git hash-object --stdin -w -t garbage --literally </dev/null
<OID>
From the very ungraceful error of:
$ git fsck
fatal: invalid object type
$
To:
$ git fsck
error: <OID>: object is of unknown type 'garbage': <OID_PATH>
[ other fsck output ]
We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).
To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fsck.c | 17 ++++++++++++++---
object-file.c | 18 ++++++------------
object-store.h | 6 +++---
t/t1450-fsck.sh | 17 +++++++++--------
4 files changed, 32 insertions(+), 26 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..3b046820750 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,11 +600,22 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
unsigned long size;
void *contents;
int eaten;
+ struct strbuf sb = STRBUF_INIT;
+ struct object_info oi = OBJECT_INFO_INIT;
+ int err = 0;
- if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+ oi.type_name = &sb;
+ oi.sizep = &size;
+ oi.typep = &type;
+
+ if (read_loose_object(path, oid, &contents, &oi) < 0)
+ err = error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
+ if (type < 0)
+ err = error(_("%s: object is of unknown type '%s': %s"),
+ oid_to_hex(oid), sb.buf, path);
+ if (err) {
errors_found |= ERROR_OBJECT;
- error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
return 0; /* keep checking other objects */
}
diff --git a/object-file.c b/object-file.c
index a70669700d0..fe95285f405 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2572,18 +2572,15 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents)
+ void **contents,
+ struct object_info *oi)
{
int ret = -1;
void *map = NULL;
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
- struct object_info oi = OBJECT_INFO_INIT;
- oi.typep = type;
- oi.sizep = size;
+ unsigned long *size = oi->sizep;
*contents = NULL;
@@ -2599,15 +2596,13 @@ int read_loose_object(const char *path,
goto out;
}
- if (parse_loose_header(hdr, &oi) < 0) {
+ if (parse_loose_header(hdr, oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
}
- if (*type < 0)
- die(_("invalid object type"));
- if (*type == OBJ_BLOB && *size > big_file_threshold) {
+ if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
goto out;
} else {
@@ -2618,8 +2613,7 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size,
- type_name(*type))) {
+ *contents, *size, oi->type_name->buf)) {
error(_("hash mismatch for %s (expected %s)"), path,
oid_to_hex(expected_oid));
free(*contents);
diff --git a/object-store.h b/object-store.h
index c5130d8baea..c90c41a07f7 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,6 +245,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
/*
* Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
* type, and size. If the object is a blob, then "contents" may return NULL,
* to allow streaming of large blobs.
*
@@ -252,9 +253,8 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents);
+ void **contents,
+ struct object_info *oi);
/* Retry packed storage after checking packed and loose storage */
#define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bd696d21dba..167c319823a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -85,11 +85,10 @@ test_expect_success 'object with hash and type mismatch' '
cmt=$(echo bogus | git commit-tree $tree) &&
git update-ref refs/heads/bogus $cmt &&
- cat >expect <<-\EOF &&
- fatal: invalid object type
- EOF
- test_must_fail git fsck 2>actual &&
- test_cmp expect actual
+
+ test_must_fail git fsck 2>out &&
+ grep "^error: hash mismatch for " out &&
+ grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
)
'
@@ -910,7 +909,7 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
git init --bare garbage-type &&
(
cd garbage-type &&
@@ -922,8 +921,10 @@ test_expect_success 'fsck hard errors on an invalid object type' '
fatal: invalid object type
EOF
test_must_fail git fsck >out 2>err &&
- test_cmp err.expect err &&
- test_must_be_empty out
+ grep -e "^error" -e "^fatal" err >errors &&
+ test_line_count = 1 errors &&
+ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+ grep "dangling blob $empty_blob" out
)
'
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v8 17/17] fsck: report invalid object type-path combinations
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (15 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-28 2:18 ` Ævar Arnfjörð Bjarmason
2021-09-29 19:50 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-28 2:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git fsck
error: garbage at end of loose object 'e69d[...]'
error: unable to unpack contents of ./objects/e6/9d[...]
error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ /usr/bin/git fsck
fatal: invalid object type
$ ~/g/git/git fsck
error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
[...]
I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.
There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fast-export.c | 2 +-
builtin/fsck.c | 23 +++++++++++++++--------
builtin/index-pack.c | 2 +-
builtin/mktag.c | 3 ++-
cache.h | 3 ++-
object-file.c | 21 ++++++++++-----------
object-store.h | 1 +
object.c | 4 ++--
pack-check.c | 3 ++-
t/t1006-cat-file.sh | 2 +-
t/t1450-fsck.sh | 8 +++++---
11 files changed, 42 insertions(+), 30 deletions(-)
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 95e8e89e81f..8e2caf72819 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
if (!buf)
die("could not read blob %s", oid_to_hex(oid));
if (check_object_signature(the_repository, oid, buf, size,
- type_name(type)) < 0)
+ type_name(type), NULL) < 0)
die("oid mismatch in blob %s", oid_to_hex(oid));
object = parse_object_buffer(the_repository, oid, type,
size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3b046820750..d925cdbae5c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -598,23 +598,30 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
struct object *obj;
enum object_type type;
unsigned long size;
- void *contents;
+ unsigned char *contents = NULL;
int eaten;
struct strbuf sb = STRBUF_INIT;
struct object_info oi = OBJECT_INFO_INIT;
- int err = 0;
+ struct object_id real_oid = *null_oid();
+ int ret;
oi.type_name = &sb;
oi.sizep = &size;
oi.typep = &type;
- if (read_loose_object(path, oid, &contents, &oi) < 0)
- err = error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
+ ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
+ if (ret < 0) {
+ if (contents && !oideq(&real_oid, oid))
+ error(_("%s: hash-path mismatch, found at: %s"),
+ oid_to_hex(&real_oid), path);
+ else
+ error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
+ }
if (type < 0)
- err = error(_("%s: object is of unknown type '%s': %s"),
- oid_to_hex(oid), sb.buf, path);
- if (err) {
+ ret = error(_("%s: object is of unknown type '%s': %s"),
+ oid_to_hex(&real_oid), sb.buf, path);
+ if (ret < 0) {
errors_found |= ERROR_OBJECT;
return 0; /* keep checking other objects */
}
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7ce69c087ec..15ae406e6b7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1415,7 +1415,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
if (check_object_signature(the_repository, &d->oid,
data, size,
- type_name(type)))
+ type_name(type), NULL))
die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
repl = lookup_replace_object(the_repository, tagged_oid);
ret = check_object_signature(the_repository, repl,
- buffer, size, type_name(*tagged_type));
+ buffer, size, type_name(*tagged_type),
+ NULL);
free(buffer);
return ret;
diff --git a/cache.h b/cache.h
index 6c5f00c82d5..e2a203073ea 100644
--- a/cache.h
+++ b/cache.h
@@ -1361,7 +1361,8 @@ struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *buf, unsigned long size, const char *type);
+ void *buf, unsigned long size, const char *type,
+ struct object_id *real_oidp);
int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/object-file.c b/object-file.c
index fe95285f405..49561e31551 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
* the streaming interface and rehash it to do the same.
*/
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *map, unsigned long size, const char *type)
+ void *map, unsigned long size, const char *type,
+ struct object_id *real_oidp)
{
- struct object_id real_oid;
+ struct object_id tmp;
+ struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
enum object_type obj_type;
struct git_istream *st;
git_hash_ctx c;
@@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
int hdrlen;
if (map) {
- hash_object_file(r->hash_algo, map, size, type, &real_oid);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ hash_object_file(r->hash_algo, map, size, type, real_oid);
+ return !oideq(oid, real_oid) ? -1 : 0;
}
st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
break;
r->hash_algo->update_fn(&c, buf, readlen);
}
- r->hash_algo->final_oid_fn(&real_oid, &c);
+ r->hash_algo->final_oid_fn(real_oid, &c);
close_istream(st);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ return !oideq(oid, real_oid) ? -1 : 0;
}
int git_open_cloexec(const char *name, int flags)
@@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi)
{
@@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
char hdr[MAX_HEADER_LEN];
unsigned long *size = oi->sizep;
- *contents = NULL;
-
map = map_loose_object_1(the_repository, path, NULL, &mapsize);
if (!map) {
error_errno(_("unable to mmap %s"), path);
@@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size, oi->type_name->buf)) {
- error(_("hash mismatch for %s (expected %s)"), path,
- oid_to_hex(expected_oid));
+ *contents, *size, oi->type_name->buf, real_oid)) {
free(*contents);
goto out;
}
diff --git a/object-store.h b/object-store.h
index c90c41a07f7..17b072e5a19 100644
--- a/object-store.h
+++ b/object-store.h
@@ -253,6 +253,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi);
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
(!obj && repo_has_object_file(r, oid) &&
oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
- if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+ if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
error(_("hash mismatch %s"), oid_to_hex(oid));
return NULL;
}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
buffer = repo_read_object_file(r, oid, &type, &size);
if (buffer) {
if (check_object_signature(r, repl, buffer, size,
- type_name(type)) < 0) {
+ type_name(type), NULL) < 0) {
free(buffer);
error(_("hash mismatch %s"), oid_to_hex(repl));
return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
oid_to_hex(&oid), p->pack_name,
(uintmax_t)entries[i].offset);
- else if (check_object_signature(r, &oid, data, size, type_name(type)))
+ else if (check_object_signature(r, &oid, data, size,
+ type_name(type), NULL))
err = error("packed %s from %s is corrupt",
oid_to_hex(&oid), p->pack_name);
else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index a5e7401af8b..0f52ca9cc82 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -512,7 +512,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
# Swap the two to corrupt the repository
mv -f "$other_path" "$empty_path" &&
test_must_fail git fsck 2>err.fsck &&
- grep "hash mismatch" err.fsck &&
+ grep "hash-path mismatch" err.fsck &&
# confirm that cat-file is reading the new swapped-in
# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 167c319823a..eb0e772f098 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -54,6 +54,7 @@ test_expect_success 'object with hash mismatch' '
cd hash-mismatch &&
oid=$(echo blob | git hash-object -w --stdin) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -65,7 +66,7 @@ test_expect_success 'object with hash mismatch' '
git update-ref refs/heads/bogus $cmt &&
test_must_fail git fsck 2>out &&
- grep "$oid.*corrupt" out
+ grep "$oldoid: hash-path mismatch, found at: .*$new" out
)
'
@@ -75,6 +76,7 @@ test_expect_success 'object with hash and type mismatch' '
cd hash-type-mismatch &&
oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -87,8 +89,8 @@ test_expect_success 'object with hash and type mismatch' '
test_must_fail git fsck 2>out &&
- grep "^error: hash mismatch for " out &&
- grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+ grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+ grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
)
'
--
2.33.0.1327.g9926af6cb02
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (16 preceding siblings ...)
2021-09-28 2:18 ` [PATCH v8 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-29 19:50 ` Taylor Blau
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
18 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-29 19:50 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak
On Tue, Sep 28, 2021 at 04:18:41AM +0200, Ævar Arnfjörð Bjarmason wrote:
> This improves fsck error reporting, see the examples in the commit
> messages of 16/17 and 17/17. To get there I've lib-ified more things
> in object-file.c and the general object APIs, i.e. now we'll return
> error codes instead of calling die() in these cases.
>
> v6 of this got a very detailed review from Taylor Blau (thanks a
> lot!), for the v6 see:
> https://lore.kernel.org/git/cover-v6-00.22-00000000000-20210907T104558Z-avarab@gmail.com/
>
> The v7 had a couple of trivial shellscripting issues, a typo'd
> test_oid variable, and a warning on a "test" comparison. For v7 see
> https://lore.kernel.org/git/cover-v7-00.17-00000000000-20210920T190304Z-avarab@gmail.com/
Thanks; I looked at the range-diff and it addresses both of my comments.
This series looks good to me.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v9 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
2021-09-28 2:18 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
` (17 preceding siblings ...)
2021-09-29 19:50 ` [PATCH v8 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
` (18 more replies)
18 siblings, 19 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
This improves fsck error reporting, see the examples in the commit
messages of 16/17 and 17/17. To get there I've lib-ified more things
in object-file.c and the general object APIs, i.e. now we'll return
error codes instead of calling die() in these cases.
Status of this: Since v6 this series has been getting a thorough
review from Taylor Blau, thanks again Taylor! See [1] for the v8, [2]
for Taylor's ack on the [2], and [3] for my own status update on the
last What's Cooking regarding the v8.
The only change since v8 is the plugging of a memory leak introduced
in the previous 16/17. I've been doing integration of my local pending
patches using some follow-up work for the in-flight
ab/sanitize-leak-ci topic, which is already proving quite useful.
1. https://lore.kernel.org/git/cover-v8-00.17-00000000000-20210928T021616Z-avarab@gmail.com/
2. https://lore.kernel.org/git/YVTDgJ7wFl9DCjS+@nand.local/
3. https://lore.kernel.org/git/87czotzaru.fsf@evledraar.gmail.com/
Ævar Arnfjörð Bjarmason (17):
fsck tests: add test for fsck-ing an unknown type
fsck tests: refactor one test to use a sub-repo
fsck tests: test current hash/type mismatch behavior
fsck tests: test for garbage appended to a loose object
cat-file tests: move bogus_* variable declarations earlier
cat-file tests: test for missing/bogus object with -t, -s and -p
cat-file tests: add corrupt loose object test
cat-file tests: test for current --allow-unknown-type behavior
object-file.c: don't set "typep" when returning non-zero
object-file.c: return -1, not "status" from unpack_loose_header()
object-file.c: make parse_loose_header_extended() public
object-file.c: simplify unpack_loose_short_header()
object-file.c: use "enum" return type for unpack_loose_header()
object-file.c: return ULHR_TOO_LONG on "header too long"
object-file.c: stop dying in parse_loose_header()
fsck: don't hard die on invalid object types
fsck: report invalid object type-path combinations
builtin/fast-export.c | 2 +-
builtin/fsck.c | 37 +++++--
builtin/index-pack.c | 2 +-
builtin/mktag.c | 3 +-
cache.h | 45 ++++++++-
object-file.c | 176 +++++++++++++++------------------
object-store.h | 7 +-
object.c | 4 +-
pack-check.c | 3 +-
streaming.c | 27 +++--
t/oid-info/oid | 2 +
t/t1006-cat-file.sh | 223 +++++++++++++++++++++++++++++++++++++++---
t/t1450-fsck.sh | 99 +++++++++++++++----
13 files changed, 468 insertions(+), 162 deletions(-)
Range-diff against v8:
1: b999ab695d9 = 1: 520732612f7 fsck tests: add test for fsck-ing an unknown type
2: e01c21378a4 = 2: af7086623fe fsck tests: refactor one test to use a sub-repo
3: 93197a7bcee = 3: 102bc4f0176 fsck tests: test current hash/type mismatch behavior
4: 277188dd58d = 4: ff7fc09d5a1 fsck tests: test for garbage appended to a loose object
5: ab2ea1beaaf = 5: 278df093239 cat-file tests: move bogus_* variable declarations earlier
6: 91229b94fac = 6: 290bf983590 cat-file tests: test for missing/bogus object with -t, -s and -p
7: 9e95e134d30 = 7: a41b2c571e5 cat-file tests: add corrupt loose object test
8: 215f98ad369 = 8: cedeb117330 cat-file tests: test for current --allow-unknown-type behavior
9: 3e1df3594df = 9: 6f0673d38c8 object-file.c: don't set "typep" when returning non-zero
10: b96828f3d5b = 10: 6637e8fd2ca object-file.c: return -1, not "status" from unpack_loose_header()
11: 273acb45517 = 11: 51db08ebbae object-file.c: make parse_loose_header_extended() public
12: 314d34357dd = 12: dffe5581f6f object-file.c: simplify unpack_loose_short_header()
13: 07481bcb55c = 13: eb7c949c8b7 object-file.c: use "enum" return type for unpack_loose_header()
14: 42b8d135c8c = 14: f4cc7271df7 object-file.c: return ULHR_TOO_LONG on "header too long"
15: 106b7461ce9 = 15: 25d6ec668d4 object-file.c: stop dying in parse_loose_header()
16: d01223ae322 ! 16: 6ce0414b2b7 fsck: don't hard die on invalid object types
@@ Commit message
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().
+ Since we're now passing in a "oi.type_name" we'll have to clean up the
+ allocated "strbuf sb". That we're doing it right is asserted by
+ e.g. the "fsck notices broken commit" test added in 03818a4a94c
+ (split_ident: parse timestamp from end of line, 2013-10-14). To do
+ that switch to a "goto cleanup" pattern, and while we're at it factor
+ out the already duplicated free(content) to use that pattern.
+
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## builtin/fsck.c ##
@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
errors_found |= ERROR_OBJECT;
- error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
- return 0; /* keep checking other objects */
+- return 0; /* keep checking other objects */
++ goto cleanup;
+ }
+
+ if (!contents && type != OBJ_BLOB)
+@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
+ errors_found |= ERROR_OBJECT;
+ error(_("%s: object could not be parsed: %s"),
+ oid_to_hex(oid), path);
+- if (!eaten)
+- free(contents);
+- return 0; /* keep checking other objects */
++ goto cleanup_eaten;
}
+ obj->flags &= ~(REACHABLE | SEEN);
+@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
+ if (fsck_obj(obj, contents, size))
+ errors_found |= ERROR_OBJECT;
+
++cleanup_eaten:
+ if (!eaten)
+ free(contents);
++cleanup:
++ strbuf_release(&sb);
+ return 0; /* keep checking other objects, even if we saw an error */
+ }
+
## object-file.c ##
@@ object-file.c: static int check_stream_oid(git_zstream *stream,
17: 7f394a991a6 ! 17: 8d926e41fc3 fsck: report invalid object type-path combinations
@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
+ oid_to_hex(&real_oid), sb.buf, path);
+ if (ret < 0) {
errors_found |= ERROR_OBJECT;
- return 0; /* keep checking other objects */
+ goto cleanup;
}
## builtin/index-pack.c ##
--
2.33.0.1374.g05459a61530
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 19:22 ` Andrei Rybak
2021-09-30 13:37 ` [PATCH v9 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
` (17 subsequent siblings)
18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.
This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..969bfbbdd8f 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
+test_expect_success 'fsck hard errors on an invalid object type' '
+ git init --bare garbage-type &&
+ (
+ cd garbage-type &&
+
+ empty=$(git hash-object --stdin -w -t blob </dev/null) &&
+ garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+ test_must_fail git fsck >out 2>err &&
+ test_cmp err.expect err &&
+ test_must_be_empty out
+ )
+'
+
test_done
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type
2021-09-30 13:37 ` [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-30 19:22 ` Andrei Rybak
2021-10-01 9:05 ` Ævar Arnfjörð Bjarmason
0 siblings, 1 reply; 245+ messages in thread
From: Andrei Rybak @ 2021-09-30 19:22 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason, git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Taylor Blau
On 30/09/2021 15:37, Ævar Arnfjörð Bjarmason wrote:
> Fix a blindspot in the fsck tests by checking what we do when we
> encounter an unknown "garbage" type produced with hash-object's
> --literally option.
>
> This behavior needs to be improved, which'll be done in subsequent
> patches, but for now let's test for the current behavior.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> t/t1450-fsck.sh | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 5071ac63a5b..969bfbbdd8f 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
> test_i18ngrep "bad index file" errors
> '
>
> +test_expect_success 'fsck hard errors on an invalid object type' '
> + git init --bare garbage-type &&
> + (
> + cd garbage-type &&
> +
> + empty=$(git hash-object --stdin -w -t blob </dev/null) &&
> + garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
Patch 01/17 introduces two unused variables: "garbage" and "empty".
However, patch 16/17 introduces grep checks for "garbage_blob" and
"empty_blob". Aside from that, 't/test-lib.sh' already defines
$EMPTY_BLOB.
> +
> + cat >err.expect <<-\EOF &&
> + fatal: invalid object type
> + EOF
> + test_must_fail git fsck >out 2>err &&
> + test_cmp err.expect err &&
> + test_must_be_empty out
> + )
> +'
> +
> test_done
>
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type
2021-09-30 19:22 ` Andrei Rybak
@ 2021-10-01 9:05 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:05 UTC (permalink / raw)
To: Andrei Rybak; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Taylor Blau
On Thu, Sep 30 2021, Andrei Rybak wrote:
> On 30/09/2021 15:37, Ævar Arnfjörð Bjarmason wrote:
>> Fix a blindspot in the fsck tests by checking what we do when we
>> encounter an unknown "garbage" type produced with hash-object's
>> --literally option.
>> This behavior needs to be improved, which'll be done in subsequent
>> patches, but for now let's test for the current behavior.
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>> t/t1450-fsck.sh | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
>> index 5071ac63a5b..969bfbbdd8f 100755
>> --- a/t/t1450-fsck.sh
>> +++ b/t/t1450-fsck.sh
>> @@ -865,4 +865,21 @@ test_expect_success 'detect corrupt index file in fsck' '
>> test_i18ngrep "bad index file" errors
>> '
>> +test_expect_success 'fsck hard errors on an invalid object type'
>> '
>> + git init --bare garbage-type &&
>> + (
>> + cd garbage-type &&
>> +
>> + empty=$(git hash-object --stdin -w -t blob </dev/null) &&
>> + garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
>
> Patch 01/17 introduces two unused variables: "garbage" and "empty".
> However, patch 16/17 introduces grep checks for "garbage_blob" and
> "empty_blob". Aside from that, 't/test-lib.sh' already defines
> $EMPTY_BLOB.
Will fix in the v10 re-roll.
I think this is from an earlier version where I used the $empty, FWIW
you do need it (or to write it) even with $EMPTY_BLOB since that's just
the OID, but doesn't give you the object. You can write the /dev/null
input and then use $EMPTY_BLOB, but I thought using the output of
hash-object was less confusing.
But in any case it isn't needed herea as you point out, we just need to
write the garbage object, we don't need either variable. Thanks!
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v9 02/17] fsck tests: refactor one test to use a sub-repo
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
` (16 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.
We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.
See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.
1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 969bfbbdd8f..f8edd15abf8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,25 @@ remove_object () {
rm "$(sha1_file "$1")"
}
-test_expect_success 'object with bad sha1' '
- sha=$(echo blob | git hash-object -w --stdin) &&
- old=$(test_oid_to_path "$sha") &&
- new=$(dirname $old)/$(test_oid ff_2) &&
- sha="$(dirname $new)$(basename $new)" &&
- mv .git/objects/$old .git/objects/$new &&
- test_when_finished "remove_object $sha" &&
- git update-index --add --cacheinfo 100644 $sha foo &&
- test_when_finished "git read-tree -u --reset HEAD" &&
- tree=$(git write-tree) &&
- test_when_finished "remove_object $tree" &&
- cmt=$(echo bogus | git commit-tree $tree) &&
- test_when_finished "remove_object $cmt" &&
- git update-ref refs/heads/bogus $cmt &&
- test_when_finished "git update-ref -d refs/heads/bogus" &&
+test_expect_success 'object with hash mismatch' '
+ git init --bare hash-mismatch &&
+ (
+ cd hash-mismatch &&
- test_must_fail git fsck 2>out &&
- test_i18ngrep "$sha.*corrupt" out
+ oid=$(echo blob | git hash-object -w --stdin) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+
+ test_must_fail git fsck 2>out &&
+ grep "$oid.*corrupt" out
+ )
'
test_expect_success 'branch pointing to non-commit' '
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 03/17] fsck tests: test current hash/type mismatch behavior
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
` (15 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f8edd15abf8..175ed304637 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,6 +69,30 @@ test_expect_success 'object with hash mismatch' '
)
'
+test_expect_success 'object with hash and type mismatch' '
+ git init --bare hash-type-mismatch &&
+ (
+ cd hash-type-mismatch &&
+
+ oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+
+ cat >expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+ test_must_fail git fsck 2>actual &&
+ test_cmp expect actual
+ )
+'
+
test_expect_success 'branch pointing to non-commit' '
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
test_when_finished "git update-ref -d refs/heads/invalid" &&
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 04/17] fsck tests: test for garbage appended to a loose object
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (2 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
` (14 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 175ed304637..bd696d21dba 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -93,6 +93,26 @@ test_expect_success 'object with hash and type mismatch' '
)
'
+test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
+ git init --bare corrupt-loose-output &&
+ (
+ cd corrupt-loose-output &&
+ oid=$(git hash-object -w --stdin --literally </dev/null) &&
+ oidf=objects/$(test_oid_to_path "$oid") &&
+ chmod 755 $oidf &&
+ echo extra garbage >>$oidf &&
+
+ cat >expect.error <<-EOF &&
+ error: garbage at end of loose object '\''$oid'\''
+ error: unable to unpack contents of ./$oidf
+ error: $oid: object corrupt or missing: ./$oidf
+ EOF
+ test_must_fail git fsck 2>actual &&
+ grep ^error: actual >error &&
+ test_cmp expect.error error
+ )
+'
+
test_expect_success 'branch pointing to non-commit' '
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
test_when_finished "git update-ref -d refs/heads/invalid" &&
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 05/17] cat-file tests: move bogus_* variable declarations earlier
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (3 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
` (13 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 35 +++++++++++++++++++----------------
1 file changed, 19 insertions(+), 16 deletions(-)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..ea6a53d425b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,36 +315,39 @@ test_expect_success '%(deltabase) reports packed delta bases' '
}
'
-bogus_type="bogus"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'setup bogus data' '
+ bogus_short_type="bogus" &&
+ bogus_short_content="bogus" &&
+ bogus_short_size=$(strlen "$bogus_short_content") &&
+ bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+
+ bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
+ bogus_long_content="bogus" &&
+ bogus_long_size=$(strlen "$bogus_long_content") &&
+ bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+'
test_expect_success "Type of broken object is correct" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_short_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
test_cmp expect actual
'
test_expect_success "Size of broken object is correct" '
- echo $bogus_size >expect &&
- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_short_size >expect &&
+ git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
test_cmp expect actual
'
-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
test_expect_success "Type of broken object is correct when type is large" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_long_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
test_cmp expect actual
'
test_expect_success "Size of large broken object is correct when type is large" '
- echo $bogus_size >expect &&
- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_long_size >expect &&
+ git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
test_cmp expect actual
'
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (4 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
` (12 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().
The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.
The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.
Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.
This extends tests added in 3e370f9faf0 (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/oid-info/oid | 2 ++
t/t1006-cat-file.sh | 75 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 77 insertions(+)
diff --git a/t/oid-info/oid b/t/oid-info/oid
index a754970523c..7547d2c7903 100644
--- a/t/oid-info/oid
+++ b/t/oid-info/oid
@@ -27,3 +27,5 @@ numeric sha1:0123456789012345678901234567890123456789
numeric sha256:0123456789012345678901234567890123456789012345678901234567890123
deadbeef sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
deadbeef sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef_short sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
+deadbeef_short sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ea6a53d425b..abf57339a29 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
'
+for arg1 in '' --allow-unknown-type
+do
+ for arg2 in -s -t -p
+ do
+ if test "$arg1" = "--allow-unknown-type" && test "$arg2" = "-p"
+ then
+ continue
+ fi
+
+
+ test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
+ cat >expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ if test "$arg1" = "--allow-unknown-type"
+ then
+ git cat-file $arg1 $arg2 $bogus_short_sha1
+ else
+ test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+ test_must_be_empty out &&
+ test_cmp expect actual
+ fi
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
+ if test "$arg2" = "-p"
+ then
+ cat >expect <<-EOF
+ error: unable to unpack $bogus_long_sha1 header
+ fatal: Not a valid object name $bogus_long_sha1
+ EOF
+ else
+ cat >expect <<-EOF
+ error: unable to unpack $bogus_long_sha1 header
+ fatal: git cat-file: could not get object info
+ EOF
+ fi &&
+
+ if test "$arg1" = "--allow-unknown-type"
+ then
+ git cat-file $arg1 $arg2 $bogus_short_sha1
+ else
+ test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+ test_must_be_empty out &&
+ test_cmp expect actual
+ fi
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
+ cat >expect.err <<-EOF &&
+ fatal: Not a valid object name $(test_oid deadbeef_short)
+ EOF
+ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
+ test_must_be_empty out
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
+ if test "$arg2" = "-p"
+ then
+ cat >expect.err <<-EOF
+ fatal: Not a valid object name $(test_oid deadbeef)
+ EOF
+ else
+ cat >expect.err <<-\EOF
+ fatal: git cat-file: could not get object info
+ EOF
+ fi &&
+ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
+ test_must_be_empty out &&
+ test_cmp expect.err err.actual
+ '
+ done
+done
+
test_expect_success "Type of broken object is correct" '
echo $bogus_short_type >expect &&
git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 07/17] cat-file tests: add corrupt loose object test
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (5 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
` (11 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index abf57339a29..15774979ad3 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -426,6 +426,58 @@ test_expect_success "Size of large broken object is correct when type is large"
test_cmp expect actual
'
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+ git init --bare corrupt-loose.git &&
+ (
+ cd corrupt-loose.git &&
+
+ # Setup and create the empty blob and its path
+ empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+ git hash-object -w --stdin </dev/null &&
+
+ # Create another blob and its path
+ echo other >other.blob &&
+ other_blob=$(git hash-object -w --stdin <other.blob) &&
+ other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+ # Before the swap the size is 0
+ cat >out.expect <<-EOF &&
+ 0
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # Swap the two to corrupt the repository
+ mv -f "$other_path" "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "hash mismatch" err.fsck &&
+
+ # confirm that cat-file is reading the new swapped-in
+ # blob...
+ cat >out.expect <<-EOF &&
+ blob
+ EOF
+ git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # ... since it has a different size now.
+ cat >out.expect <<-EOF &&
+ 6
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # So far "cat-file" has been happy to spew the found
+ # content out as-is. Try to make it zlib-invalid.
+ mv -f other.blob "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "^error: inflate: data stream error (" err.fsck
+ )
+'
+
# Tests for git cat-file --follow-symlinks
test_expect_success 'prep for symlink tests' '
echo_without_newline "$hello_content" >morx &&
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 08/17] cat-file tests: test for current --allow-unknown-type behavior
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (6 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
` (10 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.
1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 15774979ad3..5b16c69c286 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -402,6 +402,67 @@ do
done
done
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+ git cat-file -e $bogus_short_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+ test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+ test_must_fail git cat-file -p $bogus_short_sha1 &&
+ test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type "bogus"
+ EOF
+ test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+ echo $bogus_short_sha1 >bogus-oid &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual &&
+
+ test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+ test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+ test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+ cat >expect <<-EOF &&
+ $bogus_short_type
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual &&
+
+ # Create it manually, as "git replace" will die on bogus
+ # types.
+ head=$(git rev-parse --verify HEAD) &&
+ test_when_finished "rm -rf .git/refs/replace" &&
+ mkdir -p .git/refs/replace &&
+ echo $head >.git/refs/replace/$bogus_short_sha1 &&
+
+ cat >expect <<-EOF &&
+ commit
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual
+'
+
test_expect_success "Type of broken object is correct" '
echo $bogus_short_type >expect &&
git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 09/17] object-file.c: don't set "typep" when returning non-zero
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (7 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
` (9 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.
That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.
Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.
Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.
This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.
Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.
Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/object-file.c b/object-file.c
index be4f94ecf3b..766ba88b851 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1525,8 +1525,6 @@ static int loose_object_info(struct repository *r,
git_inflate_end(&stream);
munmap(map, mapsize);
- if (status && oi->typep)
- *oi->typep = status;
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 10/17] object-file.c: return -1, not "status" from unpack_loose_header()
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (8 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
` (8 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.
See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".
At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").
However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.
So let's do the minor cleanup of also changing this function to return
a -1.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/object-file.c b/object-file.c
index 766ba88b851..8475b128944 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1284,7 +1284,7 @@ int unpack_loose_header(git_zstream *stream,
buffer, bufsiz);
if (status < Z_OK)
- return status;
+ return -1;
/* Make sure we have the terminating NUL */
if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 11/17] object-file.c: make parse_loose_header_extended() public
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (9 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
` (7 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.
This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.
It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.
1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 4 +++-
object-file.c | 20 +++++++-------------
streaming.c | 5 ++++-
3 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/cache.h b/cache.h
index f6295f3b048..35f254dae4a 100644
--- a/cache.h
+++ b/cache.h
@@ -1320,7 +1320,9 @@ char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+struct object_info;
+int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags);
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 8475b128944..6b91c4edcf6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1385,8 +1385,8 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1446,14 +1446,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
return *hdr ? -1 : type;
}
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
- struct object_info oi = OBJECT_INFO_INIT;
-
- oi.sizep = sizep;
- return parse_loose_header_extended(hdr, &oi, 0);
-}
-
static int loose_object_info(struct repository *r,
const struct object_id *oid,
struct object_info *oi, int flags)
@@ -1508,10 +1500,10 @@ static int loose_object_info(struct repository *r,
if (status < 0)
; /* Do nothing */
else if (hdrbuf.len) {
- if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+ if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
- } else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+ } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
status = error(_("unable to parse %s header"), oid_to_hex(oid));
if (status >= 0 && oi->contentp) {
@@ -2599,6 +2591,8 @@ int read_loose_object(const char *path,
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = size;
*contents = NULL;
@@ -2613,7 +2607,7 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, size);
+ *type = parse_loose_header(hdr, &oi, 0);
if (*type < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
const struct object_id *oid,
enum object_type *type)
{
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = &st->size;
+
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapsize,
st->u.loose.hdr,
sizeof(st->u.loose.hdr)) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+ (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
return -1;
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 12/17] object-file.c: simplify unpack_loose_short_header()
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (10 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
` (6 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.
The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).
Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.
I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 17 ++++++++++++++-
object-file.c | 58 ++++++++++++++++++---------------------------------
streaming.c | 3 ++-
3 files changed, 38 insertions(+), 40 deletions(-)
diff --git a/cache.h b/cache.h
index 35f254dae4a..d7189aed8fc 100644
--- a/cache.h
+++ b/cache.h
@@ -1319,7 +1319,22 @@ char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
+
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+ unsigned long mapsize, void *buffer,
+ unsigned long bufsiz, struct strbuf *hdrbuf);
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi,
unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 6b91c4edcf6..1327872cbf4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,11 +1255,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-static int unpack_loose_short_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+ unsigned char *map, unsigned long mapsize,
+ void *buffer, unsigned long bufsiz,
+ struct strbuf *header)
{
- int ret;
+ int status;
/* Get the data stream */
memset(stream, 0, sizeof(*stream));
@@ -1270,35 +1271,8 @@ static int unpack_loose_short_header(git_zstream *stream,
git_inflate_init(stream);
obj_read_unlock();
- ret = git_inflate(stream, 0);
+ status = git_inflate(stream, 0);
obj_read_lock();
-
- return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
-{
- int status = unpack_loose_short_header(stream, map, mapsize,
- buffer, bufsiz);
-
- if (status < Z_OK)
- return -1;
-
- /* Make sure we have the terminating NUL */
- if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return -1;
- return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *header)
-{
- int status;
-
- status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
if (status < Z_OK)
return -1;
@@ -1308,6 +1282,14 @@ static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
return 0;
+ /*
+ * We have a header longer than MAX_HEADER_LEN. The "header"
+ * here is only non-NULL when we run "cat-file
+ * --allow-unknown-type".
+ */
+ if (!header)
+ return -1;
+
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
* result out to header, and then append the result of further
@@ -1457,6 +1439,7 @@ static int loose_object_info(struct repository *r,
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
oidclr(oi->delta_base_oid);
@@ -1490,11 +1473,9 @@ static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
- if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
- status = error(_("unable to unpack %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+
+ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL) < 0)
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
if (status < 0)
@@ -2602,7 +2583,8 @@ int read_loose_object(const char *path,
goto out;
}
- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ NULL) < 0) {
error(_("unable to unpack header of %s"), path);
goto out;
}
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped,
st->u.loose.mapsize,
st->u.loose.hdr,
- sizeof(st->u.loose.hdr)) < 0) ||
+ sizeof(st->u.loose.hdr),
+ NULL) < 0) ||
(parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 13/17] object-file.c: use "enum" return type for unpack_loose_header()
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (11 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
` (5 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.
Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 19 +++++++++++++++----
object-file.c | 34 +++++++++++++++++++++-------------
streaming.c | 23 +++++++++++++----------
3 files changed, 49 insertions(+), 27 deletions(-)
diff --git a/cache.h b/cache.h
index d7189aed8fc..7239e20a625 100644
--- a/cache.h
+++ b/cache.h
@@ -1324,7 +1324,10 @@ int git_open_cloexec(const char *name, int flags);
* unpack_loose_header() initializes the data stream needed to unpack
* a loose object header.
*
- * Returns 0 on success. Returns negative values on error.
+ * Returns:
+ *
+ * - ULHR_OK on success
+ * - ULHR_BAD on error
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
@@ -1332,9 +1335,17 @@ int git_open_cloexec(const char *name, int flags);
* reporting. The full header will be extracted to "hdrbuf" for use
* with parse_loose_header().
*/
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result {
+ ULHR_OK,
+ ULHR_BAD,
+};
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *hdrbuf);
+
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi,
unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 1327872cbf4..e0f508415dd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,10 +1255,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz,
- struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *header)
{
int status;
@@ -1274,13 +1276,13 @@ int unpack_loose_header(git_zstream *stream,
status = git_inflate(stream, 0);
obj_read_lock();
if (status < Z_OK)
- return -1;
+ return ULHR_BAD;
/*
* Check if entire header is unpacked in the first iteration.
*/
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return 0;
+ return ULHR_OK;
/*
* We have a header longer than MAX_HEADER_LEN. The "header"
@@ -1288,7 +1290,7 @@ int unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
- return -1;
+ return ULHR_BAD;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
@@ -1309,7 +1311,7 @@ int unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
- return -1;
+ return ULHR_BAD;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ -1474,13 +1476,19 @@ static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
- allow_unknown ? &hdrbuf : NULL) < 0)
+ switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL)) {
+ case ULHR_OK:
+ break;
+ case ULHR_BAD:
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
- if (status < 0)
- ; /* Do nothing */
- else if (hdrbuf.len) {
+ break;
+ }
+
+ if (status < 0) {
+ /* Do nothing */
+ } else if (hdrbuf.len) {
if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..6df0247a4cb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -229,17 +229,16 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
- if ((unpack_loose_header(&st->z,
- st->u.loose.mapped,
- st->u.loose.mapsize,
- st->u.loose.hdr,
- sizeof(st->u.loose.hdr),
- NULL) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
- git_inflate_end(&st->z);
- munmap(st->u.loose.mapped, st->u.loose.mapsize);
- return -1;
+ switch (unpack_loose_header(&st->z, st->u.loose.mapped,
+ st->u.loose.mapsize, st->u.loose.hdr,
+ sizeof(st->u.loose.hdr), NULL)) {
+ case ULHR_OK:
+ break;
+ case ULHR_BAD:
+ goto error;
}
+ if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+ goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
st->u.loose.hdr_avail = st->z.total_out;
@@ -248,6 +247,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->read = read_istream_loose;
return 0;
+error:
+ git_inflate_end(&st->z);
+ munmap(st->u.loose.mapped, st->u.loose.mapsize);
+ return -1;
}
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 14/17] object-file.c: return ULHR_TOO_LONG on "header too long"
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (12 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.
As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 5 ++++-
object-file.c | 8 ++++++--
streaming.c | 1 +
t/t1006-cat-file.sh | 4 ++--
4 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/cache.h b/cache.h
index 7239e20a625..8e05392fda8 100644
--- a/cache.h
+++ b/cache.h
@@ -1328,16 +1328,19 @@ int git_open_cloexec(const char *name, int flags);
*
* - ULHR_OK on success
* - ULHR_BAD on error
+ * - ULHR_TOO_LONG if the header was too long
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
* OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
* reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
+ * from this function to indicate that the header was too long.
*/
enum unpack_loose_header_result {
ULHR_OK,
ULHR_BAD,
+ ULHR_TOO_LONG,
};
enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
unsigned char *map,
diff --git a/object-file.c b/object-file.c
index e0f508415dd..3589c5a2e33 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1290,7 +1290,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
- return ULHR_BAD;
+ return ULHR_TOO_LONG;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
@@ -1311,7 +1311,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
- return ULHR_BAD;
+ return ULHR_TOO_LONG;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ -1484,6 +1484,10 @@ static int loose_object_info(struct repository *r,
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
break;
+ case ULHR_TOO_LONG:
+ status = error(_("header for %s too long, exceeds %d bytes"),
+ oid_to_hex(oid), MAX_HEADER_LEN);
+ break;
}
if (status < 0) {
diff --git a/streaming.c b/streaming.c
index 6df0247a4cb..bd89c50e7b3 100644
--- a/streaming.c
+++ b/streaming.c
@@ -235,6 +235,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
case ULHR_OK:
break;
case ULHR_BAD:
+ case ULHR_TOO_LONG:
goto error;
}
if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5b16c69c286..a5e7401af8b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -356,12 +356,12 @@ do
if test "$arg2" = "-p"
then
cat >expect <<-EOF
- error: unable to unpack $bogus_long_sha1 header
+ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
fatal: Not a valid object name $bogus_long_sha1
EOF
else
cat >expect <<-EOF
- error: unable to unpack $bogus_long_sha1 header
+ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
fatal: git cat-file: could not get object info
EOF
fi &&
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 15/17] object-file.c: stop dying in parse_loose_header()
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (13 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Make parse_loose_header() return error codes and data instead of
invoking die() by itself.
For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.
For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".
Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().
This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.
Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".
The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.
Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.
We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 11 +++++++--
object-file.c | 67 +++++++++++++++++++++++++--------------------------
streaming.c | 3 ++-
3 files changed, 44 insertions(+), 37 deletions(-)
diff --git a/cache.h b/cache.h
index 8e05392fda8..6c5f00c82d5 100644
--- a/cache.h
+++ b/cache.h
@@ -1349,9 +1349,16 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
unsigned long bufsiz,
struct strbuf *hdrbuf);
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
struct object_info;
-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags);
+int parse_loose_header(const char *hdr, struct object_info *oi);
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 3589c5a2e33..a70669700d0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1369,8 +1369,7 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1392,15 +1391,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
type = type_from_string_gently(type_buf, type_len, 1);
if (oi->type_name)
strbuf_add(oi->type_name, type_buf, type_len);
- /*
- * Set type to 0 if its an unknown object and
- * we're obtaining the type using '--allow-unknown-type'
- * option.
- */
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
- type = 0;
- else if (type < 0)
- die(_("invalid object type"));
if (oi->typep)
*oi->typep = type;
@@ -1427,7 +1417,14 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
/*
* The length must be followed by a zero byte
*/
- return *hdr ? -1 : type;
+ if (*hdr)
+ return -1;
+
+ /*
+ * The format is valid, but the type may still be bogus. The
+ * Caller needs to check its oi->typep.
+ */
+ return 0;
}
static int loose_object_info(struct repository *r,
@@ -1441,6 +1438,7 @@ static int loose_object_info(struct repository *r,
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ enum object_type type_scratch;
int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
@@ -1472,6 +1470,8 @@ static int loose_object_info(struct repository *r,
if (!oi->sizep)
oi->sizep = &size_scratch;
+ if (!oi->typep)
+ oi->typep = &type_scratch;
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
@@ -1479,6 +1479,18 @@ static int loose_object_info(struct repository *r,
switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
allow_unknown ? &hdrbuf : NULL)) {
case ULHR_OK:
+ if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
+ status = error(_("unable to parse %s header"), oid_to_hex(oid));
+ else if (!allow_unknown && *oi->typep < 0)
+ die(_("invalid object type"));
+
+ if (!oi->contentp)
+ break;
+ *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+ if (*oi->contentp)
+ goto cleanup;
+
+ status = -1;
break;
case ULHR_BAD:
status = error(_("unable to unpack %s header"),
@@ -1490,31 +1502,16 @@ static int loose_object_info(struct repository *r,
break;
}
- if (status < 0) {
- /* Do nothing */
- } else if (hdrbuf.len) {
- if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
- status = error(_("unable to parse %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
- status = error(_("unable to parse %s header"), oid_to_hex(oid));
-
- if (status >= 0 && oi->contentp) {
- *oi->contentp = unpack_loose_rest(&stream, hdr,
- *oi->sizep, oid);
- if (!*oi->contentp) {
- git_inflate_end(&stream);
- status = -1;
- }
- } else
- git_inflate_end(&stream);
-
+ git_inflate_end(&stream);
+cleanup:
munmap(map, mapsize);
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
+ if (oi->typep == &type_scratch)
+ oi->typep = NULL;
oi->whence = OI_LOOSE;
- return (status < 0) ? status : 0;
+ return status;
}
int obj_read_use_lock = 0;
@@ -2585,6 +2582,7 @@ int read_loose_object(const char *path,
git_zstream stream;
char hdr[MAX_HEADER_LEN];
struct object_info oi = OBJECT_INFO_INIT;
+ oi.typep = type;
oi.sizep = size;
*contents = NULL;
@@ -2601,12 +2599,13 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, &oi, 0);
- if (*type < 0) {
+ if (parse_loose_header(hdr, &oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
}
+ if (*type < 0)
+ die(_("invalid object type"));
if (*type == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/streaming.c b/streaming.c
index bd89c50e7b3..fe54665d86e 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
{
struct object_info oi = OBJECT_INFO_INIT;
oi.sizep = &st->size;
+ oi.typep = type;
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
@@ -238,7 +239,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
case ULHR_TOO_LONG:
goto error;
}
- if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+ if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 16/17] fsck: don't hard die on invalid object types
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (14 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 13:37 ` [PATCH v9 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
18 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Change the error fsck emits on invalid object types, such as:
$ git hash-object --stdin -w -t garbage --literally </dev/null
<OID>
From the very ungraceful error of:
$ git fsck
fatal: invalid object type
$
To:
$ git fsck
error: <OID>: object is of unknown type 'garbage': <OID_PATH>
[ other fsck output ]
We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).
To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().
Since we're now passing in a "oi.type_name" we'll have to clean up the
allocated "strbuf sb". That we're doing it right is asserted by
e.g. the "fsck notices broken commit" test added in 03818a4a94c
(split_ident: parse timestamp from end of line, 2013-10-14). To do
that switch to a "goto cleanup" pattern, and while we're at it factor
out the already duplicated free(content) to use that pattern.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fsck.c | 26 +++++++++++++++++++-------
object-file.c | 18 ++++++------------
object-store.h | 6 +++---
t/t1450-fsck.sh | 17 +++++++++--------
4 files changed, 37 insertions(+), 30 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..623f8fc3194 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,12 +600,23 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
unsigned long size;
void *contents;
int eaten;
+ struct strbuf sb = STRBUF_INIT;
+ struct object_info oi = OBJECT_INFO_INIT;
+ int err = 0;
- if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+ oi.type_name = &sb;
+ oi.sizep = &size;
+ oi.typep = &type;
+
+ if (read_loose_object(path, oid, &contents, &oi) < 0)
+ err = error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
+ if (type < 0)
+ err = error(_("%s: object is of unknown type '%s': %s"),
+ oid_to_hex(oid), sb.buf, path);
+ if (err) {
errors_found |= ERROR_OBJECT;
- error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
- return 0; /* keep checking other objects */
+ goto cleanup;
}
if (!contents && type != OBJ_BLOB)
@@ -618,9 +629,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
errors_found |= ERROR_OBJECT;
error(_("%s: object could not be parsed: %s"),
oid_to_hex(oid), path);
- if (!eaten)
- free(contents);
- return 0; /* keep checking other objects */
+ goto cleanup_eaten;
}
obj->flags &= ~(REACHABLE | SEEN);
@@ -628,8 +637,11 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
if (fsck_obj(obj, contents, size))
errors_found |= ERROR_OBJECT;
+cleanup_eaten:
if (!eaten)
free(contents);
+cleanup:
+ strbuf_release(&sb);
return 0; /* keep checking other objects, even if we saw an error */
}
diff --git a/object-file.c b/object-file.c
index a70669700d0..fe95285f405 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2572,18 +2572,15 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents)
+ void **contents,
+ struct object_info *oi)
{
int ret = -1;
void *map = NULL;
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
- struct object_info oi = OBJECT_INFO_INIT;
- oi.typep = type;
- oi.sizep = size;
+ unsigned long *size = oi->sizep;
*contents = NULL;
@@ -2599,15 +2596,13 @@ int read_loose_object(const char *path,
goto out;
}
- if (parse_loose_header(hdr, &oi) < 0) {
+ if (parse_loose_header(hdr, oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
}
- if (*type < 0)
- die(_("invalid object type"));
- if (*type == OBJ_BLOB && *size > big_file_threshold) {
+ if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
goto out;
} else {
@@ -2618,8 +2613,7 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size,
- type_name(*type))) {
+ *contents, *size, oi->type_name->buf)) {
error(_("hash mismatch for %s (expected %s)"), path,
oid_to_hex(expected_oid));
free(*contents);
diff --git a/object-store.h b/object-store.h
index c5130d8baea..c90c41a07f7 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,6 +245,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
/*
* Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
* type, and size. If the object is a blob, then "contents" may return NULL,
* to allow streaming of large blobs.
*
@@ -252,9 +253,8 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents);
+ void **contents,
+ struct object_info *oi);
/* Retry packed storage after checking packed and loose storage */
#define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bd696d21dba..167c319823a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -85,11 +85,10 @@ test_expect_success 'object with hash and type mismatch' '
cmt=$(echo bogus | git commit-tree $tree) &&
git update-ref refs/heads/bogus $cmt &&
- cat >expect <<-\EOF &&
- fatal: invalid object type
- EOF
- test_must_fail git fsck 2>actual &&
- test_cmp expect actual
+
+ test_must_fail git fsck 2>out &&
+ grep "^error: hash mismatch for " out &&
+ grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
)
'
@@ -910,7 +909,7 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
git init --bare garbage-type &&
(
cd garbage-type &&
@@ -922,8 +921,10 @@ test_expect_success 'fsck hard errors on an invalid object type' '
fatal: invalid object type
EOF
test_must_fail git fsck >out 2>err &&
- test_cmp err.expect err &&
- test_must_be_empty out
+ grep -e "^error" -e "^fatal" err >errors &&
+ test_line_count = 1 errors &&
+ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+ grep "dangling blob $empty_blob" out
)
'
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v9 17/17] fsck: report invalid object type-path combinations
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (15 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-30 13:37 ` Ævar Arnfjörð Bjarmason
2021-09-30 21:01 ` Junio C Hamano
2021-09-30 19:06 ` [PATCH v9 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
18 siblings, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-30 13:37 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git fsck
error: garbage at end of loose object 'e69d[...]'
error: unable to unpack contents of ./objects/e6/9d[...]
error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ /usr/bin/git fsck
fatal: invalid object type
$ ~/g/git/git fsck
error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
[...]
I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.
There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fast-export.c | 2 +-
builtin/fsck.c | 23 +++++++++++++++--------
builtin/index-pack.c | 2 +-
builtin/mktag.c | 3 ++-
cache.h | 3 ++-
object-file.c | 21 ++++++++++-----------
object-store.h | 1 +
object.c | 4 ++--
pack-check.c | 3 ++-
t/t1006-cat-file.sh | 2 +-
t/t1450-fsck.sh | 8 +++++---
11 files changed, 42 insertions(+), 30 deletions(-)
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 95e8e89e81f..8e2caf72819 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
if (!buf)
die("could not read blob %s", oid_to_hex(oid));
if (check_object_signature(the_repository, oid, buf, size,
- type_name(type)) < 0)
+ type_name(type), NULL) < 0)
die("oid mismatch in blob %s", oid_to_hex(oid));
object = parse_object_buffer(the_repository, oid, type,
size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 623f8fc3194..980c26e3b25 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -598,23 +598,30 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
struct object *obj;
enum object_type type;
unsigned long size;
- void *contents;
+ unsigned char *contents = NULL;
int eaten;
struct strbuf sb = STRBUF_INIT;
struct object_info oi = OBJECT_INFO_INIT;
- int err = 0;
+ struct object_id real_oid = *null_oid();
+ int ret;
oi.type_name = &sb;
oi.sizep = &size;
oi.typep = &type;
- if (read_loose_object(path, oid, &contents, &oi) < 0)
- err = error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
+ ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
+ if (ret < 0) {
+ if (contents && !oideq(&real_oid, oid))
+ error(_("%s: hash-path mismatch, found at: %s"),
+ oid_to_hex(&real_oid), path);
+ else
+ error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
+ }
if (type < 0)
- err = error(_("%s: object is of unknown type '%s': %s"),
- oid_to_hex(oid), sb.buf, path);
- if (err) {
+ ret = error(_("%s: object is of unknown type '%s': %s"),
+ oid_to_hex(&real_oid), sb.buf, path);
+ if (ret < 0) {
errors_found |= ERROR_OBJECT;
goto cleanup;
}
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7ce69c087ec..15ae406e6b7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1415,7 +1415,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
if (check_object_signature(the_repository, &d->oid,
data, size,
- type_name(type)))
+ type_name(type), NULL))
die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
repl = lookup_replace_object(the_repository, tagged_oid);
ret = check_object_signature(the_repository, repl,
- buffer, size, type_name(*tagged_type));
+ buffer, size, type_name(*tagged_type),
+ NULL);
free(buffer);
return ret;
diff --git a/cache.h b/cache.h
index 6c5f00c82d5..e2a203073ea 100644
--- a/cache.h
+++ b/cache.h
@@ -1361,7 +1361,8 @@ struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *buf, unsigned long size, const char *type);
+ void *buf, unsigned long size, const char *type,
+ struct object_id *real_oidp);
int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/object-file.c b/object-file.c
index fe95285f405..49561e31551 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
* the streaming interface and rehash it to do the same.
*/
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *map, unsigned long size, const char *type)
+ void *map, unsigned long size, const char *type,
+ struct object_id *real_oidp)
{
- struct object_id real_oid;
+ struct object_id tmp;
+ struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
enum object_type obj_type;
struct git_istream *st;
git_hash_ctx c;
@@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
int hdrlen;
if (map) {
- hash_object_file(r->hash_algo, map, size, type, &real_oid);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ hash_object_file(r->hash_algo, map, size, type, real_oid);
+ return !oideq(oid, real_oid) ? -1 : 0;
}
st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
break;
r->hash_algo->update_fn(&c, buf, readlen);
}
- r->hash_algo->final_oid_fn(&real_oid, &c);
+ r->hash_algo->final_oid_fn(real_oid, &c);
close_istream(st);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ return !oideq(oid, real_oid) ? -1 : 0;
}
int git_open_cloexec(const char *name, int flags)
@@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi)
{
@@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
char hdr[MAX_HEADER_LEN];
unsigned long *size = oi->sizep;
- *contents = NULL;
-
map = map_loose_object_1(the_repository, path, NULL, &mapsize);
if (!map) {
error_errno(_("unable to mmap %s"), path);
@@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size, oi->type_name->buf)) {
- error(_("hash mismatch for %s (expected %s)"), path,
- oid_to_hex(expected_oid));
+ *contents, *size, oi->type_name->buf, real_oid)) {
free(*contents);
goto out;
}
diff --git a/object-store.h b/object-store.h
index c90c41a07f7..17b072e5a19 100644
--- a/object-store.h
+++ b/object-store.h
@@ -253,6 +253,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi);
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
(!obj && repo_has_object_file(r, oid) &&
oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
- if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+ if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
error(_("hash mismatch %s"), oid_to_hex(oid));
return NULL;
}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
buffer = repo_read_object_file(r, oid, &type, &size);
if (buffer) {
if (check_object_signature(r, repl, buffer, size,
- type_name(type)) < 0) {
+ type_name(type), NULL) < 0) {
free(buffer);
error(_("hash mismatch %s"), oid_to_hex(repl));
return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
oid_to_hex(&oid), p->pack_name,
(uintmax_t)entries[i].offset);
- else if (check_object_signature(r, &oid, data, size, type_name(type)))
+ else if (check_object_signature(r, &oid, data, size,
+ type_name(type), NULL))
err = error("packed %s from %s is corrupt",
oid_to_hex(&oid), p->pack_name);
else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index a5e7401af8b..0f52ca9cc82 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -512,7 +512,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
# Swap the two to corrupt the repository
mv -f "$other_path" "$empty_path" &&
test_must_fail git fsck 2>err.fsck &&
- grep "hash mismatch" err.fsck &&
+ grep "hash-path mismatch" err.fsck &&
# confirm that cat-file is reading the new swapped-in
# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 167c319823a..eb0e772f098 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -54,6 +54,7 @@ test_expect_success 'object with hash mismatch' '
cd hash-mismatch &&
oid=$(echo blob | git hash-object -w --stdin) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -65,7 +66,7 @@ test_expect_success 'object with hash mismatch' '
git update-ref refs/heads/bogus $cmt &&
test_must_fail git fsck 2>out &&
- grep "$oid.*corrupt" out
+ grep "$oldoid: hash-path mismatch, found at: .*$new" out
)
'
@@ -75,6 +76,7 @@ test_expect_success 'object with hash and type mismatch' '
cd hash-type-mismatch &&
oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -87,8 +89,8 @@ test_expect_success 'object with hash and type mismatch' '
test_must_fail git fsck 2>out &&
- grep "^error: hash mismatch for " out &&
- grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+ grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+ grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
)
'
--
2.33.0.1374.g05459a61530
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v9 17/17] fsck: report invalid object type-path combinations
2021-09-30 13:37 ` [PATCH v9 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-30 21:01 ` Junio C Hamano
0 siblings, 0 replies; 245+ messages in thread
From: Junio C Hamano @ 2021-09-30 21:01 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 623f8fc3194..980c26e3b25 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -598,23 +598,30 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> struct object *obj;
> enum object_type type;
> unsigned long size;
> - void *contents;
> + unsigned char *contents = NULL;
> int eaten;
> struct strbuf sb = STRBUF_INIT;
> struct object_info oi = OBJECT_INFO_INIT;
> - int err = 0;
> + struct object_id real_oid = *null_oid();
> + int ret;
>
> oi.type_name = &sb;
> oi.sizep = &size;
> oi.typep = &type;
>
> - if (read_loose_object(path, oid, &contents, &oi) < 0)
> - err = error(_("%s: object corrupt or missing: %s"),
> - oid_to_hex(oid), path);
> + ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
> + if (ret < 0) {
> + if (contents && !oideq(&real_oid, oid))
> + error(_("%s: hash-path mismatch, found at: %s"),
> + oid_to_hex(&real_oid), path);
> + else
> + error(_("%s: object corrupt or missing: %s"),
> + oid_to_hex(oid), path);
> + }
> if (type < 0)
> - err = error(_("%s: object is of unknown type '%s': %s"),
> - oid_to_hex(oid), sb.buf, path);
> - if (err) {
> + ret = error(_("%s: object is of unknown type '%s': %s"),
> + oid_to_hex(&real_oid), sb.buf, path);
> + if (ret < 0) {
> errors_found |= ERROR_OBJECT;
> goto cleanup;
> }
This is immediately touching up what 16/17 has introduced, which is
making it a bit harder to follow than necessary, so let's take the
whole postimage of 16+17.
> static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> {
> struct object *obj;
> enum object_type type;
> unsigned long size;
> unsigned char *contents = NULL;
> int eaten;
> struct strbuf sb = STRBUF_INIT;
> struct object_info oi = OBJECT_INFO_INIT;
> struct object_id real_oid = *null_oid();
> int ret;
>
> oi.type_name = &sb;
> oi.sizep = &size;
> oi.typep = &type;
>
> ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
> if (ret < 0) {
> if (contents && !oideq(&real_oid, oid))
> error(_("%s: hash-path mismatch, found at: %s"),
> oid_to_hex(&real_oid), path);
> else
> error(_("%s: object corrupt or missing: %s"),
> oid_to_hex(oid), path);
We can emit an error() message from either one of these. contents
may or may not be NULL, ret is negative, and we continue. Do we
know anything about the value of type at this point? IOW, will we
get into the body of the next "if (type < 0)" statement to overwrite
ret with -1?
> }
> if (type < 0)
> ret = error(_("%s: object is of unknown type '%s': %s"),
> oid_to_hex(&real_oid), sb.buf, path);
> if (ret < 0) {
> errors_found |= ERROR_OBJECT;
> goto cleanup;
In any case, we'd jump to clean-up if any of the above hold, so we'd
avoid hittign the next BUG().
> }
>
> if (!contents && type != OBJ_BLOB)
> BUG("read_loose_object streamed a non-blob");
>
> obj = parse_object_buffer(the_repository, oid, type, size,
> contents, &eaten);
>
> if (!obj) {
> errors_found |= ERROR_OBJECT;
> error(_("%s: object could not be parsed: %s"),
> oid_to_hex(oid), path);
> goto cleanup_eaten;
> }
>
> obj->flags &= ~(REACHABLE | SEEN);
> obj->flags |= HAS_OBJ;
> if (fsck_obj(obj, contents, size))
> errors_found |= ERROR_OBJECT;
>
> cleanup_eaten:
> if (!eaten)
> free(contents);
> cleanup:
> strbuf_release(&sb);
In the "goto cleanup" error case above, we haven't done anything
that would have caused the object contents eaten, and contents may
either point at an allocated memory or NULL (in the "hash-path
mismatch" case, we may have contents allocated but nobody has freed
it yet, leaking it).
I am wondering if we initialized "eaten" to false, we can get rid of
one of the two labels we added in this series, which would fix this
leak as well, no?
> return 0; /* keep checking other objects, even if we saw an error */
> }
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v9 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (16 preceding siblings ...)
2021-09-30 13:37 ` [PATCH v9 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-30 19:06 ` Taylor Blau
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
18 siblings, 0 replies; 245+ messages in thread
From: Taylor Blau @ 2021-09-30 19:06 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau
On Thu, Sep 30, 2021 at 03:37:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
> The only change since v8 is the plugging of a memory leak introduced
> in the previous 16/17. I've been doing integration of my local pending
> patches using some follow-up work for the in-flight
> ab/sanitize-leak-ci topic, which is already proving quite useful.
Good catch, sorry that I missed it myself when reading the previous
version. The way you plugged the leak is sensible to me, so I'm happy
with this version, too.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v10 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
2021-09-30 13:37 ` [PATCH v9 " Ævar Arnfjörð Bjarmason
` (17 preceding siblings ...)
2021-09-30 19:06 ` [PATCH v9 00/17] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
` (16 more replies)
18 siblings, 17 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
This improves fsck error reporting, see the examples in the commit
messages of 16/17 and 17/17. To get there I've lib-ified more things
in object-file.c and the general object APIs, i.e. now we'll return
error codes instead of calling die() in these cases.
This should fix the issues noted about v9[1]. I.e.:
A. Junio's right in [2] that the "type" can't be trusted after a
failed read_loose_object(). I.e. if we'll fail before we can parse
it out it'll be uninitialized. It's now initialized to OBJ_NONE,
we can trust it if it's set to something else.
B. I re-arranged much of 16/17 and 17/17 to make the diff (but not
the range-diff) smaller.
C. We now share a single "strbuf" across the whole fsck_loose() walk
for saving away the type name, instead of allocating a new one
each time. This is both a better memory usage pattern, and makes
fsck_loose() itself simpler.
It also allows for using much of the pre-image as-is, i.e. the
whole "goto cleanup" is gone. Likewise instead of "ret" and "err"
we just have the "err" variable now.
D. I fixed the redundant/left-over test setup noted by Andrei
Rybak[3].
1. http://lore.kernel.org/git/cover-v9-00.17-00000000000-20210930T133300Z-avarab@gmail.com
2. https://lore.kernel.org/git/xmqqsfxlaicg.fsf@gitster.g/
3. https://lore.kernel.org/git/78bab348-ba3a-7a27-e32e-6b75f91178db@gmail.com/
Ævar Arnfjörð Bjarmason (17):
fsck tests: add test for fsck-ing an unknown type
fsck tests: refactor one test to use a sub-repo
fsck tests: test current hash/type mismatch behavior
fsck tests: test for garbage appended to a loose object
cat-file tests: move bogus_* variable declarations earlier
cat-file tests: test for missing/bogus object with -t, -s and -p
cat-file tests: add corrupt loose object test
cat-file tests: test for current --allow-unknown-type behavior
object-file.c: don't set "typep" when returning non-zero
object-file.c: return -1, not "status" from unpack_loose_header()
object-file.c: make parse_loose_header_extended() public
object-file.c: simplify unpack_loose_short_header()
object-file.c: use "enum" return type for unpack_loose_header()
object-file.c: return ULHR_TOO_LONG on "header too long"
object-file.c: stop dying in parse_loose_header()
fsck: don't hard die on invalid object types
fsck: report invalid object type-path combinations
builtin/fast-export.c | 2 +-
builtin/fsck.c | 44 +++++++--
builtin/index-pack.c | 2 +-
builtin/mktag.c | 3 +-
cache.h | 45 ++++++++-
object-file.c | 176 +++++++++++++++------------------
object-store.h | 7 +-
object.c | 4 +-
pack-check.c | 3 +-
streaming.c | 27 +++--
t/oid-info/oid | 2 +
t/t1006-cat-file.sh | 223 +++++++++++++++++++++++++++++++++++++++---
t/t1450-fsck.sh | 97 ++++++++++++++----
13 files changed, 476 insertions(+), 159 deletions(-)
Range-diff against v9:
1: 520732612f7 ! 1: 00936435423 fsck tests: add test for fsck-ing an unknown type
@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
+ (
+ cd garbage-type &&
+
-+ empty=$(git hash-object --stdin -w -t blob </dev/null) &&
-+ garbage=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
++ git hash-object --stdin -w -t garbage --literally </dev/null &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
2: af7086623fe = 2: 32a2f9cc0c9 fsck tests: refactor one test to use a sub-repo
3: 102bc4f0176 = 3: 00d661a6032 fsck tests: test current hash/type mismatch behavior
4: ff7fc09d5a1 = 4: a527e3b262c fsck tests: test for garbage appended to a loose object
5: 278df093239 = 5: 7a63d30aef3 cat-file tests: move bogus_* variable declarations earlier
6: 290bf983590 = 6: a563c7efe1c cat-file tests: test for missing/bogus object with -t, -s and -p
7: a41b2c571e5 = 7: c5affb65b7e cat-file tests: add corrupt loose object test
8: cedeb117330 = 8: 76f9888a6f7 cat-file tests: test for current --allow-unknown-type behavior
9: 6f0673d38c8 = 9: 85a91f43634 object-file.c: don't set "typep" when returning non-zero
10: 6637e8fd2ca = 10: 51eaa2e8479 object-file.c: return -1, not "status" from unpack_loose_header()
11: 51db08ebbae = 11: 5cd2ba830e9 object-file.c: make parse_loose_header_extended() public
12: dffe5581f6f = 12: 6899c6ec17a object-file.c: simplify unpack_loose_short_header()
13: eb7c949c8b7 = 13: a3bdd53d296 object-file.c: use "enum" return type for unpack_loose_header()
14: f4cc7271df7 = 14: 5a7c2855b50 object-file.c: return ULHR_TOO_LONG on "header too long"
15: 25d6ec668d4 = 15: 3ec9fee7ee9 object-file.c: stop dying in parse_loose_header()
16: 6ce0414b2b7 ! 16: 9b75ac7c8ed fsck: don't hard die on invalid object types
@@ Commit message
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().
- Since we're now passing in a "oi.type_name" we'll have to clean up the
- allocated "strbuf sb". That we're doing it right is asserted by
- e.g. the "fsck notices broken commit" test added in 03818a4a94c
- (split_ident: parse timestamp from end of line, 2013-10-14). To do
- that switch to a "goto cleanup" pattern, and while we're at it factor
- out the already duplicated free(content) to use that pattern.
+ Since we'll need a "struct strbuf" to hold the "type_name" let's pass
+ it to the for_each_loose_file_in_objdir() callback to avoid allocating
+ a new one for each loose object in the iteration. It also makes the
+ memory management simpler than sticking it in fsck_loose() itself, as
+ we'll only need to strbuf_reset() it, with no need to do a
+ strbuf_release() before each "return".
+
+ Before this commit we'd never check the "type" if read_loose_object()
+ failed, but now we do. We therefore need to initialize it to OBJ_NONE
+ to be able to tell the difference between e.g. its
+ unpack_loose_header() having failed, and us getting past that and into
+ parse_loose_header().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
## builtin/fsck.c ##
-@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
+@@ builtin/fsck.c: static void get_default_heads(void)
+ }
+ }
+
++struct for_each_loose_cb
++{
++ struct progress *progress;
++ struct strbuf obj_type;
++};
++
+ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
+ {
++ struct for_each_loose_cb *cb_data = data;
+ struct object *obj;
+- enum object_type type;
++ enum object_type type = OBJ_NONE;
unsigned long size;
void *contents;
int eaten;
-+ struct strbuf sb = STRBUF_INIT;
+ struct object_info oi = OBJECT_INFO_INIT;
+ int err = 0;
- if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
-+ oi.type_name = &sb;
++ strbuf_reset(&cb_data->obj_type);
++ oi.type_name = &cb_data->obj_type;
+ oi.sizep = &size;
+ oi.typep = &type;
+
+ if (read_loose_object(path, oid, &contents, &oi) < 0)
+ err = error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
-+ if (type < 0)
++ if (type != OBJ_NONE && type < 0)
+ err = error(_("%s: object is of unknown type '%s': %s"),
-+ oid_to_hex(oid), sb.buf, path);
-+ if (err) {
++ oid_to_hex(oid), cb_data->obj_type.buf, path);
++ if (err < 0) {
errors_found |= ERROR_OBJECT;
- error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
-- return 0; /* keep checking other objects */
-+ goto cleanup;
- }
-
- if (!contents && type != OBJ_BLOB)
-@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
- errors_found |= ERROR_OBJECT;
- error(_("%s: object could not be parsed: %s"),
- oid_to_hex(oid), path);
-- if (!eaten)
-- free(contents);
-- return 0; /* keep checking other objects */
-+ goto cleanup_eaten;
+ return 0; /* keep checking other objects */
}
- obj->flags &= ~(REACHABLE | SEEN);
-@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
- if (fsck_obj(obj, contents, size))
- errors_found |= ERROR_OBJECT;
+@@ builtin/fsck.c: static int fsck_cruft(const char *basename, const char *path, void *data)
+ return 0;
+ }
-+cleanup_eaten:
- if (!eaten)
- free(contents);
-+cleanup:
-+ strbuf_release(&sb);
- return 0; /* keep checking other objects, even if we saw an error */
+-static int fsck_subdir(unsigned int nr, const char *path, void *progress)
++static int fsck_subdir(unsigned int nr, const char *path, void *data)
+ {
++ struct for_each_loose_cb *cb_data = data;
++ struct progress *progress = cb_data->progress;
+ display_progress(progress, nr + 1);
+ return 0;
+ }
+@@ builtin/fsck.c: static int fsck_subdir(unsigned int nr, const char *path, void *progress)
+ static void fsck_object_dir(const char *path)
+ {
+ struct progress *progress = NULL;
++ struct for_each_loose_cb cb_data = {
++ .obj_type = STRBUF_INIT,
++ .progress = progress,
++ };
+
+ if (verbose)
+ fprintf_ln(stderr, _("Checking object directory"));
+@@ builtin/fsck.c: static void fsck_object_dir(const char *path)
+ progress = start_progress(_("Checking object directories"), 256);
+
+ for_each_loose_file_in_objdir(path, fsck_loose, fsck_cruft, fsck_subdir,
+- progress);
++ &cb_data);
+ display_progress(progress, 256);
+ stop_progress(&progress);
++ strbuf_release(&cb_data.obj_type);
}
+ static int fsck_head_link(const char *head_ref_name,
## object-file.c ##
@@ object-file.c: static int check_stream_oid(git_zstream *stream,
@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
git init --bare garbage-type &&
(
cd garbage-type &&
-@@ t/t1450-fsck.sh: test_expect_success 'fsck hard errors on an invalid object type' '
+
+- git hash-object --stdin -w -t garbage --literally </dev/null &&
++ garbage_blob=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
+
+ cat >err.expect <<-\EOF &&
fatal: invalid object type
EOF
test_must_fail git fsck >out 2>err &&
@@ t/t1450-fsck.sh: test_expect_success 'fsck hard errors on an invalid object type
- test_must_be_empty out
+ grep -e "^error" -e "^fatal" err >errors &&
+ test_line_count = 1 errors &&
-+ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
-+ grep "dangling blob $empty_blob" out
++ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err
)
'
17: 8d926e41fc3 ! 17: 838df0a979b fsck: report invalid object type-path combinations
@@ builtin/fast-export.c: static void export_blob(const struct object_id *oid)
## builtin/fsck.c ##
@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
- struct object *obj;
- enum object_type type;
- unsigned long size;
-- void *contents;
-+ unsigned char *contents = NULL;
+ void *contents;
int eaten;
- struct strbuf sb = STRBUF_INIT;
struct object_info oi = OBJECT_INFO_INIT;
-- int err = 0;
+ struct object_id real_oid = *null_oid();
-+ int ret;
+ int err = 0;
- oi.type_name = &sb;
+ strbuf_reset(&cb_data->obj_type);
+@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
oi.sizep = &size;
oi.typep = &type;
- if (read_loose_object(path, oid, &contents, &oi) < 0)
- err = error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
-+ ret = read_loose_object(path, oid, &real_oid, (void **)&contents, &oi);
-+ if (ret < 0) {
++ if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
+ if (contents && !oideq(&real_oid, oid))
-+ error(_("%s: hash-path mismatch, found at: %s"),
-+ oid_to_hex(&real_oid), path);
++ err = error(_("%s: hash-path mismatch, found at: %s"),
++ oid_to_hex(&real_oid), path);
+ else
-+ error(_("%s: object corrupt or missing: %s"),
-+ oid_to_hex(oid), path);
++ err = error(_("%s: object corrupt or missing: %s"),
++ oid_to_hex(oid), path);
+ }
- if (type < 0)
-- err = error(_("%s: object is of unknown type '%s': %s"),
-- oid_to_hex(oid), sb.buf, path);
-- if (err) {
-+ ret = error(_("%s: object is of unknown type '%s': %s"),
-+ oid_to_hex(&real_oid), sb.buf, path);
-+ if (ret < 0) {
+ if (type != OBJ_NONE && type < 0)
+ err = error(_("%s: object is of unknown type '%s': %s"),
+- oid_to_hex(oid), cb_data->obj_type.buf, path);
++ oid_to_hex(&real_oid), cb_data->obj_type.buf,
++ path);
+ if (err < 0) {
errors_found |= ERROR_OBJECT;
- goto cleanup;
- }
+ return 0; /* keep checking other objects */
## builtin/index-pack.c ##
@@ builtin/index-pack.c: static void fix_unresolved_deltas(struct hashfile *f)
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH v10 01/17] fsck tests: add test for fsck-ing an unknown type
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
` (15 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.
This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..beb233e91b1 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -865,4 +865,20 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
+test_expect_success 'fsck hard errors on an invalid object type' '
+ git init --bare garbage-type &&
+ (
+ cd garbage-type &&
+
+ git hash-object --stdin -w -t garbage --literally </dev/null &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+ test_must_fail git fsck >out 2>err &&
+ test_cmp err.expect err &&
+ test_must_be_empty out
+ )
+'
+
test_done
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 02/17] fsck tests: refactor one test to use a sub-repo
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
` (14 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.
We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.
See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.
1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index beb233e91b1..b73bc2a2ec3 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,25 @@ remove_object () {
rm "$(sha1_file "$1")"
}
-test_expect_success 'object with bad sha1' '
- sha=$(echo blob | git hash-object -w --stdin) &&
- old=$(test_oid_to_path "$sha") &&
- new=$(dirname $old)/$(test_oid ff_2) &&
- sha="$(dirname $new)$(basename $new)" &&
- mv .git/objects/$old .git/objects/$new &&
- test_when_finished "remove_object $sha" &&
- git update-index --add --cacheinfo 100644 $sha foo &&
- test_when_finished "git read-tree -u --reset HEAD" &&
- tree=$(git write-tree) &&
- test_when_finished "remove_object $tree" &&
- cmt=$(echo bogus | git commit-tree $tree) &&
- test_when_finished "remove_object $cmt" &&
- git update-ref refs/heads/bogus $cmt &&
- test_when_finished "git update-ref -d refs/heads/bogus" &&
+test_expect_success 'object with hash mismatch' '
+ git init --bare hash-mismatch &&
+ (
+ cd hash-mismatch &&
- test_must_fail git fsck 2>out &&
- test_i18ngrep "$sha.*corrupt" out
+ oid=$(echo blob | git hash-object -w --stdin) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+
+ test_must_fail git fsck 2>out &&
+ grep "$oid.*corrupt" out
+ )
'
test_expect_success 'branch pointing to non-commit' '
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 03/17] fsck tests: test current hash/type mismatch behavior
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 01/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 02/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
` (13 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index b73bc2a2ec3..f9cabcecd14 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,6 +69,30 @@ test_expect_success 'object with hash mismatch' '
)
'
+test_expect_success 'object with hash and type mismatch' '
+ git init --bare hash-type-mismatch &&
+ (
+ cd hash-type-mismatch &&
+
+ oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ old=$(test_oid_to_path "$oid") &&
+ new=$(dirname $old)/$(test_oid ff_2) &&
+ oid="$(dirname $new)$(basename $new)" &&
+
+ mv objects/$old objects/$new &&
+ git update-index --add --cacheinfo 100644 $oid foo &&
+ tree=$(git write-tree) &&
+ cmt=$(echo bogus | git commit-tree $tree) &&
+ git update-ref refs/heads/bogus $cmt &&
+
+ cat >expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+ test_must_fail git fsck 2>actual &&
+ test_cmp expect actual
+ )
+'
+
test_expect_success 'branch pointing to non-commit' '
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
test_when_finished "git update-ref -d refs/heads/invalid" &&
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 04/17] fsck tests: test for garbage appended to a loose object
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (2 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 03/17] fsck tests: test current hash/type mismatch behavior Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
` (12 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1450-fsck.sh | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f9cabcecd14..281ff8bdd8e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -93,6 +93,26 @@ test_expect_success 'object with hash and type mismatch' '
)
'
+test_expect_success POSIXPERM 'zlib corrupt loose object output ' '
+ git init --bare corrupt-loose-output &&
+ (
+ cd corrupt-loose-output &&
+ oid=$(git hash-object -w --stdin --literally </dev/null) &&
+ oidf=objects/$(test_oid_to_path "$oid") &&
+ chmod 755 $oidf &&
+ echo extra garbage >>$oidf &&
+
+ cat >expect.error <<-EOF &&
+ error: garbage at end of loose object '\''$oid'\''
+ error: unable to unpack contents of ./$oidf
+ error: $oid: object corrupt or missing: ./$oidf
+ EOF
+ test_must_fail git fsck 2>actual &&
+ grep ^error: actual >error &&
+ test_cmp expect.error error
+ )
+'
+
test_expect_success 'branch pointing to non-commit' '
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
test_when_finished "git update-ref -d refs/heads/invalid" &&
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 05/17] cat-file tests: move bogus_* variable declarations earlier
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (3 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 04/17] fsck tests: test for garbage appended to a loose object Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
` (11 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 35 +++++++++++++++++++----------------
1 file changed, 19 insertions(+), 16 deletions(-)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..ea6a53d425b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,36 +315,39 @@ test_expect_success '%(deltabase) reports packed delta bases' '
}
'
-bogus_type="bogus"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
+test_expect_success 'setup bogus data' '
+ bogus_short_type="bogus" &&
+ bogus_short_content="bogus" &&
+ bogus_short_size=$(strlen "$bogus_short_content") &&
+ bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+
+ bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
+ bogus_long_content="bogus" &&
+ bogus_long_size=$(strlen "$bogus_long_content") &&
+ bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+'
test_expect_success "Type of broken object is correct" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_short_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
test_cmp expect actual
'
test_expect_success "Size of broken object is correct" '
- echo $bogus_size >expect &&
- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_short_size >expect &&
+ git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
test_cmp expect actual
'
-bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
-bogus_content="bogus"
-bogus_size=$(strlen "$bogus_content")
-bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
test_expect_success "Type of broken object is correct when type is large" '
- echo $bogus_type >expect &&
- git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_long_type >expect &&
+ git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
test_cmp expect actual
'
test_expect_success "Size of large broken object is correct when type is large" '
- echo $bogus_size >expect &&
- git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
+ echo $bogus_long_size >expect &&
+ git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
test_cmp expect actual
'
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (4 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 05/17] cat-file tests: move bogus_* variable declarations earlier Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
` (10 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().
The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.
The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.
Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.
This extends tests added in 3e370f9faf0 (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/oid-info/oid | 2 ++
t/t1006-cat-file.sh | 75 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 77 insertions(+)
diff --git a/t/oid-info/oid b/t/oid-info/oid
index a754970523c..7547d2c7903 100644
--- a/t/oid-info/oid
+++ b/t/oid-info/oid
@@ -27,3 +27,5 @@ numeric sha1:0123456789012345678901234567890123456789
numeric sha256:0123456789012345678901234567890123456789012345678901234567890123
deadbeef sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
deadbeef sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef_short sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbee
+deadbeef_short sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbee
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index ea6a53d425b..abf57339a29 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -327,6 +327,81 @@ test_expect_success 'setup bogus data' '
bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
'
+for arg1 in '' --allow-unknown-type
+do
+ for arg2 in -s -t -p
+ do
+ if test "$arg1" = "--allow-unknown-type" && test "$arg2" = "-p"
+ then
+ continue
+ fi
+
+
+ test_expect_success "cat-file $arg1 $arg2 error on bogus short OID" '
+ cat >expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ if test "$arg1" = "--allow-unknown-type"
+ then
+ git cat-file $arg1 $arg2 $bogus_short_sha1
+ else
+ test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+ test_must_be_empty out &&
+ test_cmp expect actual
+ fi
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on bogus full OID" '
+ if test "$arg2" = "-p"
+ then
+ cat >expect <<-EOF
+ error: unable to unpack $bogus_long_sha1 header
+ fatal: Not a valid object name $bogus_long_sha1
+ EOF
+ else
+ cat >expect <<-EOF
+ error: unable to unpack $bogus_long_sha1 header
+ fatal: git cat-file: could not get object info
+ EOF
+ fi &&
+
+ if test "$arg1" = "--allow-unknown-type"
+ then
+ git cat-file $arg1 $arg2 $bogus_short_sha1
+ else
+ test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+ test_must_be_empty out &&
+ test_cmp expect actual
+ fi
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on missing short OID" '
+ cat >expect.err <<-EOF &&
+ fatal: Not a valid object name $(test_oid deadbeef_short)
+ EOF
+ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef_short) >out 2>err.actual &&
+ test_must_be_empty out
+ '
+
+ test_expect_success "cat-file $arg1 $arg2 error on missing full OID" '
+ if test "$arg2" = "-p"
+ then
+ cat >expect.err <<-EOF
+ fatal: Not a valid object name $(test_oid deadbeef)
+ EOF
+ else
+ cat >expect.err <<-\EOF
+ fatal: git cat-file: could not get object info
+ EOF
+ fi &&
+ test_must_fail git cat-file $arg1 $arg2 $(test_oid deadbeef) >out 2>err.actual &&
+ test_must_be_empty out &&
+ test_cmp expect.err err.actual
+ '
+ done
+done
+
test_expect_success "Type of broken object is correct" '
echo $bogus_short_type >expect &&
git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 07/17] cat-file tests: add corrupt loose object test
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (5 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 06/17] cat-file tests: test for missing/bogus object with -t, -s and -p Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
` (9 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index abf57339a29..15774979ad3 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -426,6 +426,58 @@ test_expect_success "Size of large broken object is correct when type is large"
test_cmp expect actual
'
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+ git init --bare corrupt-loose.git &&
+ (
+ cd corrupt-loose.git &&
+
+ # Setup and create the empty blob and its path
+ empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+ git hash-object -w --stdin </dev/null &&
+
+ # Create another blob and its path
+ echo other >other.blob &&
+ other_blob=$(git hash-object -w --stdin <other.blob) &&
+ other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+ # Before the swap the size is 0
+ cat >out.expect <<-EOF &&
+ 0
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # Swap the two to corrupt the repository
+ mv -f "$other_path" "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "hash mismatch" err.fsck &&
+
+ # confirm that cat-file is reading the new swapped-in
+ # blob...
+ cat >out.expect <<-EOF &&
+ blob
+ EOF
+ git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # ... since it has a different size now.
+ cat >out.expect <<-EOF &&
+ 6
+ EOF
+ git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+ test_must_be_empty err.actual &&
+ test_cmp out.expect out.actual &&
+
+ # So far "cat-file" has been happy to spew the found
+ # content out as-is. Try to make it zlib-invalid.
+ mv -f other.blob "$empty_path" &&
+ test_must_fail git fsck 2>err.fsck &&
+ grep "^error: inflate: data stream error (" err.fsck
+ )
+'
+
# Tests for git cat-file --follow-symlinks
test_expect_success 'prep for symlink tests' '
echo_without_newline "$hello_content" >morx &&
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 08/17] cat-file tests: test for current --allow-unknown-type behavior
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (6 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 07/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
` (8 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.
1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 15774979ad3..5b16c69c286 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -402,6 +402,67 @@ do
done
done
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+ git cat-file -e $bogus_short_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+ test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+ test_must_fail git cat-file -p $bogus_short_sha1 &&
+ test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type "bogus"
+ EOF
+ test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+ echo $bogus_short_sha1 >bogus-oid &&
+
+ cat >err.expect <<-\EOF &&
+ fatal: invalid object type
+ EOF
+
+ test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual &&
+
+ test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+ test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+ test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+ test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+ cat >expect <<-EOF &&
+ $bogus_short_type
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual &&
+
+ # Create it manually, as "git replace" will die on bogus
+ # types.
+ head=$(git rev-parse --verify HEAD) &&
+ test_when_finished "rm -rf .git/refs/replace" &&
+ mkdir -p .git/refs/replace &&
+ echo $head >.git/refs/replace/$bogus_short_sha1 &&
+
+ cat >expect <<-EOF &&
+ commit
+ EOF
+ git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+ test_cmp expect actual
+'
+
test_expect_success "Type of broken object is correct" '
echo $bogus_short_type >expect &&
git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 09/17] object-file.c: don't set "typep" when returning non-zero
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (7 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 08/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
` (7 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.
That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.
Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.
Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.
This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.
Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.
Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/object-file.c b/object-file.c
index be4f94ecf3b..766ba88b851 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1525,8 +1525,6 @@ static int loose_object_info(struct repository *r,
git_inflate_end(&stream);
munmap(map, mapsize);
- if (status && oi->typep)
- *oi->typep = status;
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 10/17] object-file.c: return -1, not "status" from unpack_loose_header()
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (8 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 09/17] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
` (6 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.
See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".
At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").
However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.
So let's do the minor cleanup of also changing this function to return
a -1.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/object-file.c b/object-file.c
index 766ba88b851..8475b128944 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1284,7 +1284,7 @@ int unpack_loose_header(git_zstream *stream,
buffer, bufsiz);
if (status < Z_OK)
- return status;
+ return -1;
/* Make sure we have the terminating NUL */
if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 11/17] object-file.c: make parse_loose_header_extended() public
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (9 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 10/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
` (5 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.
This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.
It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.
1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 4 +++-
object-file.c | 20 +++++++-------------
streaming.c | 5 ++++-
3 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/cache.h b/cache.h
index f6295f3b048..35f254dae4a 100644
--- a/cache.h
+++ b/cache.h
@@ -1320,7 +1320,9 @@ char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+struct object_info;
+int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags);
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 8475b128944..6b91c4edcf6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1385,8 +1385,8 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi,
+ unsigned int flags)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1446,14 +1446,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
return *hdr ? -1 : type;
}
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
- struct object_info oi = OBJECT_INFO_INIT;
-
- oi.sizep = sizep;
- return parse_loose_header_extended(hdr, &oi, 0);
-}
-
static int loose_object_info(struct repository *r,
const struct object_id *oid,
struct object_info *oi, int flags)
@@ -1508,10 +1500,10 @@ static int loose_object_info(struct repository *r,
if (status < 0)
; /* Do nothing */
else if (hdrbuf.len) {
- if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+ if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
- } else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+ } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
status = error(_("unable to parse %s header"), oid_to_hex(oid));
if (status >= 0 && oi->contentp) {
@@ -2599,6 +2591,8 @@ int read_loose_object(const char *path,
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = size;
*contents = NULL;
@@ -2613,7 +2607,7 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, size);
+ *type = parse_loose_header(hdr, &oi, 0);
if (*type < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
const struct object_id *oid,
enum object_type *type)
{
+ struct object_info oi = OBJECT_INFO_INIT;
+ oi.sizep = &st->size;
+
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapsize,
st->u.loose.hdr,
sizeof(st->u.loose.hdr)) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+ (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
return -1;
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 12/17] object-file.c: simplify unpack_loose_short_header()
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (10 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 11/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
` (4 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.
The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).
Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.
I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 17 ++++++++++++++-
object-file.c | 58 ++++++++++++++++++---------------------------------
streaming.c | 3 ++-
3 files changed, 38 insertions(+), 40 deletions(-)
diff --git a/cache.h b/cache.h
index 35f254dae4a..d7189aed8fc 100644
--- a/cache.h
+++ b/cache.h
@@ -1319,7 +1319,22 @@ char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
+
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+ unsigned long mapsize, void *buffer,
+ unsigned long bufsiz, struct strbuf *hdrbuf);
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi,
unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 6b91c4edcf6..1327872cbf4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,11 +1255,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-static int unpack_loose_short_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+ unsigned char *map, unsigned long mapsize,
+ void *buffer, unsigned long bufsiz,
+ struct strbuf *header)
{
- int ret;
+ int status;
/* Get the data stream */
memset(stream, 0, sizeof(*stream));
@@ -1270,35 +1271,8 @@ static int unpack_loose_short_header(git_zstream *stream,
git_inflate_init(stream);
obj_read_unlock();
- ret = git_inflate(stream, 0);
+ status = git_inflate(stream, 0);
obj_read_lock();
-
- return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz)
-{
- int status = unpack_loose_short_header(stream, map, mapsize,
- buffer, bufsiz);
-
- if (status < Z_OK)
- return -1;
-
- /* Make sure we have the terminating NUL */
- if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return -1;
- return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *header)
-{
- int status;
-
- status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
if (status < Z_OK)
return -1;
@@ -1308,6 +1282,14 @@ static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
return 0;
+ /*
+ * We have a header longer than MAX_HEADER_LEN. The "header"
+ * here is only non-NULL when we run "cat-file
+ * --allow-unknown-type".
+ */
+ if (!header)
+ return -1;
+
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
* result out to header, and then append the result of further
@@ -1457,6 +1439,7 @@ static int loose_object_info(struct repository *r,
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
oidclr(oi->delta_base_oid);
@@ -1490,11 +1473,9 @@ static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
- if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
- status = error(_("unable to unpack %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+
+ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL) < 0)
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
if (status < 0)
@@ -2602,7 +2583,8 @@ int read_loose_object(const char *path,
goto out;
}
- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+ if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ NULL) < 0) {
error(_("unable to unpack header of %s"), path);
goto out;
}
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped,
st->u.loose.mapsize,
st->u.loose.hdr,
- sizeof(st->u.loose.hdr)) < 0) ||
+ sizeof(st->u.loose.hdr),
+ NULL) < 0) ||
(parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
git_inflate_end(&st->z);
munmap(st->u.loose.mapped, st->u.loose.mapsize);
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 13/17] object-file.c: use "enum" return type for unpack_loose_header()
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (11 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 12/17] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
` (3 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.
Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 19 +++++++++++++++----
object-file.c | 34 +++++++++++++++++++++-------------
streaming.c | 23 +++++++++++++----------
3 files changed, 49 insertions(+), 27 deletions(-)
diff --git a/cache.h b/cache.h
index d7189aed8fc..7239e20a625 100644
--- a/cache.h
+++ b/cache.h
@@ -1324,7 +1324,10 @@ int git_open_cloexec(const char *name, int flags);
* unpack_loose_header() initializes the data stream needed to unpack
* a loose object header.
*
- * Returns 0 on success. Returns negative values on error.
+ * Returns:
+ *
+ * - ULHR_OK on success
+ * - ULHR_BAD on error
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
@@ -1332,9 +1335,17 @@ int git_open_cloexec(const char *name, int flags);
* reporting. The full header will be extracted to "hdrbuf" for use
* with parse_loose_header().
*/
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
- unsigned long mapsize, void *buffer,
- unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result {
+ ULHR_OK,
+ ULHR_BAD,
+};
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *hdrbuf);
+
struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi,
unsigned int flags);
diff --git a/object-file.c b/object-file.c
index 1327872cbf4..e0f508415dd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1255,10 +1255,12 @@ void *map_loose_object(struct repository *r,
return map_loose_object_1(r, NULL, oid, size);
}
-int unpack_loose_header(git_zstream *stream,
- unsigned char *map, unsigned long mapsize,
- void *buffer, unsigned long bufsiz,
- struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+ unsigned char *map,
+ unsigned long mapsize,
+ void *buffer,
+ unsigned long bufsiz,
+ struct strbuf *header)
{
int status;
@@ -1274,13 +1276,13 @@ int unpack_loose_header(git_zstream *stream,
status = git_inflate(stream, 0);
obj_read_lock();
if (status < Z_OK)
- return -1;
+ return ULHR_BAD;
/*
* Check if entire header is unpacked in the first iteration.
*/
if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
- return 0;
+ return ULHR_OK;
/*
* We have a header longer than MAX_HEADER_LEN. The "header"
@@ -1288,7 +1290,7 @@ int unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
- return -1;
+ return ULHR_BAD;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
@@ -1309,7 +1311,7 @@ int unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
- return -1;
+ return ULHR_BAD;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ -1474,13 +1476,19 @@ static int loose_object_info(struct repository *r,
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
- if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
- allow_unknown ? &hdrbuf : NULL) < 0)
+ switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+ allow_unknown ? &hdrbuf : NULL)) {
+ case ULHR_OK:
+ break;
+ case ULHR_BAD:
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
- if (status < 0)
- ; /* Do nothing */
- else if (hdrbuf.len) {
+ break;
+ }
+
+ if (status < 0) {
+ /* Do nothing */
+ } else if (hdrbuf.len) {
if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
status = error(_("unable to parse %s header with --allow-unknown-type"),
oid_to_hex(oid));
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..6df0247a4cb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -229,17 +229,16 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
return -1;
- if ((unpack_loose_header(&st->z,
- st->u.loose.mapped,
- st->u.loose.mapsize,
- st->u.loose.hdr,
- sizeof(st->u.loose.hdr),
- NULL) < 0) ||
- (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
- git_inflate_end(&st->z);
- munmap(st->u.loose.mapped, st->u.loose.mapsize);
- return -1;
+ switch (unpack_loose_header(&st->z, st->u.loose.mapped,
+ st->u.loose.mapsize, st->u.loose.hdr,
+ sizeof(st->u.loose.hdr), NULL)) {
+ case ULHR_OK:
+ break;
+ case ULHR_BAD:
+ goto error;
}
+ if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+ goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
st->u.loose.hdr_avail = st->z.total_out;
@@ -248,6 +247,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
st->read = read_istream_loose;
return 0;
+error:
+ git_inflate_end(&st->z);
+ munmap(st->u.loose.mapped, st->u.loose.mapsize);
+ return -1;
}
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 14/17] object-file.c: return ULHR_TOO_LONG on "header too long"
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (12 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 13/17] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
` (2 subsequent siblings)
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.
As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 5 ++++-
object-file.c | 8 ++++++--
streaming.c | 1 +
t/t1006-cat-file.sh | 4 ++--
4 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/cache.h b/cache.h
index 7239e20a625..8e05392fda8 100644
--- a/cache.h
+++ b/cache.h
@@ -1328,16 +1328,19 @@ int git_open_cloexec(const char *name, int flags);
*
* - ULHR_OK on success
* - ULHR_BAD on error
+ * - ULHR_TOO_LONG if the header was too long
*
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
* "hdrbuf" argument is non-NULL. This is intended for use with
* OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
* reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), ULHR_TOO_LONG will still be returned
+ * from this function to indicate that the header was too long.
*/
enum unpack_loose_header_result {
ULHR_OK,
ULHR_BAD,
+ ULHR_TOO_LONG,
};
enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
unsigned char *map,
diff --git a/object-file.c b/object-file.c
index e0f508415dd..3589c5a2e33 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1290,7 +1290,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
* --allow-unknown-type".
*/
if (!header)
- return ULHR_BAD;
+ return ULHR_TOO_LONG;
/*
* buffer[0..bufsiz] was not large enough. Copy the partial
@@ -1311,7 +1311,7 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
stream->next_out = buffer;
stream->avail_out = bufsiz;
} while (status != Z_STREAM_END);
- return ULHR_BAD;
+ return ULHR_TOO_LONG;
}
static void *unpack_loose_rest(git_zstream *stream,
@@ -1484,6 +1484,10 @@ static int loose_object_info(struct repository *r,
status = error(_("unable to unpack %s header"),
oid_to_hex(oid));
break;
+ case ULHR_TOO_LONG:
+ status = error(_("header for %s too long, exceeds %d bytes"),
+ oid_to_hex(oid), MAX_HEADER_LEN);
+ break;
}
if (status < 0) {
diff --git a/streaming.c b/streaming.c
index 6df0247a4cb..bd89c50e7b3 100644
--- a/streaming.c
+++ b/streaming.c
@@ -235,6 +235,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
case ULHR_OK:
break;
case ULHR_BAD:
+ case ULHR_TOO_LONG:
goto error;
}
if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5b16c69c286..a5e7401af8b 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -356,12 +356,12 @@ do
if test "$arg2" = "-p"
then
cat >expect <<-EOF
- error: unable to unpack $bogus_long_sha1 header
+ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
fatal: Not a valid object name $bogus_long_sha1
EOF
else
cat >expect <<-EOF
- error: unable to unpack $bogus_long_sha1 header
+ error: header for $bogus_long_sha1 too long, exceeds 32 bytes
fatal: git cat-file: could not get object info
EOF
fi &&
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 15/17] object-file.c: stop dying in parse_loose_header()
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (13 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 14/17] object-file.c: return ULHR_TOO_LONG on "header too long" Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Make parse_loose_header() return error codes and data instead of
invoking die() by itself.
For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.
For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".
Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().
This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.
Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".
The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.
Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.
We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
cache.h | 11 +++++++--
object-file.c | 67 +++++++++++++++++++++++++--------------------------
streaming.c | 3 ++-
3 files changed, 44 insertions(+), 37 deletions(-)
diff --git a/cache.h b/cache.h
index 8e05392fda8..6c5f00c82d5 100644
--- a/cache.h
+++ b/cache.h
@@ -1349,9 +1349,16 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
unsigned long bufsiz,
struct strbuf *hdrbuf);
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
struct object_info;
-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags);
+int parse_loose_header(const char *hdr, struct object_info *oi);
int check_object_signature(struct repository *r, const struct object_id *oid,
void *buf, unsigned long size, const char *type);
diff --git a/object-file.c b/object-file.c
index 3589c5a2e33..a70669700d0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1369,8 +1369,7 @@ static void *unpack_loose_rest(git_zstream *stream,
* too permissive for what we want to check. So do an anal
* object header parse by hand.
*/
-int parse_loose_header(const char *hdr, struct object_info *oi,
- unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
{
const char *type_buf = hdr;
unsigned long size;
@@ -1392,15 +1391,6 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
type = type_from_string_gently(type_buf, type_len, 1);
if (oi->type_name)
strbuf_add(oi->type_name, type_buf, type_len);
- /*
- * Set type to 0 if its an unknown object and
- * we're obtaining the type using '--allow-unknown-type'
- * option.
- */
- if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
- type = 0;
- else if (type < 0)
- die(_("invalid object type"));
if (oi->typep)
*oi->typep = type;
@@ -1427,7 +1417,14 @@ int parse_loose_header(const char *hdr, struct object_info *oi,
/*
* The length must be followed by a zero byte
*/
- return *hdr ? -1 : type;
+ if (*hdr)
+ return -1;
+
+ /*
+ * The format is valid, but the type may still be bogus. The
+ * Caller needs to check its oi->typep.
+ */
+ return 0;
}
static int loose_object_info(struct repository *r,
@@ -1441,6 +1438,7 @@ static int loose_object_info(struct repository *r,
char hdr[MAX_HEADER_LEN];
struct strbuf hdrbuf = STRBUF_INIT;
unsigned long size_scratch;
+ enum object_type type_scratch;
int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
if (oi->delta_base_oid)
@@ -1472,6 +1470,8 @@ static int loose_object_info(struct repository *r,
if (!oi->sizep)
oi->sizep = &size_scratch;
+ if (!oi->typep)
+ oi->typep = &type_scratch;
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
@@ -1479,6 +1479,18 @@ static int loose_object_info(struct repository *r,
switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
allow_unknown ? &hdrbuf : NULL)) {
case ULHR_OK:
+ if (parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0)
+ status = error(_("unable to parse %s header"), oid_to_hex(oid));
+ else if (!allow_unknown && *oi->typep < 0)
+ die(_("invalid object type"));
+
+ if (!oi->contentp)
+ break;
+ *oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
+ if (*oi->contentp)
+ goto cleanup;
+
+ status = -1;
break;
case ULHR_BAD:
status = error(_("unable to unpack %s header"),
@@ -1490,31 +1502,16 @@ static int loose_object_info(struct repository *r,
break;
}
- if (status < 0) {
- /* Do nothing */
- } else if (hdrbuf.len) {
- if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
- status = error(_("unable to parse %s header with --allow-unknown-type"),
- oid_to_hex(oid));
- } else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
- status = error(_("unable to parse %s header"), oid_to_hex(oid));
-
- if (status >= 0 && oi->contentp) {
- *oi->contentp = unpack_loose_rest(&stream, hdr,
- *oi->sizep, oid);
- if (!*oi->contentp) {
- git_inflate_end(&stream);
- status = -1;
- }
- } else
- git_inflate_end(&stream);
-
+ git_inflate_end(&stream);
+cleanup:
munmap(map, mapsize);
if (oi->sizep == &size_scratch)
oi->sizep = NULL;
strbuf_release(&hdrbuf);
+ if (oi->typep == &type_scratch)
+ oi->typep = NULL;
oi->whence = OI_LOOSE;
- return (status < 0) ? status : 0;
+ return status;
}
int obj_read_use_lock = 0;
@@ -2585,6 +2582,7 @@ int read_loose_object(const char *path,
git_zstream stream;
char hdr[MAX_HEADER_LEN];
struct object_info oi = OBJECT_INFO_INIT;
+ oi.typep = type;
oi.sizep = size;
*contents = NULL;
@@ -2601,12 +2599,13 @@ int read_loose_object(const char *path,
goto out;
}
- *type = parse_loose_header(hdr, &oi, 0);
- if (*type < 0) {
+ if (parse_loose_header(hdr, &oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
}
+ if (*type < 0)
+ die(_("invalid object type"));
if (*type == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/streaming.c b/streaming.c
index bd89c50e7b3..fe54665d86e 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
{
struct object_info oi = OBJECT_INFO_INIT;
oi.sizep = &st->size;
+ oi.typep = type;
st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
if (!st->u.loose.mapped)
@@ -238,7 +239,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
case ULHR_TOO_LONG:
goto error;
}
- if (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)
+ if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
goto error;
st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 16/17] fsck: don't hard die on invalid object types
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (14 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 15/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 9:16 ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
16 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Change the error fsck emits on invalid object types, such as:
$ git hash-object --stdin -w -t garbage --literally </dev/null
<OID>
From the very ungraceful error of:
$ git fsck
fatal: invalid object type
$
To:
$ git fsck
error: <OID>: object is of unknown type 'garbage': <OID_PATH>
[ other fsck output ]
We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).
To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().
Since we'll need a "struct strbuf" to hold the "type_name" let's pass
it to the for_each_loose_file_in_objdir() callback to avoid allocating
a new one for each loose object in the iteration. It also makes the
memory management simpler than sticking it in fsck_loose() itself, as
we'll only need to strbuf_reset() it, with no need to do a
strbuf_release() before each "return".
Before this commit we'd never check the "type" if read_loose_object()
failed, but now we do. We therefore need to initialize it to OBJ_NONE
to be able to tell the difference between e.g. its
unpack_loose_header() having failed, and us getting past that and into
parse_loose_header().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fsck.c | 37 +++++++++++++++++++++++++++++++------
object-file.c | 18 ++++++------------
object-store.h | 6 +++---
t/t1450-fsck.sh | 18 +++++++++---------
4 files changed, 49 insertions(+), 30 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..260210bf8a1 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -593,18 +593,36 @@ static void get_default_heads(void)
}
}
+struct for_each_loose_cb
+{
+ struct progress *progress;
+ struct strbuf obj_type;
+};
+
static int fsck_loose(const struct object_id *oid, const char *path, void *data)
{
+ struct for_each_loose_cb *cb_data = data;
struct object *obj;
- enum object_type type;
+ enum object_type type = OBJ_NONE;
unsigned long size;
void *contents;
int eaten;
+ struct object_info oi = OBJECT_INFO_INIT;
+ int err = 0;
- if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+ strbuf_reset(&cb_data->obj_type);
+ oi.type_name = &cb_data->obj_type;
+ oi.sizep = &size;
+ oi.typep = &type;
+
+ if (read_loose_object(path, oid, &contents, &oi) < 0)
+ err = error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
+ if (type != OBJ_NONE && type < 0)
+ err = error(_("%s: object is of unknown type '%s': %s"),
+ oid_to_hex(oid), cb_data->obj_type.buf, path);
+ if (err < 0) {
errors_found |= ERROR_OBJECT;
- error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
return 0; /* keep checking other objects */
}
@@ -640,8 +658,10 @@ static int fsck_cruft(const char *basename, const char *path, void *data)
return 0;
}
-static int fsck_subdir(unsigned int nr, const char *path, void *progress)
+static int fsck_subdir(unsigned int nr, const char *path, void *data)
{
+ struct for_each_loose_cb *cb_data = data;
+ struct progress *progress = cb_data->progress;
display_progress(progress, nr + 1);
return 0;
}
@@ -649,6 +669,10 @@ static int fsck_subdir(unsigned int nr, const char *path, void *progress)
static void fsck_object_dir(const char *path)
{
struct progress *progress = NULL;
+ struct for_each_loose_cb cb_data = {
+ .obj_type = STRBUF_INIT,
+ .progress = progress,
+ };
if (verbose)
fprintf_ln(stderr, _("Checking object directory"));
@@ -657,9 +681,10 @@ static void fsck_object_dir(const char *path)
progress = start_progress(_("Checking object directories"), 256);
for_each_loose_file_in_objdir(path, fsck_loose, fsck_cruft, fsck_subdir,
- progress);
+ &cb_data);
display_progress(progress, 256);
stop_progress(&progress);
+ strbuf_release(&cb_data.obj_type);
}
static int fsck_head_link(const char *head_ref_name,
diff --git a/object-file.c b/object-file.c
index a70669700d0..fe95285f405 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2572,18 +2572,15 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents)
+ void **contents,
+ struct object_info *oi)
{
int ret = -1;
void *map = NULL;
unsigned long mapsize;
git_zstream stream;
char hdr[MAX_HEADER_LEN];
- struct object_info oi = OBJECT_INFO_INIT;
- oi.typep = type;
- oi.sizep = size;
+ unsigned long *size = oi->sizep;
*contents = NULL;
@@ -2599,15 +2596,13 @@ int read_loose_object(const char *path,
goto out;
}
- if (parse_loose_header(hdr, &oi) < 0) {
+ if (parse_loose_header(hdr, oi) < 0) {
error(_("unable to parse header of %s"), path);
git_inflate_end(&stream);
goto out;
}
- if (*type < 0)
- die(_("invalid object type"));
- if (*type == OBJ_BLOB && *size > big_file_threshold) {
+ if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
goto out;
} else {
@@ -2618,8 +2613,7 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size,
- type_name(*type))) {
+ *contents, *size, oi->type_name->buf)) {
error(_("hash mismatch for %s (expected %s)"), path,
oid_to_hex(expected_oid));
free(*contents);
diff --git a/object-store.h b/object-store.h
index c5130d8baea..c90c41a07f7 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,6 +245,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
/*
* Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
* type, and size. If the object is a blob, then "contents" may return NULL,
* to allow streaming of large blobs.
*
@@ -252,9 +253,8 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
- enum object_type *type,
- unsigned long *size,
- void **contents);
+ void **contents,
+ struct object_info *oi);
/* Retry packed storage after checking packed and loose storage */
#define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 281ff8bdd8e..faf0e98847b 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -85,11 +85,10 @@ test_expect_success 'object with hash and type mismatch' '
cmt=$(echo bogus | git commit-tree $tree) &&
git update-ref refs/heads/bogus $cmt &&
- cat >expect <<-\EOF &&
- fatal: invalid object type
- EOF
- test_must_fail git fsck 2>actual &&
- test_cmp expect actual
+
+ test_must_fail git fsck 2>out &&
+ grep "^error: hash mismatch for " out &&
+ grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
)
'
@@ -910,19 +909,20 @@ test_expect_success 'detect corrupt index file in fsck' '
test_i18ngrep "bad index file" errors
'
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
git init --bare garbage-type &&
(
cd garbage-type &&
- git hash-object --stdin -w -t garbage --literally </dev/null &&
+ garbage_blob=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
cat >err.expect <<-\EOF &&
fatal: invalid object type
EOF
test_must_fail git fsck >out 2>err &&
- test_cmp err.expect err &&
- test_must_be_empty out
+ grep -e "^error" -e "^fatal" err >errors &&
+ test_line_count = 1 errors &&
+ grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err
)
'
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* [PATCH v10 17/17] fsck: report invalid object type-path combinations
2021-10-01 9:16 ` [PATCH v10 " Ævar Arnfjörð Bjarmason
` (15 preceding siblings ...)
2021-10-01 9:16 ` [PATCH v10 16/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-10-01 9:16 ` Ævar Arnfjörð Bjarmason
2021-10-01 22:14 ` Junio C Hamano
` (2 more replies)
16 siblings, 3 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 9:16 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git fsck
error: garbage at end of loose object 'e69d[...]'
error: unable to unpack contents of ./objects/e6/9d[...]
error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ /usr/bin/git fsck
fatal: invalid object type
$ ~/g/git/git fsck
error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
[...]
I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.
There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fast-export.c | 2 +-
builtin/fsck.c | 15 +++++++++++----
builtin/index-pack.c | 2 +-
builtin/mktag.c | 3 ++-
cache.h | 3 ++-
object-file.c | 21 ++++++++++-----------
object-store.h | 1 +
object.c | 4 ++--
pack-check.c | 3 ++-
t/t1006-cat-file.sh | 2 +-
t/t1450-fsck.sh | 8 +++++---
11 files changed, 38 insertions(+), 26 deletions(-)
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 95e8e89e81f..8e2caf72819 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
if (!buf)
die("could not read blob %s", oid_to_hex(oid));
if (check_object_signature(the_repository, oid, buf, size,
- type_name(type)) < 0)
+ type_name(type), NULL) < 0)
die("oid mismatch in blob %s", oid_to_hex(oid));
object = parse_object_buffer(the_repository, oid, type,
size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 260210bf8a1..30a516da29e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -608,6 +608,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
void *contents;
int eaten;
struct object_info oi = OBJECT_INFO_INIT;
+ struct object_id real_oid = *null_oid();
int err = 0;
strbuf_reset(&cb_data->obj_type);
@@ -615,12 +616,18 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
oi.sizep = &size;
oi.typep = &type;
- if (read_loose_object(path, oid, &contents, &oi) < 0)
- err = error(_("%s: object corrupt or missing: %s"),
- oid_to_hex(oid), path);
+ if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
+ if (contents && !oideq(&real_oid, oid))
+ err = error(_("%s: hash-path mismatch, found at: %s"),
+ oid_to_hex(&real_oid), path);
+ else
+ err = error(_("%s: object corrupt or missing: %s"),
+ oid_to_hex(oid), path);
+ }
if (type != OBJ_NONE && type < 0)
err = error(_("%s: object is of unknown type '%s': %s"),
- oid_to_hex(oid), cb_data->obj_type.buf, path);
+ oid_to_hex(&real_oid), cb_data->obj_type.buf,
+ path);
if (err < 0) {
errors_found |= ERROR_OBJECT;
return 0; /* keep checking other objects */
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 7ce69c087ec..15ae406e6b7 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1415,7 +1415,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
if (check_object_signature(the_repository, &d->oid,
data, size,
- type_name(type)))
+ type_name(type), NULL))
die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
repl = lookup_replace_object(the_repository, tagged_oid);
ret = check_object_signature(the_repository, repl,
- buffer, size, type_name(*tagged_type));
+ buffer, size, type_name(*tagged_type),
+ NULL);
free(buffer);
return ret;
diff --git a/cache.h b/cache.h
index 6c5f00c82d5..e2a203073ea 100644
--- a/cache.h
+++ b/cache.h
@@ -1361,7 +1361,8 @@ struct object_info;
int parse_loose_header(const char *hdr, struct object_info *oi);
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *buf, unsigned long size, const char *type);
+ void *buf, unsigned long size, const char *type,
+ struct object_id *real_oidp);
int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/object-file.c b/object-file.c
index fe95285f405..49561e31551 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
* the streaming interface and rehash it to do the same.
*/
int check_object_signature(struct repository *r, const struct object_id *oid,
- void *map, unsigned long size, const char *type)
+ void *map, unsigned long size, const char *type,
+ struct object_id *real_oidp)
{
- struct object_id real_oid;
+ struct object_id tmp;
+ struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
enum object_type obj_type;
struct git_istream *st;
git_hash_ctx c;
@@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
int hdrlen;
if (map) {
- hash_object_file(r->hash_algo, map, size, type, &real_oid);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ hash_object_file(r->hash_algo, map, size, type, real_oid);
+ return !oideq(oid, real_oid) ? -1 : 0;
}
st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
break;
r->hash_algo->update_fn(&c, buf, readlen);
}
- r->hash_algo->final_oid_fn(&real_oid, &c);
+ r->hash_algo->final_oid_fn(real_oid, &c);
close_istream(st);
- return !oideq(oid, &real_oid) ? -1 : 0;
+ return !oideq(oid, real_oid) ? -1 : 0;
}
int git_open_cloexec(const char *name, int flags)
@@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi)
{
@@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
char hdr[MAX_HEADER_LEN];
unsigned long *size = oi->sizep;
- *contents = NULL;
-
map = map_loose_object_1(the_repository, path, NULL, &mapsize);
if (!map) {
error_errno(_("unable to mmap %s"), path);
@@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size, oi->type_name->buf)) {
- error(_("hash mismatch for %s (expected %s)"), path,
- oid_to_hex(expected_oid));
+ *contents, *size, oi->type_name->buf, real_oid)) {
free(*contents);
goto out;
}
diff --git a/object-store.h b/object-store.h
index c90c41a07f7..17b072e5a19 100644
--- a/object-store.h
+++ b/object-store.h
@@ -253,6 +253,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
*/
int read_loose_object(const char *path,
const struct object_id *expected_oid,
+ struct object_id *real_oid,
void **contents,
struct object_info *oi);
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
(!obj && repo_has_object_file(r, oid) &&
oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
- if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+ if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
error(_("hash mismatch %s"), oid_to_hex(oid));
return NULL;
}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
buffer = repo_read_object_file(r, oid, &type, &size);
if (buffer) {
if (check_object_signature(r, repl, buffer, size,
- type_name(type)) < 0) {
+ type_name(type), NULL) < 0) {
free(buffer);
error(_("hash mismatch %s"), oid_to_hex(repl));
return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
oid_to_hex(&oid), p->pack_name,
(uintmax_t)entries[i].offset);
- else if (check_object_signature(r, &oid, data, size, type_name(type)))
+ else if (check_object_signature(r, &oid, data, size,
+ type_name(type), NULL))
err = error("packed %s from %s is corrupt",
oid_to_hex(&oid), p->pack_name);
else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index a5e7401af8b..0f52ca9cc82 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -512,7 +512,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
# Swap the two to corrupt the repository
mv -f "$other_path" "$empty_path" &&
test_must_fail git fsck 2>err.fsck &&
- grep "hash mismatch" err.fsck &&
+ grep "hash-path mismatch" err.fsck &&
# confirm that cat-file is reading the new swapped-in
# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index faf0e98847b..6337236fd82 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -54,6 +54,7 @@ test_expect_success 'object with hash mismatch' '
cd hash-mismatch &&
oid=$(echo blob | git hash-object -w --stdin) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -65,7 +66,7 @@ test_expect_success 'object with hash mismatch' '
git update-ref refs/heads/bogus $cmt &&
test_must_fail git fsck 2>out &&
- grep "$oid.*corrupt" out
+ grep "$oldoid: hash-path mismatch, found at: .*$new" out
)
'
@@ -75,6 +76,7 @@ test_expect_success 'object with hash and type mismatch' '
cd hash-type-mismatch &&
oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+ oldoid=$oid &&
old=$(test_oid_to_path "$oid") &&
new=$(dirname $old)/$(test_oid ff_2) &&
oid="$(dirname $new)$(basename $new)" &&
@@ -87,8 +89,8 @@ test_expect_success 'object with hash and type mismatch' '
test_must_fail git fsck 2>out &&
- grep "^error: hash mismatch for " out &&
- grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+ grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+ grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
)
'
--
2.33.0.1375.g5eed55aa1b5
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH v10 17/17] fsck: report invalid object type-path combinations
2021-10-01 9:16 ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-10-01 22:14 ` Junio C Hamano
2021-10-01 22:33 ` Ævar Arnfjörð Bjarmason
2021-11-11 3:03 ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
2021-11-11 3:05 ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Han Xin
2 siblings, 1 reply; 245+ messages in thread
From: Junio C Hamano @ 2021-10-01 22:14 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 260210bf8a1..30a516da29e 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -615,12 +616,18 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> oi.sizep = &size;
> oi.typep = &type;
>
> - if (read_loose_object(path, oid, &contents, &oi) < 0)
> - err = error(_("%s: object corrupt or missing: %s"),
> - oid_to_hex(oid), path);
> + if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
> + if (contents && !oideq(&real_oid, oid))
> + err = error(_("%s: hash-path mismatch, found at: %s"),
> + oid_to_hex(&real_oid), path);
> + else
> + err = error(_("%s: object corrupt or missing: %s"),
> + oid_to_hex(oid), path);
> + }
> if (type != OBJ_NONE && type < 0)
> err = error(_("%s: object is of unknown type '%s': %s"),
> - oid_to_hex(oid), cb_data->obj_type.buf, path);
> + oid_to_hex(&real_oid), cb_data->obj_type.buf,
> + path);
> if (err < 0) {
> errors_found |= ERROR_OBJECT;
> return 0; /* keep checking other objects */
When we say "hash-path mismatch", we would have non-null contents,
presumably obtained from read_loose_object(). err is made negative
when we give that messge, and we come here to return. Did we forget
to free "contents" in that case?
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v10 17/17] fsck: report invalid object type-path combinations
2021-10-01 22:14 ` Junio C Hamano
@ 2021-10-01 22:33 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-01 22:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau
On Fri, Oct 01 2021, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> diff --git a/builtin/fsck.c b/builtin/fsck.c
>> index 260210bf8a1..30a516da29e 100644
>> --- a/builtin/fsck.c
>> +++ b/builtin/fsck.c
>> @@ -615,12 +616,18 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>> oi.sizep = &size;
>> oi.typep = &type;
>>
>> - if (read_loose_object(path, oid, &contents, &oi) < 0)
>> - err = error(_("%s: object corrupt or missing: %s"),
>> - oid_to_hex(oid), path);
>> + if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
>> + if (contents && !oideq(&real_oid, oid))
>> + err = error(_("%s: hash-path mismatch, found at: %s"),
>> + oid_to_hex(&real_oid), path);
>> + else
>> + err = error(_("%s: object corrupt or missing: %s"),
>> + oid_to_hex(oid), path);
>> + }
>> if (type != OBJ_NONE && type < 0)
>> err = error(_("%s: object is of unknown type '%s': %s"),
>> - oid_to_hex(oid), cb_data->obj_type.buf, path);
>> + oid_to_hex(&real_oid), cb_data->obj_type.buf,
>> + path);
>> if (err < 0) {
>> errors_found |= ERROR_OBJECT;
>> return 0; /* keep checking other objects */
>
> When we say "hash-path mismatch", we would have non-null contents,
> presumably obtained from read_loose_object(). err is made negative
> when we give that messge, and we come here to return. Did we forget
> to free "contents" in that case?
No, e.g. the "cat-file -t and -s on corrupt loose object" test added in
this series doesn't error with SANITIZE=leak.
This is because as we go through read_loose_object() we'll make our way
to unpack_loose_rest(), which will return that malloc'd buffer. So we
would leak it if we returned after that.
Except that in read_loose_object() we'll go on to call
check_object_signature() right afterwards. The expecte OID is whatever
we inferred from the FS path, and the OID we saw is what we get from
hashing. That call will return non-zero, and we'll free() the
contents. The buffer isn't NULL'd, but we can't use it.
This is all behavior that pre-dates this series. I think it's a bit
stupid, and we should arguably do better about data recovery here, as
alluded to at the end of the commit message.
I.e. ideally we can use the information that we know we wanted OID A,
who cares if we found it at path B? It hashes to A and completes the
graph! Let's just re-write it to A. Or maybe it's not worth it. Or we'd
want to optionally log the content we *do* find on such failures,
e.g. maybe the content is partial or whatever. I had some WIP work on
top of this that did that, e.g. to recover in cases where you append
garbage data at the end of an object (in which case we *do* have the
content and can recover, we just need to stop reading at that byte once
our OID matches, and re-write it out again).
But anyway, it doesn't work that way now, and this doesn't leak memory,
or as far as I can tell do the wrong thing in these various edge cases,
because "content is bad" is always synonymous with read_loose_object()
itself calling free().
Thanks a lot for the careful checking!
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v2] receive-pack: not receive pack file with large object
2021-10-01 9:16 ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-10-01 22:14 ` Junio C Hamano
@ 2021-11-11 3:03 ` Han Xin
2021-11-11 18:35 ` Junio C Hamano
2021-11-11 3:05 ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Han Xin
2 siblings, 1 reply; 245+ messages in thread
From: Han Xin @ 2021-11-11 3:03 UTC (permalink / raw)
To: avarab; +Cc: git, gitster, jonathantanmy, me, peff, rybak.a.v
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
...
> diff --git a/object-file.c b/object-file.c
> index fe95285f405..49561e31551 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
> * the streaming interface and rehash it to do the same.
> */
> int check_object_signature(struct repository *r, const struct object_id *oid,
> - void *map, unsigned long size, const char *type)
> + void *map, unsigned long size, const char *type,
> + struct object_id *real_oidp)
> {
> - struct object_id real_oid;
> + struct object_id tmp;
> + struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
> enum object_type obj_type;
> struct git_istream *st;
> git_hash_ctx c;
> @@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
> int hdrlen;
>
> if (map) {
> - hash_object_file(r->hash_algo, map, size, type, &real_oid);
> - return !oideq(oid, &real_oid) ? -1 : 0;
> + hash_object_file(r->hash_algo, map, size, type, real_oid);
> + return !oideq(oid, real_oid) ? -1 : 0;
> }
>
> st = open_istream(r, oid, &obj_type, &size, NULL);
> @@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
> break;
> r->hash_algo->update_fn(&c, buf, readlen);
> }
> - r->hash_algo->final_oid_fn(&real_oid, &c);
> + r->hash_algo->final_oid_fn(real_oid, &c);
> close_istream(st);
> - return !oideq(oid, &real_oid) ? -1 : 0;
> + return !oideq(oid, real_oid) ? -1 : 0;
> }
>
> int git_open_cloexec(const char *name, int flags)
> @@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
>
> int read_loose_object(const char *path,
> const struct object_id *expected_oid,
> + struct object_id *real_oid,
> void **contents,
> struct object_info *oi)
> {
> @@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
> char hdr[MAX_HEADER_LEN];
> unsigned long *size = oi->sizep;
>
> - *contents = NULL;
> -
Deleting "*contents = NULL;" here will cause a memory free error.
When reading a large loose blob ( large than big_file_threshold), it will enter the following block and *content will not be set:
if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
goto out;
} else {
...
}
This test case can illustrate this problem:
test_expect_success 'fsck large loose blob' '
blob=$(echo large | git hash-object -w --stdin) &&
git -c core.bigfilethreshold=4 fsck
'
git(73697,0x1198f1e00) malloc: *** error for object 0x36: pointer being freed was not allocated
git(73697,0x1198f1e00) malloc: *** set a breakpoint in malloc_error_break to debug
./test-lib.sh: line 947: 73697 Abort trap: 6 git -c core.bigfilethreshold=4 fsck
> map = map_loose_object_1(the_repository, path, NULL, &mapsize);
> if (!map) {
> error_errno(_("unable to mmap %s"), path);
> @@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
> goto out;
> }
> if (check_object_signature(the_repository, expected_oid,
> - *contents, *size, oi->type_name->buf)) {
> - error(_("hash mismatch for %s (expected %s)"), path,
> - oid_to_hex(expected_oid));
> + *contents, *size, oi->type_name->buf, real_oid)) {
> free(*contents);
> goto out;
> }
...
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH v10 17/17] fsck: report invalid object type-path combinations
2021-10-01 9:16 ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
2021-10-01 22:14 ` Junio C Hamano
2021-11-11 3:03 ` [PATCH v2] receive-pack: not receive pack file with large object Han Xin
@ 2021-11-11 3:05 ` Han Xin
2021-11-11 5:18 ` [PATCH 0/2] v2.34.0-rc2 regression: free() of uninitialized in ab/fsck-unexpected-type Ævar Arnfjörð Bjarmason
2 siblings, 1 reply; 245+ messages in thread
From: Han Xin @ 2021-11-11 3:05 UTC (permalink / raw)
To: avarab; +Cc: git, gitster, jonathantanmy, me, peff, rybak.a.v
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
...
> diff --git a/object-file.c b/object-file.c
> index fe95285f405..49561e31551 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1084,9 +1084,11 @@ void *xmmap(void *start, size_t length,
> * the streaming interface and rehash it to do the same.
> */
> int check_object_signature(struct repository *r, const struct object_id *oid,
> - void *map, unsigned long size, const char *type)
> + void *map, unsigned long size, const char *type,
> + struct object_id *real_oidp)
> {
> - struct object_id real_oid;
> + struct object_id tmp;
> + struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
> enum object_type obj_type;
> struct git_istream *st;
> git_hash_ctx c;
> @@ -1094,8 +1096,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
> int hdrlen;
>
> if (map) {
> - hash_object_file(r->hash_algo, map, size, type, &real_oid);
> - return !oideq(oid, &real_oid) ? -1 : 0;
> + hash_object_file(r->hash_algo, map, size, type, real_oid);
> + return !oideq(oid, real_oid) ? -1 : 0;
> }
>
> st = open_istream(r, oid, &obj_type, &size, NULL);
> @@ -1120,9 +1122,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
> break;
> r->hash_algo->update_fn(&c, buf, readlen);
> }
> - r->hash_algo->final_oid_fn(&real_oid, &c);
> + r->hash_algo->final_oid_fn(real_oid, &c);
> close_istream(st);
> - return !oideq(oid, &real_oid) ? -1 : 0;
> + return !oideq(oid, real_oid) ? -1 : 0;
> }
>
> int git_open_cloexec(const char *name, int flags)
> @@ -2572,6 +2574,7 @@ static int check_stream_oid(git_zstream *stream,
>
> int read_loose_object(const char *path,
> const struct object_id *expected_oid,
> + struct object_id *real_oid,
> void **contents,
> struct object_info *oi)
> {
> @@ -2582,8 +2585,6 @@ int read_loose_object(const char *path,
> char hdr[MAX_HEADER_LEN];
> unsigned long *size = oi->sizep;
>
> - *contents = NULL;
> -
Deleting "*contents = NULL;" here will cause a memory free error.
When reading a large loose blob ( large than big_file_threshold), it will enter the following block and *content will not be set:
if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
goto out;
} else {
...
}
This test case can illustrate this problem:
test_expect_success 'fsck large loose blob' '
blob=$(echo large | git hash-object -w --stdin) &&
git -c core.bigfilethreshold=4 fsck
'
git(73697,0x1198f1e00) malloc: *** error for object 0x36: pointer being freed was not allocated
git(73697,0x1198f1e00) malloc: *** set a breakpoint in malloc_error_break to debug
./test-lib.sh: line 947: 73697 Abort trap: 6 git -c core.bigfilethreshold=4 fsck
> map = map_loose_object_1(the_repository, path, NULL, &mapsize);
> if (!map) {
> error_errno(_("unable to mmap %s"), path);
> @@ -2613,9 +2614,7 @@ int read_loose_object(const char *path,
> goto out;
> }
> if (check_object_signature(the_repository, expected_oid,
> - *contents, *size, oi->type_name->buf)) {
> - error(_("hash mismatch for %s (expected %s)"), path,
> - oid_to_hex(expected_oid));
> + *contents, *size, oi->type_name->buf, real_oid)) {
> free(*contents);
> goto out;
> }
...
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH 0/2] v2.34.0-rc2 regression: free() of uninitialized in ab/fsck-unexpected-type
2021-11-11 3:05 ` [PATCH v10 17/17] fsck: report invalid object type-path combinations Han Xin
@ 2021-11-11 5:18 ` Ævar Arnfjörð Bjarmason
2021-11-11 5:18 ` [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 Ævar Arnfjörð Bjarmason
2021-11-11 5:18 ` [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller Ævar Arnfjörð Bjarmason
0 siblings, 2 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-11 5:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
On Thu, Nov 11 2021, Han Xin wrote:
> [...]
> Deleting "*contents = NULL;" here will cause a memory free error.
> When reading a large loose blob ( large than big_file_threshold), it will enter the following block and *content will not be set:
>
> if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
> if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
> goto out;
> } else {
> ...
> }
>
>
> This test case can illustrate this problem:
>
> test_expect_success 'fsck large loose blob' '
> blob=$(echo large | git hash-object -w --stdin) &&
> git -c core.bigfilethreshold=4 fsck
> '
>
> git(73697,0x1198f1e00) malloc: *** error for object 0x36: pointer being freed was not allocated
> git(73697,0x1198f1e00) malloc: *** set a breakpoint in malloc_error_break to debug
> ./test-lib.sh: line 947: 73697 Abort trap: 6 git -c core.bigfilethreshold=4 fsck
Thanks a lot for the detailed report and test case. It looks like I've
got the dubious honor of most scary caught-by-rc bug so far.
This series:
Ævar Arnfjörð Bjarmason (2):
object-file: fix SEGV on free() regression in v2.34.0-rc2
This is the most minimal fix for this issue. So Junio, if you'd like
to just pick this up for v2.34.0 you can peel just 1/2 off...
object-file: free(*contents) only in read_loose_object() caller
... a fix for a related issue. In ab/fsck-unexpected-type we stopped
die()-ing in the object-name.c, so per SANITIZE=leak's accounting we
introduced a memory leak with the same variable we dealt with in 1/2.
But IMO more importantly by changing this code so that only one
function owns the free()-ing it's much easier to reason about this
code.
builtin/fsck.c | 3 ++-
object-file.c | 5 ++---
t/t1050-large.sh | 8 ++++++++
3 files changed, 12 insertions(+), 4 deletions(-)
--
2.34.0.rc2.795.g926201d1cc8
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2
2021-11-11 5:18 ` [PATCH 0/2] v2.34.0-rc2 regression: free() of uninitialized in ab/fsck-unexpected-type Ævar Arnfjörð Bjarmason
@ 2021-11-11 5:18 ` Ævar Arnfjörð Bjarmason
2021-11-11 15:18 ` Jeff King
2021-11-11 18:41 ` Junio C Hamano
2021-11-11 5:18 ` [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller Ævar Arnfjörð Bjarmason
1 sibling, 2 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-11 5:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid
object type-path combinations, 2021-10-01). When fsck-ing blobs larger
than core.bigFileThreshold we'd free() a pointer to uninitialized
memory.
This issue would have been caught by SANITIZE=address, but since it
involves core.bigFileThreshold none of the existing tests in our test
suite covered it.
Running them with the "big_file_threshold" in "environment.c" changed
to say "6" would have shown this failure, but let's add a dedicated
test for this scenario based on Han Xin's report[1].
It would be a good follow-up change to add a GIT_TEST_* mode to run
all the tests with a low core.bigFileThreshold threshold.
Currently a lot of them fail (but none due to SANITIZE=address)
because they make implicit assumptions about the current hardcoded
setting of core.bigFileThreshold.
Around half the failures are due to us assuming that files larger than
that are binary, see 6bf3b813486 (diff --stat: mark any file larger
than core.bigfilethreshold binary, 2014-08-16) and the comment added
in 12426e114b2 (diff: do not short-cut CHECK_SIZE_ONLY check in
diff_populate_filespec(), 2017-03-01). The rest seem to all be
pack/loose-related, i.e. they're assuming that something ends up as a
loose object or in a pack.
The bug was introduced between v9 and v10[2] of the fsck series merged
in 061a21d36d8 (Merge branch 'ab/fsck-unexpected-type', 2021-10-25).
1. https://lore.kernel.org/git/20211111030302.75694-1-hanxin.hx@alibaba-inc.com/
2. https://lore.kernel.org/git/cover-v10-00.17-00000000000-20211001T091051Z-avarab@gmail.com/
Reported-by: Han Xin <chiyutianyi@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
object-file.c | 2 ++
t/t1050-large.sh | 8 ++++++++
2 files changed, 10 insertions(+)
diff --git a/object-file.c b/object-file.c
index 02b79702748..ac476653a06 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2528,6 +2528,8 @@ int read_loose_object(const char *path,
char hdr[MAX_HEADER_LEN];
unsigned long *size = oi->sizep;
+ *contents = NULL;
+
map = map_loose_object_1(the_repository, path, NULL, &mapsize);
if (!map) {
error_errno(_("unable to mmap %s"), path);
diff --git a/t/t1050-large.sh b/t/t1050-large.sh
index 4bab6a513c5..6bc1d76fb10 100755
--- a/t/t1050-large.sh
+++ b/t/t1050-large.sh
@@ -17,6 +17,14 @@ test_expect_success setup '
export GIT_ALLOC_LIMIT
'
+test_expect_success 'enter "large" codepath, with small core.bigFileThreshold' '
+ test_when_finished "rm -rf repo" &&
+
+ git init --bare repo &&
+ echo large | git -C repo hash-object -w --stdin &&
+ git -C repo -c core.bigfilethreshold=4 fsck
+'
+
# add a large file with different settings
while read expect config
do
--
2.34.0.rc2.795.g926201d1cc8
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2
2021-11-11 5:18 ` [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 Ævar Arnfjörð Bjarmason
@ 2021-11-11 15:18 ` Jeff King
2021-11-11 18:41 ` Junio C Hamano
1 sibling, 0 replies; 245+ messages in thread
From: Jeff King @ 2021-11-11 15:18 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Junio C Hamano, Han Xin, Jonathan Tan, Andrei Rybak, Taylor Blau
On Thu, Nov 11, 2021 at 06:18:55AM +0100, Ævar Arnfjörð Bjarmason wrote:
> diff --git a/object-file.c b/object-file.c
> index 02b79702748..ac476653a06 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2528,6 +2528,8 @@ int read_loose_object(const char *path,
> char hdr[MAX_HEADER_LEN];
> unsigned long *size = oi->sizep;
>
> + *contents = NULL;
> +
> map = map_loose_object_1(the_repository, path, NULL, &mapsize);
> if (!map) {
> error_errno(_("unable to mmap %s"), path);
OK, I agree this fixes the segfault, and is the minimal fix.
I do find the fact that fsck_loose() looks at "contents" after
read_loose_object() returns an error to be a bit questionable. It's a
recipe for confusion about what has happened, and who is supposed to
free what. Your v2 addresses the leak, but by just shifting more burden
to the caller. There's only one caller, so it's not too bad, but for a
public function, read_loose_object() has a lot of sharp edges.
Plus I think it fails to work as intended for streaming blobs (we do not
fill in "contents" at all in that case, so we can never say "hash-path
mismatch").
I understand you're trying to catch the case of "we actually opened the
file and computed the sha1 of its contents" from cases where we didn't
get that far. But since you initialize real_oid, it seems like it would
be better to see if anything was written to that.
I.e., something like:
diff --git a/builtin/fsck.c b/builtin/fsck.c
index d87c28a1cc..8f156ed9cd 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -617,18 +617,20 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
oi.typep = &type;
if (read_loose_object(path, oid, &real_oid, &contents, &oi) < 0) {
- if (contents && !oideq(&real_oid, oid))
+ if (!is_null_oid(&real_oid) && !oideq(&real_oid, oid))
err = error(_("%s: hash-path mismatch, found at: %s"),
oid_to_hex(&real_oid), path);
else
err = error(_("%s: object corrupt or missing: %s"),
oid_to_hex(oid), path);
+ errors_found |= ERROR_OBJECT;
+ return 0; /* keep checking other objects */
}
- if (type != OBJ_NONE && type < 0)
+ if (type != OBJ_NONE && type < 0) {
err = error(_("%s: object is of unknown type '%s': %s"),
oid_to_hex(&real_oid), cb_data->obj_type.buf,
path);
- if (err < 0) {
+ free(contents);
errors_found |= ERROR_OBJECT;
return 0; /* keep checking other objects */
}
(the "err" variable is now superfluous, but I left it in to keep the
diff smaller). And then it would be safe to just set "contents" in
read_loose_object() when we need it:
diff --git a/object-file.c b/object-file.c
index ac476653a0..5e8ff94fd4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2528,8 +2528,6 @@ int read_loose_object(const char *path,
char hdr[MAX_HEADER_LEN];
unsigned long *size = oi->sizep;
- *contents = NULL;
-
map = map_loose_object_1(the_repository, path, NULL, &mapsize);
if (!map) {
error_errno(_("unable to mmap %s"), path);
@@ -2549,6 +2547,7 @@ int read_loose_object(const char *path,
}
if (*oi->typep == OBJ_BLOB && *size > big_file_threshold) {
+ *contents = NULL;
if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
goto out;
} else {
That doesn't fix the hash-path mismatch problem for streaming, but it
sets us up to do so, if check_stream_oid() returned the real_oid it
computed.
All of this is much too large for an -rc fix, so we should take your
patch as-is. These are just thoughts I had while trying to figure out
if there were other problems caused by that same commit.
-Peff
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2
2021-11-11 5:18 ` [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 Ævar Arnfjörð Bjarmason
2021-11-11 15:18 ` Jeff King
@ 2021-11-11 18:41 ` Junio C Hamano
2021-11-13 9:00 ` Ævar Arnfjörð Bjarmason
1 sibling, 1 reply; 245+ messages in thread
From: Junio C Hamano @ 2021-11-11 18:41 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid
> object type-path combinations, 2021-10-01). When fsck-ing blobs larger
> than core.bigFileThreshold we'd free() a pointer to uninitialized
> memory.
s/d we'd/d, we'd/; no need to resend.
> This issue would have been caught by SANITIZE=address, but since it
> involves core.bigFileThreshold none of the existing tests in our test
> suite covered it.
s/d none/d, none/; likewise.
> Running them with the "big_file_threshold" in "environment.c" changed
> to say "6" would have shown this failure, but let's add a dedicated
> test for this scenario based on Han Xin's report[1].
Yeah, it is a good and focused test.
By the way, I do not think changing big_file_threshold _blindly_ to
smaller values, instead of in a focused test like this, is a good
idea in general. Some tests check if a file with a normal size that
is smaller than the threshold correctly is treated as a binary file,
and lowering threshold for them without understanding what they are
meant to test would trigger a "bug" that is not a bug at all, for
example.
> It would be a good follow-up change to add a GIT_TEST_* mode to run
> all the tests with a low core.bigFileThreshold threshold.
So, no, please don't do that.
> object-file.c | 2 ++
> t/t1050-large.sh | 8 ++++++++
> 2 files changed, 10 insertions(+)
>
> diff --git a/object-file.c b/object-file.c
> index 02b79702748..ac476653a06 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2528,6 +2528,8 @@ int read_loose_object(const char *path,
> char hdr[MAX_HEADER_LEN];
> unsigned long *size = oi->sizep;
>
> + *contents = NULL;
> +
> map = map_loose_object_1(the_repository, path, NULL, &mapsize);
> if (!map) {
> error_errno(_("unable to mmap %s"), path);
> diff --git a/t/t1050-large.sh b/t/t1050-large.sh
> index 4bab6a513c5..6bc1d76fb10 100755
> --- a/t/t1050-large.sh
> +++ b/t/t1050-large.sh
> @@ -17,6 +17,14 @@ test_expect_success setup '
> export GIT_ALLOC_LIMIT
> '
>
> +test_expect_success 'enter "large" codepath, with small core.bigFileThreshold' '
> + test_when_finished "rm -rf repo" &&
> +
> + git init --bare repo &&
> + echo large | git -C repo hash-object -w --stdin &&
> + git -C repo -c core.bigfilethreshold=4 fsck
> +'
> +
> # add a large file with different settings
> while read expect config
> do
^ permalink raw reply [flat|nested] 245+ messages in thread
* Re: [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2
2021-11-11 18:41 ` Junio C Hamano
@ 2021-11-13 9:00 ` Ævar Arnfjörð Bjarmason
0 siblings, 0 replies; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-13 9:00 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau
On Thu, Nov 11 2021, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid
>> object type-path combinations, 2021-10-01). When fsck-ing blobs larger
>> than core.bigFileThreshold we'd free() a pointer to uninitialized
>> memory.
>
> s/d we'd/d, we'd/; no need to resend.
>
>> This issue would have been caught by SANITIZE=address, but since it
>> involves core.bigFileThreshold none of the existing tests in our test
>> suite covered it.
>
> s/d none/d, none/; likewise.
>
>> Running them with the "big_file_threshold" in "environment.c" changed
>> to say "6" would have shown this failure, but let's add a dedicated
>> test for this scenario based on Han Xin's report[1].
>
> Yeah, it is a good and focused test.
>
> By the way, I do not think changing big_file_threshold _blindly_ to
> smaller values, instead of in a focused test like this, is a good
> idea in general. Some tests check if a file with a normal size that
> is smaller than the threshold correctly is treated as a binary file,
> and lowering threshold for them without understanding what they are
> meant to test would trigger a "bug" that is not a bug at all, for
> example.
>
>> It would be a good follow-up change to add a GIT_TEST_* mode to run
>> all the tests with a low core.bigFileThreshold threshold.
>
> So, no, please don't do that.
Yes it's probably not worth it, and I've got enough dragons to slay as
it is.
I took the commentary you added in 12426e114b2 (diff: do not short-cut
CHECK_SIZE_ONLY check in diff_populate_filespec(), 2017-03-01) as a
suggestion that we might be conflating too many things in
core.bigFileThreshold, but maybe that's just projecting.
I think that setting is probably too much of a kitchen sink grab bag of
stuff for its own good. Any such GIT_TEST_* mode would I think need to
introduce another setting to not have it imply "these files are binary".
Which may be a good idea in general, and it might not. I.e. are there
users who mainly don't want to consider these for packing, but do want
to have "git diff" work on them?
Anyway, even if that were split up we'd still have the remaining tests
that are assuming that something ends up loose or in a pack.
Fixing those is probably a good idea either way, so poking at this might
be a useful canary for someone. I haven't looked in any detail, but a
part of them are probably checking things manually on .git/objects and
could move to "git rev-parse" or whatever.
The other half likely really do care about whether something ends up
loose or not, and would probably benefit from testing "both sides".
None of that's anything I'll pursue now, just idle thoughts from having
looked at these failures a bit, in case anyone's interested.
^ permalink raw reply [flat|nested] 245+ messages in thread
* [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller
2021-11-11 5:18 ` [PATCH 0/2] v2.34.0-rc2 regression: free() of uninitialized in ab/fsck-unexpected-type Ævar Arnfjörð Bjarmason
2021-11-11 5:18 ` [PATCH 1/2] object-file: fix SEGV on free() regression in v2.34.0-rc2 Ævar Arnfjörð Bjarmason
@ 2021-11-11 5:18 ` Ævar Arnfjörð Bjarmason
2021-11-11 18:54 ` Junio C Hamano
1 sibling, 1 reply; 245+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-11 5:18 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak,
Taylor Blau, Ævar Arnfjörð Bjarmason
In the preceding commit a free() of uninitialized memory regression in
96e41f58fe1 (fsck: report invalid object type-path combinations,
2021-10-01) was fixed, but we'd still have an issue with leaking
memory from fsck_loose(). Let's fix that issue too.
That issue was introduced in my 31deb28f5e0 (fsck: don't hard die on
invalid object types, 2021-10-01). It can be reproduced under
SANITIZE=leak with the test I added in 093fffdfbec (fsck tests: add
test for fsck-ing an unknown type, 2021-10-01):
./t1450-fsck.sh --run=84 -vixd
In some sense it's not a problem, we lost the same amount of memory in
terms of things malloc'd and not free'd. It just moved from the "still
reachable" to "definitely lost" column in valgrind(1) nomenclature[1],
since we'd have die()'d before.
But now that we don't hard die() anymore in the library let's properly
free() it. Doing so makes this code much easier to follow, since we'll
now have one function owning the freeing of the "contents" variable,
not two.
For context on that memory management pattern the read_loose_object()
function was added in f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) and subsequently used in c68b489e564 (fsck:
parse loose object paths directly, 2017-01-13). The pattern of it
being the task of both sides to free() the memory has been there in
this form since its inception.
1. https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/fsck.c | 3 ++-
object-file.c | 7 ++-----
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index d87c28a1cc4..27b9e78094d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -605,7 +605,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
struct object *obj;
enum object_type type = OBJ_NONE;
unsigned long size;
- void *contents;
+ void *contents = NULL;
int eaten;
struct object_info oi = OBJECT_INFO_INIT;
struct object_id real_oid = *null_oid();
@@ -630,6 +630,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
path);
if (err < 0) {
errors_found |= ERROR_OBJECT;
+ free(contents);
return 0; /* keep checking other objects */
}
diff --git a/object-file.c b/object-file.c
index ac476653a06..c3d866a287e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2528,8 +2528,6 @@ int read_loose_object(const char *path,
char hdr[MAX_HEADER_LEN];
unsigned long *size = oi->sizep;
- *contents = NULL;
-
map = map_loose_object_1(the_repository, path, NULL, &mapsize);
if (!map) {
error_errno(_("unable to mmap %s"), path);
@@ -2559,10 +2557,9 @@ int read_loose_object(const char *path,
goto out;
}
if (check_object_signature(the_repository, expected_oid,
- *contents, *size, oi->type_name->buf, real_oid)) {
- free(*contents);
+ *contents, *size,
+ oi->type_name->buf, real_oid))
goto out;
- }
}
ret = 0; /* everything checks out */
--
2.34.0.rc2.795.g926201d1cc8
^ permalink raw reply related [flat|nested] 245+ messages in thread
* Re: [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller
2021-11-11 5:18 ` [PATCH 2/2] object-file: free(*contents) only in read_loose_object() caller Ævar Arnfjörð Bjarmason
@ 2021-11-11 18:54 ` Junio C Hamano
0 siblings, 0 replies; 245+ messages in thread
From: Junio C Hamano @ 2021-11-11 18:54 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: git, Han Xin, Jeff King, Jonathan Tan, Andrei Rybak, Taylor Blau
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index d87c28a1cc4..27b9e78094d 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -605,7 +605,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> struct object *obj;
> enum object_type type = OBJ_NONE;
> unsigned long size;
> - void *contents;
> + void *contents = NULL;
> int eaten;
> struct object_info oi = OBJECT_INFO_INIT;
> struct object_id real_oid = *null_oid();
> @@ -630,6 +630,7 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
> path);
> if (err < 0) {
> errors_found |= ERROR_OBJECT;
> + free(contents);
> return 0; /* keep checking other objects */
> }
>
> diff --git a/object-file.c b/object-file.c
> index ac476653a06..c3d866a287e 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2528,8 +2528,6 @@ int read_loose_object(const char *path,
> char hdr[MAX_HEADER_LEN];
> unsigned long *size = oi->sizep;
>
> - *contents = NULL;
> -
> map = map_loose_object_1(the_repository, path, NULL, &mapsize);
> if (!map) {
> error_errno(_("unable to mmap %s"), path);
> @@ -2559,10 +2557,9 @@ int read_loose_object(const char *path,
> goto out;
> }
> if (check_object_signature(the_repository, expected_oid,
> - *contents, *size, oi->type_name->buf, real_oid)) {
> - free(*contents);
> + *contents, *size,
> + oi->type_name->buf, real_oid))
> goto out;
> - }
> }
Yeah, I have to say that read_loose_object() that frees *contents
without clearing *contents to NULL only because it wants to signal
if the failure comes from check_object_signature() step is quite
ugly. Making the caller responsible for freeing (in other words,
when caller's *contents is non-NULL after function returns, it
always has a valid piece of memory to be freed) makes the contract
easier to explain.
^ permalink raw reply [flat|nested] 245+ messages in thread