All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix
@ 2021-06-21 15:03 Ævar Arnfjörð Bjarmason
  2021-06-21 15:03 ` [PATCH 1/2] pack-objects tests: cover blindspots in stdin handling Ævar Arnfjörð Bjarmason
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-21 15:03 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

When re-rolling an unrelated series[1] dealing with pack-objects.c and
revision.c I discovered that we have some test blindspots, and that
the newly added --stdin-packs option in v2.32.0 will segfault if fed
garbage data.

This fixes the test blindspots, and 2/2 fixes the segfault.

As discussed in its commit message I'm being lazy about emitting the
error message. If you supply N bogus lines on stdin we'll error on the
first one, since the input is first sorted by the string-list.c
API. The test case for the error message relies on which of two SHA
lines sorts first, and I picked input that happens to sort the same
way under both SHA-1 and SHA-256.

Lazy, but I figured for this use-case it wasn't worth keeping track of
what line we saw when, or to refactor the parsing check on pack names
as we get input lines.

1. https://lore.kernel.org/git/cover-0.4-0000000000-20210617T105537Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (2):
  pack-objects tests: cover blindspots in stdin handling
  pack-objects: fix segfault in --stdin-packs option

 builtin/pack-objects.c |  10 ++++
 t/t5300-pack-object.sh | 103 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 113 insertions(+)

-- 
2.32.0.599.g3967b4fa4ac


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] pack-objects tests: cover blindspots in stdin handling
  2021-06-21 15:03 [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Ævar Arnfjörð Bjarmason
@ 2021-06-21 15:03 ` Ævar Arnfjörð Bjarmason
  2021-06-21 15:03 ` [PATCH 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-21 15:03 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Cover blindspots in the testing of stdin handling, including the
"!len" condition added in b5d97e6b0a0 (pack-objects: run rev-list
equivalent internally., 2006-09-04). The codepath taken with --revs
and read_object_list_from_stdin() acts differently in some of these
common cases, let's test for those.

The "--stdin --revs" test being added here stresses the combination of
--stdin-packs and the revision.c --stdin argument, some of this was
covered in a test added in 339bce27f4f (builtin/pack-objects.c: add
'--stdin-packs' option, 2021-02-22), but let's make sure that
GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS=true keeps erroring out about
--stdin, and it isn't picked up by the revision.c API's handling of
that option.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5300-pack-object.sh | 85 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 5c5e53f0be9..65e991e3706 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -34,6 +34,91 @@ test_expect_success 'setup' '
 	} >expect
 '
 
+test_expect_success 'setup pack-object <stdin' '
+	git init pack-object-stdin &&
+	test_commit -C pack-object-stdin one &&
+	test_commit -C pack-object-stdin two
+
+'
+
+test_expect_success 'pack-object <stdin parsing: basic [|--revs]' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+	EOF
+
+	git -C pack-object-stdin pack-objects basic-stdin <in &&
+	idx=$(echo pack-object-stdin/basic-stdin-*.idx) &&
+	git show-index <"$idx" >actual &&
+	test_line_count = 1 actual &&
+
+	git -C pack-object-stdin pack-objects --revs basic-stdin-revs <in &&
+	idx=$(echo pack-object-stdin/basic-stdin-revs-*.idx) &&
+	git show-index <"$idx" >actual &&
+	test_line_count = 3 actual
+'
+
+test_expect_success 'pack-object <stdin parsing: [|--revs] bad line' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+	garbage
+	$(git -C pack-object-stdin rev-parse two)
+	EOF
+
+	sed "s/^> //g" >err.expect <<-EOF &&
+	fatal: expected object ID, got garbage:
+	>  garbage
+
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects bad-line-stdin <in 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	cat >err.expect <<-EOF &&
+	fatal: bad revision '"'"'garbage'"'"'
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects --revs bad-line-stdin-revs <in 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'pack-object <stdin parsing: [|--revs] empty line' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+
+	$(git -C pack-object-stdin rev-parse two)
+	EOF
+
+	sed -e "s/^> //g" -e "s/Z$//g" >err.expect <<-EOF &&
+	fatal: expected object ID, got garbage:
+	>  Z
+
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects empty-line-stdin <in 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	git -C pack-object-stdin pack-objects --revs empty-line-stdin-revs <in &&
+	idx=$(echo pack-object-stdin/empty-line-stdin-revs-*.idx) &&
+	git show-index <"$idx" >actual &&
+	test_line_count = 3 actual
+'
+
+test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+	$(git -C pack-object-stdin rev-parse two)
+	EOF
+
+	# There is the "--stdin-packs is incompatible with --revs"
+	# test below, but we should make sure that the revision.c
+	# --stdin is not picked up
+	cat >err.expect <<-EOF &&
+	fatal: disallowed abbreviated or ambiguous option '"'"'stdin'"'"'
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects stdin-with-stdin-option --stdin <in 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git -C pack-object-stdin pack-objects --stdin --revs stdin-with-stdin-option-revs 2>err.actual <in &&
+	test_cmp err.expect err.actual
+'
+
 # usage: check_deltas <stderr_from_pack_objects> <cmp_op> <nr_deltas>
 # e.g.: check_deltas stderr -gt 0
 check_deltas() {
-- 
2.32.0.599.g3967b4fa4ac


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] pack-objects: fix segfault in --stdin-packs option
  2021-06-21 15:03 [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Ævar Arnfjörð Bjarmason
  2021-06-21 15:03 ` [PATCH 1/2] pack-objects tests: cover blindspots in stdin handling Ævar Arnfjörð Bjarmason
@ 2021-06-21 15:03 ` Ævar Arnfjörð Bjarmason
  2021-06-21 20:33   ` Taylor Blau
  2021-06-21 20:34 ` [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Taylor Blau
  2021-07-09 10:13 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-21 15:03 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Fix a segfault in the --stdin-packs option added in
339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option,
2021-02-22). The read_packs_list_from_stdin() function didn't check
that the lines it was reading were valid packs, and thus when doing
the QSORT() with pack_mtime_cmp() we'd have a NULL "util" field.

The logic error was in assuming that we could iterate all packs and
annotate the excluded and included packs we got, as opposed to
checking the lines we got on stdin. There was a check for excluded
packs, but included packs were simply assumed to be valid.

As noted in the test we'll not report the first bad line, but whatever
line sorted first according to the string-list.c API. In this case I
think that's fine.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/pack-objects.c | 10 ++++++++++
 t/t5300-pack-object.sh | 18 ++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index de00adbb9e0..65579e09fe0 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3310,6 +3310,16 @@ static void read_packs_list_from_stdin(void)
 			item->util = p;
 	}
 
+	/*
+	 * Arguments we got on stdin may not even be packs. Check that
+	 * to avoid segfaulting later on in e.g. pack_mtime_cmp().
+	 */
+	for_each_string_list_item(item, &include_packs) {
+		struct packed_git *p = item->util;
+		if (!p)
+			die(_("could not find pack '%s'"), item->string);
+	}
+
 	/*
 	 * First handle all of the excluded packs, marking them as kept in-core
 	 * so that later calls to add_object_entry() discards any objects that
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 65e991e3706..330deec656b 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -119,6 +119,24 @@ test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' '
 	test_cmp err.expect err.actual
 '
 
+test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+	$(git -C pack-object-stdin rev-parse two)
+	EOF
+
+	# We actually just report the first bad line in strcmp()
+	# order, it just so happens that we get the same result under
+	# SHA-1 and SHA-256 here. It does not really matter that we
+	# report the first bad item in this obscure case, so this
+	# oddity of the test is OK.
+	cat >err.expect <<-EOF &&
+	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects stdin-with-stdin-option --stdin-packs <in 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
 # usage: check_deltas <stderr_from_pack_objects> <cmp_op> <nr_deltas>
 # e.g.: check_deltas stderr -gt 0
 check_deltas() {
-- 
2.32.0.599.g3967b4fa4ac


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] pack-objects: fix segfault in --stdin-packs option
  2021-06-21 15:03 ` [PATCH 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
@ 2021-06-21 20:33   ` Taylor Blau
  0 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2021-06-21 20:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Derrick Stolee

On Mon, Jun 21, 2021 at 05:03:38PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a segfault in the --stdin-packs option added in
> 339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option,
> 2021-02-22). The read_packs_list_from_stdin() function didn't check
> that the lines it was reading were valid packs, and thus when doing
> the QSORT() with pack_mtime_cmp() we'd have a NULL "util" field.

It may be worth mentioning that the util pointer is used to associate
the names of included/excluded packs with the packed_git structs they
correspond to. I see it's mentioned in the very next paragraph, but it
may be helpful for other readers to see this information earlier.

> The logic error was in assuming that we could iterate all packs and
> annotate the excluded and included packs we got, as opposed to
> checking the lines we got on stdin. There was a check for excluded
> packs, but included packs were simply assumed to be valid.
>
> As noted in the test we'll not report the first bad line, but whatever
> line sorted first according to the string-list.c API. In this case I
> think that's fine.

Yeah. There isn't really a better way to do that since we don't have a
convenient function to look up packs by their name. Much more convenient
is to loop through all packs and assign them to entries in the
string_list one by one. That's O(n*log(n)), but it doesn't really matter
here since we expect n to be small-ish, and this is by far not the most
expensive part of writing a pack.

You could imagine doing something O(n^2) by looping through all packs
each time you receive a line of input. That performs worse, but arguably
provides a better experience when using this mode interactively. But
that is probably a relatively rare occurrence, so it likely doesn't
matter.

Equally, you could build a mapping from pack name to packed_git struct
ahead of time, and then do the lookups in constant time. That's linear,
of course, but you pay for it in memory. Honestly, the memory cost is
probably quite reasonable, but it may not be worth the effort, since I
suspect the vast majority of usage here is from 'git repack
--geometric'.


> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/pack-objects.c | 10 ++++++++++
>  t/t5300-pack-object.sh | 18 ++++++++++++++++++
>  2 files changed, 28 insertions(+)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index de00adbb9e0..65579e09fe0 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3310,6 +3310,16 @@ static void read_packs_list_from_stdin(void)
>  			item->util = p;
>  	}
>
> +	/*
> +	 * Arguments we got on stdin may not even be packs. Check that
> +	 * to avoid segfaulting later on in e.g. pack_mtime_cmp().
> +	 */

Could be worth adding "excluded packs are handled below".

> +	for_each_string_list_item(item, &include_packs) {
> +		struct packed_git *p = item->util;
> +		if (!p)
> +			die(_("could not find pack '%s'"), item->string);
> +	}
> +
>  	/*
>  	 * First handle all of the excluded packs, marking them as kept in-core

...and it may be worth updating this comment with s/First/Then.

> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> index 65e991e3706..330deec656b 100755
> --- a/t/t5300-pack-object.sh
> +++ b/t/t5300-pack-object.sh
> @@ -119,6 +119,24 @@ test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' '
>  	test_cmp err.expect err.actual
>  '
>
> +test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' '
> +	cat >in <<-EOF &&
> +	$(git -C pack-object-stdin rev-parse one)
> +	$(git -C pack-object-stdin rev-parse two)
> +	EOF

It's not a big deal, but here-doc directly into `git pack-objects` is
much more common in t5300 than first redirecting it to a separate file.
I probably would have written (in a sub-shell to avoid -C
pack-object-stdin everywhere):


  cd pack-object-stdin &&
  test_must_fail git pack-objects --stdout --stdin-packs >/dev/null 2>actual <<-EOF
  $(git rev-parse one)
  $(git rev-parse two)
  EOF

Although the line is kind of long anyway (and it'd be even longer since
the subshell will get its own level of indentation). So I could entirely
buy that you did this for readability, which is fine by me.

> +
> +	# We actually just report the first bad line in strcmp()
> +	# order, it just so happens that we get the same result under
> +	# SHA-1 and SHA-256 here. It does not really matter that we
> +	# report the first bad item in this obscure case, so this
> +	# oddity of the test is OK.
> +	cat >err.expect <<-EOF &&
> +	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
> +	EOF
> +	test_must_fail git -C pack-object-stdin pack-objects stdin-with-stdin-option --stdin-packs <in 2>err.actual &&
> +	test_cmp err.expect err.actual

If we don't care which is reported (and it just so happens that we'll
get the first one in lexical order), I would be fine with

    test_i18ngrep "could not find pack" err.actual

too. It would be good to get rid of this comment and put it in the patch
message in more detail (instead of just referring to it as "[a]s noted
in the test".

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix
  2021-06-21 15:03 [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Ævar Arnfjörð Bjarmason
  2021-06-21 15:03 ` [PATCH 1/2] pack-objects tests: cover blindspots in stdin handling Ævar Arnfjörð Bjarmason
  2021-06-21 15:03 ` [PATCH 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
@ 2021-06-21 20:34 ` Taylor Blau
  2021-07-09 10:13 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
  3 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2021-06-21 20:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Derrick Stolee

On Mon, Jun 21, 2021 at 05:03:36PM +0200, Ævar Arnfjörð Bjarmason wrote:
> When re-rolling an unrelated series[1] dealing with pack-objects.c and
> revision.c I discovered that we have some test blindspots, and that
> the newly added --stdin-packs option in v2.32.0 will segfault if fed
> garbage data.
>
> This fixes the test blindspots, and 2/2 fixes the segfault.

Thanks. I took a close look at the second patch, and it looks good to me
with a few minor comments. The first patch looks good as well, although
I didn't look as closely.

> As discussed in its commit message I'm being lazy about emitting the
> error message. If you supply N bogus lines on stdin we'll error on the
> first one, since the input is first sorted by the string-list.c
> API. The test case for the error message relies on which of two SHA
> lines sorts first, and I picked input that happens to sort the same
> way under both SHA-1 and SHA-256.
>
> Lazy, but I figured for this use-case it wasn't worth keeping track of
> what line we saw when, or to refactor the parsing check on pack names
> as we get input lines.

Yeah. I think what you wrote is entirely reasonable, too. I suggested
some alternatives if you are feeling motivated to make the error
reporting nicer, but as you say, I think the vast majority of use-cases
don't care about the output.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 0/2] pack-objects: missing tests & --stdin-packs segfault fix
  2021-06-21 15:03 [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2021-06-21 20:34 ` [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Taylor Blau
@ 2021-07-09 10:13 ` Ævar Arnfjörð Bjarmason
  2021-07-09 10:13   ` [PATCH v2 1/2] pack-objects tests: cover blindspots in stdin handling Ævar Arnfjörð Bjarmason
  2021-07-09 10:13   ` [PATCH v2 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
  3 siblings, 2 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-09 10:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

When re-rolling an unrelated series[1] dealing with pack-objects.c and
revision.c I discovered that we have some test blindspots, and that
the newly added --stdin-packs option in v2.32.0 will segfault if fed
garbage data.

See
http://lore.kernel.org/git/cover-0.2-00000000000-20210621T145819Z-avarab@gmail.com
for the v1 of this series.

This hopefully addresses Taylor's comments in
https://lore.kernel.org/git/YND3h2l10PlnSNGJ@nand.local/

I didn't end up moving away from the "<in" pattern. I prefer it
because it makes manual inspection easier, and the tests above this
one used it consistently, so I left it in place.

I also ended up keeping the test_cmp for the reasons noted in the
updated 2/2 commit message. I.e. if we're commeting on this tricky
edge case let's have our tests assert that the code works that way,
and that the comment still holds.

Ævar Arnfjörð Bjarmason (2):
  pack-objects tests: cover blindspots in stdin handling
  pack-objects: fix segfault in --stdin-packs option

 builtin/pack-objects.c |  23 +++++++--
 t/t5300-pack-object.sh | 104 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+), 3 deletions(-)

Range-diff against v1:
1:  dd2efd1cfb9 = 1:  8a4d4b820e7 pack-objects tests: cover blindspots in stdin handling
2:  a9702132385 ! 2:  c7315f2b378 pack-objects: fix segfault in --stdin-packs option
    @@ Commit message
     
         Fix a segfault in the --stdin-packs option added in
         339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option,
    -    2021-02-22). The read_packs_list_from_stdin() function didn't check
    -    that the lines it was reading were valid packs, and thus when doing
    -    the QSORT() with pack_mtime_cmp() we'd have a NULL "util" field.
    +    2021-02-22).
    +
    +    The read_packs_list_from_stdin() function didn't check that the lines
    +    it was reading were valid packs, and thus when doing the QSORT() with
    +    pack_mtime_cmp() we'd have a NULL "util" field. The "util" field is
    +    used to associate the names of included/excluded packs with the
    +    packed_git structs they correspond to.
     
         The logic error was in assuming that we could iterate all packs and
         annotate the excluded and included packs we got, as opposed to
    @@ Commit message
     
         As noted in the test we'll not report the first bad line, but whatever
         line sorted first according to the string-list.c API. In this case I
    -    think that's fine.
    +    think that's fine. There was further discussion of alternate
    +    approaches in [1].
    +
    +    Even though we're being lazy let's assert the line we do expect to get
    +    in the test, since whoever changes this code in the future might miss
    +    this case, and would want to update the test and comments.
    +
    +    1. http://lore.kernel.org/git/YND3h2l10PlnSNGJ@nand.local
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/pack-objects.c ##
     @@ builtin/pack-objects.c: static void read_packs_list_from_stdin(void)
    - 			item->util = p;
      	}
      
    -+	/*
    -+	 * Arguments we got on stdin may not even be packs. Check that
    -+	 * to avoid segfaulting later on in e.g. pack_mtime_cmp().
    + 	/*
    +-	 * First handle all of the excluded packs, marking them as kept in-core
    +-	 * so that later calls to add_object_entry() discards any objects that
    +-	 * are also found in excluded packs.
    ++	 * Arguments we got on stdin may not even be packs. First
    ++	 * check that to avoid segfaulting later on in
    ++	 * e.g. pack_mtime_cmp(), excluded packs are handled below.
    ++	 *
    ++	 * Since we first parsed our STDIN and then sorted the input
    ++	 * lines the pack we error on will be whatever line happens to
    ++	 * sort first. This is lazy, it's enough that we report one
    ++	 * bad case here, we don't need to report the first/last one,
    ++	 * or all of them.
     +	 */
     +	for_each_string_list_item(item, &include_packs) {
     +		struct packed_git *p = item->util;
    @@ builtin/pack-objects.c: static void read_packs_list_from_stdin(void)
     +			die(_("could not find pack '%s'"), item->string);
     +	}
     +
    - 	/*
    - 	 * First handle all of the excluded packs, marking them as kept in-core
    - 	 * so that later calls to add_object_entry() discards any objects that
    ++	/*
    ++	 * Then, handle all of the excluded packs, marking them as
    ++	 * kept in-core so that later calls to add_object_entry()
    ++	 * discards any objects that are also found in excluded packs.
    + 	 */
    + 	for_each_string_list_item(item, &exclude_packs) {
    + 		struct packed_git *p = item->util;
     
      ## t/t5300-pack-object.sh ##
     @@ t/t5300-pack-object.sh: test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' '
    @@ t/t5300-pack-object.sh: test_expect_success 'pack-object <stdin parsing: [|--rev
     +	$(git -C pack-object-stdin rev-parse two)
     +	EOF
     +
    -+	# We actually just report the first bad line in strcmp()
    -+	# order, it just so happens that we get the same result under
    -+	# SHA-1 and SHA-256 here. It does not really matter that we
    -+	# report the first bad item in this obscure case, so this
    -+	# oddity of the test is OK.
    ++	# That we get "two" and not "one" has to do with OID
    ++	# ordering. It happens to be the same here under SHA-1 and
    ++	# SHA-256. See commentary in pack-objects.c
     +	cat >err.expect <<-EOF &&
     +	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
     +	EOF
    -+	test_must_fail git -C pack-object-stdin pack-objects stdin-with-stdin-option --stdin-packs <in 2>err.actual &&
    ++	test_must_fail git \
    ++		-C pack-object-stdin \
    ++		pack-objects stdin-with-stdin-option --stdin-packs \
    ++		<in 2>err.actual &&
     +	test_cmp err.expect err.actual
     +'
     +
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] pack-objects tests: cover blindspots in stdin handling
  2021-07-09 10:13 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
@ 2021-07-09 10:13   ` Ævar Arnfjörð Bjarmason
  2021-07-09 10:13   ` [PATCH v2 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-09 10:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Cover blindspots in the testing of stdin handling, including the
"!len" condition added in b5d97e6b0a0 (pack-objects: run rev-list
equivalent internally., 2006-09-04). The codepath taken with --revs
and read_object_list_from_stdin() acts differently in some of these
common cases, let's test for those.

The "--stdin --revs" test being added here stresses the combination of
--stdin-packs and the revision.c --stdin argument, some of this was
covered in a test added in 339bce27f4f (builtin/pack-objects.c: add
'--stdin-packs' option, 2021-02-22), but let's make sure that
GIT_TEST_DISALLOW_ABBREVIATED_OPTIONS=true keeps erroring out about
--stdin, and it isn't picked up by the revision.c API's handling of
that option.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5300-pack-object.sh | 85 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 5c5e53f0be9..65e991e3706 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -34,6 +34,91 @@ test_expect_success 'setup' '
 	} >expect
 '
 
+test_expect_success 'setup pack-object <stdin' '
+	git init pack-object-stdin &&
+	test_commit -C pack-object-stdin one &&
+	test_commit -C pack-object-stdin two
+
+'
+
+test_expect_success 'pack-object <stdin parsing: basic [|--revs]' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+	EOF
+
+	git -C pack-object-stdin pack-objects basic-stdin <in &&
+	idx=$(echo pack-object-stdin/basic-stdin-*.idx) &&
+	git show-index <"$idx" >actual &&
+	test_line_count = 1 actual &&
+
+	git -C pack-object-stdin pack-objects --revs basic-stdin-revs <in &&
+	idx=$(echo pack-object-stdin/basic-stdin-revs-*.idx) &&
+	git show-index <"$idx" >actual &&
+	test_line_count = 3 actual
+'
+
+test_expect_success 'pack-object <stdin parsing: [|--revs] bad line' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+	garbage
+	$(git -C pack-object-stdin rev-parse two)
+	EOF
+
+	sed "s/^> //g" >err.expect <<-EOF &&
+	fatal: expected object ID, got garbage:
+	>  garbage
+
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects bad-line-stdin <in 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	cat >err.expect <<-EOF &&
+	fatal: bad revision '"'"'garbage'"'"'
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects --revs bad-line-stdin-revs <in 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'pack-object <stdin parsing: [|--revs] empty line' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+
+	$(git -C pack-object-stdin rev-parse two)
+	EOF
+
+	sed -e "s/^> //g" -e "s/Z$//g" >err.expect <<-EOF &&
+	fatal: expected object ID, got garbage:
+	>  Z
+
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects empty-line-stdin <in 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	git -C pack-object-stdin pack-objects --revs empty-line-stdin-revs <in &&
+	idx=$(echo pack-object-stdin/empty-line-stdin-revs-*.idx) &&
+	git show-index <"$idx" >actual &&
+	test_line_count = 3 actual
+'
+
+test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+	$(git -C pack-object-stdin rev-parse two)
+	EOF
+
+	# There is the "--stdin-packs is incompatible with --revs"
+	# test below, but we should make sure that the revision.c
+	# --stdin is not picked up
+	cat >err.expect <<-EOF &&
+	fatal: disallowed abbreviated or ambiguous option '"'"'stdin'"'"'
+	EOF
+	test_must_fail git -C pack-object-stdin pack-objects stdin-with-stdin-option --stdin <in 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git -C pack-object-stdin pack-objects --stdin --revs stdin-with-stdin-option-revs 2>err.actual <in &&
+	test_cmp err.expect err.actual
+'
+
 # usage: check_deltas <stderr_from_pack_objects> <cmp_op> <nr_deltas>
 # e.g.: check_deltas stderr -gt 0
 check_deltas() {
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] pack-objects: fix segfault in --stdin-packs option
  2021-07-09 10:13 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
  2021-07-09 10:13   ` [PATCH v2 1/2] pack-objects tests: cover blindspots in stdin handling Ævar Arnfjörð Bjarmason
@ 2021-07-09 10:13   ` Ævar Arnfjörð Bjarmason
  2021-07-19 21:31     ` Taylor Blau
  1 sibling, 1 reply; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-09 10:13 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Jeff King, Derrick Stolee,
	Ævar Arnfjörð Bjarmason

Fix a segfault in the --stdin-packs option added in
339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option,
2021-02-22).

The read_packs_list_from_stdin() function didn't check that the lines
it was reading were valid packs, and thus when doing the QSORT() with
pack_mtime_cmp() we'd have a NULL "util" field. The "util" field is
used to associate the names of included/excluded packs with the
packed_git structs they correspond to.

The logic error was in assuming that we could iterate all packs and
annotate the excluded and included packs we got, as opposed to
checking the lines we got on stdin. There was a check for excluded
packs, but included packs were simply assumed to be valid.

As noted in the test we'll not report the first bad line, but whatever
line sorted first according to the string-list.c API. In this case I
think that's fine. There was further discussion of alternate
approaches in [1].

Even though we're being lazy let's assert the line we do expect to get
in the test, since whoever changes this code in the future might miss
this case, and would want to update the test and comments.

1. http://lore.kernel.org/git/YND3h2l10PlnSNGJ@nand.local

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/pack-objects.c | 23 ++++++++++++++++++++---
 t/t5300-pack-object.sh | 19 +++++++++++++++++++
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index de00adbb9e0..df49f656b96 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3311,9 +3311,26 @@ static void read_packs_list_from_stdin(void)
 	}
 
 	/*
-	 * First handle all of the excluded packs, marking them as kept in-core
-	 * so that later calls to add_object_entry() discards any objects that
-	 * are also found in excluded packs.
+	 * Arguments we got on stdin may not even be packs. First
+	 * check that to avoid segfaulting later on in
+	 * e.g. pack_mtime_cmp(), excluded packs are handled below.
+	 *
+	 * Since we first parsed our STDIN and then sorted the input
+	 * lines the pack we error on will be whatever line happens to
+	 * sort first. This is lazy, it's enough that we report one
+	 * bad case here, we don't need to report the first/last one,
+	 * or all of them.
+	 */
+	for_each_string_list_item(item, &include_packs) {
+		struct packed_git *p = item->util;
+		if (!p)
+			die(_("could not find pack '%s'"), item->string);
+	}
+
+	/*
+	 * Then, handle all of the excluded packs, marking them as
+	 * kept in-core so that later calls to add_object_entry()
+	 * discards any objects that are also found in excluded packs.
 	 */
 	for_each_string_list_item(item, &exclude_packs) {
 		struct packed_git *p = item->util;
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 65e991e3706..e13a8842075 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -119,6 +119,25 @@ test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' '
 	test_cmp err.expect err.actual
 '
 
+test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' '
+	cat >in <<-EOF &&
+	$(git -C pack-object-stdin rev-parse one)
+	$(git -C pack-object-stdin rev-parse two)
+	EOF
+
+	# That we get "two" and not "one" has to do with OID
+	# ordering. It happens to be the same here under SHA-1 and
+	# SHA-256. See commentary in pack-objects.c
+	cat >err.expect <<-EOF &&
+	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
+	EOF
+	test_must_fail git \
+		-C pack-object-stdin \
+		pack-objects stdin-with-stdin-option --stdin-packs \
+		<in 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
 # usage: check_deltas <stderr_from_pack_objects> <cmp_op> <nr_deltas>
 # e.g.: check_deltas stderr -gt 0
 check_deltas() {
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] pack-objects: fix segfault in --stdin-packs option
  2021-07-09 10:13   ` [PATCH v2 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
@ 2021-07-19 21:31     ` Taylor Blau
  2021-07-20 11:55       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 11+ messages in thread
From: Taylor Blau @ 2021-07-19 21:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Derrick Stolee

On Fri, Jul 09, 2021 at 12:13:48PM +0200, Ævar Arnfjörð Bjarmason wrote:

Thanks for the update, and sorry that it took me so long to get to. I
see that this still hasn't quite made its way to 'next', so I'll just
add one comment.

> +test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' '
> +	cat >in <<-EOF &&
> +	$(git -C pack-object-stdin rev-parse one)
> +	$(git -C pack-object-stdin rev-parse two)
> +	EOF

I see that you left my suggestion to inline this here-doc with the
actual 'pack-objects' invocation below alone, which is fine. I think
that it does help the readability, too, since it separates the input
from the command its being fed to.

> +	# That we get "two" and not "one" has to do with OID
> +	# ordering. It happens to be the same here under SHA-1 and
> +	# SHA-256. See commentary in pack-objects.c
> +	cat >err.expect <<-EOF &&
> +	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
> +	EOF

On the other hand, crafting this err.expect with one of the object's
full OID still sits funny with me. I appreciate you checking that this
is the correct object to test with in SHA-1 and SHA-256 mode, but isn't
the point that we shouldn't be relying on which object comes out?

I think that dropping this down to just something like:

    grep 'could not find' err.actual

would be an improvement since it avoids the finicky shell quoting,
hardens this test in the event of a future change in hashing algorithm,
and brings the test more in line with the spirit of the patch itself
(which is to report some of its input, not necessarily the first one
given).


Thanks,
Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] pack-objects: fix segfault in --stdin-packs option
  2021-07-19 21:31     ` Taylor Blau
@ 2021-07-20 11:55       ` Ævar Arnfjörð Bjarmason
  2021-07-20 16:58         ` Taylor Blau
  0 siblings, 1 reply; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-20 11:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Derrick Stolee


On Mon, Jul 19 2021, Taylor Blau wrote:

> On Fri, Jul 09, 2021 at 12:13:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> Thanks for the update, and sorry that it took me so long to get to. I
> see that this still hasn't quite made its way to 'next', so I'll just
> add one comment.
>
>> +test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' '
>> +	cat >in <<-EOF &&
>> +	$(git -C pack-object-stdin rev-parse one)
>> +	$(git -C pack-object-stdin rev-parse two)
>> +	EOF
>
> I see that you left my suggestion to inline this here-doc with the
> actual 'pack-objects' invocation below alone, which is fine. I think
> that it does help the readability, too, since it separates the input
> from the command its being fed to.

Yeah, per CL:
    
    I didn't end up moving away from the "<in" pattern. I prefer it
    because it makes manual inspection easier, and the tests above this
    one used it consistently, so I left it in place.

>> +	# That we get "two" and not "one" has to do with OID
>> +	# ordering. It happens to be the same here under SHA-1 and
>> +	# SHA-256. See commentary in pack-objects.c
>> +	cat >err.expect <<-EOF &&
>> +	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
>> +	EOF
>
> On the other hand, crafting this err.expect with one of the object's
> full OID still sits funny with me. I appreciate you checking that this
> is the correct object to test with in SHA-1 and SHA-256 mode, but isn't
> the point that we shouldn't be relying on which object comes out?
>
> I think that dropping this down to just something like:
>
>     grep 'could not find' err.actual
>
> would be an improvement since it avoids the finicky shell quoting,
> hardens this test in the event of a future change in hashing algorithm,
> and brings the test more in line with the spirit of the patch itself
> (which is to report some of its input, not necessarily the first one
> given).

If we've got another hash transition (unlikely, at least near-ish term)
we can just look at this test again.

More plausibly it's a common pattern in our test suite that greps like
that elide actual problems, e.g. a loop printing the error N times, that
seems more likely in this case.

Do you mind if it's just left as it is?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] pack-objects: fix segfault in --stdin-packs option
  2021-07-20 11:55       ` Ævar Arnfjörð Bjarmason
@ 2021-07-20 16:58         ` Taylor Blau
  0 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2021-07-20 16:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, Junio C Hamano, Jeff King, Derrick Stolee

On Tue, Jul 20, 2021 at 01:55:31PM +0200, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Jul 19 2021, Taylor Blau wrote:
> > On Fri, Jul 09, 2021 at 12:13:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> +test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' '
> >> +	cat >in <<-EOF &&
> >> +	$(git -C pack-object-stdin rev-parse one)
> >> +	$(git -C pack-object-stdin rev-parse two)
> >> +	EOF
> >
> > I see that you left my suggestion to inline this here-doc with the
> > actual 'pack-objects' invocation below alone, which is fine. I think
> > that it does help the readability, too, since it separates the input
> > from the command its being fed to.
>
> Yeah, per CL:
>
>     I didn't end up moving away from the "<in" pattern. I prefer it
>     because it makes manual inspection easier, and the tests above this
>     one used it consistently, so I left it in place.

Ah, my apologies for reading right past it. Thanks!

> >> +	# That we get "two" and not "one" has to do with OID
> >> +	# ordering. It happens to be the same here under SHA-1 and
> >> +	# SHA-256. See commentary in pack-objects.c
> >> +	cat >err.expect <<-EOF &&
> >> +	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
> >> +	EOF
> >
> > On the other hand, crafting this err.expect with one of the object's
> > full OID still sits funny with me. I appreciate you checking that this
> > is the correct object to test with in SHA-1 and SHA-256 mode, but isn't
> > the point that we shouldn't be relying on which object comes out?
> >
> > I think that dropping this down to just something like:
> >
> >     grep 'could not find' err.actual
> >
> > would be an improvement since it avoids the finicky shell quoting,
> > hardens this test in the event of a future change in hashing algorithm,
> > and brings the test more in line with the spirit of the patch itself
> > (which is to report some of its input, not necessarily the first one
> > given).
>
> If we've got another hash transition (unlikely, at least near-ish term)
> we can just look at this test again.
>
> More plausibly it's a common pattern in our test suite that greps like
> that elide actual problems, e.g. a loop printing the error N times, that
> seems more likely in this case.

I understand the reasoning, but I'm not quite sure that I buy that that
is a likely defect here. Of course, that doesn't mean that stricter
tests aren't good for nothing, but it's a tradeoff.

> Do you mind if it's just left as it is?

No, I don't mind, but I do think that changing the test to be looser
would be an improvement. Of course, there are lots of unlikely/rare
things that could cause this test to break (like picking a different
hashing algorithm), but I think fundamentally this test disagrees in
spirit with the code.

I.e., it seems odd to me that the code says "well, we have to sort this
list before we can produce an error, so the error won't necessarily
correspond to your first line of input" but the test says "we should
check that exactly such-and-such an object is included in the error".

Obviously that is far from the worst outcome in the world, but it does
seem a little odd to me. In any case, I don't feel particularly strongly
about it, and (as I said much earlier in the thread) this probably is
all academic anyway, since I would expect the vast majority of uses of
'--stdin-packs' are from 'git repack --geometric' anyway.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-07-20 16:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-21 15:03 [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Ævar Arnfjörð Bjarmason
2021-06-21 15:03 ` [PATCH 1/2] pack-objects tests: cover blindspots in stdin handling Ævar Arnfjörð Bjarmason
2021-06-21 15:03 ` [PATCH 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
2021-06-21 20:33   ` Taylor Blau
2021-06-21 20:34 ` [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Taylor Blau
2021-07-09 10:13 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2021-07-09 10:13   ` [PATCH v2 1/2] pack-objects tests: cover blindspots in stdin handling Ævar Arnfjörð Bjarmason
2021-07-09 10:13   ` [PATCH v2 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
2021-07-19 21:31     ` Taylor Blau
2021-07-20 11:55       ` Ævar Arnfjörð Bjarmason
2021-07-20 16:58         ` Taylor Blau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.