git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 2/2] pack-objects: fix segfault in --stdin-packs option
Date: Mon, 21 Jun 2021 16:33:11 -0400	[thread overview]
Message-ID: <YND3h2l10PlnSNGJ@nand.local> (raw)
In-Reply-To: <patch-2.2-a9702132385-20210621T145819Z-avarab@gmail.com>

On Mon, Jun 21, 2021 at 05:03:38PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a segfault in the --stdin-packs option added in
> 339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option,
> 2021-02-22). The read_packs_list_from_stdin() function didn't check
> that the lines it was reading were valid packs, and thus when doing
> the QSORT() with pack_mtime_cmp() we'd have a NULL "util" field.

It may be worth mentioning that the util pointer is used to associate
the names of included/excluded packs with the packed_git structs they
correspond to. I see it's mentioned in the very next paragraph, but it
may be helpful for other readers to see this information earlier.

> The logic error was in assuming that we could iterate all packs and
> annotate the excluded and included packs we got, as opposed to
> checking the lines we got on stdin. There was a check for excluded
> packs, but included packs were simply assumed to be valid.
>
> As noted in the test we'll not report the first bad line, but whatever
> line sorted first according to the string-list.c API. In this case I
> think that's fine.

Yeah. There isn't really a better way to do that since we don't have a
convenient function to look up packs by their name. Much more convenient
is to loop through all packs and assign them to entries in the
string_list one by one. That's O(n*log(n)), but it doesn't really matter
here since we expect n to be small-ish, and this is by far not the most
expensive part of writing a pack.

You could imagine doing something O(n^2) by looping through all packs
each time you receive a line of input. That performs worse, but arguably
provides a better experience when using this mode interactively. But
that is probably a relatively rare occurrence, so it likely doesn't
matter.

Equally, you could build a mapping from pack name to packed_git struct
ahead of time, and then do the lookups in constant time. That's linear,
of course, but you pay for it in memory. Honestly, the memory cost is
probably quite reasonable, but it may not be worth the effort, since I
suspect the vast majority of usage here is from 'git repack
--geometric'.


> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/pack-objects.c | 10 ++++++++++
>  t/t5300-pack-object.sh | 18 ++++++++++++++++++
>  2 files changed, 28 insertions(+)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index de00adbb9e0..65579e09fe0 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3310,6 +3310,16 @@ static void read_packs_list_from_stdin(void)
>  			item->util = p;
>  	}
>
> +	/*
> +	 * Arguments we got on stdin may not even be packs. Check that
> +	 * to avoid segfaulting later on in e.g. pack_mtime_cmp().
> +	 */

Could be worth adding "excluded packs are handled below".

> +	for_each_string_list_item(item, &include_packs) {
> +		struct packed_git *p = item->util;
> +		if (!p)
> +			die(_("could not find pack '%s'"), item->string);
> +	}
> +
>  	/*
>  	 * First handle all of the excluded packs, marking them as kept in-core

...and it may be worth updating this comment with s/First/Then.

> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> index 65e991e3706..330deec656b 100755
> --- a/t/t5300-pack-object.sh
> +++ b/t/t5300-pack-object.sh
> @@ -119,6 +119,24 @@ test_expect_success 'pack-object <stdin parsing: [|--revs] with --stdin' '
>  	test_cmp err.expect err.actual
>  '
>
> +test_expect_success 'pack-object <stdin parsing: --stdin-packs handles garbage' '
> +	cat >in <<-EOF &&
> +	$(git -C pack-object-stdin rev-parse one)
> +	$(git -C pack-object-stdin rev-parse two)
> +	EOF

It's not a big deal, but here-doc directly into `git pack-objects` is
much more common in t5300 than first redirecting it to a separate file.
I probably would have written (in a sub-shell to avoid -C
pack-object-stdin everywhere):


  cd pack-object-stdin &&
  test_must_fail git pack-objects --stdout --stdin-packs >/dev/null 2>actual <<-EOF
  $(git rev-parse one)
  $(git rev-parse two)
  EOF

Although the line is kind of long anyway (and it'd be even longer since
the subshell will get its own level of indentation). So I could entirely
buy that you did this for readability, which is fine by me.

> +
> +	# We actually just report the first bad line in strcmp()
> +	# order, it just so happens that we get the same result under
> +	# SHA-1 and SHA-256 here. It does not really matter that we
> +	# report the first bad item in this obscure case, so this
> +	# oddity of the test is OK.
> +	cat >err.expect <<-EOF &&
> +	fatal: could not find pack '"'"'$(git -C pack-object-stdin rev-parse two)'"'"'
> +	EOF
> +	test_must_fail git -C pack-object-stdin pack-objects stdin-with-stdin-option --stdin-packs <in 2>err.actual &&
> +	test_cmp err.expect err.actual

If we don't care which is reported (and it just so happens that we'll
get the first one in lexical order), I would be fine with

    test_i18ngrep "could not find pack" err.actual

too. It would be good to get rid of this comment and put it in the patch
message in more detail (instead of just referring to it as "[a]s noted
in the test".

Thanks,
Taylor

  reply	other threads:[~2021-06-21 20:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21 15:03 [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Ævar Arnfjörð Bjarmason
2021-06-21 15:03 ` [PATCH 1/2] pack-objects tests: cover blindspots in stdin handling Ævar Arnfjörð Bjarmason
2021-06-21 15:03 ` [PATCH 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
2021-06-21 20:33   ` Taylor Blau [this message]
2021-06-21 20:34 ` [PATCH 0/2] pack-objects: missing tests & --stdin-packs segfault fix Taylor Blau
2021-07-09 10:13 ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2021-07-09 10:13   ` [PATCH v2 1/2] pack-objects tests: cover blindspots in stdin handling Ævar Arnfjörð Bjarmason
2021-07-09 10:13   ` [PATCH v2 2/2] pack-objects: fix segfault in --stdin-packs option Ævar Arnfjörð Bjarmason
2021-07-19 21:31     ` Taylor Blau
2021-07-20 11:55       ` Ævar Arnfjörð Bjarmason
2021-07-20 16:58         ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YND3h2l10PlnSNGJ@nand.local \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).