All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	Hariom Verma <hariom18599@gmail.com>,
	ZheNing Hu <adlternative@gmail.com>
Subject: Re: [PATCH] [GSOC] cat-file: fix --batch report changed-type bug
Date: Sun, 30 May 2021 17:09:44 -0400	[thread overview]
Message-ID: <YLP/GEN0qIXvWEUn@coredump.intra.peff.net> (raw)
In-Reply-To: <pull.965.git.1622363366722.gitgitgadget@gmail.com>

On Sun, May 30, 2021 at 08:29:26AM +0000, ZheNing Hu via GitGitGadget wrote:

> From: ZheNing Hu <adlternative@gmail.com>
> 
> When we use `--batch` with no atoms formatting and use
> `--batch-all-objects` at the same time (e.g.
> `git cat-file --batch="batman" --batch-all-objects`),
> Git will exit and report "object xxx changed type!?".
> 
> This is because we have a format string which does not
> contain any atoms, so skip_object_info option will be
> set if we also use --batch-all-objects, and then
> `oid_object_info_extended()` will be skipped in
> `batch_object_write()`, it cause object type to not be
> collected. Therefore, it reported object type has changed.

Good find. I think this bug is mostly my fault, as I added the
skip_object_info flag but never thought to use it with --batch.

> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 5ebf13359e83..5f9578f9b86b 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -345,11 +345,12 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
>  		contents = read_object_file(oid, &type, &size);
>  		if (!contents)
>  			die("object %s disappeared", oid_to_hex(oid));
> -		if (type != data->type)
> -			die("object %s changed type!?", oid_to_hex(oid));
> -		if (data->info.sizep && size != data->size)
> -			die("object %s changed size!?", oid_to_hex(oid));
> -
> +		if (!(opt->all_objects && data->skip_object_info)) {
> +			if (type != data->type)
> +				die("object %s changed type!?", oid_to_hex(oid));
> +			if (data->info.sizep && size != data->size)
> +				die("object %s changed size!?", oid_to_hex(oid));
> +		}

Wouldn't checking data->skip_object_info be sufficient? It's only set
when opt->all_objects is set anyway. But more importantly, it is the
most direct root of the problem: we did not find out the type and size
earlier, so comparing anything against data->type is useless.

But that leads me to a follow-up question: what if we did give a format,
so skip_object_info isn't set, but it didn't include the type or size?

In the size code, it looks like we explicitly protect against this by
checking if data->info.sizep is set (i.e., did we request the size from
oid_object_info_extended). But it's not for the type.

So the assumption is that we do always fill in the type, even if the
user didn't ask for it. And that assumption is actually violated much
earlier. These two bits of code from the setup are out of order:

          if (opt->all_objects) {
                  struct object_info empty = OBJECT_INFO_INIT;
                  if (!memcmp(&data.info, &empty, sizeof(empty)))
                          data.skip_object_info = 1;
          }

          /*
           * If we are printing out the object, then always fill in the type,
           * since we will want to decide whether or not to stream.
           */
          if (opt->print_contents)
                  data.info.typep = &data.type;

We should not let skip_object_info kick in at all if opt->print_contents
is requested. And that causes other bugs outside of the spot you found.
We look at data->type earlier in print_object_or_die() to decide whether
or not to stream the contents, but we'll see garbage (fortunately we
zero-initialize the expand_data struct, so it's at least predictably
zero, and not random undefined behavior).

But I think we'd want to solve it by swapping the two conditionals I
showed above, which restores the assumption made in print_object_or_die().

> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 5d2dc99b74ad..9b0f1ae5ef8b 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -586,4 +586,10 @@ test_expect_success 'cat-file --unordered works' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'cat-file --batch="batman" with --batch-all-objects will work' '
> +	git -C all-two cat-file --batch-all-objects --batch="%(objectname)" | wc -l >expect &&
> +	git -C all-two cat-file --batch-all-objects --batch="batman" | wc -l >actual &&
> +	test_cmp expect actual
> +'

Is it worth testing both of these? The %(objectname) one will fail in
the same way (because we do not need to run oid_object_info() to get the
oid, which we already have). I'm OK doing both for better coverage, but
it may be worth mentioning either in a comment or in the commit message
that we expect both to fail, and why.

-Peff

  reply	other threads:[~2021-05-30 21:09 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-30  8:29 [PATCH] [GSOC] cat-file: fix --batch report changed-type bug ZheNing Hu via GitGitGadget
2021-05-30 21:09 ` Jeff King [this message]
2021-05-31 13:20   ` ZheNing Hu
2021-05-31 14:44     ` Jeff King
2021-05-31 15:32       ` ZheNing Hu
2021-05-31 16:07       ` Felipe Contreras
2021-06-01  1:49         ` Jeff King
2021-06-01  6:34           ` Felipe Contreras
2021-06-01 15:34             ` Jeff King
2021-06-01 16:42               ` Felipe Contreras
2021-06-01 12:46           ` ZheNing Hu
2021-05-30 21:15 ` Junio C Hamano
2021-05-30 21:36   ` Jeff King
2021-06-01  1:40     ` Junio C Hamano
2021-05-31 13:55   ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YLP/GEN0qIXvWEUn@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.