All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Cai <johncai86@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: "John Cai via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org, "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v2] cat-file: skip expanding default format
Date: Tue, 08 Mar 2022 17:06:41 -0500	[thread overview]
Message-ID: <52AB116A-E1A6-4609-AF13-A6C81581A511@gmail.com> (raw)
In-Reply-To: <YifSFQ8zEZimCkHl@nand.local>

Hi Taylor

On 8 Mar 2022, at 17:00, Taylor Blau wrote:

> On Tue, Mar 08, 2022 at 02:54:23AM +0000, John Cai via GitGitGadget wrote:
>>  /*
>>   * If "pack" is non-NULL, then "offset" is the byte offset within the pack from
>>   * which the object may be accessed (though note that we may also rely on
>> @@ -363,6 +371,11 @@ static void batch_object_write(const char *obj_name,
>>  			       struct packed_git *pack,
>>  			       off_t offset)
>>  {
>> +	struct strbuf type_name = STRBUF_INIT;
>> +
>> +	if (!opt->format)
>> +		data->info.type_name = &type_name;
>
> Hmmm, I'm not quite sure I understand why the extra string buffer is
> necessary here. When we do the first expansion with the DEFAULT_FORMAT,
> we set data->info.typep to point at data->type.
>
> So by the time we do our actual query (either with
> `packed_object_info()` or just `oid_object_info_extended()`), we will
> get the type filled into data->type.
>
> And we should be able to use that in print_default_format(), which saves
> us in a couple of ways:
>
>   - We don't have to (stack) allocate a string buffer, which then needs
>     to be cleaned up after printing each object.
>
>   - (More importantly) we can avoid the extra string copy through the
>     intermediate buffer by using type_name() directly.

Agree with your reasoning here.

>
> Here's a small patch on top that you could apply, if you're interested:
>
> --- >8 ---
>
>
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index ab9a49e13a..9f55afa75b 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -355,5 +355,5 @@ static int print_default_format(char *buf, int len, struct expand_data *data)
>  {
>  	return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid),
> -		 data->info.type_name->buf,
> +			 type_name(data->type),
>  		 (uintmax_t)*data->info.sizep);
>
> @@ -372,9 +372,4 @@ static void batch_object_write(const char *obj_name,
>  			       off_t offset)
>  {
> -	struct strbuf type_name = STRBUF_INIT;
> -
> -	if (!opt->format)
> -		data->info.type_name = &type_name;
> -
>  	if (!data->skip_object_info) {
>  		int ret;
> @@ -391,5 +386,5 @@ static void batch_object_write(const char *obj_name,
>  			       obj_name ? obj_name : oid_to_hex(&data->oid));
>  			fflush(stdout);
> -			goto cleanup;
> +			return;
>  		}
>  	}
> @@ -410,7 +405,4 @@ static void batch_object_write(const char *obj_name,
>  		batch_write(opt, "\n", 1);
>  	}
> -
> -cleanup:
> -	strbuf_release(&type_name);
>  }
>
> --- 8< ---
>
> On my copy of git.git., it shaves off around ~7ms that we're spending
> just copying type names back and forth.
>
>     (without the above)
>     Benchmark 1: git.compile cat-file --batch-check --batch-all-objects
>       Time (mean ± σ):     578.7 ms ±  12.7 ms    [User: 553.4 ms, System: 25.2 ms]
>       Range (min … max):   568.1 ms … 600.1 ms    10 runs
>
>     (with the above)
>     Benchmark 1: git.compile cat-file --batch-check --batch-all-objects
>       Time (mean ± σ):     571.5 ms ±   7.9 ms    [User: 544.0 ms, System: 27.3 ms]
>       Range (min … max):   558.8 ms … 583.2 ms    10 runs

Thanks for this suggestion and the benchmark! This is similar to what Junio suggested, and
I do think this is the right thing to do here.

>
> Thanks,
> Taylor

Thanks,
John

  reply	other threads:[~2022-03-08 22:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-04 21:37 [PATCH] cat-file: skip expanding default format John Cai via GitGitGadget
2022-03-07  5:56 ` Junio C Hamano
2022-03-07  6:11   ` Junio C Hamano
2022-03-07 17:41     ` John Cai
2022-03-07 12:15 ` Ævar Arnfjörð Bjarmason
2022-03-08  2:54 ` [PATCH v2] " John Cai via GitGitGadget
2022-03-08 16:59   ` Junio C Hamano
2022-03-08 19:01     ` John Cai
2022-03-08 22:00   ` Taylor Blau
2022-03-08 22:06     ` John Cai [this message]
2022-03-08 22:24     ` Taylor Blau
2022-03-08 22:45       ` John Cai
2022-03-08 22:08   ` [PATCH v3] " John Cai via GitGitGadget
2022-03-08 22:30     ` Taylor Blau
2022-03-08 23:09       ` John Cai
2022-03-08 23:34         ` John Cai
2022-03-15  2:40     ` [PATCH v4] " John Cai via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52AB116A-E1A6-4609-AF13-A6C81581A511@gmail.com \
    --to=johncai86@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.