All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: "Jeff King" <peff@peff.net>, 程洋 <chengyang@xiaomi.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	何浩 <hehao@xiaomi.com>, "Xin7 Ma 马鑫" <maxin7@xiaomi.com>,
	石奉兵 <shifengbing@xiaomi.com>, 凡军辉 <fanjunhui@xiaomi.com>,
	王汉基 <wanghanji@xiaomi.com>
Subject: Re: [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects
Date: Wed, 7 Sep 2022 10:36:14 -0400	[thread overview]
Message-ID: <6018c526-4641-8374-8802-cfda5be330c3@github.com> (raw)
In-Reply-To: <YxfSRkEiiP4TyZTM@coredump.intra.peff.net>

On 9/6/2022 7:05 PM, Jeff King wrote:
> This patch teaches both code paths to use the new SKIP_HASH_CHECK flag
> for parse_object(). You can see the speed-up in p5600, which does a
> blob:none clone followed by a checkout. The savings for git.git are
> modest:
> 
>   Test                          HEAD^             HEAD
>   ----------------------------------------------------------------------
>   5600.3: checkout of result    2.23(4.19+0.24)   1.72(3.79+0.18) -22.9%
> 
> But the savings scale with the number of bytes. So on a repository like
> linux.git with more files, we see more improvement (in both absolute and
> relative numbers):
> 
>   Test                          HEAD^                HEAD
>   ----------------------------------------------------------------------------
>   5600.3: checkout of result    51.62(77.26+2.76)    34.86(61.41+2.63) -32.5%
> 
> And here's an even more extreme case. This is the android gradle-plugin
> repository, whose tip checkout has ~3.7GB of files:
> 
>   Test                          HEAD^               HEAD
>   --------------------------------------------------------------------------
>   5600.3: checkout of result    79.51(90.84+5.55)   40.28(51.88+5.67) -49.3%
> 
> Keep in mind that these timings are of the whole checkout operation.

These numbers are _very_ impressive for this user-facing scenario.

>   Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input
>     Time (mean ± σ):     50.884 s ±  0.239 s    [User: 51.450 s, System: 1.726 s]
>     Range (min … max):   50.608 s … 51.025 s    3 runs
> 
>   Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input
>     Time (mean ± σ):      9.728 s ±  0.112 s    [User: 10.466 s, System: 1.535 s]
>     Range (min … max):    9.618 s …  9.842 s    3 runs
> 
>   Summary
>     'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran
>       5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input'
> 
> So a server would see an 80% reduction in CPU serving the initial
> checkout of a partial clone for this repository.

Very nice!

> In both cases skipping the extra hashing on the server should be pretty
> safe. The client doesn't trust the server anyway, so it will re-hash all
> of the objects via index-pack.

Definitely safe for this target scenario.

> There is one thing to note, though: the
> change in get_reference() affects not just pack-objects, but rev-list,
> git-log, etc. We could use a flag to limit to index-pack here, but we
> may already skip hash checks in this instance. For commits, we'd skip
> anything we load via the commit-graph. And while before this commit we
> would check a blob fed directly to rev-list on the command-line, we'd
> skip checking that same blob if we found it by traversing a tree.

It's worth also mentioning that since you isolated the change to
get_reference(), it is only skipping the parse for the requested tips
of the revision walk. As we follow commits and trees to other objects
we do not use parse_object() but instead use the appropriate method
(parse_commit(), parse_tree(), and do not even parse blobs).

This is additionally confirmed that p0001-rev-list.sh has no measurable
change in results with this commit, even when disabling the commit-graph.

> The exception for both is if --verify-objects is used. In that case,
> we'll skip this optimization, and the new test makes sure we do this
> correctly.

The test for this is clever. Thanks for adding it.

Code and test look good.

Thanks,
-Stolee

  reply	other threads:[~2022-09-07 14:36 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11  8:09 Partial-clone cause big performance impact on server 程洋
2022-08-11 17:22 ` Jonathan Tan
2022-08-13  7:55   ` 回复: [External Mail]Re: " 程洋
2022-08-13 11:41     ` 程洋
2022-08-15  5:16     ` ZheNing Hu
2022-08-15 13:15       ` 程洋
2022-08-12 12:21 ` Derrick Stolee
2022-08-14  6:48 ` Jeff King
2022-08-15 13:18   ` Derrick Stolee
2022-08-15 14:50     ` [External Mail]Re: " 程洋
2022-08-17 10:22     ` 程洋
2022-08-17 13:41       ` Derrick Stolee
2022-08-18  5:49         ` Jeff King
2022-09-01  6:53   ` 程洋
2022-09-01 16:19     ` Jeff King
2022-09-05 11:17       ` 程洋
2022-09-06 18:38         ` Jeff King
2022-09-06 22:58           ` [PATCH 0/3] speeding up on-demand fetch for blobs in partial clone Jeff King
2022-09-06 23:01             ` [PATCH 1/3] parse_object(): allow skipping hash check Jeff King
2022-09-07 14:15               ` Derrick Stolee
2022-09-07 20:44                 ` Jeff King
2022-09-06 23:05             ` [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects Jeff King
2022-09-07 14:36               ` Derrick Stolee [this message]
2022-09-07 14:45                 ` Derrick Stolee
2022-09-07 20:50                   ` Jeff King
2022-09-07 19:26               ` Junio C Hamano
2022-09-07 20:36                 ` Jeff King
2022-09-07 20:48                   ` [BUG] t1800: Fails for error text comparison rsbecker
2022-09-07 21:55                     ` Junio C Hamano
2022-09-07 22:23                       ` rsbecker
2022-09-07 21:02                   ` [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects Jeff King
2022-09-07 22:07                     ` Junio C Hamano
2022-09-08  5:04                       ` Jeff King
2022-09-08 16:41                         ` Junio C Hamano
2022-09-06 23:06             ` [PATCH 3/3] parse_object(): check commit-graph when skip_hash set Jeff King
2022-09-07 14:46               ` Derrick Stolee
2022-09-07 19:31               ` Junio C Hamano
2022-09-08 10:39                 ` [External Mail]Re: " 程洋
2022-09-08 18:42                   ` Jeff King
2022-09-07 14:48             ` [PATCH 0/3] speeding up on-demand fetch for blobs in partial clone Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6018c526-4641-8374-8802-cfda5be330c3@github.com \
    --to=derrickstolee@github.com \
    --cc=chengyang@xiaomi.com \
    --cc=fanjunhui@xiaomi.com \
    --cc=git@vger.kernel.org \
    --cc=hehao@xiaomi.com \
    --cc=maxin7@xiaomi.com \
    --cc=peff@peff.net \
    --cc=shifengbing@xiaomi.com \
    --cc=wanghanji@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.