All of lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Hariom verma" <hariom18599@gmail.com>
Subject: Re: [GSOC] How to improve the performance of git cat-file --batch
Date: Tue, 27 Jul 2021 09:37:44 +0800	[thread overview]
Message-ID: <CAOLTT8TdL7UhfVSOzbpmo-WFNrcKwmy=E720tNt4KM9o_p=keg@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD1szPO-qmusSqiLRWxynV0gcy8UsnZORNn0USYAmRUGVw@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> 于2021年7月26日周一 下午5:38写道:
>
> On Sun, Jul 25, 2021 at 2:04 PM ZheNing Hu <adlternative@gmail.com> wrote:
> > Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年7月25日周日 上午5:23写道:
>
> > > Having skimmed it I'm a bit confused about this in reference to
> > > performance generally. I haven't looked into the case you're discussing,
> > > but as I noted in
> > > https://lore.kernel.org/git/87im1p6x34.fsf@evledraar.gmail.com/ the
> > > profiling clearly shows that the main problem is that you've added
> > > object lookups we skipped before.
> >
> > Yeah, you showed me last time that lookup_object() took up a lot of time.
>
> Could the document explain with some details why there are more calls
> to lookup_object()? For example it could take an example `git cat-file
> --batch ...` command (if possible a simple one), and say which
> functions like lookup_object() it was using (and how many times) to
> get the data it needs before using the ref-filter logic, and then the
> same information after using the ref-filter logic.
>

Sorry but this time I use gprof but can’t observe the same effect as before.
lookup_object() is indeed a part of the time overhead, but its proportion is
not very large this time.

> It could be nice if there were also some data about how much time used
> to be spent in lookup_object() and how much time is now spent there,
> and how this compares with the whole slowdown we are seeing. If Ævar
> already showed that, you can of course reuse what he already did.
>

This is my test for git cat-file --batch --batch-all-objects >/dev/null:

daab8a564: The fifth batch (upstream/master)

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 38.13      0.61     0.61  1968866     0.00     0.00  patch_delta
 13.75      0.83     0.22  6568488     0.00     0.00
unpack_object_header_buffer
 11.25      1.01     0.18   344036     0.00     0.00  unpack_entry
  7.50      1.13     0.12  1964667     0.00     0.00  hashmap_remove
  6.88      1.24     0.11  6153182     0.00     0.00  hashmap_get
  1.88      1.27     0.03  7746299     0.00     0.00  zlib_post_call
  1.88      1.30     0.03   842731     0.00     0.00  bsearch_hash
  1.88      1.33     0.03   827663     0.00     0.00  nth_packed_object_offset
  1.25      1.35     0.02 15385422     0.00     0.00  use_pack
  1.25      1.37     0.02  6236120     0.00     0.00  get_delta_base
  1.25      1.39     0.02  2581859     0.00     0.00  git_inflate_end
  1.25      1.41     0.02   826650     0.00     0.00
do_oid_object_info_extended
  1.25      1.43     0.02   826650     0.00     0.00  find_pack_entry
  1.25      1.45     0.02   825692     0.00     0.00  packed_to_object_type
  1.25      1.47     0.02   378521     0.00     0.00  get_size_from_delta


d3b5272a94: [GSOC] cat-file: reuse ref-filter logic

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 27.06      0.59     0.59  1968866     0.00     0.00  patch_delta
 16.51      0.95     0.36  2202293     0.00     0.00
unpack_object_header_buffer
 13.76      1.25     0.30  5327015     0.00     0.00  hashmap_get
 11.47      1.50     0.25   344036     0.00     0.00  unpack_entry
  8.72      1.69     0.19   521278     0.00     0.00  lookup_object
  4.13      1.78     0.09  1964667     0.00     0.00  hashmap_remove
  2.75      1.84     0.06   348709     0.00     0.00  get_object
  2.29      1.89     0.05        1     0.05     2.17  oid_array_for_each_unique
  1.38      1.92     0.03  6373452     0.00     0.00  use_pack
  0.92      1.94     0.02  2202293     0.00     0.00  unpack_compressed_entry
  0.92      1.96     0.02  1394836     0.00     0.00  grab_sub_body_contents
  0.92      1.98     0.02   348709     0.00     0.00  create_object
  0.92      2.00     0.02   348709     0.00     0.00  format_ref_array_item
  0.92      2.02     0.02    74557     0.00     0.00  fill_commit_graph_info

Because we called parse_object_buffer() in get_object(), lookup_object()
is called indirectly...

We can see that some functions are called the same times: patch_delta(),
unpack_entry(), hashmap_remove()... But after using my patch,
format_ref_array_item(), grab_sub_body_contents(), get_object(), lookup_object()
begin to occupy a certain proportion.

This is my test for git cat-file --batch-check --batch-all-objects >/dev/null:

daab8a564: The fifth batch (upstream/master)

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 47.83      0.11     0.11  3385670     0.00     0.00
unpack_object_header_buffer
 13.04      0.14     0.03  6941590     0.00     0.00  use_pack
  8.70      0.16     0.02  1046130     0.00     0.00  expand_format
  4.35      0.17     0.01   349013     0.00     0.00  oid_array_append
  4.35      0.18     0.01   348710     0.00     0.00  strbuf_expand
  4.35      0.19     0.01   348709     0.00     0.00  find_pack_entry
  4.35      0.20     0.01   348230     0.00     0.00  packed_to_object_type
  4.35      0.21     0.01   259719     0.00     0.00  git_inflate
  4.35      0.22     0.01        1    10.00   210.00  oid_array_for_each_unique
  4.35      0.23     0.01                             pack_basename

d3b5272a94: [GSOC] cat-file: reuse ref-filter logic

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 52.00      0.13     0.13  3385670     0.00     0.00
unpack_object_header_buffer
 16.00      0.17     0.04  6941590     0.00     0.00  use_pack
  8.00      0.19     0.02  3296680     0.00     0.00  get_delta_base
  4.00      0.20     0.01   348709     0.00     0.00  find_pack_entry
  4.00      0.21     0.01   348709     0.00     0.00  oid_to_hex
  4.00      0.22     0.01   348709     0.00     0.00  populate_value
  4.00      0.23     0.01   348230     0.00     0.00  packed_object_info
  4.00      0.24     0.01   348230     0.00     0.00  packed_to_object_type
  4.00      0.25     0.01                             void_hashcmp

Similarly, unpack_object_header_buffer(), use_pack(),
find_pack_entry(), packed_to_object_type(), they are still called same
times as before; populate_value(),
packed_object_info(), oid_to_hex() are new.

> The GIT_TRACE_PERFORMANCE, GIT_TRACE2_PERF env variables and the
> associated trace_*() and trace2_*() functions might help you with
> measuring how much time is spent in different functions.

Thanks.
--
ZheNing Hu

  reply	other threads:[~2021-07-27  1:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-24 14:22 [GSOC] How to improve the performance of git cat-file --batch ZheNing Hu
2021-07-24 21:20 ` Ævar Arnfjörð Bjarmason
2021-07-25 12:05   ` ZheNing Hu
2021-07-26  9:38     ` Christian Couder
2021-07-27  1:37       ` ZheNing Hu [this message]
2021-07-28  7:34         ` Christian Couder
2021-07-28 13:38           ` ZheNing Hu
2021-07-28 15:36             ` Christian Couder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOLTT8TdL7UhfVSOzbpmo-WFNrcKwmy=E720tNt4KM9o_p=keg@mail.gmail.com' \
    --to=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.