All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Hariom verma" <hariom18599@gmail.com>
Subject: Re: [GSOC] How to improve the performance of git cat-file --batch
Date: Wed, 28 Jul 2021 17:36:33 +0200	[thread overview]
Message-ID: <CAP8UFD1tS++xEMob7p5UeDQ_enoyy1aOQTG=28r-ge2k4n3ECw@mail.gmail.com> (raw)
In-Reply-To: <CAOLTT8QvtJ70X8mQx4K4gD0T=7i-ryd0QL81-QeSTqSWyHuWLQ@mail.gmail.com>

On Wed, Jul 28, 2021 at 3:38 PM ZheNing Hu <adlternative@gmail.com> wrote:

> Ok, therefore we need an accurate number of call times about lookup_object(),
> although the conclusion is obvious: 0 (upstream/master) and a big
> number (with my patch).

[...]

> This is the only 1 time left is printed by git.c, which show that after using
> my patch, we additionally call  lookup_object() when we use --batch option.
> According to the results of the previous gprof test: lookup_object()
> occupies 8.72%
> of the total time. (Though below you seem to think that the effect of
> gprof is not
> reliable enough.) This may be a place worthy of optimization.

First, yeah, I hadn't seen the "calls" columns in your gprof reports.
Sorry! It's nice to see that your manual check with a trace_printf()
function gives the same result as gprof about this though.

Anyway if you agree that it might be a place worthy of optimization,
then it might be a good idea to explain the reason for the numerous
lookup_object() calls when using the ref-filter code.

[...]

> > It would be nice if you could add a bit more details about how
> > lookup_object() is called (both before and after the changes that
> > degrade performance).
>
> After we letting git cat-file --batch reuse the logic of ref-filter,
> we will use get_object()
> to grab the object's data. Since we used atom %(raw), it will require
> us to grab the raw data
> of the object, oi->info.contentp will be set, parse_object_buffer() in
> get_object() will be
> called, parse_object_buffer() calls lookup_commit(), lookup_blob(),
> lookup_tree(),
> and lookup_tag(), they call lookup_object(). As we have seen,
> lookup_object() seems to
> take a lot of time.

Not sure why you are talking about %(raw). Is the root of the issue
that we now use atom %(raw), or how we implemented it?

> So let us think, can we skip this parse_object_buffer() in some scenarios?
> parse_object_buffer() parses the data of the object into a "struct
> object *obj", and then we use this
> obj feed to grab_values(), and then grab_values() feed obj to
> grab_tag_values() or grab_commit_values()
> to handle some logic about %(tag), %(type), %(object), %(tree),
> %(parent), %(numparent).
>
> But git cat-file --batch can avaid handle there atoms with default format.
>
> Therefore, maybe we can skip parsing object buffer if we really don't
> care about these atoms.

Yeah, maybe oi->info.contentp should be set only if the user specified
one of the atoms that really needs the content provided by
parse_object_buffer().

Thanks!

      reply	other threads:[~2021-07-28 15:36 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-24 14:22 [GSOC] How to improve the performance of git cat-file --batch ZheNing Hu
2021-07-24 21:20 ` Ævar Arnfjörð Bjarmason
2021-07-25 12:05   ` ZheNing Hu
2021-07-26  9:38     ` Christian Couder
2021-07-27  1:37       ` ZheNing Hu
2021-07-28  7:34         ` Christian Couder
2021-07-28 13:38           ` ZheNing Hu
2021-07-28 15:36             ` Christian Couder [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP8UFD1tS++xEMob7p5UeDQ_enoyy1aOQTG=28r-ge2k4n3ECw@mail.gmail.com' \
    --to=christian.couder@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.