All of lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Hariom verma <hariom18599@gmail.com>,
	Git List <git@vger.kernel.org>
Subject: Re: [GSoC] Git Blog 11
Date: Thu, 5 Aug 2021 12:50:21 +0800	[thread overview]
Message-ID: <CAOLTT8Sd6OCU_Ufrhqstz-Mw0Ej=9F2Y20BjPOpkgsuB5D-4Nw@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD3E9oR9E4S=f8iReKOnvVO_WrXVziyztHZJCiScUAxDRg@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> 于2021年8月4日周三 下午4:57写道:
>
> On Tue, Aug 3, 2021 at 4:48 AM ZheNing Hu <adlternative@gmail.com> wrote:
> >
> > ZheNing Hu <adlternative@gmail.com> 于2021年8月3日周二 上午10:37写道:
> > >
> > > Christian Couder <christian.couder@gmail.com> 于2021年8月2日周一 下午2:25写道:
> > > >
> > > > On Sun, Aug 1, 2021 at 8:45 AM ZheNing Hu <adlternative@gmail.com> wrote:
> > > >
> > > > > in some cases, this is the result of the performance test of
> > > > > `t/perf/p1006-cat-file.sh`:
> > > > >
> > > > > ```
> > > > > Test                                        HEAD~             HEAD
> > > > > ------------------------------------------------------------------------------------
> > > > > 1006.2: cat-file --batch-check              0.10(0.09+0.00)
> > > > > 0.11(0.10+0.00) +10.0%
> > > > > 1006.3: cat-file --batch-check with atoms   0.09(0.08+0.01)
> > > > > 0.09(0.06+0.03) +0.0%
> > > > > 1006.4: cat-file --batch                    0.62(0.58+0.04)
> > > > > 0.57(0.54+0.03) -8.1%
> > > > > 1006.5: cat-file --batch with atoms         0.63(0.60+0.02)
> > > > > 0.52(0.49+0.02) -17.5%
> > > > > ```
> > > > >
> > > > > We can see that the performance of `git cat-file --batch` has been a
> > > > > certain improvement!
> > > >
> > > > Yeah, sure -8.1% or -17.5% is really nice! But why +10.0% for
> > > > `cat-file --batch-check`?
> > >
> > > I think it's not very important. Because our optimization is skipping
> > > parse_object_buffer(), git cat-file --batch-check will not set oi->contentp
> > > by default, parse_object_buffer() will not be executed.
>
> Do you think that if git cat-file --batch-check would set
> oi->contentp, there would be no performance regression for `cat-file
> --batch-check`?
> Could you test that?
>

Oh, I mean that if git cat-file --batch-check with its default format
"%(objectname) %(objecttype)
%(objectsize)", it will not have any optimization; But if git cat-file
--batch set with "%(contents)" or
some other atoms, it will indeed be optimized. See 1006.4:

Test                                                 this tree
HEAD~
---------------------------------------------------------------------------------------------
1006.2: cat-file --batch-check                       0.15(0.12+0.02)
0.15(0.13+0.01) +0.0%
1006.3: cat-file --batch-check with basic atoms      0.12(0.10+0.01)
0.12(0.10+0.02) +0.0%
1006.4: cat-file --batch-check with contents atoms   0.66(0.63+0.02)
0.75(0.72+0.02) +13.6%
1006.5: cat-file --batch                             0.61(0.57+0.04)
0.70(0.65+0.05) +14.8%
1006.6: cat-file --batch with atoms                  0.58(0.57+0.01)
0.67(0.63+0.03) +15.5%

> > > Therefore, we did
> > > not optimize `git cat-file --batch-check` at all. 10% may be small enough
> > > for git cat-file --batch-check. The noise of environment even will cover it...
> >
> > By the way, its performance may still be worse than "upstream/master", but it
> > will be better than before optimization.
>
> Nice that there is some improvement, but it would be better if it was
> similar to "upstream/master".
>

Agree.

> > Test                                        HEAD~             this tree
> > ------------------------------------------------------------------------------------
> > 1006.2: cat-file --batch-check              0.10(0.09+0.01)
> > 0.09(0.08+0.01) -10.0%
> > 1006.3: cat-file --batch-check with atoms   0.09(0.07+0.02)
> > 0.08(0.05+0.03) -11.1%
> > 1006.4: cat-file --batch                    0.61(0.59+0.02)
> > 0.53(0.51+0.02) -13.1%
> > 1006.5: cat-file --batch with atoms         0.60(0.57+0.02)
> > 0.52(0.49+0.03) -13.3%
>
> Yeah, your patch seems to be an overall improvement when the
> ref-filter code is used.
>
> > Test                                        upstream/master   this
> > tree
> > ------------------------------------------------------------------------------------
> > 1006.2: cat-file --batch-check              0.08(0.07+0.01)
> > 0.10(0.07+0.02) +25.0%
> > 1006.3: cat-file --batch-check with atoms   0.06(0.05+0.01)
> > 0.08(0.08+0.00) +33.3%
> > 1006.4: cat-file --batch                    0.49(0.46+0.03)
> > 0.53(0.50+0.03) +8.2%
> > 1006.5: cat-file --batch with atoms         0.48(0.45+0.03)
> > 0.51(0.48+0.02) +6.3%
>
> This means that some further performance improvements are still needed
> both for --batch and --batch-check though.
>
> Have you tried to see, using gprof or something else, what is still
> degrading the performance compared to when the ref-filter code isn't
> used?

Yeah, gprof show that Number of calls of strbuf_add(), xstrdup() has increased
after using the logic of ref-filter. But at the same time, I noticed
that grab_person()
seems to be an area worth optimizing. grab_person() uses its parameter
"const char *who"
for type comparison, But after we added `enum atom_type` to
ref-filter, We can use it
for some comparisons. And there are two for() loops in grab_person(),
and we can merge
them into one. With this patch [1], there is a slight improvement in
performance.

Test                                                this tree
HEAD~
-------------------------------------------------------------------------------------------
1006.2: cat-file --batch-check                      0.14(0.13+0.01)
0.15(0.14+0.01) +7.1%
1006.3: cat-file --batch-check with atoms           0.12(0.10+0.01)
0.12(0.09+0.02) +0.0%
1006.4: cat-file --batch-check with contents atom   0.66(0.65+0.01)
0.66(0.64+0.02) +0.0%
1006.5: cat-file --batch                            0.60(0.57+0.02)
0.60(0.57+0.03) +0.0%
1006.6: cat-file --batch with atoms                 0.58(0.53+0.04)
0.58(0.56+0.02) +0.0%
1006.7: cat-file --batch with person atoms          0.59(0.57+0.02)
0.60(0.56+0.04) +1.7%

It’s also worth mentioning that I found that grab_person() seems to be doing
repeated parsing which parse_object_buffer() may already be done.
parse_commit_buffer()
and parse_tag_buffer() have parsed part of the content of the object,
and used by
grab_tag_values() and grab_commit_values(). For the time being, I
think this is a kind of
shallow parsing, if we can let parse_object_buffer() do in-depth
parsing, it would be great.
We can save a lot of work in grab_person()... Of course this may be a
little difficult.

Thanks.
--
ZheNing Hu

[1]: https://github.com/adlternative/git/commit/cec0ee72e64d651c01d7a2a7fe17a4adab1ef0de

      reply	other threads:[~2021-08-05  4:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-01  6:46 [GSoC] Git Blog 11 ZheNing Hu
2021-08-02  6:25 ` Christian Couder
2021-08-03  2:37   ` ZheNing Hu
2021-08-03  2:49     ` ZheNing Hu
2021-08-04  8:56       ` Christian Couder
2021-08-05  4:50         ` ZheNing Hu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOLTT8Sd6OCU_Ufrhqstz-Mw0Ej=9F2Y20BjPOpkgsuB5D-4Nw@mail.gmail.com' \
    --to=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.