All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: "Git List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Hariom verma" <hariom18599@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>
Subject: Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?
Date: Fri, 20 Aug 2021 18:13:19 +0200	[thread overview]
Message-ID: <CAP8UFD0WB7FYtp9aX4qz5BmLiNz1S5PA1U8-cB8b9zRqdZHOjw@mail.gmail.com> (raw)
In-Reply-To: <CAOLTT8Qx3=C=MwRmKbrp=G5T_rQVcaLbZfzzO60m7P-_k1qh8A@mail.gmail.com>

Hi ZheNing,

On Thu, Aug 19, 2021 at 3:39 AM ZheNing Hu <adlternative@gmail.com> wrote:
>
> Hi, Christian and Hariom,
>
> I want to use this patch series as the temporary final version of GSOC project:
>
> https://github.com/adlternative/git/commits/cat-file-reuse-ref-filter-logic

I am still not very happy with the last patch in the series,but it can
be improved later.

> Due to the branch ref-filter-opt-code-logic or branch
> ref-filter-opt-perf patch series
> temporarily unable to reflect its optimization to git cat-file
> --batch. Therefore, using
> branch cat-file-reuse-ref-filter-logic is the most effective now.
>
> This is the final performance regression test result:
> Test                                        upstream/master   this
> tree
> ------------------------------------------------------------------------------------
> 1006.2: cat-file --batch-check              0.06(0.06+0.00)
> 0.08(0.07+0.00) +33.3%
> 1006.3: cat-file --batch-check with atoms   0.06(0.04+0.01)
> 0.06(0.06+0.00) +0.0%
> 1006.4: cat-file --batch                    0.49(0.47+0.02)
> 0.48(0.47+0.01) -2.0%
> 1006.5: cat-file --batch with atoms         0.48(0.44+0.03)
> 0.47(0.46+0.01) -2.1%
>
> git cat-file --batch has a performance improvement of about 2%.
> git cat-file --batch-check still has a performance gap of 33.3%.
>
> The performance degradation of git cat-file --batch-check is actually
> not very big.
>
> upstream/master (225bc32a98):
>
> $ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
> --batch-check --batch-all-objects"
> Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
>  Time (mean ± σ):     596.2 ms ±   5.7 ms    [User: 563.0 ms, System: 32.5 ms]
>  Range (min … max):   586.9 ms … 607.9 ms    10 runs
>
> cat-file-reuse-ref-filter-logic (709a0c5c12):
>
> $ hyperfine --warmup=10  "~/git/bin-wrappers/git cat-file
> --batch-check --batch-all-objects"
> Benchmark #1: ~/git/bin-wrappers/git cat-file --batch-check --batch-all-objects
>  Time (mean ± σ):     601.3 ms ±   5.8 ms    [User: 566.9 ms, System: 33.9 ms]
>  Range (min … max):   596.7 ms … 613.3 ms    10 runs
>
> The execution time of git cat-file --batch-check is only a few
> milliseconds away.

Yeah, it looks like less than 1% overhead.

Great work!

> But look at the execution time changes of git cat-file --batch:
>
> upstream/master (225bc32a98):
>
> $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects
> >/dev/null
> /home/adl/git/bin-wrappers/git cat-file --batch --batch-all-objects >
>  24.61s user 0.30s system 99% cpu 24.908 total
>
> cat-file-reuse-ref-filter-logic (709a0c5c12):
>
> $ time ~/git/bin-wrappers/git cat-file --batch --batch-all-objects >/dev/null
> cat-file --batch --batch-all-objects > /dev/null  25.10s user 0.30s
> system 99% cpu 25.417 total
>
> The execution time has been reduced by nearly 0.5 seconds.

It looks like it has increased by 0.5s, not been reduced.

> Intuition
> tells me that the performance improvement of git cat-file --batch will be
> more important.
>
> In fact, git cat-file origin code directly adds the obtained object data
> to the output buffer; But after using ref-filter logic, it needs to copy
> the object data to the intermediate data (atom_value), and finally
> to the output buffer. At present, we cannot easily eliminate intermediate
> data, because git for-each-ref --sort has a lot of dependence on it,
> but we can reduce the overhead of copying or allocating memory as
> much as possible.

Ok.

> I had an idea that I didn't implement before: partial data delayed evaluation.
> Or to be more specific, waiting until the data is about to be added to
> the output
> buffer, form specific output content, this may be a way to bypass the
> intermediate
> data.

Yeah, that might be a good idea.

  reply	other threads:[~2021-08-20 16:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 14:00 [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content? ZheNing Hu
2021-08-17 14:34 ` Fwd: " ZheNing Hu
2021-08-17 16:09 ` Christian Couder
2021-08-18  4:51   ` ZheNing Hu
2021-08-18  8:53     ` Christian Couder
2021-08-18  9:07       ` ZheNing Hu
2021-08-18 11:11         ` ZheNing Hu
2021-08-19  1:39           ` ZheNing Hu
2021-08-20 16:13             ` Christian Couder [this message]
2021-08-21  2:36               ` ZheNing Hu
2021-08-20 15:58           ` Christian Couder
2021-08-21  2:16             ` ZheNing Hu
2021-08-24  7:11               ` Christian Couder
2021-08-25  8:11                 ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP8UFD0WB7FYtp9aX4qz5BmLiNz1S5PA1U8-cB8b9zRqdZHOjw@mail.gmail.com \
    --to=christian.couder@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.