All of lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: ZheNing Hu via GitGitGadget <gitgitgadget@gmail.com>,
	Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	Hariom Verma <hariom18599@gmail.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>, Jeff King <peff@peff.net>
Subject: Re: [PATCH 12/15] [GSOC] cat-file: reuse ref-filter logic
Date: Sat, 3 Jul 2021 19:45:30 +0800	[thread overview]
Message-ID: <CAOLTT8RdujpQ2uKEWPyG0HGkUz_EsONw3hEZ6YAhpmQc5rgohA@mail.gmail.com> (raw)
In-Reply-To: <877di8al8n.fsf@evledraar.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2680 bytes --]

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年7月2日周五 下午9:39写道:
>
>
> On Thu, Jul 01 2021, ZheNing Hu via GitGitGadget wrote:
>
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > In order to let cat-file use ref-filter logic, let's do the
> > following:
> >
> > 1. Change the type of member `format` in struct `batch_options`
> > to `ref_format`, we will pass it to ref-filter later.
> > 2. Let `batch_objects()` add atoms to format, and use
> > `verify_ref_format()` to check atoms.
> > 3. Use `format_ref_array_item()` in `batch_object_write()` to
> > get the formatted data corresponding to the object. If the
> > return value of `format_ref_array_item()` is equals to zero,
> > use `batch_write()` to print object data; else if the return
> > value is less than zero, use `die()` to print the error message
> > and exit; else if return value is greater than zero, only print
> > the error message, but don't exit.
> > 4. Use free_ref_array_item_value() to free ref_array_item's
> > value.
> >
> > Most of the atoms in `for-each-ref --format` are now supported,
> > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`,
> > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected:
> > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`,
> > `%(flag)`, `%(HEAD)`, because these atoms are unique to those objects
> > that pointed to by a ref, "for-each-ref"'s family can naturally use
> > these atoms, but not all objects are pointed to be a ref, so "cat-file"
> > will not be able to use them.
> >
> > The performance for `git cat-file --batch-all-objects
> > --batch-check` on the Git repository itself with performance
> > testing tool `hyperfine` changes from 669.4 ms ±  31.1 ms to
> > 1.134 s ±  0.063 s.
> >
> > The performance for `git cat-file --batch-all-objects --batch
> >>/dev/null` on the Git repository itself with performance testing
> > tool `time` change from "27.37s user 0.29s system 98% cpu 28.089
> > total" to "33.69s user 1.54s system 87% cpu 40.258 total".
>
> This new feature is really nice, but that's a really bad performance
> regression. A lot of software in the wild relies on "cat-file --batch"
> to be *the* performant interface to git for mass-extrction of object
> data.
>

Thanks, this performance is indeed worrying.

> That's in increase of ~70% and ~20%, respectively. Have you dug into
> (e.g. with a profiler) where we're now spending all this time?

See this two attachment about performance flame graph,
oid_object_info_extended() in get_object() is the key to performance
limitations.

--
ZheNing Hu

[-- Attachment #2: cat-file-batch-batch-all-objects.svg --]
[-- Type: image/svg+xml, Size: 172910 bytes --]

[-- Attachment #3: cat-file-batch-check-batch-all-objects.svg --]
[-- Type: image/svg+xml, Size: 237495 bytes --]

  parent reply	other threads:[~2021-07-03 11:45 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-01 16:07 [PATCH 00/15] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-01 16:07 ` [PATCH 01/15] [GSOC] ref-filter: add obj-type check in grab contents ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 02/15] [GSOC] ref-filter: add %(raw) atom ZheNing Hu via GitGitGadget
2021-07-02 13:22   ` Ævar Arnfjörð Bjarmason
2021-07-03  5:14     ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 03/15] [GSOC] ref-filter: --format=%(raw) re-support --perl ZheNing Hu via GitGitGadget
2021-07-02 13:29   ` Ævar Arnfjörð Bjarmason
2021-07-03  5:14     ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 04/15] [GSOC] ref-filter: use non-const ref_format in *_atom_parser() ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 05/15] [GSOC] ref-filter: add %(rest) atom ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 06/15] [GSOC] ref-filter: pass get_object() return value to their callers ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 07/15] [GSOC] ref-filter: introduce free_ref_array_item_value() function ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 08/15] [GSOC] ref-filter: add cat_file_mode in struct ref_format ZheNing Hu via GitGitGadget
2021-07-02 13:32   ` Ævar Arnfjörð Bjarmason
2021-07-02 19:28     ` Eric Sunshine
2021-07-02 22:11       ` Christian Couder
2021-07-03  5:55         ` ZheNing Hu
2021-07-05  7:17           ` Ævar Arnfjörð Bjarmason
2021-07-01 16:08 ` [PATCH 09/15] [GSOC] ref-filter: modify the error message and value in get_object ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 10/15] [GSOC] cat-file: add has_object_file() check ZheNing Hu via GitGitGadget
2021-07-02 13:34   ` Ævar Arnfjörð Bjarmason
2021-07-03  5:50     ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 11/15] [GSOC] cat-file: change batch_objects parameter name ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 12/15] [GSOC] cat-file: reuse ref-filter logic ZheNing Hu via GitGitGadget
2021-07-02 13:36   ` Ævar Arnfjörð Bjarmason
2021-07-02 13:45     ` Christian Couder
2021-07-03 11:45     ` ZheNing Hu [this message]
2021-07-03 13:37       ` Ævar Arnfjörð Bjarmason
2021-07-04 11:10         ` ZheNing Hu
2021-07-05  7:18           ` Ævar Arnfjörð Bjarmason
2021-07-03 14:17       ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 13/15] [GSOC] cat-file: reuse err buf in batch_object_write() ZheNing Hu via GitGitGadget
2021-07-01 16:08 ` [PATCH 14/15] [GSOC] cat-file: re-implement --textconv, --filters options ZheNing Hu via GitGitGadget
2021-07-01 19:55   ` Junio C Hamano
2021-07-01 20:11   ` Junio C Hamano
2021-07-02 12:46     ` ZheNing Hu
2021-07-02 15:27       ` Junio C Hamano
2021-07-03  6:17         ` ZheNing Hu
2021-07-01 16:08 ` [PATCH 15/15] [GSOC] ref-filter: remove grab_oid() function ZheNing Hu via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8RdujpQ2uKEWPyG0HGkUz_EsONw3hEZ6YAhpmQc5rgohA@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.