All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, ZheNing Hu <adlternative@gmail.com>
Subject: Re: [hacky PATCH 0/2] speeding up trivial for-each-ref invocations
Date: Mon, 6 Sep 2021 08:54:29 +0200	[thread overview]
Message-ID: <YTW7JV0sCq2rqvRT@ncase> (raw)
In-Reply-To: <YTNpQ7Od1U/5i0R7@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 3332 bytes --]

On Sat, Sep 04, 2021 at 08:40:35AM -0400, Jeff King wrote:
> Somebody complained to me off-list recently that for-each-ref was slow
> when doing a trivial listing (like refname + objectname) of a large
> number of refs, even though you could get mostly the same output by
> just dumping the packed-refs file.
> 
> So I was nerd-sniped into making this horrible series, which does speed
> up some cases. It was a combination of "how fast can we easily get it",
> plus I was curious _which_ parts of for-each-ref were actually slow.
> 
> In this version there are 2 patches, tested against 'git for-each-ref
> --format="%(objectname) %(refname)"' on a fully packed repo with 500k
> refs:
> 
>   - directly outputting items rather than building up a ref_array yields
>     about a 1.5x speedup
> 
>   - avoiding the whole atom-formatting system yields a 2.5x speedup on
>     top of that

Interesting, thanks for the PoC. Your 500k refs naturally triggered me
given that I have recently been trying to optimize such repos with lots
of refs, too. So I ran your series on my gitlab-org/gitlab repository
with 2.3M ref and saw similar speedups:

    Benchmark #1: jk-for-each-ref-speedup~2: git-for-each-ref --format='%(objectname) %(refname)'
      Time (mean ± σ):      2.756 s ±  0.013 s    [User: 2.416 s, System: 0.339 s]
      Range (min … max):    2.740 s …  2.774 s    5 runs

    Benchmark #2: jk-for-each-ref-speedup~: git-for-each-ref --format='%(objectname) %(refname)'
      Time (mean ± σ):      2.009 s ±  0.015 s    [User: 1.974 s, System: 0.035 s]
      Range (min … max):    1.984 s …  2.020 s    5 runs

    Benchmark #3: jk-for-each-ref-speedup: git-for-each-ref --format='%(objectname) %(refname)'
      Time (mean ± σ):     701.8 ms ±   4.2 ms    [User: 661.3 ms, System: 40.4 ms]
      Range (min … max):   696.1 ms … 706.9 ms    5 runs

    Summary
      'jk-for-each-ref-speedup: git-for-each-ref --format='%(objectname) %(refname)'' ran
        2.86 ± 0.03 times faster than 'jk-for-each-ref-speedup~: git-for-each-ref --format='%(objectname) %(refname)''
        3.93 ± 0.03 times faster than 'jk-for-each-ref-speedup~2: git-for-each-ref --format='%(objectname) %(refname)''

[snip]
> I'm not sure I'm really advocating that we should do something like
> this, though it is tempting because it does make some common queries
> much faster. But I wanted to share it here to give some insight on where
> the time may be going in ref-filter / for-each-ref.

Well, avoiding the sort like you do in #1 feels like the right thing to
do to me. I saw similar wins when optimizing the connectivity checks,
and it only makes sense that sorting becomes increasingly expensive as
your number of refs grows. And if we do skip the sort, then decreasing
the initial latency feels like the logical next step because we know we
can stream out things directly.

The quick-formatting is a different story altogether, as you rightly
point out. Having to explain why '%(objectname) %(refname)' is 2.5x
faster than '%(refname) %(objectname)' is probably nothing we'd want to
do. But in any case, your patch does point out that the current
formatting code could use some tweaking.

In any case, this is only my 2c. Thanks for this series!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      parent reply	other threads:[~2021-09-06  6:54 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-04 12:40 [hacky PATCH 0/2] speeding up trivial for-each-ref invocations Jeff King
2021-09-04 12:41 ` [PATCH 1/2] ref-filter: hacky "streaming" mode Jeff King
2021-09-05  8:20   ` ZheNing Hu
2021-09-05 13:04     ` Jeff King
2021-09-07  5:28       ` ZheNing Hu
2021-09-07 18:01         ` Jeff King
2021-09-09 14:45           ` ZheNing Hu
2021-09-10 14:26             ` Jeff King
2021-09-15 12:27               ` ZheNing Hu
2021-09-15 14:23                 ` ZheNing Hu
2021-09-16 21:45                   ` Jeff King
2021-09-20  7:42                     ` ZheNing Hu
2021-09-16 21:31                 ` Jeff King
2021-09-05 13:15     ` Jeff King
2021-09-07  5:42       ` ZheNing Hu
2021-09-04 12:42 ` [PATCH 2/2] ref-filter: implement "quick" formats Jeff King
2021-09-05  8:20   ` ZheNing Hu
2021-09-05 13:07     ` Jeff King
2021-09-06 13:34       ` ZheNing Hu
2021-09-07 20:06       ` Junio C Hamano
2021-09-05  8:19 ` [hacky PATCH 0/2] speeding up trivial for-each-ref invocations ZheNing Hu
2021-09-05 12:49   ` Jeff King
2021-09-06 13:30     ` ZheNing Hu
2021-09-07 17:28       ` Jeff King
2021-09-09 13:20         ` ZheNing Hu
2021-09-06  6:54 ` Patrick Steinhardt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTW7JV0sCq2rqvRT@ncase \
    --to=ps@pks.im \
    --cc=adlternative@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.