From: Patrick Steinhardt <ps@pks.im> To: Jeff King <peff@peff.net> Cc: git@vger.kernel.org, ZheNing Hu <adlternative@gmail.com> Subject: Re: [hacky PATCH 0/2] speeding up trivial for-each-ref invocations Date: Mon, 6 Sep 2021 08:54:29 +0200 [thread overview] Message-ID: <YTW7JV0sCq2rqvRT@ncase> (raw) In-Reply-To: <YTNpQ7Od1U/5i0R7@coredump.intra.peff.net> [-- Attachment #1: Type: text/plain, Size: 3332 bytes --] On Sat, Sep 04, 2021 at 08:40:35AM -0400, Jeff King wrote: > Somebody complained to me off-list recently that for-each-ref was slow > when doing a trivial listing (like refname + objectname) of a large > number of refs, even though you could get mostly the same output by > just dumping the packed-refs file. > > So I was nerd-sniped into making this horrible series, which does speed > up some cases. It was a combination of "how fast can we easily get it", > plus I was curious _which_ parts of for-each-ref were actually slow. > > In this version there are 2 patches, tested against 'git for-each-ref > --format="%(objectname) %(refname)"' on a fully packed repo with 500k > refs: > > - directly outputting items rather than building up a ref_array yields > about a 1.5x speedup > > - avoiding the whole atom-formatting system yields a 2.5x speedup on > top of that Interesting, thanks for the PoC. Your 500k refs naturally triggered me given that I have recently been trying to optimize such repos with lots of refs, too. So I ran your series on my gitlab-org/gitlab repository with 2.3M ref and saw similar speedups: Benchmark #1: jk-for-each-ref-speedup~2: git-for-each-ref --format='%(objectname) %(refname)' Time (mean ± σ): 2.756 s ± 0.013 s [User: 2.416 s, System: 0.339 s] Range (min … max): 2.740 s … 2.774 s 5 runs Benchmark #2: jk-for-each-ref-speedup~: git-for-each-ref --format='%(objectname) %(refname)' Time (mean ± σ): 2.009 s ± 0.015 s [User: 1.974 s, System: 0.035 s] Range (min … max): 1.984 s … 2.020 s 5 runs Benchmark #3: jk-for-each-ref-speedup: git-for-each-ref --format='%(objectname) %(refname)' Time (mean ± σ): 701.8 ms ± 4.2 ms [User: 661.3 ms, System: 40.4 ms] Range (min … max): 696.1 ms … 706.9 ms 5 runs Summary 'jk-for-each-ref-speedup: git-for-each-ref --format='%(objectname) %(refname)'' ran 2.86 ± 0.03 times faster than 'jk-for-each-ref-speedup~: git-for-each-ref --format='%(objectname) %(refname)'' 3.93 ± 0.03 times faster than 'jk-for-each-ref-speedup~2: git-for-each-ref --format='%(objectname) %(refname)'' [snip] > I'm not sure I'm really advocating that we should do something like > this, though it is tempting because it does make some common queries > much faster. But I wanted to share it here to give some insight on where > the time may be going in ref-filter / for-each-ref. Well, avoiding the sort like you do in #1 feels like the right thing to do to me. I saw similar wins when optimizing the connectivity checks, and it only makes sense that sorting becomes increasingly expensive as your number of refs grows. And if we do skip the sort, then decreasing the initial latency feels like the logical next step because we know we can stream out things directly. The quick-formatting is a different story altogether, as you rightly point out. Having to explain why '%(objectname) %(refname)' is 2.5x faster than '%(refname) %(objectname)' is probably nothing we'd want to do. But in any case, your patch does point out that the current formatting code could use some tweaking. In any case, this is only my 2c. Thanks for this series! Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
prev parent reply other threads:[~2021-09-06 6:54 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-04 12:40 Jeff King 2021-09-04 12:41 ` [PATCH 1/2] ref-filter: hacky "streaming" mode Jeff King 2021-09-05 8:20 ` ZheNing Hu 2021-09-05 13:04 ` Jeff King 2021-09-07 5:28 ` ZheNing Hu 2021-09-07 18:01 ` Jeff King 2021-09-09 14:45 ` ZheNing Hu 2021-09-10 14:26 ` Jeff King 2021-09-15 12:27 ` ZheNing Hu 2021-09-15 14:23 ` ZheNing Hu 2021-09-16 21:45 ` Jeff King 2021-09-20 7:42 ` ZheNing Hu 2021-09-16 21:31 ` Jeff King 2021-09-05 13:15 ` Jeff King 2021-09-07 5:42 ` ZheNing Hu 2021-09-04 12:42 ` [PATCH 2/2] ref-filter: implement "quick" formats Jeff King 2021-09-05 8:20 ` ZheNing Hu 2021-09-05 13:07 ` Jeff King 2021-09-06 13:34 ` ZheNing Hu 2021-09-07 20:06 ` Junio C Hamano 2021-09-05 8:19 ` [hacky PATCH 0/2] speeding up trivial for-each-ref invocations ZheNing Hu 2021-09-05 12:49 ` Jeff King 2021-09-06 13:30 ` ZheNing Hu 2021-09-07 17:28 ` Jeff King 2021-09-09 13:20 ` ZheNing Hu 2021-09-06 6:54 ` Patrick Steinhardt [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YTW7JV0sCq2rqvRT@ncase \ --to=ps@pks.im \ --cc=adlternative@gmail.com \ --cc=git@vger.kernel.org \ --cc=peff@peff.net \ --subject='Re: [hacky PATCH 0/2] speeding up trivial for-each-ref invocations' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).