All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: ZheNing Hu <adlternative@gmail.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH 1/2] ref-filter: hacky "streaming" mode
Date: Sun, 5 Sep 2021 09:15:04 -0400	[thread overview]
Message-ID: <YTTC2IUO1ZmTOEoR@coredump.intra.peff.net> (raw)
In-Reply-To: <CAOLTT8Tka0nxHb3G9yb-fs8ue7RaPCUVSKi5PM+GY+rMjFRnog@mail.gmail.com>

On Sun, Sep 05, 2021 at 04:20:02PM +0800, ZheNing Hu wrote:

> > +       if (ref_cbdata->filter->streaming_format) {
> > +               pretty_print_ref(refname, oid, ref_cbdata->filter->streaming_format);
> 
> So we directly use pretty_print_ref() in streaming mode, OK.
> 
> > +       } else {
> > +               /*
> > +                * We do not open the object yet; sort may only need refname
> > +                * to do its job and the resulting list may yet to be pruned
> > +                * by maxcount logic.
> > +                */
> > +               ref = ref_array_push(ref_cbdata->array, refname, oid);
> > +               ref->commit = commit;
> > +               ref->flag = flag;
> > +               ref->kind = kind;
> > +       }
> >
> >         return 0;
> >  }
> 
> Therefore, in streaming mode, there is no need to push ref to
> ref_array, which can
> reduce the overhead of malloc(), free(), which makes sense.

By the way, one thing I wondered here: how much of the benefit is from
avoiding the ref_array, and how much is from skipping the sort entirely.

It turns out that most of it is from the latter. If I do this:

diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 89cb6307d4..037d5db814 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -78,7 +78,11 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
 	filter.name_patterns = argv;
 	filter.match_as_path = 1;
 	filter_refs(&array, &filter, FILTER_REFS_ALL | FILTER_REFS_INCLUDE_BROKEN);
-	ref_array_sort(sorting, &array);
+	/*
+	 * we should skip this only when we are using the default refname
+	 * sorting, but as an experimental hack, we'll just comment it out.
+	 */
+	// ref_array_sort(sorting, &array);
 
 	if (!maxcount || array.nr < maxcount)
 		maxcount = array.nr;

then the timings I get are:

  Benchmark #1: ./git.old for-each-ref --format='%(objectname) %(refname)'
    Time (mean ± σ):     341.4 ms ±   7.4 ms    [User: 299.8 ms, System: 41.6 ms]
    Range (min … max):   333.5 ms … 355.1 ms    10 runs
   
  Benchmark #2: ./git.new for-each-ref --format='%(objectname) %(refname)'
    Time (mean ± σ):     249.1 ms ±   5.7 ms    [User: 211.8 ms, System: 37.2 ms]
    Range (min … max):   245.9 ms … 267.0 ms    12 runs
   
  Summary
    './git.new for-each-ref --format='%(objectname) %(refname)'' ran
      1.37 ± 0.04 times faster than './git.old for-each-ref --format='%(objectname) %(refname)''

So of the 1.5x improvement that the original patch showed, 1.37x is from
skipping the sort of the already-sorted data. I suspect that has less to
do with sorting at all, and more to do with the fact that even just
formatting "%(refname)" for each entry takes a non-trivial amount of
time.

-Peff

  parent reply	other threads:[~2021-09-05 13:15 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-04 12:40 [hacky PATCH 0/2] speeding up trivial for-each-ref invocations Jeff King
2021-09-04 12:41 ` [PATCH 1/2] ref-filter: hacky "streaming" mode Jeff King
2021-09-05  8:20   ` ZheNing Hu
2021-09-05 13:04     ` Jeff King
2021-09-07  5:28       ` ZheNing Hu
2021-09-07 18:01         ` Jeff King
2021-09-09 14:45           ` ZheNing Hu
2021-09-10 14:26             ` Jeff King
2021-09-15 12:27               ` ZheNing Hu
2021-09-15 14:23                 ` ZheNing Hu
2021-09-16 21:45                   ` Jeff King
2021-09-20  7:42                     ` ZheNing Hu
2021-09-16 21:31                 ` Jeff King
2021-09-05 13:15     ` Jeff King [this message]
2021-09-07  5:42       ` ZheNing Hu
2021-09-04 12:42 ` [PATCH 2/2] ref-filter: implement "quick" formats Jeff King
2021-09-05  8:20   ` ZheNing Hu
2021-09-05 13:07     ` Jeff King
2021-09-06 13:34       ` ZheNing Hu
2021-09-07 20:06       ` Junio C Hamano
2021-09-05  8:19 ` [hacky PATCH 0/2] speeding up trivial for-each-ref invocations ZheNing Hu
2021-09-05 12:49   ` Jeff King
2021-09-06 13:30     ` ZheNing Hu
2021-09-07 17:28       ` Jeff King
2021-09-09 13:20         ` ZheNing Hu
2021-09-06  6:54 ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTTC2IUO1ZmTOEoR@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=adlternative@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.