git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "Git List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH 2/2] ref-filter: implement "quick" formats
Date: Sun, 5 Sep 2021 16:20:07 +0800	[thread overview]
Message-ID: <CAOLTT8QYe3PBPxSH8CYY+FatSfT7C5m6nccR2xMZ1yxSDFh5OQ@mail.gmail.com> (raw)
In-Reply-To: <YTNps0YBOaRNvPzk@coredump.intra.peff.net>

Jeff King <peff@peff.net> 于2021年9月4日周六 下午8:42写道:
>
> Some commonly-used formats can be output _much_ faster than going
> through the usual atom-formatting code. E.g., "%(objectname) %(refname)"
> can just be a simple printf. This commit detects a few easy cases and
> uses a hard-coded output function instead.
>
> Note two things about the implementation:
>
>  - we could probably go further here. E.g., %(refname:lstrip) should be
>    easy-ish to optimize, too. Likewise, mixed-text like "delete
>    %(refname)" would be nice to have.
>
>  - the code is repetitive and enumerates all the cases, and it feels
>    like we ought to be able to generalize it more. But that's exactly
>    what the current formatting tries to do!
>
> So this whole thing is pretty horrible, and is a hack around the
> slowness of the whole used_atom system. It _should_ be possible to
> refactor that system to have roughly the same cost, but this will serve
> in the meantime.
>
> Here are some numbers ("stream" is Git with the streaming optimization
> from the previous commit, and "quick" is this commit):
>
>   Benchmark #1: ./git.stream for-each-ref --format='%(objectname) %(refname)'
>     Time (mean ± σ):     229.2 ms ±   6.6 ms    [User: 228.3 ms, System: 0.9 ms]
>     Range (min … max):   220.4 ms … 242.6 ms    13 runs
>
>   Benchmark #2: ./git.quick for-each-ref --format='%(objectname) %(refname)'
>     Time (mean ± σ):      94.8 ms ±   2.2 ms    [User: 93.5 ms, System: 1.4 ms]
>     Range (min … max):    90.8 ms …  98.2 ms    32 runs
>
>   Summary
>     './git.quick for-each-ref --format='%(objectname) %(refname)'' ran
>       2.42 ± 0.09 times faster than './git.stream for-each-ref --format='%(objectname) %(refname)''
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  ref-filter.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  ref-filter.h | 13 +++++++++++
>  2 files changed, 76 insertions(+)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 17b78b1d30..1efa3aadc8 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1009,6 +1009,37 @@ static int reject_atom(enum atom_type atom_type)
>         return atom_type == ATOM_REST;
>  }
>
> +static void set_up_quick_format(struct ref_format *format)
> +{
> +       /* quick formats don't handle any special quoting */
> +       if (format->quote_style)
> +               return;
> +
> +       /*
> +        * no atoms at all; this should be uncommon in real life, but it may be
> +        * interesting for benchmarking
> +        */
> +       if (!used_atom_cnt) {
> +               format->quick = REF_FORMAT_QUICK_VERBATIM;
> +               return;
> +       }
> +
> +       /*
> +        * It's tempting to look at used_atom here, but it's wrong to do so: we
> +        * need not only to be sure of the needed atoms, but also their order
> +        * and any verbatim parts of the format. So instead, let's just
> +        * hard-code some specific formats.
> +        */
> +       if (!strcmp(format->format, "%(refname)"))
> +               format->quick = REF_FORMAT_QUICK_REFNAME;
> +       else if (!strcmp(format->format, "%(objectname)"))
> +               format->quick = REF_FORMAT_QUICK_OBJECTNAME;
> +       else if (!strcmp(format->format, "%(refname) %(objectname)"))
> +               format->quick = REF_FORMAT_QUICK_REFNAME_OBJECTNAME;
> +       else if (!strcmp(format->format, "%(objectname) %(refname)"))
> +               format->quick = REF_FORMAT_QUICK_OBJECTNAME_REFNAME;
> +}
> +
>  /*
>   * Make sure the format string is well formed, and parse out
>   * the used atoms.
> @@ -1047,6 +1078,9 @@ int verify_ref_format(struct ref_format *format)
>         }
>         if (format->need_color_reset_at_eol && !want_color(format->use_color))
>                 format->need_color_reset_at_eol = 0;
> +
> +       set_up_quick_format(format);
> +
>         return 0;
>  }
>
> @@ -2617,6 +2651,32 @@ static void append_literal(const char *cp, const char *ep, struct ref_formatting
>         }
>  }
>
> +static int quick_ref_format(const struct ref_format *format,
> +                           const char *refname,
> +                           const struct object_id *oid)
> +{
> +       switch(format->quick) {
> +       case REF_FORMAT_QUICK_NONE:
> +               return -1;
> +       case REF_FORMAT_QUICK_VERBATIM:
> +               printf("%s\n", format->format);
> +               return 0;
> +       case REF_FORMAT_QUICK_REFNAME:
> +               printf("%s\n", refname);
> +               return 0;
> +       case REF_FORMAT_QUICK_OBJECTNAME:
> +               printf("%s\n", oid_to_hex(oid));
> +               return 0;
> +       case REF_FORMAT_QUICK_REFNAME_OBJECTNAME:
> +               printf("%s %s\n", refname, oid_to_hex(oid));
> +               return 0;
> +       case REF_FORMAT_QUICK_OBJECTNAME_REFNAME:
> +               printf("%s %s\n", oid_to_hex(oid), refname);
> +               return 0;
> +       }
> +       BUG("unknown ref_format_quick value: %d", format->quick);
> +}
> +

So as a fast path, we actually avoided format_ref_array_item() when we are using
%(objectname) and %(refname). But the problem is that it’s not very elegant
(using string compare), and it is no optimization for other atoms that
require in-depth
parsing. I remember the "fast path" used by Ævar last time, and it
seems that Junio doesn't
like them. [1][2]

>  int format_ref_array_item(struct ref_array_item *info,
>                           struct ref_format *format,
>                           struct strbuf *final_buf,
> @@ -2670,6 +2730,9 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
>         struct strbuf output = STRBUF_INIT;
>         struct strbuf err = STRBUF_INIT;
>
> +       if (!quick_ref_format(format, name, oid))
> +               return;
> +
>         ref_item = new_ref_array_item(name, oid);
>         ref_item->kind = ref_kind_from_refname(name);
>         if (format_ref_array_item(ref_item, format, &output, &err))
> diff --git a/ref-filter.h b/ref-filter.h
> index ecea1837a2..fde5c3a1cb 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -87,6 +87,19 @@ struct ref_format {
>
>         /* Internal state to ref-filter */
>         int need_color_reset_at_eol;
> +
> +       /*
> +        * Set by verify_ref_format(); if not NONE, we can skip the usual
> +        * formatting and use an optimized routine.
> +        */
> +       enum ref_format_quick {
> +               REF_FORMAT_QUICK_NONE = 0,
> +               REF_FORMAT_QUICK_VERBATIM,
> +               REF_FORMAT_QUICK_REFNAME,
> +               REF_FORMAT_QUICK_OBJECTNAME,
> +               REF_FORMAT_QUICK_REFNAME_OBJECTNAME,
> +               REF_FORMAT_QUICK_OBJECTNAME_REFNAME,
> +       } quick;
>  };
>
>  #define REF_FORMAT_INIT { .use_color = -1 }
> --
> 2.33.0.618.g5b11852304

[1]: https://lore.kernel.org/git/5903d02324f3275b3aa442bb3ca2602564c543dc.1626363626.git.gitgitgadget@gmail.com/
[2]: https://lore.kernel.org/git/87eecf8ork.fsf@evledraar.gmail.com/

  reply	other threads:[~2021-09-05  8:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-04 12:40 [hacky PATCH 0/2] speeding up trivial for-each-ref invocations Jeff King
2021-09-04 12:41 ` [PATCH 1/2] ref-filter: hacky "streaming" mode Jeff King
2021-09-05  8:20   ` ZheNing Hu
2021-09-05 13:04     ` Jeff King
2021-09-07  5:28       ` ZheNing Hu
2021-09-07 18:01         ` Jeff King
2021-09-09 14:45           ` ZheNing Hu
2021-09-10 14:26             ` Jeff King
2021-09-15 12:27               ` ZheNing Hu
2021-09-15 14:23                 ` ZheNing Hu
2021-09-16 21:45                   ` Jeff King
2021-09-20  7:42                     ` ZheNing Hu
2021-09-16 21:31                 ` Jeff King
2021-09-05 13:15     ` Jeff King
2021-09-07  5:42       ` ZheNing Hu
2021-09-04 12:42 ` [PATCH 2/2] ref-filter: implement "quick" formats Jeff King
2021-09-05  8:20   ` ZheNing Hu [this message]
2021-09-05 13:07     ` Jeff King
2021-09-06 13:34       ` ZheNing Hu
2021-09-07 20:06       ` Junio C Hamano
2021-09-05  8:19 ` [hacky PATCH 0/2] speeding up trivial for-each-ref invocations ZheNing Hu
2021-09-05 12:49   ` Jeff King
2021-09-06 13:30     ` ZheNing Hu
2021-09-07 17:28       ` Jeff King
2021-09-09 13:20         ` ZheNing Hu
2021-09-06  6:54 ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8QYe3PBPxSH8CYY+FatSfT7C5m6nccR2xMZ1yxSDFh5OQ@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --subject='Re: [PATCH 2/2] ref-filter: implement "quick" formats' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).