All of lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "Git List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH 2/2] ref-filter: implement "quick" formats
Date: Sun, 5 Sep 2021 16:20:07 +0800	[thread overview]
Message-ID: <CAOLTT8QYe3PBPxSH8CYY+FatSfT7C5m6nccR2xMZ1yxSDFh5OQ@mail.gmail.com> (raw)
In-Reply-To: <YTNps0YBOaRNvPzk@coredump.intra.peff.net>

Jeff King <peff@peff.net> 于2021年9月4日周六 下午8:42写道:
>
> Some commonly-used formats can be output _much_ faster than going
> through the usual atom-formatting code. E.g., "%(objectname) %(refname)"
> can just be a simple printf. This commit detects a few easy cases and
> uses a hard-coded output function instead.
>
> Note two things about the implementation:
>
>  - we could probably go further here. E.g., %(refname:lstrip) should be
>    easy-ish to optimize, too. Likewise, mixed-text like "delete
>    %(refname)" would be nice to have.
>
>  - the code is repetitive and enumerates all the cases, and it feels
>    like we ought to be able to generalize it more. But that's exactly
>    what the current formatting tries to do!
>
> So this whole thing is pretty horrible, and is a hack around the
> slowness of the whole used_atom system. It _should_ be possible to
> refactor that system to have roughly the same cost, but this will serve
> in the meantime.
>
> Here are some numbers ("stream" is Git with the streaming optimization
> from the previous commit, and "quick" is this commit):
>
>   Benchmark #1: ./git.stream for-each-ref --format='%(objectname) %(refname)'
>     Time (mean ± σ):     229.2 ms ±   6.6 ms    [User: 228.3 ms, System: 0.9 ms]
>     Range (min … max):   220.4 ms … 242.6 ms    13 runs
>
>   Benchmark #2: ./git.quick for-each-ref --format='%(objectname) %(refname)'
>     Time (mean ± σ):      94.8 ms ±   2.2 ms    [User: 93.5 ms, System: 1.4 ms]
>     Range (min … max):    90.8 ms …  98.2 ms    32 runs
>
>   Summary
>     './git.quick for-each-ref --format='%(objectname) %(refname)'' ran
>       2.42 ± 0.09 times faster than './git.stream for-each-ref --format='%(objectname) %(refname)''
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  ref-filter.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  ref-filter.h | 13 +++++++++++
>  2 files changed, 76 insertions(+)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index 17b78b1d30..1efa3aadc8 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1009,6 +1009,37 @@ static int reject_atom(enum atom_type atom_type)
>         return atom_type == ATOM_REST;
>  }
>
> +static void set_up_quick_format(struct ref_format *format)
> +{
> +       /* quick formats don't handle any special quoting */
> +       if (format->quote_style)
> +               return;
> +
> +       /*
> +        * no atoms at all; this should be uncommon in real life, but it may be
> +        * interesting for benchmarking
> +        */
> +       if (!used_atom_cnt) {
> +               format->quick = REF_FORMAT_QUICK_VERBATIM;
> +               return;
> +       }
> +
> +       /*
> +        * It's tempting to look at used_atom here, but it's wrong to do so: we
> +        * need not only to be sure of the needed atoms, but also their order
> +        * and any verbatim parts of the format. So instead, let's just
> +        * hard-code some specific formats.
> +        */
> +       if (!strcmp(format->format, "%(refname)"))
> +               format->quick = REF_FORMAT_QUICK_REFNAME;
> +       else if (!strcmp(format->format, "%(objectname)"))
> +               format->quick = REF_FORMAT_QUICK_OBJECTNAME;
> +       else if (!strcmp(format->format, "%(refname) %(objectname)"))
> +               format->quick = REF_FORMAT_QUICK_REFNAME_OBJECTNAME;
> +       else if (!strcmp(format->format, "%(objectname) %(refname)"))
> +               format->quick = REF_FORMAT_QUICK_OBJECTNAME_REFNAME;
> +}
> +
>  /*
>   * Make sure the format string is well formed, and parse out
>   * the used atoms.
> @@ -1047,6 +1078,9 @@ int verify_ref_format(struct ref_format *format)
>         }
>         if (format->need_color_reset_at_eol && !want_color(format->use_color))
>                 format->need_color_reset_at_eol = 0;
> +
> +       set_up_quick_format(format);
> +
>         return 0;
>  }
>
> @@ -2617,6 +2651,32 @@ static void append_literal(const char *cp, const char *ep, struct ref_formatting
>         }
>  }
>
> +static int quick_ref_format(const struct ref_format *format,
> +                           const char *refname,
> +                           const struct object_id *oid)
> +{
> +       switch(format->quick) {
> +       case REF_FORMAT_QUICK_NONE:
> +               return -1;
> +       case REF_FORMAT_QUICK_VERBATIM:
> +               printf("%s\n", format->format);
> +               return 0;
> +       case REF_FORMAT_QUICK_REFNAME:
> +               printf("%s\n", refname);
> +               return 0;
> +       case REF_FORMAT_QUICK_OBJECTNAME:
> +               printf("%s\n", oid_to_hex(oid));
> +               return 0;
> +       case REF_FORMAT_QUICK_REFNAME_OBJECTNAME:
> +               printf("%s %s\n", refname, oid_to_hex(oid));
> +               return 0;
> +       case REF_FORMAT_QUICK_OBJECTNAME_REFNAME:
> +               printf("%s %s\n", oid_to_hex(oid), refname);
> +               return 0;
> +       }
> +       BUG("unknown ref_format_quick value: %d", format->quick);
> +}
> +

So as a fast path, we actually avoided format_ref_array_item() when we are using
%(objectname) and %(refname). But the problem is that it’s not very elegant
(using string compare), and it is no optimization for other atoms that
require in-depth
parsing. I remember the "fast path" used by Ævar last time, and it
seems that Junio doesn't
like them. [1][2]

>  int format_ref_array_item(struct ref_array_item *info,
>                           struct ref_format *format,
>                           struct strbuf *final_buf,
> @@ -2670,6 +2730,9 @@ void pretty_print_ref(const char *name, const struct object_id *oid,
>         struct strbuf output = STRBUF_INIT;
>         struct strbuf err = STRBUF_INIT;
>
> +       if (!quick_ref_format(format, name, oid))
> +               return;
> +
>         ref_item = new_ref_array_item(name, oid);
>         ref_item->kind = ref_kind_from_refname(name);
>         if (format_ref_array_item(ref_item, format, &output, &err))
> diff --git a/ref-filter.h b/ref-filter.h
> index ecea1837a2..fde5c3a1cb 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -87,6 +87,19 @@ struct ref_format {
>
>         /* Internal state to ref-filter */
>         int need_color_reset_at_eol;
> +
> +       /*
> +        * Set by verify_ref_format(); if not NONE, we can skip the usual
> +        * formatting and use an optimized routine.
> +        */
> +       enum ref_format_quick {
> +               REF_FORMAT_QUICK_NONE = 0,
> +               REF_FORMAT_QUICK_VERBATIM,
> +               REF_FORMAT_QUICK_REFNAME,
> +               REF_FORMAT_QUICK_OBJECTNAME,
> +               REF_FORMAT_QUICK_REFNAME_OBJECTNAME,
> +               REF_FORMAT_QUICK_OBJECTNAME_REFNAME,
> +       } quick;
>  };
>
>  #define REF_FORMAT_INIT { .use_color = -1 }
> --
> 2.33.0.618.g5b11852304

[1]: https://lore.kernel.org/git/5903d02324f3275b3aa442bb3ca2602564c543dc.1626363626.git.gitgitgadget@gmail.com/
[2]: https://lore.kernel.org/git/87eecf8ork.fsf@evledraar.gmail.com/

  reply	other threads:[~2021-09-05  8:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-04 12:40 [hacky PATCH 0/2] speeding up trivial for-each-ref invocations Jeff King
2021-09-04 12:41 ` [PATCH 1/2] ref-filter: hacky "streaming" mode Jeff King
2021-09-05  8:20   ` ZheNing Hu
2021-09-05 13:04     ` Jeff King
2021-09-07  5:28       ` ZheNing Hu
2021-09-07 18:01         ` Jeff King
2021-09-09 14:45           ` ZheNing Hu
2021-09-10 14:26             ` Jeff King
2021-09-15 12:27               ` ZheNing Hu
2021-09-15 14:23                 ` ZheNing Hu
2021-09-16 21:45                   ` Jeff King
2021-09-20  7:42                     ` ZheNing Hu
2021-09-16 21:31                 ` Jeff King
2021-09-05 13:15     ` Jeff King
2021-09-07  5:42       ` ZheNing Hu
2021-09-04 12:42 ` [PATCH 2/2] ref-filter: implement "quick" formats Jeff King
2021-09-05  8:20   ` ZheNing Hu [this message]
2021-09-05 13:07     ` Jeff King
2021-09-06 13:34       ` ZheNing Hu
2021-09-07 20:06       ` Junio C Hamano
2021-09-05  8:19 ` [hacky PATCH 0/2] speeding up trivial for-each-ref invocations ZheNing Hu
2021-09-05 12:49   ` Jeff King
2021-09-06 13:30     ` ZheNing Hu
2021-09-07 17:28       ` Jeff King
2021-09-09 13:20         ` ZheNing Hu
2021-09-06  6:54 ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8QYe3PBPxSH8CYY+FatSfT7C5m6nccR2xMZ1yxSDFh5OQ@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.