git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Anders Waldenborg <anders@0x63.nu>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	christian.couder@gmail.com, peff@peff.net,
	jonathantanmy@google.com
Subject: Re: [PATCH 0/5] pretty format %(trailers): improve machine readability
Date: Mon, 07 Dec 2020 09:53:22 +0100	[thread overview]
Message-ID: <87h7oyxail.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <87wnxwp15o.fsf@0x63.nu>


On Sat, Dec 05 2020, Anders Waldenborg wrote:

> Ævar Arnfjörð Bjarmason writes:
>
>> I started writing this on top of "master", but then saw the
>> outstanding series of other miscellaneous fixes to this
>> facility[1]. This is on top of that topic & rebased on master.
>>
>> Anders, any plans to re-roll yours? Otherwise the conflicts I'd have
>> on mine are easy to fix, so I can also submit it as a stand-alone.
>
> Yes, I have plans to do that. But have yet to carve out the required
> time from my copious spare time to actually do it.
>
> So please don't hold your breath waiting for me to do that.

Thanks. I sent a v2 of mine yesterday as
https://lore.kernel.org/git/20201206002449.31452-1-avarab@gmail.com/

As noted there the merge conflict with yours is trivial, so hopefully it
won't cause you much hassle if you re-roll while it's outstanding.

>> This series comes out of a discussion at work today (well, yesterday
>> at this point) where someone wanted to parse %(trailers) output. As
>> noted in 3/5 doing this is rather tedious now if you're trying to
>> unambiguously grap trailers as a stream of key-value pairs.
>>
>> So this series adds a "key_value_separator" and "keyonly" parameters,
>> and fixes a few bugs I saw along the way.
>
> Interesting. When adding "valueonly" I never consider it being used
> without "key". The trick you are doing with separate keyonly and
> valueonly is quite clever.
>
> I've only been doing machine parsing for explicit keys, things like:
> "%cn%x00%x00%an%x00%x00%(trailers:key=Reviewed-By,valueonly,unfold,separator=%x00)%x00%x00%(trailers:key=Backport-Reviewed-By,valueonly,unfold,separator=%x00)"
> (double-NUL to separate field, single-NUL to separate values within field).
>
> But I can't help wonder that if the goal just is to have a nice machine
> parsable format maybe it would be easier (both for user and
> implementation) to have a separate placeholder for "machine readable
> trailers" which by default emits in a format suitable for machine
> parsing. Something like a new "%(ztrailers)" (but with a better name)
> which simply emits a sequence of "<KEY> NUL <VAL> NUL" for each trailer

I think it's a bit tricky to make something general in the middle of all
the custom format printf-likes in the pretty format. E.g. some users
might want to use \0 as a delimiter for key-values, others \0\0
etc. because they used \0, or the other way around.

Maybe if there's a reason to extend the optimization it could be smarter
about detecting that you only wanted some fixed-string separator and
nothing else custom?

B.t.w. I tried just deleting the optimization for testing and it slowed
down by around 8% on linux.git according to an extended
p4205-log-pretty-formats.sh.

Looking at the code I wonder if there aren't other lower hanging
optimizations, e.g. it seems we call find_separator() on multiple passes
instead of saving it away, e.g. in the format_trailers_from_commit()
entry point if there's any custom options such as "unfold".

I also wonder if memory allocation is a bottleneck in the "git log"
path, but didn't have time to refactor & test it. For each commit the
walking machinery eventually calls the trailer.c code, which allocates &
free()'s internal structures that could be re-used for parsing the next
commit.

  reply	other threads:[~2020-12-07  8:54 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-25 21:26 [PATCH 00/21] trailer fixes Anders Waldenborg
2020-10-25 21:26 ` [PATCH 01/21] trailer: change token_{from,matches}_item into taking conf_info Anders Waldenborg
2020-10-26 11:56   ` Christian Couder
2020-10-25 21:26 ` [PATCH 02/21] trailer: don't use 'struct arg_item' for storing config Anders Waldenborg
2020-10-25 21:26 ` [PATCH 03/21] doc: mention canonicalization in git i-t manual Anders Waldenborg
2020-10-26 12:14   ` Christian Couder
2020-10-25 21:26 ` [PATCH 04/21] pretty: allow using aliases in %(trailer:key=xyz) Anders Waldenborg
2020-10-26 12:38   ` Christian Couder
2020-10-25 21:26 ` [PATCH 05/21] trailer: rename 'free_all' to 'free_all_trailer_items' Anders Waldenborg
2020-10-26 12:42   ` Christian Couder
2020-11-10 19:52     ` Jeff King
2020-10-25 21:26 ` [PATCH 06/21] t4205: add test for trailer in log with nonstandard separator Anders Waldenborg
2020-10-26 12:43   ` Christian Couder
2020-11-09 22:12     ` Anders Waldenborg
2020-11-10  7:55       ` Christian Couder
2020-11-10 19:54       ` Jeff King
2020-10-25 21:26 ` [PATCH 07/21] trailer: simplify 'arg_item' lifetime Anders Waldenborg
2020-10-25 21:26 ` [PATCH 08/21] trailer: keep track of conf in trailer_item Anders Waldenborg
2020-11-10 19:58   ` Jeff King
2020-10-25 21:26 ` [PATCH 09/21] trailer: refactor print_tok_val into taking item Anders Waldenborg
2020-10-25 21:26 ` [PATCH 10/21] trailer: move trailer token canonicalization print time Anders Waldenborg
2020-10-25 21:26 ` [PATCH 11/21] trailer: remember separator used in input Anders Waldenborg
2020-10-25 21:26 ` [PATCH 12/21] trailer: handle configured nondefault separators explicitly Anders Waldenborg
2020-11-10 20:06   ` Jeff King
2020-10-25 21:26 ` [PATCH 13/21] trailer: add option to make canonicalization optional Anders Waldenborg
2020-11-10 20:10   ` Jeff King
2020-10-25 21:26 ` [PATCH 14/21] trailer: move skipping of blank lines to own loop when finding trailer Anders Waldenborg
2020-10-25 21:26 ` [PATCH 15/21] trailer: factor out classify_trailer_line Anders Waldenborg
2020-10-25 21:26 ` [PATCH 16/21] t7513: add failing test for configured trailing line classification Anders Waldenborg
2020-10-25 21:26 ` [PATCH 17/21] trailer: don't treat line with prefix of known trailer as known Anders Waldenborg
2020-11-10 20:16   ` Jeff King
2020-10-25 21:26 ` [PATCH 18/21] trailer: factor out config lookup to separate function Anders Waldenborg
2020-10-25 21:26 ` [PATCH 19/21] trailer: move config lookup out of parse_trailer Anders Waldenborg
2020-10-25 21:26 ` [PATCH 20/21] trailer: add failing tests for matching trailers against input Anders Waldenborg
2020-10-25 21:26 ` [PATCH 21/21] trailer: only do prefix matching for configured trailers on commandline Anders Waldenborg
2020-11-10  7:44 ` [PATCH 00/21] trailer fixes Christian Couder
2020-12-05  1:39 ` [PATCH 0/5] pretty format %(trailers): improve machine readability Ævar Arnfjörð Bjarmason
2020-12-05 18:18   ` Anders Waldenborg
2020-12-07  8:53     ` Ævar Arnfjörð Bjarmason [this message]
2020-12-06  0:24   ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2020-12-09 15:52     ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2020-12-10 10:48       ` Christian Couder
2020-12-10 19:00         ` Junio C Hamano
2020-12-09 15:52     ` [PATCH v3 1/5] pretty format %(trailers) test: split a long line Ævar Arnfjörð Bjarmason
2020-12-09 15:52     ` [PATCH v3 2/5] pretty format %(trailers) doc: avoid repetition Ævar Arnfjörð Bjarmason
2020-12-10 19:01       ` Junio C Hamano
2020-12-09 15:52     ` [PATCH v3 3/5] pretty-format %(trailers): fix broken standalone "valueonly" Ævar Arnfjörð Bjarmason
2020-12-09 15:52     ` [PATCH v3 4/5] pretty format %(trailers): add a "keyonly" Ævar Arnfjörð Bjarmason
2020-12-09 15:52     ` [PATCH v3 5/5] pretty format %(trailers): add a "key_value_separator" Ævar Arnfjörð Bjarmason
2020-12-06  0:24   ` [PATCH v2 1/5] pretty format %(trailers) test: split a long line Ævar Arnfjörð Bjarmason
2020-12-06  0:24   ` [PATCH v2 2/5] pretty format %(trailers) doc: avoid repetition Ævar Arnfjörð Bjarmason
2020-12-07  9:09     ` Christian Couder
2020-12-06  0:24   ` [PATCH v2 3/5] pretty-format %(trailers): fix broken standalone "valueonly" Ævar Arnfjörð Bjarmason
2020-12-06  0:24   ` [PATCH v2 4/5] pretty format %(trailers): add a "keyonly" Ævar Arnfjörð Bjarmason
2020-12-07  9:17     ` Christian Couder
2020-12-06  0:24   ` [PATCH v2 5/5] pretty format %(trailers): add a "key_value_separator" Ævar Arnfjörð Bjarmason
2020-12-05  1:39 ` [PATCH 1/5] pretty format %(trailers) test: split a long line Ævar Arnfjörð Bjarmason
2020-12-05  1:39 ` [PATCH 2/5] pretty format %(trailers): avoid needless repetition Ævar Arnfjörð Bjarmason
2020-12-05  5:43   ` Christian Couder
2020-12-05  1:39 ` [PATCH 3/5] pretty format %(trailers): add a "keyonly" Ævar Arnfjörð Bjarmason
2020-12-05  6:11   ` Christian Couder
2020-12-05 12:26     ` Ævar Arnfjörð Bjarmason
2020-12-05  1:39 ` [PATCH 4/5] pretty-format %(trailers): fix broken standalone "valueonly" Ævar Arnfjörð Bjarmason
2020-12-05  6:46   ` Christian Couder
2020-12-05  1:39 ` [PATCH 5/5] pretty format %(trailers): add a "key_value_separator" Ævar Arnfjörð Bjarmason
2020-12-05  7:13   ` Christian Couder
2020-12-05  8:49     ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h7oyxail.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=anders@0x63.nu \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).