All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "René Scharfe" <l.s.r@web.de>,
	"Carlo Marcelo Arenas Belón" <carenas@gmail.com>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"Git List" <git@vger.kernel.org>,
	"Hamza Mahfooz" <someguy@effective-light.com>
Subject: Re: [PATCH 1/5] grep: stop modifying buffer in strip_timestamp
Date: Tue, 21 Sep 2021 23:02:31 +0200	[thread overview]
Message-ID: <87v92t8wlp.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YUnrI10DXLElenKZ@coredump.intra.peff.net>


On Tue, Sep 21 2021, Jeff King wrote:

> On Tue, Sep 21, 2021 at 09:37:23AM +0200, René Scharfe wrote:
>
>> > @@ -965,9 +953,12 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol,
>> >  		bol += len;
>> >  		switch (p->field) {
>> >  		case GREP_HEADER_AUTHOR:
>> > -		case GREP_HEADER_COMMITTER:
>> > -			strip_timestamp(bol, &eol);
>> > +		case GREP_HEADER_COMMITTER: {
>> > +			char *em = memrchr(bol, '>', eol - bol);
>> > +			if (em)
>> > +				eol = em + 1;
>> 
>> The old code documents the intent via the function name.  The new one
>> goes into the nitty-gritty without further explanation, which I find
>> harder to read.
>
> Agreed. I do think the conversion is functionally correct, but it
> doesn't strike me as worth the change.

As far as some general improvement in thish area it seems to me that
this whole subthread is losing the forest for the trees.

We're operating a buffer that has "\n"'s in it, and then we loop and
split it up one line at a time, often just to match author OR
committer.

We then have things like commit_rewrite_person() in revision.c that's
preparing a "fake" commit just for grep.c's use, just for it to strip
those headers down again.

A real improvement in this area IMO would be trying to bridge the gap
between parse_commit_buffer() and grep.c's match_one_pattern().

I.e. we've even parsed this once before in commit.c's
parse_commit_date() by the time it gets to grep.c, now we're just doing
it again.

It probably makes sense to split up that commit.c code into something
that can give you structured output, i.e. headers with types and
start/end points for interesting data, then in grep.c we won't need a
strip_anything(), or strrchr() or memrchr() or whatever.

It would also be a lot faster for grepping if we could offload more of
this work to the regex engine, particularly if we've got a more capable
engine like PCREv2.

In many cases we're splitting lines ourselves, when we could have the
engine work in a multi-line mode, or translate the user's --author match
into something that can match the raw commit header. So have an implicit
/^author /m anchor if we're matching author headers, instead of having
grep.c string-twiddle that around one line at a time.

Just my 0.02.

  reply	other threads:[~2021-09-21 21:09 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-21  3:45 [PATCH 0/5] const-correctness in grep.c Jeff King
2021-09-21  3:46 ` [PATCH 1/5] grep: stop modifying buffer in strip_timestamp Jeff King
2021-09-21  5:18   ` Carlo Arenas
2021-09-21  5:24     ` Eric Sunshine
2021-09-21  5:40       ` Carlo Arenas
2021-09-21  5:43       ` Jeff King
2021-09-21  6:42         ` Carlo Marcelo Arenas Belón
2021-09-21  7:37           ` René Scharfe
2021-09-21 14:24             ` Jeff King
2021-09-21 21:02               ` Ævar Arnfjörð Bjarmason [this message]
2021-09-22 20:20                 ` Jeff King
2021-09-23  0:53                   ` Ævar Arnfjörð Bjarmason
2021-09-21  3:48 ` [PATCH 2/5] grep: stop modifying buffer in show_line() Jeff King
2021-09-21  4:22   ` Taylor Blau
2021-09-21  4:42     ` Jeff King
2021-09-21  4:45       ` Taylor Blau
2021-09-21  3:48 ` [PATCH 3/5] grep: stop modifying buffer in grep_source_1() Jeff King
2021-09-21  3:49 ` [PATCH 4/5] grep: mark "haystack" buffers as const Jeff King
2021-09-21 12:04   ` Ævar Arnfjörð Bjarmason
2021-09-21 14:27     ` Jeff King
2021-09-21  3:51 ` [PATCH 5/5] grep: store grep_source buffer " Jeff King
2021-09-21  4:30 ` [PATCH 0/5] const-correctness in grep.c Taylor Blau
2021-09-21 12:07 ` Ævar Arnfjörð Bjarmason
2021-09-21 14:49   ` Jeff King
2021-09-21 12:45 ` [PATCH 6/5] grep.c: mark eol/bol and derived as "const char * const" Ævar Arnfjörð Bjarmason
2021-09-21 14:53   ` Jeff King
2021-09-21 15:17     ` Ævar Arnfjörð Bjarmason
2021-09-21 19:18       ` Jeff King
2021-09-23 13:56         ` Ævar Arnfjörð Bjarmason
2021-09-24  4:22           ` Junio C Hamano
2021-09-22 19:02     ` Junio C Hamano
2021-09-22 18:57 ` [PATCH 0/5] const-correctness in grep.c Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v92t8wlp.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=carenas@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=someguy@effective-light.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.