All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Hamza Mahfooz <someguy@effective-light.com>
Subject: Re: [PATCH 0/5] const-correctness in grep.c
Date: Tue, 21 Sep 2021 10:49:37 -0400	[thread overview]
Message-ID: <YUnxAawpNjyb0z3o@coredump.intra.peff.net> (raw)
In-Reply-To: <87czp29l2c.fsf@evledraar.gmail.com>

On Tue, Sep 21, 2021 at 02:07:08PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This whole thing looks good to me. I only found a small whitespace nit
> in one of the patches. Did you consider following-up by having this code
> take const char*/const size_t pairs. E.g. starting with something like
> the below.

I do generally find ptr/len pairs to be easier to read, but they also
make it really easy to introduce subtle bugs. E.g., if you consume part
of a buffer, you have to tweak both the ptr and the len. So the current:

	while (word_char(bol[-1]) && bol < eol)
		bol++;

has to become:

	while (word_char(bol[-1] && len > 0) {
		bol++;
		len--;
	}

So I'd be hesitant to churn battle-tested code in such a way for what I
consider to be a pretty minor benefit.

I did notice the ugly use of "unsigned long" here in a few places
(rather than size_t). I do think it is worth fixing, but it seemed a
little too far to try to cram into this series (it's obviously touching
the same lines, but it's quite orthogonal semantically).

The other hesitation I had is that the source of this "unsigned long"
pattern is almost certainly the object code (which is much more
important to convert, as it blocks people from having >4GB objects on
Windows). So we might want to just wait for a larger conversion there.
OTOH, I don't think there is any downside to a partial conversion here
in the meantime (because size_t will always be at least as long as
"unsigned long" in practice).

> -static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
> +static int pcre2match(struct grep_pat *p, const char *line, const size_t len,
>  		regmatch_t *match, int eflags)
>  {
>  	int ret, flags = 0;
> @@ -448,11 +448,11 @@ static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
>  
>  	if (p->pcre2_jit_on)
>  		ret = pcre2_jit_match(p->pcre2_pattern, (unsigned char *)line,
> -				      eol - line, 0, flags, p->pcre2_match_data,
> +				      len, 0, flags, p->pcre2_match_data,
>  				      NULL);
>  	else
>  		ret = pcre2_match(p->pcre2_pattern, (unsigned char *)line,
> -				  eol - line, 0, flags, p->pcre2_match_data,
> +				  len, 0, flags, p->pcre2_match_data,
>  				  NULL);

Not related to your point, but these casts are funny now. They are meant
to cast to "unsigned char" pointers to match pcre's signature, but now
they are casting away const-ness, too. That might be worth fixing as
part of this series.

Though should they really be casting to PCRE2_SPTR? The types are opaque
in their API because of the weird multi-width thing, though I find it
hard to imagine us ever using the wider versions of the library.

-Peff

  reply	other threads:[~2021-09-21 14:49 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-21  3:45 [PATCH 0/5] const-correctness in grep.c Jeff King
2021-09-21  3:46 ` [PATCH 1/5] grep: stop modifying buffer in strip_timestamp Jeff King
2021-09-21  5:18   ` Carlo Arenas
2021-09-21  5:24     ` Eric Sunshine
2021-09-21  5:40       ` Carlo Arenas
2021-09-21  5:43       ` Jeff King
2021-09-21  6:42         ` Carlo Marcelo Arenas Belón
2021-09-21  7:37           ` René Scharfe
2021-09-21 14:24             ` Jeff King
2021-09-21 21:02               ` Ævar Arnfjörð Bjarmason
2021-09-22 20:20                 ` Jeff King
2021-09-23  0:53                   ` Ævar Arnfjörð Bjarmason
2021-09-21  3:48 ` [PATCH 2/5] grep: stop modifying buffer in show_line() Jeff King
2021-09-21  4:22   ` Taylor Blau
2021-09-21  4:42     ` Jeff King
2021-09-21  4:45       ` Taylor Blau
2021-09-21  3:48 ` [PATCH 3/5] grep: stop modifying buffer in grep_source_1() Jeff King
2021-09-21  3:49 ` [PATCH 4/5] grep: mark "haystack" buffers as const Jeff King
2021-09-21 12:04   ` Ævar Arnfjörð Bjarmason
2021-09-21 14:27     ` Jeff King
2021-09-21  3:51 ` [PATCH 5/5] grep: store grep_source buffer " Jeff King
2021-09-21  4:30 ` [PATCH 0/5] const-correctness in grep.c Taylor Blau
2021-09-21 12:07 ` Ævar Arnfjörð Bjarmason
2021-09-21 14:49   ` Jeff King [this message]
2021-09-21 12:45 ` [PATCH 6/5] grep.c: mark eol/bol and derived as "const char * const" Ævar Arnfjörð Bjarmason
2021-09-21 14:53   ` Jeff King
2021-09-21 15:17     ` Ævar Arnfjörð Bjarmason
2021-09-21 19:18       ` Jeff King
2021-09-23 13:56         ` Ævar Arnfjörð Bjarmason
2021-09-24  4:22           ` Junio C Hamano
2021-09-22 19:02     ` Junio C Hamano
2021-09-22 18:57 ` [PATCH 0/5] const-correctness in grep.c Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUnxAawpNjyb0z3o@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=someguy@effective-light.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.