All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Hamza Mahfooz <someguy@effective-light.com>
Subject: Re: [PATCH 0/5] const-correctness in grep.c
Date: Tue, 21 Sep 2021 14:07:08 +0200	[thread overview]
Message-ID: <87czp29l2c.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YUlVZk1xXulAqdef@coredump.intra.peff.net>


On Mon, Sep 20 2021, Jeff King wrote:

> While discussing [1], I noticed that the grep code mostly takes
> non-const buffers, even though it is conceptually a read-only operation
> to search in them. The culprit is a handful of spots that temporarily
> tie off NUL-terminated strings by overwriting a byte of the buffer and
> then restoring it. But I think we no longer need to do so these days,
> now that we have a regexec_buf() that can take a ptr/size pair.
>
> The first three patches are a bit repetitive, but I broke them up
> individually because they're the high-risk part. I.e., if my assumptions
> about needing the NUL are wrong, it could introduce a bug. But based on
> my reading of the code, plus running the test suite with ASan/UBSan, I
> feel reasonably confident.
>
> The last two are the bigger cleanups, but should obviously avoid any
> behavior changes.
>
>   [1/5]: grep: stop modifying buffer in strip_timestamp
>   [2/5]: grep: stop modifying buffer in show_line()
>   [3/5]: grep: stop modifying buffer in grep_source_1()
>   [4/5]: grep: mark "haystack" buffers as const
>   [5/5]: grep: store grep_source buffer as const
>
>  grep.c | 87 +++++++++++++++++++++++++++++-----------------------------
>  grep.h |  4 +--
>  2 files changed, 45 insertions(+), 46 deletions(-)
>
> -Peff
>
> [1] https://lore.kernel.org/git/YUk3zwuse56v76ze@coredump.intra.peff.net/

This whole thing looks good to me. I only found a small whitespace nit
in one of the patches. Did you consider following-up by having this code
take const char*/const size_t pairs. E.g. starting with something like
the below.

When this API is called it's called like that, and the regex functions
at the bottom expect that, but we have all the bol/eol twiddling in the
middle, which is often confusing because some functions pass the
pointers along as-is, and some modify them. So not for now, but I think
rolling with what I started here below would make sense for this file
eventually:

diff --git a/grep.c b/grep.c
index 14fe8a0fd23..f55ec5c0e09 100644
--- a/grep.c
+++ b/grep.c
@@ -436,7 +436,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	}
 }
 
-static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
+static int pcre2match(struct grep_pat *p, const char *line, const size_t len,
 		regmatch_t *match, int eflags)
 {
 	int ret, flags = 0;
@@ -448,11 +448,11 @@ static int pcre2match(struct grep_pat *p, const char *line, const char *eol,
 
 	if (p->pcre2_jit_on)
 		ret = pcre2_jit_match(p->pcre2_pattern, (unsigned char *)line,
-				      eol - line, 0, flags, p->pcre2_match_data,
+				      len, 0, flags, p->pcre2_match_data,
 				      NULL);
 	else
 		ret = pcre2_match(p->pcre2_pattern, (unsigned char *)line,
-				  eol - line, 0, flags, p->pcre2_match_data,
+				  len, 0, flags, p->pcre2_match_data,
 				  NULL);
 
 	if (ret < 0 && ret != PCRE2_ERROR_NOMATCH) {
@@ -909,15 +909,15 @@ static void show_name(struct grep_opt *opt, const char *name)
 }
 
 static int patmatch(struct grep_pat *p,
-		    const char *line, const char *eol,
+		    const char *line, const size_t len,
 		    regmatch_t *match, int eflags)
 {
 	int hit;
 
 	if (p->pcre2_pattern)
-		hit = !pcre2match(p, line, eol, match, eflags);
+		hit = !pcre2match(p, line, len, match, eflags);
 	else
-		hit = !regexec_buf(&p->regexp, line, eol - line, 1, match,
+		hit = !regexec_buf(&p->regexp, line, len, 1, match,
 				   eflags);
 
 	return hit;
@@ -976,7 +976,7 @@ static int match_one_pattern(struct grep_pat *p,
 	}
 
  again:
-	hit = patmatch(p, bol, eol, pmatch, eflags);
+	hit = patmatch(p, bol, eol - bol, pmatch, eflags);
 
 	if (hit && p->word_regexp) {
 		if ((pmatch[0].rm_so < 0) ||
@@ -1447,7 +1447,7 @@ static int look_ahead(struct grep_opt *opt,
 		int hit;
 		regmatch_t m;
 
-		hit = patmatch(p, bol, bol + *left_p, &m, 0);
+		hit = patmatch(p, bol, *left_p, &m, 0);
 		if (!hit || m.rm_so < 0 || m.rm_eo < 0)
 			continue;
 		if (earliest < 0 || m.rm_so < earliest)



  parent reply	other threads:[~2021-09-21 12:20 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-21  3:45 [PATCH 0/5] const-correctness in grep.c Jeff King
2021-09-21  3:46 ` [PATCH 1/5] grep: stop modifying buffer in strip_timestamp Jeff King
2021-09-21  5:18   ` Carlo Arenas
2021-09-21  5:24     ` Eric Sunshine
2021-09-21  5:40       ` Carlo Arenas
2021-09-21  5:43       ` Jeff King
2021-09-21  6:42         ` Carlo Marcelo Arenas Belón
2021-09-21  7:37           ` René Scharfe
2021-09-21 14:24             ` Jeff King
2021-09-21 21:02               ` Ævar Arnfjörð Bjarmason
2021-09-22 20:20                 ` Jeff King
2021-09-23  0:53                   ` Ævar Arnfjörð Bjarmason
2021-09-21  3:48 ` [PATCH 2/5] grep: stop modifying buffer in show_line() Jeff King
2021-09-21  4:22   ` Taylor Blau
2021-09-21  4:42     ` Jeff King
2021-09-21  4:45       ` Taylor Blau
2021-09-21  3:48 ` [PATCH 3/5] grep: stop modifying buffer in grep_source_1() Jeff King
2021-09-21  3:49 ` [PATCH 4/5] grep: mark "haystack" buffers as const Jeff King
2021-09-21 12:04   ` Ævar Arnfjörð Bjarmason
2021-09-21 14:27     ` Jeff King
2021-09-21  3:51 ` [PATCH 5/5] grep: store grep_source buffer " Jeff King
2021-09-21  4:30 ` [PATCH 0/5] const-correctness in grep.c Taylor Blau
2021-09-21 12:07 ` Ævar Arnfjörð Bjarmason [this message]
2021-09-21 14:49   ` Jeff King
2021-09-21 12:45 ` [PATCH 6/5] grep.c: mark eol/bol and derived as "const char * const" Ævar Arnfjörð Bjarmason
2021-09-21 14:53   ` Jeff King
2021-09-21 15:17     ` Ævar Arnfjörð Bjarmason
2021-09-21 19:18       ` Jeff King
2021-09-23 13:56         ` Ævar Arnfjörð Bjarmason
2021-09-24  4:22           ` Junio C Hamano
2021-09-22 19:02     ` Junio C Hamano
2021-09-22 18:57 ` [PATCH 0/5] const-correctness in grep.c Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czp29l2c.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=someguy@effective-light.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.