Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 0/9] saner xdiff hunk callbacks
Date: Fri, 2 Nov 2018 02:31:56 -0400
Message-ID: <20181102063156.GA30252@sigill.intra.peff.net> (raw)

As part of my -Wunused-parameter hunt, I noticed that parse_hunk_header()
can read out-of-bounds in the array it is given. But it turns out not to
be a big problem because we only ever use it to parse lines that xdiff
has just generated.

This is fixable, and I'll show my patch to do so at the end of this
email. But it seems rather silly that we have xdiff generate a string
just so that we can parse it from within the same process. So instead I
improved the xdiff interface to pass the actual integers out, made use
of it as appropriate, and then in the final patch we can just drop the
buggy function entirely.

So that's this series, which I think is the better fix:

  [1/9]: xdiff: provide a separate emit callback for hunks
  [2/9]: xdiff-interface: provide a separate consume callback for hunks
  [3/9]: diff: avoid generating unused hunk header lines
  [4/9]: diff: discard hunk headers for patch-ids earlier
  [5/9]: diff: use hunk callback for word-diff
  [6/9]: combine-diff: use an xdiff hunk callback
  [7/9]: diff: convert --check to use a hunk callback
  [8/9]: range-diff: use a hunk callback
  [9/9]: xdiff-interface: drop parse_hunk_header()

 builtin/merge-tree.c |  3 +-
 builtin/rerere.c     |  3 +-
 combine-diff.c       | 67 ++++++++++++++++++++------------------
 diff.c               | 48 ++++++++++++++--------------
 diffcore-pickaxe.c   |  3 +-
 range-diff.c         | 10 +++++-
 xdiff-interface.c    | 76 ++++++++++++++++++--------------------------
 xdiff-interface.h    | 19 ++++++++---
 xdiff/xdiff.h        |  6 +++-
 xdiff/xutils.c       | 21 +++++++++---
 10 files changed, 141 insertions(+), 115 deletions(-)

For reference/comparison, here's the minimal fix patch.

-- >8 --
Subject: [PATCH DO NOT APPLY] xdiff-interface: refactor parse_hunk_header

The parse_hunk_header() function takes its input as a ptr/len pair, but
does not respect the "len" half, and instead treats it like a
NUL-terminated string. This works out in practice because we only ever
parse strings generated by xdiff. These do omit the trailing NUL, but
since they're always valid, we never parse past the newline.

Still, it would be easy to misuse it, so let's fix it.

While we're doing so, we can improve a few other things:

  - by using helpers like skip_prefix_mem(), we can do the whole parse
    within a conditional, avoiding the need to jump around (backwards,
    even!) to the error block with a goto. The result is hopefully much
    more readable.

  - the check for the final "@@" would trigger an error return code if
    it failed, but wouldn't actually emit an error message (like the
    rest of the parse errors do)

  - we used to blindly assume the caller checked that the string starts
    with "@@ -"; we now check it ourselves and let the caller know what
    we found

  - the input line does not need to be modified, and can be marked
    "const"

Signed-off-by: Jeff King <peff@peff.net>
---
The diff is pretty horrid, but hopefully the post-image is pleasant to
read.

 combine-diff.c    | 13 +++++----
 diff.c            |  5 ++--
 xdiff-interface.c | 68 +++++++++++++++++++++++++----------------------
 xdiff-interface.h |  6 ++---
 4 files changed, 50 insertions(+), 42 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 10155e0ec8..0318f39103 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -348,11 +348,14 @@ struct combine_diff_state {
 static void consume_line(void *state_, char *line, unsigned long len)
 {
 	struct combine_diff_state *state = state_;
-	if (5 < len && !memcmp("@@ -", line, 4)) {
-		if (parse_hunk_header(line, len,
-				      &state->ob, &state->on,
-				      &state->nb, &state->nn))
-			return;
+	switch (maybe_parse_hunk_header(line, len,
+					&state->ob, &state->on,
+					&state->nb, &state->nn)) {
+	case 0:
+		break; /* not a hunk header */
+	case -1:
+		return; /* parse error */
+	default:
 		state->lno = state->nb;
 		if (state->nn == 0) {
 			/* @@ -X,Y +N,0 @@ removed Y lines
diff --git a/diff.c b/diff.c
index 8647db3d30..af6c01c4d0 100644
--- a/diff.c
+++ b/diff.c
@@ -1921,8 +1921,9 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 	struct diff_options *opt = diff_words->opt;
 	const char *line_prefix;
 
-	if (line[0] != '@' || parse_hunk_header(line, len,
-			&minus_first, &minus_len, &plus_first, &plus_len))
+	if (maybe_parse_hunk_header(line, len,
+				    &minus_first, &minus_len,
+				    &plus_first, &plus_len) <= 0)
 		return;
 
 	assert(opt);
diff --git a/xdiff-interface.c b/xdiff-interface.c
index e7af95db86..f574ee49cd 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -14,49 +14,53 @@ struct xdiff_emit_state {
 	struct strbuf remainder;
 };
 
-static int parse_num(char **cp_p, int *num_p)
+static int parse_num(const char **cp_p, size_t *len, int *num_p)
 {
-	char *cp = *cp_p;
+	const char *cp = *cp_p;
+	const char *end = cp + *len;
 	int num = 0;
 
-	while ('0' <= *cp && *cp <= '9')
+	while (cp < end && '0' <= *cp && *cp <= '9')
 		num = num * 10 + *cp++ - '0';
 	if (!(cp - *cp_p))
-		return -1;
+		return 0;
+	*len -= cp - *cp_p;
 	*cp_p = cp;
 	*num_p = num;
-	return 0;
+	return 1;
 }
 
-int parse_hunk_header(char *line, int len,
-		      int *ob, int *on,
-		      int *nb, int *nn)
+static int parse_hunk_len(const char **cp_p, size_t *len, int *out)
 {
-	char *cp;
-	cp = line + 4;
-	if (parse_num(&cp, ob)) {
-	bad_line:
-		return error("malformed diff output: %s", line);
+	/* hunk lengths are optional; if omitted, default to 1 */
+	if (skip_prefix_mem(*cp_p, *len, ",", cp_p, len)) {
+		if (!parse_num(cp_p, len, out))
+			return 0;
+	} else {
+		*out = 1;
 	}
-	if (*cp == ',') {
-		cp++;
-		if (parse_num(&cp, on))
-			goto bad_line;
-	}
-	else
-		*on = 1;
-	if (*cp++ != ' ' || *cp++ != '+')
-		goto bad_line;
-	if (parse_num(&cp, nb))
-		goto bad_line;
-	if (*cp == ',') {
-		cp++;
-		if (parse_num(&cp, nn))
-			goto bad_line;
-	}
-	else
-		*nn = 1;
-	return -!!memcmp(cp, " @@", 3);
+
+	return 1;
+}
+
+int maybe_parse_hunk_header(const char *line, size_t len,
+			    int *ob, int *on,
+			    int *nb, int *nn)
+{
+	const char *cp;
+
+	if (!skip_prefix_mem(line, len, "@@ -", &cp, &len))
+		return 0;
+
+	if (!parse_num(&cp, &len, ob) ||
+	    !parse_hunk_len(&cp, &len, on) ||
+	    !skip_prefix_mem(cp, len, " +", &cp, &len) ||
+	    !parse_num(&cp, &len, nb) ||
+	    !parse_hunk_len(&cp, &len, nn) ||
+	    !skip_prefix_mem(cp, len, " @@", &cp, &len))
+		return error("malformed diff output: %s", line);
+
+	return 1;
 }
 
 static void consume_one(void *priv_, char *s, unsigned long size)
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 135fc05d72..cddd56bae6 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -17,9 +17,9 @@ int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
 		  xdiff_emit_consume_fn fn, void *consume_callback_data,
 		  xpparam_t const *xpp, xdemitconf_t const *xecfg);
-int parse_hunk_header(char *line, int len,
-		      int *ob, int *on,
-		      int *nb, int *nn);
+int maybe_parse_hunk_header(const char *line, size_t len,
+			    int *ob, int *on,
+			    int *nb, int *nn);
 int read_mmfile(mmfile_t *ptr, const char *filename);
 void read_mmblob(mmfile_t *ptr, const struct object_id *oid);
 int buffer_is_binary(const char *ptr, unsigned long size);
-- 
2.19.1.1336.g081079ac04


             reply index

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-02  6:31 Jeff King [this message]
2018-11-02  6:35 ` [PATCH 1/9] xdiff: provide a separate emit callback for hunks Jeff King
2018-11-02  6:35 ` [PATCH 2/9] xdiff-interface: provide a separate consume " Jeff King
2018-11-02  6:36 ` [PATCH 3/9] diff: avoid generating unused hunk header lines Jeff King
2018-11-02 19:50   ` Stefan Beller
2018-11-02 20:15     ` Jeff King
2018-11-03  0:32       ` Junio C Hamano
2018-11-02  6:36 ` [PATCH 4/9] diff: discard hunk headers for patch-ids earlier Jeff King
2018-11-02  6:37 ` [PATCH 5/9] diff: use hunk callback for word-diff Jeff King
2018-11-02  6:38 ` [PATCH 6/9] combine-diff: use an xdiff hunk callback Jeff King
2018-11-02  6:39 ` [PATCH 7/9] diff: convert --check to use a " Jeff King
2018-11-02  6:39 ` [PATCH 8/9] range-diff: " Jeff King
2018-11-02  6:40 ` [PATCH 9/9] xdiff-interface: drop parse_hunk_header() Jeff King
2018-11-02 11:48 ` [PATCH 0/9] saner xdiff hunk callbacks Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181102063156.GA30252@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git