All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 04/23] remember commit/tag parse failures
Date: Fri, 18 Oct 2019 00:47:21 -0400	[thread overview]
Message-ID: <20191018044721.GD17879@sigill.intra.peff.net> (raw)
In-Reply-To: <20191018044103.GA17625@sigill.intra.peff.net>

If we can't parse a commit, then parse_commit() will return an error
code. But it _also_ sets the "parsed" flag, which tells us not to bother
trying to re-parse the object. That means that subsequent parses have no
idea that the information in the struct may be bogus.  I.e., doing this:

  parse_commit(commit);
  ...
  if (parse_commit(commit) < 0)
          die("commit is broken");

will never trigger the die(). The second parse_commit() will see the
"parsed" flag and quietly return success.

There are two obvious ways to fix this:

  1. Stop setting "parsed" until we've successfully parsed.

  2. Keep a second "corrupt" flag to indicate that we saw an error (and
     when the parsed flag is set, return 0/-1 depending on the corrupt
     flag).

This patch does option 1. The obvious downside versus option 2 is that
we might continually re-parse a broken object. But in practice,
corruption like this is rare, and we typically die() or return an error
in the caller. So it's OK not to worry about optimizing for corruption.
And it's much simpler: we don't need to use an extra bit in the object
struct, and callers which check the "parsed" flag don't need to learn
about the corrupt bit, too.

There's no new test here, because this case is already covered in t5318.
Note that we do need to update the expected message there, because we
now detect the problem in the return from "parse_commit()", and not with
a separate check for a NULL tree. In fact, we can now ditch that
explicit tree check entirely, as we're covered robustly by this change
(and the previous recent change to treat a NULL tree as a parse error).

We'll also give tags the same treatment. I don't know offhand of any
cases where the problem can be triggered (it implies somebody ignoring a
parse error earlier in the process), but consistently returning an error
should cause the least surprise.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c          |  3 ---
 commit.c                | 14 +++++++++++++-
 t/t5318-commit-graph.sh |  2 +-
 tag.c                   | 12 +++++++++++-
 4 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index fc4a43b8d6..852b9c39e6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -855,9 +855,6 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 			die(_("unable to parse commit %s"),
 				oid_to_hex(&(*list)->object.oid));
 		tree = get_commit_tree_oid(*list);
-		if (!tree)
-			die(_("unable to get tree for %s"),
-				oid_to_hex(&(*list)->object.oid));
 		hashwrite(f, tree->hash, hash_len);
 
 		parent = (*list)->parents;
diff --git a/commit.c b/commit.c
index 810419a168..e12e7998ad 100644
--- a/commit.c
+++ b/commit.c
@@ -405,7 +405,18 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->parents) {
+		/*
+		 * Presumably this is leftover from an earlier failed parse;
+		 * clear it out in preparation for us re-parsing (we'll hit the
+		 * same error, but that's good, since it lets our caller know
+		 * the result cannot be trusted.
+		 */
+		free_commit_list(item->parents);
+		item->parents = NULL;
+	}
+
 	tail += size;
 	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
 			bufptr[tree_entry_len] != '\n')
@@ -462,6 +473,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	if (check_graph)
 		load_commit_graph_info(r, item);
 
+	item->object.parsed = 1;
 	return 0;
 }
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d42b3efe39..127b404856 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -660,7 +660,7 @@ test_expect_success 'corrupt commit-graph write (missing tree)' '
 		git commit-tree -p "$broken" -m "good" "$tree" >good &&
 		test_must_fail git commit-graph write --stdin-commits \
 			<good 2>test_err &&
-		test_i18ngrep "unable to get tree for" test_err
+		test_i18ngrep "unable to parse commit" test_err
 	)
 '
 
diff --git a/tag.c b/tag.c
index 6a51efda8d..71b544467e 100644
--- a/tag.c
+++ b/tag.c
@@ -141,7 +141,16 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->tag) {
+		/*
+		 * Presumably left over from a previous failed parse;
+		 * clear it out in preparation for re-parsing (we'll probably
+		 * hit the same error, which lets us tell our current caller
+		 * about the problem).
+		 */
+		FREE_AND_NULL(item->tag);
+	}
 
 	if (size < the_hash_algo->hexsz + 24)
 		return -1;
@@ -192,6 +201,7 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	else
 		item->date = 0;
 
+	item->object.parsed = 1;
 	return 0;
 }
 
-- 
2.23.0.1228.gee29b05929


  parent reply	other threads:[~2019-10-18  4:54 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
2019-10-18  4:42 ` [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error Jeff King
2019-10-24  3:37   ` Junio C Hamano
2019-10-24 18:01     ` Jeff King
2019-10-18  4:43 ` [PATCH 02/23] parse_commit_buffer(): treat lookup_tree() " Jeff King
2019-10-24 23:12   ` Jonathan Tan
2019-10-24 23:22     ` Jeff King
2019-10-18  4:45 ` [PATCH 03/23] parse_tag_buffer(): treat NULL tag pointer " Jeff King
2019-10-18  4:47 ` Jeff King [this message]
2019-10-24  3:51   ` [PATCH 04/23] remember commit/tag parse failures Junio C Hamano
2019-10-24 23:25   ` Jonathan Tan
2019-10-24 23:41     ` Jeff King
2019-10-18  4:48 ` [PATCH 05/23] fsck: stop checking commit->tree value Jeff King
2019-10-24  3:57   ` Junio C Hamano
2019-10-18  4:49 ` [PATCH 06/23] fsck: stop checking commit->parent counts Jeff King
2019-10-18  4:51 ` [PATCH 07/23] fsck: stop checking tag->tagged Jeff King
2019-10-18  4:54 ` [PATCH 08/23] fsck: require an actual buffer for non-blobs Jeff King
2019-10-18  4:56 ` [PATCH 09/23] fsck: unify object-name code Jeff King
2019-10-24  6:05   ` Junio C Hamano
2019-10-24 18:07     ` Jeff King
2019-10-25  3:23       ` Junio C Hamano
2019-10-25 21:20         ` Jeff King
2019-10-18  4:56 ` [PATCH 10/23] fsck_describe_object(): build on our get_object_name() primitive Jeff King
2019-10-24  6:06   ` Junio C Hamano
2019-10-18  4:57 ` [PATCH 11/23] fsck: use oids rather than objects for object_name API Jeff King
2019-10-18  4:58 ` [PATCH 12/23] fsck: don't require object structs for display functions Jeff King
2019-10-18  4:58 ` [PATCH 13/23] fsck: only provide oid/type in fsck_error callback Jeff King
2019-10-18  4:58 ` [PATCH 14/23] fsck: only require an oid for skiplist functions Jeff King
2019-10-18  4:59 ` [PATCH 15/23] fsck: don't require an object struct for report() Jeff King
2019-10-18  4:59 ` [PATCH 16/23] fsck: accept an oid instead of a "struct blob" for fsck_blob() Jeff King
2019-10-18  4:59 ` [PATCH 17/23] fsck: drop blob struct from fsck_finish() Jeff King
2019-10-18  5:00 ` [PATCH 18/23] fsck: don't require an object struct for fsck_ident() Jeff King
2019-10-18  5:00 ` [PATCH 19/23] fsck: don't require an object struct in verify_headers() Jeff King
2019-10-18  5:00 ` [PATCH 20/23] fsck: rename vague "oid" local variables Jeff King
2019-10-18  5:01 ` [PATCH 21/23] fsck: accept an oid instead of a "struct tag" for fsck_tag() Jeff King
2019-10-18  5:01 ` [PATCH 22/23] fsck: accept an oid instead of a "struct commit" for fsck_commit() Jeff King
2019-10-18  5:02 ` [PATCH 23/23] fsck: accept an oid instead of a "struct tree" for fsck_tree() Jeff King
2019-10-24 23:49 ` [PATCH 0/23] parsing and fsck cleanups Jonathan Tan
2019-10-25  3:11 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191018044721.GD17879@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.