All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] log_tree_diff: die when we fail to parse a commit
@ 2013-10-06  4:48 Jeff King
  2013-10-08 13:48 ` [PATCH 0/5] parse_commit cleanups Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2013-10-06  4:48 UTC (permalink / raw)
  To: git

We currently call parse_commit and then assume we can
dereference the resulting "tree" struct field. If parsing
failed, however, that field is NULL and we end up
segfaulting.

Instead of a segfault, let's print an error message and die
a little more gracefully.

Note that this should never happen in practice, but may
happen in a corrupt repository (or when accessing objects
not connected to the reachable history, whose parents may be
subject to pruning).

Signed-off-by: Jeff King <peff@peff.net>
---
Not a huge deal, since we are terminating the program either way. There
are other places in the code with a bare parse_commit that could
probably use the same treatment. I didn't investigate them, but they
could easily build on the parse_commit_or_die here if somebody wants to
follow up.

 commit.c   | 7 +++++++
 commit.h   | 1 +
 log-tree.c | 6 +++---
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index de16a3c..51a9bbc 100644
--- a/commit.c
+++ b/commit.c
@@ -341,6 +341,13 @@ int parse_commit(struct commit *item)
 	return ret;
 }
 
+void parse_commit_or_die(struct commit *item)
+{
+	if (parse_commit(item))
+		die("unable to parse commit %s",
+		    item ? sha1_to_hex(item->object.sha1) : "(null)");
+}
+
 int find_commit_subject(const char *commit_buffer, const char **subject)
 {
 	const char *eol;
diff --git a/commit.h b/commit.h
index bd841f4..934af88 100644
--- a/commit.h
+++ b/commit.h
@@ -49,6 +49,7 @@ int parse_commit(struct commit *item);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size);
 int parse_commit(struct commit *item);
+void parse_commit_or_die(struct commit *item);
 
 /* Find beginning and length of commit subject. */
 int find_commit_subject(const char *commit_buffer, const char **subject);
diff --git a/log-tree.c b/log-tree.c
index 8534d91..e958d07 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -734,7 +734,7 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
 	if (!opt->diff && !DIFF_OPT_TST(&opt->diffopt, EXIT_WITH_STATUS))
 		return 0;
 
-	parse_commit(commit);
+	parse_commit_or_die(commit);
 	sha1 = commit->tree->object.sha1;
 
 	/* Root commit? */
@@ -759,7 +759,7 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
 			 * parent, showing summary diff of the others
 			 * we merged _in_.
 			 */
-			parse_commit(parents->item);
+			parse_commit_or_die(parents->item);
 			diff_tree_sha1(parents->item->tree->object.sha1,
 				       sha1, "", &opt->diffopt);
 			log_tree_diff_flush(opt);
@@ -774,7 +774,7 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
 	for (;;) {
 		struct commit *parent = parents->item;
 
-		parse_commit(parent);
+		parse_commit_or_die(parent);
 		diff_tree_sha1(parent->tree->object.sha1,
 			       sha1, "", &opt->diffopt);
 		log_tree_diff_flush(opt);
-- 
1.8.4.1.4.gf327177

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 0/5] parse_commit cleanups
  2013-10-06  4:48 [PATCH] log_tree_diff: die when we fail to parse a commit Jeff King
@ 2013-10-08 13:48 ` Jeff King
  2013-10-08 13:52   ` [PATCH 1/5] assume parse_commit checks commit->object.parsed Jeff King
                     ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Jeff King @ 2013-10-08 13:48 UTC (permalink / raw)
  To: git

On Sun, Oct 06, 2013 at 12:48:56AM -0400, Jeff King wrote:

> Instead of a segfault, let's print an error message and die
> a little more gracefully.
> [...]
> ---
> Not a huge deal, since we are terminating the program either way. There
> are other places in the code with a bare parse_commit that could
> probably use the same treatment. I didn't investigate them, but they
> could easily build on the parse_commit_or_die here if somebody wants to
> follow up.

Here are some follow-up patches that go on top. The first two are noop
cleanups. The third and fourth are not strictly noops, but are minor
behavior changes that should be strict improvements.

There are a number of unchecked parse_commit calls whose fallout is more
complicated. In many cases, we want to ignore an error (keeping in mind
that parse_commit will already have printed a message to stderr) and
keep going, in order to allow users to to get as much done as they can
in a broken repository.

The fifth patch deals with one of these cases. There are 8 more
unchecked calls after this series, some of which may want to die(), or
may want to be left alone; I didn't investigate deeply.

There are also some cases where we do not die, but perhaps should. In
general, I think it makes sense to let fetch-pack and rev-list work
around broken history, but probably not things like blame, which would
then provide subtly wrong answers. Still, I doubt it's a big deal in
practice, since corrupted repositories are relatively rare (and the
message to stderr does inform the user that something is wrong).

  [1/5]: assume parse_commit checks commit->object.parsed
  [2/5]: assume parse_commit checks for NULL commit
  [3/5]: use parse_commit_or_die instead of segfaulting
  [4/5]: use parse_commit_or_die instead of custom message
  [5/5]: checkout: do not die when leaving broken detached HEAD

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] assume parse_commit checks commit->object.parsed
  2013-10-08 13:48 ` [PATCH 0/5] parse_commit cleanups Jeff King
@ 2013-10-08 13:52   ` Jeff King
  2013-10-08 13:56   ` [PATCH 2/5] assume parse_commit checks for NULL commit Jeff King
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2013-10-08 13:52 UTC (permalink / raw)
  To: git

The parse_commit function will check the "parsed" flag of
the object and do nothing if it is set. There is no need
for callers to check the flag themselves, and doing so only
clutters the code.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/blame.c       | 3 +--
 builtin/name-rev.c    | 3 +--
 builtin/show-branch.c | 3 +--
 fetch-pack.c          | 8 +++-----
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 6da7233..5f1cb09 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1549,8 +1549,7 @@ static void assign_blame(struct scoreboard *sb, int opt)
 		 */
 		origin_incref(suspect);
 		commit = suspect->commit;
-		if (!commit->object.parsed)
-			parse_commit(commit);
+		parse_commit(commit);
 		if (reverse ||
 		    (!(commit->object.flags & UNINTERESTING) &&
 		     !(revs->max_age != -1 && commit->date < revs->max_age)))
diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 20fcf8c..23daaa7 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -27,8 +27,7 @@ static void name_rev(struct commit *commit,
 	struct commit_list *parents;
 	int parent_number = 1;
 
-	if (!commit->object.parsed)
-		parse_commit(commit);
+	parse_commit(commit);
 
 	if (commit->date < cutoff)
 		return;
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 001f29c..46902c3 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -227,8 +227,7 @@ static void join_revs(struct commit_list **list_p,
 			parents = parents->next;
 			if ((this_flag & flags) == flags)
 				continue;
-			if (!p->object.parsed)
-				parse_commit(p);
+			parse_commit(p);
 			if (mark_seen(p, seen_p) && !still_interesting)
 				extra--;
 			p->object.flags |= flags;
diff --git a/fetch-pack.c b/fetch-pack.c
index a0e0350..a141eb4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -47,9 +47,8 @@ static void rev_list_push(struct commit *commit, int mark)
 	if (!(commit->object.flags & mark)) {
 		commit->object.flags |= mark;
 
-		if (!(commit->object.parsed))
-			if (parse_commit(commit))
-				return;
+		if (parse_commit(commit))
+			return;
 
 		prio_queue_put(&rev_list, commit);
 
@@ -128,8 +127,7 @@ static const unsigned char *get_rev(void)
 			return NULL;
 
 		commit = prio_queue_get(&rev_list);
-		if (!commit->object.parsed)
-			parse_commit(commit);
+		parse_commit(commit);
 		parents = commit->parents;
 
 		commit->object.flags |= POPPED;
-- 
1.8.4.1.4.gf327177

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] assume parse_commit checks for NULL commit
  2013-10-08 13:48 ` [PATCH 0/5] parse_commit cleanups Jeff King
  2013-10-08 13:52   ` [PATCH 1/5] assume parse_commit checks commit->object.parsed Jeff King
@ 2013-10-08 13:56   ` Jeff King
  2013-10-08 13:57   ` [PATCH 3/5] use parse_commit_or_die instead of segfaulting Jeff King
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2013-10-08 13:56 UTC (permalink / raw)
  To: git

The parse_commit function will check whether it was passed a
NULL commit pointer, and if so, return an error. There is no
need for callers to check this separately.

Signed-off-by: Jeff King <peff@peff.net>
---
There are a few cases left untouched, because they actually produce a
different result for the NULL case (e.g., printing a different error
message, as in read_tree_1).

Note also the case of:

  if (parent && parse_commit(parent) < 0)

in sequencer.c, which looks at first glance like it falls in this class,
but doesn't (it is specifically avoiding the NULL check and reporting an
error only when we actually have something to parse at all).

 builtin/branch.c | 2 +-
 builtin/commit.c | 4 ++--
 commit.c         | 2 +-
 notes-utils.c    | 2 +-
 sha1_name.c      | 2 --
 5 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index ad0f86d..491090f 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -496,7 +496,7 @@ static void add_verbose_info(struct strbuf *out, struct ref_item *item,
 	const char *sub = _(" **** invalid ref ****");
 	struct commit *commit = item->commit;
 
-	if (commit && !parse_commit(commit)) {
+	if (!parse_commit(commit)) {
 		pp_commit_easy(CMIT_FMT_ONELINE, commit, &subject);
 		sub = subject.buf;
 	}
diff --git a/builtin/commit.c b/builtin/commit.c
index 6ab4605..e89c519 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1338,7 +1338,7 @@ static void print_summary(const char *prefix, const unsigned char *sha1,
 	commit = lookup_commit(sha1);
 	if (!commit)
 		die(_("couldn't look up newly created commit"));
-	if (!commit || parse_commit(commit))
+	if (parse_commit(commit))
 		die(_("could not parse newly created commit"));
 
 	strbuf_addstr(&format, "format:%h] %s");
@@ -1525,7 +1525,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		current_head = NULL;
 	else {
 		current_head = lookup_commit_or_die(sha1, "HEAD");
-		if (!current_head || parse_commit(current_head))
+		if (parse_commit(current_head))
 			die(_("could not parse HEAD commit"));
 	}
 	argc = parse_and_validate_options(argc, argv, builtin_commit_options,
diff --git a/commit.c b/commit.c
index 51a9bbc..11509ff 100644
--- a/commit.c
+++ b/commit.c
@@ -79,7 +79,7 @@ struct commit *lookup_commit_reference_by_name(const char *name)
 	if (get_sha1_committish(name, sha1))
 		return NULL;
 	commit = lookup_commit_reference(sha1);
-	if (!commit || parse_commit(commit))
+	if (parse_commit(commit))
 		return NULL;
 	return commit;
 }
diff --git a/notes-utils.c b/notes-utils.c
index 9107c37..7bb3473 100644
--- a/notes-utils.c
+++ b/notes-utils.c
@@ -18,7 +18,7 @@ void create_notes_commit(struct notes_tree *t, struct commit_list *parents,
 		unsigned char parent_sha1[20];
 		if (!read_ref(t->ref, parent_sha1)) {
 			struct commit *parent = lookup_commit(parent_sha1);
-			if (!parent || parse_commit(parent))
+			if (parse_commit(parent))
 				die("Failed to find/parse commit %s", t->ref);
 			commit_list_insert(parent, &parents);
 		}
diff --git a/sha1_name.c b/sha1_name.c
index 0e5fe7f..1dfc401 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -582,8 +582,6 @@ static int get_parent(const char *name, int len,
 	if (ret)
 		return ret;
 	commit = lookup_commit_reference(sha1);
-	if (!commit)
-		return -1;
 	if (parse_commit(commit))
 		return -1;
 	if (!idx) {
-- 
1.8.4.1.4.gf327177

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] use parse_commit_or_die instead of segfaulting
  2013-10-08 13:48 ` [PATCH 0/5] parse_commit cleanups Jeff King
  2013-10-08 13:52   ` [PATCH 1/5] assume parse_commit checks commit->object.parsed Jeff King
  2013-10-08 13:56   ` [PATCH 2/5] assume parse_commit checks for NULL commit Jeff King
@ 2013-10-08 13:57   ` Jeff King
  2013-10-08 13:57   ` [PATCH 4/5] use parse_commit_or_die instead of custom message Jeff King
  2013-10-08 14:00   ` [PATCH 5/5] checkout: do not die when leaving broken detached HEAD Jeff King
  4 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2013-10-08 13:57 UTC (permalink / raw)
  To: git

Some unchecked calls to parse_commit should obviously die on
error, because their next step is to start looking at the
parsed fields, which will cause a segfault. These are
obvious candidates for parse_commit_or_die, which will be a
strict improvement in behavior.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/checkout.c    | 4 ++--
 builtin/fast-export.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0f57397..34a2e43 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -789,7 +789,7 @@ static int switch_branches(const struct checkout_opts *opts,
 		new->commit = old.commit;
 		if (!new->commit)
 			die(_("You are on a branch yet to be born"));
-		parse_commit(new->commit);
+		parse_commit_or_die(new->commit);
 	}
 
 	ret = merge_working_tree(opts, &old, new, &writeout_error);
@@ -952,7 +952,7 @@ static int parse_branchname_arg(int argc, const char **argv,
 		/* not a commit */
 		*source_tree = parse_tree_indirect(rev);
 	} else {
-		parse_commit(new->commit);
+		parse_commit_or_die(new->commit);
 		*source_tree = new->commit->tree;
 	}
 
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 78250ea..ea63052 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -287,7 +287,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
 
 	rev->diffopt.output_format = DIFF_FORMAT_CALLBACK;
 
-	parse_commit(commit);
+	parse_commit_or_die(commit);
 	author = strstr(commit->buffer, "\nauthor ");
 	if (!author)
 		die ("Could not find author in commit %s",
@@ -308,7 +308,7 @@ static void handle_commit(struct commit *commit, struct rev_info *rev)
 	if (commit->parents &&
 	    get_object_mark(&commit->parents->item->object) != 0 &&
 	    !full_tree) {
-		parse_commit(commit->parents->item);
+		parse_commit_or_die(commit->parents->item);
 		diff_tree_sha1(commit->parents->item->tree->object.sha1,
 			       commit->tree->object.sha1, "", &rev->diffopt);
 	}
-- 
1.8.4.1.4.gf327177

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] use parse_commit_or_die instead of custom message
  2013-10-08 13:48 ` [PATCH 0/5] parse_commit cleanups Jeff King
                     ` (2 preceding siblings ...)
  2013-10-08 13:57   ` [PATCH 3/5] use parse_commit_or_die instead of segfaulting Jeff King
@ 2013-10-08 13:57   ` Jeff King
  2013-10-08 14:00   ` [PATCH 5/5] checkout: do not die when leaving broken detached HEAD Jeff King
  4 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2013-10-08 13:57 UTC (permalink / raw)
  To: git

Many calls to parse_commit detect errors and die. In some
cases, the custom error messages are more useful than what
parse_commit_or_die could produce, because they give some
context, like which ref the commit came from. Some, however,
just say "invalid commit". Let's convert the latter to use
parse_commit_or_die; its message is slightly more informative,
and it makes the error more consistent throughout git.

Signed-off-by: Jeff King <peff@peff.net>
---
 shallow.c     | 3 +--
 upload-pack.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/shallow.c b/shallow.c
index cdf37d6..961cf6f 100644
--- a/shallow.c
+++ b/shallow.c
@@ -90,8 +90,7 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
 				cur_depth = *(int *)commit->util;
 			}
 		}
-		if (parse_commit(commit))
-			die("invalid commit");
+		parse_commit_or_die(commit);
 		cur_depth++;
 		if (cur_depth >= depth) {
 			commit_list_insert(commit, &result);
diff --git a/upload-pack.c b/upload-pack.c
index a6c54e0..abe6636 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -649,8 +649,7 @@ static void receive_needs(void)
 				/* make sure the real parents are parsed */
 				unregister_shallow(object->sha1);
 				object->parsed = 0;
-				if (parse_commit((struct commit *)object))
-					die("invalid commit");
+				parse_commit_or_die((struct commit *)object);
 				parents = ((struct commit *)object)->parents;
 				while (parents) {
 					add_object_array(&parents->item->object,
-- 
1.8.4.1.4.gf327177

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] checkout: do not die when leaving broken detached HEAD
  2013-10-08 13:48 ` [PATCH 0/5] parse_commit cleanups Jeff King
                     ` (3 preceding siblings ...)
  2013-10-08 13:57   ` [PATCH 4/5] use parse_commit_or_die instead of custom message Jeff King
@ 2013-10-08 14:00   ` Jeff King
  4 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2013-10-08 14:00 UTC (permalink / raw)
  To: git

If we move away from a detached HEAD that has broken or
corrupted commits, we might die in two places:

  1. Printing the "old HEAD was..." message.

  2. Printing the list of orphaned commits.

In both cases, we ignore the return value of parse_commit
and feed the resulting commit to the pretty-print machinery,
which will die() upon failing to read the commit object
itself.

Since both cases are ancillary to the real operation being
performed, let's be more robust and keep going. This lets
user more easily checkout away from broken history.

Note that the call in describe_detached_head is also used to
describe the new commit we are moving to. We would want to
die in that case, but that is already handled much earlier,
when we parse the commit to get the tree to move to.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/checkout.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 34a2e43..1df55c0 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -380,8 +380,8 @@ static void describe_detached_head(const char *msg, struct commit *commit)
 static void describe_detached_head(const char *msg, struct commit *commit)
 {
 	struct strbuf sb = STRBUF_INIT;
-	parse_commit(commit);
-	pp_commit_easy(CMIT_FMT_ONELINE, commit, &sb);
+	if (!parse_commit(commit))
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &sb);
 	fprintf(stderr, "%s %s... %s\n", msg,
 		find_unique_abbrev(commit->object.sha1, DEFAULT_ABBREV), sb.buf);
 	strbuf_release(&sb);
@@ -677,12 +677,12 @@ static void describe_one_orphan(struct strbuf *sb, struct commit *commit)
 
 static void describe_one_orphan(struct strbuf *sb, struct commit *commit)
 {
-	parse_commit(commit);
 	strbuf_addstr(sb, "  ");
 	strbuf_addstr(sb,
 		find_unique_abbrev(commit->object.sha1, DEFAULT_ABBREV));
 	strbuf_addch(sb, ' ');
-	pp_commit_easy(CMIT_FMT_ONELINE, commit, sb);
+	if (!parse_commit(commit))
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, sb);
 	strbuf_addch(sb, '\n');
 }
 
-- 
1.8.4.1.4.gf327177

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-10-09 16:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-06  4:48 [PATCH] log_tree_diff: die when we fail to parse a commit Jeff King
2013-10-08 13:48 ` [PATCH 0/5] parse_commit cleanups Jeff King
2013-10-08 13:52   ` [PATCH 1/5] assume parse_commit checks commit->object.parsed Jeff King
2013-10-08 13:56   ` [PATCH 2/5] assume parse_commit checks for NULL commit Jeff King
2013-10-08 13:57   ` [PATCH 3/5] use parse_commit_or_die instead of segfaulting Jeff King
2013-10-08 13:57   ` [PATCH 4/5] use parse_commit_or_die instead of custom message Jeff King
2013-10-08 14:00   ` [PATCH 5/5] checkout: do not die when leaving broken detached HEAD Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.