git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/23] parsing and fsck cleanups
@ 2019-10-18  4:41 Jeff King
  2019-10-18  4:42 ` [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error Jeff King
                   ` (24 more replies)
  0 siblings, 25 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:41 UTC (permalink / raw)
  To: git

The thread starting at:

  https://public-inbox.org/git/xmqqo8zxnz0m.fsf@gitster-ct.c.googlers.com/

discusses some issues with our handling of corrupt objects, as well as
some weirdness in fsck. This series is my attempt to clean it up. The
number of patches is a little daunting, but the early ones are the most
interesting. The latter half is part of a big refactor/cleanup that's
mostly mechanical (and isn't strictly necessary; see below for
discussion).

  [01/23]: parse_commit_buffer(): treat lookup_commit() failure as parse error
  [02/23]: parse_commit_buffer(): treat lookup_tree() failure as parse error
  [03/23]: parse_tag_buffer(): treat NULL tag pointer as parse error
  [04/23]: remember commit/tag parse failures

    These ones are tightening up our parser to report failures more
    consistently. The first one definitely fixes a demonstrable bug, and
    I suspect the rest of them are fixing hard-to-trigger but lurking
    segfaults.

  [05/23]: fsck: stop checking commit->tree value
  [06/23]: fsck: stop checking commit->parent counts
  [07/23]: fsck: stop checking tag->tagged
  [08/23]: fsck: require an actual buffer for non-blobs

    These ones clean up weirdness where fsck is dependent on the results
    of parse_commit(), etc, rather than just looking at the buffer we
    gave it. I don't think they're _hurting_ anything, but it certainly
    makes following the fsck logic more confusing.

  [09/23]: fsck: unify object-name code

    Cleanup that fixes a few minor bugs.

  [10/23]: fsck_describe_object(): build on our get_object_name() primitive
  [11/23]: fsck: use oids rather than objects for object_name API
  [12/23]: fsck: don't require full object structs for display functions
  [13/23]: fsck: only provide oid/type in fsck_error callback
  [14/23]: fsck: only require an oid for skiplist functions
  [15/23]: fsck: don't require an object struct for report()
  [16/23]: fsck: accept an oid instead of a "struct blob" for fsck_blob()
  [17/23]: fsck: drop blob struct from fsck_finish()
  [18/23]: fsck: don't require an object struct for fsck_ident()
  [19/23]: fsck: don't require an object struct in verify_headers()
  [20/23]: fsck: rename vague "oid" local variables
  [21/23]: fsck: accept an oid instead of a "struct tag" for fsck_tag()
  [22/23]: fsck: accept an oid instead of a "struct commit" for fsck_commit()
  [23/23]: fsck: accept an oid instead of a "struct tree" for fsck_tree()

    This a string of refactors that ends up with all of the
    type-specific fsck functions not getting an object struct at all.
    My goal there was two-fold:

       - it makes it harder to introduce weirdness like we saw in
	 patches 5-8.

       - it _could_ make things less awkward for callers like index-pack
	 which don't necessarily have object structs. And at the end, we
	 basically have an fsck_object() that doesn't need an object
	 struct. But index-pack still calls fsck_walk(), which does (and
	 which relies on the parsed values to traverse). It's not
	 entirely clear to me whether index-pack needs to be doing
	 fsck_walk() in the first place, or if it should be relying on
	 the usual connectivity check.

	 So I'm undecided whether this is worth taking on its own, or if
	 trying to avoid object structs in the fsck code is just a
	 fool's errand. I do think the result isn't too bad to look at,
	 though and there are some minor improvements along the way
	 (e.g., patch 17 is able to drop some awkwardness).

    Most of the patches are pretty mechanical. There are so many because
    I split it by call stack layer. If A calls B calls C, then I
    converted "C" away from "struct object" first, which enables
    converting "B", and so on.

 builtin/fsck.c                         | 126 ++++----
 commit-graph.c                         |   3 -
 commit.c                               |  33 ++-
 fsck.c                                 | 386 +++++++++++--------------
 fsck.h                                 |  39 ++-
 t/t1450-fsck.sh                        |   2 +-
 t/t5318-commit-graph.sh                |   2 +-
 t/t6102-rev-list-unexpected-objects.sh |   2 +-
 tag.c                                  |  21 +-
 9 files changed, 312 insertions(+), 302 deletions(-)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
@ 2019-10-18  4:42 ` Jeff King
  2019-10-24  3:37   ` Junio C Hamano
  2019-10-18  4:43 ` [PATCH 02/23] parse_commit_buffer(): treat lookup_tree() " Jeff King
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:42 UTC (permalink / raw)
  To: git

While parsing the parents of a commit, if we are able to parse an actual
oid but lookup_commit() fails on it (because we previously saw it in
this process as a different object type), we silently omit the parent
and do not report any error to the caller.

The caller has no way of knowing this happened, because even an empty
parent list is a valid parse result. As a result, it's possible to fool
our "rev-list" connectivity check into accepting a corrupted set of
objects.

There's a test for this case already in t6102, but unfortunately it has
a slight error. It creates a broken commit with a parent line pointing
to a blob, and then checks that rev-list notices the problem in two
cases:

  1. the "lone" case: we traverse the broken commit by itself (here we
     try to actually load the blob from disk and find out that it's not
     a commit)

  2. the "seen" case: we parse the blob earlier in the process, and then
     when calling lookup_commit() we realize immediately that it's not a
     commit

The "seen" variant for this test mistakenly parsed another commit
instead of the blob, meaning that we were actually just testing the
"lone" case again. Changing that reveals the breakage (and shows that
this fixes it).

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c                               | 11 ++++++++---
 t/t6102-rev-list-unexpected-objects.sh |  2 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index 40890ae7ce..6467c9e175 100644
--- a/commit.c
+++ b/commit.c
@@ -432,8 +432,11 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 		if (graft && (graft->nr_parent < 0 || grafts_replace_parents))
 			continue;
 		new_parent = lookup_commit(r, &parent);
-		if (new_parent)
-			pptr = &commit_list_insert(new_parent, pptr)->next;
+		if (!new_parent)
+			return error("bad parent %s in commit %s",
+				     oid_to_hex(&parent),
+				     oid_to_hex(&item->object.oid));
+		pptr = &commit_list_insert(new_parent, pptr)->next;
 	}
 	if (graft) {
 		int i;
@@ -442,7 +445,9 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 			new_parent = lookup_commit(r,
 						   &graft->parent[i]);
 			if (!new_parent)
-				continue;
+				return error("bad graft parent %s in commit %s",
+					     oid_to_hex(&graft->parent[i]),
+					     oid_to_hex(&item->object.oid));
 			pptr = &commit_list_insert(new_parent, pptr)->next;
 		}
 	}
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 28611c978e..52cde097dd 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -52,7 +52,7 @@ test_expect_success 'traverse unexpected non-commit parent (lone)' '
 '
 
 test_expect_success 'traverse unexpected non-commit parent (seen)' '
-	test_must_fail git rev-list --objects $commit $broken_commit \
+	test_must_fail git rev-list --objects $blob $broken_commit \
 		>output 2>&1 &&
 	test_i18ngrep "not a commit" output
 '
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/23] parse_commit_buffer(): treat lookup_tree() failure as parse error
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
  2019-10-18  4:42 ` [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error Jeff King
@ 2019-10-18  4:43 ` Jeff King
  2019-10-24 23:12   ` Jonathan Tan
  2019-10-18  4:45 ` [PATCH 03/23] parse_tag_buffer(): treat NULL tag pointer " Jeff King
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:43 UTC (permalink / raw)
  To: git

If parsing a commit yields a valid tree oid, but we've seen that same
oid as a non-tree in the same process, the resulting commit struct will
end up with a NULL tree pointer, but not otherwise report a parsing
failure.

That's perhaps convenient for callers which want to operate on even
partially corrupt commits (e.g., by still looking at the parents). But
it leaves a potential trap for most callers, who now have to manually
check for a NULL tree. Most do not, and it's likely that there are
possible segfaults lurking. I say "possible" because there are many
candidates, and I don't think it's worth following through on
reproducing them when we can just fix them all in one spot. And
certainly we _have_ seen real-world cases, such as the one fixed by
806278dead (commit-graph.c: handle corrupt/missing trees, 2019-09-05).

Note that we can't quite drop the check in the caller added by that
commit yet, as there's some subtlety with repeated parsings (which will
be addressed in a future commit).

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 6467c9e175..810419a168 100644
--- a/commit.c
+++ b/commit.c
@@ -401,6 +401,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	struct commit_graft *graft;
 	const int tree_entry_len = the_hash_algo->hexsz + 5;
 	const int parent_entry_len = the_hash_algo->hexsz + 7;
+	struct tree *tree;
 
 	if (item->object.parsed)
 		return 0;
@@ -412,7 +413,12 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	if (get_oid_hex(bufptr + 5, &parent) < 0)
 		return error("bad tree pointer in commit %s",
 			     oid_to_hex(&item->object.oid));
-	set_commit_tree(item, lookup_tree(r, &parent));
+	tree = lookup_tree(r, &parent);
+	if (!tree)
+		return error("bad tree pointer %s in commit %s",
+			     oid_to_hex(&parent),
+			     oid_to_hex(&item->object.oid));
+	set_commit_tree(item, tree);
 	bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */
 	pptr = &item->parents;
 
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/23] parse_tag_buffer(): treat NULL tag pointer as parse error
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
  2019-10-18  4:42 ` [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error Jeff King
  2019-10-18  4:43 ` [PATCH 02/23] parse_commit_buffer(): treat lookup_tree() " Jeff King
@ 2019-10-18  4:45 ` Jeff King
  2019-10-18  4:47 ` [PATCH 04/23] remember commit/tag parse failures Jeff King
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:45 UTC (permalink / raw)
  To: git

When parsing a tag, we may end up with a NULL "tagged" field when
there's a type mismatch (e.g., the tag claims to point to object X as a
commit, but we previously saw X as a blob in the same process), but we
do not otherwise indicate a parse failure to the caller.

This is similar to the case discussed in the previous commit, where a
commit could end up with a NULL tree field: while slightly convenient
for callers who want to overlook a corrupt object, it means that normal
callers have to explicitly deal with this case (rather than just relying
on the return code from parsing). And most don't, leading to segfault
fixes like the one in c77722b3ea (use get_tagged_oid(), 2019-09-05).

Let's address this more centrally, by returning an error code from the
parse itself, which most callers would already notice (adventurous
callers are free to ignore the error and continue looking at the
struct).

This also covers the case where the tag contains a nonsensical "type"
field (there we produced a user-visible error but still returned success
to the caller; now we'll produce a slightly better message and return an
error).

Signed-off-by: Jeff King <peff@peff.net>
---
It's possible that c77722b3ea fixed all of the segfaults here, though of
course new callers must remember to use it instead of accessing
tag->tagged directly. We could _possibly_ get rid of that check now, but
that assumes that all of the caller notice parse_tag() or parse_object()
failing. Having both gives us belt-and-suspenders protection against a
segfault.

 tag.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tag.c b/tag.c
index bfa0e31435..6a51efda8d 100644
--- a/tag.c
+++ b/tag.c
@@ -167,10 +167,15 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	} else if (!strcmp(type, tag_type)) {
 		item->tagged = (struct object *)lookup_tag(r, &oid);
 	} else {
-		error("Unknown type %s", type);
-		item->tagged = NULL;
+		return error("unknown tag type '%s' in %s",
+			     type, oid_to_hex(&item->object.oid));
 	}
 
+	if (!item->tagged)
+		return error("bad tag pointer to %s in %s",
+			     oid_to_hex(&oid),
+			     oid_to_hex(&item->object.oid));
+
 	if (bufptr + 4 < tail && starts_with(bufptr, "tag "))
 		; 		/* good */
 	else
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/23] remember commit/tag parse failures
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (2 preceding siblings ...)
  2019-10-18  4:45 ` [PATCH 03/23] parse_tag_buffer(): treat NULL tag pointer " Jeff King
@ 2019-10-18  4:47 ` Jeff King
  2019-10-24  3:51   ` Junio C Hamano
  2019-10-24 23:25   ` Jonathan Tan
  2019-10-18  4:48 ` [PATCH 05/23] fsck: stop checking commit->tree value Jeff King
                   ` (20 subsequent siblings)
  24 siblings, 2 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:47 UTC (permalink / raw)
  To: git

If we can't parse a commit, then parse_commit() will return an error
code. But it _also_ sets the "parsed" flag, which tells us not to bother
trying to re-parse the object. That means that subsequent parses have no
idea that the information in the struct may be bogus.  I.e., doing this:

  parse_commit(commit);
  ...
  if (parse_commit(commit) < 0)
          die("commit is broken");

will never trigger the die(). The second parse_commit() will see the
"parsed" flag and quietly return success.

There are two obvious ways to fix this:

  1. Stop setting "parsed" until we've successfully parsed.

  2. Keep a second "corrupt" flag to indicate that we saw an error (and
     when the parsed flag is set, return 0/-1 depending on the corrupt
     flag).

This patch does option 1. The obvious downside versus option 2 is that
we might continually re-parse a broken object. But in practice,
corruption like this is rare, and we typically die() or return an error
in the caller. So it's OK not to worry about optimizing for corruption.
And it's much simpler: we don't need to use an extra bit in the object
struct, and callers which check the "parsed" flag don't need to learn
about the corrupt bit, too.

There's no new test here, because this case is already covered in t5318.
Note that we do need to update the expected message there, because we
now detect the problem in the return from "parse_commit()", and not with
a separate check for a NULL tree. In fact, we can now ditch that
explicit tree check entirely, as we're covered robustly by this change
(and the previous recent change to treat a NULL tree as a parse error).

We'll also give tags the same treatment. I don't know offhand of any
cases where the problem can be triggered (it implies somebody ignoring a
parse error earlier in the process), but consistently returning an error
should cause the least surprise.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c          |  3 ---
 commit.c                | 14 +++++++++++++-
 t/t5318-commit-graph.sh |  2 +-
 tag.c                   | 12 +++++++++++-
 4 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index fc4a43b8d6..852b9c39e6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -855,9 +855,6 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 			die(_("unable to parse commit %s"),
 				oid_to_hex(&(*list)->object.oid));
 		tree = get_commit_tree_oid(*list);
-		if (!tree)
-			die(_("unable to get tree for %s"),
-				oid_to_hex(&(*list)->object.oid));
 		hashwrite(f, tree->hash, hash_len);
 
 		parent = (*list)->parents;
diff --git a/commit.c b/commit.c
index 810419a168..e12e7998ad 100644
--- a/commit.c
+++ b/commit.c
@@ -405,7 +405,18 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->parents) {
+		/*
+		 * Presumably this is leftover from an earlier failed parse;
+		 * clear it out in preparation for us re-parsing (we'll hit the
+		 * same error, but that's good, since it lets our caller know
+		 * the result cannot be trusted.
+		 */
+		free_commit_list(item->parents);
+		item->parents = NULL;
+	}
+
 	tail += size;
 	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
 			bufptr[tree_entry_len] != '\n')
@@ -462,6 +473,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	if (check_graph)
 		load_commit_graph_info(r, item);
 
+	item->object.parsed = 1;
 	return 0;
 }
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d42b3efe39..127b404856 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -660,7 +660,7 @@ test_expect_success 'corrupt commit-graph write (missing tree)' '
 		git commit-tree -p "$broken" -m "good" "$tree" >good &&
 		test_must_fail git commit-graph write --stdin-commits \
 			<good 2>test_err &&
-		test_i18ngrep "unable to get tree for" test_err
+		test_i18ngrep "unable to parse commit" test_err
 	)
 '
 
diff --git a/tag.c b/tag.c
index 6a51efda8d..71b544467e 100644
--- a/tag.c
+++ b/tag.c
@@ -141,7 +141,16 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->tag) {
+		/*
+		 * Presumably left over from a previous failed parse;
+		 * clear it out in preparation for re-parsing (we'll probably
+		 * hit the same error, which lets us tell our current caller
+		 * about the problem).
+		 */
+		FREE_AND_NULL(item->tag);
+	}
 
 	if (size < the_hash_algo->hexsz + 24)
 		return -1;
@@ -192,6 +201,7 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	else
 		item->date = 0;
 
+	item->object.parsed = 1;
 	return 0;
 }
 
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/23] fsck: stop checking commit->tree value
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (3 preceding siblings ...)
  2019-10-18  4:47 ` [PATCH 04/23] remember commit/tag parse failures Jeff King
@ 2019-10-18  4:48 ` Jeff King
  2019-10-24  3:57   ` Junio C Hamano
  2019-10-18  4:49 ` [PATCH 06/23] fsck: stop checking commit->parent counts Jeff King
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:48 UTC (permalink / raw)
  To: git

We check in fsck_commit_buffer() that commit->tree isn't NULL, which in
turn generally comes from a previous parse by parse_commit(). But this
isn't really accomplishing anything. The two things we might care about
are:

  - was there a syntactically valid "tree <oid>" line in the object? But
    we've just done our own parse in fsck_commit_buffer() to check this.

  - does it point to a valid tree object? But checking the "tree"
    pointer here doesn't actually accomplish that; it just shows that
    lookup_tree() didn't return NULL, which only means that we haven't
    yet seen that oid as a non-tree in this process.

    A real connectivity check would exhaustively walk all graph links,
    and we do that already in a separate function.

So this code isn't helping anything. And it makes the fsck code slightly
more confusing and rigid (e.g., it requires that any commit structs have
already been parsed). Let's drop it.

As a bit of history, the presence of this code looks like a leftover
from early fsck code (which did rely on parse_commit() to do most of the
parsing). The check comes from ff5ebe39b0 (Port fsck-cache to use
parsing functions, 2005-04-18), but we later added an explicit walk in
355885d531 (add generic, type aware object chain walker, 2008-02-25).

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fsck.c b/fsck.c
index cdb7d8db03..6dfc533fb0 100644
--- a/fsck.c
+++ b/fsck.c
@@ -800,11 +800,6 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	err = fsck_ident(&buffer, &commit->object, options);
 	if (err)
 		return err;
-	if (!get_commit_tree(commit)) {
-		err = report(options, &commit->object, FSCK_MSG_BAD_TREE, "could not load commit's tree %s", oid_to_hex(&tree_oid));
-		if (err)
-			return err;
-	}
 	if (memchr(buffer_begin, '\0', size)) {
 		err = report(options, &commit->object, FSCK_MSG_NUL_IN_COMMIT,
 			     "NUL byte in the commit object body");
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/23] fsck: stop checking commit->parent counts
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (4 preceding siblings ...)
  2019-10-18  4:48 ` [PATCH 05/23] fsck: stop checking commit->tree value Jeff King
@ 2019-10-18  4:49 ` Jeff King
  2019-10-18  4:51 ` [PATCH 07/23] fsck: stop checking tag->tagged Jeff King
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:49 UTC (permalink / raw)
  To: git

In 4516338243 (builtin-fsck: reports missing parent commits,
2008-02-25), we added code to check that fsck found the same number of
parents from parsing the commit itself as we see in the commit struct we
got from parse_commit_buffer(). Back then the rationale was that the
normal commit parser might skip some bad parents.

But earlier in this series, we started treating that reliably as a
parsing error, meaning that we'd complain about it before we even hit
the code in fsck.c.

Let's drop this code, which now makes fsck_commit_buffer() completely
independent of any parsed values in the commit struct (that's
conceptually cleaner, and also opens up more refactoring options).

Note that we can also drop the MISSING_PARENT and MISSING_GRAFT fsck
identifiers. This is no loss, as these would not trigger reliably
anyway.  We'd hit them only when lookup_commit() failed, which occurs
only if we happen to have seen the object with another type already in
the same process. In most cases, we'd actually run into the problem
during the connectivity walk, not here.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 23 +----------------------
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/fsck.c b/fsck.c
index 6dfc533fb0..a0f8ae7650 100644
--- a/fsck.c
+++ b/fsck.c
@@ -43,10 +43,8 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(MISSING_AUTHOR, ERROR) \
 	FUNC(MISSING_COMMITTER, ERROR) \
 	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_GRAFT, ERROR) \
 	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
 	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_PARENT, ERROR) \
 	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
 	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
 	FUNC(MISSING_TAG, ERROR) \
@@ -739,8 +737,7 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	unsigned long size, struct fsck_options *options)
 {
 	struct object_id tree_oid, oid;
-	struct commit_graft *graft;
-	unsigned parent_count, parent_line_count = 0, author_count;
+	unsigned author_count;
 	int err;
 	const char *buffer_begin = buffer;
 	const char *p;
@@ -763,24 +760,6 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 				return err;
 		}
 		buffer = p + 1;
-		parent_line_count++;
-	}
-	graft = lookup_commit_graft(the_repository, &commit->object.oid);
-	parent_count = commit_list_count(commit->parents);
-	if (graft) {
-		if (graft->nr_parent == -1 && !parent_count)
-			; /* shallow commit */
-		else if (graft->nr_parent != parent_count) {
-			err = report(options, &commit->object, FSCK_MSG_MISSING_GRAFT, "graft objects missing");
-			if (err)
-				return err;
-		}
-	} else {
-		if (parent_count != parent_line_count) {
-			err = report(options, &commit->object, FSCK_MSG_MISSING_PARENT, "parent objects missing");
-			if (err)
-				return err;
-		}
 	}
 	author_count = 0;
 	while (skip_prefix(buffer, "author ", &buffer)) {
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/23] fsck: stop checking tag->tagged
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (5 preceding siblings ...)
  2019-10-18  4:49 ` [PATCH 06/23] fsck: stop checking commit->parent counts Jeff King
@ 2019-10-18  4:51 ` Jeff King
  2019-10-18  4:54 ` [PATCH 08/23] fsck: require an actual buffer for non-blobs Jeff King
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:51 UTC (permalink / raw)
  To: git

Way back in 92d4c85d24 (fsck-cache: fix SIGSEGV on bad tag object,
2005-05-03), we added an fsck check that the "tagged" field of a tag
struct isn't NULL. But that was mainly protecting the printing code for
"--tags", and that code wasn't moved along with the check as part of
ba002f3b28 (builtin-fsck: move common object checking code to fsck.c,
2008-02-25).

It could also serve to detect type mismatch problems (where a tag points
to object X as a commit, but really X is a blob), but it couldn't do so
reliably (we'd call lookup_commit(X), but it will only notice the
problem if we happen to have previously called lookup_blob(X) in the
same process). And as of a commit earlier in this series, we'd consider
that a parse error and complain about the object even before getting to
this point anyway.

So let's drop this "tag->tagged" check. It's not helping anything, and
getting rid of it makes the function conceptually cleaner, as it really
is just checking the buffer we feed it. In fact, we can get rid of our
one-line wrapper and just unify fsck_tag() and fsck_tag_buffer().

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/fsck.c b/fsck.c
index a0f8ae7650..79ce3a97c8 100644
--- a/fsck.c
+++ b/fsck.c
@@ -798,8 +798,8 @@ static int fsck_commit(struct commit *commit, const char *data,
 	return ret;
 }
 
-static int fsck_tag_buffer(struct tag *tag, const char *data,
-	unsigned long size, struct fsck_options *options)
+static int fsck_tag(struct tag *tag, const char *data,
+		    unsigned long size, struct fsck_options *options)
 {
 	struct object_id oid;
 	int ret = 0;
@@ -893,17 +893,6 @@ static int fsck_tag_buffer(struct tag *tag, const char *data,
 	return ret;
 }
 
-static int fsck_tag(struct tag *tag, const char *data,
-	unsigned long size, struct fsck_options *options)
-{
-	struct object *tagged = tag->tagged;
-
-	if (!tagged)
-		return report(options, &tag->object, FSCK_MSG_BAD_TAG_OBJECT, "could not load tagged object");
-
-	return fsck_tag_buffer(tag, data, size, options);
-}
-
 struct fsck_gitmodules_data {
 	struct object *obj;
 	struct fsck_options *options;
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/23] fsck: require an actual buffer for non-blobs
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (6 preceding siblings ...)
  2019-10-18  4:51 ` [PATCH 07/23] fsck: stop checking tag->tagged Jeff King
@ 2019-10-18  4:54 ` Jeff King
  2019-10-18  4:56 ` [PATCH 09/23] fsck: unify object-name code Jeff King
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:54 UTC (permalink / raw)
  To: git

The fsck_object() function takes in a buffer, but also a "struct
object". The rules for using these vary between types:

  - for a commit, we'll use the provided buffer; if it's NULL, we'll
    fall back to get_commit_buffer(), which loads from either an
    in-memory cache or from disk. If the latter fails, we'd die(), which
    is non-ideal for fsck.

  - for a tag, a NULL buffer will fall back to loading the object from
    disk (and failure would lead to an fsck error)

  - for a tree, we _never_ look at the provided buffer, and always use
    tree->buffer

  - for a blob, we usually don't look at the buffer at all, unless it
    has been marked as a .gitmodule file. In that case we check the
    buffer given to us, or assume a NULL buffer is a very large blob
    (and complain about it)

This is much more complex than it needs to be. It turns out that nobody
ever feeds a NULL buffer that isn't a blob:

  - git-fsck calls fsck_object() only from fsck_obj(). That in turn is
    called by one of:

      - fsck_obj_buffer(), which is a callback to verify_pack(), which
	unpacks everything except large blobs into a buffer (see
	pack-check.c, lines 131-141).

      - fsck_loose(), which hits a BUG() on non-blobs with a NULL buffer
	(builtin/fsck.c, lines 639-640)

    And in either case, we'll have just called parse_object_buffer()
    anyway, which would segfault on a NULL buffer for commits or tags
    (not for trees, but it would install a NULL tree->buffer which would
    later cause a segfault)

  - git-index-pack asserts that the buffer is non-NULL unless the object
    is a blob (see builtin/index-pack.c, line 832)

  - git-unpack-objects always writes a non-NULL buffer into its
    obj_buffer hash, which is then fed to fsck_object(). (There is
    actually a funny thing here where it does not store blob buffers at
    all, nor does it call fsck on them; it does check any needed blobs
    via fsck_finish() though).

Let's make the rules simpler, which reduces the amount of code and gives
us more flexibility in refactoring the fsck code. The new rules are:

  - only blobs are allowed to pass a NULL buffer

  - we always use the provided buffer, never pulling information from
    the object struct

We don't have to adjust any callers, because they were already adhering
to these. Note that we do drop a few fsck identifiers for missing tags,
but that was all dead code (because nobody passed a NULL tag buffer).

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 51 +++++++++------------------------------------------
 fsck.h |  6 +++++-
 2 files changed, 14 insertions(+), 43 deletions(-)

diff --git a/fsck.c b/fsck.c
index 79ce3a97c8..347a0ef5c9 100644
--- a/fsck.c
+++ b/fsck.c
@@ -49,13 +49,11 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
 	FUNC(MISSING_TAG, ERROR) \
 	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TAG_OBJECT, ERROR) \
 	FUNC(MISSING_TREE, ERROR) \
 	FUNC(MISSING_TREE_OBJECT, ERROR) \
 	FUNC(MISSING_TYPE, ERROR) \
 	FUNC(MISSING_TYPE_ENTRY, ERROR) \
 	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
 	FUNC(TREE_NOT_SORTED, ERROR) \
 	FUNC(UNKNOWN_TYPE, ERROR) \
 	FUNC(ZERO_PADDED_DATE, ERROR) \
@@ -541,7 +539,9 @@ static int verify_ordered(unsigned mode1, const char *name1, unsigned mode2, con
 	return c1 < c2 ? 0 : TREE_UNORDERED;
 }
 
-static int fsck_tree(struct tree *item, struct fsck_options *options)
+static int fsck_tree(struct tree *item,
+		     const char *buffer, unsigned long size,
+		     struct fsck_options *options)
 {
 	int retval = 0;
 	int has_null_sha1 = 0;
@@ -558,7 +558,7 @@ static int fsck_tree(struct tree *item, struct fsck_options *options)
 	unsigned o_mode;
 	const char *o_name;
 
-	if (init_tree_desc_gently(&desc, item->buffer, item->size)) {
+	if (init_tree_desc_gently(&desc, buffer, size)) {
 		retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
@@ -733,8 +733,8 @@ static int fsck_ident(const char **ident, struct object *obj, struct fsck_option
 	return 0;
 }
 
-static int fsck_commit_buffer(struct commit *commit, const char *buffer,
-	unsigned long size, struct fsck_options *options)
+static int fsck_commit(struct commit *commit, const char *buffer,
+		       unsigned long size, struct fsck_options *options)
 {
 	struct object_id tree_oid, oid;
 	unsigned author_count;
@@ -788,47 +788,15 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	return 0;
 }
 
-static int fsck_commit(struct commit *commit, const char *data,
-	unsigned long size, struct fsck_options *options)
-{
-	const char *buffer = data ?  data : get_commit_buffer(commit, &size);
-	int ret = fsck_commit_buffer(commit, buffer, size, options);
-	if (!data)
-		unuse_commit_buffer(commit, buffer);
-	return ret;
-}
-
-static int fsck_tag(struct tag *tag, const char *data,
+static int fsck_tag(struct tag *tag, const char *buffer,
 		    unsigned long size, struct fsck_options *options)
 {
 	struct object_id oid;
 	int ret = 0;
-	const char *buffer;
-	char *to_free = NULL, *eol;
+	char *eol;
 	struct strbuf sb = STRBUF_INIT;
 	const char *p;
 
-	if (data)
-		buffer = data;
-	else {
-		enum object_type type;
-
-		buffer = to_free =
-			read_object_file(&tag->object.oid, &type, &size);
-		if (!buffer)
-			return report(options, &tag->object,
-				FSCK_MSG_MISSING_TAG_OBJECT,
-				"cannot read tag object");
-
-		if (type != OBJ_TAG) {
-			ret = report(options, &tag->object,
-				FSCK_MSG_TAG_OBJECT_NOT_TAG,
-				"expected tag got %s",
-			    type_name(type));
-			goto done;
-		}
-	}
-
 	ret = verify_headers(buffer, size, &tag->object, options);
 	if (ret)
 		goto done;
@@ -889,7 +857,6 @@ static int fsck_tag(struct tag *tag, const char *data,
 
 done:
 	strbuf_release(&sb);
-	free(to_free);
 	return ret;
 }
 
@@ -979,7 +946,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 	if (obj->type == OBJ_BLOB)
 		return fsck_blob((struct blob *)obj, data, size, options);
 	if (obj->type == OBJ_TREE)
-		return fsck_tree((struct tree *) obj, options);
+		return fsck_tree((struct tree *) obj, data, size, options);
 	if (obj->type == OBJ_COMMIT)
 		return fsck_commit((struct commit *) obj, (const char *) data,
 			size, options);
diff --git a/fsck.h b/fsck.h
index b95595ae5f..e479461075 100644
--- a/fsck.h
+++ b/fsck.h
@@ -52,7 +52,11 @@ struct fsck_options {
  *    0		everything OK
  */
 int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
-/* If NULL is passed for data, we assume the object is local and read it. */
+
+/*
+ * Blob objects my pass a NULL data pointer, which indicates they are too large
+ * to fit in memory. All other types must pass a real buffer.
+ */
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/23] fsck: unify object-name code
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (7 preceding siblings ...)
  2019-10-18  4:54 ` [PATCH 08/23] fsck: require an actual buffer for non-blobs Jeff King
@ 2019-10-18  4:56 ` Jeff King
  2019-10-24  6:05   ` Junio C Hamano
  2019-10-18  4:56 ` [PATCH 10/23] fsck_describe_object(): build on our get_object_name() primitive Jeff King
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:56 UTC (permalink / raw)
  To: git

Commit 90cf590f53 (fsck: optionally show more helpful info for broken
links, 2016-07-17) added a system for decorating objects with names. The
code is split across builtin/fsck.c (which gives the initial names) and
fsck.c (which adds to the names as it traverses the object graph). This
leads to some duplication, where both sites have near-identical
describe_object() functions (the difference being that the one in
builtin/fsck.c uses a circular array of buffers to allow multiple calls
in a single printf).

Let's provide a unified object_name API for fsck. That lets us drop the
duplication, as well as making the interface boundaries more clear
(which will let us refactor the implementation more in a future patch).

We'll leave describe_object() in builtin/fsck.c as a thin wrapper around
the new API, as it relies on a static global to make its many callers a
bit shorter.

We'll also convert the bare add_decoration() calls in builtin/fsck.c to
put_object_name(). This fixes two minor bugs:

  1. We leak many small strings. add_decoration() has a last-one-wins
     approach: it updates the decoration to the new string and returns
     the old one. But we ignore the return value, leaking the old
     string. This is quite common to trigger, since we look at reflogs:
     the tip of any ref will be described both by looking at the actual
     ref, as well as the latest reflog entry. So we'd always end up
     leaking one of those strings.

  2. The last-one-wins approach gives us lousy names. For instance, we
     first look at all of the refs, and then all of the reflogs. So
     rather than seeing "refs/heads/master", we're likely to overwrite
     it with "HEAD@{12345678}". We're generally better off using the
     first name we find.

     And indeed, the test in t1450 expects this ugly HEAD@{} name. After
     this patch, we've switched to using fsck_put_object_name()'s
     first-one-wins semantics, and we output the more human-friendly
     "refs/tags/julius" (and the test is updated accordingly).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  | 50 ++++++++----------------------
 fsck.c          | 82 ++++++++++++++++++++++++++++++-------------------
 fsck.h          | 24 +++++++++++++++
 t/t1450-fsck.sh |  2 +-
 4 files changed, 89 insertions(+), 69 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 18403a94fa..237643cc1d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -52,24 +52,7 @@ static int name_objects;
 
 static const char *describe_object(struct object *obj)
 {
-	static struct strbuf bufs[] = {
-		STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
-	};
-	static int b = 0;
-	struct strbuf *buf;
-	char *name = NULL;
-
-	if (name_objects)
-		name = lookup_decoration(fsck_walk_options.object_names, obj);
-
-	buf = bufs + b;
-	b = (b + 1) % ARRAY_SIZE(bufs);
-	strbuf_reset(buf);
-	strbuf_addstr(buf, oid_to_hex(&obj->oid));
-	if (name)
-		strbuf_addf(buf, " (%s)", name);
-
-	return buf->buf;
+	return fsck_describe_object(&fsck_walk_options, obj);
 }
 
 static const char *printable_type(struct object *obj)
@@ -499,10 +482,10 @@ static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid,
 	if (!is_null_oid(oid)) {
 		obj = lookup_object(the_repository, oid);
 		if (obj && (obj->flags & HAS_OBJ)) {
-			if (timestamp && name_objects)
-				add_decoration(fsck_walk_options.object_names,
-					obj,
-					xstrfmt("%s@{%"PRItime"}", refname, timestamp));
+			if (timestamp)
+				fsck_put_object_name(&fsck_walk_options, obj,
+						     "%s@{%"PRItime"}",
+						     refname, timestamp);
 			obj->flags |= USED;
 			mark_object_reachable(obj);
 		} else if (!is_promisor_object(oid)) {
@@ -566,9 +549,8 @@ static int fsck_handle_ref(const char *refname, const struct object_id *oid,
 	}
 	default_refs++;
 	obj->flags |= USED;
-	if (name_objects)
-		add_decoration(fsck_walk_options.object_names,
-			obj, xstrdup(refname));
+	fsck_put_object_name(&fsck_walk_options,
+			     obj, "%s", refname);
 	mark_object_reachable(obj);
 
 	return 0;
@@ -742,9 +724,7 @@ static int fsck_cache_tree(struct cache_tree *it)
 			return 1;
 		}
 		obj->flags |= USED;
-		if (name_objects)
-			add_decoration(fsck_walk_options.object_names,
-				obj, xstrdup(":"));
+		fsck_put_object_name(&fsck_walk_options, obj, ":");
 		mark_object_reachable(obj);
 		if (obj->type != OBJ_TREE)
 			err |= objerror(obj, _("non-tree in cache-tree"));
@@ -830,8 +810,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	}
 
 	if (name_objects)
-		fsck_walk_options.object_names =
-			xcalloc(1, sizeof(struct decoration));
+		fsck_enable_object_names(&fsck_walk_options);
 
 	git_config(fsck_config, NULL);
 
@@ -890,9 +869,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 			}
 
 			obj->flags |= USED;
-			if (name_objects)
-				add_decoration(fsck_walk_options.object_names,
-					obj, xstrdup(arg));
+			fsck_put_object_name(&fsck_walk_options, obj,
+					     "%s", arg);
 			mark_object_reachable(obj);
 			continue;
 		}
@@ -928,10 +906,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 				continue;
 			obj = &blob->object;
 			obj->flags |= USED;
-			if (name_objects)
-				add_decoration(fsck_walk_options.object_names,
-					obj,
-					xstrfmt(":%s", active_cache[i]->name));
+			fsck_put_object_name(&fsck_walk_options, obj,
+					     ":%s", active_cache[i]->name);
 			mark_object_reachable(obj);
 		}
 		if (active_cache_tree)
diff --git a/fsck.c b/fsck.c
index 347a0ef5c9..ecd5957362 100644
--- a/fsck.c
+++ b/fsck.c
@@ -312,15 +312,21 @@ static int report(struct fsck_options *options, struct object *object,
 	return result;
 }
 
-static char *get_object_name(struct fsck_options *options, struct object *obj)
+void fsck_enable_object_names(struct fsck_options *options)
+{
+	if (!options->object_names)
+		options->object_names = xcalloc(1, sizeof(struct decoration));
+}
+
+const char *fsck_get_object_name(struct fsck_options *options, struct object *obj)
 {
 	if (!options->object_names)
 		return NULL;
 	return lookup_decoration(options->object_names, obj);
 }
 
-static void put_object_name(struct fsck_options *options, struct object *obj,
-	const char *fmt, ...)
+void fsck_put_object_name(struct fsck_options *options, struct object *obj,
+			  const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf buf = STRBUF_INIT;
@@ -337,17 +343,27 @@ static void put_object_name(struct fsck_options *options, struct object *obj,
 	va_end(ap);
 }
 
-static const char *describe_object(struct fsck_options *o, struct object *obj)
+const char *fsck_describe_object(struct fsck_options *options,
+				 struct object *obj)
 {
-	static struct strbuf buf = STRBUF_INIT;
-	char *name;
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, oid_to_hex(&obj->oid));
-	if (o->object_names && (name = lookup_decoration(o->object_names, obj)))
-		strbuf_addf(&buf, " (%s)", name);
+	static struct strbuf bufs[] = {
+		STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
+	};
+	static int b = 0;
+	struct strbuf *buf;
+	char *name = NULL;
+
+	if (options->object_names)
+		name = lookup_decoration(options->object_names, obj);
+
+	buf = bufs + b;
+	b = (b + 1) % ARRAY_SIZE(bufs);
+	strbuf_reset(buf);
+	strbuf_addstr(buf, oid_to_hex(&obj->oid));
+	if (name)
+		strbuf_addf(buf, " (%s)", name);
 
-	return buf.buf;
+	return buf->buf;
 }
 
 static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *options)
@@ -360,7 +376,7 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 	if (parse_tree(tree))
 		return -1;
 
-	name = get_object_name(options, &tree->object);
+	name = fsck_get_object_name(options, &tree->object);
 	if (init_tree_desc_gently(&desc, tree->buffer, tree->size))
 		return -1;
 	while (tree_entry_gently(&desc, &entry)) {
@@ -373,20 +389,21 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 		if (S_ISDIR(entry.mode)) {
 			obj = (struct object *)lookup_tree(the_repository, &entry.oid);
 			if (name && obj)
-				put_object_name(options, obj, "%s%s/", name,
-					entry.path);
+				fsck_put_object_name(options, obj, "%s%s/",
+						     name, entry.path);
 			result = options->walk(obj, OBJ_TREE, data, options);
 		}
 		else if (S_ISREG(entry.mode) || S_ISLNK(entry.mode)) {
 			obj = (struct object *)lookup_blob(the_repository, &entry.oid);
 			if (name && obj)
-				put_object_name(options, obj, "%s%s", name,
-					entry.path);
+				fsck_put_object_name(options, obj, "%s%s",
+						     name, entry.path);
 			result = options->walk(obj, OBJ_BLOB, data, options);
 		}
 		else {
 			result = error("in tree %s: entry %s has bad mode %.6o",
-					describe_object(options, &tree->object), entry.path, entry.mode);
+				       fsck_describe_object(options, &tree->object),
+				       entry.path, entry.mode);
 		}
 		if (result < 0)
 			return result;
@@ -407,10 +424,10 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 	if (parse_commit(commit))
 		return -1;
 
-	name = get_object_name(options, &commit->object);
+	name = fsck_get_object_name(options, &commit->object);
 	if (name)
-		put_object_name(options, &get_commit_tree(commit)->object,
-				"%s:", name);
+		fsck_put_object_name(options, &get_commit_tree(commit)->object,
+				     "%s:", name);
 
 	result = options->walk((struct object *)get_commit_tree(commit),
 			       OBJ_TREE, data, options);
@@ -441,13 +458,15 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 			struct object *obj = &parents->item->object;
 
 			if (counter++)
-				put_object_name(options, obj, "%s^%d",
-					name, counter);
+				fsck_put_object_name(options, obj, "%s^%d",
+						     name, counter);
 			else if (generation > 0)
-				put_object_name(options, obj, "%.*s~%d",
-					name_prefix_len, name, generation + 1);
+				fsck_put_object_name(options, obj, "%.*s~%d",
+						     name_prefix_len, name,
+						     generation + 1);
 			else
-				put_object_name(options, obj, "%s^", name);
+				fsck_put_object_name(options, obj, "%s^",
+						     name);
 		}
 		result = options->walk((struct object *)parents->item, OBJ_COMMIT, data, options);
 		if (result < 0)
@@ -461,12 +480,12 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 
 static int fsck_walk_tag(struct tag *tag, void *data, struct fsck_options *options)
 {
-	char *name = get_object_name(options, &tag->object);
+	const char *name = fsck_get_object_name(options, &tag->object);
 
 	if (parse_tag(tag))
 		return -1;
 	if (name)
-		put_object_name(options, tag->tagged, "%s", name);
+		fsck_put_object_name(options, tag->tagged, "%s", name);
 	return options->walk(tag->tagged, OBJ_ANY, data, options);
 }
 
@@ -488,7 +507,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options)
 	case OBJ_TAG:
 		return fsck_walk_tag((struct tag *)obj, data, options);
 	default:
-		error("Unknown object type for %s", describe_object(options, obj));
+		error("Unknown object type for %s",
+		      fsck_describe_object(options, obj));
 		return -1;
 	}
 }
@@ -962,10 +982,10 @@ int fsck_error_function(struct fsck_options *o,
 	struct object *obj, int msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
-		warning("object %s: %s", describe_object(o, obj), message);
+		warning("object %s: %s", fsck_describe_object(o, obj), message);
 		return 0;
 	}
-	error("object %s: %s", describe_object(o, obj), message);
+	error("object %s: %s", fsck_describe_object(o, obj), message);
 	return 1;
 }
 
diff --git a/fsck.h b/fsck.h
index e479461075..f6f0c40060 100644
--- a/fsck.h
+++ b/fsck.h
@@ -67,4 +67,28 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
  */
 int fsck_finish(struct fsck_options *options);
 
+/*
+ * Subsystem for storing human-readable names for each object.
+ *
+ * If fsck_enable_object_names() has not been called, all other functions are
+ * noops.
+ *
+ * Use put_object_name() to seed initial names (e.g. from refnames); the fsck
+ * code will extend that while walking trees, etc.
+ *
+ * Use get_object_name() to get a single name (or NULL if none). Or the more
+ * convenient describe_object(), which always produces an output string with
+ * the oid combined with the name (if any). Note that the return value points
+ * to a rotating array of static buffers, and may be invalidated by a
+ * subsequent call.
+ */
+void fsck_enable_object_names(struct fsck_options *options);
+const char *fsck_get_object_name(struct fsck_options *options,
+				 struct object *obj);
+__attribute__((format (printf,3,4)))
+void fsck_put_object_name(struct fsck_options *options, struct object *obj,
+			  const char *fmt, ...);
+const char *fsck_describe_object(struct fsck_options *options,
+				 struct object *obj);
+
 #endif
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 50d28e6fdb..7c7ff7e961 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -616,7 +616,7 @@ test_expect_success 'fsck --name-objects' '
 		remove_object $(git rev-parse julius:caesar.t) &&
 		test_must_fail git fsck --name-objects >out &&
 		tree=$(git rev-parse --verify julius:) &&
-		test_i18ngrep -E "$tree \((refs/heads/master|HEAD)@\{[0-9]*\}:" out
+		test_i18ngrep "$tree (refs/tags/julius:" out
 	)
 '
 
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/23] fsck_describe_object(): build on our get_object_name() primitive
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (8 preceding siblings ...)
  2019-10-18  4:56 ` [PATCH 09/23] fsck: unify object-name code Jeff King
@ 2019-10-18  4:56 ` Jeff King
  2019-10-24  6:06   ` Junio C Hamano
  2019-10-18  4:57 ` [PATCH 11/23] fsck: use oids rather than objects for object_name API Jeff King
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:56 UTC (permalink / raw)
  To: git

This isolates the implementation detail of using the decoration code to
our put/get functions.

Signed-off-by: Jeff King <peff@peff.net>
---
Arguably this could be squashed into the previous commit. By not doing
so, it made describe_object() more of a pure code movement.

 fsck.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/fsck.c b/fsck.c
index ecd5957362..b0c4de67c9 100644
--- a/fsck.c
+++ b/fsck.c
@@ -351,10 +351,7 @@ const char *fsck_describe_object(struct fsck_options *options,
 	};
 	static int b = 0;
 	struct strbuf *buf;
-	char *name = NULL;
-
-	if (options->object_names)
-		name = lookup_decoration(options->object_names, obj);
+	const char *name = fsck_get_object_name(options, obj);
 
 	buf = bufs + b;
 	b = (b + 1) % ARRAY_SIZE(bufs);
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/23] fsck: use oids rather than objects for object_name API
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (9 preceding siblings ...)
  2019-10-18  4:56 ` [PATCH 10/23] fsck_describe_object(): build on our get_object_name() primitive Jeff King
@ 2019-10-18  4:57 ` Jeff King
  2019-10-18  4:58 ` [PATCH 12/23] fsck: don't require object structs for display functions Jeff King
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:57 UTC (permalink / raw)
  To: git

We don't actually care about having object structs; we only need to look
up decorations by oid. Let's accept this more limited form, which will
give our callers more flexibility.

Note that the decoration API we rely on uses object structs itself (even
though it only looks at their oids). We can solve this by switching to
a kh_oid_map (we could also use the hashmap oidmap, but it's more
awkward for the simple case of just storing a void pointer).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c | 12 +++++-----
 fsck.c         | 61 ++++++++++++++++++++++++++++----------------------
 fsck.h         |  9 ++++----
 3 files changed, 45 insertions(+), 37 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 237643cc1d..66fa727c14 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -52,7 +52,7 @@ static int name_objects;
 
 static const char *describe_object(struct object *obj)
 {
-	return fsck_describe_object(&fsck_walk_options, obj);
+	return fsck_describe_object(&fsck_walk_options, &obj->oid);
 }
 
 static const char *printable_type(struct object *obj)
@@ -483,7 +483,7 @@ static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid,
 		obj = lookup_object(the_repository, oid);
 		if (obj && (obj->flags & HAS_OBJ)) {
 			if (timestamp)
-				fsck_put_object_name(&fsck_walk_options, obj,
+				fsck_put_object_name(&fsck_walk_options, oid,
 						     "%s@{%"PRItime"}",
 						     refname, timestamp);
 			obj->flags |= USED;
@@ -550,7 +550,7 @@ static int fsck_handle_ref(const char *refname, const struct object_id *oid,
 	default_refs++;
 	obj->flags |= USED;
 	fsck_put_object_name(&fsck_walk_options,
-			     obj, "%s", refname);
+			     oid, "%s", refname);
 	mark_object_reachable(obj);
 
 	return 0;
@@ -724,7 +724,7 @@ static int fsck_cache_tree(struct cache_tree *it)
 			return 1;
 		}
 		obj->flags |= USED;
-		fsck_put_object_name(&fsck_walk_options, obj, ":");
+		fsck_put_object_name(&fsck_walk_options, &it->oid, ":");
 		mark_object_reachable(obj);
 		if (obj->type != OBJ_TREE)
 			err |= objerror(obj, _("non-tree in cache-tree"));
@@ -869,7 +869,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 			}
 
 			obj->flags |= USED;
-			fsck_put_object_name(&fsck_walk_options, obj,
+			fsck_put_object_name(&fsck_walk_options, &oid,
 					     "%s", arg);
 			mark_object_reachable(obj);
 			continue;
@@ -906,7 +906,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 				continue;
 			obj = &blob->object;
 			obj->flags |= USED;
-			fsck_put_object_name(&fsck_walk_options, obj,
+			fsck_put_object_name(&fsck_walk_options, &obj->oid,
 					     ":%s", active_cache[i]->name);
 			mark_object_reachable(obj);
 		}
diff --git a/fsck.c b/fsck.c
index b0c4de67c9..124c0184d4 100644
--- a/fsck.c
+++ b/fsck.c
@@ -315,48 +315,56 @@ static int report(struct fsck_options *options, struct object *object,
 void fsck_enable_object_names(struct fsck_options *options)
 {
 	if (!options->object_names)
-		options->object_names = xcalloc(1, sizeof(struct decoration));
+		options->object_names = kh_init_oid_map();
 }
 
-const char *fsck_get_object_name(struct fsck_options *options, struct object *obj)
+const char *fsck_get_object_name(struct fsck_options *options,
+				 const struct object_id *oid)
 {
+	khiter_t pos;
 	if (!options->object_names)
 		return NULL;
-	return lookup_decoration(options->object_names, obj);
+	pos = kh_get_oid_map(options->object_names, *oid);
+	if (pos >= kh_end(options->object_names))
+		return NULL;
+	return kh_value(options->object_names, pos);
 }
 
-void fsck_put_object_name(struct fsck_options *options, struct object *obj,
+void fsck_put_object_name(struct fsck_options *options,
+			  const struct object_id *oid,
 			  const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf buf = STRBUF_INIT;
-	char *existing;
+	khiter_t pos;
+	int hashret;
 
 	if (!options->object_names)
 		return;
-	existing = lookup_decoration(options->object_names, obj);
-	if (existing)
+
+	pos = kh_put_oid_map(options->object_names, *oid, &hashret);
+	if (!hashret)
 		return;
 	va_start(ap, fmt);
 	strbuf_vaddf(&buf, fmt, ap);
-	add_decoration(options->object_names, obj, strbuf_detach(&buf, NULL));
+	kh_value(options->object_names, pos) = strbuf_detach(&buf, NULL);
 	va_end(ap);
 }
 
 const char *fsck_describe_object(struct fsck_options *options,
-				 struct object *obj)
+				 const struct object_id *oid)
 {
 	static struct strbuf bufs[] = {
 		STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
 	};
 	static int b = 0;
 	struct strbuf *buf;
-	const char *name = fsck_get_object_name(options, obj);
+	const char *name = fsck_get_object_name(options, oid);
 
 	buf = bufs + b;
 	b = (b + 1) % ARRAY_SIZE(bufs);
 	strbuf_reset(buf);
-	strbuf_addstr(buf, oid_to_hex(&obj->oid));
+	strbuf_addstr(buf, oid_to_hex(oid));
 	if (name)
 		strbuf_addf(buf, " (%s)", name);
 
@@ -373,7 +381,7 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 	if (parse_tree(tree))
 		return -1;
 
-	name = fsck_get_object_name(options, &tree->object);
+	name = fsck_get_object_name(options, &tree->object.oid);
 	if (init_tree_desc_gently(&desc, tree->buffer, tree->size))
 		return -1;
 	while (tree_entry_gently(&desc, &entry)) {
@@ -386,20 +394,20 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 		if (S_ISDIR(entry.mode)) {
 			obj = (struct object *)lookup_tree(the_repository, &entry.oid);
 			if (name && obj)
-				fsck_put_object_name(options, obj, "%s%s/",
+				fsck_put_object_name(options, &entry.oid, "%s%s/",
 						     name, entry.path);
 			result = options->walk(obj, OBJ_TREE, data, options);
 		}
 		else if (S_ISREG(entry.mode) || S_ISLNK(entry.mode)) {
 			obj = (struct object *)lookup_blob(the_repository, &entry.oid);
 			if (name && obj)
-				fsck_put_object_name(options, obj, "%s%s",
+				fsck_put_object_name(options, &entry.oid, "%s%s",
 						     name, entry.path);
 			result = options->walk(obj, OBJ_BLOB, data, options);
 		}
 		else {
 			result = error("in tree %s: entry %s has bad mode %.6o",
-				       fsck_describe_object(options, &tree->object),
+				       fsck_describe_object(options, &tree->object.oid),
 				       entry.path, entry.mode);
 		}
 		if (result < 0)
@@ -421,9 +429,9 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 	if (parse_commit(commit))
 		return -1;
 
-	name = fsck_get_object_name(options, &commit->object);
+	name = fsck_get_object_name(options, &commit->object.oid);
 	if (name)
-		fsck_put_object_name(options, &get_commit_tree(commit)->object,
+		fsck_put_object_name(options, get_commit_tree_oid(commit),
 				     "%s:", name);
 
 	result = options->walk((struct object *)get_commit_tree(commit),
@@ -452,18 +460,17 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 
 	while (parents) {
 		if (name) {
-			struct object *obj = &parents->item->object;
+			struct object_id *oid = &parents->item->object.oid;
 
 			if (counter++)
-				fsck_put_object_name(options, obj, "%s^%d",
+				fsck_put_object_name(options, oid, "%s^%d",
 						     name, counter);
 			else if (generation > 0)
-				fsck_put_object_name(options, obj, "%.*s~%d",
+				fsck_put_object_name(options, oid, "%.*s~%d",
 						     name_prefix_len, name,
 						     generation + 1);
 			else
-				fsck_put_object_name(options, obj, "%s^",
-						     name);
+				fsck_put_object_name(options, oid, "%s^", name);
 		}
 		result = options->walk((struct object *)parents->item, OBJ_COMMIT, data, options);
 		if (result < 0)
@@ -477,12 +484,12 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 
 static int fsck_walk_tag(struct tag *tag, void *data, struct fsck_options *options)
 {
-	const char *name = fsck_get_object_name(options, &tag->object);
+	const char *name = fsck_get_object_name(options, &tag->object.oid);
 
 	if (parse_tag(tag))
 		return -1;
 	if (name)
-		fsck_put_object_name(options, tag->tagged, "%s", name);
+		fsck_put_object_name(options, &tag->tagged->oid, "%s", name);
 	return options->walk(tag->tagged, OBJ_ANY, data, options);
 }
 
@@ -505,7 +512,7 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options)
 		return fsck_walk_tag((struct tag *)obj, data, options);
 	default:
 		error("Unknown object type for %s",
-		      fsck_describe_object(options, obj));
+		      fsck_describe_object(options, &obj->oid));
 		return -1;
 	}
 }
@@ -979,10 +986,10 @@ int fsck_error_function(struct fsck_options *o,
 	struct object *obj, int msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
-		warning("object %s: %s", fsck_describe_object(o, obj), message);
+		warning("object %s: %s", fsck_describe_object(o, &obj->oid), message);
 		return 0;
 	}
-	error("object %s: %s", fsck_describe_object(o, obj), message);
+	error("object %s: %s", fsck_describe_object(o, &obj->oid), message);
 	return 1;
 }
 
diff --git a/fsck.h b/fsck.h
index f6f0c40060..4397cab20f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -38,7 +38,7 @@ struct fsck_options {
 	unsigned strict:1;
 	int *msg_type;
 	struct oidset skiplist;
-	struct decoration *object_names;
+	kh_oid_map_t *object_names;
 };
 
 #define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
@@ -84,11 +84,12 @@ int fsck_finish(struct fsck_options *options);
  */
 void fsck_enable_object_names(struct fsck_options *options);
 const char *fsck_get_object_name(struct fsck_options *options,
-				 struct object *obj);
+				 const struct object_id *oid);
 __attribute__((format (printf,3,4)))
-void fsck_put_object_name(struct fsck_options *options, struct object *obj,
+void fsck_put_object_name(struct fsck_options *options,
+			  const struct object_id *oid,
 			  const char *fmt, ...);
 const char *fsck_describe_object(struct fsck_options *options,
-				 struct object *obj);
+				 const struct object_id *oid);
 
 #endif
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 12/23] fsck: don't require object structs for display functions
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (10 preceding siblings ...)
  2019-10-18  4:57 ` [PATCH 11/23] fsck: use oids rather than objects for object_name API Jeff King
@ 2019-10-18  4:58 ` Jeff King
  2019-10-18  4:58 ` [PATCH 13/23] fsck: only provide oid/type in fsck_error callback Jeff King
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:58 UTC (permalink / raw)
  To: git

Our printable_type() and describe_object() functions take whole object
structs, but they really only care about the oid and type. Let's take
those individually in order to give our callers more flexibility.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c | 69 +++++++++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 32 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 66fa727c14..59c77c1baa 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -50,23 +50,20 @@ static int name_objects;
 #define ERROR_REFS 010
 #define ERROR_COMMIT_GRAPH 020
 
-static const char *describe_object(struct object *obj)
+static const char *describe_object(const struct object_id *oid)
 {
-	return fsck_describe_object(&fsck_walk_options, &obj->oid);
+	return fsck_describe_object(&fsck_walk_options, oid);
 }
 
-static const char *printable_type(struct object *obj)
+static const char *printable_type(const struct object_id *oid,
+				  enum object_type type)
 {
 	const char *ret;
 
-	if (obj->type == OBJ_NONE) {
-		enum object_type type = oid_object_info(the_repository,
-							&obj->oid, NULL);
-		if (type > 0)
-			object_as_type(the_repository, obj, type, 0);
-	}
+	if (type == OBJ_NONE)
+		type = oid_object_info(the_repository, oid, NULL);
 
-	ret = type_name(obj->type);
+	ret = type_name(type);
 	if (!ret)
 		ret = _("unknown");
 
@@ -101,7 +98,8 @@ static int objerror(struct object *obj, const char *err)
 	errors_found |= ERROR_OBJECT;
 	/* TRANSLATORS: e.g. error in tree 01bfda: <more explanation> */
 	fprintf_ln(stderr, _("error in %s %s: %s"),
-		   printable_type(obj), describe_object(obj), err);
+		   printable_type(&obj->oid, obj->type),
+		   describe_object(&obj->oid), err);
 	return -1;
 }
 
@@ -112,12 +110,14 @@ static int fsck_error_func(struct fsck_options *o,
 	case FSCK_WARN:
 		/* TRANSLATORS: e.g. warning in tree 01bfda: <more explanation> */
 		fprintf_ln(stderr, _("warning in %s %s: %s"),
-			   printable_type(obj), describe_object(obj), message);
+			   printable_type(&obj->oid, obj->type),
+			   describe_object(&obj->oid), message);
 		return 0;
 	case FSCK_ERROR:
 		/* TRANSLATORS: e.g. error in tree 01bfda: <more explanation> */
 		fprintf_ln(stderr, _("error in %s %s: %s"),
-			   printable_type(obj), describe_object(obj), message);
+			   printable_type(&obj->oid, obj->type),
+			   describe_object(&obj->oid), message);
 		return 1;
 	default:
 		BUG("%d (FSCK_IGNORE?) should never trigger this callback", type);
@@ -138,7 +138,8 @@ static int mark_object(struct object *obj, int type, void *data, struct fsck_opt
 	if (!obj) {
 		/* ... these references to parent->fld are safe here */
 		printf_ln(_("broken link from %7s %s"),
-			  printable_type(parent), describe_object(parent));
+			  printable_type(&parent->oid, parent->type),
+			  describe_object(&parent->oid));
 		printf_ln(_("broken link from %7s %s"),
 			  (type == OBJ_ANY ? _("unknown") : type_name(type)),
 			  _("unknown"));
@@ -166,10 +167,10 @@ static int mark_object(struct object *obj, int type, void *data, struct fsck_opt
 		if (parent && !has_object_file(&obj->oid)) {
 			printf_ln(_("broken link from %7s %s\n"
 				    "              to %7s %s"),
-				  printable_type(parent),
-				  describe_object(parent),
-				  printable_type(obj),
-				  describe_object(obj));
+				  printable_type(&parent->oid, parent->type),
+				  describe_object(&parent->oid),
+				  printable_type(&obj->oid, obj->type),
+				  describe_object(&obj->oid));
 			errors_found |= ERROR_REACHABLE;
 		}
 		return 1;
@@ -275,8 +276,9 @@ static void check_reachable_object(struct object *obj)
 			return;
 		if (has_object_pack(&obj->oid))
 			return; /* it is in pack - forget about it */
-		printf_ln(_("missing %s %s"), printable_type(obj),
-			  describe_object(obj));
+		printf_ln(_("missing %s %s"),
+			  printable_type(&obj->oid, obj->type),
+			  describe_object(&obj->oid));
 		errors_found |= ERROR_REACHABLE;
 		return;
 	}
@@ -301,8 +303,9 @@ static void check_unreachable_object(struct object *obj)
 	 * since this is something that is prunable.
 	 */
 	if (show_unreachable) {
-		printf_ln(_("unreachable %s %s"), printable_type(obj),
-			  describe_object(obj));
+		printf_ln(_("unreachable %s %s"),
+			  printable_type(&obj->oid, obj->type),
+			  describe_object(&obj->oid));
 		return;
 	}
 
@@ -320,12 +323,13 @@ static void check_unreachable_object(struct object *obj)
 	 */
 	if (!(obj->flags & USED)) {
 		if (show_dangling)
-			printf_ln(_("dangling %s %s"), printable_type(obj),
-				  describe_object(obj));
+			printf_ln(_("dangling %s %s"),
+				  printable_type(&obj->oid, obj->type),
+				  describe_object(&obj->oid));
 		if (write_lost_and_found) {
 			char *filename = git_pathdup("lost-found/%s/%s",
 				obj->type == OBJ_COMMIT ? "commit" : "other",
-				describe_object(obj));
+				describe_object(&obj->oid));
 			FILE *f;
 
 			if (safe_create_leading_directories_const(filename)) {
@@ -338,7 +342,7 @@ static void check_unreachable_object(struct object *obj)
 				if (stream_blob_to_fd(fileno(f), &obj->oid, NULL, 1))
 					die_errno(_("could not write '%s'"), filename);
 			} else
-				fprintf(f, "%s\n", describe_object(obj));
+				fprintf(f, "%s\n", describe_object(&obj->oid));
 			if (fclose(f))
 				die_errno(_("could not finish '%s'"),
 					  filename);
@@ -357,7 +361,7 @@ static void check_unreachable_object(struct object *obj)
 static void check_object(struct object *obj)
 {
 	if (verbose)
-		fprintf_ln(stderr, _("Checking %s"), describe_object(obj));
+		fprintf_ln(stderr, _("Checking %s"), describe_object(&obj->oid));
 
 	if (obj->flags & REACHABLE)
 		check_reachable_object(obj);
@@ -415,7 +419,8 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size)
 
 	if (verbose)
 		fprintf_ln(stderr, _("Checking %s %s"),
-			   printable_type(obj), describe_object(obj));
+			   printable_type(&obj->oid, obj->type),
+			   describe_object(&obj->oid));
 
 	if (fsck_walk(obj, NULL, &fsck_obj_options))
 		objerror(obj, _("broken links"));
@@ -428,7 +433,7 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size)
 
 		if (!commit->parents && show_root)
 			printf_ln(_("root %s"),
-				  describe_object(&commit->object));
+				  describe_object(&commit->object.oid));
 	}
 
 	if (obj->type == OBJ_TAG) {
@@ -436,10 +441,10 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size)
 
 		if (show_tags && tag->tagged) {
 			printf_ln(_("tagged %s %s (%s) in %s"),
-				  printable_type(tag->tagged),
-				  describe_object(tag->tagged),
+				  printable_type(&tag->tagged->oid, tag->tagged->type),
+				  describe_object(&tag->tagged->oid),
 				  tag->tag,
-				  describe_object(&tag->object));
+				  describe_object(&tag->object.oid));
 		}
 	}
 
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 13/23] fsck: only provide oid/type in fsck_error callback
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (11 preceding siblings ...)
  2019-10-18  4:58 ` [PATCH 12/23] fsck: don't require object structs for display functions Jeff King
@ 2019-10-18  4:58 ` Jeff King
  2019-10-18  4:58 ` [PATCH 14/23] fsck: only require an oid for skiplist functions Jeff King
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:58 UTC (permalink / raw)
  To: git

None of the callbacks actually care about having a "struct object";
they're happy with just the oid and type information. So let's give
ourselves more flexibility to avoid having a "struct object" by just
passing the broken-down fields.

Note that the callback already takes a "type" field for the fsck message
type. We'll rename that to "msg_type" (and use "object_type" for the
object type) to make the distinction explicit.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c | 17 ++++++++++-------
 fsck.c         | 11 +++++++----
 fsck.h         |  6 ++++--
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 59c77c1baa..8d13794b14 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -104,23 +104,26 @@ static int objerror(struct object *obj, const char *err)
 }
 
 static int fsck_error_func(struct fsck_options *o,
-	struct object *obj, int type, const char *message)
+			   const struct object_id *oid,
+			   enum object_type object_type,
+			   int msg_type, const char *message)
 {
-	switch (type) {
+	switch (msg_type) {
 	case FSCK_WARN:
 		/* TRANSLATORS: e.g. warning in tree 01bfda: <more explanation> */
 		fprintf_ln(stderr, _("warning in %s %s: %s"),
-			   printable_type(&obj->oid, obj->type),
-			   describe_object(&obj->oid), message);
+			   printable_type(oid, object_type),
+			   describe_object(oid), message);
 		return 0;
 	case FSCK_ERROR:
 		/* TRANSLATORS: e.g. error in tree 01bfda: <more explanation> */
 		fprintf_ln(stderr, _("error in %s %s: %s"),
-			   printable_type(&obj->oid, obj->type),
-			   describe_object(&obj->oid), message);
+			   printable_type(oid, object_type),
+			   describe_object(oid), message);
 		return 1;
 	default:
-		BUG("%d (FSCK_IGNORE?) should never trigger this callback", type);
+		BUG("%d (FSCK_IGNORE?) should never trigger this callback",
+		    msg_type);
 	}
 }
 
diff --git a/fsck.c b/fsck.c
index 124c0184d4..c036ba09ab 100644
--- a/fsck.c
+++ b/fsck.c
@@ -305,7 +305,8 @@ static int report(struct fsck_options *options, struct object *object,
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-	result = options->error_func(options, object, msg_type, sb.buf);
+	result = options->error_func(options, &object->oid, object->type,
+				     msg_type, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
 
@@ -983,13 +984,15 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 }
 
 int fsck_error_function(struct fsck_options *o,
-	struct object *obj, int msg_type, const char *message)
+			const struct object_id *oid,
+			enum object_type object_type,
+			int msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
-		warning("object %s: %s", fsck_describe_object(o, &obj->oid), message);
+		warning("object %s: %s", fsck_describe_object(o, oid), message);
 		return 0;
 	}
-	error("object %s: %s", fsck_describe_object(o, &obj->oid), message);
+	error("object %s: %s", fsck_describe_object(o, oid), message);
 	return 1;
 }
 
diff --git a/fsck.h b/fsck.h
index 4397cab20f..f81aedf12f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -27,10 +27,12 @@ typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct f
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-	struct object *obj, int type, const char *message);
+			  const struct object_id *oid, enum object_type object_type,
+			  int msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
-	struct object *obj, int type, const char *message);
+			const struct object_id *oid, enum object_type object_type,
+			int msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 14/23] fsck: only require an oid for skiplist functions
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (12 preceding siblings ...)
  2019-10-18  4:58 ` [PATCH 13/23] fsck: only provide oid/type in fsck_error callback Jeff King
@ 2019-10-18  4:58 ` Jeff King
  2019-10-18  4:59 ` [PATCH 15/23] fsck: don't require an object struct for report() Jeff King
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:58 UTC (permalink / raw)
  To: git

The skiplist is inherently an oidset, so we don't need a full object
struct. Let's take just the oid to give our callers more flexibility.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fsck.c b/fsck.c
index c036ba09ab..2309c40a11 100644
--- a/fsck.c
+++ b/fsck.c
@@ -277,9 +277,10 @@ static void append_msg_id(struct strbuf *sb, const char *msg_id)
 	strbuf_addstr(sb, ": ");
 }
 
-static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
+static int object_on_skiplist(struct fsck_options *opts,
+			      const struct object_id *oid)
 {
-	return opts && obj && oidset_contains(&opts->skiplist, &obj->oid);
+	return opts && oid && oidset_contains(&opts->skiplist, oid);
 }
 
 __attribute__((format (printf, 4, 5)))
@@ -293,7 +294,7 @@ static int report(struct fsck_options *options, struct object *object,
 	if (msg_type == FSCK_IGNORE)
 		return 0;
 
-	if (object_on_skiplist(options, object))
+	if (object_on_skiplist(options, &object->oid))
 		return 0;
 
 	if (msg_type == FSCK_FATAL)
@@ -935,7 +936,7 @@ static int fsck_blob(struct blob *blob, const char *buf,
 		return 0;
 	oidset_insert(&gitmodules_done, &blob->object.oid);
 
-	if (object_on_skiplist(options, &blob->object))
+	if (object_on_skiplist(options, &blob->object.oid))
 		return 0;
 
 	if (!buf) {
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 15/23] fsck: don't require an object struct for report()
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (13 preceding siblings ...)
  2019-10-18  4:58 ` [PATCH 14/23] fsck: only require an oid for skiplist functions Jeff King
@ 2019-10-18  4:59 ` Jeff King
  2019-10-18  4:59 ` [PATCH 16/23] fsck: accept an oid instead of a "struct blob" for fsck_blob() Jeff King
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:59 UTC (permalink / raw)
  To: git

The report() function really only cares about the oid and type of the
object, not the full object struct. Let's convert it to take those two
items separately, which gives our callers more flexibility.

This makes some already-long lines even longer. I've mostly left them,
as our eventual goal is to shrink these down as we continue refactoring
(e.g., "&item->object" becomes "&item->object.oid, item->object.type",
but will eventually shrink down to "oid, type").

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 128 +++++++++++++++++++++++++++++++--------------------------
 1 file changed, 69 insertions(+), 59 deletions(-)

diff --git a/fsck.c b/fsck.c
index 2309c40a11..465247be71 100644
--- a/fsck.c
+++ b/fsck.c
@@ -283,9 +283,10 @@ static int object_on_skiplist(struct fsck_options *opts,
 	return opts && oid && oidset_contains(&opts->skiplist, oid);
 }
 
-__attribute__((format (printf, 4, 5)))
-static int report(struct fsck_options *options, struct object *object,
-	enum fsck_msg_id id, const char *fmt, ...)
+__attribute__((format (printf, 5, 6)))
+static int report(struct fsck_options *options,
+		  const struct object_id *oid, enum object_type object_type,
+		  enum fsck_msg_id id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
@@ -294,7 +295,7 @@ static int report(struct fsck_options *options, struct object *object,
 	if (msg_type == FSCK_IGNORE)
 		return 0;
 
-	if (object_on_skiplist(options, &object->oid))
+	if (object_on_skiplist(options, oid))
 		return 0;
 
 	if (msg_type == FSCK_FATAL)
@@ -306,7 +307,7 @@ static int report(struct fsck_options *options, struct object *object,
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-	result = options->error_func(options, &object->oid, object->type,
+	result = options->error_func(options, oid, object_type,
 				     msg_type, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
@@ -585,7 +586,7 @@ static int fsck_tree(struct tree *item,
 	const char *o_name;
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -611,13 +612,14 @@ static int fsck_tree(struct tree *item,
 			if (!S_ISLNK(mode))
 				oidset_insert(&gitmodules_found, oid);
 			else
-				retval += report(options, &item->object,
+				retval += report(options,
+						 &item->object.oid, item->object.type,
 						 FSCK_MSG_GITMODULES_SYMLINK,
 						 ".gitmodules is a symbolic link");
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 			break;
 		}
 
@@ -662,25 +664,25 @@ static int fsck_tree(struct tree *item,
 	}
 
 	if (has_null_sha1)
-		retval += report(options, &item->object, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, &item->object, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, &item->object, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
 	if (has_dot)
-		retval += report(options, &item->object, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_HAS_DOT, "contains '.'");
 	if (has_dotdot)
-		retval += report(options, &item->object, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_HAS_DOTDOT, "contains '..'");
 	if (has_dotgit)
-		retval += report(options, &item->object, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, &item->object, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, &item->object, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, &item->object, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, &item->object, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
 	return retval;
 }
 
@@ -693,7 +695,7 @@ static int verify_headers(const void *data, unsigned long size,
 	for (i = 0; i < size; i++) {
 		switch (buffer[i]) {
 		case '\0':
-			return report(options, obj,
+			return report(options, &obj->oid, obj->type,
 				FSCK_MSG_NUL_IN_HEADER,
 				"unterminated header: NUL at offset %ld", i);
 		case '\n':
@@ -711,7 +713,7 @@ static int verify_headers(const void *data, unsigned long size,
 	if (size && buffer[size - 1] == '\n')
 		return 0;
 
-	return report(options, obj,
+	return report(options, &obj->oid, obj->type,
 		FSCK_MSG_UNTERMINATED_HEADER, "unterminated header");
 }
 
@@ -725,28 +727,28 @@ static int fsck_ident(const char **ident, struct object *obj, struct fsck_option
 		(*ident)++;
 
 	if (*p == '<')
-		return report(options, obj, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
 	p += strcspn(p, "<>\n");
 	if (*p == '>')
-		return report(options, obj, FSCK_MSG_BAD_NAME, "invalid author/committer line - bad name");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_NAME, "invalid author/committer line - bad name");
 	if (*p != '<')
-		return report(options, obj, FSCK_MSG_MISSING_EMAIL, "invalid author/committer line - missing email");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_MISSING_EMAIL, "invalid author/committer line - missing email");
 	if (p[-1] != ' ')
-		return report(options, obj, FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
 	p++;
 	p += strcspn(p, "<>\n");
 	if (*p != '>')
-		return report(options, obj, FSCK_MSG_BAD_EMAIL, "invalid author/committer line - bad email");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_EMAIL, "invalid author/committer line - bad email");
 	p++;
 	if (*p != ' ')
-		return report(options, obj, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, "invalid author/committer line - missing space before date");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, "invalid author/committer line - missing space before date");
 	p++;
 	if (*p == '0' && p[1] != ' ')
-		return report(options, obj, FSCK_MSG_ZERO_PADDED_DATE, "invalid author/committer line - zero-padded date");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_ZERO_PADDED_DATE, "invalid author/committer line - zero-padded date");
 	if (date_overflows(parse_timestamp(p, &end, 10)))
-		return report(options, obj, FSCK_MSG_BAD_DATE_OVERFLOW, "invalid author/committer line - date causes integer overflow");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_DATE_OVERFLOW, "invalid author/committer line - date causes integer overflow");
 	if ((end == p || *end != ' '))
-		return report(options, obj, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date");
 	p = end + 1;
 	if ((*p != '+' && *p != '-') ||
 	    !isdigit(p[1]) ||
@@ -754,7 +756,7 @@ static int fsck_ident(const char **ident, struct object *obj, struct fsck_option
 	    !isdigit(p[3]) ||
 	    !isdigit(p[4]) ||
 	    (p[5] != '\n'))
-		return report(options, obj, FSCK_MSG_BAD_TIMEZONE, "invalid author/committer line - bad time zone");
+		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_TIMEZONE, "invalid author/committer line - bad time zone");
 	p += 6;
 	return 0;
 }
@@ -772,16 +774,16 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 		return -1;
 
 	if (!skip_prefix(buffer, "tree ", &buffer))
-		return report(options, &commit->object, FSCK_MSG_MISSING_TREE, "invalid format - expected 'tree' line");
+		return report(options, &commit->object.oid, commit->object.type, FSCK_MSG_MISSING_TREE, "invalid format - expected 'tree' line");
 	if (parse_oid_hex(buffer, &tree_oid, &p) || *p != '\n') {
-		err = report(options, &commit->object, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1");
+		err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1");
 		if (err)
 			return err;
 	}
 	buffer = p + 1;
 	while (skip_prefix(buffer, "parent ", &buffer)) {
 		if (parse_oid_hex(buffer, &oid, &p) || *p != '\n') {
-			err = report(options, &commit->object, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
+			err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
 			if (err)
 				return err;
 		}
@@ -795,18 +797,18 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 			return err;
 	}
 	if (author_count < 1)
-		err = report(options, &commit->object, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
+		err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
 	else if (author_count > 1)
-		err = report(options, &commit->object, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
+		err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
 	if (err)
 		return err;
 	if (!skip_prefix(buffer, "committer ", &buffer))
-		return report(options, &commit->object, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
+		return report(options, &commit->object.oid, commit->object.type, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
 	err = fsck_ident(&buffer, &commit->object, options);
 	if (err)
 		return err;
 	if (memchr(buffer_begin, '\0', size)) {
-		err = report(options, &commit->object, FSCK_MSG_NUL_IN_COMMIT,
+		err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_NUL_IN_COMMIT,
 			     "NUL byte in the commit object body");
 		if (err)
 			return err;
@@ -828,45 +830,46 @@ static int fsck_tag(struct tag *tag, const char *buffer,
 		goto done;
 
 	if (!skip_prefix(buffer, "object ", &buffer)) {
-		ret = report(options, &tag->object, FSCK_MSG_MISSING_OBJECT, "invalid format - expected 'object' line");
+		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_OBJECT, "invalid format - expected 'object' line");
 		goto done;
 	}
 	if (parse_oid_hex(buffer, &oid, &p) || *p != '\n') {
-		ret = report(options, &tag->object, FSCK_MSG_BAD_OBJECT_SHA1, "invalid 'object' line format - bad sha1");
+		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_BAD_OBJECT_SHA1, "invalid 'object' line format - bad sha1");
 		if (ret)
 			goto done;
 	}
 	buffer = p + 1;
 
 	if (!skip_prefix(buffer, "type ", &buffer)) {
-		ret = report(options, &tag->object, FSCK_MSG_MISSING_TYPE_ENTRY, "invalid format - expected 'type' line");
+		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TYPE_ENTRY, "invalid format - expected 'type' line");
 		goto done;
 	}
 	eol = strchr(buffer, '\n');
 	if (!eol) {
-		ret = report(options, &tag->object, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
+		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
 	if (type_from_string_gently(buffer, eol - buffer, 1) < 0)
-		ret = report(options, &tag->object, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
+		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
 	if (ret)
 		goto done;
 	buffer = eol + 1;
 
 	if (!skip_prefix(buffer, "tag ", &buffer)) {
-		ret = report(options, &tag->object, FSCK_MSG_MISSING_TAG_ENTRY, "invalid format - expected 'tag' line");
+		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TAG_ENTRY, "invalid format - expected 'tag' line");
 		goto done;
 	}
 	eol = strchr(buffer, '\n');
 	if (!eol) {
-		ret = report(options, &tag->object, FSCK_MSG_MISSING_TAG, "invalid format - unexpected end after 'type' line");
+		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TAG, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
 	strbuf_addf(&sb, "refs/tags/%.*s", (int)(eol - buffer), buffer);
 	if (check_refname_format(sb.buf, 0)) {
-		ret = report(options, &tag->object, FSCK_MSG_BAD_TAG_NAME,
-			   "invalid 'tag' name: %.*s",
-			   (int)(eol - buffer), buffer);
+		ret = report(options, &tag->object.oid, tag->object.type,
+			     FSCK_MSG_BAD_TAG_NAME,
+			     "invalid 'tag' name: %.*s",
+			     (int)(eol - buffer), buffer);
 		if (ret)
 			goto done;
 	}
@@ -874,7 +877,7 @@ static int fsck_tag(struct tag *tag, const char *buffer,
 
 	if (!skip_prefix(buffer, "tagger ", &buffer)) {
 		/* early tags do not contain 'tagger' lines; warn only */
-		ret = report(options, &tag->object, FSCK_MSG_MISSING_TAGGER_ENTRY, "invalid format - expected 'tagger' line");
+		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TAGGER_ENTRY, "invalid format - expected 'tagger' line");
 		if (ret)
 			goto done;
 	}
@@ -905,19 +908,22 @@ static int fsck_gitmodules_fn(const char *var, const char *value, void *vdata)
 
 	name = xmemdupz(subsection, subsection_len);
 	if (check_submodule_name(name) < 0)
-		data->ret |= report(data->options, data->obj,
+		data->ret |= report(data->options,
+				    &data->obj->oid, data->obj->type,
 				    FSCK_MSG_GITMODULES_NAME,
 				    "disallowed submodule name: %s",
 				    name);
 	if (!strcmp(key, "url") && value &&
 	    looks_like_command_line_option(value))
-		data->ret |= report(data->options, data->obj,
+		data->ret |= report(data->options,
+				    &data->obj->oid, data->obj->type,
 				    FSCK_MSG_GITMODULES_URL,
 				    "disallowed submodule url: %s",
 				    value);
 	if (!strcmp(key, "path") && value &&
 	    looks_like_command_line_option(value))
-		data->ret |= report(data->options, data->obj,
+		data->ret |= report(data->options,
+				    &data->obj->oid, data->obj->type,
 				    FSCK_MSG_GITMODULES_PATH,
 				    "disallowed submodule path: %s",
 				    value);
@@ -945,7 +951,7 @@ static int fsck_blob(struct blob *blob, const char *buf,
 		 * blob too gigantic to load into memory. Let's just consider
 		 * that an error.
 		 */
-		return report(options, &blob->object,
+		return report(options, &blob->object.oid, blob->object.type,
 			      FSCK_MSG_GITMODULES_LARGE,
 			      ".gitmodules too large to parse");
 	}
@@ -956,7 +962,7 @@ static int fsck_blob(struct blob *blob, const char *buf,
 	config_opts.error_action = CONFIG_ERROR_SILENT;
 	if (git_config_from_mem(fsck_gitmodules_fn, CONFIG_ORIGIN_BLOB,
 				".gitmodules", buf, size, &data, &config_opts))
-		data.ret |= report(options, &blob->object,
+		data.ret |= report(options, &blob->object.oid, blob->object.type,
 				   FSCK_MSG_GITMODULES_PARSE,
 				   "could not parse gitmodules blob");
 
@@ -967,7 +973,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options)
 {
 	if (!obj)
-		return report(options, obj, FSCK_MSG_BAD_OBJECT_SHA1, "no valid object to fsck");
+		return report(options, NULL, OBJ_NONE, FSCK_MSG_BAD_OBJECT_SHA1, "no valid object to fsck");
 
 	if (obj->type == OBJ_BLOB)
 		return fsck_blob((struct blob *)obj, data, size, options);
@@ -980,8 +986,10 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 		return fsck_tag((struct tag *) obj, (const char *) data,
 			size, options);
 
-	return report(options, obj, FSCK_MSG_UNKNOWN_TYPE, "unknown type '%d' (internal fsck error)",
-			  obj->type);
+	return report(options, &obj->oid, obj->type,
+		      FSCK_MSG_UNKNOWN_TYPE,
+		      "unknown type '%d' (internal fsck error)",
+		      obj->type);
 }
 
 int fsck_error_function(struct fsck_options *o,
@@ -1016,7 +1024,7 @@ int fsck_finish(struct fsck_options *options)
 		blob = lookup_blob(the_repository, oid);
 		if (!blob) {
 			struct object *obj = lookup_unknown_object(oid);
-			ret |= report(options, obj,
+			ret |= report(options, &obj->oid, obj->type,
 				      FSCK_MSG_GITMODULES_BLOB,
 				      "non-blob found at .gitmodules");
 			continue;
@@ -1026,7 +1034,8 @@ int fsck_finish(struct fsck_options *options)
 		if (!buf) {
 			if (is_promisor_object(&blob->object.oid))
 				continue;
-			ret |= report(options, &blob->object,
+			ret |= report(options,
+				      &blob->object.oid, blob->object.type,
 				      FSCK_MSG_GITMODULES_MISSING,
 				      "unable to read .gitmodules blob");
 			continue;
@@ -1035,7 +1044,8 @@ int fsck_finish(struct fsck_options *options)
 		if (type == OBJ_BLOB)
 			ret |= fsck_blob(blob, buf, size, options);
 		else
-			ret |= report(options, &blob->object,
+			ret |= report(options,
+				      &blob->object.oid, blob->object.type,
 				      FSCK_MSG_GITMODULES_BLOB,
 				      "non-blob found at .gitmodules");
 		free(buf);
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 16/23] fsck: accept an oid instead of a "struct blob" for fsck_blob()
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (14 preceding siblings ...)
  2019-10-18  4:59 ` [PATCH 15/23] fsck: don't require an object struct for report() Jeff King
@ 2019-10-18  4:59 ` Jeff King
  2019-10-18  4:59 ` [PATCH 17/23] fsck: drop blob struct from fsck_finish() Jeff King
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:59 UTC (permalink / raw)
  To: git

We don't actually need any information from the object struct except its
oid (and the type, of course, but that's implicitly OBJ_BLOB). This
gives our callers more flexibility to drop the object structs, too.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/fsck.c b/fsck.c
index 465247be71..6e9640a1a6 100644
--- a/fsck.c
+++ b/fsck.c
@@ -890,7 +890,7 @@ static int fsck_tag(struct tag *tag, const char *buffer,
 }
 
 struct fsck_gitmodules_data {
-	struct object *obj;
+	const struct object_id *oid;
 	struct fsck_options *options;
 	int ret;
 };
@@ -909,21 +909,21 @@ static int fsck_gitmodules_fn(const char *var, const char *value, void *vdata)
 	name = xmemdupz(subsection, subsection_len);
 	if (check_submodule_name(name) < 0)
 		data->ret |= report(data->options,
-				    &data->obj->oid, data->obj->type,
+				    data->oid, OBJ_BLOB,
 				    FSCK_MSG_GITMODULES_NAME,
 				    "disallowed submodule name: %s",
 				    name);
 	if (!strcmp(key, "url") && value &&
 	    looks_like_command_line_option(value))
 		data->ret |= report(data->options,
-				    &data->obj->oid, data->obj->type,
+				    data->oid, OBJ_BLOB,
 				    FSCK_MSG_GITMODULES_URL,
 				    "disallowed submodule url: %s",
 				    value);
 	if (!strcmp(key, "path") && value &&
 	    looks_like_command_line_option(value))
 		data->ret |= report(data->options,
-				    &data->obj->oid, data->obj->type,
+				    data->oid, OBJ_BLOB,
 				    FSCK_MSG_GITMODULES_PATH,
 				    "disallowed submodule path: %s",
 				    value);
@@ -932,17 +932,17 @@ static int fsck_gitmodules_fn(const char *var, const char *value, void *vdata)
 	return 0;
 }
 
-static int fsck_blob(struct blob *blob, const char *buf,
+static int fsck_blob(const struct object_id *oid, const char *buf,
 		     unsigned long size, struct fsck_options *options)
 {
 	struct fsck_gitmodules_data data;
 	struct config_options config_opts = { 0 };
 
-	if (!oidset_contains(&gitmodules_found, &blob->object.oid))
+	if (!oidset_contains(&gitmodules_found, oid))
 		return 0;
-	oidset_insert(&gitmodules_done, &blob->object.oid);
+	oidset_insert(&gitmodules_done, oid);
 
-	if (object_on_skiplist(options, &blob->object.oid))
+	if (object_on_skiplist(options, oid))
 		return 0;
 
 	if (!buf) {
@@ -951,18 +951,18 @@ static int fsck_blob(struct blob *blob, const char *buf,
 		 * blob too gigantic to load into memory. Let's just consider
 		 * that an error.
 		 */
-		return report(options, &blob->object.oid, blob->object.type,
+		return report(options, oid, OBJ_BLOB,
 			      FSCK_MSG_GITMODULES_LARGE,
 			      ".gitmodules too large to parse");
 	}
 
-	data.obj = &blob->object;
+	data.oid = oid;
 	data.options = options;
 	data.ret = 0;
 	config_opts.error_action = CONFIG_ERROR_SILENT;
 	if (git_config_from_mem(fsck_gitmodules_fn, CONFIG_ORIGIN_BLOB,
 				".gitmodules", buf, size, &data, &config_opts))
-		data.ret |= report(options, &blob->object.oid, blob->object.type,
+		data.ret |= report(options, oid, OBJ_BLOB,
 				   FSCK_MSG_GITMODULES_PARSE,
 				   "could not parse gitmodules blob");
 
@@ -976,7 +976,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 		return report(options, NULL, OBJ_NONE, FSCK_MSG_BAD_OBJECT_SHA1, "no valid object to fsck");
 
 	if (obj->type == OBJ_BLOB)
-		return fsck_blob((struct blob *)obj, data, size, options);
+		return fsck_blob(&obj->oid, data, size, options);
 	if (obj->type == OBJ_TREE)
 		return fsck_tree((struct tree *) obj, data, size, options);
 	if (obj->type == OBJ_COMMIT)
@@ -1042,7 +1042,7 @@ int fsck_finish(struct fsck_options *options)
 		}
 
 		if (type == OBJ_BLOB)
-			ret |= fsck_blob(blob, buf, size, options);
+			ret |= fsck_blob(&blob->object.oid, buf, size, options);
 		else
 			ret |= report(options,
 				      &blob->object.oid, blob->object.type,
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 17/23] fsck: drop blob struct from fsck_finish()
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (15 preceding siblings ...)
  2019-10-18  4:59 ` [PATCH 16/23] fsck: accept an oid instead of a "struct blob" for fsck_blob() Jeff King
@ 2019-10-18  4:59 ` Jeff King
  2019-10-18  5:00 ` [PATCH 18/23] fsck: don't require an object struct for fsck_ident() Jeff King
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  4:59 UTC (permalink / raw)
  To: git

Since fsck_blob() no longer requires us to have a "struct blob", we
don't need to create one. Which also means we don't need to worry about
handling the case that lookup_blob() returns NULL (we'll still catch
wrongly-identified blobs when we read the actual object contents and
type from disk).

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/fsck.c b/fsck.c
index 6e9640a1a6..4ff0ceb4ac 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1013,7 +1013,6 @@ int fsck_finish(struct fsck_options *options)
 
 	oidset_iter_init(&gitmodules_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
-		struct blob *blob;
 		enum object_type type;
 		unsigned long size;
 		char *buf;
@@ -1021,31 +1020,22 @@ int fsck_finish(struct fsck_options *options)
 		if (oidset_contains(&gitmodules_done, oid))
 			continue;
 
-		blob = lookup_blob(the_repository, oid);
-		if (!blob) {
-			struct object *obj = lookup_unknown_object(oid);
-			ret |= report(options, &obj->oid, obj->type,
-				      FSCK_MSG_GITMODULES_BLOB,
-				      "non-blob found at .gitmodules");
-			continue;
-		}
-
 		buf = read_object_file(oid, &type, &size);
 		if (!buf) {
-			if (is_promisor_object(&blob->object.oid))
+			if (is_promisor_object(oid))
 				continue;
 			ret |= report(options,
-				      &blob->object.oid, blob->object.type,
+				      oid, OBJ_BLOB,
 				      FSCK_MSG_GITMODULES_MISSING,
 				      "unable to read .gitmodules blob");
 			continue;
 		}
 
 		if (type == OBJ_BLOB)
-			ret |= fsck_blob(&blob->object.oid, buf, size, options);
+			ret |= fsck_blob(oid, buf, size, options);
 		else
 			ret |= report(options,
-				      &blob->object.oid, blob->object.type,
+				      oid, type,
 				      FSCK_MSG_GITMODULES_BLOB,
 				      "non-blob found at .gitmodules");
 		free(buf);
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 18/23] fsck: don't require an object struct for fsck_ident()
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (16 preceding siblings ...)
  2019-10-18  4:59 ` [PATCH 17/23] fsck: drop blob struct from fsck_finish() Jeff King
@ 2019-10-18  5:00 ` Jeff King
  2019-10-18  5:00 ` [PATCH 19/23] fsck: don't require an object struct in verify_headers() Jeff King
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  5:00 UTC (permalink / raw)
  To: git

The only thing we do with the struct is pass its oid and type to
report(). We can just take those explicitly, which gives our callers
more flexibility.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/fsck.c b/fsck.c
index 4ff0ceb4ac..e1d06fb210 100644
--- a/fsck.c
+++ b/fsck.c
@@ -717,7 +717,9 @@ static int verify_headers(const void *data, unsigned long size,
 		FSCK_MSG_UNTERMINATED_HEADER, "unterminated header");
 }
 
-static int fsck_ident(const char **ident, struct object *obj, struct fsck_options *options)
+static int fsck_ident(const char **ident,
+		      const struct object_id *oid, enum object_type type,
+		      struct fsck_options *options)
 {
 	const char *p = *ident;
 	char *end;
@@ -727,28 +729,28 @@ static int fsck_ident(const char **ident, struct object *obj, struct fsck_option
 		(*ident)++;
 
 	if (*p == '<')
-		return report(options, &obj->oid, obj->type, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
+		return report(options, oid, type, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
 	p += strcspn(p, "<>\n");
 	if (*p == '>')
-		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_NAME, "invalid author/committer line - bad name");
+		return report(options, oid, type, FSCK_MSG_BAD_NAME, "invalid author/committer line - bad name");
 	if (*p != '<')
-		return report(options, &obj->oid, obj->type, FSCK_MSG_MISSING_EMAIL, "invalid author/committer line - missing email");
+		return report(options, oid, type, FSCK_MSG_MISSING_EMAIL, "invalid author/committer line - missing email");
 	if (p[-1] != ' ')
-		return report(options, &obj->oid, obj->type, FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
+		return report(options, oid, type, FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, "invalid author/committer line - missing space before email");
 	p++;
 	p += strcspn(p, "<>\n");
 	if (*p != '>')
-		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_EMAIL, "invalid author/committer line - bad email");
+		return report(options, oid, type, FSCK_MSG_BAD_EMAIL, "invalid author/committer line - bad email");
 	p++;
 	if (*p != ' ')
-		return report(options, &obj->oid, obj->type, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, "invalid author/committer line - missing space before date");
+		return report(options, oid, type, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, "invalid author/committer line - missing space before date");
 	p++;
 	if (*p == '0' && p[1] != ' ')
-		return report(options, &obj->oid, obj->type, FSCK_MSG_ZERO_PADDED_DATE, "invalid author/committer line - zero-padded date");
+		return report(options, oid, type, FSCK_MSG_ZERO_PADDED_DATE, "invalid author/committer line - zero-padded date");
 	if (date_overflows(parse_timestamp(p, &end, 10)))
-		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_DATE_OVERFLOW, "invalid author/committer line - date causes integer overflow");
+		return report(options, oid, type, FSCK_MSG_BAD_DATE_OVERFLOW, "invalid author/committer line - date causes integer overflow");
 	if ((end == p || *end != ' '))
-		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date");
+		return report(options, oid, type, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date");
 	p = end + 1;
 	if ((*p != '+' && *p != '-') ||
 	    !isdigit(p[1]) ||
@@ -756,7 +758,7 @@ static int fsck_ident(const char **ident, struct object *obj, struct fsck_option
 	    !isdigit(p[3]) ||
 	    !isdigit(p[4]) ||
 	    (p[5] != '\n'))
-		return report(options, &obj->oid, obj->type, FSCK_MSG_BAD_TIMEZONE, "invalid author/committer line - bad time zone");
+		return report(options, oid, type, FSCK_MSG_BAD_TIMEZONE, "invalid author/committer line - bad time zone");
 	p += 6;
 	return 0;
 }
@@ -792,7 +794,7 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 	author_count = 0;
 	while (skip_prefix(buffer, "author ", &buffer)) {
 		author_count++;
-		err = fsck_ident(&buffer, &commit->object, options);
+		err = fsck_ident(&buffer, &commit->object.oid, commit->object.type, options);
 		if (err)
 			return err;
 	}
@@ -804,7 +806,7 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 		return err;
 	if (!skip_prefix(buffer, "committer ", &buffer))
 		return report(options, &commit->object.oid, commit->object.type, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
-	err = fsck_ident(&buffer, &commit->object, options);
+	err = fsck_ident(&buffer, &commit->object.oid, commit->object.type, options);
 	if (err)
 		return err;
 	if (memchr(buffer_begin, '\0', size)) {
@@ -882,7 +884,7 @@ static int fsck_tag(struct tag *tag, const char *buffer,
 			goto done;
 	}
 	else
-		ret = fsck_ident(&buffer, &tag->object, options);
+		ret = fsck_ident(&buffer, &tag->object.oid, tag->object.type, options);
 
 done:
 	strbuf_release(&sb);
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 19/23] fsck: don't require an object struct in verify_headers()
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (17 preceding siblings ...)
  2019-10-18  5:00 ` [PATCH 18/23] fsck: don't require an object struct for fsck_ident() Jeff King
@ 2019-10-18  5:00 ` Jeff King
  2019-10-18  5:00 ` [PATCH 20/23] fsck: rename vague "oid" local variables Jeff King
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  5:00 UTC (permalink / raw)
  To: git

We only need the oid and type to pass on to report(). Let's accept the
broken-out parameters to give our callers more flexibility.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fsck.c b/fsck.c
index e1d06fb210..50c93200ed 100644
--- a/fsck.c
+++ b/fsck.c
@@ -687,7 +687,8 @@ static int fsck_tree(struct tree *item,
 }
 
 static int verify_headers(const void *data, unsigned long size,
-			  struct object *obj, struct fsck_options *options)
+			  const struct object_id *oid, enum object_type type,
+			  struct fsck_options *options)
 {
 	const char *buffer = (const char *)data;
 	unsigned long i;
@@ -695,7 +696,7 @@ static int verify_headers(const void *data, unsigned long size,
 	for (i = 0; i < size; i++) {
 		switch (buffer[i]) {
 		case '\0':
-			return report(options, &obj->oid, obj->type,
+			return report(options, oid, type,
 				FSCK_MSG_NUL_IN_HEADER,
 				"unterminated header: NUL at offset %ld", i);
 		case '\n':
@@ -713,7 +714,7 @@ static int verify_headers(const void *data, unsigned long size,
 	if (size && buffer[size - 1] == '\n')
 		return 0;
 
-	return report(options, &obj->oid, obj->type,
+	return report(options, oid, type,
 		FSCK_MSG_UNTERMINATED_HEADER, "unterminated header");
 }
 
@@ -772,7 +773,8 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 	const char *buffer_begin = buffer;
 	const char *p;
 
-	if (verify_headers(buffer, size, &commit->object, options))
+	if (verify_headers(buffer, size, &commit->object.oid,
+			   commit->object.type, options))
 		return -1;
 
 	if (!skip_prefix(buffer, "tree ", &buffer))
@@ -827,7 +829,7 @@ static int fsck_tag(struct tag *tag, const char *buffer,
 	struct strbuf sb = STRBUF_INIT;
 	const char *p;
 
-	ret = verify_headers(buffer, size, &tag->object, options);
+	ret = verify_headers(buffer, size, &tag->object.oid, tag->object.type, options);
 	if (ret)
 		goto done;
 
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 20/23] fsck: rename vague "oid" local variables
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (18 preceding siblings ...)
  2019-10-18  5:00 ` [PATCH 19/23] fsck: don't require an object struct in verify_headers() Jeff King
@ 2019-10-18  5:00 ` Jeff King
  2019-10-18  5:01 ` [PATCH 21/23] fsck: accept an oid instead of a "struct tag" for fsck_tag() Jeff King
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  5:00 UTC (permalink / raw)
  To: git

In fsck_commit() and fsck_tag(), we have local "oid" variables used for
parsing parent and tagged-object oids. Let's give these more specific
names in preparation for the functions taking an "oid" parameter for the
object we're checking.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fsck.c b/fsck.c
index 50c93200ed..42e7d1f71f 100644
--- a/fsck.c
+++ b/fsck.c
@@ -767,7 +767,7 @@ static int fsck_ident(const char **ident,
 static int fsck_commit(struct commit *commit, const char *buffer,
 		       unsigned long size, struct fsck_options *options)
 {
-	struct object_id tree_oid, oid;
+	struct object_id tree_oid, parent_oid;
 	unsigned author_count;
 	int err;
 	const char *buffer_begin = buffer;
@@ -786,7 +786,7 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 	}
 	buffer = p + 1;
 	while (skip_prefix(buffer, "parent ", &buffer)) {
-		if (parse_oid_hex(buffer, &oid, &p) || *p != '\n') {
+		if (parse_oid_hex(buffer, &parent_oid, &p) || *p != '\n') {
 			err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
 			if (err)
 				return err;
@@ -823,7 +823,7 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 static int fsck_tag(struct tag *tag, const char *buffer,
 		    unsigned long size, struct fsck_options *options)
 {
-	struct object_id oid;
+	struct object_id tagged_oid;
 	int ret = 0;
 	char *eol;
 	struct strbuf sb = STRBUF_INIT;
@@ -837,7 +837,7 @@ static int fsck_tag(struct tag *tag, const char *buffer,
 		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_OBJECT, "invalid format - expected 'object' line");
 		goto done;
 	}
-	if (parse_oid_hex(buffer, &oid, &p) || *p != '\n') {
+	if (parse_oid_hex(buffer, &tagged_oid, &p) || *p != '\n') {
 		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_BAD_OBJECT_SHA1, "invalid 'object' line format - bad sha1");
 		if (ret)
 			goto done;
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 21/23] fsck: accept an oid instead of a "struct tag" for fsck_tag()
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (19 preceding siblings ...)
  2019-10-18  5:00 ` [PATCH 20/23] fsck: rename vague "oid" local variables Jeff King
@ 2019-10-18  5:01 ` Jeff King
  2019-10-18  5:01 ` [PATCH 22/23] fsck: accept an oid instead of a "struct commit" for fsck_commit() Jeff King
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  5:01 UTC (permalink / raw)
  To: git

We don't actually look at the tag struct in fsck_tag() beyond its oid
and type (which is obviously OBJ_TAG). Just taking an oid gives our
callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/fsck.c b/fsck.c
index 42e7d1f71f..38be501278 100644
--- a/fsck.c
+++ b/fsck.c
@@ -820,7 +820,7 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 	return 0;
 }
 
-static int fsck_tag(struct tag *tag, const char *buffer,
+static int fsck_tag(const struct object_id *oid, const char *buffer,
 		    unsigned long size, struct fsck_options *options)
 {
 	struct object_id tagged_oid;
@@ -829,48 +829,48 @@ static int fsck_tag(struct tag *tag, const char *buffer,
 	struct strbuf sb = STRBUF_INIT;
 	const char *p;
 
-	ret = verify_headers(buffer, size, &tag->object.oid, tag->object.type, options);
+	ret = verify_headers(buffer, size, oid, OBJ_TAG, options);
 	if (ret)
 		goto done;
 
 	if (!skip_prefix(buffer, "object ", &buffer)) {
-		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_OBJECT, "invalid format - expected 'object' line");
+		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_OBJECT, "invalid format - expected 'object' line");
 		goto done;
 	}
 	if (parse_oid_hex(buffer, &tagged_oid, &p) || *p != '\n') {
-		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_BAD_OBJECT_SHA1, "invalid 'object' line format - bad sha1");
+		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_OBJECT_SHA1, "invalid 'object' line format - bad sha1");
 		if (ret)
 			goto done;
 	}
 	buffer = p + 1;
 
 	if (!skip_prefix(buffer, "type ", &buffer)) {
-		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TYPE_ENTRY, "invalid format - expected 'type' line");
+		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TYPE_ENTRY, "invalid format - expected 'type' line");
 		goto done;
 	}
 	eol = strchr(buffer, '\n');
 	if (!eol) {
-		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
+		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TYPE, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
 	if (type_from_string_gently(buffer, eol - buffer, 1) < 0)
-		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
+		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
 	if (ret)
 		goto done;
 	buffer = eol + 1;
 
 	if (!skip_prefix(buffer, "tag ", &buffer)) {
-		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TAG_ENTRY, "invalid format - expected 'tag' line");
+		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TAG_ENTRY, "invalid format - expected 'tag' line");
 		goto done;
 	}
 	eol = strchr(buffer, '\n');
 	if (!eol) {
-		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TAG, "invalid format - unexpected end after 'type' line");
+		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TAG, "invalid format - unexpected end after 'type' line");
 		goto done;
 	}
 	strbuf_addf(&sb, "refs/tags/%.*s", (int)(eol - buffer), buffer);
 	if (check_refname_format(sb.buf, 0)) {
-		ret = report(options, &tag->object.oid, tag->object.type,
+		ret = report(options, oid, OBJ_TAG,
 			     FSCK_MSG_BAD_TAG_NAME,
 			     "invalid 'tag' name: %.*s",
 			     (int)(eol - buffer), buffer);
@@ -881,12 +881,12 @@ static int fsck_tag(struct tag *tag, const char *buffer,
 
 	if (!skip_prefix(buffer, "tagger ", &buffer)) {
 		/* early tags do not contain 'tagger' lines; warn only */
-		ret = report(options, &tag->object.oid, tag->object.type, FSCK_MSG_MISSING_TAGGER_ENTRY, "invalid format - expected 'tagger' line");
+		ret = report(options, oid, OBJ_TAG, FSCK_MSG_MISSING_TAGGER_ENTRY, "invalid format - expected 'tagger' line");
 		if (ret)
 			goto done;
 	}
 	else
-		ret = fsck_ident(&buffer, &tag->object.oid, tag->object.type, options);
+		ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
 
 done:
 	strbuf_release(&sb);
@@ -987,8 +987,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 		return fsck_commit((struct commit *) obj, (const char *) data,
 			size, options);
 	if (obj->type == OBJ_TAG)
-		return fsck_tag((struct tag *) obj, (const char *) data,
-			size, options);
+		return fsck_tag(&obj->oid, data, size, options);
 
 	return report(options, &obj->oid, obj->type,
 		      FSCK_MSG_UNKNOWN_TYPE,
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 22/23] fsck: accept an oid instead of a "struct commit" for fsck_commit()
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (20 preceding siblings ...)
  2019-10-18  5:01 ` [PATCH 21/23] fsck: accept an oid instead of a "struct tag" for fsck_tag() Jeff King
@ 2019-10-18  5:01 ` Jeff King
  2019-10-18  5:02 ` [PATCH 23/23] fsck: accept an oid instead of a "struct tree" for fsck_tree() Jeff King
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  5:01 UTC (permalink / raw)
  To: git

We don't actually look at the commit struct in fsck_commit() beyond its
oid and type (which is obviously OBJ_COMMIT). Just taking an oid gives
our callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/fsck.c b/fsck.c
index 38be501278..f8c5bbe891 100644
--- a/fsck.c
+++ b/fsck.c
@@ -764,8 +764,9 @@ static int fsck_ident(const char **ident,
 	return 0;
 }
 
-static int fsck_commit(struct commit *commit, const char *buffer,
-		       unsigned long size, struct fsck_options *options)
+static int fsck_commit(const struct object_id *oid,
+		       const char *buffer, unsigned long size,
+		       struct fsck_options *options)
 {
 	struct object_id tree_oid, parent_oid;
 	unsigned author_count;
@@ -773,21 +774,20 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 	const char *buffer_begin = buffer;
 	const char *p;
 
-	if (verify_headers(buffer, size, &commit->object.oid,
-			   commit->object.type, options))
+	if (verify_headers(buffer, size, oid, OBJ_COMMIT, options))
 		return -1;
 
 	if (!skip_prefix(buffer, "tree ", &buffer))
-		return report(options, &commit->object.oid, commit->object.type, FSCK_MSG_MISSING_TREE, "invalid format - expected 'tree' line");
+		return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_TREE, "invalid format - expected 'tree' line");
 	if (parse_oid_hex(buffer, &tree_oid, &p) || *p != '\n') {
-		err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1");
+		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_TREE_SHA1, "invalid 'tree' line format - bad sha1");
 		if (err)
 			return err;
 	}
 	buffer = p + 1;
 	while (skip_prefix(buffer, "parent ", &buffer)) {
 		if (parse_oid_hex(buffer, &parent_oid, &p) || *p != '\n') {
-			err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
+			err = report(options, oid, OBJ_COMMIT, FSCK_MSG_BAD_PARENT_SHA1, "invalid 'parent' line format - bad sha1");
 			if (err)
 				return err;
 		}
@@ -796,23 +796,23 @@ static int fsck_commit(struct commit *commit, const char *buffer,
 	author_count = 0;
 	while (skip_prefix(buffer, "author ", &buffer)) {
 		author_count++;
-		err = fsck_ident(&buffer, &commit->object.oid, commit->object.type, options);
+		err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
 		if (err)
 			return err;
 	}
 	if (author_count < 1)
-		err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
+		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_AUTHOR, "invalid format - expected 'author' line");
 	else if (author_count > 1)
-		err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
+		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_MULTIPLE_AUTHORS, "invalid format - multiple 'author' lines");
 	if (err)
 		return err;
 	if (!skip_prefix(buffer, "committer ", &buffer))
-		return report(options, &commit->object.oid, commit->object.type, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
-	err = fsck_ident(&buffer, &commit->object.oid, commit->object.type, options);
+		return report(options, oid, OBJ_COMMIT, FSCK_MSG_MISSING_COMMITTER, "invalid format - expected 'committer' line");
+	err = fsck_ident(&buffer, oid, OBJ_COMMIT, options);
 	if (err)
 		return err;
 	if (memchr(buffer_begin, '\0', size)) {
-		err = report(options, &commit->object.oid, commit->object.type, FSCK_MSG_NUL_IN_COMMIT,
+		err = report(options, oid, OBJ_COMMIT, FSCK_MSG_NUL_IN_COMMIT,
 			     "NUL byte in the commit object body");
 		if (err)
 			return err;
@@ -984,8 +984,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 	if (obj->type == OBJ_TREE)
 		return fsck_tree((struct tree *) obj, data, size, options);
 	if (obj->type == OBJ_COMMIT)
-		return fsck_commit((struct commit *) obj, (const char *) data,
-			size, options);
+		return fsck_commit(&obj->oid, data, size, options);
 	if (obj->type == OBJ_TAG)
 		return fsck_tag(&obj->oid, data, size, options);
 
-- 
2.23.0.1228.gee29b05929


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 23/23] fsck: accept an oid instead of a "struct tree" for fsck_tree()
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (21 preceding siblings ...)
  2019-10-18  5:01 ` [PATCH 22/23] fsck: accept an oid instead of a "struct commit" for fsck_commit() Jeff King
@ 2019-10-18  5:02 ` Jeff King
  2019-10-24 23:49 ` [PATCH 0/23] parsing and fsck cleanups Jonathan Tan
  2019-10-25  3:11 ` Junio C Hamano
  24 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-18  5:02 UTC (permalink / raw)
  To: git

We don't actually look at the tree struct in fsck_tree() beyond its oid
and type (which is obviously OBJ_TREE). Just taking an oid gives our
callers more flexibility to avoid creating a struct, and makes it clear
that we are fscking just what is in the buffer, not any pre-parsed bits
from the struct.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fsck.c b/fsck.c
index f8c5bbe891..ac4ba4c8e8 100644
--- a/fsck.c
+++ b/fsck.c
@@ -566,7 +566,7 @@ static int verify_ordered(unsigned mode1, const char *name1, unsigned mode2, con
 	return c1 < c2 ? 0 : TREE_UNORDERED;
 }
 
-static int fsck_tree(struct tree *item,
+static int fsck_tree(const struct object_id *oid,
 		     const char *buffer, unsigned long size,
 		     struct fsck_options *options)
 {
@@ -586,7 +586,7 @@ static int fsck_tree(struct tree *item,
 	const char *o_name;
 
 	if (init_tree_desc_gently(&desc, buffer, size)) {
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
 
@@ -613,13 +613,13 @@ static int fsck_tree(struct tree *item,
 				oidset_insert(&gitmodules_found, oid);
 			else
 				retval += report(options,
-						 &item->object.oid, item->object.type,
+						 oid, OBJ_TREE,
 						 FSCK_MSG_GITMODULES_SYMLINK,
 						 ".gitmodules is a symbolic link");
 		}
 
 		if (update_tree_entry_gently(&desc)) {
-			retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
+			retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 			break;
 		}
 
@@ -664,25 +664,25 @@ static int fsck_tree(struct tree *item,
 	}
 
 	if (has_null_sha1)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_FULL_PATHNAME, "contains full pathnames");
 	if (has_empty_name)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_EMPTY_NAME, "contains empty pathname");
 	if (has_dot)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_HAS_DOT, "contains '.'");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOT, "contains '.'");
 	if (has_dotdot)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_HAS_DOTDOT, "contains '..'");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTDOT, "contains '..'");
 	if (has_dotgit)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_HAS_DOTGIT, "contains '.git'");
 	if (has_zero_pad)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_ZERO_PADDED_FILEMODE, "contains zero-padded file modes");
 	if (has_bad_modes)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_FILEMODE, "contains bad file modes");
 	if (has_dup_entries)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_DUPLICATE_ENTRIES, "contains duplicate file entries");
 	if (not_properly_sorted)
-		retval += report(options, &item->object.oid, item->object.type, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
+		retval += report(options, oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted");
 	return retval;
 }
 
@@ -982,7 +982,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 	if (obj->type == OBJ_BLOB)
 		return fsck_blob(&obj->oid, data, size, options);
 	if (obj->type == OBJ_TREE)
-		return fsck_tree((struct tree *) obj, data, size, options);
+		return fsck_tree(&obj->oid, data, size, options);
 	if (obj->type == OBJ_COMMIT)
 		return fsck_commit(&obj->oid, data, size, options);
 	if (obj->type == OBJ_TAG)
-- 
2.23.0.1228.gee29b05929

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error
  2019-10-18  4:42 ` [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error Jeff King
@ 2019-10-24  3:37   ` Junio C Hamano
  2019-10-24 18:01     ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2019-10-24  3:37 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> ...
> The "seen" variant for this test mistakenly parsed another commit
> instead of the blob, meaning that we were actually just testing the
> "lone" case again. Changing that reveals the breakage (and shows that
> this fixes it).
> ...
> @@ -52,7 +52,7 @@ test_expect_success 'traverse unexpected non-commit parent (lone)' '
>  '
>  
>  test_expect_success 'traverse unexpected non-commit parent (seen)' '
> -	test_must_fail git rev-list --objects $commit $broken_commit \
> +	test_must_fail git rev-list --objects $blob $broken_commit \
>  		>output 2>&1 &&
>  	test_i18ngrep "not a commit" output
>  '

Yikes.  Thanks for spotting.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/23] remember commit/tag parse failures
  2019-10-18  4:47 ` [PATCH 04/23] remember commit/tag parse failures Jeff King
@ 2019-10-24  3:51   ` Junio C Hamano
  2019-10-24 23:25   ` Jonathan Tan
  1 sibling, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2019-10-24  3:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> ...
> a separate check for a NULL tree. In fact, we can now ditch that
> explicit tree check entirely, as we're covered robustly by this change
> (and the previous recent change to treat a NULL tree as a parse error).
> ...
> @@ -855,9 +855,6 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
>  			die(_("unable to parse commit %s"),
>  				oid_to_hex(&(*list)->object.oid));
>  		tree = get_commit_tree_oid(*list);
> -		if (!tree)
> -			die(_("unable to get tree for %s"),
> -				oid_to_hex(&(*list)->object.oid));

The context before this hunk is to die when parse_commit(*list)
fails, and because a successful parse_commit() will always set a
non-NULL tree now, the null-ness check on its tree becomes unneeded,
which makes sense.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 05/23] fsck: stop checking commit->tree value
  2019-10-18  4:48 ` [PATCH 05/23] fsck: stop checking commit->tree value Jeff King
@ 2019-10-24  3:57   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2019-10-24  3:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> We check in fsck_commit_buffer() that commit->tree isn't NULL, which in
> ...
> So this code isn't helping anything. And it makes the fsck code slightly
> more confusing and rigid (e.g., it requires that any commit structs have
> already been parsed). Let's drop it.

Thanks; I recall we discussed this during a review of a topic.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/23] fsck: unify object-name code
  2019-10-18  4:56 ` [PATCH 09/23] fsck: unify object-name code Jeff King
@ 2019-10-24  6:05   ` Junio C Hamano
  2019-10-24 18:07     ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2019-10-24  6:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> +/*
> + * Subsystem for storing human-readable names for each object.
> + *
> + * If fsck_enable_object_names() has not been called, all other functions are
> + * noops.

I think it is clear from the context that put_ and get_ we see below
refer to functions whose true names have fsck_ prefix, but the fact
that this one has fsck_ spelled out disrupts that context.

IOW, s/fsck_// for consistency, perhaps.

> + * Use put_object_name() to seed initial names (e.g. from refnames); the fsck
> + * code will extend that while walking trees, etc.
> + *
> + * Use get_object_name() to get a single name (or NULL if none). Or the more
> + * convenient describe_object(), which always produces an output string with
> + * the oid combined with the name (if any). Note that the return value points
> + * to a rotating array of static buffers, and may be invalidated by a
> + * subsequent call.
> + */
> +void fsck_enable_object_names(struct fsck_options *options);
> +const char *fsck_get_object_name(struct fsck_options *options,
> +				 struct object *obj);
> +__attribute__((format (printf,3,4)))
> +void fsck_put_object_name(struct fsck_options *options, struct object *obj,
> +			  const char *fmt, ...);
> +const char *fsck_describe_object(struct fsck_options *options,
> +				 struct object *obj);
> +
>  #endif
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 50d28e6fdb..7c7ff7e961 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -616,7 +616,7 @@ test_expect_success 'fsck --name-objects' '
>  		remove_object $(git rev-parse julius:caesar.t) &&
>  		test_must_fail git fsck --name-objects >out &&
>  		tree=$(git rev-parse --verify julius:) &&
> -		test_i18ngrep -E "$tree \((refs/heads/master|HEAD)@\{[0-9]*\}:" out
> +		test_i18ngrep "$tree (refs/tags/julius:" out
>  	)
>  '

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 10/23] fsck_describe_object(): build on our get_object_name() primitive
  2019-10-18  4:56 ` [PATCH 10/23] fsck_describe_object(): build on our get_object_name() primitive Jeff King
@ 2019-10-24  6:06   ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2019-10-24  6:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> This isolates the implementation detail of using the decoration code to
> our put/get functions.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Arguably this could be squashed into the previous commit. By not doing
> so, it made describe_object() more of a pure code movement.

Indeed it did.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error
  2019-10-24  3:37   ` Junio C Hamano
@ 2019-10-24 18:01     ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-24 18:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Oct 24, 2019 at 12:37:32PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ...
> > The "seen" variant for this test mistakenly parsed another commit
> > instead of the blob, meaning that we were actually just testing the
> > "lone" case again. Changing that reveals the breakage (and shows that
> > this fixes it).
> > ...
> > @@ -52,7 +52,7 @@ test_expect_success 'traverse unexpected non-commit parent (lone)' '
> >  '
> >  
> >  test_expect_success 'traverse unexpected non-commit parent (seen)' '
> > -	test_must_fail git rev-list --objects $commit $broken_commit \
> > +	test_must_fail git rev-list --objects $blob $broken_commit \
> >  		>output 2>&1 &&
> >  	test_i18ngrep "not a commit" output
> >  '
> 
> Yikes.  Thanks for spotting.

By the way, I shuffled this one to the front so that it could be taken
separately for "maint" if you want (but it has been this way for at
least a decade, so I don't think it's urgent for v2.24).

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/23] fsck: unify object-name code
  2019-10-24  6:05   ` Junio C Hamano
@ 2019-10-24 18:07     ` Jeff King
  2019-10-25  3:23       ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2019-10-24 18:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Oct 24, 2019 at 03:05:58PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +/*
> > + * Subsystem for storing human-readable names for each object.
> > + *
> > + * If fsck_enable_object_names() has not been called, all other functions are
> > + * noops.
> 
> I think it is clear from the context that put_ and get_ we see below
> refer to functions whose true names have fsck_ prefix, but the fact
> that this one has fsck_ spelled out disrupts that context.
> 
> IOW, s/fsck_// for consistency, perhaps.

Heh. I originally wrote just "put" and "get" for brevity, but then
worried that wasn't clear enough. Let's just use the full names. We can
afford the extra few bytes, and it makes it easier if we later need to
grep for them or whatever. Here's a replacement patch (I won't send the
whole series just yet, as it seems it hasn't gotten a whole lot of
review so far).

-- >8 --
Subject: [PATCH] fsck: unify object-name code

Commit 90cf590f53 (fsck: optionally show more helpful info for broken
links, 2016-07-17) added a system for decorating objects with names. The
code is split across builtin/fsck.c (which gives the initial names) and
fsck.c (which adds to the names as it traverses the object graph). This
leads to some duplication, where both sites have near-identical
describe_object() functions (the difference being that the one in
builtin/fsck.c uses a circular array of buffers to allow multiple calls
in a single printf).

Let's provide a unified object_name API for fsck. That lets us drop the
duplication, as well as making the interface boundaries more clear
(which will let us refactor the implementation more in a future patch).

We'll leave describe_object() in builtin/fsck.c as a thin wrapper around
the new API, as it relies on a static global to make its many callers a
bit shorter.

We'll also convert the bare add_decoration() calls in builtin/fsck.c to
put_object_name(). This fixes two minor bugs:

  1. We leak many small strings. add_decoration() has a last-one-wins
     approach: it updates the decoration to the new string and returns
     the old one. But we ignore the return value, leaking the old
     string. This is quite common to trigger, since we look at reflogs:
     the tip of any ref will be described both by looking at the actual
     ref, as well as the latest reflog entry. So we'd always end up
     leaking one of those strings.

  2. The last-one-wins approach gives us lousy names. For instance, we
     first look at all of the refs, and then all of the reflogs. So
     rather than seeing "refs/heads/master", we're likely to overwrite
     it with "HEAD@{12345678}". We're generally better off using the
     first name we find.

     And indeed, the test in t1450 expects this ugly HEAD@{} name. After
     this patch, we've switched to using fsck_put_object_name()'s
     first-one-wins semantics, and we output the more human-friendly
     "refs/tags/julius" (and the test is updated accordingly).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  | 50 ++++++++----------------------
 fsck.c          | 82 ++++++++++++++++++++++++++++++-------------------
 fsck.h          | 24 +++++++++++++++
 t/t1450-fsck.sh |  2 +-
 4 files changed, 89 insertions(+), 69 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 18403a94fa..237643cc1d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -52,24 +52,7 @@ static int name_objects;
 
 static const char *describe_object(struct object *obj)
 {
-	static struct strbuf bufs[] = {
-		STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
-	};
-	static int b = 0;
-	struct strbuf *buf;
-	char *name = NULL;
-
-	if (name_objects)
-		name = lookup_decoration(fsck_walk_options.object_names, obj);
-
-	buf = bufs + b;
-	b = (b + 1) % ARRAY_SIZE(bufs);
-	strbuf_reset(buf);
-	strbuf_addstr(buf, oid_to_hex(&obj->oid));
-	if (name)
-		strbuf_addf(buf, " (%s)", name);
-
-	return buf->buf;
+	return fsck_describe_object(&fsck_walk_options, obj);
 }
 
 static const char *printable_type(struct object *obj)
@@ -499,10 +482,10 @@ static void fsck_handle_reflog_oid(const char *refname, struct object_id *oid,
 	if (!is_null_oid(oid)) {
 		obj = lookup_object(the_repository, oid);
 		if (obj && (obj->flags & HAS_OBJ)) {
-			if (timestamp && name_objects)
-				add_decoration(fsck_walk_options.object_names,
-					obj,
-					xstrfmt("%s@{%"PRItime"}", refname, timestamp));
+			if (timestamp)
+				fsck_put_object_name(&fsck_walk_options, obj,
+						     "%s@{%"PRItime"}",
+						     refname, timestamp);
 			obj->flags |= USED;
 			mark_object_reachable(obj);
 		} else if (!is_promisor_object(oid)) {
@@ -566,9 +549,8 @@ static int fsck_handle_ref(const char *refname, const struct object_id *oid,
 	}
 	default_refs++;
 	obj->flags |= USED;
-	if (name_objects)
-		add_decoration(fsck_walk_options.object_names,
-			obj, xstrdup(refname));
+	fsck_put_object_name(&fsck_walk_options,
+			     obj, "%s", refname);
 	mark_object_reachable(obj);
 
 	return 0;
@@ -742,9 +724,7 @@ static int fsck_cache_tree(struct cache_tree *it)
 			return 1;
 		}
 		obj->flags |= USED;
-		if (name_objects)
-			add_decoration(fsck_walk_options.object_names,
-				obj, xstrdup(":"));
+		fsck_put_object_name(&fsck_walk_options, obj, ":");
 		mark_object_reachable(obj);
 		if (obj->type != OBJ_TREE)
 			err |= objerror(obj, _("non-tree in cache-tree"));
@@ -830,8 +810,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	}
 
 	if (name_objects)
-		fsck_walk_options.object_names =
-			xcalloc(1, sizeof(struct decoration));
+		fsck_enable_object_names(&fsck_walk_options);
 
 	git_config(fsck_config, NULL);
 
@@ -890,9 +869,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 			}
 
 			obj->flags |= USED;
-			if (name_objects)
-				add_decoration(fsck_walk_options.object_names,
-					obj, xstrdup(arg));
+			fsck_put_object_name(&fsck_walk_options, obj,
+					     "%s", arg);
 			mark_object_reachable(obj);
 			continue;
 		}
@@ -928,10 +906,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 				continue;
 			obj = &blob->object;
 			obj->flags |= USED;
-			if (name_objects)
-				add_decoration(fsck_walk_options.object_names,
-					obj,
-					xstrfmt(":%s", active_cache[i]->name));
+			fsck_put_object_name(&fsck_walk_options, obj,
+					     ":%s", active_cache[i]->name);
 			mark_object_reachable(obj);
 		}
 		if (active_cache_tree)
diff --git a/fsck.c b/fsck.c
index 347a0ef5c9..ecd5957362 100644
--- a/fsck.c
+++ b/fsck.c
@@ -312,15 +312,21 @@ static int report(struct fsck_options *options, struct object *object,
 	return result;
 }
 
-static char *get_object_name(struct fsck_options *options, struct object *obj)
+void fsck_enable_object_names(struct fsck_options *options)
+{
+	if (!options->object_names)
+		options->object_names = xcalloc(1, sizeof(struct decoration));
+}
+
+const char *fsck_get_object_name(struct fsck_options *options, struct object *obj)
 {
 	if (!options->object_names)
 		return NULL;
 	return lookup_decoration(options->object_names, obj);
 }
 
-static void put_object_name(struct fsck_options *options, struct object *obj,
-	const char *fmt, ...)
+void fsck_put_object_name(struct fsck_options *options, struct object *obj,
+			  const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf buf = STRBUF_INIT;
@@ -337,17 +343,27 @@ static void put_object_name(struct fsck_options *options, struct object *obj,
 	va_end(ap);
 }
 
-static const char *describe_object(struct fsck_options *o, struct object *obj)
+const char *fsck_describe_object(struct fsck_options *options,
+				 struct object *obj)
 {
-	static struct strbuf buf = STRBUF_INIT;
-	char *name;
-
-	strbuf_reset(&buf);
-	strbuf_addstr(&buf, oid_to_hex(&obj->oid));
-	if (o->object_names && (name = lookup_decoration(o->object_names, obj)))
-		strbuf_addf(&buf, " (%s)", name);
+	static struct strbuf bufs[] = {
+		STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
+	};
+	static int b = 0;
+	struct strbuf *buf;
+	char *name = NULL;
+
+	if (options->object_names)
+		name = lookup_decoration(options->object_names, obj);
+
+	buf = bufs + b;
+	b = (b + 1) % ARRAY_SIZE(bufs);
+	strbuf_reset(buf);
+	strbuf_addstr(buf, oid_to_hex(&obj->oid));
+	if (name)
+		strbuf_addf(buf, " (%s)", name);
 
-	return buf.buf;
+	return buf->buf;
 }
 
 static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *options)
@@ -360,7 +376,7 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 	if (parse_tree(tree))
 		return -1;
 
-	name = get_object_name(options, &tree->object);
+	name = fsck_get_object_name(options, &tree->object);
 	if (init_tree_desc_gently(&desc, tree->buffer, tree->size))
 		return -1;
 	while (tree_entry_gently(&desc, &entry)) {
@@ -373,20 +389,21 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 		if (S_ISDIR(entry.mode)) {
 			obj = (struct object *)lookup_tree(the_repository, &entry.oid);
 			if (name && obj)
-				put_object_name(options, obj, "%s%s/", name,
-					entry.path);
+				fsck_put_object_name(options, obj, "%s%s/",
+						     name, entry.path);
 			result = options->walk(obj, OBJ_TREE, data, options);
 		}
 		else if (S_ISREG(entry.mode) || S_ISLNK(entry.mode)) {
 			obj = (struct object *)lookup_blob(the_repository, &entry.oid);
 			if (name && obj)
-				put_object_name(options, obj, "%s%s", name,
-					entry.path);
+				fsck_put_object_name(options, obj, "%s%s",
+						     name, entry.path);
 			result = options->walk(obj, OBJ_BLOB, data, options);
 		}
 		else {
 			result = error("in tree %s: entry %s has bad mode %.6o",
-					describe_object(options, &tree->object), entry.path, entry.mode);
+				       fsck_describe_object(options, &tree->object),
+				       entry.path, entry.mode);
 		}
 		if (result < 0)
 			return result;
@@ -407,10 +424,10 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 	if (parse_commit(commit))
 		return -1;
 
-	name = get_object_name(options, &commit->object);
+	name = fsck_get_object_name(options, &commit->object);
 	if (name)
-		put_object_name(options, &get_commit_tree(commit)->object,
-				"%s:", name);
+		fsck_put_object_name(options, &get_commit_tree(commit)->object,
+				     "%s:", name);
 
 	result = options->walk((struct object *)get_commit_tree(commit),
 			       OBJ_TREE, data, options);
@@ -441,13 +458,15 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 			struct object *obj = &parents->item->object;
 
 			if (counter++)
-				put_object_name(options, obj, "%s^%d",
-					name, counter);
+				fsck_put_object_name(options, obj, "%s^%d",
+						     name, counter);
 			else if (generation > 0)
-				put_object_name(options, obj, "%.*s~%d",
-					name_prefix_len, name, generation + 1);
+				fsck_put_object_name(options, obj, "%.*s~%d",
+						     name_prefix_len, name,
+						     generation + 1);
 			else
-				put_object_name(options, obj, "%s^", name);
+				fsck_put_object_name(options, obj, "%s^",
+						     name);
 		}
 		result = options->walk((struct object *)parents->item, OBJ_COMMIT, data, options);
 		if (result < 0)
@@ -461,12 +480,12 @@ static int fsck_walk_commit(struct commit *commit, void *data, struct fsck_optio
 
 static int fsck_walk_tag(struct tag *tag, void *data, struct fsck_options *options)
 {
-	char *name = get_object_name(options, &tag->object);
+	const char *name = fsck_get_object_name(options, &tag->object);
 
 	if (parse_tag(tag))
 		return -1;
 	if (name)
-		put_object_name(options, tag->tagged, "%s", name);
+		fsck_put_object_name(options, tag->tagged, "%s", name);
 	return options->walk(tag->tagged, OBJ_ANY, data, options);
 }
 
@@ -488,7 +507,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options)
 	case OBJ_TAG:
 		return fsck_walk_tag((struct tag *)obj, data, options);
 	default:
-		error("Unknown object type for %s", describe_object(options, obj));
+		error("Unknown object type for %s",
+		      fsck_describe_object(options, obj));
 		return -1;
 	}
 }
@@ -962,10 +982,10 @@ int fsck_error_function(struct fsck_options *o,
 	struct object *obj, int msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
-		warning("object %s: %s", describe_object(o, obj), message);
+		warning("object %s: %s", fsck_describe_object(o, obj), message);
 		return 0;
 	}
-	error("object %s: %s", describe_object(o, obj), message);
+	error("object %s: %s", fsck_describe_object(o, obj), message);
 	return 1;
 }
 
diff --git a/fsck.h b/fsck.h
index e479461075..f6f0c40060 100644
--- a/fsck.h
+++ b/fsck.h
@@ -67,4 +67,28 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
  */
 int fsck_finish(struct fsck_options *options);
 
+/*
+ * Subsystem for storing human-readable names for each object.
+ *
+ * If fsck_enable_object_names() has not been called, all other functions are
+ * noops.
+ *
+ * Use put_object_name() to seed initial names (e.g. from refnames); the fsck
+ * code will extend that while walking trees, etc.
+ *
+ * Use get_object_name() to get a single name (or NULL if none). Or the more
+ * convenient describe_object(), which always produces an output string with
+ * the oid combined with the name (if any). Note that the return value points
+ * to a rotating array of static buffers, and may be invalidated by a
+ * subsequent call.
+ */
+void fsck_enable_object_names(struct fsck_options *options);
+const char *fsck_get_object_name(struct fsck_options *options,
+				 struct object *obj);
+__attribute__((format (printf,3,4)))
+void fsck_put_object_name(struct fsck_options *options, struct object *obj,
+			  const char *fmt, ...);
+const char *fsck_describe_object(struct fsck_options *options,
+				 struct object *obj);
+
 #endif
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 50d28e6fdb..7c7ff7e961 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -616,7 +616,7 @@ test_expect_success 'fsck --name-objects' '
 		remove_object $(git rev-parse julius:caesar.t) &&
 		test_must_fail git fsck --name-objects >out &&
 		tree=$(git rev-parse --verify julius:) &&
-		test_i18ngrep -E "$tree \((refs/heads/master|HEAD)@\{[0-9]*\}:" out
+		test_i18ngrep "$tree (refs/tags/julius:" out
 	)
 '
 
-- 
2.24.0.rc1.670.g28e7e9fef7


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/23] parse_commit_buffer(): treat lookup_tree() failure as parse error
  2019-10-18  4:43 ` [PATCH 02/23] parse_commit_buffer(): treat lookup_tree() " Jeff King
@ 2019-10-24 23:12   ` Jonathan Tan
  2019-10-24 23:22     ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Jonathan Tan @ 2019-10-24 23:12 UTC (permalink / raw)
  To: peff; +Cc: git, Jonathan Tan

> And
> certainly we _have_ seen real-world cases, such as the one fixed by
> 806278dead (commit-graph.c: handle corrupt/missing trees, 2019-09-05).
> 
> Note that we can't quite drop the check in the caller added by that
> commit yet, as there's some subtlety with repeated parsings (which will
> be addressed in a future commit).

I tried figuring out what the subtlety is by undoing the check you
describe, and did get a segfault in one of the tests, but couldn't
figure out exactly what is going on. Looking at the code, is it because
load_tree_for_commit() in commit-graph.c uses the return value of
lookup_tree() indiscriminately (which is the issue that this patch
fixes)?

This patch itself and patch 1 looks good (with the reasoning in the
commit message), of course.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/23] parse_commit_buffer(): treat lookup_tree() failure as parse error
  2019-10-24 23:12   ` Jonathan Tan
@ 2019-10-24 23:22     ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-24 23:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Thu, Oct 24, 2019 at 04:12:20PM -0700, Jonathan Tan wrote:

> > And
> > certainly we _have_ seen real-world cases, such as the one fixed by
> > 806278dead (commit-graph.c: handle corrupt/missing trees, 2019-09-05).
> > 
> > Note that we can't quite drop the check in the caller added by that
> > commit yet, as there's some subtlety with repeated parsings (which will
> > be addressed in a future commit).
> 
> I tried figuring out what the subtlety is by undoing the check you
> describe, and did get a segfault in one of the tests, but couldn't
> figure out exactly what is going on. Looking at the code, is it because
> load_tree_for_commit() in commit-graph.c uses the return value of
> lookup_tree() indiscriminately (which is the issue that this patch
> fixes)?
> 
> This patch itself and patch 1 looks good (with the reasoning in the
> commit message), of course.

It's the issue addressed in patch 4. The issue is that some _other_ code
already called parse_commit(), got an error, but then for whatever
reason ignored it. Now when we call parse_commit(), the "parsed" flag is
set, so it returns success even though the tree is still NULL.

That commit fixes the problem and does drop the extra tree check.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/23] remember commit/tag parse failures
  2019-10-18  4:47 ` [PATCH 04/23] remember commit/tag parse failures Jeff King
  2019-10-24  3:51   ` Junio C Hamano
@ 2019-10-24 23:25   ` Jonathan Tan
  2019-10-24 23:41     ` Jeff King
  1 sibling, 1 reply; 39+ messages in thread
From: Jonathan Tan @ 2019-10-24 23:25 UTC (permalink / raw)
  To: peff; +Cc: git, Jonathan Tan

Firstly, this patch is not about remembering, but about not setting
anything, so I think that the title should be something like:

  commit, tag: set parsed only if no parsing error

Incidentally, the check that you mentioned in PATCH 02 is probably no
longer necessary. The tests all pass even with the following diff:

diff --git a/commit.c b/commit.c
index e12e7998ad..086011d944 100644
--- a/commit.c
+++ b/commit.c
@@ -359,7 +359,7 @@ struct tree *repo_get_commit_tree(struct repository *r,
 struct object_id *get_commit_tree_oid(const struct commit *commit)
 {
        struct tree *tree = get_commit_tree(commit);
-       return tree ? &tree->object.oid : NULL;
+       return &tree->object.oid;
 }

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 04/23] remember commit/tag parse failures
  2019-10-24 23:25   ` Jonathan Tan
@ 2019-10-24 23:41     ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-24 23:41 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Thu, Oct 24, 2019 at 04:25:46PM -0700, Jonathan Tan wrote:

> Firstly, this patch is not about remembering, but about not setting
> anything, so I think that the title should be something like:
> 
>   commit, tag: set parsed only if no parsing error

True. I had also played with actually remembering via a bit, which I
think is how I ended up thinking about it that way. You could argue that
it is "not forgetting", which is remembering. :) But I think your
suggested title is better.

> Incidentally, the check that you mentioned in PATCH 02 is probably no
> longer necessary. The tests all pass even with the following diff:
> 
> diff --git a/commit.c b/commit.c
> index e12e7998ad..086011d944 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -359,7 +359,7 @@ struct tree *repo_get_commit_tree(struct repository *r,
>  struct object_id *get_commit_tree_oid(const struct commit *commit)
>  {
>         struct tree *tree = get_commit_tree(commit);
> -       return tree ? &tree->object.oid : NULL;
> +       return &tree->object.oid;
>  }

Ah, I see the confusion you had earlier. The check I meant in patch 2
(and here) was the one in write_graph_chunk_data(), which checks for a
non-NULL tree even after we just saw a successful parse.

I agree that getting rid of the check in get_commit_tree_oid() is
unlikely to cause any bugs, but there are still cases where it could
help. Namely if I choose to ignore the parse failure (because I want to
see the parts of the commit struct that we did manage to get), then I'd
like to be able to ask whether we have a valid tree, like:

  oid = get_commit_tree_oid(commit);
  if (!oid)
	do something...

With the revert you showed above, that's dangerous, because we'd get a
bogus value like "8" (because the oid is offset 8 bytes in the struct
which we've dereferenced as NULL).

You could obviously use "get_commit_tree()" instead, which would let you
compare to NULL. But it seemed simpler to leave the extra safety in
place.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/23] parsing and fsck cleanups
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (22 preceding siblings ...)
  2019-10-18  5:02 ` [PATCH 23/23] fsck: accept an oid instead of a "struct tree" for fsck_tree() Jeff King
@ 2019-10-24 23:49 ` Jonathan Tan
  2019-10-25  3:11 ` Junio C Hamano
  24 siblings, 0 replies; 39+ messages in thread
From: Jonathan Tan @ 2019-10-24 23:49 UTC (permalink / raw)
  To: peff; +Cc: git, Jonathan Tan

I've looked at the rest of the patch set and I think that this set is
worth taking.

>     This a string of refactors that ends up with all of the
>     type-specific fsck functions not getting an object struct at all.
>     My goal there was two-fold:
> 
>        - it makes it harder to introduce weirdness like we saw in
> 	 patches 5-8.
> 
>        - it _could_ make things less awkward for callers like index-pack
> 	 which don't necessarily have object structs. And at the end, we
> 	 basically have an fsck_object() that doesn't need an object
> 	 struct. But index-pack still calls fsck_walk(), which does (and
> 	 which relies on the parsed values to traverse). It's not
> 	 entirely clear to me whether index-pack needs to be doing
> 	 fsck_walk() in the first place, or if it should be relying on
> 	 the usual connectivity check.
> 
> 	 So I'm undecided whether this is worth taking on its own, or if
> 	 trying to avoid object structs in the fsck code is just a
> 	 fool's errand. I do think the result isn't too bad to look at,
> 	 though and there are some minor improvements along the way
> 	 (e.g., patch 17 is able to drop some awkwardness).

If we can partially avoid object structs in the fsck code, I think
that's an improvement too.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/23] parsing and fsck cleanups
  2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
                   ` (23 preceding siblings ...)
  2019-10-24 23:49 ` [PATCH 0/23] parsing and fsck cleanups Jonathan Tan
@ 2019-10-25  3:11 ` Junio C Hamano
  24 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2019-10-25  3:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

>     This a string of refactors that ends up with all of the
>     type-specific fsck functions not getting an object struct at all.
>     My goal there was two-fold:
>
>        - it makes it harder to introduce weirdness like we saw in
> 	 patches 5-8.

Yup.  I'd see that alone as a reason that makes these worth doing.

And it was a pleasant read.  Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/23] fsck: unify object-name code
  2019-10-24 18:07     ` Jeff King
@ 2019-10-25  3:23       ` Junio C Hamano
  2019-10-25 21:20         ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2019-10-25  3:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Heh. I originally wrote just "put" and "get" for brevity, but then
> worried that wasn't clear enough. Let's just use the full names. We can
> afford the extra few bytes, and it makes it easier if we later need to
> grep for them or whatever. Here's a replacement patch (I won't send the
> whole series just yet, as it seems it hasn't gotten a whole lot of
> review so far).

It seems that you sent the original instead X-<.

I'll just replace the original 09/23 with the following squashed in
for now.

Thanks.

 fsck.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fsck.h b/fsck.h
index f6f0c40060..6228f0b2d4 100644
--- a/fsck.h
+++ b/fsck.h
@@ -73,13 +73,13 @@ int fsck_finish(struct fsck_options *options);
  * If fsck_enable_object_names() has not been called, all other functions are
  * noops.
  *
- * Use put_object_name() to seed initial names (e.g. from refnames); the fsck
- * code will extend that while walking trees, etc.
+ * Use fsck_put_object_name() to seed initial names (e.g. from refnames); the
+ * fsck code will extend that while walking trees, etc.
  *
- * Use get_object_name() to get a single name (or NULL if none). Or the more
- * convenient describe_object(), which always produces an output string with
- * the oid combined with the name (if any). Note that the return value points
- * to a rotating array of static buffers, and may be invalidated by a
+ * Use fsck_get_object_name() to get a single name (or NULL if none). Or the
+ * more convenient describe_object(), which always produces an output string
+ * with the oid combined with the name (if any). Note that the return value
+ * points to a rotating array of static buffers, and may be invalidated by a
  * subsequent call.
  */
 void fsck_enable_object_names(struct fsck_options *options);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 09/23] fsck: unify object-name code
  2019-10-25  3:23       ` Junio C Hamano
@ 2019-10-25 21:20         ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2019-10-25 21:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Oct 25, 2019 at 12:23:39PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Heh. I originally wrote just "put" and "get" for brevity, but then
> > worried that wasn't clear enough. Let's just use the full names. We can
> > afford the extra few bytes, and it makes it easier if we later need to
> > grep for them or whatever. Here's a replacement patch (I won't send the
> > whole series just yet, as it seems it hasn't gotten a whole lot of
> > review so far).
> 
> It seems that you sent the original instead X-<.
> 
> I'll just replace the original 09/23 with the following squashed in
> for now.

Doh. Perhaps you have a brown paper bag I could borrow, as well. :)

What you showed matches what I meant to send. If I can trouble you once
more on this series, can you swap out the subject for patch 4 ("remember
commit/tag parse failures", as suggested by Jonathan Tan? Here's a
replacement patch (really this time!), but the subject is the only
different thing:

-- >8 --
Subject: [PATCH] commit, tag: don't set parsed bit for parse failures

If we can't parse a commit, then parse_commit() will return an error
code. But it _also_ sets the "parsed" flag, which tells us not to bother
trying to re-parse the object. That means that subsequent parses have no
idea that the information in the struct may be bogus.  I.e., doing this:

  parse_commit(commit);
  ...
  if (parse_commit(commit) < 0)
          die("commit is broken");

will never trigger the die(). The second parse_commit() will see the
"parsed" flag and quietly return success.

There are two obvious ways to fix this:

  1. Stop setting "parsed" until we've successfully parsed.

  2. Keep a second "corrupt" flag to indicate that we saw an error (and
     when the parsed flag is set, return 0/-1 depending on the corrupt
     flag).

This patch does option 1. The obvious downside versus option 2 is that
we might continually re-parse a broken object. But in practice,
corruption like this is rare, and we typically die() or return an error
in the caller. So it's OK not to worry about optimizing for corruption.
And it's much simpler: we don't need to use an extra bit in the object
struct, and callers which check the "parsed" flag don't need to learn
about the corrupt bit, too.

There's no new test here, because this case is already covered in t5318.
Note that we do need to update the expected message there, because we
now detect the problem in the return from "parse_commit()", and not with
a separate check for a NULL tree. In fact, we can now ditch that
explicit tree check entirely, as we're covered robustly by this change
(and the previous recent change to treat a NULL tree as a parse error).

We'll also give tags the same treatment. I don't know offhand of any
cases where the problem can be triggered (it implies somebody ignoring a
parse error earlier in the process), but consistently returning an error
should cause the least surprise.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit-graph.c          |  3 ---
 commit.c                | 14 +++++++++++++-
 t/t5318-commit-graph.sh |  2 +-
 tag.c                   | 12 +++++++++++-
 4 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index fc4a43b8d6..852b9c39e6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -855,9 +855,6 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
 			die(_("unable to parse commit %s"),
 				oid_to_hex(&(*list)->object.oid));
 		tree = get_commit_tree_oid(*list);
-		if (!tree)
-			die(_("unable to get tree for %s"),
-				oid_to_hex(&(*list)->object.oid));
 		hashwrite(f, tree->hash, hash_len);
 
 		parent = (*list)->parents;
diff --git a/commit.c b/commit.c
index 810419a168..e12e7998ad 100644
--- a/commit.c
+++ b/commit.c
@@ -405,7 +405,18 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->parents) {
+		/*
+		 * Presumably this is leftover from an earlier failed parse;
+		 * clear it out in preparation for us re-parsing (we'll hit the
+		 * same error, but that's good, since it lets our caller know
+		 * the result cannot be trusted.
+		 */
+		free_commit_list(item->parents);
+		item->parents = NULL;
+	}
+
 	tail += size;
 	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
 			bufptr[tree_entry_len] != '\n')
@@ -462,6 +473,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	if (check_graph)
 		load_commit_graph_info(r, item);
 
+	item->object.parsed = 1;
 	return 0;
 }
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d42b3efe39..127b404856 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -660,7 +660,7 @@ test_expect_success 'corrupt commit-graph write (missing tree)' '
 		git commit-tree -p "$broken" -m "good" "$tree" >good &&
 		test_must_fail git commit-graph write --stdin-commits \
 			<good 2>test_err &&
-		test_i18ngrep "unable to get tree for" test_err
+		test_i18ngrep "unable to parse commit" test_err
 	)
 '
 
diff --git a/tag.c b/tag.c
index 6a51efda8d..71b544467e 100644
--- a/tag.c
+++ b/tag.c
@@ -141,7 +141,16 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->tag) {
+		/*
+		 * Presumably left over from a previous failed parse;
+		 * clear it out in preparation for re-parsing (we'll probably
+		 * hit the same error, which lets us tell our current caller
+		 * about the problem).
+		 */
+		FREE_AND_NULL(item->tag);
+	}
 
 	if (size < the_hash_algo->hexsz + 24)
 		return -1;
@@ -192,6 +201,7 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	else
 		item->date = 0;
 
+	item->object.parsed = 1;
 	return 0;
 }
 
-- 
2.24.0.rc1.670.g28e7e9fef7


^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2019-10-25 21:20 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
2019-10-18  4:42 ` [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error Jeff King
2019-10-24  3:37   ` Junio C Hamano
2019-10-24 18:01     ` Jeff King
2019-10-18  4:43 ` [PATCH 02/23] parse_commit_buffer(): treat lookup_tree() " Jeff King
2019-10-24 23:12   ` Jonathan Tan
2019-10-24 23:22     ` Jeff King
2019-10-18  4:45 ` [PATCH 03/23] parse_tag_buffer(): treat NULL tag pointer " Jeff King
2019-10-18  4:47 ` [PATCH 04/23] remember commit/tag parse failures Jeff King
2019-10-24  3:51   ` Junio C Hamano
2019-10-24 23:25   ` Jonathan Tan
2019-10-24 23:41     ` Jeff King
2019-10-18  4:48 ` [PATCH 05/23] fsck: stop checking commit->tree value Jeff King
2019-10-24  3:57   ` Junio C Hamano
2019-10-18  4:49 ` [PATCH 06/23] fsck: stop checking commit->parent counts Jeff King
2019-10-18  4:51 ` [PATCH 07/23] fsck: stop checking tag->tagged Jeff King
2019-10-18  4:54 ` [PATCH 08/23] fsck: require an actual buffer for non-blobs Jeff King
2019-10-18  4:56 ` [PATCH 09/23] fsck: unify object-name code Jeff King
2019-10-24  6:05   ` Junio C Hamano
2019-10-24 18:07     ` Jeff King
2019-10-25  3:23       ` Junio C Hamano
2019-10-25 21:20         ` Jeff King
2019-10-18  4:56 ` [PATCH 10/23] fsck_describe_object(): build on our get_object_name() primitive Jeff King
2019-10-24  6:06   ` Junio C Hamano
2019-10-18  4:57 ` [PATCH 11/23] fsck: use oids rather than objects for object_name API Jeff King
2019-10-18  4:58 ` [PATCH 12/23] fsck: don't require object structs for display functions Jeff King
2019-10-18  4:58 ` [PATCH 13/23] fsck: only provide oid/type in fsck_error callback Jeff King
2019-10-18  4:58 ` [PATCH 14/23] fsck: only require an oid for skiplist functions Jeff King
2019-10-18  4:59 ` [PATCH 15/23] fsck: don't require an object struct for report() Jeff King
2019-10-18  4:59 ` [PATCH 16/23] fsck: accept an oid instead of a "struct blob" for fsck_blob() Jeff King
2019-10-18  4:59 ` [PATCH 17/23] fsck: drop blob struct from fsck_finish() Jeff King
2019-10-18  5:00 ` [PATCH 18/23] fsck: don't require an object struct for fsck_ident() Jeff King
2019-10-18  5:00 ` [PATCH 19/23] fsck: don't require an object struct in verify_headers() Jeff King
2019-10-18  5:00 ` [PATCH 20/23] fsck: rename vague "oid" local variables Jeff King
2019-10-18  5:01 ` [PATCH 21/23] fsck: accept an oid instead of a "struct tag" for fsck_tag() Jeff King
2019-10-18  5:01 ` [PATCH 22/23] fsck: accept an oid instead of a "struct commit" for fsck_commit() Jeff King
2019-10-18  5:02 ` [PATCH 23/23] fsck: accept an oid instead of a "struct tree" for fsck_tree() Jeff King
2019-10-24 23:49 ` [PATCH 0/23] parsing and fsck cleanups Jonathan Tan
2019-10-25  3:11 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).