All of lore.kernel.org
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Brandon Williams <bwilliamseng@gmail.com>, git <git@vger.kernel.org>
Cc: Jeff King <peff@peff.net>
Subject: Re: invalid tree and commit object
Date: Sat, 9 May 2020 12:16:08 +0200	[thread overview]
Message-ID: <d963242a-72f3-7f42-7c95-ea5148f74804@web.de> (raw)
In-Reply-To: <CALN-EhTpiLERuB16-WPZaLub6GdaRHJW8xDeaOEqSFtKe0kCYw@mail.gmail.com>

Am 09.05.20 um 08:19 schrieb Brandon Williams:
> Here's the setup:
>     tree c63d067eaeed0cbc68b7e4fdf40d267c6b152fe8
>     tree 6241ab2a5314798183b5c4ee8a7b0ccd12c651e6
>     blob 5e1c309dae7f45e0f39b1bf3ac3cd9db12e7d689
>
>     $ git ls-tree c63d067eaeed0cbc68b7e4fdf40d267c6b152fe8
>     100644 blob 5e1c309dae7f45e0f39b1bf3ac3cd9db12e7d689    hello
>     100644 blob 5e1c309dae7f45e0f39b1bf3ac3cd9db12e7d689    hello.c
>     040000 tree 6241ab2a5314798183b5c4ee8a7b0ccd12c651e6    hello

> Am I correct in assuming that this object is indeed invalid and should be
> rejected by fsck?

I'd say yes twice -- what good is a tree that you can't check out because
it contains a d/f conflict?

So I got curious if such trees might be in popular repos, wrote the patch
below and checked around a bit, but couldn't find any.

Is there a smarter way to check for duplicates?  One that doesn't need
allocations?  Perhaps by having a version of tree_entry_extract() that
seeks backwards somehow?

---
 fsck.c          | 10 ++++++++++
 t/t1450-fsck.sh | 16 ++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/fsck.c b/fsck.c
index 087a7f1ffc..f47b35fee8 100644
--- a/fsck.c
+++ b/fsck.c
@@ -587,6 +587,8 @@ static int fsck_tree(const struct object_id *oid,
 	struct tree_desc desc;
 	unsigned o_mode;
 	const char *o_name;
+	struct string_list names = STRING_LIST_INIT_NODUP;
+	size_t nr;

 	if (init_tree_desc_gently(&desc, buffer, size)) {
 		retval += report(options, oid, OBJ_TREE, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
@@ -680,8 +682,16 @@ static int fsck_tree(const struct object_id *oid,

 		o_mode = mode;
 		o_name = name;
+		string_list_append(&names, name);
 	}

+	nr = names.nr;
+	string_list_sort(&names);
+	string_list_remove_duplicates(&names, 0);
+	if (names.nr != nr)
+		has_dup_entries = 1;
+	string_list_clear(&names, 0);
+
 	if (has_null_sha1)
 		retval += report(options, oid, OBJ_TREE, FSCK_MSG_NULL_SHA1, "contains entries pointing to null sha1");
 	if (has_full_path)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 449ebc5657..91a6e34f38 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -257,6 +257,22 @@ test_expect_success 'tree object with duplicate entries' '
 	test_i18ngrep "error in tree .*contains duplicate file entries" out
 '

+test_expect_success 'tree object with dublicate names' '
+	test_when_finished "remove_object \$blob" &&
+	test_when_finished "remove_object \$tree" &&
+	test_when_finished "remove_object \$badtree" &&
+	blob=$(echo blob | git hash-object -w --stdin) &&
+	printf "100644 blob %s\t%s\n" $blob x.2 >tree &&
+	tree=$(git mktree <tree) &&
+	printf "100644 blob %s\t%s\n" $blob x.1 >badtree &&
+	printf "100644 blob %s\t%s\n" $blob x >>badtree &&
+	printf "040000 tree %s\t%s\n" $tree x >>badtree &&
+	badtree=$(git mktree <badtree) &&
+	test_must_fail git fsck 2>out &&
+	test_i18ngrep "$badtree" out &&
+	test_i18ngrep "error in tree .*contains duplicate file entries" out
+'
+
 test_expect_success 'unparseable tree object' '
 	test_oid_cache <<-\EOF &&
 	junk sha1:twenty-bytes-of-junk
--
2.26.2

  reply	other threads:[~2020-05-09 10:16 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-09  6:19 invalid tree and commit object Brandon Williams
2020-05-09 10:16 ` René Scharfe [this message]
2020-05-09  7:16   ` Johannes Schindelin
2020-05-09 11:51     ` René Scharfe
2020-05-09 17:28   ` Junio C Hamano
2020-05-09 19:24     ` René Scharfe
2020-05-09 20:27       ` Junio C Hamano
2020-05-10  9:07         ` René Scharfe
2020-05-10 16:12           ` René Scharfe
2020-05-11 16:25             ` Junio C Hamano
2020-05-13 16:27               ` Brandon Williams
2020-05-21  9:51               ` René Scharfe
2020-05-21  9:52               ` [PATCH 1/4] fsck: fix a typo in a comment René Scharfe
2020-05-21 10:10                 ` Denton Liu
2020-05-21 11:15                 ` René Scharfe
2020-05-21  9:52               ` [PATCH 2/4] t1450: increase test coverage of in-tree d/f detection René Scharfe
2020-05-21 10:20                 ` Denton Liu
2020-05-21 13:31                   ` René Scharfe
2020-05-21 18:01                     ` Junio C Hamano
2020-05-21  9:52               ` [PATCH 3/4] t1450: demonstrate undetected in-tree d/f conflict René Scharfe
2020-05-21  9:52               ` [PATCH 4/4] fsck: detect more in-tree d/f conflicts René Scharfe
2020-05-10 16:37           ` invalid tree and commit object Junio C Hamano
2020-05-21  9:51             ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d963242a-72f3-7f42-7c95-ea5148f74804@web.de \
    --to=l.s.r@web.de \
    --cc=bwilliamseng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.