From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id D0B6620773 for ; Mon, 16 Jan 2017 21:22:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750877AbdAPVWf (ORCPT ); Mon, 16 Jan 2017 16:22:35 -0500 Received: from cloud.peff.net ([104.130.231.41]:39836 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750834AbdAPVWe (ORCPT ); Mon, 16 Jan 2017 16:22:34 -0500 Received: (qmail 20058 invoked by uid 109); 16 Jan 2017 21:22:34 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 21:22:34 +0000 Received: (qmail 12281 invoked by uid 111); 16 Jan 2017 21:23:28 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 16:23:28 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 16 Jan 2017 16:22:31 -0500 Date: Mon, 16 Jan 2017 16:22:31 -0500 From: Jeff King To: git@vger.kernel.org Cc: Michael Haggerty , Johannes Schindelin Subject: [PATCH 0/6] fsck --connectivity-check misses some corruption Message-ID: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org I came across a repository today that was missing an object, and for which "git fsck" reported the error but "git fsck --connectivity-check" did not. It turns out that the shortcut taken by --connectivity-check violates some assumptions made by the rest of fsck (namely, that every object in the repo has a "struct object" loaded). And fsck being a generally neglected tool, I couldn't help but find several more bugs on the way. :) [1/6]: t1450: clean up sub-objects in duplicate-entry test [2/6]: fsck: report trees as dangling [3/6]: fsck: prepare dummy objects for --connectivity-check [4/6]: fsck: tighten error-checks of "git fsck " [5/6]: fsck: do not fallback "git fsck " to "git fsck" [6/6]: fsck: check HAS_OBJ more consistently builtin/fsck.c | 131 ++++++++++++++++++++++++++++++++++++++++++++------------ t/t1450-fsck.sh | 70 ++++++++++++++++++++++++++++-- 2 files changed, 171 insertions(+), 30 deletions(-) -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 619B820A17 for ; Mon, 16 Jan 2017 21:24:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751229AbdAPVYH (ORCPT ); Mon, 16 Jan 2017 16:24:07 -0500 Received: from cloud.peff.net ([104.130.231.41]:39846 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750968AbdAPVYG (ORCPT ); Mon, 16 Jan 2017 16:24:06 -0500 Received: (qmail 20138 invoked by uid 109); 16 Jan 2017 21:24:06 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 21:24:06 +0000 Received: (qmail 12324 invoked by uid 111); 16 Jan 2017 21:24:59 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 16:24:59 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 16 Jan 2017 16:24:03 -0500 Date: Mon, 16 Jan 2017 16:24:03 -0500 From: Jeff King To: git@vger.kernel.org Cc: Michael Haggerty , Johannes Schindelin Subject: [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test Message-ID: <20170116212403.l7ca7crmt47id3mu@sigill.intra.peff.net> References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org This test creates a multi-level set of trees, but its cleanup routine only removes the top-level tree. After the test finishes, the inner tree and the blob it points to remain, making the inner tree dangling. A later test ("cleaned up") verifies that we've removed any cruft and "git fsck" output is clean. This passes only because of a bug in git-fsck which fails to notice dangling trees. In preparation for fixing the bug, let's teach this earlier test to clean up after itself correctly. We have to remove the inner tree (and therefore the blob, too, which becomes dangling after removing that tree). Since the setup code happens inside a subshell, we can't just set a variable for each object. However, we can stuff all of the sha1s into the $T output variable, which is not used for anything except cleanup. Signed-off-by: Jeff King --- t/t1450-fsck.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index ee7d4736d..6eef8b28e 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -189,14 +189,16 @@ test_expect_success 'commit with NUL in header' ' ' test_expect_success 'tree object with duplicate entries' ' - test_when_finished "remove_object \$T" && + test_when_finished "for i in \$T; do remove_object \$i; done" && T=$( GIT_INDEX_FILE=test-index && export GIT_INDEX_FILE && rm -f test-index && >x && git add x && + git rev-parse :x && T=$(git write-tree) && + echo $T && ( git cat-file tree $T && git cat-file tree $T -- 2.11.0.642.gd6f8cda6c From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id AF34820A17 for ; Mon, 16 Jan 2017 21:25:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750964AbdAPVZi (ORCPT ); Mon, 16 Jan 2017 16:25:38 -0500 Received: from cloud.peff.net ([104.130.231.41]:39850 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750970AbdAPVZi (ORCPT ); Mon, 16 Jan 2017 16:25:38 -0500 Received: (qmail 20260 invoked by uid 109); 16 Jan 2017 21:25:37 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 21:25:37 +0000 Received: (qmail 12344 invoked by uid 111); 16 Jan 2017 21:26:31 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 16:26:31 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 16 Jan 2017 16:25:35 -0500 Date: Mon, 16 Jan 2017 16:25:35 -0500 From: Jeff King To: git@vger.kernel.org Cc: Michael Haggerty , Johannes Schindelin Subject: [PATCH 2/6] fsck: report trees as dangling Message-ID: <20170116212535.cohvikwkju5zehr4@sigill.intra.peff.net> References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org After checking connectivity, fsck looks through the list of any objects we've seen mentioned, and reports unreachable and un-"used" ones as dangling. However, it skips any object which is not marked as "parsed", as that is an object that we _don't_ have (but that somebody mentioned). Since 6e454b9a3 (clear parsed flag when we free tree buffers, 2013-06-05), that flag can't be relied on, and the correct method is to check the HAS_OBJ flag. The cleanup in that commit missed this callsite, though. As a result, we would generally fail to report dangling trees. We never noticed because there were no tests in this area (for trees or otherwise). Let's add some. Signed-off-by: Jeff King --- builtin/fsck.c | 2 +- t/t1450-fsck.sh | 27 +++++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 1 deletion(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index f01b81eeb..3e67203f9 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -225,7 +225,7 @@ static void check_unreachable_object(struct object *obj) * to complain about it being unreachable (since it does * not exist). */ - if (!obj->parsed) + if (!(obj->flags & HAS_OBJ)) return; /* diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 6eef8b28e..e88ec7747 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -559,4 +559,31 @@ test_expect_success 'fsck --name-objects' ' ) ' +# for each of type, we have one version which is referenced by another object +# (and so while unreachable, not dangling), and another variant which really is +# dangling. +test_expect_success 'fsck notices dangling objects' ' + git init dangling && + ( + cd dangling && + blob=$(echo not-dangling | git hash-object -w --stdin) && + dblob=$(echo dangling | git hash-object -w --stdin) && + tree=$(printf "100644 blob %s\t%s\n" $blob one | git mktree) && + dtree=$(printf "100644 blob %s\t%s\n" $blob two | git mktree) && + commit=$(git commit-tree $tree) && + dcommit=$(git commit-tree -p $commit $tree) && + + cat >expect <<-EOF && + dangling blob $dblob + dangling commit $dcommit + dangling tree $dtree + EOF + + git fsck >actual && + # the output order is non-deterministic, as it comes from a hash + sort actual.sorted && + test_cmp expect actual.sorted + ) +' + test_done -- 2.11.0.642.gd6f8cda6c From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 311AE20A17 for ; Mon, 16 Jan 2017 21:34:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751832AbdAPVei (ORCPT ); Mon, 16 Jan 2017 16:34:38 -0500 Received: from cloud.peff.net ([104.130.231.41]:39864 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751680AbdAPVef (ORCPT ); Mon, 16 Jan 2017 16:34:35 -0500 Received: (qmail 20784 invoked by uid 109); 16 Jan 2017 21:34:23 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 21:34:23 +0000 Received: (qmail 12382 invoked by uid 111); 16 Jan 2017 21:35:17 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 16:35:17 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 16 Jan 2017 16:34:21 -0500 Date: Mon, 16 Jan 2017 16:34:21 -0500 From: Jeff King To: git@vger.kernel.org Cc: Michael Haggerty , Johannes Schindelin Subject: [PATCH 5/6] fsck: do not fallback "git fsck " to "git fsck" Message-ID: <20170116213420.d3e6ziu4gt3app4x@sigill.intra.peff.net> References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Since fsck tries to continue as much as it can after seeing an error, we still do the reachability check even if some heads we were given on the command-line are bogus. But if _none_ of the heads is is valid, we fallback to checking all refs and the index, which is not what the user asked for at all. Instead of checking "heads", the number of successful heads we got, check "argc" (which we know only has non-options in it, because parse_options removed the others). Signed-off-by: Jeff King --- builtin/fsck.c | 2 +- t/t1450-fsck.sh | 11 +++++++++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index c7d0590e5..8ae065b2d 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -778,7 +778,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) * default ones from .git/refs. We also consider the index file * in this case (ie this implies --cache). */ - if (!heads) { + if (!argc) { get_default_heads(); keep_cache_objects = 1; } diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 2f3b05276..96b74dc9a 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -610,4 +610,15 @@ test_expect_success 'fsck $name notices bogus $name' ' test_must_fail git fsck $_z40 ' +test_expect_success 'bogus head does not fallback to all heads' ' + # set up a case that will cause a reachability complaint + echo to-be-deleted >foo && + git add foo && + blob=$(git rev-parse :foo) && + test_when_finished "git rm --cached foo" && + remove_object $blob && + test_must_fail git fsck $_z40 >out 2>&1 && + ! grep $blob out +' + test_done -- 2.11.0.642.gd6f8cda6c From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id CEAE020A17 for ; Mon, 16 Jan 2017 21:35:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751596AbdAPVfG (ORCPT ); Mon, 16 Jan 2017 16:35:06 -0500 Received: from cloud.peff.net ([104.130.231.41]:39870 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750803AbdAPVfE (ORCPT ); Mon, 16 Jan 2017 16:35:04 -0500 Received: (qmail 20797 invoked by uid 109); 16 Jan 2017 21:34:59 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 21:34:59 +0000 Received: (qmail 12401 invoked by uid 111); 16 Jan 2017 21:35:53 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 16:35:53 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 16 Jan 2017 16:34:57 -0500 Date: Mon, 16 Jan 2017 16:34:57 -0500 From: Jeff King To: git@vger.kernel.org Cc: Michael Haggerty , Johannes Schindelin Subject: [PATCH 6/6] fsck: check HAS_OBJ more consistently Message-ID: <20170116213457.n7twsvr2vozdpgcj@sigill.intra.peff.net> References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org There are two spots that call lookup_object() and assume that a non-NULL result means we have the object: 1. When we're checking the objects given to us by the user on the command line. 2. When we're checking if a reflog entry is valid. This generally follows fsck's mental model that we will have looked at and loaded a "struct object" for each object in the repository. But it misses one case: if another object _mentioned_ an object, but we didn't actually parse it or verify that it exists, it will still have a struct. It's not clear if this is a triggerable bug or not. Certainly the later parts of the reachability check need to be careful of this, and do so by checking the HAS_OBJ flag. But both of these steps happen before we start traversing, so probably we won't have followed any links yet. Still, it's easy enough to be defensive here. Signed-off-by: Jeff King --- builtin/fsck.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 8ae065b2d..28d3cbb14 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -398,7 +398,7 @@ static void fsck_handle_reflog_sha1(const char *refname, unsigned char *sha1, if (!is_null_sha1(sha1)) { obj = lookup_object(sha1); - if (obj) { + if (obj && (obj->flags & HAS_OBJ)) { if (timestamp && name_objects) add_decoration(fsck_walk_options.object_names, obj, @@ -755,7 +755,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) if (!get_sha1(arg, sha1)) { struct object *obj = lookup_object(sha1); - if (!obj) { + if (!obj || !(obj->flags & HAS_OBJ)) { error("%s: object missing", sha1_to_hex(sha1)); errors_found |= ERROR_OBJECT; continue; -- 2.11.0.642.gd6f8cda6c From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id CBDA320A17 for ; Mon, 16 Jan 2017 21:38:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751682AbdAPVi5 (ORCPT ); Mon, 16 Jan 2017 16:38:57 -0500 Received: from cloud.peff.net ([104.130.231.41]:39874 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751600AbdAPVis (ORCPT ); Mon, 16 Jan 2017 16:38:48 -0500 Received: (qmail 20605 invoked by uid 109); 16 Jan 2017 21:32:06 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 21:32:06 +0000 Received: (qmail 12370 invoked by uid 111); 16 Jan 2017 21:33:00 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 16:33:00 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 16 Jan 2017 16:32:04 -0500 Date: Mon, 16 Jan 2017 16:32:04 -0500 From: Jeff King To: git@vger.kernel.org Cc: Michael Haggerty , Johannes Schindelin Subject: [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check Message-ID: <20170116213204.e7ykwowqzafkexqd@sigill.intra.peff.net> References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Normally fsck makes a pass over all objects to check their integrity, and then follows up with a reachability check to make sure we have all of the referenced objects (and to know which ones are dangling). The latter checks for the HAS_OBJ flag in obj->flags to see if we found the object in the first pass. Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`, 2015-06-22) taught fsck to skip the initial pass, and to fallback to has_sha1_file() instead of the HAS_OBJ check. However, it converted only one HAS_OBJ check to use has_sha1_file(). But there are many other places in builtin/fsck.c that assume that the flag is set (or that lookup_object() will return an object at all). This leads to several bugs with --connectivity-only: 1. mark_object() will not queue objects for examination, so recursively following links from commits to trees, etc, did nothing. I.e., we were checking the reachability of hardly anything at all. 2. When a set of heads is given on the command-line, we use lookup_object() to see if they exist. But without the initial pass, we assume nothing exists. 3. When loading reflog entries, we do a similar lookup_object() check, and complain that the reflog is broken if the object doesn't exist in our hash. So in short, --connectivity-only is broken pretty badly, and will claim that your repository is fine when it's not. Presumably nobody noticed for a few reasons. One is that the embedded test does not actually test the recursive nature of the reachability check. All of the missing objects are still in the index, and we directly check items from the index. This patch modifies the test to delete the index, which shows off breakage (1). Another is that --connectivity-only just skips the initial pass for loose objects. So on a real repository, the packed objects were still checked correctly. But on the flipside, it means that "git fsck --connectivity-only" still checks the sha1 of all of the packed objects, nullifying its original purpose of being a faster git-fsck. And of course the final problem is that the bug only shows up when there _is_ corruption, which is rare. So anybody running "git fsck --connectivity-only" proactively would assume it was being thorough, when it was not. One possibility for fixing this is to find all of the spots that rely on HAS_OBJ and tweak them for the connectivity-only case. But besides the risk that we might miss a spot (and I found three already, corresponding to the three bugs above), there are other parts of fsck that _can't_ work without a full list of objects. E.g., the list of dangling objects. Instead, let's make the connectivity-only case look more like the normal case. Rather than skip the initial pass completely, we'll do an abbreviated one that sets up the HAS_OBJ flag for each object, without actually loading the object data. That's simple and fast, and we don't have to care about the connectivity_only flag in the rest of the code at all. While we're at it, let's make sure we treat loose and packed objects the same (i.e., setting up dummy objects for both and skipping the actual sha1 check). That makes the connectivity-only check actually fast on a real repo (40 seconds versus 180 seconds on my copy of linux.git). Signed-off-by: Jeff King --- The cmd_fsck() part of the diff is pretty nasty without "-b". I had originally planned to pull the "stop calling verify_pack()" bit out into its own patch. But doing this without treating loose and packed objects the same raises a lot of questions about why one and not the other. It seemed simpler to just explain it all together. builtin/fsck.c | 118 +++++++++++++++++++++++++++++++++++++++++++++----------- t/t1450-fsck.sh | 23 ++++++++++- 2 files changed, 117 insertions(+), 24 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 3e67203f9..f527d8a02 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -205,8 +205,6 @@ static void check_reachable_object(struct object *obj) if (!(obj->flags & HAS_OBJ)) { if (has_sha1_pack(obj->oid.hash)) return; /* it is in pack - forget about it */ - if (connectivity_only && has_object_file(&obj->oid)) - return; printf("missing %s %s\n", typename(obj->type), describe_object(obj)); errors_found |= ERROR_REACHABLE; @@ -584,6 +582,79 @@ static int fsck_cache_tree(struct cache_tree *it) return err; } +static void mark_object_for_connectivity(const unsigned char *sha1) +{ + struct object *obj = lookup_object(sha1); + + /* + * Setting the object type here isn't strictly necessary here for a + * connectivity check. In most cases, our walk will expect a certain + * type (e.g., a tree referencing a blob) and will use lookup_blob() to + * assign the type. But doing it here has two advantages: + * + * 1. When the fsck_walk code looks at objects that _don't_ come from + * links (e.g., the tip of a ref), it may complain about the + * "unknown object type". + * + * 2. This serves as a nice cross-check that the graph links are + * sane. So --connectivity-only does not check that the bits of + * blobs are not corrupted, but it _does_ check that 100644 tree + * entries point to blobs, and so forth. + * + * Unfortunately we can't just use parse_object() here, because the + * whole point of --connectivity-only is to avoid reading the object + * data more than necessary. + */ + if (!obj || obj->type == OBJ_NONE) { + enum object_type type = sha1_object_info(sha1, NULL); + switch (type) { + case OBJ_BAD: + error("%s: unable to read object type", + sha1_to_hex(sha1)); + break; + case OBJ_COMMIT: + obj = (struct object *)lookup_commit(sha1); + break; + case OBJ_TREE: + obj = (struct object *)lookup_tree(sha1); + break; + case OBJ_BLOB: + obj = (struct object *)lookup_blob(sha1); + break; + case OBJ_TAG: + obj = (struct object *)lookup_tag(sha1); + break; + default: + error("%s: unknown object type %d", + sha1_to_hex(sha1), type); + } + + if (!obj || obj->type == OBJ_NONE) { + errors_found |= ERROR_OBJECT; + return; + } + } + + obj->flags |= HAS_OBJ; +} + +static int mark_loose_for_connectivity(const unsigned char *sha1, + const char *path, + void *data) +{ + mark_object_for_connectivity(sha1); + return 0; +} + +static int mark_packed_for_connectivity(const unsigned char *sha1, + struct packed_git *pack, + uint32_t pos, + void *data) +{ + mark_object_for_connectivity(sha1); + return 0; +} + static char const * const fsck_usage[] = { N_("git fsck [] [...]"), NULL @@ -646,32 +717,35 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) prepare_alt_odb(); for (alt = alt_odb_list; alt; alt = alt->next) fsck_object_dir(alt->path); - } - if (check_full) { - struct packed_git *p; - uint32_t total = 0, count = 0; - struct progress *progress = NULL; + if (check_full) { + struct packed_git *p; + uint32_t total = 0, count = 0; + struct progress *progress = NULL; + + prepare_packed_git(); - prepare_packed_git(); + if (show_progress) { + for (p = packed_git; p; p = p->next) { + if (open_pack_index(p)) + continue; + total += p->num_objects; + } - if (show_progress) { + progress = start_progress(_("Checking objects"), total); + } for (p = packed_git; p; p = p->next) { - if (open_pack_index(p)) - continue; - total += p->num_objects; + /* verify gives error messages itself */ + if (verify_pack(p, fsck_obj_buffer, + progress, count)) + errors_found |= ERROR_PACK; + count += p->num_objects; } - - progress = start_progress(_("Checking objects"), total); - } - for (p = packed_git; p; p = p->next) { - /* verify gives error messages itself */ - if (verify_pack(p, fsck_obj_buffer, - progress, count)) - errors_found |= ERROR_PACK; - count += p->num_objects; + stop_progress(&progress); } - stop_progress(&progress); + } else { + for_each_loose_object(mark_loose_for_connectivity, NULL, 0); + for_each_packed_object(mark_packed_for_connectivity, NULL, 0); } heads = 0; diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index e88ec7747..c1b2dda33 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -523,9 +523,21 @@ test_expect_success 'fsck --connectivity-only' ' touch empty && git add empty && test_commit empty && + + # Drop the index now; we want to be sure that we + # recursively notice that we notice the broken objects + # because they are reachable from refs, not because + # they are in the index. + rm -f .git/index && + + # corrupt the blob, but in a way that we can still identify + # its type. That lets us see that --connectivity-only is + # not actually looking at the contents, but leaves it + # free to examine the type if it chooses. empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 && - rm -f $empty && - echo invalid >$empty && + blob=$(echo unrelated | git hash-object -w --stdin) && + mv $(sha1_file $blob) $empty && + test_must_fail git fsck --strict && git fsck --strict --connectivity-only && tree=$(git rev-parse HEAD:) && @@ -537,6 +549,13 @@ test_expect_success 'fsck --connectivity-only' ' ) ' +test_expect_success 'fsck --connectivity-only with explicit head' ' + ( + cd connectivity-only && + test_must_fail git fsck --connectivity-only $tree + ) +' + remove_loose_object () { sha1="$(git rev-parse "$1")" && remainder=${sha1#??} && -- 2.11.0.642.gd6f8cda6c From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id C90CF20A17 for ; Mon, 16 Jan 2017 21:40:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751475AbdAPVkN (ORCPT ); Mon, 16 Jan 2017 16:40:13 -0500 Received: from cloud.peff.net ([104.130.231.41]:39877 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750863AbdAPVkN (ORCPT ); Mon, 16 Jan 2017 16:40:13 -0500 Received: (qmail 20724 invoked by uid 109); 16 Jan 2017 21:33:31 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 21:33:31 +0000 Received: (qmail 12377 invoked by uid 111); 16 Jan 2017 21:34:25 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 16:34:25 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 16 Jan 2017 16:33:29 -0500 Date: Mon, 16 Jan 2017 16:33:29 -0500 From: Jeff King To: git@vger.kernel.org Cc: Michael Haggerty , Johannes Schindelin Subject: [PATCH 4/6] fsck: tighten error-checks of "git fsck " Message-ID: <20170116213329.jk26zvcp7erzfc6l@sigill.intra.peff.net> References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Instead of checking reachability from the refs, you can ask fsck to check from a particular set of heads. However, the error checking here is quite lax. In particular: 1. It claims lookup_object() will report an error, which is not true. It only does a hash lookup, and the user has no clue that their argument was skipped. 2. When either the name or sha1 cannot be resolved, we continue to exit with a successful error code, even though we didn't check what the user asked us to. This patch fixes both of these cases. Signed-off-by: Jeff King --- builtin/fsck.c | 7 +++++-- t/t1450-fsck.sh | 5 +++++ 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index f527d8a02..c7d0590e5 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -755,9 +755,11 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) if (!get_sha1(arg, sha1)) { struct object *obj = lookup_object(sha1); - /* Error is printed by lookup_object(). */ - if (!obj) + if (!obj) { + error("%s: object missing", sha1_to_hex(sha1)); + errors_found |= ERROR_OBJECT; continue; + } obj->used = 1; if (name_objects) @@ -768,6 +770,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) continue; } error("invalid parameter: expected sha1, got '%s'", arg); + errors_found |= ERROR_OBJECT; } /* diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index c1b2dda33..2f3b05276 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -605,4 +605,9 @@ test_expect_success 'fsck notices dangling objects' ' ) ' +test_expect_success 'fsck $name notices bogus $name' ' + test_must_fail git fsck bogus && + test_must_fail git fsck $_z40 +' + test_done -- 2.11.0.642.gd6f8cda6c From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id A896020A17 for ; Mon, 16 Jan 2017 23:33:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751342AbdAPXdN (ORCPT ); Mon, 16 Jan 2017 18:33:13 -0500 Received: from cloud.peff.net ([104.130.231.41]:39938 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751089AbdAPXdM (ORCPT ); Mon, 16 Jan 2017 18:33:12 -0500 Received: (qmail 29543 invoked by uid 109); 16 Jan 2017 23:26:30 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 23:26:30 +0000 Received: (qmail 13249 invoked by uid 111); 16 Jan 2017 23:27:24 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Mon, 16 Jan 2017 18:27:24 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Mon, 16 Jan 2017 18:26:28 -0500 Date: Mon, 16 Jan 2017 18:26:28 -0500 From: Jeff King To: git@vger.kernel.org Cc: Michael Haggerty , Johannes Schindelin Subject: Re: [PATCH 0/6] fsck --connectivity-check misses some corruption Message-ID: <20170116232627.hdzovhho3lnl4kdr@sigill.intra.peff.net> References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Jan 16, 2017 at 04:22:31PM -0500, Jeff King wrote: > I came across a repository today that was missing an object, and for > which "git fsck" reported the error but "git fsck --connectivity-check" > did not. It turns out that the shortcut taken by --connectivity-check > violates some assumptions made by the rest of fsck (namely, that every > object in the repo has a "struct object" loaded). > > And fsck being a generally neglected tool, I couldn't help but find > several more bugs on the way. :) > > [1/6]: t1450: clean up sub-objects in duplicate-entry test > [2/6]: fsck: report trees as dangling > [3/6]: fsck: prepare dummy objects for --connectivity-check > [4/6]: fsck: tighten error-checks of "git fsck " > [5/6]: fsck: do not fallback "git fsck " to "git fsck" > [6/6]: fsck: check HAS_OBJ more consistently > > builtin/fsck.c | 131 ++++++++++++++++++++++++++++++++++++++++++++------------ > t/t1450-fsck.sh | 70 ++++++++++++++++++++++++++++-- > 2 files changed, 171 insertions(+), 30 deletions(-) Oh, one thing I forgot to mention: I tried test-merging this with my other recent topic in the area, jk/loose-object-fsck. There are a few conflicts in the test script, but nothing hard to resolve (just both of them adding tests at the end, but both are careful to clean up after themselves). -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id C7C0720756 for ; Tue, 17 Jan 2017 20:57:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751042AbdAQU5S (ORCPT ); Tue, 17 Jan 2017 15:57:18 -0500 Received: from pb-smtp1.pobox.com ([64.147.108.70]:50093 "EHLO sasl.smtp.pobox.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751045AbdAQU5P (ORCPT ); Tue, 17 Jan 2017 15:57:15 -0500 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id 2C81F612D7; Tue, 17 Jan 2017 15:57:12 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=6eVgEZb+XmQCZDRzzOnY7EwZL9s=; b=OoP29M Y41JyBM++R43Z2WCSMcQRXPKen3pUXQVzsHRfH6cvMj4Irpx+j0pW9RKq5fJQs3Z Htmz480M5Hk00iMWBsyTP9gbnfOYFeOHKihkcFvcG7GRtN7RywAmnIXOOjQpUy77 N4YxTUtv4XyiE6gasLRj72NX7kMynWrH8g5kE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=jbfLTes8R61hoF4+W5Bmt+ZVKxuxXDlB FJZYF6CNmroZv3aujHVlFlWqrWUoYSQU1b4eN04f68yhCrkzBPdtx5O68HNKyW5O FJGEdS9ByvGITZswhHRYzBWevWUZcQFF862Drm0H6X/ZMNR6eXGsYfCDC6F2Xqjt d8Y36ksvQDw= Received: from pb-smtp1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id 1A698612D6; Tue, 17 Jan 2017 15:57:12 -0500 (EST) Received: from pobox.com (unknown [104.132.0.95]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by pb-smtp1.pobox.com (Postfix) with ESMTPSA id 7B343612D4; Tue, 17 Jan 2017 15:57:11 -0500 (EST) From: Junio C Hamano To: Jeff King Cc: git@vger.kernel.org, Michael Haggerty , Johannes Schindelin Subject: Re: [PATCH 2/6] fsck: report trees as dangling References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> <20170116212535.cohvikwkju5zehr4@sigill.intra.peff.net> Date: Tue, 17 Jan 2017 12:57:10 -0800 In-Reply-To: <20170116212535.cohvikwkju5zehr4@sigill.intra.peff.net> (Jeff King's message of "Mon, 16 Jan 2017 16:25:35 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.90 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 843FBE20-DCF7-11E6-AB1E-FE3F13518317-77302942!pb-smtp1.pobox.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Jeff King writes: > After checking connectivity, fsck looks through the list of > any objects we've seen mentioned, and reports unreachable > and un-"used" ones as dangling. However, it skips any object > which is not marked as "parsed", as that is an object that > we _don't_ have (but that somebody mentioned). > > Since 6e454b9a3 (clear parsed flag when we free tree > buffers, 2013-06-05), that flag can't be relied on, and the > correct method is to check the HAS_OBJ flag. The cleanup in > that commit missed this callsite, though. As a result, we > would generally fail to report dangling trees. > > We never noticed because there were no tests in this area > (for trees or otherwise). Let's add some. > > Signed-off-by: Jeff King > --- Makes sense, and the new test is very easy to read, too. Queued; thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 27F2D20756 for ; Tue, 17 Jan 2017 20:57:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751030AbdAQU5R (ORCPT ); Tue, 17 Jan 2017 15:57:17 -0500 Received: from pb-smtp2.pobox.com ([64.147.108.71]:54154 "EHLO sasl.smtp.pobox.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751023AbdAQU5P (ORCPT ); Tue, 17 Jan 2017 15:57:15 -0500 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 271A061A06; Tue, 17 Jan 2017 15:52:46 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=Iolid1TwW63T+FCrIx1suseVMl4=; b=MLFxlg yZyDURl23wCuXByA7vD6uIL9wb9BQhXsKVb9649heO39hhKr1LQ0Kvh4xJqo2JQ5 r1VcXXT0B0AM2abQPwraovfzNRKJ23E39i0LepXS96ZiYiHOVtlQWKRjDgdC6+eT xh6Nqjuo5MajbABHr8LxPSZ2NbKcUEIHfxSjo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=KUgklUl+kK1kTOy0QqMVTbeGQiUEbL8n VTKHRC0Nks10UtbWsD6rG7hECIKJetxTbqAR6BxcSblhwPWAhje0uQJkpIZgll4S xxWGygiKszsgk9JqP+YOoCdD0oDcb87fSdoeLwiWgcLbEmyPcS3WL+v7E5d4Xctj JRzidqVbeeU= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 1F86A61A05; Tue, 17 Jan 2017 15:52:46 -0500 (EST) Received: from pobox.com (unknown [104.132.0.95]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id 7849B61A03; Tue, 17 Jan 2017 15:52:45 -0500 (EST) From: Junio C Hamano To: Jeff King Cc: git@vger.kernel.org, Michael Haggerty , Johannes Schindelin Subject: Re: [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> <20170116212403.l7ca7crmt47id3mu@sigill.intra.peff.net> Date: Tue, 17 Jan 2017 12:52:43 -0800 In-Reply-To: <20170116212403.l7ca7crmt47id3mu@sigill.intra.peff.net> (Jeff King's message of "Mon, 16 Jan 2017 16:24:03 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.90 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: E5B23648-DCF6-11E6-9121-A7617B1B28F4-77302942!pb-smtp2.pobox.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Jeff King writes: > This test creates a multi-level set of trees, but its > cleanup routine only removes the top-level tree. After the > test finishes, the inner tree and the blob it points to > remain, making the inner tree dangling. > > A later test ("cleaned up") verifies that we've removed any > cruft and "git fsck" output is clean. This passes only > because of a bug in git-fsck which fails to notice dangling > trees. > > In preparation for fixing the bug, let's teach this earlier > test to clean up after itself correctly. We have to remove > the inner tree (and therefore the blob, too, which becomes > dangling after removing that tree). > > Since the setup code happens inside a subshell, we can't > just set a variable for each object. However, we can stuff > all of the sha1s into the $T output variable, which is not > used for anything except cleanup. > > Signed-off-by: Jeff King > --- > t/t1450-fsck.sh | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) Thanks. It is tempting to move this loop to remove_object, but that is not necessary while the user is only this one. > > diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh > index ee7d4736d..6eef8b28e 100755 > --- a/t/t1450-fsck.sh > +++ b/t/t1450-fsck.sh > @@ -189,14 +189,16 @@ test_expect_success 'commit with NUL in header' ' > ' > > test_expect_success 'tree object with duplicate entries' ' > - test_when_finished "remove_object \$T" && > + test_when_finished "for i in \$T; do remove_object \$i; done" && > T=$( > GIT_INDEX_FILE=test-index && > export GIT_INDEX_FILE && > rm -f test-index && > >x && > git add x && > + git rev-parse :x && > T=$(git write-tree) && > + echo $T && > ( > git cat-file tree $T && > git cat-file tree $T From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 1253C20756 for ; Tue, 17 Jan 2017 21:01:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751164AbdAQVBN (ORCPT ); Tue, 17 Jan 2017 16:01:13 -0500 Received: from cloud.peff.net ([104.130.231.41]:40477 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750888AbdAQVBM (ORCPT ); Tue, 17 Jan 2017 16:01:12 -0500 Received: (qmail 25697 invoked by uid 109); 17 Jan 2017 20:53:57 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Tue, 17 Jan 2017 20:53:57 +0000 Received: (qmail 21347 invoked by uid 111); 17 Jan 2017 20:54:51 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Tue, 17 Jan 2017 15:54:51 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 17 Jan 2017 15:53:54 -0500 Date: Tue, 17 Jan 2017 15:53:54 -0500 From: Jeff King To: Junio C Hamano Cc: git@vger.kernel.org, Michael Haggerty , Johannes Schindelin Subject: Re: [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test Message-ID: <20170117205354.wiczgyplbgsurn6e@sigill.intra.peff.net> References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> <20170116212403.l7ca7crmt47id3mu@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Tue, Jan 17, 2017 at 12:52:43PM -0800, Junio C Hamano wrote: > > Since the setup code happens inside a subshell, we can't > > just set a variable for each object. However, we can stuff > > all of the sha1s into the $T output variable, which is not > > used for anything except cleanup. > > > > Signed-off-by: Jeff King > > --- > > t/t1450-fsck.sh | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > Thanks. > > It is tempting to move this loop to remove_object, but that is not > necessary while the user is only this one. I agree it would be less gross. I avoided it because I knew that I hacked up remove_object() in the other topic. -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id C587B20756 for ; Tue, 17 Jan 2017 21:17:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750922AbdAQVRd (ORCPT ); Tue, 17 Jan 2017 16:17:33 -0500 Received: from pb-smtp1.pobox.com ([64.147.108.70]:53706 "EHLO sasl.smtp.pobox.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750952AbdAQVRc (ORCPT ); Tue, 17 Jan 2017 16:17:32 -0500 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id 060556179D; Tue, 17 Jan 2017 16:16:41 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=Z4SjpBlPQsARJKVJWMUsq8lf4hU=; b=euA5pV o5kPP9RAxlyu6OBPmyMCLxVVdPol3gbHeu//j5CIq4soTNUCuNPevArB5Df0ZfWh 7wDd5/2OCXhEJ7N6BOeL/RHNbnuygwqKizzFYbnVE7wiLpOgWzecZx1m0XPdzJP0 lxSfo5ZyJ5cXjK5tWM95BNce50lKQatLP3DIY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=AQXVhJJn0LxH0F+WqRS1c0UifkahPeeJ tUMDe+7VaVFLbAua3kX59ST7t5Sshm1DGWuO0Ty0LDWrxZVPM7dBnGpXP9iMjPwl 0BZKl9kt12XN7ZRMh96dtyN8TwT15LTSfX1QlhrqUZt39OVnfVS51HoKXWiJsWMh uS87JcKkzo8= Received: from pb-smtp1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id F23416179C; Tue, 17 Jan 2017 16:16:40 -0500 (EST) Received: from pobox.com (unknown [104.132.0.95]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by pb-smtp1.pobox.com (Postfix) with ESMTPSA id 3DBF86179B; Tue, 17 Jan 2017 16:16:40 -0500 (EST) From: Junio C Hamano To: Jeff King Cc: git@vger.kernel.org, Michael Haggerty , Johannes Schindelin Subject: Re: [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> <20170116213204.e7ykwowqzafkexqd@sigill.intra.peff.net> Date: Tue, 17 Jan 2017 13:16:39 -0800 In-Reply-To: <20170116213204.e7ykwowqzafkexqd@sigill.intra.peff.net> (Jeff King's message of "Mon, 16 Jan 2017 16:32:04 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.90 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 3CE12444-DCFA-11E6-918E-FE3F13518317-77302942!pb-smtp1.pobox.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Jeff King writes: > + # Drop the index now; we want to be sure that we > + # recursively notice that we notice the broken objects > + # because they are reachable from refs, not because > + # they are in the index. Rephrase to reduce redundunt "notice"s? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 6264E20756 for ; Tue, 17 Jan 2017 21:17:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751033AbdAQVRn (ORCPT ); Tue, 17 Jan 2017 16:17:43 -0500 Received: from pb-smtp2.pobox.com ([64.147.108.71]:55468 "EHLO sasl.smtp.pobox.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751030AbdAQVRm (ORCPT ); Tue, 17 Jan 2017 16:17:42 -0500 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 2213061F7B; Tue, 17 Jan 2017 16:17:42 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=XZiq3/636722/obGOpGPrL7EOBU=; b=uJu+KY APy8iTwjTSP7vl5dHX2PXJy7d4h/yyWAY7BY0XtivrunVPGxjj0TakYAOmJJXg39 mVnWfQ80iIvbKl+H7Hvv4O4bFjEyQk/pI64K50QbWGI/taNz/PTR7OzrG4cDDMej TRa5o6BXiRrNhFwLzl1PVjkCbvszZQhthoUBo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=K2k2jlSvhH9nCRVB4/5aRxS/lzBZif0d hLRftw5y/ks7jr1Osrnz7dDFj3WOLtU0uLkT6wlJ77TiwdhrbCiKOIZFGOZz2+ic 0CVz9pTPETo3j0mJulWXXD/fjK5jbXiTuR6h2TSxVDeyOltn1ekmbAgSn2miLFE7 Izyq4Eq1zjg= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 1A0AA61F7A; Tue, 17 Jan 2017 16:17:42 -0500 (EST) Received: from pobox.com (unknown [104.132.0.95]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id 5A80161F78; Tue, 17 Jan 2017 16:17:41 -0500 (EST) From: Junio C Hamano To: Jeff King Cc: git@vger.kernel.org, Michael Haggerty , Johannes Schindelin Subject: Re: [PATCH 4/6] fsck: tighten error-checks of "git fsck " References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> <20170116213329.jk26zvcp7erzfc6l@sigill.intra.peff.net> Date: Tue, 17 Jan 2017 13:17:39 -0800 In-Reply-To: <20170116213329.jk26zvcp7erzfc6l@sigill.intra.peff.net> (Jeff King's message of "Mon, 16 Jan 2017 16:33:29 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.90 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 614FB4F8-DCFA-11E6-9F4D-A7617B1B28F4-77302942!pb-smtp2.pobox.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Jeff King writes: > Instead of checking reachability from the refs, you can ask > fsck to check from a particular set of heads. However, the > error checking here is quite lax. In particular: > > 1. It claims lookup_object() will report an error, which > is not true. It only does a hash lookup, and the user > has no clue that their argument was skipped. > > 2. When either the name or sha1 cannot be resolved, we > continue to exit with a successful error code, even > though we didn't check what the user asked us to. > > This patch fixes both of these cases. > > Signed-off-by: Jeff King > --- Makes sense, too. Thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id D4CB020756 for ; Tue, 17 Jan 2017 21:25:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751169AbdAQVWr (ORCPT ); Tue, 17 Jan 2017 16:22:47 -0500 Received: from pb-smtp2.pobox.com ([64.147.108.71]:60162 "EHLO sasl.smtp.pobox.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751031AbdAQVVs (ORCPT ); Tue, 17 Jan 2017 16:21:48 -0500 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 53C1461F26; Tue, 17 Jan 2017 16:15:59 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=9JTFHPq6XF8Wv0gHtDAI6pE9dNo=; b=XGlDqG IVNFW6CibnJq0+FODJjN2ZyJKxGgDoUHE9+z34uJyEvbtkfARQ0QLuYIuTlyPUWw naSebfkt8t3MFr4lx50aPgKWLUf9BN/qVaArlWkKDNG8F68MDsyL1idIefUr005N WeCByMR0hc9+vkqgcQOLPM81l8tubU7GgkBN4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=rUSzqiKXly9hLYI4YM+vF0LW6VSyME1Q r/XYokiTry0M7ie9mD14gzA4hYdBcEtA1NuZFoBJMIGvdlJMSKoYKTcyvq5w0MMk TIdPsuUwLjGNRXE1Ln+kR6WXk34+MwxgLylnNO11t8Ig0FuCcwoKa6sd69AlzdBU t9LCvqBIO6I= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 4B97B61F25; Tue, 17 Jan 2017 16:15:59 -0500 (EST) Received: from pobox.com (unknown [104.132.0.95]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id A582661F24; Tue, 17 Jan 2017 16:15:58 -0500 (EST) From: Junio C Hamano To: Jeff King Cc: git@vger.kernel.org, Michael Haggerty , Johannes Schindelin Subject: Re: [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> <20170116213204.e7ykwowqzafkexqd@sigill.intra.peff.net> Date: Tue, 17 Jan 2017 13:15:57 -0800 In-Reply-To: <20170116213204.e7ykwowqzafkexqd@sigill.intra.peff.net> (Jeff King's message of "Mon, 16 Jan 2017 16:32:04 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.90 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 24190C24-DCFA-11E6-B040-A7617B1B28F4-77302942!pb-smtp2.pobox.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Jeff King writes: > +static void mark_object_for_connectivity(const unsigned char *sha1) > +{ > + struct object *obj = lookup_object(sha1); > + > + /* > + * Setting the object type here isn't strictly necessary here for a > + * connectivity check. Drop one of these "here"s? > The cmd_fsck() part of the diff is pretty nasty without > "-b". True. I also wonder if swapping the if/else arms make the end result and the patch easier to read. i.e. + if (connectivity_only) { + mark loose for connectivity; + mark packed for connectivity; + } else { ... existing code comes here reindented ... } But the patch makes sense. > diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh > index e88ec7747..c1b2dda33 100755 > --- a/t/t1450-fsck.sh > +++ b/t/t1450-fsck.sh > @@ -523,9 +523,21 @@ test_expect_success 'fsck --connectivity-only' ' > touch empty && > git add empty && > test_commit empty && > + > + # Drop the index now; we want to be sure that we > + # recursively notice that we notice the broken objects > + # because they are reachable from refs, not because > + # they are in the index. > + rm -f .git/index && > + > + # corrupt the blob, but in a way that we can still identify > + # its type. That lets us see that --connectivity-only is > + # not actually looking at the contents, but leaves it > + # free to examine the type if it chooses. > empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 && > - rm -f $empty && > - echo invalid >$empty && > + blob=$(echo unrelated | git hash-object -w --stdin) && > + mv $(sha1_file $blob) $empty && > + > test_must_fail git fsck --strict && > git fsck --strict --connectivity-only && > tree=$(git rev-parse HEAD:) && > @@ -537,6 +549,13 @@ test_expect_success 'fsck --connectivity-only' ' > ) > ' > > +test_expect_success 'fsck --connectivity-only with explicit head' ' > + ( > + cd connectivity-only && > + test_must_fail git fsck --connectivity-only $tree > + ) > +' Most of the earlier "tree=..." assignments are done in subshells, and it is not clear which tree this refers to. Is this the one that was written in 'rev-list --verify-objects with bad sha1' that has been removed in its when-finished handler? > remove_loose_object () { > sha1="$(git rev-parse "$1")" && > remainder=${sha1#??} && From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 6376220756 for ; Tue, 17 Jan 2017 21:34:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751261AbdAQVd6 (ORCPT ); Tue, 17 Jan 2017 16:33:58 -0500 Received: from cloud.peff.net ([104.130.231.41]:40525 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750929AbdAQVdq (ORCPT ); Tue, 17 Jan 2017 16:33:46 -0500 Received: (qmail 28158 invoked by uid 109); 17 Jan 2017 21:33:00 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Tue, 17 Jan 2017 21:33:00 +0000 Received: (qmail 22082 invoked by uid 111); 17 Jan 2017 21:33:54 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Tue, 17 Jan 2017 16:33:54 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Tue, 17 Jan 2017 16:32:57 -0500 Date: Tue, 17 Jan 2017 16:32:57 -0500 From: Jeff King To: Junio C Hamano Cc: git@vger.kernel.org, Michael Haggerty , Johannes Schindelin Subject: Re: [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check Message-ID: <20170117213257.hfqpkic4olav22z5@sigill.intra.peff.net> References: <20170116212231.ojoqzlajpszifaf3@sigill.intra.peff.net> <20170116213204.e7ykwowqzafkexqd@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Tue, Jan 17, 2017 at 01:15:57PM -0800, Junio C Hamano wrote: > Jeff King writes: > > > +static void mark_object_for_connectivity(const unsigned char *sha1) > > +{ > > + struct object *obj = lookup_object(sha1); > > + > > + /* > > + * Setting the object type here isn't strictly necessary here for a > > + * connectivity check. > > Drop one of these "here"s? Yeah. > > The cmd_fsck() part of the diff is pretty nasty without > > "-b". > > True. I also wonder if swapping the if/else arms make the end > result and the patch easier to read. i.e. > > + if (connectivity_only) { > + mark loose for connectivity; > + mark packed for connectivity; > + } else { > ... existing code comes here reindented ... > } > > But the patch makes sense. Yeah. It doesn't help the diff, but the end result is more readable. And the conditional lacks a negation, which is nice. Will change. > > +test_expect_success 'fsck --connectivity-only with explicit head' ' > > + ( > > + cd connectivity-only && > > + test_must_fail git fsck --connectivity-only $tree > > + ) > > +' > > Most of the earlier "tree=..." assignments are done in subshells, > and it is not clear which tree this refers to. Is this the one that > was written in 'rev-list --verify-objects with bad sha1' that has > been removed in its when-finished handler? Doh. It was meant to be the one from the earlier --connectivity-only check. But that is doubly silly, even, because the tree variable in that case is holding the filename, not the sha1. The right thing is to call rev-parse again, but of course that doesn't work because the repo has been corrupted. :-/ Here's a re-roll with the two changes above (and the notice thing you mentioned elsewhere), plus a test that actually checks the right thing (fails before this commit and passes after). If everything else looks good, you can queue it along with the rest. Otherwise if I need a re-roll, it will be in the next version. -- >8 -- Subject: fsck: prepare dummy objects for --connectivity-check Normally fsck makes a pass over all objects to check their integrity, and then follows up with a reachability check to make sure we have all of the referenced objects (and to know which ones are dangling). The latter checks for the HAS_OBJ flag in obj->flags to see if we found the object in the first pass. Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`, 2015-06-22) taught fsck to skip the initial pass, and to fallback to has_sha1_file() instead of the HAS_OBJ check. However, it converted only one HAS_OBJ check to use has_sha1_file(). But there are many other places in builtin/fsck.c that assume that the flag is set (or that lookup_object() will return an object at all). This leads to several bugs with --connectivity-only: 1. mark_object() will not queue objects for examination, so recursively following links from commits to trees, etc, did nothing. I.e., we were checking the reachability of hardly anything at all. 2. When a set of heads is given on the command-line, we use lookup_object() to see if they exist. But without the initial pass, we assume nothing exists. 3. When loading reflog entries, we do a similar lookup_object() check, and complain that the reflog is broken if the object doesn't exist in our hash. So in short, --connectivity-only is broken pretty badly, and will claim that your repository is fine when it's not. Presumably nobody noticed for a few reasons. One is that the embedded test does not actually test the recursive nature of the reachability check. All of the missing objects are still in the index, and we directly check items from the index. This patch modifies the test to delete the index, which shows off breakage (1). Another is that --connectivity-only just skips the initial pass for loose objects. So on a real repository, the packed objects were still checked correctly. But on the flipside, it means that "git fsck --connectivity-only" still checks the sha1 of all of the packed objects, nullifying its original purpose of being a faster git-fsck. And of course the final problem is that the bug only shows up when there _is_ corruption, which is rare. So anybody running "git fsck --connectivity-only" proactively would assume it was being thorough, when it was not. One possibility for fixing this is to find all of the spots that rely on HAS_OBJ and tweak them for the connectivity-only case. But besides the risk that we might miss a spot (and I found three already, corresponding to the three bugs above), there are other parts of fsck that _can't_ work without a full list of objects. E.g., the list of dangling objects. Instead, let's make the connectivity-only case look more like the normal case. Rather than skip the initial pass completely, we'll do an abbreviated one that sets up the HAS_OBJ flag for each object, without actually loading the object data. That's simple and fast, and we don't have to care about the connectivity_only flag in the rest of the code at all. While we're at it, let's make sure we treat loose and packed objects the same (i.e., setting up dummy objects for both and skipping the actual sha1 check). That makes the connectivity-only check actually fast on a real repo (40 seconds versus 180 seconds on my copy of linux.git). Signed-off-by: Jeff King --- builtin/fsck.c | 120 +++++++++++++++++++++++++++++++++++++++++++++----------- t/t1450-fsck.sh | 29 +++++++++++++- 2 files changed, 124 insertions(+), 25 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 3e67203f9..75e836e2f 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -205,8 +205,6 @@ static void check_reachable_object(struct object *obj) if (!(obj->flags & HAS_OBJ)) { if (has_sha1_pack(obj->oid.hash)) return; /* it is in pack - forget about it */ - if (connectivity_only && has_object_file(&obj->oid)) - return; printf("missing %s %s\n", typename(obj->type), describe_object(obj)); errors_found |= ERROR_REACHABLE; @@ -584,6 +582,79 @@ static int fsck_cache_tree(struct cache_tree *it) return err; } +static void mark_object_for_connectivity(const unsigned char *sha1) +{ + struct object *obj = lookup_object(sha1); + + /* + * Setting the object type here isn't strictly necessary for a + * connectivity check. In most cases, our walk will expect a certain + * type (e.g., a tree referencing a blob) and will use lookup_blob() to + * assign the type. But doing it here has two advantages: + * + * 1. When the fsck_walk code looks at objects that _don't_ come from + * links (e.g., the tip of a ref), it may complain about the + * "unknown object type". + * + * 2. This serves as a nice cross-check that the graph links are + * sane. So --connectivity-only does not check that the bits of + * blobs are not corrupted, but it _does_ check that 100644 tree + * entries point to blobs, and so forth. + * + * Unfortunately we can't just use parse_object() here, because the + * whole point of --connectivity-only is to avoid reading the object + * data more than necessary. + */ + if (!obj || obj->type == OBJ_NONE) { + enum object_type type = sha1_object_info(sha1, NULL); + switch (type) { + case OBJ_BAD: + error("%s: unable to read object type", + sha1_to_hex(sha1)); + break; + case OBJ_COMMIT: + obj = (struct object *)lookup_commit(sha1); + break; + case OBJ_TREE: + obj = (struct object *)lookup_tree(sha1); + break; + case OBJ_BLOB: + obj = (struct object *)lookup_blob(sha1); + break; + case OBJ_TAG: + obj = (struct object *)lookup_tag(sha1); + break; + default: + error("%s: unknown object type %d", + sha1_to_hex(sha1), type); + } + + if (!obj || obj->type == OBJ_NONE) { + errors_found |= ERROR_OBJECT; + return; + } + } + + obj->flags |= HAS_OBJ; +} + +static int mark_loose_for_connectivity(const unsigned char *sha1, + const char *path, + void *data) +{ + mark_object_for_connectivity(sha1); + return 0; +} + +static int mark_packed_for_connectivity(const unsigned char *sha1, + struct packed_git *pack, + uint32_t pos, + void *data) +{ + mark_object_for_connectivity(sha1); + return 0; +} + static char const * const fsck_usage[] = { N_("git fsck [] [...]"), NULL @@ -640,38 +711,41 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) git_config(fsck_config, NULL); fsck_head_link(); - if (!connectivity_only) { + if (connectivity_only) { + for_each_loose_object(mark_loose_for_connectivity, NULL, 0); + for_each_packed_object(mark_packed_for_connectivity, NULL, 0); + } else { fsck_object_dir(get_object_directory()); prepare_alt_odb(); for (alt = alt_odb_list; alt; alt = alt->next) fsck_object_dir(alt->path); - } - if (check_full) { - struct packed_git *p; - uint32_t total = 0, count = 0; - struct progress *progress = NULL; + if (check_full) { + struct packed_git *p; + uint32_t total = 0, count = 0; + struct progress *progress = NULL; + + prepare_packed_git(); - prepare_packed_git(); + if (show_progress) { + for (p = packed_git; p; p = p->next) { + if (open_pack_index(p)) + continue; + total += p->num_objects; + } - if (show_progress) { + progress = start_progress(_("Checking objects"), total); + } for (p = packed_git; p; p = p->next) { - if (open_pack_index(p)) - continue; - total += p->num_objects; + /* verify gives error messages itself */ + if (verify_pack(p, fsck_obj_buffer, + progress, count)) + errors_found |= ERROR_PACK; + count += p->num_objects; } - - progress = start_progress(_("Checking objects"), total); - } - for (p = packed_git; p; p = p->next) { - /* verify gives error messages itself */ - if (verify_pack(p, fsck_obj_buffer, - progress, count)) - errors_found |= ERROR_PACK; - count += p->num_objects; + stop_progress(&progress); } - stop_progress(&progress); } heads = 0; diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index e88ec7747..4d1c3ba66 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -523,9 +523,21 @@ test_expect_success 'fsck --connectivity-only' ' touch empty && git add empty && test_commit empty && + + # Drop the index now; we want to be sure that we + # recursively notice the broken objects + # because they are reachable from refs, not because + # they are in the index. + rm -f .git/index && + + # corrupt the blob, but in a way that we can still identify + # its type. That lets us see that --connectivity-only is + # not actually looking at the contents, but leaves it + # free to examine the type if it chooses. empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 && - rm -f $empty && - echo invalid >$empty && + blob=$(echo unrelated | git hash-object -w --stdin) && + mv $(sha1_file $blob) $empty && + test_must_fail git fsck --strict && git fsck --strict --connectivity-only && tree=$(git rev-parse HEAD:) && @@ -537,6 +549,19 @@ test_expect_success 'fsck --connectivity-only' ' ) ' +test_expect_success 'fsck --connectivity-only with explicit head' ' + rm -rf connectivity-only && + git init connectivity-only && + ( + cd connectivity-only && + test_commit foo && + rm -f .git/index && + tree=$(git rev-parse HEAD^{tree}) && + remove_object $(git rev-parse HEAD:foo.t) && + test_must_fail git fsck --connectivity-only $tree + ) +' + remove_loose_object () { sha1="$(git rev-parse "$1")" && remainder=${sha1#??} && -- 2.11.0.651.g41f4a5c4e