From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH 3/3] gc: Clean garbage .bitmap files from pack dir Date: Fri, 18 Dec 2015 21:01:23 -0500 Message-ID: <20151219020123.GA31782@sigill.intra.peff.net> References: <20151215232534.GA30998@sigill.intra.peff.net> <1450483600-64091-1-git-send-email-dougk.ff7@gmail.com> <1450483600-64091-4-git-send-email-dougk.ff7@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: git@vger.kernel.org, sbeller@google.com, gitster@pobox.com To: Doug Kelly X-From: git-owner@vger.kernel.org Sat Dec 19 03:01:31 2015 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aA6pu-0007Fq-Sk for gcvg-git-2@plane.gmane.org; Sat, 19 Dec 2015 03:01:31 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932199AbbLSCB0 (ORCPT ); Fri, 18 Dec 2015 21:01:26 -0500 Received: from cloud.peff.net ([50.56.180.127]:44482 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752523AbbLSCB0 (ORCPT ); Fri, 18 Dec 2015 21:01:26 -0500 Received: (qmail 20444 invoked by uid 102); 19 Dec 2015 02:01:25 -0000 Received: from Unknown (HELO peff.net) (10.0.1.1) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Fri, 18 Dec 2015 20:01:25 -0600 Received: (qmail 2165 invoked by uid 107); 19 Dec 2015 02:01:34 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Fri, 18 Dec 2015 21:01:34 -0500 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Fri, 18 Dec 2015 21:01:23 -0500 Content-Disposition: inline In-Reply-To: <1450483600-64091-4-git-send-email-dougk.ff7@gmail.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Fri, Dec 18, 2015 at 06:06:40PM -0600, Doug Kelly wrote: > Similar to cleaning up excess .idx files, clean any garbage .bitmap > files that are not otherwise associated with any .idx/.pack files. > > Signed-off-by: Doug Kelly > --- > builtin/gc.c | 12 ++++++++++-- > t/t5304-prune.sh | 2 +- > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/builtin/gc.c b/builtin/gc.c > index c583aad..7ddf071 100644 > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -58,8 +58,16 @@ static void clean_pack_garbage(void) > > static void report_pack_garbage(unsigned seen_bits, const char *path) > { > - if (seen_bits == PACKDIR_FILE_IDX) > - string_list_append(&pack_garbage, path); > + if (seen_bits & PACKDIR_FILE_IDX || > + seen_bits & PACKDIR_FILE_BITMAP) { > + const char *dot = strrchr(path, '.'); > + if (dot) { > + int baselen = dot - path + 1; > + if (!strcmp(path+baselen, "idx") || > + !strcmp(path+baselen, "bitmap")) > + string_list_append(&pack_garbage, path); > + } > + } > } Hmm. Thinking on this further, do we actually need to check seen_bits here at all? The original was trying to ask "is this a .idx file" by checking seen_bits. That was actually broken by the first patch in this series for some cases, as we might send more bits. E.g., if you have "foo.idx" and "foo.pack", this function will get called twice (once per file), but with seen_bits set to IDX|BITMAP for both cases. And we would not match the "==" above, and would therefore fail to trigger. That case is re-fixed by this patch, which is good. But I think seen_bits is not really telling us anything at this point. We know it's a garbage case, or else report_helper wouldn't have passed it along to us. But we care only about the extension in the path, which is what distinguishes each individual call to this function. So we can just check that. I also think the logic may be clearer if we handle each extension exhaustively, like: /* We know these are useless without the matching .pack */ if (ends_with(path, ".bitmap") || ends_with(path, ".idx")) { string_list_append(&pack_garbage, path); return; } /* * A pack without other files cannot be used, but should be saved, * as this is a recoverable situation (we may even see it racily * as new packs come into existence). */ if (ends_with(path, ".pack")) return; /* * A .keep file is useless without the matching pack, but it * _could_ contain information generated by the user. Let's keep it. * In the future, we may expand this to look for obvious leftover * receive-pack locks and drop them. */ if (ends_with(path, ".keep")) return; /* * A totally unrelated garbage file should be kept, to err * on the conservative side. */ if (seen_bits & PACKDIR_FILE_GARBAGE) return; /* * We have a file type that the garbage-reporting functions * know about but we don't. This function needs updating. */ die("BUG: report_pack_garbage confused"); > -test_expect_failure 'clean pack garbage with gc' ' > +test_expect_success 'clean pack garbage with gc' ' > test_when_finished "rm -f .git/objects/pack/fake*" && > test_when_finished "rm -f .git/objects/pack/foo*" && > : >.git/objects/pack/foo.keep && And I think here we should make sure that we are covering the above situations (and especially that we are keeping files that should be kept). -Peff