From: Jeff King <peff@peff.net> To: Theodore Ts'o <tytso@mit.edu> Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, "Junio C Hamano" <gitster@pobox.com>, "Christoph Hellwig" <hch@lst.de>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Git Mailing List" <git@vger.kernel.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, "Chris Mason" <clm@fb.com> Subject: Re: [PATCH] enable core.fsyncObjectFiles by default Date: Mon, 22 Jan 2018 19:47:10 -0500 Message-ID: <20180123004710.GF26357@sigill.intra.peff.net> (raw) In-Reply-To: <20180122180903.GB3513@thunk.org> On Mon, Jan 22, 2018 at 01:09:03PM -0500, Theodore Ts'o wrote: > > Wouldn't it also make gc pruning more expensive? Now you can repack > > regularly and loose objects will be left out of the pack, and then just > > rm'd, whereas now it would entail creating new packs (unless the whole > > pack was objects meant for removal). > > The idea is that the cruft pack would be all objects that were no > longer referenced. Hence the proposal that if they ever *are* > accessed, they would be exploded to a loose object at that point. So > in the common case, the GC would go quickly since the entire pack > could just be rm'ed once it hit the designated expiry time. I think Ævar is talking about the case of: 1. You make 100 objects that aren't referenced. They're loose. 2. You run git-gc. They're still too recent to be deleted. Right now those recent loose objects sit loose, and have zero cost at the time of gc. In a "cruft pack" world, you'd pay some I/O to copy them into the cruft pack, and some CPU to zlib and delta-compress them. I think that's probably fine, though. That said, some of what you wrote left me confused, and whether we're all talking about the same idea. ;) Let me describe the idea I had mentioned in another thread. Right now the behavior is basically this: If an unreachable object becomes referenced, it doesn't immediately get exploded. During the next gc, whatever new object referenced them would be one of: 1. Reachable from refs, in which case it carries along the formerly-cruft object into the new pack, since it is now also reachable. 2. Unreachable but still recent by mtime; we keep such objects, and anything they reference (now as unreachable, in this proposal in the cruft pack). Now these get either left loose, or exploded loose if they were previously packed. 3. Unreachable and old. Both objects can be dropped totally. The current strategy is to use the mtimes for "recent", and we use the pack's mtime for every object in the pack. So if we pack all the loose objects into a cruft pack, the mtime of the cruft pack becomes the new gauge for "recent". And if we migrate objects from old cruft pack to new cruft pack at each gc, then they'll keep getting their mtimes refreshed, and we'll never drop them. So we need to either: - keep per-object mtimes, so that old ones can age out (i.e., they'd hit case 3 and just not get migrated to either the new "real" pack or the new cruft pack). - keep multiple cruft packs, and let whole packs age out. But then cruft objects which get referenced again by other cruft have to get copied (not moved!) to new packs. That _probably_ doesn't happen all that often, so it might be OK. > Another way of doing things would be to use the mtime of the cruft > pack for the expiry time, and if the curft pack is ever referenced, > its mtime would get updated. Yet a third way would be to simply clear > the "cruft" bit if it ever *is* referenced. In the common case, it > would never be referenced, so it could just get deleted, but in the > case where the user has manually "rescued" a set of commits (perhaps > by explicitly setting a branch head to commit id found from a reflog), > the objects would be saved. I don't think we have to worry about "rescued" objects. Those are reachable, so they'd get copied into the new "real" pack (and then their cruft pack eventually deleted). -Peff
next prev parent reply index Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-01-17 18:48 Christoph Hellwig 2018-01-17 19:04 ` Junio C Hamano 2018-01-17 19:35 ` Christoph Hellwig 2018-01-17 20:05 ` Andreas Schwab 2018-01-17 19:37 ` Matthew Wilcox 2018-01-17 19:42 ` Christoph Hellwig 2018-01-17 21:44 ` Ævar Arnfjörð Bjarmason 2018-01-17 22:07 ` Linus Torvalds 2018-01-17 22:25 ` Linus Torvalds 2018-01-17 23:16 ` Ævar Arnfjörð Bjarmason 2018-01-17 23:42 ` Linus Torvalds 2018-01-17 23:52 ` Theodore Ts'o 2018-01-17 23:57 ` Linus Torvalds 2018-01-18 16:27 ` Christoph Hellwig 2018-01-19 19:08 ` Junio C Hamano 2018-01-20 22:14 ` Theodore Ts'o 2018-01-20 22:27 ` Junio C Hamano 2018-01-22 15:09 ` Ævar Arnfjörð Bjarmason 2018-01-22 18:09 ` Theodore Ts'o 2018-01-23 0:47 ` Jeff King [this message] 2018-01-23 5:45 ` Theodore Ts'o 2018-01-23 16:17 ` Jeff King 2018-01-23 0:25 ` Jeff King 2018-01-21 21:32 ` Chris Mason 2020-09-17 11:06 ` Ævar Arnfjörð Bjarmason 2020-09-17 14:14 ` Christoph Hellwig 2020-09-17 15:30 ` Junio C Hamano 2018-01-17 20:55 ` Jeff King 2018-01-17 21:10 ` Christoph Hellwig -- strict thread matches above, loose matches on Subject: below -- 2015-06-23 21:57 [PATCH] Enable " Stefan Beller 2015-06-23 22:21 ` Junio C Hamano 2015-06-23 23:29 ` Theodore Ts'o 2015-06-24 5:32 ` Junio C Hamano 2015-06-24 14:30 ` Theodore Ts'o 2015-06-24 1:07 ` Duy Nguyen 2015-06-24 3:37 ` Jeff King 2015-06-24 5:20 ` Junio C Hamano
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180123004710.GF26357@sigill.intra.peff.net \ --to=peff@peff.net \ --cc=avarab@gmail.com \ --cc=clm@fb.com \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ --cc=hch@lst.de \ --cc=linux-fsdevel@vger.kernel.org \ --cc=torvalds@linux-foundation.org \ --cc=tytso@mit.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Mailing List Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/git/0 git/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 git git/ https://lore.kernel.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git