From: Jeff King <peff@peff.net> To: Theodore Ts'o <tytso@mit.edu> Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>, "Junio C Hamano" <gitster@pobox.com>, "Christoph Hellwig" <hch@lst.de>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Git Mailing List" <git@vger.kernel.org>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, "Chris Mason" <clm@fb.com> Subject: Re: [PATCH] enable core.fsyncObjectFiles by default Date: Tue, 23 Jan 2018 11:17:38 -0500 Message-ID: <20180123161738.GC13068@sigill.intra.peff.net> (raw) In-Reply-To: <20180123054553.GA21015@thunk.org> On Tue, Jan 23, 2018 at 12:45:53AM -0500, Theodore Ts'o wrote: > What I was thinking about instead is that in cases where we know we > are likely to be creating a large number of loose objects (whether > they referenced or not), in a world where we will be calling fsync(2) > after every single loose object being created, pack files start > looking *way* more efficient. So in general, if you know you will be > creating N loose objects, where N is probably around 50 or so, you'll > want to create a pack instead. > > One of those cases is "repack -A", and in that case the loose objects > are all going tobe not referenced, so it would be a "cruft pack". But > in many other cases where we might be importing from another DCVS, > which will be another case where doing an fsync(2) after every loose > object creation (and where I have sometimes seen it create them *all* > loose, and not use a pack at all), is going to get extremely slow and > painful. Ah, I see. I think in the general case of git operations this is hard (because most object writes don't realize the larger operation that they're a part of). But I agree that those two are the low-hanging fruit (imports should already be using fast-import, and "cruft packs" are not too hard an idea to implement). I agree that a cruft-pack implementation could just be for "repack -A", and does not have to collect otherwise loose objects. I think part of my confusion was that you and I are coming to the idea from different angles: you care about minimizing fsyncs, and I'm interested in stopping the problem where you have too many loose objects after running auto-gc. So I care more about collecting those loose objects for that case. > > So if we pack all the loose objects into a cruft pack, the mtime of the > > cruft pack becomes the new gauge for "recent". And if we migrate objects > > from old cruft pack to new cruft pack at each gc, then they'll keep > > getting their mtimes refreshed, and we'll never drop them. > > Well, I was assuming that gc would be a special case which doesn't the > mtime of the old cruft pack. (Or more generally, any time an object > is gets copied out of the cruft pack, either to a loose object, or to > another pack, the mtime on the source pack should not be touched.) Right, that's the "you have multiple cruft packs" idea which has been discussed[1] (each one just hangs around until its mtime expires, and may duplicate objects found elsewhere). That does end up with one pack per gc, which just punts the "too many loose objects" to "too many packs". But unless the number of gc runs you do is very high compared to the expiration time, we can probably ignore that. -Peff [1] https://public-inbox.org/git/20170610080626.sjujpmgkli4muh7h@sigill.intra.peff.net/
next prev parent reply index Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-01-17 18:48 Christoph Hellwig 2018-01-17 19:04 ` Junio C Hamano 2018-01-17 19:35 ` Christoph Hellwig 2018-01-17 20:05 ` Andreas Schwab 2018-01-17 19:37 ` Matthew Wilcox 2018-01-17 19:42 ` Christoph Hellwig 2018-01-17 21:44 ` Ævar Arnfjörð Bjarmason 2018-01-17 22:07 ` Linus Torvalds 2018-01-17 22:25 ` Linus Torvalds 2018-01-17 23:16 ` Ævar Arnfjörð Bjarmason 2018-01-17 23:42 ` Linus Torvalds 2018-01-17 23:52 ` Theodore Ts'o 2018-01-17 23:57 ` Linus Torvalds 2018-01-18 16:27 ` Christoph Hellwig 2018-01-19 19:08 ` Junio C Hamano 2018-01-20 22:14 ` Theodore Ts'o 2018-01-20 22:27 ` Junio C Hamano 2018-01-22 15:09 ` Ævar Arnfjörð Bjarmason 2018-01-22 18:09 ` Theodore Ts'o 2018-01-23 0:47 ` Jeff King 2018-01-23 5:45 ` Theodore Ts'o 2018-01-23 16:17 ` Jeff King [this message] 2018-01-23 0:25 ` Jeff King 2018-01-21 21:32 ` Chris Mason 2020-09-17 11:06 ` Ævar Arnfjörð Bjarmason 2020-09-17 11:28 ` [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs Ævar Arnfjörð Bjarmason 2020-09-17 11:28 ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason 2020-09-17 13:16 ` Jeff King 2020-09-17 15:09 ` Christoph Hellwig 2020-09-17 14:09 ` Christoph Hellwig 2020-09-17 14:55 ` Jeff King 2020-09-17 14:56 ` Christoph Hellwig 2020-09-17 15:37 ` Junio C Hamano 2020-09-17 17:12 ` Jeff King 2020-09-17 20:37 ` Taylor Blau 2020-09-22 10:42 ` Ævar Arnfjörð Bjarmason 2020-09-17 20:21 ` Johannes Sixt 2020-09-22 8:24 ` Ævar Arnfjörð Bjarmason 2020-11-19 11:38 ` Johannes Schindelin 2020-09-17 11:28 ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason 2020-09-17 14:12 ` Christoph Hellwig 2020-09-17 15:43 ` Junio C Hamano 2020-09-17 20:15 ` Johannes Sixt 2020-10-08 8:13 ` Johannes Schindelin 2020-10-08 15:57 ` Ævar Arnfjörð Bjarmason 2020-10-08 18:53 ` Junio C Hamano 2020-10-09 10:44 ` Johannes Schindelin 2020-09-17 19:21 ` Marc Branchaud 2020-09-17 14:14 ` [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig 2020-09-17 15:30 ` Junio C Hamano 2018-01-17 20:55 ` Jeff King 2018-01-17 21:10 ` Christoph Hellwig -- strict thread matches above, loose matches on Subject: below -- 2015-06-23 21:57 [PATCH] Enable " Stefan Beller 2015-06-23 22:21 ` Junio C Hamano 2015-06-23 23:29 ` Theodore Ts'o 2015-06-24 5:32 ` Junio C Hamano 2015-06-24 14:30 ` Theodore Ts'o 2015-06-24 1:07 ` Duy Nguyen 2015-06-24 3:37 ` Jeff King 2015-06-24 5:20 ` Junio C Hamano
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180123161738.GC13068@sigill.intra.peff.net \ --to=peff@peff.net \ --cc=avarab@gmail.com \ --cc=clm@fb.com \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ --cc=hch@lst.de \ --cc=linux-fsdevel@vger.kernel.org \ --cc=torvalds@linux-foundation.org \ --cc=tytso@mit.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Mailing List Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/git/0 git/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 git git/ https://lore.kernel.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git