git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Jeff King <peff@peff.net>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Git Mailing List" <git@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"Chris Mason" <clm@fb.com>
Subject: Re: [PATCH] enable core.fsyncObjectFiles by default
Date: Tue, 23 Jan 2018 00:45:53 -0500	[thread overview]
Message-ID: <20180123054553.GA21015@thunk.org> (raw)
In-Reply-To: <20180123004710.GF26357@sigill.intra.peff.net>

On Mon, Jan 22, 2018 at 07:47:10PM -0500, Jeff King wrote:
> 
> I think Ævar is talking about the case of:
> 
>   1. You make 100 objects that aren't referenced. They're loose.
> 
>   2. You run git-gc. They're still too recent to be deleted.
> 
> Right now those recent loose objects sit loose, and have zero cost at
> the time of gc.  In a "cruft pack" world, you'd pay some I/O to copy
> them into the cruft pack, and some CPU to zlib and delta-compress them.
> I think that's probably fine, though.

I wasn't assuming that git-gc would create a cruft pack --- although I
guess it could.  As you say, recent loose objects have relatively zero
cost at the time of gc.  To the extent that the gc has to read lots of
loose files, there may be more seeks in the cold cache case, so there
is actually *some* cost to having the loose objects, but it's not
great.

What I was thinking about instead is that in cases where we know we
are likely to be creating a large number of loose objects (whether
they referenced or not), in a world where we will be calling fsync(2)
after every single loose object being created, pack files start
looking *way* more efficient.  So in general, if you know you will be
creating N loose objects, where N is probably around 50 or so, you'll
want to create a pack instead.

One of those cases is "repack -A", and in that case the loose objects
are all going tobe not referenced, so it would be a "cruft pack".  But
in many other cases where we might be importing from another DCVS,
which will be another case where doing an fsync(2) after every loose
object creation (and where I have sometimes seen it create them *all*
loose, and not use a pack at all), is going to get extremely slow and
painful.

> So if we pack all the loose objects into a cruft pack, the mtime of the
> cruft pack becomes the new gauge for "recent". And if we migrate objects
> from old cruft pack to new cruft pack at each gc, then they'll keep
> getting their mtimes refreshed, and we'll never drop them.

Well, I was assuming that gc would be a special case which doesn't the
mtime of the old cruft pack.  (Or more generally, any time an object
is gets copied out of the cruft pack, either to a loose object, or to
another pack, the mtime on the source pack should not be touched.)

	      	  	       	      	   	   - Ted

  reply	other threads:[~2018-01-23  5:46 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-17 18:48 [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
2018-01-17 19:04 ` Junio C Hamano
2018-01-17 19:35   ` Christoph Hellwig
2018-01-17 20:05     ` Andreas Schwab
2018-01-17 19:37   ` Matthew Wilcox
2018-01-17 19:42     ` Christoph Hellwig
2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
2018-01-17 22:07     ` Linus Torvalds
2018-01-17 22:25       ` Linus Torvalds
2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
2018-01-17 23:42         ` Linus Torvalds
2018-01-17 23:52       ` Theodore Ts'o
2018-01-17 23:57         ` Linus Torvalds
2018-01-18 16:27           ` Christoph Hellwig
2018-01-19 19:08             ` Junio C Hamano
2018-01-20 22:14               ` Theodore Ts'o
2018-01-20 22:27                 ` Junio C Hamano
2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
2018-01-22 18:09                     ` Theodore Ts'o
2018-01-23  0:47                       ` Jeff King
2018-01-23  5:45                         ` Theodore Ts'o [this message]
2018-01-23 16:17                           ` Jeff King
2018-01-23  0:25                     ` Jeff King
2018-01-21 21:32             ` Chris Mason
2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
2020-09-17 11:28           ` [RFC PATCH 0/2] should core.fsyncObjectFiles fsync the dir entry + docs Ævar Arnfjörð Bjarmason
2020-09-17 11:28           ` [RFC PATCH 1/2] sha1-file: fsync() loose dir entry when core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
2020-09-17 13:16             ` Jeff King
2020-09-17 15:09               ` Christoph Hellwig
2020-09-17 14:09             ` Christoph Hellwig
2020-09-17 14:55               ` Jeff King
2020-09-17 14:56                 ` Christoph Hellwig
2020-09-17 15:37                   ` Junio C Hamano
2020-09-17 17:12                     ` Jeff King
2020-09-17 20:37                       ` Taylor Blau
2020-09-22 10:42               ` Ævar Arnfjörð Bjarmason
2020-09-17 20:21             ` Johannes Sixt
2020-09-22  8:24               ` Ævar Arnfjörð Bjarmason
2020-11-19 11:38                 ` Johannes Schindelin
2020-09-17 11:28           ` [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant Ævar Arnfjörð Bjarmason
2020-09-17 14:12             ` Christoph Hellwig
2020-09-17 15:43             ` Junio C Hamano
2020-09-17 20:15               ` Johannes Sixt
2020-10-08  8:13               ` Johannes Schindelin
2020-10-08 15:57                 ` Ævar Arnfjörð Bjarmason
2020-10-08 18:53                   ` Junio C Hamano
2020-10-09 10:44                   ` Johannes Schindelin
2020-09-17 19:21             ` Marc Branchaud
2020-09-17 14:14           ` [PATCH] enable core.fsyncObjectFiles by default Christoph Hellwig
2020-09-17 15:30           ` Junio C Hamano
2018-01-17 20:55 ` Jeff King
2018-01-17 21:10   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2015-06-23 21:57 [PATCH] Enable " Stefan Beller
2015-06-23 22:21 ` Junio C Hamano
2015-06-23 23:29   ` Theodore Ts'o
2015-06-24  5:32     ` Junio C Hamano
2015-06-24 14:30       ` Theodore Ts'o
2015-06-24  1:07 ` Duy Nguyen
2015-06-24  3:37 ` Jeff King
2015-06-24  5:20   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180123054553.GA21015@thunk.org \
    --to=tytso@mit.edu \
    --cc=avarab@gmail.com \
    --cc=clm@fb.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).