Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Jeff King <peff@peff.net>
To: Theodore Ts'o <tytso@mit.edu>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Christoph Hellwig" <hch@lst.de>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Git Mailing List" <git@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"Chris Mason" <clm@fb.com>
Subject: Re: [PATCH] enable core.fsyncObjectFiles by default
Date: Mon, 22 Jan 2018 19:47:10 -0500
Message-ID: <20180123004710.GF26357@sigill.intra.peff.net> (raw)
In-Reply-To: <20180122180903.GB3513@thunk.org>

On Mon, Jan 22, 2018 at 01:09:03PM -0500, Theodore Ts'o wrote:

> > Wouldn't it also make gc pruning more expensive? Now you can repack
> > regularly and loose objects will be left out of the pack, and then just
> > rm'd, whereas now it would entail creating new packs (unless the whole
> > pack was objects meant for removal).
> 
> The idea is that the cruft pack would be all objects that were no
> longer referenced.  Hence the proposal that if they ever *are*
> accessed, they would be exploded to a loose object at that point.  So
> in the common case, the GC would go quickly since the entire pack
> could just be rm'ed once it hit the designated expiry time.

I think Ævar is talking about the case of:

  1. You make 100 objects that aren't referenced. They're loose.

  2. You run git-gc. They're still too recent to be deleted.

Right now those recent loose objects sit loose, and have zero cost at
the time of gc.  In a "cruft pack" world, you'd pay some I/O to copy
them into the cruft pack, and some CPU to zlib and delta-compress them.
I think that's probably fine, though.

That said, some of what you wrote left me confused, and whether we're
all talking about the same idea. ;) Let me describe the idea I had
mentioned in another thread.  Right now the behavior is basically this:

If an unreachable object becomes referenced, it doesn't immediately get
exploded. During the next gc, whatever new object referenced them would
be one of:

  1. Reachable from refs, in which case it carries along the
     formerly-cruft object into the new pack, since it is now also
     reachable.

  2. Unreachable but still recent by mtime; we keep such objects, and
     anything they reference (now as unreachable, in this proposal in
     the cruft pack). Now these get either left loose, or exploded loose
     if they were previously packed.

  3. Unreachable and old. Both objects can be dropped totally.

The current strategy is to use the mtimes for "recent", and we use the
pack's mtime for every object in the pack.

So if we pack all the loose objects into a cruft pack, the mtime of the
cruft pack becomes the new gauge for "recent". And if we migrate objects
from old cruft pack to new cruft pack at each gc, then they'll keep
getting their mtimes refreshed, and we'll never drop them.

So we need to either:

  - keep per-object mtimes, so that old ones can age out (i.e., they'd
    hit case 3 and just not get migrated to either the new "real" pack
    or the new cruft pack).

  - keep multiple cruft packs, and let whole packs age out. But then
    cruft objects which get referenced again by other cruft have to get
    copied (not moved!) to new packs. That _probably_ doesn't happen all
    that often, so it might be OK.

> Another way of doing things would be to use the mtime of the cruft
> pack for the expiry time, and if the curft pack is ever referenced,
> its mtime would get updated.  Yet a third way would be to simply clear
> the "cruft" bit if it ever *is* referenced.  In the common case, it
> would never be referenced, so it could just get deleted, but in the
> case where the user has manually "rescued" a set of commits (perhaps
> by explicitly setting a branch head to commit id found from a reflog),
> the objects would be saved.

I don't think we have to worry about "rescued" objects. Those are
reachable, so they'd get copied into the new "real" pack (and then their
cruft pack eventually deleted).

-Peff

  reply index

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-17 18:48 Christoph Hellwig
2018-01-17 19:04 ` Junio C Hamano
2018-01-17 19:35   ` Christoph Hellwig
2018-01-17 20:05     ` Andreas Schwab
2018-01-17 19:37   ` Matthew Wilcox
2018-01-17 19:42     ` Christoph Hellwig
2018-01-17 21:44   ` Ævar Arnfjörð Bjarmason
2018-01-17 22:07     ` Linus Torvalds
2018-01-17 22:25       ` Linus Torvalds
2018-01-17 23:16       ` Ævar Arnfjörð Bjarmason
2018-01-17 23:42         ` Linus Torvalds
2018-01-17 23:52       ` Theodore Ts'o
2018-01-17 23:57         ` Linus Torvalds
2018-01-18 16:27           ` Christoph Hellwig
2018-01-19 19:08             ` Junio C Hamano
2018-01-20 22:14               ` Theodore Ts'o
2018-01-20 22:27                 ` Junio C Hamano
2018-01-22 15:09                   ` Ævar Arnfjörð Bjarmason
2018-01-22 18:09                     ` Theodore Ts'o
2018-01-23  0:47                       ` Jeff King [this message]
2018-01-23  5:45                         ` Theodore Ts'o
2018-01-23 16:17                           ` Jeff King
2018-01-23  0:25                     ` Jeff King
2018-01-21 21:32             ` Chris Mason
2020-09-17 11:06         ` Ævar Arnfjörð Bjarmason
2020-09-17 14:14           ` Christoph Hellwig
2020-09-17 15:30           ` Junio C Hamano
2018-01-17 20:55 ` Jeff King
2018-01-17 21:10   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2015-06-23 21:57 [PATCH] Enable " Stefan Beller
2015-06-23 22:21 ` Junio C Hamano
2015-06-23 23:29   ` Theodore Ts'o
2015-06-24  5:32     ` Junio C Hamano
2015-06-24 14:30       ` Theodore Ts'o
2015-06-24  1:07 ` Duy Nguyen
2015-06-24  3:37 ` Jeff King
2015-06-24  5:20   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180123004710.GF26357@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=clm@fb.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git