All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
	git@vger.kernel.org, peartben@gmail.com,
	christian.couder@gmail.com
Subject: Re: [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader
Date: Thu, 3 Aug 2017 12:08:49 -0700	[thread overview]
Message-ID: <20170803120849.5f7382d3@twelve2.svl.corp.google.com> (raw)
In-Reply-To: <xmqq8tj1snfq.fsf@gitster.mtv.corp.google.com>

On Wed, 02 Aug 2017 13:51:37 -0700
Junio C Hamano <gitster@pobox.com> wrote:

> > The complication is in the "git gc" operation for the case (*).
> > Today, "git gc" uses a reachability walk to decide which objects to
> > remove --- an object referenced by no other object is fair game to
> > remove.  With (*), there is another kind of object that must not be
> > removed: if an object that I made, M, points to a missing/promised
> > object, O, pointed to by a an object I fetched, F, then I cannot prune
> > F unless there is another fetched object present to anchor O.
> 
> Absolutely.  Lazy-objects support comes with certain cost and this
> is one of them.  
> 
> But I do not think it is realistic to expect that you can prune
> anything you fetched from the "other place" (i.e. the source
> 'lazy-objects' hook reads from).  After all, once they give out
> objects to their clients (like us in this case), they cannot prune
> it, if we take the "implicit promise" approach to avoid the cost to
> transmit and maintain a separate "object list".

By this, do you mean that the client cannot prune anything lazily loaded
from the server? If yes, I understand that the server cannot prune
anything (for the reasons that you describe), but I don't see how that
applies to the client.

> > For example: suppose I have a sparse checkout and run
> >
> > 	git fetch origin refs/pulls/x
> > 	git checkout -b topic FETCH_HEAD
> > 	echo "Some great modification" >> README
> > 	git add README
> > 	git commit --amend
> >
> > When I run "git gc", there is nothing pointing to the commit that was
> > pointed to by the remote ref refs/pulls/x, so it can be pruned.  I
> > would naively also expect that the tree pointed to by that commit
> > could be pruned.  But pruning it means pruning the promise that made
> > it permissible to lack various blobs that my topic branch refers to
> > that are outside the sparse checkout area.  So "git gc" must notice
> > that it is not safe to prune that tree.
> >
> > This feels hacky.  I prefer the promised object list over this
> > approach.
> 
> I think they are moral equivalents implemented differently with
> different assumptions.  The example we are discussing makes an extra
> assumption: In order to reduce the cost of transferring and
> maintaining the list, we assume that all objects that came during
> that transfer are implicitly "promised", i.e. everything behind each
> of these objects will later be available on demand.  How these
> objects are marked is up to the exact mechanism (my preference is to
> mark the resulting packfile as special; Jon Tan's message to which
> my message was a resopnse alluded to using an alternate object
> store).  If you choose to maintain a separate "object list" and have
> the "other side" explicitly give it, perhaps you can lift that
> assumption and replace it with some other assumption that assumes
> less.

Marking might be an issue if we expect the lazy loader to emit an object
after every hash, like in the current design. There would thus be one
mark per object, with overhead similar to the promise list. (Having said
that, batching is possible - I plan to optimize common cases like
checkout, and have such a patch online for an older "promised blob"
design [1].)

Overhead could be reduced by embedding the mark in both the packed and
loose objects, requiring a different format (instead of having a
separate "catalog" of marked files). This seems more complicated than
using an alternate object store, hence my suggestion.

[1] https://github.com/jonathantanmy/git/commit/14f07d3f06bc3a1a2c9bca85adc8c42b230b9143

  parent reply	other threads:[~2017-08-03 19:08 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 23:29 [RFC PATCH 0/4] Some patches for fsck for missing objects Jonathan Tan
2017-07-26 23:29 ` [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-27 18:55   ` Junio C Hamano
2017-07-28 13:20     ` Ben Peart
2017-07-28 23:50     ` Jonathan Tan
2017-07-29  0:21       ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 2/4] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-27 18:59   ` Junio C Hamano
2017-07-27 23:50     ` Jonathan Tan
2017-07-28 13:29       ` Ben Peart
2017-07-28 20:08         ` [PATCH] tests: ensure fsck fails on corrupt packfiles Jonathan Tan
2017-07-26 23:30 ` [RFC PATCH 3/4] fsck: support referenced lazy objects Jonathan Tan
2017-07-27 19:17   ` Junio C Hamano
2017-07-27 23:50     ` Jonathan Tan
2017-07-29 16:04   ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 4/4] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-26 23:42 ` [RFC PATCH 0/4] Some patches for fsck for missing objects brian m. carlson
2017-07-27  0:24   ` Stefan Beller
2017-07-27 17:25   ` Jonathan Tan
2017-07-28 13:40     ` Ben Peart
2017-07-31 21:02 ` [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader Jonathan Tan
2017-07-31 21:21   ` Junio C Hamano
2017-07-31 23:05     ` Jonathan Tan
2017-08-01 17:11       ` Junio C Hamano
2017-08-01 17:45         ` Jonathan Nieder
2017-08-01 20:15           ` Junio C Hamano
2017-08-02  0:19         ` Jonathan Tan
2017-08-02 16:20           ` Junio C Hamano
2017-08-02 17:38             ` Jonathan Nieder
2017-08-02 20:51               ` Junio C Hamano
2017-08-02 22:13                 ` Jonathan Nieder
2017-08-03 19:08                 ` Jonathan Tan [this message]
2017-08-08 17:13   ` Ben Peart
2017-07-31 21:02 ` [PATCH v2 1/5] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 2/5] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 3/5] fsck: support referenced " Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 4/5] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 5/5] sha1_file: support loading lazy objects Jonathan Tan
2017-07-31 21:29   ` Junio C Hamano
2017-08-08 20:20   ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170803120849.5f7382d3@twelve2.svl.corp.google.com \
    --to=jonathantanmy@google.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.