All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org, peartben@gmail.com,
	christian.couder@gmail.com
Subject: Re: [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader
Date: Wed, 02 Aug 2017 13:51:37 -0700	[thread overview]
Message-ID: <xmqq8tj1snfq.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170802173857.GU13924@aiede.mtv.corp.google.com> (Jonathan Nieder's message of "Wed, 2 Aug 2017 10:38:57 -0700")

Jonathan Nieder <jrnieder@gmail.com> writes:

> Junio C Hamano wrote:
>> Jonathan Tan <jonathantanmy@google.com> writes:
>
>>> One possibility to conceptually have the same thing without the overhead
>>> of the list is to put the obtained-from-elsewhere objects into its own
>>> alternate object store, so that we can distinguish the two.
>>
>> Now you are talking.  Either a separate object store, or a packfile
>> that is specially marked as such, would work.
>
> Jonathan's not in today, so let me say a few more words about this
> approach.
>
> This approach implies a relaxed connectivity guarantee, by creating
> two classes of objects:
>
>  1. Objects that I made should satisfy the connectivity check.  They
>     can point to other objects I made, objects I fetched, or (*) objects
>     pointed to directly by objects I fetched.  More on (*) below.

Or objects that are referred to by objects I fetched.  If you
narrowly clone while omitting a subdirectory, updated a file
that is outside the subdirectory, and created a new commit, while
recording the same tree object name for the directory you do not
know its contents (becaues you didn't fetch), then it is OK for the
top-level tree of the resulting commit you created to be pointing
at the tree that represents the subdirectory you never touched.

> The complication is in the "git gc" operation for the case (*).
> Today, "git gc" uses a reachability walk to decide which objects to
> remove --- an object referenced by no other object is fair game to
> remove.  With (*), there is another kind of object that must not be
> removed: if an object that I made, M, points to a missing/promised
> object, O, pointed to by a an object I fetched, F, then I cannot prune
> F unless there is another fetched object present to anchor O.

Absolutely.  Lazy-objects support comes with certain cost and this
is one of them.  

But I do not think it is realistic to expect that you can prune
anything you fetched from the "other place" (i.e. the source
'lazy-objects' hook reads from).  After all, once they give out
objects to their clients (like us in this case), they cannot prune
it, if we take the "implicit promise" approach to avoid the cost to
transmit and maintain a separate "object list".

> For example: suppose I have a sparse checkout and run
>
> 	git fetch origin refs/pulls/x
> 	git checkout -b topic FETCH_HEAD
> 	echo "Some great modification" >> README
> 	git add README
> 	git commit --amend
>
> When I run "git gc", there is nothing pointing to the commit that was
> pointed to by the remote ref refs/pulls/x, so it can be pruned.  I
> would naively also expect that the tree pointed to by that commit
> could be pruned.  But pruning it means pruning the promise that made
> it permissible to lack various blobs that my topic branch refers to
> that are outside the sparse checkout area.  So "git gc" must notice
> that it is not safe to prune that tree.
>
> This feels hacky.  I prefer the promised object list over this
> approach.

I think they are moral equivalents implemented differently with
different assumptions.  The example we are discussing makes an extra
assumption: In order to reduce the cost of transferring and
maintaining the list, we assume that all objects that came during
that transfer are implicitly "promised", i.e. everything behind each
of these objects will later be available on demand.  How these
objects are marked is up to the exact mechanism (my preference is to
mark the resulting packfile as special; Jon Tan's message to which
my message was a resopnse alluded to using an alternate object
store).  If you choose to maintain a separate "object list" and have
the "other side" explicitly give it, perhaps you can lift that
assumption and replace it with some other assumption that assumes
less.

> Can you spell this out more?  To be clear, are you speaking as a
> reviewer or as the project maintainer?  In other words, if other
> reviewers are able to settle on a design that involves a relaxed
> guarantee for fsck in this mode that they can agree on, does this
> represent a veto meaning the patch can still not go through?

Consider it a veto over punting without making sure that we can
later come up with a solution to give such a guarantee.  I am not
getting a feeling that "other reviewers" are even seeking a "relaxed
guarantee"---all I've seen in the thread is to give up any guarantee
and to hope for the best.


  reply	other threads:[~2017-08-02 20:51 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 23:29 [RFC PATCH 0/4] Some patches for fsck for missing objects Jonathan Tan
2017-07-26 23:29 ` [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-27 18:55   ` Junio C Hamano
2017-07-28 13:20     ` Ben Peart
2017-07-28 23:50     ` Jonathan Tan
2017-07-29  0:21       ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 2/4] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-27 18:59   ` Junio C Hamano
2017-07-27 23:50     ` Jonathan Tan
2017-07-28 13:29       ` Ben Peart
2017-07-28 20:08         ` [PATCH] tests: ensure fsck fails on corrupt packfiles Jonathan Tan
2017-07-26 23:30 ` [RFC PATCH 3/4] fsck: support referenced lazy objects Jonathan Tan
2017-07-27 19:17   ` Junio C Hamano
2017-07-27 23:50     ` Jonathan Tan
2017-07-29 16:04   ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 4/4] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-26 23:42 ` [RFC PATCH 0/4] Some patches for fsck for missing objects brian m. carlson
2017-07-27  0:24   ` Stefan Beller
2017-07-27 17:25   ` Jonathan Tan
2017-07-28 13:40     ` Ben Peart
2017-07-31 21:02 ` [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader Jonathan Tan
2017-07-31 21:21   ` Junio C Hamano
2017-07-31 23:05     ` Jonathan Tan
2017-08-01 17:11       ` Junio C Hamano
2017-08-01 17:45         ` Jonathan Nieder
2017-08-01 20:15           ` Junio C Hamano
2017-08-02  0:19         ` Jonathan Tan
2017-08-02 16:20           ` Junio C Hamano
2017-08-02 17:38             ` Jonathan Nieder
2017-08-02 20:51               ` Junio C Hamano [this message]
2017-08-02 22:13                 ` Jonathan Nieder
2017-08-03 19:08                 ` Jonathan Tan
2017-08-08 17:13   ` Ben Peart
2017-07-31 21:02 ` [PATCH v2 1/5] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 2/5] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 3/5] fsck: support referenced " Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 4/5] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 5/5] sha1_file: support loading lazy objects Jonathan Tan
2017-07-31 21:29   ` Junio C Hamano
2017-08-08 20:20   ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8tj1snfq.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.