All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, peartben@gmail.com, christian.couder@gmail.com
Subject: Re: [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader
Date: Tue, 1 Aug 2017 17:19:44 -0700	[thread overview]
Message-ID: <20170801171944.7690a63f@twelve2.svl.corp.google.com> (raw)
In-Reply-To: <xmqq379bgqlx.fsf@gitster.mtv.corp.google.com>

On Tue, 01 Aug 2017 10:11:38 -0700
Junio C Hamano <gitster@pobox.com> wrote:

> Let's step back a bit and think what already happens in the pre-
> lazy-object world.  We record cut-off commits when a depth limited
> clone is created in "shallow".  These essentially are promises,
> saying something like:
> 
>     Rest assured that everything in the history behind these commits
>     are on the other side and you can retrieve them by unshallowing.
> 
>     If you traverse from your local tips and find no missing objects
>     before reaching one of these commits, then you do not have any
>     local corruption you need to worry about.
> 
> the other end made to us, when the shallow clone was made.  And we
> take this promise and build more commits on top, and then we adjust
> these cut-off commits incrementally as we deepen our clone or make
> it even shallower.  For this assurance to work, we of course need to
> assume a bit more than what we assume for a complete clone, namely,
> the "other side" will hold onto the history behind these, i.e. does
> not remind the tips it already has shown to us, or even if it does,
> the objects that are reachable from these cut-off points will
> somehow always be available to us on demand.
> 
> Can we do something similar, i.e. maintain minimum set of cut-off
> points and adjust that set incrementally, just sufficient to ensure
> the integrity of objects locally created and not yet safely stored
> away by pushing them the "other side"?

This suggestion (the "frontier" of what we have) does seem to incur less
overhead than the original promise suggestion (the "frontier" of what we
don't have), but after some in-office discussion, I'm convinced that it
might not be the case - for example, one tree (that we have) might
reference many blobs (that we don't have), but at the same time, many
trees (that we have) might have the same blob (that we don't have). And
the promise overhead was already decided to be too much - which is why
we moved away from it.

One possibility to conceptually have the same thing without the overhead
of the list is to put the obtained-from-elsewhere objects into its own
alternate object store, so that we can distinguish the two. I mentioned
this in my e-mail but rejected it, but after some more thought, this
might be sufficient - we might still need to iterate through every
object to know exactly what we can assume the remote to have, but the
"frontier" solution also needs this iteration, so we are no worse off.

Going back to the original use cases that motivated this (the monorepo
like Microsoft's repo and the large-blob repo like Android's repo), it
might be better just to disable the connectivity check when
extensions.lazyObject is set (as you mentioned). This does change the
meaning of fsck, but it may be fine since the "meaning" of the repo (a
view of another repo, and no longer a full repo) has changed too. Then
this patch set will be more about ensuring that the lazy object loader
is not inadvertently run. As future work, we could add diagnostics that,
for example, attempt a walk anyway and print a list of missing SHA-1s.

(I suspect that we will also need to disable the connectivity check for
things like "git fetch", which means that we won't be able to tell
locally if the server sent us all the objects that we requested for.
This might not be a problem, though, since the local repo already has
some measure of trust for the server.)

  parent reply	other threads:[~2017-08-02  0:19 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 23:29 [RFC PATCH 0/4] Some patches for fsck for missing objects Jonathan Tan
2017-07-26 23:29 ` [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-27 18:55   ` Junio C Hamano
2017-07-28 13:20     ` Ben Peart
2017-07-28 23:50     ` Jonathan Tan
2017-07-29  0:21       ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 2/4] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-27 18:59   ` Junio C Hamano
2017-07-27 23:50     ` Jonathan Tan
2017-07-28 13:29       ` Ben Peart
2017-07-28 20:08         ` [PATCH] tests: ensure fsck fails on corrupt packfiles Jonathan Tan
2017-07-26 23:30 ` [RFC PATCH 3/4] fsck: support referenced lazy objects Jonathan Tan
2017-07-27 19:17   ` Junio C Hamano
2017-07-27 23:50     ` Jonathan Tan
2017-07-29 16:04   ` Junio C Hamano
2017-07-26 23:30 ` [RFC PATCH 4/4] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-26 23:42 ` [RFC PATCH 0/4] Some patches for fsck for missing objects brian m. carlson
2017-07-27  0:24   ` Stefan Beller
2017-07-27 17:25   ` Jonathan Tan
2017-07-28 13:40     ` Ben Peart
2017-07-31 21:02 ` [PATCH v2 0/5] Fsck for lazy objects, and (now) actual invocation of loader Jonathan Tan
2017-07-31 21:21   ` Junio C Hamano
2017-07-31 23:05     ` Jonathan Tan
2017-08-01 17:11       ` Junio C Hamano
2017-08-01 17:45         ` Jonathan Nieder
2017-08-01 20:15           ` Junio C Hamano
2017-08-02  0:19         ` Jonathan Tan [this message]
2017-08-02 16:20           ` Junio C Hamano
2017-08-02 17:38             ` Jonathan Nieder
2017-08-02 20:51               ` Junio C Hamano
2017-08-02 22:13                 ` Jonathan Nieder
2017-08-03 19:08                 ` Jonathan Tan
2017-08-08 17:13   ` Ben Peart
2017-07-31 21:02 ` [PATCH v2 1/5] environment, fsck: introduce lazyobject extension Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 2/5] fsck: support refs pointing to lazy objects Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 3/5] fsck: support referenced " Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 4/5] fsck: support lazy objects as CLI argument Jonathan Tan
2017-07-31 21:02 ` [PATCH v2 5/5] sha1_file: support loading lazy objects Jonathan Tan
2017-07-31 21:29   ` Junio C Hamano
2017-08-08 20:20   ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170801171944.7690a63f@twelve2.svl.corp.google.com \
    --to=jonathantanmy@google.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.