All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Jeff Hostetler <jeffhost@microsoft.com>,
	git@vger.kernel.org, markbt@efaref.net, benpeart@microsoft.com,
	jonathantanmy@google.com
Subject: Re: [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks
Date: Thu, 9 Mar 2017 13:38:35 -0500	[thread overview]
Message-ID: <cb3418b8-8ecf-a38b-67d3-e73dc28d4f69@jeffhostetler.com> (raw)
In-Reply-To: <20170309075642.jy5o353ann524k7f@sigill.intra.peff.net>



On 3/9/2017 2:56 AM, Jeff King wrote:
> On Wed, Mar 08, 2017 at 03:10:54PM -0500, Jeff Hostetler wrote:
>
>>> Even though I do very much like the basic "high level" premise to
>>> omit often useless large blobs that are buried deep in the history
>>> we would not necessarily need from the initial cloning and
>>> subsequent fetches, I find it somewhat disturbing that the code
>>> "Assume"s that any missing blob is due to an previous partial clone.
>>> Adding this option smells like telling the users that they are not
>>> supposed to run "git fsck" because a partially cloned repository is
>>> inherently a corrupt repository.
>>>
>>> Can't we do a bit better?  If we want to make the world safer again,
>>> what additional complexity is required to allow us to tell the
>>> "missing by design" and "corrupt repository" apart?
>>
>> I'm open to suggestions here.  It would be nice to extend the
>> fetch-pack/upload-pack protocol to return a list of the SHAa
>> (and maybe the sizes) of the omitted blobs, so that a partial
>> clone or fetch would still be able to be integrity checked.
>
> Yeah, the early external-odb patches did this. It lets you do a more
> accurate fsck, and it also helps diff avoid faulting in large-object
> cases (because we can mark them as binary for "free" by comparing the
> size to big_file_threshold).
>
> So I think it makes a lot of sense in the large-blob case, where
> transmitting a type/size/sha1 tuple is way more efficient than sending
> the blob itself. But it's less clear for "sparse" cases where just
> enumerating the set of blobs may be prohibitively large.
>
> I have a feeling that the "sparse" thing needs to be handled separately
> from "partial". IOW, the client needs to tell the server "I'm only
> interested in the path foo/bar, so just send that". Then you don't find
> out about the types and sizes outside of that path, but you don't need
> to; the sparse path is stored locally and fsck knows to avoid looking
> into it.
>
> -Peff
>

That makes sense.  I'd like to get both concepts (by-size/special vs
sparse-file) in, but they don't really overlap that much (internally).
So I could see doing this in 2 separate efforts.

Thanks,
Jeff

  reply	other threads:[~2017-03-09 18:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-08 17:37 [PATCH 00/10] RFC Partial Clone and Fetch Jeff Hostetler
2017-03-08 17:37 ` [PATCH 01/10] pack-objects: eat CR in addition to LF after fgets Jeff Hostetler
2017-03-09  7:01   ` Jeff King
2017-03-09 15:46     ` Jeff Hostetler
2017-03-08 17:37 ` [PATCH 02/10] pack-objects: add --partial-by-size=n --partial-special Jeff Hostetler
2017-03-08 18:47   ` Junio C Hamano
2017-03-08 20:21     ` Jeff Hostetler
2017-03-09  7:04       ` Jeff King
2017-03-10 17:58         ` Brandon Williams
2017-03-10 18:03           ` Jeff King
2017-03-10 19:38             ` Junio C Hamano
2017-03-10 19:47               ` Jeff King
2017-03-09  7:31   ` Jeff King
2017-03-09 18:26     ` Jeff Hostetler
2017-03-08 17:37 ` [PATCH 03/10] pack-objects: test for --partial-by-size --partial-special Jeff Hostetler
2017-03-09  7:35   ` Jeff King
2017-03-09 18:11   ` Johannes Sixt
2017-03-08 17:37 ` [PATCH 04/10] upload-pack: add partial (sparse) fetch Jeff Hostetler
2017-03-09  7:48   ` Jeff King
2017-03-09 18:34     ` Jeff Hostetler
2017-03-09 19:09       ` Jeff King
2017-03-08 17:38 ` [PATCH 05/10] fetch-pack: add partial-by-size and partial-special Jeff Hostetler
2017-03-08 17:38 ` [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks Jeff Hostetler
2017-03-08 18:55   ` Junio C Hamano
2017-03-08 20:10     ` Jeff Hostetler
2017-03-09  7:56       ` Jeff King
2017-03-09 18:38         ` Jeff Hostetler [this message]
2017-03-08 17:38 ` [PATCH 07/10] index-pack: add --allow-partial option to relax blob existence checks Jeff Hostetler
2017-03-08 17:38 ` [PATCH 08/10] fetch: add partial-by-size and partial-special arguments Jeff Hostetler
2017-03-08 17:38 ` [PATCH 09/10] clone: " Jeff Hostetler
2017-03-08 17:38 ` [PATCH 10/10] ls-partial: created command to list missing blobs Jeff Hostetler
2017-03-08 18:50 [PATCH 00/10] RFC Partial Clone and Fetch git
2017-03-08 18:50 ` [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks git

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb3418b8-8ecf-a38b-67d3-e73dc28d4f69@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=markbt@efaref.net \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.