git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	git <git@vger.kernel.org>, Ben Peart <peartben@gmail.com>
Subject: Re: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches)
Date: Tue, 3 Oct 2017 10:39:17 -0400	[thread overview]
Message-ID: <cb818dcf-beab-10ac-5e58-f56377af4f6a@jeffhostetler.com> (raw)
In-Reply-To: <xmqq8tgsipi5.fsf@gitster.mtv.corp.google.com>



On 10/3/2017 4:50 AM, Junio C Hamano wrote:
> Christian Couder <christian.couder@gmail.com> writes:
> 
>> Could you give a bit more details about the use cases this is designed for?
>> It seems that when people review my work they want a lot of details
>> about the use cases, so I guess they would also be interesting in
>> getting this kind of information for your work too.
>>
>> Could this support users who would be interested in lazily cloning
>> only one kind of files, for example *.jpeg?
> 
> I do not know about others, but the reason why I was not interested
> in finding out "use cases" is because the value of this series is
> use-case agnostic.
> 
> At least to me, the most interesting part of the series is that it
> allows you to receive a set of objects transferred from the other
> side that lack some of objects that would otherwise be required to
> be here for connectivity purposes, and it introduces a mechanism to
> allow object transfer layer, gc and fsck to work well together in
> the resulting repository that deliberately lacks some objects.  The
> transfer layer marks the objects obtained from a specific remote as
> such, and gc and fsck are taught not to try to follow a missing link
> or diagnose a missing link as an error, if a missing link is
> expected using the mark the transfer layer left.
> 
> and it does so in such a way that it is use-case agnostic.  The
> mechanism does not care how the objects to be omitted were chosen,
> and how the omission criteria were negotiated between the sender and
> the receiver of the pack.
> 
> I think the series comes with one filter that is size-based, but I
> view it as a technology demonstration.  It does not have to be the
> primary use case.  IOW, I do not think the series is meant as a
> declaration that size-based filtering is the most important thing
> and other omission criteria are less important.
> 
> You should be able to build path based omission (i.e. narrow clone)
> or blobtype based omission.  Depending on your needs, you may want
> different object omission criteria.  It is something you can build
> on top.  And the work done on transfer/gc/fsck in this series does
> not have to change to accommodate these different "use cases".
> 

Agreed.

There are lots of reasons for wanting partial clones (and we've been
flinging lots of RFCs at each other that each seem to have a different
base assumption (small-blobs-only vs sparse-checkout vs <whatever>))
and not reaching consensus or closure.

The main thing is to allow the client to use partial clone to request
a "subset", let the server deliver that "subset", and let the client
tooling deal with an incomplete view of the repo.

As I see it there are the following major parts to partial clone:
1. How to let git-clone (and later git-fetch) specify the desired
    subset of objects that it wants?  (A ref-relative request.)
2. How to let the server and git-pack-objects build that incomplete
    packfile?
3. How to remember in the local config that a partial clone (or
    fetch) was used and that missing object should be expected?
4. How to dynamically fetch individual missing objects individually?
     (Not a ref-relative request.)
5. How to augment the local ODB with partial clone information and
    let git-fsck (and friends) perform limited consistency checking?
6. Methods to bulk fetching missing objects (whether in a pre-verb
    hook or in unpack-tree)
7. Miscellaneous issues (e.g. fixing places that accidentally cause
    a missing object to be fetched that don't really need it).

My proposal [1] includes a generic filtering mechanism that handles 3
types of filtering and makes it easy to add other techniques as we
see fit.  It slips in at the list-objects / traverse_commit_list
level and hides all of the details from rev-list and pack-objects.
I have a follow on proposal [2] that extends the filtering parameter
handling to git-clone, git-fetch, git-fetch-pack, git-upload-pack
and the pack protocol.  That takes care of items 1 and 2 above.

Jonathan's proposal [3] includes code to update the local config,
dynamically fetch individual objects, and handle the local ODB and
fsck consistency checking.  That takes care of items 3, 4, and 5.

As was suggested above, I think we should merge our efforts:
using my filtering for 1 and 2 and Jonathan's code for 3, 4, and 5.
I would need to eliminate the "relax" options in favor of his
is_promised() functionality for index-pack and similar.  And omit
his blob-max-bytes changes from pack-objects, the protocol and
related commands.

That should be a good first step.

We both have thoughts on bulk fetching (mine in pre-verb hooks and
his in unpack-tree).  We don't need this immediately, but can wait
until the above is working to revisit.

[1] https://github.com/jeffhostetler/git/pull/3
[2]https://github.com/jeffhostetler/git/pull/4
[3] https://github.com/jonathantanmy/git/tree/partialclone3

Thoughts?

Thanks,
Jeff

  reply	other threads:[~2017-10-03 14:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-29 20:11 [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches) Jonathan Tan
2017-09-29 20:11 ` [PATCH 01/18] fsck: introduce partialclone extension Jonathan Tan
2017-09-29 20:11 ` [PATCH 02/18] fsck: support refs pointing to promisor objects Jonathan Tan
2017-09-29 20:11 ` [PATCH 03/18] fsck: support referenced " Jonathan Tan
2017-09-29 20:11 ` [PATCH 04/18] fsck: support promisor objects as CLI argument Jonathan Tan
2017-09-29 20:11 ` [PATCH 05/18] index-pack: refactor writing of .keep files Jonathan Tan
2017-09-29 20:11 ` [PATCH 06/18] introduce fetch-object: fetch one promisor object Jonathan Tan
2017-09-29 20:11 ` [PATCH 07/18] sha1_file: support lazily fetching missing objects Jonathan Tan
2017-10-12 14:42   ` Christian Couder
2017-10-12 15:45     ` Christian Couder
2017-09-29 20:11 ` [PATCH 08/18] rev-list: support termination at promisor objects Jonathan Tan
2017-09-29 20:11 ` [PATCH 09/18] gc: do not repack promisor packfiles Jonathan Tan
2017-09-29 20:11 ` [PATCH 10/18] pack-objects: rename want_.* to ignore_.* Jonathan Tan
2017-09-29 20:11 ` [PATCH 11/18] pack-objects: support --blob-max-bytes Jonathan Tan
2017-09-29 20:11 ` [PATCH 12/18] fetch-pack: support excluding large blobs Jonathan Tan
2017-09-29 20:11 ` [PATCH 13/18] fetch: refactor calculation of remote list Jonathan Tan
2017-09-29 20:11 ` [PATCH 14/18] fetch: support excluding large blobs Jonathan Tan
2017-09-29 20:11 ` [PATCH 15/18] clone: " Jonathan Tan
2017-09-29 20:11 ` [PATCH 16/18] clone: configure blobmaxbytes in created repos Jonathan Tan
2017-09-29 20:11 ` [PATCH 17/18] unpack-trees: batch fetching of missing blobs Jonathan Tan
2017-09-29 20:11 ` [PATCH 18/18] fetch-pack: restore save_commit_buffer after use Jonathan Tan
2017-09-29 21:08 ` [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches) Johannes Schindelin
2017-10-02  4:23 ` Junio C Hamano
2017-10-03  6:15 ` Christian Couder
2017-10-03  8:50   ` Junio C Hamano
2017-10-03 14:39     ` Jeff Hostetler [this message]
2017-10-03 23:42       ` Jonathan Tan
2017-10-04 13:30         ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb818dcf-beab-10ac-5e58-f56377af4f6a@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).