All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Eric Wong <e@80x24.org>
Cc: Kevin Wern <kevin.m.wern@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH 00/11] Resumable clone
Date: Wed, 28 Sep 2016 10:32:30 -0700	[thread overview]
Message-ID: <xmqqy42cj5g1.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <xmqqshslkndk.fsf@gitster.mtv.corp.google.com> (Junio C. Hamano's message of "Tue, 27 Sep 2016 15:07:35 -0700")

Junio C Hamano <gitster@pobox.com> writes:

>>> git clone --resume <resumable_work_or_git_dir>
>>
>> I think calling "git fetch" should resume, actually.
>> It would reduce the learning curve and seems natural to me:
>> "fetch" is jabout grabbing whatever else appeared since the
>> last clone/fetch happened.
>
> I hate say this but it sounds to me like a terrible idea.  At that
> point when you need to resume, there is not even ref for "fetch" to
> base its incremental work off of.  It is better to keep the knowledge
> of this "priming" dance inside "clone".  Hopefully the original "clone"
> whose connection was disconnected in the middle would automatically
> attempt resuming and "clone --resume" would not be as often as needed.

After sleeping on this, I want to take the above back.

I think teaching "git fetch" about the "resume" part makes tons of
sense.

What "git clone" should have been was:

    * Parse command line arguments;

    * Create a new repository and go into it; this step would
      require us to have parsed the command line for --template,
      <directory>, --separate-git-dir, etc.

    * Talk to the remote and do get_remote_heads() aka ls-remote
      output;

    * Decide what fetch refspec to use, which alternate object store
      to borrow from; this step would require us to have parsed the
      command line for --reference, --mirror, --origin, etc;

    --- we'll insert something new here ---

    * Issue "git fetch" with the refspec determined above; this step
      would require us to have parsed the command line for --depth, etc.

    * Run "git checkout -b" to create an initial checkout; this step
      would require us to have parsed the command line for --branch,
      etc.

Even though the current code conceptually does the above, these
steps are not cleanly separated as such.  I think our update to gain
"resumable clone" feature on the client side need to start by
refactoring the current code, before learning "resumable clone", to
look like the above.

Once we do that, we can insert an extra step before the step that
runs "git fetch" to optionally [*1*] grab the extra piece of
information Kevin's "prime-clone" service produces [*2*], and store
it in the "new repository" somewhere [*3*].

And then, as you suggested, an updated "git fetch" can be taught to
notice the priming information left by the previous step, and use it
to attempt to download the pack until success, and to index that
pack to learn the tips that can be used as ".have" entries in the
request.  From the original server's point of view, this fetch
request would "want" the same set of objects, but would appear as
an incremental update.

Of course, the final step that happens in "git clone", i.e. the
initial checkout, needs to be done somehow, if your user decides to
resume with "git fetch", as "git fetch" _never_ touches the working
tree.  So for that purpose, the primary end-user facing interface
may still have to be "git clone --resume <dir>".  That would
probably skip all four steps in the above sequence, the new
"download priming information" step and go directly to the step that
runs "git fetch".

I do agree that is a much better design, and the crucial design
decision that makes it a better design is your making "git fetch"
aware of this "ah, we have the instruction left in this repository
how to prime its object store" information.

Thanks.


[Footnotes]

*1* It is debatable if it would be an overall win to use the "first
    prime by grabbing a large packfile" clone if we are doing
    shallow or single-branch clone, hence "optionally".  It is
    important to notice that we already have enough information to
    base the decision at this point in the above sequence.

*2* As I said, I do not think it needs to be a separate new service,
    and I suspect it may be a better design to carry it over the
    protocol extension.  At this point in the above sequence, we
    have done an equivalent of ls-remote and if we designed a
    protocol extension to carry the information we should already
    have it.  If we use a separate new service, we can of course
    make a separate connection to ask about "prime-clone"
    information.  The way this piece of information is transmitted
    is of secondary importance.

*3* In addition to the "prime-clone" information, we may need to
    store some information that is only known to "clone" (perhaps
    because it was given from the command line) to help the final
    "checkout -b" step to know what to checkout around here, in case
    the next "fetch" step is interrupted and killed.



  reply	other threads:[~2016-09-28 17:32 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-16  0:12 [PATCH 00/11] Resumable clone Kevin Wern
2016-09-16  0:12 ` [PATCH 01/11] Resumable clone: create service git-prime-clone Kevin Wern
2016-09-16 20:53   ` Junio C Hamano
2016-09-28  4:40     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 02/11] Resumable clone: add prime-clone endpoints Kevin Wern
2016-09-19 13:15   ` Duy Nguyen
2016-09-28  4:43     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 03/11] pkt-line: create gentle packet_read_line functions Kevin Wern
2016-09-16 22:17   ` Junio C Hamano
2016-09-28  4:42     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 04/11] Resumable clone: add prime-clone to remote-curl Kevin Wern
2016-09-19 13:52   ` Duy Nguyen
2016-09-28  6:45     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 05/11] Resumable clone: add output parsing to connect.c Kevin Wern
2016-09-16  0:12 ` [PATCH 06/11] Resumable clone: implement transport_prime_clone Kevin Wern
2016-09-16  0:12 ` [PATCH 07/11] Resumable clone: add resumable download to http/curl Kevin Wern
2016-09-16 22:45   ` Junio C Hamano
2016-09-28  6:41     ` Kevin Wern
2016-09-16  0:12 ` [PATCH 08/11] Resumable clone: create transport_download_primer Kevin Wern
2016-09-16  0:12 ` [PATCH 09/11] path: add resumable marker Kevin Wern
2016-09-19 13:24   ` Duy Nguyen
2016-09-16  0:12 ` [PATCH 10/11] run command: add RUN_COMMAND_NO_STDOUT Kevin Wern
2016-09-16 23:07   ` Junio C Hamano
2016-09-18 19:22     ` Johannes Schindelin
2016-09-28  4:46     ` Kevin Wern
2016-09-28 17:54       ` Junio C Hamano
2016-09-28 18:06         ` Kevin Wern
2016-09-16  0:12 ` [PATCH 11/11] Resumable clone: implement primer logic in git-clone Kevin Wern
2016-09-16 23:32   ` Junio C Hamano
2016-09-28  5:49     ` Kevin Wern
2016-09-19 14:04   ` Duy Nguyen
2016-09-19 17:16     ` Junio C Hamano
2016-09-28  4:44     ` Kevin Wern
2016-09-16 20:47 ` [PATCH 00/11] Resumable clone Junio C Hamano
2016-09-27 21:51 ` Eric Wong
2016-09-27 22:07   ` Junio C Hamano
2016-09-28 17:32     ` Junio C Hamano [this message]
2016-09-28 18:22       ` Junio C Hamano
2016-09-28 20:46     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy42cj5g1.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=kevin.m.wern@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.