From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-5.6 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id A97911F4F8 for ; Wed, 28 Sep 2016 17:32:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933191AbcI1Rch (ORCPT ); Wed, 28 Sep 2016 13:32:37 -0400 Received: from pb-smtp2.pobox.com ([64.147.108.71]:61571 "EHLO sasl.smtp.pobox.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932977AbcI1Rcf (ORCPT ); Wed, 28 Sep 2016 13:32:35 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id BF831405C2; Wed, 28 Sep 2016 13:32:33 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=/BC4jo20cmw3aQFP8JvrXU8eHfk=; b=H9pp0h A0RcfKf2NkUz1qLMhDIqDAvNi+UAeLGKpVSNA2+lgE2+W2mBtmpkw5NkocCqbdHn CFC8mtx71e5EbBZ7D1YP5YuDiEkZUh4JUbNUQpk8FKOLOKABx0CzPRcQURgirS0T 6j0jlYHile2ArnDnyT2yqZevociXkDOmybmrg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=sasl; b=s77AmPAH5cJ6O/2HsfApzeDAvfTN/t/7 X/h7mkXOlmhVT3diEe8KnWF6xW1JnwpYGenbWidPHlLZGw66bmUHpPVbbYe83A4b bAHOm+cVi81EuEsBY+RzPKfrIx2nJi7dRTk6l173KGFcBGEwf6N8F/MJF7TnTYcv wkEk/ia0wsk= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id B248C405C1; Wed, 28 Sep 2016 13:32:33 -0400 (EDT) Received: from pobox.com (unknown [104.132.0.95]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id 15B96405C0; Wed, 28 Sep 2016 13:32:33 -0400 (EDT) From: Junio C Hamano To: Eric Wong Cc: Kevin Wern , git@vger.kernel.org Subject: Re: [PATCH 00/11] Resumable clone References: <1473984742-12516-1-git-send-email-kevin.m.wern@gmail.com> <20160927215143.GA32622@starla> Date: Wed, 28 Sep 2016 10:32:30 -0700 In-Reply-To: (Junio C. Hamano's message of "Tue, 27 Sep 2016 15:07:35 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 89E44952-85A1-11E6-8B63-EAAE7A1B28F4-77302942!pb-smtp2.pobox.com Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Junio C Hamano writes: >>> git clone --resume >> >> I think calling "git fetch" should resume, actually. >> It would reduce the learning curve and seems natural to me: >> "fetch" is jabout grabbing whatever else appeared since the >> last clone/fetch happened. > > I hate say this but it sounds to me like a terrible idea. At that > point when you need to resume, there is not even ref for "fetch" to > base its incremental work off of. It is better to keep the knowledge > of this "priming" dance inside "clone". Hopefully the original "clone" > whose connection was disconnected in the middle would automatically > attempt resuming and "clone --resume" would not be as often as needed. After sleeping on this, I want to take the above back. I think teaching "git fetch" about the "resume" part makes tons of sense. What "git clone" should have been was: * Parse command line arguments; * Create a new repository and go into it; this step would require us to have parsed the command line for --template, , --separate-git-dir, etc. * Talk to the remote and do get_remote_heads() aka ls-remote output; * Decide what fetch refspec to use, which alternate object store to borrow from; this step would require us to have parsed the command line for --reference, --mirror, --origin, etc; --- we'll insert something new here --- * Issue "git fetch" with the refspec determined above; this step would require us to have parsed the command line for --depth, etc. * Run "git checkout -b" to create an initial checkout; this step would require us to have parsed the command line for --branch, etc. Even though the current code conceptually does the above, these steps are not cleanly separated as such. I think our update to gain "resumable clone" feature on the client side need to start by refactoring the current code, before learning "resumable clone", to look like the above. Once we do that, we can insert an extra step before the step that runs "git fetch" to optionally [*1*] grab the extra piece of information Kevin's "prime-clone" service produces [*2*], and store it in the "new repository" somewhere [*3*]. And then, as you suggested, an updated "git fetch" can be taught to notice the priming information left by the previous step, and use it to attempt to download the pack until success, and to index that pack to learn the tips that can be used as ".have" entries in the request. From the original server's point of view, this fetch request would "want" the same set of objects, but would appear as an incremental update. Of course, the final step that happens in "git clone", i.e. the initial checkout, needs to be done somehow, if your user decides to resume with "git fetch", as "git fetch" _never_ touches the working tree. So for that purpose, the primary end-user facing interface may still have to be "git clone --resume ". That would probably skip all four steps in the above sequence, the new "download priming information" step and go directly to the step that runs "git fetch". I do agree that is a much better design, and the crucial design decision that makes it a better design is your making "git fetch" aware of this "ah, we have the instruction left in this repository how to prime its object store" information. Thanks. [Footnotes] *1* It is debatable if it would be an overall win to use the "first prime by grabbing a large packfile" clone if we are doing shallow or single-branch clone, hence "optionally". It is important to notice that we already have enough information to base the decision at this point in the above sequence. *2* As I said, I do not think it needs to be a separate new service, and I suspect it may be a better design to carry it over the protocol extension. At this point in the above sequence, we have done an equivalent of ls-remote and if we designed a protocol extension to carry the information we should already have it. If we use a separate new service, we can of course make a separate connection to ask about "prime-clone" information. The way this piece of information is transmitted is of secondary importance. *3* In addition to the "prime-clone" information, we may need to store some information that is only known to "clone" (perhaps because it was given from the command line) to help the final "checkout -b" step to know what to checkout around here, in case the next "fetch" step is interrupted and killed.