All of lore.kernel.org
 help / color / mirror / Atom feed
From: Emily Shaffer <emilyshaffer@google.com>
To: William Chen <williamchen32335@gmail.com>,
	Git List <git@vger.kernel.org>
Subject: Re: faster git clone
Date: Fri, 22 Jan 2021 11:53:21 -0800	[thread overview]
Message-ID: <CAJoAoZkrYYz=1wKDtUKdewPGX9wr2Zwhhyq9kd5C2_KDn9UJ=w@mail.gmail.com> (raw)
In-Reply-To: <20210122030103.GA73465@gmail.com>

On Thu, Jan 21, 2021 at 7:01 PM William Chen <williamchen32335@gmail.com> wrote:
>
> Dear Emily,
>
> I see your excellent contribution to git clone. I hope that you are well.

Hi William, this is a question much better directed at the Git list as a whole.

>
> When I try to clone a repo of a large size from github, it is slow.
>
> $ git clone https://github.com/git/git
> ...
> remote: Enumerating objects: 56, done.
> remote: Counting objects: 100% (56/56), done.
> remote: Compressing objects: 100% (25/25), done.
> Receiving objects:  23% (70386/299751), 33.00 MiB | 450.00 KiB/s
>
> The following aria2c command, which can use multiple downloading threads, is much faster. Would you please let me know whether there is a way to speed up git clone (maybe by using parallelization)?

In general, it would be more compelling to see actual numbers than
"much faster", e.g. the outputs of `time git clone
https://github.com/git/git` and `time aria2c
https://github.com/git/git/archive/master.zip` - or even an estimation
from you, like, "I think clone takes a minute or two but aria does the
same thing in only a couple of seconds". "Much faster" means something
different to everyone :)

>
> Your help is much appreciated! I look forward to hearing from you. Thanks.
>
> $ aria2c https://github.com/git/git/archive/master.zip
>
> 01/21 20:16:04 [NOTICE] Downloading 1 item(s)
>
> 01/21 20:16:04 [NOTICE] CUID#7 - Redirecting to https://codeload.github.com/git/git/zip/master

Right here it looks like your zip download redirects to a CDN or
something, which is probably better optimized for serving archives
than the Git server itself, so I would guess that has something to do
with it too.

> [#59b6a2 8.2MiB/0B CN:1 DL:3.8MiB]
> 01/21 20:16:08 [NOTICE] Download complete: /private/tmp/git-master.zip
>
> Download Results:
> gid   |stat|avg speed  |path/URI
> ======+====+===========+=======================================================
> 59b6a2|OK  |   2.9MiB/s|/private/tmp/git-master.zip
>
> Status Legend:
> (OK):download completed.

There are others on the list who are better able to explain this than
me. But I'd guess the upshot is that 'git clone
https://github.com/git/git' is asking a Git server, which is good at
Git repo management (e.g. accepting pushes, generating packfiles to
send you a specific object or branch, etc) - but when you ask for
"git/git/archive/master.zip" you're getting the result of some work
the Git server already did a while ago to zip up the current 'master'
into an archive and give it to some other server.

We've done some other work[1] around enabling use of CDNs and prebuilt
chunks lately, but again, there are others on the list better able to
explain than me.

[1]: https://github.com/git/git/blob/master/Documentation/technical/packfile-uri.txt

       reply	other threads:[~2021-01-22 22:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210122030103.GA73465@gmail.com>
2021-01-22 19:53 ` Emily Shaffer [this message]
2021-01-23  0:41   ` faster git clone brian m. carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJoAoZkrYYz=1wKDtUKdewPGX9wr2Zwhhyq9kd5C2_KDn9UJ=w@mail.gmail.com' \
    --to=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=williamchen32335@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.