All of lore.kernel.org
 help / color / mirror / Atom feed
* Any tips for improving the performance of cloning large repositories?
@ 2011-12-16 13:02 Alex Bennee
  2011-12-16 13:11 ` Hallvard Breien Furuseth
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Bennee @ 2011-12-16 13:02 UTC (permalink / raw)
  To: git

Hi,

We've migrated our old CVS repository into GIT without too many
issues. However now we are rolling out the usage of the new repository
we are hitting some performance bottlenecks, especially on the initial
clone (something our buildbot instance does a lot).

Our repo is large, my .git is around 2.5G although the central repo
has a 1.7Gb single pack file. However some machines handle the cloning
better than others. For one thing the clone process seems to involve
the receiving side needing a large glob of memory which causes
problems when there is memory pressure.

I've tried tweaking the pack size from unlimited to 256m but this
seems to have increased the clone time as the receiving end attempts
to re-pack everything back into an uber-pack.

Another thing that I've noticed is very high systime on the receiving
machines as ethernet and disk I/O is heavily hit.

So what I'm looking for are some tips on how I can tweak
configurations to make the clone process a little less I/O and memory
heavy. Any suggestions?

One thing I did try was a rsync'ed local repo in /var/cache/repos
which the clone command used for reference with something like:

git clone --local --reference /var/cache/repo.git git://repo/repo.git

But that didn't help as it seems to copy the whole thing anyway.

-- 
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Any tips for improving the performance of cloning large repositories?
  2011-12-16 13:02 Any tips for improving the performance of cloning large repositories? Alex Bennee
@ 2011-12-16 13:11 ` Hallvard Breien Furuseth
  2011-12-16 13:39   ` Hallvard Breien Furuseth
  0 siblings, 1 reply; 7+ messages in thread
From: Hallvard Breien Furuseth @ 2011-12-16 13:11 UTC (permalink / raw)
  To: Alex Bennee; +Cc: git

Alex Bennee writes:
> We've migrated our old CVS repository into GIT without too many
> issues. However now we are rolling out the usage of the new repository
> we are hitting some performance bottlenecks, especially on the initial
> clone (something our buildbot instance does a lot).

Do you often need to clone from a remote?  Instead of cloning from a
local (git clone --mirror) which gets auto-updated from the remote.

Could the buildbot make do with a shallow repo, clone --depth <num>?

-- 
Hallvard

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Any tips for improving the performance of cloning large repositories?
  2011-12-16 13:11 ` Hallvard Breien Furuseth
@ 2011-12-16 13:39   ` Hallvard Breien Furuseth
  2011-12-16 14:14     ` Seth Robertson
  0 siblings, 1 reply; 7+ messages in thread
From: Hallvard Breien Furuseth @ 2011-12-16 13:39 UTC (permalink / raw)
  To: Alex Bennee; +Cc: git

I wrote:
> Do you often need to clone from a remote?  Instead of cloning from a
> local (git clone --mirror) which gets auto-updated from the remote.

Er, obviously not, since you tried that with rsync.  Create the mirror
with 'git clone --mirror', then update it with 'git fetch' rather than
rsync.

-- 
Hallvard

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Any tips for improving the performance of cloning large repositories?
  2011-12-16 13:39   ` Hallvard Breien Furuseth
@ 2011-12-16 14:14     ` Seth Robertson
  2011-12-16 15:28       ` Alex Bennee
  0 siblings, 1 reply; 7+ messages in thread
From: Seth Robertson @ 2011-12-16 14:14 UTC (permalink / raw)
  To: Hallvard Breien Furuseth; +Cc: Alex Bennee, git


In message <hbf.20111216zcin@bombur.uio.no>, Hallvard Breien Furuseth writes:

    I wrote:
    > Do you often need to clone from a remote?  Instead of cloning from a
    > local (git clone --mirror) which gets auto-updated from the remote.

    Er, obviously not, since you tried that with rsync.  Create the mirror
    with 'git clone --mirror', then update it with 'git fetch' rather than
    rsync.

If you really need to perform a full clone from the buildbot with or
without a different working directory (for instance if you have
buildbots/checkout users running in parallel where multiple users need
a consistent HEAD for multiple sequential operations) then instead
consider cloning with --reference or --shared.  There are severe
restrictions on what you should do with aggressive sharing (man
git-clone), but if all you are doing is normal checkouts, tags,
commits, etc, then it would be just fine.  Of course remember to add a
remote for the real upstream if you are planning on pushing
changes/tags back.

					-Seth Robertson

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Any tips for improving the performance of cloning large repositories?
  2011-12-16 14:14     ` Seth Robertson
@ 2011-12-16 15:28       ` Alex Bennee
  2011-12-16 17:08         ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Bennee @ 2011-12-16 15:28 UTC (permalink / raw)
  To: Seth Robertson; +Cc: Hallvard Breien Furuseth, git

On 16 December 2011 14:14, Seth Robertson <in-gitvger@baka.org> wrote:
>
> In message <hbf.20111216zcin@bombur.uio.no>, Hallvard Breien Furuseth writes:
>
>    I wrote:
>    > Do you often need to clone from a remote?  Instead of cloning from a
>    > local (git clone --mirror) which gets auto-updated from the remote.
>
>    Er, obviously not, since you tried that with rsync.  Create the mirror
>    with 'git clone --mirror', then update it with 'git fetch' rather than
>    rsync.
>
> If you really need to perform a full clone from the buildbot with or
> without a different working directory (for instance if you have
> buildbots/checkout users running in parallel where multiple users need
> a consistent HEAD for multiple sequential operations) then instead
> consider cloning with --reference or --shared.

Well that's counter intuitive....

 - reverting the original repo to one big pack speeds up the clone
 - adding a --local --reference mirror slows it down

Timings:

14:41 ajb@vsbldhost/i686 [ajb] >time git clone git://engbot/repo.git
test-clone-bigpack.git
Initialized empty Git repository in /scratch/ajb/test-clone-bigpack.git/.git/
remote: Counting objects: 371220, done.
remote: Compressing objects: 100% (88900/88900), done.
remote: Total 371220 (delta 274586), reused 371220 (delta 274586)
Receiving objects: 100% (371220/371220), 1.78 GiB | 20.10 MiB/s, done.
Resolving deltas: 100% (274586/274586), done.
Checking out files: 100% (42909/42909), done.

real    8m53.008s
user    2m53.151s
sys     7m16.339s

14:53 ajb@vsbldhost/i686 [ajb] >time git clone --local --reference
/var/cache/repos/repo.git git://engbot/repo.git te
st-clone-local.git
Initialized empty Git repository in /scratch/ajb/test-clone-local.git/.git/
Checking out files: 100% (42909/42909), done.

real    14m6.333s
user    1m6.844s
sys     12m44.676s

Two things are odd. The first is the clone "hung" at around 22%
checking out the files for ~ 10 minutes before finishing the remaining
70% in a few seconds. Secondly is seems in both cases the systime is
quite high.

-- 
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Any tips for improving the performance of cloning large repositories?
  2011-12-16 15:28       ` Alex Bennee
@ 2011-12-16 17:08         ` Junio C Hamano
  2011-12-16 18:37           ` Alex Bennee
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2011-12-16 17:08 UTC (permalink / raw)
  To: Alex Bennee; +Cc: Seth Robertson, Hallvard Breien Furuseth, git

Alex Bennee <kernel-hacker@bennee.com> writes:

> Well that's counter intuitive....
>
>  - reverting the original repo to one big pack speeds up the clone
>  - adding a --local --reference mirror slows it down

Neither is. Read what "--local" says in the help text of clone. It
disables the git aware clever optimization.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Any tips for improving the performance of cloning large repositories?
  2011-12-16 17:08         ` Junio C Hamano
@ 2011-12-16 18:37           ` Alex Bennee
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Bennee @ 2011-12-16 18:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Seth Robertson, Hallvard Breien Furuseth, git

On 16 December 2011 17:08, Junio C Hamano <gitster@pobox.com> wrote:
> Alex Bennee <kernel-hacker@bennee.com> writes:
>
>> Well that's counter intuitive....
>>
>>  - reverting the original repo to one big pack speeds up the clone
>>  - adding a --local --reference mirror slows it down
>
> Neither is. Read what "--local" says in the help text of clone. It
> disables the git aware clever optimization.

OK that's not how I read the man page:

       --local, -l
           When the repository to clone from is on a local machine,
this flag bypasses the normal "git aware" transport
           mechanism and clones the repository by making a copy of
HEAD and everything under objects and refs directories.

So this says it skips "git aware" (whatever that means)

           The files under .git/objects/ directory are hardlinked to
save space when possible. This is now the default when
           the source repository is specified with /path/to/repo
syntax, so it essentially is a no-op option. To force
           copying instead of hardlinking (which may be desirable if
you are trying to make a back-up of your repository),
           but still avoid the usual "git aware" transport mechanism,
--no-hardlinks can be used.

And this says that objects on the local file-system are hardlinked
(rather than copied) which I assumed was a optimal approach.

       --no-hardlinks
           Optimize the cloning process from a repository on a local
filesystem by copying files under .git/objects
           directory.

I'm not sure how this is an optimization? This means more copying
rather than linking right?

-- 
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-12-16 18:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-16 13:02 Any tips for improving the performance of cloning large repositories? Alex Bennee
2011-12-16 13:11 ` Hallvard Breien Furuseth
2011-12-16 13:39   ` Hallvard Breien Furuseth
2011-12-16 14:14     ` Seth Robertson
2011-12-16 15:28       ` Alex Bennee
2011-12-16 17:08         ` Junio C Hamano
2011-12-16 18:37           ` Alex Bennee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.