* Any tips for improving the performance of cloning large repositories?
@ 2011-12-16 13:02 Alex Bennee
2011-12-16 13:11 ` Hallvard Breien Furuseth
0 siblings, 1 reply; 7+ messages in thread
From: Alex Bennee @ 2011-12-16 13:02 UTC (permalink / raw)
To: git
Hi,
We've migrated our old CVS repository into GIT without too many
issues. However now we are rolling out the usage of the new repository
we are hitting some performance bottlenecks, especially on the initial
clone (something our buildbot instance does a lot).
Our repo is large, my .git is around 2.5G although the central repo
has a 1.7Gb single pack file. However some machines handle the cloning
better than others. For one thing the clone process seems to involve
the receiving side needing a large glob of memory which causes
problems when there is memory pressure.
I've tried tweaking the pack size from unlimited to 256m but this
seems to have increased the clone time as the receiving end attempts
to re-pack everything back into an uber-pack.
Another thing that I've noticed is very high systime on the receiving
machines as ethernet and disk I/O is heavily hit.
So what I'm looking for are some tips on how I can tweak
configurations to make the clone process a little less I/O and memory
heavy. Any suggestions?
One thing I did try was a rsync'ed local repo in /var/cache/repos
which the clone command used for reference with something like:
git clone --local --reference /var/cache/repo.git git://repo/repo.git
But that didn't help as it seems to copy the whole thing anyway.
--
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Any tips for improving the performance of cloning large repositories?
2011-12-16 13:02 Any tips for improving the performance of cloning large repositories? Alex Bennee
@ 2011-12-16 13:11 ` Hallvard Breien Furuseth
2011-12-16 13:39 ` Hallvard Breien Furuseth
0 siblings, 1 reply; 7+ messages in thread
From: Hallvard Breien Furuseth @ 2011-12-16 13:11 UTC (permalink / raw)
To: Alex Bennee; +Cc: git
Alex Bennee writes:
> We've migrated our old CVS repository into GIT without too many
> issues. However now we are rolling out the usage of the new repository
> we are hitting some performance bottlenecks, especially on the initial
> clone (something our buildbot instance does a lot).
Do you often need to clone from a remote? Instead of cloning from a
local (git clone --mirror) which gets auto-updated from the remote.
Could the buildbot make do with a shallow repo, clone --depth <num>?
--
Hallvard
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Any tips for improving the performance of cloning large repositories?
2011-12-16 13:11 ` Hallvard Breien Furuseth
@ 2011-12-16 13:39 ` Hallvard Breien Furuseth
2011-12-16 14:14 ` Seth Robertson
0 siblings, 1 reply; 7+ messages in thread
From: Hallvard Breien Furuseth @ 2011-12-16 13:39 UTC (permalink / raw)
To: Alex Bennee; +Cc: git
I wrote:
> Do you often need to clone from a remote? Instead of cloning from a
> local (git clone --mirror) which gets auto-updated from the remote.
Er, obviously not, since you tried that with rsync. Create the mirror
with 'git clone --mirror', then update it with 'git fetch' rather than
rsync.
--
Hallvard
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Any tips for improving the performance of cloning large repositories?
2011-12-16 13:39 ` Hallvard Breien Furuseth
@ 2011-12-16 14:14 ` Seth Robertson
2011-12-16 15:28 ` Alex Bennee
0 siblings, 1 reply; 7+ messages in thread
From: Seth Robertson @ 2011-12-16 14:14 UTC (permalink / raw)
To: Hallvard Breien Furuseth; +Cc: Alex Bennee, git
In message <hbf.20111216zcin@bombur.uio.no>, Hallvard Breien Furuseth writes:
I wrote:
> Do you often need to clone from a remote? Instead of cloning from a
> local (git clone --mirror) which gets auto-updated from the remote.
Er, obviously not, since you tried that with rsync. Create the mirror
with 'git clone --mirror', then update it with 'git fetch' rather than
rsync.
If you really need to perform a full clone from the buildbot with or
without a different working directory (for instance if you have
buildbots/checkout users running in parallel where multiple users need
a consistent HEAD for multiple sequential operations) then instead
consider cloning with --reference or --shared. There are severe
restrictions on what you should do with aggressive sharing (man
git-clone), but if all you are doing is normal checkouts, tags,
commits, etc, then it would be just fine. Of course remember to add a
remote for the real upstream if you are planning on pushing
changes/tags back.
-Seth Robertson
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Any tips for improving the performance of cloning large repositories?
2011-12-16 14:14 ` Seth Robertson
@ 2011-12-16 15:28 ` Alex Bennee
2011-12-16 17:08 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Alex Bennee @ 2011-12-16 15:28 UTC (permalink / raw)
To: Seth Robertson; +Cc: Hallvard Breien Furuseth, git
On 16 December 2011 14:14, Seth Robertson <in-gitvger@baka.org> wrote:
>
> In message <hbf.20111216zcin@bombur.uio.no>, Hallvard Breien Furuseth writes:
>
> I wrote:
> > Do you often need to clone from a remote? Instead of cloning from a
> > local (git clone --mirror) which gets auto-updated from the remote.
>
> Er, obviously not, since you tried that with rsync. Create the mirror
> with 'git clone --mirror', then update it with 'git fetch' rather than
> rsync.
>
> If you really need to perform a full clone from the buildbot with or
> without a different working directory (for instance if you have
> buildbots/checkout users running in parallel where multiple users need
> a consistent HEAD for multiple sequential operations) then instead
> consider cloning with --reference or --shared.
Well that's counter intuitive....
- reverting the original repo to one big pack speeds up the clone
- adding a --local --reference mirror slows it down
Timings:
14:41 ajb@vsbldhost/i686 [ajb] >time git clone git://engbot/repo.git
test-clone-bigpack.git
Initialized empty Git repository in /scratch/ajb/test-clone-bigpack.git/.git/
remote: Counting objects: 371220, done.
remote: Compressing objects: 100% (88900/88900), done.
remote: Total 371220 (delta 274586), reused 371220 (delta 274586)
Receiving objects: 100% (371220/371220), 1.78 GiB | 20.10 MiB/s, done.
Resolving deltas: 100% (274586/274586), done.
Checking out files: 100% (42909/42909), done.
real 8m53.008s
user 2m53.151s
sys 7m16.339s
14:53 ajb@vsbldhost/i686 [ajb] >time git clone --local --reference
/var/cache/repos/repo.git git://engbot/repo.git te
st-clone-local.git
Initialized empty Git repository in /scratch/ajb/test-clone-local.git/.git/
Checking out files: 100% (42909/42909), done.
real 14m6.333s
user 1m6.844s
sys 12m44.676s
Two things are odd. The first is the clone "hung" at around 22%
checking out the files for ~ 10 minutes before finishing the remaining
70% in a few seconds. Secondly is seems in both cases the systime is
quite high.
--
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Any tips for improving the performance of cloning large repositories?
2011-12-16 15:28 ` Alex Bennee
@ 2011-12-16 17:08 ` Junio C Hamano
2011-12-16 18:37 ` Alex Bennee
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2011-12-16 17:08 UTC (permalink / raw)
To: Alex Bennee; +Cc: Seth Robertson, Hallvard Breien Furuseth, git
Alex Bennee <kernel-hacker@bennee.com> writes:
> Well that's counter intuitive....
>
> - reverting the original repo to one big pack speeds up the clone
> - adding a --local --reference mirror slows it down
Neither is. Read what "--local" says in the help text of clone. It
disables the git aware clever optimization.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Any tips for improving the performance of cloning large repositories?
2011-12-16 17:08 ` Junio C Hamano
@ 2011-12-16 18:37 ` Alex Bennee
0 siblings, 0 replies; 7+ messages in thread
From: Alex Bennee @ 2011-12-16 18:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Seth Robertson, Hallvard Breien Furuseth, git
On 16 December 2011 17:08, Junio C Hamano <gitster@pobox.com> wrote:
> Alex Bennee <kernel-hacker@bennee.com> writes:
>
>> Well that's counter intuitive....
>>
>> - reverting the original repo to one big pack speeds up the clone
>> - adding a --local --reference mirror slows it down
>
> Neither is. Read what "--local" says in the help text of clone. It
> disables the git aware clever optimization.
OK that's not how I read the man page:
--local, -l
When the repository to clone from is on a local machine,
this flag bypasses the normal "git aware" transport
mechanism and clones the repository by making a copy of
HEAD and everything under objects and refs directories.
So this says it skips "git aware" (whatever that means)
The files under .git/objects/ directory are hardlinked to
save space when possible. This is now the default when
the source repository is specified with /path/to/repo
syntax, so it essentially is a no-op option. To force
copying instead of hardlinking (which may be desirable if
you are trying to make a back-up of your repository),
but still avoid the usual "git aware" transport mechanism,
--no-hardlinks can be used.
And this says that objects on the local file-system are hardlinked
(rather than copied) which I assumed was a optimal approach.
--no-hardlinks
Optimize the cloning process from a repository on a local
filesystem by copying files under .git/objects
directory.
I'm not sure how this is an optimization? This means more copying
rather than linking right?
--
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-12-16 18:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-16 13:02 Any tips for improving the performance of cloning large repositories? Alex Bennee
2011-12-16 13:11 ` Hallvard Breien Furuseth
2011-12-16 13:39 ` Hallvard Breien Furuseth
2011-12-16 14:14 ` Seth Robertson
2011-12-16 15:28 ` Alex Bennee
2011-12-16 17:08 ` Junio C Hamano
2011-12-16 18:37 ` Alex Bennee
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.