* git clone: very long "resolving deltas" phase
@ 2010-04-06 14:18 Vitaly Berov
2010-04-06 15:01 ` Matthieu Moy
2010-04-06 21:10 ` Nicolas Pitre
0 siblings, 2 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-06 14:18 UTC (permalink / raw)
To: git
We have quite a large repository and "git clone" takes about 6 hours. Herewith
"resolving deltas" takes most of the time.
What git does at this stage and how can we optimize it?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 14:18 git clone: very long "resolving deltas" phase Vitaly Berov
@ 2010-04-06 15:01 ` Matthieu Moy
2010-04-06 15:28 ` Vitaly Berov
` (2 more replies)
2010-04-06 21:10 ` Nicolas Pitre
1 sibling, 3 replies; 39+ messages in thread
From: Matthieu Moy @ 2010-04-06 15:01 UTC (permalink / raw)
To: Vitaly Berov; +Cc: git
Vitaly Berov <vitaly.berov@gmail.com> writes:
> We have quite a large repository and "git clone" takes about 6 hours. Herewith
> "resolving deltas" takes most of the time.
> What git does at this stage and how can we optimize it?
Does running "git gc" (long, but done once and for all) on the server
help?
--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 15:01 ` Matthieu Moy
@ 2010-04-06 15:28 ` Vitaly Berov
2010-04-06 15:29 ` Vitaly
2010-04-06 21:01 ` git clone: very long "resolving deltas" phase Nicolas Pitre
2 siblings, 0 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-06 15:28 UTC (permalink / raw)
To: git; +Cc: git
On 04/06/2010 07:01 PM, Matthieu Moy wrote:
> Vitaly Berov<vitaly.berov@gmail.com> writes:
>
>> We have quite a large repository and "git clone" takes about 6 hours. Herewith
>> "resolving deltas" takes most of the time.
>> What git does at this stage and how can we optimize it?
>
> Does running "git gc" (long, but done once and for all) on the server
> help?
>
Didn't try this one, but I'll give it a try, thanks.
And what does this stage do?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 15:01 ` Matthieu Moy
2010-04-06 15:28 ` Vitaly Berov
@ 2010-04-06 15:29 ` Vitaly
2010-04-06 15:32 ` Andreas Ericsson
2010-04-06 21:01 ` git clone: very long "resolving deltas" phase Nicolas Pitre
2 siblings, 1 reply; 39+ messages in thread
From: Vitaly @ 2010-04-06 15:29 UTC (permalink / raw)
To: Matthieu Moy; +Cc: git
I didn't try this, but I'll give it a try, thanks.
And what does this stage mean?
On 04/06/2010 07:01 PM, Matthieu Moy wrote:
> Vitaly Berov<vitaly.berov@gmail.com> writes:
>
>
>> We have quite a large repository and "git clone" takes about 6 hours. Herewith
>> "resolving deltas" takes most of the time.
>> What git does at this stage and how can we optimize it?
>>
> Does running "git gc" (long, but done once and for all) on the server
> help?
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 15:29 ` Vitaly
@ 2010-04-06 15:32 ` Andreas Ericsson
[not found] ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com>
2010-04-06 21:05 ` Nicolas Pitre
0 siblings, 2 replies; 39+ messages in thread
From: Andreas Ericsson @ 2010-04-06 15:32 UTC (permalink / raw)
To: Vitaly; +Cc: Matthieu Moy, git
On 04/06/2010 05:29 PM, Vitaly wrote:
> I didn't try this, but I'll give it a try, thanks.
>
> And what does this stage mean?
>
It means the server is busy creating a packfile to send
over the wire. If you pack the repository before cloning
from it, deltas from the packfile will simply be copied
into the new pack. This will provide a huge speedboost,
so make sure to repack the repository on the server every
once in a while.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
[not found] ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com>
@ 2010-04-06 15:56 ` Vitaly Berov
2010-04-06 21:09 ` Nicolas Pitre
0 siblings, 1 reply; 39+ messages in thread
From: Vitaly Berov @ 2010-04-06 15:56 UTC (permalink / raw)
To: Shawn Pearce; +Cc: Andreas Ericsson, git, Matthieu Moy
Why does git compute checksums on the client side? Isn't it already
calculated on the "server" side?
2010/4/6 Shawn Pearce <spearce@spearce.org>:
> Nope, the resolving deltas phase is about computing the checksums of each
> object on the client side of the connection. Repacking the server might
> have little impact on this phase, other than maybe to reduce the size and
> thus the disk io required to scan the entire pack.
>
> On Apr 6, 2010 9:32 AM, "Andreas Ericsson" <ae@op5.se> wrote:
>> On 04/06/2010 05:29 PM, Vitaly wrote:
>>> I didn't try this, but I'll give it a try, thanks.
>>>
>>> And what does this stage mean?
>>>
>>
>> It means the server is busy creating a packfile to send
>> over the wire. If you pack the repository before cloning
>> from it, deltas from the packfile will simply be copied
>> into the new pack. This will provide a huge speedboost,
>> so make sure to repack the repository on the server every
>> once in a while.
>>
>> --
>> Andreas Ericsson andreas.ericsson@op5.se
>> OP5 AB www.op5.se
>> Tel: +46 8-230225 Fax: +46 8-230231
>>
>> Considering the successes of the wars on alcohol, poverty, drugs and
>> terror, I think we should give some serious thought to declaring war
>> on peace.
>> --
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 15:01 ` Matthieu Moy
2010-04-06 15:28 ` Vitaly Berov
2010-04-06 15:29 ` Vitaly
@ 2010-04-06 21:01 ` Nicolas Pitre
2 siblings, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-06 21:01 UTC (permalink / raw)
To: Matthieu Moy; +Cc: Vitaly Berov, git
On Tue, 6 Apr 2010, Matthieu Moy wrote:
> Vitaly Berov <vitaly.berov@gmail.com> writes:
>
> > We have quite a large repository and "git clone" takes about 6 hours. Herewith
> > "resolving deltas" takes most of the time.
> > What git does at this stage and how can we optimize it?
>
> Does running "git gc" (long, but done once and for all) on the server
> help?
No, that won't help.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 15:32 ` Andreas Ericsson
[not found] ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com>
@ 2010-04-06 21:05 ` Nicolas Pitre
2010-04-07 9:22 ` git clone: very long "resolving deltas" phase Marat Radchenko
1 sibling, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-06 21:05 UTC (permalink / raw)
To: Andreas Ericsson; +Cc: Vitaly, Matthieu Moy, git
On Tue, 6 Apr 2010, Andreas Ericsson wrote:
> On 04/06/2010 05:29 PM, Vitaly wrote:
> > I didn't try this, but I'll give it a try, thanks.
> >
> > And what does this stage mean?
> >
>
> It means the server is busy creating a packfile to send
> over the wire.
No.
The "Resolving deltas" is performed locally, when Git is actually
expanding all the deltas in the received pack to find the actual SHA1 of
the resulting object in order to create the pack index.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 15:56 ` Vitaly Berov
@ 2010-04-06 21:09 ` Nicolas Pitre
2010-04-07 5:54 ` Vitaly Berov
2010-04-07 5:55 ` Vitaly
0 siblings, 2 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-06 21:09 UTC (permalink / raw)
To: Vitaly Berov; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy
On Tue, 6 Apr 2010, Vitaly Berov wrote:
> Why does git compute checksums on the client side? Isn't it already
> calculated on the "server" side?
Yes. But Git clients can't trust the server like that.
The only way to make sure the server didn't send you crap data, or worse
maliciously altered data, is actually to not transfer any checksum data
but to recompute and validate the received payload locally.
This being said, you should never have to wait 6 hours for that phase to
complete. It is typically a matter of minutes if not seconds.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 14:18 git clone: very long "resolving deltas" phase Vitaly Berov
2010-04-06 15:01 ` Matthieu Moy
@ 2010-04-06 21:10 ` Nicolas Pitre
2010-04-07 5:57 ` Vitaly
1 sibling, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-06 21:10 UTC (permalink / raw)
To: Vitaly Berov; +Cc: git
On Tue, 6 Apr 2010, Vitaly Berov wrote:
> We have quite a large repository and "git clone" takes about 6 hours. Herewith
> "resolving deltas" takes most of the time.
This simply makes no sense.
Is this repository publicly clonable?
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 21:09 ` Nicolas Pitre
@ 2010-04-07 5:54 ` Vitaly Berov
2010-04-07 8:00 ` Ilari Liusvaara
2010-04-07 5:55 ` Vitaly
1 sibling, 1 reply; 39+ messages in thread
From: Vitaly Berov @ 2010-04-07 5:54 UTC (permalink / raw)
To: git; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy
I suspected the security reasons.
Ok, we work in trusted environment. How can we turn this behavior off?
Vitaly
On 04/07/2010 01:09 AM, Nicolas Pitre wrote:
> On Tue, 6 Apr 2010, Vitaly Berov wrote:
>
>> Why does git compute checksums on the client side? Isn't it already
>> calculated on the "server" side?
>
> Yes. But Git clients can't trust the server like that.
>
> The only way to make sure the server didn't send you crap data, or worse
> maliciously altered data, is actually to not transfer any checksum data
> but to recompute and validate the received payload locally.
>
> This being said, you should never have to wait 6 hours for that phase to
> complete. It is typically a matter of minutes if not seconds.
>
>
> Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 21:09 ` Nicolas Pitre
2010-04-07 5:54 ` Vitaly Berov
@ 2010-04-07 5:55 ` Vitaly
2010-04-07 12:42 ` Nicolas Pitre
1 sibling, 1 reply; 39+ messages in thread
From: Vitaly @ 2010-04-07 5:55 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy
I suspected the security reasons.
Ok, we work in trusted environment. How can we turn this behavior off?
On 04/07/2010 01:09 AM, Nicolas Pitre wrote:
> On Tue, 6 Apr 2010, Vitaly Berov wrote:
>
>
>> Why does git compute checksums on the client side? Isn't it already
>> calculated on the "server" side?
>>
> Yes. But Git clients can't trust the server like that.
>
> The only way to make sure the server didn't send you crap data, or worse
> maliciously altered data, is actually to not transfer any checksum data
> but to recompute and validate the received payload locally.
>
> This being said, you should never have to wait 6 hours for that phase to
> complete. It is typically a matter of minutes if not seconds.
>
>
> Nicolas
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 21:10 ` Nicolas Pitre
@ 2010-04-07 5:57 ` Vitaly
2010-04-07 12:55 ` Nicolas Pitre
0 siblings, 1 reply; 39+ messages in thread
From: Vitaly @ 2010-04-07 5:57 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git
Hmm, what does it mean - "makes no sense"? It works as it works - for
several hours.
No, we work in a trusted environment. Our repository isn't open for
external people.
On 04/07/2010 01:10 AM, Nicolas Pitre wrote:
> On Tue, 6 Apr 2010, Vitaly Berov wrote:
>
>
>> We have quite a large repository and "git clone" takes about 6 hours. Herewith
>> "resolving deltas" takes most of the time.
>>
> This simply makes no sense.
>
> Is this repository publicly clonable?
>
>
> Nicolas
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 5:54 ` Vitaly Berov
@ 2010-04-07 8:00 ` Ilari Liusvaara
2010-04-07 8:14 ` Vitaly
2010-04-07 14:08 ` Nicolas Pitre
0 siblings, 2 replies; 39+ messages in thread
From: Ilari Liusvaara @ 2010-04-07 8:00 UTC (permalink / raw)
To: Vitaly Berov; +Cc: git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
On Wed, Apr 07, 2010 at 09:54:29AM +0400, Vitaly Berov wrote:
> I suspected the security reasons.
>
> Ok, we work in trusted environment. How can we turn this behavior off?
It can't be turned off. Protocol requires client to recompute hashes
as they are not explicitly available in transport stream (must be inferred
instead).
> >This being said, you should never have to wait 6 hours for that phase to
> >complete. It is typically a matter of minutes if not seconds.
The reasons why it might take 6 hours (offhand from memory):
- Extremely large repo
- Very large files in repo pushing client into swap.
-Ilari
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 8:00 ` Ilari Liusvaara
@ 2010-04-07 8:14 ` Vitaly
2010-04-07 9:00 ` Ilari Liusvaara
` (2 more replies)
2010-04-07 14:08 ` Nicolas Pitre
1 sibling, 3 replies; 39+ messages in thread
From: Vitaly @ 2010-04-07 8:14 UTC (permalink / raw)
To: Ilari Liusvaara; +Cc: git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
Too bad..
Yes, we really have a very large repo with binary files.
So, as far as I understand, the fastest way is to use rsync or smth like
this instead of "git clone".
P.S. Btw, how can I ask for a feature of incorporating hashes into
transport stream in trusted environments?
On 04/07/2010 12:00 PM, Ilari Liusvaara wrote:
> On Wed, Apr 07, 2010 at 09:54:29AM +0400, Vitaly Berov wrote:
>
>> I suspected the security reasons.
>>
>> Ok, we work in trusted environment. How can we turn this behavior off?
>>
>
> It can't be turned off. Protocol requires client to recompute hashes
> as they are not explicitly available in transport stream (must be inferred
> instead).
>
>
>>> This being said, you should never have to wait 6 hours for that phase to
>>> complete. It is typically a matter of minutes if not seconds.
>>>
> The reasons why it might take 6 hours (offhand from memory):
>
> - Extremely large repo
> - Very large files in repo pushing client into swap.
>
> -Ilari
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 8:14 ` Vitaly
@ 2010-04-07 9:00 ` Ilari Liusvaara
2010-04-07 9:37 ` Jakub Narebski
2010-04-07 14:20 ` Nicolas Pitre
2 siblings, 0 replies; 39+ messages in thread
From: Ilari Liusvaara @ 2010-04-07 9:00 UTC (permalink / raw)
To: Vitaly; +Cc: git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
On Wed, Apr 07, 2010 at 12:14:36PM +0400, Vitaly wrote:
> Too bad..
> Yes, we really have a very large repo with binary files.
Large binary files are the worst. I think that disabling deltification
('-delta' as attribute[*]) on them might actually help somewhat...
> P.S. Btw, how can I ask for a feature of incorporating hashes into
> transport stream in trusted environments?
On this mailing list. But as a tip: don't bother: It is by far too
large change relative to any possible benefit.
[*] I think 'info/attributes' on server influences wheither
those objects are attempted to be deltified or not.
-Ilari
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-06 21:05 ` Nicolas Pitre
@ 2010-04-07 9:22 ` Marat Radchenko
2010-04-07 14:40 ` Nicolas Pitre
0 siblings, 1 reply; 39+ messages in thread
From: Marat Radchenko @ 2010-04-07 9:22 UTC (permalink / raw)
To: git
Nicolas Pitre <nico <at> fluxnic.net> writes:
> The "Resolving deltas" is performed locally, when Git is actually
> expanding all the deltas in the received pack to find the actual SHA1 of
> the resulting object in order to create the pack index.
Is there any technical limitation why it cannot be done simultaniously with
fetch (piped or whatever), instead of a separate step after fetch?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 8:14 ` Vitaly
2010-04-07 9:00 ` Ilari Liusvaara
@ 2010-04-07 9:37 ` Jakub Narebski
2010-04-07 14:20 ` Nicolas Pitre
2 siblings, 0 replies; 39+ messages in thread
From: Jakub Narebski @ 2010-04-07 9:37 UTC (permalink / raw)
To: Vitaly; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
Please do not toppost.
Vitaly <vitaly.berov@gmail.com> writes:
> Too bad..
> Yes, we really have a very large repo with binary files.
>
> So, as far as I understand, the fastest way is to use rsync or smth
> like this instead of "git clone".
>
> P.S. Btw, how can I ask for a feature of incorporating hashes into
> transport stream in trusted environments?
If you have very large binary files, perhaps git-bigfiles fork would
help you: http://caca.zoy.org/wiki/git-bigfiles
--
Jakub Narebski
Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 5:55 ` Vitaly
@ 2010-04-07 12:42 ` Nicolas Pitre
0 siblings, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 12:42 UTC (permalink / raw)
To: Vitaly; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy
On Wed, 7 Apr 2010, Vitaly wrote:
> I suspected the security reasons.
>
> Ok, we work in trusted environment. How can we turn this behavior off?
you can't. This is fundamental to the Git native protocol.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 5:57 ` Vitaly
@ 2010-04-07 12:55 ` Nicolas Pitre
2010-04-09 6:50 ` Vitaly Berov
2010-04-10 13:25 ` Vitaly Berov
0 siblings, 2 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 12:55 UTC (permalink / raw)
To: Vitaly; +Cc: git
On Wed, 7 Apr 2010, Vitaly wrote:
> Hmm, what does it mean - "makes no sense"? It works as it works - for several
> hours.
As I said, several hours for this operation makes no sense. This should
take minutes if no seconds. *This* is what needs fixing.
> No, we work in a trusted environment. Our repository isn't open for external
> people.
I was asking that because that would have helped me (or any other Git
developer) analyse the issue and provide a fix.
OK then. What happens if you do the following on the server machine
where the repository is stored:
git repack -a -f -d
How long does this take?
How long does the "Resolving deltas" take when cloning this repacked
repository? (don't wait more than 10 minutes for it).
If the "Resolving deltas" takes more than 10 minutes, could you capture
a strace dump from that process during a minute or so and post it here?
Hmmm. Is this on Linux or Windows?
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 8:00 ` Ilari Liusvaara
2010-04-07 8:14 ` Vitaly
@ 2010-04-07 14:08 ` Nicolas Pitre
2010-04-07 14:29 ` Sverre Rabbelier
1 sibling, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 14:08 UTC (permalink / raw)
To: Ilari Liusvaara
Cc: Vitaly Berov, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
On Wed, 7 Apr 2010, Ilari Liusvaara wrote:
> On Wed, Apr 07, 2010 at 09:54:29AM +0400, Vitaly Berov wrote:
> > I suspected the security reasons.
> >
> > Ok, we work in trusted environment. How can we turn this behavior off?
>
> It can't be turned off. Protocol requires client to recompute hashes
> as they are not explicitly available in transport stream (must be inferred
> instead).
>
> > >This being said, you should never have to wait 6 hours for that phase to
> > >complete. It is typically a matter of minutes if not seconds.
>
> The reasons why it might take 6 hours (offhand from memory):
>
> - Extremely large repo
Six hours is still way out of the expected computational requirement.
That's an expected time for an aggressive repack for example, where
_each_ delta is attempted against a different base up to 250 times.
But when indexing a fetched pack, each delta is expected to be computed
only once.
> - Very large files in repo pushing client into swap.
This shouldn't happen since commit 92392b4a which provide a cap on
memory usage during the delta resolution process.
So without a look at the actual repository causing this pathological
behavior it is hard to guess what the issue and the required fix might
be.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 8:14 ` Vitaly
2010-04-07 9:00 ` Ilari Liusvaara
2010-04-07 9:37 ` Jakub Narebski
@ 2010-04-07 14:20 ` Nicolas Pitre
2010-04-07 14:35 ` Vitaly
2 siblings, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 14:20 UTC (permalink / raw)
To: Vitaly; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
On Wed, 7 Apr 2010, Vitaly wrote:
> Too bad..
> Yes, we really have a very large repo with binary files.
>
> So, as far as I understand, the fastest way is to use rsync or smth like this
> instead of "git clone".
You should still be able to use 'git clone' with a rsync:// style URL.
> P.S. Btw, how can I ask for a feature of incorporating hashes into transport
> stream in trusted environments?
As I'm trying to make you understand repeatedly now, this shouldn't be
needed. A real fix for the bad behavior would be in order before
papering over it.
If the large binary blobs are the source of the clone problem, then they
will cause the same problems with other commands such as 'git diff' or
even 'git checkout' later on. So that "feature" you're asking for is
misguided.
What you might try on your client machines is this:
git config --global core.deltaBaseCacheLimit 256m
before doing a clone.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 14:08 ` Nicolas Pitre
@ 2010-04-07 14:29 ` Sverre Rabbelier
2010-04-07 14:37 ` Vitaly
0 siblings, 1 reply; 39+ messages in thread
From: Sverre Rabbelier @ 2010-04-07 14:29 UTC (permalink / raw)
To: Nicolas Pitre, Vitaly Berov
Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
Heya,
On Wed, Apr 7, 2010 at 09:08, Nicolas Pitre <nico@fluxnic.net> wrote:
> This shouldn't happen since commit 92392b4a which provide a cap on
> memory usage during the delta resolution process.
Which made me think of asking:
Vitaly, what version of git are you running? Both client and server side please.
--
Cheers,
Sverre Rabbelier
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 14:20 ` Nicolas Pitre
@ 2010-04-07 14:35 ` Vitaly
2010-04-07 14:55 ` Nicolas Pitre
0 siblings, 1 reply; 39+ messages in thread
From: Vitaly @ 2010-04-07 14:35 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
On 04/07/2010 06:20 PM, Nicolas Pitre wrote:
>> P.S. Btw, how can I ask for a feature of incorporating hashes into transport
>> stream in trusted environments?
>>
> As I'm trying to make you understand repeatedly now, this shouldn't be
> needed. A real fix for the bad behavior would be in order before
> papering over it.
>
Nicolas, my post have been written before I received your message about
reproducing and "stracing" the problem. I caught your idea and now
reproducing the problem.
My estimate is tomorrow (repack takes quite a long time).
Vitaly
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 14:29 ` Sverre Rabbelier
@ 2010-04-07 14:37 ` Vitaly
0 siblings, 0 replies; 39+ messages in thread
From: Vitaly @ 2010-04-07 14:37 UTC (permalink / raw)
To: Sverre Rabbelier
Cc: Nicolas Pitre, Ilari Liusvaara, git, Shawn Pearce,
Andreas Ericsson, Matthieu Moy
On 04/07/2010 06:29 PM, Sverre Rabbelier wrote:
> Heya,
>
> On Wed, Apr 7, 2010 at 09:08, Nicolas Pitre<nico@fluxnic.net> wrote:
>
>> This shouldn't happen since commit 92392b4a which provide a cap on
>> memory usage during the delta resolution process.
>>
> Which made me think of asking:
>
> Vitaly, what version of git are you running? Both client and server side please.
>
git version 1.7.0.4, both sides.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 9:22 ` git clone: very long "resolving deltas" phase Marat Radchenko
@ 2010-04-07 14:40 ` Nicolas Pitre
0 siblings, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 14:40 UTC (permalink / raw)
To: Marat Radchenko; +Cc: git
On Wed, 7 Apr 2010, Marat Radchenko wrote:
> Nicolas Pitre <nico <at> fluxnic.net> writes:
> > The "Resolving deltas" is performed locally, when Git is actually
> > expanding all the deltas in the received pack to find the actual SHA1 of
> > the resulting object in order to create the pack index.
> Is there any technical limitation why it cannot be done simultaniously with
> fetch (piped or whatever), instead of a separate step after fetch?
The non delta compressed objects are indexed simultaneously as they're
received on the wire. However this is way suboptimal to do that for
delta objects because
1) The base object needed to resolve a given delta object might not have
been received yet. This means in this case that the delta will have
to be resolved later anyway, and finding out if a just received
object might be a base object for previously received objects is
rather costly, and even impossible if that potential base object is
itself a delta. So it is best to figure out the delta dependencies
only once at the end of the transfer.
2) When resolving deep delta chains, it is best to start from the root
i.e. create the result from a delta object and resolve all deltas
with this result for base recursively, not to expand deltas
repeatedly which would turn this process into exponential CPU usage.
Again this can be done only when all delta objects have been
received.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 14:35 ` Vitaly
@ 2010-04-07 14:55 ` Nicolas Pitre
2010-04-09 6:46 ` Vitaly Berov
0 siblings, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 14:55 UTC (permalink / raw)
To: Vitaly; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
On Wed, 7 Apr 2010, Vitaly wrote:
> Nicolas, my post have been written before I received your message about
> reproducing and "stracing" the problem. I caught your idea and now
> reproducing the problem.
No problem.
> My estimate is tomorrow (repack takes quite a long time).
The repack isn't so important. If it takes that long you might simply
interrupt it and strace the client when "resolving deltas" is looking to
be insanely long. In reality it is best if you don't repack as the
client needs to cope with whatever the server throws at it and repacking
your repo might hide the client issue.
Then playing with core.deltaBaseCacheLimit instead would be quite
interesting.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 14:55 ` Nicolas Pitre
@ 2010-04-09 6:46 ` Vitaly Berov
2010-04-09 19:30 ` Nicolas Pitre
0 siblings, 1 reply; 39+ messages in thread
From: Vitaly Berov @ 2010-04-09 6:46 UTC (permalink / raw)
To: git; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy
Hi,
On 04/07/2010 06:55 PM, Nicolas Pitre wrote:
>
> Then playing with core.deltaBaseCacheLimit instead would be quite
> interesting.
It's difficult to play with parameters because only receiving objects
phase takes 1.5-2 hours. But I'll try "git config --global
core.deltaBaseCacheLimit 256m" as you recommended.
Vitaly
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 12:55 ` Nicolas Pitre
@ 2010-04-09 6:50 ` Vitaly Berov
2010-04-09 8:13 ` Matthieu Moy
2010-04-09 19:25 ` Nicolas Pitre
2010-04-10 13:25 ` Vitaly Berov
1 sibling, 2 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-09 6:50 UTC (permalink / raw)
To: git; +Cc: git
Hi,
On 04/07/2010 04:55 PM, Nicolas Pitre wrote:
>
> I was asking that because that would have helped me (or any other Git
> developer) analyse the issue and provide a fix.
>
> OK then. What happens if you do the following on the server machine
> where the repository is stored:
>
> git repack -a -f -d
>
> How long does this take?
>
> How long does the "Resolving deltas" take when cloning this repacked
> repository? (don't wait more than 10 minutes for it).
Nicolas, we haven't stopped the process as you recommended, sorry for that.
So, the results: it took 37 hours. 20 hours is compressing objects
(delta compression using up to 4 threads), 17 hours is writing objects.
Almost all of the time the bottleneck was a CPU.
Objects amount: 3997548.
Size of the repository: ~57Gb.
> If the "Resolving deltas" takes more than 10 minutes, could you capture
> a strace dump from that process during a minute or so and post it here?
I'll capture strace later.
> Hmmm. Is this on Linux or Windows?
Short spec: Ubuntu 9.04 (64 bit), Intel(R) Core(TM)2 Quad CPU Q8400
2.66GHz, 8 GB of memory
By the way, we have a large amount of binary files in our rep.
Vitaly
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-09 6:50 ` Vitaly Berov
@ 2010-04-09 8:13 ` Matthieu Moy
2010-04-09 19:18 ` Nicolas Pitre
2010-04-10 8:05 ` Vitaly Berov
2010-04-09 19:25 ` Nicolas Pitre
1 sibling, 2 replies; 39+ messages in thread
From: Matthieu Moy @ 2010-04-09 8:13 UTC (permalink / raw)
To: Vitaly Berov; +Cc: git
Vitaly Berov <vitaly.berov@gmail.com> writes:
> Objects amount: 3997548.
> Size of the repository: ~57Gb.
[...]
> By the way, we have a large amount of binary files in our rep.
This is clearly not the kind of repositories Git is good at. I
encourage you to continue this discussion, and try to find a way to
get it working, but the standard approach (probably a "my 2 cents"
kind of advices, but ...) would be:
* Split your repo into smaller ones (submodules ...)
* Avoid versionning binary files
--
Matthieu Moy
http://www-verimag.imag.fr/~moy/
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-09 8:13 ` Matthieu Moy
@ 2010-04-09 19:18 ` Nicolas Pitre
2010-04-10 8:05 ` Vitaly Berov
1 sibling, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-09 19:18 UTC (permalink / raw)
To: Matthieu Moy; +Cc: Vitaly Berov, git
On Fri, 9 Apr 2010, Matthieu Moy wrote:
> Vitaly Berov <vitaly.berov@gmail.com> writes:
>
> > Objects amount: 3997548.
> > Size of the repository: ~57Gb.
> [...]
> > By the way, we have a large amount of binary files in our rep.
>
> This is clearly not the kind of repositories Git is good at. I
> encourage you to continue this discussion, and try to find a way to
> get it working, but the standard approach (probably a "my 2 cents"
> kind of advices, but ...) would be:
>
> * Split your repo into smaller ones (submodules ...)
>
> * Avoid versionning binary files
I still think that Git ought to "just work" with such a repository.
There are things that should be done for that, like applying the
core.bigFileThreshold configuration variable to more places, such as
delta compression, object creation, diff generation, etc.
Of course Git won't be as good at saving disk space in that case, but
when your repo is 57GB you probably don't care much if it grows to 80GB
but cloning it is twice as fast.
Yet, I still don't think the current issue with the receiving end of a
clone taking 6 hours in "Resolving deltas" is normal, independently of
core.bigFileThreshold.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-09 6:50 ` Vitaly Berov
2010-04-09 8:13 ` Matthieu Moy
@ 2010-04-09 19:25 ` Nicolas Pitre
2010-04-10 7:58 ` Vitaly Berov
1 sibling, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-09 19:25 UTC (permalink / raw)
To: Vitaly Berov; +Cc: git
On Fri, 9 Apr 2010, Vitaly Berov wrote:
> > OK then. What happens if you do the following on the server machine
> > where the repository is stored:
> >
> > git repack -a -f -d
> >
> > How long does this take?
>
> So, the results: it took 37 hours. 20 hours is compressing objects (delta
> compression using up to 4 threads), 17 hours is writing objects. Almost all of
> the time the bottleneck was a CPU.
>
> Objects amount: 3997548.
> Size of the repository: ~57Gb.
OK. You probably have a size record. :-)
How big is the .pack file in .git/objects/pack/ ?
> By the way, we have a large amount of binary files in our rep.
How many? How big?
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-09 6:46 ` Vitaly Berov
@ 2010-04-09 19:30 ` Nicolas Pitre
2010-04-10 6:32 ` Vitaly Berov
0 siblings, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-09 19:30 UTC (permalink / raw)
To: Vitaly Berov
Cc: git, Ilari Liusvaara, Shawn Pearce, Andreas Ericsson, Matthieu Moy
On Fri, 9 Apr 2010, Vitaly Berov wrote:
> Hi,
>
> On 04/07/2010 06:55 PM, Nicolas Pitre wrote:
> >
> > Then playing with core.deltaBaseCacheLimit instead would be quite
> > interesting.
> It's difficult to play with parameters because only receiving objects phase
> takes 1.5-2 hours.
Huh... I guess that's over 100Mbps ethernet?
57GB / 1.5h -> approx 10MB/s
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-09 19:30 ` Nicolas Pitre
@ 2010-04-10 6:32 ` Vitaly Berov
0 siblings, 0 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-10 6:32 UTC (permalink / raw)
To: git; +Cc: git, Ilari Liusvaara, Shawn Pearce, Andreas Ericsson, Matthieu Moy
On 04/09/2010 11:30 PM, Nicolas Pitre wrote:
> On Fri, 9 Apr 2010, Vitaly Berov wrote:
>
>> Hi,
>>
>> On 04/07/2010 06:55 PM, Nicolas Pitre wrote:
>>>
>>> Then playing with core.deltaBaseCacheLimit instead would be quite
>>> interesting.
>> It's difficult to play with parameters because only receiving objects phase
>> takes 1.5-2 hours.
>
> Huh... I guess that's over 100Mbps ethernet?
>
> 57GB / 1.5h -> approx 10MB/s
Yes
Vitaly
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-09 19:25 ` Nicolas Pitre
@ 2010-04-10 7:58 ` Vitaly Berov
0 siblings, 0 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-10 7:58 UTC (permalink / raw)
To: git
On 04/09/2010 11:25 PM, Nicolas Pitre wrote:
>>
>> Objects amount: 3997548.
>> Size of the repository: ~57Gb.
>
> OK. You probably have a size record. :-)
That's game development. We have ~100 artists who produce text and
binary files as "sources". FYI, the "end client version" is ~2.5GB.
> How big is the .pack file in .git/objects/pack/ ?
~56Gb
>
>> By the way, we have a large amount of binary files in our rep.
>
> How many? How big?
Total amount of files ~400000, amount of binaries ~200000.
Distribution of sizes: 5% of 4M - 32K, 5% of 32K - 12K, 12K - 6K, 6K -
4K, 4K - 2.5K, 2.5K - 2.3K, 2.3K - 2K, the rest
Vitaly
P.S. By the way, msysgit can't handle this repository, blocker bug is:
http://code.google.com/p/msysgit/issues/detail?id=365&q=mmap&colspec=ID%20Type%20Status%20Priority%20Component%20Owner%20Summary.
So I thinking about stopping the evaluation, though I like git
(especially after a long subversion experience :))
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-09 8:13 ` Matthieu Moy
2010-04-09 19:18 ` Nicolas Pitre
@ 2010-04-10 8:05 ` Vitaly Berov
1 sibling, 0 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-10 8:05 UTC (permalink / raw)
To: git
On 04/09/2010 12:13 PM, Matthieu Moy wrote:
> Vitaly Berov<vitaly.berov@gmail.com> writes:
>
>> Objects amount: 3997548.
>> Size of the repository: ~57Gb.
> [...]
>> By the way, we have a large amount of binary files in our rep.
>
> This is clearly not the kind of repositories Git is good at.
Hmm.. I'm looking for a good repository because I'm tired of subversion,
Perforce isn't an option to (very expensive and even more
uncomfortable). It seems like there only Git/Mercurial are good options.
Can you recommend some other scms?
> I encourage you to continue this discussion, and try to find a way to
> get it working, but the standard approach (probably a "my 2 cents"
> kind of advices, but ...) would be:
>
> * Split your repo into smaller ones (submodules ...)
>
> * Avoid versionning binary files
I can't get rid of binary files because they are the "sources" of our
artists work (the develop a game).
Splitting a repo can be an option, but it's very inconvenient for us.
Vitaly
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-07 12:55 ` Nicolas Pitre
2010-04-09 6:50 ` Vitaly Berov
@ 2010-04-10 13:25 ` Vitaly Berov
2010-04-11 0:50 ` Nicolas Pitre
1 sibling, 1 reply; 39+ messages in thread
From: Vitaly Berov @ 2010-04-10 13:25 UTC (permalink / raw)
To: git
Hi,
On 04/07/2010 04:55 PM, Nicolas Pitre wrote:
> On Wed, 7 Apr 2010, Vitaly wrote:
>
>
> OK then. What happens if you do the following on the server machine
> where the repository is stored:
>
> git repack -a -f -d
>
> How long does this take?
>
> If the "Resolving deltas" takes more than 10 minutes, could you capture
> a strace dump from that process during a minute or so and post it here?
Nicolas, I took strace and sent it to you personally.
Here is the extract (99% of strace is the same):
--------------
access("/home/vitaly/Projects/test/a1/.git/objects/0f/9a3d28766f8b767fb64166139dd65c079512de",
F_OK) = -1 ENOENT (No such file or directory)
pread(5,
"x\234\324\275y\\Ni\374\377\177\256\323]\335Q\271S\332\220\"\n\241\10Q\10!$!d/\262"...,
214850, 8944159649) = 214850
access("/home/vitaly/Projects/test/a1/.git/objects/a5/5430cbc6674b56d7c2d2d81ef5b7d5c8ebdec8",
F_OK) = -1 ENOENT (No such file or directory)
pread(5, "x\234\354\275\vT\224U\0270<\347\231\v\363\250\244#\f0
\"\"\"\312ETD\300af"..., 159502, 8944374506) = 159502
access("/home/vitaly/Projects/test/a1/.git/objects/e5/02b7d050d1b81ebc256234e303eac17116c9fb",
F_OK) = -1 ENOENT (No such file or directory)
pread(5,
"x\234\324\274yX\24G\3607>\3353\263,\310\342\"7\2.\202\342\1\10\212\210\212\10\236x\341"...,
61131, 8944534014) = 61131
access("/home/vitaly/Projects/test/a1/.git/objects/5b/6bdba61771e5ba63ba8b43659db1612345c2eb",
F_OK) = -1 ENOENT (No such file or directory)
pread(5,
"x\234\324\275\tX\216Y\3747~\237\323\366DOJiyR\236\210B*$!\311\276\223=[\n"...,
236685, 8944595152) = 236685
-----------------
As for me, it looks very suspicious.
Vitaly
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-10 13:25 ` Vitaly Berov
@ 2010-04-11 0:50 ` Nicolas Pitre
2010-04-12 15:31 ` Vitaly
0 siblings, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-11 0:50 UTC (permalink / raw)
To: Vitaly Berov; +Cc: git
On Sat, 10 Apr 2010, Vitaly Berov wrote:
> Hi,
>
> On 04/07/2010 04:55 PM, Nicolas Pitre wrote:
> > On Wed, 7 Apr 2010, Vitaly wrote:
> >
> >
> > OK then. What happens if you do the following on the server machine
> > where the repository is stored:
> >
> > git repack -a -f -d
> >
> > How long does this take?
> >
> > If the "Resolving deltas" takes more than 10 minutes, could you capture
> > a strace dump from that process during a minute or so and post it here?
>
> Nicolas, I took strace and sent it to you personally.
>
> Here is the extract (99% of strace is the same):
> --------------
> access("/home/vitaly/Projects/test/a1/.git/objects/0f/9a3d28766f8b767fb64166139dd65c079512de", F_OK) = -1 ENOENT (No such file or directory)
> pread(5, "x\234\324\275y\\Ni\374\377\177\256\323]\335Q\271S\332\220\"\n\241\10Q\10!$!d/\262"..., 214850, 8944159649) = 214850
> access("/home/vitaly/Projects/test/a1/.git/objects/a5/5430cbc6674b56d7c2d2d81ef5b7d5c8ebdec8", F_OK) = -1 ENOENT (No such file or directory)
> pread(5, "x\234\354\275\vT\224U\0270<\347\231\v\363\250\244#\f0\"\"\"\312ETD\300af"..., 159502, 8944374506) = 159502
> access("/home/vitaly/Projects/test/a1/.git/objects/e5/02b7d050d1b81ebc256234e303eac17116c9fb", F_OK) = -1 ENOENT (No such file or directory)
> pread(5, "x\234\324\274yX\24G\3607>\3353\263,\310\342\"7\2.\202\342\1\10\212\210\212\10\236x\341"..., 61131, 8944534014) = 61131
> access("/home/vitaly/Projects/test/a1/.git/objects/5b/6bdba61771e5ba63ba8b43659db1612345c2eb", F_OK) = -1 ENOENT (No such file or directory)
> pread(5, "x\234\324\275\tX\216Y\3747~\237\323\366DOJiyR\236\210B*$!\311\276\223=[\n"..., 236685, 8944595152) = 236685
> -----------------
> As for me, it looks very suspicious.
It isn't. The pread() is performed for each delta object within the
received pack to be resolved, and then the access() is performed to make
sure the resolved delta doesn't match an object in loose form with the
same hash. Of course deltas are recursive, meaning that a delta might
refer to a base object which is itself a delta, and so on. And yet a
base object might have many delta objects referring to it. So without a
smart delta resolution ordering and caching, we'd end up with an
exponential number of pread calls. However the cache size is limited to
avoid memory exhaustion with deep and wide delta trees (that's the
core.deltaBaseCacheLimit config variable).
So from that strace capture you sent me, we can get:
$ grep access strace.txt | wc -l
3925
$ grep pread strace.txt | wc -l
4095
$ grep pread strace.txt | sort | uniq -d | wc -l
75
So, given 3925 deltas to process, only 4095 objects were read, which is
not too bad. Still, 75 of them were read more than once, which means
they were evicted from the cache while they were still needed. The
core.deltaBaseCacheLimit could be increased to avoid those 75
duplicates. Let's have a look at a few of them:
$ grep pread strace.txt | sort | uniq -cd | sort -nr
20 pread(5, "x\234\354\275w\234\34\305\3210<;\263;\233\357nf\357v/I{\312\243\333=\235t\247p'"..., 1265653, 504922895) = 1265653
20 pread(5, "x\234\254}\7|\24E\373\377\315\354f\357r\227v\227\344.\275A`\271\\\200\20:\204^\245\203"..., 264956, 506188555) = 264956
6 pread(5, "x\234\274}\7xT\305\366\370\336\231\335\273\273\251\354f\263\233\36:\227d\3\201@ \224PB\257"..., 253102, 49016172335) = 253102
6 pread(5, "x\234\274\275\7\224\34\305\3618<;\263;\263\341\322\356\336\355^\222N\361\30\335\355)\235\20w\2"..., 506683, 48982212429) = 506683
6 pread(5, "x\234\254}\7x\34\305\25\360\336\336\335^Q\275;\351N\262d\313M\362\372tr\2231\226\1\27"..., 402609, 49245906707) = 402609
6 pread(5, "x\234\254\275\t|\24\305\3628\2763\323;\273I6\t\331lvs\21B\270\206\315&\1\2H@"..., 176754, 49246749832) = 176754
6 pread(5, "x\234\234}\7|T\305\366\377\336\331\315\246P\23H\205$t\226$t)\t\322\244\367\5\5\5\244"..., 236257, 49246513568) = 236257
6 pread(5, "x\234\224}\7|TE\360\377\355\356\225t\270\224K\207$@r\244\2\241\205\320{\21\10 \35\351"..., 204238, 49246309323) = 204238
5 pread(5, "x\234\264\275\7xT\305\0277\274\367\316\335\273%u7\311n*!\204rI6\t$\20\10\275\211"..., 233622, 49247108828) = 233622
5 pread(5, "x\234\254\275\7|T\305\363\0~\357\275\275w\227\334%\341\222\334]zB \341\270\\\n\204\26:"..., 182228, 49246926593) = 182228
5 pread(5, "x\234\234\274\5x\24\327\32\377\177\316\314fvf\26\22\226@\2\4\222l\4\"\33 \4\v\20\10"..., 70234, 49247342456) = 70234
4 pread(5, "x\234m{\t\\TU\373\377\3343\\\34\206a\270s/\273\10\303* \340 \240\250\250\203\"\232"..., 9345, 49425395631) = 9345
4 pread(5, "x\234-\326\177P\323e\34\300\3619`\342T\4\324)\23\1'\242\241\233c\300D:)~\232\250"..., 1211, 49248626398) = 1211
4 pread(5, "x\234\314\275w\234\24E\363\7\274\323\263;\273\227o\367\356v\357\270\304\35\341\3662p\204;r\16"..., 149400, 49425246225) = 149400
4 pread(5, "x\234\274\275\7|\34\305\3658\276\345n\257J\362\355Iw\262e[r_K'7al\31\343\2"..., 549072, 38602368468) = 549072
So... the first two objects are clearly a problem as they are re-loaded
over and over. Given that their offset is far away from the others i.e.
relatively at the beginning of the pack, they probably are quite high in
their delta hierarchy. And what's really bad is to see those at the
beginning of 10 pread() calls in a row meaning that an entire delta
string has to be replayed in order to get back all those base objects
that were evicted from the cache. That's clearly wasted CPU cycles and
that shouldn't happen with a large enough value for
core.deltaBaseCacheLimit. Given that your files are "relatively" small
i.e. in the 4MB range max, then the cache should be able to hold quite
many of them. At the moment with its 16MB limit, only a few of those
objects would evict many objects from the cache quickly.
If this is still not good enough, then you could add a negative delta
attribute to those large binary files (see
http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html)
and repack the repository on the server. Of course that will make the
repository larger and the data transfer longer when cloning, but the
"resolving deltas" will be much faster. This is therefore a tradeoff.
Another solution which might be way more practical for users of such a
huge repository is simply to use a shallow clone. Surely those people
cloning this repository might not need the full history of the
repository. So you could simply use:
git clone --depth=10 ...
and have only the last 10 revisions transferred. Later on the
repository can be deepened by passing the --depth argument with a larger
value to the fetch command if need be.
Nicolas
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase
2010-04-11 0:50 ` Nicolas Pitre
@ 2010-04-12 15:31 ` Vitaly
0 siblings, 0 replies; 39+ messages in thread
From: Vitaly @ 2010-04-12 15:31 UTC (permalink / raw)
To: git
Hello,
On 04/11/2010 04:50 AM, Nicolas Pitre wrote:
> core.deltaBaseCacheLimit. Given that your files are "relatively" small
> i.e. in the 4MB range max, then the cache should be able to hold quite
> many of them. At the moment with its 16MB limit, only a few of those
> objects would evict many objects from the cache quickly.
>
> If this is still not good enough, then you could add a negative delta
> attribute to those large binary files (see
> http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html)
> and repack the repository on the server. Of course that will make the
> repository larger and the data transfer longer when cloning, but the
> "resolving deltas" will be much faster. This is therefore a tradeoff.
>
> Another solution which might be way more practical for users of such a
> huge repository is simply to use a shallow clone. Surely those people
> cloning this repository might not need the full history of the
> repository. So you could simply use:
>
> git clone --depth=10 ...
>
> and have only the last 10 revisions transferred. Later on the
> repository can be deepened by passing the --depth argument with a larger
> value to the fetch command if need be.
>
>
> Nicolas
>
>
Thanks for comprehensive answer, Nicolas. Now I see 3 directions to work
on: cacheLimit, negative delta attributes and shortening the history
(actually, I don't think "clone --depth" is feasible in our environment,
but we can try to backup and just purge the history).
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2010-04-12 15:32 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-06 14:18 git clone: very long "resolving deltas" phase Vitaly Berov
2010-04-06 15:01 ` Matthieu Moy
2010-04-06 15:28 ` Vitaly Berov
2010-04-06 15:29 ` Vitaly
2010-04-06 15:32 ` Andreas Ericsson
[not found] ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com>
2010-04-06 15:56 ` Vitaly Berov
2010-04-06 21:09 ` Nicolas Pitre
2010-04-07 5:54 ` Vitaly Berov
2010-04-07 8:00 ` Ilari Liusvaara
2010-04-07 8:14 ` Vitaly
2010-04-07 9:00 ` Ilari Liusvaara
2010-04-07 9:37 ` Jakub Narebski
2010-04-07 14:20 ` Nicolas Pitre
2010-04-07 14:35 ` Vitaly
2010-04-07 14:55 ` Nicolas Pitre
2010-04-09 6:46 ` Vitaly Berov
2010-04-09 19:30 ` Nicolas Pitre
2010-04-10 6:32 ` Vitaly Berov
2010-04-07 14:08 ` Nicolas Pitre
2010-04-07 14:29 ` Sverre Rabbelier
2010-04-07 14:37 ` Vitaly
2010-04-07 5:55 ` Vitaly
2010-04-07 12:42 ` Nicolas Pitre
2010-04-06 21:05 ` Nicolas Pitre
2010-04-07 9:22 ` git clone: very long "resolving deltas" phase Marat Radchenko
2010-04-07 14:40 ` Nicolas Pitre
2010-04-06 21:01 ` git clone: very long "resolving deltas" phase Nicolas Pitre
2010-04-06 21:10 ` Nicolas Pitre
2010-04-07 5:57 ` Vitaly
2010-04-07 12:55 ` Nicolas Pitre
2010-04-09 6:50 ` Vitaly Berov
2010-04-09 8:13 ` Matthieu Moy
2010-04-09 19:18 ` Nicolas Pitre
2010-04-10 8:05 ` Vitaly Berov
2010-04-09 19:25 ` Nicolas Pitre
2010-04-10 7:58 ` Vitaly Berov
2010-04-10 13:25 ` Vitaly Berov
2010-04-11 0:50 ` Nicolas Pitre
2010-04-12 15:31 ` Vitaly
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.