All of lore.kernel.org
 help / color / mirror / Atom feed
* git clone: very long "resolving deltas" phase
@ 2010-04-06 14:18 Vitaly Berov
  2010-04-06 15:01 ` Matthieu Moy
  2010-04-06 21:10 ` Nicolas Pitre
  0 siblings, 2 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-06 14:18 UTC (permalink / raw)
  To: git

We have quite a large repository and "git clone" takes about 6 hours. Herewith 
"resolving deltas" takes most of the time.
What git does at this stage and how can we optimize it?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 14:18 git clone: very long "resolving deltas" phase Vitaly Berov
@ 2010-04-06 15:01 ` Matthieu Moy
  2010-04-06 15:28   ` Vitaly Berov
                     ` (2 more replies)
  2010-04-06 21:10 ` Nicolas Pitre
  1 sibling, 3 replies; 39+ messages in thread
From: Matthieu Moy @ 2010-04-06 15:01 UTC (permalink / raw)
  To: Vitaly Berov; +Cc: git

Vitaly Berov <vitaly.berov@gmail.com> writes:

> We have quite a large repository and "git clone" takes about 6 hours. Herewith 
> "resolving deltas" takes most of the time.
> What git does at this stage and how can we optimize it?

Does running "git gc" (long, but done once and for all) on the server
help?

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 15:01 ` Matthieu Moy
@ 2010-04-06 15:28   ` Vitaly Berov
  2010-04-06 15:29   ` Vitaly
  2010-04-06 21:01   ` git clone: very long "resolving deltas" phase Nicolas Pitre
  2 siblings, 0 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-06 15:28 UTC (permalink / raw)
  To: git; +Cc: git

On 04/06/2010 07:01 PM, Matthieu Moy wrote:
> Vitaly Berov<vitaly.berov@gmail.com>  writes:
>
>> We have quite a large repository and "git clone" takes about 6 hours. Herewith
>> "resolving deltas" takes most of the time.
>> What git does at this stage and how can we optimize it?
>
> Does running "git gc" (long, but done once and for all) on the server
> help?
>
Didn't try this one, but I'll give it a try, thanks.

And what does this stage do?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 15:01 ` Matthieu Moy
  2010-04-06 15:28   ` Vitaly Berov
@ 2010-04-06 15:29   ` Vitaly
  2010-04-06 15:32     ` Andreas Ericsson
  2010-04-06 21:01   ` git clone: very long "resolving deltas" phase Nicolas Pitre
  2 siblings, 1 reply; 39+ messages in thread
From: Vitaly @ 2010-04-06 15:29 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

I didn't try this, but I'll give it a try, thanks.

And what does this stage mean?

On 04/06/2010 07:01 PM, Matthieu Moy wrote:
> Vitaly Berov<vitaly.berov@gmail.com>  writes:
>
>    
>> We have quite a large repository and "git clone" takes about 6 hours. Herewith
>> "resolving deltas" takes most of the time.
>> What git does at this stage and how can we optimize it?
>>      
> Does running "git gc" (long, but done once and for all) on the server
> help?
>
>    

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 15:29   ` Vitaly
@ 2010-04-06 15:32     ` Andreas Ericsson
       [not found]       ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com>
  2010-04-06 21:05       ` Nicolas Pitre
  0 siblings, 2 replies; 39+ messages in thread
From: Andreas Ericsson @ 2010-04-06 15:32 UTC (permalink / raw)
  To: Vitaly; +Cc: Matthieu Moy, git

On 04/06/2010 05:29 PM, Vitaly wrote:
> I didn't try this, but I'll give it a try, thanks.
> 
> And what does this stage mean?
> 

It means the server is busy creating a packfile to send
over the wire. If you pack the repository before cloning
from it, deltas from the packfile will simply be copied
into the new pack. This will provide a huge speedboost,
so make sure to repack the repository on the server every
once in a while.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
       [not found]       ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com>
@ 2010-04-06 15:56         ` Vitaly Berov
  2010-04-06 21:09           ` Nicolas Pitre
  0 siblings, 1 reply; 39+ messages in thread
From: Vitaly Berov @ 2010-04-06 15:56 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Andreas Ericsson, git, Matthieu Moy

Why does git compute checksums on the client side? Isn't it already
calculated on the "server" side?

2010/4/6 Shawn Pearce <spearce@spearce.org>:
> Nope, the resolving deltas phase is about computing the checksums of each
> object on the client side of the connection.  Repacking the server might
> have little impact on this phase, other than maybe to reduce the size and
> thus the disk io required to scan the entire pack.
>
> On Apr 6, 2010 9:32 AM, "Andreas Ericsson" <ae@op5.se> wrote:
>> On 04/06/2010 05:29 PM, Vitaly wrote:
>>> I didn't try this, but I'll give it a try, thanks.
>>>
>>> And what does this stage mean?
>>>
>>
>> It means the server is busy creating a packfile to send
>> over the wire. If you pack the repository before cloning
>> from it, deltas from the packfile will simply be copied
>> into the new pack. This will provide a huge speedboost,
>> so make sure to repack the repository on the server every
>> once in a while.
>>
>> --
>> Andreas Ericsson andreas.ericsson@op5.se
>> OP5 AB www.op5.se
>> Tel: +46 8-230225 Fax: +46 8-230231
>>
>> Considering the successes of the wars on alcohol, poverty, drugs and
>> terror, I think we should give some serious thought to declaring war
>> on peace.
>> --
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 15:01 ` Matthieu Moy
  2010-04-06 15:28   ` Vitaly Berov
  2010-04-06 15:29   ` Vitaly
@ 2010-04-06 21:01   ` Nicolas Pitre
  2 siblings, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-06 21:01 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Vitaly Berov, git

On Tue, 6 Apr 2010, Matthieu Moy wrote:

> Vitaly Berov <vitaly.berov@gmail.com> writes:
> 
> > We have quite a large repository and "git clone" takes about 6 hours. Herewith 
> > "resolving deltas" takes most of the time.
> > What git does at this stage and how can we optimize it?
> 
> Does running "git gc" (long, but done once and for all) on the server
> help?

No, that won't help.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 15:32     ` Andreas Ericsson
       [not found]       ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com>
@ 2010-04-06 21:05       ` Nicolas Pitre
  2010-04-07  9:22         ` git clone: very long &quot;resolving deltas&quot; phase Marat Radchenko
  1 sibling, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-06 21:05 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Vitaly, Matthieu Moy, git

On Tue, 6 Apr 2010, Andreas Ericsson wrote:

> On 04/06/2010 05:29 PM, Vitaly wrote:
> > I didn't try this, but I'll give it a try, thanks.
> > 
> > And what does this stage mean?
> > 
> 
> It means the server is busy creating a packfile to send
> over the wire.

No.

The "Resolving deltas" is performed locally, when Git is actually 
expanding all the deltas in the received pack to find the actual SHA1 of 
the resulting object in order to create the pack index.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 15:56         ` Vitaly Berov
@ 2010-04-06 21:09           ` Nicolas Pitre
  2010-04-07  5:54             ` Vitaly Berov
  2010-04-07  5:55             ` Vitaly
  0 siblings, 2 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-06 21:09 UTC (permalink / raw)
  To: Vitaly Berov; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy

On Tue, 6 Apr 2010, Vitaly Berov wrote:

> Why does git compute checksums on the client side? Isn't it already
> calculated on the "server" side?

Yes.  But Git clients can't trust the server like that.

The only way to make sure the server didn't send you crap data, or worse 
maliciously altered data, is actually to not transfer any checksum data 
but to recompute and validate the received payload locally.

This being said, you should never have to wait 6 hours for that phase to 
complete.  It is typically a matter of minutes if not seconds.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 14:18 git clone: very long "resolving deltas" phase Vitaly Berov
  2010-04-06 15:01 ` Matthieu Moy
@ 2010-04-06 21:10 ` Nicolas Pitre
  2010-04-07  5:57   ` Vitaly
  1 sibling, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-06 21:10 UTC (permalink / raw)
  To: Vitaly Berov; +Cc: git

On Tue, 6 Apr 2010, Vitaly Berov wrote:

> We have quite a large repository and "git clone" takes about 6 hours. Herewith 
> "resolving deltas" takes most of the time.

This simply makes no sense.

Is this repository publicly clonable?


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 21:09           ` Nicolas Pitre
@ 2010-04-07  5:54             ` Vitaly Berov
  2010-04-07  8:00               ` Ilari Liusvaara
  2010-04-07  5:55             ` Vitaly
  1 sibling, 1 reply; 39+ messages in thread
From: Vitaly Berov @ 2010-04-07  5:54 UTC (permalink / raw)
  To: git; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy

I suspected the security reasons.

Ok, we work in trusted environment. How can we turn this behavior off?

Vitaly

On 04/07/2010 01:09 AM, Nicolas Pitre wrote:
> On Tue, 6 Apr 2010, Vitaly Berov wrote:
>
>> Why does git compute checksums on the client side? Isn't it already
>> calculated on the "server" side?
>
> Yes.  But Git clients can't trust the server like that.
>
> The only way to make sure the server didn't send you crap data, or worse
> maliciously altered data, is actually to not transfer any checksum data
> but to recompute and validate the received payload locally.
>
> This being said, you should never have to wait 6 hours for that phase to
> complete.  It is typically a matter of minutes if not seconds.
>
>
> Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 21:09           ` Nicolas Pitre
  2010-04-07  5:54             ` Vitaly Berov
@ 2010-04-07  5:55             ` Vitaly
  2010-04-07 12:42               ` Nicolas Pitre
  1 sibling, 1 reply; 39+ messages in thread
From: Vitaly @ 2010-04-07  5:55 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy

I suspected the security reasons.

Ok, we work in trusted environment. How can we turn this behavior off?

On 04/07/2010 01:09 AM, Nicolas Pitre wrote:
> On Tue, 6 Apr 2010, Vitaly Berov wrote:
>
>    
>> Why does git compute checksums on the client side? Isn't it already
>> calculated on the "server" side?
>>      
> Yes.  But Git clients can't trust the server like that.
>
> The only way to make sure the server didn't send you crap data, or worse
> maliciously altered data, is actually to not transfer any checksum data
> but to recompute and validate the received payload locally.
>
> This being said, you should never have to wait 6 hours for that phase to
> complete.  It is typically a matter of minutes if not seconds.
>
>
> Nicolas
>
>    

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-06 21:10 ` Nicolas Pitre
@ 2010-04-07  5:57   ` Vitaly
  2010-04-07 12:55     ` Nicolas Pitre
  0 siblings, 1 reply; 39+ messages in thread
From: Vitaly @ 2010-04-07  5:57 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git

Hmm, what does it mean - "makes no sense"? It works as it works - for 
several hours.

No, we work in a trusted environment. Our repository isn't open for 
external people.

On 04/07/2010 01:10 AM, Nicolas Pitre wrote:
> On Tue, 6 Apr 2010, Vitaly Berov wrote:
>
>    
>> We have quite a large repository and "git clone" takes about 6 hours. Herewith
>> "resolving deltas" takes most of the time.
>>      
> This simply makes no sense.
>
> Is this repository publicly clonable?
>
>
> Nicolas
>
>    

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07  5:54             ` Vitaly Berov
@ 2010-04-07  8:00               ` Ilari Liusvaara
  2010-04-07  8:14                 ` Vitaly
  2010-04-07 14:08                 ` Nicolas Pitre
  0 siblings, 2 replies; 39+ messages in thread
From: Ilari Liusvaara @ 2010-04-07  8:00 UTC (permalink / raw)
  To: Vitaly Berov; +Cc: git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

On Wed, Apr 07, 2010 at 09:54:29AM +0400, Vitaly Berov wrote:
> I suspected the security reasons.
> 
> Ok, we work in trusted environment. How can we turn this behavior off?
 
It can't be turned off. Protocol requires client to recompute hashes
as they are not explicitly available in transport stream (must be inferred
instead).

> >This being said, you should never have to wait 6 hours for that phase to
> >complete.  It is typically a matter of minutes if not seconds.

The reasons why it might take 6 hours (offhand from memory):

- Extremely large repo
- Very large files in repo pushing client into swap.

-Ilari

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07  8:00               ` Ilari Liusvaara
@ 2010-04-07  8:14                 ` Vitaly
  2010-04-07  9:00                   ` Ilari Liusvaara
                                     ` (2 more replies)
  2010-04-07 14:08                 ` Nicolas Pitre
  1 sibling, 3 replies; 39+ messages in thread
From: Vitaly @ 2010-04-07  8:14 UTC (permalink / raw)
  To: Ilari Liusvaara; +Cc: git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

Too bad..
Yes, we really have a very large repo with binary files.

So, as far as I understand, the fastest way is to use rsync or smth like 
this instead of "git clone".

P.S. Btw, how can I ask for a feature of incorporating hashes into 
transport stream in trusted environments?

On 04/07/2010 12:00 PM, Ilari Liusvaara wrote:
> On Wed, Apr 07, 2010 at 09:54:29AM +0400, Vitaly Berov wrote:
>    
>> I suspected the security reasons.
>>
>> Ok, we work in trusted environment. How can we turn this behavior off?
>>      
>
> It can't be turned off. Protocol requires client to recompute hashes
> as they are not explicitly available in transport stream (must be inferred
> instead).
>
>    
>>> This being said, you should never have to wait 6 hours for that phase to
>>> complete.  It is typically a matter of minutes if not seconds.
>>>        
> The reasons why it might take 6 hours (offhand from memory):
>
> - Extremely large repo
> - Very large files in repo pushing client into swap.
>
> -Ilari
>
>    

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07  8:14                 ` Vitaly
@ 2010-04-07  9:00                   ` Ilari Liusvaara
  2010-04-07  9:37                   ` Jakub Narebski
  2010-04-07 14:20                   ` Nicolas Pitre
  2 siblings, 0 replies; 39+ messages in thread
From: Ilari Liusvaara @ 2010-04-07  9:00 UTC (permalink / raw)
  To: Vitaly; +Cc: git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

On Wed, Apr 07, 2010 at 12:14:36PM +0400, Vitaly wrote:
> Too bad..
> Yes, we really have a very large repo with binary files.

Large binary files are the worst. I think that disabling deltification
('-delta' as attribute[*]) on them might actually help somewhat...

> P.S. Btw, how can I ask for a feature of incorporating hashes into
> transport stream in trusted environments?

On this mailing list. But as a tip: don't bother: It is by far too
large change relative to any possible benefit. 

[*] I think 'info/attributes' on server influences wheither
those objects are attempted to be deltified or not.

-Ilari

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long &quot;resolving deltas&quot; phase
  2010-04-06 21:05       ` Nicolas Pitre
@ 2010-04-07  9:22         ` Marat Radchenko
  2010-04-07 14:40           ` Nicolas Pitre
  0 siblings, 1 reply; 39+ messages in thread
From: Marat Radchenko @ 2010-04-07  9:22 UTC (permalink / raw)
  To: git

Nicolas Pitre <nico <at> fluxnic.net> writes:
> The "Resolving deltas" is performed locally, when Git is actually 
> expanding all the deltas in the received pack to find the actual SHA1 of 
> the resulting object in order to create the pack index.
Is there any technical limitation why it cannot be done simultaniously with
fetch (piped or whatever), instead of a separate step after fetch?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07  8:14                 ` Vitaly
  2010-04-07  9:00                   ` Ilari Liusvaara
@ 2010-04-07  9:37                   ` Jakub Narebski
  2010-04-07 14:20                   ` Nicolas Pitre
  2 siblings, 0 replies; 39+ messages in thread
From: Jakub Narebski @ 2010-04-07  9:37 UTC (permalink / raw)
  To: Vitaly; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

Please do not toppost.

Vitaly <vitaly.berov@gmail.com> writes:

> Too bad..
> Yes, we really have a very large repo with binary files.
> 
> So, as far as I understand, the fastest way is to use rsync or smth
> like this instead of "git clone".
> 
> P.S. Btw, how can I ask for a feature of incorporating hashes into
> transport stream in trusted environments?

If you have very large binary files, perhaps git-bigfiles fork would
help you: http://caca.zoy.org/wiki/git-bigfiles

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07  5:55             ` Vitaly
@ 2010-04-07 12:42               ` Nicolas Pitre
  0 siblings, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 12:42 UTC (permalink / raw)
  To: Vitaly; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy

On Wed, 7 Apr 2010, Vitaly wrote:

> I suspected the security reasons.
> 
> Ok, we work in trusted environment. How can we turn this behavior off?

you can't.  This is fundamental to the Git native protocol.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07  5:57   ` Vitaly
@ 2010-04-07 12:55     ` Nicolas Pitre
  2010-04-09  6:50       ` Vitaly Berov
  2010-04-10 13:25       ` Vitaly Berov
  0 siblings, 2 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 12:55 UTC (permalink / raw)
  To: Vitaly; +Cc: git

On Wed, 7 Apr 2010, Vitaly wrote:

> Hmm, what does it mean - "makes no sense"? It works as it works - for several
> hours.

As I said, several hours for this operation makes no sense.  This should 
take minutes if no seconds.  *This* is what needs fixing.

> No, we work in a trusted environment. Our repository isn't open for external
> people.

I was asking that because that would have helped me (or any other Git 
developer) analyse the issue and provide a fix.

OK then.  What happens if you do the following on the server machine 
where the repository is stored:

	git repack -a -f -d

How long does this take?

How long does the "Resolving deltas" take when cloning this repacked 
repository? (don't wait more than 10 minutes for it).

If the "Resolving deltas" takes more than 10 minutes, could you capture 
a strace dump from that process during a minute or so and post it here?

Hmmm. Is this on Linux or Windows?


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07  8:00               ` Ilari Liusvaara
  2010-04-07  8:14                 ` Vitaly
@ 2010-04-07 14:08                 ` Nicolas Pitre
  2010-04-07 14:29                   ` Sverre Rabbelier
  1 sibling, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 14:08 UTC (permalink / raw)
  To: Ilari Liusvaara
  Cc: Vitaly Berov, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

On Wed, 7 Apr 2010, Ilari Liusvaara wrote:

> On Wed, Apr 07, 2010 at 09:54:29AM +0400, Vitaly Berov wrote:
> > I suspected the security reasons.
> > 
> > Ok, we work in trusted environment. How can we turn this behavior off?
>  
> It can't be turned off. Protocol requires client to recompute hashes
> as they are not explicitly available in transport stream (must be inferred
> instead).
> 
> > >This being said, you should never have to wait 6 hours for that phase to
> > >complete.  It is typically a matter of minutes if not seconds.
> 
> The reasons why it might take 6 hours (offhand from memory):
> 
> - Extremely large repo

Six hours is still way out of the expected computational requirement.  
That's an expected time for an aggressive repack for example, where 
_each_ delta is attempted against a different base up to 250 times.  
But when indexing a fetched pack, each delta is expected to be computed 
only once.

> - Very large files in repo pushing client into swap.

This shouldn't happen since commit 92392b4a which provide a cap on 
memory usage during the delta resolution process.

So without a look at the actual repository causing this pathological 
behavior it is hard to guess what the issue and the required fix might 
be.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07  8:14                 ` Vitaly
  2010-04-07  9:00                   ` Ilari Liusvaara
  2010-04-07  9:37                   ` Jakub Narebski
@ 2010-04-07 14:20                   ` Nicolas Pitre
  2010-04-07 14:35                     ` Vitaly
  2 siblings, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 14:20 UTC (permalink / raw)
  To: Vitaly; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

On Wed, 7 Apr 2010, Vitaly wrote:

> Too bad..
> Yes, we really have a very large repo with binary files.
> 
> So, as far as I understand, the fastest way is to use rsync or smth like this
> instead of "git clone".

You should still be able to use 'git clone' with a rsync:// style URL.

> P.S. Btw, how can I ask for a feature of incorporating hashes into transport
> stream in trusted environments?

As I'm trying to make you understand repeatedly now, this shouldn't be 
needed.  A real fix for the bad behavior would be in order before 
papering over it.

If the large binary blobs are the source of the clone problem, then they 
will cause the same problems with other commands such as 'git diff' or 
even 'git checkout' later on.  So that "feature" you're asking for is 
misguided.

What you might try on your client machines is this:

	git config --global core.deltaBaseCacheLimit 256m

before doing a clone.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07 14:08                 ` Nicolas Pitre
@ 2010-04-07 14:29                   ` Sverre Rabbelier
  2010-04-07 14:37                     ` Vitaly
  0 siblings, 1 reply; 39+ messages in thread
From: Sverre Rabbelier @ 2010-04-07 14:29 UTC (permalink / raw)
  To: Nicolas Pitre, Vitaly Berov
  Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

Heya,

On Wed, Apr 7, 2010 at 09:08, Nicolas Pitre <nico@fluxnic.net> wrote:
> This shouldn't happen since commit 92392b4a which provide a cap on
> memory usage during the delta resolution process.

Which made me think of asking:

Vitaly, what version of git are you running? Both client and server side please.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07 14:20                   ` Nicolas Pitre
@ 2010-04-07 14:35                     ` Vitaly
  2010-04-07 14:55                       ` Nicolas Pitre
  0 siblings, 1 reply; 39+ messages in thread
From: Vitaly @ 2010-04-07 14:35 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

On 04/07/2010 06:20 PM, Nicolas Pitre wrote:
>> P.S. Btw, how can I ask for a feature of incorporating hashes into transport
>> stream in trusted environments?
>>      
> As I'm trying to make you understand repeatedly now, this shouldn't be
> needed.  A real fix for the bad behavior would be in order before
> papering over it.
>    
Nicolas, my post have been written before I received your message about 
reproducing and "stracing" the problem. I caught  your idea and now 
reproducing the problem.
My estimate is tomorrow (repack takes quite a long time).

Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07 14:29                   ` Sverre Rabbelier
@ 2010-04-07 14:37                     ` Vitaly
  0 siblings, 0 replies; 39+ messages in thread
From: Vitaly @ 2010-04-07 14:37 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Nicolas Pitre, Ilari Liusvaara, git, Shawn Pearce,
	Andreas Ericsson, Matthieu Moy

On 04/07/2010 06:29 PM, Sverre Rabbelier wrote:
> Heya,
>
> On Wed, Apr 7, 2010 at 09:08, Nicolas Pitre<nico@fluxnic.net>  wrote:
>    
>> This shouldn't happen since commit 92392b4a which provide a cap on
>> memory usage during the delta resolution process.
>>      
> Which made me think of asking:
>
> Vitaly, what version of git are you running? Both client and server side please.
>    
git version 1.7.0.4, both sides.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long &quot;resolving deltas&quot; phase
  2010-04-07  9:22         ` git clone: very long &quot;resolving deltas&quot; phase Marat Radchenko
@ 2010-04-07 14:40           ` Nicolas Pitre
  0 siblings, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 14:40 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: git

On Wed, 7 Apr 2010, Marat Radchenko wrote:

> Nicolas Pitre <nico <at> fluxnic.net> writes:
> > The "Resolving deltas" is performed locally, when Git is actually 
> > expanding all the deltas in the received pack to find the actual SHA1 of 
> > the resulting object in order to create the pack index.
> Is there any technical limitation why it cannot be done simultaniously with
> fetch (piped or whatever), instead of a separate step after fetch?

The non delta compressed objects are indexed simultaneously as they're 
received on the wire.  However this is way suboptimal to do that for 
delta objects because

1) The base object needed to resolve a given delta object might not have 
   been received yet.  This means in this case that the delta will have 
   to be resolved later anyway, and finding out if a just received 
   object might be a base object for previously received objects is 
   rather costly, and even impossible if that potential base object is 
   itself a delta.  So it is best to figure out the delta dependencies 
   only once at the end of the transfer.

2) When resolving deep delta chains, it is best to start from the root 
   i.e. create the result from a delta object and resolve all deltas 
   with this result for base recursively, not to expand deltas 
   repeatedly which would turn this process into exponential CPU usage.  
   Again this can be done only when all delta objects have been 
   received.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07 14:35                     ` Vitaly
@ 2010-04-07 14:55                       ` Nicolas Pitre
  2010-04-09  6:46                         ` Vitaly Berov
  0 siblings, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-07 14:55 UTC (permalink / raw)
  To: Vitaly; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

On Wed, 7 Apr 2010, Vitaly wrote:

> Nicolas, my post have been written before I received your message about
> reproducing and "stracing" the problem. I caught  your idea and now
> reproducing the problem.

No problem.

> My estimate is tomorrow (repack takes quite a long time).

The repack isn't so important.  If it takes that long you might simply 
interrupt it and strace the client when "resolving deltas" is looking to 
be insanely long.  In reality it is best if you don't repack as the 
client needs to cope with whatever the server throws at it and repacking 
your repo might hide the client issue.

Then playing with core.deltaBaseCacheLimit instead would be quite 
interesting.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07 14:55                       ` Nicolas Pitre
@ 2010-04-09  6:46                         ` Vitaly Berov
  2010-04-09 19:30                           ` Nicolas Pitre
  0 siblings, 1 reply; 39+ messages in thread
From: Vitaly Berov @ 2010-04-09  6:46 UTC (permalink / raw)
  To: git; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy

Hi,

On 04/07/2010 06:55 PM, Nicolas Pitre wrote:
>
> Then playing with core.deltaBaseCacheLimit instead would be quite
> interesting.
It's difficult to play with parameters because only receiving objects 
phase takes 1.5-2 hours. But I'll try "git config --global 
core.deltaBaseCacheLimit 256m" as you recommended.

Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07 12:55     ` Nicolas Pitre
@ 2010-04-09  6:50       ` Vitaly Berov
  2010-04-09  8:13         ` Matthieu Moy
  2010-04-09 19:25         ` Nicolas Pitre
  2010-04-10 13:25       ` Vitaly Berov
  1 sibling, 2 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-09  6:50 UTC (permalink / raw)
  To: git; +Cc: git

Hi,

On 04/07/2010 04:55 PM, Nicolas Pitre wrote:
>
> I was asking that because that would have helped me (or any other Git
> developer) analyse the issue and provide a fix.
>
> OK then.  What happens if you do the following on the server machine
> where the repository is stored:
>
> 	git repack -a -f -d
>
> How long does this take?
>
> How long does the "Resolving deltas" take when cloning this repacked
> repository? (don't wait more than 10 minutes for it).
Nicolas, we haven't stopped the process as you recommended, sorry for that.

So, the results: it took 37 hours. 20 hours is compressing objects 
(delta compression using up to 4 threads), 17 hours is writing objects. 
Almost all of the time the bottleneck was a CPU.

Objects amount: 3997548.
Size of the repository: ~57Gb.

> If the "Resolving deltas" takes more than 10 minutes, could you capture
> a strace dump from that process during a minute or so and post it here?
I'll capture strace later.

> Hmmm. Is this on Linux or Windows?
Short spec: Ubuntu 9.04 (64 bit), Intel(R) Core(TM)2 Quad CPU Q8400 
2.66GHz, 8 GB of memory

By the way, we have a large amount of binary files in our rep.

Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-09  6:50       ` Vitaly Berov
@ 2010-04-09  8:13         ` Matthieu Moy
  2010-04-09 19:18           ` Nicolas Pitre
  2010-04-10  8:05           ` Vitaly Berov
  2010-04-09 19:25         ` Nicolas Pitre
  1 sibling, 2 replies; 39+ messages in thread
From: Matthieu Moy @ 2010-04-09  8:13 UTC (permalink / raw)
  To: Vitaly Berov; +Cc: git

Vitaly Berov <vitaly.berov@gmail.com> writes:

> Objects amount: 3997548.
> Size of the repository: ~57Gb.
[...]
> By the way, we have a large amount of binary files in our rep.

This is clearly not the kind of repositories Git is good at. I
encourage you to continue this discussion, and try to find a way to
get it working, but the standard approach (probably a "my 2 cents"
kind of advices, but ...) would be:

* Split your repo into smaller ones (submodules ...)

* Avoid versionning binary files

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-09  8:13         ` Matthieu Moy
@ 2010-04-09 19:18           ` Nicolas Pitre
  2010-04-10  8:05           ` Vitaly Berov
  1 sibling, 0 replies; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-09 19:18 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Vitaly Berov, git

On Fri, 9 Apr 2010, Matthieu Moy wrote:

> Vitaly Berov <vitaly.berov@gmail.com> writes:
> 
> > Objects amount: 3997548.
> > Size of the repository: ~57Gb.
> [...]
> > By the way, we have a large amount of binary files in our rep.
> 
> This is clearly not the kind of repositories Git is good at. I
> encourage you to continue this discussion, and try to find a way to
> get it working, but the standard approach (probably a "my 2 cents"
> kind of advices, but ...) would be:
> 
> * Split your repo into smaller ones (submodules ...)
> 
> * Avoid versionning binary files

I still think that Git ought to "just work" with such a repository.
There are things that should be done for that, like applying the 
core.bigFileThreshold configuration variable to more places, such as 
delta compression, object creation, diff generation, etc.

Of course Git won't be as good at saving disk space in that case, but 
when your repo is 57GB you probably don't care much if it grows to 80GB 
but cloning it is twice as fast.

Yet, I still don't think the current issue with the receiving end of a 
clone taking 6 hours in "Resolving deltas" is normal, independently of 
core.bigFileThreshold.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-09  6:50       ` Vitaly Berov
  2010-04-09  8:13         ` Matthieu Moy
@ 2010-04-09 19:25         ` Nicolas Pitre
  2010-04-10  7:58           ` Vitaly Berov
  1 sibling, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-09 19:25 UTC (permalink / raw)
  To: Vitaly Berov; +Cc: git

On Fri, 9 Apr 2010, Vitaly Berov wrote:

> > OK then.  What happens if you do the following on the server machine
> > where the repository is stored:
> > 
> > 	git repack -a -f -d
> > 
> > How long does this take?
> 
> So, the results: it took 37 hours. 20 hours is compressing objects (delta
> compression using up to 4 threads), 17 hours is writing objects. Almost all of
> the time the bottleneck was a CPU.
> 
> Objects amount: 3997548.
> Size of the repository: ~57Gb.

OK.  You probably have a size record.  :-)

How big is the .pack file in .git/objects/pack/ ?

> By the way, we have a large amount of binary files in our rep.

How many?  How big?


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-09  6:46                         ` Vitaly Berov
@ 2010-04-09 19:30                           ` Nicolas Pitre
  2010-04-10  6:32                             ` Vitaly Berov
  0 siblings, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-09 19:30 UTC (permalink / raw)
  To: Vitaly Berov
  Cc: git, Ilari Liusvaara, Shawn Pearce, Andreas Ericsson, Matthieu Moy

On Fri, 9 Apr 2010, Vitaly Berov wrote:

> Hi,
> 
> On 04/07/2010 06:55 PM, Nicolas Pitre wrote:
> > 
> > Then playing with core.deltaBaseCacheLimit instead would be quite
> > interesting.
> It's difficult to play with parameters because only receiving objects phase
> takes 1.5-2 hours.

Huh...  I guess that's over 100Mbps ethernet?

57GB / 1.5h -> approx 10MB/s


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-09 19:30                           ` Nicolas Pitre
@ 2010-04-10  6:32                             ` Vitaly Berov
  0 siblings, 0 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-10  6:32 UTC (permalink / raw)
  To: git; +Cc: git, Ilari Liusvaara, Shawn Pearce, Andreas Ericsson, Matthieu Moy

On 04/09/2010 11:30 PM, Nicolas Pitre wrote:
> On Fri, 9 Apr 2010, Vitaly Berov wrote:
>
>> Hi,
>>
>> On 04/07/2010 06:55 PM, Nicolas Pitre wrote:
>>>
>>> Then playing with core.deltaBaseCacheLimit instead would be quite
>>> interesting.
>> It's difficult to play with parameters because only receiving objects phase
>> takes 1.5-2 hours.
>
> Huh...  I guess that's over 100Mbps ethernet?
>
> 57GB / 1.5h ->  approx 10MB/s

Yes

Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-09 19:25         ` Nicolas Pitre
@ 2010-04-10  7:58           ` Vitaly Berov
  0 siblings, 0 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-10  7:58 UTC (permalink / raw)
  To: git

On 04/09/2010 11:25 PM, Nicolas Pitre wrote:
>>
>> Objects amount: 3997548.
>> Size of the repository: ~57Gb.
>
> OK.  You probably have a size record.  :-)
That's game development. We have ~100 artists who produce text and 
binary files as "sources". FYI, the "end client version" is ~2.5GB.

> How big is the .pack file in .git/objects/pack/ ?
~56Gb

>
>> By the way, we have a large amount of binary files in our rep.
>
> How many?  How big?

Total amount of files ~400000, amount of binaries ~200000.
Distribution of sizes:  5% of 4M - 32K, 5% of 32K - 12K, 12K - 6K, 6K - 
4K, 4K - 2.5K, 2.5K - 2.3K, 2.3K - 2K, the rest

Vitaly

P.S. By the way, msysgit can't handle this repository, blocker bug is:
http://code.google.com/p/msysgit/issues/detail?id=365&q=mmap&colspec=ID%20Type%20Status%20Priority%20Component%20Owner%20Summary.
So I thinking about stopping the evaluation, though I like git 
(especially after a long subversion experience :))

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-09  8:13         ` Matthieu Moy
  2010-04-09 19:18           ` Nicolas Pitre
@ 2010-04-10  8:05           ` Vitaly Berov
  1 sibling, 0 replies; 39+ messages in thread
From: Vitaly Berov @ 2010-04-10  8:05 UTC (permalink / raw)
  To: git

On 04/09/2010 12:13 PM, Matthieu Moy wrote:
> Vitaly Berov<vitaly.berov@gmail.com>  writes:
>
>> Objects amount: 3997548.
>> Size of the repository: ~57Gb.
> [...]
>> By the way, we have a large amount of binary files in our rep.
>
> This is clearly not the kind of repositories Git is good at.
Hmm.. I'm looking for a good repository because I'm tired of subversion, 
Perforce isn't an option to (very expensive and even more 
uncomfortable). It seems like there only Git/Mercurial are good options. 
Can you recommend some other scms?

> I encourage you to continue this discussion, and try to find a way to
> get it working, but the standard approach (probably a "my 2 cents"
> kind of advices, but ...) would be:
>
> * Split your repo into smaller ones (submodules ...)
>
> * Avoid versionning binary files

I can't get rid of binary files because they are the "sources" of our 
artists work (the develop a game).
Splitting a repo can be an option, but it's very inconvenient for us.

Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-07 12:55     ` Nicolas Pitre
  2010-04-09  6:50       ` Vitaly Berov
@ 2010-04-10 13:25       ` Vitaly Berov
  2010-04-11  0:50         ` Nicolas Pitre
  1 sibling, 1 reply; 39+ messages in thread
From: Vitaly Berov @ 2010-04-10 13:25 UTC (permalink / raw)
  To: git

Hi,

On 04/07/2010 04:55 PM, Nicolas Pitre wrote:
> On Wed, 7 Apr 2010, Vitaly wrote:
>
>
> OK then.  What happens if you do the following on the server machine
> where the repository is stored:
>
> 	git repack -a -f -d
>
> How long does this take?
>
> If the "Resolving deltas" takes more than 10 minutes, could you capture
> a strace dump from that process during a minute or so and post it here?

Nicolas, I took strace and sent it to you personally.

Here is the extract (99% of strace is the same):
--------------
access("/home/vitaly/Projects/test/a1/.git/objects/0f/9a3d28766f8b767fb64166139dd65c079512de", 
F_OK) = -1 ENOENT (No such file or directory)
pread(5, 
"x\234\324\275y\\Ni\374\377\177\256\323]\335Q\271S\332\220\"\n\241\10Q\10!$!d/\262"..., 
214850, 8944159649) = 214850
access("/home/vitaly/Projects/test/a1/.git/objects/a5/5430cbc6674b56d7c2d2d81ef5b7d5c8ebdec8", 
F_OK) = -1 ENOENT (No such file or directory)
pread(5, "x\234\354\275\vT\224U\0270<\347\231\v\363\250\244#\f0 
\"\"\"\312ETD\300af"..., 159502, 8944374506) = 159502
access("/home/vitaly/Projects/test/a1/.git/objects/e5/02b7d050d1b81ebc256234e303eac17116c9fb", 
F_OK) = -1 ENOENT (No such file or directory)
pread(5, 
"x\234\324\274yX\24G\3607>\3353\263,\310\342\"7\2.\202\342\1\10\212\210\212\10\236x\341"..., 
61131, 8944534014) = 61131
access("/home/vitaly/Projects/test/a1/.git/objects/5b/6bdba61771e5ba63ba8b43659db1612345c2eb", 
F_OK) = -1 ENOENT (No such file or directory)
pread(5, 
"x\234\324\275\tX\216Y\3747~\237\323\366DOJiyR\236\210B*$!\311\276\223=[\n"..., 
236685, 8944595152) = 236685
-----------------
As for me, it looks very suspicious.

Vitaly

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-10 13:25       ` Vitaly Berov
@ 2010-04-11  0:50         ` Nicolas Pitre
  2010-04-12 15:31           ` Vitaly
  0 siblings, 1 reply; 39+ messages in thread
From: Nicolas Pitre @ 2010-04-11  0:50 UTC (permalink / raw)
  To: Vitaly Berov; +Cc: git

On Sat, 10 Apr 2010, Vitaly Berov wrote:

> Hi,
> 
> On 04/07/2010 04:55 PM, Nicolas Pitre wrote:
> > On Wed, 7 Apr 2010, Vitaly wrote:
> > 
> > 
> > OK then.  What happens if you do the following on the server machine
> > where the repository is stored:
> > 
> > 	git repack -a -f -d
> > 
> > How long does this take?
> > 
> > If the "Resolving deltas" takes more than 10 minutes, could you capture
> > a strace dump from that process during a minute or so and post it here?
> 
> Nicolas, I took strace and sent it to you personally.
> 
> Here is the extract (99% of strace is the same):
> --------------
> access("/home/vitaly/Projects/test/a1/.git/objects/0f/9a3d28766f8b767fb64166139dd65c079512de", F_OK) = -1 ENOENT (No such file or directory)
> pread(5, "x\234\324\275y\\Ni\374\377\177\256\323]\335Q\271S\332\220\"\n\241\10Q\10!$!d/\262"..., 214850, 8944159649) = 214850
> access("/home/vitaly/Projects/test/a1/.git/objects/a5/5430cbc6674b56d7c2d2d81ef5b7d5c8ebdec8", F_OK) = -1 ENOENT (No such file or directory)
> pread(5, "x\234\354\275\vT\224U\0270<\347\231\v\363\250\244#\f0\"\"\"\312ETD\300af"..., 159502, 8944374506) = 159502
> access("/home/vitaly/Projects/test/a1/.git/objects/e5/02b7d050d1b81ebc256234e303eac17116c9fb", F_OK) = -1 ENOENT (No such file or directory)
> pread(5, "x\234\324\274yX\24G\3607>\3353\263,\310\342\"7\2.\202\342\1\10\212\210\212\10\236x\341"..., 61131, 8944534014) = 61131
> access("/home/vitaly/Projects/test/a1/.git/objects/5b/6bdba61771e5ba63ba8b43659db1612345c2eb", F_OK) = -1 ENOENT (No such file or directory)
> pread(5, "x\234\324\275\tX\216Y\3747~\237\323\366DOJiyR\236\210B*$!\311\276\223=[\n"..., 236685, 8944595152) = 236685
> -----------------
> As for me, it looks very suspicious.

It isn't.  The pread() is performed for each delta object within the 
received pack to be resolved, and then the access() is performed to make 
sure the resolved delta doesn't match an object in loose form with the 
same hash.  Of course deltas are recursive, meaning that a delta might 
refer to a base object which is itself a delta, and so on.  And yet a 
base object might have many delta objects referring to it.  So without a 
smart delta resolution ordering and caching, we'd end up with an 
exponential number of pread calls.  However the cache size is limited to 
avoid memory exhaustion with deep and wide delta trees (that's the 
core.deltaBaseCacheLimit config variable).

So from that strace capture you sent me, we can get:

$ grep access strace.txt  | wc -l
3925

$ grep pread strace.txt  | wc -l
4095

$ grep pread strace.txt | sort | uniq -d | wc -l
75

So, given 3925 deltas to process, only 4095 objects were read, which is 
not too bad.  Still, 75 of them were read more than once, which means 
they were evicted from the cache while they were still needed.  The 
core.deltaBaseCacheLimit could be increased to avoid those 75 
duplicates.  Let's have a look at a few of them:

$ grep pread strace.txt | sort | uniq -cd | sort -nr
     20 pread(5, "x\234\354\275w\234\34\305\3210<;\263;\233\357nf\357v/I{\312\243\333=\235t\247p'"..., 1265653, 504922895) = 1265653
     20 pread(5, "x\234\254}\7|\24E\373\377\315\354f\357r\227v\227\344.\275A`\271\\\200\20:\204^\245\203"..., 264956, 506188555) = 264956
      6 pread(5, "x\234\274}\7xT\305\366\370\336\231\335\273\273\251\354f\263\233\36:\227d\3\201@ \224PB\257"..., 253102, 49016172335) = 253102
      6 pread(5, "x\234\274\275\7\224\34\305\3618<;\263;\263\341\322\356\336\355^\222N\361\30\335\355)\235\20w\2"..., 506683, 48982212429) = 506683
      6 pread(5, "x\234\254}\7x\34\305\25\360\336\336\335^Q\275;\351N\262d\313M\362\372tr\2231\226\1\27"..., 402609, 49245906707) = 402609
      6 pread(5, "x\234\254\275\t|\24\305\3628\2763\323;\273I6\t\331lvs\21B\270\206\315&\1\2H@"..., 176754, 49246749832) = 176754
      6 pread(5, "x\234\234}\7|T\305\366\377\336\331\315\246P\23H\205$t\226$t)\t\322\244\367\5\5\5\244"..., 236257, 49246513568) = 236257
      6 pread(5, "x\234\224}\7|TE\360\377\355\356\225t\270\224K\207$@r\244\2\241\205\320{\21\10 \35\351"..., 204238, 49246309323) = 204238
      5 pread(5, "x\234\264\275\7xT\305\0277\274\367\316\335\273%u7\311n*!\204rI6\t$\20\10\275\211"..., 233622, 49247108828) = 233622
      5 pread(5, "x\234\254\275\7|T\305\363\0~\357\275\275w\227\334%\341\222\334]zB \341\270\\\n\204\26:"..., 182228, 49246926593) = 182228
      5 pread(5, "x\234\234\274\5x\24\327\32\377\177\316\314fvf\26\22\226@\2\4\222l\4\"\33 \4\v\20\10"..., 70234, 49247342456) = 70234
      4 pread(5, "x\234m{\t\\TU\373\377\3343\\\34\206a\270s/\273\10\303* \340 \240\250\250\203\"\232"..., 9345, 49425395631) = 9345
      4 pread(5, "x\234-\326\177P\323e\34\300\3619`\342T\4\324)\23\1'\242\241\233c\300D:)~\232\250"..., 1211, 49248626398) = 1211
      4 pread(5, "x\234\314\275w\234\24E\363\7\274\323\263;\273\227o\367\356v\357\270\304\35\341\3662p\204;r\16"..., 149400, 49425246225) = 149400
      4 pread(5, "x\234\274\275\7|\34\305\3658\276\345n\257J\362\355Iw\262e[r_K'7al\31\343\2"..., 549072, 38602368468) = 549072

So... the first two objects are clearly a problem as they are re-loaded 
over and over. Given that their offset is far away from the others i.e. 
relatively at the beginning of the pack, they probably are quite high in 
their delta hierarchy.  And what's really bad is to see those at the 
beginning of 10 pread() calls in a row meaning that an entire delta 
string has to be replayed in order to get back all those base objects 
that were evicted from the cache.  That's clearly wasted CPU cycles and 
that shouldn't happen with a large enough value for 
core.deltaBaseCacheLimit.  Given that your files are "relatively" small 
i.e. in the 4MB range max, then the cache should be able to hold quite 
many of them.  At the moment with its 16MB limit, only a few of those 
objects would evict many objects from the cache quickly.

If this is still not good enough, then you could add a negative delta 
attribute to those large binary files (see 
http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html)
and repack the repository on the server.  Of course that will make the 
repository larger and the data transfer longer when cloning, but the 
"resolving deltas" will be much faster.  This is therefore a tradeoff.

Another solution which might be way more practical for users of such a 
huge repository is simply to use a shallow clone.  Surely those people 
cloning this repository might not need the full history of the 
repository.  So you could simply use:

	git clone --depth=10 ...

and have only the last 10 revisions transferred.  Later on the 
repository can be deepened by passing the --depth argument with a larger 
value to the fetch command if need be.


Nicolas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: git clone: very long "resolving deltas" phase
  2010-04-11  0:50         ` Nicolas Pitre
@ 2010-04-12 15:31           ` Vitaly
  0 siblings, 0 replies; 39+ messages in thread
From: Vitaly @ 2010-04-12 15:31 UTC (permalink / raw)
  To: git

Hello,

On 04/11/2010 04:50 AM, Nicolas Pitre wrote:
> core.deltaBaseCacheLimit. Given that your files are "relatively" small
> i.e. in the 4MB range max, then the cache should be able to hold quite
> many of them.  At the moment with its 16MB limit, only a few of those
> objects would evict many objects from the cache quickly.
>
> If this is still not good enough, then you could add a negative delta
> attribute to those large binary files (see
> http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html)
> and repack the repository on the server.  Of course that will make the
> repository larger and the data transfer longer when cloning, but the
> "resolving deltas" will be much faster.  This is therefore a tradeoff.
>
> Another solution which might be way more practical for users of such a
> huge repository is simply to use a shallow clone.  Surely those people
> cloning this repository might not need the full history of the
> repository.  So you could simply use:
>
> 	git clone --depth=10 ...
>
> and have only the last 10 revisions transferred.  Later on the
> repository can be deepened by passing the --depth argument with a larger
> value to the fetch command if need be.
>
>
> Nicolas
>
>    
Thanks for comprehensive answer, Nicolas. Now I see 3 directions to work 
on: cacheLimit, negative delta attributes and shortening the history 
(actually, I don't think "clone --depth" is feasible in our environment, 
but we can try to backup and just purge the history).

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2010-04-12 15:32 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-06 14:18 git clone: very long "resolving deltas" phase Vitaly Berov
2010-04-06 15:01 ` Matthieu Moy
2010-04-06 15:28   ` Vitaly Berov
2010-04-06 15:29   ` Vitaly
2010-04-06 15:32     ` Andreas Ericsson
     [not found]       ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com>
2010-04-06 15:56         ` Vitaly Berov
2010-04-06 21:09           ` Nicolas Pitre
2010-04-07  5:54             ` Vitaly Berov
2010-04-07  8:00               ` Ilari Liusvaara
2010-04-07  8:14                 ` Vitaly
2010-04-07  9:00                   ` Ilari Liusvaara
2010-04-07  9:37                   ` Jakub Narebski
2010-04-07 14:20                   ` Nicolas Pitre
2010-04-07 14:35                     ` Vitaly
2010-04-07 14:55                       ` Nicolas Pitre
2010-04-09  6:46                         ` Vitaly Berov
2010-04-09 19:30                           ` Nicolas Pitre
2010-04-10  6:32                             ` Vitaly Berov
2010-04-07 14:08                 ` Nicolas Pitre
2010-04-07 14:29                   ` Sverre Rabbelier
2010-04-07 14:37                     ` Vitaly
2010-04-07  5:55             ` Vitaly
2010-04-07 12:42               ` Nicolas Pitre
2010-04-06 21:05       ` Nicolas Pitre
2010-04-07  9:22         ` git clone: very long &quot;resolving deltas&quot; phase Marat Radchenko
2010-04-07 14:40           ` Nicolas Pitre
2010-04-06 21:01   ` git clone: very long "resolving deltas" phase Nicolas Pitre
2010-04-06 21:10 ` Nicolas Pitre
2010-04-07  5:57   ` Vitaly
2010-04-07 12:55     ` Nicolas Pitre
2010-04-09  6:50       ` Vitaly Berov
2010-04-09  8:13         ` Matthieu Moy
2010-04-09 19:18           ` Nicolas Pitre
2010-04-10  8:05           ` Vitaly Berov
2010-04-09 19:25         ` Nicolas Pitre
2010-04-10  7:58           ` Vitaly Berov
2010-04-10 13:25       ` Vitaly Berov
2010-04-11  0:50         ` Nicolas Pitre
2010-04-12 15:31           ` Vitaly

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.