git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Bump core.deltaBaseCacheLimit to 96m
@ 2014-05-04 17:13 David Kastrup
  2014-05-05 10:26 ` Duy Nguyen
  0 siblings, 1 reply; 7+ messages in thread
From: David Kastrup @ 2014-05-04 17:13 UTC (permalink / raw)
  To: git; +Cc: David Kastrup

The default of 16m causes serious thrashing for large delta chains
combined with large files.

Here are some benchmarks (pu variant of git blame):

time git blame -C src/xdisp.c >/dev/null

for a repository of Emacs repacked with git gc --aggressive (v1.9,
resulting in a window size of 250) located on an SSD drive.  The file in
question has about 30000 lines, 1Mb of size, and a history with about
2500 commits.

16m (previous default):
real	3m33.936s
user	2m15.396s
sys	1m17.352s

32m:
real	3m1.319s
user	2m8.660s
sys	0m51.904s

64m:
real	2m20.636s
user	1m55.780s
sys	0m23.964s

96m:
real	2m5.668s
user	1m50.784s
sys	0m14.288s

128m:
real	2m4.337s
user	1m50.764s
sys	0m12.832s

192m:
real	2m3.567s
user	1m49.508s
sys	0m13.312s

Signed-off-by: David Kastrup <dak@gnu.org>
---
 Documentation/config.txt | 2 +-
 environment.c            | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1932e9b..21a3c86 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -489,7 +489,7 @@ core.deltaBaseCacheLimit::
 	to avoid unpacking and decompressing frequently used base
 	objects multiple times.
 +
-Default is 16 MiB on all platforms.  This should be reasonable
+Default is 96 MiB on all platforms.  This should be reasonable
 for all users/operating systems, except on the largest projects.
 You probably do not need to adjust this value.
 +
diff --git a/environment.c b/environment.c
index 5c4815d..37354c8 100644
--- a/environment.c
+++ b/environment.c
@@ -37,7 +37,7 @@ int core_compression_seen;
 int fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
-size_t delta_base_cache_limit = 16 * 1024 * 1024;
+size_t delta_base_cache_limit = 96 * 1024 * 1024;
 unsigned long big_file_threshold = 512 * 1024 * 1024;
 const char *pager_program;
 int pager_use_color = 1;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Bump core.deltaBaseCacheLimit to 96m
  2014-05-04 17:13 [PATCH] Bump core.deltaBaseCacheLimit to 96m David Kastrup
@ 2014-05-05 10:26 ` Duy Nguyen
  2014-05-05 10:27   ` Duy Nguyen
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Duy Nguyen @ 2014-05-05 10:26 UTC (permalink / raw)
  To: David Kastrup; +Cc: Git Mailing List

On Mon, May 5, 2014 at 12:13 AM, David Kastrup <dak@gnu.org> wrote:
> The default of 16m causes serious thrashing for large delta chains
> combined with large files.
>
> Here are some benchmarks (pu variant of git blame):
>
> time git blame -C src/xdisp.c >/dev/null

...

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1932e9b..21a3c86 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -489,7 +489,7 @@ core.deltaBaseCacheLimit::
>         to avoid unpacking and decompressing frequently used base
>         objects multiple times.
>  +
> -Default is 16 MiB on all platforms.  This should be reasonable
> +Default is 96 MiB on all platforms.  This should be reasonable
>  for all users/operating systems, except on the largest projects.
>  You probably do not need to adjust this value.

So emacs.git falls exactly into the "except on the largest projects"
part. Would it make more sense to advise git devs to set this per repo
instead? The majority of (open source) repositories out there are
small if I'm not mistaken. Of those few big repos, we could have a
section listing all the tips and tricks to tune git. This is one of
them. Index v4 and sparse checkout are some other. In future, maybe
watchman support, split index and untracked cache as well.
-- 
Duy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Bump core.deltaBaseCacheLimit to 96m
  2014-05-05 10:26 ` Duy Nguyen
@ 2014-05-05 10:27   ` Duy Nguyen
  2014-05-05 11:03   ` Matthieu Moy
  2014-05-05 11:20   ` David Kastrup
  2 siblings, 0 replies; 7+ messages in thread
From: Duy Nguyen @ 2014-05-05 10:27 UTC (permalink / raw)
  To: David Kastrup; +Cc: Git Mailing List

On Mon, May 5, 2014 at 5:26 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> part. Would it make more sense to advise git devs to set this per repo

s/advise git devs/advise emacs devs/
-- 
Duy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Bump core.deltaBaseCacheLimit to 96m
  2014-05-05 10:26 ` Duy Nguyen
  2014-05-05 10:27   ` Duy Nguyen
@ 2014-05-05 11:03   ` Matthieu Moy
  2014-05-05 11:35     ` Duy Nguyen
  2014-05-05 11:20   ` David Kastrup
  2 siblings, 1 reply; 7+ messages in thread
From: Matthieu Moy @ 2014-05-05 11:03 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: David Kastrup, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, May 5, 2014 at 12:13 AM, David Kastrup <dak@gnu.org> wrote:
>> The default of 16m causes serious thrashing for large delta chains
>> combined with large files.
>>
>> Here are some benchmarks (pu variant of git blame):
>>
>> time git blame -C src/xdisp.c >/dev/null
>
> ...
>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 1932e9b..21a3c86 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -489,7 +489,7 @@ core.deltaBaseCacheLimit::
>>         to avoid unpacking and decompressing frequently used base
>>         objects multiple times.
>>  +
>> -Default is 16 MiB on all platforms.  This should be reasonable
>> +Default is 96 MiB on all platforms.  This should be reasonable
>>  for all users/operating systems, except on the largest projects.
>>  You probably do not need to adjust this value.
>
> So emacs.git falls exactly into the "except on the largest projects"
> part. Would it make more sense to advise git devs to set this per repo
> instead?

What's the impact of changing the default for small projects?

My guess is that changing from 16 to 96Mb is just following Moore's law.
Machines average RAM has increased a lot since the time 16Mb has been
chosen, and few people would actually notice the difference in RAM usage
nowadays.

If increasing the default does not harm small projects and benefits to
big projects, then we should obviously go this way.

(perhaps adding advices for people using Git on machines with low RAM)

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Bump core.deltaBaseCacheLimit to 96m
  2014-05-05 10:26 ` Duy Nguyen
  2014-05-05 10:27   ` Duy Nguyen
  2014-05-05 11:03   ` Matthieu Moy
@ 2014-05-05 11:20   ` David Kastrup
  2014-05-05 20:19     ` Jeff King
  2 siblings, 1 reply; 7+ messages in thread
From: David Kastrup @ 2014-05-05 11:20 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, May 5, 2014 at 12:13 AM, David Kastrup <dak@gnu.org> wrote:
>> The default of 16m causes serious thrashing for large delta chains
>> combined with large files.
>>
>> Here are some benchmarks (pu variant of git blame):
>>
>> time git blame -C src/xdisp.c >/dev/null
>
> ...
>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 1932e9b..21a3c86 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -489,7 +489,7 @@ core.deltaBaseCacheLimit::
>>         to avoid unpacking and decompressing frequently used base
>>         objects multiple times.
>>  +
>> -Default is 16 MiB on all platforms.  This should be reasonable
>> +Default is 96 MiB on all platforms.  This should be reasonable
>>  for all users/operating systems, except on the largest projects.
>>  You probably do not need to adjust this value.
>
> So emacs.git falls exactly into the "except on the largest projects"
> part.

git gc --aggressive has been used/recommended for _all_ projects
regularly, leading to delta chains with a length of 250.  So this delta
chain size is not exceptional but will eventually occur in any archive
that has been created and maintained according to the recommendations of
Git's documentation (which recommends gc --aggressive every few hundreds
of revisions).  I was illustrating the effect on a file of size 1MB.
That's not an egregiously large file either.

96MB is the point of diminuishing returns for this case which is _6_
times larger than the current default and _small_ in comparison with the
memory installed on developer machines nowadays.  Similar slowdowns
occur with other examples.  Git will with the current defaults accept
files of 512Mb size into its compression scheme (and thus its core
memory) before punting.

The current delteBaseCacheLimit of 16Mb is rather ridiculous in
particular with the pre-2.0 settings for gc --aggressive and causes
serious performance degration.  It was actually ridiculous even 10 years
ago.

> Would it make more sense to advise git devs to set this per repo
> instead? The majority of (open source) repositories out there are
> small if I'm not mistaken. Of those few big repos, we could have a
> section listing all the tips and tricks to tune git. This is one of
> them. Index v4 and sparse checkout are some other. In future, maybe
> watchman support, split index and untracked cache as well.

Shrug.  The last version of the patch was refused because of wanting
more evidence.  I added the evidence.

And I have it on record in the mailing list and can point to it when
people ask me why Git is so slow for "git blame" in comparison to other
version control systems in spite of my purporting to having improved it.

I'm definitely not going to jump through any more hoops here.  I don't
see a point in this kind of spectacle.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Bump core.deltaBaseCacheLimit to 96m
  2014-05-05 11:03   ` Matthieu Moy
@ 2014-05-05 11:35     ` Duy Nguyen
  0 siblings, 0 replies; 7+ messages in thread
From: Duy Nguyen @ 2014-05-05 11:35 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: David Kastrup, Git Mailing List

On Mon, May 5, 2014 at 6:03 PM, Matthieu Moy
<Matthieu.Moy@grenoble-inp.fr> wrote:
>>> -Default is 16 MiB on all platforms.  This should be reasonable
>>> +Default is 96 MiB on all platforms.  This should be reasonable
>>>  for all users/operating systems, except on the largest projects.
>>>  You probably do not need to adjust this value.
>>
>> So emacs.git falls exactly into the "except on the largest projects"
>> part. Would it make more sense to advise git devs to set this per repo
>> instead?
>
> What's the impact of changing the default for small projects?

Good question. With "git log --patch" or something like that, we could
use up to the limit, which is now 96MB. On modern machines that's
probably nothing.

> My guess is that changing from 16 to 96Mb is just following Moore's law.
> Machines average RAM has increased a lot since the time 16Mb has been
> chosen, and few people would actually notice the difference in RAM usage
> nowadays.
>
> If increasing the default does not harm small projects and benefits to
> big projects, then we should obviously go this way.

I wrote without thinking it through. I agree with you.

> (perhaps adding advices for people using Git on machines with low RAM)
-- 
Duy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Bump core.deltaBaseCacheLimit to 96m
  2014-05-05 11:20   ` David Kastrup
@ 2014-05-05 20:19     ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2014-05-05 20:19 UTC (permalink / raw)
  To: David Kastrup; +Cc: Duy Nguyen, Git Mailing List

On Mon, May 05, 2014 at 01:20:09PM +0200, David Kastrup wrote:

> > Would it make more sense to advise git devs to set this per repo
> > instead? The majority of (open source) repositories out there are
> > small if I'm not mistaken. Of those few big repos, we could have a
> > section listing all the tips and tricks to tune git. This is one of
> > them. Index v4 and sparse checkout are some other. In future, maybe
> > watchman support, split index and untracked cache as well.
> 
> Shrug.  The last version of the patch was refused because of wanting
> more evidence.  I added the evidence.

FWIW, I was the one who asked for the evidence, and this patch looks
pretty straightforward and good. We may also want to revisit the data
structure for the delta cache, but that can come separately. My earlier
tests had not shown improvement with just bumping the cache size, but
these ones obviously do. So I think it's worth bumping the default.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-05-06 16:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-04 17:13 [PATCH] Bump core.deltaBaseCacheLimit to 96m David Kastrup
2014-05-05 10:26 ` Duy Nguyen
2014-05-05 10:27   ` Duy Nguyen
2014-05-05 11:03   ` Matthieu Moy
2014-05-05 11:35     ` Duy Nguyen
2014-05-05 11:20   ` David Kastrup
2014-05-05 20:19     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).