All of lore.kernel.org
 help / color / mirror / Atom feed
* Make GIT_USE_LOOKUP default?
@ 2013-03-17 13:25 Duy Nguyen
  2013-03-17 20:13 ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Duy Nguyen @ 2013-03-17 13:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

This env comes from jc/sha1-lookup in 2008 (merge commit e9f9d4f), 5
years ago. I wonder if it's good enough to turn on by default and keep
improving from there, or is it still experimental?
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make GIT_USE_LOOKUP default?
  2013-03-17 13:25 Make GIT_USE_LOOKUP default? Duy Nguyen
@ 2013-03-17 20:13 ` Junio C Hamano
  2013-03-18  7:32   ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-03-17 20:13 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> This env comes from jc/sha1-lookup in 2008 (merge commit e9f9d4f), 5
> years ago. I wonder if it's good enough to turn on by default and keep
> improving from there, or is it still experimental?

The algorithm has been used in production in other codepaths like
patch-ids and replace-object, so correctness-wise it should be fine
to turn it on.  I think nobody has bothered to benchmark with and
without the environment to see if it is really worth the complexity.

It may be a good idea to try doing so, now you have noticed it ;-).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make GIT_USE_LOOKUP default?
  2013-03-17 20:13 ` Junio C Hamano
@ 2013-03-18  7:32   ` Jeff King
  2013-03-18 16:44     ` Junio C Hamano
  2013-03-19 15:43     ` Duy Nguyen
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2013-03-18  7:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ingo Molnar, Jonathan Nieder, Duy Nguyen, Git Mailing List

[+cc Ingo and Jonathan, as this revisits the "open-code hashcmp" thread
     referenced below]

On Sun, Mar 17, 2013 at 01:13:56PM -0700, Junio C Hamano wrote:

> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > This env comes from jc/sha1-lookup in 2008 (merge commit e9f9d4f), 5
> > years ago. I wonder if it's good enough to turn on by default and keep
> > improving from there, or is it still experimental?
> 
> The algorithm has been used in production in other codepaths like
> patch-ids and replace-object, so correctness-wise it should be fine
> to turn it on.  I think nobody has bothered to benchmark with and
> without the environment to see if it is really worth the complexity.
> 
> It may be a good idea to try doing so, now you have noticed it ;-).

The only benchmarking I could find in the list archive (besides the ones
in the commit itself, showing little change, but fewer page faults) is:

  http://article.gmane.org/gmane.comp.version-control.git/123832

which actually indicates that GIT_USE_LOOKUP is slower (despite having
fewer page faults).

By the way, looking at that made me think for a few minutes about
hashcmp, and I was surprised to find that we use an open-coded
comparison loop. That dates back to this thread by Ingo:

  http://article.gmane.org/gmane.comp.version-control.git/172286

I could not replicate his benchmarks at all. In fact, my measurements
showed a slight slowdown with 1a812f3 (hashcmp(): inline memcmp() by
hand to optimize, 2011-04-28).

Here are my best-of-five numbers for running "git rev-list --objects
--all >/dev/null" on linux-2.6.git:

  [current master, compiled with -O2]
  real    0m45.612s
  user    0m45.140s
  sys     0m0.300s

  [current master, compiled with -O3 for comparison]
  real    0m45.588s
  user    0m45.088s
  sys     0m0.312s

  [revert 1a812f3 (i.e., go back to memcmp), -O2]
  real    0m44.358s
  user    0m43.876s
  sys     0m0.316s

  [open-code first byte, fall back to memcmp, -O2]
  real    0m43.963s
  user    0m43.568s
  sys     0m0.284s

I wonder why we get such different numbers. Ingo said his tests are on a
Nehalem CPU, as are mine (mine is an i7-840QM). I wonder if we should be
wrapping the optimization in an #ifdef, but I'm not sure which flag we
should be checking.

Note that I didn't run all of my measurements using "git gc" as Ingo
did, which I think conflates a lot of unrelated performance issues (like
writing out a packfile). The interesting bits for hashcmp in "gc" are
the "Counting objects" phase of pack-objects, and "git prune"
determining reachability. Those are both basically the same as "rev-list
--objects --all".

I did do a quick check of `git gc`, though, and it showed results that
matched my rev-lists above (i.e., a very slight speedup by going back to
memcmp).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make GIT_USE_LOOKUP default?
  2013-03-18  7:32   ` Jeff King
@ 2013-03-18 16:44     ` Junio C Hamano
  2013-03-18 16:49       ` Jeff King
  2013-03-19 15:43     ` Duy Nguyen
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-03-18 16:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Ingo Molnar, Jonathan Nieder, Duy Nguyen, Git Mailing List

Jeff King <peff@peff.net> writes:

> By the way, looking at that made me think for a few minutes about
> hashcmp, and I was surprised to find that we use an open-coded
> comparison loop. That dates back to this thread by Ingo:
>
>   http://article.gmane.org/gmane.comp.version-control.git/172286
>
> I could not replicate his benchmarks at all. In fact, my measurements
> showed a slight slowdown with 1a812f3 (hashcmp(): inline memcmp() by
> hand to optimize, 2011-04-28).
>
> Here are my best-of-five numbers for running "git rev-list --objects
> --all >/dev/null" on linux-2.6.git:
>
>   [current master, compiled with -O2]
>   real    0m45.612s
>   user    0m45.140s
>   sys     0m0.300s
> ...
>   [revert 1a812f3 (i.e., go back to memcmp), -O2]
>   real    0m44.358s
>   user    0m43.876s
>   sys     0m0.316s
> ...
> I wonder why we get such different numbers. Ingo said his tests are on a
> Nehalem CPU, as are mine (mine is an i7-840QM). I wonder if we should be
> wrapping the optimization in an #ifdef, but I'm not sure which flag we
> should be checking.

FWIW, I am getting something like this on my

$ grep 'model name' /proc/cpuinfo | uniq -c
  4 model name      : Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz

The same "rev-list --objects --all >/dev/null", best of five:

[current master, compiled with -O2]
real    0m39.956s
user    0m39.562s
sys     0m0.396s

[revert 1a812f3 (i.e., go back to memcmp), -O2]
real    0m42.161s
user    0m41.879s
sys     0m0.284s

It could be that the difference may be how well memcmp() is
optimized, no?  My box runs Debian with libc6 2.11.3-4 and gcc
4.4.5.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make GIT_USE_LOOKUP default?
  2013-03-18 16:44     ` Junio C Hamano
@ 2013-03-18 16:49       ` Jeff King
  2013-03-18 17:08         ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2013-03-18 16:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ingo Molnar, Jonathan Nieder, Duy Nguyen, Git Mailing List

On Mon, Mar 18, 2013 at 09:44:20AM -0700, Junio C Hamano wrote:

> FWIW, I am getting something like this on my
> 
> $ grep 'model name' /proc/cpuinfo | uniq -c
>   4 model name      : Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
> 
> The same "rev-list --objects --all >/dev/null", best of five:
> 
> [current master, compiled with -O2]
> real    0m39.956s
> user    0m39.562s
> sys     0m0.396s
> 
> [revert 1a812f3 (i.e., go back to memcmp), -O2]
> real    0m42.161s
> user    0m41.879s
> sys     0m0.284s
> 
> It could be that the difference may be how well memcmp() is
> optimized, no?  My box runs Debian with libc6 2.11.3-4 and gcc
> 4.4.5.

Yeah, that would make sense. I have libc6 2.13-38 and gcc 4.7.2 (debian
unstable).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make GIT_USE_LOOKUP default?
  2013-03-18 16:49       ` Jeff King
@ 2013-03-18 17:08         ` Junio C Hamano
  2013-03-18 17:19           ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-03-18 17:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Ingo Molnar, Jonathan Nieder, Duy Nguyen, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Mon, Mar 18, 2013 at 09:44:20AM -0700, Junio C Hamano wrote:
>
>> FWIW, I am getting something like this on my
>> 
>> $ grep 'model name' /proc/cpuinfo | uniq -c
>>   4 model name      : Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz
>> 
>> The same "rev-list --objects --all >/dev/null", best of five:
>> 
>> [current master, compiled with -O2]
>> real    0m39.956s
>> user    0m39.562s
>> sys     0m0.396s
>> 
>> [revert 1a812f3 (i.e., go back to memcmp), -O2]
>> real    0m42.161s
>> user    0m41.879s
>> sys     0m0.284s
>> 
>> It could be that the difference may be how well memcmp() is
>> optimized, no?  My box runs Debian with libc6 2.11.3-4 and gcc
>> 4.4.5.
>
> Yeah, that would make sense. I have libc6 2.13-38 and gcc 4.7.2 (debian
> unstable).

Just for fun, here is a totally unrelated comparison, both with
current master, compiled with -O2 and running on the same box.

[without GIT_USE_LOOKUP]
real    0m39.906s
real    0m40.020s
real    0m40.022s
real    0m40.027s
real    0m40.146s

[with GIT_USE_LOOKUP]
real    0m40.336s
real    0m40.376s
real    0m40.452s
real    0m40.503s
real    0m40.572s

Maybe the Newton-Raphson is right as a concept (it does seem to
result in fewer minor-faults) but the current code is implemented
poorly and has a huge room for improvement?

[without GIT_USE_LOOKUP]
0inputs+0outputs (0major+182895minor)pagefaults 0swaps
0inputs+0outputs (0major+182920minor)pagefaults 0swaps
0inputs+0outputs (0major+183004minor)pagefaults 0swaps
0inputs+0outputs (0major+183006minor)pagefaults 0swaps
0inputs+0outputs (0major+183036minor)pagefaults 0swaps

[with GIT_USE_LOOKUP]
0inputs+0outputs (0major+182803minor)pagefaults 0swaps
0inputs+0outputs (0major+182886minor)pagefaults 0swaps
0inputs+0outputs (0major+182967minor)pagefaults 0swaps
0inputs+0outputs (0major+182997minor)pagefaults 0swaps
0inputs+0outputs (0major+182998minor)pagefaults 0swaps

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make GIT_USE_LOOKUP default?
  2013-03-18 17:08         ` Junio C Hamano
@ 2013-03-18 17:19           ` Jeff King
  2013-03-18 18:40             ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2013-03-18 17:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ingo Molnar, Jonathan Nieder, Duy Nguyen, Git Mailing List

On Mon, Mar 18, 2013 at 10:08:11AM -0700, Junio C Hamano wrote:

> Just for fun, here is a totally unrelated comparison, both with
> current master, compiled with -O2 and running on the same box.
> 
> [without GIT_USE_LOOKUP]
> real    0m39.906s
> real    0m40.020s
> real    0m40.022s
> real    0m40.027s
> real    0m40.146s
> 
> [with GIT_USE_LOOKUP]
> real    0m40.336s
> real    0m40.376s
> real    0m40.452s
> real    0m40.503s
> real    0m40.572s
> 
> Maybe the Newton-Raphson is right as a concept (it does seem to
> result in fewer minor-faults) but the current code is implemented
> poorly and has a huge room for improvement?

I do not see anything obviously wrong in it, though I did not walk
through all of the ofs calculation to look for any clever speedups.
However, I think it is clear from the other timings and Ingo's thread
that glibc 2.11's memcmp does not perform very well on many short reads.
And sha1_entry_pos will do memcmps even smaller than 20 bytes.

What happens if you do this?

diff --git a/sha1-lookup.c b/sha1-lookup.c
index c4dc55d..4ea03bd 100644
--- a/sha1-lookup.c
+++ b/sha1-lookup.c
@@ -102,6 +102,17 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t nr,
 	return -lo-1;
 }
 
+static int short_memcmp(const unsigned char *a,
+			const unsigned char *b,
+			int len)
+{
+	int i;
+	for (i = 0; i < len; i++, a++, b++)
+		if (*a != *b)
+			return *a - *b;
+	return 0;
+}
+
 /*
  * Conventional binary search loop looks like this:
  *
@@ -257,7 +268,7 @@ int sha1_entry_pos(const void *table,
 			    lo, mi, hi, sha1_to_hex(key));
 
 		mi_key = base + elem_size * mi + key_offset;
-		cmp = memcmp(mi_key + ofs_0, key + ofs_0, 20 - ofs_0);
+		cmp = short_memcmp(mi_key + ofs_0, key + ofs_0, 20 - ofs_0);
 		if (!cmp)
 			return mi;
 		if (cmp > 0) {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Make GIT_USE_LOOKUP default?
  2013-03-18 17:19           ` Jeff King
@ 2013-03-18 18:40             ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2013-03-18 18:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Ingo Molnar, Jonathan Nieder, Duy Nguyen, Git Mailing List

Jeff King <peff@peff.net> writes:

> I do not see anything obviously wrong in it, though I did not walk
> through all of the ofs calculation to look for any clever speedups.
> However, I think it is clear from the other timings and Ingo's thread
> that glibc 2.11's memcmp does not perform very well on many short reads.
> And sha1_entry_pos will do memcmps even smaller than 20 bytes.
>
> What happens if you do this?

The overall trend is the same.

[without GIT_USE_LOOKUP]
real    0m40.044s
real    0m40.054s
real    0m40.072s
real    0m40.097s
real    0m40.159s

[with GIT_USE_LOOKUP]
real    0m40.257s
real    0m40.281s
real    0m40.311s
real    0m40.366s
real    0m40.407s

I suspect that after the first few rounds the range shrinks small
enough that the difference between a simple "mi = (hi + lo)/2" and
convoluted ofs computation becomes dominant.  Perhaps we should only
do N-R for the initial midpoint selection and then fall back to a
stupid binary search?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make GIT_USE_LOOKUP default?
  2013-03-18  7:32   ` Jeff King
  2013-03-18 16:44     ` Junio C Hamano
@ 2013-03-19 15:43     ` Duy Nguyen
  2013-03-19 15:55       ` Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Duy Nguyen @ 2013-03-19 15:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Ingo Molnar, Jonathan Nieder, Git Mailing List

On Mon, Mar 18, 2013 at 2:32 PM, Jeff King <peff@peff.net> wrote:
> By the way, looking at that made me think for a few minutes about
> hashcmp, and I was surprised to find that we use an open-coded
> comparison loop. That dates back to this thread by Ingo:
>
>   http://article.gmane.org/gmane.comp.version-control.git/172286
>
> I could not replicate his benchmarks at all. In fact, my measurements
> showed a slight slowdown with 1a812f3 (hashcmp(): inline memcmp() by
> hand to optimize, 2011-04-28).
>
> Here are my best-of-five numbers for running "git rev-list --objects
> --all >/dev/null" on linux-2.6.git:
>
>   [current master, compiled with -O2]
>   real    0m45.612s
>   user    0m45.140s
>   sys     0m0.300s
>
>   [current master, compiled with -O3 for comparison]
>   real    0m45.588s
>   user    0m45.088s
>   sys     0m0.312s
>
>   [revert 1a812f3 (i.e., go back to memcmp), -O2]
>   real    0m44.358s
>   user    0m43.876s
>   sys     0m0.316s
>
>   [open-code first byte, fall back to memcmp, -O2]
>   real    0m43.963s
>   user    0m43.568s
>   sys     0m0.284s
>
> I wonder why we get such different numbers. Ingo said his tests are on a
> Nehalem CPU, as are mine (mine is an i7-840QM). I wonder if we should be
> wrapping the optimization in an #ifdef, but I'm not sure which flag we
> should be checking.

What gcc and glibc versions are you using? With gcc 4.5.3 I got "repz
cmpsb" after reverting the patch, just like what Ingo described
(although interestingly it ran a bit faster than current master, glibc
2.11.2 on Atom D510 32 bit). gcc 4.6.3 -O2 (on another machine, 64
bit) produced a call to libc's memcmp instead of "repz cmpsb". I guess
if "repz cmpsb" is what we are against, then we could pass
-fno-builtin-memcmp (potential impact to other parts of git though).
-- 
Duy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Make GIT_USE_LOOKUP default?
  2013-03-19 15:43     ` Duy Nguyen
@ 2013-03-19 15:55       ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2013-03-19 15:55 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, Ingo Molnar, Jonathan Nieder, Git Mailing List

On Tue, Mar 19, 2013 at 10:43:40PM +0700, Nguyen Thai Ngoc Duy wrote:

> > I could not replicate his benchmarks at all. In fact, my measurements
> > showed a slight slowdown with 1a812f3 (hashcmp(): inline memcmp() by
> > hand to optimize, 2011-04-28).
> [...]
> 
> What gcc and glibc versions are you using? With gcc 4.5.3 I got "repz
> cmpsb" after reverting the patch, just like what Ingo described
> (although interestingly it ran a bit faster than current master, glibc
> 2.11.2 on Atom D510 32 bit). gcc 4.6.3 -O2 (on another machine, 64
> bit) produced a call to libc's memcmp instead of "repz cmpsb". I guess
> if "repz cmpsb" is what we are against, then we could pass
> -fno-builtin-memcmp (potential impact to other parts of git though).

I have glibc 2.13. And looking at the changelog, there were a ton of
ssse-optimized memcmp implementations that went into 2.13 [1], so I suspect
that is what is going on.

-Peff

PS As a side note, you may recall a year or two ago when these went in,
because the initial versions walked backwards, meaning memcpys of
overlapping regions started to fail (as they are allowed to, but many
programs do not properly use memmove).

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-03-19 15:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-17 13:25 Make GIT_USE_LOOKUP default? Duy Nguyen
2013-03-17 20:13 ` Junio C Hamano
2013-03-18  7:32   ` Jeff King
2013-03-18 16:44     ` Junio C Hamano
2013-03-18 16:49       ` Jeff King
2013-03-18 17:08         ` Junio C Hamano
2013-03-18 17:19           ` Jeff King
2013-03-18 18:40             ` Junio C Hamano
2013-03-19 15:43     ` Duy Nguyen
2013-03-19 15:55       ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.