All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fast-import: use hashcmp() for SHA1 hash comparison
@ 2014-07-18 16:00 René Scharfe
  2014-07-18 18:42 ` Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: René Scharfe @ 2014-07-18 16:00 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 fast-import.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fast-import.c b/fast-import.c
index fa635f7..d73f58c 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2324,7 +2324,7 @@ static void file_change_m(const char *p, struct branch *b)
 	}
 
 	/* Git does not track empty, non-toplevel directories. */
-	if (S_ISDIR(mode) && !memcmp(sha1, EMPTY_TREE_SHA1_BIN, 20) && *p) {
+	if (S_ISDIR(mode) && !hashcmp(sha1, EMPTY_TREE_SHA1_BIN) && *p) {
 		tree_content_remove(&b->branch_tree, p, NULL, 0);
 		return;
 	}
-- 
2.0.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] fast-import: use hashcmp() for SHA1 hash comparison
  2014-07-18 16:00 [PATCH] fast-import: use hashcmp() for SHA1 hash comparison René Scharfe
@ 2014-07-18 18:42 ` Jonathan Nieder
  2014-07-18 19:14   ` René Scharfe
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2014-07-18 18:42 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Junio C Hamano

René Scharfe wrote:

> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> ---
>  fast-import.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Before:

  $ size git-fast-import
     text    data     bss     dec     hex filename
   804138    6768  754160 1565066  17e18a git-fast-import

After:

  $ size git-fast-import
     text    data     bss     dec     hex filename
   804154    6768  754160 1565082  17e19a git-fast-import

So this makes the text size worse on this machine (amd64, building
with gcc 4.8.2 -O2).  That's probably because the old code does 'call
memcmp', while the new code inlines it.  Inlining is presumably the
better choice.

More importantly, the new code is more readable.

For what it's worth,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fast-import: use hashcmp() for SHA1 hash comparison
  2014-07-18 18:42 ` Jonathan Nieder
@ 2014-07-18 19:14   ` René Scharfe
  2014-07-18 23:57     ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: René Scharfe @ 2014-07-18 19:14 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Git Mailing List, Junio C Hamano

Am 18.07.2014 20:42, schrieb Jonathan Nieder:
> René Scharfe wrote:
>
>> Signed-off-by: Rene Scharfe <l.s.r@web.de>
>> ---
>>   fast-import.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> Before:
>
>    $ size git-fast-import
>       text    data     bss     dec     hex filename
>     804138    6768  754160 1565066  17e18a git-fast-import
>
> After:
>
>    $ size git-fast-import
>       text    data     bss     dec     hex filename
>     804154    6768  754160 1565082  17e19a git-fast-import
>
> So this makes the text size worse on this machine (amd64, building
> with gcc 4.8.2 -O2).  That's probably because the old code does 'call
> memcmp', while the new code inlines it.  Inlining is presumably the
> better choice.
>
> More importantly, the new code is more readable.

Yes, the latter point is the important one.

If inlining is really better is another matter; I don't understand how 
1a812f3a (hashcmp(): inline memcmp() by hand to optimize) could have 
made git gc 18% faster, as it claimed.  I would expect memcmp(), which 
can compare more than a byte at a time, to be significantly faster -- or 
at least just as fast as whatever the compiler does with the inlined 
version.

René

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fast-import: use hashcmp() for SHA1 hash comparison
  2014-07-18 19:14   ` René Scharfe
@ 2014-07-18 23:57     ` Jeff King
  2014-07-19 12:11       ` René Scharfe
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2014-07-18 23:57 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jonathan Nieder, Git Mailing List, Junio C Hamano

On Fri, Jul 18, 2014 at 09:14:05PM +0200, René Scharfe wrote:

> If inlining is really better is another matter; I don't understand how
> 1a812f3a (hashcmp(): inline memcmp() by hand to optimize) could have made
> git gc 18% faster, as it claimed.  I would expect memcmp(), which can
> compare more than a byte at a time, to be significantly faster -- or at
> least just as fast as whatever the compiler does with the inlined version.

I looked into this a while ago[1]. I think with glibc 2.13 and up, the
memcmp is a win. We should consider switching back if that is what is
common now.

-Peff

[1] http://article.gmane.org/gmane.comp.version-control.git/218396

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fast-import: use hashcmp() for SHA1 hash comparison
  2014-07-18 23:57     ` Jeff King
@ 2014-07-19 12:11       ` René Scharfe
  2014-07-19 16:43         ` brian m. carlson
  0 siblings, 1 reply; 7+ messages in thread
From: René Scharfe @ 2014-07-19 12:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, Git Mailing List, Junio C Hamano

Am 19.07.2014 01:57, schrieb Jeff King:
> On Fri, Jul 18, 2014 at 09:14:05PM +0200, René Scharfe wrote:
> 
>> If inlining is really better is another matter; I don't understand how
>> 1a812f3a (hashcmp(): inline memcmp() by hand to optimize) could have made
>> git gc 18% faster, as it claimed.  I would expect memcmp(), which can
>> compare more than a byte at a time, to be significantly faster -- or at
>> least just as fast as whatever the compiler does with the inlined version.
> 
> I looked into this a while ago[1]. I think with glibc 2.13 and up, the
> memcmp is a win. We should consider switching back if that is what is
> common now.
> 
> -Peff
> 
> [1] http://article.gmane.org/gmane.comp.version-control.git/218396

On Linux this seems to improve rev-list performance by ca. 10% for me, but
on on FreeBSD it slows it down by ca. 25%.  That's less surprising after
looking at their memcmp() implementation:

http://svnweb.freebsd.org/base/release/10.0.0/lib/libc/string/memcmp.c?revision=260789&view=co

It's basically our one-byte-at-a-time inlined version, sans inlining.

I'd say if a platform doesn't bother optimizing memcmp() then they
deserve the resulting performance.  And it's probably not too bad a
penalty because such comparisons probably won't make up a significant
part of most applications.

But perhaps we can do better.  First, here are the numbers I got for
"/usr/bin/time git rev-list --all --objects v2.0.0 >/dev/null" (best
of five runs):

Debian testing amd64 with gcc 4.9.0 and libc 2.19 on Hyper-V on Intel i5-4670:

  master (398dd4bd)
  2.14user 0.02system 0:02.16elapsed 99%CPU (0avgtext+0avgdata 71004maxresident)k
  0inputs+0outputs (0major+21985minor)pagefaults 0swaps

  using memcmp
  1.92user 0.02system 0:01.95elapsed 99%CPU (0avgtext+0avgdata 71032maxresident)k
  0inputs+0outputs (0major+21987minor)pagefaults 0swaps

  four bytes at a time
  1.93user 0.03system 0:01.97elapsed 99%CPU (0avgtext+0avgdata 71240maxresident)k
  0inputs+0outputs (0major+22024minor)pagefaults 0swaps

Debian testing amd64 with clang 3.3 and libc 2.19 on Hyper-V on Intel i5-4670:

  master (398dd4bd)
  2.12user 0.04system 0:02.17elapsed 99%CPU (0avgtext+0avgdata 70976maxresident)k
  0inputs+0outputs (0major+21995minor)pagefaults 0swaps

  memcmp
  1.94user 0.01system 0:01.95elapsed 100%CPU (0avgtext+0avgdata 71012maxresident)k
  0inputs+0outputs (0major+21952minor)pagefaults 0swaps

  four bytes at a time
  1.86user 0.01system 0:01.88elapsed 99%CPU (0avgtext+0avgdata 70788maxresident)k
  0inputs+0outputs (0major+21909minor)pagefaults 0swaps

FreeBSD 10 amd64 with clang 3.3 on Hyper-V on Intel i5-4670c:

  master (398dd4bd)
        1.83 real         1.80 user         0.03 sys

  using memcmp
        2.32 real         2.32 user         0.00 sys

  four bytes at a time
        1.56 real         1.53 user         0.03 sys

Performance measurements in virtual machines are not very accurate, so
take them with a bag of salt, but the numbers were relatively stable
accross runs.

How come rev-list runs so much faster on FreeBSD?  Would it make sense
to apply the four-bytes-at-a-time patch?  How would it on other
platforms, especially ARM?  Can we do even better?  And more
importantly: Does it matter in the first place?

The memcmp version just replaces the body of hashcmp() with
"return memcmp(sha1, sha2, 20);".  Here's the four-bytes-at-a-time
patch:
---
 cache.h | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 92fc9f1..c6cd28e 100644
--- a/cache.h
+++ b/cache.h
@@ -732,13 +732,18 @@ extern const unsigned char null_sha1[20];
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+	const uint32_t *p1 = (const uint32_t *)sha1;
+	const uint32_t *p2 = (const uint32_t *)sha2;
 	int i;
 
-	for (i = 0; i < 20; i++, sha1++, sha2++) {
-		if (*sha1 != *sha2)
-			return *sha1 - *sha2;
+	for (i = 0; i < 20 / sizeof(uint32_t); i++, p1++, p2++) {
+		uint32_t v1 = htonl(*p1);
+		uint32_t v2 = htonl(*p2);
+		if (v1 < v2)
+			return -1;
+		if (v1 > v2)
+			return 1;
 	}
-
 	return 0;
 }
 
-- 
2.0.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] fast-import: use hashcmp() for SHA1 hash comparison
  2014-07-19 12:11       ` René Scharfe
@ 2014-07-19 16:43         ` brian m. carlson
  2014-07-19 17:53           ` René Scharfe
  0 siblings, 1 reply; 7+ messages in thread
From: brian m. carlson @ 2014-07-19 16:43 UTC (permalink / raw)
  To: René Scharfe
  Cc: Jeff King, Jonathan Nieder, Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1263 bytes --]

On Sat, Jul 19, 2014 at 02:11:30PM +0200, René Scharfe wrote:
> I'd say if a platform doesn't bother optimizing memcmp() then they
> deserve the resulting performance.  And it's probably not too bad a
> penalty because such comparisons probably won't make up a significant
> part of most applications.

I tend to agree with this.  On many modern versions of GCC, the compiler
can generate an appropriately optimized inline version when it sees a
memcmp call, so it's more of a compiler issue then, since no actual call
to the function will be emitted.

>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
> +	const uint32_t *p1 = (const uint32_t *)sha1;
> +	const uint32_t *p2 = (const uint32_t *)sha2;

You can't make this cast.  The guaranteed alignment for sha1 and sha2 is
1, and for p1 and p2, it's 4.  If sha1 and sha2 are not suitably
aligned, this will get a SIGBUS on sparc and possibly a wrong value on
ARM[0].

[0] http://www.aleph1.co.uk/chapter-10-arm-structured-alignment-faq

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fast-import: use hashcmp() for SHA1 hash comparison
  2014-07-19 16:43         ` brian m. carlson
@ 2014-07-19 17:53           ` René Scharfe
  0 siblings, 0 replies; 7+ messages in thread
From: René Scharfe @ 2014-07-19 17:53 UTC (permalink / raw)
  To: Jeff King, Jonathan Nieder, Git Mailing List, Junio C Hamano

Am 19.07.2014 18:43, schrieb brian m. carlson:
> On Sat, Jul 19, 2014 at 02:11:30PM +0200, René Scharfe wrote:
>> I'd say if a platform doesn't bother optimizing memcmp() then they
>> deserve the resulting performance.  And it's probably not too bad a
>> penalty because such comparisons probably won't make up a significant
>> part of most applications.
>
> I tend to agree with this.  On many modern versions of GCC, the compiler
> can generate an appropriately optimized inline version when it sees a
> memcmp call, so it's more of a compiler issue then, since no actual call
> to the function will be emitted.

I just found this open GCC bug entry about glibc memcmp being faster 
than the inlined version of the compiler: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052.

(Found through 
http://randomascii.wordpress.com/2012/10/31/comparing-memory-is-still-tricky/, 
which says that the compilers coming with Microsoft Visual Studio 2010 
and 2012 are not optimizing memcmp() as much as they could as well.)

>>   static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>>   {
>> +	const uint32_t *p1 = (const uint32_t *)sha1;
>> +	const uint32_t *p2 = (const uint32_t *)sha2;
>
> You can't make this cast.  The guaranteed alignment for sha1 and sha2 is
> 1, and for p1 and p2, it's 4.  If sha1 and sha2 are not suitably
> aligned, this will get a SIGBUS on sparc and possibly a wrong value on
> ARM[0].
>
> [0] http://www.aleph1.co.uk/chapter-10-arm-structured-alignment-faq

Yeah, it was just a test balloon that happens to work on amd64.  We 
could invent a hash type with correct alignment (a struct with a 
uint32_t[5] member?) and replace all those unsigned char pointers if we 
wanted to go with such a "vectorized" hashcmp, but that would be 
maximally invasive.

René

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-19 17:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-18 16:00 [PATCH] fast-import: use hashcmp() for SHA1 hash comparison René Scharfe
2014-07-18 18:42 ` Jonathan Nieder
2014-07-18 19:14   ` René Scharfe
2014-07-18 23:57     ` Jeff King
2014-07-19 12:11       ` René Scharfe
2014-07-19 16:43         ` brian m. carlson
2014-07-19 17:53           ` René Scharfe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.