All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] object: measure time needed for resolving hash collisions
@ 2016-09-15  2:01 Stefan Beller
  2016-09-15  6:47 ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2016-09-15  2:01 UTC (permalink / raw)
  To: peff; +Cc: git, Stefan Beller

$ time ./git rev-list HEAD --count
44271
print_time_spent_object: measure time needed for resolving hash collisions

This shows that we roughly spent a third of the time in resolving
hash collisions:

I am using linux.git to measure:

$ time git rev-list --objects --count HEAD >/dev/null
print_time_spent_resolving_hash_collisions 9:445591664

real	0m31.733s
user	0m31.220s
sys	0m0.463s

For fun I reverted 9a414486d9f0:

$ time git rev-list --objects --count HEAD >/dev/null
print_time_spent_resolving_hash_collisions 14:338605504

real	0m37.008s
user	0m36.524s
sys	0m0.423s

The 17% that Jeff measured in there, equals to 1-31.733/37.008 = 0.14
in these measurements.
The time spent in collision resolving went down by 1/3 on itself
(14.33s to 9.44s), so there is still room for improvement.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

 According to Jeff, sending patches that don't get accepted is the new hype!

 builtin/rev-list.c |  2 ++
 object.c           | 36 ++++++++++++++++++++++++++++++++++++
 object.h           |  2 ++
 3 files changed, 40 insertions(+)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 8479f6e..d0e5922 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -412,5 +412,7 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			printf("%d\n", revs.count_left + revs.count_right);
 	}
 
+	print_time_spent_resolving_hash_collisions();
+
 	return 0;
 }
diff --git a/object.c b/object.c
index 67d9a9e..e9e73e0 100644
--- a/object.c
+++ b/object.c
@@ -5,9 +5,40 @@
 #include "commit.h"
 #include "tag.h"
 
+#include <time.h>
+
 static struct object **obj_hash;
 static int nr_objs, obj_hash_size;
 
+struct timespec caching = {0, 0};
+
+void diff(struct timespec *start, struct timespec *end, struct timespec *dst)
+{
+	if ((end->tv_nsec-start->tv_nsec)<0) {
+		dst->tv_sec = end->tv_sec-start->tv_sec-1;
+		dst->tv_nsec = 1000000000+end->tv_nsec-start->tv_nsec;
+	} else {
+		dst->tv_sec = end->tv_sec-start->tv_sec;
+		dst->tv_nsec = end->tv_nsec-start->tv_nsec;
+	}
+}
+
+void add_time_to(struct timespec *dst, struct timespec *addend)
+{
+	dst->tv_sec += addend->tv_sec;
+	dst->tv_nsec += addend->tv_nsec;
+	while (dst->tv_nsec > 1000000000) {
+		dst->tv_nsec -= 1000000000;
+		dst->tv_sec ++;
+	}
+}
+
+void print_time_spent_resolving_hash_collisions(void)
+{
+	fprintf(stderr, "print_time_spent_resolving_hash_collisions %ld:%9ld\n",
+		(long)caching.tv_sec, (long)caching.tv_nsec);
+}
+
 unsigned int get_max_object_index(void)
 {
 	return obj_hash_size;
@@ -86,11 +117,13 @@ struct object *lookup_object(const unsigned char *sha1)
 {
 	unsigned int i, first;
 	struct object *obj;
+	struct timespec time1, time2, t_diff;
 
 	if (!obj_hash)
 		return NULL;
 
 	first = i = hash_obj(sha1, obj_hash_size);
+	clock_gettime(CLOCK_MONOTONIC, &time1);
 	while ((obj = obj_hash[i]) != NULL) {
 		if (!hashcmp(sha1, obj->oid.hash))
 			break;
@@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
 		if (i == obj_hash_size)
 			i = 0;
 	}
+	clock_gettime(CLOCK_MONOTONIC, &time2);
+	diff(&time1, &time2, &t_diff);
+	add_time_to(&caching, &t_diff);
 	if (obj && i != first) {
 		/*
 		 * Move object to where we started to look for it so
diff --git a/object.h b/object.h
index f8b6442..ee6ab3a 100644
--- a/object.h
+++ b/object.h
@@ -56,6 +56,8 @@ extern const char *typename(unsigned int type);
 extern int type_from_string_gently(const char *str, ssize_t, int gentle);
 #define type_from_string(str) type_from_string_gently(str, -1, 0)
 
+void print_time_spent_resolving_hash_collisions(void);
+
 /*
  * Return the current number of buckets in the object hashmap.
  */
-- 
2.10.0.130.gef2bcd7.dirty


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] object: measure time needed for resolving hash collisions
  2016-09-15  2:01 [PATCH] object: measure time needed for resolving hash collisions Stefan Beller
@ 2016-09-15  6:47 ` Jeff King
  2016-09-15  7:01   ` Jeff King
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jeff King @ 2016-09-15  6:47 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Wed, Sep 14, 2016 at 07:01:41PM -0700, Stefan Beller wrote:

>  According to Jeff, sending patches that don't get accepted is the new hype!

It is what all the cool kids are doing. Unfortunately, it does not save
you from nitpicky reviews...;)

>  	first = i = hash_obj(sha1, obj_hash_size);
> +	clock_gettime(CLOCK_MONOTONIC, &time1);
>  	while ((obj = obj_hash[i]) != NULL) {
>  		if (!hashcmp(sha1, obj->oid.hash))
>  			break;
> @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
>  		if (i == obj_hash_size)
>  			i = 0;
>  	}
> +	clock_gettime(CLOCK_MONOTONIC, &time2);
> +	diff(&time1, &time2, &t_diff);
> +	add_time_to(&caching, &t_diff);
>  	if (obj && i != first) {

I don't think this is actually measuring the time spent on collisions.
It's measuring the time we spend in hashcmp(), but that includes the
non-collision case where we find it in the first hashcmp.

Measuring _just_ the collisions is more like the patch below. In my
measurements it's more like 30ms, compared to 10s for all of the
hashcmps.

So we really aren't dealing with collisions, but rather just verifying
that our hash landed at the right spot. And _any_ data structure is
going to have to do that. If you want to make it faster, I'd try
optimizing hashcmp (and you can see why the critbit tree was slower; if
we spend so much time just on hashcmp() to make sure we're at the right
key, then making that slower with a bunch of branching is not going to
help).

I notice we still open-code hashcmp. I get a slight speedup by switching
it to memcmp(). About 2.5%, which is similar to what I showed in

  http://public-inbox.org/git/20130318073229.GA5551@sigill.intra.peff.net/

a few years ago (though it's more pronounced as a portion of the whole
now, because we've optimized some of the other bits).

The main driver there was memcmp() improvements that went into glibc
2.13 several years ago. It might be time to start assuming that memcmp()
beats our open-coded loop.

It may also be possible to really micro-optimize it on some platforms,
because we know the size in advance (I'd kind of expect the compiler to
do that, but if we're ending up in glibc memcmp then it sounds like it
is not the case).

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] object: measure time needed for resolving hash collisions
  2016-09-15  6:47 ` Jeff King
@ 2016-09-15  7:01   ` Jeff King
  2016-09-15 16:26   ` Stefan Beller
  2016-09-15 17:45   ` Junio C Hamano
  2 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2016-09-15  7:01 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Wed, Sep 14, 2016 at 11:47:01PM -0700, Jeff King wrote:

> >  	first = i = hash_obj(sha1, obj_hash_size);
> > +	clock_gettime(CLOCK_MONOTONIC, &time1);
> >  	while ((obj = obj_hash[i]) != NULL) {
> >  		if (!hashcmp(sha1, obj->oid.hash))
> >  			break;
> > @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
> >  		if (i == obj_hash_size)
> >  			i = 0;
> >  	}
> > +	clock_gettime(CLOCK_MONOTONIC, &time2);
> > +	diff(&time1, &time2, &t_diff);
> > +	add_time_to(&caching, &t_diff);
> >  	if (obj && i != first) {
> 
> I don't think this is actually measuring the time spent on collisions.
> It's measuring the time we spend in hashcmp(), but that includes the
> non-collision case where we find it in the first hashcmp.
> 
> Measuring _just_ the collisions is more like the patch below. In my
> measurements it's more like 30ms, compared to 10s for all of the
> hashcmps.

I forgot to send the patch. Which is just as well, because I realized it
was totally buggy.

Here's a patch that I believe is correct (it counts only times when we
move past the first hash slot). It spends about 280ms. Which is still a
lot less than 10s, so I think my other comments stand.

---
diff --git a/object.c b/object.c
index e9e73e0..7a74a1d 100644
--- a/object.c
+++ b/object.c
@@ -123,17 +123,20 @@ struct object *lookup_object(const unsigned char *sha1)
 		return NULL;
 
 	first = i = hash_obj(sha1, obj_hash_size);
-	clock_gettime(CLOCK_MONOTONIC, &time1);
 	while ((obj = obj_hash[i]) != NULL) {
 		if (!hashcmp(sha1, obj->oid.hash))
 			break;
+		if (first == i)
+			clock_gettime(CLOCK_MONOTONIC, &time1);
 		i++;
 		if (i == obj_hash_size)
 			i = 0;
 	}
-	clock_gettime(CLOCK_MONOTONIC, &time2);
-	diff(&time1, &time2, &t_diff);
-	add_time_to(&caching, &t_diff);
+	if (i != first) {
+		clock_gettime(CLOCK_MONOTONIC, &time2);
+		diff(&time1, &time2, &t_diff);
+		add_time_to(&caching, &t_diff);
+	}
 	if (obj && i != first) {
 		/*
 		 * Move object to where we started to look for it so

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] object: measure time needed for resolving hash collisions
  2016-09-15  6:47 ` Jeff King
  2016-09-15  7:01   ` Jeff King
@ 2016-09-15 16:26   ` Stefan Beller
  2016-09-15 18:49     ` Jeff King
  2016-09-15 17:45   ` Junio C Hamano
  2 siblings, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2016-09-15 16:26 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Wed, Sep 14, 2016 at 11:47 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Sep 14, 2016 at 07:01:41PM -0700, Stefan Beller wrote:
>
>>  According to Jeff, sending patches that don't get accepted is the new hype!
>
> It is what all the cool kids are doing. Unfortunately, it does not save
> you from nitpicky reviews...;)
>
>>       first = i = hash_obj(sha1, obj_hash_size);
>> +     clock_gettime(CLOCK_MONOTONIC, &time1);
>>       while ((obj = obj_hash[i]) != NULL) {
>>               if (!hashcmp(sha1, obj->oid.hash))
>>                       break;
>> @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
>>               if (i == obj_hash_size)
>>                       i = 0;
>>       }
>> +     clock_gettime(CLOCK_MONOTONIC, &time2);
>> +     diff(&time1, &time2, &t_diff);
>> +     add_time_to(&caching, &t_diff);
>>       if (obj && i != first) {
>
> I don't think this is actually measuring the time spent on collisions.
> It's measuring the time we spend in hashcmp(), but that includes the
> non-collision case where we find it in the first hashcmp.

Right. I measured all lookup times, i.e. this function accounts for
1/3 of the run
time.

>
> Measuring _just_ the collisions is more like the patch below. In my
> measurements it's more like 30ms, compared to 10s for all of the
> hashcmps.

So off by a few orders of magnitude, i.e. we don't have to worry about
the time spent in resolving collisions.

>
> So we really aren't dealing with collisions, but rather just verifying
> that our hash landed at the right spot. And _any_ data structure is
> going to have to do that. If you want to make it faster, I'd try
> optimizing hashcmp (and you can see why the critbit tree was slower; if
> we spend so much time just on hashcmp() to make sure we're at the right
> key, then making that slower with a bunch of branching is not going to
> help).
>
> I notice we still open-code hashcmp. I get a slight speedup by switching
> it to memcmp(). About 2.5%, which is similar to what I showed in
>
>   http://public-inbox.org/git/20130318073229.GA5551@sigill.intra.peff.net/
>
> a few years ago (though it's more pronounced as a portion of the whole
> now, because we've optimized some of the other bits).
>
> The main driver there was memcmp() improvements that went into glibc
> 2.13 several years ago. It might be time to start assuming that memcmp()
> beats our open-coded loop.

http://stackoverflow.com/questions/21106801/why-is-memcmp-so-much-faster-than-a-for-loop-check

seems to agree with you; so I'd think I'll agree with switching over.

>
> It may also be possible to really micro-optimize it on some platforms,
> because we know the size in advance (I'd kind of expect the compiler to
> do that, but if we're ending up in glibc memcmp then it sounds like it
> is not the case).

That stackoverflow link suggests that glibc already has microoptimisations
for a variety of platforms.

>
> -Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] object: measure time needed for resolving hash collisions
  2016-09-15  6:47 ` Jeff King
  2016-09-15  7:01   ` Jeff King
  2016-09-15 16:26   ` Stefan Beller
@ 2016-09-15 17:45   ` Junio C Hamano
  2016-09-15 18:47     ` Jeff King
  2 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2016-09-15 17:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git

Jeff King <peff@peff.net> writes:

> Measuring _just_ the collisions is more like the patch below. In my
> measurements it's more like 30ms, compared to 10s for all of the
> hashcmps.
>
> So we really aren't dealing with collisions, but rather just verifying
> that our hash landed at the right spot. And _any_ data structure is
> going to have to do that.

The reverse side of the coin may be if we can shrink the hashtable
smaller and load it more heavily without sacrificing performance by
making the necessary "have we landed at the right spot" check cheap
enough, I guess.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] object: measure time needed for resolving hash collisions
  2016-09-15 17:45   ` Junio C Hamano
@ 2016-09-15 18:47     ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2016-09-15 18:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git

On Thu, Sep 15, 2016 at 10:45:34AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Measuring _just_ the collisions is more like the patch below. In my
> > measurements it's more like 30ms, compared to 10s for all of the
> > hashcmps.
> >
> > So we really aren't dealing with collisions, but rather just verifying
> > that our hash landed at the right spot. And _any_ data structure is
> > going to have to do that.
> 
> The reverse side of the coin may be if we can shrink the hashtable
> smaller and load it more heavily without sacrificing performance by
> making the necessary "have we landed at the right spot" check cheap
> enough, I guess.

I think that's where things like cuckoo hashing come into play. They
didn't have any effect for us because we already keep the table very
unloaded. But you could _probably_ increase the load factor without
sacrificing performance using a more clever scheme.

It's not clear to me that the current table size is a big problem,
though. It might be hurting us with cache effects, but I think the only
way we'd know is to measure.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] object: measure time needed for resolving hash collisions
  2016-09-15 16:26   ` Stefan Beller
@ 2016-09-15 18:49     ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2016-09-15 18:49 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Thu, Sep 15, 2016 at 09:26:22AM -0700, Stefan Beller wrote:

> > It may also be possible to really micro-optimize it on some platforms,
> > because we know the size in advance (I'd kind of expect the compiler to
> > do that, but if we're ending up in glibc memcmp then it sounds like it
> > is not the case).
> 
> That stackoverflow link suggests that glibc already has microoptimisations
> for a variety of platforms.

It's definitely micro-optimized in glibc. What I was trying to say is
that if we are hitting the glibc implementation, then we know we are
handling the "20" at runtime. Whereas the compiler should know that "20"
is a constant, and could in theory skip the memcmp() call entirely in
favor of something like an unrolled loop.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-09-15 18:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-15  2:01 [PATCH] object: measure time needed for resolving hash collisions Stefan Beller
2016-09-15  6:47 ` Jeff King
2016-09-15  7:01   ` Jeff King
2016-09-15 16:26   ` Stefan Beller
2016-09-15 18:49     ` Jeff King
2016-09-15 17:45   ` Junio C Hamano
2016-09-15 18:47     ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.