* [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()
@ 2013-09-10 22:17 Nicolas Pitre
2013-09-11 18:48 ` Jeff King
0 siblings, 1 reply; 4+ messages in thread
From: Nicolas Pitre @ 2013-09-10 22:17 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
hashtable_index() appears to be a close duplicate of hash_obj().
Keep only the later and make it usable for all cases.
Also remove the modulus as this is an expansive operation.
The size argument is always a power of 2 anyway, so a simple
mask operation provides the same result.
On a 'git rev-list --all --objects' run this decreased the time spent
in lookup_object from 27.5% to 24.1%.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
---
I discovered this patch in my git work tree dating from 2 years ago.
diff --git a/object.c b/object.c
index d8a4b1f..e2dae22 100644
--- a/object.c
+++ b/object.c
@@ -43,16 +43,16 @@ int type_from_string(const char *str)
die("invalid object type \"%s\"", str);
}
-static unsigned int hash_obj(struct object *obj, unsigned int n)
+static unsigned int hash_obj(const unsigned char *sha1, unsigned int n)
{
unsigned int hash;
- memcpy(&hash, obj->sha1, sizeof(unsigned int));
- return hash % n;
+ memcpy(&hash, sha1, sizeof(unsigned int));
+ return hash & (n - 1);
}
static void insert_obj_hash(struct object *obj, struct object **hash, unsigned int size)
{
- unsigned int j = hash_obj(obj, size);
+ unsigned int j = hash_obj(obj->sha1, size);
while (hash[j]) {
j++;
@@ -62,13 +62,6 @@ static void insert_obj_hash(struct object *obj, struct object **hash, unsigned i
hash[j] = obj;
}
-static unsigned int hashtable_index(const unsigned char *sha1)
-{
- unsigned int i;
- memcpy(&i, sha1, sizeof(unsigned int));
- return i % obj_hash_size;
-}
-
struct object *lookup_object(const unsigned char *sha1)
{
unsigned int i, first;
@@ -77,7 +70,7 @@ struct object *lookup_object(const unsigned char *sha1)
if (!obj_hash)
return NULL;
- first = i = hashtable_index(sha1);
+ first = i = hash_obj(sha1, obj_hash_size);
while ((obj = obj_hash[i]) != NULL) {
if (!hashcmp(sha1, obj->sha1))
break;
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()
2013-09-10 22:17 [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj() Nicolas Pitre
@ 2013-09-11 18:48 ` Jeff King
2013-09-12 20:08 ` Nicolas Pitre
0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2013-09-11 18:48 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git
On Tue, Sep 10, 2013 at 06:17:12PM -0400, Nicolas Pitre wrote:
> hashtable_index() appears to be a close duplicate of hash_obj().
> Keep only the later and make it usable for all cases.
Thanks. This duplication has often bugged me when looking at that
hash table, but I just never actually wrote the patch.
> Also remove the modulus as this is an expansive operation.
> The size argument is always a power of 2 anyway, so a simple
> mask operation provides the same result.
>
> On a 'git rev-list --all --objects' run this decreased the time spent
> in lookup_object from 27.5% to 24.1%.
Nice. This is a tiny bit subtle, though, as the power-of-2 growth
happens elsewhere, and we may want to tweak it later (the decorate.c
hash, for example, grows by 3/2).
Maybe it's worth squashing in one or both of the comments below as a
warning to anybody who tries to tweak it.
---
diff --git a/object.c b/object.c
index e2dae22..5f792cb 100644
--- a/object.c
+++ b/object.c
@@ -47,6 +47,7 @@ static unsigned int hash_obj(const unsigned char *sha1, unsigned int n)
{
unsigned int hash;
memcpy(&hash, sha1, sizeof(unsigned int));
+ /* Assumes power-of-2 hash sizes in grow_object_hash */
return hash & (n - 1);
}
@@ -94,6 +95,10 @@ static void grow_object_hash(void)
static void grow_object_hash(void)
{
int i;
+ /*
+ * Note that this size must always be power-of-2 to match hash_obj
+ * above.
+ */
int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
struct object **new_hash;
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()
2013-09-11 18:48 ` Jeff King
@ 2013-09-12 20:08 ` Nicolas Pitre
2013-09-12 20:30 ` Junio C Hamano
0 siblings, 1 reply; 4+ messages in thread
From: Nicolas Pitre @ 2013-09-12 20:08 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
On Wed, 11 Sep 2013, Jeff King wrote:
> On Tue, Sep 10, 2013 at 06:17:12PM -0400, Nicolas Pitre wrote:
>
> > Also remove the modulus as this is an expansive operation.
> > The size argument is always a power of 2 anyway, so a simple
> > mask operation provides the same result.
> >
> > On a 'git rev-list --all --objects' run this decreased the time spent
> > in lookup_object from 27.5% to 24.1%.
>
> Nice. This is a tiny bit subtle, though, as the power-of-2 growth
> happens elsewhere, and we may want to tweak it later (the decorate.c
> hash, for example, grows by 3/2).
>
> Maybe it's worth squashing in one or both of the comments below as a
> warning to anybody who tries to tweak it.
Agreed.
@Junio: are you willing to squash those in, or do you prefer a resent?
> ---
> diff --git a/object.c b/object.c
> index e2dae22..5f792cb 100644
> --- a/object.c
> +++ b/object.c
> @@ -47,6 +47,7 @@ static unsigned int hash_obj(const unsigned char *sha1, unsigned int n)
> {
> unsigned int hash;
> memcpy(&hash, sha1, sizeof(unsigned int));
> + /* Assumes power-of-2 hash sizes in grow_object_hash */
> return hash & (n - 1);
> }
>
> @@ -94,6 +95,10 @@ static void grow_object_hash(void)
> static void grow_object_hash(void)
> {
> int i;
> + /*
> + * Note that this size must always be power-of-2 to match hash_obj
> + * above.
> + */
> int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
> struct object **new_hash;
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()
2013-09-12 20:08 ` Nicolas Pitre
@ 2013-09-12 20:30 ` Junio C Hamano
0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2013-09-12 20:30 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Jeff King, git
Nicolas Pitre <nico@fluxnic.net> writes:
>> Maybe it's worth squashing in one or both of the comments below as a
>> warning to anybody who tries to tweak it.
>
> Agreed.
>
> @Junio: are you willing to squash those in, or do you prefer a resent?
I think I've queued it ready to be squashed. No need for resend.
Thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-09-12 20:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-10 22:17 [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj() Nicolas Pitre
2013-09-11 18:48 ` Jeff King
2013-09-12 20:08 ` Nicolas Pitre
2013-09-12 20:30 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.