All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Jakub Narebski" <jnareb@gmail.com>, "Ted Ts'o" <tytso@mit.edu>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Clemens Buchacher" <drizzd@aon.at>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	"David Barr" <davidbarr@google.com>
Subject: [PATCH 1/3] implement generic key/value map
Date: Thu, 14 Jul 2011 13:51:05 -0400	[thread overview]
Message-ID: <20110714175105.GA21771@sigill.intra.peff.net> (raw)
In-Reply-To: <20110714173454.GA21657@sigill.intra.peff.net>

It is frequently useful to have a fast, generic data
structure mapping keys to values. We already have something
like this in the "decorate" API, but it has two downsides:

  1. The key type must always be a "struct object *".

  2. The value type is a void pointer, which means it is
     inefficient and cumbersome for storing small values.
     One must either encode their value inside the void
     pointer, or allocate additional storage for the pointer
     to point to.

This patch introduces a generic map data structure, mapping
keys of arbitrary type to values of arbitrary type.

One possible strategy for implementation is to have a struct
that points to a sequence of bytes for each of the key and
the value, and to try to treat them as opaque in the code.
However, this code gets complex, has a lot of casts, and
runs afoul of violating alignment and strict aliasing rules.

This patch takes a different approach. We parameterize the
types in each map by putting the declarations and
implementations inside macros, and expand the macros with
the correct types. This lets the compiler see the actual
code, with its real types, and figure out things like struct
packing and alignment itself.

Signed-off-by: Jeff King <peff@peff.net>
---
In addition to switching from using void pointers to macro expansion,
this has one other difference from my previous patch: it handles
arbitrary types for keys, not just object pointers. This was mentioned
by Jakub, and would allow things like a fast bi-directional map for SVN
revision numbers and commits.

I tried to keep the implementation simple. Two things that could be changed:

  1. We can't assume that the map key is a pointer. So the sentinel
     "NULL" value isn't necessarily available to use, and we have to
     keep a separate bit in each hash entry to say "is this valid".
     This means when we _do_ store a pointer, we end up with an extra
     32 bits or so in each hash entry for the "used" flag.

     We could add a macro parameter for sentinel values, so that types
     which _can_ handle this efficiently don't have to pay the price.
     Or we could decide that mapping arbitrary keys isn't worth the
     hassle. I wrote this way to be flexible for future use; I don't
     personally have plans to add svn revision number mappings.

  2. It assumes values are assignable. That means storing something like
     "unsigned char sha1[20]" doesn't work. You can wrap it in a struct,
     but do we assume that struct assignment works everywhere? I didn't
     check, but I think it is in C89 but some antique compilers didn't
     allow it. Switching it to use memcpy() would be easy enough (or
     again, parameterizing so that assignable things don't have to pay
     the price).

 Makefile |    2 +
 map.c    |   86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 map.h    |   24 +++++++++++++++++
 3 files changed, 112 insertions(+), 0 deletions(-)
 create mode 100644 map.c
 create mode 100644 map.h

diff --git a/Makefile b/Makefile
index 46793d1..6242321 100644
--- a/Makefile
+++ b/Makefile
@@ -530,6 +530,7 @@ LIB_H += list-objects.h
 LIB_H += ll-merge.h
 LIB_H += log-tree.h
 LIB_H += mailmap.h
+LIB_H += map.h
 LIB_H += merge-file.h
 LIB_H += merge-recursive.h
 LIB_H += notes.h
@@ -621,6 +622,7 @@ LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
 LIB_OBJS += mailmap.o
+LIB_OBJS += map.o
 LIB_OBJS += match-trees.o
 LIB_OBJS += merge-file.o
 LIB_OBJS += merge-recursive.o
diff --git a/map.c b/map.c
new file mode 100644
index 0000000..378cecb
--- /dev/null
+++ b/map.c
@@ -0,0 +1,86 @@
+#include "cache.h"
+#include "map.h"
+#include "object.h"
+
+static unsigned int hash_obj(const struct object *obj, unsigned int n)
+{
+	unsigned int hash;
+
+	memcpy(&hash, obj->sha1, sizeof(unsigned int));
+	return hash % n;
+}
+
+static unsigned int cmp_obj(const struct object *a, const struct object *b)
+{
+	return b == a;
+}
+
+#define MAP_IMPLEMENT(name, ktype, vtype, cmp_fun, hash_fun) \
+static int map_insert_##name(struct map_##name *m, \
+			     const ktype key, \
+			     vtype value, \
+			     vtype *old) \
+{ \
+	unsigned int j; \
+ \
+	for (j = hash_fun(key, m->size); m->hash[j].used; j = (j+1) % m->size) { \
+		if (cmp_fun(m->hash[j].key, key)) { \
+			if (old) \
+				*old = m->hash[j].value; \
+			m->hash[j].value = value; \
+			return 1; \
+		} \
+	} \
+ \
+	m->hash[j].key = key; \
+	m->hash[j].value = value; \
+	m->hash[j].used = 1; \
+	m->nr++; \
+	return 0; \
+} \
+ \
+static void map_grow_##name(struct map_##name *m) \
+{ \
+	struct map_entry_##name *old_hash = m->hash; \
+	unsigned int old_size = m->size; \
+	unsigned int i; \
+ \
+	m->size = (old_size + 1000) * 3 / 2; \
+	m->hash = xcalloc(m->size, sizeof(*m->hash)); \
+	m->nr = 0; \
+ \
+	for (i = 0; i < old_size; i++) { \
+		if (!old_hash[i].used) \
+			continue; \
+		map_insert_##name(m, old_hash[i].key, old_hash[i].value, NULL); \
+	} \
+	free(old_hash); \
+} \
+ \
+int map_set_##name(struct map_##name *m, \
+		   const ktype key, \
+		   vtype value, \
+		   vtype *old) \
+{ \
+	if (m->nr >= m->size * 2 / 3) \
+		map_grow_##name(m); \
+	return map_insert_##name(m, key, value, old); \
+} \
+ \
+int map_get_##name(struct map_##name *m, \
+		   const ktype key, \
+		   vtype *value) \
+{ \
+	unsigned int j; \
+ \
+	if (!m->size) \
+		return 0; \
+ \
+	for (j = hash_fun(key, m->size); m->hash[j].used; j = (j+1) % m->size) { \
+		if (cmp_fun(m->hash[j].key, key)) { \
+			*value = m->hash[j].value; \
+			return 1; \
+		} \
+	} \
+	return 0; \
+}
diff --git a/map.h b/map.h
new file mode 100644
index 0000000..496c5d1
--- /dev/null
+++ b/map.h
@@ -0,0 +1,24 @@
+#ifndef MAP_H
+#define MAP_H
+
+#define DECLARE_MAP(name, ktype, vtype) \
+struct map_entry_##name { \
+	const ktype key; \
+	vtype value; \
+	unsigned used:1; \
+}; \
+ \
+struct map_##name { \
+	unsigned int size, nr; \
+	struct map_entry_##name *hash; \
+}; \
+ \
+extern int map_get_##name(struct map_##name *, \
+			  const ktype key, \
+			  vtype *value); \
+extern int map_set_##name(struct map_##name *, \
+			  const ktype key, \
+			  vtype value, \
+			  vtype *old); \
+
+#endif /* MAP_H */
-- 
1.7.6.38.ge5b33

  reply	other threads:[~2011-07-14 17:51 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-13  6:47 [RFC/PATCHv2 0/6] generation numbers for faster traversals Jeff King
2011-07-13  6:57 ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Jeff King
2011-07-13 17:52   ` Jonathan Nieder
2011-07-13 20:08     ` Jeff King
2011-07-14 17:34       ` Jeff King
2011-07-14 17:51         ` Jeff King [this message]
2011-07-14 18:52           ` [PATCH 1/3] implement generic key/value map Bert Wesarg
2011-07-14 18:54             ` Bert Wesarg
2011-07-14 18:55               ` Jeff King
2011-07-14 19:07                 ` Bert Wesarg
2011-07-14 19:14                   ` Jeff King
2011-07-14 19:18                     ` Bert Wesarg
2011-07-14 17:52         ` [PATCH 2/3] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-07-15  9:40           ` Sverre Rabbelier
2011-07-15 20:00             ` Jeff King
2011-07-14 17:53         ` [PATCH 3/3] decorate: use "map" for the underlying implementation Jeff King
2011-07-14 21:06         ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Junio C Hamano
2011-08-04 22:43           ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-04 22:45             ` [PATCH 1/5] implement generic key/value map Jeff King
2011-08-04 22:46             ` [PATCH 2/5] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-08-04 22:46             ` [PATCH 3/5] decorate: use "map" for the underlying implementation Jeff King
2011-08-04 22:46             ` [PATCH 4/5] map: implement persistent maps Jeff King
2011-08-04 22:46             ` [PATCH 5/5] implement metadata cache subsystem Jeff King
2011-08-04 22:48             ` [RFC/PATCH 0/2] patch-id caching Jeff King
2011-08-04 22:49               ` [PATCH 1/2] cherry: read default config Jeff King
2011-08-04 22:49               ` [PATCH 2/2] cache patch ids on disk Jeff King
2011-08-04 22:52                 ` Jeff King
2011-08-05 11:03             ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-05 15:31               ` René Scharfe
2011-08-06  6:30                 ` Jeff King
2011-07-13  7:04 ` [RFC/PATCHv2 2/6] add metadata-cache infrastructure Jeff King
2011-07-13  8:18   ` Bert Wesarg
2011-07-13  8:31     ` Jeff King
2011-07-13  8:45       ` Bert Wesarg
2011-07-13 19:18         ` Jeff King
2011-07-13 19:40       ` Junio C Hamano
2011-07-13 19:33   ` Junio C Hamano
2011-07-13 20:25     ` Jeff King
2011-07-13  7:05 ` [RFC/PATCHv2 3/6] commit: add commit_generation function Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13  7:05 ` [RFC/PATCHv2 4/6] pretty: support %G to show the generation number of a commit Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 5/6] check commit generation cache validity against grafts Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13 19:35     ` Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 6/6] limit "contains" traversals based on commit generation Jeff King
2011-07-13  7:23   ` Jeff King
2011-07-13 20:33     ` Junio C Hamano
2011-07-13 20:58       ` Jeff King
2011-07-13 21:12         ` Junio C Hamano
2011-07-13 21:18           ` Jeff King
2011-07-15 18:22   ` Junio C Hamano
2011-07-15 20:40     ` Jeff King
2011-07-15 21:04       ` Junio C Hamano
2011-07-15 21:14         ` Jeff King
2011-07-15 21:01 ` Generation numbers and replacement objects Jakub Narebski
2011-07-15 21:10   ` Jeff King
2011-07-16 21:10     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110714175105.GA21771@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=davidbarr@google.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=spearce@spearce.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.