git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Jakub Narebski" <jnareb@gmail.com>, "Ted Ts'o" <tytso@mit.edu>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Clemens Buchacher" <drizzd@aon.at>,
	"Shawn O. Pearce" <spearce@spearce.org>
Subject: [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers
Date: Wed, 13 Jul 2011 02:57:00 -0400	[thread overview]
Message-ID: <20110713065700.GA18566@sigill.intra.peff.net> (raw)
In-Reply-To: <20110713064709.GA18499@sigill.intra.peff.net>

The decorate API provides a mapping of objects to arbitrary
values. Until now, it did this by allowing you to store a
void pointer, which could point to other storage. This has
two problems:

  1. It's inefficient. To store even a small value, you have
     to allocate persistent storage. So we end up storing
     both the value and a pointer to it.

  2. It's a pain to use. To avoid heap overhead for small
     values, you have to either use a custom allocater, or
     you have to shoe-horn the value into the void pointer
     (if it's small enough to fit).

This patch lets you store fixed-size values directly in the
hash table without allocating them elsewhere. This is a
definite win for any value smaller than or equal to a
pointer. It's probably a win for slightly larger values, but
may eventually be slower for storing large structs (because
the values are copied when the hash grows).

It also provides a more natural interface for storing small
values; see the changes in fast-export.c, which can now drop
its pointer arithmetic magic.

The original "store and retrieve a void pointer" API is easy
to implement on top of this (the values we store are the
pointers). The add_decoration and lookup_decoration
functions are kept for compatibility.

Signed-off-by: Jeff King <peff@peff.net>
---
The "width" field is now const; you must either use a static
initializer or overwrite a struct decoration via memset. This gives us a
little more language-enforced safety.

The "stride" and "end" fields were removed in favor of run-time
calculation. The calculations are wrapped in inline functions to give
the optimizer a chance to remove them.

Since we're translating sizes to pointers in our loops via these
functions now anyway, I made the iteration interface a bit more sane.
It used to be:

  unsigned char *p;
  for (p = n->hash; p < n->end; p += n->stride) {
    struct object_decoration *e = (struct object_decoration *)p;
    ...
  }

which really leaks ugly implementation details. You can now do:

  int i;
  for (i = 0; i < n->size; i++) {
    struct object_decoration *e = decoration_slot(n, i);
    ...
  }

which is a bit more natural.

I peeked at the output of "gcc -O3"; it doesn't actually hoist much of
the decoration_slot calculation out of the loop. But I measured the code
from the previous series with this one, and wasn't able to detect a
difference. So I think using pointers was just a silly premature
micro-optimization.

 Documentation/technical/api-decorate.txt |  164 +++++++++++++++++++++++++++++-
 builtin/fast-export.c                    |   29 ++----
 decorate.c                               |   70 +++++++++----
 decorate.h                               |   37 ++++++-
 4 files changed, 251 insertions(+), 49 deletions(-)

diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
index 1d52a6c..b048b28 100644
--- a/Documentation/technical/api-decorate.txt
+++ b/Documentation/technical/api-decorate.txt
@@ -1,6 +1,166 @@
 decorate API
 ============
 
-Talk about <decorate.h>
+The decorate API is a system for efficiently mapping objects to values
+in memory. It is slightly slower than an actual member of an object
+struct (because it incurs a hash lookup), but it uses less memory when
+the mapping is not in use, or when the number of decorated objects is
+small compared to the total number of objects.
 
-(Linus)
+For efficiency, the mapping is capable of storing actual byte values, as
+long as the byte values for each element are of a fixed size. So one
+could, for example, map objects into 32-bit integers. For ease of use,
+functions are provided for storing the values of arbitrary pointers,
+which can point to strings or structs.
+
+Data Structures
+---------------
+
+`struct decoration`::
+
+	This structure represents a single mapping of objects to
+	values. Its fields are:
+
+	`name`:::
+		This field is not used by the decorate API itself, but
+		may be used by calling code.
+
+	`width`:::
+		This field specifies the width (in units of `char`) of
+		the values to be stored. This field must be set to its
+		final value when the decoration struct is initialized.
+		A width of `0` is equivalent to `sizeof(void *)`.
+
+	`nr`:::
+		The number of objects currently mapped by the
+		decoration.
+
+	`size`:::
+		The number of hash slots allocated; this is kept to at
+		least 3/2 of the number of actual slots used, to keep
+		the hash sparse.
+
+	`hash`:::
+		A pointer to an array of actual `object_decoration`
+		structs. Note that because the width of `struct
+		object_decoration` is not known until runtime, this
+		array is stored with type `unsigned char *`. To access
+		individual items, one must perform pointer arithmetic;
+		see the `decoration_by_offset` function below.
+
+`struct object_decoration`::
+
+	A structure representing the decoration of a single object.
+	Callers will not normally need to use this object unless they
+	are iterating all elements in the decoration hash. The `base`
+	field points to the object being mapped (or `NULL` if it is
+	an empty hash slot). The `decoration` field stores the mapped
+	value as a sequence of bytes; use the `width` field in `struct
+	decoration` to know the exact size.
+
+
+Functions
+---------
+
+`add_decoration_value`::
+
+	Add a mapping from an object to a sequence of bytes. The number
+	of bytes pointed to by `decoration` should be equal to the
+	`width` field of the `struct decoration`. If the `old` parameter
+	is not NULL and a there was already a value for the object, the
+	bytes of the old value are copied into `old`.  The return value
+	is `1` if there was a previous value, or `0` otherwise. Note
+	that if there is no previous value, then `old` is left
+	untouched; it is the responsibility of the caller to either
+	check the return value or to set a sentinel value in `old`.
+
+`lookup_decoration_value`::
+
+	Retrieve a decoration from the mapping. The return value is a
+	pointer to the sequence of bytes representing the value (of
+	length `width`), or `NULL` if no value is found.
+
+`add_decoration`::
+
+	Add a mapping from an object to a void pointer. If a previous
+	pointer exists for the object, it is returned; otherwise, `NULL`
+	is returned.
+
+`lookup_decoration`::
+
+	Retrieve a void pointer from the mapping. The return value is
+	the stored pointer, or `NULL` if there is no stored pointer.
+
+`decoration_slot`::
+
+	Retrieve the decoration stored at slot `i` in the hash table.
+	If the `base` field of the returned `struct object_decoration`
+	is `NULL`, then no value is stored at that slot. This function
+	is useful when iterating over the entire contents of the hash
+	table. See the iteration example below.
+
+
+Examples
+--------
+
+Store and retrieve pointers to structs:
+
+-------------------------------------------------------------------
+/* no need to set width parameter; it defaults to sizeof(void *) */
+static struct decoration commit_foos;
+
+void store_foo(const struct commit *c, const char *name)
+{
+	struct foo *value = alloc_foo(name);
+	struct foo *old;
+
+	old = add_decoration(&commit_foos, c->object, value);
+	free(old);
+}
+
+const struct foo *get_foo(const struct commit *c)
+{
+	return lookup_decoration(&commit_foos, c->object);
+}
+-------------------------------------------------------------------
+
+Store and retrieve `unsigned long` integers:
+
+-------------------------------------------------------------------
+static struct decoration longs = { "my longs", sizeof(unsigned long) };
+
+void store_long(const struct object *obj, unsigned long value)
+{
+	unsigned long old;
+	if (add_decoration_value(&longs, obj, &value, &old)
+		printf("old value was %lu\n", old);
+}
+
+void print_long(const struct object *obj)
+{
+	unsigned long *value = lookup_decoration_value(&longs, obj);
+	if (!value)
+		printf("no value\n");
+	else
+		printf("value is %lu\n", *value);
+}
+-------------------------------------------------------------------
+
+Iterate over all stored decorations:
+
+-------------------------------------------------------------------
+void dump_longs(void)
+{
+	int i;
+	for (i = 0; i < longs.size; i++) {
+		struct object_decoration *e = decoration_slot(&longs, i);
+		unsigned long *value = (unsigned long *)e->decoration;
+
+		/* empty hash slot */
+		if (!e->base)
+			continue;
+
+		printf("%s -> %lu\n", sha1_to_hex(e->base->sha1), *value);
+	}
+}
+-------------------------------------------------------------------
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index daf1945..82b458d 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -59,7 +59,7 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt,
 	return 0;
 }
 
-static struct decoration idnums;
+static struct decoration idnums = { NULL, sizeof(uint32_t) };
 static uint32_t last_idnum;
 
 static int has_unshown_parent(struct commit *commit)
@@ -73,20 +73,9 @@ static int has_unshown_parent(struct commit *commit)
 	return 0;
 }
 
-/* Since intptr_t is C99, we do not use it here */
-static inline uint32_t *mark_to_ptr(uint32_t mark)
-{
-	return ((uint32_t *)NULL) + mark;
-}
-
-static inline uint32_t ptr_to_mark(void * mark)
-{
-	return (uint32_t *)mark - (uint32_t *)NULL;
-}
-
 static inline void mark_object(struct object *object, uint32_t mark)
 {
-	add_decoration(&idnums, object, mark_to_ptr(mark));
+	add_decoration_value(&idnums, object, &mark, NULL);
 }
 
 static inline void mark_next_object(struct object *object)
@@ -96,10 +85,10 @@ static inline void mark_next_object(struct object *object)
 
 static int get_object_mark(struct object *object)
 {
-	void *decoration = lookup_decoration(&idnums, object);
-	if (!decoration)
+	uint32_t *mark = lookup_decoration_value(&idnums, object);
+	if (!mark)
 		return 0;
-	return ptr_to_mark(decoration);
+	return *mark;
 }
 
 static void show_progress(void)
@@ -537,8 +526,6 @@ static void handle_tags_and_duplicates(struct string_list *extra_refs)
 static void export_marks(char *file)
 {
 	unsigned int i;
-	uint32_t mark;
-	struct object_decoration *deco = idnums.hash;
 	FILE *f;
 	int e = 0;
 
@@ -547,15 +534,15 @@ static void export_marks(char *file)
 		die_errno("Unable to open marks file %s for writing.", file);
 
 	for (i = 0; i < idnums.size; i++) {
+		struct object_decoration *deco = decoration_slot(&idnums, i);
 		if (deco->base && deco->base->type == 1) {
-			mark = ptr_to_mark(deco->decoration);
-			if (fprintf(f, ":%"PRIu32" %s\n", mark,
+			uint32_t *mark = (uint32_t *)deco->decoration;
+			if (fprintf(f, ":%"PRIu32" %s\n", *mark,
 				sha1_to_hex(deco->base->sha1)) < 0) {
 			    e = 1;
 			    break;
 			}
 		}
-		deco++;
 	}
 
 	e |= ferror(f);
diff --git a/decorate.c b/decorate.c
index 2f8a63e..5a0747e 100644
--- a/decorate.c
+++ b/decorate.c
@@ -14,44 +14,48 @@ static unsigned int hash_obj(const struct object *obj, unsigned int n)
 	return hash % n;
 }
 
-static void *insert_decoration(struct decoration *n, const struct object *base, void *decoration)
+static int insert_decoration(struct decoration *n, const struct object *base,
+			     const void *decoration, void *old)
 {
 	int size = n->size;
-	struct object_decoration *hash = n->hash;
+	unsigned long width = decoration_width(n);
 	unsigned int j = hash_obj(base, size);
 
-	while (hash[j].base) {
-		if (hash[j].base == base) {
-			void *old = hash[j].decoration;
-			hash[j].decoration = decoration;
-			return old;
+	while (1) {
+		struct object_decoration *e = decoration_slot(n, j);
+		if (!e->base) {
+			e->base = base;
+			memcpy(e->decoration, decoration, width);
+			n->nr++;
+			return 0;
+		}
+		if (e->base == base) {
+			if (old)
+				memcpy(old, e->decoration, width);
+			memcpy(e->decoration, decoration, width);
+			return 1;
 		}
 		if (++j >= size)
 			j = 0;
 	}
-	hash[j].base = base;
-	hash[j].decoration = decoration;
-	n->nr++;
-	return NULL;
 }
 
 static void grow_decoration(struct decoration *n)
 {
 	int i;
 	int old_size = n->size;
-	struct object_decoration *old_hash = n->hash;
+	unsigned char *old_hash = n->hash;
 
 	n->size = (old_size + 1000) * 3 / 2;
-	n->hash = xcalloc(n->size, sizeof(struct object_decoration));
+	n->hash = xcalloc(n->size, decoration_stride(n));
 	n->nr = 0;
 
 	for (i = 0; i < old_size; i++) {
-		const struct object *base = old_hash[i].base;
-		void *decoration = old_hash[i].decoration;
-
-		if (!base)
-			continue;
-		insert_decoration(n, base, decoration);
+		struct object_decoration *e =
+			(struct object_decoration *)
+			(old_hash + i * decoration_stride(n));
+		if (e->base)
+			insert_decoration(n, e->base, e->decoration, NULL);
 	}
 	free(old_hash);
 }
@@ -60,15 +64,35 @@ static void grow_decoration(struct decoration *n)
 void *add_decoration(struct decoration *n, const struct object *obj,
 		void *decoration)
 {
-	int nr = n->nr + 1;
+	void *old = NULL;
+	add_decoration_value(n, obj, &decoration, &old);
+	return old;
+}
 
+int add_decoration_value(struct decoration *n,
+			 const struct object *obj,
+			 const void *decoration,
+			 void *old)
+{
+	int nr = n->nr + 1;
 	if (nr > n->size * 2 / 3)
 		grow_decoration(n);
-	return insert_decoration(n, obj, decoration);
+	return insert_decoration(n, obj, decoration, old);
 }
 
 /* Lookup a decoration pointer */
-void *lookup_decoration(struct decoration *n, const struct object *obj)
+void *lookup_decoration(const struct decoration *n, const struct object *obj)
+{
+	void **v;
+
+	v = lookup_decoration_value(n, obj);
+	if (!v)
+		return NULL;
+	return *v;
+}
+
+void *lookup_decoration_value(const struct decoration *n,
+			      const struct object *obj)
 {
 	unsigned int j;
 
@@ -77,7 +101,7 @@ void *lookup_decoration(struct decoration *n, const struct object *obj)
 		return NULL;
 	j = hash_obj(obj, n->size);
 	for (;;) {
-		struct object_decoration *ref = n->hash + j;
+		struct object_decoration *ref = decoration_slot(n, j);
 		if (ref->base == obj)
 			return ref->decoration;
 		if (!ref->base)
diff --git a/decorate.h b/decorate.h
index e732804..c0e8e3f 100644
--- a/decorate.h
+++ b/decorate.h
@@ -3,16 +3,47 @@
 
 struct object_decoration {
 	const struct object *base;
-	void *decoration;
+	unsigned char decoration[FLEX_ARRAY];
 };
 
 struct decoration {
 	const char *name;
+	/* width of data we're holding; must be set before adding */
+	const unsigned int width;
 	unsigned int size, nr;
-	struct object_decoration *hash;
+	/*
+	 * The hash contains object_decoration structs, but we don't know their
+	 * size until runtime. So we store is as a pointer to characters to
+	 * make pointer arithmetic easier.
+	 */
+	unsigned char *hash;
 };
 
 extern void *add_decoration(struct decoration *n, const struct object *obj, void *decoration);
-extern void *lookup_decoration(struct decoration *n, const struct object *obj);
+extern void *lookup_decoration(const struct decoration *n, const struct object *obj);
+
+extern int add_decoration_value(struct decoration *n,
+				const struct object *obj,
+				const void *decoration,
+				void *old);
+extern void *lookup_decoration_value(const struct decoration *n,
+				     const struct object *obj);
+
+static inline unsigned long decoration_width(const struct decoration *n)
+{
+	return n->width ? n->width : sizeof(void *);
+}
+
+static inline unsigned long decoration_stride(const struct decoration *n)
+{
+	return sizeof(struct object_decoration) + decoration_width(n);
+}
+
+static inline struct object_decoration *decoration_slot(const struct decoration *n,
+							unsigned i)
+{
+	return (struct object_decoration *)
+		(n->hash + (i * decoration_stride(n)));
+}
 
 #endif
-- 
1.7.6.37.g989c6

  reply	other threads:[~2011-07-13  6:57 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-13  6:47 [RFC/PATCHv2 0/6] generation numbers for faster traversals Jeff King
2011-07-13  6:57 ` Jeff King [this message]
2011-07-13 17:52   ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Jonathan Nieder
2011-07-13 20:08     ` Jeff King
2011-07-14 17:34       ` Jeff King
2011-07-14 17:51         ` [PATCH 1/3] implement generic key/value map Jeff King
2011-07-14 18:52           ` Bert Wesarg
2011-07-14 18:54             ` Bert Wesarg
2011-07-14 18:55               ` Jeff King
2011-07-14 19:07                 ` Bert Wesarg
2011-07-14 19:14                   ` Jeff King
2011-07-14 19:18                     ` Bert Wesarg
2011-07-14 17:52         ` [PATCH 2/3] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-07-15  9:40           ` Sverre Rabbelier
2011-07-15 20:00             ` Jeff King
2011-07-14 17:53         ` [PATCH 3/3] decorate: use "map" for the underlying implementation Jeff King
2011-07-14 21:06         ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Junio C Hamano
2011-08-04 22:43           ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-04 22:45             ` [PATCH 1/5] implement generic key/value map Jeff King
2011-08-04 22:46             ` [PATCH 2/5] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-08-04 22:46             ` [PATCH 3/5] decorate: use "map" for the underlying implementation Jeff King
2011-08-04 22:46             ` [PATCH 4/5] map: implement persistent maps Jeff King
2011-08-04 22:46             ` [PATCH 5/5] implement metadata cache subsystem Jeff King
2011-08-04 22:48             ` [RFC/PATCH 0/2] patch-id caching Jeff King
2011-08-04 22:49               ` [PATCH 1/2] cherry: read default config Jeff King
2011-08-04 22:49               ` [PATCH 2/2] cache patch ids on disk Jeff King
2011-08-04 22:52                 ` Jeff King
2011-08-05 11:03             ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-05 15:31               ` René Scharfe
2011-08-06  6:30                 ` Jeff King
2011-07-13  7:04 ` [RFC/PATCHv2 2/6] add metadata-cache infrastructure Jeff King
2011-07-13  8:18   ` Bert Wesarg
2011-07-13  8:31     ` Jeff King
2011-07-13  8:45       ` Bert Wesarg
2011-07-13 19:18         ` Jeff King
2011-07-13 19:40       ` Junio C Hamano
2011-07-13 19:33   ` Junio C Hamano
2011-07-13 20:25     ` Jeff King
2011-07-13  7:05 ` [RFC/PATCHv2 3/6] commit: add commit_generation function Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13  7:05 ` [RFC/PATCHv2 4/6] pretty: support %G to show the generation number of a commit Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 5/6] check commit generation cache validity against grafts Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13 19:35     ` Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 6/6] limit "contains" traversals based on commit generation Jeff King
2011-07-13  7:23   ` Jeff King
2011-07-13 20:33     ` Junio C Hamano
2011-07-13 20:58       ` Jeff King
2011-07-13 21:12         ` Junio C Hamano
2011-07-13 21:18           ` Jeff King
2011-07-15 18:22   ` Junio C Hamano
2011-07-15 20:40     ` Jeff King
2011-07-15 21:04       ` Junio C Hamano
2011-07-15 21:14         ` Jeff King
2011-07-15 21:01 ` Generation numbers and replacement objects Jakub Narebski
2011-07-15 21:10   ` Jeff King
2011-07-16 21:10     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110713065700.GA18566@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=spearce@spearce.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).