All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Jakub Narebski" <jnareb@gmail.com>, "Ted Ts'o" <tytso@mit.edu>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Clemens Buchacher" <drizzd@aon.at>,
	"Shawn O. Pearce" <spearce@spearce.org>
Subject: [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers
Date: Wed, 13 Jul 2011 02:57:00 -0400	[thread overview]
Message-ID: <20110713065700.GA18566@sigill.intra.peff.net> (raw)
In-Reply-To: <20110713064709.GA18499@sigill.intra.peff.net>

The decorate API provides a mapping of objects to arbitrary
values. Until now, it did this by allowing you to store a
void pointer, which could point to other storage. This has
two problems:

  1. It's inefficient. To store even a small value, you have
     to allocate persistent storage. So we end up storing
     both the value and a pointer to it.

  2. It's a pain to use. To avoid heap overhead for small
     values, you have to either use a custom allocater, or
     you have to shoe-horn the value into the void pointer
     (if it's small enough to fit).

This patch lets you store fixed-size values directly in the
hash table without allocating them elsewhere. This is a
definite win for any value smaller than or equal to a
pointer. It's probably a win for slightly larger values, but
may eventually be slower for storing large structs (because
the values are copied when the hash grows).

It also provides a more natural interface for storing small
values; see the changes in fast-export.c, which can now drop
its pointer arithmetic magic.

The original "store and retrieve a void pointer" API is easy
to implement on top of this (the values we store are the
pointers). The add_decoration and lookup_decoration
functions are kept for compatibility.

Signed-off-by: Jeff King <peff@peff.net>
---
The "width" field is now const; you must either use a static
initializer or overwrite a struct decoration via memset. This gives us a
little more language-enforced safety.

The "stride" and "end" fields were removed in favor of run-time
calculation. The calculations are wrapped in inline functions to give
the optimizer a chance to remove them.

Since we're translating sizes to pointers in our loops via these
functions now anyway, I made the iteration interface a bit more sane.
It used to be:

  unsigned char *p;
  for (p = n->hash; p < n->end; p += n->stride) {
    struct object_decoration *e = (struct object_decoration *)p;
    ...
  }

which really leaks ugly implementation details. You can now do:

  int i;
  for (i = 0; i < n->size; i++) {
    struct object_decoration *e = decoration_slot(n, i);
    ...
  }

which is a bit more natural.

I peeked at the output of "gcc -O3"; it doesn't actually hoist much of
the decoration_slot calculation out of the loop. But I measured the code
from the previous series with this one, and wasn't able to detect a
difference. So I think using pointers was just a silly premature
micro-optimization.

 Documentation/technical/api-decorate.txt |  164 +++++++++++++++++++++++++++++-
 builtin/fast-export.c                    |   29 ++----
 decorate.c                               |   70 +++++++++----
 decorate.h                               |   37 ++++++-
 4 files changed, 251 insertions(+), 49 deletions(-)

diff --git a/Documentation/technical/api-decorate.txt b/Documentation/technical/api-decorate.txt
index 1d52a6c..b048b28 100644
--- a/Documentation/technical/api-decorate.txt
+++ b/Documentation/technical/api-decorate.txt
@@ -1,6 +1,166 @@
 decorate API
 ============
 
-Talk about <decorate.h>
+The decorate API is a system for efficiently mapping objects to values
+in memory. It is slightly slower than an actual member of an object
+struct (because it incurs a hash lookup), but it uses less memory when
+the mapping is not in use, or when the number of decorated objects is
+small compared to the total number of objects.
 
-(Linus)
+For efficiency, the mapping is capable of storing actual byte values, as
+long as the byte values for each element are of a fixed size. So one
+could, for example, map objects into 32-bit integers. For ease of use,
+functions are provided for storing the values of arbitrary pointers,
+which can point to strings or structs.
+
+Data Structures
+---------------
+
+`struct decoration`::
+
+	This structure represents a single mapping of objects to
+	values. Its fields are:
+
+	`name`:::
+		This field is not used by the decorate API itself, but
+		may be used by calling code.
+
+	`width`:::
+		This field specifies the width (in units of `char`) of
+		the values to be stored. This field must be set to its
+		final value when the decoration struct is initialized.
+		A width of `0` is equivalent to `sizeof(void *)`.
+
+	`nr`:::
+		The number of objects currently mapped by the
+		decoration.
+
+	`size`:::
+		The number of hash slots allocated; this is kept to at
+		least 3/2 of the number of actual slots used, to keep
+		the hash sparse.
+
+	`hash`:::
+		A pointer to an array of actual `object_decoration`
+		structs. Note that because the width of `struct
+		object_decoration` is not known until runtime, this
+		array is stored with type `unsigned char *`. To access
+		individual items, one must perform pointer arithmetic;
+		see the `decoration_by_offset` function below.
+
+`struct object_decoration`::
+
+	A structure representing the decoration of a single object.
+	Callers will not normally need to use this object unless they
+	are iterating all elements in the decoration hash. The `base`
+	field points to the object being mapped (or `NULL` if it is
+	an empty hash slot). The `decoration` field stores the mapped
+	value as a sequence of bytes; use the `width` field in `struct
+	decoration` to know the exact size.
+
+
+Functions
+---------
+
+`add_decoration_value`::
+
+	Add a mapping from an object to a sequence of bytes. The number
+	of bytes pointed to by `decoration` should be equal to the
+	`width` field of the `struct decoration`. If the `old` parameter
+	is not NULL and a there was already a value for the object, the
+	bytes of the old value are copied into `old`.  The return value
+	is `1` if there was a previous value, or `0` otherwise. Note
+	that if there is no previous value, then `old` is left
+	untouched; it is the responsibility of the caller to either
+	check the return value or to set a sentinel value in `old`.
+
+`lookup_decoration_value`::
+
+	Retrieve a decoration from the mapping. The return value is a
+	pointer to the sequence of bytes representing the value (of
+	length `width`), or `NULL` if no value is found.
+
+`add_decoration`::
+
+	Add a mapping from an object to a void pointer. If a previous
+	pointer exists for the object, it is returned; otherwise, `NULL`
+	is returned.
+
+`lookup_decoration`::
+
+	Retrieve a void pointer from the mapping. The return value is
+	the stored pointer, or `NULL` if there is no stored pointer.
+
+`decoration_slot`::
+
+	Retrieve the decoration stored at slot `i` in the hash table.
+	If the `base` field of the returned `struct object_decoration`
+	is `NULL`, then no value is stored at that slot. This function
+	is useful when iterating over the entire contents of the hash
+	table. See the iteration example below.
+
+
+Examples
+--------
+
+Store and retrieve pointers to structs:
+
+-------------------------------------------------------------------
+/* no need to set width parameter; it defaults to sizeof(void *) */
+static struct decoration commit_foos;
+
+void store_foo(const struct commit *c, const char *name)
+{
+	struct foo *value = alloc_foo(name);
+	struct foo *old;
+
+	old = add_decoration(&commit_foos, c->object, value);
+	free(old);
+}
+
+const struct foo *get_foo(const struct commit *c)
+{
+	return lookup_decoration(&commit_foos, c->object);
+}
+-------------------------------------------------------------------
+
+Store and retrieve `unsigned long` integers:
+
+-------------------------------------------------------------------
+static struct decoration longs = { "my longs", sizeof(unsigned long) };
+
+void store_long(const struct object *obj, unsigned long value)
+{
+	unsigned long old;
+	if (add_decoration_value(&longs, obj, &value, &old)
+		printf("old value was %lu\n", old);
+}
+
+void print_long(const struct object *obj)
+{
+	unsigned long *value = lookup_decoration_value(&longs, obj);
+	if (!value)
+		printf("no value\n");
+	else
+		printf("value is %lu\n", *value);
+}
+-------------------------------------------------------------------
+
+Iterate over all stored decorations:
+
+-------------------------------------------------------------------
+void dump_longs(void)
+{
+	int i;
+	for (i = 0; i < longs.size; i++) {
+		struct object_decoration *e = decoration_slot(&longs, i);
+		unsigned long *value = (unsigned long *)e->decoration;
+
+		/* empty hash slot */
+		if (!e->base)
+			continue;
+
+		printf("%s -> %lu\n", sha1_to_hex(e->base->sha1), *value);
+	}
+}
+-------------------------------------------------------------------
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index daf1945..82b458d 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -59,7 +59,7 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt,
 	return 0;
 }
 
-static struct decoration idnums;
+static struct decoration idnums = { NULL, sizeof(uint32_t) };
 static uint32_t last_idnum;
 
 static int has_unshown_parent(struct commit *commit)
@@ -73,20 +73,9 @@ static int has_unshown_parent(struct commit *commit)
 	return 0;
 }
 
-/* Since intptr_t is C99, we do not use it here */
-static inline uint32_t *mark_to_ptr(uint32_t mark)
-{
-	return ((uint32_t *)NULL) + mark;
-}
-
-static inline uint32_t ptr_to_mark(void * mark)
-{
-	return (uint32_t *)mark - (uint32_t *)NULL;
-}
-
 static inline void mark_object(struct object *object, uint32_t mark)
 {
-	add_decoration(&idnums, object, mark_to_ptr(mark));
+	add_decoration_value(&idnums, object, &mark, NULL);
 }
 
 static inline void mark_next_object(struct object *object)
@@ -96,10 +85,10 @@ static inline void mark_next_object(struct object *object)
 
 static int get_object_mark(struct object *object)
 {
-	void *decoration = lookup_decoration(&idnums, object);
-	if (!decoration)
+	uint32_t *mark = lookup_decoration_value(&idnums, object);
+	if (!mark)
 		return 0;
-	return ptr_to_mark(decoration);
+	return *mark;
 }
 
 static void show_progress(void)
@@ -537,8 +526,6 @@ static void handle_tags_and_duplicates(struct string_list *extra_refs)
 static void export_marks(char *file)
 {
 	unsigned int i;
-	uint32_t mark;
-	struct object_decoration *deco = idnums.hash;
 	FILE *f;
 	int e = 0;
 
@@ -547,15 +534,15 @@ static void export_marks(char *file)
 		die_errno("Unable to open marks file %s for writing.", file);
 
 	for (i = 0; i < idnums.size; i++) {
+		struct object_decoration *deco = decoration_slot(&idnums, i);
 		if (deco->base && deco->base->type == 1) {
-			mark = ptr_to_mark(deco->decoration);
-			if (fprintf(f, ":%"PRIu32" %s\n", mark,
+			uint32_t *mark = (uint32_t *)deco->decoration;
+			if (fprintf(f, ":%"PRIu32" %s\n", *mark,
 				sha1_to_hex(deco->base->sha1)) < 0) {
 			    e = 1;
 			    break;
 			}
 		}
-		deco++;
 	}
 
 	e |= ferror(f);
diff --git a/decorate.c b/decorate.c
index 2f8a63e..5a0747e 100644
--- a/decorate.c
+++ b/decorate.c
@@ -14,44 +14,48 @@ static unsigned int hash_obj(const struct object *obj, unsigned int n)
 	return hash % n;
 }
 
-static void *insert_decoration(struct decoration *n, const struct object *base, void *decoration)
+static int insert_decoration(struct decoration *n, const struct object *base,
+			     const void *decoration, void *old)
 {
 	int size = n->size;
-	struct object_decoration *hash = n->hash;
+	unsigned long width = decoration_width(n);
 	unsigned int j = hash_obj(base, size);
 
-	while (hash[j].base) {
-		if (hash[j].base == base) {
-			void *old = hash[j].decoration;
-			hash[j].decoration = decoration;
-			return old;
+	while (1) {
+		struct object_decoration *e = decoration_slot(n, j);
+		if (!e->base) {
+			e->base = base;
+			memcpy(e->decoration, decoration, width);
+			n->nr++;
+			return 0;
+		}
+		if (e->base == base) {
+			if (old)
+				memcpy(old, e->decoration, width);
+			memcpy(e->decoration, decoration, width);
+			return 1;
 		}
 		if (++j >= size)
 			j = 0;
 	}
-	hash[j].base = base;
-	hash[j].decoration = decoration;
-	n->nr++;
-	return NULL;
 }
 
 static void grow_decoration(struct decoration *n)
 {
 	int i;
 	int old_size = n->size;
-	struct object_decoration *old_hash = n->hash;
+	unsigned char *old_hash = n->hash;
 
 	n->size = (old_size + 1000) * 3 / 2;
-	n->hash = xcalloc(n->size, sizeof(struct object_decoration));
+	n->hash = xcalloc(n->size, decoration_stride(n));
 	n->nr = 0;
 
 	for (i = 0; i < old_size; i++) {
-		const struct object *base = old_hash[i].base;
-		void *decoration = old_hash[i].decoration;
-
-		if (!base)
-			continue;
-		insert_decoration(n, base, decoration);
+		struct object_decoration *e =
+			(struct object_decoration *)
+			(old_hash + i * decoration_stride(n));
+		if (e->base)
+			insert_decoration(n, e->base, e->decoration, NULL);
 	}
 	free(old_hash);
 }
@@ -60,15 +64,35 @@ static void grow_decoration(struct decoration *n)
 void *add_decoration(struct decoration *n, const struct object *obj,
 		void *decoration)
 {
-	int nr = n->nr + 1;
+	void *old = NULL;
+	add_decoration_value(n, obj, &decoration, &old);
+	return old;
+}
 
+int add_decoration_value(struct decoration *n,
+			 const struct object *obj,
+			 const void *decoration,
+			 void *old)
+{
+	int nr = n->nr + 1;
 	if (nr > n->size * 2 / 3)
 		grow_decoration(n);
-	return insert_decoration(n, obj, decoration);
+	return insert_decoration(n, obj, decoration, old);
 }
 
 /* Lookup a decoration pointer */
-void *lookup_decoration(struct decoration *n, const struct object *obj)
+void *lookup_decoration(const struct decoration *n, const struct object *obj)
+{
+	void **v;
+
+	v = lookup_decoration_value(n, obj);
+	if (!v)
+		return NULL;
+	return *v;
+}
+
+void *lookup_decoration_value(const struct decoration *n,
+			      const struct object *obj)
 {
 	unsigned int j;
 
@@ -77,7 +101,7 @@ void *lookup_decoration(struct decoration *n, const struct object *obj)
 		return NULL;
 	j = hash_obj(obj, n->size);
 	for (;;) {
-		struct object_decoration *ref = n->hash + j;
+		struct object_decoration *ref = decoration_slot(n, j);
 		if (ref->base == obj)
 			return ref->decoration;
 		if (!ref->base)
diff --git a/decorate.h b/decorate.h
index e732804..c0e8e3f 100644
--- a/decorate.h
+++ b/decorate.h
@@ -3,16 +3,47 @@
 
 struct object_decoration {
 	const struct object *base;
-	void *decoration;
+	unsigned char decoration[FLEX_ARRAY];
 };
 
 struct decoration {
 	const char *name;
+	/* width of data we're holding; must be set before adding */
+	const unsigned int width;
 	unsigned int size, nr;
-	struct object_decoration *hash;
+	/*
+	 * The hash contains object_decoration structs, but we don't know their
+	 * size until runtime. So we store is as a pointer to characters to
+	 * make pointer arithmetic easier.
+	 */
+	unsigned char *hash;
 };
 
 extern void *add_decoration(struct decoration *n, const struct object *obj, void *decoration);
-extern void *lookup_decoration(struct decoration *n, const struct object *obj);
+extern void *lookup_decoration(const struct decoration *n, const struct object *obj);
+
+extern int add_decoration_value(struct decoration *n,
+				const struct object *obj,
+				const void *decoration,
+				void *old);
+extern void *lookup_decoration_value(const struct decoration *n,
+				     const struct object *obj);
+
+static inline unsigned long decoration_width(const struct decoration *n)
+{
+	return n->width ? n->width : sizeof(void *);
+}
+
+static inline unsigned long decoration_stride(const struct decoration *n)
+{
+	return sizeof(struct object_decoration) + decoration_width(n);
+}
+
+static inline struct object_decoration *decoration_slot(const struct decoration *n,
+							unsigned i)
+{
+	return (struct object_decoration *)
+		(n->hash + (i * decoration_stride(n)));
+}
 
 #endif
-- 
1.7.6.37.g989c6

  reply	other threads:[~2011-07-13  6:57 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-13  6:47 [RFC/PATCHv2 0/6] generation numbers for faster traversals Jeff King
2011-07-13  6:57 ` Jeff King [this message]
2011-07-13 17:52   ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Jonathan Nieder
2011-07-13 20:08     ` Jeff King
2011-07-14 17:34       ` Jeff King
2011-07-14 17:51         ` [PATCH 1/3] implement generic key/value map Jeff King
2011-07-14 18:52           ` Bert Wesarg
2011-07-14 18:54             ` Bert Wesarg
2011-07-14 18:55               ` Jeff King
2011-07-14 19:07                 ` Bert Wesarg
2011-07-14 19:14                   ` Jeff King
2011-07-14 19:18                     ` Bert Wesarg
2011-07-14 17:52         ` [PATCH 2/3] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-07-15  9:40           ` Sverre Rabbelier
2011-07-15 20:00             ` Jeff King
2011-07-14 17:53         ` [PATCH 3/3] decorate: use "map" for the underlying implementation Jeff King
2011-07-14 21:06         ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Junio C Hamano
2011-08-04 22:43           ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-04 22:45             ` [PATCH 1/5] implement generic key/value map Jeff King
2011-08-04 22:46             ` [PATCH 2/5] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-08-04 22:46             ` [PATCH 3/5] decorate: use "map" for the underlying implementation Jeff King
2011-08-04 22:46             ` [PATCH 4/5] map: implement persistent maps Jeff King
2011-08-04 22:46             ` [PATCH 5/5] implement metadata cache subsystem Jeff King
2011-08-04 22:48             ` [RFC/PATCH 0/2] patch-id caching Jeff King
2011-08-04 22:49               ` [PATCH 1/2] cherry: read default config Jeff King
2011-08-04 22:49               ` [PATCH 2/2] cache patch ids on disk Jeff King
2011-08-04 22:52                 ` Jeff King
2011-08-05 11:03             ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-05 15:31               ` René Scharfe
2011-08-06  6:30                 ` Jeff King
2011-07-13  7:04 ` [RFC/PATCHv2 2/6] add metadata-cache infrastructure Jeff King
2011-07-13  8:18   ` Bert Wesarg
2011-07-13  8:31     ` Jeff King
2011-07-13  8:45       ` Bert Wesarg
2011-07-13 19:18         ` Jeff King
2011-07-13 19:40       ` Junio C Hamano
2011-07-13 19:33   ` Junio C Hamano
2011-07-13 20:25     ` Jeff King
2011-07-13  7:05 ` [RFC/PATCHv2 3/6] commit: add commit_generation function Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13  7:05 ` [RFC/PATCHv2 4/6] pretty: support %G to show the generation number of a commit Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 5/6] check commit generation cache validity against grafts Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13 19:35     ` Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 6/6] limit "contains" traversals based on commit generation Jeff King
2011-07-13  7:23   ` Jeff King
2011-07-13 20:33     ` Junio C Hamano
2011-07-13 20:58       ` Jeff King
2011-07-13 21:12         ` Junio C Hamano
2011-07-13 21:18           ` Jeff King
2011-07-15 18:22   ` Junio C Hamano
2011-07-15 20:40     ` Jeff King
2011-07-15 21:04       ` Junio C Hamano
2011-07-15 21:14         ` Jeff King
2011-07-15 21:01 ` Generation numbers and replacement objects Jakub Narebski
2011-07-15 21:10   ` Jeff King
2011-07-16 21:10     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110713065700.GA18566@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=jrnieder@gmail.com \
    --cc=spearce@spearce.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.