Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Jakub Narebski" <jnareb@gmail.com>, "Ted Ts\'o" <tytso@mit.edu>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Clemens Buchacher" <drizzd@aon.at>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	"David Barr" <davidbarr@google.com>
Subject: Re: [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers
Date: Wed, 13 Jul 2011 12:52:50 -0500
Message-ID: <20110713175250.GA1448@elie> (raw)
In-Reply-To: <20110713065700.GA18566@sigill.intra.peff.net>

(+cc: David Barr)
Hi,

Jeff King wrote:

> The decorate API provides a mapping of objects to arbitrary
> values. Until now, it did this by allowing you to store a
> void pointer, which could point to other storage. This has
> two problems:
[...]
> This patch lets you store fixed-size values directly in the
> hash table without allocating them elsewhere.

Nice idea.

> --- a/Documentation/technical/api-decorate.txt
> +++ b/Documentation/technical/api-decorate.txt
> @@ -1,6 +1,166 @@
>  decorate API
>  ============
>  
> -Talk about <decorate.h>
> +The decorate API is a system for efficiently mapping objects to values

Thanks for filling in the API docs!  That's awesome.

> +`struct object_decoration`::
> +
> +	A structure representing the decoration of a single object.
> +	Callers will not normally need to use this object unless they
> +	are iterating all elements in the decoration hash. The `base`
> +	field points to the object being mapped (or `NULL` if it is
> +	an empty hash slot). The `decoration` field stores the mapped
> +	value as a sequence of bytes; use the `width` field in `struct
> +	decoration` to know the exact size.

So the `decoration` field is an array rather than a pointer now,
hence...

[...]
> +void dump_longs(void)
> +{
> +	int i;
> +	for (i = 0; i < longs.size; i++) {
> +		struct object_decoration *e = decoration_slot(&longs, i);
> +		unsigned long *value = (unsigned long *)e->decoration;
> +
> +		/* empty hash slot */
> +		if (!e->base)
> +			continue;
> +
> +		printf("%s -> %lu\n", sha1_to_hex(e->base->sha1), *value);

... a cast is needed to use it.  Makes some sense.

What alignment guarantees are there for the field, if any?  I'm
especially worried about platforms like sparc32 where the pointer
width is 32 bits but some types need to be aligned to 64 bits.

> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
[...]

Nice.

> @@ -547,15 +534,15 @@ static void export_marks(char *file)
>  		die_errno("Unable to open marks file %s for writing.", file);
>  
>  	for (i = 0; i < idnums.size; i++) {
> +		struct object_decoration *deco = decoration_slot(&idnums, i);
>  		if (deco->base && deco->base->type == 1) {
> -			mark = ptr_to_mark(deco->decoration);
> -			if (fprintf(f, ":%"PRIu32" %s\n", mark,
> +			uint32_t *mark = (uint32_t *)deco->decoration;
> +			if (fprintf(f, ":%"PRIu32" %s\n", *mark,
>  				sha1_to_hex(deco->base->sha1)) < 0) {

Is this okay according to strict aliasing rules?  Maybe it would be
safer to write

			uint32_t mark;
			memcpy(&mark, deco->decoration, sizeof(mark));

which generates the same code in current versions of gcc on x86 if I
remember correctly.

> --- a/decorate.c
> +++ b/decorate.c
> @@ -14,44 +14,48 @@ static unsigned int hash_obj(const struct object *obj, unsigned int n)
>  	return hash % n;
>  }
>  
> -static void *insert_decoration(struct decoration *n, const struct object *base, void *decoration)
> +static int insert_decoration(struct decoration *n, const struct object *base,
> +			     const void *decoration, void *old)
>  {
>  	int size = n->size;
> -	struct object_decoration *hash = n->hash;
> +	unsigned long width = decoration_width(n);

Micronit: why not size_t?

>  	unsigned int j = hash_obj(base, size);
>  
> -	while (hash[j].base) {
> -		if (hash[j].base == base) {
> -			void *old = hash[j].decoration;
> -			hash[j].decoration = decoration;
> -			return old;
> +	while (1) {

Microoptimization: the modulo operation can avoid the conditional (j >= size):

	for (j = hash_obj(base, size); ; j = (j + 1) % size) {
	}

By the way, how do we know this loop will terminate?  Is it because
the insertion function is careful to make sure the table never gets
filled?

[...]
>  static void grow_decoration(struct decoration *n)
>  {
>  	int i;
>  	int old_size = n->size;
> -	struct object_decoration *old_hash = n->hash;
> +	unsigned char *old_hash = n->hash;
>  
>  	n->size = (old_size + 1000) * 3 / 2;
> -	n->hash = xcalloc(n->size, sizeof(struct object_decoration));
> +	n->hash = xcalloc(n->size, decoration_stride(n));
>  	n->nr = 0;
>  
>  	for (i = 0; i < old_size; i++) {
> -		const struct object *base = old_hash[i].base;
> -		void *decoration = old_hash[i].decoration;
> -
> -		if (!base)
> -			continue;
> -		insert_decoration(n, base, decoration);
> +		struct object_decoration *e =
> +			(struct object_decoration *)
> +			(old_hash + i * decoration_stride(n));
> +		if (e->base)
> +			insert_decoration(n, e->base, e->decoration, NULL);

I'm worried about alignment here, too.

[...]
> @@ -60,15 +64,35 @@ static void grow_decoration(struct decoration *n)
[...]
>  /* Lookup a decoration pointer */
> -void *lookup_decoration(struct decoration *n, const struct object *obj)
> +void *lookup_decoration(const struct decoration *n, const struct object *obj)
> +{
> +	void **v;
> +
> +	v = lookup_decoration_value(n, obj);
> +	if (!v)
> +		return NULL;
> +	return *v;
> +}

Maybe memcpy to avoid alignment problems?

	unsigned char *p;
	void *v;

	p = lookup_decoration_value(n, obj);
	if (!p)
		return NULL;
	memcpy(v, p, sizeof(v));
	return v;

But:

> +
> +void *lookup_decoration_value(const struct decoration *n,
> +			      const struct object *obj)
>  {
>  	unsigned int j;
>  
> @@ -77,7 +101,7 @@ void *lookup_decoration(struct decoration *n, const struct object *obj)
>  		return NULL;
>  	j = hash_obj(obj, n->size);
>  	for (;;) {
> -		struct object_decoration *ref = n->hash + j;
> +		struct object_decoration *ref = decoration_slot(n, j);
>  		if (ref->base == obj)
>  			return ref->decoration;

I worry that this could have alignment trouble anyway.

> --- a/decorate.h
> +++ b/decorate.h
> @@ -3,16 +3,47 @@
>  
>  struct object_decoration {
>  	const struct object *base;
> -	void *decoration;
> +	unsigned char decoration[FLEX_ARRAY];
>  };

On some platforms, this becomes

	struct object_decoration {
		const struct object *base;
		unsigned char decoration[];
	};

which I hope would create a type with the alignment of a pointer
(generally sufficient except in odd cases like sparc32).  But on
old-fashioned platforms, it is

	struct object_decoration {
		const struct object *base;
		unsigned char decoration[1];
	};

Will that be a problem, or is it standard for compilers to be smart
enough to pad to a nice alignment?

>  struct decoration {
>  	const char *name;
> +	/* width of data we're holding; must be set before adding */
> +	const unsigned int width;

Makes sense.

>  	unsigned int size, nr;
> -	struct object_decoration *hash;
> +	/*
> +	 * The hash contains object_decoration structs, but we don't know their
> +	 * size until runtime. So we store is as a pointer to characters to
> +	 * make pointer arithmetic easier.
> +	 */
> +	unsigned char *hash;
>  };
[...]
> +extern int add_decoration_value(struct decoration *n,
> +				const struct object *obj,
> +				const void *decoration,
> +				void *old);
> +extern void *lookup_decoration_value(const struct decoration *n,
> +				     const struct object *obj);

If we're willing to incur the cost of a copy that assumes unaligned
objects, perhaps

	extern int lookup_decoration_value(const struct decoration *n,
				const struct object *obj,
				void *result, size_t width);

would be safer.

Aside from the alignment and strict-aliasing worries, this looks very
nice.  Thanks for writing it.

Regards,
Jonathan

  reply index

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-13  6:47 [RFC/PATCHv2 0/6] generation numbers for faster traversals Jeff King
2011-07-13  6:57 ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Jeff King
2011-07-13 17:52   ` Jonathan Nieder [this message]
2011-07-13 20:08     ` Jeff King
2011-07-14 17:34       ` Jeff King
2011-07-14 17:51         ` [PATCH 1/3] implement generic key/value map Jeff King
2011-07-14 18:52           ` Bert Wesarg
2011-07-14 18:54             ` Bert Wesarg
2011-07-14 18:55               ` Jeff King
2011-07-14 19:07                 ` Bert Wesarg
2011-07-14 19:14                   ` Jeff King
2011-07-14 19:18                     ` Bert Wesarg
2011-07-14 17:52         ` [PATCH 2/3] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-07-15  9:40           ` Sverre Rabbelier
2011-07-15 20:00             ` Jeff King
2011-07-14 17:53         ` [PATCH 3/3] decorate: use "map" for the underlying implementation Jeff King
2011-07-14 21:06         ` [RFC/PATCHv2 1/6] decorate: allow storing values instead of pointers Junio C Hamano
2011-08-04 22:43           ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-04 22:45             ` [PATCH 1/5] implement generic key/value map Jeff King
2011-08-04 22:46             ` [PATCH 2/5] fast-export: use object to uint32 map instead of "decorate" Jeff King
2011-08-04 22:46             ` [PATCH 3/5] decorate: use "map" for the underlying implementation Jeff King
2011-08-04 22:46             ` [PATCH 4/5] map: implement persistent maps Jeff King
2011-08-04 22:46             ` [PATCH 5/5] implement metadata cache subsystem Jeff King
2011-08-05 11:03             ` [RFC/PATCH 0/5] macro-based key/value maps Jeff King
2011-08-05 15:31               ` René Scharfe
2011-08-06  6:30                 ` Jeff King
2011-07-13  7:04 ` [RFC/PATCHv2 2/6] add metadata-cache infrastructure Jeff King
2011-07-13  8:18   ` Bert Wesarg
2011-07-13  8:31     ` Jeff King
2011-07-13  8:45       ` Bert Wesarg
2011-07-13 19:18         ` Jeff King
2011-07-13 19:40       ` Junio C Hamano
2011-07-13 19:33   ` Junio C Hamano
2011-07-13 20:25     ` Jeff King
2011-07-13  7:05 ` [RFC/PATCHv2 3/6] commit: add commit_generation function Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13  7:05 ` [RFC/PATCHv2 4/6] pretty: support %G to show the generation number of a commit Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 5/6] check commit generation cache validity against grafts Jeff King
2011-07-13 14:26   ` Eric Sunshine
2011-07-13 19:35     ` Jeff King
2011-07-13  7:06 ` [RFC/PATCHv2 6/6] limit "contains" traversals based on commit generation Jeff King
2011-07-13  7:23   ` Jeff King
2011-07-13 20:33     ` Junio C Hamano
2011-07-13 20:58       ` Jeff King
2011-07-13 21:12         ` Junio C Hamano
2011-07-13 21:18           ` Jeff King
2011-07-15 18:22   ` Junio C Hamano
2011-07-15 20:40     ` Jeff King
2011-07-15 21:04       ` Junio C Hamano
2011-07-15 21:14         ` Jeff King
2011-07-15 21:01 ` Generation numbers and replacement objects Jakub Narebski
2011-07-15 21:10   ` Jeff King
2011-07-16 21:10     ` Jakub Narebski
2011-08-04 22:48 [RFC/PATCH 0/2] patch-id caching Jeff King
2011-08-04 22:49 ` [PATCH 1/2] cherry: read default config Jeff King
2011-08-04 22:49 ` [PATCH 2/2] cache patch ids on disk Jeff King
2011-08-04 22:52   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110713175250.GA1448@elie \
    --to=jrnieder@gmail.com \
    --cc=avarab@gmail.com \
    --cc=davidbarr@google.com \
    --cc=drizzd@aon.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git