Re: [RFC/PATCH 2/3] small-alloc: add allocator for small objects

From: David Barr <davidbarr@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: GIT Mailing-list <git@vger.kernel.org>
Subject: Re: [RFC/PATCH 2/3] small-alloc: add allocator for small objects
Date: Fri, 24 Jun 2011 07:38:51 -0700	[thread overview]
Message-ID: <BANLkTi=34cQvU9oE0gPe=5PFDYfhxoYF+A@mail.gmail.com> (raw)
In-Reply-To: <7vk4cd617u.fsf@alter.siamese.dyndns.org>

Junio,

Sorry for the repeat, accidentally sent as HTML, rejected by the list.

On Wednesday, June 22, 2011, Junio C Hamano wrote:
David Barr <davidbarr@google.com> writes:

> +void *pool_alloc(struct mem_pool *pool, size_t len, size_t *id_out)
> +{
> +     static size_t id = 1;

Does this mean that even though you can have more than one mem_pool object
active in the system, you won't have id collision throughout the system?
That is a nice property (e.g. given an ID that does not belong to a pool,
you won't risk returning a wrong chunk of <mem,len> pair from pool_ptr()),
but is there a downside using function-scope static like this, I wonder?

This is an artifact that I missed in refactoring my experimental code.
This ought to be a field in struct mem_pool.
My intent was for id's to be unique and contiguous within a pool.

For example, if I have two pools A and B, and call pool_alloc on A and
then on B and then on A again, A's space[0] will have the first and the
third object, with id=1 and id=3. How does this interact with your
implementation of pool_ptr() which seems to assume that id's are
consecutive within a single pool->space[]?

It would interact very poorly.

> +     size_t n;

> +     void *r;
> +
> +     if ((pool->end - pool->next_free >= len) &&
> +         (pool->len_free >= sizeof_varint(len)))
> +             n = pool->nr - 1;

Help me make sure I understand what is going on here.

A mem-pool has pool->nr chunks (and it can grow as more memory is asked
from this pool). A new memory request is satisfied by carving out of
pool->space[n] (where n is the "newest chunk" in the pool).
pool->len[n] is a fixed sized byte array and stores the length of each
memory block carved out of pool->space[n] as a sequence of varint.
If pool->space[n] has enough space to fit "len" bytes, and if pool->len[n]
still has enough space to record the length, you use the current chunk,
otherwise (i.e. the else clause) you add a new chunk.

Am I with you so far?

Absolutely.

With the chunk_size of 2MB, you would fit roughly 16k allocation requests
for 128-byte memory, and you would need 2-bytes to express the length of
one piece of memory in your varint encoding, i.e. you would need to size
an element of pool->len[] to 32kB if you wanted to store 16k allocation of
128-byte for a chunk.

This would all depend on what the expected distribution of request size,
but it somehow feels wasteful to be limited by both space[] and len[]. If
you chose sizeof(*pool->len) that is too small for the workload, wouldn't
you end up allocating many 2MB space[], only to use potentially very
initial parts of them before len[] fills up?

Yes, this is a weakness of this iteration of the design.

> +     if (id_out)

> +             *id_out = id;
> +     id++;
> +
> +     char *t = &pool->len[n][sizeof(*pool->len) - pool->len_free];

Please avoid decl_after_statement.

Thanks for the reminder, will clean up some more.

> +     size_t n = pool->nr * id / pool->first_id[pool->nr - 1];

> +     if (n >= pool->nr - 1)
> +             n = pool->nr - 1;
> +     while (n && pool->first_id[n] > id)
> +             n--;
> +     while (n + 1 < pool->nr && pool->first_id[n + 1] <= id)
> +             n++;

I was about to say "bsearch?", but perhaps it is not worth it.

A linear guesstimate is typically off by a small amount, so linear
search is fine.
Also, the index of id's is contiguous, so linear search has good cache locality.
If I address the next concern in this review, this search will no
longer be necessary.

> +struct mem_pool {
> +     size_t *first_id;
> +     char **space;
> +     char (*len)[sizeof(size_t) + sizeof(char*)];

Each element of pool->len[] is just a byte-array that stores varint, no?
It is very misleading to specify its size as sizeof(size_t)+sizeof(char*)
as if you would store a "struct { size_t some; char *thing; }" here.

Instead of having two independently depleted byte-buffer (space[] and
len[]), I wonder if it would be more space efficient (without being less
processing efficient) to use a single buffer space.  Your pool_ptr() would
start at the beginning of pool->space[n], decode a varint and take it as a
length, if that is not the object you are looking for, skip that many
bytes (i.e. payload immediately follows the length) to the next object,
and so on.

I have already investigated this arrangement, it has very poor
locality of access.
For objects <32 bytes long, its not too bad since typically 2 bytes of a 64 byte
cache line would be read consecutively. For larger objects this is pathological
cache behavior. On the other hand, the current design means that the entire
sequence of lengths will fit on a single >=16 byte cache line.

Also what kind of alignment guarantee would we _want_ to give the callers?
As far as I can tell, this implementation does not guarantee any alignment.

No alignment guarantee provided. An interesting possibility is to provide a
guarantee and use the padding bytes for metadata.

Although on second reading I think I must have mis-read, I've been investigating
fixing the number of objects per chunk rather than fixing the space for lengths.
I have an inkling that I already considered this and found that it introduced a
nasty corner case. This investigation is ongoing.

--
David Barr.