All of lore.kernel.org
 help / color / mirror / Atom feed
From: Huang Ying <ying.huang@intel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Len Brown <lenb@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Andi Kleen <andi@firstfloor.org>,
	"Luck, Tony" <tony.luck@intel.com>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH -v2 3/4] lib, Make gen_pool memory allocator lockless
Date: Fri, 08 Apr 2011 09:33:48 +0800	[thread overview]
Message-ID: <4D9E65FC.8010507@intel.com> (raw)
In-Reply-To: <20110407184946.GC6104@Krystal>

On 04/08/2011 02:49 AM, Mathieu Desnoyers wrote:
> * Huang Ying (ying.huang@intel.com) wrote:
>>
>> +static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
>> +{
>> +     unsigned long val, nval;
>> +
>> +     nval = *addr;
>> +     do {
>> +             val = nval;
>> +             if (val & mask_to_set)
>> +                     return -EBUSY;
>> +             cpu_relax();
>> +     } while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
> 
> Some architectures have their own atomic set bit already (e.g. intel),
> you should probably extend the existing set "bit" to a set "bits"
> instead, and use that instead for those, and put the generic
> implementation in asm-generic.

You mean implement set_bits_ll based on atomic set_bit or test_and_set?
 I don't know how to do that in a more efficient way.

This code is not put into generic bitmap implementation (lib/bitmap.c)
because Linus think we have no enough users yet.

[snip]
>> +/*
>> + * bitmap_set_ll - set the specified number of bits at the specified position
>> + * @map: pointer to a bitmap
>> + * @start: a bit position in @map
>> + * @nr: number of bits to set
>> + *
>> + * Set @nr bits start from @start in @map lock-lessly. Several users
>> + * can set/clear the same bitmap simultaneously without lock. If two
>> + * users set the same bit, one user will return remain bits, otherwise
>> + * return 0.
>> + */
>> +static int bitmap_set_ll(unsigned long *map, int start, int nr)
>> +{
>> +     unsigned long *p = map + BIT_WORD(start);
>> +     const int size = start + nr;
>> +     int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
>> +     unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
> 
> Ah :) I've had some fun working on bitfield management headers. First
> question: how did you test this code ? Shift of "32" being turned to a
> no-op on Intel is an example of how some odd cases can creep into this
> kind of code. If you are interested, you might want to have a look at my
> portable bitfield read/write MIT-licensed header in the Babeltrace
> library, file include/babeltrace/bitfield.h
> (http://git.efficios.com/?p=babeltrace.git).  It's not using atomic
> read/writes, but supports bitfield read/write event across different
> endiannesses. I made a testing program for it by providing limit values
> and random value, and checking that what is read/written matches. That
> helped me find interesting corner-cases.

I have some self-made testing program to test this.  And this code is
just copy/change of bitmap_set in lib/bitmap.c, same for bitmap_clear_ll
too.

If bitmap implementation is so tricky, I think it may be a good idea to
add your testing code into kernel for lib/bitmap.c.

[snip]
>> @@ -58,15 +184,15 @@ int gen_pool_add(struct gen_pool *pool,
>>
>>       chunk = kmalloc_node(nbytes, GFP_KERNEL | __GFP_ZERO, nid);
>>       if (unlikely(chunk == NULL))
>> -             return -1;
>> +             return -ENOMEM;
>>
>> -     spin_lock_init(&chunk->lock);
>>       chunk->start_addr = addr;
>>       chunk->end_addr = addr + size;
>> +     atomic_set(&chunk->avail, size);
>>
>> -     write_lock(&pool->lock);
>> -     list_add(&chunk->next_chunk, &pool->chunks);
>> -     write_unlock(&pool->lock);
>> +     spin_lock(&pool->lock);
>> +     list_add_rcu(&chunk->next_chunk, &pool->chunks);
> 
> hrm, where is the list_del_rcu ? Is there anywhere where we have some
> call_rcu scheme or synchronize_rcu to handle chunk teardown ?

That should be in gen_pool_remove.  But that have not been implemented
yet.  I have plan to do it, after the basic support is in place.

[snip]
>> @@ -108,43 +233,47 @@ EXPORT_SYMBOL(gen_pool_destroy);
>>   * @size: number of bytes to allocate from the pool
>>   *
>>   * Allocate the requested number of bytes from the specified pool.
>> - * Uses a first-fit algorithm.
>> + * Uses a first-fit algorithm. Can not be used in NMI handler on
>> + * architectures without NMI-safe cmpxchg implementation.
>>   */
>>  unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
>>  {
>> -     struct list_head *_chunk;
>>       struct gen_pool_chunk *chunk;
>> -     unsigned long addr, flags;
>> +     unsigned long addr;
>>       int order = pool->min_alloc_order;
>> -     int nbits, start_bit, end_bit;
>> +     int nbits, start_bit = 0, end_bit, remain;
>> +
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> +     BUG_ON(in_nmi());
>> +#endif
>>
>>       if (size == 0)
>>               return 0;
>>
>>       nbits = (size + (1UL << order) - 1) >> order;
>> -
>> -     read_lock(&pool->lock);
>> -     list_for_each(_chunk, &pool->chunks) {
>> -             chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
> 
> missing rcu_read_lock() ?

rcu_read_lock() is not used here because we have not implement a
gen_pool_remove now.  So new chunk will be added into pool but no chunk
will be removed from pool.  After adding gen_pool_remove, we will add
rcu_read_lock() here.

>> +     list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>> +             if (size > atomic_read(&chunk->avail))
>> +                     continue;
>>
>>               end_bit = (chunk->end_addr - chunk->start_addr) >> order;
>> -
>> -             spin_lock_irqsave(&chunk->lock, flags);
>> -             start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0,
>> -                                             nbits, 0);
>> -             if (start_bit >= end_bit) {
>> -                     spin_unlock_irqrestore(&chunk->lock, flags);
>> +retry:
>> +             start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit,
>> +                                                    start_bit, nbits, 0);
>> +             if (start_bit >= end_bit)
>>                       continue;
>> +             remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
>> +             if (remain) {
>> +                     remain = bitmap_clear_ll(chunk->bits, start_bit,
>> +                                              nbits - remain);
>> +                     BUG_ON(remain);
> 
> maybe add cpu_relax() ? This is a busy loop after all.

There is cpu_relax() in bitmap_set_ll and bitmap_clear_ll already.  And
this loop is longer, do we need cpu_relax in long loop?

>> +                     goto retry;
>>               }
>>
>>               addr = chunk->start_addr + ((unsigned long)start_bit << order);
>> -
>> -             bitmap_set(chunk->bits, start_bit, nbits);
>> -             spin_unlock_irqrestore(&chunk->lock, flags);
>> -             read_unlock(&pool->lock);
>> +             size = nbits << order;
>> +             atomic_sub(size, &chunk->avail);
>>               return addr;
>>       }
>> -     read_unlock(&pool->lock);
>>       return 0;
>>  }
>>  EXPORT_SYMBOL(gen_pool_alloc);
>> @@ -155,33 +284,66 @@ EXPORT_SYMBOL(gen_pool_alloc);
>>   * @addr: starting address of memory to free back to pool
>>   * @size: size in bytes of memory to free
>>   *
>> - * Free previously allocated special memory back to the specified pool.
>> + * Free previously allocated special memory back to the specified
>> + * pool.  Can not be used in NMI handler on architectures without
>> + * NMI-safe cmpxchg implementation.
>>   */
>>  void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
>>  {
>> -     struct list_head *_chunk;
>>       struct gen_pool_chunk *chunk;
>> -     unsigned long flags;
>>       int order = pool->min_alloc_order;
>> -     int bit, nbits;
>> -
>> -     nbits = (size + (1UL << order) - 1) >> order;
>> +     int start_bit, nbits, remain;
>>
>> -     read_lock(&pool->lock);
>> -     list_for_each(_chunk, &pool->chunks) {
>> -             chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> +     BUG_ON(in_nmi());
>> +#endif
>>
>> +     nbits = (size + (1UL << order) - 1) >> order;
> 
> missing rcu_read_lock ?

Same as above.

>> +     list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
>>               if (addr >= chunk->start_addr && addr < chunk->end_addr) {
>>                       BUG_ON(addr + size > chunk->end_addr);
>> -                     spin_lock_irqsave(&chunk->lock, flags);
>> -                     bit = (addr - chunk->start_addr) >> order;
>> -                     while (nbits--)
>> -                             __clear_bit(bit++, chunk->bits);
>> -                     spin_unlock_irqrestore(&chunk->lock, flags);
>> -                     break;
>> +                     start_bit = (addr - chunk->start_addr) >> order;
>> +                     remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
>> +                     BUG_ON(remain);
>> +                     size = nbits << order;
>> +                     atomic_add(size, &chunk->avail);
>> +                     return;
>>               }
>>       }
>> -     BUG_ON(nbits > 0);
>> -     read_unlock(&pool->lock);
>> +     BUG();
>>  }
>>  EXPORT_SYMBOL(gen_pool_free);
>> +
>> +/**
>> + * gen_pool_avail - get available free space of the pool
>> + * @pool: pool to get available free space
>> + *
>> + * Return available free space of the specified pool.
>> + */
>> +size_t gen_pool_avail(struct gen_pool *pool)
>> +{
>> +     struct gen_pool_chunk *chunk;
>> +     size_t avail = 0;
>> +
> 
> rcu_read_lock ?

Same as above.

>> +     list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk)
>> +             avail += atomic_read(&chunk->avail);
>> +     return avail;
>> +}
>> +EXPORT_SYMBOL_GPL(gen_pool_avail);
>> +
>> +/**
>> + * gen_pool_size - get size in bytes of memory managed by the pool
>> + * @pool: pool to get size
>> + *
>> + * Return size in bytes of memory managed by the pool.
>> + */
>> +size_t gen_pool_size(struct gen_pool *pool)
>> +{
>> +     struct gen_pool_chunk *chunk;
>> +     size_t size = 0;
>> +
> 
> rcu_read_lock ?

Same as above.

>> +     list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk)
>> +             size += chunk->end_addr - chunk->start_addr;
>> +     return size;
>> +}
>> +EXPORT_SYMBOL_GPL(gen_pool_size);

Best Regards,
Huang Ying

  reply	other threads:[~2011-04-08  1:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-07  1:29 [PATCH -v2 0/4] ACPI, APEI, GHES, printk support for recoverable error via NMI Huang Ying
2011-04-07  1:29 ` [PATCH -v2 1/4] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG Huang Ying
2011-04-07 17:39   ` Russell King - ARM Linux
2011-04-08  0:32     ` Huang Ying
2011-04-07  1:29 ` [PATCH -v2 2/4] lib, Add lock-less NULL terminated single list Huang Ying
2011-04-07 18:30   ` Mathieu Desnoyers
2011-04-08  1:03     ` Huang Ying
2011-04-07  1:29 ` [PATCH -v2 3/4] lib, Make gen_pool memory allocator lockless Huang Ying
2011-04-07 18:49   ` Mathieu Desnoyers
2011-04-08  1:33     ` Huang Ying [this message]
2011-04-07  1:29 ` [PATCH -v2 4/4] ACPI, APEI, GHES, printk support for recoverable error via NMI Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D9E65FC.8010507@intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.