linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: About lock-less data structure patches
       [not found]           ` <BLU0-SMTP52616BFA7ABB7E8E73E8396BC0@phx.gbl>
@ 2011-03-31  1:39             ` Huang Ying
  2011-04-01 21:37               ` Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: Huang Ying @ 2011-03-31  1:39 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrew Morton, Andi Kleen, lenb, Paul E. McKenney, linux-kernel,
	Linus Torvalds

Hi, Mathieu,

Thank you very much for your review.  Do you have time to take a look at
the lock-less memory allocator as follow?

https://lkml.org/lkml/2011/2/21/15

On 03/30/2011 11:11 PM, Mathieu Desnoyers wrote:
> * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
>> * Andrew Morton (akpm@linux-foundation.org) wrote:
>>> On Wed, 30 Mar 2011 05:22:03 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>>>
>>>> On Wed, Mar 30, 2011 at 09:30:43AM +0800, Huang Ying wrote:
>>>>> On 03/30/2011 09:21 AM, Andrew Morton wrote:
>>>>>> On Wed, 30 Mar 2011 09:14:45 +0800 Huang Ying <ying.huang@intel.com> wrote:
>>>>>>
>>>>>>> Hi, Andrew and Len,
>>>>>>>
>>>>>>> In my original APEI patches for 2.6.39, the following 3 patches is about
>>>>>>> lock-less data structure.
>>>>>>>
>>>>>>> [PATCH 1/7] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>>>>> [PATCH 2/7] lib, Add lock-less NULL terminated single list
>>>>>>> [PATCH 6/7] lib, Make gen_pool memory allocator lockless
>>>>>>>
>>>>>>> Len think we need some non-Intel "Acked-by" or "Reviewed-by" for these
>>>>>>> patches to go through the ACPI git tree.  Or they should go through
>>>>>>> other tree, such as -mm tree.
>>>>>>
>>>>>> I just dropped a couple of your patches because include/linux/llist.h
>>>>>> vanished from linux-next.   Did Len trip over the power cord?
>>>>>
>>>>> Len has dropped lock-less data structure patches from acpi git tree.  He
>>>>> describe the reason in following mails.
>>>>>
>>>>> https://lkml.org/lkml/2011/3/2/501
>>>>> https://lkml.org/lkml/2011/3/23/6
>>>>
>>>> Ok so we still need a lockless reviewer really and I don't count.
>>>
>>> Well I think you count ;) If this is some Intel thing then cluebeat,
>>> cluebeat, cluebeat, overruled.
>>>
>>>> Copying Mathieu who did a lot of lockless stuff. Are you interested
>>>> in reviewing Ying's patches?
>>>
>>> That would be great.
>>
>> Sure, I can have a look. Huang, can you resend those three patches
>> adding me to CC list ? That will help me keep appropriate threading in
>> my review. Adding Paul McKenney would also be appropriate.
> 
> I know, I know, I said I would wait for a repost, but now the answer
> burns my fingers. ;-) I'm replying to the patch found in
> https://lkml.org/lkml/2011/2/21/13
> 
> 
>> --- /dev/null
>> +++ b/include/linux/llist.h
>> @@ -0,0 +1,98 @@
>> +#ifndef LLIST_H
>> +#define LLIST_H
>> +/*
>> + * Lock-less NULL terminated single linked list
> 
> Because this single-linked-list works like a stack (with "push"
> operation for llist_add, "pop" operation for llist_del_first), I would
> recommend to rename it accordingly (as a stack rather than "list"). If
> we think about other possible users of this kind of lock-free list, such
> as call_rcu(), a "queue" would be rather more appropriate than a "stack"
> (with enqueue/dequeue operations). So at the very least I would like to
> make sure this API keeps room for lock-free queue implementations that
> won't be confused with this stack API. It would also be important to
> figure out if what we really want is a stack or a queue. Some naming
> ideas follow (maybe they are a bit verbose, comments are welcome).
> 
> We should note that this list implements "lock-free" push and pop
> operations (cmpxchg loops), and a "wait-free" "llist_del_all" operation
> (using xchg) (only really true for architectures with "true" xchg
> operation though, not those using LL/SC). We should think about the real
> use-case requirements put on this lockless stack to decide which variant
> is most appropriate. We can either have, with the implementation you
> propose:
> 
> - Lock-free push
> - Pop protected by mutex
> - Wait-free pop all
> 
> Or, as an example of an alternative structure (as Paul and I implemented
> in the userspace RCU library):
> 
> - Wait-free push (stronger real-time guarantees provided by xchg())
> - Blocking pop/pop all (use cmpxchg and busy-wait for very short time
>   periods)
> 
> (there are others, with e.g. lock-free push, lock-free pop, lock-free
> pop all, but this one requires RCU read lock across the pop/pop/pop all
> operations and that memory reclaim of the nodes is only performed after
> a RCU grace-period has elapsed. This deals with ABA issues of concurrent
> push/pop you noticed without requiring mutexes protecting pop operations.)
> 
> So it all boils down to which are the constraints of the push/pop
> callers.  Typically, I would expect that the "push" operation has the
> most strict real-time constraints (and is possibly executed the most
> often, thus would also benefit from xchg() which is typically slightly
> faster than cmpxchg()), which would argue in favor of a wait-free
> push/blocking pop.  But maybe I'm lacking understanding of what you are
> trying to do with this stack. Do you need to ever pop from a NMI
> handler ?

In my user case, I don't need to pop in a NMI handler, just push.  But
we need to pop in a IRQ handler, so we can not use blocking pop.  Please
take a look at the user case patches listed later.

> Some ideas for API identifiers:
> 
> struct llist_head -> slist_stack_head
> struct llist_node -> slist_stack_node

Why call it a stack and a list?  Because it is a stack implemented with
single list?  I think it is better to name after usage instead of
implementation.

The next question is whether it should be named as stack or list.  I
think from current user's point of view, they think they are using a
list instead of stack.  There are 3 users so far as follow.

https://lkml.org/lkml/2011/1/17/14
https://lkml.org/lkml/2011/1/17/15
https://lkml.org/lkml/2011/2/21/16

And if we named this data structure as list, we can still use "queue"
for another data structure.  Do you think so?

> * For your lock-free push/pop + wait-free pop_all implementation:
> 
> llist_add -> slist_stack_push_lf        (lock-free)
> llist_del_first -> _slist_stack_pop     (needs mutex protection)
> llist_del_all -> slist_stack_pop_all_wf (wait-free)

Do we really need to distinguish between lock-free and wait-free from
interface?  Will we implement both slist_stack_push_lf and
slist_stack_push_wf for one data structure?

mutex is needed between multiple "_slist_stack_pop", but not needed
between slist_stack_push_lf and _slist_stack_pop.  I think it is hard to
explain that clearly via function naming.

> * If we choose to go with an alternate wait-free push implementation:
> 
> llist_add -> slist_stack_push_wf              (wait-free)
> llist_del_first -> slist_stack_pop_blocking   (blocking)
> llist_del_all -> slist_stack_pop_all_blocking (blocking)

We need non-blocking pop, so maybe you need implement another data
structure which has these interface.  I think there can be multiple
lock-less data structure in kernel.

>> + *
>> + * If there are multiple producers and multiple consumers, llist_add
>> + * can be used in producers and llist_del_all can be used in
>> + * consumers.  They can work simultaneously without lock.  But
>> + * llist_del_first can not be used here.  Because llist_del_first
>> + * depends on list->first->next does not changed if list->first is not
>> + * changed during its operation, but llist_del_first, llist_add,
>> + * llist_add sequence in another consumer may violate that.
> 
> You did not seem to define the locking rules when using both
> 
>   llist_del_all
> and
>   llist_del_first
> 
> in parallel. I expect that a mutex is needed, because a
> 
>   llist_del_all, llist_add, llist_add
> 
> in parallel with
> 
>   llist_del_first
> 
> could run into the same ABA problem as described above.

OK.  I will add that.

>> + *
>> + * If there are multiple producers and one consumer, llist_add can be
>> + * used in producers and llist_del_all or llist_del_first can be used
>> + * in the consumer.
>> + *
>> + * The list entries deleted via llist_del_all can be traversed with
>> + * traversing function such as llist_for_each etc.  But the list
>> + * entries can not be traversed safely before deleted from the list.
> 
> Given that this is in fact a stack, specifying the traversal order of
> llist_for_each and friends would be appropriate.

Ok.  I will add something like "traversing from head to tail" in the
comments.

>> + *
>> + * The basic atomic operation of this list is cmpxchg on long.  On
>> + * architectures that don't have NMI-safe cmpxchg implementation, the
>> + * list can NOT be used in NMI handler.  So code uses the list in NMI
>> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
>> + */
>> +
>> +struct llist_head {
>> +     struct llist_node *first;
>> +};
>> +
>> +struct llist_node {
>> +     struct llist_node *next;
>> +};
>> +
>> +#define LLIST_HEAD_INIT(name)        { NULL }
>> +#define LLIST_HEAD(name)     struct llist_head name = LLIST_HEAD_INIT(name)
>> +
>> +/**
>> + * init_llist_head - initialize lock-less list head
>> + * @head:    the head for your lock-less list
>> + */
>> +static inline void init_llist_head(struct llist_head *list)
>> +{
>> +     list->first = NULL;
>> +}
>> +
>> +/**
>> + * llist_entry - get the struct of this entry
>> + * @ptr:     the &struct llist_node pointer.
>> + * @type:    the type of the struct this is embedded in.
>> + * @member:  the name of the llist_node within the struct.
>> + */
>> +#define llist_entry(ptr, type, member)               \
>> +     container_of(ptr, type, member)
>> +
>> +/**
>> + * llist_for_each - iterate over some deleted entries of a lock-less list
>> + * @pos:     the &struct llist_node to use as a loop cursor
>> + * @node:    the first entry of deleted list entries
>> + *
>> + * In general, some entries of the lock-less list can be traversed
>> + * safely only after being deleted from list, so start with an entry
>> + * instead of list head.
>> + */
>> +#define llist_for_each(pos, node)                    \
>> +     for (pos = (node); pos; pos = pos->next)
>> +
>> +/**
>> + * llist_for_each_entry - iterate over some deleted entries of lock-less list of given type
>> + * @pos:     the type * to use as a loop cursor.
>> + * @node:    the fist entry of deleted list entries.
>> + * @member:  the name of the llist_node with the struct.
>> + *
>> + * In general, some entries of the lock-less list can be traversed
>> + * safely only after being removed from list, so start with an entry
>> + * instead of list head.
>> + */
>> +#define llist_for_each_entry(pos, node, member)                              \
>> +     for (pos = llist_entry((node), typeof(*pos), member);           \
>> +          &pos->member != NULL;                                      \
>> +          pos = llist_entry(pos->member.next, typeof(*pos), member))
>> +
>> +/**
>> + * llist_empty - tests whether a lock-less list is empty
> 
> How is this llist_empty test expected to be used in combination with
> other API members ? e.g. llist_del_first, llist_del_all, llist_add ? I
> suspect that without mutex to ensure that there are no concurrent
> changes, llist_empty return value can easily be non-current.

We don't need llist_empty to be accurate.  Just a quick way to test
whether list/stack is empty without deleting something from list/stack.

Best Regards,
Huang Ying

> Thanks,
> 
> Mathieu
> 
>> + * @head:    the list to test
>> + */
>> +static inline int llist_empty(const struct llist_head *head)
>> +{
>> +     return head->first == NULL;
>> +}
>> +
>> +void llist_add(struct llist_node *new, struct llist_head *head);
>> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
>> +                  struct llist_head *head);
>> +struct llist_node *llist_del_first(struct llist_head *head);
>> +struct llist_node *llist_del_all(struct llist_head *head);
>> +#endif /* LLIST_H */
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -219,4 +219,7 @@ config LRU_CACHE
>>  config AVERAGE
>>       bool
>>
>> +config LLIST
>> +     bool
>> +
>>  endmenu
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -110,6 +110,8 @@ obj-$(CONFIG_ATOMIC64_SELFTEST) += atomi
>>
>>  obj-$(CONFIG_AVERAGE) += average.o
>>
>> +obj-$(CONFIG_LLIST) += llist.o
>> +
>>  hostprogs-y  := gen_crc32table
>>  clean-files  := crc32table.h
>>
>> --- /dev/null
>> +++ b/lib/llist.c
>> @@ -0,0 +1,119 @@
>> +/*
>> + * Lock-less NULL terminated single linked list
>> + *
>> + * The basic atomic operation of this list is cmpxchg on long.  On
>> + * architectures that don't have NMI-safe cmpxchg implementation, the
>> + * list can NOT be used in NMI handler.  So code uses the list in NMI
>> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
>> + *
>> + * Copyright 2010 Intel Corp.
>> + *   Author: Huang Ying <ying.huang@intel.com>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License version
>> + * 2 as published by the Free Software Foundation;
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/llist.h>
>> +
>> +#include <asm/system.h>
>> +
>> +/**
>> + * llist_add - add a new entry
>> + * @new:     new entry to be added
>> + * @head:    the head for your lock-less list
>> + */
>> +void llist_add(struct llist_node *new, struct llist_head *head)
>> +{
>> +     struct llist_node *entry;
>> +
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> +     BUG_ON(in_nmi());
>> +#endif
>> +
>> +     do {
>> +             entry = head->first;
>> +             new->next = entry;
>> +     } while (cmpxchg(&head->first, entry, new) != entry);
>> +}
>> +EXPORT_SYMBOL_GPL(llist_add);
>> +
>> +/**
>> + * llist_add_batch - add several linked entries in batch
>> + * @new_first:       first entry in batch to be added
>> + * @new_last:        last entry in batch to be added
>> + * @head:    the head for your lock-less list
>> + */
>> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
>> +                  struct llist_head *head)
>> +{
>> +     struct llist_node *entry;
>> +
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> +     BUG_ON(in_nmi());
>> +#endif
>> +
>> +     do {
>> +             entry = head->first;
>> +             new_last->next = entry;
>> +     } while (cmpxchg(&head->first, entry, new_first) != entry);
>> +}
>> +EXPORT_SYMBOL_GPL(llist_add_batch);
>> +
>> +/**
>> + * llist_del_first - delete the first entry of lock-less list
>> + * @head:    the head for your lock-less list
>> + *
>> + * If list is empty, return NULL, otherwise, return the first entry deleted.
>> + *
>> + * Only one llist_del_first user can be used simultaneously with
>> + * multiple llist_add users without lock. Because otherwise
>> + * llist_del_first, llist_add, llist_add sequence in another user may
>> + * change @head->first->next, but keep @head->first. If multiple
>> + * consumers are needed, please use llist_del_all.
>> + */
>> +struct llist_node *llist_del_first(struct llist_head *head)
>> +{
>> +     struct llist_node *entry;
>> +
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> +     BUG_ON(in_nmi());
>> +#endif
>> +
>> +     do {
>> +             entry = head->first;
>> +             if (entry == NULL)
>> +                     return NULL;
>> +     } while (cmpxchg(&head->first, entry, entry->next) != entry);
>> +
>> +     return entry;
>> +}
>> +EXPORT_SYMBOL_GPL(llist_del_first);
>> +
>> +/**
>> + * llist_del_all - delete all entries from lock-less list
>> + * @head:    the head of lock-less list to delete all entries
>> + *
>> + * If list is empty, return NULL, otherwise, delete all entries and
>> + * return the pointer to the first entry.
>> + */
>> +struct llist_node *llist_del_all(struct llist_head *head)
>> +{
>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>> +     BUG_ON(in_nmi());
>> +#endif
>> +
>> +     return xchg(&head->first, NULL);
>> +}
>> +EXPORT_SYMBOL_GPL(llist_del_all);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-03-31  1:39             ` About lock-less data structure patches Huang Ying
@ 2011-04-01 21:37               ` Mathieu Desnoyers
  2011-04-02  5:05                 ` Huang Ying
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Desnoyers @ 2011-04-01 21:37 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, Andi Kleen, lenb, Paul E. McKenney, linux-kernel,
	Linus Torvalds

* Huang Ying (ying.huang@intel.com) wrote:
> Hi, Mathieu,
> 
> Thank you very much for your review.  Do you have time to take a look at
> the lock-less memory allocator as follow?
> 
> https://lkml.org/lkml/2011/2/21/15

I'll try to have a look in the following weeks.

Answer to comments below,

> 
> On 03/30/2011 11:11 PM, Mathieu Desnoyers wrote:
> > * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> >> * Andrew Morton (akpm@linux-foundation.org) wrote:
> >>> On Wed, 30 Mar 2011 05:22:03 +0200 Andi Kleen <andi@firstfloor.org> wrote:
> >>>
> >>>> On Wed, Mar 30, 2011 at 09:30:43AM +0800, Huang Ying wrote:
> >>>>> On 03/30/2011 09:21 AM, Andrew Morton wrote:
> >>>>>> On Wed, 30 Mar 2011 09:14:45 +0800 Huang Ying <ying.huang@intel.com> wrote:
> >>>>>>
> >>>>>>> Hi, Andrew and Len,
> >>>>>>>
> >>>>>>> In my original APEI patches for 2.6.39, the following 3 patches is about
> >>>>>>> lock-less data structure.
> >>>>>>>
> >>>>>>> [PATCH 1/7] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
> >>>>>>> [PATCH 2/7] lib, Add lock-less NULL terminated single list
> >>>>>>> [PATCH 6/7] lib, Make gen_pool memory allocator lockless
> >>>>>>>
> >>>>>>> Len think we need some non-Intel "Acked-by" or "Reviewed-by" for these
> >>>>>>> patches to go through the ACPI git tree.  Or they should go through
> >>>>>>> other tree, such as -mm tree.
> >>>>>>
> >>>>>> I just dropped a couple of your patches because include/linux/llist.h
> >>>>>> vanished from linux-next.   Did Len trip over the power cord?
> >>>>>
> >>>>> Len has dropped lock-less data structure patches from acpi git tree.  He
> >>>>> describe the reason in following mails.
> >>>>>
> >>>>> https://lkml.org/lkml/2011/3/2/501
> >>>>> https://lkml.org/lkml/2011/3/23/6
> >>>>
> >>>> Ok so we still need a lockless reviewer really and I don't count.
> >>>
> >>> Well I think you count ;) If this is some Intel thing then cluebeat,
> >>> cluebeat, cluebeat, overruled.
> >>>
> >>>> Copying Mathieu who did a lot of lockless stuff. Are you interested
> >>>> in reviewing Ying's patches?
> >>>
> >>> That would be great.
> >>
> >> Sure, I can have a look. Huang, can you resend those three patches
> >> adding me to CC list ? That will help me keep appropriate threading in
> >> my review. Adding Paul McKenney would also be appropriate.
> > 
> > I know, I know, I said I would wait for a repost, but now the answer
> > burns my fingers. ;-) I'm replying to the patch found in
> > https://lkml.org/lkml/2011/2/21/13
> > 
> > 
> >> --- /dev/null
> >> +++ b/include/linux/llist.h
> >> @@ -0,0 +1,98 @@
> >> +#ifndef LLIST_H
> >> +#define LLIST_H
> >> +/*
> >> + * Lock-less NULL terminated single linked list
> > 
> > Because this single-linked-list works like a stack (with "push"
> > operation for llist_add, "pop" operation for llist_del_first), I would
> > recommend to rename it accordingly (as a stack rather than "list"). If
> > we think about other possible users of this kind of lock-free list, such
> > as call_rcu(), a "queue" would be rather more appropriate than a "stack"
> > (with enqueue/dequeue operations). So at the very least I would like to
> > make sure this API keeps room for lock-free queue implementations that
> > won't be confused with this stack API. It would also be important to
> > figure out if what we really want is a stack or a queue. Some naming
> > ideas follow (maybe they are a bit verbose, comments are welcome).
> > 
> > We should note that this list implements "lock-free" push and pop
> > operations (cmpxchg loops), and a "wait-free" "llist_del_all" operation
> > (using xchg) (only really true for architectures with "true" xchg
> > operation though, not those using LL/SC). We should think about the real
> > use-case requirements put on this lockless stack to decide which variant
> > is most appropriate. We can either have, with the implementation you
> > propose:
> > 
> > - Lock-free push
> > - Pop protected by mutex
> > - Wait-free pop all
> > 
> > Or, as an example of an alternative structure (as Paul and I implemented
> > in the userspace RCU library):
> > 
> > - Wait-free push (stronger real-time guarantees provided by xchg())
> > - Blocking pop/pop all (use cmpxchg and busy-wait for very short time
> >   periods)
> > 
> > (there are others, with e.g. lock-free push, lock-free pop, lock-free
> > pop all, but this one requires RCU read lock across the pop/pop/pop all
> > operations and that memory reclaim of the nodes is only performed after
> > a RCU grace-period has elapsed. This deals with ABA issues of concurrent
> > push/pop you noticed without requiring mutexes protecting pop operations.)
> > 
> > So it all boils down to which are the constraints of the push/pop
> > callers.  Typically, I would expect that the "push" operation has the
> > most strict real-time constraints (and is possibly executed the most
> > often, thus would also benefit from xchg() which is typically slightly
> > faster than cmpxchg()), which would argue in favor of a wait-free
> > push/blocking pop.  But maybe I'm lacking understanding of what you are
> > trying to do with this stack. Do you need to ever pop from a NMI
> > handler ?
> 
> In my user case, I don't need to pop in a NMI handler, just push.  But
> we need to pop in a IRQ handler, so we can not use blocking pop.  Please
> take a look at the user case patches listed later.

Actually, I marked that as a "blocking" in our implementation because I
have a busy-loop in there, and because my algorithm was implemented in
user-space, I added an adaptative delay to make sure not to busy-loop
waiting for a preempted thread to come back.

But by disabling preemption and using a real busy-loop in the kernel,
the pop can trivially be made non-blocking, and thus OK for interrupt
handlers.

> 
> > Some ideas for API identifiers:
> > 
> > struct llist_head -> slist_stack_head
> > struct llist_node -> slist_stack_node
> 
> Why call it a stack and a list?  Because it is a stack implemented with
> single list?  I think it is better to name after usage instead of
> implementation.
> 
> The next question is whether it should be named as stack or list.  I
> think from current user's point of view, they think they are using a
> list instead of stack.  There are 3 users so far as follow.
> 
> https://lkml.org/lkml/2011/1/17/14
> https://lkml.org/lkml/2011/1/17/15
> https://lkml.org/lkml/2011/2/21/16

Hrm, I'm concerned about the impact of handing the irq work from one
execution context to another in the reverse order (because a stack is
implemented here). Can this have unwanted side-effects ?

> 
> And if we named this data structure as list, we can still use "queue"
> for another data structure.  Do you think so?

Well, the "delete first entry" is really unclear if we call all this a
"list". Which is the first entry ? The first added to the list, or the
last one ? The natural reflex would be to think it is the first added,
but in this case, it's the opposite. This is why I think using stack or
lifo (with push/pop) would be clearer than "list" (with add/del).

So maybe "lockless stack" could be palatable ? The name "lockless lifo"
came to my mind in the past days too.

> 
> > * For your lock-free push/pop + wait-free pop_all implementation:
> > 
> > llist_add -> slist_stack_push_lf        (lock-free)
> > llist_del_first -> _slist_stack_pop     (needs mutex protection)
> > llist_del_all -> slist_stack_pop_all_wf (wait-free)
> 
> Do we really need to distinguish between lock-free and wait-free from
> interface?

I don't think it is needed, and it's probably even just more confusing
for API users.

> Will we implement both slist_stack_push_lf and
> slist_stack_push_wf for one data structure?

Those will possibly even require different data structures. It might be
less confusing to find out clearly what our needs are, and only propose
a single behavior, otherwise it will be confusing for users of these
APIs without good reason.

> 
> mutex is needed between multiple "_slist_stack_pop", but not needed
> between slist_stack_push_lf and _slist_stack_pop.  I think it is hard to
> explain that clearly via function naming.

Good point. A ascii-art table might be appropriate here, e.g.:


M: Mutual exclusion needed to protect one from another.
-: lockless.

             |     push     |   pop    |   pop_all
push         |      -       |    -     |     -
pop          |              |    L     |     L
pop_all      |              |          |     -

How about adding this (or something prettier) to the header ?


> 
> > * If we choose to go with an alternate wait-free push implementation:
> > 
> > llist_add -> slist_stack_push_wf              (wait-free)
> > llist_del_first -> slist_stack_pop_blocking   (blocking)
> > llist_del_all -> slist_stack_pop_all_blocking (blocking)
> 
> We need non-blocking pop, so maybe you need implement another data
> structure which has these interface.  I think there can be multiple
> lock-less data structure in kernel.

As I noted earlier, the blocking was only due to our user-level
implementation. It can be turned in a very short-lived busy loop
instead.

> 
> >> + *
> >> + * If there are multiple producers and multiple consumers, llist_add
> >> + * can be used in producers and llist_del_all can be used in
> >> + * consumers.  They can work simultaneously without lock.  But
> >> + * llist_del_first can not be used here.  Because llist_del_first
> >> + * depends on list->first->next does not changed if list->first is not
> >> + * changed during its operation, but llist_del_first, llist_add,
> >> + * llist_add sequence in another consumer may violate that.
> > 
> > You did not seem to define the locking rules when using both
> > 
> >   llist_del_all
> > and
> >   llist_del_first
> > 
> > in parallel. I expect that a mutex is needed, because a
> > 
> >   llist_del_all, llist_add, llist_add
> > 
> > in parallel with
> > 
> >   llist_del_first
> > 
> > could run into the same ABA problem as described above.
> 
> OK.  I will add that.
> 
> >> + *
> >> + * If there are multiple producers and one consumer, llist_add can be
> >> + * used in producers and llist_del_all or llist_del_first can be used
> >> + * in the consumer.
> >> + *
> >> + * The list entries deleted via llist_del_all can be traversed with
> >> + * traversing function such as llist_for_each etc.  But the list
> >> + * entries can not be traversed safely before deleted from the list.
> > 
> > Given that this is in fact a stack, specifying the traversal order of
> > llist_for_each and friends would be appropriate.
> 
> Ok.  I will add something like "traversing from head to tail" in the
> comments.

Traversing from last element pushed to first element pushed would
probably be clearer.

> 
> >> + *
> >> + * The basic atomic operation of this list is cmpxchg on long.  On
> >> + * architectures that don't have NMI-safe cmpxchg implementation, the
> >> + * list can NOT be used in NMI handler.  So code uses the list in NMI
> >> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
> >> + */
> >> +
> >> +struct llist_head {
> >> +     struct llist_node *first;
> >> +};
> >> +
> >> +struct llist_node {
> >> +     struct llist_node *next;
> >> +};
> >> +
> >> +#define LLIST_HEAD_INIT(name)        { NULL }
> >> +#define LLIST_HEAD(name)     struct llist_head name = LLIST_HEAD_INIT(name)
> >> +
> >> +/**
> >> + * init_llist_head - initialize lock-less list head
> >> + * @head:    the head for your lock-less list
> >> + */
> >> +static inline void init_llist_head(struct llist_head *list)
> >> +{
> >> +     list->first = NULL;
> >> +}
> >> +
> >> +/**
> >> + * llist_entry - get the struct of this entry
> >> + * @ptr:     the &struct llist_node pointer.
> >> + * @type:    the type of the struct this is embedded in.
> >> + * @member:  the name of the llist_node within the struct.
> >> + */
> >> +#define llist_entry(ptr, type, member)               \
> >> +     container_of(ptr, type, member)
> >> +
> >> +/**
> >> + * llist_for_each - iterate over some deleted entries of a lock-less list
> >> + * @pos:     the &struct llist_node to use as a loop cursor
> >> + * @node:    the first entry of deleted list entries
> >> + *
> >> + * In general, some entries of the lock-less list can be traversed
> >> + * safely only after being deleted from list, so start with an entry
> >> + * instead of list head.
> >> + */
> >> +#define llist_for_each(pos, node)                    \
> >> +     for (pos = (node); pos; pos = pos->next)
> >> +
> >> +/**
> >> + * llist_for_each_entry - iterate over some deleted entries of lock-less list of given type
> >> + * @pos:     the type * to use as a loop cursor.
> >> + * @node:    the fist entry of deleted list entries.
> >> + * @member:  the name of the llist_node with the struct.
> >> + *
> >> + * In general, some entries of the lock-less list can be traversed
> >> + * safely only after being removed from list, so start with an entry
> >> + * instead of list head.
> >> + */
> >> +#define llist_for_each_entry(pos, node, member)                              \
> >> +     for (pos = llist_entry((node), typeof(*pos), member);           \
> >> +          &pos->member != NULL;                                      \
> >> +          pos = llist_entry(pos->member.next, typeof(*pos), member))
> >> +
> >> +/**
> >> + * llist_empty - tests whether a lock-less list is empty
> > 
> > How is this llist_empty test expected to be used in combination with
> > other API members ? e.g. llist_del_first, llist_del_all, llist_add ? I
> > suspect that without mutex to ensure that there are no concurrent
> > changes, llist_empty return value can easily be non-current.
> 
> We don't need llist_empty to be accurate.  Just a quick way to test
> whether list/stack is empty without deleting something from list/stack.

OK, maybe specifying this limitation above the function would be
appropriate.

Thanks,

Mathieu

> 
> Best Regards,
> Huang Ying
> 
> > Thanks,
> > 
> > Mathieu
> > 
> >> + * @head:    the list to test
> >> + */
> >> +static inline int llist_empty(const struct llist_head *head)
> >> +{
> >> +     return head->first == NULL;
> >> +}
> >> +
> >> +void llist_add(struct llist_node *new, struct llist_head *head);
> >> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
> >> +                  struct llist_head *head);
> >> +struct llist_node *llist_del_first(struct llist_head *head);
> >> +struct llist_node *llist_del_all(struct llist_head *head);
> >> +#endif /* LLIST_H */
> >> --- a/lib/Kconfig
> >> +++ b/lib/Kconfig
> >> @@ -219,4 +219,7 @@ config LRU_CACHE
> >>  config AVERAGE
> >>       bool
> >>
> >> +config LLIST
> >> +     bool
> >> +
> >>  endmenu
> >> --- a/lib/Makefile
> >> +++ b/lib/Makefile
> >> @@ -110,6 +110,8 @@ obj-$(CONFIG_ATOMIC64_SELFTEST) += atomi
> >>
> >>  obj-$(CONFIG_AVERAGE) += average.o
> >>
> >> +obj-$(CONFIG_LLIST) += llist.o
> >> +
> >>  hostprogs-y  := gen_crc32table
> >>  clean-files  := crc32table.h
> >>
> >> --- /dev/null
> >> +++ b/lib/llist.c
> >> @@ -0,0 +1,119 @@
> >> +/*
> >> + * Lock-less NULL terminated single linked list
> >> + *
> >> + * The basic atomic operation of this list is cmpxchg on long.  On
> >> + * architectures that don't have NMI-safe cmpxchg implementation, the
> >> + * list can NOT be used in NMI handler.  So code uses the list in NMI
> >> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
> >> + *
> >> + * Copyright 2010 Intel Corp.
> >> + *   Author: Huang Ying <ying.huang@intel.com>
> >> + *
> >> + * This program is free software; you can redistribute it and/or
> >> + * modify it under the terms of the GNU General Public License version
> >> + * 2 as published by the Free Software Foundation;
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, write to the Free Software
> >> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> >> + */
> >> +#include <linux/kernel.h>
> >> +#include <linux/module.h>
> >> +#include <linux/interrupt.h>
> >> +#include <linux/llist.h>
> >> +
> >> +#include <asm/system.h>
> >> +
> >> +/**
> >> + * llist_add - add a new entry
> >> + * @new:     new entry to be added
> >> + * @head:    the head for your lock-less list
> >> + */
> >> +void llist_add(struct llist_node *new, struct llist_head *head)
> >> +{
> >> +     struct llist_node *entry;
> >> +
> >> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> >> +     BUG_ON(in_nmi());
> >> +#endif
> >> +
> >> +     do {
> >> +             entry = head->first;
> >> +             new->next = entry;
> >> +     } while (cmpxchg(&head->first, entry, new) != entry);
> >> +}
> >> +EXPORT_SYMBOL_GPL(llist_add);
> >> +
> >> +/**
> >> + * llist_add_batch - add several linked entries in batch
> >> + * @new_first:       first entry in batch to be added
> >> + * @new_last:        last entry in batch to be added
> >> + * @head:    the head for your lock-less list
> >> + */
> >> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
> >> +                  struct llist_head *head)
> >> +{
> >> +     struct llist_node *entry;
> >> +
> >> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> >> +     BUG_ON(in_nmi());
> >> +#endif
> >> +
> >> +     do {
> >> +             entry = head->first;
> >> +             new_last->next = entry;
> >> +     } while (cmpxchg(&head->first, entry, new_first) != entry);
> >> +}
> >> +EXPORT_SYMBOL_GPL(llist_add_batch);
> >> +
> >> +/**
> >> + * llist_del_first - delete the first entry of lock-less list
> >> + * @head:    the head for your lock-less list
> >> + *
> >> + * If list is empty, return NULL, otherwise, return the first entry deleted.
> >> + *
> >> + * Only one llist_del_first user can be used simultaneously with
> >> + * multiple llist_add users without lock. Because otherwise
> >> + * llist_del_first, llist_add, llist_add sequence in another user may
> >> + * change @head->first->next, but keep @head->first. If multiple
> >> + * consumers are needed, please use llist_del_all.
> >> + */
> >> +struct llist_node *llist_del_first(struct llist_head *head)
> >> +{
> >> +     struct llist_node *entry;
> >> +
> >> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> >> +     BUG_ON(in_nmi());
> >> +#endif
> >> +
> >> +     do {
> >> +             entry = head->first;
> >> +             if (entry == NULL)
> >> +                     return NULL;
> >> +     } while (cmpxchg(&head->first, entry, entry->next) != entry);
> >> +
> >> +     return entry;
> >> +}
> >> +EXPORT_SYMBOL_GPL(llist_del_first);
> >> +
> >> +/**
> >> + * llist_del_all - delete all entries from lock-less list
> >> + * @head:    the head of lock-less list to delete all entries
> >> + *
> >> + * If list is empty, return NULL, otherwise, delete all entries and
> >> + * return the pointer to the first entry.
> >> + */
> >> +struct llist_node *llist_del_all(struct llist_head *head)
> >> +{
> >> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> >> +     BUG_ON(in_nmi());
> >> +#endif
> >> +
> >> +     return xchg(&head->first, NULL);
> >> +}
> >> +EXPORT_SYMBOL_GPL(llist_del_all);

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-04-01 21:37               ` Mathieu Desnoyers
@ 2011-04-02  5:05                 ` Huang Ying
  2011-04-04 15:53                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: Huang Ying @ 2011-04-02  5:05 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrew Morton, Andi Kleen, lenb, Paul E. McKenney, linux-kernel,
	Linus Torvalds

On 04/02/2011 05:37 AM, Mathieu Desnoyers wrote:
> * Huang Ying (ying.huang@intel.com) wrote:
>> Hi, Mathieu,
>>
>> Thank you very much for your review.  Do you have time to take a look at
>> the lock-less memory allocator as follow?
>>
>> https://lkml.org/lkml/2011/2/21/15
> 
> I'll try to have a look in the following weeks.

Thanks.

> Answer to comments below,
> 
>>
>> On 03/30/2011 11:11 PM, Mathieu Desnoyers wrote:
>>> * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
>>>> * Andrew Morton (akpm@linux-foundation.org) wrote:
>>>>> On Wed, 30 Mar 2011 05:22:03 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>>>>>
>>>>>> On Wed, Mar 30, 2011 at 09:30:43AM +0800, Huang Ying wrote:
>>>>>>> On 03/30/2011 09:21 AM, Andrew Morton wrote:
>>>>>>>> On Wed, 30 Mar 2011 09:14:45 +0800 Huang Ying <ying.huang@intel.com> wrote:
>>>>>>>>
>>>>>>>>> Hi, Andrew and Len,
>>>>>>>>>
>>>>>>>>> In my original APEI patches for 2.6.39, the following 3 patches is about
>>>>>>>>> lock-less data structure.
>>>>>>>>>
>>>>>>>>> [PATCH 1/7] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>>>>>>> [PATCH 2/7] lib, Add lock-less NULL terminated single list
>>>>>>>>> [PATCH 6/7] lib, Make gen_pool memory allocator lockless
>>>>>>>>>
>>>>>>>>> Len think we need some non-Intel "Acked-by" or "Reviewed-by" for these
>>>>>>>>> patches to go through the ACPI git tree.  Or they should go through
>>>>>>>>> other tree, such as -mm tree.
>>>>>>>>
>>>>>>>> I just dropped a couple of your patches because include/linux/llist.h
>>>>>>>> vanished from linux-next.   Did Len trip over the power cord?
>>>>>>>
>>>>>>> Len has dropped lock-less data structure patches from acpi git tree.  He
>>>>>>> describe the reason in following mails.
>>>>>>>
>>>>>>> https://lkml.org/lkml/2011/3/2/501
>>>>>>> https://lkml.org/lkml/2011/3/23/6
>>>>>>
>>>>>> Ok so we still need a lockless reviewer really and I don't count.
>>>>>
>>>>> Well I think you count ;) If this is some Intel thing then cluebeat,
>>>>> cluebeat, cluebeat, overruled.
>>>>>
>>>>>> Copying Mathieu who did a lot of lockless stuff. Are you interested
>>>>>> in reviewing Ying's patches?
>>>>>
>>>>> That would be great.
>>>>
>>>> Sure, I can have a look. Huang, can you resend those three patches
>>>> adding me to CC list ? That will help me keep appropriate threading in
>>>> my review. Adding Paul McKenney would also be appropriate.
>>>
>>> I know, I know, I said I would wait for a repost, but now the answer
>>> burns my fingers. ;-) I'm replying to the patch found in
>>> https://lkml.org/lkml/2011/2/21/13
>>>
>>>
>>>> --- /dev/null
>>>> +++ b/include/linux/llist.h
>>>> @@ -0,0 +1,98 @@
>>>> +#ifndef LLIST_H
>>>> +#define LLIST_H
>>>> +/*
>>>> + * Lock-less NULL terminated single linked list
>>>
>>> Because this single-linked-list works like a stack (with "push"
>>> operation for llist_add, "pop" operation for llist_del_first), I would
>>> recommend to rename it accordingly (as a stack rather than "list"). If
>>> we think about other possible users of this kind of lock-free list, such
>>> as call_rcu(), a "queue" would be rather more appropriate than a "stack"
>>> (with enqueue/dequeue operations). So at the very least I would like to
>>> make sure this API keeps room for lock-free queue implementations that
>>> won't be confused with this stack API. It would also be important to
>>> figure out if what we really want is a stack or a queue. Some naming
>>> ideas follow (maybe they are a bit verbose, comments are welcome).
>>>
>>> We should note that this list implements "lock-free" push and pop
>>> operations (cmpxchg loops), and a "wait-free" "llist_del_all" operation
>>> (using xchg) (only really true for architectures with "true" xchg
>>> operation though, not those using LL/SC). We should think about the real
>>> use-case requirements put on this lockless stack to decide which variant
>>> is most appropriate. We can either have, with the implementation you
>>> propose:
>>>
>>> - Lock-free push
>>> - Pop protected by mutex
>>> - Wait-free pop all
>>>
>>> Or, as an example of an alternative structure (as Paul and I implemented
>>> in the userspace RCU library):
>>>
>>> - Wait-free push (stronger real-time guarantees provided by xchg())
>>> - Blocking pop/pop all (use cmpxchg and busy-wait for very short time
>>>   periods)
>>>
>>> (there are others, with e.g. lock-free push, lock-free pop, lock-free
>>> pop all, but this one requires RCU read lock across the pop/pop/pop all
>>> operations and that memory reclaim of the nodes is only performed after
>>> a RCU grace-period has elapsed. This deals with ABA issues of concurrent
>>> push/pop you noticed without requiring mutexes protecting pop operations.)
>>>
>>> So it all boils down to which are the constraints of the push/pop
>>> callers.  Typically, I would expect that the "push" operation has the
>>> most strict real-time constraints (and is possibly executed the most
>>> often, thus would also benefit from xchg() which is typically slightly
>>> faster than cmpxchg()), which would argue in favor of a wait-free
>>> push/blocking pop.  But maybe I'm lacking understanding of what you are
>>> trying to do with this stack. Do you need to ever pop from a NMI
>>> handler ?
>>
>> In my user case, I don't need to pop in a NMI handler, just push.  But
>> we need to pop in a IRQ handler, so we can not use blocking pop.  Please
>> take a look at the user case patches listed later.
> 
> Actually, I marked that as a "blocking" in our implementation because I
> have a busy-loop in there, and because my algorithm was implemented in
> user-space, I added an adaptative delay to make sure not to busy-loop
> waiting for a preempted thread to come back.
> 
> But by disabling preemption and using a real busy-loop in the kernel,
> the pop can trivially be made non-blocking, and thus OK for interrupt
> handlers.

OK.  I see.  Thanks for clarification.  Takes a look at your "wfs"
implementation.  It seems that we need to disable interrupt in wf push
side to make it possible to use pop in interrupt handler. Do you think so?

>>> Some ideas for API identifiers:
>>>
>>> struct llist_head -> slist_stack_head
>>> struct llist_node -> slist_stack_node
>>
>> Why call it a stack and a list?  Because it is a stack implemented with
>> single list?  I think it is better to name after usage instead of
>> implementation.
>>
>> The next question is whether it should be named as stack or list.  I
>> think from current user's point of view, they think they are using a
>> list instead of stack.  There are 3 users so far as follow.
>>
>> https://lkml.org/lkml/2011/1/17/14
>> https://lkml.org/lkml/2011/1/17/15
>> https://lkml.org/lkml/2011/2/21/16
> 
> Hrm, I'm concerned about the impact of handing the irq work from one
> execution context to another in the reverse order (because a stack is
> implemented here). Can this have unwanted side-effects ?

For irq work, this is not an issue.  irq work users should not depend on
the order of raising.  But if it is necessary, we can reserve the order
again in irq_work_run to keep the original order.  After deleting all
nodes from lock-less list, we can operate on them at will.

>>
>> And if we named this data structure as list, we can still use "queue"
>> for another data structure.  Do you think so?
> 
> Well, the "delete first entry" is really unclear if we call all this a
> "list". Which is the first entry ? The first added to the list, or the
> last one ? The natural reflex would be to think it is the first added,
> but in this case, it's the opposite. This is why I think using stack or
> lifo (with push/pop) would be clearer than "list" (with add/del).

The llist is ordered list and has only one list "head", so the first
entry is the one pointed to by the "head".  Is it clear?

> So maybe "lockless stack" could be palatable ? The name "lockless lifo"
> came to my mind in the past days too.

But what we need is just a lock-less list, not a stack.  It behaves like
a stack just because we can not implement other functions such as
llist_add_tail etc in a lock-less way yet.  And we will add these
functions when we know how to do that.

>>> * For your lock-free push/pop + wait-free pop_all implementation:
>>>
>>> llist_add -> slist_stack_push_lf        (lock-free)
>>> llist_del_first -> _slist_stack_pop     (needs mutex protection)
>>> llist_del_all -> slist_stack_pop_all_wf (wait-free)
>>
>> Do we really need to distinguish between lock-free and wait-free from
>> interface?
> 
> I don't think it is needed, and it's probably even just more confusing
> for API users.
> 
>> Will we implement both slist_stack_push_lf and
>> slist_stack_push_wf for one data structure?
> 
> Those will possibly even require different data structures. It might be
> less confusing to find out clearly what our needs are, and only propose
> a single behavior, otherwise it will be confusing for users of these
> APIs without good reason.

Yes.  I agree.  The semantics of my original lock-less list has already
been used by irq work and xlist of net rds.  My original implementation
can be seen as generalizing the existing implementation.  How about take
my original implementation as the first step, that is, generalizing,
then consider how to optimize it after that?

But if you want to compare the two semantics above, I am OK too.

>> mutex is needed between multiple "_slist_stack_pop", but not needed
>> between slist_stack_push_lf and _slist_stack_pop.  I think it is hard to
>> explain that clearly via function naming.
> 
> Good point. A ascii-art table might be appropriate here, e.g.:
> 
> 
> M: Mutual exclusion needed to protect one from another.
> -: lockless.
> 
>              |     push     |   pop    |   pop_all
> push         |      -       |    -     |     -
> pop          |              |    L     |     L
> pop_all      |              |          |     -
> 
> How about adding this (or something prettier) to the header ?

Cool!  I will add that to header.

>>> * If we choose to go with an alternate wait-free push implementation:
>>>
>>> llist_add -> slist_stack_push_wf              (wait-free)
>>> llist_del_first -> slist_stack_pop_blocking   (blocking)
>>> llist_del_all -> slist_stack_pop_all_blocking (blocking)
>>
>> We need non-blocking pop, so maybe you need implement another data
>> structure which has these interface.  I think there can be multiple
>> lock-less data structure in kernel.
> 
> As I noted earlier, the blocking was only due to our user-level
> implementation. It can be turned in a very short-lived busy loop
> instead.
> 
>>
>>>> + *
>>>> + * If there are multiple producers and multiple consumers, llist_add
>>>> + * can be used in producers and llist_del_all can be used in
>>>> + * consumers.  They can work simultaneously without lock.  But
>>>> + * llist_del_first can not be used here.  Because llist_del_first
>>>> + * depends on list->first->next does not changed if list->first is not
>>>> + * changed during its operation, but llist_del_first, llist_add,
>>>> + * llist_add sequence in another consumer may violate that.
>>>
>>> You did not seem to define the locking rules when using both
>>>
>>>   llist_del_all
>>> and
>>>   llist_del_first
>>>
>>> in parallel. I expect that a mutex is needed, because a
>>>
>>>   llist_del_all, llist_add, llist_add
>>>
>>> in parallel with
>>>
>>>   llist_del_first
>>>
>>> could run into the same ABA problem as described above.
>>
>> OK.  I will add that.
>>
>>>> + *
>>>> + * If there are multiple producers and one consumer, llist_add can be
>>>> + * used in producers and llist_del_all or llist_del_first can be used
>>>> + * in the consumer.
>>>> + *
>>>> + * The list entries deleted via llist_del_all can be traversed with
>>>> + * traversing function such as llist_for_each etc.  But the list
>>>> + * entries can not be traversed safely before deleted from the list.
>>>
>>> Given that this is in fact a stack, specifying the traversal order of
>>> llist_for_each and friends would be appropriate.
>>
>> Ok.  I will add something like "traversing from head to tail" in the
>> comments.
> 
> Traversing from last element pushed to first element pushed would
> probably be clearer.

head and tail describe the current state, while "last/first element
pushed" describe the history.  I think both are understandable.

>>>> + *
>>>> + * The basic atomic operation of this list is cmpxchg on long.  On
>>>> + * architectures that don't have NMI-safe cmpxchg implementation, the
>>>> + * list can NOT be used in NMI handler.  So code uses the list in NMI
>>>> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
>>>> + */
>>>> +
>>>> +struct llist_head {
>>>> +     struct llist_node *first;
>>>> +};
>>>> +
>>>> +struct llist_node {
>>>> +     struct llist_node *next;
>>>> +};
>>>> +
>>>> +#define LLIST_HEAD_INIT(name)        { NULL }
>>>> +#define LLIST_HEAD(name)     struct llist_head name = LLIST_HEAD_INIT(name)
>>>> +
>>>> +/**
>>>> + * init_llist_head - initialize lock-less list head
>>>> + * @head:    the head for your lock-less list
>>>> + */
>>>> +static inline void init_llist_head(struct llist_head *list)
>>>> +{
>>>> +     list->first = NULL;
>>>> +}
>>>> +
>>>> +/**
>>>> + * llist_entry - get the struct of this entry
>>>> + * @ptr:     the &struct llist_node pointer.
>>>> + * @type:    the type of the struct this is embedded in.
>>>> + * @member:  the name of the llist_node within the struct.
>>>> + */
>>>> +#define llist_entry(ptr, type, member)               \
>>>> +     container_of(ptr, type, member)
>>>> +
>>>> +/**
>>>> + * llist_for_each - iterate over some deleted entries of a lock-less list
>>>> + * @pos:     the &struct llist_node to use as a loop cursor
>>>> + * @node:    the first entry of deleted list entries
>>>> + *
>>>> + * In general, some entries of the lock-less list can be traversed
>>>> + * safely only after being deleted from list, so start with an entry
>>>> + * instead of list head.
>>>> + */
>>>> +#define llist_for_each(pos, node)                    \
>>>> +     for (pos = (node); pos; pos = pos->next)
>>>> +
>>>> +/**
>>>> + * llist_for_each_entry - iterate over some deleted entries of lock-less list of given type
>>>> + * @pos:     the type * to use as a loop cursor.
>>>> + * @node:    the fist entry of deleted list entries.
>>>> + * @member:  the name of the llist_node with the struct.
>>>> + *
>>>> + * In general, some entries of the lock-less list can be traversed
>>>> + * safely only after being removed from list, so start with an entry
>>>> + * instead of list head.
>>>> + */
>>>> +#define llist_for_each_entry(pos, node, member)                              \
>>>> +     for (pos = llist_entry((node), typeof(*pos), member);           \
>>>> +          &pos->member != NULL;                                      \
>>>> +          pos = llist_entry(pos->member.next, typeof(*pos), member))
>>>> +
>>>> +/**
>>>> + * llist_empty - tests whether a lock-less list is empty
>>>
>>> How is this llist_empty test expected to be used in combination with
>>> other API members ? e.g. llist_del_first, llist_del_all, llist_add ? I
>>> suspect that without mutex to ensure that there are no concurrent
>>> changes, llist_empty return value can easily be non-current.
>>
>> We don't need llist_empty to be accurate.  Just a quick way to test
>> whether list/stack is empty without deleting something from list/stack.
> 
> OK, maybe specifying this limitation above the function would be
> appropriate.
> 
> Thanks,
> 
> Mathieu
> 
>>
>> Best Regards,
>> Huang Ying
>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>> + * @head:    the list to test
>>>> + */
>>>> +static inline int llist_empty(const struct llist_head *head)
>>>> +{
>>>> +     return head->first == NULL;
>>>> +}
>>>> +
>>>> +void llist_add(struct llist_node *new, struct llist_head *head);
>>>> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
>>>> +                  struct llist_head *head);
>>>> +struct llist_node *llist_del_first(struct llist_head *head);
>>>> +struct llist_node *llist_del_all(struct llist_head *head);
>>>> +#endif /* LLIST_H */
>>>> --- a/lib/Kconfig
>>>> +++ b/lib/Kconfig
>>>> @@ -219,4 +219,7 @@ config LRU_CACHE
>>>>  config AVERAGE
>>>>       bool
>>>>
>>>> +config LLIST
>>>> +     bool
>>>> +
>>>>  endmenu
>>>> --- a/lib/Makefile
>>>> +++ b/lib/Makefile
>>>> @@ -110,6 +110,8 @@ obj-$(CONFIG_ATOMIC64_SELFTEST) += atomi
>>>>
>>>>  obj-$(CONFIG_AVERAGE) += average.o
>>>>
>>>> +obj-$(CONFIG_LLIST) += llist.o
>>>> +
>>>>  hostprogs-y  := gen_crc32table
>>>>  clean-files  := crc32table.h
>>>>
>>>> --- /dev/null
>>>> +++ b/lib/llist.c
>>>> @@ -0,0 +1,119 @@
>>>> +/*
>>>> + * Lock-less NULL terminated single linked list
>>>> + *
>>>> + * The basic atomic operation of this list is cmpxchg on long.  On
>>>> + * architectures that don't have NMI-safe cmpxchg implementation, the
>>>> + * list can NOT be used in NMI handler.  So code uses the list in NMI
>>>> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
>>>> + *
>>>> + * Copyright 2010 Intel Corp.
>>>> + *   Author: Huang Ying <ying.huang@intel.com>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or
>>>> + * modify it under the terms of the GNU General Public License version
>>>> + * 2 as published by the Free Software Foundation;
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License
>>>> + * along with this program; if not, write to the Free Software
>>>> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
>>>> + */
>>>> +#include <linux/kernel.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/interrupt.h>
>>>> +#include <linux/llist.h>
>>>> +
>>>> +#include <asm/system.h>
>>>> +
>>>> +/**
>>>> + * llist_add - add a new entry
>>>> + * @new:     new entry to be added
>>>> + * @head:    the head for your lock-less list
>>>> + */
>>>> +void llist_add(struct llist_node *new, struct llist_head *head)
>>>> +{
>>>> +     struct llist_node *entry;
>>>> +
>>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>> +     BUG_ON(in_nmi());
>>>> +#endif
>>>> +
>>>> +     do {
>>>> +             entry = head->first;
>>>> +             new->next = entry;
>>>> +     } while (cmpxchg(&head->first, entry, new) != entry);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(llist_add);
>>>> +
>>>> +/**
>>>> + * llist_add_batch - add several linked entries in batch
>>>> + * @new_first:       first entry in batch to be added
>>>> + * @new_last:        last entry in batch to be added
>>>> + * @head:    the head for your lock-less list
>>>> + */
>>>> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
>>>> +                  struct llist_head *head)
>>>> +{
>>>> +     struct llist_node *entry;
>>>> +
>>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>> +     BUG_ON(in_nmi());
>>>> +#endif
>>>> +
>>>> +     do {
>>>> +             entry = head->first;
>>>> +             new_last->next = entry;
>>>> +     } while (cmpxchg(&head->first, entry, new_first) != entry);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(llist_add_batch);
>>>> +
>>>> +/**
>>>> + * llist_del_first - delete the first entry of lock-less list
>>>> + * @head:    the head for your lock-less list
>>>> + *
>>>> + * If list is empty, return NULL, otherwise, return the first entry deleted.
>>>> + *
>>>> + * Only one llist_del_first user can be used simultaneously with
>>>> + * multiple llist_add users without lock. Because otherwise
>>>> + * llist_del_first, llist_add, llist_add sequence in another user may
>>>> + * change @head->first->next, but keep @head->first. If multiple
>>>> + * consumers are needed, please use llist_del_all.
>>>> + */
>>>> +struct llist_node *llist_del_first(struct llist_head *head)
>>>> +{
>>>> +     struct llist_node *entry;
>>>> +
>>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>> +     BUG_ON(in_nmi());
>>>> +#endif
>>>> +
>>>> +     do {
>>>> +             entry = head->first;
>>>> +             if (entry == NULL)
>>>> +                     return NULL;
>>>> +     } while (cmpxchg(&head->first, entry, entry->next) != entry);
>>>> +
>>>> +     return entry;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(llist_del_first);
>>>> +
>>>> +/**
>>>> + * llist_del_all - delete all entries from lock-less list
>>>> + * @head:    the head of lock-less list to delete all entries
>>>> + *
>>>> + * If list is empty, return NULL, otherwise, delete all entries and
>>>> + * return the pointer to the first entry.
>>>> + */
>>>> +struct llist_node *llist_del_all(struct llist_head *head)
>>>> +{
>>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
>>>> +     BUG_ON(in_nmi());
>>>> +#endif
>>>> +
>>>> +     return xchg(&head->first, NULL);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(llist_del_all);
> 
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-04-02  5:05                 ` Huang Ying
@ 2011-04-04 15:53                   ` Mathieu Desnoyers
  2011-04-05  1:16                     ` huang ying
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Desnoyers @ 2011-04-04 15:53 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, Andi Kleen, lenb, Paul E. McKenney, linux-kernel,
	Linus Torvalds

* Huang Ying (ying.huang@intel.com) wrote:
> On 04/02/2011 05:37 AM, Mathieu Desnoyers wrote:
> > * Huang Ying (ying.huang@intel.com) wrote:
> >> Hi, Mathieu,
> >>
> >> Thank you very much for your review.  Do you have time to take a look at
> >> the lock-less memory allocator as follow?
> >>
> >> https://lkml.org/lkml/2011/2/21/15
> > 
> > I'll try to have a look in the following weeks.
> 
> Thanks.
> 
> > Answer to comments below,
> > 
> >>
> >> On 03/30/2011 11:11 PM, Mathieu Desnoyers wrote:
> >>> * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> >>>> * Andrew Morton (akpm@linux-foundation.org) wrote:
> >>>>> On Wed, 30 Mar 2011 05:22:03 +0200 Andi Kleen <andi@firstfloor.org> wrote:
> >>>>>
> >>>>>> On Wed, Mar 30, 2011 at 09:30:43AM +0800, Huang Ying wrote:
> >>>>>>> On 03/30/2011 09:21 AM, Andrew Morton wrote:
> >>>>>>>> On Wed, 30 Mar 2011 09:14:45 +0800 Huang Ying <ying.huang@intel.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi, Andrew and Len,
> >>>>>>>>>
> >>>>>>>>> In my original APEI patches for 2.6.39, the following 3 patches is about
> >>>>>>>>> lock-less data structure.
> >>>>>>>>>
> >>>>>>>>> [PATCH 1/7] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
> >>>>>>>>> [PATCH 2/7] lib, Add lock-less NULL terminated single list
> >>>>>>>>> [PATCH 6/7] lib, Make gen_pool memory allocator lockless
> >>>>>>>>>
> >>>>>>>>> Len think we need some non-Intel "Acked-by" or "Reviewed-by" for these
> >>>>>>>>> patches to go through the ACPI git tree.  Or they should go through
> >>>>>>>>> other tree, such as -mm tree.
> >>>>>>>>
> >>>>>>>> I just dropped a couple of your patches because include/linux/llist.h
> >>>>>>>> vanished from linux-next.   Did Len trip over the power cord?
> >>>>>>>
> >>>>>>> Len has dropped lock-less data structure patches from acpi git tree.  He
> >>>>>>> describe the reason in following mails.
> >>>>>>>
> >>>>>>> https://lkml.org/lkml/2011/3/2/501
> >>>>>>> https://lkml.org/lkml/2011/3/23/6
> >>>>>>
> >>>>>> Ok so we still need a lockless reviewer really and I don't count.
> >>>>>
> >>>>> Well I think you count ;) If this is some Intel thing then cluebeat,
> >>>>> cluebeat, cluebeat, overruled.
> >>>>>
> >>>>>> Copying Mathieu who did a lot of lockless stuff. Are you interested
> >>>>>> in reviewing Ying's patches?
> >>>>>
> >>>>> That would be great.
> >>>>
> >>>> Sure, I can have a look. Huang, can you resend those three patches
> >>>> adding me to CC list ? That will help me keep appropriate threading in
> >>>> my review. Adding Paul McKenney would also be appropriate.
> >>>
> >>> I know, I know, I said I would wait for a repost, but now the answer
> >>> burns my fingers. ;-) I'm replying to the patch found in
> >>> https://lkml.org/lkml/2011/2/21/13
> >>>
> >>>
> >>>> --- /dev/null
> >>>> +++ b/include/linux/llist.h
> >>>> @@ -0,0 +1,98 @@
> >>>> +#ifndef LLIST_H
> >>>> +#define LLIST_H
> >>>> +/*
> >>>> + * Lock-less NULL terminated single linked list
> >>>
> >>> Because this single-linked-list works like a stack (with "push"
> >>> operation for llist_add, "pop" operation for llist_del_first), I would
> >>> recommend to rename it accordingly (as a stack rather than "list"). If
> >>> we think about other possible users of this kind of lock-free list, such
> >>> as call_rcu(), a "queue" would be rather more appropriate than a "stack"
> >>> (with enqueue/dequeue operations). So at the very least I would like to
> >>> make sure this API keeps room for lock-free queue implementations that
> >>> won't be confused with this stack API. It would also be important to
> >>> figure out if what we really want is a stack or a queue. Some naming
> >>> ideas follow (maybe they are a bit verbose, comments are welcome).
> >>>
> >>> We should note that this list implements "lock-free" push and pop
> >>> operations (cmpxchg loops), and a "wait-free" "llist_del_all" operation
> >>> (using xchg) (only really true for architectures with "true" xchg
> >>> operation though, not those using LL/SC). We should think about the real
> >>> use-case requirements put on this lockless stack to decide which variant
> >>> is most appropriate. We can either have, with the implementation you
> >>> propose:
> >>>
> >>> - Lock-free push
> >>> - Pop protected by mutex
> >>> - Wait-free pop all
> >>>
> >>> Or, as an example of an alternative structure (as Paul and I implemented
> >>> in the userspace RCU library):
> >>>
> >>> - Wait-free push (stronger real-time guarantees provided by xchg())
> >>> - Blocking pop/pop all (use cmpxchg and busy-wait for very short time
> >>>   periods)
> >>>
> >>> (there are others, with e.g. lock-free push, lock-free pop, lock-free
> >>> pop all, but this one requires RCU read lock across the pop/pop/pop all
> >>> operations and that memory reclaim of the nodes is only performed after
> >>> a RCU grace-period has elapsed. This deals with ABA issues of concurrent
> >>> push/pop you noticed without requiring mutexes protecting pop operations.)
> >>>
> >>> So it all boils down to which are the constraints of the push/pop
> >>> callers.  Typically, I would expect that the "push" operation has the
> >>> most strict real-time constraints (and is possibly executed the most
> >>> often, thus would also benefit from xchg() which is typically slightly
> >>> faster than cmpxchg()), which would argue in favor of a wait-free
> >>> push/blocking pop.  But maybe I'm lacking understanding of what you are
> >>> trying to do with this stack. Do you need to ever pop from a NMI
> >>> handler ?
> >>
> >> In my user case, I don't need to pop in a NMI handler, just push.  But
> >> we need to pop in a IRQ handler, so we can not use blocking pop.  Please
> >> take a look at the user case patches listed later.
> > 
> > Actually, I marked that as a "blocking" in our implementation because I
> > have a busy-loop in there, and because my algorithm was implemented in
> > user-space, I added an adaptative delay to make sure not to busy-loop
> > waiting for a preempted thread to come back.
> > 
> > But by disabling preemption and using a real busy-loop in the kernel,
> > the pop can trivially be made non-blocking, and thus OK for interrupt
> > handlers.
> 
> OK.  I see.  Thanks for clarification.  Takes a look at your "wfs"
> implementation.  It seems that we need to disable interrupt in wf push
> side to make it possible to use pop in interrupt handler. Do you think so?

It would be appropriate, yes, because the irq handler doing "pop" can
busy-wait during the short period of time during which it sees a "NULL"
value. We don't want to have a softirq running on top of the push that
would make the irq handler busy-wait for longer than the duration of
interrupt handlers. In your case, the push is done by a NMI handler,
which already has interrupts disabled, so no worry for this specific
use-case.

> 
> >>> Some ideas for API identifiers:
> >>>
> >>> struct llist_head -> slist_stack_head
> >>> struct llist_node -> slist_stack_node
> >>
> >> Why call it a stack and a list?  Because it is a stack implemented with
> >> single list?  I think it is better to name after usage instead of
> >> implementation.
> >>
> >> The next question is whether it should be named as stack or list.  I
> >> think from current user's point of view, they think they are using a
> >> list instead of stack.  There are 3 users so far as follow.
> >>
> >> https://lkml.org/lkml/2011/1/17/14
> >> https://lkml.org/lkml/2011/1/17/15
> >> https://lkml.org/lkml/2011/2/21/16
> > 
> > Hrm, I'm concerned about the impact of handing the irq work from one
> > execution context to another in the reverse order (because a stack is
> > implemented here). Can this have unwanted side-effects ?
> 
> For irq work, this is not an issue.  irq work users should not depend on
> the order of raising.  But if it is necessary, we can reserve the order
> again in irq_work_run to keep the original order.  After deleting all
> nodes from lock-less list, we can operate on them at will.

Ouch. This seems counter-intuitive. I'd rather prefer if we can change
the way this lock-less list operates to use a "queue" fifo scheme rather
than "stack" (lifo). This would seem much more intuitive: the first
entry added to the list is the first to be removed. And for iteration on
multiple entries removed at once, rather than iterating in reverse
order, it would iterate from the oldest to the newest entry, which again
seems more natural.

> 
> >>
> >> And if we named this data structure as list, we can still use "queue"
> >> for another data structure.  Do you think so?
> > 
> > Well, the "delete first entry" is really unclear if we call all this a
> > "list". Which is the first entry ? The first added to the list, or the
> > last one ? The natural reflex would be to think it is the first added,
> > but in this case, it's the opposite. This is why I think using stack or
> > lifo (with push/pop) would be clearer than "list" (with add/del).
> 
> The llist is ordered list and has only one list "head", so the first
> entry is the one pointed to by the "head".  Is it clear?
> 
> > So maybe "lockless stack" could be palatable ? The name "lockless lifo"
> > came to my mind in the past days too.
> 
> But what we need is just a lock-less list, not a stack.  It behaves like
> a stack just because we can not implement other functions such as
> llist_add_tail etc in a lock-less way yet.  And we will add these
> functions when we know how to do that.

Well, we do already know how to do that ;-) ;-) Please have a look at
wfqueue.h and wfqueue-static.h in Userspace RCU library. In this
implementation too, the "blocking" busy-wait can be replaced by an
active busy-looping in pretty much the same way I explained for the
stack implementation.

> 
> >>> * For your lock-free push/pop + wait-free pop_all implementation:
> >>>
> >>> llist_add -> slist_stack_push_lf        (lock-free)
> >>> llist_del_first -> _slist_stack_pop     (needs mutex protection)
> >>> llist_del_all -> slist_stack_pop_all_wf (wait-free)
> >>
> >> Do we really need to distinguish between lock-free and wait-free from
> >> interface?
> > 
> > I don't think it is needed, and it's probably even just more confusing
> > for API users.
> > 
> >> Will we implement both slist_stack_push_lf and
> >> slist_stack_push_wf for one data structure?
> > 
> > Those will possibly even require different data structures. It might be
> > less confusing to find out clearly what our needs are, and only propose
> > a single behavior, otherwise it will be confusing for users of these
> > APIs without good reason.
> 
> Yes.  I agree.  The semantics of my original lock-less list has already
> been used by irq work and xlist of net rds.  My original implementation
> can be seen as generalizing the existing implementation.  How about take
> my original implementation as the first step, that is, generalizing,
> then consider how to optimize it after that?
> 
> But if you want to compare the two semantics above, I am OK too.

In the spirit of not providing more APIs than needed, I would argue that
it would be better to settle on the question of stack vs queue before
this API gets into mainline.

Unless we can show that you have users that need your list to behave
like a stack, I would be in favor of moving it to a queue
implementation and provide the semantic of "enqueue/dequeue", with list
iteration order from oldest to newest node. Trying to change this
afterward could bring us surprises, so it might be better to do it right
now before this gets introduced.

Moreover, we're not talking about that many lines of code to port from
the userspace implementation to kernel-space. It just needs very
thorough testing.

> 
> >> mutex is needed between multiple "_slist_stack_pop", but not needed
> >> between slist_stack_push_lf and _slist_stack_pop.  I think it is hard to
> >> explain that clearly via function naming.
> > 
> > Good point. A ascii-art table might be appropriate here, e.g.:
> > 
> > 
> > M: Mutual exclusion needed to protect one from another.
> > -: lockless.
> > 
> >              |     push     |   pop    |   pop_all
> > push         |      -       |    -     |     -
> > pop          |              |    L     |     L
> > pop_all      |              |          |     -
> > 
> > How about adding this (or something prettier) to the header ?
> 
> Cool!  I will add that to header.
> 
> >>> * If we choose to go with an alternate wait-free push implementation:
> >>>
> >>> llist_add -> slist_stack_push_wf              (wait-free)
> >>> llist_del_first -> slist_stack_pop_blocking   (blocking)
> >>> llist_del_all -> slist_stack_pop_all_blocking (blocking)
> >>
> >> We need non-blocking pop, so maybe you need implement another data
> >> structure which has these interface.  I think there can be multiple
> >> lock-less data structure in kernel.
> > 
> > As I noted earlier, the blocking was only due to our user-level
> > implementation. It can be turned in a very short-lived busy loop
> > instead.
> > 
> >>
> >>>> + *
> >>>> + * If there are multiple producers and multiple consumers, llist_add
> >>>> + * can be used in producers and llist_del_all can be used in
> >>>> + * consumers.  They can work simultaneously without lock.  But
> >>>> + * llist_del_first can not be used here.  Because llist_del_first
> >>>> + * depends on list->first->next does not changed if list->first is not
> >>>> + * changed during its operation, but llist_del_first, llist_add,
> >>>> + * llist_add sequence in another consumer may violate that.
> >>>
> >>> You did not seem to define the locking rules when using both
> >>>
> >>>   llist_del_all
> >>> and
> >>>   llist_del_first
> >>>
> >>> in parallel. I expect that a mutex is needed, because a
> >>>
> >>>   llist_del_all, llist_add, llist_add
> >>>
> >>> in parallel with
> >>>
> >>>   llist_del_first
> >>>
> >>> could run into the same ABA problem as described above.
> >>
> >> OK.  I will add that.
> >>
> >>>> + *
> >>>> + * If there are multiple producers and one consumer, llist_add can be
> >>>> + * used in producers and llist_del_all or llist_del_first can be used
> >>>> + * in the consumer.
> >>>> + *
> >>>> + * The list entries deleted via llist_del_all can be traversed with
> >>>> + * traversing function such as llist_for_each etc.  But the list
> >>>> + * entries can not be traversed safely before deleted from the list.
> >>>
> >>> Given that this is in fact a stack, specifying the traversal order of
> >>> llist_for_each and friends would be appropriate.
> >>
> >> Ok.  I will add something like "traversing from head to tail" in the
> >> comments.
> > 
> > Traversing from last element pushed to first element pushed would
> > probably be clearer.
> 
> head and tail describe the current state, while "last/first element
> pushed" describe the history.  I think both are understandable.

head and tail, in a list implemented as a stack (but that is not
clearly advertised as such), don't convey the meaning of "we're
iterating from the newest element pushed to the oldest one". This
counter-intuitiveness is part of why I would really like to see this
turned into a queue.

Thanks,

Mathieu

> 
> >>>> + *
> >>>> + * The basic atomic operation of this list is cmpxchg on long.  On
> >>>> + * architectures that don't have NMI-safe cmpxchg implementation, the
> >>>> + * list can NOT be used in NMI handler.  So code uses the list in NMI
> >>>> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
> >>>> + */
> >>>> +
> >>>> +struct llist_head {
> >>>> +     struct llist_node *first;
> >>>> +};
> >>>> +
> >>>> +struct llist_node {
> >>>> +     struct llist_node *next;
> >>>> +};
> >>>> +
> >>>> +#define LLIST_HEAD_INIT(name)        { NULL }
> >>>> +#define LLIST_HEAD(name)     struct llist_head name = LLIST_HEAD_INIT(name)
> >>>> +
> >>>> +/**
> >>>> + * init_llist_head - initialize lock-less list head
> >>>> + * @head:    the head for your lock-less list
> >>>> + */
> >>>> +static inline void init_llist_head(struct llist_head *list)
> >>>> +{
> >>>> +     list->first = NULL;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * llist_entry - get the struct of this entry
> >>>> + * @ptr:     the &struct llist_node pointer.
> >>>> + * @type:    the type of the struct this is embedded in.
> >>>> + * @member:  the name of the llist_node within the struct.
> >>>> + */
> >>>> +#define llist_entry(ptr, type, member)               \
> >>>> +     container_of(ptr, type, member)
> >>>> +
> >>>> +/**
> >>>> + * llist_for_each - iterate over some deleted entries of a lock-less list
> >>>> + * @pos:     the &struct llist_node to use as a loop cursor
> >>>> + * @node:    the first entry of deleted list entries
> >>>> + *
> >>>> + * In general, some entries of the lock-less list can be traversed
> >>>> + * safely only after being deleted from list, so start with an entry
> >>>> + * instead of list head.
> >>>> + */
> >>>> +#define llist_for_each(pos, node)                    \
> >>>> +     for (pos = (node); pos; pos = pos->next)
> >>>> +
> >>>> +/**
> >>>> + * llist_for_each_entry - iterate over some deleted entries of lock-less list of given type
> >>>> + * @pos:     the type * to use as a loop cursor.
> >>>> + * @node:    the fist entry of deleted list entries.
> >>>> + * @member:  the name of the llist_node with the struct.
> >>>> + *
> >>>> + * In general, some entries of the lock-less list can be traversed
> >>>> + * safely only after being removed from list, so start with an entry
> >>>> + * instead of list head.
> >>>> + */
> >>>> +#define llist_for_each_entry(pos, node, member)                              \
> >>>> +     for (pos = llist_entry((node), typeof(*pos), member);           \
> >>>> +          &pos->member != NULL;                                      \
> >>>> +          pos = llist_entry(pos->member.next, typeof(*pos), member))
> >>>> +
> >>>> +/**
> >>>> + * llist_empty - tests whether a lock-less list is empty
> >>>
> >>> How is this llist_empty test expected to be used in combination with
> >>> other API members ? e.g. llist_del_first, llist_del_all, llist_add ? I
> >>> suspect that without mutex to ensure that there are no concurrent
> >>> changes, llist_empty return value can easily be non-current.
> >>
> >> We don't need llist_empty to be accurate.  Just a quick way to test
> >> whether list/stack is empty without deleting something from list/stack.
> > 
> > OK, maybe specifying this limitation above the function would be
> > appropriate.
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> >>
> >> Best Regards,
> >> Huang Ying
> >>
> >>> Thanks,
> >>>
> >>> Mathieu
> >>>
> >>>> + * @head:    the list to test
> >>>> + */
> >>>> +static inline int llist_empty(const struct llist_head *head)
> >>>> +{
> >>>> +     return head->first == NULL;
> >>>> +}
> >>>> +
> >>>> +void llist_add(struct llist_node *new, struct llist_head *head);
> >>>> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
> >>>> +                  struct llist_head *head);
> >>>> +struct llist_node *llist_del_first(struct llist_head *head);
> >>>> +struct llist_node *llist_del_all(struct llist_head *head);
> >>>> +#endif /* LLIST_H */
> >>>> --- a/lib/Kconfig
> >>>> +++ b/lib/Kconfig
> >>>> @@ -219,4 +219,7 @@ config LRU_CACHE
> >>>>  config AVERAGE
> >>>>       bool
> >>>>
> >>>> +config LLIST
> >>>> +     bool
> >>>> +
> >>>>  endmenu
> >>>> --- a/lib/Makefile
> >>>> +++ b/lib/Makefile
> >>>> @@ -110,6 +110,8 @@ obj-$(CONFIG_ATOMIC64_SELFTEST) += atomi
> >>>>
> >>>>  obj-$(CONFIG_AVERAGE) += average.o
> >>>>
> >>>> +obj-$(CONFIG_LLIST) += llist.o
> >>>> +
> >>>>  hostprogs-y  := gen_crc32table
> >>>>  clean-files  := crc32table.h
> >>>>
> >>>> --- /dev/null
> >>>> +++ b/lib/llist.c
> >>>> @@ -0,0 +1,119 @@
> >>>> +/*
> >>>> + * Lock-less NULL terminated single linked list
> >>>> + *
> >>>> + * The basic atomic operation of this list is cmpxchg on long.  On
> >>>> + * architectures that don't have NMI-safe cmpxchg implementation, the
> >>>> + * list can NOT be used in NMI handler.  So code uses the list in NMI
> >>>> + * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
> >>>> + *
> >>>> + * Copyright 2010 Intel Corp.
> >>>> + *   Author: Huang Ying <ying.huang@intel.com>
> >>>> + *
> >>>> + * This program is free software; you can redistribute it and/or
> >>>> + * modify it under the terms of the GNU General Public License version
> >>>> + * 2 as published by the Free Software Foundation;
> >>>> + *
> >>>> + * This program is distributed in the hope that it will be useful,
> >>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>>> + * GNU General Public License for more details.
> >>>> + *
> >>>> + * You should have received a copy of the GNU General Public License
> >>>> + * along with this program; if not, write to the Free Software
> >>>> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> >>>> + */
> >>>> +#include <linux/kernel.h>
> >>>> +#include <linux/module.h>
> >>>> +#include <linux/interrupt.h>
> >>>> +#include <linux/llist.h>
> >>>> +
> >>>> +#include <asm/system.h>
> >>>> +
> >>>> +/**
> >>>> + * llist_add - add a new entry
> >>>> + * @new:     new entry to be added
> >>>> + * @head:    the head for your lock-less list
> >>>> + */
> >>>> +void llist_add(struct llist_node *new, struct llist_head *head)
> >>>> +{
> >>>> +     struct llist_node *entry;
> >>>> +
> >>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> >>>> +     BUG_ON(in_nmi());
> >>>> +#endif
> >>>> +
> >>>> +     do {
> >>>> +             entry = head->first;
> >>>> +             new->next = entry;
> >>>> +     } while (cmpxchg(&head->first, entry, new) != entry);
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(llist_add);
> >>>> +
> >>>> +/**
> >>>> + * llist_add_batch - add several linked entries in batch
> >>>> + * @new_first:       first entry in batch to be added
> >>>> + * @new_last:        last entry in batch to be added
> >>>> + * @head:    the head for your lock-less list
> >>>> + */
> >>>> +void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
> >>>> +                  struct llist_head *head)
> >>>> +{
> >>>> +     struct llist_node *entry;
> >>>> +
> >>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> >>>> +     BUG_ON(in_nmi());
> >>>> +#endif
> >>>> +
> >>>> +     do {
> >>>> +             entry = head->first;
> >>>> +             new_last->next = entry;
> >>>> +     } while (cmpxchg(&head->first, entry, new_first) != entry);
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(llist_add_batch);
> >>>> +
> >>>> +/**
> >>>> + * llist_del_first - delete the first entry of lock-less list
> >>>> + * @head:    the head for your lock-less list
> >>>> + *
> >>>> + * If list is empty, return NULL, otherwise, return the first entry deleted.
> >>>> + *
> >>>> + * Only one llist_del_first user can be used simultaneously with
> >>>> + * multiple llist_add users without lock. Because otherwise
> >>>> + * llist_del_first, llist_add, llist_add sequence in another user may
> >>>> + * change @head->first->next, but keep @head->first. If multiple
> >>>> + * consumers are needed, please use llist_del_all.
> >>>> + */
> >>>> +struct llist_node *llist_del_first(struct llist_head *head)
> >>>> +{
> >>>> +     struct llist_node *entry;
> >>>> +
> >>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> >>>> +     BUG_ON(in_nmi());
> >>>> +#endif
> >>>> +
> >>>> +     do {
> >>>> +             entry = head->first;
> >>>> +             if (entry == NULL)
> >>>> +                     return NULL;
> >>>> +     } while (cmpxchg(&head->first, entry, entry->next) != entry);
> >>>> +
> >>>> +     return entry;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(llist_del_first);
> >>>> +
> >>>> +/**
> >>>> + * llist_del_all - delete all entries from lock-less list
> >>>> + * @head:    the head of lock-less list to delete all entries
> >>>> + *
> >>>> + * If list is empty, return NULL, otherwise, delete all entries and
> >>>> + * return the pointer to the first entry.
> >>>> + */
> >>>> +struct llist_node *llist_del_all(struct llist_head *head)
> >>>> +{
> >>>> +#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> >>>> +     BUG_ON(in_nmi());
> >>>> +#endif
> >>>> +
> >>>> +     return xchg(&head->first, NULL);
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(llist_del_all);
> > 
> > --
> > Mathieu Desnoyers
> > Operating System Efficiency R&D Consultant
> > EfficiOS Inc.
> > http://www.efficios.com
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-04-04 15:53                   ` Mathieu Desnoyers
@ 2011-04-05  1:16                     ` huang ying
  2011-04-05  4:42                       ` Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: huang ying @ 2011-04-05  1:16 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Huang Ying, Andrew Morton, Andi Kleen, lenb, Paul E. McKenney,
	linux-kernel, Linus Torvalds

On Mon, Apr 4, 2011 at 11:53 PM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
> * Huang Ying (ying.huang@intel.com) wrote:
>> On 04/02/2011 05:37 AM, Mathieu Desnoyers wrote:
>> > * Huang Ying (ying.huang@intel.com) wrote:
>> >> Hi, Mathieu,
>> >>
>> >> Thank you very much for your review.  Do you have time to take a look at
>> >> the lock-less memory allocator as follow?
>> >>
>> >> https://lkml.org/lkml/2011/2/21/15
>> >
>> > I'll try to have a look in the following weeks.
>>
>> Thanks.
>>
>> > Answer to comments below,
>> >
>> >>
>> >> On 03/30/2011 11:11 PM, Mathieu Desnoyers wrote:
>> >>> * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
>> >>>> * Andrew Morton (akpm@linux-foundation.org) wrote:
>> >>>>> On Wed, 30 Mar 2011 05:22:03 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>> >>>>>
>> >>>>>> On Wed, Mar 30, 2011 at 09:30:43AM +0800, Huang Ying wrote:
>> >>>>>>> On 03/30/2011 09:21 AM, Andrew Morton wrote:
>> >>>>>>>> On Wed, 30 Mar 2011 09:14:45 +0800 Huang Ying <ying.huang@intel.com> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi, Andrew and Len,
>> >>>>>>>>>
>> >>>>>>>>> In my original APEI patches for 2.6.39, the following 3 patches is about
>> >>>>>>>>> lock-less data structure.
>> >>>>>>>>>
>> >>>>>>>>> [PATCH 1/7] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
>> >>>>>>>>> [PATCH 2/7] lib, Add lock-less NULL terminated single list
>> >>>>>>>>> [PATCH 6/7] lib, Make gen_pool memory allocator lockless
>> >>>>>>>>>
>> >>>>>>>>> Len think we need some non-Intel "Acked-by" or "Reviewed-by" for these
>> >>>>>>>>> patches to go through the ACPI git tree.  Or they should go through
>> >>>>>>>>> other tree, such as -mm tree.
>> >>>>>>>>
>> >>>>>>>> I just dropped a couple of your patches because include/linux/llist.h
>> >>>>>>>> vanished from linux-next.   Did Len trip over the power cord?
>> >>>>>>>
>> >>>>>>> Len has dropped lock-less data structure patches from acpi git tree.  He
>> >>>>>>> describe the reason in following mails.
>> >>>>>>>
>> >>>>>>> https://lkml.org/lkml/2011/3/2/501
>> >>>>>>> https://lkml.org/lkml/2011/3/23/6
>> >>>>>>
>> >>>>>> Ok so we still need a lockless reviewer really and I don't count.
>> >>>>>
>> >>>>> Well I think you count ;) If this is some Intel thing then cluebeat,
>> >>>>> cluebeat, cluebeat, overruled.
>> >>>>>
>> >>>>>> Copying Mathieu who did a lot of lockless stuff. Are you interested
>> >>>>>> in reviewing Ying's patches?
>> >>>>>
>> >>>>> That would be great.
>> >>>>
>> >>>> Sure, I can have a look. Huang, can you resend those three patches
>> >>>> adding me to CC list ? That will help me keep appropriate threading in
>> >>>> my review. Adding Paul McKenney would also be appropriate.
>> >>>
>> >>> I know, I know, I said I would wait for a repost, but now the answer
>> >>> burns my fingers. ;-) I'm replying to the patch found in
>> >>> https://lkml.org/lkml/2011/2/21/13
>> >>>
>> >>>
>> >>>> --- /dev/null
>> >>>> +++ b/include/linux/llist.h
>> >>>> @@ -0,0 +1,98 @@
>> >>>> +#ifndef LLIST_H
>> >>>> +#define LLIST_H
>> >>>> +/*
>> >>>> + * Lock-less NULL terminated single linked list
>> >>>
>> >>> Because this single-linked-list works like a stack (with "push"
>> >>> operation for llist_add, "pop" operation for llist_del_first), I would
>> >>> recommend to rename it accordingly (as a stack rather than "list"). If
>> >>> we think about other possible users of this kind of lock-free list, such
>> >>> as call_rcu(), a "queue" would be rather more appropriate than a "stack"
>> >>> (with enqueue/dequeue operations). So at the very least I would like to
>> >>> make sure this API keeps room for lock-free queue implementations that
>> >>> won't be confused with this stack API. It would also be important to
>> >>> figure out if what we really want is a stack or a queue. Some naming
>> >>> ideas follow (maybe they are a bit verbose, comments are welcome).
>> >>>
>> >>> We should note that this list implements "lock-free" push and pop
>> >>> operations (cmpxchg loops), and a "wait-free" "llist_del_all" operation
>> >>> (using xchg) (only really true for architectures with "true" xchg
>> >>> operation though, not those using LL/SC). We should think about the real
>> >>> use-case requirements put on this lockless stack to decide which variant
>> >>> is most appropriate. We can either have, with the implementation you
>> >>> propose:
>> >>>
>> >>> - Lock-free push
>> >>> - Pop protected by mutex
>> >>> - Wait-free pop all
>> >>>
>> >>> Or, as an example of an alternative structure (as Paul and I implemented
>> >>> in the userspace RCU library):
>> >>>
>> >>> - Wait-free push (stronger real-time guarantees provided by xchg())
>> >>> - Blocking pop/pop all (use cmpxchg and busy-wait for very short time
>> >>>   periods)
>> >>>
>> >>> (there are others, with e.g. lock-free push, lock-free pop, lock-free
>> >>> pop all, but this one requires RCU read lock across the pop/pop/pop all
>> >>> operations and that memory reclaim of the nodes is only performed after
>> >>> a RCU grace-period has elapsed. This deals with ABA issues of concurrent
>> >>> push/pop you noticed without requiring mutexes protecting pop operations.)
>> >>>
>> >>> So it all boils down to which are the constraints of the push/pop
>> >>> callers.  Typically, I would expect that the "push" operation has the
>> >>> most strict real-time constraints (and is possibly executed the most
>> >>> often, thus would also benefit from xchg() which is typically slightly
>> >>> faster than cmpxchg()), which would argue in favor of a wait-free
>> >>> push/blocking pop.  But maybe I'm lacking understanding of what you are
>> >>> trying to do with this stack. Do you need to ever pop from a NMI
>> >>> handler ?
>> >>
>> >> In my user case, I don't need to pop in a NMI handler, just push.  But
>> >> we need to pop in a IRQ handler, so we can not use blocking pop.  Please
>> >> take a look at the user case patches listed later.
>> >
>> > Actually, I marked that as a "blocking" in our implementation because I
>> > have a busy-loop in there, and because my algorithm was implemented in
>> > user-space, I added an adaptative delay to make sure not to busy-loop
>> > waiting for a preempted thread to come back.
>> >
>> > But by disabling preemption and using a real busy-loop in the kernel,
>> > the pop can trivially be made non-blocking, and thus OK for interrupt
>> > handlers.
>>
>> OK.  I see.  Thanks for clarification.  Takes a look at your "wfs"
>> implementation.  It seems that we need to disable interrupt in wf push
>> side to make it possible to use pop in interrupt handler. Do you think so?
>
> It would be appropriate, yes, because the irq handler doing "pop" can
> busy-wait during the short period of time during which it sees a "NULL"
> value. We don't want to have a softirq running on top of the push that
> would make the irq handler busy-wait for longer than the duration of
> interrupt handlers. In your case, the push is done by a NMI handler,
> which already has interrupts disabled, so no worry for this specific
> use-case.

The semantics of irq_work doesn't restrict the push must be in NMI
handler, it is possible for the push to be called in process context.
And we are working on library code, so I don't think it is a good idea
to restrict that the priority of context of push must be higher (such
as NMI vs IRQ) than that of pop.

>> >>> Some ideas for API identifiers:
>> >>>
>> >>> struct llist_head -> slist_stack_head
>> >>> struct llist_node -> slist_stack_node
>> >>
>> >> Why call it a stack and a list?  Because it is a stack implemented with
>> >> single list?  I think it is better to name after usage instead of
>> >> implementation.
>> >>
>> >> The next question is whether it should be named as stack or list.  I
>> >> think from current user's point of view, they think they are using a
>> >> list instead of stack.  There are 3 users so far as follow.
>> >>
>> >> https://lkml.org/lkml/2011/1/17/14
>> >> https://lkml.org/lkml/2011/1/17/15
>> >> https://lkml.org/lkml/2011/2/21/16
>> >
>> > Hrm, I'm concerned about the impact of handing the irq work from one
>> > execution context to another in the reverse order (because a stack is
>> > implemented here). Can this have unwanted side-effects ?
>>
>> For irq work, this is not an issue.  irq work users should not depend on
>> the order of raising.  But if it is necessary, we can reserve the order
>> again in irq_work_run to keep the original order.  After deleting all
>> nodes from lock-less list, we can operate on them at will.
>
> Ouch. This seems counter-intuitive. I'd rather prefer if we can change
> the way this lock-less list operates to use a "queue" fifo scheme rather
> than "stack" (lifo). This would seem much more intuitive: the first
> entry added to the list is the first to be removed. And for iteration on
> multiple entries removed at once, rather than iterating in reverse
> order, it would iterate from the oldest to the newest entry, which again
> seems more natural.

I don't think "lifo" is a big issue here, although "fifo" may be more
natural.  After all, this is a very simple data structure with very
simple code.

>> >> And if we named this data structure as list, we can still use "queue"
>> >> for another data structure.  Do you think so?
>> >
>> > Well, the "delete first entry" is really unclear if we call all this a
>> > "list". Which is the first entry ? The first added to the list, or the
>> > last one ? The natural reflex would be to think it is the first added,
>> > but in this case, it's the opposite. This is why I think using stack or
>> > lifo (with push/pop) would be clearer than "list" (with add/del).
>>
>> The llist is ordered list and has only one list "head", so the first
>> entry is the one pointed to by the "head".  Is it clear?
>>
>> > So maybe "lockless stack" could be palatable ? The name "lockless lifo"
>> > came to my mind in the past days too.
>>
>> But what we need is just a lock-less list, not a stack.  It behaves like
>> a stack just because we can not implement other functions such as
>> llist_add_tail etc in a lock-less way yet.  And we will add these
>> functions when we know how to do that.
>
> Well, we do already know how to do that ;-) ;-) Please have a look at
> wfqueue.h and wfqueue-static.h in Userspace RCU library. In this
> implementation too, the "blocking" busy-wait can be replaced by an
> active busy-looping in pretty much the same way I explained for the
> stack implementation.

Find your wfqeueu implementation, that is great!  I just have some
concern about the performance.  It seems hard to implement the
dequeue_all, and IRQ may need to be turned off in enqueue side.

>> >>> * For your lock-free push/pop + wait-free pop_all implementation:
>> >>>
>> >>> llist_add -> slist_stack_push_lf        (lock-free)
>> >>> llist_del_first -> _slist_stack_pop     (needs mutex protection)
>> >>> llist_del_all -> slist_stack_pop_all_wf (wait-free)
>> >>
>> >> Do we really need to distinguish between lock-free and wait-free from
>> >> interface?
>> >
>> > I don't think it is needed, and it's probably even just more confusing
>> > for API users.
>> >
>> >> Will we implement both slist_stack_push_lf and
>> >> slist_stack_push_wf for one data structure?
>> >
>> > Those will possibly even require different data structures. It might be
>> > less confusing to find out clearly what our needs are, and only propose
>> > a single behavior, otherwise it will be confusing for users of these
>> > APIs without good reason.
>>
>> Yes.  I agree.  The semantics of my original lock-less list has already
>> been used by irq work and xlist of net rds.  My original implementation
>> can be seen as generalizing the existing implementation.  How about take
>> my original implementation as the first step, that is, generalizing,
>> then consider how to optimize it after that?
>>
>> But if you want to compare the two semantics above, I am OK too.
>
> In the spirit of not providing more APIs than needed, I would argue that
> it would be better to settle on the question of stack vs queue before
> this API gets into mainline.
>
> Unless we can show that you have users that need your list to behave
> like a stack, I would be in favor of moving it to a queue
> implementation and provide the semantic of "enqueue/dequeue", with list
> iteration order from oldest to newest node. Trying to change this
> afterward could bring us surprises, so it might be better to do it right
> now before this gets introduced.
>
> Moreover, we're not talking about that many lines of code to port from
> the userspace implementation to kernel-space. It just needs very
> thorough testing.

The current users don't need a stack semantics.  They are USING the
stack semantics and implementation.  So I am NOT writing a new data
structure, I just generalize some existing data structure used by more
than one user into lib directory.

So I think the lock-less list data structure should be done in 2 steps.

1. Generalize the existing lock-less list data structure into lib.
Because exactly same implementation is used, there will be no big
issue here.  And if there are some issue in 2, we can revert to here
easily.

2. Change the implementation/semantics of lock-less list data
structure and change all users, this can based on your "queue"
implementation.  This involves performance compromise and semantics
changing, so will need more testing and more review.

Do you agree?

>> >> mutex is needed between multiple "_slist_stack_pop", but not needed
>> >> between slist_stack_push_lf and _slist_stack_pop.  I think it is hard to
>> >> explain that clearly via function naming.
>> >
>> > Good point. A ascii-art table might be appropriate here, e.g.:
>> >
>> >
>> > M: Mutual exclusion needed to protect one from another.
>> > -: lockless.
>> >
>> >              |     push     |   pop    |   pop_all
>> > push         |      -       |    -     |     -
>> > pop          |              |    L     |     L
>> > pop_all      |              |          |     -
>> >
>> > How about adding this (or something prettier) to the header ?
>>
>> Cool!  I will add that to header.
>>
>> >>> * If we choose to go with an alternate wait-free push implementation:
>> >>>
>> >>> llist_add -> slist_stack_push_wf              (wait-free)
>> >>> llist_del_first -> slist_stack_pop_blocking   (blocking)
>> >>> llist_del_all -> slist_stack_pop_all_blocking (blocking)
>> >>
>> >> We need non-blocking pop, so maybe you need implement another data
>> >> structure which has these interface.  I think there can be multiple
>> >> lock-less data structure in kernel.
>> >
>> > As I noted earlier, the blocking was only due to our user-level
>> > implementation. It can be turned in a very short-lived busy loop
>> > instead.
>> >
>> >>
>> >>>> + *
>> >>>> + * If there are multiple producers and multiple consumers, llist_add
>> >>>> + * can be used in producers and llist_del_all can be used in
>> >>>> + * consumers.  They can work simultaneously without lock.  But
>> >>>> + * llist_del_first can not be used here.  Because llist_del_first
>> >>>> + * depends on list->first->next does not changed if list->first is not
>> >>>> + * changed during its operation, but llist_del_first, llist_add,
>> >>>> + * llist_add sequence in another consumer may violate that.
>> >>>
>> >>> You did not seem to define the locking rules when using both
>> >>>
>> >>>   llist_del_all
>> >>> and
>> >>>   llist_del_first
>> >>>
>> >>> in parallel. I expect that a mutex is needed, because a
>> >>>
>> >>>   llist_del_all, llist_add, llist_add
>> >>>
>> >>> in parallel with
>> >>>
>> >>>   llist_del_first
>> >>>
>> >>> could run into the same ABA problem as described above.
>> >>
>> >> OK.  I will add that.
>> >>
>> >>>> + *
>> >>>> + * If there are multiple producers and one consumer, llist_add can be
>> >>>> + * used in producers and llist_del_all or llist_del_first can be used
>> >>>> + * in the consumer.
>> >>>> + *
>> >>>> + * The list entries deleted via llist_del_all can be traversed with
>> >>>> + * traversing function such as llist_for_each etc.  But the list
>> >>>> + * entries can not be traversed safely before deleted from the list.
>> >>>
>> >>> Given that this is in fact a stack, specifying the traversal order of
>> >>> llist_for_each and friends would be appropriate.
>> >>
>> >> Ok.  I will add something like "traversing from head to tail" in the
>> >> comments.
>> >
>> > Traversing from last element pushed to first element pushed would
>> > probably be clearer.
>>
>> head and tail describe the current state, while "last/first element
>> pushed" describe the history.  I think both are understandable.
>
> head and tail, in a list implemented as a stack (but that is not
> clearly advertised as such), don't convey the meaning of "we're
> iterating from the newest element pushed to the oldest one". This
> counter-intuitiveness is part of why I would really like to see this
> turned into a queue.

OK.  I will change the comments, adding these semantics explanation.
The user should be warned :)

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-04-05  1:16                     ` huang ying
@ 2011-04-05  4:42                       ` Mathieu Desnoyers
  2011-04-05 12:46                         ` huang ying
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Desnoyers @ 2011-04-05  4:42 UTC (permalink / raw)
  To: huang ying
  Cc: Huang Ying, Andrew Morton, Andi Kleen, lenb, Paul E. McKenney,
	linux-kernel, Linus Torvalds

* huang ying (huang.ying.caritas@gmail.com) wrote:
> On Mon, Apr 4, 2011 at 11:53 PM, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> > * Huang Ying (ying.huang@intel.com) wrote:
> >> On 04/02/2011 05:37 AM, Mathieu Desnoyers wrote:
> >> > * Huang Ying (ying.huang@intel.com) wrote:
> >> >> Hi, Mathieu,
> >> >>
> >> >> Thank you very much for your review.  Do you have time to take a look at
> >> >> the lock-less memory allocator as follow?
> >> >>
> >> >> https://lkml.org/lkml/2011/2/21/15
> >> >
> >> > I'll try to have a look in the following weeks.
> >>
> >> Thanks.
> >>
> >> > Answer to comments below,
> >> >
> >> >>
> >> >> On 03/30/2011 11:11 PM, Mathieu Desnoyers wrote:
> >> >>> * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> >> >>>> * Andrew Morton (akpm@linux-foundation.org) wrote:
> >> >>>>> On Wed, 30 Mar 2011 05:22:03 +0200 Andi Kleen <andi@firstfloor.org> wrote:
> >> >>>>>
> >> >>>>>> On Wed, Mar 30, 2011 at 09:30:43AM +0800, Huang Ying wrote:
> >> >>>>>>> On 03/30/2011 09:21 AM, Andrew Morton wrote:
> >> >>>>>>>> On Wed, 30 Mar 2011 09:14:45 +0800 Huang Ying <ying.huang@intel.com> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Hi, Andrew and Len,
> >> >>>>>>>>>
> >> >>>>>>>>> In my original APEI patches for 2.6.39, the following 3 patches is about
> >> >>>>>>>>> lock-less data structure.
> >> >>>>>>>>>
> >> >>>>>>>>> [PATCH 1/7] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
> >> >>>>>>>>> [PATCH 2/7] lib, Add lock-less NULL terminated single list
> >> >>>>>>>>> [PATCH 6/7] lib, Make gen_pool memory allocator lockless
> >> >>>>>>>>>
> >> >>>>>>>>> Len think we need some non-Intel "Acked-by" or "Reviewed-by" for these
> >> >>>>>>>>> patches to go through the ACPI git tree.  Or they should go through
> >> >>>>>>>>> other tree, such as -mm tree.
> >> >>>>>>>>
> >> >>>>>>>> I just dropped a couple of your patches because include/linux/llist.h
> >> >>>>>>>> vanished from linux-next.   Did Len trip over the power cord?
> >> >>>>>>>
> >> >>>>>>> Len has dropped lock-less data structure patches from acpi git tree.  He
> >> >>>>>>> describe the reason in following mails.
> >> >>>>>>>
> >> >>>>>>> https://lkml.org/lkml/2011/3/2/501
> >> >>>>>>> https://lkml.org/lkml/2011/3/23/6
> >> >>>>>>
> >> >>>>>> Ok so we still need a lockless reviewer really and I don't count.
> >> >>>>>
> >> >>>>> Well I think you count ;) If this is some Intel thing then cluebeat,
> >> >>>>> cluebeat, cluebeat, overruled.
> >> >>>>>
> >> >>>>>> Copying Mathieu who did a lot of lockless stuff. Are you interested
> >> >>>>>> in reviewing Ying's patches?
> >> >>>>>
> >> >>>>> That would be great.
> >> >>>>
> >> >>>> Sure, I can have a look. Huang, can you resend those three patches
> >> >>>> adding me to CC list ? That will help me keep appropriate threading in
> >> >>>> my review. Adding Paul McKenney would also be appropriate.
> >> >>>
> >> >>> I know, I know, I said I would wait for a repost, but now the answer
> >> >>> burns my fingers. ;-) I'm replying to the patch found in
> >> >>> https://lkml.org/lkml/2011/2/21/13
> >> >>>
> >> >>>
> >> >>>> --- /dev/null
> >> >>>> +++ b/include/linux/llist.h
> >> >>>> @@ -0,0 +1,98 @@
> >> >>>> +#ifndef LLIST_H
> >> >>>> +#define LLIST_H
> >> >>>> +/*
> >> >>>> + * Lock-less NULL terminated single linked list
> >> >>>
> >> >>> Because this single-linked-list works like a stack (with "push"
> >> >>> operation for llist_add, "pop" operation for llist_del_first), I would
> >> >>> recommend to rename it accordingly (as a stack rather than "list"). If
> >> >>> we think about other possible users of this kind of lock-free list, such
> >> >>> as call_rcu(), a "queue" would be rather more appropriate than a "stack"
> >> >>> (with enqueue/dequeue operations). So at the very least I would like to
> >> >>> make sure this API keeps room for lock-free queue implementations that
> >> >>> won't be confused with this stack API. It would also be important to
> >> >>> figure out if what we really want is a stack or a queue. Some naming
> >> >>> ideas follow (maybe they are a bit verbose, comments are welcome).
> >> >>>
> >> >>> We should note that this list implements "lock-free" push and pop
> >> >>> operations (cmpxchg loops), and a "wait-free" "llist_del_all" operation
> >> >>> (using xchg) (only really true for architectures with "true" xchg
> >> >>> operation though, not those using LL/SC). We should think about the real
> >> >>> use-case requirements put on this lockless stack to decide which variant
> >> >>> is most appropriate. We can either have, with the implementation you
> >> >>> propose:
> >> >>>
> >> >>> - Lock-free push
> >> >>> - Pop protected by mutex
> >> >>> - Wait-free pop all
> >> >>>
> >> >>> Or, as an example of an alternative structure (as Paul and I implemented
> >> >>> in the userspace RCU library):
> >> >>>
> >> >>> - Wait-free push (stronger real-time guarantees provided by xchg())
> >> >>> - Blocking pop/pop all (use cmpxchg and busy-wait for very short time
> >> >>>   periods)
> >> >>>
> >> >>> (there are others, with e.g. lock-free push, lock-free pop, lock-free
> >> >>> pop all, but this one requires RCU read lock across the pop/pop/pop all
> >> >>> operations and that memory reclaim of the nodes is only performed after
> >> >>> a RCU grace-period has elapsed. This deals with ABA issues of concurrent
> >> >>> push/pop you noticed without requiring mutexes protecting pop operations.)
> >> >>>
> >> >>> So it all boils down to which are the constraints of the push/pop
> >> >>> callers.  Typically, I would expect that the "push" operation has the
> >> >>> most strict real-time constraints (and is possibly executed the most
> >> >>> often, thus would also benefit from xchg() which is typically slightly
> >> >>> faster than cmpxchg()), which would argue in favor of a wait-free
> >> >>> push/blocking pop.  But maybe I'm lacking understanding of what you are
> >> >>> trying to do with this stack. Do you need to ever pop from a NMI
> >> >>> handler ?
> >> >>
> >> >> In my user case, I don't need to pop in a NMI handler, just push.  But
> >> >> we need to pop in a IRQ handler, so we can not use blocking pop.  Please
> >> >> take a look at the user case patches listed later.
> >> >
> >> > Actually, I marked that as a "blocking" in our implementation because I
> >> > have a busy-loop in there, and because my algorithm was implemented in
> >> > user-space, I added an adaptative delay to make sure not to busy-loop
> >> > waiting for a preempted thread to come back.
> >> >
> >> > But by disabling preemption and using a real busy-loop in the kernel,
> >> > the pop can trivially be made non-blocking, and thus OK for interrupt
> >> > handlers.
> >>
> >> OK.  I see.  Thanks for clarification.  Takes a look at your "wfs"
> >> implementation.  It seems that we need to disable interrupt in wf push
> >> side to make it possible to use pop in interrupt handler. Do you think so?
> >
> > It would be appropriate, yes, because the irq handler doing "pop" can
> > busy-wait during the short period of time during which it sees a "NULL"
> > value. We don't want to have a softirq running on top of the push that
> > would make the irq handler busy-wait for longer than the duration of
> > interrupt handlers. In your case, the push is done by a NMI handler,
> > which already has interrupts disabled, so no worry for this specific
> > use-case.
> 
> The semantics of irq_work doesn't restrict the push must be in NMI
> handler, it is possible for the push to be called in process context.
> And we are working on library code, so I don't think it is a good idea
> to restrict that the priority of context of push must be higher (such
> as NMI vs IRQ) than that of pop.

OK. Anyway, the busy-looping on the pop side is an implementation
choice. We have the luxury to choose between either xchg on
push/busy-loop on pop or a cmpxchg on push (as in your implementation),
which removes the need for busy-loop on pop. Maybe going for a cmpxchg
on push might make more sense, because it would not require to disable
interrupts.

> 
> >> >>> Some ideas for API identifiers:
> >> >>>
> >> >>> struct llist_head -> slist_stack_head
> >> >>> struct llist_node -> slist_stack_node
> >> >>
> >> >> Why call it a stack and a list?  Because it is a stack implemented with
> >> >> single list?  I think it is better to name after usage instead of
> >> >> implementation.
> >> >>
> >> >> The next question is whether it should be named as stack or list.  I
> >> >> think from current user's point of view, they think they are using a
> >> >> list instead of stack.  There are 3 users so far as follow.
> >> >>
> >> >> https://lkml.org/lkml/2011/1/17/14
> >> >> https://lkml.org/lkml/2011/1/17/15
> >> >> https://lkml.org/lkml/2011/2/21/16
> >> >
> >> > Hrm, I'm concerned about the impact of handing the irq work from one
> >> > execution context to another in the reverse order (because a stack is
> >> > implemented here). Can this have unwanted side-effects ?
> >>
> >> For irq work, this is not an issue.  irq work users should not depend on
> >> the order of raising.  But if it is necessary, we can reserve the order
> >> again in irq_work_run to keep the original order.  After deleting all
> >> nodes from lock-less list, we can operate on them at will.
> >
> > Ouch. This seems counter-intuitive. I'd rather prefer if we can change
> > the way this lock-less list operates to use a "queue" fifo scheme rather
> > than "stack" (lifo). This would seem much more intuitive: the first
> > entry added to the list is the first to be removed. And for iteration on
> > multiple entries removed at once, rather than iterating in reverse
> > order, it would iterate from the oldest to the newest entry, which again
> > seems more natural.
> 
> I don't think "lifo" is a big issue here, although "fifo" may be more
> natural.  After all, this is a very simple data structure with very
> simple code.

Yessss, although I would never apply the "simple" qualifier to lockless
code. ;-)

> 
> >> >> And if we named this data structure as list, we can still use "queue"
> >> >> for another data structure.  Do you think so?
> >> >
> >> > Well, the "delete first entry" is really unclear if we call all this a
> >> > "list". Which is the first entry ? The first added to the list, or the
> >> > last one ? The natural reflex would be to think it is the first added,
> >> > but in this case, it's the opposite. This is why I think using stack or
> >> > lifo (with push/pop) would be clearer than "list" (with add/del).
> >>
> >> The llist is ordered list and has only one list "head", so the first
> >> entry is the one pointed to by the "head".  Is it clear?
> >>
> >> > So maybe "lockless stack" could be palatable ? The name "lockless lifo"
> >> > came to my mind in the past days too.
> >>
> >> But what we need is just a lock-less list, not a stack.  It behaves like
> >> a stack just because we can not implement other functions such as
> >> llist_add_tail etc in a lock-less way yet.  And we will add these
> >> functions when we know how to do that.
> >
> > Well, we do already know how to do that ;-) ;-) Please have a look at
> > wfqueue.h and wfqueue-static.h in Userspace RCU library. In this
> > implementation too, the "blocking" busy-wait can be replaced by an
> > active busy-looping in pretty much the same way I explained for the
> > stack implementation.
> 
> Find your wfqeueu implementation, that is great!  I just have some
> concern about the performance.

I guess you are referring to the need to disable interrupts. As I stated
above, we can choose to go for a cmpxchg-based push instead, which
removes the busy-loop from dequeue altogether.

> It seems hard to implement the
> dequeue_all,

Actually, there is already one implemented :)

See urcu-call-rcu.c: call_rcu_thread()

Please note that this "dequeue all" implementation does not support
concurrent "dequeue single node" operation (because we did not need it
in the specific call_rcu() use-case), but it could possibly be adapted
by handling the dummy node more carefully, although I'm not sure it's
worth the effort (unless the combined use of dequeue one and dequeue all
becomes an important use-case for a given list).

> and IRQ may need to be turned off in enqueue side.

Not if we use a cmpxchg-based enqueue.

> 
> >> >>> * For your lock-free push/pop + wait-free pop_all implementation:
> >> >>>
> >> >>> llist_add -> slist_stack_push_lf        (lock-free)
> >> >>> llist_del_first -> _slist_stack_pop     (needs mutex protection)
> >> >>> llist_del_all -> slist_stack_pop_all_wf (wait-free)
> >> >>
> >> >> Do we really need to distinguish between lock-free and wait-free from
> >> >> interface?
> >> >
> >> > I don't think it is needed, and it's probably even just more confusing
> >> > for API users.
> >> >
> >> >> Will we implement both slist_stack_push_lf and
> >> >> slist_stack_push_wf for one data structure?
> >> >
> >> > Those will possibly even require different data structures. It might be
> >> > less confusing to find out clearly what our needs are, and only propose
> >> > a single behavior, otherwise it will be confusing for users of these
> >> > APIs without good reason.
> >>
> >> Yes.  I agree.  The semantics of my original lock-less list has already
> >> been used by irq work and xlist of net rds.  My original implementation
> >> can be seen as generalizing the existing implementation.  How about take
> >> my original implementation as the first step, that is, generalizing,
> >> then consider how to optimize it after that?
> >>
> >> But if you want to compare the two semantics above, I am OK too.
> >
> > In the spirit of not providing more APIs than needed, I would argue that
> > it would be better to settle on the question of stack vs queue before
> > this API gets into mainline.
> >
> > Unless we can show that you have users that need your list to behave
> > like a stack, I would be in favor of moving it to a queue
> > implementation and provide the semantic of "enqueue/dequeue", with list
> > iteration order from oldest to newest node. Trying to change this
> > afterward could bring us surprises, so it might be better to do it right
> > now before this gets introduced.
> >
> > Moreover, we're not talking about that many lines of code to port from
> > the userspace implementation to kernel-space. It just needs very
> > thorough testing.
> 
> The current users don't need a stack semantics.  They are USING the
> stack semantics and implementation.  So I am NOT writing a new data
> structure, I just generalize some existing data structure used by more
> than one user into lib directory.

I see. That's the part of the big picture I was missing.

> 
> So I think the lock-less list data structure should be done in 2 steps.
> 
> 1. Generalize the existing lock-less list data structure into lib.
> Because exactly same implementation is used, there will be no big
> issue here.  And if there are some issue in 2, we can revert to here
> easily.
> 
> 2. Change the implementation/semantics of lock-less list data
> structure and change all users, this can based on your "queue"
> implementation.  This involves performance compromise and semantics
> changing, so will need more testing and more review.
> 
> Do you agree?

Yep. What I was missing is that you took existing implementations and
merged them together and did not create your own implementation from
scratch. Considering that there are already users of these
implementation, it indeed makes sense to go through this intermediate
step.

> 
> >> >> mutex is needed between multiple "_slist_stack_pop", but not needed
> >> >> between slist_stack_push_lf and _slist_stack_pop.  I think it is hard to
> >> >> explain that clearly via function naming.
> >> >
> >> > Good point. A ascii-art table might be appropriate here, e.g.:
> >> >
> >> >
> >> > M: Mutual exclusion needed to protect one from another.
> >> > -: lockless.
> >> >
> >> >              |     push     |   pop    |   pop_all
> >> > push         |      -       |    -     |     -
> >> > pop          |              |    L     |     L
> >> > pop_all      |              |          |     -
> >> >
> >> > How about adding this (or something prettier) to the header ?
> >>
> >> Cool!  I will add that to header.
> >>
> >> >>> * If we choose to go with an alternate wait-free push implementation:
> >> >>>
> >> >>> llist_add -> slist_stack_push_wf              (wait-free)
> >> >>> llist_del_first -> slist_stack_pop_blocking   (blocking)
> >> >>> llist_del_all -> slist_stack_pop_all_blocking (blocking)
> >> >>
> >> >> We need non-blocking pop, so maybe you need implement another data
> >> >> structure which has these interface.  I think there can be multiple
> >> >> lock-less data structure in kernel.
> >> >
> >> > As I noted earlier, the blocking was only due to our user-level
> >> > implementation. It can be turned in a very short-lived busy loop
> >> > instead.
> >> >
> >> >>
> >> >>>> + *
> >> >>>> + * If there are multiple producers and multiple consumers, llist_add
> >> >>>> + * can be used in producers and llist_del_all can be used in
> >> >>>> + * consumers.  They can work simultaneously without lock.  But
> >> >>>> + * llist_del_first can not be used here.  Because llist_del_first
> >> >>>> + * depends on list->first->next does not changed if list->first is not
> >> >>>> + * changed during its operation, but llist_del_first, llist_add,
> >> >>>> + * llist_add sequence in another consumer may violate that.
> >> >>>
> >> >>> You did not seem to define the locking rules when using both
> >> >>>
> >> >>>   llist_del_all
> >> >>> and
> >> >>>   llist_del_first
> >> >>>
> >> >>> in parallel. I expect that a mutex is needed, because a
> >> >>>
> >> >>>   llist_del_all, llist_add, llist_add
> >> >>>
> >> >>> in parallel with
> >> >>>
> >> >>>   llist_del_first
> >> >>>
> >> >>> could run into the same ABA problem as described above.
> >> >>
> >> >> OK.  I will add that.
> >> >>
> >> >>>> + *
> >> >>>> + * If there are multiple producers and one consumer, llist_add can be
> >> >>>> + * used in producers and llist_del_all or llist_del_first can be used
> >> >>>> + * in the consumer.
> >> >>>> + *
> >> >>>> + * The list entries deleted via llist_del_all can be traversed with
> >> >>>> + * traversing function such as llist_for_each etc.  But the list
> >> >>>> + * entries can not be traversed safely before deleted from the list.
> >> >>>
> >> >>> Given that this is in fact a stack, specifying the traversal order of
> >> >>> llist_for_each and friends would be appropriate.
> >> >>
> >> >> Ok.  I will add something like "traversing from head to tail" in the
> >> >> comments.
> >> >
> >> > Traversing from last element pushed to first element pushed would
> >> > probably be clearer.
> >>
> >> head and tail describe the current state, while "last/first element
> >> pushed" describe the history.  I think both are understandable.
> >
> > head and tail, in a list implemented as a stack (but that is not
> > clearly advertised as such), don't convey the meaning of "we're
> > iterating from the newest element pushed to the oldest one". This
> > counter-intuitiveness is part of why I would really like to see this
> > turned into a queue.
> 
> OK.  I will change the comments, adding these semantics explanation.
> The user should be warned :)

Yes, that makes sense. After this generalization step, if you're ok with
this, we could aim at moving the implementation from a stack to a queue
and provide fifo semantic rather than lifo, so that other users (e.g.
call_rcu in the kernel) can start benefiting from it.

Thanks!

Mathieu

> 
> Best Regards,
> Huang Ying

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-04-05  4:42                       ` Mathieu Desnoyers
@ 2011-04-05 12:46                         ` huang ying
  2011-04-06  1:48                           ` Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: huang ying @ 2011-04-05 12:46 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Huang Ying, Andrew Morton, Andi Kleen, lenb, Paul E. McKenney,
	linux-kernel, Linus Torvalds

On Tue, Apr 5, 2011 at 12:42 PM, Mathieu Desnoyers
[snip]
>> >> >>> Because this single-linked-list works like a stack (with "push"
>> >> >>> operation for llist_add, "pop" operation for llist_del_first), I would
>> >> >>> recommend to rename it accordingly (as a stack rather than "list"). If
>> >> >>> we think about other possible users of this kind of lock-free list, such
>> >> >>> as call_rcu(), a "queue" would be rather more appropriate than a "stack"
>> >> >>> (with enqueue/dequeue operations). So at the very least I would like to
>> >> >>> make sure this API keeps room for lock-free queue implementations that
>> >> >>> won't be confused with this stack API. It would also be important to
>> >> >>> figure out if what we really want is a stack or a queue. Some naming
>> >> >>> ideas follow (maybe they are a bit verbose, comments are welcome).
>> >> >>>
>> >> >>> We should note that this list implements "lock-free" push and pop
>> >> >>> operations (cmpxchg loops), and a "wait-free" "llist_del_all" operation
>> >> >>> (using xchg) (only really true for architectures with "true" xchg
>> >> >>> operation though, not those using LL/SC). We should think about the real
>> >> >>> use-case requirements put on this lockless stack to decide which variant
>> >> >>> is most appropriate. We can either have, with the implementation you
>> >> >>> propose:
>> >> >>>
>> >> >>> - Lock-free push
>> >> >>> - Pop protected by mutex
>> >> >>> - Wait-free pop all
>> >> >>>
>> >> >>> Or, as an example of an alternative structure (as Paul and I implemented
>> >> >>> in the userspace RCU library):
>> >> >>>
>> >> >>> - Wait-free push (stronger real-time guarantees provided by xchg())
>> >> >>> - Blocking pop/pop all (use cmpxchg and busy-wait for very short time
>> >> >>>   periods)
>> >> >>>
>> >> >>> (there are others, with e.g. lock-free push, lock-free pop, lock-free
>> >> >>> pop all, but this one requires RCU read lock across the pop/pop/pop all
>> >> >>> operations and that memory reclaim of the nodes is only performed after
>> >> >>> a RCU grace-period has elapsed. This deals with ABA issues of concurrent
>> >> >>> push/pop you noticed without requiring mutexes protecting pop operations.)
>> >> >>>
>> >> >>> So it all boils down to which are the constraints of the push/pop
>> >> >>> callers.  Typically, I would expect that the "push" operation has the
>> >> >>> most strict real-time constraints (and is possibly executed the most
>> >> >>> often, thus would also benefit from xchg() which is typically slightly
>> >> >>> faster than cmpxchg()), which would argue in favor of a wait-free
>> >> >>> push/blocking pop.  But maybe I'm lacking understanding of what you are
>> >> >>> trying to do with this stack. Do you need to ever pop from a NMI
>> >> >>> handler ?
>> >> >>
>> >> >> In my user case, I don't need to pop in a NMI handler, just push.  But
>> >> >> we need to pop in a IRQ handler, so we can not use blocking pop.  Please
>> >> >> take a look at the user case patches listed later.
>> >> >
>> >> > Actually, I marked that as a "blocking" in our implementation because I
>> >> > have a busy-loop in there, and because my algorithm was implemented in
>> >> > user-space, I added an adaptative delay to make sure not to busy-loop
>> >> > waiting for a preempted thread to come back.
>> >> >
>> >> > But by disabling preemption and using a real busy-loop in the kernel,
>> >> > the pop can trivially be made non-blocking, and thus OK for interrupt
>> >> > handlers.
>> >>
>> >> OK.  I see.  Thanks for clarification.  Takes a look at your "wfs"
>> >> implementation.  It seems that we need to disable interrupt in wf push
>> >> side to make it possible to use pop in interrupt handler. Do you think so?
>> >
>> > It would be appropriate, yes, because the irq handler doing "pop" can
>> > busy-wait during the short period of time during which it sees a "NULL"
>> > value. We don't want to have a softirq running on top of the push that
>> > would make the irq handler busy-wait for longer than the duration of
>> > interrupt handlers. In your case, the push is done by a NMI handler,
>> > which already has interrupts disabled, so no worry for this specific
>> > use-case.
>>
>> The semantics of irq_work doesn't restrict the push must be in NMI
>> handler, it is possible for the push to be called in process context.
>> And we are working on library code, so I don't think it is a good idea
>> to restrict that the priority of context of push must be higher (such
>> as NMI vs IRQ) than that of pop.
>
> OK. Anyway, the busy-looping on the pop side is an implementation
> choice. We have the luxury to choose between either xchg on
> push/busy-loop on pop or a cmpxchg on push (as in your implementation),
> which removes the need for busy-loop on pop. Maybe going for a cmpxchg
> on push might make more sense, because it would not require to disable
> interrupts.

Yes.  It is luxury to have two choices. :-)

>> >> >>> Some ideas for API identifiers:
>> >> >>>
>> >> >>> struct llist_head -> slist_stack_head
>> >> >>> struct llist_node -> slist_stack_node
>> >> >>
>> >> >> Why call it a stack and a list?  Because it is a stack implemented with
>> >> >> single list?  I think it is better to name after usage instead of
>> >> >> implementation.
>> >> >>
>> >> >> The next question is whether it should be named as stack or list.  I
>> >> >> think from current user's point of view, they think they are using a
>> >> >> list instead of stack.  There are 3 users so far as follow.
>> >> >>
>> >> >> https://lkml.org/lkml/2011/1/17/14
>> >> >> https://lkml.org/lkml/2011/1/17/15
>> >> >> https://lkml.org/lkml/2011/2/21/16
>> >> >
>> >> > Hrm, I'm concerned about the impact of handing the irq work from one
>> >> > execution context to another in the reverse order (because a stack is
>> >> > implemented here). Can this have unwanted side-effects ?
>> >>
>> >> For irq work, this is not an issue.  irq work users should not depend on
>> >> the order of raising.  But if it is necessary, we can reserve the order
>> >> again in irq_work_run to keep the original order.  After deleting all
>> >> nodes from lock-less list, we can operate on them at will.
>> >
>> > Ouch. This seems counter-intuitive. I'd rather prefer if we can change
>> > the way this lock-less list operates to use a "queue" fifo scheme rather
>> > than "stack" (lifo). This would seem much more intuitive: the first
>> > entry added to the list is the first to be removed. And for iteration on
>> > multiple entries removed at once, rather than iterating in reverse
>> > order, it would iterate from the oldest to the newest entry, which again
>> > seems more natural.
>>
>> I don't think "lifo" is a big issue here, although "fifo" may be more
>> natural.  After all, this is a very simple data structure with very
>> simple code.
>
> Yessss, although I would never apply the "simple" qualifier to lockless
> code. ;-)
>
>>
>> >> >> And if we named this data structure as list, we can still use "queue"
>> >> >> for another data structure.  Do you think so?
>> >> >
>> >> > Well, the "delete first entry" is really unclear if we call all this a
>> >> > "list". Which is the first entry ? The first added to the list, or the
>> >> > last one ? The natural reflex would be to think it is the first added,
>> >> > but in this case, it's the opposite. This is why I think using stack or
>> >> > lifo (with push/pop) would be clearer than "list" (with add/del).
>> >>
>> >> The llist is ordered list and has only one list "head", so the first
>> >> entry is the one pointed to by the "head".  Is it clear?
>> >>
>> >> > So maybe "lockless stack" could be palatable ? The name "lockless lifo"
>> >> > came to my mind in the past days too.
>> >>
>> >> But what we need is just a lock-less list, not a stack.  It behaves like
>> >> a stack just because we can not implement other functions such as
>> >> llist_add_tail etc in a lock-less way yet.  And we will add these
>> >> functions when we know how to do that.
>> >
>> > Well, we do already know how to do that ;-) ;-) Please have a look at
>> > wfqueue.h and wfqueue-static.h in Userspace RCU library. In this
>> > implementation too, the "blocking" busy-wait can be replaced by an
>> > active busy-looping in pretty much the same way I explained for the
>> > stack implementation.
>>
>> Find your wfqeueu implementation, that is great!  I just have some
>> concern about the performance.
>
> I guess you are referring to the need to disable interrupts. As I stated
> above, we can choose to go for a cmpxchg-based push instead, which
> removes the busy-loop from dequeue altogether.

That is great!

>> It seems hard to implement the
>> dequeue_all,
>
> Actually, there is already one implemented :)
>
> See urcu-call-rcu.c: call_rcu_thread()

Have not understand the whole algorithm fully. Will continue to study
it.  I found there is a synchronize_rcu() in call_crc_thread().  Must
"dequeue_all" use that too?

> Please note that this "dequeue all" implementation does not support
> concurrent "dequeue single node" operation (because we did not need it
> in the specific call_rcu() use-case), but it could possibly be adapted
> by handling the dummy node more carefully, although I'm not sure it's
> worth the effort (unless the combined use of dequeue one and dequeue all
> becomes an important use-case for a given list).
>
>> and IRQ may need to be turned off in enqueue side.
>
> Not if we use a cmpxchg-based enqueue.
>
>>
>> >> >>> * For your lock-free push/pop + wait-free pop_all implementation:
>> >> >>>
>> >> >>> llist_add -> slist_stack_push_lf        (lock-free)
>> >> >>> llist_del_first -> _slist_stack_pop     (needs mutex protection)
>> >> >>> llist_del_all -> slist_stack_pop_all_wf (wait-free)
>> >> >>
>> >> >> Do we really need to distinguish between lock-free and wait-free from
>> >> >> interface?
>> >> >
>> >> > I don't think it is needed, and it's probably even just more confusing
>> >> > for API users.
>> >> >
>> >> >> Will we implement both slist_stack_push_lf and
>> >> >> slist_stack_push_wf for one data structure?
>> >> >
>> >> > Those will possibly even require different data structures. It might be
>> >> > less confusing to find out clearly what our needs are, and only propose
>> >> > a single behavior, otherwise it will be confusing for users of these
>> >> > APIs without good reason.
>> >>
>> >> Yes.  I agree.  The semantics of my original lock-less list has already
>> >> been used by irq work and xlist of net rds.  My original implementation
>> >> can be seen as generalizing the existing implementation.  How about take
>> >> my original implementation as the first step, that is, generalizing,
>> >> then consider how to optimize it after that?
>> >>
>> >> But if you want to compare the two semantics above, I am OK too.
>> >
>> > In the spirit of not providing more APIs than needed, I would argue that
>> > it would be better to settle on the question of stack vs queue before
>> > this API gets into mainline.
>> >
>> > Unless we can show that you have users that need your list to behave
>> > like a stack, I would be in favor of moving it to a queue
>> > implementation and provide the semantic of "enqueue/dequeue", with list
>> > iteration order from oldest to newest node. Trying to change this
>> > afterward could bring us surprises, so it might be better to do it right
>> > now before this gets introduced.
>> >
>> > Moreover, we're not talking about that many lines of code to port from
>> > the userspace implementation to kernel-space. It just needs very
>> > thorough testing.
>>
>> The current users don't need a stack semantics.  They are USING the
>> stack semantics and implementation.  So I am NOT writing a new data
>> structure, I just generalize some existing data structure used by more
>> than one user into lib directory.
>
> I see. That's the part of the big picture I was missing.
>
>>
>> So I think the lock-less list data structure should be done in 2 steps.
>>
>> 1. Generalize the existing lock-less list data structure into lib.
>> Because exactly same implementation is used, there will be no big
>> issue here.  And if there are some issue in 2, we can revert to here
>> easily.
>>
>> 2. Change the implementation/semantics of lock-less list data
>> structure and change all users, this can based on your "queue"
>> implementation.  This involves performance compromise and semantics
>> changing, so will need more testing and more review.
>>
>> Do you agree?
>
> Yep. What I was missing is that you took existing implementations and
> merged them together and did not create your own implementation from
> scratch. Considering that there are already users of these
> implementation, it indeed makes sense to go through this intermediate
> step.

Thanks.

>> >> >> mutex is needed between multiple "_slist_stack_pop", but not needed
>> >> >> between slist_stack_push_lf and _slist_stack_pop.  I think it is hard to
>> >> >> explain that clearly via function naming.
>> >> >
>> >> > Good point. A ascii-art table might be appropriate here, e.g.:
>> >> >
>> >> >
>> >> > M: Mutual exclusion needed to protect one from another.
>> >> > -: lockless.
>> >> >
>> >> >              |     push     |   pop    |   pop_all
>> >> > push         |      -       |    -     |     -
>> >> > pop          |              |    L     |     L
>> >> > pop_all      |              |          |     -
>> >> >
>> >> > How about adding this (or something prettier) to the header ?
>> >>
>> >> Cool!  I will add that to header.
>> >>
>> >> >>> * If we choose to go with an alternate wait-free push implementation:
>> >> >>>
>> >> >>> llist_add -> slist_stack_push_wf              (wait-free)
>> >> >>> llist_del_first -> slist_stack_pop_blocking   (blocking)
>> >> >>> llist_del_all -> slist_stack_pop_all_blocking (blocking)
>> >> >>
>> >> >> We need non-blocking pop, so maybe you need implement another data
>> >> >> structure which has these interface.  I think there can be multiple
>> >> >> lock-less data structure in kernel.
>> >> >
>> >> > As I noted earlier, the blocking was only due to our user-level
>> >> > implementation. It can be turned in a very short-lived busy loop
>> >> > instead.
>> >> >
>> >> >>
>> >> >>>> + *
>> >> >>>> + * If there are multiple producers and multiple consumers, llist_add
>> >> >>>> + * can be used in producers and llist_del_all can be used in
>> >> >>>> + * consumers.  They can work simultaneously without lock.  But
>> >> >>>> + * llist_del_first can not be used here.  Because llist_del_first
>> >> >>>> + * depends on list->first->next does not changed if list->first is not
>> >> >>>> + * changed during its operation, but llist_del_first, llist_add,
>> >> >>>> + * llist_add sequence in another consumer may violate that.
>> >> >>>
>> >> >>> You did not seem to define the locking rules when using both
>> >> >>>
>> >> >>>   llist_del_all
>> >> >>> and
>> >> >>>   llist_del_first
>> >> >>>
>> >> >>> in parallel. I expect that a mutex is needed, because a
>> >> >>>
>> >> >>>   llist_del_all, llist_add, llist_add
>> >> >>>
>> >> >>> in parallel with
>> >> >>>
>> >> >>>   llist_del_first
>> >> >>>
>> >> >>> could run into the same ABA problem as described above.
>> >> >>
>> >> >> OK.  I will add that.
>> >> >>
>> >> >>>> + *
>> >> >>>> + * If there are multiple producers and one consumer, llist_add can be
>> >> >>>> + * used in producers and llist_del_all or llist_del_first can be used
>> >> >>>> + * in the consumer.
>> >> >>>> + *
>> >> >>>> + * The list entries deleted via llist_del_all can be traversed with
>> >> >>>> + * traversing function such as llist_for_each etc.  But the list
>> >> >>>> + * entries can not be traversed safely before deleted from the list.
>> >> >>>
>> >> >>> Given that this is in fact a stack, specifying the traversal order of
>> >> >>> llist_for_each and friends would be appropriate.
>> >> >>
>> >> >> Ok.  I will add something like "traversing from head to tail" in the
>> >> >> comments.
>> >> >
>> >> > Traversing from last element pushed to first element pushed would
>> >> > probably be clearer.
>> >>
>> >> head and tail describe the current state, while "last/first element
>> >> pushed" describe the history.  I think both are understandable.
>> >
>> > head and tail, in a list implemented as a stack (but that is not
>> > clearly advertised as such), don't convey the meaning of "we're
>> > iterating from the newest element pushed to the oldest one". This
>> > counter-intuitiveness is part of why I would really like to see this
>> > turned into a queue.
>>
>> OK.  I will change the comments, adding these semantics explanation.
>> The user should be warned :)
>
> Yes, that makes sense. After this generalization step, if you're ok with
> this, we could aim at moving the implementation from a stack to a queue
> and provide fifo semantic rather than lifo, so that other users (e.g.
> call_rcu in the kernel) can start benefiting from it.

I think that is good to move from stack to queue.

I will send out changed lock-less data structure patchset soon.  And
we can continue to work on the new lock-less queue at the same time.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-04-05 12:46                         ` huang ying
@ 2011-04-06  1:48                           ` Mathieu Desnoyers
  2011-04-07  2:14                             ` Huang Ying
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Desnoyers @ 2011-04-06  1:48 UTC (permalink / raw)
  To: huang ying
  Cc: Huang Ying, Andrew Morton, Andi Kleen, lenb, Paul E. McKenney,
	linux-kernel, Linus Torvalds

* huang ying (huang.ying.caritas@gmail.com) wrote:
> On Tue, Apr 5, 2011 at 12:42 PM, Mathieu Desnoyers
[snip] 
> 
> >> It seems hard to implement the
> >> dequeue_all,
> >
> > Actually, there is already one implemented :)
> >
> > See urcu-call-rcu.c: call_rcu_thread()
> 
> Have not understand the whole algorithm fully. Will continue to study
> it.  I found there is a synchronize_rcu() in call_crc_thread().  Must
> "dequeue_all" use that too?

Please note that call_rcu_thread() does more than just a dequeue_all: it
dequeues all the current callbacks from the list, waits for a grace
period to elapse, and then execute all the callbacks it got. So the
synchronize_rcu() would not be needed in a dequeue_all implementation.

> 
> >> >> >> mutex is needed between multiple "_slist_stack_pop", but not needed
> >> >> >> between slist_stack_push_lf and _slist_stack_pop.  I think it is hard to
> >> >> >> explain that clearly via function naming.
> >> >> >
> >> >> > Good point. A ascii-art table might be appropriate here, e.g.:
> >> >> >
> >> >> >
> >> >> > M: Mutual exclusion needed to protect one from another.
> >> >> > -: lockless.
> >> >> >
> >> >> >              |     push     |   pop    |   pop_all
> >> >> > push         |      -       |    -     |     -
> >> >> > pop          |              |    L     |     L
> >> >> > pop_all      |              |          |     -
> >> >> >
> >> >> > How about adding this (or something prettier) to the header ?
> >> >>
> >> >> Cool!  I will add that to header.
> >> >>
> >> >> >>> * If we choose to go with an alternate wait-free push implementation:
> >> >> >>>
> >> >> >>> llist_add -> slist_stack_push_wf              (wait-free)
> >> >> >>> llist_del_first -> slist_stack_pop_blocking   (blocking)
> >> >> >>> llist_del_all -> slist_stack_pop_all_blocking (blocking)
> >> >> >>
> >> >> >> We need non-blocking pop, so maybe you need implement another data
> >> >> >> structure which has these interface.  I think there can be multiple
> >> >> >> lock-less data structure in kernel.
> >> >> >
> >> >> > As I noted earlier, the blocking was only due to our user-level
> >> >> > implementation. It can be turned in a very short-lived busy loop
> >> >> > instead.
> >> >> >
> >> >> >>
> >> >> >>>> + *
> >> >> >>>> + * If there are multiple producers and multiple consumers, llist_add
> >> >> >>>> + * can be used in producers and llist_del_all can be used in
> >> >> >>>> + * consumers.  They can work simultaneously without lock.  But
> >> >> >>>> + * llist_del_first can not be used here.  Because llist_del_first
> >> >> >>>> + * depends on list->first->next does not changed if list->first is not
> >> >> >>>> + * changed during its operation, but llist_del_first, llist_add,
> >> >> >>>> + * llist_add sequence in another consumer may violate that.
> >> >> >>>
> >> >> >>> You did not seem to define the locking rules when using both
> >> >> >>>
> >> >> >>>   llist_del_all
> >> >> >>> and
> >> >> >>>   llist_del_first
> >> >> >>>
> >> >> >>> in parallel. I expect that a mutex is needed, because a
> >> >> >>>
> >> >> >>>   llist_del_all, llist_add, llist_add
> >> >> >>>
> >> >> >>> in parallel with
> >> >> >>>
> >> >> >>>   llist_del_first
> >> >> >>>
> >> >> >>> could run into the same ABA problem as described above.
> >> >> >>
> >> >> >> OK.  I will add that.
> >> >> >>
> >> >> >>>> + *
> >> >> >>>> + * If there are multiple producers and one consumer, llist_add can be
> >> >> >>>> + * used in producers and llist_del_all or llist_del_first can be used
> >> >> >>>> + * in the consumer.
> >> >> >>>> + *
> >> >> >>>> + * The list entries deleted via llist_del_all can be traversed with
> >> >> >>>> + * traversing function such as llist_for_each etc.  But the list
> >> >> >>>> + * entries can not be traversed safely before deleted from the list.
> >> >> >>>
> >> >> >>> Given that this is in fact a stack, specifying the traversal order of
> >> >> >>> llist_for_each and friends would be appropriate.
> >> >> >>
> >> >> >> Ok.  I will add something like "traversing from head to tail" in the
> >> >> >> comments.
> >> >> >
> >> >> > Traversing from last element pushed to first element pushed would
> >> >> > probably be clearer.
> >> >>
> >> >> head and tail describe the current state, while "last/first element
> >> >> pushed" describe the history.  I think both are understandable.
> >> >
> >> > head and tail, in a list implemented as a stack (but that is not
> >> > clearly advertised as such), don't convey the meaning of "we're
> >> > iterating from the newest element pushed to the oldest one". This
> >> > counter-intuitiveness is part of why I would really like to see this
> >> > turned into a queue.
> >>
> >> OK.  I will change the comments, adding these semantics explanation.
> >> The user should be warned :)
> >
> > Yes, that makes sense. After this generalization step, if you're ok with
> > this, we could aim at moving the implementation from a stack to a queue
> > and provide fifo semantic rather than lifo, so that other users (e.g.
> > call_rcu in the kernel) can start benefiting from it.
> 
> I think that is good to move from stack to queue.
> 
> I will send out changed lock-less data structure patchset soon.  And
> we can continue to work on the new lock-less queue at the same time.

Sounds like a very good plan! Thanks!

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-04-06  1:48                           ` Mathieu Desnoyers
@ 2011-04-07  2:14                             ` Huang Ying
  2011-04-07 18:32                               ` Mathieu Desnoyers
  0 siblings, 1 reply; 11+ messages in thread
From: Huang Ying @ 2011-04-07  2:14 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: huang ying, Andrew Morton, Andi Kleen, lenb, Paul E. McKenney,
	linux-kernel, Linus Torvalds

On 04/06/2011 09:48 AM, Mathieu Desnoyers wrote:
> * huang ying (huang.ying.caritas@gmail.com) wrote:
[snip]
>>>>
>>>> OK.  I will change the comments, adding these semantics explanation.
>>>> The user should be warned :)
>>>
>>> Yes, that makes sense. After this generalization step, if you're ok with
>>> this, we could aim at moving the implementation from a stack to a queue
>>> and provide fifo semantic rather than lifo, so that other users (e.g.
>>> call_rcu in the kernel) can start benefiting from it.
>>
>> I think that is good to move from stack to queue.
>>
>> I will send out changed lock-less data structure patchset soon.  And
>> we can continue to work on the new lock-less queue at the same time.
> 
> Sounds like a very good plan! Thanks!

Maybe you can send out your lock-less queue patches, so we can work on that.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-04-07  2:14                             ` Huang Ying
@ 2011-04-07 18:32                               ` Mathieu Desnoyers
  2011-04-07 22:05                                 ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Desnoyers @ 2011-04-07 18:32 UTC (permalink / raw)
  To: Huang Ying
  Cc: huang ying, Andrew Morton, Andi Kleen, lenb, Paul E. McKenney,
	linux-kernel, Linus Torvalds

* Huang Ying (ying.huang@intel.com) wrote:
> On 04/06/2011 09:48 AM, Mathieu Desnoyers wrote:
> > * huang ying (huang.ying.caritas@gmail.com) wrote:
> [snip]
> >>>>
> >>>> OK.  I will change the comments, adding these semantics explanation.
> >>>> The user should be warned :)
> >>>
> >>> Yes, that makes sense. After this generalization step, if you're ok with
> >>> this, we could aim at moving the implementation from a stack to a queue
> >>> and provide fifo semantic rather than lifo, so that other users (e.g.
> >>> call_rcu in the kernel) can start benefiting from it.
> >>
> >> I think that is good to move from stack to queue.
> >>
> >> I will send out changed lock-less data structure patchset soon.  And
> >> we can continue to work on the new lock-less queue at the same time.
> > 
> > Sounds like a very good plan! Thanks!
> 
> Maybe you can send out your lock-less queue patches, so we can work on that.

Yep, let's wait until your implementation is finalized and merged, and
then ping me again so I can cook up a RFC patch turning llist into a
queue, if it's OK with you.

Thanks,

Mathieu

> 
> Best Regards,
> Huang Ying

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: About lock-less data structure patches
  2011-04-07 18:32                               ` Mathieu Desnoyers
@ 2011-04-07 22:05                                 ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2011-04-07 22:05 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Huang Ying, huang ying, Andrew Morton, Andi Kleen, lenb,
	linux-kernel, Linus Torvalds

On Thu, Apr 07, 2011 at 02:32:06PM -0400, Mathieu Desnoyers wrote:
> * Huang Ying (ying.huang@intel.com) wrote:
> > On 04/06/2011 09:48 AM, Mathieu Desnoyers wrote:
> > > * huang ying (huang.ying.caritas@gmail.com) wrote:
> > [snip]
> > >>>>
> > >>>> OK.  I will change the comments, adding these semantics explanation.
> > >>>> The user should be warned :)
> > >>>
> > >>> Yes, that makes sense. After this generalization step, if you're ok with
> > >>> this, we could aim at moving the implementation from a stack to a queue
> > >>> and provide fifo semantic rather than lifo, so that other users (e.g.
> > >>> call_rcu in the kernel) can start benefiting from it.

Just to be clear...  Currently, call_rcu() works on a per-CPU basis,
so that it can simply disable interrupts and then do the queuing
non-atomically.

However, should it be necessary to cross-queue RCU callbacks in order
to avoid ever executing an RCU callback on a given CPU, then something
like this might become useful.

							Thanx, Paul

> > >> I think that is good to move from stack to queue.
> > >>
> > >> I will send out changed lock-less data structure patchset soon.  And
> > >> we can continue to work on the new lock-less queue at the same time.
> > > 
> > > Sounds like a very good plan! Thanks!
> > 
> > Maybe you can send out your lock-less queue patches, so we can work on that.
> 
> Yep, let's wait until your implementation is finalized and merged, and
> then ping me again so I can cook up a RFC patch turning llist into a
> queue, if it's OK with you.
> 
> Thanks,
> 
> Mathieu
> 
> > 
> > Best Regards,
> > Huang Ying
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-04-07 22:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4D928405.4050607@intel.com>
     [not found] ` <20110329182145.e64b66b5.akpm@linux-foundation.org>
     [not found]   ` <4D9287C3.7030103@intel.com>
     [not found]     ` <20110330032203.GC21838@one.firstfloor.org>
     [not found]       ` <20110329202649.137a8a18.akpm@linux-foundation.org>
     [not found]         ` <20110330133411.GA11101@Krystal>
     [not found]           ` <BLU0-SMTP52616BFA7ABB7E8E73E8396BC0@phx.gbl>
2011-03-31  1:39             ` About lock-less data structure patches Huang Ying
2011-04-01 21:37               ` Mathieu Desnoyers
2011-04-02  5:05                 ` Huang Ying
2011-04-04 15:53                   ` Mathieu Desnoyers
2011-04-05  1:16                     ` huang ying
2011-04-05  4:42                       ` Mathieu Desnoyers
2011-04-05 12:46                         ` huang ying
2011-04-06  1:48                           ` Mathieu Desnoyers
2011-04-07  2:14                             ` Huang Ying
2011-04-07 18:32                               ` Mathieu Desnoyers
2011-04-07 22:05                                 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).