All of lore.kernel.org
 help / color / mirror / Atom feed
From: Huang Ying <ying.huang@intel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH -mm 1/2] irq_work, Use llist in irq_work
Date: Fri, 02 Sep 2011 09:08:52 +0800	[thread overview]
Message-ID: <4E602CA4.4000703@intel.com> (raw)
In-Reply-To: <20110901125140.GA22737@Krystal>

On 09/01/2011 08:51 PM, Mathieu Desnoyers wrote:
> * Huang Ying (ying.huang@intel.com) wrote:
>> On 09/01/2011 03:58 PM, Peter Zijlstra wrote:
>>> On Thu, 2011-09-01 at 11:20 +0800, Huang Ying wrote:
>>>> On 09/01/2011 09:46 AM, Huang Ying wrote:
>>>>>>> -static void __irq_work_queue(struct irq_work *entry)
>>>>>>> +static void __irq_work_queue(struct irq_work *work)
>>>>>>>  {
>>>>>>> -	struct irq_work *next;
>>>>>>> +	struct irq_work_list *irq_work_list;
>>>>>>>  
>>>>>>> -	preempt_disable();
>>>>>>> +	irq_work_list = &get_cpu_var(irq_work_lists);
>>>>>>>  
>>>>>>> -	do {
>>>>>>> -		next = __this_cpu_read(irq_work_list);
>>>>>>> -		/* Can assign non-atomic because we keep the flags set. */
>>>>>>> -		entry->next = next_flags(next, IRQ_WORK_FLAGS);
>>>>>>> -	} while (this_cpu_cmpxchg(irq_work_list, next, entry) != next);
>>>>>>> +	llist_add(&work->llnode, &irq_work_list->llist);
>>>>>>>  
>>>>>>>  	/* The list was empty, raise self-interrupt to start processing. */
>>>>>>> -	if (!irq_work_next(entry))
>>>>>>> +	if (!test_and_set_bit(LIST_NONEMPTY_BIT, &irq_work_list->flags))
>>>>>>>  		arch_irq_work_raise();
>>>>>>
>>>>>> So why can't you simply test work->llnode->next?
>>>>>
>>>>> Yes.  That is better.  Even if there may be a small race window, it is
>>>>> not a big issue to raise one extra self interrupt seldom.
>>>>
>>>> Remember something about this.  I didn't test work->llnode->next here
>>>> because I didn't want expose the implementation details like that here.
>>>>  How about make llist_add() return whether list is empty before adding?
>>>>  Because it will be an inline function, that should be optimized out if
>>>> the caller do not need the information.
> 
> Yes, that would make sense.
> 
> something like
> 
> static inline
> struct llist_node *llist_add(struct llist_node *new, struct llist_head *head)
> {
> 	struct llist_node *entry, *old_entry;
> 
> #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> 	BUG_ON(in_nmi());
> #endif
> 
> 	entry = head->first;
> 	do {
> 		old_entry = entry;
> 		new->next = entry;
> 		cpu_relax();
> 	} while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry);
>         return old_entry;
> }
> 
> Where llist_add returns NULL if the list was initially empty, else
> returns non-null. This return value usage should be documented in the
> function header, of course.
> 
> BUT, big warning here: this whole thing _should_ be renamed as a
> "lockless stack" rather than a "lockless list". Really. I said it in the
> past, and here I see another example showing why we should. The only
> reason why we can get away this easily with knowing atomically if the
> structure was empty prior to the insertion is because this "list"
> hehaves like a stack (LIFO). I foresee we'll need to add "lockless
> queues" at some point (FIFO), which has the ability to enqueue/dequeue
> concurrently without sharing the same cache-lines when the queue is
> non-empty. Within that kind of queue structure, knowing if the queue was
> empty prior to insertion would become a bottleneck, so I would not
> advise to make that part of _that_ API, which would require to add a new
> "llqueue" API. And at that point, the "llist" vs "llqueue" semantic
> would become really confusing. Maybe "llstack" would be more appropriate
> here instead of llist ? Or llfifo ? The API we can provide with a
> lock-less structure does not only depend on how the elements are
> organized, but also on the operations allowed on the structure. So the
> API should reflect that. So I'm saying this warning again: if we don't
> fix that now, I think we're heading into a lock-free structure
> namespacing trainwreck that will limit our ability to add other
> lock-free operations due vague naming that does not take the operations
> allowed on the structure into consideration, combined with API
> constraints permitted by a specific given behavior (e.g. FIFO semantic)
> that tend to define these lock-free APIs.

I remember the previous consensus between us is to keep the API of llist
and may change its implementation in the future.  But it seems that
define a really general llist API is too hard.  But fortunately, because
llist is just inside kernel (not like a user space library, which is
used by various applications), we can change its name in the future if
it is needed.

> I've been working on lock-free structures myself in Userspace RCU: I
> have lock-free stack and queue, wait-free stack, wait-free queue, and
> (work in progress), RCU red-black tree (not lock-free), and lock-free
> RCU expandable hash table. If we plan namespacing of lock-free data
> structure correctly, I think there will be room for very interesting
> performance and scalability improvements in the future.

That is interesting.  But we need find the user in Linux kernel first.

>>>
>>> You could also use llist_empty() although that brings me to that
>>> ACCESS_ONCE thing in there, what's the point?
>>
>> Something as follow with llist_empty() seems not work.
>>
>>   empty = llist_empty(irq_work_list);
>>   llist_add(&work->llnode, irq_work_list);
>>   if (empty)
>>           arch_irq_work_raise();
>>
>> Because irq_work IRQ handler or timer IRQ handler may be executed just
>> before "llist_add(&work->llnode, irq_work_list)", so that, although
>> "empty == false", arch_irq_work_raise() still should be executed.
>>
>> Can you tell me how to that with llist_empty()?
>>
>>
>> For ACCESS_ONCE, Mathiew suggest me to add it,
>>
>> Hi, Mathiew,
>>
>> Can you explain why ACCESS_ONCE should be used here?
> 
> It was mainly to force the compiler to re-fetch the head->first value
> from memory in busy-waiting loops. So even though the right practice is
> to have a cpu_relax() in the body of the loop (which would force the
> refetch due to the compiler barrier), having the ACCESS_ONCE in
> llist_empty() should not hurt much. It also seems to be what atomic*.h
> does (volatile read), so I guess the expected behavior wrt atomic
> accesses are to behave as volatile, hence my recommendation to make this
> llist_empty a volatile access. Quoting my previous email on that topic:
> 
> "Would it make sense to do:
> 
>  return ACCESS_ONCE(head->first) == NULL;
> 
> instead ? Otherwise the compiler can choose to keep the result around in
> registers without re-reading (e.g. busy waiting loop)."
> 
> So I was not implying it was strictly required, merely wondering whether
> it should be added.

Thanks for your explanation.  I should refer to our previous email
firstly.  I think your point is reasonable.

Best Regards,
Huang Ying

  parent reply	other threads:[~2011-09-02  1:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-30  5:16 [PATCH -mm 0/2] Use llist in irq_work and xlist Huang Ying
2011-08-30  5:16 ` [PATCH -mm 1/2] irq_work, Use llist in irq_work Huang Ying
2011-08-31 10:10   ` Peter Zijlstra
2011-09-01  1:46     ` Huang Ying
2011-09-01  3:20       ` Huang Ying
2011-09-01  7:58         ` Peter Zijlstra
2011-09-01  8:56           ` Huang Ying
2011-09-01  9:57             ` Peter Zijlstra
2011-09-02  1:14               ` Huang Ying
2011-09-03 17:35                 ` Mathieu Desnoyers
2011-09-01 12:51             ` Mathieu Desnoyers
2011-09-01 13:00               ` Mathieu Desnoyers
2011-09-02  1:08               ` Huang Ying [this message]
2011-09-03 16:33                 ` Mathieu Desnoyers
2011-09-01  7:57       ` Peter Zijlstra
2011-09-01  8:44         ` Huang Ying
2011-09-01 10:00           ` Peter Zijlstra
2011-09-02  1:18             ` Huang Ying
2011-09-02 13:26               ` Peter Zijlstra
2011-08-30  5:16 ` [PATCH -mm 2/2] net, rds, Replace xlist in net/rds/xlist.h with llist Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E602CA4.4000703@intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.