From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932879Ab1IBBI5 (ORCPT ); Thu, 1 Sep 2011 21:08:57 -0400 Received: from mga11.intel.com ([192.55.52.93]:25142 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932809Ab1IBBI4 (ORCPT ); Thu, 1 Sep 2011 21:08:56 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.68,316,1312182000"; d="scan'208";a="47465329" Message-ID: <4E602CA4.4000703@intel.com> Date: Fri, 02 Sep 2011 09:08:52 +0800 From: Huang Ying User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.20) Gecko/20110820 Iceowl/1.0b2 Icedove/3.1.12 MIME-Version: 1.0 To: Mathieu Desnoyers CC: Peter Zijlstra , Andrew Morton , "linux-kernel@vger.kernel.org" , "Paul E. McKenney" Subject: Re: [PATCH -mm 1/2] irq_work, Use llist in irq_work References: <1314681384-20881-1-git-send-email-ying.huang@intel.com> <1314681384-20881-2-git-send-email-ying.huang@intel.com> <1314785405.23993.21.camel@twins> <4E5EE409.3060102@intel.com> <4E5EFA08.30205@intel.com> <1314863927.7945.11.camel@twins> <4E5F48D1.801@intel.com> <20110901125140.GA22737@Krystal> In-Reply-To: <20110901125140.GA22737@Krystal> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/01/2011 08:51 PM, Mathieu Desnoyers wrote: > * Huang Ying (ying.huang@intel.com) wrote: >> On 09/01/2011 03:58 PM, Peter Zijlstra wrote: >>> On Thu, 2011-09-01 at 11:20 +0800, Huang Ying wrote: >>>> On 09/01/2011 09:46 AM, Huang Ying wrote: >>>>>>> -static void __irq_work_queue(struct irq_work *entry) >>>>>>> +static void __irq_work_queue(struct irq_work *work) >>>>>>> { >>>>>>> - struct irq_work *next; >>>>>>> + struct irq_work_list *irq_work_list; >>>>>>> >>>>>>> - preempt_disable(); >>>>>>> + irq_work_list = &get_cpu_var(irq_work_lists); >>>>>>> >>>>>>> - do { >>>>>>> - next = __this_cpu_read(irq_work_list); >>>>>>> - /* Can assign non-atomic because we keep the flags set. */ >>>>>>> - entry->next = next_flags(next, IRQ_WORK_FLAGS); >>>>>>> - } while (this_cpu_cmpxchg(irq_work_list, next, entry) != next); >>>>>>> + llist_add(&work->llnode, &irq_work_list->llist); >>>>>>> >>>>>>> /* The list was empty, raise self-interrupt to start processing. */ >>>>>>> - if (!irq_work_next(entry)) >>>>>>> + if (!test_and_set_bit(LIST_NONEMPTY_BIT, &irq_work_list->flags)) >>>>>>> arch_irq_work_raise(); >>>>>> >>>>>> So why can't you simply test work->llnode->next? >>>>> >>>>> Yes. That is better. Even if there may be a small race window, it is >>>>> not a big issue to raise one extra self interrupt seldom. >>>> >>>> Remember something about this. I didn't test work->llnode->next here >>>> because I didn't want expose the implementation details like that here. >>>> How about make llist_add() return whether list is empty before adding? >>>> Because it will be an inline function, that should be optimized out if >>>> the caller do not need the information. > > Yes, that would make sense. > > something like > > static inline > struct llist_node *llist_add(struct llist_node *new, struct llist_head *head) > { > struct llist_node *entry, *old_entry; > > #ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG > BUG_ON(in_nmi()); > #endif > > entry = head->first; > do { > old_entry = entry; > new->next = entry; > cpu_relax(); > } while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry); > return old_entry; > } > > Where llist_add returns NULL if the list was initially empty, else > returns non-null. This return value usage should be documented in the > function header, of course. > > BUT, big warning here: this whole thing _should_ be renamed as a > "lockless stack" rather than a "lockless list". Really. I said it in the > past, and here I see another example showing why we should. The only > reason why we can get away this easily with knowing atomically if the > structure was empty prior to the insertion is because this "list" > hehaves like a stack (LIFO). I foresee we'll need to add "lockless > queues" at some point (FIFO), which has the ability to enqueue/dequeue > concurrently without sharing the same cache-lines when the queue is > non-empty. Within that kind of queue structure, knowing if the queue was > empty prior to insertion would become a bottleneck, so I would not > advise to make that part of _that_ API, which would require to add a new > "llqueue" API. And at that point, the "llist" vs "llqueue" semantic > would become really confusing. Maybe "llstack" would be more appropriate > here instead of llist ? Or llfifo ? The API we can provide with a > lock-less structure does not only depend on how the elements are > organized, but also on the operations allowed on the structure. So the > API should reflect that. So I'm saying this warning again: if we don't > fix that now, I think we're heading into a lock-free structure > namespacing trainwreck that will limit our ability to add other > lock-free operations due vague naming that does not take the operations > allowed on the structure into consideration, combined with API > constraints permitted by a specific given behavior (e.g. FIFO semantic) > that tend to define these lock-free APIs. I remember the previous consensus between us is to keep the API of llist and may change its implementation in the future. But it seems that define a really general llist API is too hard. But fortunately, because llist is just inside kernel (not like a user space library, which is used by various applications), we can change its name in the future if it is needed. > I've been working on lock-free structures myself in Userspace RCU: I > have lock-free stack and queue, wait-free stack, wait-free queue, and > (work in progress), RCU red-black tree (not lock-free), and lock-free > RCU expandable hash table. If we plan namespacing of lock-free data > structure correctly, I think there will be room for very interesting > performance and scalability improvements in the future. That is interesting. But we need find the user in Linux kernel first. >>> >>> You could also use llist_empty() although that brings me to that >>> ACCESS_ONCE thing in there, what's the point? >> >> Something as follow with llist_empty() seems not work. >> >> empty = llist_empty(irq_work_list); >> llist_add(&work->llnode, irq_work_list); >> if (empty) >> arch_irq_work_raise(); >> >> Because irq_work IRQ handler or timer IRQ handler may be executed just >> before "llist_add(&work->llnode, irq_work_list)", so that, although >> "empty == false", arch_irq_work_raise() still should be executed. >> >> Can you tell me how to that with llist_empty()? >> >> >> For ACCESS_ONCE, Mathiew suggest me to add it, >> >> Hi, Mathiew, >> >> Can you explain why ACCESS_ONCE should be used here? > > It was mainly to force the compiler to re-fetch the head->first value > from memory in busy-waiting loops. So even though the right practice is > to have a cpu_relax() in the body of the loop (which would force the > refetch due to the compiler barrier), having the ACCESS_ONCE in > llist_empty() should not hurt much. It also seems to be what atomic*.h > does (volatile read), so I guess the expected behavior wrt atomic > accesses are to behave as volatile, hence my recommendation to make this > llist_empty a volatile access. Quoting my previous email on that topic: > > "Would it make sense to do: > > return ACCESS_ONCE(head->first) == NULL; > > instead ? Otherwise the compiler can choose to keep the result around in > registers without re-reading (e.g. busy waiting loop)." > > So I was not implying it was strictly required, merely wondering whether > it should be added. Thanks for your explanation. I should refer to our previous email firstly. I think your point is reasonable. Best Regards, Huang Ying