From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752047Ab1CPMG1 (ORCPT ); Wed, 16 Mar 2011 08:06:27 -0400 Received: from e32.co.us.ibm.com ([32.97.110.150]:42721 "EHLO e32.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750990Ab1CPMG0 (ORCPT ); Wed, 16 Mar 2011 08:06:26 -0400 Date: Wed, 16 Mar 2011 05:06:06 -0700 From: "Paul E. McKenney" To: Milton Miller Cc: Peter Zijlstra , akpm@linux-foundation.org, Anton Blanchard , xiaoguangrong@cn.fujitsu.com, mingo@elte.hu, jaxboe@fusionio.com, npiggin@gmail.com, rusty@rustcorp.com.au, efault@gmx.de, Jan Beulich , Dimitri Sivanich , Tony Luck , torvalds@linux-foundation.org, benh@kernel.crashing.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/4 v3] call_function_many: add missing ordering Message-ID: <20110316120606.GN2273@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20110112150740.77dde58c@kryten> <1295288253.30950.280.camel@laptop> <1296145360.15234.234.camel@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 15, 2011 at 01:27:16PM -0600, Milton Miller wrote: > Paul McKenney's review pointed out two problems with the barriers > in the 2.6.38 update to the smp call function many code. > > First, a barrier that would force the func and info members of data > to be visible before their consumption in the interrupt handler > was missing. This can be solved by adding a smp_wmb between > setting the func and info members and setting setting the cpumask; > this will pair with the existing and required smp_rmb ordering > the cpumask read before the read of refs. This placement avoids > the need a second smp_rmb in the interrupt handler which would > be executed on each of the N cpus executing the call request. > (I was thinking this barrier was present but was not). > > Second, the previous write to refs (establishing the zero that > we the interrupt handler was testing from all cpus) was performed > by a third party cpu. This would invoke transitivity which, as > a recient or concurrent addition to memory-barriers.txt now > explicitly states, would require a full smp_mb(). > > However, we know the cpumask will only be set by one cpu (the > data owner) and any preivous iteration of the mask would have > cleared by the reading cpu. By redundantly writing refs to 0 on > the owning cpu before the smp_wmb, the write to refs will follow > the same path as the writes that set the cpumask, which in turn > allows us to keep the barrier in the interrupt handler a smp_rmb > instead of promoting it to a smp_mb (which will be be executed > by N cpus for each of the possible M elements on the list). > > I moved and expanded the comment about our (ab)use of the rcu list > primitives for the concurrent walk earlier into this function. > I considered moving the first two paragraphs to the queue list > head and lock, but felt it would have been too disconected from > the code. > > Cc: Paul McKinney > Cc: stable (2.6.32 and later) > Signed-off-by: Milton Miller > > Paul, please review this alternative to your patch at either of Hello, Milton, At first glance, this looks promising, but it will take me a few more days to wrap my head fully around it. Fair enough? Thanx, Paul > http://marc.info/?l=linux-kernel&m=129662029916241&w=2 > https://patchwork.kernel.org/patch/525891/ > > In contrast to your patch, this proposal keeps the additional > barriers in the (interrupt enabled) requester instead of the > execution interrupt, where they would be executed by all N cpus > in the system triggered concurrently for each of the N possible > elements in the list. > > Index: common/kernel/smp.c > =================================================================== > --- common.orig/kernel/smp.c 2011-03-15 05:21:41.000000000 -0500 > +++ common/kernel/smp.c 2011-03-15 05:22:26.000000000 -0500 > @@ -483,23 +483,42 @@ void smp_call_function_many(const struct > > data = &__get_cpu_var(cfd_data); > csd_lock(&data->csd); > - BUG_ON(atomic_read(&data->refs) || !cpumask_empty(data->cpumask)); > > - data->csd.func = func; > - data->csd.info = info; > - cpumask_and(data->cpumask, mask, cpu_online_mask); > - cpumask_clear_cpu(this_cpu, data->cpumask); > + /* This BUG_ON verifies our reuse assertions and can be removed */ > + BUG_ON(atomic_read(&data->refs) || !cpumask_empty(data->cpumask)); > > /* > + * The global call function queue list add and delete are protected > + * by a lock, but the list is traversed without any lock, relying > + * on the rcu list add and delete to allow safe concurrent traversal. > * We reuse the call function data without waiting for any grace > * period after some other cpu removes it from the global queue. > - * This means a cpu might find our data block as it is writen. > - * The interrupt handler waits until it sees refs filled out > - * while its cpu mask bit is set; here we may only clear our > - * own cpu mask bit, and must wait to set refs until we are sure > - * previous writes are complete and we have obtained the lock to > - * add the element to the queue. > + * This means a cpu might find our data block as it is being > + * filled out. > + * > + * We hold off the interrupt handler on the other cpu by > + * ordering our writes to the cpu mask vs our setting of the > + * refs counter. We assert only the cpu owning the data block > + * will set a bit in cpumask, and each bit will only be cleared > + * by the subject cpu. Each cpu must first find its bit is > + * set and then check that refs is set indicating the element is > + * ready to be processed, otherwise it must skip the entry. > + * > + * On the previous iteration refs was set to 0 by another cpu. > + * To avoid the use of transitivity, set the counter to 0 here > + * so the wmb will pair with the rmb in the interrupt handler. > */ > + atomic_set(&data->refs, 0); /* convert 3rd to 1st party write */ > + > + data->csd.func = func; > + data->csd.info = info; > + > + /* Ensure 0 refs is visible before mask. Also orders func and info */ > + smp_wmb(); > + > + /* We rely on the "and" being processed before the store */ > + cpumask_and(data->cpumask, mask, cpu_online_mask); > + cpumask_clear_cpu(this_cpu, data->cpumask); > > raw_spin_lock_irqsave(&call_function.lock, flags); > /* > @@ -509,8 +528,9 @@ void smp_call_function_many(const struct > */ > list_add_rcu(&data->csd.list, &call_function.queue); > /* > - * We rely on the wmb() in list_add_rcu to order the writes > - * to func, data, and cpumask before this write to refs. > + * We rely on the wmb() in list_add_rcu to complete our writes > + * to the cpumask before this write to refs, which indicates > + * data is on the list and is ready to be processed. > */ > atomic_set(&data->refs, cpumask_weight(data->cpumask)); > raw_spin_unlock_irqrestore(&call_function.lock, flags); > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/