From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758000AbcLUQsn (ORCPT ); Wed, 21 Dec 2016 11:48:43 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:46441 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932325AbcLUQsl (ORCPT ); Wed, 21 Dec 2016 11:48:41 -0500 Date: Wed, 21 Dec 2016 08:48:45 -0800 From: "Paul E. McKenney" To: Boqun Feng Cc: Colin Ian King , Mark Rutland , linux-kernel@vger.kernel.org, Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan Subject: Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp() Reply-To: paulmck@linux.vnet.ibm.com References: <20161215120459.GE21758@leverpostej> <20161215144242.GN9728@tardis.cn.ibm.com> <05a9953b-aaa4-6117-b120-85c12ad56ace@canonical.com> <20161219151515.GP9728@tardis.cn.ibm.com> <20161220050913.GP3924@linux.vnet.ibm.com> <20161220055914.GB1316@tardis.cn.ibm.com> <20161220152352.GQ3924@linux.vnet.ibm.com> <20161221023456.GE1316@tardis.cn.ibm.com> <20161221034024.GC3924@linux.vnet.ibm.com> <20161221041808.GF1316@tardis.cn.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161221041808.GF1316@tardis.cn.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16122116-0004-0000-0000-0000112460D3 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006290; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000198; SDB=6.00797033; UDB=6.00386892; IPR=6.00574843; BA=6.00004998; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013675; XFM=3.00000011; UTC=2016-12-21 16:48:37 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16122116-0005-0000-0000-00007B992185 Message-Id: <20161221164845.GH3924@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-12-21_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1612210263 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 21, 2016 at 12:18:08PM +0800, Boqun Feng wrote: > On Tue, Dec 20, 2016 at 07:40:24PM -0800, Paul E. McKenney wrote: > [...] > > > > > > Agreed, my intent is to keep this overcare check for couples of releases > > > and if no one shoots his/her foot, we can remove it, if not, it > > > definitely means this part is subtle, and we need to pay more attention > > > to it, maybe write some regression tests for this particular problem to > > > help developers avoid it. > > > > > > This check is supposed to be removed, so I'm not stick to keeping it. > > > > I suggest keeping through validation. If it triggers during that time, > > consider keeping it longer. If it does not trigger, remove it before > > it goes upstream. > > Good point ;-) > > [...] > > > > > > > > > > But this brings a side question, is the callsite of rcu_cpu_starting() > > > > > is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only > > > > > set _this_ cpu's bit in a leaf node? > > > > > > > > The calls from notify_cpu_starting() are called from the various > > > > start_kernel_secondary(), secondary_start_kernel(), and similarly > > > > named functions. These are called on the incoming CPU early in that > > > > CPU's execution. The call from rcu_init() is correct until such time > > > > as more than one CPU can be running at rcu_init() time. And that > > > > day might be coming, so please see the untested patch below. > > > > > > Looks better than mine ;-) > > > > > > But do we need to worry that we start rcu on each CPU twice, which may > > > slow down the boot? > > > > We only start a given CPU once. The boot CPU at rcu_init() time, and > > the rest at CPU-hotplug time. Unless of course a CPU is later taken > > Confused... we call rcu_cpu_starting() in a for_each_online_cpu() loop > in rcu_init(), so we basically start all online CPUs there after > applying your patch. And all the rest CPUs will get themselves start > again at CPU-hotplug time, right? At rcu_init() time, there is only one online CPU, namely the boot CPU. Or perhaps your point is that if CPUs come online before rcu_init(), they might do so via the normal online mechanism. I don't believe that this is likely, because the normal online mechanism reaquires the scheduler be running. But either way, my hope would be that whoever fires up CPUs before rcu_init() asks a few questions when they run into bugs. ;-) > Besides, without your patch, we started the boot CPU many times in the > for_each_online_cpu() loop. That is true. It is harmless because it just does a group of assignments repeatedly, and because there is only one CPU and because interrupts are disabled, this cannot have any effect. And my fix inadvertently fixed this issue, didn't it? So I do need to update the commit log accordingly. Done! > Am I missing something subtle? Given the nature of RCU, the only possible answer I can give to that question is "probably". (Hey, you asked!!!) Thanx, Paul > Regards, > Boqun > > > offline, in which case we start it again when it comes back online. > > > > Thanx, Paul > > > > > Regards, > > > Boqun > > > > > > > Thanx, Paul > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > commit 1e84402587173d6d4da8645689f0e24c877b3269 > > > > Author: Paul E. McKenney > > > > Date: Tue Dec 20 07:17:58 2016 -0800 > > > > > > > > rcu: Make rcu_cpu_starting() use its "cpu" argument > > > > > > > > The rcu_cpu_starting() function uses this_cpu_ptr() to locate the > > > > incoming CPU's rcu_data structure. This works for the boot CPU and for > > > > all CPUs onlined after rcu_init() executes (during very early boot). > > > > Currently, this is the full set of CPUs, so all is well. But if > > > > anyone ever parallelizes boot before rcu_init() time, it will fail. > > > > This commit therefore substitutes the rcu_cpu_starting() function's > > > > this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and > > > > (arguably) improving readability. > > > > > > > > Reported-by: Boqun Feng > > > > Signed-off-by: Paul E. McKenney > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index b9d3c0e30935..083cb8a6299c 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -4017,7 +4017,7 @@ void rcu_cpu_starting(unsigned int cpu) > > > > struct rcu_state *rsp; > > > > > > > > for_each_rcu_flavor(rsp) { > > > > - rdp = this_cpu_ptr(rsp->rda); > > > > + rdp = per_cpu_ptr(rsp->rda, cpu); > > > > rnp = rdp->mynode; > > > > mask = rdp->grpmask; > > > > raw_spin_lock_irqsave_rcu_node(rnp, flags); > > > > > > > >