From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66161C4320A for ; Thu, 29 Jul 2021 14:41:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4A51F6069E for ; Thu, 29 Jul 2021 14:41:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237253AbhG2Ol6 (ORCPT ); Thu, 29 Jul 2021 10:41:58 -0400 Received: from mail.efficios.com ([167.114.26.124]:41520 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237295AbhG2Olw (ORCPT ); Thu, 29 Jul 2021 10:41:52 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id DEE1A369AE3; Thu, 29 Jul 2021 10:41:22 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id CWqFcYkL-vGL; Thu, 29 Jul 2021 10:41:18 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 6CAFB369A46; Thu, 29 Jul 2021 10:41:18 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 6CAFB369A46 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1627569678; bh=imOEQJggP8apNWlmWM1pp9n9c+v+640+TBqxfmwfWtI=; h=Date:From:To:Message-ID:MIME-Version; b=PbC9dT2My8p6XygR5lOTkt2EJuykhkCPHo+GM/EcQ/rT/oWDr4+mtjuO1zc0WXbBH Oo25CXt39C2w88PHvqHjj8Y2imNsZieTIHeEWVF5kLEz2Pu2psh8MNvavD8ZtdfZTK 0ghfNBHMPeaiBtkvQMyk12dNT/Ov5GIYk2WzMfxFLkAi8yYDN9u705308TUG6YVcKx rR8++xovYn9Oqx/PA70V6qiquZyfIPc9ImIuk7sK5Pj5qqM3sD6zbHHUFU4vatPTSU Zh7aBMrrWSCxnISZkVZ1ItS1PW8tfab+sFL25xR+Vw1Xb9pzFw1g+ICX4QKzU9GWN4 jKJKbiW64Se5w== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id WHHQgscXqbFY; Thu, 29 Jul 2021 10:41:18 -0400 (EDT) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id 48E953698FE; Thu, 29 Jul 2021 10:41:18 -0400 (EDT) Date: Thu, 29 Jul 2021 10:41:18 -0400 (EDT) From: Mathieu Desnoyers To: paulmck Cc: rcu , linux-kernel , kernel-team , Ingo Molnar , Lai Jiangshan , Andrew Morton , Josh Triplett , Thomas Gleixner , Peter Zijlstra , rostedt , David Howells , Eric Dumazet , fweisbec , Oleg Nesterov , "Joel Fernandes, Google" , Linus Torvalds Message-ID: <1929727713.10248.1627569678176.JavaMail.zimbra@efficios.com> In-Reply-To: <20210728202802.GL4397@paulmck-ThinkPad-P17-Gen-1> References: <20210721202042.GA1472052@paulmck-ThinkPad-P17-Gen-1> <20210721202127.2129660-4-paulmck@kernel.org> <20210728173715.GA9416@paulmck-ThinkPad-P17-Gen-1> <2135064974.9081.1627496585724.JavaMail.zimbra@efficios.com> <20210728185854.GK4397@paulmck-ThinkPad-P17-Gen-1> <20210728194505.GA1500024@paulmck-ThinkPad-P17-Gen-1> <874308613.9545.1627502582005.JavaMail.zimbra@efficios.com> <20210728202802.GL4397@paulmck-ThinkPad-P17-Gen-1> Subject: Re: [PATCH v2 rcu 04/18] rcu: Weaken ->dynticks accesses and updates MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_4059 (ZimbraWebClient - FF90 (Linux)/8.8.15_GA_4059) Thread-Topic: Weaken ->dynticks accesses and updates Thread-Index: ixJkojOxsNGlWYgstuju2nSnDslUDg== Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org ----- On Jul 28, 2021, at 4:28 PM, paulmck paulmck@kernel.org wrote: > On Wed, Jul 28, 2021 at 04:03:02PM -0400, Mathieu Desnoyers wrote: >> ----- On Jul 28, 2021, at 3:45 PM, paulmck paulmck@kernel.org wrote: >> [...] >> > >> > And how about like this? >> > >> > Thanx, Paul >> > >> > ------------------------------------------------------------------------ >> > >> > commit cb8914dcc6443cca15ce48d937a93c0dfdb114d3 >> > Author: Paul E. McKenney >> > Date: Wed Jul 28 12:38:42 2021 -0700 >> > >> > rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting() >> > >> > The purpose of rcu_dynticks_eqs_online() is to adjust the ->dynticks >> > counter of an incoming CPU if required. It is currently is invoked >> >> "is currently is" -> "is currently" > > Good catch, fixed! > >> > from rcutree_prepare_cpu(), which runs before the incoming CPU is >> > running, and thus on some other CPU. This makes the per-CPU accesses in >> > rcu_dynticks_eqs_online() iffy at best, and it all "works" only because >> > the running CPU cannot possibly be in dyntick-idle mode, which means >> > that rcu_dynticks_eqs_online() never has any effect. One could argue >> > that this means that rcu_dynticks_eqs_online() is unnecessary, however, >> > removing it makes the CPU-online process vulnerable to slight changes >> > in the CPU-offline process. >> >> Why favor moving this from the prepare_cpu to the cpu_starting hotplug step, >> rather than using the target cpu's rdp from rcutree_prepare_cpu ? Maybe there >> was a good reason for having this very early in the prepare_cpu step ? > > Some years back, there was a good reason. This reason was that > rcutree_prepare_cpu() marked the CPU as being online from an RCU > viewpoint. But now rcu_cpu_starting() is the one that marks the CPU as > being online, so the ->dynticks check can be deferred to this function. > >> Also, the commit message refers to this bug as having no effect because the >> running CPU cannot possibly be in dyntick-idle mode. I understand that calling >> this function was indeed effect-less, but then why is it OK for the CPU coming >> online to skip this call in the first place ? This commit message hints at >> "slight changes in the CPU-offline process" which could break it, but therer is >> no explanation of what makes this not an actual bug fix. > > Because rcutorture would not have suffered in silence had this > situation ever arisen. Testing can usually prove the presence of a bug, but it's rather tricky to prove the absence of bug. > > I have updated the commit log to answer these questions as shown > below. Thoughts? I'm still concerned about one scenario wrt moving rcu_dynticks_eqs_online() from rcutree_prepare_cpu to rcu_cpu_starting. What happens if an interrupt handler, or a NMI handler, nests early over the CPU-online startup code ? AFAIU, this interrupt handler could contain RCU read-side critical sections, but if the eqs state does not show the CPU as "online", I wonder whether it will work as expected. Thanks, Mathieu > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 516c8c4cc6fce62539f7e0182739812db4591c1d > Author: Paul E. McKenney > Date: Wed Jul 28 12:38:42 2021 -0700 > > rcu: Move rcu_dynticks_eqs_online() to rcu_cpu_starting() > > The purpose of rcu_dynticks_eqs_online() is to adjust the ->dynticks > counter of an incoming CPU when required. It is currently invoked > from rcutree_prepare_cpu(), which runs before the incoming CPU is > running, and thus on some other CPU. This makes the per-CPU accesses in > rcu_dynticks_eqs_online() iffy at best, and it all "works" only because > the running CPU cannot possibly be in dyntick-idle mode, which means > that rcu_dynticks_eqs_online() never has any effect. > > It is currently OK for rcu_dynticks_eqs_online() to have no effect, but > only because the CPU-offline process just happens to leave ->dynticks in > the correct state. After all, if ->dynticks were in the wrong state on a > just-onlined CPU, rcutorture would complain bitterly the next time that > CPU went idle, at least in kernels built with CONFIG_RCU_EQS_DEBUG=y, > for example, those built by rcutorture scenario TREE04. One could > argue that this means that rcu_dynticks_eqs_online() is unnecessary, > however, removing it would make the CPU-online process vulnerable to > slight changes in the CPU-offline process. > > One could also ask why it is safe to move the rcu_dynticks_eqs_online() > call so late in the CPU-online process. Indeed, there was a time when it > would not have been safe, which does much to explain its current location. > However, the marking of a CPU as online from an RCU perspective has long > since moved from rcutree_prepare_cpu() to rcu_cpu_starting(), and all > that is required is that ->dynticks be set correctly by the time that > the CPU is marked as online from an RCU perspective. After all, the RCU > grace-period kthread does not check to see if offline CPUs are also idle. > (In case you were curious, this is one reason why there is quiescent-state > reporting as part of the offlining process.) > > This commit therefore moves the call to rcu_dynticks_eqs_online() from > rcutree_prepare_cpu() to rcu_cpu_starting(), this latter being guaranteed > to be running on the incoming CPU. The call to this function must of > course be placed before this rcu_cpu_starting() announces this CPU's > presence to RCU. > > Reported-by: Mathieu Desnoyers > Signed-off-by: Paul E. McKenney > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 0172a5fd6d8de..aa00babdaf544 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -4129,7 +4129,6 @@ int rcutree_prepare_cpu(unsigned int cpu) > rdp->n_force_qs_snap = READ_ONCE(rcu_state.n_force_qs); > rdp->blimit = blimit; > rdp->dynticks_nesting = 1; /* CPU not up, no tearing. */ > - rcu_dynticks_eqs_online(); > raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ > > /* > @@ -4249,6 +4248,7 @@ void rcu_cpu_starting(unsigned int cpu) > mask = rdp->grpmask; > WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1); > WARN_ON_ONCE(!(rnp->ofl_seq & 0x1)); > + rcu_dynticks_eqs_online(); > smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier(). > raw_spin_lock_irqsave_rcu_node(rnp, flags); > WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask); -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com