From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756804AbcKKRBC (ORCPT ); Fri, 11 Nov 2016 12:01:02 -0500 Received: from mail-vk0-f53.google.com ([209.85.213.53]:36153 "EHLO mail-vk0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933797AbcKKRA7 (ORCPT ); Fri, 11 Nov 2016 12:00:59 -0500 MIME-Version: 1.0 In-Reply-To: <20161110045213.GY4127@linux.vnet.ibm.com> References: <1471382376-5443-1-git-send-email-cmetcalf@mellanox.com> <1605b087-2b3b-77c1-01ac-084e378f5f28@mellanox.com> <20161109014045.GI4127@linux.vnet.ibm.com> <20161109173808.GJ4127@linux.vnet.ibm.com> <20161110045213.GY4127@linux.vnet.ibm.com> From: Andy Lutomirski Date: Fri, 11 Nov 2016 09:00:37 -0800 Message-ID: Subject: Re: task isolation discussion at Linux Plumbers To: "Paul E. McKenney" Cc: Chris Metcalf , Gilad Ben Yossef , Steven Rostedt , Ingo Molnar , Peter Zijlstra , Andrew Morton , Rik van Riel , Tejun Heo , Frederic Weisbecker , Thomas Gleixner , Christoph Lameter , Viresh Kumar , Catalin Marinas , Will Deacon , Daniel Lezcano , Francis Giraldeau , Andi Kleen , Arnd Bergmann , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 9, 2016 at 8:52 PM, Paul E. McKenney wrote: > On Wed, Nov 09, 2016 at 05:44:02PM -0800, Andy Lutomirski wrote: >> On Wed, Nov 9, 2016 at 9:38 AM, Paul E. McKenney >> wrote: >> >> Are you planning on changing rcu_nmi_enter()? It would make it easier >> to figure out how they interact if I could see the code. > > It already calls rcu_dynticks_eqs_exit(), courtesy of the earlier > consolidation patches. > >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >> > index dbf20b058f48..342c8ee402d6 100644 >> > --- a/kernel/rcu/tree.c >> > +++ b/kernel/rcu/tree.c >> >> >> > /* >> > @@ -305,17 +318,22 @@ static void rcu_dynticks_eqs_enter(void) >> > static void rcu_dynticks_eqs_exit(void) >> > { >> > struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); >> > + int seq; >> > >> > /* >> > - * CPUs seeing atomic_inc() must see prior idle sojourns, >> > + * CPUs seeing atomic_inc_return() must see prior idle sojourns, >> > * and we also must force ordering with the next RCU read-side >> > * critical section. >> > */ >> > - smp_mb__before_atomic(); /* See above. */ >> > - atomic_inc(&rdtp->dynticks); >> > - smp_mb__after_atomic(); /* See above. */ >> > + seq = atomic_inc_return(&rdtp->dynticks); >> > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && >> > - !(atomic_read(&rdtp->dynticks) & 0x1)); >> > + !(seq & RCU_DYNTICK_CTRL_CTR)); >> >> I think there's still a race here. Suppose we're running this code on >> cpu n and... >> >> > + if (seq & RCU_DYNTICK_CTRL_MASK) { >> > + rcu_eqs_special_exit(); >> > + /* Prefer duplicate flushes to losing a flush. */ >> > + smp_mb__before_atomic(); /* NMI safety. */ >> >> ... another CPU changes the page tables and calls rcu_eqs_special_set(n) here. > > But then rcu_eqs_special_set() will return false because we already > exited the extended quiescent state at the atomic_inc_return() above. > > That should tell the caller to send an IPI. > >> That CPU expects that we will flush prior to continuing, but we won't. >> Admittedly it's highly unlikely that any stale TLB entries would be >> created yet, but nothing rules it out. > > That said, 0day is having some heartburn from this, so I must have broken > something somewhere. My own tests of course complete just fine... > >> > + atomic_and(~RCU_DYNTICK_CTRL_MASK, &rdtp->dynticks); >> > + } >> >> Maybe the way to handle it is something like: >> >> this_cpu_write(rcu_nmi_needs_eqs_special, 1); >> barrier(); >> >> /* NMI here will call rcu_eqs_special_exit() regardless of the value >> in dynticks */ >> >> atomic_and(...); >> smp_mb__after_atomic(); >> rcu_eqs_special_exit(); >> >> barrier(); >> this_cpu_write(rcu_nmi_needs_eqs_special, 0); >> >> >> Then rcu_nmi_enter() would call rcu_eqs_special_exit() if the dynticks >> bit is set *or* rcu_nmi_needs_eqs_special is set. >> >> Does that make sense? > > I believe that rcu_eqs_special_set() returning false covers this, but > could easily be missing something. I think you're right. I'll stare at it some more when I do the actual TLB flush patch. --Andy