From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: "Borislav Petkov" <bp@alien8.de>, "X86 ML" <x86@kernel.org>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Peter Zijlstra" <peterz@infradead.org>,
"Oleg Nesterov" <oleg@redhat.com>,
"Tony Luck" <tony.luck@intel.com>,
"Andi Kleen" <andi@firstfloor.org>,
"Josh Triplett" <josh@joshtriplett.org>,
"Frédéric Weisbecker" <fweisbec@gmail.com>
Subject: Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context
Date: Mon, 24 Nov 2014 14:34:07 -0800 [thread overview]
Message-ID: <20141124223407.GB8512@linux.vnet.ibm.com> (raw)
In-Reply-To: <20141124213501.GX5050@linux.vnet.ibm.com>
On Mon, Nov 24, 2014 at 01:35:01PM -0800, Paul E. McKenney wrote:
> On Mon, Nov 24, 2014 at 01:02:51PM -0800, Andy Lutomirski wrote:
> > On Mon, Nov 24, 2014 at 12:54 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Mon, Nov 24, 2014 at 12:22:13PM -0800, Andy Lutomirski wrote:
> > >> On Sat, Nov 22, 2014 at 3:41 PM, Paul E. McKenney
> > >> <paulmck@linux.vnet.ibm.com> wrote:
> > >> > On Fri, Nov 21, 2014 at 09:53:29PM -0800, Andy Lutomirski wrote:
> > >> >> On Fri, Nov 21, 2014 at 8:20 PM, Paul E. McKenney
> > >> >> <paulmck@linux.vnet.ibm.com> wrote:
> > >> >> > On Fri, Nov 21, 2014 at 06:00:14PM -0800, Andy Lutomirski wrote:
> > >> >> >> On Fri, Nov 21, 2014 at 3:38 PM, Paul E. McKenney
> > >> >> >> <paulmck@linux.vnet.ibm.com> wrote:
> > >>
> > >> > Returning state sounds like a bad idea, if we can reasonably avoid it.
> > >>
> > >> I agree, except that we already do it for exception_enter(), etc. But
> > >> yes, changing fewer things is nice.
> > >>
> > >> >
> > >> > And I think I finally see what you are pointing out about my code: If
> > >> > another NMI comes in between the time I increment ->dynticks_nmi_nesting
> > >> > and the time I atomically increment ->dynticks, the nested NMI handler
> > >> > will incorrectly believe that RCU is already paying attention to this CPU.
> > >> > Which would indeed not be at all good, so good catch!!!
> > >> >
> > >> >> Otherwise, I think that there may need to be enough state somewhere so
> > >> >> that the outermost nested rcu_nmi_enter knows whether to increment
> > >> >> dynticks. For example, dynticks_nmi_nesting could store the nesting
> > >> >> count * 2 - (1 if the outermost nested user needs to increment
> > >> >> dynticks). Something like:
> > >> >>
> > >> >> void rcu_nmi_enter(void)
> > >> >> {
> > >> >> /* Be very careful -- this function may be called reentrently on the
> > >> >> same CPU. */
> > >> >> atomically: increment dynticks if it's even.
> > >> >>
> > >> >> /* If an rcu_nmi_enter/rcu_nmi_exit pair happens here, then it will not change
> > >> >> * the state. */
> > >> >>
> > >> >> local_inc(&dynticks_nmi_nesting, (we incremented dynticks ? 1 : 2));
> > >> >>
> > >> >> WARN_ON(we incremented dynticks and dynticks_nmi_nesting was nonzero);
> > >> >> }
> > >> >>
> > >> >> void rcu_nmi_exit(void)
> > >> >> {
> > >> >> WARN_ON(!(dynticks & 1));
> > >> >> locally atomically: dynticks_nmi_nesting -= 2, unless
> > >> >> dynticks_nmi_nesting == 1, in which case set it to zero
> > >> >>
> > >> >> if (dynticks_nmi_nesting was 1)
> > >> >> atomic_inc(&dynticks);
> > >> >> }
> > >> >>
> > >> >> The invariant here is that, for a single unnested enter/exit, if
> > >> >> dynticks_nmi_nesting != 0, then dynticks is odd. As a result, an
> > >> >> rcu_nmi_enter/rcu_nmi_exit pair at any time when dynticks_nmi_nesting
> > >> >> != 0 *or* dynticks is odd will have no net effect, so the invariant,
> > >> >> in fact, holds for all invocations, nested or otherwise.
> > >> >>
> > >> >> At least one of those conditions is true at all times during the
> > >> >> execution of outermost pair, starting with the first atomic operation
> > >> >> and ending with the final atomic_inc. So they nest properly no matter
> > >> >> what else happens (unless, of course, someone else pokes dynticks in
> > >> >> the middle).
> > >> >>
> > >> >> Thoughts?
> > >> >
> > >> > Let's see... The evenness of ->dynticks should be preserved by nested NMI
> > >> > handlers, so the check and increment need not be atomic. We don't have
> > >> > any way (other than atomic operations) to do local atomic modifications
> > >> > on all architectures, because we cannot mask NMIs. (Yes, it can work
> > >> > on x86, but this is common code that needs to work everywhere.) On the
> > >> > other hand, presumably NMIs are rare, so atomic modification of the NMI
> > >> > nesting counter should be OK, at least if it proves absolutely necessary.
> > >> > And I am thinking that a mechanical proof will be needed here. :-/
> > >> >
> > >> > But first, let me try generating the code and informally evaluating it:
> > >> >
> > >> > 1 struct rcu_dynticks {
> > >> > 2 long long dynticks_nesting;
> > >> > 3 int dynticks_nmi_nesting;
> > >> > 4 atomic_t dynticks;
> > >> > 5 };
> > >> > 6
> > >> > 7 void rcu_nmi_enter(void)
> > >> > 8 {
> > >> > 9 struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> > >> > 10 int incby = 2;
> > >> > 11
> > >> > 12 if (!(atomic_read(&rdtp->dynticks) & 0x1)) {
> > >> > 13 smp_mb__before_atomic();
> > >> > 14 atomic_inc(&rdtp->dynticks);
> > >> > 15 smp_mb__after_atomic();
> > >> > 16 WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> > >> > 17 incby = 1;
> > >>
> > >> WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 1) here, perhaps?
> > >
> > > That would make sense.
> > >
> > >> > 18 }
> > >> > 19 rdtp->dynticks_nmi_nesting += incby;
> > >>
> > >> Oh, I see why you don't need local_add -- it's because an nmi in the
> > >> middle of this increment won't have any effect on the interrupted
> > >> code, so even a software RMW will be okay.
> > >
> > > Yep! ;-)
> > >
> > >> > 20 barrier();
> > >> > 21 }
> > >> > 22
> > >> > 23 void rcu_nmi_exit(void)
> > >> > 24 {
> > >> > 25 struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> > >> > 26
> > >> > 27 WARN_ON_ONCE(!rdtp->dynticks_nmi_nesting);
> > >> > 28 WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> > >> > 29 if (rdtp->dynticks_nmi_nesting != 1) {
> > >>
> > >> WARN_ON_ONCE(rdtp->dynticks_nmi_nesting < 2), perhaps?
> > >
> > > This is already implied by the WARN_ON_ONCE() on line 27 and the check
> > > on line 29.
> >
> > I was worried about negative numbers. Maybe change line 27 to
> > WARN_ON_ONCE(rdtp->dynticks_nmi_nesting <= 0), then? (Or is it
> > unsigned? If so, let's make to signed to catch this type of error.)
>
> Good point, they are signed, so your WARN_ON_ONCE() would work.
>
> > >> > 30 rdtp->dynticks_nmi_nesting -= 2;
> > >> > 31 return;
> > >> > 32 }
> > >> > 33 rdtp->dynticks_nmi_nesting = 0;
> > >> > 34 smp_mb__before_atomic();
> > >>
> > >> This implies barrier(), right?
> > >
> > > Yep!
> > >
> > >> > 35 atomic_inc(&rdtp->dynticks);
> > >> > 36 smp_mb__after_atomic();
> > >> > 37 WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
> > >> > 38 }
> > >> >
> > >> > Line 9 picks up a pointer to this CPU's rcu_dynticks structure and line 10
> > >> > assumes that we don't need to increment ->dynticks.
> > >> >
> > >> > Line 12 checks to see if ->dynticks is even. Note that this check is
> > >> > stable: If there are nested NMIs, they will increment ->dynticks twice
> > >> > or not at all, and either way preserves the evenness (to be proven, of
> > >> > course, but that is the plan). If ->dynticks is even, lines 13-15
> > >> > atomically increment it, line 16 complains if still even, and line 17
> > >> > says we will increment ->dynticks_nmi_nesting by only 1.
> > >> >
> > >> > Either way, line 19 increments ->dynticks_nmi_nesting as needed and
> > >> > line 20 keeps the compiler from getting too cute.
> > >> >
> > >> > For rcu_nmi_exit(), line 25 again picks up this CPUs rcu_dynticks
> > >> > structure. Lines 27 and 28 complain bitterly if invariants are violated.
> > >> > If line 29 finds that the value of ->dynticks_nmi_nesting is not 1,
> > >> > then line 30 subtracts 2 from ->dynticks_nmi_nesting and line 31 returns.
> > >> >
> > >> > Otherwise, line 33 sets ->dynticks_nmi_nesting to zero, lines 34-36
> > >> > atomically increment ->dynticks with full ordering, and line 37
> > >> > complains bitterly if ->dynticks is not even.
> > >> >
> > >> > So, if an NMI occurs before rcu_nmi_enter's atomic increment, then the
> > >> > nested NMI's rcu_nmi_enter() and rcu_nmi_exit() will think that they are
> > >> > not nested, which is the correct thing for them to think in that case.
> > >> > They will increment ->dynticks twice and restore ->dynticks_nmi_nesting
> > >> > to zero (adding and then subtracting 1). If the NMI happens after the
> > >> > atomic increment, then the nested rcu_nmi_enter() and rcu_nmi_exit()
> > >> > will leave ->dynticks alone, and will restore ->dynticks_nmi_nesting
> > >> > to zero (adding and subtracting two again). If the NMI happens after
> > >> > the increment of ->dynticks_nmi_nesting, the nested NMI's rcu_nmi_enter()
> > >> > and rcu_nmi_exit() will again restore ->dynticks_nmi_nesting, but this
> > >> > time to one (again adding and subtracting two).
> > >> >
> > >> > In rcu_nmi_exit(), ->dynticks_nmi_nesting of zero had better not happen,
> > >> > one means we need to atomically increment ->dynticks, and other values
> > >> > mean that we are partially or fully nested. Reasoning proceeds as for
> > >> > rcu_nmi_enter(), but in the opposite direction.
> > >> >
> > >> > Whew! That might even work.
> > >>
> > >> I think I like this, with the warnings above.
> > >
> > > OK with dropping the one that I called out as redundant?
> >
> > Sure, but see about.
> >
> > >
> > >> > But how about taking a different approach. Assuming that there can
> > >> > never be more than (say) 14 nesting NMI-like things, use the lower
> > >> > four bits of ->dynticks to represent the NMI nesting and the upper
> > >> > 28 bits as the counter. This of course requires modifying lots of
> > >> > places in RCU that check the counter, but it is probably time to
> > >> > abstract the check anyway.
> > >> >
> > >> > This would allow my earlier attempted logic to work and (maybe) simplify
> > >> > the reasoning a bit (and yes, the "magic" constants need macros):
> > >> >
> > >> > void rcu_nmi_enter(void)
> > >> > {
> > >> > struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> > >> > int nesting = atomic_read(&rdtp->dynticks) & 0xf;
> > >> > int incby = 0x01;
> > >> >
> > >> > WARN_ON_ONCE(nexting == 0xf);
> > >> > if (nesting == 0) {
> > >> > if (atomic_read(&rdtp->dynticks) & 0x10)
> > >> > return;
> > >> > incby = 0x11;
> > >> > }
> > >> > smp_mb__before_atomic();
> > >> > atomic_add(&rdtp->dynticks, incby);
> > >> > smp_mb__after_atomic();
> > >> > WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks) & 0x1));
> > >> > }
> > >> >
> > >> > void rcu_nmi_exit(void)
> > >> > {
> > >> > struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> > >> > int nesting = atomic_read(&rdtp->dynticks) & 0xf;
> > >> > int incby = 0x0f;
> > >> >
> > >> > if (nesting == 0)
> > >> > return;
> > >> > if (nesting > 1)
> > >> > incby = -1;
> > >> > smp_mb__before_atomic();
> > >> > atomic_add(&rdtp->dynticks, incby);
> > >> > smp_mb__after_atomic();
> > >> > WARN_ON_ONCE(atomic_read(&rdtp->dynticks) & 0x1);
> > >> > }
> > >> >
> > >> > Over to you! ;-)
> > >>
> > >> This latter one is all you :)
> > >
> > > Well, let's see how I feel about it after trying a Promela model of
> > > the first code sequence. ;-)
> >
> > :)
> >
> > Does Promela understand the differences between this type of
> > reentrancy and real threading?
>
> Not as far as I know. But it can be tricked into making this distinction.
> One thread just has the Promela code as is, and the other thread has
> the same Promela code entirely contained in an atomic block. This means
> that the entire second thread must executed at one point in the first
> thread, just like an NMI would.
And the following Promela model claims that your approach works.
Should I trust it? ;-)
Thanx, Paul
------------------------------------------------------------------------
/*
* Promela model for Andy Lutomirski's suggested change to rcu_nmi_enter()
* that allows nesting.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, you can access it online at
* http://www.gnu.org/licenses/gpl-2.0.html.
*
* Copyright IBM Corporation, 2014
*
* Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
*/
byte dynticks_nesting;
byte dynticks_nmi_nesting;
byte dynticks;
byte busy;
/*
* Promela verision of rcu_nmi_enter().
*/
inline rcu_nmi_enter()
{
assert(dynticks_nmi_nesting >= 0);
if
:: (dynticks & 1) == 0 ->
atomic {
dynticks = dynticks + 1;
}
assert((dynticks & 1) == 1);
dynticks_nmi_nesting = dynticks_nmi_nesting + 1;
assert(dynticks_nmi_nesting >= 1);
:: else ->
dynticks_nmi_nesting = dynticks_nmi_nesting + 2;
fi;
}
/*
* Promela verision of rcu_nmi_exit().
*/
inline rcu_nmi_exit()
{
assert(dynticks_nmi_nesting > 0);
assert((dynticks & 1) != 0);
if
:: dynticks_nmi_nesting != 1 ->
dynticks_nmi_nesting = dynticks_nmi_nesting - 2;
:: else ->
dynticks_nmi_nesting = 0;
atomic {
dynticks = dynticks + 1;
}
assert((dynticks & 1) == 0);
fi;
}
/*
* Base-level NMI runs non-atomically. Crudely emulates process-level
* dynticks-idle entry/exit.
*/
proctype base_NMI()
{
do
:: if
:: 1 -> atomic {
dynticks = dynticks + 1;
}
busy = 0;
:: 1 -> skip;
fi;
rcu_nmi_enter();
assert((dynticks & 1) == 1);
rcu_nmi_exit();
if
:: busy -> skip;
:: !busy ->
atomic {
dynticks = dynticks + 1;
}
busy = 1;
fi;
od;
}
/*
* Nested NMI runs atomically to emulate interrupting base_level().
*/
proctype nested_NMI()
{
do
:: atomic {
rcu_nmi_enter();
assert((dynticks & 1) == 1);
rcu_nmi_exit();
}
od;
}
init {
dynticks_nesting = 0;
dynticks_nmi_nesting = 0;
dynticks = 0;
busy = 0;
run base_NMI();
run nested_NMI();
}
next prev parent reply other threads:[~2014-11-24 22:34 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-21 21:26 [PATCH v4 0/5] x86: Rework IST interrupts Andy Lutomirski
2014-11-21 21:26 ` [PATCH v4 1/5] uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME Andy Lutomirski
2014-11-22 16:55 ` Borislav Petkov
2014-11-24 17:58 ` Andy Lutomirski
2014-11-21 21:26 ` [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context Andy Lutomirski
2014-11-21 21:32 ` Andy Lutomirski
2014-11-21 22:07 ` Paul E. McKenney
2014-11-21 22:19 ` Andy Lutomirski
2014-11-21 22:55 ` Paul E. McKenney
2014-11-21 23:06 ` Andy Lutomirski
2014-11-21 23:38 ` Paul E. McKenney
2014-11-22 2:00 ` Andy Lutomirski
2014-11-22 4:20 ` Paul E. McKenney
2014-11-22 5:53 ` Andy Lutomirski
2014-11-22 23:41 ` Paul E. McKenney
2014-11-24 20:22 ` Andy Lutomirski
2014-11-24 20:54 ` Paul E. McKenney
2014-11-24 21:02 ` Andy Lutomirski
2014-11-24 21:35 ` Paul E. McKenney
2014-11-24 22:34 ` Paul E. McKenney [this message]
2014-11-24 22:36 ` Andy Lutomirski
2014-11-24 22:57 ` Paul E. McKenney
2014-11-24 23:31 ` Paul E. McKenney
2014-11-24 23:35 ` Andy Lutomirski
2014-11-24 23:50 ` Paul E. McKenney
2014-11-24 23:52 ` Andy Lutomirski
2014-11-25 18:58 ` Borislav Petkov
2014-11-25 19:16 ` Paul E. McKenney
2014-12-11 0:22 ` Tony Luck
2014-12-11 0:24 ` Andy Lutomirski
2015-01-05 21:46 ` Tony Luck
2015-01-05 21:54 ` Andy Lutomirski
2015-01-06 0:44 ` [PATCH] x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks Luck, Tony
2015-01-06 1:01 ` Andy Lutomirski
2015-01-06 18:00 ` Luck, Tony
2015-01-07 12:13 ` Borislav Petkov
2015-01-07 15:51 ` Andy Lutomirski
2015-01-07 15:58 ` Borislav Petkov
2015-01-07 16:12 ` Paul E. McKenney
2014-11-25 17:13 ` [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context Paul E. McKenney
2014-11-27 7:03 ` Lai Jiangshan
2014-11-27 16:46 ` Paul E. McKenney
2014-11-24 21:27 ` Paul E. McKenney
2014-11-21 22:20 ` Frederic Weisbecker
2014-11-21 22:00 ` Paul E. McKenney
2014-11-22 17:20 ` Borislav Petkov
2014-11-24 19:48 ` Andy Lutomirski
2015-01-22 21:52 ` Sasha Levin
2015-01-23 17:58 ` Andy Lutomirski
2015-01-23 18:04 ` Borislav Petkov
2015-01-23 18:34 ` Andy Lutomirski
2015-01-23 20:48 ` Sasha Levin
2015-01-24 1:25 ` Andy Lutomirski
2015-01-28 16:33 ` Andy Lutomirski
2015-01-28 17:48 ` Paul E. McKenney
2015-01-28 21:02 ` Andy Lutomirski
2015-01-30 19:57 ` Sasha Levin
2015-01-31 1:28 ` Sasha Levin
2015-01-31 3:12 ` Andy Lutomirski
2015-01-31 12:50 ` Andy Lutomirski
2015-01-31 13:01 ` [PATCH] x86, traps: Fix ist_enter from userspace Andy Lutomirski
2015-01-31 15:09 ` Sasha Levin
2015-01-31 16:18 ` Paul E. McKenney
2015-02-01 2:17 ` Andy Lutomirski
2015-02-04 6:01 ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2014-11-21 21:26 ` [PATCH v4 3/5] x86, entry: Switch stacks on a paranoid entry " Andy Lutomirski
2014-11-24 15:55 ` Borislav Petkov
2014-11-21 21:26 ` [PATCH v4 4/5] x86: Clean up current_stack_pointer Andy Lutomirski
2014-11-24 11:39 ` Borislav Petkov
2014-11-21 21:26 ` [PATCH v4 5/5] x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic Andy Lutomirski
2014-11-24 15:54 ` Borislav Petkov
2014-11-24 19:52 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141124223407.GB8512@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=andi@firstfloor.org \
--cc=bp@alien8.de \
--cc=fweisbec@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).