All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: "Borislav Petkov" <bp@alien8.de>, "X86 ML" <x86@kernel.org>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"Tony Luck" <tony.luck@intel.com>,
	"Andi Kleen" <andi@firstfloor.org>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>
Subject: Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context
Date: Fri, 21 Nov 2014 20:20:56 -0800	[thread overview]
Message-ID: <20141122042056.GY5050@linux.vnet.ibm.com> (raw)
In-Reply-To: <CALCETrVNnDEcVu_BteA2gVBeRB7zOsQmrA6JCrE8ez4U=vc0MQ@mail.gmail.com>

On Fri, Nov 21, 2014 at 06:00:14PM -0800, Andy Lutomirski wrote:
> On Fri, Nov 21, 2014 at 3:38 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Fri, Nov 21, 2014 at 03:06:48PM -0800, Andy Lutomirski wrote:
> >> On Fri, Nov 21, 2014 at 2:55 PM, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> > On Fri, Nov 21, 2014 at 02:19:17PM -0800, Andy Lutomirski wrote:
> >> >> On Fri, Nov 21, 2014 at 2:07 PM, Paul E. McKenney
> >> >> <paulmck@linux.vnet.ibm.com> wrote:
> >> >> > On Fri, Nov 21, 2014 at 01:32:50PM -0800, Andy Lutomirski wrote:
> >> >> >> On Fri, Nov 21, 2014 at 1:26 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> >> >> > We currently pretend that IST context is like standard exception
> >> >> >> > context, but this is incorrect.  IST entries from userspace are like
> >> >> >> > standard exceptions except that they use per-cpu stacks, so they are
> >> >> >> > atomic.  IST entries from kernel space are like NMIs from RCU's
> >> >> >> > perspective -- they are not quiescent states even if they
> >> >> >> > interrupted the kernel during a quiescent state.
> >> >> >> >
> >> >> >> > Add and use ist_enter and ist_exit to track IST context.  Even
> >> >> >> > though x86_32 has no IST stacks, we track these interrupts the same
> >> >> >> > way.
> >> >> >>
> >> >> >> I should add:
> >> >> >>
> >> >> >> I have no idea why RCU read-side critical sections are safe inside
> >> >> >> __do_page_fault today.  It's guarded by exception_enter(), but that
> >> >> >> doesn't do anything if context tracking is off, and context tracking
> >> >> >> is usually off. What am I missing here?
> >> >> >
> >> >> > Ah!  There are three cases:
> >> >> >
> >> >> > 1.      Context tracking is off on a non-idle CPU.  In this case, RCU is
> >> >> >         still paying attention to CPUs running in both userspace and in
> >> >> >         the kernel.  So if a page fault happens, RCU will be set up to
> >> >> >         notice any RCU read-side critical sections.
> >> >> >
> >> >> > 2.      Context tracking is on on a non-idle CPU.  In this case, RCU
> >> >> >         might well be ignoring userspace execution: NO_HZ_FULL and
> >> >> >         all that.  However, as you pointed out, in this case the
> >> >> >         context-tracking code lets RCU know that we have entered the
> >> >> >         kernel, which means that RCU will again be paying attention to
> >> >> >         RCU read-side critical sections.
> >> >> >
> >> >> > 3.      The CPU is idle.  In this case, RCU is ignoring the CPU, so
> >> >> >         if we take a page fault when context tracking is off, life
> >> >> >         will be hard.  But the kernel is not supposed to take page
> >> >> >         faults in the idle loop, so this is not a problem.
> >> >>
> >> >> I guess so, as long as there are really no page faults in the idle loop.
> >> >
> >> > As far as I know, there are not.  If there are, someone needs to let
> >> > me know!  ;-)
> >> >
> >> >> There are, however, machine checks in the idle loop, and maybe kprobes
> >> >> (haven't checked), so I think this patch might fix real bugs.
> >> >
> >> > If you can get ISTs from the idle loop, then the patch is needed.
> >> >
> >> >> > Just out of curiosity...  Can an NMI occur in IST context?  If it can,
> >> >> > I need to make rcu_nmi_enter() and rcu_nmi_exit() deal properly with
> >> >> > nested calls.
> >> >>
> >> >> Yes, and vice versa.  That code looked like it handled nesting
> >> >> correctly, but I wasn't entirely sure.
> >> >
> >> > It currently does not, please see below patch.  Are you able to test
> >> > nesting?  It would be really cool if you could do so -- I have no
> >> > way to test this patch.
> >>
> >> I can try.  It's sort of easy -- I'll put an int3 into do_nmi and add
> >> a fixup to avoid crashing.
> >>
> >> What should I look for?  Should I try to force full nohz on and assert
> >> something?  I don't really know how to make full nohz work.
> >
> > You should look for the WARN_ON_ONCE() calls in rcu_nmi_enter() and
> > rcu_nmi_exit() to fire.
> 
> No warning with or without your patch, maybe because all of those
> returns skip the labels.

I will be guardedly optimistic and take this as a good sign.  ;-)

> Also, an NMI can happen *during* rcu_nmi_enter or rcu_nmi_exit.  Is
> that okay?  Should those dynticks_nmi_nesting++ things be local_inc
> and local_dec_and_test?

Yep, it is OK during rcu_nmi_enter() or rcu_nmi_exit().  The nested
NMI will put the dynticks_nmi_nesting counter back where it was, so
no chance of confusion.

> That dynticks_nmi_nesting thing seems scary to me.  Shouldn't the code
> unconditionally increment dynticks_nmi_nesting in rcu_nmi_enter and
> unconditionally decrement it in rcu_nmi_exit?

You might be able to get that to work, but the reason it is not done
that way is because we might get an NMI while not in dyntick-idle
state.  In that case, it would be very bad to atomically increment
rcu_dynticks, because that would tell RCU to ignore the CPU while it
was in the NMI handler, which is the opposite of what we want.

But what did you have in mind?

							Thanx, Paul

> --Andy
> 
> >
> >                                                         Thanx, Paul
> >
> >> >> Also, just to make sure: are we okay if rcu_nmi_enter() is called
> >> >> before exception_enter if context tracking is on and we came directly
> >> >> from userspace?
> >> >
> >> > If I understand correctly, this will result in context tracking invoking
> >> > rcu_user_enter(), which will result in the rcu_dynticks counter having an
> >> > odd value.  In that case, rcu_nmi_enter() will notice that RCU is already
> >> > paying attention to this CPU via its check of atomic_read(&rdtp->dynticks)
> >> > & 0x1), and will thus just return.  The matching rcu_nmi_exit() will
> >> > notice that the nesting count is zero, and will also just return.
> >> >
> >> > Thus, everything works in that case.
> >> >
> >> > In contrast, if rcu_nmi_enter() was invoked from the idle loop, it
> >> > would see that RCU is not paying attention to this CPU and that the
> >> > NMI nesting depth (which rcu_nmi_enter() increments) used to be zero.
> >> > It would then atomically increment rtdp->dynticks, forcing RCU to start
> >> > paying attention to this CPU.  The matching rcu_nmi_exit() will see
> >> > that the nesting count was non-zero, but became zero when decremented.
> >> > This will cause rcu_nmi_exit() to atomically increment rtdp->dynticks,
> >> > which will tell RCU to stop paying attention to this CPU.
> >> >
> >> >                                                         Thanx, Paul
> >> >
> >> > ------------------------------------------------------------------------
> >> >
> >> > rcu: Make rcu_nmi_enter() handle nesting
> >> >
> >> > Andy Lutomirski is introducing ISTs into x86, which from RCU's
> >> > viewpoint are NMIs.  Because ISTs and NMIs can nest, rcu_nmi_enter()
> >> > and rcu_nmi_exit() must now correctly handle nesting.  As luck would
> >> > have it, rcu_nmi_exit() handles nesting but rcu_nmi_enter() does not.
> >> > This patch therefore makes rcu_nmi_enter() handle nesting.
> >>
> >> Thanks.  Should I add this to v5 of my series?
> >>
> >> --Andy
> >>
> >> >
> >> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >> >
> >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> > index 8749f43f3f05..875421aff6e3 100644
> >> > --- a/kernel/rcu/tree.c
> >> > +++ b/kernel/rcu/tree.c
> >> > @@ -770,7 +770,8 @@ void rcu_nmi_enter(void)
> >> >         if (rdtp->dynticks_nmi_nesting == 0 &&
> >> >             (atomic_read(&rdtp->dynticks) & 0x1))
> >> >                 return;
> >> > -       rdtp->dynticks_nmi_nesting++;
> >> > +       if (rdtp->dynticks_nmi_nesting++ != 0)
> >> > +               return; /* Nested NMI/IST/whatever. */
> >> >         smp_mb__before_atomic();  /* Force delay from prior write. */
> >> >         atomic_inc(&rdtp->dynticks);
> >> >         /* CPUs seeing atomic_inc() must see later RCU read-side crit sects */
> >> >
> >>
> >>
> >>
> >> --
> >> Andy Lutomirski
> >> AMA Capital Management, LLC
> >>
> >
> 
> 
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC
> 


  reply	other threads:[~2014-11-22  4:21 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-21 21:26 [PATCH v4 0/5] x86: Rework IST interrupts Andy Lutomirski
2014-11-21 21:26 ` [PATCH v4 1/5] uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME Andy Lutomirski
2014-11-22 16:55   ` Borislav Petkov
2014-11-24 17:58     ` Andy Lutomirski
2014-11-21 21:26 ` [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context Andy Lutomirski
2014-11-21 21:32   ` Andy Lutomirski
2014-11-21 22:07     ` Paul E. McKenney
2014-11-21 22:19       ` Andy Lutomirski
2014-11-21 22:55         ` Paul E. McKenney
2014-11-21 23:06           ` Andy Lutomirski
2014-11-21 23:38             ` Paul E. McKenney
2014-11-22  2:00               ` Andy Lutomirski
2014-11-22  4:20                 ` Paul E. McKenney [this message]
2014-11-22  5:53                   ` Andy Lutomirski
2014-11-22 23:41                     ` Paul E. McKenney
2014-11-24 20:22                       ` Andy Lutomirski
2014-11-24 20:54                         ` Paul E. McKenney
2014-11-24 21:02                           ` Andy Lutomirski
2014-11-24 21:35                             ` Paul E. McKenney
2014-11-24 22:34                               ` Paul E. McKenney
2014-11-24 22:36                                 ` Andy Lutomirski
2014-11-24 22:57                                   ` Paul E. McKenney
2014-11-24 23:31                                     ` Paul E. McKenney
2014-11-24 23:35                                       ` Andy Lutomirski
2014-11-24 23:50                                         ` Paul E. McKenney
2014-11-24 23:52                                           ` Andy Lutomirski
2014-11-25 18:58                                             ` Borislav Petkov
2014-11-25 19:16                                               ` Paul E. McKenney
2014-12-11  0:22                                               ` Tony Luck
2014-12-11  0:24                                                 ` Andy Lutomirski
2015-01-05 21:46                                                   ` Tony Luck
2015-01-05 21:54                                                     ` Andy Lutomirski
2015-01-06  0:44                                                       ` [PATCH] x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks Luck, Tony
2015-01-06  1:01                                                         ` Andy Lutomirski
2015-01-06 18:00                                                           ` Luck, Tony
2015-01-07 12:13                                                             ` Borislav Petkov
2015-01-07 15:51                                                               ` Andy Lutomirski
2015-01-07 15:58                                                                 ` Borislav Petkov
2015-01-07 16:12                                                                 ` Paul E. McKenney
2014-11-25 17:13                                           ` [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context Paul E. McKenney
2014-11-27  7:03                                           ` Lai Jiangshan
2014-11-27 16:46                                             ` Paul E. McKenney
2014-11-24 21:27                           ` Paul E. McKenney
2014-11-21 22:20       ` Frederic Weisbecker
2014-11-21 22:00   ` Paul E. McKenney
2014-11-22 17:20   ` Borislav Petkov
2014-11-24 19:48     ` Andy Lutomirski
2015-01-22 21:52   ` Sasha Levin
2015-01-23 17:58     ` Andy Lutomirski
2015-01-23 18:04       ` Borislav Petkov
2015-01-23 18:34         ` Andy Lutomirski
2015-01-23 20:48           ` Sasha Levin
2015-01-24  1:25             ` Andy Lutomirski
2015-01-28 16:33               ` Andy Lutomirski
2015-01-28 17:48                 ` Paul E. McKenney
2015-01-28 21:02                   ` Andy Lutomirski
2015-01-30 19:57                     ` Sasha Levin
2015-01-31  1:28                       ` Sasha Levin
2015-01-31  3:12                         ` Andy Lutomirski
2015-01-31 12:50                           ` Andy Lutomirski
2015-01-31 13:01                         ` [PATCH] x86, traps: Fix ist_enter from userspace Andy Lutomirski
2015-01-31 15:09                           ` Sasha Levin
2015-01-31 16:18                           ` Paul E. McKenney
2015-02-01  2:17                             ` Andy Lutomirski
2015-02-04  6:01                           ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2014-11-21 21:26 ` [PATCH v4 3/5] x86, entry: Switch stacks on a paranoid entry " Andy Lutomirski
2014-11-24 15:55   ` Borislav Petkov
2014-11-21 21:26 ` [PATCH v4 4/5] x86: Clean up current_stack_pointer Andy Lutomirski
2014-11-24 11:39   ` Borislav Petkov
2014-11-21 21:26 ` [PATCH v4 5/5] x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic Andy Lutomirski
2014-11-24 15:54   ` Borislav Petkov
2014-11-24 19:52     ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141122042056.GY5050@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=bp@alien8.de \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.