All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: "Borislav Petkov" <bp@alien8.de>, "X86 ML" <x86@kernel.org>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"Tony Luck" <tony.luck@intel.com>,
	"Andi Kleen" <andi@firstfloor.org>,
	"Josh Triplett" <josh@joshtriplett.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>
Subject: Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context
Date: Fri, 21 Nov 2014 14:07:04 -0800	[thread overview]
Message-ID: <20141121220704.GU5050@linux.vnet.ibm.com> (raw)
In-Reply-To: <CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anBWDA@mail.gmail.com>

On Fri, Nov 21, 2014 at 01:32:50PM -0800, Andy Lutomirski wrote:
> On Fri, Nov 21, 2014 at 1:26 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > We currently pretend that IST context is like standard exception
> > context, but this is incorrect.  IST entries from userspace are like
> > standard exceptions except that they use per-cpu stacks, so they are
> > atomic.  IST entries from kernel space are like NMIs from RCU's
> > perspective -- they are not quiescent states even if they
> > interrupted the kernel during a quiescent state.
> >
> > Add and use ist_enter and ist_exit to track IST context.  Even
> > though x86_32 has no IST stacks, we track these interrupts the same
> > way.
> 
> I should add:
> 
> I have no idea why RCU read-side critical sections are safe inside
> __do_page_fault today.  It's guarded by exception_enter(), but that
> doesn't do anything if context tracking is off, and context tracking
> is usually off. What am I missing here?

Ah!  There are three cases:

1.	Context tracking is off on a non-idle CPU.  In this case, RCU is
	still paying attention to CPUs running in both userspace and in
	the kernel.  So if a page fault happens, RCU will be set up to
	notice any RCU read-side critical sections.

2.	Context tracking is on on a non-idle CPU.  In this case, RCU
	might well be ignoring userspace execution: NO_HZ_FULL and
	all that.  However, as you pointed out, in this case the
	context-tracking code lets RCU know that we have entered the
	kernel, which means that RCU will again be paying attention to
	RCU read-side critical sections.

3.	The CPU is idle.  In this case, RCU is ignoring the CPU, so
	if we take a page fault when context tracking is off, life
	will be hard.  But the kernel is not supposed to take page
	faults in the idle loop, so this is not a problem.

Make sense?

							Thanx, Paul

> --Andy
> 
> >
> > This fixes two issues:
> >
> >  - Scheduling from an IST interrupt handler will now warn.  It would
> >    previously appear to work as long as we got lucky and nothing
> >    overwrote the stack frame.  (I don't know of any bugs in this
> >    that would trigger the warning, but it's good to be on the safe
> >    side.)
> >
> >  - RCU handling in IST context was dangerous.  As far as I know,
> >    only machine checks were likely to trigger this, but it's good to
> >    be on the safe side.
> >
> > Note that the machine check handlers appears to have been missing
> > any context tracking at all before this patch.
> >
> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Frédéric Weisbecker <fweisbec@gmail.com>
> > Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> > ---
> >  arch/x86/include/asm/traps.h         |  4 +++
> >  arch/x86/kernel/cpu/mcheck/mce.c     |  5 ++++
> >  arch/x86/kernel/cpu/mcheck/p5.c      |  6 +++++
> >  arch/x86/kernel/cpu/mcheck/winchip.c |  5 ++++
> >  arch/x86/kernel/traps.c              | 49 ++++++++++++++++++++++++++++++------
> >  5 files changed, 61 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
> > index bc8352e7010a..eb16a61bfd06 100644
> > --- a/arch/x86/include/asm/traps.h
> > +++ b/arch/x86/include/asm/traps.h
> > @@ -1,6 +1,7 @@
> >  #ifndef _ASM_X86_TRAPS_H
> >  #define _ASM_X86_TRAPS_H
> >
> > +#include <linux/context_tracking_state.h>
> >  #include <linux/kprobes.h>
> >
> >  #include <asm/debugreg.h>
> > @@ -109,6 +110,9 @@ asmlinkage void smp_thermal_interrupt(void);
> >  asmlinkage void mce_threshold_interrupt(void);
> >  #endif
> >
> > +extern enum ctx_state ist_enter(struct pt_regs *regs);
> > +extern void ist_exit(struct pt_regs *regs, enum ctx_state prev_state);
> > +
> >  /* Interrupts/Exceptions */
> >  enum {
> >         X86_TRAP_DE = 0,        /*  0, Divide-by-zero */
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 61a9668cebfd..b72509d77337 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -43,6 +43,7 @@
> >  #include <linux/export.h>
> >
> >  #include <asm/processor.h>
> > +#include <asm/traps.h>
> >  #include <asm/mce.h>
> >  #include <asm/msr.h>
> >
> > @@ -1016,6 +1017,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> >  {
> >         struct mca_config *cfg = &mca_cfg;
> >         struct mce m, *final;
> > +       enum ctx_state prev_state;
> >         int i;
> >         int worst = 0;
> >         int severity;
> > @@ -1038,6 +1040,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> >         DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
> >         char *msg = "Unknown";
> >
> > +       prev_state = ist_enter(regs);
> > +
> >         this_cpu_inc(mce_exception_count);
> >
> >         if (!cfg->banks)
> > @@ -1168,6 +1172,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> >         mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> >  out:
> >         sync_core();
> > +       ist_exit(regs, prev_state);
> >  }
> >  EXPORT_SYMBOL_GPL(do_machine_check);
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/p5.c b/arch/x86/kernel/cpu/mcheck/p5.c
> > index a3042989398c..ec2663a708e4 100644
> > --- a/arch/x86/kernel/cpu/mcheck/p5.c
> > +++ b/arch/x86/kernel/cpu/mcheck/p5.c
> > @@ -8,6 +8,7 @@
> >  #include <linux/smp.h>
> >
> >  #include <asm/processor.h>
> > +#include <asm/traps.h>
> >  #include <asm/mce.h>
> >  #include <asm/msr.h>
> >
> > @@ -17,8 +18,11 @@ int mce_p5_enabled __read_mostly;
> >  /* Machine check handler for Pentium class Intel CPUs: */
> >  static void pentium_machine_check(struct pt_regs *regs, long error_code)
> >  {
> > +       enum ctx_state prev_state;
> >         u32 loaddr, hi, lotype;
> >
> > +       prev_state = ist_enter(regs);
> > +
> >         rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
> >         rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
> >
> > @@ -33,6 +37,8 @@ static void pentium_machine_check(struct pt_regs *regs, long error_code)
> >         }
> >
> >         add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> > +
> > +       ist_exit(regs, prev_state);
> >  }
> >
> >  /* Set up machine check reporting for processors with Intel style MCE: */
> > diff --git a/arch/x86/kernel/cpu/mcheck/winchip.c b/arch/x86/kernel/cpu/mcheck/winchip.c
> > index 7dc5564d0cdf..bd5d46a32210 100644
> > --- a/arch/x86/kernel/cpu/mcheck/winchip.c
> > +++ b/arch/x86/kernel/cpu/mcheck/winchip.c
> > @@ -7,14 +7,19 @@
> >  #include <linux/types.h>
> >
> >  #include <asm/processor.h>
> > +#include <asm/traps.h>
> >  #include <asm/mce.h>
> >  #include <asm/msr.h>
> >
> >  /* Machine check handler for WinChip C6: */
> >  static void winchip_machine_check(struct pt_regs *regs, long error_code)
> >  {
> > +       enum ctx_state prev_state = ist_enter(regs);
> > +
> >         printk(KERN_EMERG "CPU0: Machine Check Exception.\n");
> >         add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> > +
> > +       ist_exit(regs, prev_state);
> >  }
> >
> >  /* Set up machine check reporting on the Winchip C6 series */
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 0d0e922fafc1..f5c4b8813774 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -107,6 +107,39 @@ static inline void preempt_conditional_cli(struct pt_regs *regs)
> >         preempt_count_dec();
> >  }
> >
> > +enum ctx_state ist_enter(struct pt_regs *regs)
> > +{
> > +       /*
> > +        * We are atomic because we're on the IST stack (or we're on x86_32,
> > +        * in which case we still shouldn't schedule.
> > +        */
> > +       preempt_count_add(HARDIRQ_OFFSET);
> > +
> > +       if (user_mode_vm(regs)) {
> > +               /* Other than that, we're just an exception. */
> > +               return exception_enter();
> > +       } else {
> > +               /*
> > +                * We might have interrupted pretty much anything.  In
> > +                * fact, if we're a machine check, we can even interrupt
> > +                * NMI processing.  We don't want in_nmi() to return true,
> > +                * but we need to notify RCU.
> > +                */
> > +               rcu_nmi_enter();
> > +               return IN_KERNEL;  /* the value is irrelevant. */
> > +       }
> > +}
> > +
> > +void ist_exit(struct pt_regs *regs, enum ctx_state prev_state)
> > +{
> > +       preempt_count_sub(HARDIRQ_OFFSET);
> > +
> > +       if (user_mode_vm(regs))
> > +               return exception_exit(prev_state);
> > +       else
> > +               rcu_nmi_exit();
> > +}
> > +
> >  static nokprobe_inline int
> >  do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str,
> >                   struct pt_regs *regs, long error_code)
> > @@ -244,14 +277,14 @@ dotraplinkage void do_stack_segment(struct pt_regs *regs, long error_code)
> >  {
> >         enum ctx_state prev_state;
> >
> > -       prev_state = exception_enter();
> > +       prev_state = ist_enter(regs);
> >         if (notify_die(DIE_TRAP, "stack segment", regs, error_code,
> >                        X86_TRAP_SS, SIGBUS) != NOTIFY_STOP) {
> >                 preempt_conditional_sti(regs);
> >                 do_trap(X86_TRAP_SS, SIGBUS, "stack segment", regs, error_code, NULL);
> >                 preempt_conditional_cli(regs);
> >         }
> > -       exception_exit(prev_state);
> > +       ist_exit(regs, prev_state);
> >  }
> >
> >  dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
> > @@ -259,8 +292,8 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code)
> >         static const char str[] = "double fault";
> >         struct task_struct *tsk = current;
> >
> > -       exception_enter();
> > -       /* Return not checked because double check cannot be ignored */
> > +       ist_enter(regs);
> > +       /* Return not checked because double fault cannot be ignored */
> >         notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
> >
> >         tsk->thread.error_code = error_code;
> > @@ -343,7 +376,7 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
> >         if (poke_int3_handler(regs))
> >                 return;
> >
> > -       prev_state = exception_enter();
> > +       prev_state = ist_enter(regs);
> >  #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
> >         if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
> >                                 SIGTRAP) == NOTIFY_STOP)
> > @@ -369,7 +402,7 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
> >         preempt_conditional_cli(regs);
> >         debug_stack_usage_dec();
> >  exit:
> > -       exception_exit(prev_state);
> > +       ist_exit(regs, prev_state);
> >  }
> >  NOKPROBE_SYMBOL(do_int3);
> >
> > @@ -433,7 +466,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
> >         unsigned long dr6;
> >         int si_code;
> >
> > -       prev_state = exception_enter();
> > +       prev_state = ist_enter(regs);
> >
> >         get_debugreg(dr6, 6);
> >
> > @@ -508,7 +541,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
> >         debug_stack_usage_dec();
> >
> >  exit:
> > -       exception_exit(prev_state);
> > +       ist_exit(regs, prev_state);
> >  }
> >  NOKPROBE_SYMBOL(do_debug);
> >
> > --
> > 1.9.3
> >
> 
> 
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC
> 


  reply	other threads:[~2014-11-21 22:07 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-21 21:26 [PATCH v4 0/5] x86: Rework IST interrupts Andy Lutomirski
2014-11-21 21:26 ` [PATCH v4 1/5] uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME Andy Lutomirski
2014-11-22 16:55   ` Borislav Petkov
2014-11-24 17:58     ` Andy Lutomirski
2014-11-21 21:26 ` [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context Andy Lutomirski
2014-11-21 21:32   ` Andy Lutomirski
2014-11-21 22:07     ` Paul E. McKenney [this message]
2014-11-21 22:19       ` Andy Lutomirski
2014-11-21 22:55         ` Paul E. McKenney
2014-11-21 23:06           ` Andy Lutomirski
2014-11-21 23:38             ` Paul E. McKenney
2014-11-22  2:00               ` Andy Lutomirski
2014-11-22  4:20                 ` Paul E. McKenney
2014-11-22  5:53                   ` Andy Lutomirski
2014-11-22 23:41                     ` Paul E. McKenney
2014-11-24 20:22                       ` Andy Lutomirski
2014-11-24 20:54                         ` Paul E. McKenney
2014-11-24 21:02                           ` Andy Lutomirski
2014-11-24 21:35                             ` Paul E. McKenney
2014-11-24 22:34                               ` Paul E. McKenney
2014-11-24 22:36                                 ` Andy Lutomirski
2014-11-24 22:57                                   ` Paul E. McKenney
2014-11-24 23:31                                     ` Paul E. McKenney
2014-11-24 23:35                                       ` Andy Lutomirski
2014-11-24 23:50                                         ` Paul E. McKenney
2014-11-24 23:52                                           ` Andy Lutomirski
2014-11-25 18:58                                             ` Borislav Petkov
2014-11-25 19:16                                               ` Paul E. McKenney
2014-12-11  0:22                                               ` Tony Luck
2014-12-11  0:24                                                 ` Andy Lutomirski
2015-01-05 21:46                                                   ` Tony Luck
2015-01-05 21:54                                                     ` Andy Lutomirski
2015-01-06  0:44                                                       ` [PATCH] x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks Luck, Tony
2015-01-06  1:01                                                         ` Andy Lutomirski
2015-01-06 18:00                                                           ` Luck, Tony
2015-01-07 12:13                                                             ` Borislav Petkov
2015-01-07 15:51                                                               ` Andy Lutomirski
2015-01-07 15:58                                                                 ` Borislav Petkov
2015-01-07 16:12                                                                 ` Paul E. McKenney
2014-11-25 17:13                                           ` [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context Paul E. McKenney
2014-11-27  7:03                                           ` Lai Jiangshan
2014-11-27 16:46                                             ` Paul E. McKenney
2014-11-24 21:27                           ` Paul E. McKenney
2014-11-21 22:20       ` Frederic Weisbecker
2014-11-21 22:00   ` Paul E. McKenney
2014-11-22 17:20   ` Borislav Petkov
2014-11-24 19:48     ` Andy Lutomirski
2015-01-22 21:52   ` Sasha Levin
2015-01-23 17:58     ` Andy Lutomirski
2015-01-23 18:04       ` Borislav Petkov
2015-01-23 18:34         ` Andy Lutomirski
2015-01-23 20:48           ` Sasha Levin
2015-01-24  1:25             ` Andy Lutomirski
2015-01-28 16:33               ` Andy Lutomirski
2015-01-28 17:48                 ` Paul E. McKenney
2015-01-28 21:02                   ` Andy Lutomirski
2015-01-30 19:57                     ` Sasha Levin
2015-01-31  1:28                       ` Sasha Levin
2015-01-31  3:12                         ` Andy Lutomirski
2015-01-31 12:50                           ` Andy Lutomirski
2015-01-31 13:01                         ` [PATCH] x86, traps: Fix ist_enter from userspace Andy Lutomirski
2015-01-31 15:09                           ` Sasha Levin
2015-01-31 16:18                           ` Paul E. McKenney
2015-02-01  2:17                             ` Andy Lutomirski
2015-02-04  6:01                           ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2014-11-21 21:26 ` [PATCH v4 3/5] x86, entry: Switch stacks on a paranoid entry " Andy Lutomirski
2014-11-24 15:55   ` Borislav Petkov
2014-11-21 21:26 ` [PATCH v4 4/5] x86: Clean up current_stack_pointer Andy Lutomirski
2014-11-24 11:39   ` Borislav Petkov
2014-11-21 21:26 ` [PATCH v4 5/5] x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic Andy Lutomirski
2014-11-24 15:54   ` Borislav Petkov
2014-11-24 19:52     ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141121220704.GU5050@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=bp@alien8.de \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.