All of lore.kernel.org
 help / color / mirror / Atom feed
* Request for stable 3.18.y and 3.14.y inclusion: Fix for CVE-2015-3290 (nmi)
@ 2015-08-17 10:39 Thomas D.
  2015-08-17 13:23 ` Greg KH
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas D. @ 2015-08-17 10:39 UTC (permalink / raw)
  To: stable

Hi,

I don't see the fixes for CVE-2015-3290 in 3.18.y or 3.14.y and can't
find them quequed so I am requesting the following commits for inclusion in

 - 3.18
 - 3.14

I am quoting from 4.1.6:


commit 37df1cab0c4d4ec0f4bec868b2e26b84e725c478
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Jul 15 10:29:38 2015 -0700

    x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection

    commit 810bc075f78ff2c221536eb3008eac6a492dba2d upstream.

    We have a tricky bug in the nested NMI code: if we see RSP
    pointing to the NMI stack on NMI entry from kernel mode, we
    assume that we are executing a nested NMI.

    This isn't quite true.  A malicious userspace program can point
    RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
    happen while RSP is still pointing at the NMI stack.

    Fix it with a sneaky trick.  Set DF in the region of code that
    the RSP check is intended to detect.  IRET will clear DF
    atomically.

    ( Note: other than paravirt, there's little need for all this
      complexity. We could check RIP instead of RSP. )


commit d8246ca4e3ce08c9ed98ebe292f36ee2bc5f54ab
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Jul 15 10:29:37 2015 -0700

    x86/nmi/64: Reorder nested NMI checks

    commit a27507ca2d796cfa8d907de31ad730359c8a6d06 upstream.

    Check the repeat_nmi .. end_repeat_nmi special case first.  The
    next patch will rework the RSP check and, as a side effect, the
    RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
    we'll need this ordering of the checks.

    Note: this is more subtle than it appears.  The check for
    repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
    instead of adjusting the "iret" frame to force a repeat.  This
    is necessary, because the code between repeat_nmi and
    end_repeat_nmi sets "NMI executing" and then writes to the
    "iret" frame itself.  If a nested NMI comes in and modifies the
    "iret" frame while repeat_nmi is also modifying it, we'll end up
    with garbage.  The old code got this right, as does the new
    code, but the new code is a bit more explicit.

    If we were to move the check right after the "NMI executing"
    check, then we'd get it wrong and have random crashes.

    ( Because the "NMI executing" check would jump to the code that would
      modify the "iret" frame without checking if the interrupted NMI was
      currently modifying it. )


commit 1dd191d72fdfb46988d8c97c8b19fc6d83d44d3e
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Jul 15 10:29:36 2015 -0700

    x86/nmi/64: Improve nested NMI comments

    commit 0b22930ebad563ae97ff3f8d7b9f12060b4c6e6b upstream.

    I found the nested NMI documentation to be difficult to follow.
    Improve the comments.


commit 60e6cbaf875edd9aef40948d0790decb8e1a77cc
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Jul 15 10:29:35 2015 -0700

    x86/nmi/64: Switch stacks on userspace NMI entry

    commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.

    Returning to userspace is tricky: IRET can fail, and ESPFIX can
    rearrange the stack prior to IRET.

    The NMI nesting fixup relies on a precise stack layout and
    atomic IRET.  Rather than trying to teach the NMI nesting fixup
    to handle ESPFIX and failed IRET, punt: run NMIs that came from
    user mode on the normal kernel stack.

    This will make some nested NMIs visible to C code, but the C
    code is okay with that.

    As a side effect, this should speed up perf: it eliminates an
    RDMSR when NMIs come from user mode.


commit f163d838c24833d4493e693a612be275b723d5e8
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Jul 15 10:29:34 2015 -0700

    x86/nmi/64: Remove asm code that saves CR2

    commit 0e181bb58143cb4a2e8f01c281b0816cd0e4798e upstream.

    Now that do_nmi saves CR2, we don't need to save it in asm.


commit e0146756cb2b7122adbe9b5bb97da4a99c5437b3
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Jul 15 10:29:33 2015 -0700

    x86/nmi: Enable nested do_nmi() handling for 64-bit kernels

    commit 9d05041679904b12c12421cbcf9cb5f4860a8d7b upstream.

    32-bit kernels handle nested NMIs in C.  Enable the exact same
    handling on 64-bit kernels as well.  This isn't currently
    necessary, but it will become necessary once the asm code starts
    allowing limited nesting.



-Thomas


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Request for stable 3.18.y and 3.14.y inclusion: Fix for CVE-2015-3290 (nmi)
  2015-08-17 10:39 Request for stable 3.18.y and 3.14.y inclusion: Fix for CVE-2015-3290 (nmi) Thomas D.
@ 2015-08-17 13:23 ` Greg KH
  2015-08-17 22:55   ` [PATCH-v3.14.y 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
  0 siblings, 1 reply; 21+ messages in thread
From: Greg KH @ 2015-08-17 13:23 UTC (permalink / raw)
  To: Thomas D.; +Cc: stable

On Mon, Aug 17, 2015 at 12:39:59PM +0200, Thomas D. wrote:
> Hi,
> 
> I don't see the fixes for CVE-2015-3290 in 3.18.y or 3.14.y and can't
> find them quequed so I am requesting the following commits for inclusion in
> 
>  - 3.18
>  - 3.14

Do you have a backport that is tested for these kernel versions that you
can point to for me to include in the 3.14-stable tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157
  2015-08-17 13:23 ` Greg KH
@ 2015-08-17 22:55   ` Thomas D
  2015-08-17 22:55     ` [PATCH-v3.14.y 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Thomas D
                       ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Thomas D @ 2015-08-17 22:55 UTC (permalink / raw)
  To: stable; +Cc: luto, Thomas D

Hi,

here's my backport for CVE-2015-3290 and linux-3.14.

How I tested the backport:

1. I compiled and booted vanilla linux-3.14.51.

2. I run the public exploit for CVE-2015-3290 [1] from Andrew Lutomirski
   against the kernel. Nothing really happened but I saw output I
   shouldn't see. While the exploit was still hammering the system I
   started the public exploit for CVE-2015-5157 [2] (also from Andrew) in
   addition.

3. Now the system logged 

> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.874717] kernel BUG at arch/x86/kernel/traps.c:413!
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.875987] invalid opcode: 0000 [#2] SMP
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.877267] Modules linked in: xt_recent xt_comment ipt_REJECT xt_addrtype xt_mark xt_CT xt_multiport ipt_ULOG xt_NFLOG nfnetlink_log xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpudp xt_conntrack iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_filter ip_tables x_tables binfmt_misc coretemp microcode psmouse pcspkr libcrc32c dm_log_userspace vmxnet3 e1000 fuse nfs lockd sunrpc fscache dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log usb_storage
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.886469] CPU: 0 PID: 15061 Comm: CVE-2015-5157 Tainted: G      D      3.14.51 #1
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.888055] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.889664] task: ffff8800b9c40000 ti: ffff8800b9eb4000 task.ti: ffff8800b9eb4000
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.891250] RIP: 0010:[<ffffffff81621280>]  [<ffffffff81621280>] fixup_bad_iret+0x60/0x70
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.892913] RSP: 0000:ffff88013fc05ec8  EFLAGS: 00010046
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.894459] RAX: ffff8800b9eb5f50 RBX: ffff8800b9eb5f50 RCX: ffffffff81620827
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.895944] RDX: 0000000000000008 RSI: ffff88013fc05f70 RDI: ffff8800b9eb5fd0
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.897387] RBP: ffff88013fc05ee0 R08: 00000000ffe58efc R09: 0000000000000000
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.898796] R10: 0000000000000004 R11: 0000000000000004 R12: ffff8800b9eb6000
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.900178] R13: ffff88013fc05ef0 R14: 0000000000000000 R15: 0000000000000000
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.901554] FS:  0000000000000000(0000) GS:ffff88013fc00000(0063) knlGS:00000000f75c7940
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.903066] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.904771] CR2: 00000000f75f4320 CR3: 00000000b9e47000 CR4: 00000000001407f0
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.906599] Stack:
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.908242]  0000000000000001 0000000000000000 0000000000000000 00000000ffe58f18
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.909886]  ffffffff81620c31 ffffffff816209dc 0000000000000000 0000000000000000
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.911241]  0000000000000000 0000000000000000 00000000ffe58f18 00000000ffe58e70
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.912701] Call Trace:
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.914201]  <NMI>
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.914216]  [<ffffffff81620c31>] error_bad_iret+0xb/0x1a
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.916857]  [<ffffffff816209dc>] ? general_protection+0xc/0x30
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.918193]  [<ffffffff81620827>] ? native_iret+0x7/0x7
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.919493]  [<ffffffff81620d27>] ? first_nmi+0x1e/0x1e
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.920790]  [<ffffffff816209d0>] ? stack_segment+0x30/0x30
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.922079]  <<EOE>>
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.922092] Code: 00 00 e8 14 71 d2 ff ba 88 00 00 00 4c 89 ee 48 89 df e8 04 71 d2 ff 41 f6 44 24 e0 03 74 0c 48 89 d8 5b 41 5c 41 5d 5d c3 66 90 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.926265] RIP  [<ffffffff81621280>] fixup_bad_iret+0x60/0x70
> Aug 17 17:26:09 vm-gentoo-x64 kernel: [  808.927620]  RSP <ffff88013fc05ec8>

and finally crashed (rebooted).


4. After I backported the fixes, I re-compiled the kernel and tested again.

5. Nothing happens. No crash anymore, nor output. Well, that's not 100%
   correct, kernel logged

Aug 17 23:52:50 vm-gentoo-x64 kernel: [  355.090003] Uhhuh. NMI received for unknown reason 31 on CPU 0.
Aug 17 23:52:50 vm-gentoo-x64 kernel: [  355.090279] Do you have a strange power saving mode enabled?
Aug 17 23:52:50 vm-gentoo-x64 kernel: [  355.090549] Dazed and confused, but trying to continue

   while running exploit from CVE-2015-5157 but this seems to be OK.



But please before you accept the backport, someone needs to review and
acknowledge at least commit 6d420d6f05010e7113ddf04c748ca137ed2aea54
(x86/nmi/64: Switch stacks on userspace NMI entry) in detail:

3.14.y has no "restore_c_regs_and_iret" lable so I added the "Open-code
the entire return process for compatibility with varying" block with the
additional addq/popq calls I found in Debian's patch for 3.16.y [3].

But to be honest I don't know what I am doing here so please review.

Thanks!



See also:
=========
[1] http://www.openwall.com/lists/oss-security/2015/08/04/8

[2] http://www.openwall.com/lists/oss-security/2015/07/22/7

[3] https://anonscm.debian.org/cgit/kernel/linux.git/tree/debian/patches/bugfix/x86/0006-x86-nmi-64-Switch-stacks-on-userspace-NMI-entry.patch?h=jessie#n112


Andy Lutomirski (6):
  x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
  x86/nmi/64: Remove asm code that saves CR2
  x86/nmi/64: Switch stacks on userspace NMI entry
  x86/nmi/64: Improve nested NMI comments
  x86/nmi/64: Reorder nested NMI checks
  x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI
    detection

 arch/x86/kernel/entry_64.S | 296 ++++++++++++++++++++++++++++++---------------
 arch/x86/kernel/nmi.c      | 123 ++++++++-----------
 2 files changed, 249 insertions(+), 170 deletions(-)

-- 
2.5.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
  2015-08-17 22:55   ` [PATCH-v3.14.y 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
@ 2015-08-17 22:55     ` Thomas D
  2015-08-17 22:55     ` [PATCH-v3.14.y 2/6] x86/nmi/64: Remove asm code that saves CR2 Thomas D
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-17 22:55 UTC (permalink / raw)
  To: stable
  Cc: luto, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar

From: Andy Lutomirski <luto@kernel.org>

32-bit kernels handle nested NMIs in C.  Enable the exact same
handling on 64-bit kernels as well.  This isn't currently
necessary, but it will become necessary once the asm code starts
allowing limited nesting.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/nmi.c | 123 +++++++++++++++++++++-----------------------------
 1 file changed, 52 insertions(+), 71 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 6fcb49c..c1c7d41 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -392,15 +392,15 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 }
 
 /*
- * NMIs can hit breakpoints which will cause it to lose its
- * NMI context with the CPU when the breakpoint does an iret.
- */
-#ifdef CONFIG_X86_32
-/*
- * For i386, NMIs use the same stack as the kernel, and we can
- * add a workaround to the iret problem in C (preventing nested
- * NMIs if an NMI takes a trap). Simply have 3 states the NMI
- * can be in:
+ * NMIs can hit breakpoints which will cause it to lose its NMI context
+ * with the CPU when the breakpoint or page fault does an IRET.
+ *
+ * As a result, NMIs can nest if NMIs get unmasked due an IRET during
+ * NMI processing.  On x86_64, the asm glue protects us from nested NMIs
+ * if the outer NMI came from kernel mode, but we can still nest if the
+ * outer NMI came from user mode.
+ *
+ * To handle these nested NMIs, we have three states:
  *
  *  1) not running
  *  2) executing
@@ -414,15 +414,14 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
  * (Note, the latch is binary, thus multiple NMIs triggering,
  *  when one is running, are ignored. Only one NMI is restarted.)
  *
- * If an NMI hits a breakpoint that executes an iret, another
- * NMI can preempt it. We do not want to allow this new NMI
- * to run, but we want to execute it when the first one finishes.
- * We set the state to "latched", and the exit of the first NMI will
- * perform a dec_return, if the result is zero (NOT_RUNNING), then
- * it will simply exit the NMI handler. If not, the dec_return
- * would have set the state to NMI_EXECUTING (what we want it to
- * be when we are running). In this case, we simply jump back
- * to rerun the NMI handler again, and restart the 'latched' NMI.
+ * If an NMI executes an iret, another NMI can preempt it. We do not
+ * want to allow this new NMI to run, but we want to execute it when the
+ * first one finishes.  We set the state to "latched", and the exit of
+ * the first NMI will perform a dec_return, if the result is zero
+ * (NOT_RUNNING), then it will simply exit the NMI handler. If not, the
+ * dec_return would have set the state to NMI_EXECUTING (what we want it
+ * to be when we are running). In this case, we simply jump back to
+ * rerun the NMI handler again, and restart the 'latched' NMI.
  *
  * No trap (breakpoint or page fault) should be hit before nmi_restart,
  * thus there is no race between the first check of state for NOT_RUNNING
@@ -445,49 +444,36 @@ enum nmi_states {
 static DEFINE_PER_CPU(enum nmi_states, nmi_state);
 static DEFINE_PER_CPU(unsigned long, nmi_cr2);
 
-#define nmi_nesting_preprocess(regs)					\
-	do {								\
-		if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {	\
-			this_cpu_write(nmi_state, NMI_LATCHED);		\
-			return;						\
-		}							\
-		this_cpu_write(nmi_state, NMI_EXECUTING);		\
-		this_cpu_write(nmi_cr2, read_cr2());			\
-	} while (0);							\
-	nmi_restart:
-
-#define nmi_nesting_postprocess()					\
-	do {								\
-		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
-			write_cr2(this_cpu_read(nmi_cr2));		\
-		if (this_cpu_dec_return(nmi_state))			\
-			goto nmi_restart;				\
-	} while (0)
-#else /* x86_64 */
+#ifdef CONFIG_X86_64
 /*
- * In x86_64 things are a bit more difficult. This has the same problem
- * where an NMI hitting a breakpoint that calls iret will remove the
- * NMI context, allowing a nested NMI to enter. What makes this more
- * difficult is that both NMIs and breakpoints have their own stack.
- * When a new NMI or breakpoint is executed, the stack is set to a fixed
- * point. If an NMI is nested, it will have its stack set at that same
- * fixed address that the first NMI had, and will start corrupting the
- * stack. This is handled in entry_64.S, but the same problem exists with
- * the breakpoint stack.
+ * In x86_64, we need to handle breakpoint -> NMI -> breakpoint.  Without
+ * some care, the inner breakpoint will clobber the outer breakpoint's
+ * stack.
  *
- * If a breakpoint is being processed, and the debug stack is being used,
- * if an NMI comes in and also hits a breakpoint, the stack pointer
- * will be set to the same fixed address as the breakpoint that was
- * interrupted, causing that stack to be corrupted. To handle this case,
- * check if the stack that was interrupted is the debug stack, and if
- * so, change the IDT so that new breakpoints will use the current stack
- * and not switch to the fixed address. On return of the NMI, switch back
- * to the original IDT.
+ * If a breakpoint is being processed, and the debug stack is being
+ * used, if an NMI comes in and also hits a breakpoint, the stack
+ * pointer will be set to the same fixed address as the breakpoint that
+ * was interrupted, causing that stack to be corrupted. To handle this
+ * case, check if the stack that was interrupted is the debug stack, and
+ * if so, change the IDT so that new breakpoints will use the current
+ * stack and not switch to the fixed address. On return of the NMI,
+ * switch back to the original IDT.
  */
 static DEFINE_PER_CPU(int, update_debug_stack);
+#endif
 
-static inline void nmi_nesting_preprocess(struct pt_regs *regs)
+dotraplinkage notrace void
+do_nmi(struct pt_regs *regs, long error_code)
 {
+	if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {
+		this_cpu_write(nmi_state, NMI_LATCHED);
+		return;
+	}
+	this_cpu_write(nmi_state, NMI_EXECUTING);
+	this_cpu_write(nmi_cr2, read_cr2());
+nmi_restart:
+
+#ifdef CONFIG_X86_64
 	/*
 	 * If we interrupted a breakpoint, it is possible that
 	 * the nmi handler will have breakpoints too. We need to
@@ -498,22 +484,8 @@ static inline void nmi_nesting_preprocess(struct pt_regs *regs)
 		debug_stack_set_zero();
 		this_cpu_write(update_debug_stack, 1);
 	}
-}
-
-static inline void nmi_nesting_postprocess(void)
-{
-	if (unlikely(this_cpu_read(update_debug_stack))) {
-		debug_stack_reset();
-		this_cpu_write(update_debug_stack, 0);
-	}
-}
 #endif
 
-dotraplinkage notrace __kprobes void
-do_nmi(struct pt_regs *regs, long error_code)
-{
-	nmi_nesting_preprocess(regs);
-
 	nmi_enter();
 
 	inc_irq_stat(__nmi_count);
@@ -523,8 +495,17 @@ do_nmi(struct pt_regs *regs, long error_code)
 
 	nmi_exit();
 
-	/* On i386, may loop back to preprocess */
-	nmi_nesting_postprocess();
+#ifdef CONFIG_X86_64
+	if (unlikely(this_cpu_read(update_debug_stack))) {
+		debug_stack_reset();
+		this_cpu_write(update_debug_stack, 0);
+	}
+#endif
+
+	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
+		write_cr2(this_cpu_read(nmi_cr2));
+	if (this_cpu_dec_return(nmi_state))
+		goto nmi_restart;
 }
 
 void stop_nmi(void)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y 2/6] x86/nmi/64: Remove asm code that saves CR2
  2015-08-17 22:55   ` [PATCH-v3.14.y 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
  2015-08-17 22:55     ` [PATCH-v3.14.y 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Thomas D
@ 2015-08-17 22:55     ` Thomas D
  2015-08-17 22:55     ` [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry Thomas D
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-17 22:55 UTC (permalink / raw)
  To: stable; +Cc: luto, Linus Torvalds, Peter Zijlstra, Thomas Gleixner, Ingo Molnar

From: Andy Lutomirski <luto@kernel.org>

Now that do_nmi saves CR2, we don't need to save it in asm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/entry_64.S | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 06469ee..28b08345 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1885,28 +1885,10 @@ end_repeat_nmi:
 	call save_paranoid
 	DEFAULT_FRAME 0
 
-	/*
-	 * Save off the CR2 register. If we take a page fault in the NMI then
-	 * it could corrupt the CR2 value. If the NMI preempts a page fault
-	 * handler before it was able to read the CR2 register, and then the
-	 * NMI itself takes a page fault, the page fault that was preempted
-	 * will read the information from the NMI page fault and not the
-	 * origin fault. Save it off and restore it if it changes.
-	 * Use the r12 callee-saved register.
-	 */
-	movq %cr2, %r12
-
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
 	movq %rsp,%rdi
 	movq $-1,%rsi
 	call do_nmi
-
-	/* Did the NMI take a page fault? Restore cr2 if it did */
-	movq %cr2, %rcx
-	cmpq %rcx, %r12
-	je 1f
-	movq %r12, %cr2
-1:
 	
 	testl %ebx,%ebx				/* swapgs needed? */
 	jnz nmi_restore
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry
  2015-08-17 22:55   ` [PATCH-v3.14.y 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
  2015-08-17 22:55     ` [PATCH-v3.14.y 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Thomas D
  2015-08-17 22:55     ` [PATCH-v3.14.y 2/6] x86/nmi/64: Remove asm code that saves CR2 Thomas D
@ 2015-08-17 22:55     ` Thomas D
  2015-08-18 15:45       ` Jiri Slaby
  2015-08-17 22:55     ` [PATCH-v3.14.y 4/6] x86/nmi/64: Improve nested NMI comments Thomas D
                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Thomas D @ 2015-08-17 22:55 UTC (permalink / raw)
  To: stable
  Cc: luto, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Greg Kroah-Hartman

From: Andy Lutomirski <luto@kernel.org>

commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.

Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.

The NMI nesting fixup relies on a precise stack layout and
atomic IRET.  Rather than trying to teach the NMI nesting fixup
to handle ESPFIX and failed IRET, punt: run NMIs that came from
user mode on the normal kernel stack.

This will make some nested NMIs visible to C code, but the C
code is okay with that.

As a side effect, this should speed up perf: it eliminates an
RDMSR when NMIs come from user mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/entry_64.S | 77 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 73 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 28b08345..bd7d8aa 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1715,19 +1715,88 @@ ENTRY(nmi)
 	 * a nested NMI that updated the copy interrupt stack frame, a
 	 * jump will be made to the repeat_nmi code that will handle the second
 	 * NMI.
+	 *
+	 * However, espfix prevents us from directly returning to userspace
+	 * with a single IRET instruction.  Similarly, IRET to user mode
+	 * can fault.  We therefore handle NMIs from user space like
+	 * other IST entries.
 	 */
 
 	/* Use %rdx as out temp variable throughout */
 	pushq_cfi %rdx
 	CFI_REL_OFFSET rdx, 0
 
+	testb	$3, CS-RIP+8(%rsp)
+	jz	.Lnmi_from_kernel
+
+	/*
+	 * NMI from user mode.  We need to run on the thread stack, but we
+	 * can't go through the normal entry paths: NMIs are masked, and
+	 * we don't want to enable interrupts, because then we'll end
+	 * up in an awkward situation in which IRQs are on but NMIs
+	 * are off.
+	 */
+
+	SWAPGS
+	cld
+	movq	%rsp, %rdx
+	movq	PER_CPU_VAR(kernel_stack), %rsp
+	pushq	5*8(%rdx)	/* pt_regs->ss */
+	pushq	4*8(%rdx)	/* pt_regs->rsp */
+	pushq	3*8(%rdx)	/* pt_regs->flags */
+	pushq	2*8(%rdx)	/* pt_regs->cs */
+	pushq	1*8(%rdx)	/* pt_regs->rip */
+	pushq   $-1		/* pt_regs->orig_ax */
+	pushq   %rdi		/* pt_regs->di */
+	pushq   %rsi		/* pt_regs->si */
+	pushq   (%rdx)		/* pt_regs->dx */
+	pushq   %rcx		/* pt_regs->cx */
+	pushq   %rax		/* pt_regs->ax */
+	pushq   %r8		/* pt_regs->r8 */
+	pushq   %r9		/* pt_regs->r9 */
+	pushq   %r10		/* pt_regs->r10 */
+	pushq   %r11		/* pt_regs->r11 */
+	pushq	%rbx		/* pt_regs->rbx */
+	pushq	%rbp		/* pt_regs->rbp */
+	pushq	%r12		/* pt_regs->r12 */
+	pushq	%r13		/* pt_regs->r13 */
+	pushq	%r14		/* pt_regs->r14 */
+	pushq	%r15		/* pt_regs->r15 */
+
 	/*
-	 * If %cs was not the kernel segment, then the NMI triggered in user
-	 * space, which means it is definitely not nested.
+	 * At this point we no longer need to worry about stack damage
+	 * due to nesting -- we're on the normal thread stack and we're
+	 * done with the NMI stack.
 	 */
-	cmpl $__KERNEL_CS, 16(%rsp)
-	jne first_nmi
+	movq	%rsp, %rdi
+	movq	$-1, %rsi
+	call	do_nmi
+
+	/*
+	 * Return back to user mode.  We must *not* do the normal exit
+	 * work, because we don't want to enable interrupts.  Fortunately,
+	 * do_nmi doesn't modify pt_regs.
+	 */
+	SWAPGS
+
+	/*
+	 * Open-code the entire return process for compatibility with varying
+	 * register layouts across different kernel versions.
+	 */
+	addq	$6*8, %rsp	/* skip bx, bp, and r12-r15 */
+	popq	%r11		/* pt_regs->r11 */
+	popq	%r10		/* pt_regs->r10 */
+	popq	%r9		/* pt_regs->r9 */
+	popq	%r8		/* pt_regs->r8 */
+	popq	%rax		/* pt_regs->ax */
+	popq	%rcx		/* pt_regs->cx */
+	popq	%rdx		/* pt_regs->dx */
+	popq	%rsi		/* pt_regs->si */
+	popq	%rdi		/* pt_regs->di */
+	addq	$8, %rsp	/* skip orig_ax */
+	INTERRUPT_RETURN
 
+.Lnmi_from_kernel:
 	/*
 	 * Check the special variable on the stack to see if NMIs are
 	 * executing.
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y 4/6] x86/nmi/64: Improve nested NMI comments
  2015-08-17 22:55   ` [PATCH-v3.14.y 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
                       ` (2 preceding siblings ...)
  2015-08-17 22:55     ` [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry Thomas D
@ 2015-08-17 22:55     ` Thomas D
  2015-08-17 22:55     ` [PATCH-v3.14.y 5/6] x86/nmi/64: Reorder nested NMI checks Thomas D
  2015-08-17 22:55     ` [PATCH-v3.14.y 6/6] x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection Thomas D
  5 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-17 22:55 UTC (permalink / raw)
  To: stable
  Cc: luto, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Greg Kroah-Hartman

From: Andy Lutomirski <luto@kernel.org>

commit 0b22930ebad563ae97ff3f8d7b9f12060b4c6e6b upstream.

I found the nested NMI documentation to be difficult to follow.
Improve the comments.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/entry_64.S | 159 ++++++++++++++++++++++++++-------------------
 arch/x86/kernel/nmi.c      |   4 +-
 2 files changed, 93 insertions(+), 70 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index bd7d8aa..c080360 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1702,11 +1702,12 @@ ENTRY(nmi)
 	 *  If the variable is not set and the stack is not the NMI
 	 *  stack then:
 	 *    o Set the special variable on the stack
-	 *    o Copy the interrupt frame into a "saved" location on the stack
-	 *    o Copy the interrupt frame into a "copy" location on the stack
+	 *    o Copy the interrupt frame into an "outermost" location on the
+	 *      stack
+	 *    o Copy the interrupt frame into an "iret" location on the stack
 	 *    o Continue processing the NMI
 	 *  If the variable is set or the previous stack is the NMI stack:
-	 *    o Modify the "copy" location to jump to the repeate_nmi
+	 *    o Modify the "iret" location to jump to the repeat_nmi
 	 *    o return back to the first NMI
 	 *
 	 * Now on exit of the first NMI, we first clear the stack variable
@@ -1798,18 +1799,60 @@ ENTRY(nmi)
 
 .Lnmi_from_kernel:
 	/*
-	 * Check the special variable on the stack to see if NMIs are
-	 * executing.
+	 * Here's what our stack frame will look like:
+	 * +---------------------------------------------------------+
+	 * | original SS                                             |
+	 * | original Return RSP                                     |
+	 * | original RFLAGS                                         |
+	 * | original CS                                             |
+	 * | original RIP                                            |
+	 * +---------------------------------------------------------+
+	 * | temp storage for rdx                                    |
+	 * +---------------------------------------------------------+
+	 * | "NMI executing" variable                                |
+	 * +---------------------------------------------------------+
+	 * | iret SS          } Copied from "outermost" frame        |
+	 * | iret Return RSP  } on each loop iteration; overwritten  |
+	 * | iret RFLAGS      } by a nested NMI to force another     |
+	 * | iret CS          } iteration if needed.                 |
+	 * | iret RIP         }                                      |
+	 * +---------------------------------------------------------+
+	 * | outermost SS          } initialized in first_nmi;       |
+	 * | outermost Return RSP  } will not be changed before      |
+	 * | outermost RFLAGS      } NMI processing is done.         |
+	 * | outermost CS          } Copied to "iret" frame on each  |
+	 * | outermost RIP         } iteration.                      |
+	 * +---------------------------------------------------------+
+	 * | pt_regs                                                 |
+	 * +---------------------------------------------------------+
+	 *
+	 * The "original" frame is used by hardware.  Before re-enabling
+	 * NMIs, we need to be done with it, and we need to leave enough
+	 * space for the asm code here.
+	 *
+	 * We return by executing IRET while RSP points to the "iret" frame.
+	 * That will either return for real or it will loop back into NMI
+	 * processing.
+	 *
+	 * The "outermost" frame is copied to the "iret" frame on each
+	 * iteration of the loop, so each iteration starts with the "iret"
+	 * frame pointing to the final return target.
+	 */
+
+	/*
+	 * Determine whether we're a nested NMI.
+	 *
+	 * First check "NMI executing".  If it's set, then we're nested.
+	 * This will not detect if we interrupted an outer NMI just
+	 * before IRET.
 	 */
 	cmpl $1, -8(%rsp)
 	je nested_nmi
 
 	/*
-	 * Now test if the previous stack was an NMI stack.
-	 * We need the double check. We check the NMI stack to satisfy the
-	 * race when the first NMI clears the variable before returning.
-	 * We check the variable because the first NMI could be in a
-	 * breakpoint routine using a breakpoint stack.
+	 * Now test if the previous stack was an NMI stack.  This covers
+	 * the case where we interrupt an outer NMI after it clears
+	 * "NMI executing" but before IRET.
 	 */
 	lea 6*8(%rsp), %rdx
 	test_in_nmi rdx, 4*8(%rsp), nested_nmi, first_nmi
@@ -1817,9 +1860,11 @@ ENTRY(nmi)
 
 nested_nmi:
 	/*
-	 * Do nothing if we interrupted the fixup in repeat_nmi.
-	 * It's about to repeat the NMI handler, so we are fine
-	 * with ignoring this one.
+	 * If we interrupted an NMI that is between repeat_nmi and
+	 * end_repeat_nmi, then we must not modify the "iret" frame
+	 * because it's being written by the outer NMI.  That's okay;
+	 * the outer NMI handler is about to call do_nmi anyway,
+	 * so we can just resume the outer NMI.
 	 */
 	movq $repeat_nmi, %rdx
 	cmpq 8(%rsp), %rdx
@@ -1829,7 +1874,10 @@ nested_nmi:
 	ja nested_nmi_out
 
 1:
-	/* Set up the interrupted NMIs stack to jump to repeat_nmi */
+	/*
+	 * Modify the "iret" frame to point to repeat_nmi, forcing another
+	 * iteration of NMI handling.
+	 */
 	leaq -1*8(%rsp), %rdx
 	movq %rdx, %rsp
 	CFI_ADJUST_CFA_OFFSET 1*8
@@ -1848,60 +1896,23 @@ nested_nmi_out:
 	popq_cfi %rdx
 	CFI_RESTORE rdx
 
-	/* No need to check faults here */
+	/* We are returning to kernel mode, so this cannot result in a fault. */
 	INTERRUPT_RETURN
 
 	CFI_RESTORE_STATE
 first_nmi:
-	/*
-	 * Because nested NMIs will use the pushed location that we
-	 * stored in rdx, we must keep that space available.
-	 * Here's what our stack frame will look like:
-	 * +-------------------------+
-	 * | original SS             |
-	 * | original Return RSP     |
-	 * | original RFLAGS         |
-	 * | original CS             |
-	 * | original RIP            |
-	 * +-------------------------+
-	 * | temp storage for rdx    |
-	 * +-------------------------+
-	 * | NMI executing variable  |
-	 * +-------------------------+
-	 * | copied SS               |
-	 * | copied Return RSP       |
-	 * | copied RFLAGS           |
-	 * | copied CS               |
-	 * | copied RIP              |
-	 * +-------------------------+
-	 * | Saved SS                |
-	 * | Saved Return RSP        |
-	 * | Saved RFLAGS            |
-	 * | Saved CS                |
-	 * | Saved RIP               |
-	 * +-------------------------+
-	 * | pt_regs                 |
-	 * +-------------------------+
-	 *
-	 * The saved stack frame is used to fix up the copied stack frame
-	 * that a nested NMI may change to make the interrupted NMI iret jump
-	 * to the repeat_nmi. The original stack frame and the temp storage
-	 * is also used by nested NMIs and can not be trusted on exit.
-	 */
-	/* Do not pop rdx, nested NMIs will corrupt that part of the stack */
+	/* Restore rdx. */
 	movq (%rsp), %rdx
 	CFI_RESTORE rdx
 
-	/* Set the NMI executing variable on the stack. */
+	/* Set "NMI executing" on the stack. */
 	pushq_cfi $1
 
-	/*
-	 * Leave room for the "copied" frame
-	 */
+	/* Leave room for the "iret" frame */
 	subq $(5*8), %rsp
 	CFI_ADJUST_CFA_OFFSET 5*8
 
-	/* Copy the stack frame to the Saved frame */
+	/* Copy the "original" frame to the "outermost" frame */
 	.rept 5
 	pushq_cfi 11*8(%rsp)
 	.endr
@@ -1909,6 +1920,7 @@ first_nmi:
 
 	/* Everything up to here is safe from nested NMIs */
 
+repeat_nmi:
 	/*
 	 * If there was a nested NMI, the first NMI's iret will return
 	 * here. But NMIs are still enabled and we can take another
@@ -1917,16 +1929,21 @@ first_nmi:
 	 * it will just return, as we are about to repeat an NMI anyway.
 	 * This makes it safe to copy to the stack frame that a nested
 	 * NMI will update.
-	 */
-repeat_nmi:
-	/*
-	 * Update the stack variable to say we are still in NMI (the update
-	 * is benign for the non-repeat case, where 1 was pushed just above
-	 * to this very stack slot).
+	 *
+	 * RSP is pointing to "outermost RIP".  gsbase is unknown, but, if
+	 * we're repeating an NMI, gsbase has the same value that it had on
+	 * the first iteration.  paranoid_entry will load the kernel
+	 * gsbase if needed before we call do_nmi.
+	 *
+	 * Set "NMI executing" in case we came back here via IRET.
 	 */
 	movq $1, 10*8(%rsp)
 
-	/* Make another copy, this one may be modified by nested NMIs */
+	/*
+	 * Copy the "outermost" frame to the "iret" frame.  NMIs that nest
+	 * here must not modify the "iret" frame while we're writing to
+	 * it or it will end up containing garbage.
+	 */
 	addq $(10*8), %rsp
 	CFI_ADJUST_CFA_OFFSET -10*8
 	.rept 5
@@ -1937,9 +1954,9 @@ repeat_nmi:
 end_repeat_nmi:
 
 	/*
-	 * Everything below this point can be preempted by a nested
-	 * NMI if the first NMI took an exception and reset our iret stack
-	 * so that we repeat another NMI.
+	 * Everything below this point can be preempted by a nested NMI.
+	 * If this happens, then the inner NMI will change the "iret"
+	 * frame to point back to repeat_nmi.
 	 */
 	pushq_cfi $-1		/* ORIG_RAX: no syscall to restart */
 	subq $ORIG_RAX-R15, %rsp
@@ -1967,9 +1984,15 @@ nmi_restore:
 	/* Pop the extra iret frame at once */
 	RESTORE_ALL 6*8
 
-	/* Clear the NMI executing stack variable */
+	/* Clear "NMI executing". */
 	movq $0, 5*8(%rsp)
-	jmp irq_return
+
+	/*
+	 * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
+	 * stack in a single instruction.  We are returning to kernel
+	 * mode, so this cannot result in a fault.
+	 */
+	INTERRUPT_RETURN
 	CFI_ENDPROC
 END(nmi)
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index c1c7d41..8facfb3 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -392,8 +392,8 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 }
 
 /*
- * NMIs can hit breakpoints which will cause it to lose its NMI context
- * with the CPU when the breakpoint or page fault does an IRET.
+ * NMIs can page fault or hit breakpoints which will cause it to lose
+ * its NMI context with the CPU when the breakpoint or page fault does an IRET.
  *
  * As a result, NMIs can nest if NMIs get unmasked due an IRET during
  * NMI processing.  On x86_64, the asm glue protects us from nested NMIs
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y 5/6] x86/nmi/64: Reorder nested NMI checks
  2015-08-17 22:55   ` [PATCH-v3.14.y 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
                       ` (3 preceding siblings ...)
  2015-08-17 22:55     ` [PATCH-v3.14.y 4/6] x86/nmi/64: Improve nested NMI comments Thomas D
@ 2015-08-17 22:55     ` Thomas D
  2015-08-17 22:55     ` [PATCH-v3.14.y 6/6] x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection Thomas D
  5 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-17 22:55 UTC (permalink / raw)
  To: stable
  Cc: luto, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Greg Kroah-Hartman

From: Andy Lutomirski <luto@kernel.org>

commit a27507ca2d796cfa8d907de31ad730359c8a6d06 upstream.

Check the repeat_nmi .. end_repeat_nmi special case first.  The
next patch will rework the RSP check and, as a side effect, the
RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
we'll need this ordering of the checks.

Note: this is more subtle than it appears.  The check for
repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
instead of adjusting the "iret" frame to force a repeat.  This
is necessary, because the code between repeat_nmi and
end_repeat_nmi sets "NMI executing" and then writes to the
"iret" frame itself.  If a nested NMI comes in and modifies the
"iret" frame while repeat_nmi is also modifying it, we'll end up
with garbage.  The old code got this right, as does the new
code, but the new code is a bit more explicit.

If we were to move the check right after the "NMI executing"
check, then we'd get it wrong and have random crashes.

( Because the "NMI executing" check would jump to the code that would
  modify the "iret" frame without checking if the interrupted NMI was
  currently modifying it. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/entry_64.S | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index c080360..13c6f50 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1842,7 +1842,23 @@ ENTRY(nmi)
 	/*
 	 * Determine whether we're a nested NMI.
 	 *
-	 * First check "NMI executing".  If it's set, then we're nested.
+	 * If we interrupted kernel code between repeat_nmi and
+	 * end_repeat_nmi, then we are a nested NMI.  We must not
+	 * modify the "iret" frame because it's being written by
+	 * the outer NMI.  That's okay; the outer NMI handler is
+	 * about to about to call do_nmi anyway, so we can just
+	 * resume the outer NMI.
+	 */
+	movq	$repeat_nmi, %rdx
+	cmpq	8(%rsp), %rdx
+	ja	1f
+	movq	$end_repeat_nmi, %rdx
+	cmpq	8(%rsp), %rdx
+	ja	nested_nmi_out
+1:
+
+	/*
+	 * Now check "NMI executing".  If it's set, then we're nested.
 	 * This will not detect if we interrupted an outer NMI just
 	 * before IRET.
 	 */
@@ -1860,21 +1876,6 @@ ENTRY(nmi)
 
 nested_nmi:
 	/*
-	 * If we interrupted an NMI that is between repeat_nmi and
-	 * end_repeat_nmi, then we must not modify the "iret" frame
-	 * because it's being written by the outer NMI.  That's okay;
-	 * the outer NMI handler is about to call do_nmi anyway,
-	 * so we can just resume the outer NMI.
-	 */
-	movq $repeat_nmi, %rdx
-	cmpq 8(%rsp), %rdx
-	ja 1f
-	movq $end_repeat_nmi, %rdx
-	cmpq 8(%rsp), %rdx
-	ja nested_nmi_out
-
-1:
-	/*
 	 * Modify the "iret" frame to point to repeat_nmi, forcing another
 	 * iteration of NMI handling.
 	 */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y 6/6] x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
  2015-08-17 22:55   ` [PATCH-v3.14.y 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
                       ` (4 preceding siblings ...)
  2015-08-17 22:55     ` [PATCH-v3.14.y 5/6] x86/nmi/64: Reorder nested NMI checks Thomas D
@ 2015-08-17 22:55     ` Thomas D
  5 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-17 22:55 UTC (permalink / raw)
  To: stable
  Cc: luto, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Greg Kroah-Hartman

From: Andy Lutomirski <luto@kernel.org>

commit 810bc075f78ff2c221536eb3008eac6a492dba2d upstream.

We have a tricky bug in the nested NMI code: if we see RSP
pointing to the NMI stack on NMI entry from kernel mode, we
assume that we are executing a nested NMI.

This isn't quite true.  A malicious userspace program can point
RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
happen while RSP is still pointing at the NMI stack.

Fix it with a sneaky trick.  Set DF in the region of code that
the RSP check is intended to detect.  IRET will clear DF
atomically.

( Note: other than paravirt, there's little need for all this
  complexity. We could check RIP instead of RSP. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/entry_64.S | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 13c6f50..8ce55e3 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1868,10 +1868,25 @@ ENTRY(nmi)
 	/*
 	 * Now test if the previous stack was an NMI stack.  This covers
 	 * the case where we interrupt an outer NMI after it clears
-	 * "NMI executing" but before IRET.
+	 * "NMI executing" but before IRET.  We need to be careful, though:
+	 * there is one case in which RSP could point to the NMI stack
+	 * despite there being no NMI active: naughty userspace controls
+	 * RSP at the very beginning of the SYSCALL targets.  We can
+	 * pull a fast one on naughty userspace, though: we program
+	 * SYSCALL to mask DF, so userspace cannot cause DF to be set
+	 * if it controls the kernel's RSP.  We set DF before we clear
+	 * "NMI executing".
 	 */
 	lea 6*8(%rsp), %rdx
 	test_in_nmi rdx, 4*8(%rsp), nested_nmi, first_nmi
+
+	/* Ah, it is within the NMI stack. */
+
+	testb	$(X86_EFLAGS_DF >> 8), (3*8 + 1)(%rsp)
+	jz	first_nmi	/* RSP was user controlled. */
+
+	/* This is a nested NMI. */
+
 	CFI_REMEMBER_STATE
 
 nested_nmi:
@@ -1985,8 +2000,16 @@ nmi_restore:
 	/* Pop the extra iret frame at once */
 	RESTORE_ALL 6*8
 
-	/* Clear "NMI executing". */
-	movq $0, 5*8(%rsp)
+	/*
+	 * Clear "NMI executing".  Set DF first so that we can easily
+	 * distinguish the remaining code between here and IRET from
+	 * the SYSCALL entry and exit paths.  On a native kernel, we
+	 * could just inspect RIP, but, on paravirt kernels,
+	 * INTERRUPT_RETURN can translate into a jump into a
+	 * hypercall page.
+	 */
+	std
+	movq	$0, 5*8(%rsp)		/* clear "NMI executing" */
 
 	/*
 	 * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry
  2015-08-17 22:55     ` [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry Thomas D
@ 2015-08-18 15:45       ` Jiri Slaby
  2015-08-18 17:12         ` Thomas D.
  0 siblings, 1 reply; 21+ messages in thread
From: Jiri Slaby @ 2015-08-18 15:45 UTC (permalink / raw)
  To: Thomas D, stable
  Cc: luto, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Greg Kroah-Hartman

On 08/18/2015, 12:55 AM, Thomas D wrote:
> From: Andy Lutomirski <luto@kernel.org>
> 
> commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.
> 
> Returning to userspace is tricky: IRET can fail, and ESPFIX can
> rearrange the stack prior to IRET.
> 
> The NMI nesting fixup relies on a precise stack layout and
> atomic IRET.  Rather than trying to teach the NMI nesting fixup
> to handle ESPFIX and failed IRET, punt: run NMIs that came from
> user mode on the normal kernel stack.
> 
> This will make some nested NMIs visible to C code, but the C
> code is okay with that.
> 
> As a side effect, this should speed up perf: it eliminates an
> RDMSR when NMIs come from user mode.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Reviewed-by: Borislav Petkov <bp@suse.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>  arch/x86/kernel/entry_64.S | 77 +++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 73 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 28b08345..bd7d8aa 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1715,19 +1715,88 @@ ENTRY(nmi)
>  	 * a nested NMI that updated the copy interrupt stack frame, a
>  	 * jump will be made to the repeat_nmi code that will handle the second
>  	 * NMI.
> +	 *
> +	 * However, espfix prevents us from directly returning to userspace
> +	 * with a single IRET instruction.  Similarly, IRET to user mode
> +	 * can fault.  We therefore handle NMIs from user space like
> +	 * other IST entries.
>  	 */
>  
>  	/* Use %rdx as out temp variable throughout */
>  	pushq_cfi %rdx
>  	CFI_REL_OFFSET rdx, 0
>  
> +	testb	$3, CS-RIP+8(%rsp)
> +	jz	.Lnmi_from_kernel
> +
> +	/*
> +	 * NMI from user mode.  We need to run on the thread stack, but we
> +	 * can't go through the normal entry paths: NMIs are masked, and
> +	 * we don't want to enable interrupts, because then we'll end
> +	 * up in an awkward situation in which IRQs are on but NMIs
> +	 * are off.
> +	 */
> +
> +	SWAPGS
> +	cld
> +	movq	%rsp, %rdx
> +	movq	PER_CPU_VAR(kernel_stack), %rsp

I think you are wasting stack space here. With kernel_stack, you should
add 5*8 (KERNEL_STACK_OFFSET) to the pointer here. I.e. space for 5
registers is pre-reserved at kernel_stack already. (Or use movq instead
of the 5 pushq below.)

Why don't you re-use the 3.16's version anyway?

> +	pushq	5*8(%rdx)	/* pt_regs->ss */
> +	pushq	4*8(%rdx)	/* pt_regs->rsp */
> +	pushq	3*8(%rdx)	/* pt_regs->flags */
> +	pushq	2*8(%rdx)	/* pt_regs->cs */
> +	pushq	1*8(%rdx)	/* pt_regs->rip */
> +	pushq   $-1		/* pt_regs->orig_ax */
> +	pushq   %rdi		/* pt_regs->di */
> +	pushq   %rsi		/* pt_regs->si */
> +	pushq   (%rdx)		/* pt_regs->dx */
> +	pushq   %rcx		/* pt_regs->cx */
> +	pushq   %rax		/* pt_regs->ax */
> +	pushq   %r8		/* pt_regs->r8 */
> +	pushq   %r9		/* pt_regs->r9 */
> +	pushq   %r10		/* pt_regs->r10 */
> +	pushq   %r11		/* pt_regs->r11 */
> +	pushq	%rbx		/* pt_regs->rbx */
> +	pushq	%rbp		/* pt_regs->rbp */
> +	pushq	%r12		/* pt_regs->r12 */
> +	pushq	%r13		/* pt_regs->r13 */
> +	pushq	%r14		/* pt_regs->r14 */
> +	pushq	%r15		/* pt_regs->r15 */

regards,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry
  2015-08-18 15:45       ` Jiri Slaby
@ 2015-08-18 17:12         ` Thomas D.
  2015-08-18 19:32           ` Jiri Slaby
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas D. @ 2015-08-18 17:12 UTC (permalink / raw)
  To: Jiri Slaby, stable
  Cc: luto, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Greg Kroah-Hartman

Hi,

Jiri Slaby wrote:
> On 08/18/2015, 12:55 AM, Thomas D wrote:
>> From: Andy Lutomirski <luto@kernel.org>
>>
>> commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.
>>
>> Returning to userspace is tricky: IRET can fail, and ESPFIX can
>> rearrange the stack prior to IRET.
>>
>> The NMI nesting fixup relies on a precise stack layout and
>> atomic IRET.  Rather than trying to teach the NMI nesting fixup
>> to handle ESPFIX and failed IRET, punt: run NMIs that came from
>> user mode on the normal kernel stack.
>>
>> This will make some nested NMIs visible to C code, but the C
>> code is okay with that.
>>
>> As a side effect, this should speed up perf: it eliminates an
>> RDMSR when NMIs come from user mode.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
>> Reviewed-by: Borislav Petkov <bp@suse.de>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> ---
>>  arch/x86/kernel/entry_64.S | 77 +++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 73 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index 28b08345..bd7d8aa 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -1715,19 +1715,88 @@ ENTRY(nmi)
>>  	 * a nested NMI that updated the copy interrupt stack frame, a
>>  	 * jump will be made to the repeat_nmi code that will handle the second
>>  	 * NMI.
>> +	 *
>> +	 * However, espfix prevents us from directly returning to userspace
>> +	 * with a single IRET instruction.  Similarly, IRET to user mode
>> +	 * can fault.  We therefore handle NMIs from user space like
>> +	 * other IST entries.
>>  	 */
>>  
>>  	/* Use %rdx as out temp variable throughout */
>>  	pushq_cfi %rdx
>>  	CFI_REL_OFFSET rdx, 0
>>  
>> +	testb	$3, CS-RIP+8(%rsp)
>> +	jz	.Lnmi_from_kernel
>> +
>> +	/*
>> +	 * NMI from user mode.  We need to run on the thread stack, but we
>> +	 * can't go through the normal entry paths: NMIs are masked, and
>> +	 * we don't want to enable interrupts, because then we'll end
>> +	 * up in an awkward situation in which IRQs are on but NMIs
>> +	 * are off.
>> +	 */
>> +
>> +	SWAPGS
>> +	cld
>> +	movq	%rsp, %rdx
>> +	movq	PER_CPU_VAR(kernel_stack), %rsp
> 
> I think you are wasting stack space here. With kernel_stack, you should
> add 5*8 (KERNEL_STACK_OFFSET) to the pointer here. I.e. space for 5
> registers is pre-reserved at kernel_stack already. (Or use movq instead
> of the 5 pushq below.)
> 
> Why don't you re-use the 3.16's version anyway?
> 
>> +	pushq	5*8(%rdx)	/* pt_regs->ss */
>> +	pushq	4*8(%rdx)	/* pt_regs->rsp */
>> +	pushq	3*8(%rdx)	/* pt_regs->flags */
>> +	pushq	2*8(%rdx)	/* pt_regs->cs */
>> +	pushq	1*8(%rdx)	/* pt_regs->rip */
>> +	pushq   $-1		/* pt_regs->orig_ax */
>> +	pushq   %rdi		/* pt_regs->di */
>> +	pushq   %rsi		/* pt_regs->si */
>> +	pushq   (%rdx)		/* pt_regs->dx */
>> +	pushq   %rcx		/* pt_regs->cx */
>> +	pushq   %rax		/* pt_regs->ax */
>> +	pushq   %r8		/* pt_regs->r8 */
>> +	pushq   %r9		/* pt_regs->r9 */
>> +	pushq   %r10		/* pt_regs->r10 */
>> +	pushq   %r11		/* pt_regs->r11 */
>> +	pushq	%rbx		/* pt_regs->rbx */
>> +	pushq	%rbp		/* pt_regs->rbp */
>> +	pushq	%r12		/* pt_regs->r12 */
>> +	pushq	%r13		/* pt_regs->r13 */
>> +	pushq	%r14		/* pt_regs->r14 */
>> +	pushq	%r15		/* pt_regs->r15 */

Mh, so you mean

> +	addq	$KERNEL_STACK_OFFSET, %rsp

between

> +	movq	PER_CPU_VAR(kernel_stack), %rsp

and

> +	pushq	5*8(%rdx)	/* pt_regs->ss */

is missing?

That seems to be the only difference between this patch and Debian's
3.16.7-ckt11-1+deb8u2 [1] version.

 [1]
https://anonscm.debian.org/cgit/kernel/linux.git/tree/debian/patches/bugfix/x86/0006-x86-nmi-64-Switch-stacks-on-userspace-NMI-entry.patch?h=jessie#n69



-Thomas


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry
  2015-08-18 17:12         ` Thomas D.
@ 2015-08-18 19:32           ` Jiri Slaby
  2015-08-19 14:11             ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
  0 siblings, 1 reply; 21+ messages in thread
From: Jiri Slaby @ 2015-08-18 19:32 UTC (permalink / raw)
  To: Thomas D., stable
  Cc: luto, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Greg Kroah-Hartman

On 08/18/2015, 07:12 PM, Thomas D. wrote:
>>> --- a/arch/x86/kernel/entry_64.S
>>> +++ b/arch/x86/kernel/entry_64.S
>>> @@ -1715,19 +1715,88 @@ ENTRY(nmi)
>>>  	 * a nested NMI that updated the copy interrupt stack frame, a
>>>  	 * jump will be made to the repeat_nmi code that will handle the second
>>>  	 * NMI.
>>> +	 *
>>> +	 * However, espfix prevents us from directly returning to userspace
>>> +	 * with a single IRET instruction.  Similarly, IRET to user mode
>>> +	 * can fault.  We therefore handle NMIs from user space like
>>> +	 * other IST entries.
>>>  	 */
>>>  
>>>  	/* Use %rdx as out temp variable throughout */
>>>  	pushq_cfi %rdx
>>>  	CFI_REL_OFFSET rdx, 0
>>>  
>>> +	testb	$3, CS-RIP+8(%rsp)
>>> +	jz	.Lnmi_from_kernel
>>> +
>>> +	/*
>>> +	 * NMI from user mode.  We need to run on the thread stack, but we
>>> +	 * can't go through the normal entry paths: NMIs are masked, and
>>> +	 * we don't want to enable interrupts, because then we'll end
>>> +	 * up in an awkward situation in which IRQs are on but NMIs
>>> +	 * are off.
>>> +	 */
>>> +
>>> +	SWAPGS
>>> +	cld
>>> +	movq	%rsp, %rdx
>>> +	movq	PER_CPU_VAR(kernel_stack), %rsp
>>
>> I think you are wasting stack space here. With kernel_stack, you should
>> add 5*8 (KERNEL_STACK_OFFSET) to the pointer here. I.e. space for 5
>> registers is pre-reserved at kernel_stack already. (Or use movq instead
>> of the 5 pushq below.)
>>
>> Why don't you re-use the 3.16's version anyway?
>>
>>> +	pushq	5*8(%rdx)	/* pt_regs->ss */
>>> +	pushq	4*8(%rdx)	/* pt_regs->rsp */
>>> +	pushq	3*8(%rdx)	/* pt_regs->flags */
>>> +	pushq	2*8(%rdx)	/* pt_regs->cs */
>>> +	pushq	1*8(%rdx)	/* pt_regs->rip */
>>> +	pushq   $-1		/* pt_regs->orig_ax */
>>> +	pushq   %rdi		/* pt_regs->di */
>>> +	pushq   %rsi		/* pt_regs->si */
>>> +	pushq   (%rdx)		/* pt_regs->dx */
>>> +	pushq   %rcx		/* pt_regs->cx */
>>> +	pushq   %rax		/* pt_regs->ax */
>>> +	pushq   %r8		/* pt_regs->r8 */
>>> +	pushq   %r9		/* pt_regs->r9 */
>>> +	pushq   %r10		/* pt_regs->r10 */
>>> +	pushq   %r11		/* pt_regs->r11 */
>>> +	pushq	%rbx		/* pt_regs->rbx */
>>> +	pushq	%rbp		/* pt_regs->rbp */
>>> +	pushq	%r12		/* pt_regs->r12 */
>>> +	pushq	%r13		/* pt_regs->r13 */
>>> +	pushq	%r14		/* pt_regs->r14 */
>>> +	pushq	%r15		/* pt_regs->r15 */
> 
> Mh, so you mean
> 
>> +	addq	$KERNEL_STACK_OFFSET, %rsp
> 
> between
> 
>> +	movq	PER_CPU_VAR(kernel_stack), %rsp
> 
> and
> 
>> +	pushq	5*8(%rdx)	/* pt_regs->ss */
> 
> is missing?

Yep, that makes sense. But I am not an x86 maintainer :P.

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157
  2015-08-18 19:32           ` Jiri Slaby
@ 2015-08-19 14:11             ` Thomas D
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Thomas D
                                 ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Thomas D @ 2015-08-19 14:11 UTC (permalink / raw)
  To: stable; +Cc: luto, Thomas D

Hi,

here's my backport for CVE-2015-3290 and CVE-2015-5157 for linux-3.14.


Changelog:

v2:
- Missing "addq" call added to "x86/nmi/64: Switch stacks on userspace NMI entry"
  Thanks to Jiri Slaby!

v1: http://thread.gmane.org/gmane.linux.kernel.stable/146885/focus=146902

Andy Lutomirski (6):
  x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
  x86/nmi/64: Remove asm code that saves CR2
  x86/nmi/64: Switch stacks on userspace NMI entry
  x86/nmi/64: Improve nested NMI comments
  x86/nmi/64: Reorder nested NMI checks
  x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI
    detection

 arch/x86/kernel/entry_64.S | 296 ++++++++++++++++++++++++++++++---------------
 arch/x86/kernel/nmi.c      | 123 ++++++++-----------
 2 files changed, 249 insertions(+), 170 deletions(-)

-- 
2.5.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y v2 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
  2015-08-19 14:11             ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
@ 2015-08-19 14:11               ` Thomas D
  2015-09-29 13:38                 ` Greg KH
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 2/6] x86/nmi/64: Remove asm code that saves CR2 Thomas D
                                 ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Thomas D @ 2015-08-19 14:11 UTC (permalink / raw)
  To: stable
  Cc: luto, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar

From: Andy Lutomirski <luto@kernel.org>

32-bit kernels handle nested NMIs in C.  Enable the exact same
handling on 64-bit kernels as well.  This isn't currently
necessary, but it will become necessary once the asm code starts
allowing limited nesting.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/nmi.c | 123 +++++++++++++++++++++-----------------------------
 1 file changed, 52 insertions(+), 71 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 6fcb49c..c1c7d41 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -392,15 +392,15 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 }
 
 /*
- * NMIs can hit breakpoints which will cause it to lose its
- * NMI context with the CPU when the breakpoint does an iret.
- */
-#ifdef CONFIG_X86_32
-/*
- * For i386, NMIs use the same stack as the kernel, and we can
- * add a workaround to the iret problem in C (preventing nested
- * NMIs if an NMI takes a trap). Simply have 3 states the NMI
- * can be in:
+ * NMIs can hit breakpoints which will cause it to lose its NMI context
+ * with the CPU when the breakpoint or page fault does an IRET.
+ *
+ * As a result, NMIs can nest if NMIs get unmasked due an IRET during
+ * NMI processing.  On x86_64, the asm glue protects us from nested NMIs
+ * if the outer NMI came from kernel mode, but we can still nest if the
+ * outer NMI came from user mode.
+ *
+ * To handle these nested NMIs, we have three states:
  *
  *  1) not running
  *  2) executing
@@ -414,15 +414,14 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
  * (Note, the latch is binary, thus multiple NMIs triggering,
  *  when one is running, are ignored. Only one NMI is restarted.)
  *
- * If an NMI hits a breakpoint that executes an iret, another
- * NMI can preempt it. We do not want to allow this new NMI
- * to run, but we want to execute it when the first one finishes.
- * We set the state to "latched", and the exit of the first NMI will
- * perform a dec_return, if the result is zero (NOT_RUNNING), then
- * it will simply exit the NMI handler. If not, the dec_return
- * would have set the state to NMI_EXECUTING (what we want it to
- * be when we are running). In this case, we simply jump back
- * to rerun the NMI handler again, and restart the 'latched' NMI.
+ * If an NMI executes an iret, another NMI can preempt it. We do not
+ * want to allow this new NMI to run, but we want to execute it when the
+ * first one finishes.  We set the state to "latched", and the exit of
+ * the first NMI will perform a dec_return, if the result is zero
+ * (NOT_RUNNING), then it will simply exit the NMI handler. If not, the
+ * dec_return would have set the state to NMI_EXECUTING (what we want it
+ * to be when we are running). In this case, we simply jump back to
+ * rerun the NMI handler again, and restart the 'latched' NMI.
  *
  * No trap (breakpoint or page fault) should be hit before nmi_restart,
  * thus there is no race between the first check of state for NOT_RUNNING
@@ -445,49 +444,36 @@ enum nmi_states {
 static DEFINE_PER_CPU(enum nmi_states, nmi_state);
 static DEFINE_PER_CPU(unsigned long, nmi_cr2);
 
-#define nmi_nesting_preprocess(regs)					\
-	do {								\
-		if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {	\
-			this_cpu_write(nmi_state, NMI_LATCHED);		\
-			return;						\
-		}							\
-		this_cpu_write(nmi_state, NMI_EXECUTING);		\
-		this_cpu_write(nmi_cr2, read_cr2());			\
-	} while (0);							\
-	nmi_restart:
-
-#define nmi_nesting_postprocess()					\
-	do {								\
-		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
-			write_cr2(this_cpu_read(nmi_cr2));		\
-		if (this_cpu_dec_return(nmi_state))			\
-			goto nmi_restart;				\
-	} while (0)
-#else /* x86_64 */
+#ifdef CONFIG_X86_64
 /*
- * In x86_64 things are a bit more difficult. This has the same problem
- * where an NMI hitting a breakpoint that calls iret will remove the
- * NMI context, allowing a nested NMI to enter. What makes this more
- * difficult is that both NMIs and breakpoints have their own stack.
- * When a new NMI or breakpoint is executed, the stack is set to a fixed
- * point. If an NMI is nested, it will have its stack set at that same
- * fixed address that the first NMI had, and will start corrupting the
- * stack. This is handled in entry_64.S, but the same problem exists with
- * the breakpoint stack.
+ * In x86_64, we need to handle breakpoint -> NMI -> breakpoint.  Without
+ * some care, the inner breakpoint will clobber the outer breakpoint's
+ * stack.
  *
- * If a breakpoint is being processed, and the debug stack is being used,
- * if an NMI comes in and also hits a breakpoint, the stack pointer
- * will be set to the same fixed address as the breakpoint that was
- * interrupted, causing that stack to be corrupted. To handle this case,
- * check if the stack that was interrupted is the debug stack, and if
- * so, change the IDT so that new breakpoints will use the current stack
- * and not switch to the fixed address. On return of the NMI, switch back
- * to the original IDT.
+ * If a breakpoint is being processed, and the debug stack is being
+ * used, if an NMI comes in and also hits a breakpoint, the stack
+ * pointer will be set to the same fixed address as the breakpoint that
+ * was interrupted, causing that stack to be corrupted. To handle this
+ * case, check if the stack that was interrupted is the debug stack, and
+ * if so, change the IDT so that new breakpoints will use the current
+ * stack and not switch to the fixed address. On return of the NMI,
+ * switch back to the original IDT.
  */
 static DEFINE_PER_CPU(int, update_debug_stack);
+#endif
 
-static inline void nmi_nesting_preprocess(struct pt_regs *regs)
+dotraplinkage notrace void
+do_nmi(struct pt_regs *regs, long error_code)
 {
+	if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {
+		this_cpu_write(nmi_state, NMI_LATCHED);
+		return;
+	}
+	this_cpu_write(nmi_state, NMI_EXECUTING);
+	this_cpu_write(nmi_cr2, read_cr2());
+nmi_restart:
+
+#ifdef CONFIG_X86_64
 	/*
 	 * If we interrupted a breakpoint, it is possible that
 	 * the nmi handler will have breakpoints too. We need to
@@ -498,22 +484,8 @@ static inline void nmi_nesting_preprocess(struct pt_regs *regs)
 		debug_stack_set_zero();
 		this_cpu_write(update_debug_stack, 1);
 	}
-}
-
-static inline void nmi_nesting_postprocess(void)
-{
-	if (unlikely(this_cpu_read(update_debug_stack))) {
-		debug_stack_reset();
-		this_cpu_write(update_debug_stack, 0);
-	}
-}
 #endif
 
-dotraplinkage notrace __kprobes void
-do_nmi(struct pt_regs *regs, long error_code)
-{
-	nmi_nesting_preprocess(regs);
-
 	nmi_enter();
 
 	inc_irq_stat(__nmi_count);
@@ -523,8 +495,17 @@ do_nmi(struct pt_regs *regs, long error_code)
 
 	nmi_exit();
 
-	/* On i386, may loop back to preprocess */
-	nmi_nesting_postprocess();
+#ifdef CONFIG_X86_64
+	if (unlikely(this_cpu_read(update_debug_stack))) {
+		debug_stack_reset();
+		this_cpu_write(update_debug_stack, 0);
+	}
+#endif
+
+	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
+		write_cr2(this_cpu_read(nmi_cr2));
+	if (this_cpu_dec_return(nmi_state))
+		goto nmi_restart;
 }
 
 void stop_nmi(void)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y v2 2/6] x86/nmi/64: Remove asm code that saves CR2
  2015-08-19 14:11             ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Thomas D
@ 2015-08-19 14:11               ` Thomas D
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 3/6] x86/nmi/64: Switch stacks on userspace NMI entry Thomas D
                                 ` (4 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-19 14:11 UTC (permalink / raw)
  To: stable; +Cc: luto, Linus Torvalds, Peter Zijlstra, Thomas Gleixner, Ingo Molnar

From: Andy Lutomirski <luto@kernel.org>

Now that do_nmi saves CR2, we don't need to save it in asm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/entry_64.S | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 06469ee..28b08345 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1885,28 +1885,10 @@ end_repeat_nmi:
 	call save_paranoid
 	DEFAULT_FRAME 0
 
-	/*
-	 * Save off the CR2 register. If we take a page fault in the NMI then
-	 * it could corrupt the CR2 value. If the NMI preempts a page fault
-	 * handler before it was able to read the CR2 register, and then the
-	 * NMI itself takes a page fault, the page fault that was preempted
-	 * will read the information from the NMI page fault and not the
-	 * origin fault. Save it off and restore it if it changes.
-	 * Use the r12 callee-saved register.
-	 */
-	movq %cr2, %r12
-
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
 	movq %rsp,%rdi
 	movq $-1,%rsi
 	call do_nmi
-
-	/* Did the NMI take a page fault? Restore cr2 if it did */
-	movq %cr2, %rcx
-	cmpq %rcx, %r12
-	je 1f
-	movq %r12, %cr2
-1:
 	
 	testl %ebx,%ebx				/* swapgs needed? */
 	jnz nmi_restore
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y v2 3/6] x86/nmi/64: Switch stacks on userspace NMI entry
  2015-08-19 14:11             ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Thomas D
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 2/6] x86/nmi/64: Remove asm code that saves CR2 Thomas D
@ 2015-08-19 14:11               ` Thomas D
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 4/6] x86/nmi/64: Improve nested NMI comments Thomas D
                                 ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-19 14:11 UTC (permalink / raw)
  To: stable
  Cc: luto, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Greg Kroah-Hartman

From: Andy Lutomirski <luto@kernel.org>

commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.

Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.

The NMI nesting fixup relies on a precise stack layout and
atomic IRET.  Rather than trying to teach the NMI nesting fixup
to handle ESPFIX and failed IRET, punt: run NMIs that came from
user mode on the normal kernel stack.

This will make some nested NMIs visible to C code, but the C
code is okay with that.

As a side effect, this should speed up perf: it eliminates an
RDMSR when NMIs come from user mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/entry_64.S | 77 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 73 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 28b08345..2c9366a 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1715,19 +1715,88 @@ ENTRY(nmi)
 	 * a nested NMI that updated the copy interrupt stack frame, a
 	 * jump will be made to the repeat_nmi code that will handle the second
 	 * NMI.
+	 *
+	 * However, espfix prevents us from directly returning to userspace
+	 * with a single IRET instruction.  Similarly, IRET to user mode
+	 * can fault.  We therefore handle NMIs from user space like
+	 * other IST entries.
 	 */
 
 	/* Use %rdx as out temp variable throughout */
 	pushq_cfi %rdx
 	CFI_REL_OFFSET rdx, 0
 
+	testb	$3, CS-RIP+8(%rsp)
+	jz	.Lnmi_from_kernel
+
+	/*
+	 * NMI from user mode.  We need to run on the thread stack, but we
+	 * can't go through the normal entry paths: NMIs are masked, and
+	 * we don't want to enable interrupts, because then we'll end
+	 * up in an awkward situation in which IRQs are on but NMIs
+	 * are off.
+	 */
+	SWAPGS
+	cld
+	movq	%rsp, %rdx
+	movq	PER_CPU_VAR(kernel_stack), %rsp
+	addq	$KERNEL_STACK_OFFSET, %rsp
+	pushq	5*8(%rdx)	/* pt_regs->ss */
+	pushq	4*8(%rdx)	/* pt_regs->rsp */
+	pushq	3*8(%rdx)	/* pt_regs->flags */
+	pushq	2*8(%rdx)	/* pt_regs->cs */
+	pushq	1*8(%rdx)	/* pt_regs->rip */
+	pushq   $-1		/* pt_regs->orig_ax */
+	pushq   %rdi		/* pt_regs->di */
+	pushq   %rsi		/* pt_regs->si */
+	pushq   (%rdx)		/* pt_regs->dx */
+	pushq   %rcx		/* pt_regs->cx */
+	pushq   %rax		/* pt_regs->ax */
+	pushq   %r8		/* pt_regs->r8 */
+	pushq   %r9		/* pt_regs->r9 */
+	pushq   %r10		/* pt_regs->r10 */
+	pushq   %r11		/* pt_regs->r11 */
+	pushq	%rbx		/* pt_regs->rbx */
+	pushq	%rbp		/* pt_regs->rbp */
+	pushq	%r12		/* pt_regs->r12 */
+	pushq	%r13		/* pt_regs->r13 */
+	pushq	%r14		/* pt_regs->r14 */
+	pushq	%r15		/* pt_regs->r15 */
+
 	/*
-	 * If %cs was not the kernel segment, then the NMI triggered in user
-	 * space, which means it is definitely not nested.
+	 * At this point we no longer need to worry about stack damage
+	 * due to nesting -- we're on the normal thread stack and we're
+	 * done with the NMI stack.
 	 */
-	cmpl $__KERNEL_CS, 16(%rsp)
-	jne first_nmi
+	movq	%rsp, %rdi
+	movq	$-1, %rsi
+	call	do_nmi
+
+	/*
+	 * Return back to user mode.  We must *not* do the normal exit
+	 * work, because we don't want to enable interrupts.  Fortunately,
+	 * do_nmi doesn't modify pt_regs.
+	 */
+	SWAPGS
+
+	/*
+	 * Open-code the entire return process for compatibility with varying
+	 * register layouts across different kernel versions.
+	 */
+	addq	$6*8, %rsp	/* skip bx, bp, and r12-r15 */
+	popq	%r11		/* pt_regs->r11 */
+	popq	%r10		/* pt_regs->r10 */
+	popq	%r9		/* pt_regs->r9 */
+	popq	%r8		/* pt_regs->r8 */
+	popq	%rax		/* pt_regs->ax */
+	popq	%rcx		/* pt_regs->cx */
+	popq	%rdx		/* pt_regs->dx */
+	popq	%rsi		/* pt_regs->si */
+	popq	%rdi		/* pt_regs->di */
+	addq	$8, %rsp	/* skip orig_ax */
+	INTERRUPT_RETURN
 
+.Lnmi_from_kernel:
 	/*
 	 * Check the special variable on the stack to see if NMIs are
 	 * executing.
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y v2 4/6] x86/nmi/64: Improve nested NMI comments
  2015-08-19 14:11             ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
                                 ` (2 preceding siblings ...)
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 3/6] x86/nmi/64: Switch stacks on userspace NMI entry Thomas D
@ 2015-08-19 14:11               ` Thomas D
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 5/6] x86/nmi/64: Reorder nested NMI checks Thomas D
                                 ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-19 14:11 UTC (permalink / raw)
  To: stable
  Cc: luto, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Greg Kroah-Hartman

From: Andy Lutomirski <luto@kernel.org>

commit 0b22930ebad563ae97ff3f8d7b9f12060b4c6e6b upstream.

I found the nested NMI documentation to be difficult to follow.
Improve the comments.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/entry_64.S | 159 ++++++++++++++++++++++++++-------------------
 arch/x86/kernel/nmi.c      |   4 +-
 2 files changed, 93 insertions(+), 70 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 2c9366a..7251922 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1702,11 +1702,12 @@ ENTRY(nmi)
 	 *  If the variable is not set and the stack is not the NMI
 	 *  stack then:
 	 *    o Set the special variable on the stack
-	 *    o Copy the interrupt frame into a "saved" location on the stack
-	 *    o Copy the interrupt frame into a "copy" location on the stack
+	 *    o Copy the interrupt frame into an "outermost" location on the
+	 *      stack
+	 *    o Copy the interrupt frame into an "iret" location on the stack
 	 *    o Continue processing the NMI
 	 *  If the variable is set or the previous stack is the NMI stack:
-	 *    o Modify the "copy" location to jump to the repeate_nmi
+	 *    o Modify the "iret" location to jump to the repeat_nmi
 	 *    o return back to the first NMI
 	 *
 	 * Now on exit of the first NMI, we first clear the stack variable
@@ -1798,18 +1799,60 @@ ENTRY(nmi)
 
 .Lnmi_from_kernel:
 	/*
-	 * Check the special variable on the stack to see if NMIs are
-	 * executing.
+	 * Here's what our stack frame will look like:
+	 * +---------------------------------------------------------+
+	 * | original SS                                             |
+	 * | original Return RSP                                     |
+	 * | original RFLAGS                                         |
+	 * | original CS                                             |
+	 * | original RIP                                            |
+	 * +---------------------------------------------------------+
+	 * | temp storage for rdx                                    |
+	 * +---------------------------------------------------------+
+	 * | "NMI executing" variable                                |
+	 * +---------------------------------------------------------+
+	 * | iret SS          } Copied from "outermost" frame        |
+	 * | iret Return RSP  } on each loop iteration; overwritten  |
+	 * | iret RFLAGS      } by a nested NMI to force another     |
+	 * | iret CS          } iteration if needed.                 |
+	 * | iret RIP         }                                      |
+	 * +---------------------------------------------------------+
+	 * | outermost SS          } initialized in first_nmi;       |
+	 * | outermost Return RSP  } will not be changed before      |
+	 * | outermost RFLAGS      } NMI processing is done.         |
+	 * | outermost CS          } Copied to "iret" frame on each  |
+	 * | outermost RIP         } iteration.                      |
+	 * +---------------------------------------------------------+
+	 * | pt_regs                                                 |
+	 * +---------------------------------------------------------+
+	 *
+	 * The "original" frame is used by hardware.  Before re-enabling
+	 * NMIs, we need to be done with it, and we need to leave enough
+	 * space for the asm code here.
+	 *
+	 * We return by executing IRET while RSP points to the "iret" frame.
+	 * That will either return for real or it will loop back into NMI
+	 * processing.
+	 *
+	 * The "outermost" frame is copied to the "iret" frame on each
+	 * iteration of the loop, so each iteration starts with the "iret"
+	 * frame pointing to the final return target.
+	 */
+
+	/*
+	 * Determine whether we're a nested NMI.
+	 *
+	 * First check "NMI executing".  If it's set, then we're nested.
+	 * This will not detect if we interrupted an outer NMI just
+	 * before IRET.
 	 */
 	cmpl $1, -8(%rsp)
 	je nested_nmi
 
 	/*
-	 * Now test if the previous stack was an NMI stack.
-	 * We need the double check. We check the NMI stack to satisfy the
-	 * race when the first NMI clears the variable before returning.
-	 * We check the variable because the first NMI could be in a
-	 * breakpoint routine using a breakpoint stack.
+	 * Now test if the previous stack was an NMI stack.  This covers
+	 * the case where we interrupt an outer NMI after it clears
+	 * "NMI executing" but before IRET.
 	 */
 	lea 6*8(%rsp), %rdx
 	test_in_nmi rdx, 4*8(%rsp), nested_nmi, first_nmi
@@ -1817,9 +1860,11 @@ ENTRY(nmi)
 
 nested_nmi:
 	/*
-	 * Do nothing if we interrupted the fixup in repeat_nmi.
-	 * It's about to repeat the NMI handler, so we are fine
-	 * with ignoring this one.
+	 * If we interrupted an NMI that is between repeat_nmi and
+	 * end_repeat_nmi, then we must not modify the "iret" frame
+	 * because it's being written by the outer NMI.  That's okay;
+	 * the outer NMI handler is about to call do_nmi anyway,
+	 * so we can just resume the outer NMI.
 	 */
 	movq $repeat_nmi, %rdx
 	cmpq 8(%rsp), %rdx
@@ -1829,7 +1874,10 @@ nested_nmi:
 	ja nested_nmi_out
 
 1:
-	/* Set up the interrupted NMIs stack to jump to repeat_nmi */
+	/*
+	 * Modify the "iret" frame to point to repeat_nmi, forcing another
+	 * iteration of NMI handling.
+	 */
 	leaq -1*8(%rsp), %rdx
 	movq %rdx, %rsp
 	CFI_ADJUST_CFA_OFFSET 1*8
@@ -1848,60 +1896,23 @@ nested_nmi_out:
 	popq_cfi %rdx
 	CFI_RESTORE rdx
 
-	/* No need to check faults here */
+	/* We are returning to kernel mode, so this cannot result in a fault. */
 	INTERRUPT_RETURN
 
 	CFI_RESTORE_STATE
 first_nmi:
-	/*
-	 * Because nested NMIs will use the pushed location that we
-	 * stored in rdx, we must keep that space available.
-	 * Here's what our stack frame will look like:
-	 * +-------------------------+
-	 * | original SS             |
-	 * | original Return RSP     |
-	 * | original RFLAGS         |
-	 * | original CS             |
-	 * | original RIP            |
-	 * +-------------------------+
-	 * | temp storage for rdx    |
-	 * +-------------------------+
-	 * | NMI executing variable  |
-	 * +-------------------------+
-	 * | copied SS               |
-	 * | copied Return RSP       |
-	 * | copied RFLAGS           |
-	 * | copied CS               |
-	 * | copied RIP              |
-	 * +-------------------------+
-	 * | Saved SS                |
-	 * | Saved Return RSP        |
-	 * | Saved RFLAGS            |
-	 * | Saved CS                |
-	 * | Saved RIP               |
-	 * +-------------------------+
-	 * | pt_regs                 |
-	 * +-------------------------+
-	 *
-	 * The saved stack frame is used to fix up the copied stack frame
-	 * that a nested NMI may change to make the interrupted NMI iret jump
-	 * to the repeat_nmi. The original stack frame and the temp storage
-	 * is also used by nested NMIs and can not be trusted on exit.
-	 */
-	/* Do not pop rdx, nested NMIs will corrupt that part of the stack */
+	/* Restore rdx. */
 	movq (%rsp), %rdx
 	CFI_RESTORE rdx
 
-	/* Set the NMI executing variable on the stack. */
+	/* Set "NMI executing" on the stack. */
 	pushq_cfi $1
 
-	/*
-	 * Leave room for the "copied" frame
-	 */
+	/* Leave room for the "iret" frame */
 	subq $(5*8), %rsp
 	CFI_ADJUST_CFA_OFFSET 5*8
 
-	/* Copy the stack frame to the Saved frame */
+	/* Copy the "original" frame to the "outermost" frame */
 	.rept 5
 	pushq_cfi 11*8(%rsp)
 	.endr
@@ -1909,6 +1920,7 @@ first_nmi:
 
 	/* Everything up to here is safe from nested NMIs */
 
+repeat_nmi:
 	/*
 	 * If there was a nested NMI, the first NMI's iret will return
 	 * here. But NMIs are still enabled and we can take another
@@ -1917,16 +1929,21 @@ first_nmi:
 	 * it will just return, as we are about to repeat an NMI anyway.
 	 * This makes it safe to copy to the stack frame that a nested
 	 * NMI will update.
-	 */
-repeat_nmi:
-	/*
-	 * Update the stack variable to say we are still in NMI (the update
-	 * is benign for the non-repeat case, where 1 was pushed just above
-	 * to this very stack slot).
+	 *
+	 * RSP is pointing to "outermost RIP".  gsbase is unknown, but, if
+	 * we're repeating an NMI, gsbase has the same value that it had on
+	 * the first iteration.  paranoid_entry will load the kernel
+	 * gsbase if needed before we call do_nmi.
+	 *
+	 * Set "NMI executing" in case we came back here via IRET.
 	 */
 	movq $1, 10*8(%rsp)
 
-	/* Make another copy, this one may be modified by nested NMIs */
+	/*
+	 * Copy the "outermost" frame to the "iret" frame.  NMIs that nest
+	 * here must not modify the "iret" frame while we're writing to
+	 * it or it will end up containing garbage.
+	 */
 	addq $(10*8), %rsp
 	CFI_ADJUST_CFA_OFFSET -10*8
 	.rept 5
@@ -1937,9 +1954,9 @@ repeat_nmi:
 end_repeat_nmi:
 
 	/*
-	 * Everything below this point can be preempted by a nested
-	 * NMI if the first NMI took an exception and reset our iret stack
-	 * so that we repeat another NMI.
+	 * Everything below this point can be preempted by a nested NMI.
+	 * If this happens, then the inner NMI will change the "iret"
+	 * frame to point back to repeat_nmi.
 	 */
 	pushq_cfi $-1		/* ORIG_RAX: no syscall to restart */
 	subq $ORIG_RAX-R15, %rsp
@@ -1967,9 +1984,15 @@ nmi_restore:
 	/* Pop the extra iret frame at once */
 	RESTORE_ALL 6*8
 
-	/* Clear the NMI executing stack variable */
+	/* Clear "NMI executing". */
 	movq $0, 5*8(%rsp)
-	jmp irq_return
+
+	/*
+	 * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
+	 * stack in a single instruction.  We are returning to kernel
+	 * mode, so this cannot result in a fault.
+	 */
+	INTERRUPT_RETURN
 	CFI_ENDPROC
 END(nmi)
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index c1c7d41..8facfb3 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -392,8 +392,8 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 }
 
 /*
- * NMIs can hit breakpoints which will cause it to lose its NMI context
- * with the CPU when the breakpoint or page fault does an IRET.
+ * NMIs can page fault or hit breakpoints which will cause it to lose
+ * its NMI context with the CPU when the breakpoint or page fault does an IRET.
  *
  * As a result, NMIs can nest if NMIs get unmasked due an IRET during
  * NMI processing.  On x86_64, the asm glue protects us from nested NMIs
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y v2 5/6] x86/nmi/64: Reorder nested NMI checks
  2015-08-19 14:11             ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
                                 ` (3 preceding siblings ...)
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 4/6] x86/nmi/64: Improve nested NMI comments Thomas D
@ 2015-08-19 14:11               ` Thomas D
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 6/6] x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection Thomas D
  2015-09-29 14:11               ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Greg KH
  6 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-19 14:11 UTC (permalink / raw)
  To: stable
  Cc: luto, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Greg Kroah-Hartman

From: Andy Lutomirski <luto@kernel.org>

commit a27507ca2d796cfa8d907de31ad730359c8a6d06 upstream.

Check the repeat_nmi .. end_repeat_nmi special case first.  The
next patch will rework the RSP check and, as a side effect, the
RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
we'll need this ordering of the checks.

Note: this is more subtle than it appears.  The check for
repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
instead of adjusting the "iret" frame to force a repeat.  This
is necessary, because the code between repeat_nmi and
end_repeat_nmi sets "NMI executing" and then writes to the
"iret" frame itself.  If a nested NMI comes in and modifies the
"iret" frame while repeat_nmi is also modifying it, we'll end up
with garbage.  The old code got this right, as does the new
code, but the new code is a bit more explicit.

If we were to move the check right after the "NMI executing"
check, then we'd get it wrong and have random crashes.

( Because the "NMI executing" check would jump to the code that would
  modify the "iret" frame without checking if the interrupted NMI was
  currently modifying it. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/entry_64.S | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 7251922..0f02171 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1842,7 +1842,23 @@ ENTRY(nmi)
 	/*
 	 * Determine whether we're a nested NMI.
 	 *
-	 * First check "NMI executing".  If it's set, then we're nested.
+	 * If we interrupted kernel code between repeat_nmi and
+	 * end_repeat_nmi, then we are a nested NMI.  We must not
+	 * modify the "iret" frame because it's being written by
+	 * the outer NMI.  That's okay; the outer NMI handler is
+	 * about to about to call do_nmi anyway, so we can just
+	 * resume the outer NMI.
+	 */
+	movq	$repeat_nmi, %rdx
+	cmpq	8(%rsp), %rdx
+	ja	1f
+	movq	$end_repeat_nmi, %rdx
+	cmpq	8(%rsp), %rdx
+	ja	nested_nmi_out
+1:
+
+	/*
+	 * Now check "NMI executing".  If it's set, then we're nested.
 	 * This will not detect if we interrupted an outer NMI just
 	 * before IRET.
 	 */
@@ -1860,21 +1876,6 @@ ENTRY(nmi)
 
 nested_nmi:
 	/*
-	 * If we interrupted an NMI that is between repeat_nmi and
-	 * end_repeat_nmi, then we must not modify the "iret" frame
-	 * because it's being written by the outer NMI.  That's okay;
-	 * the outer NMI handler is about to call do_nmi anyway,
-	 * so we can just resume the outer NMI.
-	 */
-	movq $repeat_nmi, %rdx
-	cmpq 8(%rsp), %rdx
-	ja 1f
-	movq $end_repeat_nmi, %rdx
-	cmpq 8(%rsp), %rdx
-	ja nested_nmi_out
-
-1:
-	/*
 	 * Modify the "iret" frame to point to repeat_nmi, forcing another
 	 * iteration of NMI handling.
 	 */
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH-v3.14.y v2 6/6] x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection
  2015-08-19 14:11             ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
                                 ` (4 preceding siblings ...)
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 5/6] x86/nmi/64: Reorder nested NMI checks Thomas D
@ 2015-08-19 14:11               ` Thomas D
  2015-09-29 14:11               ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Greg KH
  6 siblings, 0 replies; 21+ messages in thread
From: Thomas D @ 2015-08-19 14:11 UTC (permalink / raw)
  To: stable
  Cc: luto, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Greg Kroah-Hartman

From: Andy Lutomirski <luto@kernel.org>

commit 810bc075f78ff2c221536eb3008eac6a492dba2d upstream.

We have a tricky bug in the nested NMI code: if we see RSP
pointing to the NMI stack on NMI entry from kernel mode, we
assume that we are executing a nested NMI.

This isn't quite true.  A malicious userspace program can point
RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
happen while RSP is still pointing at the NMI stack.

Fix it with a sneaky trick.  Set DF in the region of code that
the RSP check is intended to detect.  IRET will clear DF
atomically.

( Note: other than paravirt, there's little need for all this
  complexity. We could check RIP instead of RSP. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/entry_64.S | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0f02171..6d6ab2b 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1868,10 +1868,25 @@ ENTRY(nmi)
 	/*
 	 * Now test if the previous stack was an NMI stack.  This covers
 	 * the case where we interrupt an outer NMI after it clears
-	 * "NMI executing" but before IRET.
+	 * "NMI executing" but before IRET.  We need to be careful, though:
+	 * there is one case in which RSP could point to the NMI stack
+	 * despite there being no NMI active: naughty userspace controls
+	 * RSP at the very beginning of the SYSCALL targets.  We can
+	 * pull a fast one on naughty userspace, though: we program
+	 * SYSCALL to mask DF, so userspace cannot cause DF to be set
+	 * if it controls the kernel's RSP.  We set DF before we clear
+	 * "NMI executing".
 	 */
 	lea 6*8(%rsp), %rdx
 	test_in_nmi rdx, 4*8(%rsp), nested_nmi, first_nmi
+
+	/* Ah, it is within the NMI stack. */
+
+	testb	$(X86_EFLAGS_DF >> 8), (3*8 + 1)(%rsp)
+	jz	first_nmi	/* RSP was user controlled. */
+
+	/* This is a nested NMI. */
+
 	CFI_REMEMBER_STATE
 
 nested_nmi:
@@ -1985,8 +2000,16 @@ nmi_restore:
 	/* Pop the extra iret frame at once */
 	RESTORE_ALL 6*8
 
-	/* Clear "NMI executing". */
-	movq $0, 5*8(%rsp)
+	/*
+	 * Clear "NMI executing".  Set DF first so that we can easily
+	 * distinguish the remaining code between here and IRET from
+	 * the SYSCALL entry and exit paths.  On a native kernel, we
+	 * could just inspect RIP, but, on paravirt kernels,
+	 * INTERRUPT_RETURN can translate into a jump into a
+	 * hypercall page.
+	 */
+	std
+	movq	$0, 5*8(%rsp)		/* clear "NMI executing" */
 
 	/*
 	 * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH-v3.14.y v2 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Thomas D
@ 2015-09-29 13:38                 ` Greg KH
  0 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2015-09-29 13:38 UTC (permalink / raw)
  To: Thomas D
  Cc: stable, luto, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar

On Wed, Aug 19, 2015 at 04:11:47PM +0200, Thomas D wrote:
> From: Andy Lutomirski <luto@kernel.org>
> 
> 32-bit kernels handle nested NMIs in C.  Enable the exact same
> handling on 64-bit kernels as well.  This isn't currently
> necessary, but it will become necessary once the asm code starts
> allowing limited nesting.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/nmi.c | 123 +++++++++++++++++++++-----------------------------
>  1 file changed, 52 insertions(+), 71 deletions(-)

Any reason why you didn't sign off on these?

And please provide the git commit ids of the patches, otherwise I have
to go dig them up, and that's a pain when you are dealing with the
number of patches I have to deal with...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157
  2015-08-19 14:11             ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
                                 ` (5 preceding siblings ...)
  2015-08-19 14:11               ` [PATCH-v3.14.y v2 6/6] x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection Thomas D
@ 2015-09-29 14:11               ` Greg KH
  6 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2015-09-29 14:11 UTC (permalink / raw)
  To: Thomas D; +Cc: stable, luto

On Wed, Aug 19, 2015 at 04:11:46PM +0200, Thomas D wrote:
> Hi,
> 
> here's my backport for CVE-2015-3290 and CVE-2015-5157 for linux-3.14.

All now queued up, thanks for this.

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-09-29 14:11 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-17 10:39 Request for stable 3.18.y and 3.14.y inclusion: Fix for CVE-2015-3290 (nmi) Thomas D.
2015-08-17 13:23 ` Greg KH
2015-08-17 22:55   ` [PATCH-v3.14.y 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
2015-08-17 22:55     ` [PATCH-v3.14.y 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Thomas D
2015-08-17 22:55     ` [PATCH-v3.14.y 2/6] x86/nmi/64: Remove asm code that saves CR2 Thomas D
2015-08-17 22:55     ` [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry Thomas D
2015-08-18 15:45       ` Jiri Slaby
2015-08-18 17:12         ` Thomas D.
2015-08-18 19:32           ` Jiri Slaby
2015-08-19 14:11             ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Thomas D
2015-08-19 14:11               ` [PATCH-v3.14.y v2 1/6] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Thomas D
2015-09-29 13:38                 ` Greg KH
2015-08-19 14:11               ` [PATCH-v3.14.y v2 2/6] x86/nmi/64: Remove asm code that saves CR2 Thomas D
2015-08-19 14:11               ` [PATCH-v3.14.y v2 3/6] x86/nmi/64: Switch stacks on userspace NMI entry Thomas D
2015-08-19 14:11               ` [PATCH-v3.14.y v2 4/6] x86/nmi/64: Improve nested NMI comments Thomas D
2015-08-19 14:11               ` [PATCH-v3.14.y v2 5/6] x86/nmi/64: Reorder nested NMI checks Thomas D
2015-08-19 14:11               ` [PATCH-v3.14.y v2 6/6] x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection Thomas D
2015-09-29 14:11               ` [PATCH-v3.14.y v2 0/6] x86/nmi/64: Stable backports for CVE-2015-3290 and CVE-2015-5157 Greg KH
2015-08-17 22:55     ` [PATCH-v3.14.y 4/6] x86/nmi/64: Improve nested NMI comments Thomas D
2015-08-17 22:55     ` [PATCH-v3.14.y 5/6] x86/nmi/64: Reorder nested NMI checks Thomas D
2015-08-17 22:55     ` [PATCH-v3.14.y 6/6] x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection Thomas D

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.