linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Andy Lutomirski <luto@kernel.org>, x86@kernel.org
Cc: linux-kernel@vger.kernel.org, Borislav Petkov <bp@alien8.de>,
	Brian Gerst <brgerst@gmail.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Kees Cook <keescook@chromium.org>
Subject: Re: [RFC 08/17] x86/asm/64: De-Xen-ify our NMI code
Date: Thu, 7 Sep 2017 11:34:44 +0200	[thread overview]
Message-ID: <decdea78-f65a-ddc1-4b0c-e18964790def@suse.com> (raw)
In-Reply-To: <f0d45728756ebe8540edcc7b0da57bb77b3b719e.1504733277.git.luto@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3931 bytes --]

On 06/09/17 23:36, Andy Lutomirski wrote:
> Xen PV is fundamentally incompatible with our fancy NMI code: it
> doesn't use IST at all, and Xen entries clobber two stack slots
> below the hardware frame.
> 
> Drop Xen PV support from our NMI code entirely.
> 
> XXX: Juergen: could you write and test the tiny patch needed to
> make Xen PV have a xen_nmi entry that handles NMIs?  I don't know
> how to test it.

You mean something like the attached one?

Seems to work at least for the "normal" case of a NMI coming in at
a random point in time.

Regarding testing: in case you have a Xen setup you can easily send
a NMI to a domain from dom0:

xl trigger <domain> nmi


Juergen

> 
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_64.S | 41 +++++++++++++++++------------------------
>  1 file changed, 17 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index a9e318f7cc9b..c81e05fb999e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1171,21 +1171,12 @@ ENTRY(error_exit)
>  	jmp	retint_user
>  END(error_exit)
>  
> -/* Runs on exception stack */
> +/*
> + * Runs on exception stack.  Xen PV does not go through this path at all,
> + * so we can use real assembly here.
> + */
>  ENTRY(nmi)
>  	/*
> -	 * Fix up the exception frame if we're on Xen.
> -	 * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
> -	 * one value to the stack on native, so it may clobber the rdx
> -	 * scratch slot, but it won't clobber any of the important
> -	 * slots past it.
> -	 *
> -	 * Xen is a different story, because the Xen frame itself overlaps
> -	 * the "NMI executing" variable.
> -	 */
> -	PARAVIRT_ADJUST_EXCEPTION_FRAME
> -
> -	/*
>  	 * We allow breakpoints in NMIs. If a breakpoint occurs, then
>  	 * the iretq it performs will take us out of NMI context.
>  	 * This means that we can have nested NMIs where the next
> @@ -1240,7 +1231,7 @@ ENTRY(nmi)
>  	 * stacks lest we corrupt the "NMI executing" variable.
>  	 */
>  
> -	SWAPGS_UNSAFE_STACK
> +	swapgs
>  	cld
>  	movq	%rsp, %rdx
>  	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
> @@ -1402,7 +1393,7 @@ nested_nmi_out:
>  	popq	%rdx
>  
>  	/* We are returning to kernel mode, so this cannot result in a fault. */
> -	INTERRUPT_RETURN
> +	iretq
>  
>  first_nmi:
>  	/* Restore rdx. */
> @@ -1432,7 +1423,7 @@ first_nmi:
>  	pushfq			/* RFLAGS */
>  	pushq	$__KERNEL_CS	/* CS */
>  	pushq	$1f		/* RIP */
> -	INTERRUPT_RETURN	/* continues at repeat_nmi below */
> +	iretq			/* continues at repeat_nmi below */
>  1:
>  #endif
>  
> @@ -1502,20 +1493,22 @@ nmi_restore:
>  	/*
>  	 * Clear "NMI executing".  Set DF first so that we can easily
>  	 * distinguish the remaining code between here and IRET from
> -	 * the SYSCALL entry and exit paths.  On a native kernel, we
> -	 * could just inspect RIP, but, on paravirt kernels,
> -	 * INTERRUPT_RETURN can translate into a jump into a
> -	 * hypercall page.
> +	 * the SYSCALL entry and exit paths.
> +	 *
> +	 * We arguably should just inspect RIP instead, but I (Andy) wrote
> +	 * this code when I had the misapprehension that Xen PV supported
> +	 * NMIs, and Xen PV would break that approach.
>  	 */
>  	std
>  	movq	$0, 5*8(%rsp)		/* clear "NMI executing" */
>  
>  	/*
> -	 * INTERRUPT_RETURN reads the "iret" frame and exits the NMI
> -	 * stack in a single instruction.  We are returning to kernel
> -	 * mode, so this cannot result in a fault.
> +	 * iretq reads the "iret" frame and exits the NMI stack in a
> +	 * single instruction.  We are returning to kernel mode, so this
> +	 * cannot result in a fault.  Similarly, we don't need to worry
> +	 * about espfix64 on the way back to kernel mode.
>  	 */
> -	INTERRUPT_RETURN
> +	iretq
>  END(nmi)
>  
>  ENTRY(ignore_sysret)
> 


[-- Attachment #2: 0001-xen-add-xen-nmi-trap-entry.patch --]
[-- Type: text/x-patch, Size: 3169 bytes --]

>From 544e6cf531efaafbeed7d4c15690ee360ae1e45c Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Thu, 7 Sep 2017 10:23:19 +0200
Subject: [PATCH] xen: add xen nmi trap entry

Instead of trying to execute any NMI via the bare metal's NMI trap
handler use a Xen specific one for pv domains, like we do for e.g.
debug traps. As in a pv domain the NMI is handled via the normal
kernel stack this is the correct thing to do.

This will enable us to get rid of the very fragile and questionable
dependencies between the bare metal NMI handler and Xen assumptions
believed to be broken anyway.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/entry/entry_64.S    | 2 +-
 arch/x86/include/asm/traps.h | 2 +-
 arch/x86/xen/enlighten_pv.c  | 2 +-
 arch/x86/xen/xen-asm_64.S    | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 49167258d587..62a3248c18fc 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1058,6 +1058,7 @@ idtentry int3			do_int3			has_error_code=0	paranoid=1 shift_ist=DEBUG_STACK
 idtentry stack_segment		do_stack_segment	has_error_code=1
 
 #ifdef CONFIG_XEN
+idtentry xennmi			do_nmi			has_error_code=0
 idtentry xendebug		do_debug		has_error_code=0
 idtentry xenint3		do_int3			has_error_code=0
 #endif
@@ -1223,7 +1224,6 @@ ENTRY(error_exit)
 END(error_exit)
 
 /* Runs on exception stack */
-/* XXX: broken on Xen PV */
 ENTRY(nmi)
 	UNWIND_HINT_IRET_REGS
 	/*
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 5545f6459bf5..26f1a88e42da 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -37,9 +37,9 @@ asmlinkage void simd_coprocessor_error(void);
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV)
 asmlinkage void xen_divide_error(void);
+asmlinkage void xen_xennmi(void);
 asmlinkage void xen_xendebug(void);
 asmlinkage void xen_xenint3(void);
-asmlinkage void xen_nmi(void);
 asmlinkage void xen_overflow(void);
 asmlinkage void xen_bounds(void);
 asmlinkage void xen_invalid_op(void);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index ae2a2e2d6362..e94f7a7f7f15 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -600,7 +600,7 @@ static struct trap_array_entry trap_array[] = {
 #ifdef CONFIG_X86_MCE
 	{ machine_check,               xen_machine_check,               true },
 #endif
-	{ nmi,                         xen_nmi,                         true },
+	{ nmi,                         xen_xennmi,                      true },
 	{ overflow,                    xen_overflow,                    false },
 #ifdef CONFIG_IA32_EMULATION
 	{ entry_INT80_compat,          xen_entry_INT80_compat,          false },
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index dae2cc33afb5..286ecc198562 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -29,7 +29,7 @@ xen_pv_trap debug
 xen_pv_trap xendebug
 xen_pv_trap int3
 xen_pv_trap xenint3
-xen_pv_trap nmi
+xen_pv_trap xennmi
 xen_pv_trap overflow
 xen_pv_trap bounds
 xen_pv_trap invalid_op
-- 
2.12.3


  reply	other threads:[~2017-09-07  9:34 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-06 21:36 [RFC 00/17] Pile o' entry stack changes Andy Lutomirski
2017-09-06 21:36 ` [RFC 01/17] x86/asm/64: Remove the restore_c_regs_and_iret label Andy Lutomirski
2017-09-07  9:40   ` Borislav Petkov
2017-09-07  9:46     ` Ingo Molnar
2017-09-07  9:49       ` Ingo Molnar
2017-09-07  9:57         ` Borislav Petkov
2017-09-07 10:29           ` Ingo Molnar
2017-09-06 21:36 ` [RFC 02/17] x86/asm/64: Split the iret-to-user and iret-to-kernel paths Andy Lutomirski
2017-09-06 21:36 ` [RFC 03/17] x86/asm/64: Move SWAPGS into the common iret-to-usermode path Andy Lutomirski
2017-09-06 21:36 ` [RFC 04/17] x86/asm/64: Simplify reg restore code in the standard IRET paths Andy Lutomirski
2017-09-12 20:05   ` Josh Poimboeuf
2017-09-06 21:36 ` [RFC 05/17] x86/asm/64: Shrink paranoid_exit_restore and make labels local Andy Lutomirski
2017-09-06 21:36 ` [RFC 06/17] x86/asm/64: Use pop instead of movq in syscall_return_via_sysret Andy Lutomirski
2017-09-06 21:36 ` [RFC 07/17] x86/asm/64: Merge the fast and slow SYSRET paths Andy Lutomirski
2017-09-06 21:36 ` [RFC 08/17] x86/asm/64: De-Xen-ify our NMI code Andy Lutomirski
2017-09-07  9:34   ` Juergen Gross [this message]
2017-09-07 18:38     ` Andy Lutomirski
2017-09-08  4:26       ` Juergen Gross
2017-09-06 21:36 ` [RFC 09/17] x86/asm/32: Pull MSR_IA32_SYSENTER_CS update code out of native_load_sp0() Andy Lutomirski
2017-09-12 20:06   ` Josh Poimboeuf
2017-09-06 21:36 ` [RFC 10/17] x86/asm/64: Pass sp0 directly to load_sp0() Andy Lutomirski
2017-09-06 21:36 ` [RFC 11/17] x86/asm: Add task_top_of_stack() to find the top of a task's stack Andy Lutomirski
2017-09-06 21:36 ` [RFC 12/17] x86/xen/64: Clean up SP code in cpu_initialize_context() Andy Lutomirski
2017-09-12 20:09   ` Josh Poimboeuf
2017-09-06 21:36 ` [RFC 13/17] x86/boot/64: Stop initializing TSS.sp0 at boot Andy Lutomirski
2017-09-06 21:36 ` [RFC 14/17] x86/asm/64: Remove all remaining direct thread_struct::sp0 reads Andy Lutomirski
2017-09-06 21:37 ` [RFC 15/17] x86/boot/32: Fix cpu_current_top_of_stack initialization at boot Andy Lutomirski
2017-09-06 21:37 ` [RFC 16/17] x86/asm/64: Remove thread_struct::sp0 Andy Lutomirski
2017-09-06 21:37 ` [RFC 17/17] x86/traps: Use a new on_thread_stack() helper to clean up an assertion Andy Lutomirski
2017-09-12 20:11   ` Josh Poimboeuf
2017-09-12 20:25     ` Andrew Cooper
2017-09-06 22:16 ` [RFC 00/17] Pile o' entry stack changes Andi Kleen
2017-09-07  0:01   ` Andy Lutomirski
2017-09-07  7:04     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=decdea78-f65a-ddc1-4b0c-e18964790def@suse.com \
    --to=jgross@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).