From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755120AbdIGJes (ORCPT ); Thu, 7 Sep 2017 05:34:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:43305 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754825AbdIGJeq (ORCPT ); Thu, 7 Sep 2017 05:34:46 -0400 Subject: Re: [RFC 08/17] x86/asm/64: De-Xen-ify our NMI code To: Andy Lutomirski , x86@kernel.org Cc: linux-kernel@vger.kernel.org, Borislav Petkov , Brian Gerst , Andrew Cooper , Boris Ostrovsky , Kees Cook References: From: Juergen Gross Message-ID: Date: Thu, 7 Sep 2017 11:34:44 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------433D5E2AA65F0E49B5A8903F" Content-Language: de-DE Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------433D5E2AA65F0E49B5A8903F Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 06/09/17 23:36, Andy Lutomirski wrote: > Xen PV is fundamentally incompatible with our fancy NMI code: it > doesn't use IST at all, and Xen entries clobber two stack slots > below the hardware frame. > > Drop Xen PV support from our NMI code entirely. > > XXX: Juergen: could you write and test the tiny patch needed to > make Xen PV have a xen_nmi entry that handles NMIs? I don't know > how to test it. You mean something like the attached one? Seems to work at least for the "normal" case of a NMI coming in at a random point in time. Regarding testing: in case you have a Xen setup you can easily send a NMI to a domain from dom0: xl trigger nmi Juergen > > Cc: Juergen Gross > Cc: Boris Ostrovsky > Signed-off-by: Andy Lutomirski > --- > arch/x86/entry/entry_64.S | 41 +++++++++++++++++------------------------ > 1 file changed, 17 insertions(+), 24 deletions(-) > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index a9e318f7cc9b..c81e05fb999e 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -1171,21 +1171,12 @@ ENTRY(error_exit) > jmp retint_user > END(error_exit) > > -/* Runs on exception stack */ > +/* > + * Runs on exception stack. Xen PV does not go through this path at all, > + * so we can use real assembly here. > + */ > ENTRY(nmi) > /* > - * Fix up the exception frame if we're on Xen. > - * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most > - * one value to the stack on native, so it may clobber the rdx > - * scratch slot, but it won't clobber any of the important > - * slots past it. > - * > - * Xen is a different story, because the Xen frame itself overlaps > - * the "NMI executing" variable. > - */ > - PARAVIRT_ADJUST_EXCEPTION_FRAME > - > - /* > * We allow breakpoints in NMIs. If a breakpoint occurs, then > * the iretq it performs will take us out of NMI context. > * This means that we can have nested NMIs where the next > @@ -1240,7 +1231,7 @@ ENTRY(nmi) > * stacks lest we corrupt the "NMI executing" variable. > */ > > - SWAPGS_UNSAFE_STACK > + swapgs > cld > movq %rsp, %rdx > movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp > @@ -1402,7 +1393,7 @@ nested_nmi_out: > popq %rdx > > /* We are returning to kernel mode, so this cannot result in a fault. */ > - INTERRUPT_RETURN > + iretq > > first_nmi: > /* Restore rdx. */ > @@ -1432,7 +1423,7 @@ first_nmi: > pushfq /* RFLAGS */ > pushq $__KERNEL_CS /* CS */ > pushq $1f /* RIP */ > - INTERRUPT_RETURN /* continues at repeat_nmi below */ > + iretq /* continues at repeat_nmi below */ > 1: > #endif > > @@ -1502,20 +1493,22 @@ nmi_restore: > /* > * Clear "NMI executing". Set DF first so that we can easily > * distinguish the remaining code between here and IRET from > - * the SYSCALL entry and exit paths. On a native kernel, we > - * could just inspect RIP, but, on paravirt kernels, > - * INTERRUPT_RETURN can translate into a jump into a > - * hypercall page. > + * the SYSCALL entry and exit paths. > + * > + * We arguably should just inspect RIP instead, but I (Andy) wrote > + * this code when I had the misapprehension that Xen PV supported > + * NMIs, and Xen PV would break that approach. > */ > std > movq $0, 5*8(%rsp) /* clear "NMI executing" */ > > /* > - * INTERRUPT_RETURN reads the "iret" frame and exits the NMI > - * stack in a single instruction. We are returning to kernel > - * mode, so this cannot result in a fault. > + * iretq reads the "iret" frame and exits the NMI stack in a > + * single instruction. We are returning to kernel mode, so this > + * cannot result in a fault. Similarly, we don't need to worry > + * about espfix64 on the way back to kernel mode. > */ > - INTERRUPT_RETURN > + iretq > END(nmi) > > ENTRY(ignore_sysret) > --------------433D5E2AA65F0E49B5A8903F Content-Type: text/x-patch; name="0001-xen-add-xen-nmi-trap-entry.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-xen-add-xen-nmi-trap-entry.patch" >>From 544e6cf531efaafbeed7d4c15690ee360ae1e45c Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 7 Sep 2017 10:23:19 +0200 Subject: [PATCH] xen: add xen nmi trap entry Instead of trying to execute any NMI via the bare metal's NMI trap handler use a Xen specific one for pv domains, like we do for e.g. debug traps. As in a pv domain the NMI is handled via the normal kernel stack this is the correct thing to do. This will enable us to get rid of the very fragile and questionable dependencies between the bare metal NMI handler and Xen assumptions believed to be broken anyway. Signed-off-by: Juergen Gross --- arch/x86/entry/entry_64.S | 2 +- arch/x86/include/asm/traps.h | 2 +- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm_64.S | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 49167258d587..62a3248c18fc 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1058,6 +1058,7 @@ idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK idtentry stack_segment do_stack_segment has_error_code=1 #ifdef CONFIG_XEN +idtentry xennmi do_nmi has_error_code=0 idtentry xendebug do_debug has_error_code=0 idtentry xenint3 do_int3 has_error_code=0 #endif @@ -1223,7 +1224,6 @@ ENTRY(error_exit) END(error_exit) /* Runs on exception stack */ -/* XXX: broken on Xen PV */ ENTRY(nmi) UNWIND_HINT_IRET_REGS /* diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 5545f6459bf5..26f1a88e42da 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -37,9 +37,9 @@ asmlinkage void simd_coprocessor_error(void); #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV) asmlinkage void xen_divide_error(void); +asmlinkage void xen_xennmi(void); asmlinkage void xen_xendebug(void); asmlinkage void xen_xenint3(void); -asmlinkage void xen_nmi(void); asmlinkage void xen_overflow(void); asmlinkage void xen_bounds(void); asmlinkage void xen_invalid_op(void); diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index ae2a2e2d6362..e94f7a7f7f15 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -600,7 +600,7 @@ static struct trap_array_entry trap_array[] = { #ifdef CONFIG_X86_MCE { machine_check, xen_machine_check, true }, #endif - { nmi, xen_nmi, true }, + { nmi, xen_xennmi, true }, { overflow, xen_overflow, false }, #ifdef CONFIG_IA32_EMULATION { entry_INT80_compat, xen_entry_INT80_compat, false }, diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S index dae2cc33afb5..286ecc198562 100644 --- a/arch/x86/xen/xen-asm_64.S +++ b/arch/x86/xen/xen-asm_64.S @@ -29,7 +29,7 @@ xen_pv_trap debug xen_pv_trap xendebug xen_pv_trap int3 xen_pv_trap xenint3 -xen_pv_trap nmi +xen_pv_trap xennmi xen_pv_trap overflow xen_pv_trap bounds xen_pv_trap invalid_op -- 2.12.3 --------------433D5E2AA65F0E49B5A8903F--