LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH entry v2 0/6] x86/entry: Fixes and cleanups
@ 2020-07-03 17:02 Andy Lutomirski
  2020-07-03 17:02 ` [PATCH entry v2 1/6] x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER Andy Lutomirski
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Andy Lutomirski @ 2020-07-03 17:02 UTC (permalink / raw)
  To: x86; +Cc: Andrew Cooper, Juergen Gross, LKML, Andy Lutomirski

These are in priority order.  Patch 1 could be folded into the patch
it fixes.  The selftests improve my confidence in the correctness of
the whole pile.  The next two patches fix IDTENTRY miswiring.  The
last two are optional and could easily wait until the next merge
window.

Andy Lutomirski (6):
  x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
  x86/entry, selftests: Further improve user entry sanity checks
  x86/entry/xen: Route #DB correctly on Xen PV
  x86/entry/32: Fix #MC and #DB wiring on x86_32
  x86/ldt: Disable 16-bit segments on Xen PV
  x86/entry: Rename idtentry_enter/exit_cond_rcu() to
    idtentry_enter/exit()

 arch/x86/entry/common.c                  | 69 +++++++++++++++-------
 arch/x86/entry/entry_64_compat.S         | 19 +++---
 arch/x86/include/asm/idtentry.h          | 75 +++++++++++-------------
 arch/x86/kernel/cpu/mce/core.c           |  4 +-
 arch/x86/kernel/kvm.c                    |  6 +-
 arch/x86/kernel/ldt.c                    | 35 ++++++++++-
 arch/x86/kernel/traps.c                  | 20 +++++--
 arch/x86/mm/fault.c                      |  6 +-
 arch/x86/xen/enlighten_pv.c              | 28 +++++++--
 arch/x86/xen/xen-asm_64.S                |  5 +-
 kernel/time/tick-sched.c                 |  1 +
 tools/testing/selftests/x86/syscall_nt.c | 11 ++++
 12 files changed, 189 insertions(+), 90 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH entry v2 1/6] x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
  2020-07-03 17:02 [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Andy Lutomirski
@ 2020-07-03 17:02 ` Andy Lutomirski
  2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
  2020-07-03 17:02 ` [PATCH entry v2 2/6] x86/entry, selftests: Further improve user entry sanity checks Andy Lutomirski
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2020-07-03 17:02 UTC (permalink / raw)
  To: x86; +Cc: Andrew Cooper, Juergen Gross, LKML, Andy Lutomirski

Move the cleari the high bits of RAX after Xen PV joins the SYSENTER
path so that Xen PV doesn't skip it.

Arguably we should delete this code instead, but that would belong
in the merge window.

Fixes: ffae641f5747 ("x86/entry/64/compat: Fix Xen PV SYSENTER frame setup")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64_compat.S | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 381a6de7de9c..541fdaf64045 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -57,15 +57,6 @@ SYM_CODE_START(entry_SYSENTER_compat)
 
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
-	/*
-	 * User tracing code (ptrace or signal handlers) might assume that
-	 * the saved RAX contains a 32-bit number when we're invoking a 32-bit
-	 * syscall.  Just in case the high bits are nonzero, zero-extend
-	 * the syscall number.  (This could almost certainly be deleted
-	 * with no ill effects.)
-	 */
-	movl	%eax, %eax
-
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER32_DS		/* pt_regs->ss */
 	pushq	$0			/* pt_regs->sp = 0 (placeholder) */
@@ -80,6 +71,16 @@ SYM_CODE_START(entry_SYSENTER_compat)
 	pushq	$__USER32_CS		/* pt_regs->cs */
 	pushq	$0			/* pt_regs->ip = 0 (placeholder) */
 SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
+
+	/*
+	 * User tracing code (ptrace or signal handlers) might assume that
+	 * the saved RAX contains a 32-bit number when we're invoking a 32-bit
+	 * syscall.  Just in case the high bits are nonzero, zero-extend
+	 * the syscall number.  (This could almost certainly be deleted
+	 * with no ill effects.)
+	 */
+	movl	%eax, %eax
+
 	pushq	%rax			/* pt_regs->orig_ax */
 	pushq	%rdi			/* pt_regs->di */
 	pushq	%rsi			/* pt_regs->si */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH entry v2 2/6] x86/entry, selftests: Further improve user entry sanity checks
  2020-07-03 17:02 [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Andy Lutomirski
  2020-07-03 17:02 ` [PATCH entry v2 1/6] x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER Andy Lutomirski
@ 2020-07-03 17:02 ` Andy Lutomirski
  2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
  2020-07-03 17:02 ` [PATCH entry v2 3/6] x86/entry/xen: Route #DB correctly on Xen PV Andy Lutomirski
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2020-07-03 17:02 UTC (permalink / raw)
  To: x86; +Cc: Andrew Cooper, Juergen Gross, LKML, Andy Lutomirski

Chasing down a Xen bug caused me to realize that the new entry sanity
checks are still fairly weak.  Add some more checks.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/common.c                  | 19 +++++++++++++++++++
 tools/testing/selftests/x86/syscall_nt.c | 11 +++++++++++
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index f392a8bcd1c3..e83b3f14897c 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -49,6 +49,23 @@
 static void check_user_regs(struct pt_regs *regs)
 {
 	if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) {
+		/*
+		 * Make sure that the entry code gave us a sensible EFLAGS
+		 * register.  Native because we want to check the actual CPU
+		 * state, not the interrupt state as imagined by Xen.
+		 */
+		unsigned long flags = native_save_fl();
+		WARN_ON_ONCE(flags & (X86_EFLAGS_AC | X86_EFLAGS_DF |
+				      X86_EFLAGS_NT));
+
+		/* We think we came from user mode. Make sure pt_regs agrees. */
+		WARN_ON_ONCE(!user_mode(regs));
+
+		/*
+		 * All entries from user mode (except #DF) should be on the
+		 * normal thread stack and should have user pt_regs in the
+		 * correct location.
+		 */
 		WARN_ON_ONCE(!on_thread_stack());
 		WARN_ON_ONCE(regs != task_pt_regs(current));
 	}
@@ -577,6 +594,7 @@ SYSCALL_DEFINE0(ni_syscall)
 bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
 {
 	if (user_mode(regs)) {
+		check_user_regs(regs);
 		enter_from_user_mode();
 		return false;
 	}
@@ -710,6 +728,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
  */
 void noinstr idtentry_enter_user(struct pt_regs *regs)
 {
+	check_user_regs(regs);
 	enter_from_user_mode();
 }
 
diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c
index 970e5e14d96d..a108b80dd082 100644
--- a/tools/testing/selftests/x86/syscall_nt.c
+++ b/tools/testing/selftests/x86/syscall_nt.c
@@ -81,5 +81,16 @@ int main(void)
 	printf("[RUN]\tSet NT|AC|TF and issue a syscall\n");
 	do_it(X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_TF);
 
+	/*
+	 * Now try DF.  This is evil and it's plausible that we will crash
+	 * glibc, but glibc would have to do something rather surprising
+	 * for this to happen.
+	 */
+	printf("[RUN]\tSet DF and issue a syscall\n");
+	do_it(X86_EFLAGS_DF);
+
+	printf("[RUN]\tSet TF|DF and issue a syscall\n");
+	do_it(X86_EFLAGS_TF | X86_EFLAGS_DF);
+
 	return nerrs == 0 ? 0 : 1;
 }
-- 
2.25.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH entry v2 3/6] x86/entry/xen: Route #DB correctly on Xen PV
  2020-07-03 17:02 [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Andy Lutomirski
  2020-07-03 17:02 ` [PATCH entry v2 1/6] x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER Andy Lutomirski
  2020-07-03 17:02 ` [PATCH entry v2 2/6] x86/entry, selftests: Further improve user entry sanity checks Andy Lutomirski
@ 2020-07-03 17:02 ` Andy Lutomirski
  2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
  2020-07-06  8:41   ` [PATCH entry v2 3/6] " Michal Kubecek
  2020-07-03 17:02 ` [PATCH entry v2 4/6] x86/entry/32: Fix #MC and #DB wiring on x86_32 Andy Lutomirski
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Andy Lutomirski @ 2020-07-03 17:02 UTC (permalink / raw)
  To: x86; +Cc: Andrew Cooper, Juergen Gross, LKML, Andy Lutomirski

On Xen PV, #DB doesn't use IST.  We still need to correctly route it
depending on whether it came from user or kernel mode.

This patch gets rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too
hard to follow the logic.  Instead, route #DB and NMI through
DECLARE/DEFINE_IDTENTRY_RAW on Xen, and do the right thing for #DB.
Also add more warnings to the exc_debug* handlers to make this type
of failure more obvious.

This fixes various forms of corruption that happen when usermode
triggers #DB on Xen PV.

Fixes: 4c0dcd8350a0 ("x86/entry: Implement user mode C entry points for #DB and #MCE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/idtentry.h | 24 ++++++------------------
 arch/x86/kernel/traps.c         | 12 ++++++++++++
 arch/x86/xen/enlighten_pv.c     | 28 ++++++++++++++++++++++++----
 arch/x86/xen/xen-asm_64.S       |  5 ++---
 4 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index cf51c50eb356..94333ac3092b 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -398,18 +398,6 @@ __visible noinstr void func(struct pt_regs *regs,			\
 #define DEFINE_IDTENTRY_DEBUG		DEFINE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_DEBUG_USER	DEFINE_IDTENTRY_NOIST
 
-/**
- * DECLARE_IDTENTRY_XEN - Declare functions for XEN redirect IDT entry points
- * @vector:	Vector number (ignored for C)
- * @func:	Function name of the entry point
- *
- * Used for xennmi and xendebug redirections. No DEFINE as this is all ASM
- * indirection magic.
- */
-#define DECLARE_IDTENTRY_XEN(vector, func)				\
-	asmlinkage void xen_asm_exc_xen##func(void);			\
-	asmlinkage void asm_exc_xen##func(void)
-
 #else /* !__ASSEMBLY__ */
 
 /*
@@ -469,10 +457,6 @@ __visible noinstr void func(struct pt_regs *regs,			\
 /* No ASM code emitted for NMI */
 #define DECLARE_IDTENTRY_NMI(vector, func)
 
-/* XEN NMI and DB wrapper */
-#define DECLARE_IDTENTRY_XEN(vector, func)				\
-	idtentry vector asm_exc_xen##func exc_##func has_error_code=0
-
 /*
  * ASM code to emit the common vector entry stubs where each stub is
  * packed into 8 bytes.
@@ -570,11 +554,15 @@ DECLARE_IDTENTRY_MCE(X86_TRAP_MC,	exc_machine_check);
 
 /* NMI */
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
-DECLARE_IDTENTRY_XEN(X86_TRAP_NMI,	nmi);
+#ifdef CONFIG_XEN_PV
+DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
+#endif
 
 /* #DB */
 DECLARE_IDTENTRY_DEBUG(X86_TRAP_DB,	exc_debug);
-DECLARE_IDTENTRY_XEN(X86_TRAP_DB,	debug);
+#ifdef CONFIG_XEN_PV
+DECLARE_IDTENTRY_RAW(X86_TRAP_DB,	xenpv_exc_debug);
+#endif
 
 /* #DF */
 DECLARE_IDTENTRY_DF(X86_TRAP_DF,	exc_double_fault);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f9727b96961f..c17f9b57171f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -865,6 +865,12 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
 	instrumentation_begin();
 	trace_hardirqs_off_finish();
 
+	/*
+	 * If something gets miswired and we end up here for a user mode
+	 * #DB, we will malfunction.
+	 */
+	WARN_ON_ONCE(user_mode(regs));
+
 	/*
 	 * Catch SYSENTER with TF set and clear DR_STEP. If this hit a
 	 * watchpoint at the same time then that will still be handled.
@@ -883,6 +889,12 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
 static __always_inline void exc_debug_user(struct pt_regs *regs,
 					   unsigned long dr6)
 {
+	/*
+	 * If something gets miswired and we end up here for a kernel mode
+	 * #DB, we will malfunction.
+	 */
+	WARN_ON_ONCE(!user_mode(regs));
+
 	idtentry_enter_user(regs);
 	instrumentation_begin();
 
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index acc49fa6a097..0d68948c82ad 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -598,6 +598,26 @@ static void xen_write_ldt_entry(struct desc_struct *dt, int entrynum,
 }
 
 #ifdef CONFIG_X86_64
+void noist_exc_debug(struct pt_regs *regs);
+
+DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
+{
+	/* On Xen PV, NMI doesn't use IST.  The C part is the sane as native. */
+	exc_nmi(regs);
+}
+
+DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
+{
+	/*
+	 * There's no IST on Xen PV, but we still need to dispatch
+	 * to the correct handler.
+	 */
+	if (user_mode(regs))
+		noist_exc_debug(regs);
+	else
+		exc_debug(regs);
+}
+
 struct trap_array_entry {
 	void (*orig)(void);
 	void (*xen)(void);
@@ -609,18 +629,18 @@ struct trap_array_entry {
 	.xen		= xen_asm_##func,		\
 	.ist_okay	= ist_ok }
 
-#define TRAP_ENTRY_REDIR(func, xenfunc, ist_ok) {	\
+#define TRAP_ENTRY_REDIR(func, ist_ok) {		\
 	.orig		= asm_##func,			\
-	.xen		= xen_asm_##xenfunc,		\
+	.xen		= xen_asm_xenpv_##func,		\
 	.ist_okay	= ist_ok }
 
 static struct trap_array_entry trap_array[] = {
-	TRAP_ENTRY_REDIR(exc_debug, exc_xendebug,	true  ),
+	TRAP_ENTRY_REDIR(exc_debug,			true  ),
 	TRAP_ENTRY(exc_double_fault,			true  ),
 #ifdef CONFIG_X86_MCE
 	TRAP_ENTRY(exc_machine_check,			true  ),
 #endif
-	TRAP_ENTRY_REDIR(exc_nmi, exc_xennmi,		true  ),
+	TRAP_ENTRY_REDIR(exc_nmi,			true  ),
 	TRAP_ENTRY(exc_int3,				false ),
 	TRAP_ENTRY(exc_overflow,			false ),
 #ifdef CONFIG_IA32_EMULATION
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index e1e1c7eafa60..aab1d99b2b48 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -29,10 +29,9 @@ _ASM_NOKPROBE(xen_\name)
 .endm
 
 xen_pv_trap asm_exc_divide_error
-xen_pv_trap asm_exc_debug
-xen_pv_trap asm_exc_xendebug
+xen_pv_trap asm_xenpv_exc_debug
 xen_pv_trap asm_exc_int3
-xen_pv_trap asm_exc_xennmi
+xen_pv_trap asm_xenpv_exc_nmi
 xen_pv_trap asm_exc_overflow
 xen_pv_trap asm_exc_bounds
 xen_pv_trap asm_exc_invalid_op
-- 
2.25.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH entry v2 4/6] x86/entry/32: Fix #MC and #DB wiring on x86_32
  2020-07-03 17:02 [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Andy Lutomirski
                   ` (2 preceding siblings ...)
  2020-07-03 17:02 ` [PATCH entry v2 3/6] x86/entry/xen: Route #DB correctly on Xen PV Andy Lutomirski
@ 2020-07-03 17:02 ` Andy Lutomirski
  2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
  2020-07-03 17:02 ` [PATCH entry v2 5/6] x86/ldt: Disable 16-bit segments on Xen PV Andy Lutomirski
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2020-07-03 17:02 UTC (permalink / raw)
  To: x86; +Cc: Andrew Cooper, Juergen Gross, LKML, Andy Lutomirski, Naresh Kamboju

DEFINE_IDTENTRY_MCE and DEFINE_IDTENTRY_DEBUG were wired up as non-RAW
on x86_32, but the code expected them to be RAW.

Get rid of all the macro indirection for them on 32-bit and just use
DECLARE_IDTENTRY_RAW and DEFINE_IDTENTRY_RAW directly.

Also add a warning to make sure that we only hit the _kernel paths
in kernel mode.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/idtentry.h | 23 +++++++++++++----------
 arch/x86/kernel/cpu/mce/core.c  |  4 +++-
 arch/x86/kernel/traps.c         |  2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 94333ac3092b..eeac6dc2adaa 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -353,10 +353,6 @@ static __always_inline void __##func(struct pt_regs *regs)
 
 #else	/* CONFIG_X86_64 */
 
-/* Maps to a regular IDTENTRY on 32bit for now */
-# define DECLARE_IDTENTRY_IST		DECLARE_IDTENTRY
-# define DEFINE_IDTENTRY_IST		DEFINE_IDTENTRY
-
 /**
  * DECLARE_IDTENTRY_DF - Declare functions for double fault 32bit variant
  * @vector:	Vector number (ignored for C)
@@ -387,16 +383,18 @@ __visible noinstr void func(struct pt_regs *regs,			\
 #endif	/* !CONFIG_X86_64 */
 
 /* C-Code mapping */
+#define DECLARE_IDTENTRY_NMI		DECLARE_IDTENTRY_RAW
+#define DEFINE_IDTENTRY_NMI		DEFINE_IDTENTRY_RAW
+
+#ifdef CONFIG_X86_64
 #define DECLARE_IDTENTRY_MCE		DECLARE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_MCE		DEFINE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_MCE_USER	DEFINE_IDTENTRY_NOIST
 
-#define DECLARE_IDTENTRY_NMI		DECLARE_IDTENTRY_RAW
-#define DEFINE_IDTENTRY_NMI		DEFINE_IDTENTRY_RAW
-
 #define DECLARE_IDTENTRY_DEBUG		DECLARE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_DEBUG		DEFINE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_DEBUG_USER	DEFINE_IDTENTRY_NOIST
+#endif
 
 #else /* !__ASSEMBLY__ */
 
@@ -443,9 +441,6 @@ __visible noinstr void func(struct pt_regs *regs,			\
 # define DECLARE_IDTENTRY_MCE(vector, func)				\
 	DECLARE_IDTENTRY(vector, func)
 
-# define DECLARE_IDTENTRY_DEBUG(vector, func)				\
-	DECLARE_IDTENTRY(vector, func)
-
 /* No ASM emitted for DF as this goes through a C shim */
 # define DECLARE_IDTENTRY_DF(vector, func)
 
@@ -549,7 +544,11 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_BP,		exc_int3);
 DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_PF,	exc_page_fault);
 
 #ifdef CONFIG_X86_MCE
+#ifdef CONFIG_X86_64
 DECLARE_IDTENTRY_MCE(X86_TRAP_MC,	exc_machine_check);
+#else
+DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	exc_machine_check);
+#endif
 #endif
 
 /* NMI */
@@ -559,7 +558,11 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
 #endif
 
 /* #DB */
+#ifdef CONFIG_X86_64
 DECLARE_IDTENTRY_DEBUG(X86_TRAP_DB,	exc_debug);
+#else
+DECLARE_IDTENTRY_RAW(X86_TRAP_DB,	exc_debug);
+#endif
 #ifdef CONFIG_XEN_PV
 DECLARE_IDTENTRY_RAW(X86_TRAP_DB,	xenpv_exc_debug);
 #endif
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ce9120c4f740..a6a90b5d7c83 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1901,6 +1901,8 @@ void (*machine_check_vector)(struct pt_regs *) = unexpected_machine_check;
 
 static __always_inline void exc_machine_check_kernel(struct pt_regs *regs)
 {
+	WARN_ON_ONCE(user_mode(regs));
+
 	/*
 	 * Only required when from kernel mode. See
 	 * mce_check_crashing_cpu() for details.
@@ -1954,7 +1956,7 @@ DEFINE_IDTENTRY_MCE_USER(exc_machine_check)
 }
 #else
 /* 32bit unified entry point */
-DEFINE_IDTENTRY_MCE(exc_machine_check)
+DEFINE_IDTENTRY_RAW(exc_machine_check)
 {
 	unsigned long dr7;
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c17f9b57171f..6ed8cc5fbe8f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -925,7 +925,7 @@ DEFINE_IDTENTRY_DEBUG_USER(exc_debug)
 }
 #else
 /* 32 bit does not have separate entry points. */
-DEFINE_IDTENTRY_DEBUG(exc_debug)
+DEFINE_IDTENTRY_RAW(exc_debug)
 {
 	unsigned long dr6, dr7;
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH entry v2 5/6] x86/ldt: Disable 16-bit segments on Xen PV
  2020-07-03 17:02 [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Andy Lutomirski
                   ` (3 preceding siblings ...)
  2020-07-03 17:02 ` [PATCH entry v2 4/6] x86/entry/32: Fix #MC and #DB wiring on x86_32 Andy Lutomirski
@ 2020-07-03 17:02 ` Andy Lutomirski
  2020-07-03 19:00   ` Andrew Cooper
  2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
  2020-07-03 17:02 ` [PATCH entry v2 6/6] x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit() Andy Lutomirski
  2020-07-03 17:31 ` [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Peter Zijlstra
  6 siblings, 2 replies; 20+ messages in thread
From: Andy Lutomirski @ 2020-07-03 17:02 UTC (permalink / raw)
  To: x86; +Cc: Andrew Cooper, Juergen Gross, LKML, Andy Lutomirski

Xen PV doesn't implement ESPFIX64, so they don't work right.  Disable
them.  Also print a warning the first time anyone tries to use a
16-bit segment on a Xen PV guest that would otherwise allow it
to help people diagnose this change in behavior.

This gets us closer to having all x86 selftests pass on Xen PV.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/ldt.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 8748321c4486..34e918ad34d4 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -29,6 +29,8 @@
 #include <asm/mmu_context.h>
 #include <asm/pgtable_areas.h>
 
+#include <xen/xen.h>
+
 /* This is a multiple of PAGE_SIZE. */
 #define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE)
 
@@ -543,6 +545,37 @@ static int read_default_ldt(void __user *ptr, unsigned long bytecount)
 	return bytecount;
 }
 
+static bool allow_16bit_segments(void)
+{
+	if (!IS_ENABLED(CONFIG_X86_16BIT))
+		return false;
+
+#ifdef CONFIG_XEN_PV
+	/*
+	 * Xen PV does not implement ESPFIX64, which means that 16-bit
+	 * segments will not work correctly.  Until either Xen PV implements
+	 * ESPFIX64 and can signal this fact to the guest or unless someone
+	 * provides compelling evidence that allowing broken 16-bit segments
+	 * is worthwhile, disallow 16-bit segments under Xen PV.
+	 */
+	if (xen_pv_domain()) {
+		static DEFINE_MUTEX(xen_warning);
+		static bool warned;
+
+		mutex_lock(&xen_warning);
+		if (!warned) {
+			pr_info("Warning: 16-bit segments do not work correctly in a Xen PV guest\n");
+			warned = true;
+		}
+		mutex_unlock(&xen_warning);
+
+		return false;
+	}
+#endif
+
+	return true;
+}
+
 static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 {
 	struct mm_struct *mm = current->mm;
@@ -574,7 +607,7 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 		/* The user wants to clear the entry. */
 		memset(&ldt, 0, sizeof(ldt));
 	} else {
-		if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
+		if (!ldt_info.seg_32bit && !allow_16bit_segments()) {
 			error = -EINVAL;
 			goto out;
 		}
-- 
2.25.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH entry v2 6/6] x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit()
  2020-07-03 17:02 [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Andy Lutomirski
                   ` (4 preceding siblings ...)
  2020-07-03 17:02 ` [PATCH entry v2 5/6] x86/ldt: Disable 16-bit segments on Xen PV Andy Lutomirski
@ 2020-07-03 17:02 ` Andy Lutomirski
  2020-07-07  8:23   ` [tip: x86/entry] " tip-bot2 for Andy Lutomirski
  2020-07-03 17:31 ` [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Peter Zijlstra
  6 siblings, 1 reply; 20+ messages in thread
From: Andy Lutomirski @ 2020-07-03 17:02 UTC (permalink / raw)
  To: x86; +Cc: Andrew Cooper, Juergen Gross, LKML, Andy Lutomirski

They were originally called _cond_rcu because they were special
versions with conditional RCU handling.  Now they're the standard
entry and exit path, so the _cond_rcu part is just confusing.  Drop
it.

Also change the signature to make them more extensible and more
foolproof.

This patch makes no functional change -- it's pure refactoring.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/common.c         | 50 ++++++++++++++++++---------------
 arch/x86/include/asm/idtentry.h | 28 ++++++++++--------
 arch/x86/kernel/kvm.c           |  6 ++--
 arch/x86/kernel/traps.c         |  6 ++--
 arch/x86/mm/fault.c             |  6 ++--
 kernel/time/tick-sched.c        |  1 +
 6 files changed, 54 insertions(+), 43 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index e83b3f14897c..0521546022cb 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -559,8 +559,7 @@ SYSCALL_DEFINE0(ni_syscall)
 }
 
 /**
- * idtentry_enter_cond_rcu - Handle state tracking on idtentry with conditional
- *			     RCU handling
+ * idtentry_enter - Handle state tracking on ordinary idtentries
  * @regs:	Pointer to pt_regs of interrupted context
  *
  * Invokes:
@@ -572,6 +571,9 @@ SYSCALL_DEFINE0(ni_syscall)
  *  - The hardirq tracer to keep the state consistent as low level ASM
  *    entry disabled interrupts.
  *
+ * As a precondition, this requires that the entry came from user mode,
+ * idle, or a kernel context in which RCU is watching.
+ *
  * For kernel mode entries RCU handling is done conditional. If RCU is
  * watching then the only RCU requirement is to check whether the tick has
  * to be restarted. If RCU is not watching then rcu_irq_enter() has to be
@@ -585,18 +587,21 @@ SYSCALL_DEFINE0(ni_syscall)
  * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
  * would not be possible.
  *
- * Returns: True if RCU has been adjusted on a kernel entry
- *	    False otherwise
+ * Returns: An opaque object that must be passed to idtentry_exit()
  *
- * The return value must be fed into the rcu_exit argument of
- * idtentry_exit_cond_rcu().
+ * The return value must be fed into the state argument of
+ * idtentry_exit().
  */
-bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
+idtentry_state_t noinstr idtentry_enter(struct pt_regs *regs)
 {
+	idtentry_state_t ret = {
+		.exit_rcu = false,
+	};
+
 	if (user_mode(regs)) {
 		check_user_regs(regs);
 		enter_from_user_mode();
-		return false;
+		return ret;
 	}
 
 	/*
@@ -634,7 +639,8 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
 		trace_hardirqs_off_finish();
 		instrumentation_end();
 
-		return true;
+		ret.exit_rcu = true;
+		return ret;
 	}
 
 	/*
@@ -649,7 +655,7 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
 	trace_hardirqs_off();
 	instrumentation_end();
 
-	return false;
+	return ret;
 }
 
 static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
@@ -667,10 +673,9 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
 }
 
 /**
- * idtentry_exit_cond_rcu - Handle return from exception with conditional RCU
- *			    handling
+ * idtentry_exit - Handle return from exception that used idtentry_enter()
  * @regs:	Pointer to pt_regs (exception entry regs)
- * @rcu_exit:	Invoke rcu_irq_exit() if true
+ * @state:	Return value from matching call to idtentry_enter()
  *
  * Depending on the return target (kernel/user) this runs the necessary
  * preemption and work checks if possible and reguired and returns to
@@ -679,10 +684,10 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
  * This is the last action before returning to the low level ASM code which
  * just needs to return to the appropriate context.
  *
- * Counterpart to idtentry_enter_cond_rcu(). The return value of the entry
- * function must be fed into the @rcu_exit argument.
+ * Counterpart to idtentry_enter(). The return value of the entry
+ * function must be fed into the @state argument.
  */
-void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
+void noinstr idtentry_exit(struct pt_regs *regs, idtentry_state_t state)
 {
 	lockdep_assert_irqs_disabled();
 
@@ -695,7 +700,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
 		 * carefully and needs the same ordering of lockdep/tracing
 		 * and RCU as the return to user mode path.
 		 */
-		if (rcu_exit) {
+		if (state.exit_rcu) {
 			instrumentation_begin();
 			/* Tell the tracer that IRET will enable interrupts */
 			trace_hardirqs_on_prepare();
@@ -714,7 +719,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
 		 * IRQ flags state is correct already. Just tell RCU if it
 		 * was not watching on entry.
 		 */
-		if (rcu_exit)
+		if (state.exit_rcu)
 			rcu_irq_exit();
 	}
 }
@@ -800,9 +805,10 @@ static void __xen_pv_evtchn_do_upcall(void)
 __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
 {
 	struct pt_regs *old_regs;
-	bool inhcall, rcu_exit;
+	bool inhcall;
+	idtentry_state_t state;
 
-	rcu_exit = idtentry_enter_cond_rcu(regs);
+	state = idtentry_enter(regs);
 	old_regs = set_irq_regs(regs);
 
 	instrumentation_begin();
@@ -812,13 +818,13 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
 	set_irq_regs(old_regs);
 
 	inhcall = get_and_clear_inhcall();
-	if (inhcall && !WARN_ON_ONCE(rcu_exit)) {
+	if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
 		instrumentation_begin();
 		idtentry_exit_cond_resched(regs, true);
 		instrumentation_end();
 		restore_inhcall(inhcall);
 	} else {
-		idtentry_exit_cond_rcu(regs, rcu_exit);
+		idtentry_exit(regs, state);
 	}
 }
 #endif /* CONFIG_XEN_PV */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index eeac6dc2adaa..7227225cf45d 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -13,8 +13,12 @@
 void idtentry_enter_user(struct pt_regs *regs);
 void idtentry_exit_user(struct pt_regs *regs);
 
-bool idtentry_enter_cond_rcu(struct pt_regs *regs);
-void idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit);
+typedef struct idtentry_state {
+	bool exit_rcu;
+} idtentry_state_t;
+
+idtentry_state_t idtentry_enter(struct pt_regs *regs);
+void idtentry_exit(struct pt_regs *regs, idtentry_state_t state);
 
 /**
  * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
@@ -54,12 +58,12 @@ static __always_inline void __##func(struct pt_regs *regs);		\
 									\
 __visible noinstr void func(struct pt_regs *regs)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	__##func (regs);						\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static __always_inline void __##func(struct pt_regs *regs)
@@ -101,12 +105,12 @@ static __always_inline void __##func(struct pt_regs *regs,		\
 __visible noinstr void func(struct pt_regs *regs,			\
 			    unsigned long error_code)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	__##func (regs, error_code);					\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static __always_inline void __##func(struct pt_regs *regs,		\
@@ -199,7 +203,7 @@ static __always_inline void __##func(struct pt_regs *regs, u8 vector);	\
 __visible noinstr void func(struct pt_regs *regs,			\
 			    unsigned long error_code)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	irq_enter_rcu();						\
@@ -207,7 +211,7 @@ __visible noinstr void func(struct pt_regs *regs,			\
 	__##func (regs, (u8)error_code);				\
 	irq_exit_rcu();							\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static __always_inline void __##func(struct pt_regs *regs, u8 vector)
@@ -241,7 +245,7 @@ static void __##func(struct pt_regs *regs);				\
 									\
 __visible noinstr void func(struct pt_regs *regs)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	irq_enter_rcu();						\
@@ -249,7 +253,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	run_on_irqstack_cond(__##func, regs, regs);			\
 	irq_exit_rcu();							\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static noinline void __##func(struct pt_regs *regs)
@@ -270,7 +274,7 @@ static __always_inline void __##func(struct pt_regs *regs);		\
 									\
 __visible noinstr void func(struct pt_regs *regs)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	__irq_enter_raw();						\
@@ -278,7 +282,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	__##func (regs);						\
 	__irq_exit_raw();						\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static __always_inline void __##func(struct pt_regs *regs)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index df63786e7bfa..3f78482d9496 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -233,7 +233,7 @@ EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf_flags);
 noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
 {
 	u32 reason = kvm_read_and_reset_apf_flags();
-	bool rcu_exit;
+	idtentry_state_t state;
 
 	switch (reason) {
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
@@ -243,7 +243,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
 		return false;
 	}
 
-	rcu_exit = idtentry_enter_cond_rcu(regs);
+	state = idtentry_enter(regs);
 	instrumentation_begin();
 
 	/*
@@ -264,7 +264,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
 	}
 
 	instrumentation_end();
-	idtentry_exit_cond_rcu(regs, rcu_exit);
+	idtentry_exit(regs, state);
 	return true;
 }
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 6ed8cc5fbe8f..de75e531cdd2 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -218,7 +218,7 @@ static inline void handle_invalid_op(struct pt_regs *regs)
 
 DEFINE_IDTENTRY_RAW(exc_invalid_op)
 {
-	bool rcu_exit;
+	idtentry_state_t state;
 
 	/*
 	 * Handle BUG/WARN like NMIs instead of like normal idtentries:
@@ -251,11 +251,11 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op)
 		 */
 	}
 
-	rcu_exit = idtentry_enter_cond_rcu(regs);
+	state = idtentry_enter(regs);
 	instrumentation_begin();
 	handle_invalid_op(regs);
 	instrumentation_end();
-	idtentry_exit_cond_rcu(regs, rcu_exit);
+	idtentry_exit(regs, state);
 }
 
 DEFINE_IDTENTRY(exc_coproc_segment_overrun)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 1ead568c0101..5e41949453cc 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1377,7 +1377,7 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
 DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
 {
 	unsigned long address = read_cr2();
-	bool rcu_exit;
+	idtentry_state_t state;
 
 	prefetchw(&current->mm->mmap_lock);
 
@@ -1412,11 +1412,11 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
 	 * code reenabled RCU to avoid subsequent wreckage which helps
 	 * debugability.
 	 */
-	rcu_exit = idtentry_enter_cond_rcu(regs);
+	state = idtentry_enter(regs);
 
 	instrumentation_begin();
 	handle_page_fault(regs, error_code, address);
 	instrumentation_end();
 
-	idtentry_exit_cond_rcu(regs, rcu_exit);
+	idtentry_exit(regs, state);
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3e2dc9b8858c..6aa0daca0dd2 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -921,6 +921,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
 			pr_warn("NOHZ: local_softirq_pending %02x\n",
 				(unsigned int) local_softirq_pending());
+			WARN_ON_ONCE(1);
 			ratelimit++;
 		}
 		return false;
-- 
2.25.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH entry v2 0/6] x86/entry: Fixes and cleanups
  2020-07-03 17:02 [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Andy Lutomirski
                   ` (5 preceding siblings ...)
  2020-07-03 17:02 ` [PATCH entry v2 6/6] x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit() Andy Lutomirski
@ 2020-07-03 17:31 ` Peter Zijlstra
  6 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2020-07-03 17:31 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, Andrew Cooper, Juergen Gross, LKML

On Fri, Jul 03, 2020 at 10:02:52AM -0700, Andy Lutomirski wrote:
> These are in priority order.  Patch 1 could be folded into the patch
> it fixes.  The selftests improve my confidence in the correctness of
> the whole pile.  The next two patches fix IDTENTRY miswiring.  The
> last two are optional and could easily wait until the next merge
> window.
> 
> Andy Lutomirski (6):
>   x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
>   x86/entry, selftests: Further improve user entry sanity checks
>   x86/entry/xen: Route #DB correctly on Xen PV
>   x86/entry/32: Fix #MC and #DB wiring on x86_32
>   x86/ldt: Disable 16-bit segments on Xen PV
>   x86/entry: Rename idtentry_enter/exit_cond_rcu() to
>     idtentry_enter/exit()

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH entry v2 5/6] x86/ldt: Disable 16-bit segments on Xen PV
  2020-07-03 17:02 ` [PATCH entry v2 5/6] x86/ldt: Disable 16-bit segments on Xen PV Andy Lutomirski
@ 2020-07-03 19:00   ` Andrew Cooper
  2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
  1 sibling, 0 replies; 20+ messages in thread
From: Andrew Cooper @ 2020-07-03 19:00 UTC (permalink / raw)
  To: Andy Lutomirski, x86; +Cc: Juergen Gross, LKML

On 03/07/2020 18:02, Andy Lutomirski wrote:
> Xen PV doesn't implement ESPFIX64, so they don't work right.  Disable
> them.  Also print a warning the first time anyone tries to use a
> 16-bit segment on a Xen PV guest that would otherwise allow it
> to help people diagnose this change in behavior.
>
> This gets us closer to having all x86 selftests pass on Xen PV.

Do we know exactly how much virtual address space ESPFIX64 takes up?

Are people going to scream in horror if Linux needs to donate some
virtual address space back to Xen to get this working?  Linux's current
hypervisor range (8TB, 16 pagetable slots) really isn't enough for some
systems these days.

~Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip: x86/urgent] x86/ldt: Disable 16-bit segments on Xen PV
  2020-07-03 17:02 ` [PATCH entry v2 5/6] x86/ldt: Disable 16-bit segments on Xen PV Andy Lutomirski
  2020-07-03 19:00   ` Andrew Cooper
@ 2020-07-04 17:49   ` tip-bot2 for Andy Lutomirski
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-04 17:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     cc801833a171163edb6385425349ba8903bd1b20
Gitweb:        https://git.kernel.org/tip/cc801833a171163edb6385425349ba8903bd1b20
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 03 Jul 2020 10:02:57 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 04 Jul 2020 19:47:26 +02:00

x86/ldt: Disable 16-bit segments on Xen PV

Xen PV doesn't implement ESPFIX64, so they don't work right.  Disable
them.  Also print a warning the first time anyone tries to use a
16-bit segment on a Xen PV guest that would otherwise allow it
to help people diagnose this change in behavior.

This gets us closer to having all x86 selftests pass on Xen PV.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/92b2975459dfe5929ecf34c3896ad920bd9e3f2d.1593795633.git.luto@kernel.org

---
 arch/x86/kernel/ldt.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 8748321..34e918a 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -29,6 +29,8 @@
 #include <asm/mmu_context.h>
 #include <asm/pgtable_areas.h>
 
+#include <xen/xen.h>
+
 /* This is a multiple of PAGE_SIZE. */
 #define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE)
 
@@ -543,6 +545,37 @@ static int read_default_ldt(void __user *ptr, unsigned long bytecount)
 	return bytecount;
 }
 
+static bool allow_16bit_segments(void)
+{
+	if (!IS_ENABLED(CONFIG_X86_16BIT))
+		return false;
+
+#ifdef CONFIG_XEN_PV
+	/*
+	 * Xen PV does not implement ESPFIX64, which means that 16-bit
+	 * segments will not work correctly.  Until either Xen PV implements
+	 * ESPFIX64 and can signal this fact to the guest or unless someone
+	 * provides compelling evidence that allowing broken 16-bit segments
+	 * is worthwhile, disallow 16-bit segments under Xen PV.
+	 */
+	if (xen_pv_domain()) {
+		static DEFINE_MUTEX(xen_warning);
+		static bool warned;
+
+		mutex_lock(&xen_warning);
+		if (!warned) {
+			pr_info("Warning: 16-bit segments do not work correctly in a Xen PV guest\n");
+			warned = true;
+		}
+		mutex_unlock(&xen_warning);
+
+		return false;
+	}
+#endif
+
+	return true;
+}
+
 static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 {
 	struct mm_struct *mm = current->mm;
@@ -574,7 +607,7 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 		/* The user wants to clear the entry. */
 		memset(&ldt, 0, sizeof(ldt));
 	} else {
-		if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
+		if (!ldt_info.seg_32bit && !allow_16bit_segments()) {
 			error = -EINVAL;
 			goto out;
 		}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip: x86/urgent] x86/entry/32: Fix #MC and #DB wiring on x86_32
  2020-07-03 17:02 ` [PATCH entry v2 4/6] x86/entry/32: Fix #MC and #DB wiring on x86_32 Andy Lutomirski
@ 2020-07-04 17:49   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-04 17:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Naresh Kamboju, Andy Lutomirski, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, LKML

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     13cbc0cd4a30c815984ad88e3a2e5976493516a3
Gitweb:        https://git.kernel.org/tip/13cbc0cd4a30c815984ad88e3a2e5976493516a3
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 03 Jul 2020 10:02:56 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 04 Jul 2020 19:47:26 +02:00

x86/entry/32: Fix #MC and #DB wiring on x86_32

DEFINE_IDTENTRY_MCE and DEFINE_IDTENTRY_DEBUG were wired up as non-RAW
on x86_32, but the code expected them to be RAW.

Get rid of all the macro indirection for them on 32-bit and just use
DECLARE_IDTENTRY_RAW and DEFINE_IDTENTRY_RAW directly.

Also add a warning to make sure that we only hit the _kernel paths
in kernel mode.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9e90a7ee8e72fd757db6d92e1e5ff16339c1ecf9.1593795633.git.luto@kernel.org

---
 arch/x86/include/asm/idtentry.h | 23 +++++++++++++----------
 arch/x86/kernel/cpu/mce/core.c  |  4 +++-
 arch/x86/kernel/traps.c         |  2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 94333ac..eeac6dc 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -353,10 +353,6 @@ static __always_inline void __##func(struct pt_regs *regs)
 
 #else	/* CONFIG_X86_64 */
 
-/* Maps to a regular IDTENTRY on 32bit for now */
-# define DECLARE_IDTENTRY_IST		DECLARE_IDTENTRY
-# define DEFINE_IDTENTRY_IST		DEFINE_IDTENTRY
-
 /**
  * DECLARE_IDTENTRY_DF - Declare functions for double fault 32bit variant
  * @vector:	Vector number (ignored for C)
@@ -387,16 +383,18 @@ __visible noinstr void func(struct pt_regs *regs,			\
 #endif	/* !CONFIG_X86_64 */
 
 /* C-Code mapping */
+#define DECLARE_IDTENTRY_NMI		DECLARE_IDTENTRY_RAW
+#define DEFINE_IDTENTRY_NMI		DEFINE_IDTENTRY_RAW
+
+#ifdef CONFIG_X86_64
 #define DECLARE_IDTENTRY_MCE		DECLARE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_MCE		DEFINE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_MCE_USER	DEFINE_IDTENTRY_NOIST
 
-#define DECLARE_IDTENTRY_NMI		DECLARE_IDTENTRY_RAW
-#define DEFINE_IDTENTRY_NMI		DEFINE_IDTENTRY_RAW
-
 #define DECLARE_IDTENTRY_DEBUG		DECLARE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_DEBUG		DEFINE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_DEBUG_USER	DEFINE_IDTENTRY_NOIST
+#endif
 
 #else /* !__ASSEMBLY__ */
 
@@ -443,9 +441,6 @@ __visible noinstr void func(struct pt_regs *regs,			\
 # define DECLARE_IDTENTRY_MCE(vector, func)				\
 	DECLARE_IDTENTRY(vector, func)
 
-# define DECLARE_IDTENTRY_DEBUG(vector, func)				\
-	DECLARE_IDTENTRY(vector, func)
-
 /* No ASM emitted for DF as this goes through a C shim */
 # define DECLARE_IDTENTRY_DF(vector, func)
 
@@ -549,7 +544,11 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_BP,		exc_int3);
 DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_PF,	exc_page_fault);
 
 #ifdef CONFIG_X86_MCE
+#ifdef CONFIG_X86_64
 DECLARE_IDTENTRY_MCE(X86_TRAP_MC,	exc_machine_check);
+#else
+DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	exc_machine_check);
+#endif
 #endif
 
 /* NMI */
@@ -559,7 +558,11 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
 #endif
 
 /* #DB */
+#ifdef CONFIG_X86_64
 DECLARE_IDTENTRY_DEBUG(X86_TRAP_DB,	exc_debug);
+#else
+DECLARE_IDTENTRY_RAW(X86_TRAP_DB,	exc_debug);
+#endif
 #ifdef CONFIG_XEN_PV
 DECLARE_IDTENTRY_RAW(X86_TRAP_DB,	xenpv_exc_debug);
 #endif
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ce9120c..a6a90b5 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1901,6 +1901,8 @@ void (*machine_check_vector)(struct pt_regs *) = unexpected_machine_check;
 
 static __always_inline void exc_machine_check_kernel(struct pt_regs *regs)
 {
+	WARN_ON_ONCE(user_mode(regs));
+
 	/*
 	 * Only required when from kernel mode. See
 	 * mce_check_crashing_cpu() for details.
@@ -1954,7 +1956,7 @@ DEFINE_IDTENTRY_MCE_USER(exc_machine_check)
 }
 #else
 /* 32bit unified entry point */
-DEFINE_IDTENTRY_MCE(exc_machine_check)
+DEFINE_IDTENTRY_RAW(exc_machine_check)
 {
 	unsigned long dr7;
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c17f9b5..6ed8cc5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -925,7 +925,7 @@ DEFINE_IDTENTRY_DEBUG_USER(exc_debug)
 }
 #else
 /* 32 bit does not have separate entry points. */
-DEFINE_IDTENTRY_DEBUG(exc_debug)
+DEFINE_IDTENTRY_RAW(exc_debug)
 {
 	unsigned long dr6, dr7;
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip: x86/urgent] x86/entry/xen: Route #DB correctly on Xen PV
  2020-07-03 17:02 ` [PATCH entry v2 3/6] x86/entry/xen: Route #DB correctly on Xen PV Andy Lutomirski
@ 2020-07-04 17:49   ` tip-bot2 for Andy Lutomirski
  2020-07-06  8:41   ` [PATCH entry v2 3/6] " Michal Kubecek
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-04 17:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     f41f0824224eb12ad84de8972962dd54be5abe3b
Gitweb:        https://git.kernel.org/tip/f41f0824224eb12ad84de8972962dd54be5abe3b
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 03 Jul 2020 10:02:55 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 04 Jul 2020 19:47:25 +02:00

x86/entry/xen: Route #DB correctly on Xen PV

On Xen PV, #DB doesn't use IST. It still needs to be correctly routed
depending on whether it came from user or kernel mode.

Get rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too hard to follow the
logic.  Instead, route #DB and NMI through DECLARE/DEFINE_IDTENTRY_RAW on
Xen, and do the right thing for #DB.  Also add more warnings to the
exc_debug* handlers to make this type of failure more obvious.

This fixes various forms of corruption that happen when usermode
triggers #DB on Xen PV.

Fixes: 4c0dcd8350a0 ("x86/entry: Implement user mode C entry points for #DB and #MCE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/4163e733cce0b41658e252c6c6b3464f33fdff17.1593795633.git.luto@kernel.org

---
 arch/x86/include/asm/idtentry.h | 24 ++++++------------------
 arch/x86/kernel/traps.c         | 12 ++++++++++++
 arch/x86/xen/enlighten_pv.c     | 28 ++++++++++++++++++++++++----
 arch/x86/xen/xen-asm_64.S       |  5 ++---
 4 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index cf51c50..94333ac 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -398,18 +398,6 @@ __visible noinstr void func(struct pt_regs *regs,			\
 #define DEFINE_IDTENTRY_DEBUG		DEFINE_IDTENTRY_IST
 #define DEFINE_IDTENTRY_DEBUG_USER	DEFINE_IDTENTRY_NOIST
 
-/**
- * DECLARE_IDTENTRY_XEN - Declare functions for XEN redirect IDT entry points
- * @vector:	Vector number (ignored for C)
- * @func:	Function name of the entry point
- *
- * Used for xennmi and xendebug redirections. No DEFINE as this is all ASM
- * indirection magic.
- */
-#define DECLARE_IDTENTRY_XEN(vector, func)				\
-	asmlinkage void xen_asm_exc_xen##func(void);			\
-	asmlinkage void asm_exc_xen##func(void)
-
 #else /* !__ASSEMBLY__ */
 
 /*
@@ -469,10 +457,6 @@ __visible noinstr void func(struct pt_regs *regs,			\
 /* No ASM code emitted for NMI */
 #define DECLARE_IDTENTRY_NMI(vector, func)
 
-/* XEN NMI and DB wrapper */
-#define DECLARE_IDTENTRY_XEN(vector, func)				\
-	idtentry vector asm_exc_xen##func exc_##func has_error_code=0
-
 /*
  * ASM code to emit the common vector entry stubs where each stub is
  * packed into 8 bytes.
@@ -570,11 +554,15 @@ DECLARE_IDTENTRY_MCE(X86_TRAP_MC,	exc_machine_check);
 
 /* NMI */
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
-DECLARE_IDTENTRY_XEN(X86_TRAP_NMI,	nmi);
+#ifdef CONFIG_XEN_PV
+DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
+#endif
 
 /* #DB */
 DECLARE_IDTENTRY_DEBUG(X86_TRAP_DB,	exc_debug);
-DECLARE_IDTENTRY_XEN(X86_TRAP_DB,	debug);
+#ifdef CONFIG_XEN_PV
+DECLARE_IDTENTRY_RAW(X86_TRAP_DB,	xenpv_exc_debug);
+#endif
 
 /* #DF */
 DECLARE_IDTENTRY_DF(X86_TRAP_DF,	exc_double_fault);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f9727b9..c17f9b5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -866,6 +866,12 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
 	trace_hardirqs_off_finish();
 
 	/*
+	 * If something gets miswired and we end up here for a user mode
+	 * #DB, we will malfunction.
+	 */
+	WARN_ON_ONCE(user_mode(regs));
+
+	/*
 	 * Catch SYSENTER with TF set and clear DR_STEP. If this hit a
 	 * watchpoint at the same time then that will still be handled.
 	 */
@@ -883,6 +889,12 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
 static __always_inline void exc_debug_user(struct pt_regs *regs,
 					   unsigned long dr6)
 {
+	/*
+	 * If something gets miswired and we end up here for a kernel mode
+	 * #DB, we will malfunction.
+	 */
+	WARN_ON_ONCE(!user_mode(regs));
+
 	idtentry_enter_user(regs);
 	instrumentation_begin();
 
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index acc49fa..0d68948 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -598,6 +598,26 @@ static void xen_write_ldt_entry(struct desc_struct *dt, int entrynum,
 }
 
 #ifdef CONFIG_X86_64
+void noist_exc_debug(struct pt_regs *regs);
+
+DEFINE_IDTENTRY_RAW(xenpv_exc_nmi)
+{
+	/* On Xen PV, NMI doesn't use IST.  The C part is the sane as native. */
+	exc_nmi(regs);
+}
+
+DEFINE_IDTENTRY_RAW(xenpv_exc_debug)
+{
+	/*
+	 * There's no IST on Xen PV, but we still need to dispatch
+	 * to the correct handler.
+	 */
+	if (user_mode(regs))
+		noist_exc_debug(regs);
+	else
+		exc_debug(regs);
+}
+
 struct trap_array_entry {
 	void (*orig)(void);
 	void (*xen)(void);
@@ -609,18 +629,18 @@ struct trap_array_entry {
 	.xen		= xen_asm_##func,		\
 	.ist_okay	= ist_ok }
 
-#define TRAP_ENTRY_REDIR(func, xenfunc, ist_ok) {	\
+#define TRAP_ENTRY_REDIR(func, ist_ok) {		\
 	.orig		= asm_##func,			\
-	.xen		= xen_asm_##xenfunc,		\
+	.xen		= xen_asm_xenpv_##func,		\
 	.ist_okay	= ist_ok }
 
 static struct trap_array_entry trap_array[] = {
-	TRAP_ENTRY_REDIR(exc_debug, exc_xendebug,	true  ),
+	TRAP_ENTRY_REDIR(exc_debug,			true  ),
 	TRAP_ENTRY(exc_double_fault,			true  ),
 #ifdef CONFIG_X86_MCE
 	TRAP_ENTRY(exc_machine_check,			true  ),
 #endif
-	TRAP_ENTRY_REDIR(exc_nmi, exc_xennmi,		true  ),
+	TRAP_ENTRY_REDIR(exc_nmi,			true  ),
 	TRAP_ENTRY(exc_int3,				false ),
 	TRAP_ENTRY(exc_overflow,			false ),
 #ifdef CONFIG_IA32_EMULATION
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index e1e1c7e..aab1d99 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -29,10 +29,9 @@ _ASM_NOKPROBE(xen_\name)
 .endm
 
 xen_pv_trap asm_exc_divide_error
-xen_pv_trap asm_exc_debug
-xen_pv_trap asm_exc_xendebug
+xen_pv_trap asm_xenpv_exc_debug
 xen_pv_trap asm_exc_int3
-xen_pv_trap asm_exc_xennmi
+xen_pv_trap asm_xenpv_exc_nmi
 xen_pv_trap asm_exc_overflow
 xen_pv_trap asm_exc_bounds
 xen_pv_trap asm_exc_invalid_op

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip: x86/urgent] x86/entry, selftests: Further improve user entry sanity checks
  2020-07-03 17:02 ` [PATCH entry v2 2/6] x86/entry, selftests: Further improve user entry sanity checks Andy Lutomirski
@ 2020-07-04 17:49   ` tip-bot2 for Andy Lutomirski
  2020-08-20 10:23     ` peterz
  0 siblings, 1 reply; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-04 17:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     3c73b81a9164d0c1b6379d6672d2772a9e95168e
Gitweb:        https://git.kernel.org/tip/3c73b81a9164d0c1b6379d6672d2772a9e95168e
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 03 Jul 2020 10:02:54 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 04 Jul 2020 19:47:25 +02:00

x86/entry, selftests: Further improve user entry sanity checks

Chasing down a Xen bug caused me to realize that the new entry sanity
checks are still fairly weak.  Add some more checks.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/881de09e786ab93ce56ee4a2437ba2c308afe7a9.1593795633.git.luto@kernel.org

---
 arch/x86/entry/common.c                  | 19 +++++++++++++++++++
 tools/testing/selftests/x86/syscall_nt.c | 11 +++++++++++
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index f392a8b..e83b3f1 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -49,6 +49,23 @@
 static void check_user_regs(struct pt_regs *regs)
 {
 	if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) {
+		/*
+		 * Make sure that the entry code gave us a sensible EFLAGS
+		 * register.  Native because we want to check the actual CPU
+		 * state, not the interrupt state as imagined by Xen.
+		 */
+		unsigned long flags = native_save_fl();
+		WARN_ON_ONCE(flags & (X86_EFLAGS_AC | X86_EFLAGS_DF |
+				      X86_EFLAGS_NT));
+
+		/* We think we came from user mode. Make sure pt_regs agrees. */
+		WARN_ON_ONCE(!user_mode(regs));
+
+		/*
+		 * All entries from user mode (except #DF) should be on the
+		 * normal thread stack and should have user pt_regs in the
+		 * correct location.
+		 */
 		WARN_ON_ONCE(!on_thread_stack());
 		WARN_ON_ONCE(regs != task_pt_regs(current));
 	}
@@ -577,6 +594,7 @@ SYSCALL_DEFINE0(ni_syscall)
 bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
 {
 	if (user_mode(regs)) {
+		check_user_regs(regs);
 		enter_from_user_mode();
 		return false;
 	}
@@ -710,6 +728,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
  */
 void noinstr idtentry_enter_user(struct pt_regs *regs)
 {
+	check_user_regs(regs);
 	enter_from_user_mode();
 }
 
diff --git a/tools/testing/selftests/x86/syscall_nt.c b/tools/testing/selftests/x86/syscall_nt.c
index 970e5e1..a108b80 100644
--- a/tools/testing/selftests/x86/syscall_nt.c
+++ b/tools/testing/selftests/x86/syscall_nt.c
@@ -81,5 +81,16 @@ int main(void)
 	printf("[RUN]\tSet NT|AC|TF and issue a syscall\n");
 	do_it(X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_TF);
 
+	/*
+	 * Now try DF.  This is evil and it's plausible that we will crash
+	 * glibc, but glibc would have to do something rather surprising
+	 * for this to happen.
+	 */
+	printf("[RUN]\tSet DF and issue a syscall\n");
+	do_it(X86_EFLAGS_DF);
+
+	printf("[RUN]\tSet TF|DF and issue a syscall\n");
+	do_it(X86_EFLAGS_TF | X86_EFLAGS_DF);
+
 	return nerrs == 0 ? 0 : 1;
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip: x86/urgent] x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
  2020-07-03 17:02 ` [PATCH entry v2 1/6] x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER Andy Lutomirski
@ 2020-07-04 17:49   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-04 17:49 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     db5b2c5a90a111618f071d231a8b945cf522313e
Gitweb:        https://git.kernel.org/tip/db5b2c5a90a111618f071d231a8b945cf522313e
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 03 Jul 2020 10:02:53 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 04 Jul 2020 19:47:25 +02:00

x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER

Move the clearing of the high bits of RAX after Xen PV joins the SYSENTER
path so that Xen PV doesn't skip it.

Arguably this code should be deleted instead, but that would belong in the
merge window.

Fixes: ffae641f5747 ("x86/entry/64/compat: Fix Xen PV SYSENTER frame setup")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9d33b3f3216dcab008070f1c28b6091ae7199969.1593795633.git.luto@kernel.org

---
 arch/x86/entry/entry_64_compat.S | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 381a6de..541fdaf 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -57,15 +57,6 @@ SYM_CODE_START(entry_SYSENTER_compat)
 
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
-	/*
-	 * User tracing code (ptrace or signal handlers) might assume that
-	 * the saved RAX contains a 32-bit number when we're invoking a 32-bit
-	 * syscall.  Just in case the high bits are nonzero, zero-extend
-	 * the syscall number.  (This could almost certainly be deleted
-	 * with no ill effects.)
-	 */
-	movl	%eax, %eax
-
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER32_DS		/* pt_regs->ss */
 	pushq	$0			/* pt_regs->sp = 0 (placeholder) */
@@ -80,6 +71,16 @@ SYM_CODE_START(entry_SYSENTER_compat)
 	pushq	$__USER32_CS		/* pt_regs->cs */
 	pushq	$0			/* pt_regs->ip = 0 (placeholder) */
 SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
+
+	/*
+	 * User tracing code (ptrace or signal handlers) might assume that
+	 * the saved RAX contains a 32-bit number when we're invoking a 32-bit
+	 * syscall.  Just in case the high bits are nonzero, zero-extend
+	 * the syscall number.  (This could almost certainly be deleted
+	 * with no ill effects.)
+	 */
+	movl	%eax, %eax
+
 	pushq	%rax			/* pt_regs->orig_ax */
 	pushq	%rdi			/* pt_regs->di */
 	pushq	%rsi			/* pt_regs->si */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH entry v2 3/6] x86/entry/xen: Route #DB correctly on Xen PV
  2020-07-03 17:02 ` [PATCH entry v2 3/6] x86/entry/xen: Route #DB correctly on Xen PV Andy Lutomirski
  2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
@ 2020-07-06  8:41   ` Michal Kubecek
  2020-07-06  8:57     ` Jürgen Groß
  1 sibling, 1 reply; 20+ messages in thread
From: Michal Kubecek @ 2020-07-06  8:41 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, Andrew Cooper, Juergen Gross, LKML

On Fri, Jul 03, 2020 at 10:02:55AM -0700, Andy Lutomirski wrote:
> On Xen PV, #DB doesn't use IST.  We still need to correctly route it
> depending on whether it came from user or kernel mode.
> 
> This patch gets rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too
> hard to follow the logic.  Instead, route #DB and NMI through
> DECLARE/DEFINE_IDTENTRY_RAW on Xen, and do the right thing for #DB.
> Also add more warnings to the exc_debug* handlers to make this type
> of failure more obvious.
> 
> This fixes various forms of corruption that happen when usermode
> triggers #DB on Xen PV.
> 
> Fixes: 4c0dcd8350a0 ("x86/entry: Implement user mode C entry points for #DB and #MCE")
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/include/asm/idtentry.h | 24 ++++++------------------
>  arch/x86/kernel/traps.c         | 12 ++++++++++++
>  arch/x86/xen/enlighten_pv.c     | 28 ++++++++++++++++++++++++----
>  arch/x86/xen/xen-asm_64.S       |  5 ++---
>  4 files changed, 44 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
> index cf51c50eb356..94333ac3092b 100644
> --- a/arch/x86/include/asm/idtentry.h
> +++ b/arch/x86/include/asm/idtentry.h
[...]
> @@ -570,11 +554,15 @@ DECLARE_IDTENTRY_MCE(X86_TRAP_MC,	exc_machine_check);
>  
>  /* NMI */
>  DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
> -DECLARE_IDTENTRY_XEN(X86_TRAP_NMI,	nmi);
> +#ifdef CONFIG_XEN_PV
> +DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
> +#endif
>  
>  /* #DB */
>  DECLARE_IDTENTRY_DEBUG(X86_TRAP_DB,	exc_debug);
> -DECLARE_IDTENTRY_XEN(X86_TRAP_DB,	debug);
> +#ifdef CONFIG_XEN_PV
> +DECLARE_IDTENTRY_RAW(X86_TRAP_DB,	xenpv_exc_debug);
> +#endif
>  
>  /* #DF */
>  DECLARE_IDTENTRY_DF(X86_TRAP_DF,	exc_double_fault);

Hello,

this patch - now in mainline as commit 13cbc0cd4a30 ("x86/entry/32: Fix
#MC and #DB wiring on x86_32") - seems to break i586 builds with
CONFIG_XEN_PV=y as xenpv_exc_nmi and xenpv_exc_debug are only defined
with CONFIG_X86_64:

[ 1279s] ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_nmi':
[ 1279s] /home/abuild/rpmbuild/BUILD/kernel-pae-5.8.rc4/linux-5.8-rc4/linux-obj/../arch/x86/include/asm/idtentry.h:557: undefined reference to `xenpv_exc_nmi'
[ 1279s] ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_debug':
[ 1279s] /home/abuild/rpmbuild/BUILD/kernel-pae-5.8.rc4/linux-5.8-rc4/linux-obj/../arch/x86/include/asm/idtentry.h:567: undefined reference to `xenpv_exc_debug'

Michal Kubecek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH entry v2 3/6] x86/entry/xen: Route #DB correctly on Xen PV
  2020-07-06  8:41   ` [PATCH entry v2 3/6] " Michal Kubecek
@ 2020-07-06  8:57     ` Jürgen Groß
  2020-07-06  9:32       ` Michal Kubecek
  0 siblings, 1 reply; 20+ messages in thread
From: Jürgen Groß @ 2020-07-06  8:57 UTC (permalink / raw)
  To: Michal Kubecek, Andy Lutomirski; +Cc: x86, Andrew Cooper, LKML

On 06.07.20 10:41, Michal Kubecek wrote:
> On Fri, Jul 03, 2020 at 10:02:55AM -0700, Andy Lutomirski wrote:
>> On Xen PV, #DB doesn't use IST.  We still need to correctly route it
>> depending on whether it came from user or kernel mode.
>>
>> This patch gets rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too
>> hard to follow the logic.  Instead, route #DB and NMI through
>> DECLARE/DEFINE_IDTENTRY_RAW on Xen, and do the right thing for #DB.
>> Also add more warnings to the exc_debug* handlers to make this type
>> of failure more obvious.
>>
>> This fixes various forms of corruption that happen when usermode
>> triggers #DB on Xen PV.
>>
>> Fixes: 4c0dcd8350a0 ("x86/entry: Implement user mode C entry points for #DB and #MCE")
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>   arch/x86/include/asm/idtentry.h | 24 ++++++------------------
>>   arch/x86/kernel/traps.c         | 12 ++++++++++++
>>   arch/x86/xen/enlighten_pv.c     | 28 ++++++++++++++++++++++++----
>>   arch/x86/xen/xen-asm_64.S       |  5 ++---
>>   4 files changed, 44 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
>> index cf51c50eb356..94333ac3092b 100644
>> --- a/arch/x86/include/asm/idtentry.h
>> +++ b/arch/x86/include/asm/idtentry.h
> [...]
>> @@ -570,11 +554,15 @@ DECLARE_IDTENTRY_MCE(X86_TRAP_MC,	exc_machine_check);
>>   
>>   /* NMI */
>>   DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
>> -DECLARE_IDTENTRY_XEN(X86_TRAP_NMI,	nmi);
>> +#ifdef CONFIG_XEN_PV
>> +DECLARE_IDTENTRY_RAW(X86_TRAP_NMI,	xenpv_exc_nmi);
>> +#endif
>>   
>>   /* #DB */
>>   DECLARE_IDTENTRY_DEBUG(X86_TRAP_DB,	exc_debug);
>> -DECLARE_IDTENTRY_XEN(X86_TRAP_DB,	debug);
>> +#ifdef CONFIG_XEN_PV
>> +DECLARE_IDTENTRY_RAW(X86_TRAP_DB,	xenpv_exc_debug);
>> +#endif
>>   
>>   /* #DF */
>>   DECLARE_IDTENTRY_DF(X86_TRAP_DF,	exc_double_fault);
> 
> Hello,
> 
> this patch - now in mainline as commit 13cbc0cd4a30 ("x86/entry/32: Fix
> #MC and #DB wiring on x86_32") - seems to break i586 builds with
> CONFIG_XEN_PV=y as xenpv_exc_nmi and xenpv_exc_debug are only defined
> with CONFIG_X86_64:
> 
> [ 1279s] ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_nmi':
> [ 1279s] /home/abuild/rpmbuild/BUILD/kernel-pae-5.8.rc4/linux-5.8-rc4/linux-obj/../arch/x86/include/asm/idtentry.h:557: undefined reference to `xenpv_exc_nmi'
> [ 1279s] ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_debug':
> [ 1279s] /home/abuild/rpmbuild/BUILD/kernel-pae-5.8.rc4/linux-5.8-rc4/linux-obj/../arch/x86/include/asm/idtentry.h:567: undefined reference to `xenpv_exc_debug'

Fix is already queued in tip/tip.git x86/urgent


Juergen


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH entry v2 3/6] x86/entry/xen: Route #DB correctly on Xen PV
  2020-07-06  8:57     ` Jürgen Groß
@ 2020-07-06  9:32       ` Michal Kubecek
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Kubecek @ 2020-07-06  9:32 UTC (permalink / raw)
  To: Jürgen Groß; +Cc: Andy Lutomirski, x86, Andrew Cooper, LKML

On Mon, Jul 06, 2020 at 10:57:07AM +0200, Jürgen Groß wrote:
> On 06.07.20 10:41, Michal Kubecek wrote:
> > this patch - now in mainline as commit 13cbc0cd4a30 ("x86/entry/32: Fix
> > #MC and #DB wiring on x86_32") - seems to break i586 builds with
> > CONFIG_XEN_PV=y as xenpv_exc_nmi and xenpv_exc_debug are only defined
> > with CONFIG_X86_64:
> > 
> > [ 1279s] ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_nmi':
> > [ 1279s] /home/abuild/rpmbuild/BUILD/kernel-pae-5.8.rc4/linux-5.8-rc4/linux-obj/../arch/x86/include/asm/idtentry.h:557: undefined reference to `xenpv_exc_nmi'
> > [ 1279s] ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_debug':
> > [ 1279s] /home/abuild/rpmbuild/BUILD/kernel-pae-5.8.rc4/linux-5.8-rc4/linux-obj/../arch/x86/include/asm/idtentry.h:567: undefined reference to `xenpv_exc_debug'
> 
> Fix is already queued in tip/tip.git x86/urgent

Thank you,
Michal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip: x86/entry] x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit()
  2020-07-03 17:02 ` [PATCH entry v2 6/6] x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit() Andy Lutomirski
@ 2020-07-07  8:23   ` tip-bot2 for Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Andy Lutomirski @ 2020-07-07  8:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the x86/entry branch of tip:

Commit-ID:     b037b09b9058d84882fa2c4db3806433e2b0f912
Gitweb:        https://git.kernel.org/tip/b037b09b9058d84882fa2c4db3806433e2b0f912
Author:        Andy Lutomirski <luto@kernel.org>
AuthorDate:    Fri, 03 Jul 2020 10:02:58 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 06 Jul 2020 21:15:52 +02:00

x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit()

They were originally called _cond_rcu because they were special versions
with conditional RCU handling.  Now they're the standard entry and exit
path, so the _cond_rcu part is just confusing.  Drop it.

Also change the signature to make them more extensible and more foolproof.

No functional change -- it's pure refactoring.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/247fc67685263e0b673e1d7f808182d28ff80359.1593795633.git.luto@kernel.org

---
 arch/x86/entry/common.c         | 50 +++++++++++++++++---------------
 arch/x86/include/asm/idtentry.h | 28 ++++++++++--------
 arch/x86/kernel/kvm.c           |  6 ++--
 arch/x86/kernel/traps.c         |  6 ++--
 arch/x86/mm/fault.c             |  6 ++--
 5 files changed, 53 insertions(+), 43 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index e83b3f1..0521546 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -559,8 +559,7 @@ SYSCALL_DEFINE0(ni_syscall)
 }
 
 /**
- * idtentry_enter_cond_rcu - Handle state tracking on idtentry with conditional
- *			     RCU handling
+ * idtentry_enter - Handle state tracking on ordinary idtentries
  * @regs:	Pointer to pt_regs of interrupted context
  *
  * Invokes:
@@ -572,6 +571,9 @@ SYSCALL_DEFINE0(ni_syscall)
  *  - The hardirq tracer to keep the state consistent as low level ASM
  *    entry disabled interrupts.
  *
+ * As a precondition, this requires that the entry came from user mode,
+ * idle, or a kernel context in which RCU is watching.
+ *
  * For kernel mode entries RCU handling is done conditional. If RCU is
  * watching then the only RCU requirement is to check whether the tick has
  * to be restarted. If RCU is not watching then rcu_irq_enter() has to be
@@ -585,18 +587,21 @@ SYSCALL_DEFINE0(ni_syscall)
  * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
  * would not be possible.
  *
- * Returns: True if RCU has been adjusted on a kernel entry
- *	    False otherwise
+ * Returns: An opaque object that must be passed to idtentry_exit()
  *
- * The return value must be fed into the rcu_exit argument of
- * idtentry_exit_cond_rcu().
+ * The return value must be fed into the state argument of
+ * idtentry_exit().
  */
-bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
+idtentry_state_t noinstr idtentry_enter(struct pt_regs *regs)
 {
+	idtentry_state_t ret = {
+		.exit_rcu = false,
+	};
+
 	if (user_mode(regs)) {
 		check_user_regs(regs);
 		enter_from_user_mode();
-		return false;
+		return ret;
 	}
 
 	/*
@@ -634,7 +639,8 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
 		trace_hardirqs_off_finish();
 		instrumentation_end();
 
-		return true;
+		ret.exit_rcu = true;
+		return ret;
 	}
 
 	/*
@@ -649,7 +655,7 @@ bool noinstr idtentry_enter_cond_rcu(struct pt_regs *regs)
 	trace_hardirqs_off();
 	instrumentation_end();
 
-	return false;
+	return ret;
 }
 
 static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
@@ -667,10 +673,9 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
 }
 
 /**
- * idtentry_exit_cond_rcu - Handle return from exception with conditional RCU
- *			    handling
+ * idtentry_exit - Handle return from exception that used idtentry_enter()
  * @regs:	Pointer to pt_regs (exception entry regs)
- * @rcu_exit:	Invoke rcu_irq_exit() if true
+ * @state:	Return value from matching call to idtentry_enter()
  *
  * Depending on the return target (kernel/user) this runs the necessary
  * preemption and work checks if possible and reguired and returns to
@@ -679,10 +684,10 @@ static void idtentry_exit_cond_resched(struct pt_regs *regs, bool may_sched)
  * This is the last action before returning to the low level ASM code which
  * just needs to return to the appropriate context.
  *
- * Counterpart to idtentry_enter_cond_rcu(). The return value of the entry
- * function must be fed into the @rcu_exit argument.
+ * Counterpart to idtentry_enter(). The return value of the entry
+ * function must be fed into the @state argument.
  */
-void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
+void noinstr idtentry_exit(struct pt_regs *regs, idtentry_state_t state)
 {
 	lockdep_assert_irqs_disabled();
 
@@ -695,7 +700,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
 		 * carefully and needs the same ordering of lockdep/tracing
 		 * and RCU as the return to user mode path.
 		 */
-		if (rcu_exit) {
+		if (state.exit_rcu) {
 			instrumentation_begin();
 			/* Tell the tracer that IRET will enable interrupts */
 			trace_hardirqs_on_prepare();
@@ -714,7 +719,7 @@ void noinstr idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit)
 		 * IRQ flags state is correct already. Just tell RCU if it
 		 * was not watching on entry.
 		 */
-		if (rcu_exit)
+		if (state.exit_rcu)
 			rcu_irq_exit();
 	}
 }
@@ -800,9 +805,10 @@ static void __xen_pv_evtchn_do_upcall(void)
 __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
 {
 	struct pt_regs *old_regs;
-	bool inhcall, rcu_exit;
+	bool inhcall;
+	idtentry_state_t state;
 
-	rcu_exit = idtentry_enter_cond_rcu(regs);
+	state = idtentry_enter(regs);
 	old_regs = set_irq_regs(regs);
 
 	instrumentation_begin();
@@ -812,13 +818,13 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs)
 	set_irq_regs(old_regs);
 
 	inhcall = get_and_clear_inhcall();
-	if (inhcall && !WARN_ON_ONCE(rcu_exit)) {
+	if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
 		instrumentation_begin();
 		idtentry_exit_cond_resched(regs, true);
 		instrumentation_end();
 		restore_inhcall(inhcall);
 	} else {
-		idtentry_exit_cond_rcu(regs, rcu_exit);
+		idtentry_exit(regs, state);
 	}
 }
 #endif /* CONFIG_XEN_PV */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index eeac6dc..7227225 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -13,8 +13,12 @@
 void idtentry_enter_user(struct pt_regs *regs);
 void idtentry_exit_user(struct pt_regs *regs);
 
-bool idtentry_enter_cond_rcu(struct pt_regs *regs);
-void idtentry_exit_cond_rcu(struct pt_regs *regs, bool rcu_exit);
+typedef struct idtentry_state {
+	bool exit_rcu;
+} idtentry_state_t;
+
+idtentry_state_t idtentry_enter(struct pt_regs *regs);
+void idtentry_exit(struct pt_regs *regs, idtentry_state_t state);
 
 /**
  * DECLARE_IDTENTRY - Declare functions for simple IDT entry points
@@ -54,12 +58,12 @@ static __always_inline void __##func(struct pt_regs *regs);		\
 									\
 __visible noinstr void func(struct pt_regs *regs)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	__##func (regs);						\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static __always_inline void __##func(struct pt_regs *regs)
@@ -101,12 +105,12 @@ static __always_inline void __##func(struct pt_regs *regs,		\
 __visible noinstr void func(struct pt_regs *regs,			\
 			    unsigned long error_code)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	__##func (regs, error_code);					\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static __always_inline void __##func(struct pt_regs *regs,		\
@@ -199,7 +203,7 @@ static __always_inline void __##func(struct pt_regs *regs, u8 vector);	\
 __visible noinstr void func(struct pt_regs *regs,			\
 			    unsigned long error_code)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	irq_enter_rcu();						\
@@ -207,7 +211,7 @@ __visible noinstr void func(struct pt_regs *regs,			\
 	__##func (regs, (u8)error_code);				\
 	irq_exit_rcu();							\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static __always_inline void __##func(struct pt_regs *regs, u8 vector)
@@ -241,7 +245,7 @@ static void __##func(struct pt_regs *regs);				\
 									\
 __visible noinstr void func(struct pt_regs *regs)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	irq_enter_rcu();						\
@@ -249,7 +253,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	run_on_irqstack_cond(__##func, regs, regs);			\
 	irq_exit_rcu();							\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static noinline void __##func(struct pt_regs *regs)
@@ -270,7 +274,7 @@ static __always_inline void __##func(struct pt_regs *regs);		\
 									\
 __visible noinstr void func(struct pt_regs *regs)			\
 {									\
-	bool rcu_exit = idtentry_enter_cond_rcu(regs);			\
+	idtentry_state_t state = idtentry_enter(regs);			\
 									\
 	instrumentation_begin();					\
 	__irq_enter_raw();						\
@@ -278,7 +282,7 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	__##func (regs);						\
 	__irq_exit_raw();						\
 	instrumentation_end();						\
-	idtentry_exit_cond_rcu(regs, rcu_exit);				\
+	idtentry_exit(regs, state);					\
 }									\
 									\
 static __always_inline void __##func(struct pt_regs *regs)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index df63786..3f78482 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -233,7 +233,7 @@ EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf_flags);
 noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
 {
 	u32 reason = kvm_read_and_reset_apf_flags();
-	bool rcu_exit;
+	idtentry_state_t state;
 
 	switch (reason) {
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
@@ -243,7 +243,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
 		return false;
 	}
 
-	rcu_exit = idtentry_enter_cond_rcu(regs);
+	state = idtentry_enter(regs);
 	instrumentation_begin();
 
 	/*
@@ -264,7 +264,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
 	}
 
 	instrumentation_end();
-	idtentry_exit_cond_rcu(regs, rcu_exit);
+	idtentry_exit(regs, state);
 	return true;
 }
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b038695..4627f82 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -245,7 +245,7 @@ static noinstr bool handle_bug(struct pt_regs *regs)
 
 DEFINE_IDTENTRY_RAW(exc_invalid_op)
 {
-	bool rcu_exit;
+	idtentry_state_t state;
 
 	/*
 	 * We use UD2 as a short encoding for 'CALL __WARN', as such
@@ -255,11 +255,11 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op)
 	if (!user_mode(regs) && handle_bug(regs))
 		return;
 
-	rcu_exit = idtentry_enter_cond_rcu(regs);
+	state = idtentry_enter(regs);
 	instrumentation_begin();
 	handle_invalid_op(regs);
 	instrumentation_end();
-	idtentry_exit_cond_rcu(regs, rcu_exit);
+	idtentry_exit(regs, state);
 }
 
 DEFINE_IDTENTRY(exc_coproc_segment_overrun)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 1ead568..5e41949 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1377,7 +1377,7 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
 DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
 {
 	unsigned long address = read_cr2();
-	bool rcu_exit;
+	idtentry_state_t state;
 
 	prefetchw(&current->mm->mmap_lock);
 
@@ -1412,11 +1412,11 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
 	 * code reenabled RCU to avoid subsequent wreckage which helps
 	 * debugability.
 	 */
-	rcu_exit = idtentry_enter_cond_rcu(regs);
+	state = idtentry_enter(regs);
 
 	instrumentation_begin();
 	handle_page_fault(regs, error_code, address);
 	instrumentation_end();
 
-	idtentry_exit_cond_rcu(regs, rcu_exit);
+	idtentry_exit(regs, state);
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip: x86/urgent] x86/entry, selftests: Further improve user entry sanity checks
  2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
@ 2020-08-20 10:23     ` peterz
  2020-08-22 21:59       ` Andy Lutomirski
  0 siblings, 1 reply; 20+ messages in thread
From: peterz @ 2020-08-20 10:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-tip-commits, Andy Lutomirski, Thomas Gleixner, x86

On Sat, Jul 04, 2020 at 05:49:10PM -0000, tip-bot2 for Andy Lutomirski wrote:

> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index f392a8b..e83b3f1 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -49,6 +49,23 @@
>  static void check_user_regs(struct pt_regs *regs)
>  {
>  	if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) {
> +		/*
> +		 * Make sure that the entry code gave us a sensible EFLAGS
> +		 * register.  Native because we want to check the actual CPU
> +		 * state, not the interrupt state as imagined by Xen.
> +		 */
> +		unsigned long flags = native_save_fl();
> +		WARN_ON_ONCE(flags & (X86_EFLAGS_AC | X86_EFLAGS_DF |
> +				      X86_EFLAGS_NT));

This triggers with AC|TF on my !SMAP enabled machine.

something like so then?

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index a8f9315b9eae..76410964585f 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -18,8 +18,15 @@ static __always_inline void arch_check_user_regs(struct pt_regs *regs)
 		 * state, not the interrupt state as imagined by Xen.
 		 */
 		unsigned long flags = native_save_fl();
-		WARN_ON_ONCE(flags & (X86_EFLAGS_AC | X86_EFLAGS_DF |
-				      X86_EFLAGS_NT));
+		unsigned long mask = X86_EFLAGS_DF | X86_EFLAGS_NT;
+
+		/*
+		 * For !SMAP hardware we patch out CLAC on entry.
+		 */
+		if (boot_cpu_has(X86_FEATURE_SMAP))
+			mask |= X86_EFLAGS_AC;
+
+		WARN_ON_ONCE(flags & mask);
 
 		/* We think we came from user mode. Make sure pt_regs agrees. */
 		WARN_ON_ONCE(!user_mode(regs));

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [tip: x86/urgent] x86/entry, selftests: Further improve user entry sanity checks
  2020-08-20 10:23     ` peterz
@ 2020-08-22 21:59       ` Andy Lutomirski
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Lutomirski @ 2020-08-22 21:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, linux-tip-commits, Andy Lutomirski, Thomas Gleixner, x86

On Thu, Aug 20, 2020 at 3:24 AM <peterz@infradead.org> wrote:
>
> On Sat, Jul 04, 2020 at 05:49:10PM -0000, tip-bot2 for Andy Lutomirski wrote:
>
> > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > index f392a8b..e83b3f1 100644
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -49,6 +49,23 @@
> >  static void check_user_regs(struct pt_regs *regs)
> >  {
> >       if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) {
> > +             /*
> > +              * Make sure that the entry code gave us a sensible EFLAGS
> > +              * register.  Native because we want to check the actual CPU
> > +              * state, not the interrupt state as imagined by Xen.
> > +              */
> > +             unsigned long flags = native_save_fl();
> > +             WARN_ON_ONCE(flags & (X86_EFLAGS_AC | X86_EFLAGS_DF |
> > +                                   X86_EFLAGS_NT));
>
> This triggers with AC|TF on my !SMAP enabled machine.
>
> something like so then?
>
> diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
> index a8f9315b9eae..76410964585f 100644
> --- a/arch/x86/include/asm/entry-common.h
> +++ b/arch/x86/include/asm/entry-common.h
> @@ -18,8 +18,15 @@ static __always_inline void arch_check_user_regs(struct pt_regs *regs)
>                  * state, not the interrupt state as imagined by Xen.
>                  */
>                 unsigned long flags = native_save_fl();
> -               WARN_ON_ONCE(flags & (X86_EFLAGS_AC | X86_EFLAGS_DF |
> -                                     X86_EFLAGS_NT));
> +               unsigned long mask = X86_EFLAGS_DF | X86_EFLAGS_NT;
> +
> +               /*
> +                * For !SMAP hardware we patch out CLAC on entry.
> +                */
> +               if (boot_cpu_has(X86_FEATURE_SMAP))
> +                       mask |= X86_EFLAGS_AC;
> +
> +               WARN_ON_ONCE(flags & mask);
>
>                 /* We think we came from user mode. Make sure pt_regs agrees. */
>                 WARN_ON_ONCE(!user_mode(regs));

LGTM.

Acked-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-03 17:02 [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Andy Lutomirski
2020-07-03 17:02 ` [PATCH entry v2 1/6] x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER Andy Lutomirski
2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
2020-07-03 17:02 ` [PATCH entry v2 2/6] x86/entry, selftests: Further improve user entry sanity checks Andy Lutomirski
2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
2020-08-20 10:23     ` peterz
2020-08-22 21:59       ` Andy Lutomirski
2020-07-03 17:02 ` [PATCH entry v2 3/6] x86/entry/xen: Route #DB correctly on Xen PV Andy Lutomirski
2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
2020-07-06  8:41   ` [PATCH entry v2 3/6] " Michal Kubecek
2020-07-06  8:57     ` Jürgen Groß
2020-07-06  9:32       ` Michal Kubecek
2020-07-03 17:02 ` [PATCH entry v2 4/6] x86/entry/32: Fix #MC and #DB wiring on x86_32 Andy Lutomirski
2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
2020-07-03 17:02 ` [PATCH entry v2 5/6] x86/ldt: Disable 16-bit segments on Xen PV Andy Lutomirski
2020-07-03 19:00   ` Andrew Cooper
2020-07-04 17:49   ` [tip: x86/urgent] " tip-bot2 for Andy Lutomirski
2020-07-03 17:02 ` [PATCH entry v2 6/6] x86/entry: Rename idtentry_enter/exit_cond_rcu() to idtentry_enter/exit() Andy Lutomirski
2020-07-07  8:23   ` [tip: x86/entry] " tip-bot2 for Andy Lutomirski
2020-07-03 17:31 ` [PATCH entry v2 0/6] x86/entry: Fixes and cleanups Peter Zijlstra

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git