All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work
@ 2019-09-19 15:03 Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 01/15] entry: Provide generic syscall entry functionality Thomas Gleixner
                   ` (17 more replies)
  0 siblings, 18 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

When working on a way to move out the posix cpu timer expiry out of the
timer interrupt context, I noticed that KVM is not handling pending task
work before entering a guest. A quick hack was to add that to the x86 KVM
handling loop. The discussion ended with a request to make this a generic
infrastructure possible with also moving the per arch implementations of
the enter from and return to user space handling generic.

  https://lore.kernel.org/r/89E42BCC-47A8-458B-B06A-D6A20D20512C@amacapital.net

You asked for it, so don't complain that you have to review it :)

The series implements the syscall enter/exit and the general exit to
userspace work handling along with the pre guest enter functionality.

The series converts x86 and ARM64. x86 is fully tested including selftests
etc. ARM64 is only compile tested for now as my only ARM64 testbox is not
available right now.

Thanks,

	tglx

---
 /Makefile                               |    3 
 arch/Kconfig                            |    3 
 arch/arm64/Kconfig                      |    1 
 arch/arm64/include/asm/kvm_host.h       |    1 
 arch/arm64/kernel/entry.S               |   18 -
 arch/arm64/kernel/ptrace.c              |   65 ------
 arch/arm64/kernel/signal.c              |   45 ----
 arch/arm64/kernel/syscall.c             |   49 ----
 arch/x86/Kconfig                        |    1 
 arch/x86/entry/common.c                 |  265 +-------------------------
 arch/x86/entry/entry_32.S               |   13 -
 arch/x86/entry/entry_64.S               |   12 -
 arch/x86/entry/entry_64_compat.S        |   21 --
 arch/x86/include/asm/signal.h           |    1 
 arch/x86/include/asm/thread_info.h      |    9 
 arch/x86/kernel/signal.c                |    2 
 arch/x86/kvm/x86.c                      |   17 -
 b/arch/arm64/include/asm/entry-common.h |   76 +++++++
 b/arch/x86/include/asm/entry-common.h   |  104 ++++++++++
 b/include/linux/entry-common.h          |  324 ++++++++++++++++++++++++++++++++
 b/kernel/entry/common.c                 |  220 +++++++++++++++++++++
 kernel/Makefile                         |    1 
 22 files changed, 776 insertions(+), 475 deletions(-)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 01/15] entry: Provide generic syscall entry functionality
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-20 23:38   ` Andy Lutomirski
  2019-09-23  9:05   ` Mike Rapoport
  2019-09-19 15:03 ` [RFC patch 02/15] x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY Thomas Gleixner
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

On syscall entry certain work needs to be done conditionally like tracing,
seccomp etc. This code is duplicated in all architectures.

Provide a generic version.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/Kconfig                 |    3 +
 include/linux/entry-common.h |  122 +++++++++++++++++++++++++++++++++++++++++++
 kernel/Makefile              |    1 
 kernel/entry/Makefile        |    3 +
 kernel/entry/common.c        |   33 +++++++++++
 5 files changed, 162 insertions(+)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -27,6 +27,9 @@ config HAVE_IMA_KEXEC
 config HOTPLUG_SMT
 	bool
 
+config GENERIC_ENTRY
+       bool
+
 config OPROFILE
 	tristate "OProfile system profiling"
 	depends on PROFILING
--- /dev/null
+++ b/include/linux/entry-common.h
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_ENTRYCOMMON_H
+#define __LINUX_ENTRYCOMMON_H
+
+#include <linux/tracehook.h>
+#include <linux/syscalls.h>
+#include <linux/seccomp.h>
+#include <linux/sched.h>
+#include <linux/audit.h>
+
+#include <asm/entry-common.h>
+
+/*
+ * Define dummy _TIF work flags if not defined by the architecture or for
+ * disabled functionality.
+ */
+#ifndef _TIF_SYSCALL_TRACE
+# define _TIF_SYSCALL_TRACE		(0)
+#endif
+
+#ifndef _TIF_SYSCALL_EMU
+# define _TIF_SYSCALL_EMU		(0)
+#endif
+
+#ifndef _TIF_SYSCALL_TRACEPOINT
+# define _TIF_SYSCALL_TRACEPOINT	(0)
+#endif
+
+#ifndef _TIF_SECCOMP
+# define _TIF_SECCOMP			(0)
+#endif
+
+#ifndef _TIF_AUDIT
+# define _TIF_AUDIT			(0)
+#endif
+
+/*
+ * TIF flags handled in syscall_enter_from_usermode()
+ */
+#ifndef ARCH_SYSCALL_ENTER_WORK
+# define ARCH_SYSCALL_ENTER_WORK	(0)
+#endif
+
+#define SYSCALL_ENTER_WORK						\
+	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | TIF_SECCOMP |	\
+	 _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU |			\
+	 ARCH_SYSCALL_ENTER_WORK)
+
+/**
+ * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry()
+ *
+ * Defaults to tracehook_report_syscall_entry(). Can be replaced by
+ * architecture specific code.
+ *
+ * Invoked from syscall_enter_from_usermode()
+ */
+static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs);
+
+#ifndef arch_syscall_enter_tracehook
+static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs)
+{
+	return tracehook_report_syscall_entry(regs);
+}
+#endif
+
+/**
+ * arch_syscall_enter_seccomp - Architecture specific seccomp invocation
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Invoked from syscall_enter_from_usermode(). Can be replaced by
+ * architecture specific code.
+ */
+static inline long arch_syscall_enter_seccomp(struct pt_regs *regs);
+
+#ifndef arch_syscall_enter_seccomp
+static inline long arch_syscall_enter_seccomp(struct pt_regs *regs)
+{
+	return secure_computing(NULL);
+}
+#endif
+
+/**
+ * arch_syscall_enter_audit - Architecture specific audit invocation
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Invoked from syscall_enter_from_usermode(). Must be replaced by
+ * architecture specific code if the architecture supports audit.
+ */
+static inline void arch_syscall_enter_audit(struct pt_regs *regs);
+
+#ifndef arch_syscall_enter_audit
+static inline void arch_syscall_enter_audit(struct pt_regs *regs) { }
+#endif
+
+/* Common syscall enter function */
+long core_syscall_enter_from_usermode(struct pt_regs *regs, long syscall);
+
+/**
+ * syscall_enter_from_usermode - Check and handle work before invoking
+ *				 a syscall
+ * @regs:	Pointer to currents pt_regs
+ * @syscall:	The syscall number
+ *
+ * Invoked from architecture specific syscall entry code with interrupts
+ * enabled.
+ *
+ * Returns: The original or a modified syscall number
+ */
+static inline long syscall_enter_from_usermode(struct pt_regs *regs,
+					       long syscall)
+{
+	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+
+	if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
+		BUG_ON(regs != task_pt_regs(current));
+
+	if (ti_work & SYSCALL_ENTER_WORK)
+		syscall = core_syscall_enter_from_usermode(regs, syscall);
+	return syscall;
+}
+
+#endif
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -43,6 +43,7 @@ obj-y += irq/
 obj-y += rcu/
 obj-y += livepatch/
 obj-y += dma/
+obj-y += entry/
 
 obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
 obj-$(CONFIG_FREEZER) += freezer.o
--- /dev/null
+++ b/kernel/entry/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_GENERIC_ENTRY) += common.o
--- /dev/null
+++ b/kernel/entry/common.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/context_tracking.h>
+#include <linux/entry-common.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/syscalls.h>
+
+long core_syscall_enter_from_usermode(struct pt_regs *regs, long syscall)
+{
+	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+	unsigned long ret = 0;
+
+	if (ti_work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
+		ret = arch_syscall_enter_tracehook(regs);
+		if (ret || (ti_work & _TIF_SYSCALL_EMU))
+			return -1L;
+	}
+
+	/* Do seccomp after ptrace, to catch any tracer changes. */
+	if (ti_work & _TIF_SECCOMP) {
+		ret = arch_syscall_enter_seccomp(regs);
+		if (ret == -1L)
+			return ret;
+	}
+
+	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
+		trace_sys_enter(regs, syscall);
+
+	arch_syscall_enter_audit(regs);
+
+	return ret ? : syscall;
+}



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 02/15] x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 01/15] entry: Provide generic syscall entry functionality Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-20 23:39   ` Andy Lutomirski
  2019-09-19 15:03 ` [RFC patch 03/15] x86/entry: Use generic syscall entry function Thomas Gleixner
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Evaluating _TIF_NOHZ to decide whether to use the slow syscall entry path
is not only pointless, it's actually counterproductive:

 1) Context tracking code is invoked unconditionally before that flag is
    evaluated.

 2) If the flag is set the slow path is invoked for nothing due to #1

Remove it.  

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/thread_info.h |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -133,14 +133,10 @@ struct thread_info {
 #define _TIF_X32		(1 << TIF_X32)
 #define _TIF_FSCHECK		(1 << TIF_FSCHECK)
 
-/*
- * work to do in syscall_trace_enter().  Also includes TIF_NOHZ for
- * enter_from_user_mode()
- */
+/* Work to do before invoking the actual syscall. */
 #define _TIF_WORK_SYSCALL_ENTRY	\
 	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT |	\
-	 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT |	\
-	 _TIF_NOHZ)
+	 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
 
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW_BASE						\



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 03/15] x86/entry: Use generic syscall entry function
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 01/15] entry: Provide generic syscall entry functionality Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 02/15] x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-20 23:41   ` Andy Lutomirski
  2019-09-19 15:03 ` [RFC patch 04/15] arm64/entry: " Thomas Gleixner
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Replace the syscall entry work handling with the generic version, Provide
the necessary helper inlines to handle the real architecture specific
parts, e.g. audit and seccomp invocations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/Kconfig                    |    1 
 arch/x86/entry/common.c             |  108 +++---------------------------------
 arch/x86/include/asm/entry-common.h |   59 +++++++++++++++++++
 arch/x86/include/asm/thread_info.h  |    5 -
 4 files changed, 70 insertions(+), 103 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -110,6 +110,7 @@ config X86
 	select GENERIC_CPU_AUTOPROBE
 	select GENERIC_CPU_VULNERABILITIES
 	select GENERIC_EARLY_IOREMAP
+	select GENERIC_ENTRY
 	select GENERIC_FIND_FIRST_BIT
 	select GENERIC_IOMAP
 	select GENERIC_IRQ_EFFECTIVE_AFF_MASK	if SMP
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -10,13 +10,13 @@
 #include <linux/kernel.h>
 #include <linux/sched.h>
 #include <linux/sched/task_stack.h>
+#include <linux/entry-common.h>
 #include <linux/mm.h>
 #include <linux/smp.h>
 #include <linux/errno.h>
 #include <linux/ptrace.h>
 #include <linux/tracehook.h>
 #include <linux/audit.h>
-#include <linux/seccomp.h>
 #include <linux/signal.h>
 #include <linux/export.h>
 #include <linux/context_tracking.h>
@@ -34,7 +34,6 @@
 #include <asm/fpu/api.h>
 #include <asm/nospec-branch.h>
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
 
 #ifdef CONFIG_CONTEXT_TRACKING
@@ -48,86 +47,6 @@
 static inline void enter_from_user_mode(void) {}
 #endif
 
-static void do_audit_syscall_entry(struct pt_regs *regs, u32 arch)
-{
-#ifdef CONFIG_X86_64
-	if (arch == AUDIT_ARCH_X86_64) {
-		audit_syscall_entry(regs->orig_ax, regs->di,
-				    regs->si, regs->dx, regs->r10);
-	} else
-#endif
-	{
-		audit_syscall_entry(regs->orig_ax, regs->bx,
-				    regs->cx, regs->dx, regs->si);
-	}
-}
-
-/*
- * Returns the syscall nr to run (which should match regs->orig_ax) or -1
- * to skip the syscall.
- */
-static long syscall_trace_enter(struct pt_regs *regs)
-{
-	u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
-
-	struct thread_info *ti = current_thread_info();
-	unsigned long ret = 0;
-	u32 work;
-
-	if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
-		BUG_ON(regs != task_pt_regs(current));
-
-	work = READ_ONCE(ti->flags);
-
-	if (work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
-		ret = tracehook_report_syscall_entry(regs);
-		if (ret || (work & _TIF_SYSCALL_EMU))
-			return -1L;
-	}
-
-#ifdef CONFIG_SECCOMP
-	/*
-	 * Do seccomp after ptrace, to catch any tracer changes.
-	 */
-	if (work & _TIF_SECCOMP) {
-		struct seccomp_data sd;
-
-		sd.arch = arch;
-		sd.nr = regs->orig_ax;
-		sd.instruction_pointer = regs->ip;
-#ifdef CONFIG_X86_64
-		if (arch == AUDIT_ARCH_X86_64) {
-			sd.args[0] = regs->di;
-			sd.args[1] = regs->si;
-			sd.args[2] = regs->dx;
-			sd.args[3] = regs->r10;
-			sd.args[4] = regs->r8;
-			sd.args[5] = regs->r9;
-		} else
-#endif
-		{
-			sd.args[0] = regs->bx;
-			sd.args[1] = regs->cx;
-			sd.args[2] = regs->dx;
-			sd.args[3] = regs->si;
-			sd.args[4] = regs->di;
-			sd.args[5] = regs->bp;
-		}
-
-		ret = __secure_computing(&sd);
-		if (ret == -1)
-			return ret;
-	}
-#endif
-
-	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
-		trace_sys_enter(regs, regs->orig_ax);
-
-	do_audit_syscall_entry(regs, arch);
-
-	return ret ?: regs->orig_ax;
-}
-
 #define EXIT_TO_USERMODE_LOOP_FLAGS				\
 	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |	\
 	 _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING)
@@ -277,13 +196,10 @@ static void syscall_slow_exit_work(struc
 #ifdef CONFIG_X86_64
 __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 {
-	struct thread_info *ti;
-
 	enter_from_user_mode();
 	local_irq_enable();
-	ti = current_thread_info();
-	if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY)
-		nr = syscall_trace_enter(regs);
+
+	nr = syscall_enter_from_usermode(regs, nr);
 
 	if (likely(nr < NR_syscalls)) {
 		nr = array_index_nospec(nr, NR_syscalls);
@@ -310,22 +226,18 @@ static void syscall_slow_exit_work(struc
  */
 static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
 {
-	struct thread_info *ti = current_thread_info();
 	unsigned int nr = (unsigned int)regs->orig_ax;
 
 #ifdef CONFIG_IA32_EMULATION
-	ti->status |= TS_COMPAT;
+	current_thread_info()->status |= TS_COMPAT;
 #endif
 
-	if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
-		/*
-		 * Subtlety here: if ptrace pokes something larger than
-		 * 2^32-1 into orig_ax, this truncates it.  This may or
-		 * may not be necessary, but it matches the old asm
-		 * behavior.
-		 */
-		nr = syscall_trace_enter(regs);
-	}
+	/*
+	 * Subtlety here: if ptrace pokes something larger than 2^32-1 into
+	 * orig_ax, this truncates it.  This may or may not be necessary,
+	 * but it matches the old asm behavior.
+	 */
+	nr = syscall_enter_from_usermode(regs, nr);
 
 	if (likely(nr < IA32_NR_syscalls)) {
 		nr = array_index_nospec(nr, IA32_NR_syscalls);
--- /dev/null
+++ b/arch/x86/include/asm/entry-common.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_X86_ENTRY_COMMON_H
+#define _ASM_X86_ENTRY_COMMON_H
+
+#include <linux/seccomp.h>
+#include <linux/audit.h>
+
+static inline long arch_syscall_enter_seccomp(struct pt_regs *regs)
+{
+#ifdef CONFIG_SECCOMP
+	u32 arch = in_ia32_syscall() ? AUDIT_ARCH_I386 : AUDIT_ARCH_X86_64;
+	struct seccomp_data sd;
+
+	sd.arch = arch;
+	sd.nr = regs->orig_ax;
+	sd.instruction_pointer = regs->ip;
+
+#ifdef CONFIG_X86_64
+	if (arch == AUDIT_ARCH_X86_64) {
+		sd.args[0] = regs->di;
+		sd.args[1] = regs->si;
+		sd.args[2] = regs->dx;
+		sd.args[3] = regs->r10;
+		sd.args[4] = regs->r8;
+		sd.args[5] = regs->r9;
+	} else
+#endif
+	{
+		sd.args[0] = regs->bx;
+		sd.args[1] = regs->cx;
+		sd.args[2] = regs->dx;
+		sd.args[3] = regs->si;
+		sd.args[4] = regs->di;
+		sd.args[5] = regs->bp;
+	}
+
+	return __secure_computing(&sd);
+#else
+	return 0;
+#endif
+}
+#define arch_syscall_enter_seccomp arch_syscall_enter_seccomp
+
+static inline void arch_syscall_enter_audit(struct pt_regs *regs)
+{
+#ifdef CONFIG_X86_64
+	if (in_ia32_syscall()) {
+		audit_syscall_entry(regs->orig_ax, regs->di,
+				    regs->si, regs->dx, regs->r10);
+	} else
+#endif
+	{
+		audit_syscall_entry(regs->orig_ax, regs->bx,
+				    regs->cx, regs->dx, regs->si);
+	}
+}
+#define arch_syscall_enter_audit arch_syscall_enter_audit
+
+#endif
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -133,11 +133,6 @@ struct thread_info {
 #define _TIF_X32		(1 << TIF_X32)
 #define _TIF_FSCHECK		(1 << TIF_FSCHECK)
 
-/* Work to do before invoking the actual syscall. */
-#define _TIF_WORK_SYSCALL_ENTRY	\
-	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT |	\
-	 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)
-
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW_BASE						\
 	(_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP|		\



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 04/15] arm64/entry: Use generic syscall entry function
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (2 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 03/15] x86/entry: Use generic syscall entry function Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-20 12:21   ` Catalin Marinas
  2019-09-19 15:03 ` [RFC patch 05/15] entry: Provide generic syscall exit function Thomas Gleixner
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Replace the syscall entry work handling with the generic version, Provide
the necessary helper inlines to handle the real architecture specific
parts, e.g. audit and seccomp invocations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/arm64/Kconfig                    |    1 
 arch/arm64/include/asm/entry-common.h |   39 ++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/ptrace.c            |   29 -------------------------
 arch/arm64/kernel/syscall.c           |   15 +++++--------
 4 files changed, 47 insertions(+), 37 deletions(-)

--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -99,6 +99,7 @@ config ARM64
 	select GENERIC_CPU_AUTOPROBE
 	select GENERIC_CPU_VULNERABILITIES
 	select GENERIC_EARLY_IOREMAP
+	select GENERIC_ENTRY
 	select GENERIC_IDLE_POLL_SETUP
 	select GENERIC_IRQ_MULTI_HANDLER
 	select GENERIC_IRQ_PROBE
--- /dev/null
+++ b/arch/arm64/include/asm/entry-common.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2017 ARM Ltd.
+ */
+#ifndef __ASM_ENTRY_COMMON_H
+#define __ASM_ENTRY_COMMON_H
+
+enum ptrace_syscall_dir {
+	PTRACE_SYSCALL_ENTER = 0,
+	PTRACE_SYSCALL_EXIT,
+};
+
+/*
+ * A scratch register (ip(r12) on AArch32, x7 on AArch64) is
+ * used to denote syscall entry/exit for the tracehooks
+ */
+static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs)
+{
+	int regno = (is_compat_task() ? 12 : 7);
+	unsigned long reg = regs->regs[regno];
+	long ret;
+
+	regs->regs[regno] = PTRACE_SYSCALL_ENTER;
+	ret = tracehook_report_syscall_entry(regs);
+	if (ret)
+		forget_syscall(regs);
+	regs->regs[regno] = reg;
+	return ret;
+}
+#define arch_syscall_enter_tracehook arch_syscall_enter_tracehook
+
+static inline void arch_syscall_enter_audit(struct pt_regs *regs)
+{
+	audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1],
+			    regs->regs[2], regs->regs[3]);
+}
+#define arch_syscall_enter_audit arch_syscall_enter_audit
+
+#endif
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -13,6 +13,7 @@
 #include <linux/kernel.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/task_stack.h>
+#include <linux/entry-common.h>
 #include <linux/mm.h>
 #include <linux/nospec.h>
 #include <linux/smp.h>
@@ -41,7 +42,6 @@
 #include <asm/traps.h>
 #include <asm/system_misc.h>
 
-#define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
 
 struct pt_regs_offset {
@@ -1779,11 +1779,6 @@ long arch_ptrace(struct task_struct *chi
 	return ptrace_request(child, request, addr, data);
 }
 
-enum ptrace_syscall_dir {
-	PTRACE_SYSCALL_ENTER = 0,
-	PTRACE_SYSCALL_EXIT,
-};
-
 static void tracehook_report_syscall(struct pt_regs *regs,
 				     enum ptrace_syscall_dir dir)
 {
@@ -1806,28 +1801,6 @@ static void tracehook_report_syscall(str
 	regs->regs[regno] = saved_reg;
 }
 
-int syscall_trace_enter(struct pt_regs *regs)
-{
-	if (test_thread_flag(TIF_SYSCALL_TRACE) ||
-		test_thread_flag(TIF_SYSCALL_EMU)) {
-		tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
-		if (!in_syscall(regs) || test_thread_flag(TIF_SYSCALL_EMU))
-			return -1;
-	}
-
-	/* Do the secure computing after ptrace; failures should be fast. */
-	if (secure_computing(NULL) == -1)
-		return -1;
-
-	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
-		trace_sys_enter(regs, regs->syscallno);
-
-	audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1],
-			    regs->regs[2], regs->regs[3]);
-
-	return regs->syscallno;
-}
-
 void syscall_trace_exit(struct pt_regs *regs)
 {
 	audit_syscall_exit(regs);
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
 #include <linux/compiler.h>
+#include <linux/entry-common.h>
 #include <linux/context_tracking.h>
 #include <linux/errno.h>
 #include <linux/nospec.h>
@@ -58,7 +59,6 @@ static inline bool has_syscall_work(unsi
 	return unlikely(flags & _TIF_SYSCALL_WORK);
 }
 
-int syscall_trace_enter(struct pt_regs *regs);
 void syscall_trace_exit(struct pt_regs *regs);
 
 #ifdef CONFIG_ARM64_ERRATUM_1463225
@@ -97,19 +97,16 @@ static void el0_svc_common(struct pt_reg
 
 	regs->orig_x0 = regs->regs[0];
 	regs->syscallno = scno;
+	/* Set default error number */
+	regs->regs[0] = -ENOSYS;
 
 	cortex_a76_erratum_1463225_svc_handler();
 	local_daif_restore(DAIF_PROCCTX);
 	user_exit();
 
-	if (has_syscall_work(flags)) {
-		/* set default errno for user-issued syscall(-1) */
-		if (scno == NO_SYSCALL)
-			regs->regs[0] = -ENOSYS;
-		scno = syscall_trace_enter(regs);
-		if (scno == NO_SYSCALL)
-			goto trace_exit;
-	}
+	scno = syscall_enter_from_usermode(regs, scno);
+	if (scno == NO_SYSCALL)
+		goto trace_exit;
 
 	invoke_syscall(regs, scno, sc_nr, syscall_table);
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 05/15] entry: Provide generic syscall exit function
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (3 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 04/15] arm64/entry: " Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 06/15] x86/entry: Use generic syscall exit functionality Thomas Gleixner
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Similar to syscall entry all architectures have similar and pointlessly
different code to handle pending work before returning from a syscall to
user space.

Provide a generic version.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/entry-common.h |   31 ++++++++++++++++++++++++
 kernel/entry/common.c        |   55 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+)

--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -45,6 +45,17 @@
 	 _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU |			\
 	 ARCH_SYSCALL_ENTER_WORK)
 
+/*
+ * TIF flags handled in syscall_exit_to_usermode()
+ */
+#ifndef ARCH_SYSCALL_EXIT_WORK
+# define ARCH_SYSCALL_EXIT_WORK		(0)
+#endif
+
+#define SYSCALL_EXIT_WORK						\
+	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT |			\
+	 _TIF_SYSCALL_TRACEPOINT | ARCH_SYSCALL_EXIT_WORK)
+
 /**
  * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry()
  *
@@ -118,4 +129,24 @@ static inline long syscall_enter_from_us
 	return syscall;
 }
 
+/**
+ * arch_syscall_exit_tracehook - Wrapper around tracehook_report_syscall_exit()
+ *
+ * Defaults to tracehook_report_syscall_exit(). Can be replaced by
+ * architecture specific code.
+ *
+ * Invoked from syscall_exit_to_usermode()
+ */
+static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step);
+
+#ifndef arch_syscall_exit_tracehook
+static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step)
+{
+	tracehook_report_syscall_exit(regs, step);
+}
+#endif
+
+/* Common syscall exit function */
+void syscall_exit_to_usermode(struct pt_regs *regs, long syscall, long retval);
+
 #endif
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -31,3 +31,58 @@ long core_syscall_enter_from_usermode(st
 
 	return ret ? : syscall;
 }
+
+#ifndef _TIF_SINGLESTEP
+static inline bool report_single_step(unsigned long ti_work)
+{
+	return false;
+}
+#else
+/*
+ * If TIF_SYSCALL_EMU is set, then the only reason to report is when
+ * TIF_SINGLESTEP is set (i.e. PTRACE_SYSEMU_SINGLESTEP).  This syscall
+ * instruction has been already reported in syscall_enter_from_usermode().
+ */
+#define SYSEMU_STEP	(_TIF_SINGLESTEP | _TIF_SYSCALL_EMU)
+
+static inline bool report_single_step(unsigned long ti_work)
+{
+	return (ti_work & SYSEMU_STEP) == _TIF_SINGLESTEP;
+}
+#endif
+
+static void syscall_exit_work(struct pt_regs *regs, long retval,
+			      unsigned long ti_work)
+{
+	bool step;
+
+	audit_syscall_exit(regs);
+
+	if (ti_work & _TIF_SYSCALL_TRACEPOINT)
+		trace_sys_exit(regs, retval);
+
+	step = report_single_step(ti_work);
+	if (step || ti_work & _TIF_SYSCALL_TRACE)
+		arch_syscall_exit_tracehook(regs, step);
+}
+
+void syscall_exit_to_usermode(struct pt_regs *regs, long syscall, long retval)
+{
+	unsigned long ti_work;
+
+	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+
+	if (IS_ENABLED(CONFIG_PROVE_LOCKING) &&
+	    WARN(irqs_disabled(), "syscall %ld left IRQs disabled", syscall))
+		local_irq_enable();
+
+	rseq_syscall(regs);
+
+	/*
+	 * Handle work which needs to run exactly once per syscall exit
+	 * with interrupts enabled.
+	 */
+	ti_work = READ_ONCE(current_thread_info()->flags);
+	if (unlikely(ti_work & SYSCALL_EXIT_WORK))
+		syscall_exit_work(regs, retval, ti_work);
+}



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 06/15] x86/entry: Use generic syscall exit functionality
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (4 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 05/15] entry: Provide generic syscall exit function Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 07/15] arm64/syscall: Remove obscure flag check Thomas Gleixner
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Replace the x86 variant with the generic version.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/common.c             |   44 ------------------------------------
 arch/x86/include/asm/entry-common.h |    2 +
 2 files changed, 3 insertions(+), 43 deletions(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -139,55 +139,13 @@ static void exit_to_usermode_loop(struct
 	mds_user_clear_cpu_buffers();
 }
 
-#define SYSCALL_EXIT_WORK_FLAGS				\
-	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT |	\
-	 _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT)
-
-static void syscall_slow_exit_work(struct pt_regs *regs, u32 cached_flags)
-{
-	bool step;
-
-	audit_syscall_exit(regs);
-
-	if (cached_flags & _TIF_SYSCALL_TRACEPOINT)
-		trace_sys_exit(regs, regs->ax);
-
-	/*
-	 * If TIF_SYSCALL_EMU is set, we only get here because of
-	 * TIF_SINGLESTEP (i.e. this is PTRACE_SYSEMU_SINGLESTEP).
-	 * We already reported this syscall instruction in
-	 * syscall_trace_enter().
-	 */
-	step = unlikely(
-		(cached_flags & (_TIF_SINGLESTEP | _TIF_SYSCALL_EMU))
-		== _TIF_SINGLESTEP);
-	if (step || cached_flags & _TIF_SYSCALL_TRACE)
-		tracehook_report_syscall_exit(regs, step);
-}
-
 /*
  * Called with IRQs on and fully valid regs.  Returns with IRQs off in a
  * state such that we can immediately switch to user mode.
  */
 __visible inline void syscall_return_slowpath(struct pt_regs *regs)
 {
-	struct thread_info *ti = current_thread_info();
-	u32 cached_flags = READ_ONCE(ti->flags);
-
-	CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
-
-	if (IS_ENABLED(CONFIG_PROVE_LOCKING) &&
-	    WARN(irqs_disabled(), "syscall %ld left IRQs disabled", regs->orig_ax))
-		local_irq_enable();
-
-	rseq_syscall(regs);
-
-	/*
-	 * First do one-time work.  If these work items are enabled, we
-	 * want to run them exactly once per syscall exit with IRQs on.
-	 */
-	if (unlikely(cached_flags & SYSCALL_EXIT_WORK_FLAGS))
-		syscall_slow_exit_work(regs, cached_flags);
+	syscall_exit_to_usermode(regs, regs->orig_ax, regs->ax);
 
 	local_irq_disable();
 	prepare_exit_to_usermode(regs);
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -5,6 +5,8 @@
 #include <linux/seccomp.h>
 #include <linux/audit.h>
 
+#define ARCH_SYSCALL_EXIT_WORK		(_TIF_SINGLESTEP)
+
 static inline long arch_syscall_enter_seccomp(struct pt_regs *regs)
 {
 #ifdef CONFIG_SECCOMP



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 07/15] arm64/syscall: Remove obscure flag check
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (5 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 06/15] x86/entry: Use generic syscall exit functionality Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-20 14:29   ` Catalin Marinas
  2019-09-19 15:03 ` [RFC patch 08/15] arm64/syscall: Use generic syscall exit functionality Thomas Gleixner
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

The syscall handling code has an obscure check of pending work which does a
shortcut before returning to user space. It calls into the exit work code
when the flags at entry time required an entry into the slowpath. That does
not make sense because the underlying work functionality will reevaluate
the flags anyway and not do anything.

Replace that by a straight forward test for work flags. Preparatory change
for switching to the generic syscall exit work handling code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/arm64/kernel/syscall.c |   32 ++++++--------------------------
 1 file changed, 6 insertions(+), 26 deletions(-)

--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -93,8 +93,6 @@ static void cortex_a76_erratum_1463225_s
 static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 			   const syscall_fn_t syscall_table[])
 {
-	unsigned long flags = current_thread_info()->flags;
-
 	regs->orig_x0 = regs->regs[0];
 	regs->syscallno = scno;
 	/* Set default error number */
@@ -105,33 +103,15 @@ static void el0_svc_common(struct pt_reg
 	user_exit();
 
 	scno = syscall_enter_from_usermode(regs, scno);
-	if (scno == NO_SYSCALL)
-		goto trace_exit;
-
-	invoke_syscall(regs, scno, sc_nr, syscall_table);
+	if (scno != NO_SYSCALL)
+		invoke_syscall(regs, scno, sc_nr, syscall_table);
 
-	/*
-	 * The tracing status may have changed under our feet, so we have to
-	 * check again. However, if we were tracing entry, then we always trace
-	 * exit regardless, as the old entry assembly did.
-	 */
-	if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) {
-		local_daif_mask();
-		flags = current_thread_info()->flags;
-		if (!has_syscall_work(flags)) {
-			/*
-			 * We're off to userspace, where interrupts are
-			 * always enabled after we restore the flags from
-			 * the SPSR.
-			 */
-			trace_hardirqs_on();
-			return;
-		}
+	local_daif_mask();
+	if (has_syscall_work(current_thread_info()->flags) ||
+	    IS_ENABLED(CONFIG_DEBUG_RSEQ)) {
 		local_daif_restore(DAIF_PROCCTX);
+		syscall_trace_exit(regs);
 	}
-
-trace_exit:
-	syscall_trace_exit(regs);
 }
 
 static inline void sve_user_discard(void)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 08/15] arm64/syscall: Use generic syscall exit functionality
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (6 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 07/15] arm64/syscall: Remove obscure flag check Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 09/15] entry: Provide generic exit to usermode functionality Thomas Gleixner
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Replace the syscall exit handling code with the generic version. That gets
rid of the interrupts disabled check for work flags. That's a non-issue
because the flags are checked with READ_ONCE() in the core and not
reevaluated. That's perfectly fine because the interrupts disabled check is
also just a snapshot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/arm64/include/asm/entry-common.h |   11 +++++++++
 arch/arm64/kernel/ptrace.c            |   38 ----------------------------------
 arch/arm64/kernel/syscall.c           |   14 ------------
 3 files changed, 12 insertions(+), 51 deletions(-)

--- a/arch/arm64/include/asm/entry-common.h
+++ b/arch/arm64/include/asm/entry-common.h
@@ -29,6 +29,17 @@ static inline __must_check int arch_sysc
 }
 #define arch_syscall_enter_tracehook arch_syscall_enter_tracehook
 
+static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step)
+{
+	int regno = (is_compat_task() ? 12 : 7);
+	unsigned long reg = regs->regs[regno];
+
+	regs->regs[regno] = PTRACE_SYSCALL_EXIT;
+	tracehook_report_syscall_exit(regs, step);
+	regs->regs[regno] = reg;
+}
+#define arch_syscall_exit_tracehook arch_syscall_exit_tracehook
+
 static inline void arch_syscall_enter_audit(struct pt_regs *regs)
 {
 	audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1],
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -13,13 +13,11 @@
 #include <linux/kernel.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/task_stack.h>
-#include <linux/entry-common.h>
 #include <linux/mm.h>
 #include <linux/nospec.h>
 #include <linux/smp.h>
 #include <linux/ptrace.h>
 #include <linux/user.h>
-#include <linux/seccomp.h>
 #include <linux/security.h>
 #include <linux/init.h>
 #include <linux/signal.h>
@@ -28,7 +26,6 @@
 #include <linux/perf_event.h>
 #include <linux/hw_breakpoint.h>
 #include <linux/regset.h>
-#include <linux/tracehook.h>
 #include <linux/elf.h>
 
 #include <asm/compat.h>
@@ -1779,41 +1776,6 @@ long arch_ptrace(struct task_struct *chi
 	return ptrace_request(child, request, addr, data);
 }
 
-static void tracehook_report_syscall(struct pt_regs *regs,
-				     enum ptrace_syscall_dir dir)
-{
-	int regno;
-	unsigned long saved_reg;
-
-	/*
-	 * A scratch register (ip(r12) on AArch32, x7 on AArch64) is
-	 * used to denote syscall entry/exit:
-	 */
-	regno = (is_compat_task() ? 12 : 7);
-	saved_reg = regs->regs[regno];
-	regs->regs[regno] = dir;
-
-	if (dir == PTRACE_SYSCALL_EXIT)
-		tracehook_report_syscall_exit(regs, 0);
-	else if (tracehook_report_syscall_entry(regs))
-		forget_syscall(regs);
-
-	regs->regs[regno] = saved_reg;
-}
-
-void syscall_trace_exit(struct pt_regs *regs)
-{
-	audit_syscall_exit(regs);
-
-	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
-		trace_sys_exit(regs, regs_return_value(regs));
-
-	if (test_thread_flag(TIF_SYSCALL_TRACE))
-		tracehook_report_syscall(regs, PTRACE_SYSCALL_EXIT);
-
-	rseq_syscall(regs);
-}
-
 /*
  * SPSR_ELx bits which are always architecturally RES0 per ARM DDI 0487D.a.
  * We permit userspace to set SSBS (AArch64 bit 12, AArch32 bit 23) which is
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -54,13 +54,6 @@ static void invoke_syscall(struct pt_reg
 	regs->regs[0] = ret;
 }
 
-static inline bool has_syscall_work(unsigned long flags)
-{
-	return unlikely(flags & _TIF_SYSCALL_WORK);
-}
-
-void syscall_trace_exit(struct pt_regs *regs);
-
 #ifdef CONFIG_ARM64_ERRATUM_1463225
 DECLARE_PER_CPU(int, __in_cortex_a76_erratum_1463225_wa);
 
@@ -106,12 +99,7 @@ static void el0_svc_common(struct pt_reg
 	if (scno != NO_SYSCALL)
 		invoke_syscall(regs, scno, sc_nr, syscall_table);
 
-	local_daif_mask();
-	if (has_syscall_work(current_thread_info()->flags) ||
-	    IS_ENABLED(CONFIG_DEBUG_RSEQ)) {
-		local_daif_restore(DAIF_PROCCTX);
-		syscall_trace_exit(regs);
-	}
+	syscall_exit_to_usermode(regs, scno, regs_return_value(regs));
 }
 
 static inline void sve_user_discard(void)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 09/15] entry: Provide generic exit to usermode functionality
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (7 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 08/15] arm64/syscall: Use generic syscall exit functionality Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-23  8:30   ` Peter Zijlstra
  2019-09-19 15:03 ` [RFC patch 10/15] x86/entry: Move irq tracing to C code Thomas Gleixner
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Provide a generic facility to handle the exit to usermode work. That's
aimed to replace the pointlessly different copies in each architecture.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/entry-common.h |  105 +++++++++++++++++++++++++++++++++++++++++++
 kernel/entry/common.c        |   88 ++++++++++++++++++++++++++++++++++++
 2 files changed, 193 insertions(+)

--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -34,6 +34,30 @@
 # define _TIF_AUDIT			(0)
 #endif
 
+#ifndef _TIF_UPROBE
+# define _TIF_UPROBE			(0)
+#endif
+
+#ifndef _TIF_PATCH_PENDING
+# define _TIF_PATCH_PENDING		(0)
+#endif
+
+#ifndef _TIF_NOTIFY_RESUME
+# define _TIF_NOTIFY_RESUME		(0)
+#endif
+
+/*
+ * TIF flags handled in exit_to_usermode()
+ */
+#ifndef ARCH_EXIT_TO_USERMODE_WORK
+# define ARCH_EXIT_TO_USERMODE_WORK	(0)
+#endif
+
+#define EXIT_TO_USERMODE_WORK						\
+	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |		\
+	 _TIF_NEED_RESCHED | _TIF_PATCH_PENDING |			\
+	 ARCH_EXIT_TO_USERMODE_WORK)
+
 /*
  * TIF flags handled in syscall_enter_from_usermode()
  */
@@ -58,6 +82,87 @@
 	 _TIF_SYSCALL_TRACEPOINT | ARCH_SYSCALL_EXIT_WORK)
 
 /**
+ * local_irq_enable_exit_to_user - Exit to user variant of local_irq_enable()
+ * @ti_work:	Cached TIF flags gathered with interrupts disabled
+ *
+ * Defaults to local_irq_enable(). Can be supplied by architecture specific
+ * code.
+ */
+static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
+
+#ifndef local_irq_enable_exit_to_user
+static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
+{
+	local_irq_enable();
+}
+#endif
+
+/**
+ * local_irq_disable_exit_to_user - Exit to user variant of local_irq_disable()
+ *
+ * Defaults to local_irq_disable(). Can be supplied by architecture specific
+ * code.
+ */
+static inline void local_irq_disable_exit_to_user(void);
+
+#ifndef local_irq_disable_exit_to_user
+static inline void local_irq_disable_exit_to_user(void)
+{
+	local_irq_disable();
+}
+#endif
+
+/**
+ * arch_exit_to_usermode_work - Architecture specific TIF work for
+ *				exit to user mode.
+ * @regs:	Pointer to currents pt_regs
+ * @ti_work:	Cached TIF flags gathered with interrupts disabled
+ *
+ * Invoked from exit_to_usermode() with interrupt disabled
+ *
+ * Defaults to NOOP. Can be supplied by architecture specific code.
+ */
+static inline void arch_exit_to_usermode_work(struct pt_regs *regs,
+					      unsigned long ti_work);
+
+#ifndef arch_exit_to_usermode_work
+static inline void arch_exit_to_usermode_work(struct pt_regs *regs,
+					      unsigned long ti_work)
+{
+}
+#endif
+
+/**
+ * arch_exit_to_usermode - Architecture specific preparation for
+ *			   exit to user mode.
+ * @regs:	Pointer to currents pt_regs
+ * @ti_work:	Cached TIF flags gathered with interrupts disabled
+ *
+ * Invoked from exit_to_usermode() with interrupt disabled as the last
+ * function before return.
+ */
+static inline void arch_exit_to_usermode(struct pt_regs *regs,
+					 unsigned long ti_work);
+
+#ifndef arch_exit_to_usermode
+static inline void arch_exit_to_usermode(struct pt_regs *regs,
+					 unsigned long ti_work)
+{
+}
+#endif
+
+/* Common exit to usermode function to handle TIF work */
+asmlinkage __visible void exit_to_usermode(struct pt_regs *regs);
+
+/**
+ * arch_do_signal -  Architecture specific signal delivery function
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Invoked from exit_to_usermode()
+ */
+void arch_do_signal(struct pt_regs *regs);
+
+/**
  * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry()
  *
  * Defaults to tracehook_report_syscall_entry(). Can be replaced by
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -2,10 +2,90 @@
 
 #include <linux/context_tracking.h>
 #include <linux/entry-common.h>
+#include <linux/livepatch.h>
+#include <linux/uprobes.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
 
+static unsigned long core_exit_to_usermode_work(struct pt_regs *regs,
+						unsigned long ti_work)
+{
+	/*
+	 * Before returning to user space ensure that all pending work
+	 * items have been completed.
+	 */
+	while (ti_work & EXIT_TO_USERMODE_WORK) {
+
+		local_irq_enable_exit_to_user(ti_work);
+
+		if (ti_work & _TIF_NEED_RESCHED)
+			schedule();
+
+		if (ti_work & _TIF_UPROBE)
+			uprobe_notify_resume(regs);
+
+		if (ti_work & _TIF_PATCH_PENDING)
+			klp_update_patch_state(current);
+
+		if (ti_work & _TIF_SIGPENDING)
+			arch_do_signal(regs);
+
+		if (ti_work & _TIF_NOTIFY_RESUME) {
+			clear_thread_flag(TIF_NOTIFY_RESUME);
+			tracehook_notify_resume(regs);
+			rseq_handle_notify_resume(NULL, regs);
+		}
+
+		/* Architecture specific TIF work */
+		arch_exit_to_usermode_work(regs, ti_work);
+
+		/*
+		 * Disable interrupts and reevaluate the work flags as they
+		 * might have changed while interrupts and preemption was
+		 * enabled above.
+		 */
+		local_irq_disable_exit_to_user();
+		ti_work = READ_ONCE(current_thread_info()->flags);
+	}
+	/*
+	 * Was checked in exit_to_usermode_work() already, but the above
+	 * loop might have wreckaged it.
+	 */
+	addr_limit_user_check();
+	return ti_work;
+}
+
+static void do_exit_to_usermode(struct pt_regs *regs)
+{
+	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+
+	lockdep_sys_exit();
+
+	addr_limit_user_check();
+
+	if (unlikely(ti_work & EXIT_TO_USERMODE_WORK))
+		ti_work = core_exit_to_usermode_work(regs, ti_work);
+
+	arch_exit_to_usermode(regs, ti_work);
+	/* Return to userspace right after this which turns on interrupts */
+	trace_hardirqs_on();
+}
+
+/**
+ * exit_to_usermode - Check and handle pending work which needs to be
+ *		      handled before returning to user mode
+ * @regs:	Pointer to currents pt_regs
+ *
+ * Called and returns with interrupts disabled
+ */
+asmlinkage __visible void exit_to_usermode(struct pt_regs *regs)
+{
+	trace_hardirqs_off();
+	lockdep_assert_irqs_disabled();
+	do_exit_to_usermode(regs);
+}
+
 long core_syscall_enter_from_usermode(struct pt_regs *regs, long syscall)
 {
 	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
@@ -85,4 +165,12 @@ void syscall_exit_to_usermode(struct pt_
 	ti_work = READ_ONCE(current_thread_info()->flags);
 	if (unlikely(ti_work & SYSCALL_EXIT_WORK))
 		syscall_exit_work(regs, retval, ti_work);
+
+#ifdef ARCH_EXIT_TO_USER_FROM_SYSCALL_EXIT
+	/*
+	 * Disable interrupts and handle the regular exit to user mode work
+	 */
+	local_irq_disable_exit_to_user();
+	do_exit_to_usermode(regs);
+#endif
 }



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 10/15] x86/entry: Move irq tracing to C code
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (8 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 09/15] entry: Provide generic exit to usermode functionality Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-23  8:47   ` Peter Zijlstra
  2019-09-26  2:59   ` Josh Poimboeuf
  2019-09-19 15:03 ` [RFC patch 11/15] x86/entry: Use generic exit to usermode Thomas Gleixner
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

To prepare for converting the exit to usermode code to the generic version,
move the irqflags tracing into C code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/common.c          |   10 ++++++++++
 arch/x86/entry/entry_32.S        |   11 +----------
 arch/x86/entry/entry_64.S        |   10 ++--------
 arch/x86/entry/entry_64_compat.S |   21 ---------------------
 4 files changed, 13 insertions(+), 39 deletions(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -102,6 +102,8 @@ static void exit_to_usermode_loop(struct
 	struct thread_info *ti = current_thread_info();
 	u32 cached_flags;
 
+	trace_hardirqs_off();
+
 	addr_limit_user_check();
 
 	lockdep_assert_irqs_disabled();
@@ -137,6 +139,8 @@ static void exit_to_usermode_loop(struct
 	user_enter_irqoff();
 
 	mds_user_clear_cpu_buffers();
+
+	trace_hardirqs_on();
 }
 
 /*
@@ -154,6 +158,8 @@ static void exit_to_usermode_loop(struct
 #ifdef CONFIG_X86_64
 __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
 {
+	/* User to kernel transition disabled interrupts. */
+	trace_hardirqs_off();
 	enter_from_user_mode();
 	local_irq_enable();
 
@@ -221,6 +227,7 @@ static __always_inline void do_syscall_3
 /* Handles int $0x80 */
 __visible void do_int80_syscall_32(struct pt_regs *regs)
 {
+	trace_hardirqs_off();
 	enter_from_user_mode();
 	local_irq_enable();
 	do_syscall_32_irqs_on(regs);
@@ -237,6 +244,9 @@ static __always_inline void do_syscall_3
 	unsigned long landing_pad = (unsigned long)current->mm->context.vdso +
 		vdso_image_32.sym_int80_landing_pad;
 
+	/* User to kernel transition disabled interrupts. */
+	trace_hardirqs_off();
+
 	/*
 	 * SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward
 	 * so that 'regs->ip -= 2' lands back on an int $0x80 instruction.
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -827,7 +827,6 @@ END(ret_from_fork)
 
 ENTRY(resume_userspace)
 	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
 	movl	%esp, %eax
 	call	prepare_exit_to_usermode
 	jmp	restore_all
@@ -1049,12 +1048,6 @@ ENTRY(entry_INT80_32)
 
 	SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1	/* save rest */
 
-	/*
-	 * User mode is traced as though IRQs are on, and the interrupt gate
-	 * turned them off.
-	 */
-	TRACE_IRQS_OFF
-
 	movl	%esp, %eax
 	call	do_int80_syscall_32
 .Lsyscall_32_done:
@@ -1062,11 +1055,8 @@ ENTRY(entry_INT80_32)
 	STACKLEAK_ERASE
 
 restore_all:
-	TRACE_IRQS_IRET
 	SWITCH_TO_ENTRY_STACK
-.Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
-.Lrestore_nocheck:
 	/* Switch back to user CR3 */
 	SWITCH_TO_USER_CR3 scratch_reg=%eax
 
@@ -1086,6 +1076,7 @@ ENTRY(entry_INT80_32)
 restore_all_kernel:
 #ifdef CONFIG_PREEMPTION
 	DISABLE_INTERRUPTS(CLBR_ANY)
+	TRACE_IRQS_OFF
 	cmpl	$0, PER_CPU_VAR(__preempt_count)
 	jnz	.Lno_preempt
 	testl	$X86_EFLAGS_IF, PT_EFLAGS(%esp)	# interrupts off (exception path) ?
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -167,15 +167,11 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
 
 	PUSH_AND_CLEAR_REGS rax=$-ENOSYS
 
-	TRACE_IRQS_OFF
-
 	/* IRQs are off. */
 	movq	%rax, %rdi
 	movq	%rsp, %rsi
 	call	do_syscall_64		/* returns with IRQs disabled */
 
-	TRACE_IRQS_IRETQ		/* we're about to change IF */
-
 	/*
 	 * Try to use SYSRET instead of IRET if we're returning to
 	 * a completely clean 64-bit userspace context.  If we're not,
@@ -342,7 +338,6 @@ ENTRY(ret_from_fork)
 	UNWIND_HINT_REGS
 	movq	%rsp, %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
-	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	jmp	swapgs_restore_regs_and_return_to_usermode
 
 1:
@@ -608,7 +603,6 @@ END(common_spurious)
 	/* 0(%rsp): old RSP */
 ret_from_intr:
 	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
 
 	LEAVE_IRQ_STACK
 
@@ -619,7 +613,6 @@ END(common_spurious)
 GLOBAL(retint_user)
 	mov	%rsp,%rdi
 	call	prepare_exit_to_usermode
-	TRACE_IRQS_IRETQ
 
 GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 #ifdef CONFIG_DEBUG_ENTRY
@@ -666,6 +659,7 @@ GLOBAL(swapgs_restore_regs_and_return_to
 retint_kernel:
 #ifdef CONFIG_PREEMPTION
 	/* Interrupts are off */
+	TRACE_IRQS_OFF
 	/* Check if we need preemption */
 	btl	$9, EFLAGS(%rsp)		/* were interrupts off? */
 	jnc	1f
@@ -1367,9 +1361,9 @@ ENTRY(error_entry)
 END(error_entry)
 
 ENTRY(error_exit)
-	UNWIND_HINT_REGS
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF
+	UNWIND_HINT_REGS
 	testb	$3, CS(%rsp)
 	jz	retint_kernel
 	jmp	retint_user
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -129,12 +129,6 @@ ENTRY(entry_SYSENTER_compat)
 	jnz	.Lsysenter_fix_flags
 .Lsysenter_flags_fixed:
 
-	/*
-	 * User mode is traced as though IRQs are on, and SYSENTER
-	 * turned them off.
-	 */
-	TRACE_IRQS_OFF
-
 	movq	%rsp, %rdi
 	call	do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
@@ -247,12 +241,6 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram
 	pushq   $0			/* pt_regs->r15 = 0 */
 	xorl	%r15d, %r15d		/* nospec   r15 */
 
-	/*
-	 * User mode is traced as though IRQs are on, and SYSENTER
-	 * turned them off.
-	 */
-	TRACE_IRQS_OFF
-
 	movq	%rsp, %rdi
 	call	do_fast_syscall_32
 	/* XEN PV guests always use IRET path */
@@ -266,7 +254,6 @@ GLOBAL(entry_SYSCALL_compat_after_hwfram
 	 * stack. So let's erase the thread stack right now.
 	 */
 	STACKLEAK_ERASE
-	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
 	movq	RBX(%rsp), %rbx		/* pt_regs->rbx */
 	movq	RBP(%rsp), %rbp		/* pt_regs->rbp */
 	movq	EFLAGS(%rsp), %r11	/* pt_regs->flags (in r11) */
@@ -403,17 +390,9 @@ ENTRY(entry_INT80_compat)
 	xorl	%r15d, %r15d		/* nospec   r15 */
 	cld
 
-	/*
-	 * User mode is traced as though IRQs are on, and the interrupt
-	 * gate turned them off.
-	 */
-	TRACE_IRQS_OFF
-
 	movq	%rsp, %rdi
 	call	do_int80_syscall_32
 .Lsyscall_32_done:
 
-	/* Go back to user mode. */
-	TRACE_IRQS_ON
 	jmp	swapgs_restore_regs_and_return_to_usermode
 END(entry_INT80_compat)



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 11/15] x86/entry: Use generic exit to usermode
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (9 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 10/15] x86/entry: Move irq tracing to C code Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 12/15] arm64/entry: " Thomas Gleixner
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Replace the x86 specific exit to usermode code with the generic
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/entry/common.c             |  111 ------------------------------------
 arch/x86/entry/entry_32.S           |    2 
 arch/x86/entry/entry_64.S           |    2 
 arch/x86/include/asm/entry-common.h |   47 ++++++++++++++-
 arch/x86/include/asm/signal.h       |    1 
 arch/x86/kernel/signal.c            |    2 
 6 files changed, 51 insertions(+), 114 deletions(-)

--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -15,15 +15,9 @@
 #include <linux/smp.h>
 #include <linux/errno.h>
 #include <linux/ptrace.h>
-#include <linux/tracehook.h>
-#include <linux/audit.h>
 #include <linux/signal.h>
 #include <linux/export.h>
-#include <linux/context_tracking.h>
-#include <linux/user-return-notifier.h>
 #include <linux/nospec.h>
-#include <linux/uprobes.h>
-#include <linux/livepatch.h>
 #include <linux/syscalls.h>
 #include <linux/uaccess.h>
 
@@ -47,102 +41,6 @@
 static inline void enter_from_user_mode(void) {}
 #endif
 
-#define EXIT_TO_USERMODE_LOOP_FLAGS				\
-	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |	\
-	 _TIF_NEED_RESCHED | _TIF_USER_RETURN_NOTIFY | _TIF_PATCH_PENDING)
-
-static void exit_to_usermode_loop(struct pt_regs *regs, u32 cached_flags)
-{
-	/*
-	 * In order to return to user mode, we need to have IRQs off with
-	 * none of EXIT_TO_USERMODE_LOOP_FLAGS set.  Several of these flags
-	 * can be set at any time on preemptible kernels if we have IRQs on,
-	 * so we need to loop.  Disabling preemption wouldn't help: doing the
-	 * work to clear some of the flags can sleep.
-	 */
-	while (true) {
-		/* We have work to do. */
-		local_irq_enable();
-
-		if (cached_flags & _TIF_NEED_RESCHED)
-			schedule();
-
-		if (cached_flags & _TIF_UPROBE)
-			uprobe_notify_resume(regs);
-
-		if (cached_flags & _TIF_PATCH_PENDING)
-			klp_update_patch_state(current);
-
-		/* deal with pending signal delivery */
-		if (cached_flags & _TIF_SIGPENDING)
-			do_signal(regs);
-
-		if (cached_flags & _TIF_NOTIFY_RESUME) {
-			clear_thread_flag(TIF_NOTIFY_RESUME);
-			tracehook_notify_resume(regs);
-			rseq_handle_notify_resume(NULL, regs);
-		}
-
-		if (cached_flags & _TIF_USER_RETURN_NOTIFY)
-			fire_user_return_notifiers();
-
-		/* Disable IRQs and retry */
-		local_irq_disable();
-
-		cached_flags = READ_ONCE(current_thread_info()->flags);
-
-		if (!(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
-			break;
-	}
-}
-
-/* Called with IRQs disabled. */
-__visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
-{
-	struct thread_info *ti = current_thread_info();
-	u32 cached_flags;
-
-	trace_hardirqs_off();
-
-	addr_limit_user_check();
-
-	lockdep_assert_irqs_disabled();
-	lockdep_sys_exit();
-
-	cached_flags = READ_ONCE(ti->flags);
-
-	if (unlikely(cached_flags & EXIT_TO_USERMODE_LOOP_FLAGS))
-		exit_to_usermode_loop(regs, cached_flags);
-
-	/* Reload ti->flags; we may have rescheduled above. */
-	cached_flags = READ_ONCE(ti->flags);
-
-	fpregs_assert_state_consistent();
-	if (unlikely(cached_flags & _TIF_NEED_FPU_LOAD))
-		switch_fpu_return();
-
-#ifdef CONFIG_COMPAT
-	/*
-	 * Compat syscalls set TS_COMPAT.  Make sure we clear it before
-	 * returning to user mode.  We need to clear it *after* signal
-	 * handling, because syscall restart has a fixup for compat
-	 * syscalls.  The fixup is exercised by the ptrace_syscall_32
-	 * selftest.
-	 *
-	 * We also need to clear TS_REGS_POKED_I386: the 32-bit tracer
-	 * special case only applies after poking regs and before the
-	 * very next return to user mode.
-	 */
-	ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
-#endif
-
-	user_enter_irqoff();
-
-	mds_user_clear_cpu_buffers();
-
-	trace_hardirqs_on();
-}
-
 /*
  * Called with IRQs on and fully valid regs.  Returns with IRQs off in a
  * state such that we can immediately switch to user mode.
@@ -150,9 +48,6 @@ static void exit_to_usermode_loop(struct
 __visible inline void syscall_return_slowpath(struct pt_regs *regs)
 {
 	syscall_exit_to_usermode(regs, regs->orig_ax, regs->ax);
-
-	local_irq_disable();
-	prepare_exit_to_usermode(regs);
 }
 
 #ifdef CONFIG_X86_64
@@ -177,7 +72,7 @@ static void exit_to_usermode_loop(struct
 #endif
 	}
 
-	syscall_return_slowpath(regs);
+	syscall_exit_to_usermode(regs, regs->orig_ax, regs->ax);
 }
 #endif
 
@@ -221,7 +116,7 @@ static __always_inline void do_syscall_3
 #endif /* CONFIG_IA32_EMULATION */
 	}
 
-	syscall_return_slowpath(regs);
+	syscall_exit_to_usermode(regs, regs->orig_ax, regs->ax);
 }
 
 /* Handles int $0x80 */
@@ -276,7 +171,7 @@ static __always_inline void do_syscall_3
 		/* User code screwed up. */
 		local_irq_disable();
 		regs->ax = -EFAULT;
-		prepare_exit_to_usermode(regs);
+		exit_to_usermode(regs);
 		return 0;	/* Keep it simple: use IRET. */
 	}
 
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -828,7 +828,7 @@ END(ret_from_fork)
 ENTRY(resume_userspace)
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	movl	%esp, %eax
-	call	prepare_exit_to_usermode
+	call	exit_to_usermode
 	jmp	restore_all
 END(ret_from_exception)
 
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -612,7 +612,7 @@ END(common_spurious)
 	/* Interrupt came from user space */
 GLOBAL(retint_user)
 	mov	%rsp,%rdi
-	call	prepare_exit_to_usermode
+	call	exit_to_usermode
 
 GLOBAL(swapgs_restore_regs_and_return_to_usermode)
 #ifdef CONFIG_DEBUG_ENTRY
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -2,11 +2,54 @@
 #ifndef _ASM_X86_ENTRY_COMMON_H
 #define _ASM_X86_ENTRY_COMMON_H
 
-#include <linux/seccomp.h>
-#include <linux/audit.h>
+#include <linux/user-return-notifier.h>
+#include <linux/context_tracking.h>
+
+#include <asm/nospec-branch.h>
+#include <asm/fpu/api.h>
 
 #define ARCH_SYSCALL_EXIT_WORK		(_TIF_SINGLESTEP)
 
+#define ARCH_EXIT_TO_USERMODE_WORK	(_TIF_USER_RETURN_NOTIFY)
+
+#define ARCH_EXIT_TO_USER_FROM_SYSCALL_EXIT
+
+static inline void arch_exit_to_usermode_work(struct pt_regs *regs,
+					      unsigned long ti_work)
+{
+	if (ti_work & _TIF_USER_RETURN_NOTIFY)
+		fire_user_return_notifiers();
+}
+#define arch_exit_to_usermode_work arch_exit_to_usermode_work
+
+static inline void arch_exit_to_usermode(struct pt_regs *regs,
+					 unsigned long ti_work)
+{
+	fpregs_assert_state_consistent();
+	if (unlikely(ti_work & _TIF_NEED_FPU_LOAD))
+		switch_fpu_return();
+
+#ifdef CONFIG_COMPAT
+	/*
+	 * Compat syscalls set TS_COMPAT.  Make sure we clear it before
+	 * returning to user mode.  We need to clear it *after* signal
+	 * handling, because syscall restart has a fixup for compat
+	 * syscalls.  The fixup is exercised by the ptrace_syscall_32
+	 * selftest.
+	 *
+	 * We also need to clear TS_REGS_POKED_I386: the 32-bit tracer
+	 * special case only applies after poking regs and before the
+	 * very next return to user mode.
+	 */
+	current_thread_info()->status &= ~(TS_COMPAT | TS_I386_REGS_POKED);
+#endif
+
+	user_enter_irqoff();
+
+	mds_user_clear_cpu_buffers();
+}
+#define arch_exit_to_usermode arch_exit_to_usermode
+
 static inline long arch_syscall_enter_seccomp(struct pt_regs *regs)
 {
 #ifdef CONFIG_SECCOMP
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -35,7 +35,6 @@ typedef sigset_t compat_sigset_t;
 #endif /* __ASSEMBLY__ */
 #include <uapi/asm/signal.h>
 #ifndef __ASSEMBLY__
-extern void do_signal(struct pt_regs *regs);
 
 #define __ARCH_HAS_SA_RESTORER
 
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -808,7 +808,7 @@ static inline unsigned long get_nr_resta
  * want to handle. Thus you cannot kill init even with a SIGKILL even by
  * mistake.
  */
-void do_signal(struct pt_regs *regs)
+void arch_do_signal(struct pt_regs *regs)
 {
 	struct ksignal ksig;
 



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 12/15] arm64/entry: Use generic exit to usermode
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (10 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 11/15] x86/entry: Use generic exit to usermode Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 13/15] arm64/entry: Move FPU restore out of exit_to_usermode() loop Thomas Gleixner
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Replace the exit to usermode code with the generic version.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/arm64/include/asm/entry-common.h |   29 +++++++++++++++++++++
 arch/arm64/kernel/entry.S             |   18 ++-----------
 arch/arm64/kernel/signal.c            |   45 ----------------------------------
 3 files changed, 33 insertions(+), 59 deletions(-)

--- a/arch/arm64/include/asm/entry-common.h
+++ b/arch/arm64/include/asm/entry-common.h
@@ -5,6 +5,35 @@
 #ifndef __ASM_ENTRY_COMMON_H
 #define __ASM_ENTRY_COMMON_H
 
+#include <asm/daifflags.h>
+
+#define ARCH_EXIT_TO_USERMODE_WORK	(_TIF_FOREIGN_FPSTATE)
+
+static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
+{
+	if (ti_work & _TIF_NEED_RESCHED)
+		local_daif_restore(DAIF_PROCCTX_NOIRQ);
+	else
+		local_daif_restore(DAIF_PROCCTX);
+}
+#define local_irq_enable_exit_to_user local_irq_enable_exit_to_user
+
+static inline void local_irq_disable_exit_to_user(void)
+{
+	local_daif_mask();
+}
+#define local_irq_disable_exit_to_user local_irq_disable_exit_to_user
+
+static inline void arch_exit_to_usermode_work(struct pt_regs *regs,
+					      unsigned long ti_work)
+{
+	/* Must this be inside the work loop ? */
+	if (ti_work & _TIF_FOREIGN_FPSTATE)
+		fpsimd_restore_current_state();
+
+}
+#define arch_exit_to_usermode_work arch_exit_to_usermode_work
+
 enum ptrace_syscall_dir {
 	PTRACE_SYSCALL_ENTER = 0,
 	PTRACE_SYSCALL_EXIT,
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -971,26 +971,14 @@ ENDPROC(el1_error)
 ENDPROC(el0_error)
 
 /*
- * Ok, we need to do extra processing, enter the slow path.
- */
-work_pending:
-	mov	x0, sp				// 'regs'
-	bl	do_notify_resume
-#ifdef CONFIG_TRACE_IRQFLAGS
-	bl	trace_hardirqs_on		// enabled while in userspace
-#endif
-	ldr	x1, [tsk, #TSK_TI_FLAGS]	// re-check for single-step
-	b	finish_ret_to_user
-/*
  * "slow" syscall return path.
  */
 ret_to_user:
 	disable_daif
 	gic_prio_kentry_setup tmp=x3
-	ldr	x1, [tsk, #TSK_TI_FLAGS]
-	and	x2, x1, #_TIF_WORK_MASK
-	cbnz	x2, work_pending
-finish_ret_to_user:
+	mov	x0, sp				// 'regs'
+	bl	exit_to_usermode
+	ldr	x1, [tsk, #TSK_TI_FLAGS]	// re-check for single-step
 	enable_step_tsk x1, x2
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
 	bl	stackleak_erase
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -825,7 +825,7 @@ static void handle_signal(struct ksignal
  * the kernel can handle, and then we build all the user-level signal handling
  * stack-frames in one go after that.
  */
-static void do_signal(struct pt_regs *regs)
+void arch_do_signal(struct pt_regs *regs)
 {
 	unsigned long continue_addr = 0, restart_addr = 0;
 	int retval = 0;
@@ -896,49 +896,6 @@ static void do_signal(struct pt_regs *re
 	restore_saved_sigmask();
 }
 
-asmlinkage void do_notify_resume(struct pt_regs *regs,
-				 unsigned long thread_flags)
-{
-	/*
-	 * The assembly code enters us with IRQs off, but it hasn't
-	 * informed the tracing code of that for efficiency reasons.
-	 * Update the trace code with the current status.
-	 */
-	trace_hardirqs_off();
-
-	do {
-		/* Check valid user FS if needed */
-		addr_limit_user_check();
-
-		if (thread_flags & _TIF_NEED_RESCHED) {
-			/* Unmask Debug and SError for the next task */
-			local_daif_restore(DAIF_PROCCTX_NOIRQ);
-
-			schedule();
-		} else {
-			local_daif_restore(DAIF_PROCCTX);
-
-			if (thread_flags & _TIF_UPROBE)
-				uprobe_notify_resume(regs);
-
-			if (thread_flags & _TIF_SIGPENDING)
-				do_signal(regs);
-
-			if (thread_flags & _TIF_NOTIFY_RESUME) {
-				clear_thread_flag(TIF_NOTIFY_RESUME);
-				tracehook_notify_resume(regs);
-				rseq_handle_notify_resume(NULL, regs);
-			}
-
-			if (thread_flags & _TIF_FOREIGN_FPSTATE)
-				fpsimd_restore_current_state();
-		}
-
-		local_daif_mask();
-		thread_flags = READ_ONCE(current_thread_info()->flags);
-	} while (thread_flags & _TIF_WORK_MASK);
-}
-
 unsigned long __ro_after_init signal_minsigstksz;
 
 /*



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 13/15] arm64/entry: Move FPU restore out of exit_to_usermode() loop
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (11 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 12/15] arm64/entry: " Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-19 15:03 ` [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest Thomas Gleixner
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Restoring the FPU is not required to be insided the loop. The final point
to do that is before returning to user space. Move it there.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/arm64/include/asm/entry-common.h |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

--- a/arch/arm64/include/asm/entry-common.h
+++ b/arch/arm64/include/asm/entry-common.h
@@ -7,8 +7,6 @@
 
 #include <asm/daifflags.h>
 
-#define ARCH_EXIT_TO_USERMODE_WORK	(_TIF_FOREIGN_FPSTATE)
-
 static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
 {
 	if (ti_work & _TIF_NEED_RESCHED)
@@ -24,15 +22,14 @@ static inline void local_irq_disable_exi
 }
 #define local_irq_disable_exit_to_user local_irq_disable_exit_to_user
 
-static inline void arch_exit_to_usermode_work(struct pt_regs *regs,
-					      unsigned long ti_work)
+static inline void arch_exit_to_usermode(struct pt_regs *regs,
+					 unsigned long ti_work)
 {
-	/* Must this be inside the work loop ? */
 	if (ti_work & _TIF_FOREIGN_FPSTATE)
 		fpsimd_restore_current_state();
 
 }
-#define arch_exit_to_usermode_work arch_exit_to_usermode_work
+#define arch_exit_to_usermode arch_exit_to_usermode
 
 enum ptrace_syscall_dir {
 	PTRACE_SYSCALL_ENTER = 0,



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (12 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 13/15] arm64/entry: Move FPU restore out of exit_to_usermode() loop Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-19 15:40   ` Paolo Bonzini
                     ` (2 more replies)
  2019-09-19 15:03 ` [RFC patch 15/15] x86/kvm: Use GENERIC_EXIT_WORKPENDING Thomas Gleixner
                   ` (3 subsequent siblings)
  17 siblings, 3 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Entering a guest is similar to exiting to user space. Pending work like
handling signals, rescheduling, task work etc. needs to be handled before
that.

Provide generic infrastructure to avoid duplication of the same handling code
all over the place.

Update ARM64 struct kvm_vcpu_stat with a signal_exit member so the generic
code compiles.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/arm64/include/asm/kvm_host.h |    1 
 include/linux/entry-common.h      |   66 ++++++++++++++++++++++++++++++++++++++
 kernel/entry/common.c             |   44 +++++++++++++++++++++++++
 3 files changed, 111 insertions(+)

--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -409,6 +409,7 @@ struct kvm_vcpu_stat {
 	u64 wfi_exit_stat;
 	u64 mmio_exit_user;
 	u64 mmio_exit_kernel;
+	u64 signal_exits;
 	u64 exits;
 };
 
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -255,4 +255,70 @@ static inline void arch_syscall_exit_tra
 /* Common syscall exit function */
 void syscall_exit_to_usermode(struct pt_regs *regs, long syscall, long retval);
 
+#if IS_ENABLED(CONFIG_KVM)
+
+#include <linux/kvm_host.h>
+
+#ifndef ARCH_EXIT_TO_GUESTMODE_WORK
+# define ARCH_EXIT_TO_GUESTMODE_WORK	(0)
+#endif
+
+#define EXIT_TO_GUESTMODE_WORK						\
+	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_RESUME |	\
+	 ARCH_EXIT_TO_GUESTMODE_WORK)
+
+int core_exit_to_guestmode_work(struct kvm *kvm, struct kvm_vcpu *vcpu,
+				unsigned long ti_work);
+
+/**
+ * arch_exit_to_guestmode - Architecture specific exit to guest mode function
+ * @kvm:	Pointer to the guest instance
+ * @vcpu:	Pointer to current's VCPU data
+ * @ti_work:	Cached TIF flags gathered in exit_to_guestmode()
+ *
+ * Invoked from core_exit_to_guestmode_work(). Can be replaced by
+ * architecture specific code.
+ */
+static inline int arch_exit_to_guestmode(struct kvm *kvm, struct kvm_vcpu *vcpu,
+					 unsigned long ti_work);
+
+#ifndef arch_exit_to_guestmode
+static inline int arch_exit_to_guestmode(struct kvm *kvm, struct kvm_vcpu *vcpu,
+					 unsigned long ti_work)
+{
+	return 0;
+}
+#endif
+
+/**
+ * exit_to_guestmode - Check and handle pending work which needs to be
+ *		       handled before returning to guest mode
+ * @kvm:	Pointer to the guest instance
+ * @vcpu:	Pointer to current's VCPU data
+ *
+ * Returns: 0 or an error code
+ */
+static inline int exit_to_guestmode(struct kvm *kvm, struct kvm_vcpu *vcpu)
+{
+	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+
+	if (unlikely(ti_work & EXIT_TO_GUESTMODE_WORK))
+		return core_exit_to_guestmode_work(kvm, vcpu, ti_work);
+	return 0;
+}
+
+
+/**
+ * _exit_to_guestmode_work_pending - Check if work is pending which needs to be
+ *				     handled before returning to guest mode
+ */
+static inline bool exit_to_guestmode_work_pending(void)
+{
+	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
+
+	return !!(ti_work & EXIT_TO_GUESTMODE_WORK);
+
+}
+#endif /* CONFIG_KVM */
+
 #endif
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -174,3 +174,47 @@ void syscall_exit_to_usermode(struct pt_
 	do_exit_to_usermode(regs);
 #endif
 }
+
+#if IS_ENABLED(CONFIG_KVM)
+int __weak arch_exit_to_guestmode_work(struct kvm *kvm, struct kvm_vcpu *vcpu,
+				       unsigned long ti_work)
+{
+	return 0;
+}
+
+int core_exit_to_guestmode_work(struct kvm *kvm, struct kvm_vcpu *vcpu,
+				unsigned long ti_work)
+{
+	/*
+	 * Before returning to guest mode handle all pending work
+	 */
+	if (ti_work & _TIF_SIGPENDING) {
+		vcpu->run->exit_reason = KVM_EXIT_INTR;
+		vcpu->stat.signal_exits++;
+		return -EINTR;
+	}
+
+	if (ti_work & _TIF_NEED_RESCHED) {
+		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
+		schedule();
+		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+	}
+
+	if (ti_work & _TIF_PATCH_PENDING) {
+		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
+		klp_update_patch_state(current);
+		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+	}
+
+	if (ti_work & _TIF_NOTIFY_RESUME) {
+		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
+		clear_thread_flag(TIF_NOTIFY_RESUME);
+		tracehook_notify_resume(NULL);
+		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+	}
+
+	/* Any extra architecture specific work */
+	return arch_exit_to_guestmode_work(kvm, vcpu, ti_work);
+}
+EXPORT_SYMBOL_GPL(core_exit_to_guestmode_work);
+#endif



^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC patch 15/15] x86/kvm: Use GENERIC_EXIT_WORKPENDING
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (13 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest Thomas Gleixner
@ 2019-09-19 15:03 ` Thomas Gleixner
  2019-09-19 15:40   ` Paolo Bonzini
  2019-09-20 15:12 ` [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Mark Rutland
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-19 15:03 UTC (permalink / raw)
  To: LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

Use the generic infrastructure to check for and handle pending work before
entering into guest mode.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kvm/x86.c |   17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -52,6 +52,7 @@
 #include <linux/irqbypass.h>
 #include <linux/sched/stat.h>
 #include <linux/sched/isolation.h>
+#include <linux/entry-common.h>
 #include <linux/mem_encrypt.h>
 
 #include <trace/events/kvm.h>
@@ -7984,8 +7985,8 @@ static int vcpu_enter_guest(struct kvm_v
 	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
 		kvm_x86_ops->sync_pir_to_irr(vcpu);
 
-	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
-	    || need_resched() || signal_pending(current)) {
+	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu) ||
+	    exit_to_guestmode_work_pending()) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
 		local_irq_enable();
@@ -8178,17 +8179,9 @@ static int vcpu_run(struct kvm_vcpu *vcp
 
 		kvm_check_async_pf_completion(vcpu);
 
-		if (signal_pending(current)) {
-			r = -EINTR;
-			vcpu->run->exit_reason = KVM_EXIT_INTR;
-			++vcpu->stat.signal_exits;
+		r = exit_to_guestmode(kvm, vcpu);
+		if (r)
 			break;
-		}
-		if (need_resched()) {
-			srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
-			cond_resched();
-			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
-		}
 	}
 
 	srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest
  2019-09-19 15:03 ` [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest Thomas Gleixner
@ 2019-09-19 15:40   ` Paolo Bonzini
  2019-09-20 11:48     ` Thomas Gleixner
  2019-09-23 18:17   ` Andy Lutomirski
  2019-09-26 11:35   ` Miroslav Benes
  2 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2019-09-19 15:40 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, kvm, linux-arch

Quick API review before I dive into the implementation.

On 19/09/19 17:03, Thomas Gleixner wrote:
> +	/*
> +	 * Before returning to guest mode handle all pending work
> +	 */
> +	if (ti_work & _TIF_SIGPENDING) {
> +		vcpu->run->exit_reason = KVM_EXIT_INTR;
> +		vcpu->stat.signal_exits++;
> +		return -EINTR;
> +	}
> +
> +	if (ti_work & _TIF_NEED_RESCHED) {
> +		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> +		schedule();
> +		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +	}
> +
> +	if (ti_work & _TIF_PATCH_PENDING) {
> +		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> +		klp_update_patch_state(current);
> +		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +	}
> +
> +	if (ti_work & _TIF_NOTIFY_RESUME) {
> +		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> +		clear_thread_flag(TIF_NOTIFY_RESUME);
> +		tracehook_notify_resume(NULL);
> +		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +	}
> +
> +	/* Any extra architecture specific work */
> +	return arch_exit_to_guestmode_work(kvm, vcpu, ti_work);
> +}

Perhaps, in virt/kvm/kvm_main.c:

int kvm_exit_to_guestmode_work(struct kvm *kvm, struct kvm_vcpu *vcpu,
				unsigned long ti_work)
{
	int r;

	/*
	 * Before returning to guest mode handle all pending work
	 */
	if (ti_work & _TIF_SIGPENDING) {
		vcpu->run->exit_reason = KVM_EXIT_INTR;
		vcpu->stat.signal_exits++;
		return -EINTR;
	}

	srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
	core_exit_to_guestmode_work(ti_work);
	vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);

	return r;
}

and in kernel/entry/common.c:

int core_exit_to_guestmode_work(unsigned long ti_work)
{
	/*
	 * Before returning to guest mode handle all pending work
	 */
	if (ti_work & _TIF_NEED_RESCHED)
		schedule();

	if (ti_work & _TIF_PATCH_PENDING)
		klp_update_patch_state(current);

	if (ti_work & _TIF_NOTIFY_RESUME) {
		clear_thread_flag(TIF_NOTIFY_RESUME);
		tracehook_notify_resume(NULL);
	}
	return arch_exit_to_guestmode_work(ti_work);
}

so that kernel/entry/ is not polluted with KVM structs and APIs.

Perhaps even extract the body of core_exit_to_usermode_work's while loop
to a separate function, and call it as

	core_exit_to_usermode_work_once(NULL,
				ti_work & EXIT_TO_GUESTMODE_WORK);

from core_exit_to_guestmode_work.

In general I don't mind having these exit_to_guestmode functions in
kvm_host.h, and only having entry-common.h export EXIT_TO_GUESTMODE_WORK
and ARCH_EXIT_TO_GUESTMODE_WORK.  Unless you had good reasons to do the
opposite...

Paolo


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 15/15] x86/kvm: Use GENERIC_EXIT_WORKPENDING
  2019-09-19 15:03 ` [RFC patch 15/15] x86/kvm: Use GENERIC_EXIT_WORKPENDING Thomas Gleixner
@ 2019-09-19 15:40   ` Paolo Bonzini
  0 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2019-09-19 15:40 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, kvm, linux-arch

On 19/09/19 17:03, Thomas Gleixner wrote:
> Use the generic infrastructure to check for and handle pending work before
> entering into guest mode.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Subject should be "x86/kvm: use exit_to_guestmode".

Paolo

> ---
>  arch/x86/kvm/x86.c |   17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -52,6 +52,7 @@
>  #include <linux/irqbypass.h>
>  #include <linux/sched/stat.h>
>  #include <linux/sched/isolation.h>
> +#include <linux/entry-common.h>
>  #include <linux/mem_encrypt.h>
>  
>  #include <trace/events/kvm.h>
> @@ -7984,8 +7985,8 @@ static int vcpu_enter_guest(struct kvm_v
>  	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
>  		kvm_x86_ops->sync_pir_to_irr(vcpu);
>  
> -	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
> -	    || need_resched() || signal_pending(current)) {
> +	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu) ||
> +	    exit_to_guestmode_work_pending()) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		smp_wmb();
>  		local_irq_enable();
> @@ -8178,17 +8179,9 @@ static int vcpu_run(struct kvm_vcpu *vcp
>  
>  		kvm_check_async_pf_completion(vcpu);
>  
> -		if (signal_pending(current)) {
> -			r = -EINTR;
> -			vcpu->run->exit_reason = KVM_EXIT_INTR;
> -			++vcpu->stat.signal_exits;
> +		r = exit_to_guestmode(kvm, vcpu);
> +		if (r)
>  			break;
> -		}
> -		if (need_resched()) {
> -			srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> -			cond_resched();
> -			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> -		}
>  	}
>  
>  	srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest
  2019-09-19 15:40   ` Paolo Bonzini
@ 2019-09-20 11:48     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-20 11:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, kvm, linux-arch

On Thu, 19 Sep 2019, Paolo Bonzini wrote:
> > +	/* Any extra architecture specific work */
> > +	return arch_exit_to_guestmode_work(kvm, vcpu, ti_work);
> > +}
> 
> Perhaps, in virt/kvm/kvm_main.c:
> 
> int kvm_exit_to_guestmode_work(struct kvm *kvm, struct kvm_vcpu *vcpu,
> 				unsigned long ti_work)
> {
...
> }
> 
> and in kernel/entry/common.c:
> 
> int core_exit_to_guestmode_work(unsigned long ti_work)
> {
...
> }

Makes sense.
 
> so that kernel/entry/ is not polluted with KVM structs and APIs.
> 
> Perhaps even extract the body of core_exit_to_usermode_work's while loop
> to a separate function, and call it as
> 
> 	core_exit_to_usermode_work_once(NULL,
> 				ti_work & EXIT_TO_GUESTMODE_WORK);

Doh, its too obvious now that you mention it :)

> from core_exit_to_guestmode_work.
> 
> In general I don't mind having these exit_to_guestmode functions in
> kvm_host.h, and only having entry-common.h export EXIT_TO_GUESTMODE_WORK
> and ARCH_EXIT_TO_GUESTMODE_WORK.  Unless you had good reasons to do the
> opposite...

That was an arbitrary choice and it does not matter much where it lives.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 04/15] arm64/entry: Use generic syscall entry function
  2019-09-19 15:03 ` [RFC patch 04/15] arm64/entry: " Thomas Gleixner
@ 2019-09-20 12:21   ` Catalin Marinas
  0 siblings, 0 replies; 43+ messages in thread
From: Catalin Marinas @ 2019-09-20 12:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra, Andy Lutomirski, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm, linux-arch

On Thu, Sep 19, 2019 at 05:03:18PM +0200, Thomas Gleixner wrote:
>  #ifdef CONFIG_ARM64_ERRATUM_1463225
> @@ -97,19 +97,16 @@ static void el0_svc_common(struct pt_reg
>  
>  	regs->orig_x0 = regs->regs[0];
>  	regs->syscallno = scno;
> +	/* Set default error number */
> +	regs->regs[0] = -ENOSYS;

I think this corrupts the first argument of all valid syscalls.
SC_ARM64_REGS_TO_ARGS uses regs[0] instead of orig_x0. ptrace should be
fine since it calls syscall_get_arguments() which uses orig_x0.

We could change the SC_ARM64_REGS_TO_ARGS macro though (in theory there
shouldn't be any performance hit as it's already cached).

>  
>  	cortex_a76_erratum_1463225_svc_handler();
>  	local_daif_restore(DAIF_PROCCTX);
>  	user_exit();
>  
> -	if (has_syscall_work(flags)) {
> -		/* set default errno for user-issued syscall(-1) */
> -		if (scno == NO_SYSCALL)
> -			regs->regs[0] = -ENOSYS;
> -		scno = syscall_trace_enter(regs);
> -		if (scno == NO_SYSCALL)
> -			goto trace_exit;
> -	}
> +	scno = syscall_enter_from_usermode(regs, scno);
> +	if (scno == NO_SYSCALL)
> +		goto trace_exit;
>  
>  	invoke_syscall(regs, scno, sc_nr, syscall_table);

-- 
Catalin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 07/15] arm64/syscall: Remove obscure flag check
  2019-09-19 15:03 ` [RFC patch 07/15] arm64/syscall: Remove obscure flag check Thomas Gleixner
@ 2019-09-20 14:29   ` Catalin Marinas
  0 siblings, 0 replies; 43+ messages in thread
From: Catalin Marinas @ 2019-09-20 14:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra, Andy Lutomirski, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm, linux-arch

On Thu, Sep 19, 2019 at 05:03:21PM +0200, Thomas Gleixner wrote:
> The syscall handling code has an obscure check of pending work which does a
> shortcut before returning to user space. It calls into the exit work code
> when the flags at entry time required an entry into the slowpath. That does
> not make sense because the underlying work functionality will reevaluate
> the flags anyway and not do anything.

The current C code was just matching the original behaviour in asm
(converted by commit f37099b6992a0b81). The idea IIRC was to always pair
a syscall_trace_enter() with a syscall_trace_exit() irrespective of the
thread flag changes. I think the behaviour is preserved with your patch
if no-one clears the work flags during el0_svc_common().

> @@ -105,33 +103,15 @@ static void el0_svc_common(struct pt_reg
>  	user_exit();
>  
>  	scno = syscall_enter_from_usermode(regs, scno);
> -	if (scno == NO_SYSCALL)
> -		goto trace_exit;
> -
> -	invoke_syscall(regs, scno, sc_nr, syscall_table);
> +	if (scno != NO_SYSCALL)
> +		invoke_syscall(regs, scno, sc_nr, syscall_table);
>  
> -	/*
> -	 * The tracing status may have changed under our feet, so we have to
> -	 * check again. However, if we were tracing entry, then we always trace
> -	 * exit regardless, as the old entry assembly did.
> -	 */
> -	if (!has_syscall_work(flags) && !IS_ENABLED(CONFIG_DEBUG_RSEQ)) {
> -		local_daif_mask();
> -		flags = current_thread_info()->flags;
> -		if (!has_syscall_work(flags)) {
> -			/*
> -			 * We're off to userspace, where interrupts are
> -			 * always enabled after we restore the flags from
> -			 * the SPSR.
> -			 */
> -			trace_hardirqs_on();
> -			return;
> -		}
> +	local_daif_mask();
> +	if (has_syscall_work(current_thread_info()->flags) ||
> +	    IS_ENABLED(CONFIG_DEBUG_RSEQ)) {
>  		local_daif_restore(DAIF_PROCCTX);
> +		syscall_trace_exit(regs);
>  	}

That's missing a trace_hardirqs_on() (off done in local_daif_mask())
before returning.

> -
> -trace_exit:
> -	syscall_trace_exit(regs);
>  }

-- 
Catalin

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (14 preceding siblings ...)
  2019-09-19 15:03 ` [RFC patch 15/15] x86/kvm: Use GENERIC_EXIT_WORKPENDING Thomas Gleixner
@ 2019-09-20 15:12 ` Mark Rutland
  2019-09-23 20:50   ` Thomas Gleixner
  2019-09-23 18:18 ` Andy Lutomirski
  2019-09-24  6:50 ` Christian Borntraeger
  17 siblings, 1 reply; 43+ messages in thread
From: Mark Rutland @ 2019-09-20 15:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Marc Zyngier, Paolo Bonzini, kvm, linux-arch,
	James Morse

Hi Thomas,

As a heads-up, I'm going to be away next week, and I likely won't have
the chance to look at this in detail before October.

On Thu, Sep 19, 2019 at 05:03:14PM +0200, Thomas Gleixner wrote:
> When working on a way to move out the posix cpu timer expiry out of the
> timer interrupt context, I noticed that KVM is not handling pending task
> work before entering a guest. A quick hack was to add that to the x86 KVM
> handling loop. The discussion ended with a request to make this a generic
> infrastructure possible with also moving the per arch implementations of
> the enter from and return to user space handling generic.
> 
>   https://lore.kernel.org/r/89E42BCC-47A8-458B-B06A-D6A20D20512C@amacapital.net
> 
> You asked for it, so don't complain that you have to review it :)

I never asked for this! ;)

> The series implements the syscall enter/exit and the general exit to
> userspace work handling along with the pre guest enter functionality.
> 
> The series converts x86 and ARM64. x86 is fully tested including selftests
> etc. ARM64 is only compile tested for now as my only ARM64 testbox is not
> available right now.

I've been working on converting the arm64 entry code to C for a while
now [1], gradually upstreaming the bits I can.

James has picked up some of that [2] as a prerequisite for some RAS
error handling, and I think building the arm64 bits atop of that would
be preferable. IIUC that should get posted as a series come -rc1.

Since there's immense scope for subtle breakage, I'd prefer that we do
the arm64-specific asm->C conversion before migrating arm64 to generic
code. That way us arm64 folk can ensure the asm->C conversion retains
the existing behaviour, and it'll be easier for everyone to compare the
arm64 and generic C implementations.

Thanks,
Mark.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
[2] git://linux-arm.org/linux-jm.git -b deasm_sync_only/v1

> 
> Thanks,
> 
> 	tglx
> 
> ---
>  /Makefile                               |    3 
>  arch/Kconfig                            |    3 
>  arch/arm64/Kconfig                      |    1 
>  arch/arm64/include/asm/kvm_host.h       |    1 
>  arch/arm64/kernel/entry.S               |   18 -
>  arch/arm64/kernel/ptrace.c              |   65 ------
>  arch/arm64/kernel/signal.c              |   45 ----
>  arch/arm64/kernel/syscall.c             |   49 ----
>  arch/x86/Kconfig                        |    1 
>  arch/x86/entry/common.c                 |  265 +-------------------------
>  arch/x86/entry/entry_32.S               |   13 -
>  arch/x86/entry/entry_64.S               |   12 -
>  arch/x86/entry/entry_64_compat.S        |   21 --
>  arch/x86/include/asm/signal.h           |    1 
>  arch/x86/include/asm/thread_info.h      |    9 
>  arch/x86/kernel/signal.c                |    2 
>  arch/x86/kvm/x86.c                      |   17 -
>  b/arch/arm64/include/asm/entry-common.h |   76 +++++++
>  b/arch/x86/include/asm/entry-common.h   |  104 ++++++++++
>  b/include/linux/entry-common.h          |  324 ++++++++++++++++++++++++++++++++
>  b/kernel/entry/common.c                 |  220 +++++++++++++++++++++
>  kernel/Makefile                         |    1 
>  22 files changed, 776 insertions(+), 475 deletions(-)
> 
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 01/15] entry: Provide generic syscall entry functionality
  2019-09-19 15:03 ` [RFC patch 01/15] entry: Provide generic syscall entry functionality Thomas Gleixner
@ 2019-09-20 23:38   ` Andy Lutomirski
  2019-10-20 11:49     ` Thomas Gleixner
  2019-09-23  9:05   ` Mike Rapoport
  1 sibling, 1 reply; 43+ messages in thread
From: Andy Lutomirski @ 2019-09-20 23:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list,
	linux-arch

On Thu, Sep 19, 2019 at 8:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On syscall entry certain work needs to be done conditionally like tracing,
> seccomp etc. This code is duplicated in all architectures.
>
> Provide a generic version.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/Kconfig                 |    3 +
>  include/linux/entry-common.h |  122 +++++++++++++++++++++++++++++++++++++++++++
>  kernel/Makefile              |    1
>  kernel/entry/Makefile        |    3 +
>  kernel/entry/common.c        |   33 +++++++++++
>  5 files changed, 162 insertions(+)
>
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -27,6 +27,9 @@ config HAVE_IMA_KEXEC
>  config HOTPLUG_SMT
>         bool
>
> +config GENERIC_ENTRY
> +       bool
> +
>  config OPROFILE
>         tristate "OProfile system profiling"
>         depends on PROFILING
> --- /dev/null
> +++ b/include/linux/entry-common.h
> @@ -0,0 +1,122 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_ENTRYCOMMON_H
> +#define __LINUX_ENTRYCOMMON_H
> +
> +#include <linux/tracehook.h>
> +#include <linux/syscalls.h>
> +#include <linux/seccomp.h>
> +#include <linux/sched.h>
> +#include <linux/audit.h>
> +
> +#include <asm/entry-common.h>
> +
> +/*
> + * Define dummy _TIF work flags if not defined by the architecture or for
> + * disabled functionality.
> + */
> +#ifndef _TIF_SYSCALL_TRACE
> +# define _TIF_SYSCALL_TRACE            (0)
> +#endif
> +
> +#ifndef _TIF_SYSCALL_EMU
> +# define _TIF_SYSCALL_EMU              (0)
> +#endif
> +
> +#ifndef _TIF_SYSCALL_TRACEPOINT
> +# define _TIF_SYSCALL_TRACEPOINT       (0)
> +#endif
> +
> +#ifndef _TIF_SECCOMP
> +# define _TIF_SECCOMP                  (0)
> +#endif
> +
> +#ifndef _TIF_AUDIT
> +# define _TIF_AUDIT                    (0)
> +#endif

I'm wondering if these should be __TIF (double-underscore) or
MAYBE_TIF_ or something to avoid errors where people do flags |=
TIF_WHATEVER and get surprised.

> +/**
> + * syscall_enter_from_usermode - Check and handle work before invoking
> + *                              a syscall
> + * @regs:      Pointer to currents pt_regs
> + * @syscall:   The syscall number
> + *
> + * Invoked from architecture specific syscall entry code with interrupts
> + * enabled.
> + *
> + * Returns: The original or a modified syscall number
> + */

Maybe document that it can return -1 to skip the syscall and that, if
this happens, it may use syscall_set_error() or
syscall_set_return_value() first.  If neither of those is called and
-1 is returned, then the syscall will fail with ENOSYS.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 02/15] x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY
  2019-09-19 15:03 ` [RFC patch 02/15] x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY Thomas Gleixner
@ 2019-09-20 23:39   ` Andy Lutomirski
  2019-09-23 20:43     ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Andy Lutomirski @ 2019-09-20 23:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list,
	linux-arch

On Thu, Sep 19, 2019 at 8:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Evaluating _TIF_NOHZ to decide whether to use the slow syscall entry path
> is not only pointless, it's actually counterproductive:
>
>  1) Context tracking code is invoked unconditionally before that flag is
>     evaluated.
>
>  2) If the flag is set the slow path is invoked for nothing due to #1

Can we also get rid of TIF_NOHZ on x86?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 03/15] x86/entry: Use generic syscall entry function
  2019-09-19 15:03 ` [RFC patch 03/15] x86/entry: Use generic syscall entry function Thomas Gleixner
@ 2019-09-20 23:41   ` Andy Lutomirski
  2019-09-23  8:31     ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Andy Lutomirski @ 2019-09-20 23:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list,
	linux-arch

On Thu, Sep 19, 2019 at 8:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Replace the syscall entry work handling with the generic version, Provide
> the necessary helper inlines to handle the real architecture specific
> parts, e.g. audit and seccomp invocations.

> -       if (work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
> -               ret = tracehook_report_syscall_entry(regs);
> -               if (ret || (work & _TIF_SYSCALL_EMU))
> -                       return -1L;
> -       }

Unless I missed something, you lost the _TIF_SYSCALL_EMU abomination.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 09/15] entry: Provide generic exit to usermode functionality
  2019-09-19 15:03 ` [RFC patch 09/15] entry: Provide generic exit to usermode functionality Thomas Gleixner
@ 2019-09-23  8:30   ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2019-09-23  8:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm, linux-arch

On Thu, Sep 19, 2019 at 05:03:23PM +0200, Thomas Gleixner wrote:
> +static unsigned long core_exit_to_usermode_work(struct pt_regs *regs,
> +						unsigned long ti_work)
> +{
> +	/*
> +	 * Before returning to user space ensure that all pending work
> +	 * items have been completed.
> +	 */
> +	while (ti_work & EXIT_TO_USERMODE_WORK) {
> +
> +		local_irq_enable_exit_to_user(ti_work);
> +
> +		if (ti_work & _TIF_NEED_RESCHED)
> +			schedule();
> +
> +		if (ti_work & _TIF_UPROBE)
> +			uprobe_notify_resume(regs);
> +
> +		if (ti_work & _TIF_PATCH_PENDING)
> +			klp_update_patch_state(current);
> +
> +		if (ti_work & _TIF_SIGPENDING)
> +			arch_do_signal(regs);
> +
> +		if (ti_work & _TIF_NOTIFY_RESUME) {
> +			clear_thread_flag(TIF_NOTIFY_RESUME);
> +			tracehook_notify_resume(regs);
> +			rseq_handle_notify_resume(NULL, regs);
> +		}
> +
> +		/* Architecture specific TIF work */
> +		arch_exit_to_usermode_work(regs, ti_work);
> +
> +		/*
> +		 * Disable interrupts and reevaluate the work flags as they
> +		 * might have changed while interrupts and preemption was
> +		 * enabled above.
> +		 */
> +		local_irq_disable_exit_to_user();
> +		ti_work = READ_ONCE(current_thread_info()->flags);
> +	}
> +	/*
> +	 * Was checked in exit_to_usermode_work() already, but the above
> +	 * loop might have wreckaged it.
> +	 */
> +	addr_limit_user_check();
> +	return ti_work;
> +}
> +
> +static void do_exit_to_usermode(struct pt_regs *regs)
> +{
> +	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +
> +	lockdep_sys_exit();
> +
> +	addr_limit_user_check();
> +
> +	if (unlikely(ti_work & EXIT_TO_USERMODE_WORK))
> +		ti_work = core_exit_to_usermode_work(regs, ti_work);

would it make sense to do:

	lockdep_sys_exit();
	addr_limit_user_check();

here instead of before core_exit_to_usermode_work(); that would also
allow getting rid of that second addr_limit_user_check() invocation.

And movind that lockdep check later would catch any of the
EXIT_TO_USERMODE_WORK users leaking a lock.

> +
> +	arch_exit_to_usermode(regs, ti_work);
> +	/* Return to userspace right after this which turns on interrupts */
> +	trace_hardirqs_on();
> +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 03/15] x86/entry: Use generic syscall entry function
  2019-09-20 23:41   ` Andy Lutomirski
@ 2019-09-23  8:31     ` Peter Zijlstra
  2019-09-23  8:40       ` Thomas Gleixner
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2019-09-23  8:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, LKML, X86 ML, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list, linux-arch

On Fri, Sep 20, 2019 at 04:41:03PM -0700, Andy Lutomirski wrote:
> On Thu, Sep 19, 2019 at 8:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Replace the syscall entry work handling with the generic version, Provide
> > the necessary helper inlines to handle the real architecture specific
> > parts, e.g. audit and seccomp invocations.
> 
> > -       if (work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
> > -               ret = tracehook_report_syscall_entry(regs);
> > -               if (ret || (work & _TIF_SYSCALL_EMU))
> > -                       return -1L;
> > -       }
> 
> Unless I missed something, you lost the _TIF_SYSCALL_EMU abomination.

IIUC that's actually in patch #1.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 03/15] x86/entry: Use generic syscall entry function
  2019-09-23  8:31     ` Peter Zijlstra
@ 2019-09-23  8:40       ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-23  8:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, LKML, X86 ML, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list, linux-arch

On Mon, 23 Sep 2019, Peter Zijlstra wrote:

> On Fri, Sep 20, 2019 at 04:41:03PM -0700, Andy Lutomirski wrote:
> > On Thu, Sep 19, 2019 at 8:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > Replace the syscall entry work handling with the generic version, Provide
> > > the necessary helper inlines to handle the real architecture specific
> > > parts, e.g. audit and seccomp invocations.
> > 
> > > -       if (work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
> > > -               ret = tracehook_report_syscall_entry(regs);
> > > -               if (ret || (work & _TIF_SYSCALL_EMU))
> > > -                       return -1L;
> > > -       }
> > 
> > Unless I missed something, you lost the _TIF_SYSCALL_EMU abomination.
> 
> IIUC that's actually in patch #1.

Correct:

+       if (ti_work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
+               ret = arch_syscall_enter_tracehook(regs);
+               if (ret || (ti_work & _TIF_SYSCALL_EMU))
+                       return -1L;
+       }


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 10/15] x86/entry: Move irq tracing to C code
  2019-09-19 15:03 ` [RFC patch 10/15] x86/entry: Move irq tracing to C code Thomas Gleixner
@ 2019-09-23  8:47   ` Peter Zijlstra
  2019-09-23 10:27     ` Thomas Gleixner
  2019-09-26  2:59   ` Josh Poimboeuf
  1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2019-09-23  8:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm, linux-arch

On Thu, Sep 19, 2019 at 05:03:24PM +0200, Thomas Gleixner wrote:
> To prepare for converting the exit to usermode code to the generic version,
> move the irqflags tracing into C code.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/entry/common.c          |   10 ++++++++++
>  arch/x86/entry/entry_32.S        |   11 +----------
>  arch/x86/entry/entry_64.S        |   10 ++--------
>  arch/x86/entry/entry_64_compat.S |   21 ---------------------
>  4 files changed, 13 insertions(+), 39 deletions(-)
> 
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -102,6 +102,8 @@ static void exit_to_usermode_loop(struct
>  	struct thread_info *ti = current_thread_info();
>  	u32 cached_flags;
>  
> +	trace_hardirqs_off();

Bah.. so this gets called from:

 - C code, with IRQs disabled
 - entry_64.S:error_exit
 - entry_32.S:resume_userspace

The first obviously doesn't need this annotation, but this patch doesn't
remove the TRACE_IRQS_OFF from entry_64.S and only the 32bit case is
changed.

Is that entry_64.S case an oversight, or do we need an extensive comment
on this one?

>  	addr_limit_user_check();
>  
>  	lockdep_assert_irqs_disabled();
> @@ -137,6 +139,8 @@ static void exit_to_usermode_loop(struct
>  	user_enter_irqoff();
>  
>  	mds_user_clear_cpu_buffers();
> +

	/* Return to userspace (IRET) will re-enable interrupts */
> +	trace_hardirqs_on();
>  }

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 01/15] entry: Provide generic syscall entry functionality
  2019-09-19 15:03 ` [RFC patch 01/15] entry: Provide generic syscall entry functionality Thomas Gleixner
  2019-09-20 23:38   ` Andy Lutomirski
@ 2019-09-23  9:05   ` Mike Rapoport
  1 sibling, 0 replies; 43+ messages in thread
From: Mike Rapoport @ 2019-09-23  9:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

On Thu, Sep 19, 2019 at 05:03:15PM +0200, Thomas Gleixner wrote:
> On syscall entry certain work needs to be done conditionally like tracing,
> seccomp etc. This code is duplicated in all architectures.
> 
> Provide a generic version.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/Kconfig                 |    3 +
>  include/linux/entry-common.h |  122 +++++++++++++++++++++++++++++++++++++++++++
>  kernel/Makefile              |    1 
>  kernel/entry/Makefile        |    3 +
>  kernel/entry/common.c        |   33 +++++++++++
>  5 files changed, 162 insertions(+)
> 
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -27,6 +27,9 @@ config HAVE_IMA_KEXEC
>  config HOTPLUG_SMT
>  	bool
> 
> +config GENERIC_ENTRY
> +       bool
> +
>  config OPROFILE
>  	tristate "OProfile system profiling"
>  	depends on PROFILING
> --- /dev/null
> +++ b/include/linux/entry-common.h
> @@ -0,0 +1,122 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_ENTRYCOMMON_H
> +#define __LINUX_ENTRYCOMMON_H
> +
> +#include <linux/tracehook.h>
> +#include <linux/syscalls.h>
> +#include <linux/seccomp.h>
> +#include <linux/sched.h>
> +#include <linux/audit.h>
> +
> +#include <asm/entry-common.h>
> +
> +/*
> + * Define dummy _TIF work flags if not defined by the architecture or for
> + * disabled functionality.
> + */
> +#ifndef _TIF_SYSCALL_TRACE
> +# define _TIF_SYSCALL_TRACE		(0)
> +#endif
> +
> +#ifndef _TIF_SYSCALL_EMU
> +# define _TIF_SYSCALL_EMU		(0)
> +#endif
> +
> +#ifndef _TIF_SYSCALL_TRACEPOINT
> +# define _TIF_SYSCALL_TRACEPOINT	(0)
> +#endif
> +
> +#ifndef _TIF_SECCOMP
> +# define _TIF_SECCOMP			(0)
> +#endif
> +
> +#ifndef _TIF_AUDIT
> +# define _TIF_AUDIT			(0)
> +#endif
> +
> +/*
> + * TIF flags handled in syscall_enter_from_usermode()
> + */
> +#ifndef ARCH_SYSCALL_ENTER_WORK
> +# define ARCH_SYSCALL_ENTER_WORK	(0)
> +#endif
> +
> +#define SYSCALL_ENTER_WORK						\
> +	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | TIF_SECCOMP |	\
> +	 _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU |			\
> +	 ARCH_SYSCALL_ENTER_WORK)
> +
> +/**
> + * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry()
> + *
> + * Defaults to tracehook_report_syscall_entry(). Can be replaced by
> + * architecture specific code.
> + *
> + * Invoked from syscall_enter_from_usermode()
> + */

Nit: the kernel-doc here and in other places in the patchset lacks
parameter and return value descriptions, which will create lots of warnings
for 'make *docs'.

> +static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs);
> +
> +#ifndef arch_syscall_enter_tracehook
> +static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs)
> +{
> +	return tracehook_report_syscall_entry(regs);
> +}
> +#endif
> +
> +/**
> + * arch_syscall_enter_seccomp - Architecture specific seccomp invocation
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Invoked from syscall_enter_from_usermode(). Can be replaced by
> + * architecture specific code.
> + */
> +static inline long arch_syscall_enter_seccomp(struct pt_regs *regs);
> +
> +#ifndef arch_syscall_enter_seccomp
> +static inline long arch_syscall_enter_seccomp(struct pt_regs *regs)
> +{
> +	return secure_computing(NULL);
> +}
> +#endif
> +
> +/**
> + * arch_syscall_enter_audit - Architecture specific audit invocation
> + * @regs:	Pointer to currents pt_regs
> + *
> + * Invoked from syscall_enter_from_usermode(). Must be replaced by
> + * architecture specific code if the architecture supports audit.
> + */
> +static inline void arch_syscall_enter_audit(struct pt_regs *regs);
> +
> +#ifndef arch_syscall_enter_audit
> +static inline void arch_syscall_enter_audit(struct pt_regs *regs) { }
> +#endif
> +
> +/* Common syscall enter function */
> +long core_syscall_enter_from_usermode(struct pt_regs *regs, long syscall);
> +
> +/**
> + * syscall_enter_from_usermode - Check and handle work before invoking
> + *				 a syscall
> + * @regs:	Pointer to currents pt_regs
> + * @syscall:	The syscall number
> + *
> + * Invoked from architecture specific syscall entry code with interrupts
> + * enabled.
> + *
> + * Returns: The original or a modified syscall number
> + */
> +static inline long syscall_enter_from_usermode(struct pt_regs *regs,
> +					       long syscall)
> +{
> +	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +
> +	if (IS_ENABLED(CONFIG_DEBUG_ENTRY))
> +		BUG_ON(regs != task_pt_regs(current));
> +
> +	if (ti_work & SYSCALL_ENTER_WORK)
> +		syscall = core_syscall_enter_from_usermode(regs, syscall);
> +	return syscall;
> +}
> +
> +#endif
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -43,6 +43,7 @@ obj-y += irq/
>  obj-y += rcu/
>  obj-y += livepatch/
>  obj-y += dma/
> +obj-y += entry/
> 
>  obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
>  obj-$(CONFIG_FREEZER) += freezer.o
> --- /dev/null
> +++ b/kernel/entry/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_GENERIC_ENTRY) += common.o
> --- /dev/null
> +++ b/kernel/entry/common.c
> @@ -0,0 +1,33 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/context_tracking.h>
> +#include <linux/entry-common.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/syscalls.h>
> +
> +long core_syscall_enter_from_usermode(struct pt_regs *regs, long syscall)
> +{
> +	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +	unsigned long ret = 0;
> +
> +	if (ti_work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
> +		ret = arch_syscall_enter_tracehook(regs);
> +		if (ret || (ti_work & _TIF_SYSCALL_EMU))
> +			return -1L;
> +	}
> +
> +	/* Do seccomp after ptrace, to catch any tracer changes. */
> +	if (ti_work & _TIF_SECCOMP) {
> +		ret = arch_syscall_enter_seccomp(regs);
> +		if (ret == -1L)
> +			return ret;
> +	}
> +
> +	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
> +		trace_sys_enter(regs, syscall);
> +
> +	arch_syscall_enter_audit(regs);
> +
> +	return ret ? : syscall;
> +}
> 
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 10/15] x86/entry: Move irq tracing to C code
  2019-09-23  8:47   ` Peter Zijlstra
@ 2019-09-23 10:27     ` Thomas Gleixner
  2019-09-23 11:49       ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-23 10:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, x86, Andy Lutomirski, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm, linux-arch

On Mon, 23 Sep 2019, Peter Zijlstra wrote:

> On Thu, Sep 19, 2019 at 05:03:24PM +0200, Thomas Gleixner wrote:
> > To prepare for converting the exit to usermode code to the generic version,
> > move the irqflags tracing into C code.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> >  arch/x86/entry/common.c          |   10 ++++++++++
> >  arch/x86/entry/entry_32.S        |   11 +----------
> >  arch/x86/entry/entry_64.S        |   10 ++--------
> >  arch/x86/entry/entry_64_compat.S |   21 ---------------------
> >  4 files changed, 13 insertions(+), 39 deletions(-)
> > 
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -102,6 +102,8 @@ static void exit_to_usermode_loop(struct
> >  	struct thread_info *ti = current_thread_info();
> >  	u32 cached_flags;
> >  
> > +	trace_hardirqs_off();
> 
> Bah.. so this gets called from:
> 
>  - C code, with IRQs disabled
>  - entry_64.S:error_exit
>  - entry_32.S:resume_userspace
> 
> The first obviously doesn't need this annotation, but this patch doesn't
> remove the TRACE_IRQS_OFF from entry_64.S and only the 32bit case is
> changed.
> 
> Is that entry_64.S case an oversight, or do we need an extensive comment
> on this one?

Lemme stare at that again. At some point I probably lost track in that maze.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 10/15] x86/entry: Move irq tracing to C code
  2019-09-23 10:27     ` Thomas Gleixner
@ 2019-09-23 11:49       ` Peter Zijlstra
  2019-09-23 11:55         ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2019-09-23 11:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm, linux-arch,
	jgross

On Mon, Sep 23, 2019 at 12:27:47PM +0200, Thomas Gleixner wrote:
> On Mon, 23 Sep 2019, Peter Zijlstra wrote:
> 
> > On Thu, Sep 19, 2019 at 05:03:24PM +0200, Thomas Gleixner wrote:
> > > To prepare for converting the exit to usermode code to the generic version,
> > > move the irqflags tracing into C code.
> > > 
> > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > > ---
> > >  arch/x86/entry/common.c          |   10 ++++++++++
> > >  arch/x86/entry/entry_32.S        |   11 +----------
> > >  arch/x86/entry/entry_64.S        |   10 ++--------
> > >  arch/x86/entry/entry_64_compat.S |   21 ---------------------
> > >  4 files changed, 13 insertions(+), 39 deletions(-)
> > > 
> > > --- a/arch/x86/entry/common.c
> > > +++ b/arch/x86/entry/common.c
> > > @@ -102,6 +102,8 @@ static void exit_to_usermode_loop(struct
> > >  	struct thread_info *ti = current_thread_info();
> > >  	u32 cached_flags;
> > >  
> > > +	trace_hardirqs_off();
> > 
> > Bah.. so this gets called from:
> > 
> >  - C code, with IRQs disabled
> >  - entry_64.S:error_exit
> >  - entry_32.S:resume_userspace
> > 
> > The first obviously doesn't need this annotation, but this patch doesn't
> > remove the TRACE_IRQS_OFF from entry_64.S and only the 32bit case is
> > changed.
> > 
> > Is that entry_64.S case an oversight, or do we need an extensive comment
> > on this one?
> 
> Lemme stare at that again. At some point I probably lost track in that maze.

While walking the kids to school I wondered WTH we need to call
TRACE_IRQS_OFF in the first place. If this is the return from exception
path, interrupts had better be disabled already (in exception enter).

For entry_64.S we have:

  - idtentry_part; which does TRACE_IRQS_OFF at the start and error_exit
    at the end.

  - xen_do_hypervisor_callback, xen_failsafe_callback -- which are
    confusing.

So in the normal case, it appears we can simply do:

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b7c3ea4cb19d..e9cf59ac554e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1368,8 +1368,6 @@ END(error_entry)
 
 ENTRY(error_exit)
 	UNWIND_HINT_REGS
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
 	testb	$3, CS(%rsp)
 	jz	retint_kernel
 	jmp	retint_user

and all should be well. This leaves Xen...

For entry_32.S it looks like nothing uses 'resume_userspace' so that
ENTRY can go away. Which then leaves:

 * ret_from_intr:
  - common_spurious: TRACE_IRQS_OFF
  - common_interrupt: TRACE_IRQS_OFF
  - BUILD_INTERRUPT3: TRACE_IRQS_OFF
  - xen_do_upcall: ...

 * ret_from_exception:
  - xen_failsafe_callback: ...
  - common_exception_read_cr2: TRACE_IRQS_OFF
  - common_exception: TRACE_IRQS_OFF
  - int3: TRACE_IRQS_OFF

Which again suggests:

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index f83ca5aa8b77..7a19e7413a8e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -825,9 +825,6 @@ END(ret_from_fork)
 	cmpl	$USER_RPL, %eax
 	jb	restore_all_kernel		# not returning to v8086 or userspace
 
-ENTRY(resume_userspace)
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
 	movl	%esp, %eax
 	call	prepare_exit_to_usermode
 	jmp	restore_all

with the notable exception (oh teh pun!) being Xen... _again_.

With these patchlets on, we'd want prepare_exit_to_usermode() to
validate the IRQ state, but AFAICT it _should_ all just 'work' (famous
last words).

Cc Juergen because Xen

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC patch 10/15] x86/entry: Move irq tracing to C code
  2019-09-23 11:49       ` Peter Zijlstra
@ 2019-09-23 11:55         ` Peter Zijlstra
  2019-09-23 12:10           ` Peter Zijlstra
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2019-09-23 11:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm, linux-arch,
	jgross

On Mon, Sep 23, 2019 at 01:49:20PM +0200, Peter Zijlstra wrote:
> While walking the kids to school I wondered WTH we need to call
> TRACE_IRQS_OFF in the first place. If this is the return from exception
> path, interrupts had better be disabled already (in exception enter).
> 
> For entry_64.S we have:
> 
>   - idtentry_part; which does TRACE_IRQS_OFF at the start and error_exit
>     at the end.
> 
>   - xen_do_hypervisor_callback, xen_failsafe_callback -- which are
>     confusing.
> 
> So in the normal case, it appears we can simply do:
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index b7c3ea4cb19d..e9cf59ac554e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1368,8 +1368,6 @@ END(error_entry)
>  
>  ENTRY(error_exit)
>  	UNWIND_HINT_REGS
> -	DISABLE_INTERRUPTS(CLBR_ANY)
> -	TRACE_IRQS_OFF
>  	testb	$3, CS(%rsp)
>  	jz	retint_kernel
>  	jmp	retint_user
> 
> and all should be well. This leaves Xen...
> 
> For entry_32.S it looks like nothing uses 'resume_userspace' so that
> ENTRY can go away. Which then leaves:
> 
>  * ret_from_intr:
>   - common_spurious: TRACE_IRQS_OFF
>   - common_interrupt: TRACE_IRQS_OFF
>   - BUILD_INTERRUPT3: TRACE_IRQS_OFF
>   - xen_do_upcall: ...
> 
>  * ret_from_exception:
>   - xen_failsafe_callback: ...
>   - common_exception_read_cr2: TRACE_IRQS_OFF
>   - common_exception: TRACE_IRQS_OFF
>   - int3: TRACE_IRQS_OFF
> 
> Which again suggests:
> 
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index f83ca5aa8b77..7a19e7413a8e 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -825,9 +825,6 @@ END(ret_from_fork)
>  	cmpl	$USER_RPL, %eax
>  	jb	restore_all_kernel		# not returning to v8086 or userspace
>  
> -ENTRY(resume_userspace)
> -	DISABLE_INTERRUPTS(CLBR_ANY)
> -	TRACE_IRQS_OFF
>  	movl	%esp, %eax
>  	call	prepare_exit_to_usermode
>  	jmp	restore_all
> 
> with the notable exception (oh teh pun!) being Xen... _again_.
> 
> With these patchlets on, we'd want prepare_exit_to_usermode() to
> validate the IRQ state, but AFAICT it _should_ all just 'work' (famous
> last words).
> 
> Cc Juergen because Xen

Arrgh.. faults!! they do local_irq_enable() but never disable them
again. Arguably those functions should be symmetric and restore IF when
they muck with it instead of relying on the exit path fixing things up.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 10/15] x86/entry: Move irq tracing to C code
  2019-09-23 11:55         ` Peter Zijlstra
@ 2019-09-23 12:10           ` Peter Zijlstra
  2019-09-23 17:24             ` Andy Lutomirski
  0 siblings, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2019-09-23 12:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Andy Lutomirski, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm, linux-arch,
	jgross

On Mon, Sep 23, 2019 at 01:55:51PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 23, 2019 at 01:49:20PM +0200, Peter Zijlstra wrote:
> > While walking the kids to school I wondered WTH we need to call
> > TRACE_IRQS_OFF in the first place. If this is the return from exception
> > path, interrupts had better be disabled already (in exception enter).
> > 
> > For entry_64.S we have:
> > 
> >   - idtentry_part; which does TRACE_IRQS_OFF at the start and error_exit
> >     at the end.
> > 
> >   - xen_do_hypervisor_callback, xen_failsafe_callback -- which are
> >     confusing.
> > 
> > So in the normal case, it appears we can simply do:
> > 
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index b7c3ea4cb19d..e9cf59ac554e 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -1368,8 +1368,6 @@ END(error_entry)
> >  
> >  ENTRY(error_exit)
> >  	UNWIND_HINT_REGS
> > -	DISABLE_INTERRUPTS(CLBR_ANY)
> > -	TRACE_IRQS_OFF
> >  	testb	$3, CS(%rsp)
> >  	jz	retint_kernel
> >  	jmp	retint_user
> > 
> > and all should be well. This leaves Xen...
> > 
> > For entry_32.S it looks like nothing uses 'resume_userspace' so that
> > ENTRY can go away. Which then leaves:
> > 
> >  * ret_from_intr:
> >   - common_spurious: TRACE_IRQS_OFF
> >   - common_interrupt: TRACE_IRQS_OFF
> >   - BUILD_INTERRUPT3: TRACE_IRQS_OFF
> >   - xen_do_upcall: ...
> > 
> >  * ret_from_exception:
> >   - xen_failsafe_callback: ...
> >   - common_exception_read_cr2: TRACE_IRQS_OFF
> >   - common_exception: TRACE_IRQS_OFF
> >   - int3: TRACE_IRQS_OFF
> > 
> > Which again suggests:
> > 
> > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> > index f83ca5aa8b77..7a19e7413a8e 100644
> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -825,9 +825,6 @@ END(ret_from_fork)
> >  	cmpl	$USER_RPL, %eax
> >  	jb	restore_all_kernel		# not returning to v8086 or userspace
> >  
> > -ENTRY(resume_userspace)
> > -	DISABLE_INTERRUPTS(CLBR_ANY)
> > -	TRACE_IRQS_OFF
> >  	movl	%esp, %eax
> >  	call	prepare_exit_to_usermode
> >  	jmp	restore_all
> > 
> > with the notable exception (oh teh pun!) being Xen... _again_.
> > 
> > With these patchlets on, we'd want prepare_exit_to_usermode() to
> > validate the IRQ state, but AFAICT it _should_ all just 'work' (famous
> > last words).
> > 
> > Cc Juergen because Xen
> 
> Arrgh.. faults!! they do local_irq_enable() but never disable them
> again. Arguably those functions should be symmetric and restore IF when
> they muck with it instead of relying on the exit path fixing things up.

---
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index f83ca5aa8b77..7a19e7413a8e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -825,9 +825,6 @@ END(ret_from_fork)
 	cmpl	$USER_RPL, %eax
 	jb	restore_all_kernel		# not returning to v8086 or userspace
 
-ENTRY(resume_userspace)
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
 	movl	%esp, %eax
 	call	prepare_exit_to_usermode
 	jmp	restore_all
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index b7c3ea4cb19d..e9cf59ac554e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1368,8 +1368,6 @@ END(error_entry)
 
 ENTRY(error_exit)
 	UNWIND_HINT_REGS
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
 	testb	$3, CS(%rsp)
 	jz	retint_kernel
 	jmp	retint_user
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4bb0f8447112..663d44c68f67 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -276,6 +276,7 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
 			NOTIFY_STOP) {
 		cond_local_irq_enable(regs);
 		do_trap(trapnr, signr, str, regs, error_code, sicode, addr);
+		cond_local_irq_disable(regs);
 	}
 }
 
@@ -501,6 +502,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 		die("bounds", regs, error_code);
 	}
 
+	cond_local_irq_disable(regs);
 	return;
 
 exit_trap:
@@ -512,6 +514,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
 	 * time..
 	 */
 	do_trap(X86_TRAP_BR, SIGSEGV, "bounds", regs, error_code, 0, NULL);
+	cond_local_irq_disable(regs);
 }
 
 dotraplinkage void
@@ -525,19 +528,19 @@ do_general_protection(struct pt_regs *regs, long error_code)
 
 	if (static_cpu_has(X86_FEATURE_UMIP)) {
 		if (user_mode(regs) && fixup_umip_exception(regs))
-			return;
+			goto exit_trap;
 	}
 
 	if (v8086_mode(regs)) {
 		local_irq_enable();
 		handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
-		return;
+		goto exit_trap;
 	}
 
 	tsk = current;
 	if (!user_mode(regs)) {
 		if (fixup_exception(regs, X86_TRAP_GP, error_code, 0))
-			return;
+			goto exit_trap;
 
 		tsk->thread.error_code = error_code;
 		tsk->thread.trap_nr = X86_TRAP_GP;
@@ -549,12 +552,12 @@ do_general_protection(struct pt_regs *regs, long error_code)
 		 */
 		if (!preemptible() && kprobe_running() &&
 		    kprobe_fault_handler(regs, X86_TRAP_GP))
-			return;
+			goto exit_trap;
 
 		if (notify_die(DIE_GPF, desc, regs, error_code,
 			       X86_TRAP_GP, SIGSEGV) != NOTIFY_STOP)
 			die(desc, regs, error_code);
-		return;
+		goto exit_trap;
 	}
 
 	tsk->thread.error_code = error_code;
@@ -563,6 +566,8 @@ do_general_protection(struct pt_regs *regs, long error_code)
 	show_signal(tsk, SIGSEGV, "", desc, regs, error_code);
 
 	force_sig(SIGSEGV);
+exit_trap:
+	cond_local_irq_disable(regs);
 }
 NOKPROBE_SYMBOL(do_general_protection);
 
@@ -783,9 +788,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	if (v8086_mode(regs)) {
 		handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code,
 					X86_TRAP_DB);
-		cond_local_irq_disable(regs);
-		debug_stack_usage_dec();
-		goto exit;
+		goto exit_irq;
 	}
 
 	if (WARN_ON_ONCE((dr6 & DR_STEP) && !user_mode(regs))) {
@@ -802,6 +805,8 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	si_code = get_si_code(tsk->thread.debugreg6);
 	if (tsk->thread.debugreg6 & (DR_STEP | DR_TRAP_BITS) || user_icebp)
 		send_sigtrap(regs, error_code, si_code);
+
+exit_irq:
 	cond_local_irq_disable(regs);
 	debug_stack_usage_dec();
 
@@ -827,7 +832,7 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
 
 	if (!user_mode(regs)) {
 		if (fixup_exception(regs, trapnr, error_code, 0))
-			return;
+			goto exit_trap;
 
 		task->thread.error_code = error_code;
 		task->thread.trap_nr = trapnr;
@@ -835,7 +840,7 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
 		if (notify_die(DIE_TRAP, str, regs, error_code,
 					trapnr, SIGFPE) != NOTIFY_STOP)
 			die(str, regs, error_code);
-		return;
+		goto exit_trap;
 	}
 
 	/*
@@ -849,10 +854,12 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
 	si_code = fpu__exception_code(fpu, trapnr);
 	/* Retry when we get spurious exceptions: */
 	if (!si_code)
-		return;
+		goto exit_trap;
 
 	force_sig_fault(SIGFPE, si_code,
 			(void __user *)uprobe_get_trap_addr(regs));
+exit_trap:
+	cond_local_irq_disable(regs);
 }
 
 dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code)
@@ -889,6 +896,8 @@ do_device_not_available(struct pt_regs *regs, long error_code)
 
 		info.regs = regs;
 		math_emulate(&info);
+
+		cond_local_irq_disable(regs);
 		return;
 	}
 #endif
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9ceacd1156db..501cc36a3d6a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1500,10 +1500,13 @@ __do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
 		return;
 
 	/* Was the fault on kernel-controlled part of the address space? */
-	if (unlikely(fault_in_kernel_space(address)))
+	if (unlikely(fault_in_kernel_space(address))) {
 		do_kern_addr_fault(regs, hw_error_code, address);
-	else
+	} else {
 		do_user_addr_fault(regs, hw_error_code, address);
+		if (regs->flags & X86_EFLAGS_IF)
+			local_irq_disable();
+	}
 }
 NOKPROBE_SYMBOL(__do_page_fault);
 

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC patch 10/15] x86/entry: Move irq tracing to C code
  2019-09-23 12:10           ` Peter Zijlstra
@ 2019-09-23 17:24             ` Andy Lutomirski
  0 siblings, 0 replies; 43+ messages in thread
From: Andy Lutomirski @ 2019-09-23 17:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, LKML, X86 ML, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list,
	linux-arch, Juergen Gross

On Mon, Sep 23, 2019 at 5:10 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Sep 23, 2019 at 01:55:51PM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 23, 2019 at 01:49:20PM +0200, Peter Zijlstra wrote:
> > > While walking the kids to school I wondered WTH we need to call
> > > TRACE_IRQS_OFF in the first place. If this is the return from exception
> > > path, interrupts had better be disabled already (in exception enter).
> > >
> > > For entry_64.S we have:
> > >
> > >   - idtentry_part; which does TRACE_IRQS_OFF at the start and error_exit
> > >     at the end.
> > >
> > >   - xen_do_hypervisor_callback, xen_failsafe_callback -- which are
> > >     confusing.
> > >
> > > So in the normal case, it appears we can simply do:
> > >
> > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > index b7c3ea4cb19d..e9cf59ac554e 100644
> > > --- a/arch/x86/entry/entry_64.S
> > > +++ b/arch/x86/entry/entry_64.S
> > > @@ -1368,8 +1368,6 @@ END(error_entry)
> > >
> > >  ENTRY(error_exit)
> > >     UNWIND_HINT_REGS
> > > -   DISABLE_INTERRUPTS(CLBR_ANY)
> > > -   TRACE_IRQS_OFF
> > >     testb   $3, CS(%rsp)
> > >     jz      retint_kernel
> > >     jmp     retint_user
> > >
> > > and all should be well. This leaves Xen...
> > >
> > > For entry_32.S it looks like nothing uses 'resume_userspace' so that
> > > ENTRY can go away. Which then leaves:
> > >
> > >  * ret_from_intr:
> > >   - common_spurious: TRACE_IRQS_OFF
> > >   - common_interrupt: TRACE_IRQS_OFF
> > >   - BUILD_INTERRUPT3: TRACE_IRQS_OFF
> > >   - xen_do_upcall: ...
> > >
> > >  * ret_from_exception:
> > >   - xen_failsafe_callback: ...
> > >   - common_exception_read_cr2: TRACE_IRQS_OFF
> > >   - common_exception: TRACE_IRQS_OFF
> > >   - int3: TRACE_IRQS_OFF
> > >
> > > Which again suggests:
> > >
> > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> > > index f83ca5aa8b77..7a19e7413a8e 100644
> > > --- a/arch/x86/entry/entry_32.S
> > > +++ b/arch/x86/entry/entry_32.S
> > > @@ -825,9 +825,6 @@ END(ret_from_fork)
> > >     cmpl    $USER_RPL, %eax
> > >     jb      restore_all_kernel              # not returning to v8086 or userspace
> > >
> > > -ENTRY(resume_userspace)
> > > -   DISABLE_INTERRUPTS(CLBR_ANY)
> > > -   TRACE_IRQS_OFF
> > >     movl    %esp, %eax
> > >     call    prepare_exit_to_usermode
> > >     jmp     restore_all
> > >
> > > with the notable exception (oh teh pun!) being Xen... _again_.
> > >
> > > With these patchlets on, we'd want prepare_exit_to_usermode() to
> > > validate the IRQ state, but AFAICT it _should_ all just 'work' (famous
> > > last words).
> > >
> > > Cc Juergen because Xen
> >
> > Arrgh.. faults!! they do local_irq_enable() but never disable them
> > again. Arguably those functions should be symmetric and restore IF when
> > they muck with it instead of relying on the exit path fixing things up.
>
> ---
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index f83ca5aa8b77..7a19e7413a8e 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -825,9 +825,6 @@ END(ret_from_fork)
>         cmpl    $USER_RPL, %eax
>         jb      restore_all_kernel              # not returning to v8086 or userspace

...

Yes, please, but can you add an assertion under CONFIG_DEBUG_ENTRY
that IRQs are actually off?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest
  2019-09-19 15:03 ` [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest Thomas Gleixner
  2019-09-19 15:40   ` Paolo Bonzini
@ 2019-09-23 18:17   ` Andy Lutomirski
  2019-09-26 11:35   ` Miroslav Benes
  2 siblings, 0 replies; 43+ messages in thread
From: Andy Lutomirski @ 2019-09-23 18:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list,
	linux-arch

On Thu, Sep 19, 2019 at 8:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Entering a guest is similar to exiting to user space. Pending work like
> handling signals, rescheduling, task work etc. needs to be handled before
> that.
>
> Provide generic infrastructure to avoid duplication of the same handling code
> all over the place.
>
> Update ARM64 struct kvm_vcpu_stat with a signal_exit member so the generic
> code compiles.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/arm64/include/asm/kvm_host.h |    1
>  include/linux/entry-common.h      |   66 ++++++++++++++++++++++++++++++++++++++
>  kernel/entry/common.c             |   44 +++++++++++++++++++++++++
>  3 files changed, 111 insertions(+)
>
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -409,6 +409,7 @@ struct kvm_vcpu_stat {
>         u64 wfi_exit_stat;
>         u64 mmio_exit_user;
>         u64 mmio_exit_kernel;
> +       u64 signal_exits;
>         u64 exits;
>  };
>
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -255,4 +255,70 @@ static inline void arch_syscall_exit_tra
>  /* Common syscall exit function */
>  void syscall_exit_to_usermode(struct pt_regs *regs, long syscall, long retval);
>
> +#if IS_ENABLED(CONFIG_KVM)
> +
> +#include <linux/kvm_host.h>
> +
> +#ifndef ARCH_EXIT_TO_GUESTMODE_WORK
> +# define ARCH_EXIT_TO_GUESTMODE_WORK   (0)
> +#endif
> +
> +#define EXIT_TO_GUESTMODE_WORK                                         \
> +       (_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_RESUME |     \
> +        ARCH_EXIT_TO_GUESTMODE_WORK)
> +
> +int core_exit_to_guestmode_work(struct kvm *kvm, struct kvm_vcpu *vcpu,
> +                               unsigned long ti_work);
> +
> +/**
> + * arch_exit_to_guestmode - Architecture specific exit to guest mode function
> + * @kvm:       Pointer to the guest instance
> + * @vcpu:      Pointer to current's VCPU data
> + * @ti_work:   Cached TIF flags gathered in exit_to_guestmode()
> + *
> + * Invoked from core_exit_to_guestmode_work(). Can be replaced by
> + * architecture specific code.
> + */
> +static inline int arch_exit_to_guestmode(struct kvm *kvm, struct kvm_vcpu *vcpu,
> +                                        unsigned long ti_work);

Can you add a comment about whether IRQs are supposed to be off (I
assume they are) and perhaps a lockdep assertion to verify it?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (15 preceding siblings ...)
  2019-09-20 15:12 ` [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Mark Rutland
@ 2019-09-23 18:18 ` Andy Lutomirski
  2019-09-24  6:50 ` Christian Borntraeger
  17 siblings, 0 replies; 43+ messages in thread
From: Andy Lutomirski @ 2019-09-23 18:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, X86 ML, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list,
	linux-arch

On Thu, Sep 19, 2019 at 8:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> When working on a way to move out the posix cpu timer expiry out of the
> timer interrupt context, I noticed that KVM is not handling pending task
> work before entering a guest. A quick hack was to add that to the x86 KVM
> handling loop. The discussion ended with a request to make this a generic
> infrastructure possible with also moving the per arch implementations of
> the enter from and return to user space handling generic.
>
>   https://lore.kernel.org/r/89E42BCC-47A8-458B-B06A-D6A20D20512C@amacapital.net
>
> You asked for it, so don't complain that you have to review it :)
>
> The series implements the syscall enter/exit and the general exit to
> userspace work handling along with the pre guest enter functionality.
>
> The series converts x86 and ARM64. x86 is fully tested including selftests
> etc. ARM64 is only compile tested for now as my only ARM64 testbox is not
> available right now.

Other than the comments I sent so far, I like this series.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 02/15] x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY
  2019-09-20 23:39   ` Andy Lutomirski
@ 2019-09-23 20:43     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-23 20:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list, linux-arch

On Fri, 20 Sep 2019, Andy Lutomirski wrote:

> On Thu, Sep 19, 2019 at 8:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > Evaluating _TIF_NOHZ to decide whether to use the slow syscall entry path
> > is not only pointless, it's actually counterproductive:
> >
> >  1) Context tracking code is invoked unconditionally before that flag is
> >     evaluated.
> >
> >  2) If the flag is set the slow path is invoked for nothing due to #1
> 
> Can we also get rid of TIF_NOHZ on x86?

If we make the usage in context_tracking_cpu_set() conditional.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work
  2019-09-20 15:12 ` [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Mark Rutland
@ 2019-09-23 20:50   ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-09-23 20:50 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKML, x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Marc Zyngier, Paolo Bonzini, kvm, linux-arch,
	James Morse

On Fri, 20 Sep 2019, Mark Rutland wrote:
> I've been working on converting the arm64 entry code to C for a while
> now [1], gradually upstreaming the bits I can.
> 
> James has picked up some of that [2] as a prerequisite for some RAS
> error handling, and I think building the arm64 bits atop of that would
> be preferable. IIUC that should get posted as a series come -rc1.
> 
> Since there's immense scope for subtle breakage, I'd prefer that we do
> the arm64-specific asm->C conversion before migrating arm64 to generic
> code. That way us arm64 folk can ensure the asm->C conversion retains
> the existing behaviour, and it'll be easier for everyone to compare the
> arm64 and generic C implementations.

Right. It still would be nice to have some feedback on the general
approach.

That sais I'm happy to let you screw your entry code up yourself :)

Thanks

	tglx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work
  2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
                   ` (16 preceding siblings ...)
  2019-09-23 18:18 ` Andy Lutomirski
@ 2019-09-24  6:50 ` Christian Borntraeger
  17 siblings, 0 replies; 43+ messages in thread
From: Christian Borntraeger @ 2019-09-24  6:50 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch, linux-s390



On 19.09.19 17:03, Thomas Gleixner wrote:
> When working on a way to move out the posix cpu timer expiry out of the
> timer interrupt context, I noticed that KVM is not handling pending task
> work before entering a guest. A quick hack was to add that to the x86 KVM
> handling loop. The discussion ended with a request to make this a generic
> infrastructure possible with also moving the per arch implementations of
> the enter from and return to user space handling generic.
> 
>   https://lore.kernel.org/r/89E42BCC-47A8-458B-B06A-D6A20D20512C@amacapital.net
> 
> You asked for it, so don't complain that you have to review it :)
> 
> The series implements the syscall enter/exit and the general exit to
> userspace work handling along with the pre guest enter functionality.
> 
> The series converts x86 and ARM64. x86 is fully tested including selftests
> etc. ARM64 is only compile tested for now as my only ARM64 testbox is not
> available right now.

It seems that s390x would also need to look into TIF_NOTIFY_PENDING before
entering a KVM guest. Given that the s390x entry path is still in assembler
this might not be something to do quickly.

Would it make sense to actually start with a minimal solution (e.g. one that
provides notify_resume_pending like your original patch) as a fix. That would
also be simple to backport. And then we can do the proper rework on top.

Or do we consider anything that depends on TIF_NOTIFY_PENDING before entering
a guest as not important enough for stable?
After all the vcpu_run ioctl almost never returns to userspace and nothing 
was obviously broken.

Another question: Are there callbacks due to TIF_NOTIFY_PENDING that should
NOT happen as long as we stay in the vpcu loop?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 10/15] x86/entry: Move irq tracing to C code
  2019-09-19 15:03 ` [RFC patch 10/15] x86/entry: Move irq tracing to C code Thomas Gleixner
  2019-09-23  8:47   ` Peter Zijlstra
@ 2019-09-26  2:59   ` Josh Poimboeuf
  1 sibling, 0 replies; 43+ messages in thread
From: Josh Poimboeuf @ 2019-09-26  2:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch

On Thu, Sep 19, 2019 at 05:03:24PM +0200, Thomas Gleixner wrote:
>  ENTRY(error_exit)
> -	UNWIND_HINT_REGS
>  	DISABLE_INTERRUPTS(CLBR_ANY)
>  	TRACE_IRQS_OFF
> +	UNWIND_HINT_REGS
>  	testb	$3, CS(%rsp)
>  	jz	retint_kernel
>  	jmp	retint_user

I don't think this UNWIND_HINT_REGS needs to be moved?

-- 
Josh

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest
  2019-09-19 15:03 ` [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest Thomas Gleixner
  2019-09-19 15:40   ` Paolo Bonzini
  2019-09-23 18:17   ` Andy Lutomirski
@ 2019-09-26 11:35   ` Miroslav Benes
  2 siblings, 0 replies; 43+ messages in thread
From: Miroslav Benes @ 2019-09-26 11:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Peter Zijlstra, Andy Lutomirski, Catalin Marinas,
	Will Deacon, Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm,
	linux-arch, live-patching

> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h

[...]

> +#define EXIT_TO_GUESTMODE_WORK						\
> +	(_TIF_NEED_RESCHED | _TIF_SIGPENDING | _TIF_NOTIFY_RESUME |	\
> +	 ARCH_EXIT_TO_GUESTMODE_WORK)

[...]

> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
>
> +int core_exit_to_guestmode_work(struct kvm *kvm, struct kvm_vcpu *vcpu,
> +				unsigned long ti_work)
> +{
> +	/*
> +	 * Before returning to guest mode handle all pending work
> +	 */
> +	if (ti_work & _TIF_SIGPENDING) {
> +		vcpu->run->exit_reason = KVM_EXIT_INTR;
> +		vcpu->stat.signal_exits++;
> +		return -EINTR;
> +	}
> +
> +	if (ti_work & _TIF_NEED_RESCHED) {
> +		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> +		schedule();
> +		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +	}
> +
> +	if (ti_work & _TIF_PATCH_PENDING) {
> +		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> +		klp_update_patch_state(current);
> +		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +	}

If I am reading the code correctly, _TIF_PATCH_PENDING is not a part of 
EXIT_TO_GUESTMODE_WORK, so the handling code here would not be called on 
any arch as of now.

I also think that _TIF_PATCH_PENDING must not be handled here generally. 
It could break consistency guarantees when live patching KVM (and we do 
that from time to time).

Adding live-patching ML to CC.

Miroslav

> +	if (ti_work & _TIF_NOTIFY_RESUME) {
> +		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> +		clear_thread_flag(TIF_NOTIFY_RESUME);
> +		tracehook_notify_resume(NULL);
> +		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
> +	}
> +
> +	/* Any extra architecture specific work */
> +	return arch_exit_to_guestmode_work(kvm, vcpu, ti_work);
> +}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC patch 01/15] entry: Provide generic syscall entry functionality
  2019-09-20 23:38   ` Andy Lutomirski
@ 2019-10-20 11:49     ` Thomas Gleixner
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Gleixner @ 2019-10-20 11:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, X86 ML, Peter Zijlstra, Catalin Marinas, Will Deacon,
	Mark Rutland, Marc Zyngier, Paolo Bonzini, kvm list, linux-arch

On Fri, 20 Sep 2019, Andy Lutomirski wrote:
> On Thu, Sep 19, 2019 at 8:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > +#ifndef _TIF_AUDIT
> > +# define _TIF_AUDIT                    (0)
> > +#endif
> 
> I'm wondering if these should be __TIF (double-underscore) or
> MAYBE_TIF_ or something to avoid errors where people do flags |=
> TIF_WHATEVER and get surprised.

That's what exists today already. See arch/*/include/asm/threadinfo.h
 
> > +/**
> > + * syscall_enter_from_usermode - Check and handle work before invoking
> > + *                              a syscall
> > + * @regs:      Pointer to currents pt_regs
> > + * @syscall:   The syscall number
> > + *
> > + * Invoked from architecture specific syscall entry code with interrupts
> > + * enabled.
> > + *
> > + * Returns: The original or a modified syscall number
> > + */
> 
> Maybe document that it can return -1 to skip the syscall and that, if
> this happens, it may use syscall_set_error() or
> syscall_set_return_value() first.  If neither of those is called and
> -1 is returned, then the syscall will fail with ENOSYS.

Sure.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2019-10-20 11:49 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-19 15:03 [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Thomas Gleixner
2019-09-19 15:03 ` [RFC patch 01/15] entry: Provide generic syscall entry functionality Thomas Gleixner
2019-09-20 23:38   ` Andy Lutomirski
2019-10-20 11:49     ` Thomas Gleixner
2019-09-23  9:05   ` Mike Rapoport
2019-09-19 15:03 ` [RFC patch 02/15] x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY Thomas Gleixner
2019-09-20 23:39   ` Andy Lutomirski
2019-09-23 20:43     ` Thomas Gleixner
2019-09-19 15:03 ` [RFC patch 03/15] x86/entry: Use generic syscall entry function Thomas Gleixner
2019-09-20 23:41   ` Andy Lutomirski
2019-09-23  8:31     ` Peter Zijlstra
2019-09-23  8:40       ` Thomas Gleixner
2019-09-19 15:03 ` [RFC patch 04/15] arm64/entry: " Thomas Gleixner
2019-09-20 12:21   ` Catalin Marinas
2019-09-19 15:03 ` [RFC patch 05/15] entry: Provide generic syscall exit function Thomas Gleixner
2019-09-19 15:03 ` [RFC patch 06/15] x86/entry: Use generic syscall exit functionality Thomas Gleixner
2019-09-19 15:03 ` [RFC patch 07/15] arm64/syscall: Remove obscure flag check Thomas Gleixner
2019-09-20 14:29   ` Catalin Marinas
2019-09-19 15:03 ` [RFC patch 08/15] arm64/syscall: Use generic syscall exit functionality Thomas Gleixner
2019-09-19 15:03 ` [RFC patch 09/15] entry: Provide generic exit to usermode functionality Thomas Gleixner
2019-09-23  8:30   ` Peter Zijlstra
2019-09-19 15:03 ` [RFC patch 10/15] x86/entry: Move irq tracing to C code Thomas Gleixner
2019-09-23  8:47   ` Peter Zijlstra
2019-09-23 10:27     ` Thomas Gleixner
2019-09-23 11:49       ` Peter Zijlstra
2019-09-23 11:55         ` Peter Zijlstra
2019-09-23 12:10           ` Peter Zijlstra
2019-09-23 17:24             ` Andy Lutomirski
2019-09-26  2:59   ` Josh Poimboeuf
2019-09-19 15:03 ` [RFC patch 11/15] x86/entry: Use generic exit to usermode Thomas Gleixner
2019-09-19 15:03 ` [RFC patch 12/15] arm64/entry: " Thomas Gleixner
2019-09-19 15:03 ` [RFC patch 13/15] arm64/entry: Move FPU restore out of exit_to_usermode() loop Thomas Gleixner
2019-09-19 15:03 ` [RFC patch 14/15] workpending: Provide infrastructure for work before entering a guest Thomas Gleixner
2019-09-19 15:40   ` Paolo Bonzini
2019-09-20 11:48     ` Thomas Gleixner
2019-09-23 18:17   ` Andy Lutomirski
2019-09-26 11:35   ` Miroslav Benes
2019-09-19 15:03 ` [RFC patch 15/15] x86/kvm: Use GENERIC_EXIT_WORKPENDING Thomas Gleixner
2019-09-19 15:40   ` Paolo Bonzini
2019-09-20 15:12 ` [RFC patch 00/15] entry: Provide generic implementation for host and guest entry/exit work Mark Rutland
2019-09-23 20:50   ` Thomas Gleixner
2019-09-23 18:18 ` Andy Lutomirski
2019-09-24  6:50 ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.