From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, linux-arch@vger.kernel.org,
Will Deacon <will@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Mark Rutland <mark.rutland@arm.com>,
Kees Cook <keescook@chromium.org>,
Keno Fischer <keno@juliacomputing.com>,
Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org,
Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: [patch V4 02/15] entry: Provide generic syscall entry functionality
Date: Tue, 21 Jul 2020 12:57:08 +0200 [thread overview]
Message-ID: <20200721110808.455350746@linutronix.de> (raw)
In-Reply-To: 20200721105706.030914876@linutronix.de
On syscall entry certain work needs to be done:
- Establish state (lockdep, context tracking, tracing)
- Conditional work (ptrace, seccomp, audit...)
This code is needlessly duplicated and different in all
architectures.
Provide a generic version based on the x86 implementation which has all the
RCU and instrumentation bits right.
As interrupt/exception entry from user space needs parts of the same
functionality, provide a function for this as well.
syscall_enter_from_user_mode() and irqentry_enter_from_user_mode() must be
called right after the low level ASM entry. The calling code must be
non-instrumentable. After the functions returns state is correct and the
subsequent functions can be instrumented.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V4: Remove the architecture wrappers for seccomp and audit and use
the generic code (Kees)
Fix TIF_SYSCALL_AUDIT dummy define and remove TIF_SYSCALL_TRACE as it's
defined in all architectures.
V3: Adapt to noinstr changes. Add interrupt/exception entry function.
V2: Fix function documentation (Mike)
Add comment about return value (Andy)
---
arch/Kconfig | 3 +
include/linux/entry-common.h | 121 +++++++++++++++++++++++++++++++++++++++++++
kernel/Makefile | 1
kernel/entry/Makefile | 3 +
kernel/entry/common.c | 88 +++++++++++++++++++++++++++++++
5 files changed, 216 insertions(+)
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -27,6 +27,9 @@ config HAVE_IMA_KEXEC
config HOTPLUG_SMT
bool
+config GENERIC_ENTRY
+ bool
+
config OPROFILE
tristate "OProfile system profiling"
depends on PROFILING
--- /dev/null
+++ b/include/linux/entry-common.h
@@ -0,0 +1,121 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_ENTRYCOMMON_H
+#define __LINUX_ENTRYCOMMON_H
+
+#include <linux/tracehook.h>
+#include <linux/syscalls.h>
+#include <linux/seccomp.h>
+#include <linux/sched.h>
+
+#include <asm/entry-common.h>
+
+/*
+ * Define dummy _TIF work flags if not defined by the architecture or for
+ * disabled functionality.
+ */
+#ifndef _TIF_SYSCALL_EMU
+# define _TIF_SYSCALL_EMU (0)
+#endif
+
+#ifndef _TIF_SYSCALL_TRACEPOINT
+# define _TIF_SYSCALL_TRACEPOINT (0)
+#endif
+
+#ifndef _TIF_SECCOMP
+# define _TIF_SECCOMP (0)
+#endif
+
+#ifndef _TIF_SYSCALL_AUDIT
+# define _TIF_SYSCALL_AUDIT (0)
+#endif
+
+/*
+ * TIF flags handled in syscall_enter_from_usermode()
+ */
+#ifndef ARCH_SYSCALL_ENTER_WORK
+# define ARCH_SYSCALL_ENTER_WORK (0)
+#endif
+
+#define SYSCALL_ENTER_WORK \
+ (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | TIF_SECCOMP | \
+ _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_EMU | \
+ ARCH_SYSCALL_ENTER_WORK)
+
+/**
+ * arch_check_user_regs - Architecture specific sanity check for user mode regs
+ * @regs: Pointer to currents pt_regs
+ *
+ * Defaults to an empty implementation. Can be replaced by architecture
+ * specific code.
+ *
+ * Invoked from syscall_enter_from_user_mode() in the non-instrumentable
+ * section. Use __always_inline so the compiler cannot push it out of line
+ * and make it instrumentable.
+ */
+static __always_inline void arch_check_user_regs(struct pt_regs *regs);
+
+#ifndef arch_check_user_regs
+static __always_inline void arch_check_user_regs(struct pt_regs *regs) {}
+#endif
+
+/**
+ * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry()
+ * @regs: Pointer to currents pt_regs
+ *
+ * Returns: 0 on success or an error code to skip the syscall.
+ *
+ * Defaults to tracehook_report_syscall_entry(). Can be replaced by
+ * architecture specific code.
+ *
+ * Invoked from syscall_enter_from_user_mode()
+ */
+static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs);
+
+#ifndef arch_syscall_enter_tracehook
+static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs)
+{
+ return tracehook_report_syscall_entry(regs);
+}
+#endif
+
+/**
+ * syscall_enter_from_user_mode - Check and handle work before invoking
+ * a syscall
+ * @regs: Pointer to currents pt_regs
+ * @syscall: The syscall number
+ *
+ * Invoked from architecture specific syscall entry code with interrupts
+ * disabled. The calling code has to be non-instrumentable. When the
+ * function returns all state is correct and the subsequent functions can be
+ * instrumented.
+ *
+ * Returns: The original or a modified syscall number
+ *
+ * If the returned syscall number is -1 then the syscall should be
+ * skipped. In this case the caller may invoke syscall_set_error() or
+ * syscall_set_return_value() first. If neither of those are called and -1
+ * is returned, then the syscall will fail with ENOSYS.
+ *
+ * The following functionality is handled here:
+ *
+ * 1) Establish state (lockdep, RCU (context tracking), tracing)
+ * 2) TIF flag dependent invocations of arch_syscall_enter_tracehook(),
+ * __secure_computing(), trace_sys_enter()
+ * 3) Invocation of audit_syscall_entry()
+ */
+long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall);
+
+/**
+ * irqentry_enter_from_user_mode - Establish state before invoking the irq handler
+ * @regs: Pointer to currents pt_regs
+ *
+ * Invoked from architecture specific entry code with interrupts disabled.
+ * Can only be called when the interrupt entry came from user mode. The
+ * calling code must be non-instrumentable. When the function returns all
+ * state is correct and the subsequent functions can be instrumented.
+ *
+ * The function establishes state (lockdep, RCU (context tracking), tracing)
+ */
+void irqentry_enter_from_user_mode(struct pt_regs *regs);
+
+#endif
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -48,6 +48,7 @@ obj-y += irq/
obj-y += rcu/
obj-y += livepatch/
obj-y += dma/
+obj-y += entry/
obj-$(CONFIG_CHECKPOINT_RESTORE) += kcmp.o
obj-$(CONFIG_FREEZER) += freezer.o
--- /dev/null
+++ b/kernel/entry/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_GENERIC_ENTRY) += common.o
--- /dev/null
+++ b/kernel/entry/common.c
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/context_tracking.h>
+#include <linux/entry-common.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/syscalls.h>
+
+/**
+ * enter_from_user_mode - Establish state when coming from user mode
+ *
+ * Syscall/interrupt entry disables interrupts, but user mode is traced as
+ * interrupts enabled. Also with NO_HZ_FULL RCU might be idle.
+ *
+ * 1) Tell lockdep that interrupts are disabled
+ * 2) Invoke context tracking if enabled to reactivate RCU
+ * 3) Trace interrupts off state
+ */
+static __always_inline void enter_from_user_mode(struct pt_regs *regs)
+{
+ arch_check_user_regs(regs);
+ lockdep_hardirqs_off(CALLER_ADDR0);
+
+ CT_WARN_ON(ct_state() != CONTEXT_USER);
+ user_exit_irqoff();
+
+ instrumentation_begin();
+ trace_hardirqs_off_finish();
+ instrumentation_end();
+}
+
+static inline void syscall_enter_audit(struct pt_regs *regs, long syscall)
+{
+ if (unlikely(audit_context())) {
+ unsigned long args[6];
+
+ syscall_get_arguments(current, regs, args);
+ audit_syscall_entry(syscall, args[0], args[1], args[2], args[3]);
+ }
+}
+
+static long syscall_trace_enter(struct pt_regs *regs, long syscall,
+ unsigned long ti_work)
+{
+ long ret = 0;
+
+ /* Handle ptrace */
+ if (ti_work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) {
+ ret = arch_syscall_enter_tracehook(regs);
+ if (ret || (ti_work & _TIF_SYSCALL_EMU))
+ return -1L;
+ }
+
+ /* Do seccomp after ptrace, to catch any tracer changes. */
+ if (ti_work & _TIF_SECCOMP) {
+ ret = __secure_computing(NULL);
+ if (ret == -1L)
+ return ret;
+ }
+
+ if (unlikely(ti_work & _TIF_SYSCALL_TRACEPOINT))
+ trace_sys_enter(regs, syscall);
+
+ syscall_enter_audit(regs, syscall);
+
+ return ret ? : syscall;
+}
+
+noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
+{
+ unsigned long ti_work;
+
+ enter_from_user_mode(regs);
+ instrumentation_begin();
+
+ local_irq_enable();
+ ti_work = READ_ONCE(current_thread_info()->flags);
+ if (ti_work & SYSCALL_ENTER_WORK)
+ syscall = syscall_trace_enter(regs, syscall, ti_work);
+ instrumentation_end();
+
+ return syscall;
+}
+
+noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)
+{
+ enter_from_user_mode(regs);
+}
next prev parent reply other threads:[~2020-07-21 11:08 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-21 10:57 [patch V4 00/15] entry, x86, kvm: Generic entry/exit functionality for host and guest Thomas Gleixner
2020-07-21 10:57 ` [patch V4 01/15] seccomp: Provide stub for __secure_computing() Thomas Gleixner
2020-07-21 21:21 ` Kees Cook
2020-07-21 10:57 ` Thomas Gleixner [this message]
2020-07-21 21:38 ` [patch V4 02/15] entry: Provide generic syscall entry functionality Kees Cook
2020-07-22 7:34 ` Thomas Gleixner
2020-07-22 7:54 ` peterz
2020-07-21 10:57 ` [patch V4 03/15] entry: Provide generic syscall exit function Thomas Gleixner
2020-07-27 22:37 ` Andy Lutomirski
2020-07-21 10:57 ` [patch V4 04/15] entry: Provide generic interrupt entry/exit code Thomas Gleixner
2020-07-27 22:39 ` Andy Lutomirski
2020-07-29 12:18 ` Thomas Gleixner
2020-07-21 10:57 ` [patch V4 05/15] entry: Provide infrastructure for work before exiting to guest mode Thomas Gleixner
2020-07-21 10:57 ` [patch V4 06/15] x86/entry: Consolidate check_user_regs() Thomas Gleixner
2020-07-27 22:39 ` Andy Lutomirski
2020-07-21 10:57 ` [patch V4 07/15] x86/entry: Consolidate 32/64 bit syscall entry Thomas Gleixner
2020-07-21 10:57 ` [patch V4 08/15] x86/entry: Move user return notifier out of loop Thomas Gleixner
2020-07-21 10:57 ` [patch V4 09/15] x86/ptrace: Provide pt_regs helper for entry/exit Thomas Gleixner
2020-07-21 10:57 ` [patch V4 10/15] x86/entry: Use generic syscall entry function Thomas Gleixner
2020-07-21 21:47 ` Kees Cook
2020-07-22 18:25 ` Thomas Gleixner
2020-07-21 10:57 ` [patch V4 11/15] x86/entry: Use generic syscall exit functionality Thomas Gleixner
2020-07-21 21:47 ` Kees Cook
2020-07-21 10:57 ` [patch V4 12/15] x86/entry: Cleanup idtentry_entry/exit_user Thomas Gleixner
2020-07-21 21:48 ` Kees Cook
2020-07-21 10:57 ` [patch V4 13/15] x86/entry: Use generic interrupt entry/exit code Thomas Gleixner
2020-07-21 10:57 ` [patch V4 14/15] x86/entry: Cleanup idtentry_enter/exit Thomas Gleixner
2020-07-21 21:48 ` Kees Cook
2020-07-21 10:57 ` [patch V4 15/15] x86/kvm: Use generic exit to guest work function Thomas Gleixner
2020-07-21 20:27 ` Sean Christopherson
2020-07-22 7:40 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200721110808.455350746@linutronix.de \
--to=tglx@linutronix.de \
--cc=arnd@arndb.de \
--cc=keescook@chromium.org \
--cc=keno@juliacomputing.com \
--cc=krisman@collabora.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=pbonzini@redhat.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).