* [PATCH 0/7] use struct pt_regs based syscall calling for x86-64
@ 2018-03-30 9:37 Dominik Brodowski
2018-03-30 9:37 ` [PATCH 1/7] x86: don't pointlessly reload the system call number Dominik Brodowski
` (7 more replies)
0 siblings, 8 replies; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 9:37 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Andi Kleen, Andrew Morton,
Andy Lutomirski, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
Ingo Molnar, Peter Zijlstra, Thomas Gleixner, x86
On top of all the patches which remove in-kernel calls to syscall functions
sent out yesterday[*[, it now becomes easy for achitectures to re-define the
syscall calling convention. For x86, this may be used to merely decode those
entries from struct pt_regs which are needed for a specific syscall.
[*] http://lkml.kernel.org/r/20180329112426.23043-1-linux@dominikbrodowski.net
This approach avoids leaking random user-provided register content down
the call chain. Therefore, the last patch of this series extends the
register clearing in the entry path to a few more registers.
To exemplify: sys_recv() is a classic 4-parameter syscall. For this syscall,
the DEFINE_SYSCALL macro creates the following stub:
asmlinkage long sys_recv(struct pt_regs *regs)
{
return SyS_recv(regs->di, regs->si, regs->dx, regs->r10);
}
The assembly of that function then becomes, in slightly reordered fashion:
<sys_recv>:
callq <__fentry__>
/* decode regs->di, ->si, ->dx and ->r10 */
mov 0x70(%rdi),%rdi
mov 0x68(%rdi),%rsi
mov 0x60(%rdi),%rdx
mov 0x38(%rdi),%rcx
[ SyS_recv() is inlined here by the compiler, as it is tiny ]
/* clear %r9 and %r8, the 5th and 6th args */
xor %r9d,%r9d
xor %r8d,%r8d
/* do the actual work */
callq __sys_recvfrom
/* cleanup and return */
cltq
retq
For IA32_EMULATION and X32, additional care needs to be taken as they use
different registers to pass parameters to syscalls; vsyscalls need to be
modified to use this new calling convention as well.
This actual conversion of x86 syscalls is heavily based on a proof-of-concept
by Linus[*]. This patchset here differs, for example, as it provides a generic
config symbol ARCH_HAS_SYSCALL_WRAPPER, introduces <asm/syscall_wrapper.h>,
splits up the patch into several parts, and adds the actual register clearing.
[*] Accessible at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git WIP-syscall
It contains an additional patch
x86: avoid per-cpu system call trampoline
which is not included in my series as it addresses a different
issue, but may be of interest to the x86 maintainers as well.
Compared to v4.16-rc5 baseline and on a random kernel config, these patches
(in combination with the large do-not-call-syscalls-in-the-kernel series)
lead to a minisculue increase in text (+0.005%) and data (+0.11%) size on a
pure 64bit system,
text data bss dec hex filename
18853337 9535476 938380 29327193 1bf7f59 vmlinux-orig
18854227 9546100 938380 29338707 1bfac53 vmlinux,
with IA32_EMULATION and X32 enabled, the situation is just a little bit worse
for text size (+0.009%) and data (+0.38%) size.
text data bss dec hex filename
18902496 9603676 938444 29444616 1c14a08 vmlinux-orig
18904136 9640604 938444 29483184 1c1e0b0 vmlinux.
The 64bit part of this series has worked flawlessly on my local system for a
few weeks. IA32_EMULATION and x32 has passed some basic testing as well, but
has not yet been tested as extensively as x86-64. Pure i386 kernels are left
as-is, as they use a different asmlinkage anyway.
A few questions remain, from important stuff to bikeshedding:
1) Is it acceptable to pass the existing struct pt_regs to the sys_*()
kernel functions in emulate_vsyscall(), or should it use a hand-crafted
struct pt_regs instead?
2) Is it the right approach to generate the __sys32_ia32_*() names to
include in the syscall table on-the-fly, or should they all be listed
in arch/x86/entry/syscalls/syscall_32.tbl ?
3) I have chosen to name the default 64-bit syscall stub sys_*(), same as
the "normal" syscall, and the IA32_EMULATION compat syscall stub
compat_sys_*(), same as the "normal" compat syscall. Though this
might cause some confusion, as the "same" function uses a different
calling convention and different parameters on x86, it has the
advantages that
- the kernel *has* a function sys_*() implementing the syscall,
so those curious in stack traces etc. will find it in plain
sight,
- it is easier to handle in the syscall table generation, and
- error injection works the same.
The whole series is available at
https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP
Thanks,
Dominik
Dominik Brodowski (6):
syscalls: introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER
syscalls/x86: use struct pt_regs based syscall calling for 64bit
syscalls
syscalls: prepare ARCH_HAS_SYSCALL_WRAPPER for compat syscalls
syscalls/x86: use struct pt_regs based syscall calling for
IA32_EMULATION and x32
syscalls/x86: unconditionally enable struct pt_regs based syscalls on
x86_64
x86/entry/64: extend register clearing on syscall entry to lower
registers
Linus Torvalds (1):
x86: don't pointlessly reload the system call number
arch/x86/Kconfig | 1 +
arch/x86/entry/calling.h | 2 +
arch/x86/entry/common.c | 20 ++--
arch/x86/entry/entry_64.S | 3 +-
arch/x86/entry/entry_64_compat.S | 6 ++
arch/x86/entry/syscall_32.c | 15 ++-
arch/x86/entry/syscall_64.c | 6 +-
arch/x86/entry/syscalls/syscall_64.tbl | 74 ++++++-------
arch/x86/entry/syscalls/syscalltbl.sh | 8 ++
arch/x86/entry/vsyscall/vsyscall_64.c | 14 +--
arch/x86/include/asm/syscall.h | 4 +
arch/x86/include/asm/syscall_wrapper.h | 189 +++++++++++++++++++++++++++++++++
arch/x86/include/asm/syscalls.h | 17 ++-
include/linux/compat.h | 22 ++++
include/linux/syscalls.h | 25 ++++-
init/Kconfig | 10 ++
kernel/sys_ni.c | 10 ++
kernel/time/posix-stubs.c | 10 ++
18 files changed, 365 insertions(+), 71 deletions(-)
create mode 100644 arch/x86/include/asm/syscall_wrapper.h
--
2.16.3
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/7] x86: don't pointlessly reload the system call number
2018-03-30 9:37 [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
@ 2018-03-30 9:37 ` Dominik Brodowski
2018-03-30 9:37 ` [PATCH 2/7] syscalls: introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER Dominik Brodowski
` (6 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 9:37 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andi Kleen, x86
From: Linus Torvalds <torvalds@linux-foundation.org>
We have it in a register in the low-level asm, just pass it in as an
argument rather than have do_syscall_64() load it back in from the
ptregs pointer.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: x86@kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
arch/x86/entry/common.c | 12 ++++++------
arch/x86/entry/entry_64.S | 3 ++-
2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 74f6eee15179..a8b066dbbf48 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -266,14 +266,13 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
}
#ifdef CONFIG_X86_64
-__visible void do_syscall_64(struct pt_regs *regs)
+__visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
{
- struct thread_info *ti = current_thread_info();
- unsigned long nr = regs->orig_ax;
+ struct thread_info *ti;
enter_from_user_mode();
local_irq_enable();
-
+ ti = current_thread_info();
if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY)
nr = syscall_trace_enter(regs);
@@ -282,8 +281,9 @@ __visible void do_syscall_64(struct pt_regs *regs)
* table. The only functional difference is the x32 bit in
* regs->orig_ax, which changes the behavior of some syscalls.
*/
- if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
- nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
+ nr &= __SYSCALL_MASK;
+ if (likely(nr < NR_syscalls)) {
+ nr = array_index_nospec(nr, NR_syscalls);
regs->ax = sys_call_table[nr](
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 805f52703ee3..c843b3d69048 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -233,7 +233,8 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
TRACE_IRQS_OFF
/* IRQs are off. */
- movq %rsp, %rdi
+ movq %rax, %rdi
+ movq %rsp, %rsi
call do_syscall_64 /* returns with IRQs disabled */
TRACE_IRQS_IRETQ /* we're about to change IF */
--
2.16.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/7] syscalls: introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER
2018-03-30 9:37 [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
2018-03-30 9:37 ` [PATCH 1/7] x86: don't pointlessly reload the system call number Dominik Brodowski
@ 2018-03-30 9:37 ` Dominik Brodowski
2018-03-30 9:37 ` [PATCH 3/7] syscalls/x86: use struct pt_regs based syscall calling for 64bit syscalls Dominik Brodowski
` (5 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 9:37 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Thomas Gleixner,
H. Peter Anvin, Andi Kleen, Ingo Molnar, Andrew Morton, Al Viro
It may be useful for an architecture to override the definitions of the
SYSCALL_DEFINE0() and __SYSCALL_DEFINEx() macros in <linux/syscalls.h>,
in particular to use a different calling convention for syscalls.
This patch provides a mechanism to do so: It introduces
CONFIG_ARCH_HAS_SYSCALL_WRAPPER. If it is enabled, <asm/sycall_wrapper.h>
is included in <linux/syscalls.h> and may be used to define the macros
mentioned above. Moreover, as the syscall calling convention may be
different if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is set, the syscall function
prototypes in <linux/syscalls.h> are #ifndef'd out in that case.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
include/linux/syscalls.h | 23 +++++++++++++++++++++++
init/Kconfig | 7 +++++++
2 files changed, 30 insertions(+)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b961184f597a..503ab245d4ce 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -81,6 +81,17 @@ union bpf_attr;
#include <linux/key.h>
#include <trace/syscall.h>
+#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
+/*
+ * It may be useful for an architecture to override the definitions of the
+ * SYSCALL_DEFINE0() and __SYSCALL_DEFINEx() macros, in particular to use a
+ * different calling convention for syscalls. To allow for that, the prototypes
+ * for the sys_*() functions below will *not* be included if
+ * CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled.
+ */
+#include <asm/syscall_wrapper.h>
+#endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */
+
/*
* __MAP - apply a macro to syscall arguments
* __MAP(n, m, t1, a1, t2, a2, ..., tn, an) will expand to
@@ -189,11 +200,13 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
}
#endif
+#ifndef SYSCALL_DEFINE0
#define SYSCALL_DEFINE0(sname) \
SYSCALL_METADATA(_##sname, 0); \
asmlinkage long sys_##sname(void); \
ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); \
asmlinkage long sys_##sname(void)
+#endif /* SYSCALL_DEFINE0 */
#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
@@ -209,6 +222,8 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
#define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
+
+#ifndef __SYSCALL_DEFINEx
#define __SYSCALL_DEFINEx(x, name, ...) \
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
__attribute__((alias(__stringify(SyS##name)))); \
@@ -223,6 +238,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
return ret; \
} \
static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#endif /* __SYSCALL_DEFINEx */
/*
* Called before coming back to user-mode. Returning to user-mode with an
@@ -252,7 +268,12 @@ static inline void addr_limit_user_check(void)
* Please note that these prototypes here are only provided for information
* purposes, for static analysis, and for linking from the syscall table.
* These functions should not be called elsewhere from kernel code.
+ *
+ * As the syscall calling convention may be different from the default
+ * for architectures overriding the syscall calling convention, do not
+ * include the prototypes if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled.
*/
+#ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx);
asmlinkage long sys_io_destroy(aio_context_t ctx);
asmlinkage long sys_io_submit(aio_context_t, long,
@@ -1076,6 +1097,8 @@ asmlinkage long sys_old_mmap(struct mmap_arg_struct __user *arg);
*/
asmlinkage long sys_ni_syscall(void);
+#endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */
+
/*
* Kernel code should not call syscalls (i.e., sys_xyzyyz()) directly.
diff --git a/init/Kconfig b/init/Kconfig
index e37f4b2a6445..6079629be211 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1924,3 +1924,10 @@ source "kernel/Kconfig.locks"
config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
bool
+
+# It may be useful for an architecture to override the definitions of the
+# SYSCALL_DEFINE() and __SYSCALL_DEFINEx() macros in <linux/syscalls.h>,
+# in particular to use a different calling convention for syscalls.
+config ARCH_HAS_SYSCALL_WRAPPER
+ def_bool n
+ depends on !COMPAT
--
2.16.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/7] syscalls/x86: use struct pt_regs based syscall calling for 64bit syscalls
2018-03-30 9:37 [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
2018-03-30 9:37 ` [PATCH 1/7] x86: don't pointlessly reload the system call number Dominik Brodowski
2018-03-30 9:37 ` [PATCH 2/7] syscalls: introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER Dominik Brodowski
@ 2018-03-30 9:37 ` Dominik Brodowski
2018-03-30 9:37 ` [PATCH 4/7] syscalls: prepare ARCH_HAS_SYSCALL_WRAPPER for compat syscalls Dominik Brodowski
` (4 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 9:37 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Thomas Gleixner, Andi Kleen,
Ingo Molnar, Andrew Morton, Al Viro, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, Peter Zijlstra, H. Peter Anvin, x86
Let's make use of ARCH_HAS_SYSCALL_WRAPPER on pure 64bit x86-64 systems:
Each syscall defines a stub which takes struct pt_regs as its only
argument. It decodes just those parameters it needs, e.g:
asmlinkage long sys_xyzzy(struct pt_regs *regs)
{
return SyS_xyzzy(regs->di, regs->si, regs->dx);
}
This approach avoids leaking random user-provided register content down
the call chain.
For example, for sys_recv() which is a 4-parameter syscall, the assembly
now is (in slightly reordered fashion):
<sys_recv>:
callq <__fentry__>
/* decode regs->di, ->si, ->dx and ->r10 */
mov 0x70(%rdi),%rdi
mov 0x68(%rdi),%rsi
mov 0x60(%rdi),%rdx
mov 0x38(%rdi),%rcx
[ SyS_recv() is automatically inlined by the compiler,
as it is not [yet] used anywhere else ]
/* clear %r9 and %r8, the 5th and 6th args */
xor %r9d,%r9d
xor %r8d,%r8d
/* do the actual work */
callq __sys_recvfrom
/* cleanup and return */
cltq
retq
The only valid place in an x86-64 kernel which rightfully calls
a syscall function on its own -- vsyscall -- needs to be modified
to pass struct pt_regs onwards as well.
This patch is based on an original proof-of-concept
From: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and was split up and heavily modified by me, in particular to base it on
ARCH_HAS_SYSCALL_WRAPPER, to limit it to 64bit-only for the time being,
and to update the vsyscall to the new calling convention.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
arch/x86/Kconfig | 5 +++
arch/x86/entry/common.c | 4 ++
arch/x86/entry/syscall_64.c | 9 ++++-
arch/x86/entry/vsyscall/vsyscall_64.c | 16 ++++++++
arch/x86/include/asm/syscall.h | 4 ++
arch/x86/include/asm/syscall_wrapper.h | 69 ++++++++++++++++++++++++++++++++++
arch/x86/include/asm/syscalls.h | 7 ++++
include/linux/syscalls.h | 2 +-
8 files changed, 113 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/include/asm/syscall_wrapper.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0fa71a78ec99..a5db03705452 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2957,3 +2957,8 @@ source "crypto/Kconfig"
source "arch/x86/kvm/Kconfig"
source "lib/Kconfig"
+
+config SYSCALL_PTREGS
+ def_bool y
+ depends on X86_64 && !COMPAT
+ select ARCH_HAS_SYSCALL_WRAPPER
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index a8b066dbbf48..e1b91bffa988 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -284,9 +284,13 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
nr &= __SYSCALL_MASK;
if (likely(nr < NR_syscalls)) {
nr = array_index_nospec(nr, NR_syscalls);
+#ifdef CONFIG_SYSCALL_PTREGS
+ regs->ax = sys_call_table[nr](regs);
+#else
regs->ax = sys_call_table[nr](
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);
+#endif
}
syscall_return_slowpath(regs);
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index c176d2fab1da..b4e724777a7d 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -7,14 +7,19 @@
#include <asm/asm-offsets.h>
#include <asm/syscall.h>
+#ifdef CONFIG_SYSCALL_PTREGS
+/* this is a lie, but it does not hurt as sys_ni_syscall just returns -EINVAL */
+extern asmlinkage long sys_ni_syscall(struct pt_regs *);
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(struct pt_regs *);
+#else /* CONFIG_SYSCALL_PTREGS */
+extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#endif /* CONFIG_SYSCALL_PTREGS */
#include <asm/syscalls_64.h>
#undef __SYSCALL_64
#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
-extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
-
asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
/*
* Smells like a compiler bug -- it doesn't work
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 8560ef68a9d6..9fad68899f82 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -227,19 +227,35 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
ret = -EFAULT;
switch (vsyscall_nr) {
case 0:
+#ifdef CONFIG_SYSCALL_PTREGS
+ /* this decodes regs->di and regs->si on its own */
+ ret = sys_gettimeofday(regs);
+#else
ret = sys_gettimeofday(
(struct timeval __user *)regs->di,
(struct timezone __user *)regs->si);
+#endif /* CONFIG_SYSCALL_PTREGS */
break;
case 1:
+#ifdef CONFIG_SYSCALL_PTREGS
+ /* this decodes regs->di on its own */
+ ret = sys_time(regs);
+#else
ret = sys_time((time_t __user *)regs->di);
+#endif /* CONFIG_SYSCALL_PTREGS */
break;
case 2:
+#ifdef CONFIG_SYSCALL_PTREGS
+ /* this decodes regs->di, regs->si and regs->dx on its own */
+ regs->dx = 0;
+ ret = sys_getcpu(regs);
+#else
ret = sys_getcpu((unsigned __user *)regs->di,
(unsigned __user *)regs->si,
NULL);
+#endif /* CONFIG_SYSCALL_PTREGS */
break;
}
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 03eedc21246d..8702c7951bc7 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -20,9 +20,13 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>
+#ifdef CONFIG_SYSCALL_PTREGS
+typedef asmlinkage long (*sys_call_ptr_t)(struct pt_regs *);
+#else
typedef asmlinkage long (*sys_call_ptr_t)(unsigned long, unsigned long,
unsigned long, unsigned long,
unsigned long, unsigned long);
+#endif /* CONFIG_SYSCALL_PTREGS */
extern const sys_call_ptr_t sys_call_table[];
#if defined(CONFIG_X86_32)
diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
new file mode 100644
index 000000000000..ca928adf4a53
--- /dev/null
+++ b/arch/x86/include/asm/syscall_wrapper.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * syscall_wrapper.h - x86 specific wrappers to syscall definitions
+ */
+
+#ifndef _ASM_X86_SYSCALL_WRAPPER_H
+#define _ASM_X86_SYSCALL_WRAPPER_H
+
+/*
+ * Instead of the generic __SYSCALL_DEFINEx() definition, this macro takes
+ * struct pt_regs *regs as the only argument of the syscall stub named
+ * sys_*(). It decodes just the registers it needs and passes them on to
+ * the SyS_*() wrapper and then to the SYSC_*() function doing the actual job.
+ * These wrappers and functions are inlined, meaning that the assembly looks
+ * as follows (slightly re-ordered):
+ *
+ * <sys_recv>: <-- syscall with 4 parameters
+ * callq <__fentry__>
+ *
+ * mov 0x70(%rdi),%rdi <-- decode regs->di
+ * mov 0x68(%rdi),%rsi <-- decode regs->si
+ * mov 0x60(%rdi),%rdx <-- decode regs->dx
+ * mov 0x38(%rdi),%rcx <-- decode regs->r10
+ *
+ * xor %r9d,%r9d <-- clear %r9
+ * xor %r8d,%r8d <-- clear %r8
+ *
+ * callq __sys_recvfrom <-- do the actual work in __sys_recvfrom()
+ * which takes 6 arguments
+ *
+ * cltq <-- extend return value to 64bit
+ * retq <-- return
+ *
+ * This approach avoids leaking random user-provided register content down
+ * the call chain.
+ *
+ * As the generic SYSCALL_DEFINE0() macro does not decode any parameters for
+ * obvious reasons, there is no need to override it.
+ */
+#define __SYSCALL_DEFINEx(x, name, ...) \
+ asmlinkage long sys##name(struct pt_regs *regs); \
+ ALLOW_ERROR_INJECTION(sys##name, ERRNO); \
+ static long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
+ static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
+ asmlinkage long sys##name(struct pt_regs *regs) \
+ { \
+ return SyS##name(__MAP(x,__SC_ARGS \
+ ,,regs->di,,regs->si,,regs->dx \
+ ,,regs->r10,,regs->r8,,regs->r9)); \
+ } \
+ static long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
+ { \
+ long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
+ __MAP(x,__SC_TEST,__VA_ARGS__); \
+ __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
+ return ret; \
+ } \
+ static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+
+/*
+ * for VSYSCALLS, we need to declare these three syscalls with the new
+ * pt_regs-based calling convention for in-kernel use.
+ */
+struct pt_regs;
+asmlinkage long sys_getcpu(struct pt_regs *regs); /* di, si, dx */
+asmlinkage long sys_gettimeofday(struct pt_regs *regs); /* di, si */
+asmlinkage long sys_time(struct pt_regs *regs); /* di */
+
+#endif /* _ASM_X86_SYSCALL_WRAPPER_H */
diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index ae6e05fdc24b..e4ad93c05f02 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -18,6 +18,12 @@
/* Common in X86_32 and X86_64 */
/* kernel/ioport.c */
long ksys_ioperm(unsigned long from, unsigned long num, int turn_on);
+
+#ifndef CONFIG_SYSCALL_PTREGS
+/*
+ * If CONFIG_SYSCALL_PTREGS is enabled, a different syscall calling convention
+ * is used. Do not include these -- invalid -- prototypes then
+ */
asmlinkage long sys_ioperm(unsigned long, unsigned long, int);
asmlinkage long sys_iopl(unsigned int);
@@ -53,4 +59,5 @@ asmlinkage long sys_mmap(unsigned long, unsigned long, unsigned long,
unsigned long, unsigned long, unsigned long);
#endif /* CONFIG_X86_32 */
+#endif /* CONFIG_SYSCALL_PTREGS */
#endif /* _ASM_X86_SYSCALLS_H */
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 503ab245d4ce..d7168b3a4b4c 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -102,7 +102,7 @@ union bpf_attr;
* for SYSCALL_DEFINE<n>/COMPAT_SYSCALL_DEFINE<n>
*/
#define __MAP0(m,...)
-#define __MAP1(m,t,a) m(t,a)
+#define __MAP1(m,t,a,...) m(t,a)
#define __MAP2(m,t,a,...) m(t,a), __MAP1(m,__VA_ARGS__)
#define __MAP3(m,t,a,...) m(t,a), __MAP2(m,__VA_ARGS__)
#define __MAP4(m,t,a,...) m(t,a), __MAP3(m,__VA_ARGS__)
--
2.16.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/7] syscalls: prepare ARCH_HAS_SYSCALL_WRAPPER for compat syscalls
2018-03-30 9:37 [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
` (2 preceding siblings ...)
2018-03-30 9:37 ` [PATCH 3/7] syscalls/x86: use struct pt_regs based syscall calling for 64bit syscalls Dominik Brodowski
@ 2018-03-30 9:37 ` Dominik Brodowski
2018-03-30 9:37 ` [PATCH 5/7] syscalls/x86: use struct pt_regs based syscall calling for IA32_EMULATION and x32 Dominik Brodowski
` (3 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 9:37 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Thomas Gleixner, Andi Kleen,
Ingo Molnar, Andrew Morton, Al Viro, H. Peter Anvin
It may be useful for an architecture to override the definitions of the
COMPAT_SYSCALL_DEFINE0() and __COMPAT_SYSCALL_DEFINEx() macros in
<linux/compat.h>, in particular to use a different calling convention
for syscalls. This patch provides a mechanism to do so, based on the
previously introduced CONFIG_ARCH_HAS_SYSCALL_WRAPPER. If it is enabled,
<asm/sycall_wrapper.h> is included in <linux/compat.h> and may be used
to define the macros mentioned above. Moreover, as the syscall calling
convention may be different if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is set,
the compat syscall function prototypes in <linux/compat.h> are #ifndef'd
out in that case.
As some of the syscalls and/or compat syscalls may not be present,
the COND_SYSCALL() and COND_SYSCALL_COMPAT() macros in kernel/sys_ni.c
as well as the SYS_NI() and COMPAT_SYS_NI() macros in
kernel/time/posix-stubs.c can be re-defined in <asm/syscall_wrapper.h> iff
CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
include/linux/compat.h | 22 ++++++++++++++++++++++
init/Kconfig | 9 ++++++---
kernel/sys_ni.c | 10 ++++++++++
kernel/time/posix-stubs.c | 10 ++++++++++
4 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 9847c5a013c3..2d85ec5cfda2 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -24,6 +24,17 @@
#include <asm/siginfo.h>
#include <asm/signal.h>
+#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
+/*
+ * It may be useful for an architecture to override the definitions of the
+ * COMPAT_SYSCALL_DEFINE0 and COMPAT_SYSCALL_DEFINEx() macros, in particular
+ * to use a different calling convention for syscalls. To allow for that,
+ + the prototypes for the compat_sys_*() functions below will *not* be included
+ * if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled.
+ */
+#include <asm/syscall_wrapper.h>
+#endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */
+
#ifndef COMPAT_USE_64BIT_TIME
#define COMPAT_USE_64BIT_TIME 0
#endif
@@ -32,10 +43,12 @@
#define __SC_DELOUSE(t,v) ((__force t)(unsigned long)(v))
#endif
+#ifndef COMPAT_SYSCALL_DEFINE0
#define COMPAT_SYSCALL_DEFINE0(name) \
asmlinkage long compat_sys_##name(void); \
ALLOW_ERROR_INJECTION(compat_sys_##name, ERRNO); \
asmlinkage long compat_sys_##name(void)
+#endif /* COMPAT_SYSCALL_DEFINE0 */
#define COMPAT_SYSCALL_DEFINE1(name, ...) \
COMPAT_SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
@@ -50,6 +63,7 @@
#define COMPAT_SYSCALL_DEFINE6(name, ...) \
COMPAT_SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
+#ifndef COMPAT_SYSCALL_DEFINEx
#define COMPAT_SYSCALL_DEFINEx(x, name, ...) \
asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
@@ -62,6 +76,7 @@
return C_SYSC##name(__MAP(x,__SC_DELOUSE,__VA_ARGS__)); \
} \
static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+#endif /* COMPAT_SYSCALL_DEFINEx */
#ifndef compat_user_stack_pointer
#define compat_user_stack_pointer() current_user_stack_pointer()
@@ -517,7 +532,12 @@ int __compat_save_altstack(compat_stack_t __user *, unsigned long);
* Please note that these prototypes here are only provided for information
* purposes, for static analysis, and for linking from the syscall table.
* These functions should not be called elsewhere from kernel code.
+ *
+ * As the syscall calling convention may be different from the default
+ * for architectures overriding the syscall calling convention, do not
+ * include the prototypes if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled.
*/
+#ifndef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
asmlinkage long compat_sys_io_setup(unsigned nr_reqs, u32 __user *ctx32p);
asmlinkage long compat_sys_io_submit(compat_aio_context_t ctx_id, int nr,
u32 __user *iocb);
@@ -955,6 +975,8 @@ asmlinkage long compat_sys_stime(compat_time_t __user *tptr);
/* obsolete: net/socket.c */
asmlinkage long compat_sys_socketcall(int call, u32 __user *args);
+#endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */
+
/*
* For most but not all architectures, "am I in a compat syscall?" and
diff --git a/init/Kconfig b/init/Kconfig
index 6079629be211..a7b14fc7bbef 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1926,8 +1926,11 @@ config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
bool
# It may be useful for an architecture to override the definitions of the
-# SYSCALL_DEFINE() and __SYSCALL_DEFINEx() macros in <linux/syscalls.h>,
-# in particular to use a different calling convention for syscalls.
+# SYSCALL_DEFINE() and __SYSCALL_DEFINEx() macros in <linux/syscalls.h>
+# and the COMPAT_ variants in <linux/compat.h>, in particular to use a
+# different calling convention for syscalls. They can also override the
+# macros for not implemented syscalls in kernel/sys_ni.c and
+# kernel/time/posix-stubs.c. All these overrides need to be available in
+# <asm/syscall_wrapper.h>
config ARCH_HAS_SYSCALL_WRAPPER
def_bool n
- depends on !COMPAT
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 6cafc008f6db..9791364925dc 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -5,6 +5,11 @@
#include <asm/unistd.h>
+#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
+/* Architectures may override COND_SYSCALL and COND_SYSCALL_COMPAT */
+#include <asm/syscall_wrapper.h>
+#endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */
+
/* we can't #include <linux/syscalls.h> here,
but tell gcc to not warn with -Wmissing-prototypes */
asmlinkage long sys_ni_syscall(void);
@@ -17,8 +22,13 @@ asmlinkage long sys_ni_syscall(void)
return -ENOSYS;
}
+#ifndef COND_SYSCALL
#define COND_SYSCALL(name) cond_syscall(sys_##name)
+#endif /* COND_SYSCALL */
+
+#ifndef COND_SYSCALL_COMPAT
#define COND_SYSCALL_COMPAT(name) cond_syscall(compat_sys_##name)
+#endif /* COND_SYSCALL_COMPAT */
/*
* This list is kept in the same order as include/uapi/asm-generic/unistd.h.
diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index b258bee13b02..69a937c3cd81 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -19,6 +19,11 @@
#include <linux/posix-timers.h>
#include <linux/compat.h>
+#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
+/* Architectures may override SYS_NI and COMPAT_SYS_NI */
+#include <asm/syscall_wrapper.h>
+#endif
+
asmlinkage long sys_ni_posix_timers(void)
{
pr_err_once("process %d (%s) attempted a POSIX timer syscall "
@@ -27,8 +32,13 @@ asmlinkage long sys_ni_posix_timers(void)
return -ENOSYS;
}
+#ifndef SYS_NI
#define SYS_NI(name) SYSCALL_ALIAS(sys_##name, sys_ni_posix_timers)
+#endif
+
+#ifndef COMPAT_SYS_NI
#define COMPAT_SYS_NI(name) SYSCALL_ALIAS(compat_sys_##name, sys_ni_posix_timers)
+#endif
SYS_NI(timer_create);
SYS_NI(timer_gettime);
--
2.16.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/7] syscalls/x86: use struct pt_regs based syscall calling for IA32_EMULATION and x32
2018-03-30 9:37 [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
` (3 preceding siblings ...)
2018-03-30 9:37 ` [PATCH 4/7] syscalls: prepare ARCH_HAS_SYSCALL_WRAPPER for compat syscalls Dominik Brodowski
@ 2018-03-30 9:37 ` Dominik Brodowski
2018-03-30 9:37 ` [PATCH 6/7] syscalls/x86: unconditionally enable struct pt_regs based syscalls on x86_64 Dominik Brodowski
` (2 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 9:37 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Thomas Gleixner, Andi Kleen,
Ingo Molnar, Andrew Morton, Al Viro, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, Peter Zijlstra, x86, H. Peter Anvin
Extend ARCH_HAS_SYSCALL_WRAPPER for i386 emulation and for x32 on 64-bit
x86.
For x32, all we need to do is to create an additional stub for each
compat syscall which decodes the parameters in x86-64 ordering, e.g.:
asmlinkage long __compat_sys_x32_xyzzy(struct pt_regs *regs)
{
return c_SyS_xyzzy(regs->di, regs->si, regs->dx);
}
For i386 emulation, we need to teach compat_sys_*() to take struct
pt_regs as its only argument, e.g.:
asmlinkage long compat_sys_xyzzy(struct pt_regs *regs)
{
return c_SyS_xyzzy(regs->bx, regs->cx, regs->dx);
}
In addition, we need to create additional stubs for common syscalls
(that is, for syscalls which have the same parameters on 32bit and 64bit),
e.g.:
asmlinkage long __sys32_ia32_xyzzy(struct pt_regs *regs)
{
return c_sys_xyzzy(regs->bx, regs->cx, regs->dx);
}
This approach avoids leaking random user-provided register content down
the call chain.
This patch is based on an original proof-of-concept
From: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and was split up and heavily modified by me, in particular to base it on
ARCH_HAS_SYSCALL_WRAPPER.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
arch/x86/Kconfig | 2 +-
arch/x86/entry/common.c | 4 ++
arch/x86/entry/syscall_32.c | 15 +++-
arch/x86/entry/syscalls/syscall_64.tbl | 74 +++++++++----------
arch/x86/entry/syscalls/syscalltbl.sh | 8 +++
arch/x86/include/asm/syscall_wrapper.h | 128 +++++++++++++++++++++++++++++++--
6 files changed, 187 insertions(+), 44 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a5db03705452..2ad46f7c522c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2960,5 +2960,5 @@ source "lib/Kconfig"
config SYSCALL_PTREGS
def_bool y
- depends on X86_64 && !COMPAT
+ depends on X86_64
select ARCH_HAS_SYSCALL_WRAPPER
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index e1b91bffa988..425f798b39e3 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -325,6 +325,9 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
if (likely(nr < IA32_NR_syscalls)) {
nr = array_index_nospec(nr, IA32_NR_syscalls);
+#ifdef CONFIG_SYSCALL_PTREGS
+ regs->ax = ia32_sys_call_table[nr](regs);
+#else
/*
* It's possible that a 32-bit syscall implementation
* takes a 64-bit parameter but nonetheless assumes that
@@ -335,6 +338,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
(unsigned int)regs->bx, (unsigned int)regs->cx,
(unsigned int)regs->dx, (unsigned int)regs->si,
(unsigned int)regs->di, (unsigned int)regs->bp);
+#endif /* CONFIG_SYSCALL_PTREGS */
}
syscall_return_slowpath(regs);
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index 95c294963612..bbd8dda36c7d 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -7,14 +7,23 @@
#include <asm/asm-offsets.h>
#include <asm/syscall.h>
-#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#ifdef CONFIG_SYSCALL_PTREGS
+/* On X86_64, we use struct pt_regs * to pass parameters to syscalls */
+#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(struct pt_regs *);
+
+/* this is a lie, but it does not hurt as sys_ni_syscall just returns -EINVAL */
+extern asmlinkage long sys_ni_syscall(struct pt_regs *);
+
+#else /* CONFIG_SYSCALL_PTREGS */
+#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
+#endif /* CONFIG_SYSCALL_PTREGS */
+
#include <asm/syscalls_32.h>
#undef __SYSCALL_I386
#define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
-extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
-
__visible const sys_call_ptr_t ia32_sys_call_table[__NR_syscall_compat_max+1] = {
/*
* Smells like a compiler bug -- it doesn't work
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..a83c0f7f462f 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -342,41 +342,43 @@
#
# x32-specific system call numbers start at 512 to avoid cache impact
-# for native 64-bit operation.
+# for native 64-bit operation. The __compat_sys_x32 stubs are created
+# on-the-fly for compat_sys_*() compatibility system calls if X86_X32
+# is defined.
#
-512 x32 rt_sigaction compat_sys_rt_sigaction
+512 x32 rt_sigaction __compat_sys_x32_rt_sigaction
513 x32 rt_sigreturn sys32_x32_rt_sigreturn
-514 x32 ioctl compat_sys_ioctl
-515 x32 readv compat_sys_readv
-516 x32 writev compat_sys_writev
-517 x32 recvfrom compat_sys_recvfrom
-518 x32 sendmsg compat_sys_sendmsg
-519 x32 recvmsg compat_sys_recvmsg
-520 x32 execve compat_sys_execve/ptregs
-521 x32 ptrace compat_sys_ptrace
-522 x32 rt_sigpending compat_sys_rt_sigpending
-523 x32 rt_sigtimedwait compat_sys_rt_sigtimedwait
-524 x32 rt_sigqueueinfo compat_sys_rt_sigqueueinfo
-525 x32 sigaltstack compat_sys_sigaltstack
-526 x32 timer_create compat_sys_timer_create
-527 x32 mq_notify compat_sys_mq_notify
-528 x32 kexec_load compat_sys_kexec_load
-529 x32 waitid compat_sys_waitid
-530 x32 set_robust_list compat_sys_set_robust_list
-531 x32 get_robust_list compat_sys_get_robust_list
-532 x32 vmsplice compat_sys_vmsplice
-533 x32 move_pages compat_sys_move_pages
-534 x32 preadv compat_sys_preadv64
-535 x32 pwritev compat_sys_pwritev64
-536 x32 rt_tgsigqueueinfo compat_sys_rt_tgsigqueueinfo
-537 x32 recvmmsg compat_sys_recvmmsg
-538 x32 sendmmsg compat_sys_sendmmsg
-539 x32 process_vm_readv compat_sys_process_vm_readv
-540 x32 process_vm_writev compat_sys_process_vm_writev
-541 x32 setsockopt compat_sys_setsockopt
-542 x32 getsockopt compat_sys_getsockopt
-543 x32 io_setup compat_sys_io_setup
-544 x32 io_submit compat_sys_io_submit
-545 x32 execveat compat_sys_execveat/ptregs
-546 x32 preadv2 compat_sys_preadv64v2
-547 x32 pwritev2 compat_sys_pwritev64v2
+514 x32 ioctl __compat_sys_x32_ioctl
+515 x32 readv __compat_sys_x32_readv
+516 x32 writev __compat_sys_x32_writev
+517 x32 recvfrom __compat_sys_x32_recvfrom
+518 x32 sendmsg __compat_sys_x32_sendmsg
+519 x32 recvmsg __compat_sys_x32_recvmsg
+520 x32 execve __compat_sys_x32_execve/ptregs
+521 x32 ptrace __compat_sys_x32_ptrace
+522 x32 rt_sigpending __compat_sys_x32_rt_sigpending
+523 x32 rt_sigtimedwait __compat_sys_x32_rt_sigtimedwait
+524 x32 rt_sigqueueinfo __compat_sys_x32_rt_sigqueueinfo
+525 x32 sigaltstack __compat_sys_x32_sigaltstack
+526 x32 timer_create __compat_sys_x32_timer_create
+527 x32 mq_notify __compat_sys_x32_mq_notify
+528 x32 kexec_load __compat_sys_x32_kexec_load
+529 x32 waitid __compat_sys_x32_waitid
+530 x32 set_robust_list __compat_sys_x32_set_robust_list
+531 x32 get_robust_list __compat_sys_x32_get_robust_list
+532 x32 vmsplice __compat_sys_x32_vmsplice
+533 x32 move_pages __compat_sys_x32_move_pages
+534 x32 preadv __compat_sys_x32_preadv64
+535 x32 pwritev __compat_sys_x32_pwritev64
+536 x32 rt_tgsigqueueinfo __compat_sys_x32_rt_tgsigqueueinfo
+537 x32 recvmmsg __compat_sys_x32_recvmmsg
+538 x32 sendmmsg __compat_sys_x32_sendmmsg
+539 x32 process_vm_readv __compat_sys_x32_process_vm_readv
+540 x32 process_vm_writev __compat_sys_x32_process_vm_writev
+541 x32 setsockopt __compat_sys_x32_setsockopt
+542 x32 getsockopt __compat_sys_x32_getsockopt
+543 x32 io_setup __compat_sys_x32_io_setup
+544 x32 io_submit __compat_sys_x32_io_submit
+545 x32 execveat __compat_sys_x32_execveat/ptregs
+546 x32 preadv2 __compat_sys_x32_preadv64v2
+547 x32 pwritev2 __compat_sys_x32_pwritev64v2
diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index d71ef4bd3615..4e468f16cb3b 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -49,6 +49,14 @@ emit() {
grep '^[0-9]' "$in" | sort -n | (
while read nr abi name entry compat; do
abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
+
+ # auto-create i386 stubs for struct pt_regs calling convention
+ if [ -n "$entry" -a "$abi" = "I386" -a -z "$compat" ]; then
+ if [ "$entry" != "sys_ni_syscall" ]; then
+ compat="__sys32_ia32_${entry#sys_}"
+ fi
+ fi
+
if [ "$abi" = "COMMON" -o "$abi" = "64" ]; then
# COMMON is the same as 64, except that we don't expect X32
# programs to use it. Our expectation has nothing to do with
diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
index ca928adf4a53..6629a22b542c 100644
--- a/arch/x86/include/asm/syscall_wrapper.h
+++ b/arch/x86/include/asm/syscall_wrapper.h
@@ -6,6 +6,122 @@
#ifndef _ASM_X86_SYSCALL_WRAPPER_H
#define _ASM_X86_SYSCALL_WRAPPER_H
+/* Mapping of registers to parameters for syscalls on x86-64 and x32 */
+#define SC_X86_64_REGS_TO_ARGS(x, ...) \
+ __MAP(x,__SC_ARGS \
+ ,,regs->di,,regs->si,,regs->dx \
+ ,,regs->r10,,regs->r8,,regs->r9) \
+
+/* Mapping of registers to parameters for syscalls on i386 */
+#define SC_IA32_REGS_TO_ARGS(x, ...) \
+ __MAP(x,__SC_ARGS \
+ ,,(unsigned int)regs->bx,,(unsigned int)regs->cx \
+ ,,(unsigned int)regs->dx,,(unsigned int)regs->si \
+ ,,(unsigned int)regs->di,,(unsigned int)regs->bp)
+
+#ifdef CONFIG_IA32_EMULATION
+/*
+ * For IA32 emulation, we need to handle "compat" syscalls *and* create
+ * additional wrappers (aptly named __sys32_ia32_sys_xyzzy) which decode
+ * the ia32 regs in the proper order for shared or "common" syscalls. As
+ * some syscalls may not be implemented, we need to expand COND_SYSCALL in
+ * kernel/sys_ni.c and SYS_NI in kernel/time/posix-stubs.c to cover this
+ * case as well.
+ */
+#define COMPAT_SC_IA32_STUBx(x, name, ...) \
+ asmlinkage long compat_sys##name(struct pt_regs *regs); \
+ ALLOW_ERROR_INJECTION(compat_sys##name, ERRNO); \
+ asmlinkage long compat_sys##name(struct pt_regs *regs) \
+ { \
+ return c_SyS##name(SC_IA32_REGS_TO_ARGS(x,__VA_ARGS__));\
+ } \
+
+#define SC_IA32_WRAPPERx(x, name, ...) \
+ asmlinkage long __sys32_ia32##name(struct pt_regs *regs); \
+ asmlinkage long __sys32_ia32##name(struct pt_regs *regs) \
+ { \
+ return SyS##name(SC_IA32_REGS_TO_ARGS(x,__VA_ARGS__)); \
+ }
+
+#define COND_SYSCALL(name) \
+ cond_syscall(sys_##name); \
+ cond_syscall(__sys32_ia32_##name)
+
+#define SYS_NI(name) \
+ SYSCALL_ALIAS(sys_##name, sys_ni_posix_timers); \
+ SYSCALL_ALIAS(__sys32_ia32_##name, sys_ni_posix_timers)
+
+/*
+ * As the generic SYSCALL_DEFINE0() macro does not decode any parameters for
+ * obvious reasons, it does not care about struct pt_regs. There is a need,
+ * however, to create an alias named __sys32_ia32_sys*() if IA32_EMULATION
+ * is enabled
+ */
+#define SYSCALL_DEFINE0(sname) \
+ SYSCALL_METADATA(_##sname, 0); \
+ asmlinkage long sys_##sname(void); \
+ ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); \
+ asmlinkage long __sys32_ia32_##sname(void) \
+ __attribute__((alias(__stringify(sys_##sname)))); \
+ asmlinkage long sys_##sname(void)
+
+#else /* CONFIG_IA32_EMULATION */
+#define COMPAT_SC_IA32_STUBx(x, name, ...)
+#define SC_IA32_WRAPPERx(x, fullname, name, ...)
+#endif /* CONFIG_IA32_EMULATION */
+
+
+#ifdef CONFIG_X86_X32
+/*
+ * For the x32 ABI, we need to create a stub for compat_sys_*() which is aware
+ * of the x86-64-style parameter ordering of x32 syscalls. The syscalls common
+ * with x86_64 obviously do not need such care.
+ */
+#define COMPAT_SC_X32_STUBx(x, name, ...) \
+ asmlinkage long __compat_sys_x32##name(struct pt_regs *regs); \
+ ALLOW_ERROR_INJECTION(__compat_sys_x32##name, ERRNO); \
+ asmlinkage long __compat_sys_x32##name(struct pt_regs *regs) \
+ { \
+ return c_SyS##name(SC_X86_64_REGS_TO_ARGS(x,__VA_ARGS__));\
+ } \
+
+/* As some compat syscalls may not be implemented, we need to expand
+ * COND_SYSCALL_COMPAT in kernel/sys_ni.c and COMPAT_SYS_NI in
+ * kernel/time/posix-stubs.c to cover this case as well.
+ */
+#define COND_SYSCALL_COMPAT(name) \
+ cond_syscall(compat_sys_##name); \
+ cond_syscall(__compat_sys_x32_##name)
+
+#define COMPAT_SYS_NI(name) \
+ SYSCALL_ALIAS(compat_sys_##name, sys_ni_posix_timers); \
+ SYSCALL_ALIAS(__compat_sys_x32_##name, sys_ni_posix_timers)
+
+#else /* CONFIG_X86_X32 */
+#define COMPAT_SC_X32_STUBx(x, name, ...)
+#endif /* CONFIG_X86_X32 */
+
+
+#ifdef CONFIG_COMPAT
+/*
+ * Compat means IA32_EMULATION and/or X86_X32. As they use a different
+ * mapping of registers to parameters, we need to generate stubs for each
+ * of them.
+ */
+#define COMPAT_SYSCALL_DEFINEx(x, name, ...) \
+ static long c_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
+ static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
+ COMPAT_SC_IA32_STUBx(x, name, __VA_ARGS__) \
+ COMPAT_SC_X32_STUBx(x, name, __VA_ARGS__) \
+ static long c_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
+ { \
+ return C_SYSC##name(__MAP(x,__SC_DELOUSE,__VA_ARGS__)); \
+ } \
+ static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__))
+
+#endif /* CONFIG_COMPAT */
+
+
/*
* Instead of the generic __SYSCALL_DEFINEx() definition, this macro takes
* struct pt_regs *regs as the only argument of the syscall stub named
@@ -34,8 +150,13 @@
* This approach avoids leaking random user-provided register content down
* the call chain.
*
+ * If IA32_EMULATION is enabled, this macro generates an additional wrapper
+ * named __sys32_ia32_*() which decodes the struct pt_regs *regs according
+ * to the i386 calling convention (bx, cx, dx, si, di, bp).
+ *
* As the generic SYSCALL_DEFINE0() macro does not decode any parameters for
- * obvious reasons, there is no need to override it.
+ * obvious reasons, there is no need to override it unless IA32_EMULATION is
+ * enabled (see above).
*/
#define __SYSCALL_DEFINEx(x, name, ...) \
asmlinkage long sys##name(struct pt_regs *regs); \
@@ -44,10 +165,9 @@
static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
asmlinkage long sys##name(struct pt_regs *regs) \
{ \
- return SyS##name(__MAP(x,__SC_ARGS \
- ,,regs->di,,regs->si,,regs->dx \
- ,,regs->r10,,regs->r8,,regs->r9)); \
+ return SyS##name(SC_X86_64_REGS_TO_ARGS(x,__VA_ARGS__));\
} \
+ SC_IA32_WRAPPERx(x, name, __VA_ARGS__) \
static long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
{ \
long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
--
2.16.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 6/7] syscalls/x86: unconditionally enable struct pt_regs based syscalls on x86_64
2018-03-30 9:37 [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
` (4 preceding siblings ...)
2018-03-30 9:37 ` [PATCH 5/7] syscalls/x86: use struct pt_regs based syscall calling for IA32_EMULATION and x32 Dominik Brodowski
@ 2018-03-30 9:37 ` Dominik Brodowski
2018-03-30 9:37 ` [PATCH 7/7] x86/entry/64: extend register clearing on syscall entry to lower registers Dominik Brodowski
2018-03-30 10:16 ` [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Ingo Molnar
7 siblings, 0 replies; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 9:37 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Thomas Gleixner, Andi Kleen,
Ingo Molnar, Andrew Morton, Al Viro, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, Peter Zijlstra, H. Peter Anvin, x86
Remove CONFIG_SYSCALL_PTREGS from arch/x86/Kconfig and simply select
ARCH_HAS_SYSCALL_WRAPPER unconditionally on x86-64 allows us to simplify
several codepaths.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
arch/x86/Kconfig | 6 +-----
arch/x86/entry/common.c | 10 ++--------
arch/x86/entry/syscall_32.c | 6 +++---
arch/x86/entry/syscall_64.c | 5 -----
arch/x86/entry/vsyscall/vsyscall_64.c | 16 ----------------
arch/x86/include/asm/syscall.h | 4 ++--
arch/x86/include/asm/syscalls.h | 20 ++++----------------
7 files changed, 12 insertions(+), 55 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2ad46f7c522c..7c0e135819f1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -29,6 +29,7 @@ config X86_64
select HAVE_ARCH_SOFT_DIRTY
select MODULES_USE_ELF_RELA
select X86_DEV_DMA_OPS
+ select ARCH_HAS_SYSCALL_WRAPPER
#
# Arch settings
@@ -2957,8 +2958,3 @@ source "crypto/Kconfig"
source "arch/x86/kvm/Kconfig"
source "lib/Kconfig"
-
-config SYSCALL_PTREGS
- def_bool y
- depends on X86_64
- select ARCH_HAS_SYSCALL_WRAPPER
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 425f798b39e3..fbf6a6c3fd2d 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -284,13 +284,7 @@ __visible void do_syscall_64(unsigned long nr, struct pt_regs *regs)
nr &= __SYSCALL_MASK;
if (likely(nr < NR_syscalls)) {
nr = array_index_nospec(nr, NR_syscalls);
-#ifdef CONFIG_SYSCALL_PTREGS
regs->ax = sys_call_table[nr](regs);
-#else
- regs->ax = sys_call_table[nr](
- regs->di, regs->si, regs->dx,
- regs->r10, regs->r8, regs->r9);
-#endif
}
syscall_return_slowpath(regs);
@@ -325,7 +319,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
if (likely(nr < IA32_NR_syscalls)) {
nr = array_index_nospec(nr, IA32_NR_syscalls);
-#ifdef CONFIG_SYSCALL_PTREGS
+#ifdef CONFIG_IA32_EMULATION
regs->ax = ia32_sys_call_table[nr](regs);
#else
/*
@@ -338,7 +332,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
(unsigned int)regs->bx, (unsigned int)regs->cx,
(unsigned int)regs->dx, (unsigned int)regs->si,
(unsigned int)regs->di, (unsigned int)regs->bp);
-#endif /* CONFIG_SYSCALL_PTREGS */
+#endif /* CONFIG_IA32_EMULATION */
}
syscall_return_slowpath(regs);
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index bbd8dda36c7d..84901c4ad67b 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -7,17 +7,17 @@
#include <asm/asm-offsets.h>
#include <asm/syscall.h>
-#ifdef CONFIG_SYSCALL_PTREGS
+#ifdef CONFIG_IA32_EMULATION
/* On X86_64, we use struct pt_regs * to pass parameters to syscalls */
#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(struct pt_regs *);
/* this is a lie, but it does not hurt as sys_ni_syscall just returns -EINVAL */
extern asmlinkage long sys_ni_syscall(struct pt_regs *);
-#else /* CONFIG_SYSCALL_PTREGS */
+#else /* CONFIG_IA32_EMULATION */
#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
-#endif /* CONFIG_SYSCALL_PTREGS */
+#endif /* CONFIG_IA32_EMULATION */
#include <asm/syscalls_32.h>
#undef __SYSCALL_I386
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index b4e724777a7d..0ff4de8d9571 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -7,14 +7,9 @@
#include <asm/asm-offsets.h>
#include <asm/syscall.h>
-#ifdef CONFIG_SYSCALL_PTREGS
/* this is a lie, but it does not hurt as sys_ni_syscall just returns -EINVAL */
extern asmlinkage long sys_ni_syscall(struct pt_regs *);
#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(struct pt_regs *);
-#else /* CONFIG_SYSCALL_PTREGS */
-extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
-#endif /* CONFIG_SYSCALL_PTREGS */
#include <asm/syscalls_64.h>
#undef __SYSCALL_64
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 9fad68899f82..4e08df16d9da 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -227,35 +227,19 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
ret = -EFAULT;
switch (vsyscall_nr) {
case 0:
-#ifdef CONFIG_SYSCALL_PTREGS
/* this decodes regs->di and regs->si on its own */
ret = sys_gettimeofday(regs);
-#else
- ret = sys_gettimeofday(
- (struct timeval __user *)regs->di,
- (struct timezone __user *)regs->si);
-#endif /* CONFIG_SYSCALL_PTREGS */
break;
case 1:
-#ifdef CONFIG_SYSCALL_PTREGS
/* this decodes regs->di on its own */
ret = sys_time(regs);
-#else
- ret = sys_time((time_t __user *)regs->di);
-#endif /* CONFIG_SYSCALL_PTREGS */
break;
case 2:
-#ifdef CONFIG_SYSCALL_PTREGS
/* this decodes regs->di, regs->si and regs->dx on its own */
regs->dx = 0;
ret = sys_getcpu(regs);
-#else
- ret = sys_getcpu((unsigned __user *)regs->di,
- (unsigned __user *)regs->si,
- NULL);
-#endif /* CONFIG_SYSCALL_PTREGS */
break;
}
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 8702c7951bc7..0af95296ace8 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -20,13 +20,13 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>
-#ifdef CONFIG_SYSCALL_PTREGS
+#ifdef CONFIG_X86_64
typedef asmlinkage long (*sys_call_ptr_t)(struct pt_regs *);
#else
typedef asmlinkage long (*sys_call_ptr_t)(unsigned long, unsigned long,
unsigned long, unsigned long,
unsigned long, unsigned long);
-#endif /* CONFIG_SYSCALL_PTREGS */
+#endif /* CONFIG_X86_64 */
extern const sys_call_ptr_t sys_call_table[];
#if defined(CONFIG_X86_32)
diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index e4ad93c05f02..a3aecee89881 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -19,10 +19,10 @@
/* kernel/ioport.c */
long ksys_ioperm(unsigned long from, unsigned long num, int turn_on);
-#ifndef CONFIG_SYSCALL_PTREGS
-/*
- * If CONFIG_SYSCALL_PTREGS is enabled, a different syscall calling convention
- * is used. Do not include these -- invalid -- prototypes then
+#ifdef CONFIG_X86_32
+/*
+ * These definitions are only valid on pure 32bit systems; x86-64 uses a
+ * different syscall calling convention
*/
asmlinkage long sys_ioperm(unsigned long, unsigned long, int);
asmlinkage long sys_iopl(unsigned int);
@@ -38,7 +38,6 @@ asmlinkage long sys_set_thread_area(struct user_desc __user *);
asmlinkage long sys_get_thread_area(struct user_desc __user *);
/* X86_32 only */
-#ifdef CONFIG_X86_32
/* kernel/signal.c */
asmlinkage long sys_sigreturn(void);
@@ -48,16 +47,5 @@ struct vm86_struct;
asmlinkage long sys_vm86old(struct vm86_struct __user *);
asmlinkage long sys_vm86(unsigned long, unsigned long);
-#else /* CONFIG_X86_32 */
-
-/* X86_64 only */
-/* kernel/process_64.c */
-asmlinkage long sys_arch_prctl(int, unsigned long);
-
-/* kernel/sys_x86_64.c */
-asmlinkage long sys_mmap(unsigned long, unsigned long, unsigned long,
- unsigned long, unsigned long, unsigned long);
-
#endif /* CONFIG_X86_32 */
-#endif /* CONFIG_SYSCALL_PTREGS */
#endif /* _ASM_X86_SYSCALLS_H */
--
2.16.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 7/7] x86/entry/64: extend register clearing on syscall entry to lower registers
2018-03-30 9:37 [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
` (5 preceding siblings ...)
2018-03-30 9:37 ` [PATCH 6/7] syscalls/x86: unconditionally enable struct pt_regs based syscalls on x86_64 Dominik Brodowski
@ 2018-03-30 9:37 ` Dominik Brodowski
2018-03-30 10:10 ` Ingo Molnar
2018-03-30 10:16 ` [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Ingo Molnar
7 siblings, 1 reply; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 9:37 UTC (permalink / raw)
To: linux-kernel
Cc: viro, torvalds, arnd, linux-arch, Thomas Gleixner, Andi Kleen,
Ingo Molnar, Andrew Morton, Al Viro, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, Peter Zijlstra, H. Peter Anvin, x86
To reduce the chance that random user space content leaks down the call
chain in registers, also clear lower registers on syscall entry:
For 64bit syscalls, extend the register clearing in PUSH_AND_CLEAR_REGS
to %dx and %cx. This should not hurt at all, also on the other callers
of that macro. We do not need to clear %rdi and %rsi for syscall entry,
as those registers are used to pass the parameters to do_syscall_64().
For the 32bit compat syscalls, do_int80_syscall_32() and
do_fast_syscall_32() each only take one parameter. Therefore, extend the
register clearing to %dx, %cx, and %si in entry_SYSCALL_compat and
entry_INT80_compat.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
---
arch/x86/entry/calling.h | 2 ++
arch/x86/entry/entry_64_compat.S | 6 ++++++
2 files changed, 8 insertions(+)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index be63330c5511..593812a4c29e 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -114,7 +114,9 @@ For 32-bit we have the following conventions - kernel is built with
pushq %rsi /* pt_regs->si */
.endif
pushq \rdx /* pt_regs->dx */
+ xorl %edx, %edx /* nosepc dx */
pushq %rcx /* pt_regs->cx */
+ xorl %ecx, %ecx /* nosepc cx */
pushq \rax /* pt_regs->ax */
pushq %r8 /* pt_regs->r8 */
xorl %r8d, %r8d /* nospec r8 */
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 08425c42f8b7..23e0945959e5 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -220,8 +220,11 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
pushq %rax /* pt_regs->orig_ax */
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
+ xorl %esi, %esi /* nosepc si */
pushq %rdx /* pt_regs->dx */
+ xorl %edx, %edx /* nosepc dx */
pushq %rbp /* pt_regs->cx (stashed in bp) */
+ xorl %ecx, %ecx /* nosepc cx */
pushq $-ENOSYS /* pt_regs->ax */
pushq $0 /* pt_regs->r8 = 0 */
xorl %r8d, %r8d /* nospec r8 */
@@ -365,8 +368,11 @@ ENTRY(entry_INT80_compat)
pushq (%rdi) /* pt_regs->di */
pushq %rsi /* pt_regs->si */
+ xorl %esi, %esi /* nosepc si */
pushq %rdx /* pt_regs->dx */
+ xorl %edx, %edx /* nosepc dx */
pushq %rcx /* pt_regs->cx */
+ xorl %ecx, %ecx /* nosepc cx */
pushq $-ENOSYS /* pt_regs->ax */
pushq $0 /* pt_regs->r8 = 0 */
xorl %r8d, %r8d /* nospec r8 */
--
2.16.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 7/7] x86/entry/64: extend register clearing on syscall entry to lower registers
2018-03-30 9:37 ` [PATCH 7/7] x86/entry/64: extend register clearing on syscall entry to lower registers Dominik Brodowski
@ 2018-03-30 10:10 ` Ingo Molnar
0 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2018-03-30 10:10 UTC (permalink / raw)
To: Dominik Brodowski
Cc: linux-kernel, viro, torvalds, arnd, linux-arch, Thomas Gleixner,
Andi Kleen, Ingo Molnar, Andrew Morton, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, Peter Zijlstra, H. Peter Anvin, x86
* Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> .endif
> pushq \rdx /* pt_regs->dx */
> + xorl %edx, %edx /* nosepc dx */
> pushq %rcx /* pt_regs->cx */
> + xorl %ecx, %ecx /* nosepc cx */
> pushq \rax /* pt_regs->ax */
> pushq %r8 /* pt_regs->r8 */
> xorl %r8d, %r8d /* nospec r8 */
> diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
> index 08425c42f8b7..23e0945959e5 100644
> --- a/arch/x86/entry/entry_64_compat.S
> +++ b/arch/x86/entry/entry_64_compat.S
> @@ -220,8 +220,11 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
> pushq %rax /* pt_regs->orig_ax */
> pushq %rdi /* pt_regs->di */
> pushq %rsi /* pt_regs->si */
> + xorl %esi, %esi /* nosepc si */
> pushq %rdx /* pt_regs->dx */
> + xorl %edx, %edx /* nosepc dx */
> pushq %rbp /* pt_regs->cx (stashed in bp) */
> + xorl %ecx, %ecx /* nosepc cx */
> pushq $-ENOSYS /* pt_regs->ax */
> pushq $0 /* pt_regs->r8 = 0 */
> xorl %r8d, %r8d /* nospec r8 */
> @@ -365,8 +368,11 @@ ENTRY(entry_INT80_compat)
>
> pushq (%rdi) /* pt_regs->di */
> pushq %rsi /* pt_regs->si */
> + xorl %esi, %esi /* nosepc si */
> pushq %rdx /* pt_regs->dx */
> + xorl %edx, %edx /* nosepc dx */
> pushq %rcx /* pt_regs->cx */
> + xorl %ecx, %ecx /* nosepc cx */
> pushq $-ENOSYS /* pt_regs->ax */
> pushq $0 /* pt_regs->r8 = 0 */
> xorl %r8d, %r8d /* nospec r8 */
s/nosepc
/nospec
Thanks,
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] use struct pt_regs based syscall calling for x86-64
2018-03-30 9:37 [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
` (6 preceding siblings ...)
2018-03-30 9:37 ` [PATCH 7/7] x86/entry/64: extend register clearing on syscall entry to lower registers Dominik Brodowski
@ 2018-03-30 10:16 ` Ingo Molnar
2018-03-30 10:46 ` Dominik Brodowski
7 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2018-03-30 10:16 UTC (permalink / raw)
To: Dominik Brodowski
Cc: linux-kernel, viro, torvalds, arnd, linux-arch, Andi Kleen,
Andrew Morton, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
x86
* Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> A few questions remain, from important stuff to bikeshedding:
>
> 1) Is it acceptable to pass the existing struct pt_regs to the sys_*()
> kernel functions in emulate_vsyscall(), or should it use a hand-crafted
> struct pt_regs instead?
I think so: we already have task_pt_regs() which gives access to the real return
registers on the kernel stack.
I think as long as we constify the pointer, we should pass in the real thing.
> 2) Is it the right approach to generate the __sys32_ia32_*() names to
> include in the syscall table on-the-fly, or should they all be listed
> in arch/x86/entry/syscalls/syscall_32.tbl ?
I think as a general principle all system call tables should point to the
first-hop wrapper symbol name (i.e. __sys32_ia32_*() in this case), not to the
generic symbol name - even though we could generate the former from the latter.
The more indirection in these tables, the harder to read they become I think.
> 3) I have chosen to name the default 64-bit syscall stub sys_*(), same as
> the "normal" syscall, and the IA32_EMULATION compat syscall stub
> compat_sys_*(), same as the "normal" compat syscall. Though this
> might cause some confusion, as the "same" function uses a different
> calling convention and different parameters on x86, it has the
> advantages that
> - the kernel *has* a function sys_*() implementing the syscall,
> so those curious in stack traces etc. will find it in plain
> sight,
> - it is easier to handle in the syscall table generation, and
> - error injection works the same.
I don't think there should be a symbol space overlap, that will only lead to
confusion. The symbols can be _similar_, with a prefix, underscores or so, but
they shouldn't match I think.
> The whole series is available at
>
> https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP
BTW., I'd like all these bits to go through the x86 tree.
What is the expected merge route of the generic preparatory bits?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] use struct pt_regs based syscall calling for x86-64
2018-03-30 10:16 ` [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Ingo Molnar
@ 2018-03-30 10:46 ` Dominik Brodowski
2018-03-30 11:03 ` Ingo Molnar
0 siblings, 1 reply; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 10:46 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, viro, torvalds, arnd, linux-arch, Andi Kleen,
Andrew Morton, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
x86
On Fri, Mar 30, 2018 at 12:16:02PM +0200, Ingo Molnar wrote:
>
> * Dominik Brodowski <linux@dominikbrodowski.net> wrote:
>
> > A few questions remain, from important stuff to bikeshedding:
> >
> > 1) Is it acceptable to pass the existing struct pt_regs to the sys_*()
> > kernel functions in emulate_vsyscall(), or should it use a hand-crafted
> > struct pt_regs instead?
>
> I think so: we already have task_pt_regs() which gives access to the real return
> registers on the kernel stack.
>
> I think as long as we constify the pointer, we should pass in the real thing.
Good idea. I have updated the patchset accordingly.
> > 2) Is it the right approach to generate the __sys32_ia32_*() names to
> > include in the syscall table on-the-fly, or should they all be listed
> > in arch/x86/entry/syscalls/syscall_32.tbl ?
>
> I think as a general principle all system call tables should point to the
> first-hop wrapper symbol name (i.e. __sys32_ia32_*() in this case), not to the
> generic symbol name - even though we could generate the former from the latter.
>
> The more indirection in these tables, the harder to read they become I think.
>
> > 3) I have chosen to name the default 64-bit syscall stub sys_*(), same as
> > the "normal" syscall, and the IA32_EMULATION compat syscall stub
> > compat_sys_*(), same as the "normal" compat syscall. Though this
> > might cause some confusion, as the "same" function uses a different
> > calling convention and different parameters on x86, it has the
> > advantages that
> > - the kernel *has* a function sys_*() implementing the syscall,
> > so those curious in stack traces etc. will find it in plain
> > sight,
> > - it is easier to handle in the syscall table generation, and
> > - error injection works the same.
>
> I don't think there should be a symbol space overlap, that will only lead to
> confusion. The symbols can be _similar_, with a prefix, underscores or so, but
> they shouldn't match I think.
OK, I'll wait for a few more opinions on these two related issues, and update
the code accordingly then.
> > The whole series is available at
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP
>
> BTW., I'd like all these bits to go through the x86 tree.
>
> What is the expected merge route of the generic preparatory bits?
My current plan is to push the 109 patch bomb to remove in-kernel calls to syscalls
directly to Linus once v4.16 is released.
For this series of seven patches, I am content with them going upstream through
the x86 tree (once that contains a backmerge of Linus' tree or the syscalls
tree, obviously). IMO, these seven patches should be kept together, and not routed
upstream through different channels.
Thanks,
Dominik
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] use struct pt_regs based syscall calling for x86-64
2018-03-30 10:46 ` Dominik Brodowski
@ 2018-03-30 11:03 ` Ingo Molnar
2018-03-30 11:48 ` Dominik Brodowski
0 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2018-03-30 11:03 UTC (permalink / raw)
To: Dominik Brodowski
Cc: linux-kernel, viro, torvalds, arnd, linux-arch, Andi Kleen,
Andrew Morton, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
x86
* Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> > > The whole series is available at
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP
> >
> > BTW., I'd like all these bits to go through the x86 tree.
> >
> > What is the expected merge route of the generic preparatory bits?
>
> My current plan is to push the 109 patch bomb to remove in-kernel calls to syscalls
> directly to Linus once v4.16 is released.
Are there any (textual and semantic) conflicts with the latest -next?
Also, to what extent were these 109 patches tested in -next?
> For this series of seven patches, I am content with them going upstream through
> the x86 tree (once that contains a backmerge of Linus' tree or the syscalls
> tree, obviously). IMO, these seven patches should be kept together, and not
> routed upstream through different channels.
Of course they should stay together - the generic code impact is minimal, these
are 95% x86.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] use struct pt_regs based syscall calling for x86-64
2018-03-30 11:03 ` Ingo Molnar
@ 2018-03-30 11:48 ` Dominik Brodowski
2018-03-30 12:00 ` Ingo Molnar
0 siblings, 1 reply; 14+ messages in thread
From: Dominik Brodowski @ 2018-03-30 11:48 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, viro, torvalds, arnd, linux-arch, Andi Kleen,
Andrew Morton, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
x86
On Fri, Mar 30, 2018 at 01:03:54PM +0200, Ingo Molnar wrote:
>
> * Dominik Brodowski <linux@dominikbrodowski.net> wrote:
>
> > > > The whole series is available at
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP
> > >
> > > BTW., I'd like all these bits to go through the x86 tree.
> > >
> > > What is the expected merge route of the generic preparatory bits?
> >
> > My current plan is to push the 109 patch bomb to remove in-kernel calls to syscalls
> > directly to Linus once v4.16 is released.
>
> Are there any (textual and semantic) conflicts with the latest -next?
>
> Also, to what extent were these 109 patches tested in -next?
These 109 patches are equivalent to the syscalls tree in linux-next. Most of
these patches habe been in there for quite a while (the last major batch went
in on March 22; other patches are in there since March 14th).
Conflicts existend with asm-generic and metag (which contain remvoal of some
architectures; I have solved that issue by not caring about those archs any
more); trivial conflicts exist since very few days with the vfs and sparc
trees.
Thanks,
Dominik
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] use struct pt_regs based syscall calling for x86-64
2018-03-30 11:48 ` Dominik Brodowski
@ 2018-03-30 12:00 ` Ingo Molnar
0 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2018-03-30 12:00 UTC (permalink / raw)
To: Dominik Brodowski
Cc: linux-kernel, viro, torvalds, arnd, linux-arch, Andi Kleen,
Andrew Morton, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
H. Peter Anvin, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
x86
* Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> On Fri, Mar 30, 2018 at 01:03:54PM +0200, Ingo Molnar wrote:
> >
> > * Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> >
> > > > > The whole series is available at
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git syscalls-WIP
> > > >
> > > > BTW., I'd like all these bits to go through the x86 tree.
> > > >
> > > > What is the expected merge route of the generic preparatory bits?
> > >
> > > My current plan is to push the 109 patch bomb to remove in-kernel calls to syscalls
> > > directly to Linus once v4.16 is released.
> >
> > Are there any (textual and semantic) conflicts with the latest -next?
> >
> > Also, to what extent were these 109 patches tested in -next?
>
> These 109 patches are equivalent to the syscalls tree in linux-next. Most of
> these patches habe been in there for quite a while (the last major batch went
> in on March 22; other patches are in there since March 14th).
>
> Conflicts existend with asm-generic and metag (which contain remvoal of some
> architectures; I have solved that issue by not caring about those archs any
> more); trivial conflicts exist since very few days with the vfs and sparc
> trees.
Ok, great - all that sounds good to me, and I'll integrate the x86 bits once the
generic bits are upstream.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-03-30 12:00 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-30 9:37 [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Dominik Brodowski
2018-03-30 9:37 ` [PATCH 1/7] x86: don't pointlessly reload the system call number Dominik Brodowski
2018-03-30 9:37 ` [PATCH 2/7] syscalls: introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER Dominik Brodowski
2018-03-30 9:37 ` [PATCH 3/7] syscalls/x86: use struct pt_regs based syscall calling for 64bit syscalls Dominik Brodowski
2018-03-30 9:37 ` [PATCH 4/7] syscalls: prepare ARCH_HAS_SYSCALL_WRAPPER for compat syscalls Dominik Brodowski
2018-03-30 9:37 ` [PATCH 5/7] syscalls/x86: use struct pt_regs based syscall calling for IA32_EMULATION and x32 Dominik Brodowski
2018-03-30 9:37 ` [PATCH 6/7] syscalls/x86: unconditionally enable struct pt_regs based syscalls on x86_64 Dominik Brodowski
2018-03-30 9:37 ` [PATCH 7/7] x86/entry/64: extend register clearing on syscall entry to lower registers Dominik Brodowski
2018-03-30 10:10 ` Ingo Molnar
2018-03-30 10:16 ` [PATCH 0/7] use struct pt_regs based syscall calling for x86-64 Ingo Molnar
2018-03-30 10:46 ` Dominik Brodowski
2018-03-30 11:03 ` Ingo Molnar
2018-03-30 11:48 ` Dominik Brodowski
2018-03-30 12:00 ` Ingo Molnar
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.