All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] x86: Rewrite 64-bit syscall code
@ 2016-01-28 23:11 Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 01/10] selftests/x86: Extend Makefile to allow 64-bit-only tests Andy Lutomirski
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

This is kind of like the 32-bit and compat code, except that I
preserved the fast path this time.  I was unable to measure any
significant performance change on my laptop in the fast path.

Changes from v1:
 - Various tidying up.
 - Remove duplicate tables (folded in, so the fastpath table isn't in this set).
 - Rebased to 4.5-rc1
 - Remove enter_from_user_mode stuff -- let's get the basics in first.

Andy Lutomirski (10):
  selftests/x86: Extend Makefile to allow 64-bit-only tests
  selftests/x86: Add check_initial_reg_state
  x86/syscalls: Refactor syscalltbl.sh
  x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
  x86/syscalls: Move compat syscall entry handling into syscalltbl.sh
  x86/syscalls: Add syscall entry qualifiers
  x86/entry/64: Always run ptregs-using syscalls on the slow path
  x86/entry/64: Call all native slow-path syscalls with full pt-regs
  x86/entry/64: Stop using int_ret_from_sys_call in ret_from_fork
  x86/entry/64: Migrate the 64-bit syscall slow path to C

 arch/x86/entry/common.c                            |  26 ++
 arch/x86/entry/entry_64.S                          | 271 +++++++--------------
 arch/x86/entry/syscall_32.c                        |  10 +-
 arch/x86/entry/syscall_64.c                        |  13 +-
 arch/x86/entry/syscalls/syscall_64.tbl             |  18 +-
 arch/x86/entry/syscalls/syscalltbl.sh              |  58 ++++-
 arch/x86/kernel/asm-offsets_32.c                   |   2 +-
 arch/x86/kernel/asm-offsets_64.c                   |  10 +-
 arch/x86/um/sys_call_table_32.c                    |   4 +-
 arch/x86/um/sys_call_table_64.c                    |   7 +-
 arch/x86/um/user-offsets.c                         |   6 +-
 tools/testing/selftests/x86/Makefile               |  14 +-
 .../selftests/x86/check_initial_reg_state.c        | 109 +++++++++
 13 files changed, 317 insertions(+), 231 deletions(-)
 create mode 100644 tools/testing/selftests/x86/check_initial_reg_state.c

-- 
2.5.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 01/10] selftests/x86: Extend Makefile to allow 64-bit-only tests
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:33   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 02/10] selftests/x86: Add check_initial_reg_state Andy Lutomirski
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski, Shuah Khan, linux-api

Previously the Makefile supported 32-bit-only tests and tests that
were 32-bit and 64-bit.  This adds the support for tests that are
only built as 64-bit binaries.

There aren't any yet, but there might be a few some day.

Cc: Shuah Khan <shuahkhan@gmail.com>
Cc: linux-api@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index d0c473f65850..9c81f263a396 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -11,8 +11,9 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault sigreturn test_syscall
 			vdso_restorer
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
+TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)
 BINARIES_32 := $(TARGETS_C_32BIT_ALL:%=%_32)
-BINARIES_64 := $(TARGETS_C_BOTHBITS:%=%_64)
+BINARIES_64 := $(TARGETS_C_64BIT_ALL:%=%_64)
 
 CFLAGS := -O2 -g -std=gnu99 -pthread -Wall
 
@@ -40,7 +41,7 @@ clean:
 $(TARGETS_C_32BIT_ALL:%=%_32): %_32: %.c
 	$(CC) -m32 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl -lm
 
-$(TARGETS_C_BOTHBITS:%=%_64): %_64: %.c
+$(TARGETS_C_64BIT_ALL:%=%_64): %_64: %.c
 	$(CC) -m64 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl
 
 # x86_64 users should be encouraged to install 32-bit libraries
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 02/10] selftests/x86: Add check_initial_reg_state
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 01/10] selftests/x86: Extend Makefile to allow 64-bit-only tests Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:34   ` [tip:x86/asm] selftests/x86: Add check_initial_reg_state() tip-bot for Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 03/10] x86/syscalls: Refactor syscalltbl.sh Andy Lutomirski
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

This checks that ELF binaries are started with an appropriately
blank register state.

(There's currently a nasty special case in the entry asm to arrange
 for this.  I'm planning on removing the special case, and this will
 help make sure I don't break it.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 tools/testing/selftests/x86/Makefile               |   9 +-
 .../selftests/x86/check_initial_reg_state.c        | 109 +++++++++++++++++++++
 2 files changed, 117 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/check_initial_reg_state.c

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 9c81f263a396..df4f767f48da 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -4,7 +4,8 @@ include ../lib.mk
 
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
-TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_syscall
+TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_syscall \
+			check_initial_reg_state
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault sigreturn test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			ldt_gdt \
@@ -66,3 +67,9 @@ endif
 sysret_ss_attrs_64: thunks.S
 ptrace_syscall_32: raw_syscall_helper_32.S
 test_syscall_vdso_32: thunks_32.S
+
+# check_initial_reg_state is special: it needs a custom entry, and it
+# needs to be static so that its interpreter doesn't destroy its initial
+# state.
+check_initial_reg_state_32: CFLAGS += -Wl,-ereal_start -static
+check_initial_reg_state_64: CFLAGS += -Wl,-ereal_start -static
diff --git a/tools/testing/selftests/x86/check_initial_reg_state.c b/tools/testing/selftests/x86/check_initial_reg_state.c
new file mode 100644
index 000000000000..6aaed9b85baf
--- /dev/null
+++ b/tools/testing/selftests/x86/check_initial_reg_state.c
@@ -0,0 +1,109 @@
+/*
+ * check_initial_reg_state.c - check that execve sets the correct state
+ * Copyright (c) 2014-2016 Andrew Lutomirski
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+
+unsigned long ax, bx, cx, dx, si, di, bp, sp, flags;
+unsigned long r8, r9, r10, r11, r12, r13, r14, r15;
+
+asm (
+	".pushsection .text\n\t"
+	".type real_start, @function\n\t"
+	".global real_start\n\t"
+	"real_start:\n\t"
+#ifdef __x86_64__
+	"mov %rax, ax\n\t"
+	"mov %rbx, bx\n\t"
+	"mov %rcx, cx\n\t"
+	"mov %rdx, dx\n\t"
+	"mov %rsi, si\n\t"
+	"mov %rdi, di\n\t"
+	"mov %rbp, bp\n\t"
+	"mov %rsp, sp\n\t"
+	"mov %r8, r8\n\t"
+	"mov %r9, r9\n\t"
+	"mov %r10, r10\n\t"
+	"mov %r11, r11\n\t"
+	"mov %r12, r12\n\t"
+	"mov %r13, r13\n\t"
+	"mov %r14, r14\n\t"
+	"mov %r15, r15\n\t"
+	"pushfq\n\t"
+	"popq flags\n\t"
+#else
+	"mov %eax, ax\n\t"
+	"mov %ebx, bx\n\t"
+	"mov %ecx, cx\n\t"
+	"mov %edx, dx\n\t"
+	"mov %esi, si\n\t"
+	"mov %edi, di\n\t"
+	"mov %ebp, bp\n\t"
+	"mov %esp, sp\n\t"
+	"pushfl\n\t"
+	"popl flags\n\t"
+#endif
+	"jmp _start\n\t"
+	".size real_start, . - real_start\n\t"
+	".popsection");
+
+int main()
+{
+	int nerrs = 0;
+
+	if (sp == 0) {
+		printf("[FAIL]\tTest was built incorrectly\n");
+		return 1;
+	}
+
+	if (ax || bx || cx || dx || si || di || bp
+#ifdef __x86_64__
+	    || r8 || r9 || r10 || r11 || r12 || r13 || r14 || r15
+#endif
+		) {
+		printf("[FAIL]\tAll GPRs except SP should be 0\n");
+#define SHOW(x) printf("\t" #x " = 0x%lx\n", x);
+		SHOW(ax);
+		SHOW(bx);
+		SHOW(cx);
+		SHOW(dx);
+		SHOW(si);
+		SHOW(di);
+		SHOW(bp);
+		SHOW(sp);
+#ifdef __x86_64__
+		SHOW(r8);
+		SHOW(r9);
+		SHOW(r10);
+		SHOW(r11);
+		SHOW(r12);
+		SHOW(r13);
+		SHOW(r14);
+		SHOW(r15);
+#endif
+		nerrs++;
+	} else {
+		printf("[OK]\tAll GPRs except SP are 0\n");
+	}
+
+	if (flags != 0x202) {
+		printf("[FAIL]\tFLAGS is 0x%lx, but it should be 0x202\n", flags);
+		nerrs++;
+	} else {
+		printf("[OK]\tFLAGS is 0x202\n");
+	}
+
+	return nerrs ? 1 : 0;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 03/10] x86/syscalls: Refactor syscalltbl.sh
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 01/10] selftests/x86: Extend Makefile to allow 64-bit-only tests Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 02/10] selftests/x86: Add check_initial_reg_state Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:34   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 04/10] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32 Andy Lutomirski
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

This splits out the code to emit a syscall line.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/syscalls/syscalltbl.sh | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index 0e7f8ec071e7..167965ee742e 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -3,13 +3,21 @@
 in="$1"
 out="$2"
 
+emit() {
+    abi="$1"
+    nr="$2"
+    entry="$3"
+    compat="$4"
+    if [ -n "$compat" ]; then
+	echo "__SYSCALL_${abi}($nr, $entry, $compat)"
+    elif [ -n "$entry" ]; then
+	echo "__SYSCALL_${abi}($nr, $entry, $entry)"
+    fi
+}
+
 grep '^[0-9]' "$in" | sort -n | (
     while read nr abi name entry compat; do
 	abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
-	if [ -n "$compat" ]; then
-	    echo "__SYSCALL_${abi}($nr, $entry, $compat)"
-	elif [ -n "$entry" ]; then
-	    echo "__SYSCALL_${abi}($nr, $entry, $entry)"
-	fi
+	emit "$abi" "$nr" "$entry" "$compat"
     done
 ) > "$out"
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 04/10] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
                   ` (2 preceding siblings ...)
  2016-01-28 23:11 ` [PATCH v2 03/10] x86/syscalls: Refactor syscalltbl.sh Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:34   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 05/10] x86/syscalls: Move compat syscall entry handling into syscalltbl.sh Andy Lutomirski
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

The common/64/x32 distinction has no effect other than determining
which kernels actually support the syscall.  Move the logic into
syscalltbl.sh.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/syscall_64.c           |  8 --------
 arch/x86/entry/syscalls/syscalltbl.sh | 17 ++++++++++++++++-
 arch/x86/kernel/asm-offsets_64.c      |  6 ------
 arch/x86/um/sys_call_table_64.c       |  3 ---
 arch/x86/um/user-offsets.c            |  2 --
 5 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 41283d22be7a..974fd89ac806 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -6,14 +6,6 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
-
-#ifdef CONFIG_X86_X32_ABI
-# define __SYSCALL_X32(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
-#else
-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
-#endif
-
 #define __SYSCALL_64(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index 167965ee742e..5ebeaf1041e7 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -18,6 +18,21 @@ emit() {
 grep '^[0-9]' "$in" | sort -n | (
     while read nr abi name entry compat; do
 	abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
-	emit "$abi" "$nr" "$entry" "$compat"
+	if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then
+	    # COMMON is the same as 64, except that we don't expect X32
+	    # programs to use it.  Our expectation has nothing to do with
+	    # any generated code, so treat them the same.
+	    emit 64 "$nr" "$entry" "$compat"
+	elif [ "$abi" == "X32" ]; then
+	    # X32 is equivalent to 64 on an X32-compatible kernel.
+	    echo "#ifdef CONFIG_X86_X32_ABI"
+	    emit 64 "$nr" "$entry" "$compat"
+	    echo "#endif"
+	elif [ "$abi" == "I386" ]; then
+	    emit "$abi" "$nr" "$entry" "$compat"
+	else
+	    echo "Unknown abi $abi" >&2
+	    exit 1
+	fi
     done
 ) > "$out"
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index f2edafb5f24e..29db3b3f550c 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -5,12 +5,6 @@
 #include <asm/ia32.h>
 
 #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
-#ifdef CONFIG_X86_X32_ABI
-# define __SYSCALL_X32(nr, sym, compat) [nr] = 1,
-#else
-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
-#endif
 static char syscalls_64[] = {
 #include <asm/syscalls_64.h>
 };
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index b74ea6c2c0e7..71a497cde921 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -35,9 +35,6 @@
 #define stub_execveat sys_execveat
 #define stub_rt_sigreturn sys_rt_sigreturn
 
-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
-
 #define __SYSCALL_64(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 
diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
index ce7e3607a870..5edf4f4bbf53 100644
--- a/arch/x86/um/user-offsets.c
+++ b/arch/x86/um/user-offsets.c
@@ -15,8 +15,6 @@ static char syscalls[] = {
 };
 #else
 #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
 static char syscalls[] = {
 #include <asm/syscalls_64.h>
 };
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 05/10] x86/syscalls: Move compat syscall entry handling into syscalltbl.sh
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
                   ` (3 preceding siblings ...)
  2016-01-28 23:11 ` [PATCH v2 04/10] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32 Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:35   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 06/10] x86/syscalls: Add syscall entry qualifiers Andy Lutomirski
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

Rather than duplicating the compat entry handling in all consumers
of syscalls_BITS.h, handle it directly in syscalltbl.sh.  Now we
generate entries in syscalls_32.h like:

__SYSCALL_I386(5, sys_open)
__SYSCALL_I386(5, compat_sys_open)

and all of its consumers implicitly get the right entry point.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/syscall_32.c           | 10 ++--------
 arch/x86/entry/syscall_64.c           |  4 ++--
 arch/x86/entry/syscalls/syscalltbl.sh | 22 ++++++++++++++++++----
 arch/x86/kernel/asm-offsets_32.c      |  2 +-
 arch/x86/kernel/asm-offsets_64.c      |  4 ++--
 arch/x86/um/sys_call_table_32.c       |  4 ++--
 arch/x86/um/sys_call_table_64.c       |  4 ++--
 arch/x86/um/user-offsets.c            |  4 ++--
 8 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index 9a6649857106..3e2829759da2 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -6,17 +6,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#ifdef CONFIG_IA32_EMULATION
-#define SYM(sym, compat) compat
-#else
-#define SYM(sym, compat) sym
-#endif
-
-#define __SYSCALL_I386(nr, sym, compat) extern asmlinkage long SYM(sym, compat)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_I386(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_32.h>
 #undef __SYSCALL_I386
 
-#define __SYSCALL_I386(nr, sym, compat) [nr] = SYM(sym, compat),
+#define __SYSCALL_I386(nr, sym) [nr] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 974fd89ac806..3781989b180e 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -6,11 +6,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym, compat) [nr] = sym,
+#define __SYSCALL_64(nr, sym) [nr] = sym,
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index 5ebeaf1041e7..b81479c8c5fb 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -8,10 +8,24 @@ emit() {
     nr="$2"
     entry="$3"
     compat="$4"
-    if [ -n "$compat" ]; then
-	echo "__SYSCALL_${abi}($nr, $entry, $compat)"
-    elif [ -n "$entry" ]; then
-	echo "__SYSCALL_${abi}($nr, $entry, $entry)"
+
+    if [ "$abi" == "64" -a -n "$compat" ]; then
+	echo "a compat entry for a 64-bit syscall makes no sense" >&2
+	exit 1
+    fi
+
+    if [ -z "$compat" ]; then
+	if [ -n "$entry" ]; then
+	    echo "__SYSCALL_${abi}($nr, $entry)"
+	fi
+    else
+	echo "#ifdef CONFIG_X86_32"
+	if [ -n "$entry" ]; then
+	    echo "__SYSCALL_${abi}($nr, $entry)"
+	fi
+	echo "#else"
+	echo "__SYSCALL_${abi}($nr, $compat)"
+	echo "#endif"
     fi
 }
 
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 6ce39025f467..abec4c9f1c97 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -7,7 +7,7 @@
 #include <linux/lguest.h>
 #include "../../../drivers/lguest/lg.h"
 
-#define __SYSCALL_I386(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_I386(nr, sym) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_32.h>
 };
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 29db3b3f550c..9677bf9a616f 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -4,11 +4,11 @@
 
 #include <asm/ia32.h>
 
-#define __SYSCALL_64(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_64(nr, sym) [nr] = 1,
 static char syscalls_64[] = {
 #include <asm/syscalls_64.h>
 };
-#define __SYSCALL_I386(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_I386(nr, sym) [nr] = 1,
 static char syscalls_ia32[] = {
 #include <asm/syscalls_32.h>
 };
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index 439c0994b696..d4669a679fd0 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -25,11 +25,11 @@
 
 #define old_mmap sys_old_mmap
 
-#define __SYSCALL_I386(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_I386(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_32.h>
 
 #undef __SYSCALL_I386
-#define __SYSCALL_I386(nr, sym, compat) [ nr ] = sym,
+#define __SYSCALL_I386(nr, sym) [ nr ] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index 71a497cde921..6ee5268beb05 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -35,11 +35,11 @@
 #define stub_execveat sys_execveat
 #define stub_rt_sigreturn sys_rt_sigreturn
 
-#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 
 #undef __SYSCALL_64
-#define __SYSCALL_64(nr, sym, compat) [ nr ] = sym,
+#define __SYSCALL_64(nr, sym) [ nr ] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
index 5edf4f4bbf53..6c9a9c1eae32 100644
--- a/arch/x86/um/user-offsets.c
+++ b/arch/x86/um/user-offsets.c
@@ -9,12 +9,12 @@
 #include <asm/types.h>
 
 #ifdef __i386__
-#define __SYSCALL_I386(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_I386(nr, sym) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_32.h>
 };
 #else
-#define __SYSCALL_64(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_64(nr, sym) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_64.h>
 };
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 06/10] x86/syscalls: Add syscall entry qualifiers
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
                   ` (4 preceding siblings ...)
  2016-01-28 23:11 ` [PATCH v2 05/10] x86/syscalls: Move compat syscall entry handling into syscalltbl.sh Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:35   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 07/10] x86/entry/64: Always run ptregs-using syscalls on the slow path Andy Lutomirski
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

This will let us specify something like sys_xyz/foo instead of
sys_xyz in the syscall table, where the foo conveys some information
to the C code.

The intent is to allow things like sys_execve/ptregs to indicate
that sys_execve touches pt_regs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/syscall_32.c           |  4 ++--
 arch/x86/entry/syscall_64.c           |  4 ++--
 arch/x86/entry/syscalls/syscalltbl.sh | 19 ++++++++++++++++---
 arch/x86/kernel/asm-offsets_32.c      |  2 +-
 arch/x86/kernel/asm-offsets_64.c      |  4 ++--
 arch/x86/um/sys_call_table_32.c       |  4 ++--
 arch/x86/um/sys_call_table_64.c       |  4 ++--
 arch/x86/um/user-offsets.c            |  4 ++--
 8 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index 3e2829759da2..8f895ee13a1c 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -6,11 +6,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_I386(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_32.h>
 #undef __SYSCALL_I386
 
-#define __SYSCALL_I386(nr, sym) [nr] = sym,
+#define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 3781989b180e..a1d408772ae6 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -6,11 +6,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_64(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym) [nr] = sym,
+#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index b81479c8c5fb..cd3d3015d7df 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -3,6 +3,19 @@
 in="$1"
 out="$2"
 
+syscall_macro() {
+    abi="$1"
+    nr="$2"
+    entry="$3"
+
+    # Entry can be either just a function name or "function/qualifier"
+    real_entry="${entry%%/*}"
+    qualifier="${entry:${#real_entry}}"		# Strip the function name
+    qualifier="${qualifier:1}"			# Strip the slash, if any
+
+    echo "__SYSCALL_${abi}($nr, $real_entry, $qualifier)"
+}
+
 emit() {
     abi="$1"
     nr="$2"
@@ -16,15 +29,15 @@ emit() {
 
     if [ -z "$compat" ]; then
 	if [ -n "$entry" ]; then
-	    echo "__SYSCALL_${abi}($nr, $entry)"
+	    syscall_macro "$abi" "$nr" "$entry"
 	fi
     else
 	echo "#ifdef CONFIG_X86_32"
 	if [ -n "$entry" ]; then
-	    echo "__SYSCALL_${abi}($nr, $entry)"
+	    syscall_macro "$abi" "$nr" "$entry"
 	fi
 	echo "#else"
-	echo "__SYSCALL_${abi}($nr, $compat)"
+	syscall_macro "$abi" "$nr" "$compat"
 	echo "#endif"
     fi
 }
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index abec4c9f1c97..fdeb0ce07c16 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -7,7 +7,7 @@
 #include <linux/lguest.h>
 #include "../../../drivers/lguest/lg.h"
 
-#define __SYSCALL_I386(nr, sym) [nr] = 1,
+#define __SYSCALL_I386(nr, sym, qual) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_32.h>
 };
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 9677bf9a616f..d875f97d4e0b 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -4,11 +4,11 @@
 
 #include <asm/ia32.h>
 
-#define __SYSCALL_64(nr, sym) [nr] = 1,
+#define __SYSCALL_64(nr, sym, qual) [nr] = 1,
 static char syscalls_64[] = {
 #include <asm/syscalls_64.h>
 };
-#define __SYSCALL_I386(nr, sym) [nr] = 1,
+#define __SYSCALL_I386(nr, sym, qual) [nr] = 1,
 static char syscalls_ia32[] = {
 #include <asm/syscalls_32.h>
 };
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index d4669a679fd0..bfce503dffae 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -25,11 +25,11 @@
 
 #define old_mmap sys_old_mmap
 
-#define __SYSCALL_I386(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_32.h>
 
 #undef __SYSCALL_I386
-#define __SYSCALL_I386(nr, sym) [ nr ] = sym,
+#define __SYSCALL_I386(nr, sym, qual) [ nr ] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index 6ee5268beb05..f306413d3eb6 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -35,11 +35,11 @@
 #define stub_execveat sys_execveat
 #define stub_rt_sigreturn sys_rt_sigreturn
 
-#define __SYSCALL_64(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 
 #undef __SYSCALL_64
-#define __SYSCALL_64(nr, sym) [ nr ] = sym,
+#define __SYSCALL_64(nr, sym, qual) [ nr ] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
index 6c9a9c1eae32..470564bbd08e 100644
--- a/arch/x86/um/user-offsets.c
+++ b/arch/x86/um/user-offsets.c
@@ -9,12 +9,12 @@
 #include <asm/types.h>
 
 #ifdef __i386__
-#define __SYSCALL_I386(nr, sym) [nr] = 1,
+#define __SYSCALL_I386(nr, sym, qual) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_32.h>
 };
 #else
-#define __SYSCALL_64(nr, sym) [nr] = 1,
+#define __SYSCALL_64(nr, sym, qual) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_64.h>
 };
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 07/10] x86/entry/64: Always run ptregs-using syscalls on the slow path
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
                   ` (5 preceding siblings ...)
  2016-01-28 23:11 ` [PATCH v2 06/10] x86/syscalls: Add syscall entry qualifiers Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:35   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 08/10] x86/entry/64: Call all native slow-path syscalls with full pt-regs Andy Lutomirski
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

64-bit syscalls currently have an optimization in which they are
called with partial pt_regs.  A small handful require full pt_regs.

In the 32-bit and compat cases, I cleaned this up by forcing full
pt_regs for all syscalls.  The performance hit doesn't really matter.

I want to clean up the 64-bit case as well, but I don't want to hurt
fast path performance.  To do that, I want to force the syscalls
that use pt_regs onto the slow path.  This will enable us to make
slow path syscalls be real ABI-compliant C functions.

Use the new syscall entry qualification machinery for this.
stub_clone is now stub_clone/ptregs.

The next patch will eliminate the stubs, and we'll just have
sys_clone/ptregs.

As of this patch, two-phase entry tracing is no longer used.  It has
served its purpose (namely a huge speedup on some workloads prior to
more general opportunistic SYSRET support), and once the dust
settles I'll send patches to back it out.

The implementation is heavily based on a patch from Brian Gerst [1].

[1] http://lkml.kernel.org/g/1449666173-15366-1-git-send-email-brgerst@gmail.com

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: the arch/x86 maintainers <x86@kernel.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S              | 56 +++++++++++++++++++++++++---------
 arch/x86/entry/syscall_64.c            |  7 +++--
 arch/x86/entry/syscalls/syscall_64.tbl | 16 +++++-----
 3 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9d34d3cfceb6..f1c8f150728e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -182,7 +182,15 @@ entry_SYSCALL_64_fastpath:
 #endif
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
 	movq	%r10, %rcx
+
+	/*
+	 * This call instruction is handled specially in stub_ptregs_64.
+	 * It might end up jumping to the slow path.  If it jumps, RAX is
+	 * clobbered.
+	 */
 	call	*sys_call_table(, %rax, 8)
+.Lentry_SYSCALL_64_after_fastpath_call:
+
 	movq	%rax, RAX(%rsp)
 1:
 /*
@@ -235,25 +243,13 @@ GLOBAL(int_ret_from_sys_call_irqs_off)
 
 	/* Do syscall entry tracing */
 tracesys:
-	movq	%rsp, %rdi
-	movl	$AUDIT_ARCH_X86_64, %esi
-	call	syscall_trace_enter_phase1
-	test	%rax, %rax
-	jnz	tracesys_phase2			/* if needed, run the slow path */
-	RESTORE_C_REGS_EXCEPT_RAX		/* else restore clobbered regs */
-	movq	ORIG_RAX(%rsp), %rax
-	jmp	entry_SYSCALL_64_fastpath	/* and return to the fast path */
-
-tracesys_phase2:
 	SAVE_EXTRA_REGS
 	movq	%rsp, %rdi
-	movl	$AUDIT_ARCH_X86_64, %esi
-	movq	%rax, %rdx
-	call	syscall_trace_enter_phase2
+	call	syscall_trace_enter
 
 	/*
 	 * Reload registers from stack in case ptrace changed them.
-	 * We don't reload %rax because syscall_trace_entry_phase2() returned
+	 * We don't reload %rax because syscall_trace_enter() returned
 	 * the value it wants us to use in the table lookup.
 	 */
 	RESTORE_C_REGS_EXCEPT_RAX
@@ -355,6 +351,38 @@ opportunistic_sysret_failed:
 	jmp	restore_c_regs_and_iret
 END(entry_SYSCALL_64)
 
+ENTRY(stub_ptregs_64)
+	/*
+	 * Syscalls marked as needing ptregs land here.
+	 * If we are on the fast path, we need to save the extra regs.
+	 * If we are on the slow path, the extra regs are already saved.
+	 *
+	 * RAX stores a pointer to the C function implementing the syscall.
+	 */
+	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
+	jne	1f
+
+	/* Called from fast path -- pop return address and jump to slow path */
+	popq	%rax
+	jmp	tracesys	/* called from fast path */
+
+1:
+	/* Called from C */
+	jmp	*%rax				/* called from C */
+END(stub_ptregs_64)
+
+.macro ptregs_stub func
+ENTRY(ptregs_\func)
+	leaq	\func(%rip), %rax
+	jmp	stub_ptregs_64
+END(ptregs_\func)
+.endm
+
+/* Instantiate ptregs_stub for each ptregs-using syscall */
+#define __SYSCALL_64_QUAL_(sym)
+#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
+#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
+#include <asm/syscalls_64.h>
 
 	.macro FORK_LIKE func
 ENTRY(stub_\func)
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index a1d408772ae6..9dbc5abb6162 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -6,11 +6,14 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64_QUAL_(sym) sym
+#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
+
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
+#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index dc1040a50bdc..5de342a729d0 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -21,7 +21,7 @@
 12	common	brk			sys_brk
 13	64	rt_sigaction		sys_rt_sigaction
 14	common	rt_sigprocmask		sys_rt_sigprocmask
-15	64	rt_sigreturn		stub_rt_sigreturn
+15	64	rt_sigreturn		stub_rt_sigreturn/ptregs
 16	64	ioctl			sys_ioctl
 17	common	pread64			sys_pread64
 18	common	pwrite64		sys_pwrite64
@@ -62,10 +62,10 @@
 53	common	socketpair		sys_socketpair
 54	64	setsockopt		sys_setsockopt
 55	64	getsockopt		sys_getsockopt
-56	common	clone			stub_clone
-57	common	fork			stub_fork
-58	common	vfork			stub_vfork
-59	64	execve			stub_execve
+56	common	clone			stub_clone/ptregs
+57	common	fork			stub_fork/ptregs
+58	common	vfork			stub_vfork/ptregs
+59	64	execve			stub_execve/ptregs
 60	common	exit			sys_exit
 61	common	wait4			sys_wait4
 62	common	kill			sys_kill
@@ -328,7 +328,7 @@
 319	common	memfd_create		sys_memfd_create
 320	common	kexec_file_load		sys_kexec_file_load
 321	common	bpf			sys_bpf
-322	64	execveat		stub_execveat
+322	64	execveat		stub_execveat/ptregs
 323	common	userfaultfd		sys_userfaultfd
 324	common	membarrier		sys_membarrier
 325	common	mlock2			sys_mlock2
@@ -346,7 +346,7 @@
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
-520	x32	execve			stub_x32_execve
+520	x32	execve			stub_x32_execve/ptregs
 521	x32	ptrace			compat_sys_ptrace
 522	x32	rt_sigpending		compat_sys_rt_sigpending
 523	x32	rt_sigtimedwait		compat_sys_rt_sigtimedwait
@@ -371,4 +371,4 @@
 542	x32	getsockopt		compat_sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
 544	x32	io_submit		compat_sys_io_submit
-545	x32	execveat		stub_x32_execveat
+545	x32	execveat		stub_x32_execveat/ptregs
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 08/10] x86/entry/64: Call all native slow-path syscalls with full pt-regs
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
                   ` (6 preceding siblings ...)
  2016-01-28 23:11 ` [PATCH v2 07/10] x86/entry/64: Always run ptregs-using syscalls on the slow path Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:36   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 09/10] x86/entry/64: Stop using int_ret_from_sys_call in ret_from_fork Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 10/10] x86/entry/64: Migrate the 64-bit syscall slow path to C Andy Lutomirski
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

This removes all of the remaining asm syscall stubs except for
stub_ptregs_64.  Entries in the main syscall table are now all
callable from C.

The resulting asm is every bit as ridiculous as it looks.  The next
few patches will clean it up.  This patch is here to let reviewers
rest their brains and for bisection.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S              | 79 +---------------------------------
 arch/x86/entry/syscalls/syscall_64.tbl | 18 ++++----
 2 files changed, 10 insertions(+), 87 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f1c8f150728e..f7050a5d9dbc 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -253,7 +253,6 @@ tracesys:
 	 * the value it wants us to use in the table lookup.
 	 */
 	RESTORE_C_REGS_EXCEPT_RAX
-	RESTORE_EXTRA_REGS
 #if __SYSCALL_MASK == ~0
 	cmpq	$__NR_syscall_max, %rax
 #else
@@ -264,6 +263,7 @@ tracesys:
 	movq	%r10, %rcx			/* fixup for C */
 	call	*sys_call_table(, %rax, 8)
 	movq	%rax, RAX(%rsp)
+	RESTORE_EXTRA_REGS
 1:
 	/* Use IRET because user could have changed pt_regs->foo */
 
@@ -384,83 +384,6 @@ END(ptregs_\func)
 #define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
 #include <asm/syscalls_64.h>
 
-	.macro FORK_LIKE func
-ENTRY(stub_\func)
-	SAVE_EXTRA_REGS 8
-	jmp	sys_\func
-END(stub_\func)
-	.endm
-
-	FORK_LIKE  clone
-	FORK_LIKE  fork
-	FORK_LIKE  vfork
-
-ENTRY(stub_execve)
-	call	sys_execve
-return_from_execve:
-	testl	%eax, %eax
-	jz	1f
-	/* exec failed, can use fast SYSRET code path in this case */
-	ret
-1:
-	/* must use IRET code path (pt_regs->cs may have changed) */
-	addq	$8, %rsp
-	ZERO_EXTRA_REGS
-	movq	%rax, RAX(%rsp)
-	jmp	int_ret_from_sys_call
-END(stub_execve)
-/*
- * Remaining execve stubs are only 7 bytes long.
- * ENTRY() often aligns to 16 bytes, which in this case has no benefits.
- */
-	.align	8
-GLOBAL(stub_execveat)
-	call	sys_execveat
-	jmp	return_from_execve
-END(stub_execveat)
-
-#if defined(CONFIG_X86_X32_ABI)
-	.align	8
-GLOBAL(stub_x32_execve)
-	call	compat_sys_execve
-	jmp	return_from_execve
-END(stub_x32_execve)
-	.align	8
-GLOBAL(stub_x32_execveat)
-	call	compat_sys_execveat
-	jmp	return_from_execve
-END(stub_x32_execveat)
-#endif
-
-/*
- * sigreturn is special because it needs to restore all registers on return.
- * This cannot be done with SYSRET, so use the IRET return path instead.
- */
-ENTRY(stub_rt_sigreturn)
-	/*
-	 * SAVE_EXTRA_REGS result is not normally needed:
-	 * sigreturn overwrites all pt_regs->GPREGS.
-	 * But sigreturn can fail (!), and there is no easy way to detect that.
-	 * To make sure RESTORE_EXTRA_REGS doesn't restore garbage on error,
-	 * we SAVE_EXTRA_REGS here.
-	 */
-	SAVE_EXTRA_REGS 8
-	call	sys_rt_sigreturn
-return_from_stub:
-	addq	$8, %rsp
-	RESTORE_EXTRA_REGS
-	movq	%rax, RAX(%rsp)
-	jmp	int_ret_from_sys_call
-END(stub_rt_sigreturn)
-
-#ifdef CONFIG_X86_X32_ABI
-ENTRY(stub_x32_rt_sigreturn)
-	SAVE_EXTRA_REGS 8
-	call	sys32_x32_rt_sigreturn
-	jmp	return_from_stub
-END(stub_x32_rt_sigreturn)
-#endif
-
 /*
  * A newly forked process directly context switches into this address.
  *
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5de342a729d0..dcf107ce2cd4 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -21,7 +21,7 @@
 12	common	brk			sys_brk
 13	64	rt_sigaction		sys_rt_sigaction
 14	common	rt_sigprocmask		sys_rt_sigprocmask
-15	64	rt_sigreturn		stub_rt_sigreturn/ptregs
+15	64	rt_sigreturn		sys_rt_sigreturn/ptregs
 16	64	ioctl			sys_ioctl
 17	common	pread64			sys_pread64
 18	common	pwrite64		sys_pwrite64
@@ -62,10 +62,10 @@
 53	common	socketpair		sys_socketpair
 54	64	setsockopt		sys_setsockopt
 55	64	getsockopt		sys_getsockopt
-56	common	clone			stub_clone/ptregs
-57	common	fork			stub_fork/ptregs
-58	common	vfork			stub_vfork/ptregs
-59	64	execve			stub_execve/ptregs
+56	common	clone			sys_clone/ptregs
+57	common	fork			sys_fork/ptregs
+58	common	vfork			sys_vfork/ptregs
+59	64	execve			sys_execve/ptregs
 60	common	exit			sys_exit
 61	common	wait4			sys_wait4
 62	common	kill			sys_kill
@@ -328,7 +328,7 @@
 319	common	memfd_create		sys_memfd_create
 320	common	kexec_file_load		sys_kexec_file_load
 321	common	bpf			sys_bpf
-322	64	execveat		stub_execveat/ptregs
+322	64	execveat		sys_execveat/ptregs
 323	common	userfaultfd		sys_userfaultfd
 324	common	membarrier		sys_membarrier
 325	common	mlock2			sys_mlock2
@@ -339,14 +339,14 @@
 # for native 64-bit operation.
 #
 512	x32	rt_sigaction		compat_sys_rt_sigaction
-513	x32	rt_sigreturn		stub_x32_rt_sigreturn
+513	x32	rt_sigreturn		sys32_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
 515	x32	readv			compat_sys_readv
 516	x32	writev			compat_sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
-520	x32	execve			stub_x32_execve/ptregs
+520	x32	execve			compat_sys_execve/ptregs
 521	x32	ptrace			compat_sys_ptrace
 522	x32	rt_sigpending		compat_sys_rt_sigpending
 523	x32	rt_sigtimedwait		compat_sys_rt_sigtimedwait
@@ -371,4 +371,4 @@
 542	x32	getsockopt		compat_sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
 544	x32	io_submit		compat_sys_io_submit
-545	x32	execveat		stub_x32_execveat/ptregs
+545	x32	execveat		compat_sys_execveat/ptregs
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 09/10] x86/entry/64: Stop using int_ret_from_sys_call in ret_from_fork
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
                   ` (7 preceding siblings ...)
  2016-01-28 23:11 ` [PATCH v2 08/10] x86/entry/64: Call all native slow-path syscalls with full pt-regs Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:36   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  2016-01-28 23:11 ` [PATCH v2 10/10] x86/entry/64: Migrate the 64-bit syscall slow path to C Andy Lutomirski
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

ret_from_fork is now open-coded and is no longer tangled up with the
syscall code.  This isn't so bad -- this adds very little code, and
IMO the result is much easier to understand.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_64.S | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f7050a5d9dbc..cb5d940a7abd 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -390,7 +390,6 @@ END(ptregs_\func)
  * rdi: prev task we switched from
  */
 ENTRY(ret_from_fork)
-
 	LOCK ; btr $TIF_FORK, TI_flags(%r8)
 
 	pushq	$0x0002
@@ -398,28 +397,32 @@ ENTRY(ret_from_fork)
 
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
-	RESTORE_EXTRA_REGS
-
 	testb	$3, CS(%rsp)			/* from kernel_thread? */
+	jnz	1f
 
 	/*
-	 * By the time we get here, we have no idea whether our pt_regs,
-	 * ti flags, and ti status came from the 64-bit SYSCALL fast path,
-	 * the slow path, or one of the 32-bit compat paths.
-	 * Use IRET code path to return, since it can safely handle
-	 * all of the above.
+	 * We came from kernel_thread.  This code path is quite twisted, and
+	 * someone should clean it up.
+	 *
+	 * copy_thread_tls stashes the function pointer in RBX and the
+	 * parameter to be passed in RBP.  The called function is permitted
+	 * to call do_execve and thereby jump to user mode.
 	 */
-	jnz	int_ret_from_sys_call
+	movq	RBP(%rsp), %rdi
+	call	*RBX(%rsp)
+	movl	$0, RAX(%rsp)
 
 	/*
-	 * We came from kernel_thread
-	 * nb: we depend on RESTORE_EXTRA_REGS above
+	 * Fall through as though we're exiting a syscall.  This makes a
+	 * twisted sort of sense if we just called do_execve.
 	 */
-	movq	%rbp, %rdi
-	call	*%rbx
-	movl	$0, RAX(%rsp)
-	RESTORE_EXTRA_REGS
-	jmp	int_ret_from_sys_call
+
+1:
+	movq	%rsp, %rdi
+	call	syscall_return_slowpath	/* returns with IRQs disabled */
+	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
+	SWAPGS
+	jmp	restore_regs_and_iret
 END(ret_from_fork)
 
 /*
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 10/10] x86/entry/64: Migrate the 64-bit syscall slow path to C
  2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
                   ` (8 preceding siblings ...)
  2016-01-28 23:11 ` [PATCH v2 09/10] x86/entry/64: Stop using int_ret_from_sys_call in ret_from_fork Andy Lutomirski
@ 2016-01-28 23:11 ` Andy Lutomirski
  2016-01-29 11:36   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
  9 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-28 23:11 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Brian Gerst, Borislav Petkov,
	Frédéric Weisbecker, Denys Vlasenko, Linus Torvalds,
	Andy Lutomirski

This is more complicated than the 32-bit and compat cases because it
preserves an asm fast path for the case where the callee-saved regs
aren't needed in pt_regs and no entry or exit work needs to be done.

This appears to slow down fastpath syscalls by no more than one cycle
on my Skylake laptop.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/common.c   |  26 +++++++++++
 arch/x86/entry/entry_64.S | 117 ++++++++++++++++------------------------------
 2 files changed, 65 insertions(+), 78 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 03663740c866..75175f92f462 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -344,6 +344,32 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
 	prepare_exit_to_usermode(regs);
 }
 
+#ifdef CONFIG_X86_64
+__visible void do_syscall_64(struct pt_regs *regs)
+{
+	struct thread_info *ti = pt_regs_to_thread_info(regs);
+	unsigned long nr = regs->orig_ax;
+
+	local_irq_enable();
+
+	if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY)
+		nr = syscall_trace_enter(regs);
+
+	/*
+	 * NB: Native and x32 syscalls are dispatched from the same
+	 * table.  The only functional difference is the x32 bit in
+	 * regs->orig_ax, which changes the behavior of some syscalls.
+	 */
+	if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
+		regs->ax = sys_call_table[nr & __SYSCALL_MASK](
+			regs->di, regs->si, regs->dx,
+			regs->r10, regs->r8, regs->r9);
+	}
+
+	syscall_return_slowpath(regs);
+}
+#endif
+
 #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
 /*
  * Does a 32-bit syscall.  Called with IRQs on and does all entry and
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cb5d940a7abd..567aa522ac0a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -145,17 +145,11 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
 	movq	%rsp, PER_CPU_VAR(rsp_scratch)
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
+	TRACE_IRQS_OFF
+
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER_DS			/* pt_regs->ss */
 	pushq	PER_CPU_VAR(rsp_scratch)	/* pt_regs->sp */
-	/*
-	 * Re-enable interrupts.
-	 * We use 'rsp_scratch' as a scratch space, hence irq-off block above
-	 * must execute atomically in the face of possible interrupt-driven
-	 * task preemption. We must enable interrupts only after we're done
-	 * with using rsp_scratch:
-	 */
-	ENABLE_INTERRUPTS(CLBR_NONE)
 	pushq	%r11				/* pt_regs->flags */
 	pushq	$__USER_CS			/* pt_regs->cs */
 	pushq	%rcx				/* pt_regs->ip */
@@ -171,9 +165,21 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
 	pushq	%r11				/* pt_regs->r11 */
 	sub	$(6*8), %rsp			/* pt_regs->bp, bx, r12-15 not saved */
 
-	testl	$_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jnz	tracesys
+	/*
+	 * If we need to do entry work or if we guess we'll need to do
+	 * exit work, go straight to the slow path.
+	 */
+	testl	$_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+	jnz	entry_SYSCALL64_slow_path
+
 entry_SYSCALL_64_fastpath:
+	/*
+	 * Easy case: enable interrupts and issue the syscall.  If the syscall
+	 * needs pt_regs, we'll call a stub that disables interrupts again
+	 * and jumps to the slow path.
+	 */
+	TRACE_IRQS_ON
+	ENABLE_INTERRUPTS(CLBR_NONE)
 #if __SYSCALL_MASK == ~0
 	cmpq	$__NR_syscall_max, %rax
 #else
@@ -193,88 +199,43 @@ entry_SYSCALL_64_fastpath:
 
 	movq	%rax, RAX(%rsp)
 1:
-/*
- * Syscall return path ending with SYSRET (fast path).
- * Has incompletely filled pt_regs.
- */
-	LOCKDEP_SYS_EXIT
-	/*
-	 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
-	 * it is too small to ever cause noticeable irq latency.
-	 */
-	DISABLE_INTERRUPTS(CLBR_NONE)
 
 	/*
-	 * We must check ti flags with interrupts (or at least preemption)
-	 * off because we must *never* return to userspace without
-	 * processing exit work that is enqueued if we're preempted here.
-	 * In particular, returning to userspace with any of the one-shot
-	 * flags (TIF_NOTIFY_RESUME, TIF_USER_RETURN_NOTIFY, etc) set is
-	 * very bad.
+	 * If we get here, then we know that pt_regs is clean for SYSRET64.
+	 * If we see that no exit work is required (which we are required
+	 * to check with IRQs off), then we can go straight to SYSRET64.
 	 */
+	DISABLE_INTERRUPTS(CLBR_NONE)
+	TRACE_IRQS_OFF
 	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jnz	int_ret_from_sys_call_irqs_off	/* Go to the slow path */
+	jnz	1f
 
-	RESTORE_C_REGS_EXCEPT_RCX_R11
-	movq	RIP(%rsp), %rcx
-	movq	EFLAGS(%rsp), %r11
+	LOCKDEP_SYS_EXIT
+	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
+	RESTORE_C_REGS
 	movq	RSP(%rsp), %rsp
-	/*
-	 * 64-bit SYSRET restores rip from rcx,
-	 * rflags from r11 (but RF and VM bits are forced to 0),
-	 * cs and ss are loaded from MSRs.
-	 * Restoration of rflags re-enables interrupts.
-	 *
-	 * NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
-	 * descriptor is not reinitialized.  This means that we should
-	 * avoid SYSRET with SS == NULL, which could happen if we schedule,
-	 * exit the kernel, and re-enter using an interrupt vector.  (All
-	 * interrupt entries on x86_64 set SS to NULL.)  We prevent that
-	 * from happening by reloading SS in __switch_to.  (Actually
-	 * detecting the failure in 64-bit userspace is tricky but can be
-	 * done.)
-	 */
 	USERGS_SYSRET64
 
-GLOBAL(int_ret_from_sys_call_irqs_off)
+1:
+	/*
+	 * The fast path looked good when we started, but something changed
+	 * along the way and we need to switch to the slow path.  Calling
+	 * raise(3) will trigger this, for example.  IRQs are off.
+	 */
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	jmp int_ret_from_sys_call
-
-	/* Do syscall entry tracing */
-tracesys:
 	SAVE_EXTRA_REGS
 	movq	%rsp, %rdi
-	call	syscall_trace_enter
-
-	/*
-	 * Reload registers from stack in case ptrace changed them.
-	 * We don't reload %rax because syscall_trace_enter() returned
-	 * the value it wants us to use in the table lookup.
-	 */
-	RESTORE_C_REGS_EXCEPT_RAX
-#if __SYSCALL_MASK == ~0
-	cmpq	$__NR_syscall_max, %rax
-#else
-	andl	$__SYSCALL_MASK, %eax
-	cmpl	$__NR_syscall_max, %eax
-#endif
-	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
-	movq	%r10, %rcx			/* fixup for C */
-	call	*sys_call_table(, %rax, 8)
-	movq	%rax, RAX(%rsp)
-	RESTORE_EXTRA_REGS
-1:
-	/* Use IRET because user could have changed pt_regs->foo */
+	call	syscall_return_slowpath	/* returns with IRQs disabled */
+	jmp	return_from_SYSCALL_64
 
-/*
- * Syscall return path ending with IRET.
- * Has correct iret frame.
- */
-GLOBAL(int_ret_from_sys_call)
+entry_SYSCALL64_slow_path:
+	/* IRQs are off. */
 	SAVE_EXTRA_REGS
 	movq	%rsp, %rdi
-	call	syscall_return_slowpath	/* returns with IRQs disabled */
+	call	do_syscall_64		/* returns with IRQs disabled */
+
+return_from_SYSCALL_64:
 	RESTORE_EXTRA_REGS
 	TRACE_IRQS_IRETQ		/* we're about to change IF */
 
@@ -364,7 +325,7 @@ ENTRY(stub_ptregs_64)
 
 	/* Called from fast path -- pop return address and jump to slow path */
 	popq	%rax
-	jmp	tracesys	/* called from fast path */
+	jmp	entry_SYSCALL64_slow_path	/* called from fast path */
 
 1:
 	/* Called from C */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] selftests/x86: Extend Makefile to allow 64-bit-only tests
  2016-01-28 23:11 ` [PATCH v2 01/10] selftests/x86: Extend Makefile to allow 64-bit-only tests Andy Lutomirski
@ 2016-01-29 11:33   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, luto, luto, tglx, shuahkhan, linux-kernel, brgerst, dvlasenk,
	peterz, hpa, fweisbec, mingo, torvalds

Commit-ID:  c31b34255b48d1a169693c9c70c49ad6418cfd20
Gitweb:     http://git.kernel.org/tip/c31b34255b48d1a169693c9c70c49ad6418cfd20
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:19 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:36 +0100

selftests/x86: Extend Makefile to allow 64-bit-only tests

Previously the Makefile supported 32-bit-only tests and tests
that were 32-bit and 64-bit.  This adds the support for tests
that are only built as 64-bit binaries.

There aren't any yet, but there might be a few some day.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuahkhan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/99789bfe65706e6df32cc7e13f656e8c9fa92031.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 tools/testing/selftests/x86/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index d0c473f..9c81f26 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -11,8 +11,9 @@ TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault sigreturn test_syscall
 			vdso_restorer
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
+TARGETS_C_64BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_64BIT_ONLY)
 BINARIES_32 := $(TARGETS_C_32BIT_ALL:%=%_32)
-BINARIES_64 := $(TARGETS_C_BOTHBITS:%=%_64)
+BINARIES_64 := $(TARGETS_C_64BIT_ALL:%=%_64)
 
 CFLAGS := -O2 -g -std=gnu99 -pthread -Wall
 
@@ -40,7 +41,7 @@ clean:
 $(TARGETS_C_32BIT_ALL:%=%_32): %_32: %.c
 	$(CC) -m32 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl -lm
 
-$(TARGETS_C_BOTHBITS:%=%_64): %_64: %.c
+$(TARGETS_C_64BIT_ALL:%=%_64): %_64: %.c
 	$(CC) -m64 -o $@ $(CFLAGS) $(EXTRA_CFLAGS) $^ -lrt -ldl
 
 # x86_64 users should be encouraged to install 32-bit libraries

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] selftests/x86: Add check_initial_reg_state()
  2016-01-28 23:11 ` [PATCH v2 02/10] selftests/x86: Add check_initial_reg_state Andy Lutomirski
@ 2016-01-29 11:34   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: fweisbec, luto, tglx, hpa, luto, bp, torvalds, dvlasenk, mingo,
	linux-kernel, brgerst, peterz

Commit-ID:  e21d50f3864e2a8995f5d2a41dea3f0fa07758b4
Gitweb:     http://git.kernel.org/tip/e21d50f3864e2a8995f5d2a41dea3f0fa07758b4
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:20 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:37 +0100

selftests/x86: Add check_initial_reg_state()

This checks that ELF binaries are started with an appropriately
blank register state.

( There's currently a nasty special case in the entry asm to
  arrange for this. I'm planning on removing the special case,
  and this will help make sure I don't break it. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ef54f8d066b30a3eb36bbf26300eebb242185700.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 tools/testing/selftests/x86/Makefile               |   9 +-
 .../selftests/x86/check_initial_reg_state.c        | 109 +++++++++++++++++++++
 2 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index 9c81f26..df4f767 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -4,7 +4,8 @@ include ../lib.mk
 
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
-TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_syscall
+TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt ptrace_syscall \
+			check_initial_reg_state
 TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault sigreturn test_syscall_vdso unwind_vdso \
 			test_FCMOV test_FCOMI test_FISTTP \
 			ldt_gdt \
@@ -66,3 +67,9 @@ endif
 sysret_ss_attrs_64: thunks.S
 ptrace_syscall_32: raw_syscall_helper_32.S
 test_syscall_vdso_32: thunks_32.S
+
+# check_initial_reg_state is special: it needs a custom entry, and it
+# needs to be static so that its interpreter doesn't destroy its initial
+# state.
+check_initial_reg_state_32: CFLAGS += -Wl,-ereal_start -static
+check_initial_reg_state_64: CFLAGS += -Wl,-ereal_start -static
diff --git a/tools/testing/selftests/x86/check_initial_reg_state.c b/tools/testing/selftests/x86/check_initial_reg_state.c
new file mode 100644
index 0000000..6aaed9b
--- /dev/null
+++ b/tools/testing/selftests/x86/check_initial_reg_state.c
@@ -0,0 +1,109 @@
+/*
+ * check_initial_reg_state.c - check that execve sets the correct state
+ * Copyright (c) 2014-2016 Andrew Lutomirski
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+
+unsigned long ax, bx, cx, dx, si, di, bp, sp, flags;
+unsigned long r8, r9, r10, r11, r12, r13, r14, r15;
+
+asm (
+	".pushsection .text\n\t"
+	".type real_start, @function\n\t"
+	".global real_start\n\t"
+	"real_start:\n\t"
+#ifdef __x86_64__
+	"mov %rax, ax\n\t"
+	"mov %rbx, bx\n\t"
+	"mov %rcx, cx\n\t"
+	"mov %rdx, dx\n\t"
+	"mov %rsi, si\n\t"
+	"mov %rdi, di\n\t"
+	"mov %rbp, bp\n\t"
+	"mov %rsp, sp\n\t"
+	"mov %r8, r8\n\t"
+	"mov %r9, r9\n\t"
+	"mov %r10, r10\n\t"
+	"mov %r11, r11\n\t"
+	"mov %r12, r12\n\t"
+	"mov %r13, r13\n\t"
+	"mov %r14, r14\n\t"
+	"mov %r15, r15\n\t"
+	"pushfq\n\t"
+	"popq flags\n\t"
+#else
+	"mov %eax, ax\n\t"
+	"mov %ebx, bx\n\t"
+	"mov %ecx, cx\n\t"
+	"mov %edx, dx\n\t"
+	"mov %esi, si\n\t"
+	"mov %edi, di\n\t"
+	"mov %ebp, bp\n\t"
+	"mov %esp, sp\n\t"
+	"pushfl\n\t"
+	"popl flags\n\t"
+#endif
+	"jmp _start\n\t"
+	".size real_start, . - real_start\n\t"
+	".popsection");
+
+int main()
+{
+	int nerrs = 0;
+
+	if (sp == 0) {
+		printf("[FAIL]\tTest was built incorrectly\n");
+		return 1;
+	}
+
+	if (ax || bx || cx || dx || si || di || bp
+#ifdef __x86_64__
+	    || r8 || r9 || r10 || r11 || r12 || r13 || r14 || r15
+#endif
+		) {
+		printf("[FAIL]\tAll GPRs except SP should be 0\n");
+#define SHOW(x) printf("\t" #x " = 0x%lx\n", x);
+		SHOW(ax);
+		SHOW(bx);
+		SHOW(cx);
+		SHOW(dx);
+		SHOW(si);
+		SHOW(di);
+		SHOW(bp);
+		SHOW(sp);
+#ifdef __x86_64__
+		SHOW(r8);
+		SHOW(r9);
+		SHOW(r10);
+		SHOW(r11);
+		SHOW(r12);
+		SHOW(r13);
+		SHOW(r14);
+		SHOW(r15);
+#endif
+		nerrs++;
+	} else {
+		printf("[OK]\tAll GPRs except SP are 0\n");
+	}
+
+	if (flags != 0x202) {
+		printf("[FAIL]\tFLAGS is 0x%lx, but it should be 0x202\n", flags);
+		nerrs++;
+	} else {
+		printf("[OK]\tFLAGS is 0x202\n");
+	}
+
+	return nerrs ? 1 : 0;
+}

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] x86/syscalls: Refactor syscalltbl.sh
  2016-01-28 23:11 ` [PATCH v2 03/10] x86/syscalls: Refactor syscalltbl.sh Andy Lutomirski
@ 2016-01-29 11:34   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, torvalds, mingo, tglx, luto, dvlasenk, fweisbec,
	linux-kernel, luto, peterz, hpa, bp

Commit-ID:  fba324744bfd2a7948a7710d7a021d76dafb9b67
Gitweb:     http://git.kernel.org/tip/fba324744bfd2a7948a7710d7a021d76dafb9b67
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:21 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:37 +0100

x86/syscalls: Refactor syscalltbl.sh

This splits out the code to emit a syscall line.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1bfcbba991f5cfaa9291ff950a593daa972a205f.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/syscalls/syscalltbl.sh | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index 0e7f8ec..167965e 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -3,13 +3,21 @@
 in="$1"
 out="$2"
 
+emit() {
+    abi="$1"
+    nr="$2"
+    entry="$3"
+    compat="$4"
+    if [ -n "$compat" ]; then
+	echo "__SYSCALL_${abi}($nr, $entry, $compat)"
+    elif [ -n "$entry" ]; then
+	echo "__SYSCALL_${abi}($nr, $entry, $entry)"
+    fi
+}
+
 grep '^[0-9]' "$in" | sort -n | (
     while read nr abi name entry compat; do
 	abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
-	if [ -n "$compat" ]; then
-	    echo "__SYSCALL_${abi}($nr, $entry, $compat)"
-	elif [ -n "$entry" ]; then
-	    echo "__SYSCALL_${abi}($nr, $entry, $entry)"
-	fi
+	emit "$abi" "$nr" "$entry" "$compat"
     done
 ) > "$out"

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
  2016-01-28 23:11 ` [PATCH v2 04/10] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32 Andy Lutomirski
@ 2016-01-29 11:34   ` tip-bot for Andy Lutomirski
  2016-01-29 21:23     ` H. Peter Anvin
  0 siblings, 1 reply; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, brgerst, hpa, luto, bp, luto, linux-kernel, mingo,
	tglx, dvlasenk, peterz, fweisbec

Commit-ID:  32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
Gitweb:     http://git.kernel.org/tip/32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:22 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:37 +0100

x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32

The common/64/x32 distinction has no effect other than
determining which kernels actually support the syscall.  Move
the logic into syscalltbl.sh.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/58d4a95f40e43b894f93288b4a3633963d0ee22e.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/syscall_64.c           |  8 --------
 arch/x86/entry/syscalls/syscalltbl.sh | 17 ++++++++++++++++-
 arch/x86/kernel/asm-offsets_64.c      |  6 ------
 arch/x86/um/sys_call_table_64.c       |  3 ---
 arch/x86/um/user-offsets.c            |  2 --
 5 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 41283d2..974fd89 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -6,14 +6,6 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
-
-#ifdef CONFIG_X86_X32_ABI
-# define __SYSCALL_X32(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
-#else
-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
-#endif
-
 #define __SYSCALL_64(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index 167965e..5ebeaf1 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -18,6 +18,21 @@ emit() {
 grep '^[0-9]' "$in" | sort -n | (
     while read nr abi name entry compat; do
 	abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
-	emit "$abi" "$nr" "$entry" "$compat"
+	if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then
+	    # COMMON is the same as 64, except that we don't expect X32
+	    # programs to use it.  Our expectation has nothing to do with
+	    # any generated code, so treat them the same.
+	    emit 64 "$nr" "$entry" "$compat"
+	elif [ "$abi" == "X32" ]; then
+	    # X32 is equivalent to 64 on an X32-compatible kernel.
+	    echo "#ifdef CONFIG_X86_X32_ABI"
+	    emit 64 "$nr" "$entry" "$compat"
+	    echo "#endif"
+	elif [ "$abi" == "I386" ]; then
+	    emit "$abi" "$nr" "$entry" "$compat"
+	else
+	    echo "Unknown abi $abi" >&2
+	    exit 1
+	fi
     done
 ) > "$out"
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index f2edafb..29db3b3 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -5,12 +5,6 @@
 #include <asm/ia32.h>
 
 #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
-#ifdef CONFIG_X86_X32_ABI
-# define __SYSCALL_X32(nr, sym, compat) [nr] = 1,
-#else
-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
-#endif
 static char syscalls_64[] = {
 #include <asm/syscalls_64.h>
 };
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index b74ea6c..71a497c 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -35,9 +35,6 @@
 #define stub_execveat sys_execveat
 #define stub_rt_sigreturn sys_rt_sigreturn
 
-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
-
 #define __SYSCALL_64(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 
diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
index ce7e360..5edf4f4 100644
--- a/arch/x86/um/user-offsets.c
+++ b/arch/x86/um/user-offsets.c
@@ -15,8 +15,6 @@ static char syscalls[] = {
 };
 #else
 #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
 static char syscalls[] = {
 #include <asm/syscalls_64.h>
 };

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] x86/syscalls: Move compat syscall entry handling into syscalltbl.sh
  2016-01-28 23:11 ` [PATCH v2 05/10] x86/syscalls: Move compat syscall entry handling into syscalltbl.sh Andy Lutomirski
@ 2016-01-29 11:35   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, torvalds, tglx, luto, hpa, bp, luto, linux-kernel,
	peterz, mingo, fweisbec, dvlasenk

Commit-ID:  3e65654e3db6df6aba9c5b895f8b8e6a8d8eb508
Gitweb:     http://git.kernel.org/tip/3e65654e3db6df6aba9c5b895f8b8e6a8d8eb508
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:23 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:37 +0100

x86/syscalls: Move compat syscall entry handling into syscalltbl.sh

Rather than duplicating the compat entry handling in all
consumers of syscalls_BITS.h, handle it directly in
syscalltbl.sh.  Now we generate entries in syscalls_32.h like:

__SYSCALL_I386(5, sys_open)
__SYSCALL_I386(5, compat_sys_open)

and all of its consumers implicitly get the right entry point.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b7c2b501dc0e6e43050e916b95807c3e2e16e9bb.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/syscall_32.c           | 10 ++--------
 arch/x86/entry/syscall_64.c           |  4 ++--
 arch/x86/entry/syscalls/syscalltbl.sh | 22 ++++++++++++++++++----
 arch/x86/kernel/asm-offsets_32.c      |  2 +-
 arch/x86/kernel/asm-offsets_64.c      |  4 ++--
 arch/x86/um/sys_call_table_32.c       |  4 ++--
 arch/x86/um/sys_call_table_64.c       |  4 ++--
 arch/x86/um/user-offsets.c            |  4 ++--
 8 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index 9a66498..3e28297 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -6,17 +6,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#ifdef CONFIG_IA32_EMULATION
-#define SYM(sym, compat) compat
-#else
-#define SYM(sym, compat) sym
-#endif
-
-#define __SYSCALL_I386(nr, sym, compat) extern asmlinkage long SYM(sym, compat)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_I386(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_32.h>
 #undef __SYSCALL_I386
 
-#define __SYSCALL_I386(nr, sym, compat) [nr] = SYM(sym, compat),
+#define __SYSCALL_I386(nr, sym) [nr] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 974fd89..3781989 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -6,11 +6,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym, compat) [nr] = sym,
+#define __SYSCALL_64(nr, sym) [nr] = sym,
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index 5ebeaf1..b81479c 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -8,10 +8,24 @@ emit() {
     nr="$2"
     entry="$3"
     compat="$4"
-    if [ -n "$compat" ]; then
-	echo "__SYSCALL_${abi}($nr, $entry, $compat)"
-    elif [ -n "$entry" ]; then
-	echo "__SYSCALL_${abi}($nr, $entry, $entry)"
+
+    if [ "$abi" == "64" -a -n "$compat" ]; then
+	echo "a compat entry for a 64-bit syscall makes no sense" >&2
+	exit 1
+    fi
+
+    if [ -z "$compat" ]; then
+	if [ -n "$entry" ]; then
+	    echo "__SYSCALL_${abi}($nr, $entry)"
+	fi
+    else
+	echo "#ifdef CONFIG_X86_32"
+	if [ -n "$entry" ]; then
+	    echo "__SYSCALL_${abi}($nr, $entry)"
+	fi
+	echo "#else"
+	echo "__SYSCALL_${abi}($nr, $compat)"
+	echo "#endif"
     fi
 }
 
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 6ce3902..abec4c9 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -7,7 +7,7 @@
 #include <linux/lguest.h>
 #include "../../../drivers/lguest/lg.h"
 
-#define __SYSCALL_I386(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_I386(nr, sym) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_32.h>
 };
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 29db3b3..9677bf9 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -4,11 +4,11 @@
 
 #include <asm/ia32.h>
 
-#define __SYSCALL_64(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_64(nr, sym) [nr] = 1,
 static char syscalls_64[] = {
 #include <asm/syscalls_64.h>
 };
-#define __SYSCALL_I386(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_I386(nr, sym) [nr] = 1,
 static char syscalls_ia32[] = {
 #include <asm/syscalls_32.h>
 };
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index 439c099..d4669a6 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -25,11 +25,11 @@
 
 #define old_mmap sys_old_mmap
 
-#define __SYSCALL_I386(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_I386(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_32.h>
 
 #undef __SYSCALL_I386
-#define __SYSCALL_I386(nr, sym, compat) [ nr ] = sym,
+#define __SYSCALL_I386(nr, sym) [ nr ] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index 71a497c..6ee5268 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -35,11 +35,11 @@
 #define stub_execveat sys_execveat
 #define stub_rt_sigreturn sys_rt_sigreturn
 
-#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 
 #undef __SYSCALL_64
-#define __SYSCALL_64(nr, sym, compat) [ nr ] = sym,
+#define __SYSCALL_64(nr, sym) [ nr ] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
index 5edf4f4..6c9a9c1 100644
--- a/arch/x86/um/user-offsets.c
+++ b/arch/x86/um/user-offsets.c
@@ -9,12 +9,12 @@
 #include <asm/types.h>
 
 #ifdef __i386__
-#define __SYSCALL_I386(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_I386(nr, sym) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_32.h>
 };
 #else
-#define __SYSCALL_64(nr, sym, compat) [nr] = 1,
+#define __SYSCALL_64(nr, sym) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_64.h>
 };

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] x86/syscalls: Add syscall entry qualifiers
  2016-01-28 23:11 ` [PATCH v2 06/10] x86/syscalls: Add syscall entry qualifiers Andy Lutomirski
@ 2016-01-29 11:35   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, brgerst, fweisbec, dvlasenk, luto, luto, tglx, linux-kernel,
	hpa, torvalds, peterz, mingo

Commit-ID:  cfcbadb49dabb05efa23e1a0f95f3391c0a815bc
Gitweb:     http://git.kernel.org/tip/cfcbadb49dabb05efa23e1a0f95f3391c0a815bc
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:24 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:38 +0100

x86/syscalls: Add syscall entry qualifiers

This will let us specify something like 'sys_xyz/foo' instead of
'sys_xyz' in the syscall table, where the 'foo' qualifier conveys
some extra information to the C code.

The intent is to allow things like sys_execve/ptregs to indicate
that sys_execve() touches pt_regs.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2de06e33dce62556b3ec662006fcb295504e296e.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/syscall_32.c           |  4 ++--
 arch/x86/entry/syscall_64.c           |  4 ++--
 arch/x86/entry/syscalls/syscalltbl.sh | 19 ++++++++++++++++---
 arch/x86/kernel/asm-offsets_32.c      |  2 +-
 arch/x86/kernel/asm-offsets_64.c      |  4 ++--
 arch/x86/um/sys_call_table_32.c       |  4 ++--
 arch/x86/um/sys_call_table_64.c       |  4 ++--
 arch/x86/um/user-offsets.c            |  4 ++--
 8 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index 3e28297..8f895ee 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -6,11 +6,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_I386(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_32.h>
 #undef __SYSCALL_I386
 
-#define __SYSCALL_I386(nr, sym) [nr] = sym,
+#define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 3781989..a1d4087 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -6,11 +6,11 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_64(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym) [nr] = sym,
+#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh
index b81479c..cd3d301 100644
--- a/arch/x86/entry/syscalls/syscalltbl.sh
+++ b/arch/x86/entry/syscalls/syscalltbl.sh
@@ -3,6 +3,19 @@
 in="$1"
 out="$2"
 
+syscall_macro() {
+    abi="$1"
+    nr="$2"
+    entry="$3"
+
+    # Entry can be either just a function name or "function/qualifier"
+    real_entry="${entry%%/*}"
+    qualifier="${entry:${#real_entry}}"		# Strip the function name
+    qualifier="${qualifier:1}"			# Strip the slash, if any
+
+    echo "__SYSCALL_${abi}($nr, $real_entry, $qualifier)"
+}
+
 emit() {
     abi="$1"
     nr="$2"
@@ -16,15 +29,15 @@ emit() {
 
     if [ -z "$compat" ]; then
 	if [ -n "$entry" ]; then
-	    echo "__SYSCALL_${abi}($nr, $entry)"
+	    syscall_macro "$abi" "$nr" "$entry"
 	fi
     else
 	echo "#ifdef CONFIG_X86_32"
 	if [ -n "$entry" ]; then
-	    echo "__SYSCALL_${abi}($nr, $entry)"
+	    syscall_macro "$abi" "$nr" "$entry"
 	fi
 	echo "#else"
-	echo "__SYSCALL_${abi}($nr, $compat)"
+	syscall_macro "$abi" "$nr" "$compat"
 	echo "#endif"
     fi
 }
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index abec4c9..fdeb0ce 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -7,7 +7,7 @@
 #include <linux/lguest.h>
 #include "../../../drivers/lguest/lg.h"
 
-#define __SYSCALL_I386(nr, sym) [nr] = 1,
+#define __SYSCALL_I386(nr, sym, qual) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_32.h>
 };
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 9677bf9..d875f97 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -4,11 +4,11 @@
 
 #include <asm/ia32.h>
 
-#define __SYSCALL_64(nr, sym) [nr] = 1,
+#define __SYSCALL_64(nr, sym, qual) [nr] = 1,
 static char syscalls_64[] = {
 #include <asm/syscalls_64.h>
 };
-#define __SYSCALL_I386(nr, sym) [nr] = 1,
+#define __SYSCALL_I386(nr, sym, qual) [nr] = 1,
 static char syscalls_ia32[] = {
 #include <asm/syscalls_32.h>
 };
diff --git a/arch/x86/um/sys_call_table_32.c b/arch/x86/um/sys_call_table_32.c
index d4669a6..bfce503 100644
--- a/arch/x86/um/sys_call_table_32.c
+++ b/arch/x86/um/sys_call_table_32.c
@@ -25,11 +25,11 @@
 
 #define old_mmap sys_old_mmap
 
-#define __SYSCALL_I386(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_32.h>
 
 #undef __SYSCALL_I386
-#define __SYSCALL_I386(nr, sym) [ nr ] = sym,
+#define __SYSCALL_I386(nr, sym, qual) [ nr ] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index 6ee5268..f306413 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -35,11 +35,11 @@
 #define stub_execveat sys_execveat
 #define stub_rt_sigreturn sys_rt_sigreturn
 
-#define __SYSCALL_64(nr, sym) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
 #include <asm/syscalls_64.h>
 
 #undef __SYSCALL_64
-#define __SYSCALL_64(nr, sym) [ nr ] = sym,
+#define __SYSCALL_64(nr, sym, qual) [ nr ] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
index 6c9a9c1..470564b 100644
--- a/arch/x86/um/user-offsets.c
+++ b/arch/x86/um/user-offsets.c
@@ -9,12 +9,12 @@
 #include <asm/types.h>
 
 #ifdef __i386__
-#define __SYSCALL_I386(nr, sym) [nr] = 1,
+#define __SYSCALL_I386(nr, sym, qual) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_32.h>
 };
 #else
-#define __SYSCALL_64(nr, sym) [nr] = 1,
+#define __SYSCALL_64(nr, sym, qual) [nr] = 1,
 static char syscalls[] = {
 #include <asm/syscalls_64.h>
 };

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] x86/entry/64: Always run ptregs-using syscalls on the slow path
  2016-01-28 23:11 ` [PATCH v2 07/10] x86/entry/64: Always run ptregs-using syscalls on the slow path Andy Lutomirski
@ 2016-01-29 11:35   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: fweisbec, hpa, luto, mingo, bp, linux-kernel, luto, torvalds,
	peterz, tglx, dvlasenk

Commit-ID:  302f5b260c322696cbeb962a263a4d2d99864aed
Gitweb:     http://git.kernel.org/tip/302f5b260c322696cbeb962a263a4d2d99864aed
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:25 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:38 +0100

x86/entry/64: Always run ptregs-using syscalls on the slow path

64-bit syscalls currently have an optimization in which they are
called with partial pt_regs.  A small handful require full
pt_regs.

In the 32-bit and compat cases, I cleaned this up by forcing
full pt_regs for all syscalls.  The performance hit doesn't
really matter as the affected system calls are fundamentally
heavy and this is the 32-bit compat case.

I want to clean up the 64-bit case as well, but I don't want to
hurt fast path performance.  To do that, I want to force the
syscalls that use pt_regs onto the slow path.  This will enable
us to make slow path syscalls be real ABI-compliant C functions.

Use the new syscall entry qualification machinery for this.
'stub_clone' is now 'stub_clone/ptregs'.

The next patch will eliminate the stubs, and we'll just have
'sys_clone/ptregs'.

As of this patch, two-phase entry tracing is no longer used.  It
has served its purpose (namely a huge speedup on some workloads
prior to more general opportunistic SYSRET support), and once
the dust settles I'll send patches to back it out.

The implementation is heavily based on a patch from Brian Gerst:

  http://lkml.kernel.org/g/1449666173-15366-1-git-send-email-brgerst@gmail.com

Originally-From: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b9beda88460bcefec6e7d792bd44eca9b760b0c4.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S              | 56 +++++++++++++++++++++++++---------
 arch/x86/entry/syscall_64.c            |  7 +++--
 arch/x86/entry/syscalls/syscall_64.tbl | 16 +++++-----
 3 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9d34d3c..f1c8f15 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -182,7 +182,15 @@ entry_SYSCALL_64_fastpath:
 #endif
 	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
 	movq	%r10, %rcx
+
+	/*
+	 * This call instruction is handled specially in stub_ptregs_64.
+	 * It might end up jumping to the slow path.  If it jumps, RAX is
+	 * clobbered.
+	 */
 	call	*sys_call_table(, %rax, 8)
+.Lentry_SYSCALL_64_after_fastpath_call:
+
 	movq	%rax, RAX(%rsp)
 1:
 /*
@@ -235,25 +243,13 @@ GLOBAL(int_ret_from_sys_call_irqs_off)
 
 	/* Do syscall entry tracing */
 tracesys:
-	movq	%rsp, %rdi
-	movl	$AUDIT_ARCH_X86_64, %esi
-	call	syscall_trace_enter_phase1
-	test	%rax, %rax
-	jnz	tracesys_phase2			/* if needed, run the slow path */
-	RESTORE_C_REGS_EXCEPT_RAX		/* else restore clobbered regs */
-	movq	ORIG_RAX(%rsp), %rax
-	jmp	entry_SYSCALL_64_fastpath	/* and return to the fast path */
-
-tracesys_phase2:
 	SAVE_EXTRA_REGS
 	movq	%rsp, %rdi
-	movl	$AUDIT_ARCH_X86_64, %esi
-	movq	%rax, %rdx
-	call	syscall_trace_enter_phase2
+	call	syscall_trace_enter
 
 	/*
 	 * Reload registers from stack in case ptrace changed them.
-	 * We don't reload %rax because syscall_trace_entry_phase2() returned
+	 * We don't reload %rax because syscall_trace_enter() returned
 	 * the value it wants us to use in the table lookup.
 	 */
 	RESTORE_C_REGS_EXCEPT_RAX
@@ -355,6 +351,38 @@ opportunistic_sysret_failed:
 	jmp	restore_c_regs_and_iret
 END(entry_SYSCALL_64)
 
+ENTRY(stub_ptregs_64)
+	/*
+	 * Syscalls marked as needing ptregs land here.
+	 * If we are on the fast path, we need to save the extra regs.
+	 * If we are on the slow path, the extra regs are already saved.
+	 *
+	 * RAX stores a pointer to the C function implementing the syscall.
+	 */
+	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
+	jne	1f
+
+	/* Called from fast path -- pop return address and jump to slow path */
+	popq	%rax
+	jmp	tracesys	/* called from fast path */
+
+1:
+	/* Called from C */
+	jmp	*%rax				/* called from C */
+END(stub_ptregs_64)
+
+.macro ptregs_stub func
+ENTRY(ptregs_\func)
+	leaq	\func(%rip), %rax
+	jmp	stub_ptregs_64
+END(ptregs_\func)
+.endm
+
+/* Instantiate ptregs_stub for each ptregs-using syscall */
+#define __SYSCALL_64_QUAL_(sym)
+#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
+#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
+#include <asm/syscalls_64.h>
 
 	.macro FORK_LIKE func
 ENTRY(stub_\func)
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index a1d4087..9dbc5ab 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -6,11 +6,14 @@
 #include <asm/asm-offsets.h>
 #include <asm/syscall.h>
 
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ;
+#define __SYSCALL_64_QUAL_(sym) sym
+#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
+
+#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 #include <asm/syscalls_64.h>
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
+#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index dc1040a..5de342a 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -21,7 +21,7 @@
 12	common	brk			sys_brk
 13	64	rt_sigaction		sys_rt_sigaction
 14	common	rt_sigprocmask		sys_rt_sigprocmask
-15	64	rt_sigreturn		stub_rt_sigreturn
+15	64	rt_sigreturn		stub_rt_sigreturn/ptregs
 16	64	ioctl			sys_ioctl
 17	common	pread64			sys_pread64
 18	common	pwrite64		sys_pwrite64
@@ -62,10 +62,10 @@
 53	common	socketpair		sys_socketpair
 54	64	setsockopt		sys_setsockopt
 55	64	getsockopt		sys_getsockopt
-56	common	clone			stub_clone
-57	common	fork			stub_fork
-58	common	vfork			stub_vfork
-59	64	execve			stub_execve
+56	common	clone			stub_clone/ptregs
+57	common	fork			stub_fork/ptregs
+58	common	vfork			stub_vfork/ptregs
+59	64	execve			stub_execve/ptregs
 60	common	exit			sys_exit
 61	common	wait4			sys_wait4
 62	common	kill			sys_kill
@@ -328,7 +328,7 @@
 319	common	memfd_create		sys_memfd_create
 320	common	kexec_file_load		sys_kexec_file_load
 321	common	bpf			sys_bpf
-322	64	execveat		stub_execveat
+322	64	execveat		stub_execveat/ptregs
 323	common	userfaultfd		sys_userfaultfd
 324	common	membarrier		sys_membarrier
 325	common	mlock2			sys_mlock2
@@ -346,7 +346,7 @@
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
-520	x32	execve			stub_x32_execve
+520	x32	execve			stub_x32_execve/ptregs
 521	x32	ptrace			compat_sys_ptrace
 522	x32	rt_sigpending		compat_sys_rt_sigpending
 523	x32	rt_sigtimedwait		compat_sys_rt_sigtimedwait
@@ -371,4 +371,4 @@
 542	x32	getsockopt		compat_sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
 544	x32	io_submit		compat_sys_io_submit
-545	x32	execveat		stub_x32_execveat
+545	x32	execveat		stub_x32_execveat/ptregs

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] x86/entry/64: Call all native slow-path syscalls with full pt-regs
  2016-01-28 23:11 ` [PATCH v2 08/10] x86/entry/64: Call all native slow-path syscalls with full pt-regs Andy Lutomirski
@ 2016-01-29 11:36   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, luto, torvalds, bp, linux-kernel, hpa, dvlasenk,
	peterz, fweisbec, brgerst, luto

Commit-ID:  46eabf06c04a6847a694a0c1413d4ac57e5b058a
Gitweb:     http://git.kernel.org/tip/46eabf06c04a6847a694a0c1413d4ac57e5b058a
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:26 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:38 +0100

x86/entry/64: Call all native slow-path syscalls with full pt-regs

This removes all of the remaining asm syscall stubs except for
stub_ptregs_64.  Entries in the main syscall table are now all
callable from C.

The resulting asm is every bit as ridiculous as it looks.  The
next few patches will clean it up.  This patch is here to let
reviewers rest their brains and for bisection.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a6b3801be0d505d50aefabda02d3b93efbfc9c73.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S              | 79 +---------------------------------
 arch/x86/entry/syscalls/syscall_64.tbl | 18 ++++----
 2 files changed, 10 insertions(+), 87 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f1c8f15..f7050a5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -253,7 +253,6 @@ tracesys:
 	 * the value it wants us to use in the table lookup.
 	 */
 	RESTORE_C_REGS_EXCEPT_RAX
-	RESTORE_EXTRA_REGS
 #if __SYSCALL_MASK == ~0
 	cmpq	$__NR_syscall_max, %rax
 #else
@@ -264,6 +263,7 @@ tracesys:
 	movq	%r10, %rcx			/* fixup for C */
 	call	*sys_call_table(, %rax, 8)
 	movq	%rax, RAX(%rsp)
+	RESTORE_EXTRA_REGS
 1:
 	/* Use IRET because user could have changed pt_regs->foo */
 
@@ -384,83 +384,6 @@ END(ptregs_\func)
 #define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
 #include <asm/syscalls_64.h>
 
-	.macro FORK_LIKE func
-ENTRY(stub_\func)
-	SAVE_EXTRA_REGS 8
-	jmp	sys_\func
-END(stub_\func)
-	.endm
-
-	FORK_LIKE  clone
-	FORK_LIKE  fork
-	FORK_LIKE  vfork
-
-ENTRY(stub_execve)
-	call	sys_execve
-return_from_execve:
-	testl	%eax, %eax
-	jz	1f
-	/* exec failed, can use fast SYSRET code path in this case */
-	ret
-1:
-	/* must use IRET code path (pt_regs->cs may have changed) */
-	addq	$8, %rsp
-	ZERO_EXTRA_REGS
-	movq	%rax, RAX(%rsp)
-	jmp	int_ret_from_sys_call
-END(stub_execve)
-/*
- * Remaining execve stubs are only 7 bytes long.
- * ENTRY() often aligns to 16 bytes, which in this case has no benefits.
- */
-	.align	8
-GLOBAL(stub_execveat)
-	call	sys_execveat
-	jmp	return_from_execve
-END(stub_execveat)
-
-#if defined(CONFIG_X86_X32_ABI)
-	.align	8
-GLOBAL(stub_x32_execve)
-	call	compat_sys_execve
-	jmp	return_from_execve
-END(stub_x32_execve)
-	.align	8
-GLOBAL(stub_x32_execveat)
-	call	compat_sys_execveat
-	jmp	return_from_execve
-END(stub_x32_execveat)
-#endif
-
-/*
- * sigreturn is special because it needs to restore all registers on return.
- * This cannot be done with SYSRET, so use the IRET return path instead.
- */
-ENTRY(stub_rt_sigreturn)
-	/*
-	 * SAVE_EXTRA_REGS result is not normally needed:
-	 * sigreturn overwrites all pt_regs->GPREGS.
-	 * But sigreturn can fail (!), and there is no easy way to detect that.
-	 * To make sure RESTORE_EXTRA_REGS doesn't restore garbage on error,
-	 * we SAVE_EXTRA_REGS here.
-	 */
-	SAVE_EXTRA_REGS 8
-	call	sys_rt_sigreturn
-return_from_stub:
-	addq	$8, %rsp
-	RESTORE_EXTRA_REGS
-	movq	%rax, RAX(%rsp)
-	jmp	int_ret_from_sys_call
-END(stub_rt_sigreturn)
-
-#ifdef CONFIG_X86_X32_ABI
-ENTRY(stub_x32_rt_sigreturn)
-	SAVE_EXTRA_REGS 8
-	call	sys32_x32_rt_sigreturn
-	jmp	return_from_stub
-END(stub_x32_rt_sigreturn)
-#endif
-
 /*
  * A newly forked process directly context switches into this address.
  *
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5de342a..dcf107c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -21,7 +21,7 @@
 12	common	brk			sys_brk
 13	64	rt_sigaction		sys_rt_sigaction
 14	common	rt_sigprocmask		sys_rt_sigprocmask
-15	64	rt_sigreturn		stub_rt_sigreturn/ptregs
+15	64	rt_sigreturn		sys_rt_sigreturn/ptregs
 16	64	ioctl			sys_ioctl
 17	common	pread64			sys_pread64
 18	common	pwrite64		sys_pwrite64
@@ -62,10 +62,10 @@
 53	common	socketpair		sys_socketpair
 54	64	setsockopt		sys_setsockopt
 55	64	getsockopt		sys_getsockopt
-56	common	clone			stub_clone/ptregs
-57	common	fork			stub_fork/ptregs
-58	common	vfork			stub_vfork/ptregs
-59	64	execve			stub_execve/ptregs
+56	common	clone			sys_clone/ptregs
+57	common	fork			sys_fork/ptregs
+58	common	vfork			sys_vfork/ptregs
+59	64	execve			sys_execve/ptregs
 60	common	exit			sys_exit
 61	common	wait4			sys_wait4
 62	common	kill			sys_kill
@@ -328,7 +328,7 @@
 319	common	memfd_create		sys_memfd_create
 320	common	kexec_file_load		sys_kexec_file_load
 321	common	bpf			sys_bpf
-322	64	execveat		stub_execveat/ptregs
+322	64	execveat		sys_execveat/ptregs
 323	common	userfaultfd		sys_userfaultfd
 324	common	membarrier		sys_membarrier
 325	common	mlock2			sys_mlock2
@@ -339,14 +339,14 @@
 # for native 64-bit operation.
 #
 512	x32	rt_sigaction		compat_sys_rt_sigaction
-513	x32	rt_sigreturn		stub_x32_rt_sigreturn
+513	x32	rt_sigreturn		sys32_x32_rt_sigreturn
 514	x32	ioctl			compat_sys_ioctl
 515	x32	readv			compat_sys_readv
 516	x32	writev			compat_sys_writev
 517	x32	recvfrom		compat_sys_recvfrom
 518	x32	sendmsg			compat_sys_sendmsg
 519	x32	recvmsg			compat_sys_recvmsg
-520	x32	execve			stub_x32_execve/ptregs
+520	x32	execve			compat_sys_execve/ptregs
 521	x32	ptrace			compat_sys_ptrace
 522	x32	rt_sigpending		compat_sys_rt_sigpending
 523	x32	rt_sigtimedwait		compat_sys_rt_sigtimedwait
@@ -371,4 +371,4 @@
 542	x32	getsockopt		compat_sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
 544	x32	io_submit		compat_sys_io_submit
-545	x32	execveat		stub_x32_execveat/ptregs
+545	x32	execveat		compat_sys_execveat/ptregs

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] x86/entry/64: Stop using int_ret_from_sys_call in ret_from_fork
  2016-01-28 23:11 ` [PATCH v2 09/10] x86/entry/64: Stop using int_ret_from_sys_call in ret_from_fork Andy Lutomirski
@ 2016-01-29 11:36   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, bp, fweisbec, torvalds, luto, tglx, brgerst, mingo,
	luto, dvlasenk, hpa, peterz

Commit-ID:  24d978b76ffd20ecff8a8d1c21b16fe740f8b119
Gitweb:     http://git.kernel.org/tip/24d978b76ffd20ecff8a8d1c21b16fe740f8b119
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:27 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:38 +0100

x86/entry/64: Stop using int_ret_from_sys_call in ret_from_fork

ret_from_fork is now open-coded and is no longer tangled up with
the syscall code.  This isn't so bad -- this adds very little
code, and IMO the result is much easier to understand.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a0747e2a5e47084655a1e96351c545b755c41fa7.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_64.S | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f7050a5..cb5d940 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -390,7 +390,6 @@ END(ptregs_\func)
  * rdi: prev task we switched from
  */
 ENTRY(ret_from_fork)
-
 	LOCK ; btr $TIF_FORK, TI_flags(%r8)
 
 	pushq	$0x0002
@@ -398,28 +397,32 @@ ENTRY(ret_from_fork)
 
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
-	RESTORE_EXTRA_REGS
-
 	testb	$3, CS(%rsp)			/* from kernel_thread? */
+	jnz	1f
 
 	/*
-	 * By the time we get here, we have no idea whether our pt_regs,
-	 * ti flags, and ti status came from the 64-bit SYSCALL fast path,
-	 * the slow path, or one of the 32-bit compat paths.
-	 * Use IRET code path to return, since it can safely handle
-	 * all of the above.
+	 * We came from kernel_thread.  This code path is quite twisted, and
+	 * someone should clean it up.
+	 *
+	 * copy_thread_tls stashes the function pointer in RBX and the
+	 * parameter to be passed in RBP.  The called function is permitted
+	 * to call do_execve and thereby jump to user mode.
 	 */
-	jnz	int_ret_from_sys_call
+	movq	RBP(%rsp), %rdi
+	call	*RBX(%rsp)
+	movl	$0, RAX(%rsp)
 
 	/*
-	 * We came from kernel_thread
-	 * nb: we depend on RESTORE_EXTRA_REGS above
+	 * Fall through as though we're exiting a syscall.  This makes a
+	 * twisted sort of sense if we just called do_execve.
 	 */
-	movq	%rbp, %rdi
-	call	*%rbx
-	movl	$0, RAX(%rsp)
-	RESTORE_EXTRA_REGS
-	jmp	int_ret_from_sys_call
+
+1:
+	movq	%rsp, %rdi
+	call	syscall_return_slowpath	/* returns with IRQs disabled */
+	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
+	SWAPGS
+	jmp	restore_regs_and_iret
 END(ret_from_fork)
 
 /*

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip:x86/asm] x86/entry/64: Migrate the 64-bit syscall slow path to C
  2016-01-28 23:11 ` [PATCH v2 10/10] x86/entry/64: Migrate the 64-bit syscall slow path to C Andy Lutomirski
@ 2016-01-29 11:36   ` tip-bot for Andy Lutomirski
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot for Andy Lutomirski @ 2016-01-29 11:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, torvalds, hpa, luto, peterz, linux-kernel, fweisbec, luto,
	dvlasenk, brgerst, bp, mingo

Commit-ID:  1e423bff959e48166f5b7efca01fdb0dbdf05846
Gitweb:     http://git.kernel.org/tip/1e423bff959e48166f5b7efca01fdb0dbdf05846
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu, 28 Jan 2016 15:11:28 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 29 Jan 2016 09:46:38 +0100

x86/entry/64: Migrate the 64-bit syscall slow path to C

This is more complicated than the 32-bit and compat cases
because it preserves an asm fast path for the case where the
callee-saved regs aren't needed in pt_regs and no entry or exit
work needs to be done.

This appears to slow down fastpath syscalls by no more than one
cycle on my Skylake laptop.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ce2335a4d42dc164b24132ee5e8c7716061f947b.1454022279.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/common.c   |  26 +++++++++++
 arch/x86/entry/entry_64.S | 117 ++++++++++++++++------------------------------
 2 files changed, 65 insertions(+), 78 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 0366374..75175f9 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -344,6 +344,32 @@ __visible inline void syscall_return_slowpath(struct pt_regs *regs)
 	prepare_exit_to_usermode(regs);
 }
 
+#ifdef CONFIG_X86_64
+__visible void do_syscall_64(struct pt_regs *regs)
+{
+	struct thread_info *ti = pt_regs_to_thread_info(regs);
+	unsigned long nr = regs->orig_ax;
+
+	local_irq_enable();
+
+	if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY)
+		nr = syscall_trace_enter(regs);
+
+	/*
+	 * NB: Native and x32 syscalls are dispatched from the same
+	 * table.  The only functional difference is the x32 bit in
+	 * regs->orig_ax, which changes the behavior of some syscalls.
+	 */
+	if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
+		regs->ax = sys_call_table[nr & __SYSCALL_MASK](
+			regs->di, regs->si, regs->dx,
+			regs->r10, regs->r8, regs->r9);
+	}
+
+	syscall_return_slowpath(regs);
+}
+#endif
+
 #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
 /*
  * Does a 32-bit syscall.  Called with IRQs on and does all entry and
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cb5d940..567aa52 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -145,17 +145,11 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
 	movq	%rsp, PER_CPU_VAR(rsp_scratch)
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
 
+	TRACE_IRQS_OFF
+
 	/* Construct struct pt_regs on stack */
 	pushq	$__USER_DS			/* pt_regs->ss */
 	pushq	PER_CPU_VAR(rsp_scratch)	/* pt_regs->sp */
-	/*
-	 * Re-enable interrupts.
-	 * We use 'rsp_scratch' as a scratch space, hence irq-off block above
-	 * must execute atomically in the face of possible interrupt-driven
-	 * task preemption. We must enable interrupts only after we're done
-	 * with using rsp_scratch:
-	 */
-	ENABLE_INTERRUPTS(CLBR_NONE)
 	pushq	%r11				/* pt_regs->flags */
 	pushq	$__USER_CS			/* pt_regs->cs */
 	pushq	%rcx				/* pt_regs->ip */
@@ -171,9 +165,21 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
 	pushq	%r11				/* pt_regs->r11 */
 	sub	$(6*8), %rsp			/* pt_regs->bp, bx, r12-15 not saved */
 
-	testl	$_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jnz	tracesys
+	/*
+	 * If we need to do entry work or if we guess we'll need to do
+	 * exit work, go straight to the slow path.
+	 */
+	testl	$_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+	jnz	entry_SYSCALL64_slow_path
+
 entry_SYSCALL_64_fastpath:
+	/*
+	 * Easy case: enable interrupts and issue the syscall.  If the syscall
+	 * needs pt_regs, we'll call a stub that disables interrupts again
+	 * and jumps to the slow path.
+	 */
+	TRACE_IRQS_ON
+	ENABLE_INTERRUPTS(CLBR_NONE)
 #if __SYSCALL_MASK == ~0
 	cmpq	$__NR_syscall_max, %rax
 #else
@@ -193,88 +199,43 @@ entry_SYSCALL_64_fastpath:
 
 	movq	%rax, RAX(%rsp)
 1:
-/*
- * Syscall return path ending with SYSRET (fast path).
- * Has incompletely filled pt_regs.
- */
-	LOCKDEP_SYS_EXIT
-	/*
-	 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
-	 * it is too small to ever cause noticeable irq latency.
-	 */
-	DISABLE_INTERRUPTS(CLBR_NONE)
 
 	/*
-	 * We must check ti flags with interrupts (or at least preemption)
-	 * off because we must *never* return to userspace without
-	 * processing exit work that is enqueued if we're preempted here.
-	 * In particular, returning to userspace with any of the one-shot
-	 * flags (TIF_NOTIFY_RESUME, TIF_USER_RETURN_NOTIFY, etc) set is
-	 * very bad.
+	 * If we get here, then we know that pt_regs is clean for SYSRET64.
+	 * If we see that no exit work is required (which we are required
+	 * to check with IRQs off), then we can go straight to SYSRET64.
 	 */
+	DISABLE_INTERRUPTS(CLBR_NONE)
+	TRACE_IRQS_OFF
 	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jnz	int_ret_from_sys_call_irqs_off	/* Go to the slow path */
+	jnz	1f
 
-	RESTORE_C_REGS_EXCEPT_RCX_R11
-	movq	RIP(%rsp), %rcx
-	movq	EFLAGS(%rsp), %r11
+	LOCKDEP_SYS_EXIT
+	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
+	RESTORE_C_REGS
 	movq	RSP(%rsp), %rsp
-	/*
-	 * 64-bit SYSRET restores rip from rcx,
-	 * rflags from r11 (but RF and VM bits are forced to 0),
-	 * cs and ss are loaded from MSRs.
-	 * Restoration of rflags re-enables interrupts.
-	 *
-	 * NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
-	 * descriptor is not reinitialized.  This means that we should
-	 * avoid SYSRET with SS == NULL, which could happen if we schedule,
-	 * exit the kernel, and re-enter using an interrupt vector.  (All
-	 * interrupt entries on x86_64 set SS to NULL.)  We prevent that
-	 * from happening by reloading SS in __switch_to.  (Actually
-	 * detecting the failure in 64-bit userspace is tricky but can be
-	 * done.)
-	 */
 	USERGS_SYSRET64
 
-GLOBAL(int_ret_from_sys_call_irqs_off)
+1:
+	/*
+	 * The fast path looked good when we started, but something changed
+	 * along the way and we need to switch to the slow path.  Calling
+	 * raise(3) will trigger this, for example.  IRQs are off.
+	 */
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	jmp int_ret_from_sys_call
-
-	/* Do syscall entry tracing */
-tracesys:
 	SAVE_EXTRA_REGS
 	movq	%rsp, %rdi
-	call	syscall_trace_enter
-
-	/*
-	 * Reload registers from stack in case ptrace changed them.
-	 * We don't reload %rax because syscall_trace_enter() returned
-	 * the value it wants us to use in the table lookup.
-	 */
-	RESTORE_C_REGS_EXCEPT_RAX
-#if __SYSCALL_MASK == ~0
-	cmpq	$__NR_syscall_max, %rax
-#else
-	andl	$__SYSCALL_MASK, %eax
-	cmpl	$__NR_syscall_max, %eax
-#endif
-	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
-	movq	%r10, %rcx			/* fixup for C */
-	call	*sys_call_table(, %rax, 8)
-	movq	%rax, RAX(%rsp)
-	RESTORE_EXTRA_REGS
-1:
-	/* Use IRET because user could have changed pt_regs->foo */
+	call	syscall_return_slowpath	/* returns with IRQs disabled */
+	jmp	return_from_SYSCALL_64
 
-/*
- * Syscall return path ending with IRET.
- * Has correct iret frame.
- */
-GLOBAL(int_ret_from_sys_call)
+entry_SYSCALL64_slow_path:
+	/* IRQs are off. */
 	SAVE_EXTRA_REGS
 	movq	%rsp, %rdi
-	call	syscall_return_slowpath	/* returns with IRQs disabled */
+	call	do_syscall_64		/* returns with IRQs disabled */
+
+return_from_SYSCALL_64:
 	RESTORE_EXTRA_REGS
 	TRACE_IRQS_IRETQ		/* we're about to change IF */
 
@@ -364,7 +325,7 @@ ENTRY(stub_ptregs_64)
 
 	/* Called from fast path -- pop return address and jump to slow path */
 	popq	%rax
-	jmp	tracesys	/* called from fast path */
+	jmp	entry_SYSCALL64_slow_path	/* called from fast path */
 
 1:
 	/* Called from C */

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [tip:x86/asm] x86/syscalls:  Remove __SYSCALL_COMMON and __SYSCALL_X32
  2016-01-29 11:34   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
@ 2016-01-29 21:23     ` H. Peter Anvin
  2016-01-29 22:19       ` Brian Gerst
  0 siblings, 1 reply; 28+ messages in thread
From: H. Peter Anvin @ 2016-01-29 21:23 UTC (permalink / raw)
  To: linux-tip-commits, tip-bot for Andy Lutomirski
  Cc: torvalds, brgerst, luto, bp, luto, linux-kernel, mingo, tglx,
	dvlasenk, peterz, fweisbec

On January 29, 2016 3:34:44 AM PST, tip-bot for Andy Lutomirski <tipbot@zytor.com> wrote:
>Commit-ID:  32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
>Gitweb:    
>http://git.kernel.org/tip/32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
>Author:     Andy Lutomirski <luto@kernel.org>
>AuthorDate: Thu, 28 Jan 2016 15:11:22 -0800
>Committer:  Ingo Molnar <mingo@kernel.org>
>CommitDate: Fri, 29 Jan 2016 09:46:37 +0100
>
>x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
>
>The common/64/x32 distinction has no effect other than
>determining which kernels actually support the syscall.  Move
>the logic into syscalltbl.sh.
>
>Signed-off-by: Andy Lutomirski <luto@kernel.org>
>Cc: Andy Lutomirski <luto@amacapital.net>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: Brian Gerst <brgerst@gmail.com>
>Cc: Denys Vlasenko <dvlasenk@redhat.com>
>Cc: Frederic Weisbecker <fweisbec@gmail.com>
>Cc: H. Peter Anvin <hpa@zytor.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Link:
>http://lkml.kernel.org/r/58d4a95f40e43b894f93288b4a3633963d0ee22e.1454022279.git.luto@kernel.org
>Signed-off-by: Ingo Molnar <mingo@kernel.org>
>---
> arch/x86/entry/syscall_64.c           |  8 --------
> arch/x86/entry/syscalls/syscalltbl.sh | 17 ++++++++++++++++-
> arch/x86/kernel/asm-offsets_64.c      |  6 ------
> arch/x86/um/sys_call_table_64.c       |  3 ---
> arch/x86/um/user-offsets.c            |  2 --
> 5 files changed, 16 insertions(+), 20 deletions(-)
>
>diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
>index 41283d2..974fd89 100644
>--- a/arch/x86/entry/syscall_64.c
>+++ b/arch/x86/entry/syscall_64.c
>@@ -6,14 +6,6 @@
> #include <asm/asm-offsets.h>
> #include <asm/syscall.h>
> 
>-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym,
>compat)
>-
>-#ifdef CONFIG_X86_X32_ABI
>-# define __SYSCALL_X32(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
>-#else
>-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
>-#endif
>-
>#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long
>sym(unsigned long, unsigned long, unsigned long, unsigned long,
>unsigned long, unsigned long) ;
> #include <asm/syscalls_64.h>
> #undef __SYSCALL_64
>diff --git a/arch/x86/entry/syscalls/syscalltbl.sh
>b/arch/x86/entry/syscalls/syscalltbl.sh
>index 167965e..5ebeaf1 100644
>--- a/arch/x86/entry/syscalls/syscalltbl.sh
>+++ b/arch/x86/entry/syscalls/syscalltbl.sh
>@@ -18,6 +18,21 @@ emit() {
> grep '^[0-9]' "$in" | sort -n | (
>     while read nr abi name entry compat; do
> 	abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
>-	emit "$abi" "$nr" "$entry" "$compat"
>+	if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then
>+	    # COMMON is the same as 64, except that we don't expect X32
>+	    # programs to use it.  Our expectation has nothing to do with
>+	    # any generated code, so treat them the same.
>+	    emit 64 "$nr" "$entry" "$compat"
>+	elif [ "$abi" == "X32" ]; then
>+	    # X32 is equivalent to 64 on an X32-compatible kernel.
>+	    echo "#ifdef CONFIG_X86_X32_ABI"
>+	    emit 64 "$nr" "$entry" "$compat"
>+	    echo "#endif"
>+	elif [ "$abi" == "I386" ]; then
>+	    emit "$abi" "$nr" "$entry" "$compat"
>+	else
>+	    echo "Unknown abi $abi" >&2
>+	    exit 1
>+	fi
>     done
> ) > "$out"
>diff --git a/arch/x86/kernel/asm-offsets_64.c
>b/arch/x86/kernel/asm-offsets_64.c
>index f2edafb..29db3b3 100644
>--- a/arch/x86/kernel/asm-offsets_64.c
>+++ b/arch/x86/kernel/asm-offsets_64.c
>@@ -5,12 +5,6 @@
> #include <asm/ia32.h>
> 
> #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
>-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
>-#ifdef CONFIG_X86_X32_ABI
>-# define __SYSCALL_X32(nr, sym, compat) [nr] = 1,
>-#else
>-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
>-#endif
> static char syscalls_64[] = {
> #include <asm/syscalls_64.h>
> };
>diff --git a/arch/x86/um/sys_call_table_64.c
>b/arch/x86/um/sys_call_table_64.c
>index b74ea6c..71a497c 100644
>--- a/arch/x86/um/sys_call_table_64.c
>+++ b/arch/x86/um/sys_call_table_64.c
>@@ -35,9 +35,6 @@
> #define stub_execveat sys_execveat
> #define stub_rt_sigreturn sys_rt_sigreturn
> 
>-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym,
>compat)
>-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
>-
>#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long
>sym(unsigned long, unsigned long, unsigned long, unsigned long,
>unsigned long, unsigned long) ;
> #include <asm/syscalls_64.h>
> 
>diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
>index ce7e360..5edf4f4 100644
>--- a/arch/x86/um/user-offsets.c
>+++ b/arch/x86/um/user-offsets.c
>@@ -15,8 +15,6 @@ static char syscalls[] = {
> };
> #else
> #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
>-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
>-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
> static char syscalls[] = {
> #include <asm/syscalls_64.h>
> };

I am really unhappy about this change.  syscalltbl.sh is written so that other architectures can use it, and so I would really prefer for it to stay arch-neutral and encourage other arches to use it, too.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [tip:x86/asm] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
  2016-01-29 21:23     ` H. Peter Anvin
@ 2016-01-29 22:19       ` Brian Gerst
  2016-01-29 22:23         ` Andy Lutomirski
  2016-01-30 18:40         ` H. Peter Anvin
  0 siblings, 2 replies; 28+ messages in thread
From: Brian Gerst @ 2016-01-29 22:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-tip-commits, tip-bot for Andy Lutomirski, Linus Torvalds,
	Andy Lutomirski, Borislav Petkov, Andy Lutomirski,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Denys Vlasenko, Peter Zijlstra, Frédéric Weisbecker

On Fri, Jan 29, 2016 at 4:23 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On January 29, 2016 3:34:44 AM PST, tip-bot for Andy Lutomirski <tipbot@zytor.com> wrote:
>>Commit-ID:  32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
>>Gitweb:
>>http://git.kernel.org/tip/32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
>>Author:     Andy Lutomirski <luto@kernel.org>
>>AuthorDate: Thu, 28 Jan 2016 15:11:22 -0800
>>Committer:  Ingo Molnar <mingo@kernel.org>
>>CommitDate: Fri, 29 Jan 2016 09:46:37 +0100
>>
>>x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
>>
>>The common/64/x32 distinction has no effect other than
>>determining which kernels actually support the syscall.  Move
>>the logic into syscalltbl.sh.
>>
>>Signed-off-by: Andy Lutomirski <luto@kernel.org>
>>Cc: Andy Lutomirski <luto@amacapital.net>
>>Cc: Borislav Petkov <bp@alien8.de>
>>Cc: Brian Gerst <brgerst@gmail.com>
>>Cc: Denys Vlasenko <dvlasenk@redhat.com>
>>Cc: Frederic Weisbecker <fweisbec@gmail.com>
>>Cc: H. Peter Anvin <hpa@zytor.com>
>>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>Cc: Peter Zijlstra <peterz@infradead.org>
>>Cc: Thomas Gleixner <tglx@linutronix.de>
>>Link:
>>http://lkml.kernel.org/r/58d4a95f40e43b894f93288b4a3633963d0ee22e.1454022279.git.luto@kernel.org
>>Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>---
>> arch/x86/entry/syscall_64.c           |  8 --------
>> arch/x86/entry/syscalls/syscalltbl.sh | 17 ++++++++++++++++-
>> arch/x86/kernel/asm-offsets_64.c      |  6 ------
>> arch/x86/um/sys_call_table_64.c       |  3 ---
>> arch/x86/um/user-offsets.c            |  2 --
>> 5 files changed, 16 insertions(+), 20 deletions(-)
>>
>>diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
>>index 41283d2..974fd89 100644
>>--- a/arch/x86/entry/syscall_64.c
>>+++ b/arch/x86/entry/syscall_64.c
>>@@ -6,14 +6,6 @@
>> #include <asm/asm-offsets.h>
>> #include <asm/syscall.h>
>>
>>-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym,
>>compat)
>>-
>>-#ifdef CONFIG_X86_X32_ABI
>>-# define __SYSCALL_X32(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
>>-#else
>>-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
>>-#endif
>>-
>>#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long
>>sym(unsigned long, unsigned long, unsigned long, unsigned long,
>>unsigned long, unsigned long) ;
>> #include <asm/syscalls_64.h>
>> #undef __SYSCALL_64
>>diff --git a/arch/x86/entry/syscalls/syscalltbl.sh
>>b/arch/x86/entry/syscalls/syscalltbl.sh
>>index 167965e..5ebeaf1 100644
>>--- a/arch/x86/entry/syscalls/syscalltbl.sh
>>+++ b/arch/x86/entry/syscalls/syscalltbl.sh
>>@@ -18,6 +18,21 @@ emit() {
>> grep '^[0-9]' "$in" | sort -n | (
>>     while read nr abi name entry compat; do
>>       abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
>>-      emit "$abi" "$nr" "$entry" "$compat"
>>+      if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then
>>+          # COMMON is the same as 64, except that we don't expect X32
>>+          # programs to use it.  Our expectation has nothing to do with
>>+          # any generated code, so treat them the same.
>>+          emit 64 "$nr" "$entry" "$compat"
>>+      elif [ "$abi" == "X32" ]; then
>>+          # X32 is equivalent to 64 on an X32-compatible kernel.
>>+          echo "#ifdef CONFIG_X86_X32_ABI"
>>+          emit 64 "$nr" "$entry" "$compat"
>>+          echo "#endif"
>>+      elif [ "$abi" == "I386" ]; then
>>+          emit "$abi" "$nr" "$entry" "$compat"
>>+      else
>>+          echo "Unknown abi $abi" >&2
>>+          exit 1
>>+      fi
>>     done
>> ) > "$out"
>>diff --git a/arch/x86/kernel/asm-offsets_64.c
>>b/arch/x86/kernel/asm-offsets_64.c
>>index f2edafb..29db3b3 100644
>>--- a/arch/x86/kernel/asm-offsets_64.c
>>+++ b/arch/x86/kernel/asm-offsets_64.c
>>@@ -5,12 +5,6 @@
>> #include <asm/ia32.h>
>>
>> #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
>>-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
>>-#ifdef CONFIG_X86_X32_ABI
>>-# define __SYSCALL_X32(nr, sym, compat) [nr] = 1,
>>-#else
>>-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
>>-#endif
>> static char syscalls_64[] = {
>> #include <asm/syscalls_64.h>
>> };
>>diff --git a/arch/x86/um/sys_call_table_64.c
>>b/arch/x86/um/sys_call_table_64.c
>>index b74ea6c..71a497c 100644
>>--- a/arch/x86/um/sys_call_table_64.c
>>+++ b/arch/x86/um/sys_call_table_64.c
>>@@ -35,9 +35,6 @@
>> #define stub_execveat sys_execveat
>> #define stub_rt_sigreturn sys_rt_sigreturn
>>
>>-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym,
>>compat)
>>-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
>>-
>>#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long
>>sym(unsigned long, unsigned long, unsigned long, unsigned long,
>>unsigned long, unsigned long) ;
>> #include <asm/syscalls_64.h>
>>
>>diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
>>index ce7e360..5edf4f4 100644
>>--- a/arch/x86/um/user-offsets.c
>>+++ b/arch/x86/um/user-offsets.c
>>@@ -15,8 +15,6 @@ static char syscalls[] = {
>> };
>> #else
>> #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
>>-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
>>-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
>> static char syscalls[] = {
>> #include <asm/syscalls_64.h>
>> };
>
> I am really unhappy about this change.  syscalltbl.sh is written so that other architectures can use it, and so I would really prefer for it to stay arch-neutral and encourage other arches to use it, too.

We could use a qualifier (ie. /x32 or /x32ptregs) instead.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [tip:x86/asm] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
  2016-01-29 22:19       ` Brian Gerst
@ 2016-01-29 22:23         ` Andy Lutomirski
  2016-01-30  9:31           ` Ingo Molnar
  2016-01-30 18:40         ` H. Peter Anvin
  1 sibling, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-29 22:23 UTC (permalink / raw)
  To: Brian Gerst
  Cc: H. Peter Anvin, linux-tip-commits, tip-bot for Andy Lutomirski,
	Linus Torvalds, Borislav Petkov, Andy Lutomirski,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Denys Vlasenko, Peter Zijlstra, Frédéric Weisbecker

On Fri, Jan 29, 2016 at 2:19 PM, Brian Gerst <brgerst@gmail.com> wrote:
> On Fri, Jan 29, 2016 at 4:23 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On January 29, 2016 3:34:44 AM PST, tip-bot for Andy Lutomirski <tipbot@zytor.com> wrote:
>>>Commit-ID:  32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
>>>Gitweb:
>>>http://git.kernel.org/tip/32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
>>>Author:     Andy Lutomirski <luto@kernel.org>
>>>AuthorDate: Thu, 28 Jan 2016 15:11:22 -0800
>>>Committer:  Ingo Molnar <mingo@kernel.org>
>>>CommitDate: Fri, 29 Jan 2016 09:46:37 +0100
>>>
>>>x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
>>>
>>>The common/64/x32 distinction has no effect other than
>>>determining which kernels actually support the syscall.  Move
>>>the logic into syscalltbl.sh.
>>>
>>>Signed-off-by: Andy Lutomirski <luto@kernel.org>
>>>Cc: Andy Lutomirski <luto@amacapital.net>
>>>Cc: Borislav Petkov <bp@alien8.de>
>>>Cc: Brian Gerst <brgerst@gmail.com>
>>>Cc: Denys Vlasenko <dvlasenk@redhat.com>
>>>Cc: Frederic Weisbecker <fweisbec@gmail.com>
>>>Cc: H. Peter Anvin <hpa@zytor.com>
>>>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>Cc: Peter Zijlstra <peterz@infradead.org>
>>>Cc: Thomas Gleixner <tglx@linutronix.de>
>>>Link:
>>>http://lkml.kernel.org/r/58d4a95f40e43b894f93288b4a3633963d0ee22e.1454022279.git.luto@kernel.org
>>>Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>>---
>>> arch/x86/entry/syscall_64.c           |  8 --------
>>> arch/x86/entry/syscalls/syscalltbl.sh | 17 ++++++++++++++++-
>>> arch/x86/kernel/asm-offsets_64.c      |  6 ------
>>> arch/x86/um/sys_call_table_64.c       |  3 ---
>>> arch/x86/um/user-offsets.c            |  2 --
>>> 5 files changed, 16 insertions(+), 20 deletions(-)
>>>
>>>diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
>>>index 41283d2..974fd89 100644
>>>--- a/arch/x86/entry/syscall_64.c
>>>+++ b/arch/x86/entry/syscall_64.c
>>>@@ -6,14 +6,6 @@
>>> #include <asm/asm-offsets.h>
>>> #include <asm/syscall.h>
>>>
>>>-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym,
>>>compat)
>>>-
>>>-#ifdef CONFIG_X86_X32_ABI
>>>-# define __SYSCALL_X32(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
>>>-#else
>>>-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
>>>-#endif
>>>-
>>>#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long
>>>sym(unsigned long, unsigned long, unsigned long, unsigned long,
>>>unsigned long, unsigned long) ;
>>> #include <asm/syscalls_64.h>
>>> #undef __SYSCALL_64
>>>diff --git a/arch/x86/entry/syscalls/syscalltbl.sh
>>>b/arch/x86/entry/syscalls/syscalltbl.sh
>>>index 167965e..5ebeaf1 100644
>>>--- a/arch/x86/entry/syscalls/syscalltbl.sh
>>>+++ b/arch/x86/entry/syscalls/syscalltbl.sh
>>>@@ -18,6 +18,21 @@ emit() {
>>> grep '^[0-9]' "$in" | sort -n | (
>>>     while read nr abi name entry compat; do
>>>       abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
>>>-      emit "$abi" "$nr" "$entry" "$compat"
>>>+      if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then
>>>+          # COMMON is the same as 64, except that we don't expect X32
>>>+          # programs to use it.  Our expectation has nothing to do with
>>>+          # any generated code, so treat them the same.
>>>+          emit 64 "$nr" "$entry" "$compat"
>>>+      elif [ "$abi" == "X32" ]; then
>>>+          # X32 is equivalent to 64 on an X32-compatible kernel.
>>>+          echo "#ifdef CONFIG_X86_X32_ABI"
>>>+          emit 64 "$nr" "$entry" "$compat"
>>>+          echo "#endif"
>>>+      elif [ "$abi" == "I386" ]; then
>>>+          emit "$abi" "$nr" "$entry" "$compat"
>>>+      else
>>>+          echo "Unknown abi $abi" >&2
>>>+          exit 1
>>>+      fi
>>>     done
>>> ) > "$out"
>>>diff --git a/arch/x86/kernel/asm-offsets_64.c
>>>b/arch/x86/kernel/asm-offsets_64.c
>>>index f2edafb..29db3b3 100644
>>>--- a/arch/x86/kernel/asm-offsets_64.c
>>>+++ b/arch/x86/kernel/asm-offsets_64.c
>>>@@ -5,12 +5,6 @@
>>> #include <asm/ia32.h>
>>>
>>> #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
>>>-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
>>>-#ifdef CONFIG_X86_X32_ABI
>>>-# define __SYSCALL_X32(nr, sym, compat) [nr] = 1,
>>>-#else
>>>-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
>>>-#endif
>>> static char syscalls_64[] = {
>>> #include <asm/syscalls_64.h>
>>> };
>>>diff --git a/arch/x86/um/sys_call_table_64.c
>>>b/arch/x86/um/sys_call_table_64.c
>>>index b74ea6c..71a497c 100644
>>>--- a/arch/x86/um/sys_call_table_64.c
>>>+++ b/arch/x86/um/sys_call_table_64.c
>>>@@ -35,9 +35,6 @@
>>> #define stub_execveat sys_execveat
>>> #define stub_rt_sigreturn sys_rt_sigreturn
>>>
>>>-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym,
>>>compat)
>>>-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
>>>-
>>>#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long
>>>sym(unsigned long, unsigned long, unsigned long, unsigned long,
>>>unsigned long, unsigned long) ;
>>> #include <asm/syscalls_64.h>
>>>
>>>diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
>>>index ce7e360..5edf4f4 100644
>>>--- a/arch/x86/um/user-offsets.c
>>>+++ b/arch/x86/um/user-offsets.c
>>>@@ -15,8 +15,6 @@ static char syscalls[] = {
>>> };
>>> #else
>>> #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
>>>-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
>>>-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
>>> static char syscalls[] = {
>>> #include <asm/syscalls_64.h>
>>> };
>>
>> I am really unhappy about this change.  syscalltbl.sh is written so that other architectures can use it, and so I would really prefer for it to stay arch-neutral and encourage other arches to use it, too.
>
> We could use a qualifier (ie. /x32 or /x32ptregs) instead.

No combinatorial explosion, please.  We could use __SYSCALL(nr, sym,
abi, qual), though.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [tip:x86/asm] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
  2016-01-29 22:23         ` Andy Lutomirski
@ 2016-01-30  9:31           ` Ingo Molnar
  2016-01-30 17:35             ` Andy Lutomirski
  0 siblings, 1 reply; 28+ messages in thread
From: Ingo Molnar @ 2016-01-30  9:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Brian Gerst, H. Peter Anvin, linux-tip-commits,
	tip-bot for Andy Lutomirski, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Linux Kernel Mailing List, Thomas Gleixner,
	Denys Vlasenko, Peter Zijlstra, Frédéric Weisbecker


* Andy Lutomirski <luto@amacapital.net> wrote:

> >>>+      if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then
> >>>+          # COMMON is the same as 64, except that we don't expect X32
> >>>+          # programs to use it.  Our expectation has nothing to do with
> >>>+          # any generated code, so treat them the same.
> >>>+          emit 64 "$nr" "$entry" "$compat"
> >>>+      elif [ "$abi" == "X32" ]; then
> >>>+          # X32 is equivalent to 64 on an X32-compatible kernel.
> >>>+          echo "#ifdef CONFIG_X86_X32_ABI"
> >>>+          emit 64 "$nr" "$entry" "$compat"
> >>>+          echo "#endif"
> >>>+      elif [ "$abi" == "I386" ]; then
> >>>+          emit "$abi" "$nr" "$entry" "$compat"
> >>>+      else
> >>>+          echo "Unknown abi $abi" >&2
> >>>+          exit 1
> >>>+      fi

> No combinatorial explosion, please.  We could use __SYSCALL(nr, sym,
> abi, qual), though.

Mind fixing it, so that we get back the arch-neutral property?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [tip:x86/asm] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
  2016-01-30  9:31           ` Ingo Molnar
@ 2016-01-30 17:35             ` Andy Lutomirski
  2016-01-30 21:22               ` H. Peter Anvin
  0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-01-30 17:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Brian Gerst, H. Peter Anvin, linux-tip-commits,
	tip-bot for Andy Lutomirski, Linus Torvalds, Borislav Petkov,
	Andy Lutomirski, Linux Kernel Mailing List, Thomas Gleixner,
	Denys Vlasenko, Peter Zijlstra, Frédéric Weisbecker

On Sat, Jan 30, 2016 at 1:31 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Andy Lutomirski <luto@amacapital.net> wrote:
>
>> >>>+      if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then
>> >>>+          # COMMON is the same as 64, except that we don't expect X32
>> >>>+          # programs to use it.  Our expectation has nothing to do with
>> >>>+          # any generated code, so treat them the same.
>> >>>+          emit 64 "$nr" "$entry" "$compat"
>> >>>+      elif [ "$abi" == "X32" ]; then
>> >>>+          # X32 is equivalent to 64 on an X32-compatible kernel.
>> >>>+          echo "#ifdef CONFIG_X86_X32_ABI"
>> >>>+          emit 64 "$nr" "$entry" "$compat"
>> >>>+          echo "#endif"
>> >>>+      elif [ "$abi" == "I386" ]; then
>> >>>+          emit "$abi" "$nr" "$entry" "$compat"
>> >>>+      else
>> >>>+          echo "Unknown abi $abi" >&2
>> >>>+          exit 1
>> >>>+      fi
>
>> No combinatorial explosion, please.  We could use __SYSCALL(nr, sym,
>> abi, qual), though.
>
> Mind fixing it, so that we get back the arch-neutral property?
>

I need some guidance as to the goal to do a good job.

In the version in -tip, I have this thing:

if [ "$abi" == "64" -a -n "$compat" ]; then
    echo "a compat entry for a 64-bit syscall makes no sense" >&2
    exit 1
fi

Moving that outside the script will either be impossible or an
exercise in really awful C preprocessor hacks.  We could keep that
under the theory that it's arch-neutral.

It might be nice to add a similar warning that a compat entry for an
x32 syscall makes so sense.  That's a little less arch-neutral,
although it wouldn't be actively harmful on any architecture, since
"x32" wouldn't occur in the first place.

Other than that, I could add a little header called
syscall_abi_mapping.h containing something like:

#ifndef __SYSCALL_ABI_MAPPING_H
#define __SYSCALL_ABI_MAPPING_H

#ifdef CONFIG_X86_32

/* Only I386 entries should ever be compiled into 32-bit kernels. */
#define __SYSCALL_ABI_I386(nr, entry, qual, compat, compat_qual)
__SYSCALL_I386(nr, entry, qual)

#else

/* I386 entries on 64-bit kernels use the compat entry point. */
#define __SYSCALL_ABI_I386(nr, entry, qual, compat, compat_qual)
__SYSCALL_I386(nr, compat, compat_qual)

#define __SYSCALL_ABI_common(nr, entry, compat, qual)
#define __SYSCALL_ABI_64(nr, entry, qual, compat, compat_qual)
__SYSCALL_64(nr, entry, qual)
#ifdef CONFIG_X86_X32
#define __SYSCALL_ABI_x32(nr, entry, qual, compat, compat_qual)
__SYSCALL_64(nr, entry, qual)
#else
#define __SYSCALL_ABI_x32(nr, entry, qual, compat, compat_qual)
__SYSCALL_64(nr, entry, qual)
#endif

#endif

#endif

and teach syscalltbl.sh to emit #include <asm/syscall_abi_mapping.h>
at the beginning of syscalls_32.h and syscalls_64.h and to reference
those macros.

hpa, would that meet your requirements?

IMO this is quite a bit messier than the code in -tip, and I'm
honestly not convinced it's an improvement.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [tip:x86/asm] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
  2016-01-29 22:19       ` Brian Gerst
  2016-01-29 22:23         ` Andy Lutomirski
@ 2016-01-30 18:40         ` H. Peter Anvin
  1 sibling, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2016-01-30 18:40 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-tip-commits, tip-bot for Andy Lutomirski, Linus Torvalds,
	Andy Lutomirski, Borislav Petkov, Andy Lutomirski,
	Linux Kernel Mailing List, Ingo Molnar, Thomas Gleixner,
	Denys Vlasenko, Peter Zijlstra, Frédéric Weisbecker

On January 29, 2016 2:19:29 PM PST, Brian Gerst <brgerst@gmail.com> wrote:
>On Fri, Jan 29, 2016 at 4:23 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On January 29, 2016 3:34:44 AM PST, tip-bot for Andy Lutomirski
><tipbot@zytor.com> wrote:
>>>Commit-ID:  32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
>>>Gitweb:
>>>http://git.kernel.org/tip/32324ce15ea8cb4c8acc28acb2fd36fabf73e9db
>>>Author:     Andy Lutomirski <luto@kernel.org>
>>>AuthorDate: Thu, 28 Jan 2016 15:11:22 -0800
>>>Committer:  Ingo Molnar <mingo@kernel.org>
>>>CommitDate: Fri, 29 Jan 2016 09:46:37 +0100
>>>
>>>x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
>>>
>>>The common/64/x32 distinction has no effect other than
>>>determining which kernels actually support the syscall.  Move
>>>the logic into syscalltbl.sh.
>>>
>>>Signed-off-by: Andy Lutomirski <luto@kernel.org>
>>>Cc: Andy Lutomirski <luto@amacapital.net>
>>>Cc: Borislav Petkov <bp@alien8.de>
>>>Cc: Brian Gerst <brgerst@gmail.com>
>>>Cc: Denys Vlasenko <dvlasenk@redhat.com>
>>>Cc: Frederic Weisbecker <fweisbec@gmail.com>
>>>Cc: H. Peter Anvin <hpa@zytor.com>
>>>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>Cc: Peter Zijlstra <peterz@infradead.org>
>>>Cc: Thomas Gleixner <tglx@linutronix.de>
>>>Link:
>>>http://lkml.kernel.org/r/58d4a95f40e43b894f93288b4a3633963d0ee22e.1454022279.git.luto@kernel.org
>>>Signed-off-by: Ingo Molnar <mingo@kernel.org>
>>>---
>>> arch/x86/entry/syscall_64.c           |  8 --------
>>> arch/x86/entry/syscalls/syscalltbl.sh | 17 ++++++++++++++++-
>>> arch/x86/kernel/asm-offsets_64.c      |  6 ------
>>> arch/x86/um/sys_call_table_64.c       |  3 ---
>>> arch/x86/um/user-offsets.c            |  2 --
>>> 5 files changed, 16 insertions(+), 20 deletions(-)
>>>
>>>diff --git a/arch/x86/entry/syscall_64.c
>b/arch/x86/entry/syscall_64.c
>>>index 41283d2..974fd89 100644
>>>--- a/arch/x86/entry/syscall_64.c
>>>+++ b/arch/x86/entry/syscall_64.c
>>>@@ -6,14 +6,6 @@
>>> #include <asm/asm-offsets.h>
>>> #include <asm/syscall.h>
>>>
>>>-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym,
>>>compat)
>>>-
>>>-#ifdef CONFIG_X86_X32_ABI
>>>-# define __SYSCALL_X32(nr, sym, compat) __SYSCALL_64(nr, sym,
>compat)
>>>-#else
>>>-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
>>>-#endif
>>>-
>>>#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long
>>>sym(unsigned long, unsigned long, unsigned long, unsigned long,
>>>unsigned long, unsigned long) ;
>>> #include <asm/syscalls_64.h>
>>> #undef __SYSCALL_64
>>>diff --git a/arch/x86/entry/syscalls/syscalltbl.sh
>>>b/arch/x86/entry/syscalls/syscalltbl.sh
>>>index 167965e..5ebeaf1 100644
>>>--- a/arch/x86/entry/syscalls/syscalltbl.sh
>>>+++ b/arch/x86/entry/syscalls/syscalltbl.sh
>>>@@ -18,6 +18,21 @@ emit() {
>>> grep '^[0-9]' "$in" | sort -n | (
>>>     while read nr abi name entry compat; do
>>>       abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
>>>-      emit "$abi" "$nr" "$entry" "$compat"
>>>+      if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then
>>>+          # COMMON is the same as 64, except that we don't expect
>X32
>>>+          # programs to use it.  Our expectation has nothing to do
>with
>>>+          # any generated code, so treat them the same.
>>>+          emit 64 "$nr" "$entry" "$compat"
>>>+      elif [ "$abi" == "X32" ]; then
>>>+          # X32 is equivalent to 64 on an X32-compatible kernel.
>>>+          echo "#ifdef CONFIG_X86_X32_ABI"
>>>+          emit 64 "$nr" "$entry" "$compat"
>>>+          echo "#endif"
>>>+      elif [ "$abi" == "I386" ]; then
>>>+          emit "$abi" "$nr" "$entry" "$compat"
>>>+      else
>>>+          echo "Unknown abi $abi" >&2
>>>+          exit 1
>>>+      fi
>>>     done
>>> ) > "$out"
>>>diff --git a/arch/x86/kernel/asm-offsets_64.c
>>>b/arch/x86/kernel/asm-offsets_64.c
>>>index f2edafb..29db3b3 100644
>>>--- a/arch/x86/kernel/asm-offsets_64.c
>>>+++ b/arch/x86/kernel/asm-offsets_64.c
>>>@@ -5,12 +5,6 @@
>>> #include <asm/ia32.h>
>>>
>>> #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
>>>-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
>>>-#ifdef CONFIG_X86_X32_ABI
>>>-# define __SYSCALL_X32(nr, sym, compat) [nr] = 1,
>>>-#else
>>>-# define __SYSCALL_X32(nr, sym, compat) /* nothing */
>>>-#endif
>>> static char syscalls_64[] = {
>>> #include <asm/syscalls_64.h>
>>> };
>>>diff --git a/arch/x86/um/sys_call_table_64.c
>>>b/arch/x86/um/sys_call_table_64.c
>>>index b74ea6c..71a497c 100644
>>>--- a/arch/x86/um/sys_call_table_64.c
>>>+++ b/arch/x86/um/sys_call_table_64.c
>>>@@ -35,9 +35,6 @@
>>> #define stub_execveat sys_execveat
>>> #define stub_rt_sigreturn sys_rt_sigreturn
>>>
>>>-#define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym,
>>>compat)
>>>-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
>>>-
>>>#define __SYSCALL_64(nr, sym, compat) extern asmlinkage long
>>>sym(unsigned long, unsigned long, unsigned long, unsigned long,
>>>unsigned long, unsigned long) ;
>>> #include <asm/syscalls_64.h>
>>>
>>>diff --git a/arch/x86/um/user-offsets.c b/arch/x86/um/user-offsets.c
>>>index ce7e360..5edf4f4 100644
>>>--- a/arch/x86/um/user-offsets.c
>>>+++ b/arch/x86/um/user-offsets.c
>>>@@ -15,8 +15,6 @@ static char syscalls[] = {
>>> };
>>> #else
>>> #define __SYSCALL_64(nr, sym, compat) [nr] = 1,
>>>-#define __SYSCALL_COMMON(nr, sym, compat) [nr] = 1,
>>>-#define __SYSCALL_X32(nr, sym, compat) /* Not supported */
>>> static char syscalls[] = {
>>> #include <asm/syscalls_64.h>
>>> };
>>
>> I am really unhappy about this change.  syscalltbl.sh is written so
>that other architectures can use it, and so I would really prefer for
>it to stay arch-neutral and encourage other arches to use it, too.
>
>We could use a qualifier (ie. /x32 or /x32ptregs) instead.
>
>--
>Brian Gerst

I like that idea much better.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [tip:x86/asm] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32
  2016-01-30 17:35             ` Andy Lutomirski
@ 2016-01-30 21:22               ` H. Peter Anvin
  0 siblings, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2016-01-30 21:22 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar
  Cc: Brian Gerst, linux-tip-commits, tip-bot for Andy Lutomirski,
	Linus Torvalds, Borislav Petkov, Andy Lutomirski,
	Linux Kernel Mailing List, Thomas Gleixner, Denys Vlasenko,
	Peter Zijlstra, Frédéric Weisbecker

On January 30, 2016 9:35:57 AM PST, Andy Lutomirski <luto@amacapital.net> wrote:
>On Sat, Jan 30, 2016 at 1:31 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Andy Lutomirski <luto@amacapital.net> wrote:
>>
>>> >>>+      if [ "$abi" == "COMMON" -o "$abi" == "64" ]; then
>>> >>>+          # COMMON is the same as 64, except that we don't
>expect X32
>>> >>>+          # programs to use it.  Our expectation has nothing to
>do with
>>> >>>+          # any generated code, so treat them the same.
>>> >>>+          emit 64 "$nr" "$entry" "$compat"
>>> >>>+      elif [ "$abi" == "X32" ]; then
>>> >>>+          # X32 is equivalent to 64 on an X32-compatible kernel.
>>> >>>+          echo "#ifdef CONFIG_X86_X32_ABI"
>>> >>>+          emit 64 "$nr" "$entry" "$compat"
>>> >>>+          echo "#endif"
>>> >>>+      elif [ "$abi" == "I386" ]; then
>>> >>>+          emit "$abi" "$nr" "$entry" "$compat"
>>> >>>+      else
>>> >>>+          echo "Unknown abi $abi" >&2
>>> >>>+          exit 1
>>> >>>+      fi
>>
>>> No combinatorial explosion, please.  We could use __SYSCALL(nr, sym,
>>> abi, qual), though.
>>
>> Mind fixing it, so that we get back the arch-neutral property?
>>
>
>I need some guidance as to the goal to do a good job.
>
>In the version in -tip, I have this thing:
>
>if [ "$abi" == "64" -a -n "$compat" ]; then
>    echo "a compat entry for a 64-bit syscall makes no sense" >&2
>    exit 1
>fi
>
>Moving that outside the script will either be impossible or an
>exercise in really awful C preprocessor hacks.  We could keep that
>under the theory that it's arch-neutral.
>
>It might be nice to add a similar warning that a compat entry for an
>x32 syscall makes so sense.  That's a little less arch-neutral,
>although it wouldn't be actively harmful on any architecture, since
>"x32" wouldn't occur in the first place.
>
>Other than that, I could add a little header called
>syscall_abi_mapping.h containing something like:
>
>#ifndef __SYSCALL_ABI_MAPPING_H
>#define __SYSCALL_ABI_MAPPING_H
>
>#ifdef CONFIG_X86_32
>
>/* Only I386 entries should ever be compiled into 32-bit kernels. */
>#define __SYSCALL_ABI_I386(nr, entry, qual, compat, compat_qual)
>__SYSCALL_I386(nr, entry, qual)
>
>#else
>
>/* I386 entries on 64-bit kernels use the compat entry point. */
>#define __SYSCALL_ABI_I386(nr, entry, qual, compat, compat_qual)
>__SYSCALL_I386(nr, compat, compat_qual)
>
>#define __SYSCALL_ABI_common(nr, entry, compat, qual)
>#define __SYSCALL_ABI_64(nr, entry, qual, compat, compat_qual)
>__SYSCALL_64(nr, entry, qual)
>#ifdef CONFIG_X86_X32
>#define __SYSCALL_ABI_x32(nr, entry, qual, compat, compat_qual)
>__SYSCALL_64(nr, entry, qual)
>#else
>#define __SYSCALL_ABI_x32(nr, entry, qual, compat, compat_qual)
>__SYSCALL_64(nr, entry, qual)
>#endif
>
>#endif
>
>#endif
>
>and teach syscalltbl.sh to emit #include <asm/syscall_abi_mapping.h>
>at the beginning of syscalls_32.h and syscalls_64.h and to reference
>those macros.
>
>hpa, would that meet your requirements?
>
>IMO this is quite a bit messier than the code in -tip, and I'm
>honestly not convinced it's an improvement.
>
>--Andy

Something like that... however, I can't look at in detail right now.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-01-30 21:23 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-28 23:11 [PATCH v2 00/10] x86: Rewrite 64-bit syscall code Andy Lutomirski
2016-01-28 23:11 ` [PATCH v2 01/10] selftests/x86: Extend Makefile to allow 64-bit-only tests Andy Lutomirski
2016-01-29 11:33   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-01-28 23:11 ` [PATCH v2 02/10] selftests/x86: Add check_initial_reg_state Andy Lutomirski
2016-01-29 11:34   ` [tip:x86/asm] selftests/x86: Add check_initial_reg_state() tip-bot for Andy Lutomirski
2016-01-28 23:11 ` [PATCH v2 03/10] x86/syscalls: Refactor syscalltbl.sh Andy Lutomirski
2016-01-29 11:34   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-01-28 23:11 ` [PATCH v2 04/10] x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32 Andy Lutomirski
2016-01-29 11:34   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-01-29 21:23     ` H. Peter Anvin
2016-01-29 22:19       ` Brian Gerst
2016-01-29 22:23         ` Andy Lutomirski
2016-01-30  9:31           ` Ingo Molnar
2016-01-30 17:35             ` Andy Lutomirski
2016-01-30 21:22               ` H. Peter Anvin
2016-01-30 18:40         ` H. Peter Anvin
2016-01-28 23:11 ` [PATCH v2 05/10] x86/syscalls: Move compat syscall entry handling into syscalltbl.sh Andy Lutomirski
2016-01-29 11:35   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-01-28 23:11 ` [PATCH v2 06/10] x86/syscalls: Add syscall entry qualifiers Andy Lutomirski
2016-01-29 11:35   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-01-28 23:11 ` [PATCH v2 07/10] x86/entry/64: Always run ptregs-using syscalls on the slow path Andy Lutomirski
2016-01-29 11:35   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-01-28 23:11 ` [PATCH v2 08/10] x86/entry/64: Call all native slow-path syscalls with full pt-regs Andy Lutomirski
2016-01-29 11:36   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-01-28 23:11 ` [PATCH v2 09/10] x86/entry/64: Stop using int_ret_from_sys_call in ret_from_fork Andy Lutomirski
2016-01-29 11:36   ` [tip:x86/asm] " tip-bot for Andy Lutomirski
2016-01-28 23:11 ` [PATCH v2 10/10] x86/entry/64: Migrate the 64-bit syscall slow path to C Andy Lutomirski
2016-01-29 11:36   ` [tip:x86/asm] " tip-bot for Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.