* [PATCH v4 0/17] x86: entry.S optimizations
@ 2014-08-08 17:44 Denys Vlasenko
2014-08-08 17:44 ` [PATCH 01/17] x86: entry_64.S: delete unused code Denys Vlasenko
` (17 more replies)
0 siblings, 18 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
Version 4 of the patchset.
Please consider applying at least two first patches, they are definitely safe,
and the second one fixes a latent bug.
Changes since v3:
= simplified iret stack handling on SYSCALL64 fastpath:
got rid of FIXUP_TOP_OF_STACK and RESTORE_TOP_OF_STACK,
got rid of thread_struct::usersp.
= save_paranoid cleaned up.
= folded test_in_nmi and IA32_ARG_FIXUP macros into their users.
= got rid of KERNEL_STACK_OFFSET.
= other small optimizations.
= fixed comments about SYSCALL from 32bit mode.
Changes since v2:
= fixed pre-existing latent bug: wrong symbolic constant usage: R11->ARGOFFSET
= per Oleg's request, added comments about various syscall instructions
Changes since v1 (mainly in patch 4/6):
= Reverted changes to "partial pt_regs saving" on interrupt path.
= Folded PARTIAL_FRAME macro into DEFAULT_FRAME. Patch v1
had a bug (incorrect dwarf debug info generation).
= Corrected more comments.
= Added use of SAVE_* macros to error_entry and save_paranoid.
= Added another cleanup rename patch.
TODO:
= "idtentry" macro uses a subroutine to factor out a largish
common code block. "interrupt" macro inlines a similar
block every time (~20 instances). Maybe factor it out?
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
Denys Vlasenko (17):
x86: entry_64.S: delete unused code
x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET
x86: open-code register save/restore in trace_hardirqs thunks
x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user
x86: entry_64.S: always allocate complete "struct pt_regs"
x86: mass removal of ARGOFFSET
x86: rename some macros and labels, no code changes
x86: add comments about various syscall instructions, no code changes
x86: entry_64.S: move save_paranoid and ret_from_fork closer to their
users
x86: entry_64.S: rename save_paranoid to paranoid_entry, no code
changes
x86: entry_64.S: fold test_in_nmi macro into its only user
x86: get rid of KERNEL_STACK_OFFSET
x86: ia32entry.S: fold IA32_ARG_FIXUP macro into its callers
x86: ia32entry.S: use mov instead of push/pop where possible
x86: code shrink in paranoid_exit
x86: entry_64.S: trivial optimization for ENOSYS
x86: simplify iret stack handling on SYSCALL64 fastpath
arch/x86/ia32/ia32entry.S | 366 +++++++++--------
arch/x86/include/asm/calling.h | 226 +++++------
arch/x86/include/asm/compat.h | 2 +-
arch/x86/include/asm/irqflags.h | 4 +-
arch/x86/include/asm/processor.h | 1 -
arch/x86/include/asm/ptrace.h | 21 +-
arch/x86/include/asm/thread_info.h | 8 +-
arch/x86/include/uapi/asm/ptrace-abi.h | 16 +-
arch/x86/include/uapi/asm/ptrace.h | 13 +-
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/entry_64.S | 708 ++++++++++++++-------------------
arch/x86/kernel/preempt.S | 16 +-
arch/x86/kernel/process_32.c | 3 +-
arch/x86/kernel/process_64.c | 11 +-
arch/x86/kernel/smpboot.c | 3 +-
arch/x86/lib/thunk_64.S | 29 +-
arch/x86/syscalls/syscall_64.tbl | 2 +-
arch/x86/um/sys_call_table_64.c | 2 +-
arch/x86/xen/smp.c | 3 +-
19 files changed, 699 insertions(+), 737 deletions(-)
--
1.8.1.4
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 01/17] x86: entry_64.S: delete unused code
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 02/17] x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET Denys Vlasenko
` (16 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
A define, two macros and an unreferenced bit of assembly are gone.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/include/asm/calling.h | 1 -
arch/x86/kernel/entry_64.S | 34 ----------------------------------
2 files changed, 35 deletions(-)
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index cb4c73b..e176cea 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -83,7 +83,6 @@ For 32-bit we have the following conventions - kernel is built with
#define SS 160
#define ARGOFFSET R11
-#define SWFRAME ORIG_RAX
.macro SAVE_ARGS addskip=0, save_rcx=1, save_r891011=1
subq $9*8+\addskip, %rsp
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b25ca96..dbfd037 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -155,27 +155,6 @@ ENDPROC(native_usergs_sysret64)
movq \tmp,R11+\offset(%rsp)
.endm
- .macro FAKE_STACK_FRAME child_rip
- /* push in order ss, rsp, eflags, cs, rip */
- xorl %eax, %eax
- pushq_cfi $__KERNEL_DS /* ss */
- /*CFI_REL_OFFSET ss,0*/
- pushq_cfi %rax /* rsp */
- CFI_REL_OFFSET rsp,0
- pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) /* eflags - interrupts on */
- /*CFI_REL_OFFSET rflags,0*/
- pushq_cfi $__KERNEL_CS /* cs */
- /*CFI_REL_OFFSET cs,0*/
- pushq_cfi \child_rip /* rip */
- CFI_REL_OFFSET rip,0
- pushq_cfi %rax /* orig rax */
- .endm
-
- .macro UNFAKE_STACK_FRAME
- addq $8*6, %rsp
- CFI_ADJUST_CFA_OFFSET -(6*8)
- .endm
-
/*
* initial frame state for interrupts (and exceptions without error code)
*/
@@ -640,19 +619,6 @@ END(\label)
FORK_LIKE vfork
FIXED_FRAME stub_iopl, sys_iopl
-ENTRY(ptregscall_common)
- DEFAULT_FRAME 1 8 /* offset 8: return address */
- RESTORE_TOP_OF_STACK %r11, 8
- movq_cfi_restore R15+8, r15
- movq_cfi_restore R14+8, r14
- movq_cfi_restore R13+8, r13
- movq_cfi_restore R12+8, r12
- movq_cfi_restore RBP+8, rbp
- movq_cfi_restore RBX+8, rbx
- ret $REST_SKIP /* pop extended registers */
- CFI_ENDPROC
-END(ptregscall_common)
-
ENTRY(stub_execve)
CFI_STARTPROC
addq $8, %rsp
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 02/17] x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
2014-08-08 17:44 ` [PATCH 01/17] x86: entry_64.S: delete unused code Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 03/17] x86: open-code register save/restore in trace_hardirqs thunks Denys Vlasenko
` (15 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
The values of these two constants are the same, the meaning is different.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/ia32/ia32entry.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb0..59c8d3a 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -169,8 +169,8 @@ sysenter_dispatch:
sysexit_from_sys_call:
andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
/* clear IF, that popfq doesn't enable interrupts early */
- andl $~0x200,EFLAGS-R11(%rsp)
- movl RIP-R11(%rsp),%edx /* User %eip */
+ andl $~0x200,EFLAGS-ARGOFFSET(%rsp)
+ movl RIP-ARGOFFSET(%rsp),%edx /* User %eip */
CFI_REGISTER rip,rdx
RESTORE_ARGS 0,24,0,0,0,0
xorq %r8,%r8
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 03/17] x86: open-code register save/restore in trace_hardirqs thunks
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
2014-08-08 17:44 ` [PATCH 01/17] x86: entry_64.S: delete unused code Denys Vlasenko
2014-08-08 17:44 ` [PATCH 02/17] x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 04/17] x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user Denys Vlasenko
` (14 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
This is a preparatory patch for change in "struct pt_regs"
handling in entry_64.S.
trace_hardirqs thunks were (ab)using a part of pt_regs
handling code, namely SAVE_ARGS/RESTORE_ARGS macros,
to save/restore registers across C function calls.
Since SAVE_ARGS is going to be changed, open-code
register saving/restoring here.
Incidentally, this removes a bit of dead code:
one SAVE_ARGS was just to emit a CFI annotation,
but it also generated unreachable assembly insns.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/lib/thunk_64.S | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
diff --git a/arch/x86/lib/thunk_64.S b/arch/x86/lib/thunk_64.S
index 92d9fea..c1c9131 100644
--- a/arch/x86/lib/thunk_64.S
+++ b/arch/x86/lib/thunk_64.S
@@ -16,10 +16,20 @@
\name:
CFI_STARTPROC
- /* this one pushes 9 elems, the next one would be %rIP */
- SAVE_ARGS
+ subq $9*8, %rsp
+ CFI_ADJUST_CFA_OFFSET 9*8
+ movq_cfi rdi, 8*8
+ movq_cfi rsi, 7*8
+ movq_cfi rdx, 6*8
+ movq_cfi rcx, 5*8
+ movq_cfi rax, 4*8
+ movq_cfi r8, 3*8
+ movq_cfi r9, 2*8
+ movq_cfi r10, 1*8
+ movq_cfi r11, 0*8
.if \put_ret_addr_in_rdi
+ /* 9*8(%rsp) is return addr on stack */
movq_cfi_restore 9*8, rdi
.endif
@@ -38,11 +48,20 @@
THUNK lockdep_sys_exit_thunk,lockdep_sys_exit
#endif
- /* SAVE_ARGS below is used only for the .cfi directives it contains. */
CFI_STARTPROC
- SAVE_ARGS
+ CFI_ADJUST_CFA_OFFSET 9*8
restore:
- RESTORE_ARGS
+ movq_cfi_restore 0*8, r11
+ movq_cfi_restore 1*8, r10
+ movq_cfi_restore 2*8, r9
+ movq_cfi_restore 3*8, r8
+ movq_cfi_restore 4*8, rax
+ movq_cfi_restore 5*8, rcx
+ movq_cfi_restore 6*8, rdx
+ movq_cfi_restore 7*8, rsi
+ movq_cfi_restore 8*8, rdi
+ addq 9*8, %rsp
+ CFI_ADJUST_CFA_OFFSET -9*8
ret
CFI_ENDPROC
_ASM_NOKPROBE(restore)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 04/17] x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (2 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 03/17] x86: open-code register save/restore in trace_hardirqs thunks Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 05/17] x86: entry_64.S: always allocate complete "struct pt_regs" Denys Vlasenko
` (13 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
No code changes.
This is a preparatory patch for change in "struct pt_regs" handling.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/kernel/entry_64.S | 88 ++++++++++++++++++++++------------------------
1 file changed, 42 insertions(+), 46 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index dbfd037..37f7d95 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -218,51 +218,6 @@ ENDPROC(native_usergs_sysret64)
CFI_REL_OFFSET r15, R15+\offset
.endm
-/* save partial stack frame */
- .macro SAVE_ARGS_IRQ
- cld
- /* start from rbp in pt_regs and jump over */
- movq_cfi rdi, (RDI-RBP)
- movq_cfi rsi, (RSI-RBP)
- movq_cfi rdx, (RDX-RBP)
- movq_cfi rcx, (RCX-RBP)
- movq_cfi rax, (RAX-RBP)
- movq_cfi r8, (R8-RBP)
- movq_cfi r9, (R9-RBP)
- movq_cfi r10, (R10-RBP)
- movq_cfi r11, (R11-RBP)
-
- /* Save rbp so that we can unwind from get_irq_regs() */
- movq_cfi rbp, 0
-
- /* Save previous stack value */
- movq %rsp, %rsi
-
- leaq -RBP(%rsp),%rdi /* arg1 for handler */
- testl $3, CS-RBP(%rsi)
- je 1f
- SWAPGS
- /*
- * irq_count is used to check if a CPU is already on an interrupt stack
- * or not. While this is essentially redundant with preempt_count it is
- * a little cheaper to use a separate counter in the PDA (short of
- * moving irq_enter into assembly, which would be too much work)
- */
-1: incl PER_CPU_VAR(irq_count)
- cmovzq PER_CPU_VAR(irq_stack_ptr),%rsp
- CFI_DEF_CFA_REGISTER rsi
-
- /* Store previous stack value */
- pushq %rsi
- CFI_ESCAPE 0x0f /* DW_CFA_def_cfa_expression */, 6, \
- 0x77 /* DW_OP_breg7 */, 0, \
- 0x06 /* DW_OP_deref */, \
- 0x08 /* DW_OP_const1u */, SS+8-RBP, \
- 0x22 /* DW_OP_plus */
- /* We entered an interrupt context - irqs are off: */
- TRACE_IRQS_OFF
- .endm
-
ENTRY(save_paranoid)
XCPT_FRAME 1 RDI+8
cld
@@ -731,7 +686,48 @@ END(interrupt)
/* reserve pt_regs for scratch regs and rbp */
subq $ORIG_RAX-RBP, %rsp
CFI_ADJUST_CFA_OFFSET ORIG_RAX-RBP
- SAVE_ARGS_IRQ
+ cld
+ /* start from rbp in pt_regs and jump over */
+ movq_cfi rdi, (RDI-RBP)
+ movq_cfi rsi, (RSI-RBP)
+ movq_cfi rdx, (RDX-RBP)
+ movq_cfi rcx, (RCX-RBP)
+ movq_cfi rax, (RAX-RBP)
+ movq_cfi r8, (R8-RBP)
+ movq_cfi r9, (R9-RBP)
+ movq_cfi r10, (R10-RBP)
+ movq_cfi r11, (R11-RBP)
+
+ /* Save rbp so that we can unwind from get_irq_regs() */
+ movq_cfi rbp, 0
+
+ /* Save previous stack value */
+ movq %rsp, %rsi
+
+ leaq -RBP(%rsp),%rdi /* arg1 for handler */
+ testl $3, CS-RBP(%rsi)
+ je 1f
+ SWAPGS
+ /*
+ * irq_count is used to check if a CPU is already on an interrupt stack
+ * or not. While this is essentially redundant with preempt_count it is
+ * a little cheaper to use a separate counter in the PDA (short of
+ * moving irq_enter into assembly, which would be too much work)
+ */
+1: incl PER_CPU_VAR(irq_count)
+ cmovzq PER_CPU_VAR(irq_stack_ptr),%rsp
+ CFI_DEF_CFA_REGISTER rsi
+
+ /* Store previous stack value */
+ pushq %rsi
+ CFI_ESCAPE 0x0f /* DW_CFA_def_cfa_expression */, 6, \
+ 0x77 /* DW_OP_breg7 */, 0, \
+ 0x06 /* DW_OP_deref */, \
+ 0x08 /* DW_OP_const1u */, SS+8-RBP, \
+ 0x22 /* DW_OP_plus */
+ /* We entered an interrupt context - irqs are off: */
+ TRACE_IRQS_OFF
+
call \func
.endm
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 05/17] x86: entry_64.S: always allocate complete "struct pt_regs"
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (3 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 04/17] x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 06/17] x86: mass removal of ARGOFFSET Denys Vlasenko
` (12 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
64-bit code was using six stack slots fewer by not saving/restoring
registers which are callee-preserved according to C ABI,
and not allocating space for them.
Only when syscall needed a complete "struct pt_regs",
the complete area was allocated and filled in.
As an additional twist, on interrupt entry a "slightly less truncated pt_regs"
trick is used, to make nested interrupt stacks easier to unwind.
This proved to be a source of significant obfuscation and subtle bugs.
For example, stub_fork had to pop the return address,
extend the struct, save registers, and push return address back. Ugly.
ia32_ptregs_common pops return address and "returns" via jmp insn,
throwing a wrench into CPU return stack cache.
This patch changes code to always allocate a complete "struct pt_regs".
The saving of registers is still done lazily.
"Partial pt_regs" trick on interrupt stack is retained, but with added comments
which explain what we are doing, and why. Existing comments are improved.
Macros which manipulate "struct pt_regs" on stack are reworked:
ALLOC_PTREGS_ON_STACK allocates the structure.
SAVE_C_REGS saves to it those registers which are clobbered by C code.
SAVE_EXTRA_REGS saves to it all other registers.
Corresponding RESTORE_* and REMOVE_PTREGS_FROM_STACK macros reverse it.
ia32_ptregs_common, stub_fork and friends lost their ugly dance with
return pointer.
LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets
instead of magic numbers.
error_entry and save_paranoid now use SAVE_C_REGS + SAVE_EXTRA_REGS
instead of having it open-coded yet again.
Misleading and slightly wrong comments in "struct pt_regs" are fixed
(four instances).
Patch was run-tested: 64-bit executables, 32-bit executables,
strace works.
Timing tests did not show measurable difference in 32-bit
and 64-bit syscalls.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/ia32/ia32entry.S | 47 +++----
arch/x86/include/asm/calling.h | 217 ++++++++++++++++-----------------
arch/x86/include/asm/irqflags.h | 4 +-
arch/x86/include/asm/ptrace.h | 13 +-
arch/x86/include/uapi/asm/ptrace-abi.h | 16 ++-
arch/x86/include/uapi/asm/ptrace.h | 13 +-
arch/x86/kernel/entry_64.S | 191 +++++++++++------------------
arch/x86/kernel/preempt.S | 16 ++-
8 files changed, 249 insertions(+), 268 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 59c8d3a..d0ca63b 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -62,12 +62,12 @@
*/
.macro LOAD_ARGS32 offset, _r9=0
.if \_r9
- movl \offset+16(%rsp),%r9d
+ movl \offset+R9(%rsp),%r9d
.endif
- movl \offset+40(%rsp),%ecx
- movl \offset+48(%rsp),%edx
- movl \offset+56(%rsp),%esi
- movl \offset+64(%rsp),%edi
+ movl \offset+RCX(%rsp),%ecx
+ movl \offset+RDX(%rsp),%edx
+ movl \offset+RSI(%rsp),%esi
+ movl \offset+RDI(%rsp),%edi
movl %eax,%eax /* zero extension */
.endm
@@ -144,7 +144,8 @@ ENTRY(ia32_sysenter_target)
CFI_REL_OFFSET rip,0
pushq_cfi %rax
cld
- SAVE_ARGS 0,1,0
+ ALLOC_PTREGS_ON_STACK
+ SAVE_C_REGS_EXCEPT_R891011
/* no need to do an access_ok check here because rbp has been
32bit zero extended */
ASM_STAC
@@ -172,7 +173,8 @@ sysexit_from_sys_call:
andl $~0x200,EFLAGS-ARGOFFSET(%rsp)
movl RIP-ARGOFFSET(%rsp),%edx /* User %eip */
CFI_REGISTER rip,rdx
- RESTORE_ARGS 0,24,0,0,0,0
+ RESTORE_RSI_RDI
+ REMOVE_PTREGS_FROM_STACK 8*3
xorq %r8,%r8
xorq %r9,%r9
xorq %r10,%r10
@@ -240,13 +242,13 @@ sysenter_tracesys:
testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
jz sysenter_auditsys
#endif
- SAVE_REST
+ SAVE_EXTRA_REGS
CLEAR_RREGS
movq $-ENOSYS,RAX(%rsp)/* ptrace can change this for a bad syscall */
movq %rsp,%rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */
- RESTORE_REST
+ RESTORE_EXTRA_REGS
cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* sysenter_tracesys has set RAX(%rsp) */
jmp sysenter_do_call
@@ -288,7 +290,8 @@ ENTRY(ia32_cstar_target)
* disabled irqs and here we enable it straight after entry:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- SAVE_ARGS 8,0,0
+ ALLOC_PTREGS_ON_STACK 8
+ SAVE_C_REGS_EXCEPT_RCX_R891011
movl %eax,%eax /* zero extension */
movq %rax,ORIG_RAX-ARGOFFSET(%rsp)
movq %rcx,RIP-ARGOFFSET(%rsp)
@@ -325,7 +328,7 @@ cstar_dispatch:
jnz sysretl_audit
sysretl_from_sys_call:
andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
- RESTORE_ARGS 0,-ARG_SKIP,0,0,0
+ RESTORE_RSI_RDI_RDX
movl RIP-ARGOFFSET(%rsp),%ecx
CFI_REGISTER rip,rcx
movl EFLAGS-ARGOFFSET(%rsp),%r11d
@@ -356,13 +359,13 @@ cstar_tracesys:
jz cstar_auditsys
#endif
xchgl %r9d,%ebp
- SAVE_REST
+ SAVE_EXTRA_REGS
CLEAR_RREGS 0, r9
movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
movq %rsp,%rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
LOAD_ARGS32 ARGOFFSET, 1 /* reload args from stack in case ptrace changed it */
- RESTORE_REST
+ RESTORE_EXTRA_REGS
xchgl %ebp,%r9d
cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* cstar_tracesys has set RAX(%rsp) */
@@ -417,7 +420,8 @@ ENTRY(ia32_syscall)
cld
/* note the registers are not zero extended to the sf.
this could be a problem. */
- SAVE_ARGS 0,1,0
+ ALLOC_PTREGS_ON_STACK
+ SAVE_C_REGS_EXCEPT_R891011
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
jnz ia32_tracesys
@@ -430,16 +434,16 @@ ia32_sysret:
movq %rax,RAX-ARGOFFSET(%rsp)
ia32_ret_from_sys_call:
CLEAR_RREGS -ARGOFFSET
- jmp int_ret_from_sys_call
+ jmp int_ret_from_sys_call
-ia32_tracesys:
- SAVE_REST
+ia32_tracesys:
+ SAVE_EXTRA_REGS
CLEAR_RREGS
movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
movq %rsp,%rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */
- RESTORE_REST
+ RESTORE_EXTRA_REGS
cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* ia32_tracesys has set RAX(%rsp) */
jmp ia32_do_call
@@ -475,7 +479,6 @@ GLOBAL(stub32_clone)
ALIGN
ia32_ptregs_common:
- popq %r11
CFI_ENDPROC
CFI_STARTPROC32 simple
CFI_SIGNAL_FRAME
@@ -490,9 +493,9 @@ ia32_ptregs_common:
/* CFI_REL_OFFSET rflags,EFLAGS-ARGOFFSET*/
CFI_REL_OFFSET rsp,RSP-ARGOFFSET
/* CFI_REL_OFFSET ss,SS-ARGOFFSET*/
- SAVE_REST
+ SAVE_EXTRA_REGS 8
call *%rax
- RESTORE_REST
- jmp ia32_sysret /* misbalances the return cache */
+ RESTORE_EXTRA_REGS 8
+ ret
CFI_ENDPROC
END(ia32_ptregs_common)
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index e176cea..7642948 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -52,142 +52,135 @@ For 32-bit we have the following conventions - kernel is built with
/*
* 64-bit system call stack frame layout defines and helpers,
- * for assembly code:
+ * for assembly code.
*/
-#define R15 0
-#define R14 8
-#define R13 16
-#define R12 24
-#define RBP 32
-#define RBX 40
-
-/* arguments: interrupts/non tracing syscalls only save up to here: */
-#define R11 48
-#define R10 56
-#define R9 64
-#define R8 72
-#define RAX 80
-#define RCX 88
-#define RDX 96
-#define RSI 104
-#define RDI 112
-#define ORIG_RAX 120 /* + error_code */
-/* end of arguments */
-
-/* cpu exception frame or undefined in case of fast syscall: */
-#define RIP 128
-#define CS 136
-#define EFLAGS 144
-#define RSP 152
-#define SS 160
-
-#define ARGOFFSET R11
-
- .macro SAVE_ARGS addskip=0, save_rcx=1, save_r891011=1
- subq $9*8+\addskip, %rsp
- CFI_ADJUST_CFA_OFFSET 9*8+\addskip
- movq_cfi rdi, 8*8
- movq_cfi rsi, 7*8
- movq_cfi rdx, 6*8
-
- .if \save_rcx
- movq_cfi rcx, 5*8
- .endif
-
- movq_cfi rax, 4*8
+/* The layout forms the "struct pt_regs" on the stack: */
+/*
+ * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
+ * unless syscall needs a complete, fully filled "struct pt_regs".
+ */
+#define R15 0*8
+#define R14 1*8
+#define R13 2*8
+#define R12 3*8
+#define RBP 4*8
+#define RBX 5*8
+/* These regs are callee-clobbered. Always saved on kernel entry. */
+#define R11 6*8
+#define R10 7*8
+#define R9 8*8
+#define R8 9*8
+#define RAX 10*8
+#define RCX 11*8
+#define RDX 12*8
+#define RSI 13*8
+#define RDI 14*8
+/*
+ * On syscall entry, this is syscall#. On CPU exception, this is error code.
+ * On hw interrupt, it's IRQ number:
+ */
+#define ORIG_RAX 15*8
+/* Return frame for iretq */
+#define RIP 16*8
+#define CS 17*8
+#define EFLAGS 18*8
+#define RSP 19*8
+#define SS 20*8
+
+#define ARGOFFSET 0
+
+ .macro ALLOC_PTREGS_ON_STACK addskip=0
+ subq $15*8+\addskip, %rsp
+ CFI_ADJUST_CFA_OFFSET 15*8+\addskip
+ .endm
- .if \save_r891011
- movq_cfi r8, 3*8
- movq_cfi r9, 2*8
- movq_cfi r10, 1*8
- movq_cfi r11, 0*8
+ .macro SAVE_C_REGS_HELPER offset=0 rcx=1 r8plus=1
+ movq_cfi rdi, 14*8+\offset
+ movq_cfi rsi, 13*8+\offset
+ movq_cfi rdx, 12*8+\offset
+ .if \rcx
+ movq_cfi rcx, 11*8+\offset
+ .endif
+ movq_cfi rax, 10*8+\offset
+ .if \r8plus
+ movq_cfi r8, 9*8+\offset
+ movq_cfi r9, 8*8+\offset
+ movq_cfi r10, 7*8+\offset
+ movq_cfi r11, 6*8+\offset
.endif
+ .endm
+ .macro SAVE_C_REGS offset=0
+ SAVE_C_REGS_HELPER \offset, 1, 1
+ .endm
+ .macro SAVE_C_REGS_EXCEPT_R891011
+ SAVE_C_REGS_HELPER 0, 1, 0
+ .endm
+ .macro SAVE_C_REGS_EXCEPT_RCX_R891011
+ SAVE_C_REGS_HELPER 0, 0, 0
+ .endm
+ .macro SAVE_EXTRA_REGS offset=0
+ movq_cfi rbx, 5*8+\offset
+ movq_cfi rbp, 4*8+\offset
+ movq_cfi r12, 3*8+\offset
+ movq_cfi r13, 2*8+\offset
+ movq_cfi r14, 1*8+\offset
+ movq_cfi r15, 0*8+\offset
+ .endm
+ .macro SAVE_EXTRA_REGS_RBP offset=0
+ movq_cfi rbp, 4*8+\offset
.endm
-#define ARG_SKIP (9*8)
+ .macro RESTORE_EXTRA_REGS offset=0
+ movq_cfi_restore 0*8+\offset, r15
+ movq_cfi_restore 1*8+\offset, r14
+ movq_cfi_restore 2*8+\offset, r13
+ movq_cfi_restore 3*8+\offset, r12
+ movq_cfi_restore 4*8+\offset, rbp
+ movq_cfi_restore 5*8+\offset, rbx
+ .endm
- .macro RESTORE_ARGS rstor_rax=1, addskip=0, rstor_rcx=1, rstor_r11=1, \
- rstor_r8910=1, rstor_rdx=1
+ .macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
.if \rstor_r11
- movq_cfi_restore 0*8, r11
+ movq_cfi_restore 6*8, r11
.endif
-
.if \rstor_r8910
- movq_cfi_restore 1*8, r10
- movq_cfi_restore 2*8, r9
- movq_cfi_restore 3*8, r8
+ movq_cfi_restore 7*8, r10
+ movq_cfi_restore 8*8, r9
+ movq_cfi_restore 9*8, r8
.endif
-
.if \rstor_rax
- movq_cfi_restore 4*8, rax
+ movq_cfi_restore 10*8, rax
.endif
-
.if \rstor_rcx
- movq_cfi_restore 5*8, rcx
+ movq_cfi_restore 11*8, rcx
.endif
-
.if \rstor_rdx
- movq_cfi_restore 6*8, rdx
- .endif
-
- movq_cfi_restore 7*8, rsi
- movq_cfi_restore 8*8, rdi
-
- .if ARG_SKIP+\addskip > 0
- addq $ARG_SKIP+\addskip, %rsp
- CFI_ADJUST_CFA_OFFSET -(ARG_SKIP+\addskip)
+ movq_cfi_restore 12*8, rdx
.endif
+ movq_cfi_restore 13*8, rsi
+ movq_cfi_restore 14*8, rdi
.endm
-
- .macro LOAD_ARGS offset, skiprax=0
- movq \offset(%rsp), %r11
- movq \offset+8(%rsp), %r10
- movq \offset+16(%rsp), %r9
- movq \offset+24(%rsp), %r8
- movq \offset+40(%rsp), %rcx
- movq \offset+48(%rsp), %rdx
- movq \offset+56(%rsp), %rsi
- movq \offset+64(%rsp), %rdi
- .if \skiprax
- .else
- movq \offset+72(%rsp), %rax
- .endif
+ .macro RESTORE_C_REGS
+ RESTORE_C_REGS_HELPER 1,1,1,1,1
.endm
-
-#define REST_SKIP (6*8)
-
- .macro SAVE_REST
- subq $REST_SKIP, %rsp
- CFI_ADJUST_CFA_OFFSET REST_SKIP
- movq_cfi rbx, 5*8
- movq_cfi rbp, 4*8
- movq_cfi r12, 3*8
- movq_cfi r13, 2*8
- movq_cfi r14, 1*8
- movq_cfi r15, 0*8
+ .macro RESTORE_C_REGS_EXCEPT_RAX
+ RESTORE_C_REGS_HELPER 0,1,1,1,1
.endm
-
- .macro RESTORE_REST
- movq_cfi_restore 0*8, r15
- movq_cfi_restore 1*8, r14
- movq_cfi_restore 2*8, r13
- movq_cfi_restore 3*8, r12
- movq_cfi_restore 4*8, rbp
- movq_cfi_restore 5*8, rbx
- addq $REST_SKIP, %rsp
- CFI_ADJUST_CFA_OFFSET -(REST_SKIP)
+ .macro RESTORE_C_REGS_EXCEPT_RCX
+ RESTORE_C_REGS_HELPER 1,0,1,1,1
.endm
-
- .macro SAVE_ALL
- SAVE_ARGS
- SAVE_REST
+ .macro RESTORE_RSI_RDI
+ RESTORE_C_REGS_HELPER 0,0,0,0,0
+ .endm
+ .macro RESTORE_RSI_RDI_RDX
+ RESTORE_C_REGS_HELPER 0,0,0,0,1
.endm
- .macro RESTORE_ALL addskip=0
- RESTORE_REST
- RESTORE_ARGS 1, \addskip
+ .macro REMOVE_PTREGS_FROM_STACK addskip=0
+ addq $15*8+\addskip, %rsp
+ CFI_ADJUST_CFA_OFFSET -(15*8+\addskip)
.endm
.macro icebp
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index bba3cf8..6f98c16 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -171,9 +171,9 @@ static inline int arch_irqs_disabled(void)
#define ARCH_LOCKDEP_SYS_EXIT_IRQ \
TRACE_IRQS_ON; \
sti; \
- SAVE_REST; \
+ SAVE_EXTRA_REGS; \
LOCKDEP_SYS_EXIT; \
- RESTORE_REST; \
+ RESTORE_EXTRA_REGS; \
cli; \
TRACE_IRQS_OFF;
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 6205f0c..c822b35 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -31,13 +31,17 @@ struct pt_regs {
#else /* __i386__ */
struct pt_regs {
+/*
+ * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
+ * unless syscall needs a complete, fully filled "struct pt_regs".
+ */
unsigned long r15;
unsigned long r14;
unsigned long r13;
unsigned long r12;
unsigned long bp;
unsigned long bx;
-/* arguments: non interrupts/non tracing syscalls only save up to here*/
+/* These regs are callee-clobbered. Always saved on kernel entry. */
unsigned long r11;
unsigned long r10;
unsigned long r9;
@@ -47,9 +51,12 @@ struct pt_regs {
unsigned long dx;
unsigned long si;
unsigned long di;
+/*
+ * On syscall entry, this is syscall#. On CPU exception, this is error code.
+ * On hw interrupt, it's IRQ number:
+ */
unsigned long orig_ax;
-/* end of arguments */
-/* cpu exception frame or undefined */
+/* Return frame for iretq */
unsigned long ip;
unsigned long cs;
unsigned long flags;
diff --git a/arch/x86/include/uapi/asm/ptrace-abi.h b/arch/x86/include/uapi/asm/ptrace-abi.h
index 7b0a55a..580aee3 100644
--- a/arch/x86/include/uapi/asm/ptrace-abi.h
+++ b/arch/x86/include/uapi/asm/ptrace-abi.h
@@ -25,13 +25,17 @@
#else /* __i386__ */
#if defined(__ASSEMBLY__) || defined(__FRAME_OFFSETS)
+/*
+ * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
+ * unless syscall needs a complete, fully filled "struct pt_regs".
+ */
#define R15 0
#define R14 8
#define R13 16
#define R12 24
#define RBP 32
#define RBX 40
-/* arguments: interrupts/non tracing syscalls only save up to here*/
+/* These regs are callee-clobbered. Always saved on kernel entry. */
#define R11 48
#define R10 56
#define R9 64
@@ -41,15 +45,17 @@
#define RDX 96
#define RSI 104
#define RDI 112
-#define ORIG_RAX 120 /* = ERROR */
-/* end of arguments */
-/* cpu exception frame or undefined in case of fast syscall. */
+/*
+ * On syscall entry, this is syscall#. On CPU exception, this is error code.
+ * On hw interrupt, it's IRQ number:
+ */
+#define ORIG_RAX 120
+/* Return frame for iretq */
#define RIP 128
#define CS 136
#define EFLAGS 144
#define RSP 152
#define SS 160
-#define ARGOFFSET R11
#endif /* __ASSEMBLY__ */
/* top of stack page */
diff --git a/arch/x86/include/uapi/asm/ptrace.h b/arch/x86/include/uapi/asm/ptrace.h
index ac4b9aa..bc16115 100644
--- a/arch/x86/include/uapi/asm/ptrace.h
+++ b/arch/x86/include/uapi/asm/ptrace.h
@@ -41,13 +41,17 @@ struct pt_regs {
#ifndef __KERNEL__
struct pt_regs {
+/*
+ * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
+ * unless syscall needs a complete, fully filled "struct pt_regs".
+ */
unsigned long r15;
unsigned long r14;
unsigned long r13;
unsigned long r12;
unsigned long rbp;
unsigned long rbx;
-/* arguments: non interrupts/non tracing syscalls only save up to here*/
+/* These regs are callee-clobbered. Always saved on kernel entry. */
unsigned long r11;
unsigned long r10;
unsigned long r9;
@@ -57,9 +61,12 @@ struct pt_regs {
unsigned long rdx;
unsigned long rsi;
unsigned long rdi;
+/*
+ * On syscall entry, this is syscall#. On CPU exception, this is error code.
+ * On hw interrupt, it's IRQ number:
+ */
unsigned long orig_rax;
-/* end of arguments */
-/* cpu exception frame or undefined */
+/* Return frame for iretq */
unsigned long rip;
unsigned long cs;
unsigned long eflags;
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 37f7d95..c489a2d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -14,9 +14,6 @@
* NOTE: This code handles signal-recognition, which happens every time
* after an interrupt and after each system call.
*
- * Normal syscalls and interrupts don't save a full stack frame, this is
- * only done for syscall tracing, signals or fork/exec et.al.
- *
* A note on terminology:
* - top of stack: Architecture defined interrupt frame from SS to RIP
* at the top of the kernel process stack.
@@ -26,12 +23,6 @@
* Some macro usage:
* - CFI macros are used to generate dwarf2 unwind information for better
* backtraces. They don't change any code.
- * - SAVE_ALL/RESTORE_ALL - Save/restore all registers
- * - SAVE_ARGS/RESTORE_ARGS - Save/restore registers that C functions modify.
- * There are unfortunately lots of special cases where some registers
- * not touched. The macro is a big mess that should be cleaned up.
- * - SAVE_REST/RESTORE_REST - Handle the registers not saved by SAVE_ARGS.
- * Gives a full stack frame.
* - ENTRY/END Define functions in the symbol table.
* - FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK - Fix up the hardware stack
* frame that is otherwise undefined after a SYSCALL
@@ -156,7 +147,7 @@ ENDPROC(native_usergs_sysret64)
.endm
/*
- * initial frame state for interrupts (and exceptions without error code)
+ * empty frame
*/
.macro EMPTY_FRAME start=1 offset=0
.if \start
@@ -190,9 +181,9 @@ ENDPROC(native_usergs_sysret64)
.endm
/*
- * frame that enables calling into C.
+ * frame that enables passing a complete pt_regs to a C function.
*/
- .macro PARTIAL_FRAME start=1 offset=0
+ .macro DEFAULT_FRAME start=1 offset=0
XCPT_FRAME \start, ORIG_RAX+\offset-ARGOFFSET
CFI_REL_OFFSET rdi, RDI+\offset-ARGOFFSET
CFI_REL_OFFSET rsi, RSI+\offset-ARGOFFSET
@@ -203,13 +194,6 @@ ENDPROC(native_usergs_sysret64)
CFI_REL_OFFSET r9, R9+\offset-ARGOFFSET
CFI_REL_OFFSET r10, R10+\offset-ARGOFFSET
CFI_REL_OFFSET r11, R11+\offset-ARGOFFSET
- .endm
-
-/*
- * frame that enables passing a complete pt_regs to a C function.
- */
- .macro DEFAULT_FRAME start=1 offset=0
- PARTIAL_FRAME \start, R11+\offset-R15
CFI_REL_OFFSET rbx, RBX+\offset
CFI_REL_OFFSET rbp, RBP+\offset
CFI_REL_OFFSET r12, R12+\offset
@@ -221,21 +205,8 @@ ENDPROC(native_usergs_sysret64)
ENTRY(save_paranoid)
XCPT_FRAME 1 RDI+8
cld
- movq_cfi rdi, RDI+8
- movq_cfi rsi, RSI+8
- movq_cfi rdx, RDX+8
- movq_cfi rcx, RCX+8
- movq_cfi rax, RAX+8
- movq_cfi r8, R8+8
- movq_cfi r9, R9+8
- movq_cfi r10, R10+8
- movq_cfi r11, R11+8
- movq_cfi rbx, RBX+8
- movq_cfi rbp, RBP+8
- movq_cfi r12, R12+8
- movq_cfi r13, R13+8
- movq_cfi r14, R14+8
- movq_cfi r15, R15+8
+ SAVE_C_REGS 8
+ SAVE_EXTRA_REGS 8
movl $1,%ebx
movl $MSR_GS_BASE,%ecx
rdmsr
@@ -264,7 +235,7 @@ ENTRY(ret_from_fork)
GET_THREAD_INFO(%rcx)
- RESTORE_REST
+ RESTORE_EXTRA_REGS
testl $3, CS-ARGOFFSET(%rsp) # from kernel_thread?
jz 1f
@@ -276,12 +247,10 @@ ENTRY(ret_from_fork)
jmp ret_from_sys_call # go to the SYSRET fastpath
1:
- subq $REST_SKIP, %rsp # leave space for volatiles
- CFI_ADJUST_CFA_OFFSET REST_SKIP
movq %rbp, %rdi
call *%rbx
movl $0, RAX(%rsp)
- RESTORE_REST
+ RESTORE_EXTRA_REGS
jmp int_ret_from_sys_call
CFI_ENDPROC
END(ret_from_fork)
@@ -339,7 +308,8 @@ GLOBAL(system_call_after_swapgs)
* and short:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- SAVE_ARGS 8,0
+ ALLOC_PTREGS_ON_STACK 8
+ SAVE_C_REGS
movq %rax,ORIG_RAX-ARGOFFSET(%rsp)
movq %rcx,RIP-ARGOFFSET(%rsp)
CFI_REL_OFFSET rip,RIP-ARGOFFSET
@@ -375,9 +345,9 @@ sysret_check:
* sysretq will re-enable interrupts:
*/
TRACE_IRQS_ON
+ RESTORE_C_REGS_EXCEPT_RCX
movq RIP-ARGOFFSET(%rsp),%rcx
CFI_REGISTER rip,rcx
- RESTORE_ARGS 1,-ARG_SKIP,0
/*CFI_REGISTER rflags,r11*/
movq PER_CPU_VAR(old_rsp), %rsp
USERGS_SYSRET64
@@ -429,7 +399,7 @@ auditsys:
movq %rax,%rsi /* 2nd arg: syscall number */
movl $AUDIT_ARCH_X86_64,%edi /* 1st arg: audit arch */
call __audit_syscall_entry
- LOAD_ARGS 0 /* reload call-clobbered registers */
+ RESTORE_C_REGS /* reload call-clobbered registers */
jmp system_call_fastpath
/*
@@ -453,7 +423,7 @@ tracesys:
testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
jz auditsys
#endif
- SAVE_REST
+ SAVE_EXTRA_REGS
movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
FIXUP_TOP_OF_STACK %rdi
movq %rsp,%rdi
@@ -463,8 +433,8 @@ tracesys:
* We don't reload %rax because syscall_trace_enter() returned
* the value it wants us to use in the table lookup.
*/
- LOAD_ARGS ARGOFFSET, 1
- RESTORE_REST
+ RESTORE_C_REGS_EXCEPT_RAX
+ RESTORE_EXTRA_REGS
#if __SYSCALL_MASK == ~0
cmpq $__NR_syscall_max,%rax
#else
@@ -515,7 +485,7 @@ int_very_careful:
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
int_check_syscall_exit_work:
- SAVE_REST
+ SAVE_EXTRA_REGS
/* Check for syscall exit trace */
testl $_TIF_WORK_SYSCALL_EXIT,%edx
jz int_signal
@@ -534,7 +504,7 @@ int_signal:
call do_notify_resume
1: movl $_TIF_WORK_MASK,%edi
int_restore_rest:
- RESTORE_REST
+ RESTORE_EXTRA_REGS
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check
@@ -544,15 +514,12 @@ END(system_call)
.macro FORK_LIKE func
ENTRY(stub_\func)
CFI_STARTPROC
- popq %r11 /* save return address */
- PARTIAL_FRAME 0
- SAVE_REST
- pushq %r11 /* put it back on stack */
+ DEFAULT_FRAME 0, 8 /* offset 8: return address */
+ SAVE_EXTRA_REGS 8
FIXUP_TOP_OF_STACK %r11, 8
- DEFAULT_FRAME 0 8 /* offset 8: return address */
call sys_\func
RESTORE_TOP_OF_STACK %r11, 8
- ret $REST_SKIP /* pop extended registers */
+ ret
CFI_ENDPROC
END(stub_\func)
.endm
@@ -560,7 +527,7 @@ END(stub_\func)
.macro FIXED_FRAME label,func
ENTRY(\label)
CFI_STARTPROC
- PARTIAL_FRAME 0 8 /* offset 8: return address */
+ DEFAULT_FRAME 0, 8 /* offset 8: return address */
FIXUP_TOP_OF_STACK %r11, 8-ARGOFFSET
call \func
RESTORE_TOP_OF_STACK %r11, 8-ARGOFFSET
@@ -577,12 +544,12 @@ END(\label)
ENTRY(stub_execve)
CFI_STARTPROC
addq $8, %rsp
- PARTIAL_FRAME 0
- SAVE_REST
+ DEFAULT_FRAME 0
+ SAVE_EXTRA_REGS
FIXUP_TOP_OF_STACK %r11
call sys_execve
movq %rax,RAX(%rsp)
- RESTORE_REST
+ RESTORE_EXTRA_REGS
jmp int_ret_from_sys_call
CFI_ENDPROC
END(stub_execve)
@@ -594,12 +561,12 @@ END(stub_execve)
ENTRY(stub_rt_sigreturn)
CFI_STARTPROC
addq $8, %rsp
- PARTIAL_FRAME 0
- SAVE_REST
+ DEFAULT_FRAME 0
+ SAVE_EXTRA_REGS
FIXUP_TOP_OF_STACK %r11
call sys_rt_sigreturn
movq %rax,RAX(%rsp) # fixme, this could be done at the higher layer
- RESTORE_REST
+ RESTORE_EXTRA_REGS
jmp int_ret_from_sys_call
CFI_ENDPROC
END(stub_rt_sigreturn)
@@ -608,12 +575,12 @@ END(stub_rt_sigreturn)
ENTRY(stub_x32_rt_sigreturn)
CFI_STARTPROC
addq $8, %rsp
- PARTIAL_FRAME 0
- SAVE_REST
+ DEFAULT_FRAME 0
+ SAVE_EXTRA_REGS
FIXUP_TOP_OF_STACK %r11
call sys32_x32_rt_sigreturn
movq %rax,RAX(%rsp) # fixme, this could be done at the higher layer
- RESTORE_REST
+ RESTORE_EXTRA_REGS
jmp int_ret_from_sys_call
CFI_ENDPROC
END(stub_x32_rt_sigreturn)
@@ -621,13 +588,13 @@ END(stub_x32_rt_sigreturn)
ENTRY(stub_x32_execve)
CFI_STARTPROC
addq $8, %rsp
- PARTIAL_FRAME 0
- SAVE_REST
+ DEFAULT_FRAME 0
+ SAVE_EXTRA_REGS
FIXUP_TOP_OF_STACK %r11
call compat_sys_execve
RESTORE_TOP_OF_STACK %r11
movq %rax,RAX(%rsp)
- RESTORE_REST
+ RESTORE_EXTRA_REGS
jmp int_ret_from_sys_call
CFI_ENDPROC
END(stub_x32_execve)
@@ -683,42 +650,36 @@ END(interrupt)
/* 0(%rsp): ~(interrupt number) */
.macro interrupt func
- /* reserve pt_regs for scratch regs and rbp */
- subq $ORIG_RAX-RBP, %rsp
- CFI_ADJUST_CFA_OFFSET ORIG_RAX-RBP
cld
- /* start from rbp in pt_regs and jump over */
- movq_cfi rdi, (RDI-RBP)
- movq_cfi rsi, (RSI-RBP)
- movq_cfi rdx, (RDX-RBP)
- movq_cfi rcx, (RCX-RBP)
- movq_cfi rax, (RAX-RBP)
- movq_cfi r8, (R8-RBP)
- movq_cfi r9, (R9-RBP)
- movq_cfi r10, (R10-RBP)
- movq_cfi r11, (R11-RBP)
-
- /* Save rbp so that we can unwind from get_irq_regs() */
- movq_cfi rbp, 0
-
- /* Save previous stack value */
- movq %rsp, %rsi
+ /*
+ * Since nothing in interrupt handling code touches r12...r15 members
+ * of "struct pt_regs", and since interrupts can nest, we can save
+ * four stack slots and simultaneously provide
+ * an unwind-friendly stack layout by saving "truncated" pt_regs
+ * exactly up to rbp slot, without these members.
+ */
+ ALLOC_PTREGS_ON_STACK -RBP
+ SAVE_C_REGS -RBP
+ /* this goes to 0(%rsp) for unwinder, not for saving the value: */
+ SAVE_EXTRA_REGS_RBP -RBP
- leaq -RBP(%rsp),%rdi /* arg1 for handler */
- testl $3, CS-RBP(%rsi)
+ leaq -RBP(%rsp),%rdi /* arg1 for \func (pointer to pt_regs) */
+
+ testl $3, CS-RBP(%rsp)
je 1f
SWAPGS
+1:
/*
+ * Save previous stack pointer, optionally switch to interrupt stack.
* irq_count is used to check if a CPU is already on an interrupt stack
* or not. While this is essentially redundant with preempt_count it is
* a little cheaper to use a separate counter in the PDA (short of
* moving irq_enter into assembly, which would be too much work)
*/
-1: incl PER_CPU_VAR(irq_count)
+ movq %rsp, %rsi
+ incl PER_CPU_VAR(irq_count)
cmovzq PER_CPU_VAR(irq_stack_ptr),%rsp
CFI_DEF_CFA_REGISTER rsi
-
- /* Store previous stack value */
pushq %rsi
CFI_ESCAPE 0x0f /* DW_CFA_def_cfa_expression */, 6, \
0x77 /* DW_OP_breg7 */, 0, \
@@ -750,6 +711,7 @@ ret_from_intr:
/* Restore saved previous stack */
popq %rsi
CFI_DEF_CFA rsi,SS+8-RBP /* reg/off reset after def_cfa_expr */
+ /* return code expects complete pt_regs - adjust rsp accordingly: */
leaq ARGOFFSET-RBP(%rsi), %rsp
CFI_DEF_CFA_REGISTER rsp
CFI_ADJUST_CFA_OFFSET RBP-ARGOFFSET
@@ -761,7 +723,7 @@ exit_intr:
/* Interrupt came from user space */
/*
- * Has a correct top of stack, but a partial stack frame
+ * Has a correct top of stack.
* %rcx: thread info. Interrupts off.
*/
retint_with_reschedule:
@@ -789,7 +751,8 @@ retint_restore_args: /* return to kernel space */
*/
TRACE_IRQS_IRETQ
restore_args:
- RESTORE_ARGS 1,8,1
+ RESTORE_C_REGS
+ REMOVE_PTREGS_FROM_STACK 8
irq_return:
/*
@@ -876,12 +839,12 @@ retint_signal:
jz retint_swapgs
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
- SAVE_REST
+ SAVE_EXTRA_REGS
movq $-1,ORIG_RAX(%rsp)
xorl %esi,%esi # oldset
movq %rsp,%rdi # &pt_regs
call do_notify_resume
- RESTORE_REST
+ RESTORE_EXTRA_REGS
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
GET_THREAD_INFO(%rcx)
@@ -1044,8 +1007,7 @@ ENTRY(\sym)
pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */
.endif
- subq $ORIG_RAX-R15, %rsp
- CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
+ ALLOC_PTREGS_ON_STACK
.if \paranoid
call save_paranoid
@@ -1256,7 +1218,9 @@ ENTRY(xen_failsafe_callback)
addq $0x30,%rsp
CFI_ADJUST_CFA_OFFSET -0x30
pushq_cfi $-1 /* orig_ax = -1 => not a system call */
- SAVE_ALL
+ ALLOC_PTREGS_ON_STACK
+ SAVE_C_REGS
+ SAVE_EXTRA_REGS
jmp error_exit
CFI_ENDPROC
END(xen_failsafe_callback)
@@ -1313,11 +1277,15 @@ ENTRY(paranoid_exit)
paranoid_swapgs:
TRACE_IRQS_IRETQ 0
SWAPGS_UNSAFE_STACK
- RESTORE_ALL 8
+ RESTORE_EXTRA_REGS
+ RESTORE_C_REGS
+ REMOVE_PTREGS_FROM_STACK 8
jmp irq_return
paranoid_restore:
TRACE_IRQS_IRETQ_DEBUG 0
- RESTORE_ALL 8
+ RESTORE_EXTRA_REGS
+ RESTORE_C_REGS
+ REMOVE_PTREGS_FROM_STACK 8
jmp irq_return
paranoid_userspace:
GET_THREAD_INFO(%rcx)
@@ -1357,21 +1325,8 @@ ENTRY(error_entry)
CFI_ADJUST_CFA_OFFSET 15*8
/* oldrax contains error code */
cld
- movq_cfi rdi, RDI+8
- movq_cfi rsi, RSI+8
- movq_cfi rdx, RDX+8
- movq_cfi rcx, RCX+8
- movq_cfi rax, RAX+8
- movq_cfi r8, R8+8
- movq_cfi r9, R9+8
- movq_cfi r10, R10+8
- movq_cfi r11, R11+8
- movq_cfi rbx, RBX+8
- movq_cfi rbp, RBP+8
- movq_cfi r12, R12+8
- movq_cfi r13, R13+8
- movq_cfi r14, R14+8
- movq_cfi r15, R15+8
+ SAVE_C_REGS 8
+ SAVE_EXTRA_REGS 8
xorl %ebx,%ebx
testl $3,CS+8(%rsp)
je error_kernelspace
@@ -1412,7 +1367,7 @@ END(error_entry)
ENTRY(error_exit)
DEFAULT_FRAME
movl %ebx,%eax
- RESTORE_REST
+ RESTORE_EXTRA_REGS
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
GET_THREAD_INFO(%rcx)
@@ -1631,8 +1586,8 @@ end_repeat_nmi:
* so that we repeat another NMI.
*/
pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */
- subq $ORIG_RAX-R15, %rsp
- CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
+ ALLOC_PTREGS_ON_STACK
+
/*
* Use save_paranoid to handle SWAPGS, but no need to use paranoid_exit
* as we should not be calling schedule in NMI context.
@@ -1671,8 +1626,10 @@ end_repeat_nmi:
nmi_swapgs:
SWAPGS_UNSAFE_STACK
nmi_restore:
+ RESTORE_EXTRA_REGS
+ RESTORE_C_REGS
/* Pop the extra iret frame at once */
- RESTORE_ALL 6*8
+ REMOVE_PTREGS_FROM_STACK 6*8
/* Clear the NMI executing stack variable */
movq $0, 5*8(%rsp)
diff --git a/arch/x86/kernel/preempt.S b/arch/x86/kernel/preempt.S
index ca7f0d5..673da2f 100644
--- a/arch/x86/kernel/preempt.S
+++ b/arch/x86/kernel/preempt.S
@@ -6,9 +6,13 @@
ENTRY(___preempt_schedule)
CFI_STARTPROC
- SAVE_ALL
+ ALLOC_PTREGS_ON_STACK
+ SAVE_C_REGS
+ SAVE_EXTRA_REGS
call preempt_schedule
- RESTORE_ALL
+ RESTORE_EXTRA_REGS
+ RESTORE_C_REGS
+ REMOVE_PTREGS_FROM_STACK
ret
CFI_ENDPROC
@@ -16,9 +20,13 @@ ENTRY(___preempt_schedule)
ENTRY(___preempt_schedule_context)
CFI_STARTPROC
- SAVE_ALL
+ ALLOC_PTREGS_ON_STACK
+ SAVE_C_REGS
+ SAVE_EXTRA_REGS
call preempt_schedule_context
- RESTORE_ALL
+ RESTORE_EXTRA_REGS
+ RESTORE_C_REGS
+ REMOVE_PTREGS_FROM_STACK
ret
CFI_ENDPROC
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 06/17] x86: mass removal of ARGOFFSET
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (4 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 05/17] x86: entry_64.S: always allocate complete "struct pt_regs" Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 07/17] x86: rename some macros and labels, no code changes Denys Vlasenko
` (11 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
ARGOFFSET is zero now, removing it changes no code.
A few macros lost "offset" parameter, since it is always zero now too.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/ia32/ia32entry.S | 140 ++++++++++++++++++++---------------------
arch/x86/include/asm/calling.h | 2 -
arch/x86/kernel/entry_64.S | 66 +++++++++----------
3 files changed, 103 insertions(+), 105 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index d0ca63b..20eba33 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -41,13 +41,13 @@
movl %edx,%edx /* zero extension */
.endm
- /* clobbers %eax */
- .macro CLEAR_RREGS offset=0, _r9=rax
+ /* clobbers %rax */
+ .macro CLEAR_RREGS _r9=rax
xorl %eax,%eax
- movq %rax,\offset+R11(%rsp)
- movq %rax,\offset+R10(%rsp)
- movq %\_r9,\offset+R9(%rsp)
- movq %rax,\offset+R8(%rsp)
+ movq %rax,R11(%rsp)
+ movq %rax,R10(%rsp)
+ movq %\_r9,R9(%rsp)
+ movq %rax,R8(%rsp)
.endm
/*
@@ -60,14 +60,14 @@
* If it's -1 to make us punt the syscall, then (u32)-1 is still
* an appropriately invalid value.
*/
- .macro LOAD_ARGS32 offset, _r9=0
+ .macro LOAD_ARGS32 _r9=0
.if \_r9
- movl \offset+R9(%rsp),%r9d
+ movl R9(%rsp),%r9d
.endif
- movl \offset+RCX(%rsp),%ecx
- movl \offset+RDX(%rsp),%edx
- movl \offset+RSI(%rsp),%esi
- movl \offset+RDI(%rsp),%edi
+ movl RCX(%rsp),%ecx
+ movl RDX(%rsp),%edx
+ movl RSI(%rsp),%esi
+ movl RDI(%rsp),%edi
movl %eax,%eax /* zero extension */
.endm
@@ -152,8 +152,8 @@ ENTRY(ia32_sysenter_target)
1: movl (%rbp),%ebp
_ASM_EXTABLE(1b,ia32_badarg)
ASM_CLAC
- orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
CFI_REMEMBER_STATE
jnz sysenter_tracesys
cmpq $(IA32_NR_syscalls-1),%rax
@@ -162,16 +162,16 @@ sysenter_do_call:
IA32_ARG_FIXUP
sysenter_dispatch:
call *ia32_sys_call_table(,%rax,8)
- movq %rax,RAX-ARGOFFSET(%rsp)
+ movq %rax,RAX(%rsp)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
jnz sysexit_audit
sysexit_from_sys_call:
- andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
/* clear IF, that popfq doesn't enable interrupts early */
- andl $~0x200,EFLAGS-ARGOFFSET(%rsp)
- movl RIP-ARGOFFSET(%rsp),%edx /* User %eip */
+ andl $~0x200,EFLAGS(%rsp)
+ movl RIP(%rsp),%edx /* User %eip */
CFI_REGISTER rip,rdx
RESTORE_RSI_RDI
REMOVE_PTREGS_FROM_STACK 8*3
@@ -195,18 +195,18 @@ sysexit_from_sys_call:
movl %eax,%esi /* 2nd arg: syscall number */
movl $AUDIT_ARCH_I386,%edi /* 1st arg: audit arch */
call __audit_syscall_entry
- movl RAX-ARGOFFSET(%rsp),%eax /* reload syscall number */
+ movl RAX(%rsp),%eax /* reload syscall number */
cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
movl %ebx,%edi /* reload 1st syscall arg */
- movl RCX-ARGOFFSET(%rsp),%esi /* reload 2nd syscall arg */
- movl RDX-ARGOFFSET(%rsp),%edx /* reload 3rd syscall arg */
- movl RSI-ARGOFFSET(%rsp),%ecx /* reload 4th syscall arg */
- movl RDI-ARGOFFSET(%rsp),%r8d /* reload 5th syscall arg */
+ movl RCX(%rsp),%esi /* reload 2nd syscall arg */
+ movl RDX(%rsp),%edx /* reload 3rd syscall arg */
+ movl RSI(%rsp),%ecx /* reload 4th syscall arg */
+ movl RDI(%rsp),%r8d /* reload 5th syscall arg */
.endm
.macro auditsys_exit exit
- testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
jnz ia32_ret_from_sys_call
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
@@ -217,13 +217,13 @@ sysexit_from_sys_call:
1: setbe %al /* 1 if error, 0 if not */
movzbl %al,%edi /* zero-extend that into %edi */
call __audit_syscall_exit
- movq RAX-ARGOFFSET(%rsp),%rax /* reload syscall return value */
+ movq RAX(%rsp),%rax /* reload syscall return value */
movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),%edi
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl %edi,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ testl %edi,TI_flags+THREAD_INFO(%rsp,RIP)
jz \exit
- CLEAR_RREGS -ARGOFFSET
+ CLEAR_RREGS
jmp int_with_check
.endm
@@ -239,7 +239,7 @@ sysexit_audit:
sysenter_tracesys:
#ifdef CONFIG_AUDITSYSCALL
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
jz sysenter_auditsys
#endif
SAVE_EXTRA_REGS
@@ -247,7 +247,7 @@ sysenter_tracesys:
movq $-ENOSYS,RAX(%rsp)/* ptrace can change this for a bad syscall */
movq %rsp,%rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
- LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */
+ LOAD_ARGS32 /* reload args from stack in case ptrace changed it */
RESTORE_EXTRA_REGS
cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* sysenter_tracesys has set RAX(%rsp) */
@@ -293,17 +293,17 @@ ENTRY(ia32_cstar_target)
ALLOC_PTREGS_ON_STACK 8
SAVE_C_REGS_EXCEPT_RCX_R891011
movl %eax,%eax /* zero extension */
- movq %rax,ORIG_RAX-ARGOFFSET(%rsp)
- movq %rcx,RIP-ARGOFFSET(%rsp)
- CFI_REL_OFFSET rip,RIP-ARGOFFSET
- movq %rbp,RCX-ARGOFFSET(%rsp) /* this lies slightly to ptrace */
+ movq %rax,ORIG_RAX(%rsp)
+ movq %rcx,RIP(%rsp)
+ CFI_REL_OFFSET rip,RIP
+ movq %rbp,RCX(%rsp) /* this lies slightly to ptrace */
movl %ebp,%ecx
- movq $__USER32_CS,CS-ARGOFFSET(%rsp)
- movq $__USER32_DS,SS-ARGOFFSET(%rsp)
- movq %r11,EFLAGS-ARGOFFSET(%rsp)
- /*CFI_REL_OFFSET rflags,EFLAGS-ARGOFFSET*/
- movq %r8,RSP-ARGOFFSET(%rsp)
- CFI_REL_OFFSET rsp,RSP-ARGOFFSET
+ movq $__USER32_CS,CS(%rsp)
+ movq $__USER32_DS,SS(%rsp)
+ movq %r11,EFLAGS(%rsp)
+ /*CFI_REL_OFFSET rflags,EFLAGS*/
+ movq %r8,RSP(%rsp)
+ CFI_REL_OFFSET rsp,RSP
/* no need to do an access_ok check here because r8 has been
32bit zero extended */
/* hardware stack frame is complete now */
@@ -311,8 +311,8 @@ ENTRY(ia32_cstar_target)
1: movl (%r8),%r9d
_ASM_EXTABLE(1b,ia32_badarg)
ASM_CLAC
- orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
CFI_REMEMBER_STATE
jnz cstar_tracesys
cmpq $IA32_NR_syscalls-1,%rax
@@ -321,32 +321,32 @@ cstar_do_call:
IA32_ARG_FIXUP 1
cstar_dispatch:
call *ia32_sys_call_table(,%rax,8)
- movq %rax,RAX-ARGOFFSET(%rsp)
+ movq %rax,RAX(%rsp)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
jnz sysretl_audit
sysretl_from_sys_call:
- andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
RESTORE_RSI_RDI_RDX
- movl RIP-ARGOFFSET(%rsp),%ecx
+ movl RIP(%rsp),%ecx
CFI_REGISTER rip,rcx
- movl EFLAGS-ARGOFFSET(%rsp),%r11d
+ movl EFLAGS(%rsp),%r11d
/*CFI_REGISTER rflags,r11*/
xorq %r10,%r10
xorq %r9,%r9
xorq %r8,%r8
TRACE_IRQS_ON
- movl RSP-ARGOFFSET(%rsp),%esp
+ movl RSP(%rsp),%esp
CFI_RESTORE rsp
USERGS_SYSRET32
#ifdef CONFIG_AUDITSYSCALL
cstar_auditsys:
CFI_RESTORE_STATE
- movl %r9d,R9-ARGOFFSET(%rsp) /* register to be clobbered by call */
+ movl %r9d,R9(%rsp) /* register to be clobbered by call */
auditsys_entry_common
- movl R9-ARGOFFSET(%rsp),%r9d /* reload 6th syscall arg */
+ movl R9(%rsp),%r9d /* reload 6th syscall arg */
jmp cstar_dispatch
sysretl_audit:
@@ -355,16 +355,16 @@ sysretl_audit:
cstar_tracesys:
#ifdef CONFIG_AUDITSYSCALL
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
jz cstar_auditsys
#endif
xchgl %r9d,%ebp
SAVE_EXTRA_REGS
- CLEAR_RREGS 0, r9
+ CLEAR_RREGS r9
movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
movq %rsp,%rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
- LOAD_ARGS32 ARGOFFSET, 1 /* reload args from stack in case ptrace changed it */
+ LOAD_ARGS32 1 /* reload args from stack in case ptrace changed it */
RESTORE_EXTRA_REGS
xchgl %ebp,%r9d
cmpq $(IA32_NR_syscalls-1),%rax
@@ -422,8 +422,8 @@ ENTRY(ia32_syscall)
this could be a problem. */
ALLOC_PTREGS_ON_STACK
SAVE_C_REGS_EXCEPT_R891011
- orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
jnz ia32_tracesys
cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
@@ -431,9 +431,9 @@ ia32_do_call:
IA32_ARG_FIXUP
call *ia32_sys_call_table(,%rax,8) # xxx: rip relative
ia32_sysret:
- movq %rax,RAX-ARGOFFSET(%rsp)
+ movq %rax,RAX(%rsp)
ia32_ret_from_sys_call:
- CLEAR_RREGS -ARGOFFSET
+ CLEAR_RREGS
jmp int_ret_from_sys_call
ia32_tracesys:
@@ -442,7 +442,7 @@ ia32_tracesys:
movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
movq %rsp,%rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
- LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */
+ LOAD_ARGS32 /* reload args from stack in case ptrace changed it */
RESTORE_EXTRA_REGS
cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* ia32_tracesys has set RAX(%rsp) */
@@ -450,7 +450,7 @@ ia32_tracesys:
END(ia32_syscall)
ia32_badsys:
- movq $0,ORIG_RAX-ARGOFFSET(%rsp)
+ movq $0,ORIG_RAX(%rsp)
movq $-ENOSYS,%rax
jmp ia32_sysret
@@ -482,17 +482,17 @@ ia32_ptregs_common:
CFI_ENDPROC
CFI_STARTPROC32 simple
CFI_SIGNAL_FRAME
- CFI_DEF_CFA rsp,SS+8-ARGOFFSET
- CFI_REL_OFFSET rax,RAX-ARGOFFSET
- CFI_REL_OFFSET rcx,RCX-ARGOFFSET
- CFI_REL_OFFSET rdx,RDX-ARGOFFSET
- CFI_REL_OFFSET rsi,RSI-ARGOFFSET
- CFI_REL_OFFSET rdi,RDI-ARGOFFSET
- CFI_REL_OFFSET rip,RIP-ARGOFFSET
-/* CFI_REL_OFFSET cs,CS-ARGOFFSET*/
-/* CFI_REL_OFFSET rflags,EFLAGS-ARGOFFSET*/
- CFI_REL_OFFSET rsp,RSP-ARGOFFSET
-/* CFI_REL_OFFSET ss,SS-ARGOFFSET*/
+ CFI_DEF_CFA rsp,SS+8
+ CFI_REL_OFFSET rax,RAX
+ CFI_REL_OFFSET rcx,RCX
+ CFI_REL_OFFSET rdx,RDX
+ CFI_REL_OFFSET rsi,RSI
+ CFI_REL_OFFSET rdi,RDI
+ CFI_REL_OFFSET rip,RIP
+/* CFI_REL_OFFSET cs,CS*/
+/* CFI_REL_OFFSET rflags,EFLAGS*/
+ CFI_REL_OFFSET rsp,RSP
+/* CFI_REL_OFFSET ss,SS*/
SAVE_EXTRA_REGS 8
call *%rax
RESTORE_EXTRA_REGS 8
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index 7642948..e8e2e41 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -88,8 +88,6 @@ For 32-bit we have the following conventions - kernel is built with
#define RSP 19*8
#define SS 20*8
-#define ARGOFFSET 0
-
.macro ALLOC_PTREGS_ON_STACK addskip=0
subq $15*8+\addskip, %rsp
CFI_ADJUST_CFA_OFFSET 15*8+\addskip
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index c489a2d..8c6a01d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -73,7 +73,7 @@ ENDPROC(native_usergs_sysret64)
#endif /* CONFIG_PARAVIRT */
-.macro TRACE_IRQS_IRETQ offset=ARGOFFSET
+.macro TRACE_IRQS_IRETQ offset=0
#ifdef CONFIG_TRACE_IRQFLAGS
bt $9,EFLAGS-\offset(%rsp) /* interrupts off? */
jnc 1f
@@ -107,7 +107,7 @@ ENDPROC(native_usergs_sysret64)
call debug_stack_reset
.endm
-.macro TRACE_IRQS_IRETQ_DEBUG offset=ARGOFFSET
+.macro TRACE_IRQS_IRETQ_DEBUG offset=0
bt $9,EFLAGS-\offset(%rsp) /* interrupts off? */
jnc 1f
TRACE_IRQS_ON_DEBUG
@@ -184,16 +184,16 @@ ENDPROC(native_usergs_sysret64)
* frame that enables passing a complete pt_regs to a C function.
*/
.macro DEFAULT_FRAME start=1 offset=0
- XCPT_FRAME \start, ORIG_RAX+\offset-ARGOFFSET
- CFI_REL_OFFSET rdi, RDI+\offset-ARGOFFSET
- CFI_REL_OFFSET rsi, RSI+\offset-ARGOFFSET
- CFI_REL_OFFSET rdx, RDX+\offset-ARGOFFSET
- CFI_REL_OFFSET rcx, RCX+\offset-ARGOFFSET
- CFI_REL_OFFSET rax, RAX+\offset-ARGOFFSET
- CFI_REL_OFFSET r8, R8+\offset-ARGOFFSET
- CFI_REL_OFFSET r9, R9+\offset-ARGOFFSET
- CFI_REL_OFFSET r10, R10+\offset-ARGOFFSET
- CFI_REL_OFFSET r11, R11+\offset-ARGOFFSET
+ XCPT_FRAME \start, ORIG_RAX+\offset
+ CFI_REL_OFFSET rdi, RDI+\offset
+ CFI_REL_OFFSET rsi, RSI+\offset
+ CFI_REL_OFFSET rdx, RDX+\offset
+ CFI_REL_OFFSET rcx, RCX+\offset
+ CFI_REL_OFFSET rax, RAX+\offset
+ CFI_REL_OFFSET r8, R8+\offset
+ CFI_REL_OFFSET r9, R9+\offset
+ CFI_REL_OFFSET r10, R10+\offset
+ CFI_REL_OFFSET r11, R11+\offset
CFI_REL_OFFSET rbx, RBX+\offset
CFI_REL_OFFSET rbp, RBP+\offset
CFI_REL_OFFSET r12, R12+\offset
@@ -237,13 +237,13 @@ ENTRY(ret_from_fork)
RESTORE_EXTRA_REGS
- testl $3, CS-ARGOFFSET(%rsp) # from kernel_thread?
+ testl $3, CS(%rsp) # from kernel_thread?
jz 1f
testl $_TIF_IA32, TI_flags(%rcx) # 32-bit compat task needs IRET
jnz int_ret_from_sys_call
- RESTORE_TOP_OF_STACK %rdi, -ARGOFFSET
+ RESTORE_TOP_OF_STACK %rdi
jmp ret_from_sys_call # go to the SYSRET fastpath
1:
@@ -310,10 +310,10 @@ GLOBAL(system_call_after_swapgs)
ENABLE_INTERRUPTS(CLBR_NONE)
ALLOC_PTREGS_ON_STACK 8
SAVE_C_REGS
- movq %rax,ORIG_RAX-ARGOFFSET(%rsp)
- movq %rcx,RIP-ARGOFFSET(%rsp)
- CFI_REL_OFFSET rip,RIP-ARGOFFSET
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ movq %rax,ORIG_RAX(%rsp)
+ movq %rcx,RIP(%rsp)
+ CFI_REL_OFFSET rip,RIP
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
jnz tracesys
system_call_fastpath:
#if __SYSCALL_MASK == ~0
@@ -325,7 +325,7 @@ system_call_fastpath:
ja badsys
movq %r10,%rcx
call *sys_call_table(,%rax,8) # XXX: rip relative
- movq %rax,RAX-ARGOFFSET(%rsp)
+ movq %rax,RAX(%rsp)
/*
* Syscall return path ending with SYSRET (fast path)
* Has incomplete stack frame and undefined top of stack.
@@ -337,7 +337,7 @@ sysret_check:
LOCKDEP_SYS_EXIT
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- movl TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET),%edx
+ movl TI_flags+THREAD_INFO(%rsp,RIP),%edx
andl %edi,%edx
jnz sysret_careful
CFI_REMEMBER_STATE
@@ -346,7 +346,7 @@ sysret_check:
*/
TRACE_IRQS_ON
RESTORE_C_REGS_EXCEPT_RCX
- movq RIP-ARGOFFSET(%rsp),%rcx
+ movq RIP(%rsp),%rcx
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
movq PER_CPU_VAR(old_rsp), %rsp
@@ -378,11 +378,11 @@ sysret_signal:
* These all wind up with the iret return path anyway,
* so just join that path right now.
*/
- FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
+ FIXUP_TOP_OF_STACK %r11
jmp int_check_syscall_exit_work
badsys:
- movq $-ENOSYS,RAX-ARGOFFSET(%rsp)
+ movq $-ENOSYS,RAX(%rsp)
jmp ret_from_sys_call
#ifdef CONFIG_AUDITSYSCALL
@@ -408,7 +408,7 @@ auditsys:
* masked off.
*/
sysret_audit:
- movq RAX-ARGOFFSET(%rsp),%rsi /* second arg, syscall return value */
+ movq RAX(%rsp),%rsi /* second arg, syscall return value */
cmpq $-MAX_ERRNO,%rsi /* is it < -MAX_ERRNO? */
setbe %al /* 1 if so, 0 if not */
movzbl %al,%edi /* zero-extend that into %edi */
@@ -420,7 +420,7 @@ sysret_audit:
/* Do syscall tracing */
tracesys:
#ifdef CONFIG_AUDITSYSCALL
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+ testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
jz auditsys
#endif
SAVE_EXTRA_REGS
@@ -444,7 +444,7 @@ tracesys:
ja int_ret_from_sys_call /* RAX(%rsp) set to -ENOSYS above */
movq %r10,%rcx /* fixup for C */
call *sys_call_table(,%rax,8)
- movq %rax,RAX-ARGOFFSET(%rsp)
+ movq %rax,RAX(%rsp)
/* Use IRET because user could have changed frame */
/*
@@ -528,9 +528,9 @@ END(stub_\func)
ENTRY(\label)
CFI_STARTPROC
DEFAULT_FRAME 0, 8 /* offset 8: return address */
- FIXUP_TOP_OF_STACK %r11, 8-ARGOFFSET
+ FIXUP_TOP_OF_STACK %r11, 8
call \func
- RESTORE_TOP_OF_STACK %r11, 8-ARGOFFSET
+ RESTORE_TOP_OF_STACK %r11, 8
ret
CFI_ENDPROC
END(\label)
@@ -702,7 +702,7 @@ common_interrupt:
ASM_CLAC
addq $-0x80,(%rsp) /* Adjust vector to [-256,-1] range */
interrupt do_IRQ
- /* 0(%rsp): old_rsp-ARGOFFSET */
+ /* 0(%rsp): old_rsp */
ret_from_intr:
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
@@ -712,13 +712,13 @@ ret_from_intr:
popq %rsi
CFI_DEF_CFA rsi,SS+8-RBP /* reg/off reset after def_cfa_expr */
/* return code expects complete pt_regs - adjust rsp accordingly: */
- leaq ARGOFFSET-RBP(%rsi), %rsp
+ leaq -RBP(%rsi), %rsp
CFI_DEF_CFA_REGISTER rsp
- CFI_ADJUST_CFA_OFFSET RBP-ARGOFFSET
+ CFI_ADJUST_CFA_OFFSET RBP
exit_intr:
GET_THREAD_INFO(%rcx)
- testl $3,CS-ARGOFFSET(%rsp)
+ testl $3,CS(%rsp)
je retint_kernel
/* Interrupt came from user space */
@@ -856,7 +856,7 @@ retint_signal:
ENTRY(retint_kernel)
cmpl $0,PER_CPU_VAR(__preempt_count)
jnz retint_restore_args
- bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */
+ bt $9,EFLAGS(%rsp) /* interrupts off? */
jnc retint_restore_args
call preempt_schedule_irq
jmp exit_intr
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 07/17] x86: rename some macros and labels, no code changes
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (5 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 06/17] x86: mass removal of ARGOFFSET Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 08/17] x86: add comments about various syscall instructions, " Denys Vlasenko
` (10 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
Rename LOAD_ARGS32 to RESTORE_REGS32 to match other RESTORE_* macros.
The "ARGS" part was misleading anyway - we are restoring registers,
not arguments.
Similarly, rename [retint_]restore_args to [retint_]restore_c_regs:
at these labels, we restore registers clobbered by C ABI;
rename int_restore_rest to int_restore_extra_regs.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/ia32/ia32entry.S | 10 +++++-----
arch/x86/kernel/entry_64.S | 16 ++++++++--------
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 20eba33..4402bbe 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -51,7 +51,7 @@
.endm
/*
- * Reload arg registers from stack in case ptrace changed them.
+ * Reload registers from stack in case ptrace changed them.
* We don't reload %eax because syscall_trace_enter() returned
* the %rax value we should see. Instead, we just truncate that
* value to 32 bits again as we did on entry from user mode.
@@ -60,7 +60,7 @@
* If it's -1 to make us punt the syscall, then (u32)-1 is still
* an appropriately invalid value.
*/
- .macro LOAD_ARGS32 _r9=0
+ .macro RESTORE_REGS32 _r9=0
.if \_r9
movl R9(%rsp),%r9d
.endif
@@ -247,7 +247,7 @@ sysenter_tracesys:
movq $-ENOSYS,RAX(%rsp)/* ptrace can change this for a bad syscall */
movq %rsp,%rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
- LOAD_ARGS32 /* reload args from stack in case ptrace changed it */
+ RESTORE_REGS32 /* reload regs from stack in case ptrace changed it */
RESTORE_EXTRA_REGS
cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* sysenter_tracesys has set RAX(%rsp) */
@@ -364,7 +364,7 @@ cstar_tracesys:
movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
movq %rsp,%rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
- LOAD_ARGS32 1 /* reload args from stack in case ptrace changed it */
+ RESTORE_REGS32 1 /* reload regs from stack in case ptrace changed it */
RESTORE_EXTRA_REGS
xchgl %ebp,%r9d
cmpq $(IA32_NR_syscalls-1),%rax
@@ -442,7 +442,7 @@ ia32_tracesys:
movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
movq %rsp,%rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
- LOAD_ARGS32 /* reload args from stack in case ptrace changed it */
+ RESTORE_REGS32 /* reload args from stack in case ptrace changed it */
RESTORE_EXTRA_REGS
cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* ia32_tracesys has set RAX(%rsp) */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 8c6a01d..e9bbe02 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -62,7 +62,7 @@
#ifndef CONFIG_PREEMPT
-#define retint_kernel retint_restore_args
+#define retint_kernel retint_restore_c_regs
#endif
#ifdef CONFIG_PARAVIRT
@@ -494,7 +494,7 @@ int_check_syscall_exit_work:
call syscall_trace_leave
popq_cfi %rdi
andl $~(_TIF_WORK_SYSCALL_EXIT|_TIF_SYSCALL_EMU),%edi
- jmp int_restore_rest
+ jmp int_restore_extra_regs
int_signal:
testl $_TIF_DO_NOTIFY_MASK,%edx
@@ -503,7 +503,7 @@ int_signal:
xorl %esi,%esi # oldset -> arg2
call do_notify_resume
1: movl $_TIF_WORK_MASK,%edi
-int_restore_rest:
+int_restore_extra_regs:
RESTORE_EXTRA_REGS
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
@@ -742,15 +742,15 @@ retint_swapgs: /* return to user-space */
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_IRETQ
SWAPGS
- jmp restore_args
+ jmp restore_c_regs
-retint_restore_args: /* return to kernel space */
+retint_restore_c_regs: /* return to kernel space */
DISABLE_INTERRUPTS(CLBR_ANY)
/*
* The iretq could re-enable interrupts:
*/
TRACE_IRQS_IRETQ
-restore_args:
+restore_c_regs:
RESTORE_C_REGS
REMOVE_PTREGS_FROM_STACK 8
@@ -855,9 +855,9 @@ retint_signal:
/* rcx: threadinfo. interrupts off. */
ENTRY(retint_kernel)
cmpl $0,PER_CPU_VAR(__preempt_count)
- jnz retint_restore_args
+ jnz retint_restore_c_regs
bt $9,EFLAGS(%rsp) /* interrupts off? */
- jnc retint_restore_args
+ jnc retint_restore_c_regs
call preempt_schedule_irq
jmp exit_intr
#endif
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 08/17] x86: add comments about various syscall instructions, no code changes
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (6 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 07/17] x86: rename some macros and labels, no code changes Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 09/17] x86: entry_64.S: move save_paranoid and ret_from_fork closer to their users Denys Vlasenko
` (9 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
SYSCALL/SYSRET and SYSENTER/SYSEXIT have weird semantics.
Moreover, they differ in 32- and 64-bit mode.
What is saved? What is not? Is rsp set? Are interrupts disabled?
People tend to not remember these details well enough.
This patch adds comments which explain in detail
what registers are modified by each of these instructions.
The comments are placed immediately before corresponding
entry and exit points.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/ia32/ia32entry.S | 133 ++++++++++++++++++++++++++++-----------------
arch/x86/kernel/entry_64.S | 32 ++++++-----
2 files changed, 102 insertions(+), 63 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4402bbe..fe6eaf7 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -99,22 +99,25 @@ ENDPROC(native_irq_enable_sysexit)
/*
* 32bit SYSENTER instruction entry.
*
+ * SYSENTER loads ss, rsp, cs, and rip from previously programmed MSRs.
+ * IF and VM in rflags are cleared (IOW: interrupts are off).
+ * SYSENTER does not save anything on the stack,
+ * and does not save old rip (!!!) and rflags.
+ *
* Arguments:
- * %eax System call number.
- * %ebx Arg1
- * %ecx Arg2
- * %edx Arg3
- * %esi Arg4
- * %edi Arg5
- * %ebp user stack
- * 0(%ebp) Arg6
- *
- * Interrupts off.
- *
+ * eax system call number
+ * ebx arg1
+ * ecx arg2
+ * edx arg3
+ * esi arg4
+ * edi arg5
+ * ebp user stack
+ * 0(%ebp) arg6
+ *
* This is purely a fast path. For anything complicated we use the int 0x80
- * path below. Set up a complete hardware stack frame to share code
+ * path below. We set up a complete hardware stack frame to share code
* with the int 0x80 path.
- */
+ */
ENTRY(ia32_sysenter_target)
CFI_STARTPROC32 simple
CFI_SIGNAL_FRAME
@@ -128,6 +131,7 @@ ENTRY(ia32_sysenter_target)
* disabled irqs, here we enable it straight after entry:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
+ /* Construct iret frame (ss,rsp,rflags,cs,rip) */
movl %ebp,%ebp /* zero extension */
pushq_cfi $__USER32_DS
/*CFI_REL_OFFSET ss,0*/
@@ -140,14 +144,19 @@ ENTRY(ia32_sysenter_target)
pushq_cfi $__USER32_CS
/*CFI_REL_OFFSET cs,0*/
movl %eax, %eax
+ /* Store thread_info->sysenter_return in rip stack slot */
pushq_cfi %r10
CFI_REL_OFFSET rip,0
+ /* Store orig_ax */
pushq_cfi %rax
+ /* Construct the rest of "struct pt_regs" */
cld
ALLOC_PTREGS_ON_STACK
SAVE_C_REGS_EXCEPT_R891011
- /* no need to do an access_ok check here because rbp has been
- 32bit zero extended */
+ /*
+ * no need to do an access_ok check here because rbp has been
+ * 32bit zero extended
+ */
ASM_STAC
1: movl (%rbp),%ebp
_ASM_EXTABLE(1b,ia32_badarg)
@@ -174,6 +183,7 @@ sysexit_from_sys_call:
movl RIP(%rsp),%edx /* User %eip */
CFI_REGISTER rip,rdx
RESTORE_RSI_RDI
+ /* pop everything except ss,rsp,rflags slots */
REMOVE_PTREGS_FROM_STACK 8*3
xorq %r8,%r8
xorq %r9,%r9
@@ -184,6 +194,10 @@ sysexit_from_sys_call:
popq_cfi %rcx /* User %esp */
CFI_REGISTER rsp,rcx
TRACE_IRQS_ON
+ /*
+ * 32bit SYSEXIT restores eip from edx, esp from ecx.
+ * cs and ss are loaded from MSRs.
+ */
ENABLE_INTERRUPTS_SYSEXIT32
#ifdef CONFIG_AUDITSYSCALL
@@ -258,23 +272,33 @@ ENDPROC(ia32_sysenter_target)
/*
* 32bit SYSCALL instruction entry.
*
+ * 32bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
+ * then loads new ss, cs, and rip from previously programmed MSRs.
+ * rflags gets masked by a value from another MSR (so CLD and CLAC
+ * are not needed). SYSCALL does not save anything on the stack
+ * and does not change rsp.
+ *
+ * Note: rflags saving+masking-with-MSR happens only in Long mode
+ * (in legacy 32bit mode, IF, RF and VM bits are cleared and that's it).
+ * Don't get confused: rflags saving+masking depends on Long Mode Active bit
+ * (EFER.LMA=1), NOT on bitness of userspace where SYSCALL executes
+ * or target CS descriptor's L bit (SYSCALL does not read segment descriptors).
+ *
* Arguments:
- * %eax System call number.
- * %ebx Arg1
- * %ecx return EIP
- * %edx Arg3
- * %esi Arg4
- * %edi Arg5
- * %ebp Arg2 [note: not saved in the stack frame, should not be touched]
- * %esp user stack
- * 0(%esp) Arg6
- *
- * Interrupts off.
- *
+ * eax system call number
+ * ecx return address
+ * ebx arg1
+ * ebp arg2 (note: not saved in the stack frame, should not be touched)
+ * edx arg3
+ * esi arg4
+ * edi arg5
+ * esp user stack
+ * 0(%esp) arg6
+ *
* This is purely a fast path. For anything complicated we use the int 0x80
- * path below. Set up a complete hardware stack frame to share code
- * with the int 0x80 path.
- */
+ * path below. We set up a complete hardware stack frame to share code
+ * with the int 0x80 path.
+ */
ENTRY(ia32_cstar_target)
CFI_STARTPROC32 simple
CFI_SIGNAL_FRAME
@@ -290,7 +314,7 @@ ENTRY(ia32_cstar_target)
* disabled irqs and here we enable it straight after entry:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- ALLOC_PTREGS_ON_STACK 8
+ ALLOC_PTREGS_ON_STACK 8 /* +8: space for orig_ax */
SAVE_C_REGS_EXCEPT_RCX_R891011
movl %eax,%eax /* zero extension */
movq %rax,ORIG_RAX(%rsp)
@@ -304,9 +328,11 @@ ENTRY(ia32_cstar_target)
/*CFI_REL_OFFSET rflags,EFLAGS*/
movq %r8,RSP(%rsp)
CFI_REL_OFFSET rsp,RSP
- /* no need to do an access_ok check here because r8 has been
- 32bit zero extended */
- /* hardware stack frame is complete now */
+ /* iret stack frame is complete now */
+ /*
+ * no need to do an access_ok check here because r8 has been
+ * 32bit zero extended
+ */
ASM_STAC
1: movl (%r8),%r9d
_ASM_EXTABLE(1b,ia32_badarg)
@@ -339,8 +365,15 @@ sysretl_from_sys_call:
TRACE_IRQS_ON
movl RSP(%rsp),%esp
CFI_RESTORE rsp
+ /*
+ * 64bit->32bit SYSRET restores eip from ecx,
+ * eflags from r11 (but RF and VM bits are forced to 0),
+ * cs and ss are loaded from MSRs.
+ * (Note: 32bit->32bit SYSRET is different: since r11
+ * does not exist, it merely sets eflags.IF=1).
+ */
USERGS_SYSRET32
-
+
#ifdef CONFIG_AUDITSYSCALL
cstar_auditsys:
CFI_RESTORE_STATE
@@ -378,26 +411,26 @@ ia32_badarg:
jmp ia32_sysret
CFI_ENDPROC
-/*
- * Emulated IA32 system calls via int 0x80.
+/*
+ * Emulated IA32 system calls via int 0x80.
*
- * Arguments:
- * %eax System call number.
- * %ebx Arg1
- * %ecx Arg2
- * %edx Arg3
- * %esi Arg4
- * %edi Arg5
- * %ebp Arg6 [note: not saved in the stack frame, should not be touched]
+ * Arguments:
+ * eax system call number
+ * ebx arg1
+ * ecx arg2
+ * edx arg3
+ * esi arg4
+ * edi arg5
+ * ebp arg6 (note: not saved in the stack frame, should not be touched)
*
* Notes:
- * Uses the same stack frame as the x86-64 version.
- * All registers except %eax must be saved (but ptrace may violate that)
+ * Uses the same stack frame as the x86-64 version.
+ * All registers except eax must be saved (but ptrace may violate that).
* Arguments are zero extended. For system calls that want sign extension and
* take long arguments a wrapper is needed. Most calls can just be called
* directly.
- * Assumes it is only called from user space and entered with interrupts off.
- */
+ * Assumes it is only called from user space and entered with interrupts off.
+ */
ENTRY(ia32_syscall)
CFI_STARTPROC32 simple
@@ -416,7 +449,7 @@ ENTRY(ia32_syscall)
*/
ENABLE_INTERRUPTS(CLBR_NONE)
movl %eax,%eax
- pushq_cfi %rax
+ pushq_cfi %rax /* store orig_ax */
cld
/* note the registers are not zero extended to the sf.
this could be a problem. */
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e9bbe02..4a2e3ab 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -256,25 +256,25 @@ ENTRY(ret_from_fork)
END(ret_from_fork)
/*
- * System call entry. Up to 6 arguments in registers are supported.
+ * 64bit SYSCALL instruction entry. Up to 6 arguments in registers.
*
- * SYSCALL does not save anything on the stack and does not change the
- * stack pointer. However, it does mask the flags register for us, so
- * CLD and CLAC are not needed.
- */
-
-/*
- * Register setup:
+ * 64bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
+ * then loads new ss, cs, and rip from previously programmed MSRs.
+ * rflags gets masked by a value from another MSR (so CLD and CLAC
+ * are not needed). SYSCALL does not save anything on the stack
+ * and does not change rsp.
+ *
+ * Registers on entry:
* rax system call number
+ * rcx return address
+ * r11 saved rflags (note: r11 is callee-clobbered register in C ABI)
* rdi arg0
- * rcx return address for syscall/sysret, C arg3
* rsi arg1
* rdx arg2
- * r10 arg3 (--> moved to rcx for C)
+ * r10 arg3 (needs to be moved to rcx to conform to C ABI)
* r8 arg4
* r9 arg5
- * r11 eflags for syscall/sysret, temporary for C
- * r12-r15,rbp,rbx saved by C code, not touched.
+ * (note: r12-r15,rbp,rbx are callee-preserved in C ABI)
*
* Interrupts are off on entry.
* Only called from user space.
@@ -302,13 +302,14 @@ ENTRY(system_call)
GLOBAL(system_call_after_swapgs)
movq %rsp,PER_CPU_VAR(old_rsp)
+ /* kernel_stack is set so that 5 slots (iret frame) are preallocated */
movq PER_CPU_VAR(kernel_stack),%rsp
/*
* No need to follow this irqs off/on section - it's straight
* and short:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- ALLOC_PTREGS_ON_STACK 8
+ ALLOC_PTREGS_ON_STACK 8 /* +8: space for orig_ax */
SAVE_C_REGS
movq %rax,ORIG_RAX(%rsp)
movq %rcx,RIP(%rsp)
@@ -350,6 +351,11 @@ sysret_check:
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
movq PER_CPU_VAR(old_rsp), %rsp
+ /*
+ * 64bit SYSRET restores rip from rcx,
+ * rflags from r11 (but RF and VM bits are forced to 0),
+ * cs and ss are loaded from MSRs.
+ */
USERGS_SYSRET64
CFI_RESTORE_STATE
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 09/17] x86: entry_64.S: move save_paranoid and ret_from_fork closer to their users
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (7 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 08/17] x86: add comments about various syscall instructions, " Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 10/17] x86: entry_64.S: rename save_paranoid to paranoid_entry, no code changes Denys Vlasenko
` (8 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
For some odd reasons, these two functions are at the very top of the file,
whereas their callers are approximately in the middle of it.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/kernel/entry_64.S | 106 ++++++++++++++++++++++-----------------------
1 file changed, 53 insertions(+), 53 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4a2e3ab..5ad9b28 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -202,59 +202,6 @@ ENDPROC(native_usergs_sysret64)
CFI_REL_OFFSET r15, R15+\offset
.endm
-ENTRY(save_paranoid)
- XCPT_FRAME 1 RDI+8
- cld
- SAVE_C_REGS 8
- SAVE_EXTRA_REGS 8
- movl $1,%ebx
- movl $MSR_GS_BASE,%ecx
- rdmsr
- testl %edx,%edx
- js 1f /* negative -> in kernel */
- SWAPGS
- xorl %ebx,%ebx
-1: ret
- CFI_ENDPROC
-END(save_paranoid)
-
-/*
- * A newly forked process directly context switches into this address.
- *
- * rdi: prev task we switched from
- */
-ENTRY(ret_from_fork)
- DEFAULT_FRAME
-
- LOCK ; btr $TIF_FORK,TI_flags(%r8)
-
- pushq_cfi $0x0002
- popfq_cfi # reset kernel eflags
-
- call schedule_tail # rdi: 'prev' task parameter
-
- GET_THREAD_INFO(%rcx)
-
- RESTORE_EXTRA_REGS
-
- testl $3, CS(%rsp) # from kernel_thread?
- jz 1f
-
- testl $_TIF_IA32, TI_flags(%rcx) # 32-bit compat task needs IRET
- jnz int_ret_from_sys_call
-
- RESTORE_TOP_OF_STACK %rdi
- jmp ret_from_sys_call # go to the SYSRET fastpath
-
-1:
- movq %rbp, %rdi
- call *%rbx
- movl $0, RAX(%rsp)
- RESTORE_EXTRA_REGS
- jmp int_ret_from_sys_call
- CFI_ENDPROC
-END(ret_from_fork)
-
/*
* 64bit SYSCALL instruction entry. Up to 6 arguments in registers.
*
@@ -608,6 +555,43 @@ END(stub_x32_execve)
#endif
/*
+ * A newly forked process directly context switches into this address.
+ *
+ * rdi: prev task we switched from
+ */
+ENTRY(ret_from_fork)
+ DEFAULT_FRAME
+
+ LOCK ; btr $TIF_FORK,TI_flags(%r8)
+
+ pushq_cfi $0x0002
+ popfq_cfi # reset kernel eflags
+
+ call schedule_tail # rdi: 'prev' task parameter
+
+ GET_THREAD_INFO(%rcx)
+
+ RESTORE_EXTRA_REGS
+
+ testl $3, CS(%rsp) # from kernel_thread?
+ jz 1f
+
+ testl $_TIF_IA32, TI_flags(%rcx) # 32-bit compat task needs IRET
+ jnz int_ret_from_sys_call
+
+ RESTORE_TOP_OF_STACK %rdi
+ jmp ret_from_sys_call # go to the SYSRET fastpath
+
+1:
+ movq %rbp, %rdi
+ call *%rbx
+ movl $0, RAX(%rsp)
+ RESTORE_EXTRA_REGS
+ jmp int_ret_from_sys_call
+ CFI_ENDPROC
+END(ret_from_fork)
+
+/*
* Build the entry stubs and pointer table with some assembler magic.
* We pack 7 stubs into a single 32-byte chunk, which will fit in a
* single cache line on all modern x86 implementations.
@@ -1258,6 +1242,22 @@ idtentry async_page_fault do_async_page_fault has_error_code=1
idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vector(%rip)
#endif
+ENTRY(save_paranoid)
+ XCPT_FRAME 1 RDI+8
+ cld
+ SAVE_C_REGS 8
+ SAVE_EXTRA_REGS 8
+ movl $1,%ebx
+ movl $MSR_GS_BASE,%ecx
+ rdmsr
+ testl %edx,%edx
+ js 1f /* negative -> in kernel */
+ SWAPGS
+ xorl %ebx,%ebx
+1: ret
+ CFI_ENDPROC
+END(save_paranoid)
+
/*
* "Paranoid" exit path from exception stack.
* Paranoid because this is used by NMIs and cannot take
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 10/17] x86: entry_64.S: rename save_paranoid to paranoid_entry, no code changes
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (8 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 09/17] x86: entry_64.S: move save_paranoid and ret_from_fork closer to their users Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 11/17] x86: entry_64.S: fold test_in_nmi macro into its only user Denys Vlasenko
` (7 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
This patch does a lot of cleanup in comments and formatting,
but it does not change any code.
Rename save_paranoid to paranoid_entry: this makes naming
similar to its "non-paranoid" sibling, error_entry,
and to it counterpart, paranoid_exit.
Use the same CFI annotation atop paranoid_entry and error_entry.
Fix irregular indentation of assembler operands.
Add/fix comments on top of paranoid_entry and error_entry.
Remove stale comment about "oldrax".
Make comments about "no swapgs" flag in ebx more prominent.
Deindent wrongly indented top-level comment atop paranoid_exit.
Indent wrongly deindented comment inside error_entry.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/kernel/entry_64.S | 117 +++++++++++++++++++++++----------------------
1 file changed, 59 insertions(+), 58 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 5ad9b28..decfe25 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1000,10 +1000,11 @@ ENTRY(\sym)
ALLOC_PTREGS_ON_STACK
.if \paranoid
- call save_paranoid
+ call paranoid_entry
.else
call error_entry
.endif
+ /* returned flag: ebx=0: need swapgs on exit, ebx=1: don't need it */
DEFAULT_FRAME 0
@@ -1034,10 +1035,11 @@ ENTRY(\sym)
addq $EXCEPTION_STKSZ, INIT_TSS_IST(\shift_ist)
.endif
+ /* these procedures expect "no swapgs" flag in ebx */
.if \paranoid
- jmp paranoid_exit /* %ebx: no swapgs flag */
+ jmp paranoid_exit
.else
- jmp error_exit /* %ebx: no swapgs flag */
+ jmp error_exit
.endif
CFI_ENDPROC
@@ -1242,36 +1244,40 @@ idtentry async_page_fault do_async_page_fault has_error_code=1
idtentry machine_check has_error_code=0 paranoid=1 do_sym=*machine_check_vector(%rip)
#endif
-ENTRY(save_paranoid)
- XCPT_FRAME 1 RDI+8
+/*
+ * Save all registers in pt_regs, and switch gs if needed.
+ * Use slow, but surefire "are we in kernel?" check.
+ * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ */
+ENTRY(paranoid_entry)
+ XCPT_FRAME 1 15*8
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
- movl $1,%ebx
- movl $MSR_GS_BASE,%ecx
+ movl $1,%ebx
+ movl $MSR_GS_BASE,%ecx
rdmsr
- testl %edx,%edx
- js 1f /* negative -> in kernel */
+ testl %edx,%edx
+ js 1f /* negative -> in kernel */
SWAPGS
- xorl %ebx,%ebx
+ xorl %ebx,%ebx
1: ret
CFI_ENDPROC
-END(save_paranoid)
-
- /*
- * "Paranoid" exit path from exception stack.
- * Paranoid because this is used by NMIs and cannot take
- * any kernel state for granted.
- * We don't do kernel preemption checks here, because only
- * NMI should be common and it does not enable IRQs and
- * cannot get reschedule ticks.
- *
- * "trace" is 0 for the NMI handler only, because irq-tracing
- * is fundamentally NMI-unsafe. (we cannot change the soft and
- * hard flags at once, atomically)
- */
+END(paranoid_entry)
- /* ebx: no swapgs flag */
+/*
+ * "Paranoid" exit path from exception stack.
+ * Paranoid because this is used by NMIs and cannot take
+ * any kernel state for granted.
+ * We don't do kernel preemption checks here, because only
+ * NMI should be common and it does not enable IRQs and
+ * cannot get reschedule ticks.
+ *
+ * "trace" is 0 for the NMI handler only, because irq-tracing
+ * is fundamentally NMI-unsafe. (we cannot change the soft and
+ * hard flags at once, atomically)
+ */
+/* On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it) */
ENTRY(paranoid_exit)
DEFAULT_FRAME
DISABLE_INTERRUPTS(CLBR_NONE)
@@ -1323,53 +1329,48 @@ paranoid_schedule:
END(paranoid_exit)
/*
- * Exception entry point. This expects an error code/orig_rax on the stack.
- * returns in "no swapgs flag" in %ebx.
+ * Save all registers in pt_regs, and switch gs if needed.
+ * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
*/
ENTRY(error_entry)
- XCPT_FRAME
- CFI_ADJUST_CFA_OFFSET 15*8
- /* oldrax contains error code */
+ XCPT_FRAME 1 15*8
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
- xorl %ebx,%ebx
- testl $3,CS+8(%rsp)
- je error_kernelspace
+ xorl %ebx,%ebx
+ testl $3,CS+8(%rsp)
+ jz error_kernelspace
error_swapgs:
SWAPGS
error_sti:
TRACE_IRQS_OFF
ret
-
-/*
- * There are two places in the kernel that can potentially fault with
- * usergs. Handle them here. The exception handlers after iret run with
- * kernel gs again, so don't set the user space flag. B stepping K8s
- * sometimes report an truncated RIP for IRET exceptions returning to
- * compat mode. Check for these here too.
- */
+ /*
+ * There are two places in the kernel that can potentially fault with
+ * usergs. Handle them here. The exception handlers after iret run with
+ * kernel gs again, so don't set the user space flag. B stepping K8s
+ * sometimes report an truncated RIP for IRET exceptions returning to
+ * compat mode. Check for these here too.
+ */
error_kernelspace:
- incl %ebx
- leaq irq_return_iret(%rip),%rcx
- cmpq %rcx,RIP+8(%rsp)
- je error_swapgs
- movl %ecx,%eax /* zero extend */
- cmpq %rax,RIP+8(%rsp)
- je bstep_iret
- cmpq $gs_change,RIP+8(%rsp)
- je error_swapgs
- jmp error_sti
-
+ incl %ebx
+ leaq irq_return_iret(%rip),%rcx
+ cmpq %rcx,RIP+8(%rsp)
+ je error_swapgs
+ movl %ecx,%eax /* zero extend */
+ cmpq %rax,RIP+8(%rsp)
+ je bstep_iret
+ cmpq $gs_change,RIP+8(%rsp)
+ je error_swapgs
+ jmp error_sti
bstep_iret:
/* Fix truncated RIP */
- movq %rcx,RIP+8(%rsp)
- jmp error_swapgs
+ movq %rcx,RIP+8(%rsp)
+ jmp error_swapgs
CFI_ENDPROC
END(error_entry)
-
-/* ebx: no swapgs flag (1: don't need swapgs, 0: need it) */
+/* On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it) */
ENTRY(error_exit)
DEFAULT_FRAME
movl %ebx,%eax
@@ -1595,13 +1596,13 @@ end_repeat_nmi:
ALLOC_PTREGS_ON_STACK
/*
- * Use save_paranoid to handle SWAPGS, but no need to use paranoid_exit
+ * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
* as we should not be calling schedule in NMI context.
* Even with normal interrupts enabled. An NMI should not be
* setting NEED_RESCHED or anything that normal interrupts and
* exceptions might do.
*/
- call save_paranoid
+ call paranoid_entry
DEFAULT_FRAME 0
/*
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 11/17] x86: entry_64.S: fold test_in_nmi macro into its only user
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (9 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 10/17] x86: entry_64.S: rename save_paranoid to paranoid_entry, no code changes Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 12/17] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
` (6 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
No code changes.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/kernel/entry_64.S | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index decfe25..bf882af 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1389,19 +1389,7 @@ ENTRY(error_exit)
CFI_ENDPROC
END(error_exit)
-/*
- * Test if a given stack is an NMI stack or not.
- */
- .macro test_in_nmi reg stack nmi_ret normal_ret
- cmpq %\reg, \stack
- ja \normal_ret
- subq $EXCEPTION_STKSZ, %\reg
- cmpq %\reg, \stack
- jb \normal_ret
- jmp \nmi_ret
- .endm
-
- /* runs on exception stack */
+/* Runs on exception stack */
ENTRY(nmi)
INTR_FRAME
PARAVIRT_ADJUST_EXCEPTION_FRAME
@@ -1462,8 +1450,14 @@ ENTRY(nmi)
* We check the variable because the first NMI could be in a
* breakpoint routine using a breakpoint stack.
*/
- lea 6*8(%rsp), %rdx
- test_in_nmi rdx, 4*8(%rsp), nested_nmi, first_nmi
+ lea 6*8(%rsp), %rdx
+ cmpq %rdx, 4*8(%rsp)
+ ja first_nmi
+ subq $EXCEPTION_STKSZ, %rdx
+ cmpq %rdx, 4*8(%rsp)
+ jb first_nmi
+ jmp nested_nmi
+
CFI_REMEMBER_STATE
nested_nmi:
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 12/17] x86: get rid of KERNEL_STACK_OFFSET
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (10 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 11/17] x86: entry_64.S: fold test_in_nmi macro into its only user Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 13/17] x86: ia32entry.S: fold IA32_ARG_FIXUP macro into its callers Denys Vlasenko
` (5 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.
Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.
Ironically, none of them benefit from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.
And ia32_sysenter_target even needs to *undo* this optimization:
it constructs iret stack with pushes instead of movs,
so it needs to start right at the top.
This patch eliminates KERNEL_STACK_OFFSET.
PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Semi-mysterious expressions THREAD_INFO(%rsp,RIP) - "why RIP??"
are now replaced by more logical THREAD_INFO(%rsp,SIZEOF_PTREGS) -
"rsp is SIZEOF_PTREGS bytes below the top, calculate
thread_info's address using that information"
Net result in code is that one instruction is eliminated,
and constants in several others have changed.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/ia32/ia32entry.S | 35 +++++++++++++++++------------------
arch/x86/include/asm/calling.h | 2 ++
arch/x86/include/asm/thread_info.h | 8 +++-----
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/entry_64.S | 13 ++++++-------
arch/x86/kernel/process_32.c | 3 +--
arch/x86/kernel/process_64.c | 3 +--
arch/x86/kernel/smpboot.c | 3 +--
arch/x86/xen/smp.c | 3 +--
9 files changed, 33 insertions(+), 39 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index fe6eaf7..8cd08de 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -125,7 +125,6 @@ ENTRY(ia32_sysenter_target)
CFI_REGISTER rsp,rbp
SWAPGS_UNSAFE_STACK
movq PER_CPU_VAR(kernel_stack), %rsp
- addq $(KERNEL_STACK_OFFSET),%rsp
/*
* No need to follow this irqs on/off section: the syscall
* disabled irqs, here we enable it straight after entry:
@@ -139,7 +138,7 @@ ENTRY(ia32_sysenter_target)
CFI_REL_OFFSET rsp,0
pushfq_cfi
/*CFI_REL_OFFSET rflags,0*/
- movl TI_sysenter_return+THREAD_INFO(%rsp,3*8-KERNEL_STACK_OFFSET),%r10d
+ movl TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d
CFI_REGISTER rip,r10
pushq_cfi $__USER32_CS
/*CFI_REL_OFFSET cs,0*/
@@ -161,8 +160,8 @@ ENTRY(ia32_sysenter_target)
1: movl (%rbp),%ebp
_ASM_EXTABLE(1b,ia32_badarg)
ASM_CLAC
- orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+ orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
CFI_REMEMBER_STATE
jnz sysenter_tracesys
cmpq $(IA32_NR_syscalls-1),%rax
@@ -174,10 +173,10 @@ sysenter_dispatch:
movq %rax,RAX(%rsp)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz sysexit_audit
sysexit_from_sys_call:
- andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
+ andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
/* clear IF, that popfq doesn't enable interrupts early */
andl $~0x200,EFLAGS(%rsp)
movl RIP(%rsp),%edx /* User %eip */
@@ -220,7 +219,7 @@ sysexit_from_sys_call:
.endm
.macro auditsys_exit exit
- testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz ia32_ret_from_sys_call
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
@@ -235,7 +234,7 @@ sysexit_from_sys_call:
movl $(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT),%edi
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl %edi,TI_flags+THREAD_INFO(%rsp,RIP)
+ testl %edi,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jz \exit
CLEAR_RREGS
jmp int_with_check
@@ -253,7 +252,7 @@ sysexit_audit:
sysenter_tracesys:
#ifdef CONFIG_AUDITSYSCALL
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jz sysenter_auditsys
#endif
SAVE_EXTRA_REGS
@@ -302,7 +301,7 @@ ENDPROC(ia32_sysenter_target)
ENTRY(ia32_cstar_target)
CFI_STARTPROC32 simple
CFI_SIGNAL_FRAME
- CFI_DEF_CFA rsp,KERNEL_STACK_OFFSET
+ CFI_DEF_CFA rsp,0
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
SWAPGS_UNSAFE_STACK
@@ -314,7 +313,7 @@ ENTRY(ia32_cstar_target)
* disabled irqs and here we enable it straight after entry:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- ALLOC_PTREGS_ON_STACK 8 /* +8: space for orig_ax */
+ ALLOC_PTREGS_ON_STACK 6*8 /* space for orig_ax and iret frame */
SAVE_C_REGS_EXCEPT_RCX_R891011
movl %eax,%eax /* zero extension */
movq %rax,ORIG_RAX(%rsp)
@@ -337,8 +336,8 @@ ENTRY(ia32_cstar_target)
1: movl (%r8),%r9d
_ASM_EXTABLE(1b,ia32_badarg)
ASM_CLAC
- orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+ orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
CFI_REMEMBER_STATE
jnz cstar_tracesys
cmpq $IA32_NR_syscalls-1,%rax
@@ -350,10 +349,10 @@ cstar_dispatch:
movq %rax,RAX(%rsp)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_ALLWORK_MASK,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz sysretl_audit
sysretl_from_sys_call:
- andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
+ andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
RESTORE_RSI_RDI_RDX
movl RIP(%rsp),%ecx
CFI_REGISTER rip,rcx
@@ -388,7 +387,7 @@ sysretl_audit:
cstar_tracesys:
#ifdef CONFIG_AUDITSYSCALL
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jz cstar_auditsys
#endif
xchgl %r9d,%ebp
@@ -455,8 +454,8 @@ ENTRY(ia32_syscall)
this could be a problem. */
ALLOC_PTREGS_ON_STACK
SAVE_C_REGS_EXCEPT_R891011
- orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP)
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+ orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz ia32_tracesys
cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index e8e2e41..aa9113e 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -88,6 +88,8 @@ For 32-bit we have the following conventions - kernel is built with
#define RSP 19*8
#define SS 20*8
+#define SIZEOF_PTREGS 21*8
+
.macro ALLOC_PTREGS_ON_STACK addskip=0
subq $15*8+\addskip, %rsp
CFI_ADJUST_CFA_OFFSET 15*8+\addskip
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 8540538..af9b74c 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -151,7 +151,6 @@ struct thread_info {
#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW)
#define STACK_WARN (THREAD_SIZE/8)
-#define KERNEL_STACK_OFFSET (5*(BITS_PER_LONG/8))
/*
* macros/functions for gaining access to the thread information structure
@@ -165,8 +164,7 @@ DECLARE_PER_CPU(unsigned long, kernel_stack);
static inline struct thread_info *current_thread_info(void)
{
struct thread_info *ti;
- ti = (void *)(this_cpu_read_stable(kernel_stack) +
- KERNEL_STACK_OFFSET - THREAD_SIZE);
+ ti = (void *)(this_cpu_read_stable(kernel_stack) - THREAD_SIZE);
return ti;
}
@@ -175,13 +173,13 @@ static inline struct thread_info *current_thread_info(void)
/* how to get the thread information struct from ASM */
#define GET_THREAD_INFO(reg) \
_ASM_MOV PER_CPU_VAR(kernel_stack),reg ; \
- _ASM_SUB $(THREAD_SIZE-KERNEL_STACK_OFFSET),reg ;
+ _ASM_SUB $(THREAD_SIZE),reg ;
/*
* Same if PER_CPU_VAR(kernel_stack) is, perhaps with some offset, already in
* a certain register (to be used in assembler memory operands).
*/
-#define THREAD_INFO(reg, off) KERNEL_STACK_OFFSET+(off)-THREAD_SIZE(reg)
+#define THREAD_INFO(reg, off) (off)-THREAD_SIZE(reg)
#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ef1b93f..605034f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1114,7 +1114,7 @@ static __init int setup_disablecpuid(char *arg)
__setup("clearcpuid=", setup_disablecpuid);
DEFINE_PER_CPU(unsigned long, kernel_stack) =
- (unsigned long)&init_thread_union - KERNEL_STACK_OFFSET + THREAD_SIZE;
+ (unsigned long)&init_thread_union + THREAD_SIZE;
EXPORT_PER_CPU_SYMBOL(kernel_stack);
#ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index bf882af..f652ac6 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -237,7 +237,7 @@ ENDPROC(native_usergs_sysret64)
ENTRY(system_call)
CFI_STARTPROC simple
CFI_SIGNAL_FRAME
- CFI_DEF_CFA rsp,KERNEL_STACK_OFFSET
+ CFI_DEF_CFA rsp,0
CFI_REGISTER rip,rcx
/*CFI_REGISTER rflags,r11*/
SWAPGS_UNSAFE_STACK
@@ -249,19 +249,18 @@ ENTRY(system_call)
GLOBAL(system_call_after_swapgs)
movq %rsp,PER_CPU_VAR(old_rsp)
- /* kernel_stack is set so that 5 slots (iret frame) are preallocated */
movq PER_CPU_VAR(kernel_stack),%rsp
/*
* No need to follow this irqs off/on section - it's straight
* and short:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- ALLOC_PTREGS_ON_STACK 8 /* +8: space for orig_ax */
+ ALLOC_PTREGS_ON_STACK 6*8 /* space for orig_ax and iret frame */
SAVE_C_REGS
movq %rax,ORIG_RAX(%rsp)
movq %rcx,RIP(%rsp)
CFI_REL_OFFSET rip,RIP
- testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz tracesys
system_call_fastpath:
#if __SYSCALL_MASK == ~0
@@ -285,7 +284,7 @@ sysret_check:
LOCKDEP_SYS_EXIT
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- movl TI_flags+THREAD_INFO(%rsp,RIP),%edx
+ movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
andl %edi,%edx
jnz sysret_careful
CFI_REMEMBER_STATE
@@ -373,7 +372,7 @@ sysret_audit:
/* Do syscall tracing */
tracesys:
#ifdef CONFIG_AUDITSYSCALL
- testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP)
+ testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jz auditsys
#endif
SAVE_EXTRA_REGS
@@ -879,7 +878,7 @@ __do_double_fault:
jne do_double_fault /* This shouldn't happen... */
1:
movq PER_CPU_VAR(kernel_stack),%rax
- subq $(6*8-KERNEL_STACK_OFFSET),%rax /* Reset to original stack */
+ subq $(6*8),%rax /* Reset to original stack */
movq %rax,RSP(%rdi)
movq $0,(%rax) /* Missing (lost) #GP error code */
movq $general_protection,RIP(%rdi)
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7bc86bb..e72b776 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -315,8 +315,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
arch_end_context_switch(next_p);
this_cpu_write(kernel_stack,
- (unsigned long)task_stack_page(next_p) +
- THREAD_SIZE - KERNEL_STACK_OFFSET);
+ (unsigned long)task_stack_page(next_p) + THREAD_SIZE);
/*
* Restore %gs if needed (which is common)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ca5b02d..2105487e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -372,8 +372,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
this_cpu_write(kernel_stack,
- (unsigned long)task_stack_page(next_p) +
- THREAD_SIZE - KERNEL_STACK_OFFSET);
+ (unsigned long)task_stack_page(next_p) + THREAD_SIZE);
/*
* Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5492798..788e215 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -775,8 +775,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
initial_gs = per_cpu_offset(cpu);
#endif
per_cpu(kernel_stack, cpu) =
- (unsigned long)task_stack_page(idle) -
- KERNEL_STACK_OFFSET + THREAD_SIZE;
+ (unsigned long)task_stack_page(idle) + THREAD_SIZE;
early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
initial_code = (unsigned long)start_secondary;
stack_start = idle->thread.sp;
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..bf549d5 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -443,8 +443,7 @@ static int xen_cpu_up(unsigned int cpu, struct task_struct *idle)
clear_tsk_thread_flag(idle, TIF_FORK);
#endif
per_cpu(kernel_stack, cpu) =
- (unsigned long)task_stack_page(idle) -
- KERNEL_STACK_OFFSET + THREAD_SIZE;
+ (unsigned long)task_stack_page(idle) + THREAD_SIZE;
xen_setup_runstate_info(cpu);
xen_setup_timer(cpu);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 13/17] x86: ia32entry.S: fold IA32_ARG_FIXUP macro into its callers
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (11 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 12/17] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 14/17] x86: ia32entry.S: use mov instead of push/pop where possible Denys Vlasenko
` (4 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
Use of a small macro - one with conditional expansion - does more harm
than good. It obfuscates code, with minimal code reuse.
For example, because of obfuscation it's not obvious that
in ia32_sysenter_target, we can optimize loading of r9 -
currently it is loaded with a detour through ebp.
This patch folds IA32_ARG_FIXUP macro into its callers.
No code changes.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/ia32/ia32entry.S | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 8cd08de..c70c9a0 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -30,17 +30,6 @@
.section .entry.text, "ax"
- .macro IA32_ARG_FIXUP noebp=0
- movl %edi,%r8d
- .if \noebp
- .else
- movl %ebp,%r9d
- .endif
- xchg %ecx,%esi
- movl %ebx,%edi
- movl %edx,%edx /* zero extension */
- .endm
-
/* clobbers %rax */
.macro CLEAR_RREGS _r9=rax
xorl %eax,%eax
@@ -167,7 +156,12 @@ ENTRY(ia32_sysenter_target)
cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
sysenter_do_call:
- IA32_ARG_FIXUP
+ /* 32bit syscall -> 64bit C ABI argument conversion */
+ movl %edi,%r8d /* arg5 */
+ movl %ebp,%r9d /* arg6 */
+ xchg %ecx,%esi /* rsi:arg2, rcx:arg4 */
+ movl %ebx,%edi /* arg1 */
+ movl %edx,%edx /* arg3 (zero extension) */
sysenter_dispatch:
call *ia32_sys_call_table(,%rax,8)
movq %rax,RAX(%rsp)
@@ -343,7 +337,12 @@ ENTRY(ia32_cstar_target)
cmpq $IA32_NR_syscalls-1,%rax
ja ia32_badsys
cstar_do_call:
- IA32_ARG_FIXUP 1
+ /* 32bit syscall -> 64bit C ABI argument conversion */
+ movl %edi,%r8d /* arg5 */
+ /* r9 already loaded */ /* arg6 */
+ xchg %ecx,%esi /* rsi:arg2, rcx:arg4 */
+ movl %ebx,%edi /* arg1 */
+ movl %edx,%edx /* arg3 (zero extension) */
cstar_dispatch:
call *ia32_sys_call_table(,%rax,8)
movq %rax,RAX(%rsp)
@@ -460,7 +459,12 @@ ENTRY(ia32_syscall)
cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
ia32_do_call:
- IA32_ARG_FIXUP
+ /* 32bit syscall -> 64bit C ABI argument conversion */
+ movl %edi,%r8d /* arg5 */
+ movl %ebp,%r9d /* arg6 */
+ xchg %ecx,%esi /* rsi:arg2, rcx:arg4 */
+ movl %ebx,%edi /* arg1 */
+ movl %edx,%edx /* arg3 (zero extension) */
call *ia32_sys_call_table(,%rax,8) # xxx: rip relative
ia32_sysret:
movq %rax,RAX(%rsp)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 14/17] x86: ia32entry.S: use mov instead of push/pop where possible
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (12 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 13/17] x86: ia32entry.S: fold IA32_ARG_FIXUP macro into its callers Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 15/17] x86: code shrink in paranoid_exit Denys Vlasenko
` (3 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
mov insns are faster than push/pops: some CPUs can execute
two movs per cycle, but only one push/pop.
Logic is not changed by this patch.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/ia32/ia32entry.S | 54 ++++++++++++++++++++++-------------------------
1 file changed, 25 insertions(+), 29 deletions(-)
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index c70c9a0..844ef4f 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -119,28 +119,25 @@ ENTRY(ia32_sysenter_target)
* disabled irqs, here we enable it straight after entry:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- /* Construct iret frame (ss,rsp,rflags,cs,rip) */
+ /* Construct part of iret frame (ss,rsp,rflags) */
movl %ebp,%ebp /* zero extension */
pushq_cfi $__USER32_DS
/*CFI_REL_OFFSET ss,0*/
+ movl %eax,%eax /* zero extension */
pushq_cfi %rbp
CFI_REL_OFFSET rsp,0
pushfq_cfi
/*CFI_REL_OFFSET rflags,0*/
- movl TI_sysenter_return+THREAD_INFO(%rsp,3*8),%r10d
- CFI_REGISTER rip,r10
- pushq_cfi $__USER32_CS
- /*CFI_REL_OFFSET cs,0*/
- movl %eax, %eax
- /* Store thread_info->sysenter_return in rip stack slot */
- pushq_cfi %r10
- CFI_REL_OFFSET rip,0
- /* Store orig_ax */
- pushq_cfi %rax
- /* Construct the rest of "struct pt_regs" */
cld
- ALLOC_PTREGS_ON_STACK
+ /* Construct the rest of pt_regs */
+ ALLOC_PTREGS_ON_STACK 3*8 /* 3*8: space for orig_ax,rip,cs */
+ movl TI_sysenter_return+THREAD_INFO(%rsp,SIZEOF_PTREGS),%r10d
SAVE_C_REGS_EXCEPT_R891011
+ movq %rax,ORIG_RAX(%rsp)
+ /* pt_regs->ip = thread_info->sysenter_return */
+ movq %r10,RIP(%rsp)
+ CFI_REL_OFFSET rip,RIP
+ movq $__USER32_CS,CS(%rsp)
/*
* no need to do an access_ok check here because rbp has been
* 32bit zero extended
@@ -171,10 +168,12 @@ sysenter_dispatch:
jnz sysexit_audit
sysexit_from_sys_call:
andl $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
- /* clear IF, that popfq doesn't enable interrupts early */
+ /* clear IF, so that popfq won't enable interrupts early */
andl $~0x200,EFLAGS(%rsp)
movl RIP(%rsp),%edx /* User %eip */
CFI_REGISTER rip,rdx
+ movl RSP(%rsp),%ecx /* User %esp */
+ CFI_REGISTER rsp,rcx
RESTORE_RSI_RDI
/* pop everything except ss,rsp,rflags slots */
REMOVE_PTREGS_FROM_STACK 8*3
@@ -184,8 +183,6 @@ sysexit_from_sys_call:
xorq %r11,%r11
popfq_cfi
/*CFI_RESTORE rflags*/
- popq_cfi %rcx /* User %esp */
- CFI_REGISTER rsp,rcx
TRACE_IRQS_ON
/*
* 32bit SYSEXIT restores eip from edx, esp from ecx.
@@ -307,21 +304,21 @@ ENTRY(ia32_cstar_target)
* disabled irqs and here we enable it straight after entry:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- ALLOC_PTREGS_ON_STACK 6*8 /* space for orig_ax and iret frame */
- SAVE_C_REGS_EXCEPT_RCX_R891011
- movl %eax,%eax /* zero extension */
+ movl %eax,%eax /* zero extension */
+ ALLOC_PTREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
movq %rax,ORIG_RAX(%rsp)
movq %rcx,RIP(%rsp)
CFI_REL_OFFSET rip,RIP
- movq %rbp,RCX(%rsp) /* this lies slightly to ptrace */
- movl %ebp,%ecx
movq $__USER32_CS,CS(%rsp)
- movq $__USER32_DS,SS(%rsp)
movq %r11,EFLAGS(%rsp)
/*CFI_REL_OFFSET rflags,EFLAGS*/
movq %r8,RSP(%rsp)
CFI_REL_OFFSET rsp,RSP
- /* iret stack frame is complete now */
+ movq $__USER32_DS,SS(%rsp)
+ /* iret frame is complete now */
+ SAVE_C_REGS_EXCEPT_RCX_R891011
+ movq %rbp,RCX(%rsp) /* this lies slightly to ptrace */
+ movl %ebp,%ecx
/*
* no need to do an access_ok check here because r8 has been
* 32bit zero extended
@@ -447,11 +444,11 @@ ENTRY(ia32_syscall)
*/
ENABLE_INTERRUPTS(CLBR_NONE)
movl %eax,%eax
- pushq_cfi %rax /* store orig_ax */
cld
/* note the registers are not zero extended to the sf.
this could be a problem. */
- ALLOC_PTREGS_ON_STACK
+ ALLOC_PTREGS_ON_STACK 1*8 /* 1*8: space for orig_ax */
+ movq %rax,ORIG_RAX(%rsp)
SAVE_C_REGS_EXCEPT_R891011
orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,SIZEOF_PTREGS)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 15/17] x86: code shrink in paranoid_exit
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (13 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 14/17] x86: ia32entry.S: use mov instead of push/pop where possible Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 17:44 ` [PATCH 16/17] x86: entry_64.S: trivial optimization for ENOSYS Denys Vlasenko
` (2 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
RESTORE_EXTRA_REGS + RESTORE_C_REGS looks small, but it's
a lot of instructions (fourteen). Let's reuse them.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/kernel/entry_64.S | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index f652ac6..7bc8c24 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1288,12 +1288,10 @@ ENTRY(paranoid_exit)
paranoid_swapgs:
TRACE_IRQS_IRETQ 0
SWAPGS_UNSAFE_STACK
- RESTORE_EXTRA_REGS
- RESTORE_C_REGS
- REMOVE_PTREGS_FROM_STACK 8
- jmp irq_return
+ jmp paranoid_restore1
paranoid_restore:
TRACE_IRQS_IRETQ_DEBUG 0
+paranoid_restore1:
RESTORE_EXTRA_REGS
RESTORE_C_REGS
REMOVE_PTREGS_FROM_STACK 8
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 16/17] x86: entry_64.S: trivial optimization for ENOSYS
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (14 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 15/17] x86: code shrink in paranoid_exit Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 22:48 ` Andy Lutomirski
2014-08-08 17:44 ` [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath Denys Vlasenko
2014-08-09 0:27 ` [PATCH v4 0/17] x86: entry.S optimizations H. Peter Anvin
17 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
This code uses a slightly shorter insn.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/kernel/entry_64.S | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 7bc8c24..5d639a6 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -272,6 +272,7 @@ system_call_fastpath:
ja badsys
movq %r10,%rcx
call *sys_call_table(,%rax,8) # XXX: rip relative
+from_badsys:
movq %rax,RAX(%rsp)
/*
* Syscall return path ending with SYSRET (fast path)
@@ -334,8 +335,8 @@ sysret_signal:
jmp int_check_syscall_exit_work
badsys:
- movq $-ENOSYS,RAX(%rsp)
- jmp ret_from_sys_call
+ movq $-ENOSYS,%rax
+ jmp from_badsys
#ifdef CONFIG_AUDITSYSCALL
/*
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (15 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 16/17] x86: entry_64.S: trivial optimization for ENOSYS Denys Vlasenko
@ 2014-08-08 17:44 ` Denys Vlasenko
2014-08-08 22:59 ` Andy Lutomirski
2014-08-10 18:47 ` [PATCH 17/17 v2] " Denys Vlasenko
2014-08-09 0:27 ` [PATCH v4 0/17] x86: entry.S optimizations H. Peter Anvin
17 siblings, 2 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-08 17:44 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
Before this patch, rcx and r11 were saved in pt_regs->rcx
and pt_regs->r11. Which looks natural, but requires messy
shuffling to/from iret stack whenever ptrace or e.g. iopl
wants to modify return address or flags - because that's
how these registers are used by SYSCALL/SYSRET.
This patch saves rcx and r11 in pt_regs->rip and pt_regs->flags,
and uses these values for SYSRET64 insn. Shuffling is eliminated.
On slow path, the values are saved in both locations - thus, ptrace
can modify rcx, and on syscall exit rcx will be different from
return address... don't see why that can be useful but anyway,
it works.
The lazy store of pt_regs->cs and pt_regs->ss is retained -
tests have shown that these insns do take ~2 cycles on fast path.
FIXUP_TOP_OF_STACK and RESTORE_TOP_OF_STACK macros are replaced
by a simpler single macro, STORE_IRET_FRAME_CS_SS.
stub_iopl is no longer needed: pt_regs->flags needs no fixing up.
PER_CPU(old_rsp) usage is simplified - now it is used only
as temp storage, and userspace stack pointer is immediately stored
in pt_regs->sp on syscall entry, instead of being used later,
on syscall exit. This allows to get rid of thread_struct::usersp.
Testing shows that syscall fast path is ~54.3 ns before
and after the patch (on 2.7 GHz Sandy Bridge CPU).
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/include/asm/calling.h | 18 +++--
arch/x86/include/asm/compat.h | 2 +-
arch/x86/include/asm/processor.h | 1 -
arch/x86/include/asm/ptrace.h | 8 +--
arch/x86/kernel/entry_64.S | 140 +++++++++++++++++----------------------
arch/x86/kernel/process_64.c | 8 +--
arch/x86/syscalls/syscall_64.tbl | 2 +-
arch/x86/um/sys_call_table_64.c | 2 +-
8 files changed, 78 insertions(+), 103 deletions(-)
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index aa9113e..7afbcea 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -95,7 +95,7 @@ For 32-bit we have the following conventions - kernel is built with
CFI_ADJUST_CFA_OFFSET 15*8+\addskip
.endm
- .macro SAVE_C_REGS_HELPER offset=0 rcx=1 r8plus=1
+ .macro SAVE_C_REGS_HELPER offset=0 rcx=1 r8910=1 r11=1
movq_cfi rdi, 14*8+\offset
movq_cfi rsi, 13*8+\offset
movq_cfi rdx, 12*8+\offset
@@ -103,21 +103,26 @@ For 32-bit we have the following conventions - kernel is built with
movq_cfi rcx, 11*8+\offset
.endif
movq_cfi rax, 10*8+\offset
- .if \r8plus
+ .if \r8910
movq_cfi r8, 9*8+\offset
movq_cfi r9, 8*8+\offset
movq_cfi r10, 7*8+\offset
+ .endif
+ .if \r11
movq_cfi r11, 6*8+\offset
.endif
.endm
.macro SAVE_C_REGS offset=0
- SAVE_C_REGS_HELPER \offset, 1, 1
+ SAVE_C_REGS_HELPER \offset, 1, 1, 1
.endm
.macro SAVE_C_REGS_EXCEPT_R891011
- SAVE_C_REGS_HELPER 0, 1, 0
+ SAVE_C_REGS_HELPER 0, 1, 0, 0
.endm
.macro SAVE_C_REGS_EXCEPT_RCX_R891011
- SAVE_C_REGS_HELPER 0, 0, 0
+ SAVE_C_REGS_HELPER 0, 0, 0, 0
+ .endm
+ .macro SAVE_C_REGS_EXCEPT_RCX_R11
+ SAVE_C_REGS_HELPER 0, 0, 1, 0
.endm
.macro SAVE_EXTRA_REGS offset=0
@@ -171,6 +176,9 @@ For 32-bit we have the following conventions - kernel is built with
.macro RESTORE_C_REGS_EXCEPT_RCX
RESTORE_C_REGS_HELPER 1,0,1,1,1
.endm
+ .macro RESTORE_C_REGS_EXCEPT_RCX_R11
+ RESTORE_C_REGS_HELPER 1,0,0,1,1
+ .endm
.macro RESTORE_RSI_RDI
RESTORE_C_REGS_HELPER 0,0,0,0,0
.endm
diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 59c6c40..acdee09 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -301,7 +301,7 @@ static inline void __user *arch_compat_alloc_user_space(long len)
sp = task_pt_regs(current)->sp;
} else {
/* -128 for the x32 ABI redzone */
- sp = this_cpu_read(old_rsp) - 128;
+ sp = task_pt_regs(current)->sp - 128;
}
return (void __user *)round_down(sp - len, 16);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..33c86c6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -474,7 +474,6 @@ struct thread_struct {
#ifdef CONFIG_X86_32
unsigned long sysenter_cs;
#else
- unsigned long usersp; /* Copy from PDA */
unsigned short es;
unsigned short ds;
unsigned short fsindex;
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index c822b35..271c779 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -140,12 +140,8 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
#endif
}
-#define current_user_stack_pointer() this_cpu_read(old_rsp)
-/* ia32 vs. x32 difference */
-#define compat_user_stack_pointer() \
- (test_thread_flag(TIF_IA32) \
- ? current_pt_regs()->sp \
- : this_cpu_read(old_rsp))
+#define current_user_stack_pointer() current_pt_regs()->sp
+#define compat_user_stack_pointer() current_pt_regs()->sp
#endif
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 5d639a6..efe9780 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -24,8 +24,6 @@
* - CFI macros are used to generate dwarf2 unwind information for better
* backtraces. They don't change any code.
* - ENTRY/END Define functions in the symbol table.
- * - FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK - Fix up the hardware stack
- * frame that is otherwise undefined after a SYSCALL
* - TRACE_IRQ_* - Trace hard interrupt state for lock debugging.
* - idtentry - Define exception entry points.
*/
@@ -121,29 +119,13 @@ ENDPROC(native_usergs_sysret64)
#endif
/*
- * C code is not supposed to know about undefined top of stack. Every time
- * a C function with an pt_regs argument is called from the SYSCALL based
- * fast path FIXUP_TOP_OF_STACK is needed.
- * RESTORE_TOP_OF_STACK syncs the syscall state after any possible ptregs
- * manipulation.
+ * struct pt_regs is not fully populated on the fast path
+ * (rcx, r11, cs and ss are not filled in).
+ * This macro populates segment registers in iret frame.
*/
-
- /* %rsp:at FRAMEEND */
- .macro FIXUP_TOP_OF_STACK tmp offset=0
- movq PER_CPU_VAR(old_rsp),\tmp
- movq \tmp,RSP+\offset(%rsp)
- movq $__USER_DS,SS+\offset(%rsp)
- movq $__USER_CS,CS+\offset(%rsp)
- movq $-1,RCX+\offset(%rsp)
- movq R11+\offset(%rsp),\tmp /* get eflags */
- movq \tmp,EFLAGS+\offset(%rsp)
- .endm
-
- .macro RESTORE_TOP_OF_STACK tmp offset=0
- movq RSP+\offset(%rsp),\tmp
- movq \tmp,PER_CPU_VAR(old_rsp)
- movq EFLAGS+\offset(%rsp),\tmp
- movq \tmp,R11+\offset(%rsp)
+ .macro STORE_IRET_FRAME_CS_SS offset=0
+ movq $__USER_CS,CS+\offset(%rsp)
+ movq $__USER_DS,SS+\offset(%rsp)
.endm
/*
@@ -226,12 +208,16 @@ ENDPROC(native_usergs_sysret64)
* Interrupts are off on entry.
* Only called from user space.
*
- * XXX if we had a free scratch register we could save the RSP into the stack frame
- * and report it properly in ps. Unfortunately we haven't.
- *
* When user can change the frames always force IRET. That is because
* it deals with uncanonical addresses better. SYSRET has trouble
* with them due to bugs in both AMD and Intel CPUs.
+ *
+ * When returning through fast path, userspace sees rcx = return address,
+ * r11 = rflags. When returning through iret (e.g. if audit is active),
+ * these registers may contain garbage.
+ * For ptrace we manage to avoid that: when we hit slow path on entry,
+ * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
+ * If slow path is entered only on exit, there will be garbage.
*/
ENTRY(system_call)
@@ -256,9 +242,16 @@ GLOBAL(system_call_after_swapgs)
*/
ENABLE_INTERRUPTS(CLBR_NONE)
ALLOC_PTREGS_ON_STACK 6*8 /* space for orig_ax and iret frame */
- SAVE_C_REGS
- movq %rax,ORIG_RAX(%rsp)
- movq %rcx,RIP(%rsp)
+ SAVE_C_REGS_EXCEPT_RCX_R11
+ /* Fill orig_ax and, partially, iret frame (rip,flags,rsp) */
+ movq %rax,ORIG_RAX(%rsp)
+ movq %rcx,RIP(%rsp)
+ movq PER_CPU_VAR(old_rsp),%rcx
+ /*movq $__USER_CS,CS(%rsp) - moved out of fast path */
+ movq %r11,EFLAGS(%rsp)
+ movq %rcx,RSP(%rsp)
+ /*movq $__USER_DS,SS(%rsp) - moved out of fast path */
+
CFI_REL_OFFSET rip,RIP
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz tracesys
@@ -276,7 +269,7 @@ from_badsys:
movq %rax,RAX(%rsp)
/*
* Syscall return path ending with SYSRET (fast path)
- * Has incomplete stack frame and undefined top of stack.
+ * Has incompletely filled pt_regs and iret frame's ss and cs.
*/
ret_from_sys_call:
movl $_TIF_ALLWORK_MASK,%edi
@@ -293,11 +286,12 @@ sysret_check:
* sysretq will re-enable interrupts:
*/
TRACE_IRQS_ON
- RESTORE_C_REGS_EXCEPT_RCX
- movq RIP(%rsp),%rcx
+ RESTORE_C_REGS_EXCEPT_RCX_R11
+ movq RIP(%rsp),%rcx
CFI_REGISTER rip,rcx
+ movq EFLAGS(%rsp),%r11
/*CFI_REGISTER rflags,r11*/
- movq PER_CPU_VAR(old_rsp), %rsp
+ movq RSP(%rsp),%rsp
/*
* 64bit SYSRET restores rip from rcx,
* rflags from r11 (but RF and VM bits are forced to 0),
@@ -309,6 +303,7 @@ sysret_check:
/* Handle reschedules */
/* edx: work, edi: workmask */
sysret_careful:
+ STORE_IRET_FRAME_CS_SS
bt $TIF_NEED_RESCHED,%edx
jnc sysret_signal
TRACE_IRQS_ON
@@ -331,7 +326,6 @@ sysret_signal:
* These all wind up with the iret return path anyway,
* so just join that path right now.
*/
- FIXUP_TOP_OF_STACK %r11
jmp int_check_syscall_exit_work
badsys:
@@ -370,15 +364,18 @@ sysret_audit:
jmp sysret_check
#endif /* CONFIG_AUDITSYSCALL */
- /* Do syscall tracing */
+ /* Do entry syscall tracing */
tracesys:
+ movq RIP(%rsp),%rcx
+ STORE_IRET_FRAME_CS_SS /* fill the rest of iret frame */
+ movq %r11,R11(%rsp) /* fill the rest of pt_regs */
+ movq %rcx,RCX(%rsp) /* ditto */
#ifdef CONFIG_AUDITSYSCALL
testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jz auditsys
#endif
SAVE_EXTRA_REGS
movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
- FIXUP_TOP_OF_STACK %rdi
movq %rsp,%rdi
call syscall_trace_enter
/*
@@ -402,7 +399,7 @@ tracesys:
/*
* Syscall return path ending with IRET.
- * Has correct top of stack, but partial stack frame.
+ * Has correct top of stack, but possibly only partially filled pt_regs.
*/
GLOBAL(int_ret_from_sys_call)
DISABLE_INTERRUPTS(CLBR_NONE)
@@ -469,43 +466,15 @@ ENTRY(stub_\func)
CFI_STARTPROC
DEFAULT_FRAME 0, 8 /* offset 8: return address */
SAVE_EXTRA_REGS 8
- FIXUP_TOP_OF_STACK %r11, 8
- call sys_\func
- RESTORE_TOP_OF_STACK %r11, 8
- ret
+ STORE_IRET_FRAME_CS_SS 8 /* not sure we need this. paranoia */
+ jmp sys_\func
CFI_ENDPROC
END(stub_\func)
.endm
- .macro FIXED_FRAME label,func
-ENTRY(\label)
- CFI_STARTPROC
- DEFAULT_FRAME 0, 8 /* offset 8: return address */
- FIXUP_TOP_OF_STACK %r11, 8
- call \func
- RESTORE_TOP_OF_STACK %r11, 8
- ret
- CFI_ENDPROC
-END(\label)
- .endm
-
FORK_LIKE clone
FORK_LIKE fork
FORK_LIKE vfork
- FIXED_FRAME stub_iopl, sys_iopl
-
-ENTRY(stub_execve)
- CFI_STARTPROC
- addq $8, %rsp
- DEFAULT_FRAME 0
- SAVE_EXTRA_REGS
- FIXUP_TOP_OF_STACK %r11
- call sys_execve
- movq %rax,RAX(%rsp)
- RESTORE_EXTRA_REGS
- jmp int_ret_from_sys_call
- CFI_ENDPROC
-END(stub_execve)
/*
* sigreturn is special because it needs to restore all registers on return.
@@ -516,25 +485,35 @@ ENTRY(stub_rt_sigreturn)
addq $8, %rsp
DEFAULT_FRAME 0
SAVE_EXTRA_REGS
- FIXUP_TOP_OF_STACK %r11
+ STORE_IRET_FRAME_CS_SS
call sys_rt_sigreturn
- movq %rax,RAX(%rsp) # fixme, this could be done at the higher layer
+stub_return:
+ movq %rax,RAX(%rsp)
RESTORE_EXTRA_REGS
jmp int_ret_from_sys_call
CFI_ENDPROC
END(stub_rt_sigreturn)
+ENTRY(stub_execve)
+ CFI_STARTPROC
+ addq $8, %rsp
+ DEFAULT_FRAME 0
+ SAVE_EXTRA_REGS
+ STORE_IRET_FRAME_CS_SS
+ call sys_execve
+ jmp stub_return
+ CFI_ENDPROC
+END(stub_execve)
+
#ifdef CONFIG_X86_X32_ABI
ENTRY(stub_x32_rt_sigreturn)
CFI_STARTPROC
addq $8, %rsp
DEFAULT_FRAME 0
SAVE_EXTRA_REGS
- FIXUP_TOP_OF_STACK %r11
+ STORE_IRET_FRAME_CS_SS
call sys32_x32_rt_sigreturn
- movq %rax,RAX(%rsp) # fixme, this could be done at the higher layer
- RESTORE_EXTRA_REGS
- jmp int_ret_from_sys_call
+ jmp stub_return
CFI_ENDPROC
END(stub_x32_rt_sigreturn)
@@ -543,15 +522,11 @@ ENTRY(stub_x32_execve)
addq $8, %rsp
DEFAULT_FRAME 0
SAVE_EXTRA_REGS
- FIXUP_TOP_OF_STACK %r11
+ STORE_IRET_FRAME_CS_SS
call compat_sys_execve
- RESTORE_TOP_OF_STACK %r11
- movq %rax,RAX(%rsp)
- RESTORE_EXTRA_REGS
- jmp int_ret_from_sys_call
+ jmp stub_return
CFI_ENDPROC
END(stub_x32_execve)
-
#endif
/*
@@ -576,10 +551,13 @@ ENTRY(ret_from_fork)
testl $3, CS(%rsp) # from kernel_thread?
jz 1f
+ /*
+ * TODO: can we instead check CS(%rsp) == __USER32_CS below?
+ * We won't need GET_THREAD_INFO(%rcx) then...
+ */
testl $_TIF_IA32, TI_flags(%rcx) # 32-bit compat task needs IRET
jnz int_ret_from_sys_call
- RESTORE_TOP_OF_STACK %rdi
jmp ret_from_sys_call # go to the SYSRET fastpath
1:
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 2105487e..5a7ba74 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -161,7 +161,6 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
childregs = task_pt_regs(p);
p->thread.sp = (unsigned long) childregs;
- p->thread.usersp = me->thread.usersp;
set_tsk_thread_flag(p, TIF_FORK);
p->thread.fpu_counter = 0;
p->thread.io_bitmap_ptr = NULL;
@@ -238,10 +237,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
loadsegment(es, _ds);
loadsegment(ds, _ds);
load_gs_index(0);
- current->thread.usersp = new_sp;
regs->ip = new_ip;
regs->sp = new_sp;
- this_cpu_write(old_rsp, new_sp);
regs->cs = _cs;
regs->ss = _ss;
regs->flags = X86_EFLAGS_IF;
@@ -359,8 +356,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
/*
* Switch the PDA and FPU contexts.
*/
- prev->usersp = this_cpu_read(old_rsp);
- this_cpu_write(old_rsp, next->usersp);
this_cpu_write(current_task, next_p);
/*
@@ -559,6 +554,5 @@ long sys_arch_prctl(int code, unsigned long addr)
unsigned long KSTK_ESP(struct task_struct *task)
{
- return (test_tsk_thread_flag(task, TIF_IA32)) ?
- (task_pt_regs(task)->sp) : ((task)->thread.usersp);
+ return task_pt_regs(task)->sp;
}
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index ec255a1..8b745ae 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -178,7 +178,7 @@
169 common reboot sys_reboot
170 common sethostname sys_sethostname
171 common setdomainname sys_setdomainname
-172 common iopl stub_iopl
+172 common iopl sys_iopl
173 common ioperm sys_ioperm
174 64 create_module
175 common init_module sys_init_module
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index f2f0723..5c81ed2 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -16,7 +16,7 @@
*/
/* Not going to be implemented by UML, since we have no hardware. */
-#define stub_iopl sys_ni_syscall
+#define sys_iopl sys_ni_syscall
#define sys_ioperm sys_ni_syscall
/*
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 16/17] x86: entry_64.S: trivial optimization for ENOSYS
2014-08-08 17:44 ` [PATCH 16/17] x86: entry_64.S: trivial optimization for ENOSYS Denys Vlasenko
@ 2014-08-08 22:48 ` Andy Lutomirski
0 siblings, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2014-08-08 22:48 UTC (permalink / raw)
To: Denys Vlasenko
Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Frederic Weisbecker, X86 ML, Alexei Starovoitov, Will Drewry,
Kees Cook
On Sat, Aug 9, 2014 at 2:44 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> This code uses a slightly shorter insn.
See this for an alternative approach:
http://thread.gmane.org/gmane.linux.kernel.cross-arch/24167
--Andy
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath
2014-08-08 17:44 ` [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath Denys Vlasenko
@ 2014-08-08 22:59 ` Andy Lutomirski
2014-08-10 15:00 ` Denys Vlasenko
2014-08-10 18:47 ` [PATCH 17/17 v2] " Denys Vlasenko
1 sibling, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-08-08 22:59 UTC (permalink / raw)
To: Denys Vlasenko
Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Frederic Weisbecker, X86 ML, Alexei Starovoitov, Will Drewry,
Kees Cook
On Sat, Aug 9, 2014 at 2:44 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> Before this patch, rcx and r11 were saved in pt_regs->rcx
> and pt_regs->r11. Which looks natural, but requires messy
> shuffling to/from iret stack whenever ptrace or e.g. iopl
> wants to modify return address or flags - because that's
> how these registers are used by SYSCALL/SYSRET.
>
> This patch saves rcx and r11 in pt_regs->rip and pt_regs->flags,
> and uses these values for SYSRET64 insn. Shuffling is eliminated.
>
> On slow path, the values are saved in both locations - thus, ptrace
> can modify rcx, and on syscall exit rcx will be different from
> return address... don't see why that can be useful but anyway,
> it works.
>
> The lazy store of pt_regs->cs and pt_regs->ss is retained -
> tests have shown that these insns do take ~2 cycles on fast path.
>
> FIXUP_TOP_OF_STACK and RESTORE_TOP_OF_STACK macros are replaced
> by a simpler single macro, STORE_IRET_FRAME_CS_SS.
>
> stub_iopl is no longer needed: pt_regs->flags needs no fixing up.
>
> PER_CPU(old_rsp) usage is simplified - now it is used only
> as temp storage, and userspace stack pointer is immediately stored
> in pt_regs->sp on syscall entry, instead of being used later,
> on syscall exit. This allows to get rid of thread_struct::usersp.
>
> Testing shows that syscall fast path is ~54.3 ns before
> and after the patch (on 2.7 GHz Sandy Bridge CPU).
>
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: X86 ML <x86@kernel.org>
> CC: Alexei Starovoitov <ast@plumgrid.com>
> CC: Will Drewry <wad@chromium.org>
> CC: Kees Cook <keescook@chromium.org>
> CC: linux-kernel@vger.kernel.org
> ---
> arch/x86/include/asm/calling.h | 18 +++--
> arch/x86/include/asm/compat.h | 2 +-
> arch/x86/include/asm/processor.h | 1 -
> arch/x86/include/asm/ptrace.h | 8 +--
> arch/x86/kernel/entry_64.S | 140 +++++++++++++++++----------------------
> arch/x86/kernel/process_64.c | 8 +--
> arch/x86/syscalls/syscall_64.tbl | 2 +-
> arch/x86/um/sys_call_table_64.c | 2 +-
> 8 files changed, 78 insertions(+), 103 deletions(-)
>
> diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
> index aa9113e..7afbcea 100644
> --- a/arch/x86/include/asm/calling.h
> +++ b/arch/x86/include/asm/calling.h
> @@ -95,7 +95,7 @@ For 32-bit we have the following conventions - kernel is built with
> CFI_ADJUST_CFA_OFFSET 15*8+\addskip
> .endm
>
> - .macro SAVE_C_REGS_HELPER offset=0 rcx=1 r8plus=1
> + .macro SAVE_C_REGS_HELPER offset=0 rcx=1 r8910=1 r11=1
> movq_cfi rdi, 14*8+\offset
> movq_cfi rsi, 13*8+\offset
> movq_cfi rdx, 12*8+\offset
> @@ -103,21 +103,26 @@ For 32-bit we have the following conventions - kernel is built with
> movq_cfi rcx, 11*8+\offset
> .endif
> movq_cfi rax, 10*8+\offset
> - .if \r8plus
> + .if \r8910
> movq_cfi r8, 9*8+\offset
> movq_cfi r9, 8*8+\offset
> movq_cfi r10, 7*8+\offset
> + .endif
> + .if \r11
> movq_cfi r11, 6*8+\offset
> .endif
> .endm
> .macro SAVE_C_REGS offset=0
> - SAVE_C_REGS_HELPER \offset, 1, 1
> + SAVE_C_REGS_HELPER \offset, 1, 1, 1
> .endm
> .macro SAVE_C_REGS_EXCEPT_R891011
> - SAVE_C_REGS_HELPER 0, 1, 0
> + SAVE_C_REGS_HELPER 0, 1, 0, 0
> .endm
> .macro SAVE_C_REGS_EXCEPT_RCX_R891011
> - SAVE_C_REGS_HELPER 0, 0, 0
> + SAVE_C_REGS_HELPER 0, 0, 0, 0
> + .endm
> + .macro SAVE_C_REGS_EXCEPT_RCX_R11
> + SAVE_C_REGS_HELPER 0, 0, 1, 0
> .endm
>
> .macro SAVE_EXTRA_REGS offset=0
> @@ -171,6 +176,9 @@ For 32-bit we have the following conventions - kernel is built with
> .macro RESTORE_C_REGS_EXCEPT_RCX
> RESTORE_C_REGS_HELPER 1,0,1,1,1
> .endm
> + .macro RESTORE_C_REGS_EXCEPT_RCX_R11
> + RESTORE_C_REGS_HELPER 1,0,0,1,1
> + .endm
> .macro RESTORE_RSI_RDI
> RESTORE_C_REGS_HELPER 0,0,0,0,0
> .endm
> diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
> index 59c6c40..acdee09 100644
> --- a/arch/x86/include/asm/compat.h
> +++ b/arch/x86/include/asm/compat.h
> @@ -301,7 +301,7 @@ static inline void __user *arch_compat_alloc_user_space(long len)
> sp = task_pt_regs(current)->sp;
> } else {
> /* -128 for the x32 ABI redzone */
> - sp = this_cpu_read(old_rsp) - 128;
> + sp = task_pt_regs(current)->sp - 128;
> }
>
> return (void __user *)round_down(sp - len, 16);
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index a4ea023..33c86c6 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -474,7 +474,6 @@ struct thread_struct {
> #ifdef CONFIG_X86_32
> unsigned long sysenter_cs;
> #else
> - unsigned long usersp; /* Copy from PDA */
> unsigned short es;
> unsigned short ds;
> unsigned short fsindex;
> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> index c822b35..271c779 100644
> --- a/arch/x86/include/asm/ptrace.h
> +++ b/arch/x86/include/asm/ptrace.h
> @@ -140,12 +140,8 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
> #endif
> }
>
> -#define current_user_stack_pointer() this_cpu_read(old_rsp)
> -/* ia32 vs. x32 difference */
> -#define compat_user_stack_pointer() \
> - (test_thread_flag(TIF_IA32) \
> - ? current_pt_regs()->sp \
> - : this_cpu_read(old_rsp))
> +#define current_user_stack_pointer() current_pt_regs()->sp
> +#define compat_user_stack_pointer() current_pt_regs()->sp
> #endif
>
> #ifdef CONFIG_X86_32
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 5d639a6..efe9780 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -24,8 +24,6 @@
> * - CFI macros are used to generate dwarf2 unwind information for better
> * backtraces. They don't change any code.
> * - ENTRY/END Define functions in the symbol table.
> - * - FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK - Fix up the hardware stack
> - * frame that is otherwise undefined after a SYSCALL
> * - TRACE_IRQ_* - Trace hard interrupt state for lock debugging.
> * - idtentry - Define exception entry points.
> */
> @@ -121,29 +119,13 @@ ENDPROC(native_usergs_sysret64)
> #endif
>
> /*
> - * C code is not supposed to know about undefined top of stack. Every time
> - * a C function with an pt_regs argument is called from the SYSCALL based
> - * fast path FIXUP_TOP_OF_STACK is needed.
> - * RESTORE_TOP_OF_STACK syncs the syscall state after any possible ptregs
> - * manipulation.
> + * struct pt_regs is not fully populated on the fast path
> + * (rcx, r11, cs and ss are not filled in).
> + * This macro populates segment registers in iret frame.
> */
> -
> - /* %rsp:at FRAMEEND */
> - .macro FIXUP_TOP_OF_STACK tmp offset=0
> - movq PER_CPU_VAR(old_rsp),\tmp
> - movq \tmp,RSP+\offset(%rsp)
> - movq $__USER_DS,SS+\offset(%rsp)
> - movq $__USER_CS,CS+\offset(%rsp)
> - movq $-1,RCX+\offset(%rsp)
> - movq R11+\offset(%rsp),\tmp /* get eflags */
> - movq \tmp,EFLAGS+\offset(%rsp)
> - .endm
> -
> - .macro RESTORE_TOP_OF_STACK tmp offset=0
> - movq RSP+\offset(%rsp),\tmp
> - movq \tmp,PER_CPU_VAR(old_rsp)
> - movq EFLAGS+\offset(%rsp),\tmp
> - movq \tmp,R11+\offset(%rsp)
> + .macro STORE_IRET_FRAME_CS_SS offset=0
> + movq $__USER_CS,CS+\offset(%rsp)
> + movq $__USER_DS,SS+\offset(%rsp)
> .endm
>
> /*
> @@ -226,12 +208,16 @@ ENDPROC(native_usergs_sysret64)
> * Interrupts are off on entry.
> * Only called from user space.
> *
> - * XXX if we had a free scratch register we could save the RSP into the stack frame
> - * and report it properly in ps. Unfortunately we haven't.
> - *
> * When user can change the frames always force IRET. That is because
> * it deals with uncanonical addresses better. SYSRET has trouble
> * with them due to bugs in both AMD and Intel CPUs.
> + *
> + * When returning through fast path, userspace sees rcx = return address,
> + * r11 = rflags. When returning through iret (e.g. if audit is active),
> + * these registers may contain garbage.
> + * For ptrace we manage to avoid that: when we hit slow path on entry,
> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
> + * If slow path is entered only on exit, there will be garbage.
I don't like this. At least the current code puts something
deterministic in there (-1) for the slow path, even though it's wrong
and makes the slow path behave visibly differently from the fast path.
Leaking uninitialized data here is extra bad, though. Keep in mind
that, when a syscall entry is interrupted before fully setting up
pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
depending on the stack slot ordering, for a kernel secret
(kernel_stack?) to end up somewhere in task_pt_regs.
--Andy
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v4 0/17] x86: entry.S optimizations
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
` (16 preceding siblings ...)
2014-08-08 17:44 ` [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath Denys Vlasenko
@ 2014-08-09 0:27 ` H. Peter Anvin
17 siblings, 0 replies; 28+ messages in thread
From: H. Peter Anvin @ 2014-08-09 0:27 UTC (permalink / raw)
To: Denys Vlasenko, linux-kernel
Cc: Linus Torvalds, Oleg Nesterov, Andy Lutomirski,
Frederic Weisbecker, X86 ML, Alexei Starovoitov, Will Drewry,
Kees Cook
On 08/08/2014 10:44 AM, Denys Vlasenko wrote:
> Version 4 of the patchset.
>
> Please consider applying at least two first patches, they are definitely safe,
> and the second one fixes a latent bug.
>
Please note that the 3.17 merge window is currently open. This is not
3.17 material at this point, so I'm not going to apply anything until
after Kernel Summit.
In other words, a few weeks to work out the kinks.
-hpa
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath
2014-08-08 22:59 ` Andy Lutomirski
@ 2014-08-10 15:00 ` Denys Vlasenko
2014-08-10 22:42 ` Andy Lutomirski
0 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-10 15:00 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Frederic Weisbecker, X86 ML, Alexei Starovoitov, Will Drewry,
Kees Cook
On 08/09/2014 12:59 AM, Andy Lutomirski wrote:
> On Sat, Aug 9, 2014 at 2:44 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> Before this patch, rcx and r11 were saved in pt_regs->rcx
>> and pt_regs->r11. Which looks natural, but requires messy
>> shuffling to/from iret stack whenever ptrace or e.g. iopl
>> wants to modify return address or flags - because that's
>> how these registers are used by SYSCALL/SYSRET.
>>
>> This patch saves rcx and r11 in pt_regs->rip and pt_regs->flags,
>> and uses these values for SYSRET64 insn. Shuffling is eliminated.
>>
>> On slow path, the values are saved in both locations - thus, ptrace
>> can modify rcx, and on syscall exit rcx will be different from
>> return address... don't see why that can be useful but anyway,
>> it works.
>>
>> The lazy store of pt_regs->cs and pt_regs->ss is retained -
>> tests have shown that these insns do take ~2 cycles on fast path.
>>
>> FIXUP_TOP_OF_STACK and RESTORE_TOP_OF_STACK macros are replaced
>> by a simpler single macro, STORE_IRET_FRAME_CS_SS.
>>
>> stub_iopl is no longer needed: pt_regs->flags needs no fixing up.
>>
>> PER_CPU(old_rsp) usage is simplified - now it is used only
>> as temp storage, and userspace stack pointer is immediately stored
>> in pt_regs->sp on syscall entry, instead of being used later,
>> on syscall exit. This allows to get rid of thread_struct::usersp.
>>
>> Testing shows that syscall fast path is ~54.3 ns before
>> and after the patch (on 2.7 GHz Sandy Bridge CPU).
>>
>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> CC: Linus Torvalds <torvalds@linux-foundation.org>
>> CC: Oleg Nesterov <oleg@redhat.com>
>> CC: "H. Peter Anvin" <hpa@zytor.com>
>> CC: Andy Lutomirski <luto@amacapital.net>
>> CC: Frederic Weisbecker <fweisbec@gmail.com>
>> CC: X86 ML <x86@kernel.org>
>> CC: Alexei Starovoitov <ast@plumgrid.com>
>> CC: Will Drewry <wad@chromium.org>
>> CC: Kees Cook <keescook@chromium.org>
>> CC: linux-kernel@vger.kernel.org
>> ---
>> arch/x86/include/asm/calling.h | 18 +++--
>> arch/x86/include/asm/compat.h | 2 +-
>> arch/x86/include/asm/processor.h | 1 -
>> arch/x86/include/asm/ptrace.h | 8 +--
>> arch/x86/kernel/entry_64.S | 140 +++++++++++++++++----------------------
>> arch/x86/kernel/process_64.c | 8 +--
>> arch/x86/syscalls/syscall_64.tbl | 2 +-
>> arch/x86/um/sys_call_table_64.c | 2 +-
>> 8 files changed, 78 insertions(+), 103 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
>> index aa9113e..7afbcea 100644
>> --- a/arch/x86/include/asm/calling.h
>> +++ b/arch/x86/include/asm/calling.h
>> @@ -95,7 +95,7 @@ For 32-bit we have the following conventions - kernel is built with
>> CFI_ADJUST_CFA_OFFSET 15*8+\addskip
>> .endm
>>
>> - .macro SAVE_C_REGS_HELPER offset=0 rcx=1 r8plus=1
>> + .macro SAVE_C_REGS_HELPER offset=0 rcx=1 r8910=1 r11=1
>> movq_cfi rdi, 14*8+\offset
>> movq_cfi rsi, 13*8+\offset
>> movq_cfi rdx, 12*8+\offset
>> @@ -103,21 +103,26 @@ For 32-bit we have the following conventions - kernel is built with
>> movq_cfi rcx, 11*8+\offset
>> .endif
>> movq_cfi rax, 10*8+\offset
>> - .if \r8plus
>> + .if \r8910
>> movq_cfi r8, 9*8+\offset
>> movq_cfi r9, 8*8+\offset
>> movq_cfi r10, 7*8+\offset
>> + .endif
>> + .if \r11
>> movq_cfi r11, 6*8+\offset
>> .endif
>> .endm
>> .macro SAVE_C_REGS offset=0
>> - SAVE_C_REGS_HELPER \offset, 1, 1
>> + SAVE_C_REGS_HELPER \offset, 1, 1, 1
>> .endm
>> .macro SAVE_C_REGS_EXCEPT_R891011
>> - SAVE_C_REGS_HELPER 0, 1, 0
>> + SAVE_C_REGS_HELPER 0, 1, 0, 0
>> .endm
>> .macro SAVE_C_REGS_EXCEPT_RCX_R891011
>> - SAVE_C_REGS_HELPER 0, 0, 0
>> + SAVE_C_REGS_HELPER 0, 0, 0, 0
>> + .endm
>> + .macro SAVE_C_REGS_EXCEPT_RCX_R11
>> + SAVE_C_REGS_HELPER 0, 0, 1, 0
>> .endm
>>
>> .macro SAVE_EXTRA_REGS offset=0
>> @@ -171,6 +176,9 @@ For 32-bit we have the following conventions - kernel is built with
>> .macro RESTORE_C_REGS_EXCEPT_RCX
>> RESTORE_C_REGS_HELPER 1,0,1,1,1
>> .endm
>> + .macro RESTORE_C_REGS_EXCEPT_RCX_R11
>> + RESTORE_C_REGS_HELPER 1,0,0,1,1
>> + .endm
>> .macro RESTORE_RSI_RDI
>> RESTORE_C_REGS_HELPER 0,0,0,0,0
>> .endm
>> diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
>> index 59c6c40..acdee09 100644
>> --- a/arch/x86/include/asm/compat.h
>> +++ b/arch/x86/include/asm/compat.h
>> @@ -301,7 +301,7 @@ static inline void __user *arch_compat_alloc_user_space(long len)
>> sp = task_pt_regs(current)->sp;
>> } else {
>> /* -128 for the x32 ABI redzone */
>> - sp = this_cpu_read(old_rsp) - 128;
>> + sp = task_pt_regs(current)->sp - 128;
>> }
>>
>> return (void __user *)round_down(sp - len, 16);
>> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
>> index a4ea023..33c86c6 100644
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -474,7 +474,6 @@ struct thread_struct {
>> #ifdef CONFIG_X86_32
>> unsigned long sysenter_cs;
>> #else
>> - unsigned long usersp; /* Copy from PDA */
>> unsigned short es;
>> unsigned short ds;
>> unsigned short fsindex;
>> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
>> index c822b35..271c779 100644
>> --- a/arch/x86/include/asm/ptrace.h
>> +++ b/arch/x86/include/asm/ptrace.h
>> @@ -140,12 +140,8 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
>> #endif
>> }
>>
>> -#define current_user_stack_pointer() this_cpu_read(old_rsp)
>> -/* ia32 vs. x32 difference */
>> -#define compat_user_stack_pointer() \
>> - (test_thread_flag(TIF_IA32) \
>> - ? current_pt_regs()->sp \
>> - : this_cpu_read(old_rsp))
>> +#define current_user_stack_pointer() current_pt_regs()->sp
>> +#define compat_user_stack_pointer() current_pt_regs()->sp
>> #endif
>>
>> #ifdef CONFIG_X86_32
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index 5d639a6..efe9780 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -24,8 +24,6 @@
>> * - CFI macros are used to generate dwarf2 unwind information for better
>> * backtraces. They don't change any code.
>> * - ENTRY/END Define functions in the symbol table.
>> - * - FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK - Fix up the hardware stack
>> - * frame that is otherwise undefined after a SYSCALL
>> * - TRACE_IRQ_* - Trace hard interrupt state for lock debugging.
>> * - idtentry - Define exception entry points.
>> */
>> @@ -121,29 +119,13 @@ ENDPROC(native_usergs_sysret64)
>> #endif
>>
>> /*
>> - * C code is not supposed to know about undefined top of stack. Every time
>> - * a C function with an pt_regs argument is called from the SYSCALL based
>> - * fast path FIXUP_TOP_OF_STACK is needed.
>> - * RESTORE_TOP_OF_STACK syncs the syscall state after any possible ptregs
>> - * manipulation.
>> + * struct pt_regs is not fully populated on the fast path
>> + * (rcx, r11, cs and ss are not filled in).
>> + * This macro populates segment registers in iret frame.
>> */
>> -
>> - /* %rsp:at FRAMEEND */
>> - .macro FIXUP_TOP_OF_STACK tmp offset=0
>> - movq PER_CPU_VAR(old_rsp),\tmp
>> - movq \tmp,RSP+\offset(%rsp)
>> - movq $__USER_DS,SS+\offset(%rsp)
>> - movq $__USER_CS,CS+\offset(%rsp)
>> - movq $-1,RCX+\offset(%rsp)
>> - movq R11+\offset(%rsp),\tmp /* get eflags */
>> - movq \tmp,EFLAGS+\offset(%rsp)
>> - .endm
>> -
>> - .macro RESTORE_TOP_OF_STACK tmp offset=0
>> - movq RSP+\offset(%rsp),\tmp
>> - movq \tmp,PER_CPU_VAR(old_rsp)
>> - movq EFLAGS+\offset(%rsp),\tmp
>> - movq \tmp,R11+\offset(%rsp)
>> + .macro STORE_IRET_FRAME_CS_SS offset=0
>> + movq $__USER_CS,CS+\offset(%rsp)
>> + movq $__USER_DS,SS+\offset(%rsp)
>> .endm
>>
>> /*
>> @@ -226,12 +208,16 @@ ENDPROC(native_usergs_sysret64)
>> * Interrupts are off on entry.
>> * Only called from user space.
>> *
>> - * XXX if we had a free scratch register we could save the RSP into the stack frame
>> - * and report it properly in ps. Unfortunately we haven't.
>> - *
>> * When user can change the frames always force IRET. That is because
>> * it deals with uncanonical addresses better. SYSRET has trouble
>> * with them due to bugs in both AMD and Intel CPUs.
>> + *
>> + * When returning through fast path, userspace sees rcx = return address,
>> + * r11 = rflags. When returning through iret (e.g. if audit is active),
>> + * these registers may contain garbage.
>> + * For ptrace we manage to avoid that: when we hit slow path on entry,
>> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
>> + * If slow path is entered only on exit, there will be garbage.
>
> I don't like this. At least the current code puts something
> deterministic in there (-1) for the slow path, even though it's wrong
> and makes the slow path behave visibly differently from the fast path.
>
> Leaking uninitialized data here is extra bad, though. Keep in mind
> that, when a syscall entry is interrupted before fully setting up
> pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
> depending on the stack slot ordering, for a kernel secret
> (kernel_stack?) to end up somewhere in task_pt_regs.
It's easy to fix. We jump off fast path to slow path here:
movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
andl %edi,%edx
jnz sysret_careful
This is the only use of "sysret_careful" label.
Therefore, there we don't need to think about any other scenarios
besides "we are returning from syscall here".
My patch only fixes up segment regs:
/* Handle reschedules */
/* edx: work, edi: workmask */
sysret_careful:
+ STORE_IRET_FRAME_CS_SS
bt $TIF_NEED_RESCHED,%edx
pt_regs->rcx and ->r11 aren't initialized.
We can fix that e.g. by adding here four more insns:
movq RIP(%rsp),%rcx
movq %rcx,RCX(%rsp)
movq EFLAGS(%rsp),%rcx
movq %rcx,R11(%rsp)
similar to what my patch did on slow path on entry (see changes at
"tracesys" label).
Or we can just bite the bullet and save rcx and r11 on fast path
(there it will require just two insns). +2 cycles on most CPUs.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 17/17 v2] x86: simplify iret stack handling on SYSCALL64 fastpath
2014-08-08 17:44 ` [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath Denys Vlasenko
2014-08-08 22:59 ` Andy Lutomirski
@ 2014-08-10 18:47 ` Denys Vlasenko
1 sibling, 0 replies; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-10 18:47 UTC (permalink / raw)
To: linux-kernel
Cc: Denys Vlasenko, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, X86 ML, Alexei Starovoitov,
Will Drewry, Kees Cook
Before this patch, rcx and r11 were saved in pt_regs->rcx
and pt_regs->r11. Which looks natural, but requires messy
shuffling to/from iret frame whenever ptrace or e.g. sys_iopl
wants to modify return address or flags - because that's
how these registers are used by SYSCALL/SYSRET.
This patch saves rcx and r11 in pt_regs->rip and pt_regs->flags,
and uses these values for SYSRET64 insn. Shuffling is eliminated.
On slow path, the values are saved in both locations - thus, ptrace
can modify rcx, and on syscall exit rcx will be different from
return address... don't see why that can be useful but anyway,
it works.
The lazy store of pt_regs->cs and pt_regs->ss is retained -
tests have shown that these insns do take ~2 cycles on fast path.
FIXUP_TOP_OF_STACK and RESTORE_TOP_OF_STACK macros are replaced
by a simpler single macro, STORE_IRET_FRAME_CS_SS.
stub_iopl is no longer needed: pt_regs->flags needs no fixing up.
PER_CPU(old_rsp) usage is simplified - now it is used only
as temp storage, and userspace stack pointer is immediately stored
in pt_regs->sp on syscall entry, instead of being used later,
on syscall exit. This allows to get rid of thread_struct::usersp.
Testing shows that syscall fast path is ~54.3 ns before
and after the patch (on 2.7 GHz Sandy Bridge CPU).
Changes since v1:
= added code to fill pt_regs->rcx and ->r11 on syscall exit slowpath.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: X86 ML <x86@kernel.org>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: linux-kernel@vger.kernel.org
---
arch/x86/include/asm/calling.h | 18 +++--
arch/x86/include/asm/compat.h | 2 +-
arch/x86/include/asm/processor.h | 1 -
arch/x86/include/asm/ptrace.h | 8 +--
arch/x86/kernel/entry_64.S | 139 ++++++++++++++++-----------------------
arch/x86/kernel/process_64.c | 8 +--
arch/x86/syscalls/syscall_64.tbl | 2 +-
arch/x86/um/sys_call_table_64.c | 2 +-
8 files changed, 76 insertions(+), 104 deletions(-)
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index aa9113e..7afbcea 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -95,7 +95,7 @@ For 32-bit we have the following conventions - kernel is built with
CFI_ADJUST_CFA_OFFSET 15*8+\addskip
.endm
- .macro SAVE_C_REGS_HELPER offset=0 rcx=1 r8plus=1
+ .macro SAVE_C_REGS_HELPER offset=0 rcx=1 r8910=1 r11=1
movq_cfi rdi, 14*8+\offset
movq_cfi rsi, 13*8+\offset
movq_cfi rdx, 12*8+\offset
@@ -103,21 +103,26 @@ For 32-bit we have the following conventions - kernel is built with
movq_cfi rcx, 11*8+\offset
.endif
movq_cfi rax, 10*8+\offset
- .if \r8plus
+ .if \r8910
movq_cfi r8, 9*8+\offset
movq_cfi r9, 8*8+\offset
movq_cfi r10, 7*8+\offset
+ .endif
+ .if \r11
movq_cfi r11, 6*8+\offset
.endif
.endm
.macro SAVE_C_REGS offset=0
- SAVE_C_REGS_HELPER \offset, 1, 1
+ SAVE_C_REGS_HELPER \offset, 1, 1, 1
.endm
.macro SAVE_C_REGS_EXCEPT_R891011
- SAVE_C_REGS_HELPER 0, 1, 0
+ SAVE_C_REGS_HELPER 0, 1, 0, 0
.endm
.macro SAVE_C_REGS_EXCEPT_RCX_R891011
- SAVE_C_REGS_HELPER 0, 0, 0
+ SAVE_C_REGS_HELPER 0, 0, 0, 0
+ .endm
+ .macro SAVE_C_REGS_EXCEPT_RCX_R11
+ SAVE_C_REGS_HELPER 0, 0, 1, 0
.endm
.macro SAVE_EXTRA_REGS offset=0
@@ -171,6 +176,9 @@ For 32-bit we have the following conventions - kernel is built with
.macro RESTORE_C_REGS_EXCEPT_RCX
RESTORE_C_REGS_HELPER 1,0,1,1,1
.endm
+ .macro RESTORE_C_REGS_EXCEPT_RCX_R11
+ RESTORE_C_REGS_HELPER 1,0,0,1,1
+ .endm
.macro RESTORE_RSI_RDI
RESTORE_C_REGS_HELPER 0,0,0,0,0
.endm
diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 59c6c40..acdee09 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -301,7 +301,7 @@ static inline void __user *arch_compat_alloc_user_space(long len)
sp = task_pt_regs(current)->sp;
} else {
/* -128 for the x32 ABI redzone */
- sp = this_cpu_read(old_rsp) - 128;
+ sp = task_pt_regs(current)->sp - 128;
}
return (void __user *)round_down(sp - len, 16);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..33c86c6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -474,7 +474,6 @@ struct thread_struct {
#ifdef CONFIG_X86_32
unsigned long sysenter_cs;
#else
- unsigned long usersp; /* Copy from PDA */
unsigned short es;
unsigned short ds;
unsigned short fsindex;
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index c822b35..271c779 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -140,12 +140,8 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
#endif
}
-#define current_user_stack_pointer() this_cpu_read(old_rsp)
-/* ia32 vs. x32 difference */
-#define compat_user_stack_pointer() \
- (test_thread_flag(TIF_IA32) \
- ? current_pt_regs()->sp \
- : this_cpu_read(old_rsp))
+#define current_user_stack_pointer() current_pt_regs()->sp
+#define compat_user_stack_pointer() current_pt_regs()->sp
#endif
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 5d639a6..45e0797 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -24,8 +24,6 @@
* - CFI macros are used to generate dwarf2 unwind information for better
* backtraces. They don't change any code.
* - ENTRY/END Define functions in the symbol table.
- * - FIXUP_TOP_OF_STACK/RESTORE_TOP_OF_STACK - Fix up the hardware stack
- * frame that is otherwise undefined after a SYSCALL
* - TRACE_IRQ_* - Trace hard interrupt state for lock debugging.
* - idtentry - Define exception entry points.
*/
@@ -121,29 +119,13 @@ ENDPROC(native_usergs_sysret64)
#endif
/*
- * C code is not supposed to know about undefined top of stack. Every time
- * a C function with an pt_regs argument is called from the SYSCALL based
- * fast path FIXUP_TOP_OF_STACK is needed.
- * RESTORE_TOP_OF_STACK syncs the syscall state after any possible ptregs
- * manipulation.
+ * struct pt_regs is not fully populated on the fast path
+ * (rcx, r11, cs and ss are not filled in).
+ * This macro populates segment registers in iret frame.
*/
-
- /* %rsp:at FRAMEEND */
- .macro FIXUP_TOP_OF_STACK tmp offset=0
- movq PER_CPU_VAR(old_rsp),\tmp
- movq \tmp,RSP+\offset(%rsp)
- movq $__USER_DS,SS+\offset(%rsp)
- movq $__USER_CS,CS+\offset(%rsp)
- movq $-1,RCX+\offset(%rsp)
- movq R11+\offset(%rsp),\tmp /* get eflags */
- movq \tmp,EFLAGS+\offset(%rsp)
- .endm
-
- .macro RESTORE_TOP_OF_STACK tmp offset=0
- movq RSP+\offset(%rsp),\tmp
- movq \tmp,PER_CPU_VAR(old_rsp)
- movq EFLAGS+\offset(%rsp),\tmp
- movq \tmp,R11+\offset(%rsp)
+ .macro STORE_IRET_FRAME_CS_SS offset=0
+ movq $__USER_CS,CS+\offset(%rsp)
+ movq $__USER_DS,SS+\offset(%rsp)
.endm
/*
@@ -226,9 +208,6 @@ ENDPROC(native_usergs_sysret64)
* Interrupts are off on entry.
* Only called from user space.
*
- * XXX if we had a free scratch register we could save the RSP into the stack frame
- * and report it properly in ps. Unfortunately we haven't.
- *
* When user can change the frames always force IRET. That is because
* it deals with uncanonical addresses better. SYSRET has trouble
* with them due to bugs in both AMD and Intel CPUs.
@@ -255,11 +234,18 @@ GLOBAL(system_call_after_swapgs)
* and short:
*/
ENABLE_INTERRUPTS(CLBR_NONE)
- ALLOC_PTREGS_ON_STACK 6*8 /* space for orig_ax and iret frame */
- SAVE_C_REGS
- movq %rax,ORIG_RAX(%rsp)
- movq %rcx,RIP(%rsp)
+ ALLOC_PTREGS_ON_STACK 6*8 /* 6*8: space for orig_ax and iret frame */
+ SAVE_C_REGS_EXCEPT_RCX_R11
+ /* Fill orig_ax and, partially, iret frame (rip,flags,rsp) */
+ movq %rax,ORIG_RAX(%rsp)
+ movq %rcx,RIP(%rsp)
CFI_REL_OFFSET rip,RIP
+ movq PER_CPU_VAR(old_rsp),%rcx
+ /*movq $__USER_CS,CS(%rsp) - moved out of fast path */
+ movq %r11,EFLAGS(%rsp)
+ movq %rcx,RSP(%rsp)
+ /*movq $__USER_DS,SS(%rsp) - moved out of fast path */
+
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jnz tracesys
system_call_fastpath:
@@ -276,7 +262,7 @@ from_badsys:
movq %rax,RAX(%rsp)
/*
* Syscall return path ending with SYSRET (fast path)
- * Has incomplete stack frame and undefined top of stack.
+ * Has incompletely filled pt_regs and iret frame's ss and cs.
*/
ret_from_sys_call:
movl $_TIF_ALLWORK_MASK,%edi
@@ -293,11 +279,12 @@ sysret_check:
* sysretq will re-enable interrupts:
*/
TRACE_IRQS_ON
- RESTORE_C_REGS_EXCEPT_RCX
- movq RIP(%rsp),%rcx
+ RESTORE_C_REGS_EXCEPT_RCX_R11
+ movq RIP(%rsp),%rcx
CFI_REGISTER rip,rcx
+ movq EFLAGS(%rsp),%r11
/*CFI_REGISTER rflags,r11*/
- movq PER_CPU_VAR(old_rsp), %rsp
+ movq RSP(%rsp),%rsp
/*
* 64bit SYSRET restores rip from rcx,
* rflags from r11 (but RF and VM bits are forced to 0),
@@ -309,6 +296,11 @@ sysret_check:
/* Handle reschedules */
/* edx: work, edi: workmask */
sysret_careful:
+ movq RIP(%rsp),%rcx
+ movq EFLAGS(%rsp),%r11
+ STORE_IRET_FRAME_CS_SS /* fill the rest of iret frame */
+ movq %rcx,RCX(%rsp) /* fill the rest of pt_regs */
+ movq %r11,R11(%rsp) /* ditto */
bt $TIF_NEED_RESCHED,%edx
jnc sysret_signal
TRACE_IRQS_ON
@@ -331,7 +323,6 @@ sysret_signal:
* These all wind up with the iret return path anyway,
* so just join that path right now.
*/
- FIXUP_TOP_OF_STACK %r11
jmp int_check_syscall_exit_work
badsys:
@@ -370,15 +361,18 @@ sysret_audit:
jmp sysret_check
#endif /* CONFIG_AUDITSYSCALL */
- /* Do syscall tracing */
+ /* Do syscall entry tracing */
tracesys:
+ movq RIP(%rsp),%rcx
+ STORE_IRET_FRAME_CS_SS /* fill the rest of iret frame */
+ movq %r11,R11(%rsp) /* fill the rest of pt_regs */
+ movq %rcx,RCX(%rsp) /* ditto */
#ifdef CONFIG_AUDITSYSCALL
testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS)
jz auditsys
#endif
SAVE_EXTRA_REGS
movq $-ENOSYS,RAX(%rsp) /* ptrace can change this for a bad syscall */
- FIXUP_TOP_OF_STACK %rdi
movq %rsp,%rdi
call syscall_trace_enter
/*
@@ -402,7 +396,7 @@ tracesys:
/*
* Syscall return path ending with IRET.
- * Has correct top of stack, but partial stack frame.
+ * Has correct top of stack, but possibly only partially filled pt_regs.
*/
GLOBAL(int_ret_from_sys_call)
DISABLE_INTERRUPTS(CLBR_NONE)
@@ -469,43 +463,15 @@ ENTRY(stub_\func)
CFI_STARTPROC
DEFAULT_FRAME 0, 8 /* offset 8: return address */
SAVE_EXTRA_REGS 8
- FIXUP_TOP_OF_STACK %r11, 8
- call sys_\func
- RESTORE_TOP_OF_STACK %r11, 8
- ret
+ STORE_IRET_FRAME_CS_SS 8 /* not sure we need this. paranoia */
+ jmp sys_\func
CFI_ENDPROC
END(stub_\func)
.endm
- .macro FIXED_FRAME label,func
-ENTRY(\label)
- CFI_STARTPROC
- DEFAULT_FRAME 0, 8 /* offset 8: return address */
- FIXUP_TOP_OF_STACK %r11, 8
- call \func
- RESTORE_TOP_OF_STACK %r11, 8
- ret
- CFI_ENDPROC
-END(\label)
- .endm
-
FORK_LIKE clone
FORK_LIKE fork
FORK_LIKE vfork
- FIXED_FRAME stub_iopl, sys_iopl
-
-ENTRY(stub_execve)
- CFI_STARTPROC
- addq $8, %rsp
- DEFAULT_FRAME 0
- SAVE_EXTRA_REGS
- FIXUP_TOP_OF_STACK %r11
- call sys_execve
- movq %rax,RAX(%rsp)
- RESTORE_EXTRA_REGS
- jmp int_ret_from_sys_call
- CFI_ENDPROC
-END(stub_execve)
/*
* sigreturn is special because it needs to restore all registers on return.
@@ -516,25 +482,35 @@ ENTRY(stub_rt_sigreturn)
addq $8, %rsp
DEFAULT_FRAME 0
SAVE_EXTRA_REGS
- FIXUP_TOP_OF_STACK %r11
+ STORE_IRET_FRAME_CS_SS
call sys_rt_sigreturn
- movq %rax,RAX(%rsp) # fixme, this could be done at the higher layer
+stub_return:
+ movq %rax,RAX(%rsp)
RESTORE_EXTRA_REGS
jmp int_ret_from_sys_call
CFI_ENDPROC
END(stub_rt_sigreturn)
+ENTRY(stub_execve)
+ CFI_STARTPROC
+ addq $8, %rsp
+ DEFAULT_FRAME 0
+ SAVE_EXTRA_REGS
+ STORE_IRET_FRAME_CS_SS
+ call sys_execve
+ jmp stub_return
+ CFI_ENDPROC
+END(stub_execve)
+
#ifdef CONFIG_X86_X32_ABI
ENTRY(stub_x32_rt_sigreturn)
CFI_STARTPROC
addq $8, %rsp
DEFAULT_FRAME 0
SAVE_EXTRA_REGS
- FIXUP_TOP_OF_STACK %r11
+ STORE_IRET_FRAME_CS_SS
call sys32_x32_rt_sigreturn
- movq %rax,RAX(%rsp) # fixme, this could be done at the higher layer
- RESTORE_EXTRA_REGS
- jmp int_ret_from_sys_call
+ jmp stub_return
CFI_ENDPROC
END(stub_x32_rt_sigreturn)
@@ -543,15 +519,11 @@ ENTRY(stub_x32_execve)
addq $8, %rsp
DEFAULT_FRAME 0
SAVE_EXTRA_REGS
- FIXUP_TOP_OF_STACK %r11
+ STORE_IRET_FRAME_CS_SS
call compat_sys_execve
- RESTORE_TOP_OF_STACK %r11
- movq %rax,RAX(%rsp)
- RESTORE_EXTRA_REGS
- jmp int_ret_from_sys_call
+ jmp stub_return
CFI_ENDPROC
END(stub_x32_execve)
-
#endif
/*
@@ -576,10 +548,13 @@ ENTRY(ret_from_fork)
testl $3, CS(%rsp) # from kernel_thread?
jz 1f
+ /*
+ * TODO: can we instead check CS(%rsp) == __USER32_CS below?
+ * We won't need GET_THREAD_INFO(%rcx) then...
+ */
testl $_TIF_IA32, TI_flags(%rcx) # 32-bit compat task needs IRET
jnz int_ret_from_sys_call
- RESTORE_TOP_OF_STACK %rdi
jmp ret_from_sys_call # go to the SYSRET fastpath
1:
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 2105487e..5a7ba74 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -161,7 +161,6 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
childregs = task_pt_regs(p);
p->thread.sp = (unsigned long) childregs;
- p->thread.usersp = me->thread.usersp;
set_tsk_thread_flag(p, TIF_FORK);
p->thread.fpu_counter = 0;
p->thread.io_bitmap_ptr = NULL;
@@ -238,10 +237,8 @@ start_thread_common(struct pt_regs *regs, unsigned long new_ip,
loadsegment(es, _ds);
loadsegment(ds, _ds);
load_gs_index(0);
- current->thread.usersp = new_sp;
regs->ip = new_ip;
regs->sp = new_sp;
- this_cpu_write(old_rsp, new_sp);
regs->cs = _cs;
regs->ss = _ss;
regs->flags = X86_EFLAGS_IF;
@@ -359,8 +356,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
/*
* Switch the PDA and FPU contexts.
*/
- prev->usersp = this_cpu_read(old_rsp);
- this_cpu_write(old_rsp, next->usersp);
this_cpu_write(current_task, next_p);
/*
@@ -559,6 +554,5 @@ long sys_arch_prctl(int code, unsigned long addr)
unsigned long KSTK_ESP(struct task_struct *task)
{
- return (test_tsk_thread_flag(task, TIF_IA32)) ?
- (task_pt_regs(task)->sp) : ((task)->thread.usersp);
+ return task_pt_regs(task)->sp;
}
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index ec255a1..8b745ae 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -178,7 +178,7 @@
169 common reboot sys_reboot
170 common sethostname sys_sethostname
171 common setdomainname sys_setdomainname
-172 common iopl stub_iopl
+172 common iopl sys_iopl
173 common ioperm sys_ioperm
174 64 create_module
175 common init_module sys_init_module
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index f2f0723..5c81ed2 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -16,7 +16,7 @@
*/
/* Not going to be implemented by UML, since we have no hardware. */
-#define stub_iopl sys_ni_syscall
+#define sys_iopl sys_ni_syscall
#define sys_ioperm sys_ni_syscall
/*
--
1.8.1.4
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath
2014-08-10 15:00 ` Denys Vlasenko
@ 2014-08-10 22:42 ` Andy Lutomirski
2014-08-11 12:24 ` Denys Vlasenko
0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-08-10 22:42 UTC (permalink / raw)
To: Denys Vlasenko
Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Frederic Weisbecker, X86 ML, Alexei Starovoitov, Will Drewry,
Kees Cook
On Mon, Aug 11, 2014 at 12:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 08/09/2014 12:59 AM, Andy Lutomirski wrote:
>>> + * When returning through fast path, userspace sees rcx = return address,
>>> + * r11 = rflags. When returning through iret (e.g. if audit is active),
>>> + * these registers may contain garbage.
>>> + * For ptrace we manage to avoid that: when we hit slow path on entry,
>>> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
>>> + * If slow path is entered only on exit, there will be garbage.
>>
>> I don't like this. At least the current code puts something
>> deterministic in there (-1) for the slow path, even though it's wrong
>> and makes the slow path behave visibly differently from the fast path.
>>
>> Leaking uninitialized data here is extra bad, though. Keep in mind
>> that, when a syscall entry is interrupted before fully setting up
>> pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
>> depending on the stack slot ordering, for a kernel secret
>> (kernel_stack?) to end up somewhere in task_pt_regs.
>
> It's easy to fix. We jump off fast path to slow path here:
>
> movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
> andl %edi,%edx
> jnz sysret_careful
>
> This is the only use of "sysret_careful" label.
> Therefore, there we don't need to think about any other scenarios
> besides "we are returning from syscall here".
I may be missing something here (on vacation, not really testing
things, no big monitor, etc), but how is this compatible with things
like rt_sigreturn? rt_sigreturn is call from the fastpath, via a
stub, and it returns through int_ret_from_syscall. The C part needs
to modify all the regs, and those regs need to stick, so fixing up rcx
and r11 after rt_sigreturn can't work.
On the flip side, fork probably needs to *read* those regs, so the
fixup needs to happen before fork.
I'm sure it's possible to get this right, but it's complicated and
error-prone, and I'm not at all convinced it's worth it to save two
stores on the fast path.
(I'm pretty sure that current kernels have all kinds of bugs in this area.)
>
> My patch only fixes up segment regs:
>
> /* Handle reschedules */
> /* edx: work, edi: workmask */
> sysret_careful:
> + STORE_IRET_FRAME_CS_SS
> bt $TIF_NEED_RESCHED,%edx
>
> pt_regs->rcx and ->r11 aren't initialized.
>
> We can fix that e.g. by adding here four more insns:
>
> movq RIP(%rsp),%rcx
> movq %rcx,RCX(%rsp)
> movq EFLAGS(%rsp),%rcx
> movq %rcx,R11(%rsp)
>
> similar to what my patch did on slow path on entry (see changes at
> "tracesys" label).
>
> Or we can just bite the bullet and save rcx and r11 on fast path
> (there it will require just two insns). +2 cycles on most CPUs.
>
--Andy
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath
2014-08-10 22:42 ` Andy Lutomirski
@ 2014-08-11 12:24 ` Denys Vlasenko
2014-08-11 20:06 ` Andy Lutomirski
0 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-11 12:24 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Frederic Weisbecker, X86 ML, Alexei Starovoitov, Will Drewry,
Kees Cook
On 08/11/2014 12:42 AM, Andy Lutomirski wrote:
> On Mon, Aug 11, 2014 at 12:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> On 08/09/2014 12:59 AM, Andy Lutomirski wrote:
>>>> + * When returning through fast path, userspace sees rcx = return address,
>>>> + * r11 = rflags. When returning through iret (e.g. if audit is active),
>>>> + * these registers may contain garbage.
>>>> + * For ptrace we manage to avoid that: when we hit slow path on entry,
>>>> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
>>>> + * If slow path is entered only on exit, there will be garbage.
>>>
>>> I don't like this. At least the current code puts something
>>> deterministic in there (-1) for the slow path, even though it's wrong
>>> and makes the slow path behave visibly differently from the fast path.
>>>
>>> Leaking uninitialized data here is extra bad, though. Keep in mind
>>> that, when a syscall entry is interrupted before fully setting up
>>> pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
>>> depending on the stack slot ordering, for a kernel secret
>>> (kernel_stack?) to end up somewhere in task_pt_regs.
>>
>> It's easy to fix. We jump off fast path to slow path here:
>>
>> movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
>> andl %edi,%edx
>> jnz sysret_careful
>>
>> This is the only use of "sysret_careful" label.
>> Therefore, there we don't need to think about any other scenarios
>> besides "we are returning from syscall here".
>
> I may be missing something here (on vacation, not really testing
> things, no big monitor, etc), but how is this compatible with things
> like rt_sigreturn? rt_sigreturn is call from the fastpath, via a
> stub, and it returns through int_ret_from_syscall. The C part needs
> to modify all the regs, and those regs need to stick, so fixing up rcx
> and r11 after rt_sigreturn can't work.
Code at "sysret_careful" label is only reachable
on fast path return.
We don't go down this code path after rt_sigreturn.
stub_rt_sigreturn manually steers into iret code path instead:
ENTRY(stub_rt_sigreturn)
CFI_STARTPROC
addq $8, %rsp
DEFAULT_FRAME 0
SAVE_EXTRA_REGS
STORE_IRET_FRAME_CS_SS
call sys_rt_sigreturn
movq %rax,RAX(%rsp)
RESTORE_EXTRA_REGS
jmp int_ret_from_sys_call <==== NOTE THIS
So, we don't do any rcx/r11 fixups after sys_rt_sigreturn.
BTW, looks like "SAVE_EXTRA_REGS" here is not necessary:
sys_rt_sigreturn overwrites all registers anyway.
I have its removal on my TODO list.
(It was there before my patch set because stack needed
*resizing to full pt_regs size*; now it does not need that).
>
> On the flip side, fork probably needs to *read* those regs, so the
> fixup needs to happen before fork.
>
> I'm sure it's possible to get this right, but it's complicated and
> error-prone, and I'm not at all convinced it's worth it to save two
> stores on the fast path.
>
> (I'm pretty sure that current kernels have all kinds of bugs in this area.)
>
>>
>> My patch only fixes up segment regs:
>>
>> /* Handle reschedules */
>> /* edx: work, edi: workmask */
>> sysret_careful:
>> + STORE_IRET_FRAME_CS_SS
>> bt $TIF_NEED_RESCHED,%edx
>>
>> pt_regs->rcx and ->r11 aren't initialized.
>>
>> We can fix that e.g. by adding here four more insns:
>>
>> movq RIP(%rsp),%rcx
>> movq %rcx,RCX(%rsp)
>> movq EFLAGS(%rsp),%rcx
>> movq %rcx,R11(%rsp)
>>
>> similar to what my patch did on slow path on entry (see changes at
>> "tracesys" label).
>>
>> Or we can just bite the bullet and save rcx and r11 on fast path
>> (there it will require just two insns). +2 cycles on most CPUs.
>>
>
>
> --Andy
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath
2014-08-11 12:24 ` Denys Vlasenko
@ 2014-08-11 20:06 ` Andy Lutomirski
2014-08-12 9:21 ` Denys Vlasenko
0 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-08-11 20:06 UTC (permalink / raw)
To: Denys Vlasenko
Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Frederic Weisbecker, X86 ML, Alexei Starovoitov, Will Drewry,
Kees Cook
On Mon, Aug 11, 2014 at 5:24 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 08/11/2014 12:42 AM, Andy Lutomirski wrote:
>> On Mon, Aug 11, 2014 at 12:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>> On 08/09/2014 12:59 AM, Andy Lutomirski wrote:
>>>>> + * When returning through fast path, userspace sees rcx = return address,
>>>>> + * r11 = rflags. When returning through iret (e.g. if audit is active),
>>>>> + * these registers may contain garbage.
>>>>> + * For ptrace we manage to avoid that: when we hit slow path on entry,
>>>>> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
>>>>> + * If slow path is entered only on exit, there will be garbage.
>>>>
>>>> I don't like this. At least the current code puts something
>>>> deterministic in there (-1) for the slow path, even though it's wrong
>>>> and makes the slow path behave visibly differently from the fast path.
>>>>
>>>> Leaking uninitialized data here is extra bad, though. Keep in mind
>>>> that, when a syscall entry is interrupted before fully setting up
>>>> pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
>>>> depending on the stack slot ordering, for a kernel secret
>>>> (kernel_stack?) to end up somewhere in task_pt_regs.
>>>
>>> It's easy to fix. We jump off fast path to slow path here:
>>>
>>> movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
>>> andl %edi,%edx
>>> jnz sysret_careful
>>>
>>> This is the only use of "sysret_careful" label.
>>> Therefore, there we don't need to think about any other scenarios
>>> besides "we are returning from syscall here".
>>
>> I may be missing something here (on vacation, not really testing
>> things, no big monitor, etc), but how is this compatible with things
>> like rt_sigreturn? rt_sigreturn is call from the fastpath, via a
>> stub, and it returns through int_ret_from_syscall. The C part needs
>> to modify all the regs, and those regs need to stick, so fixing up rcx
>> and r11 after rt_sigreturn can't work.
>
> Code at "sysret_careful" label is only reachable
> on fast path return.
> We don't go down this code path after rt_sigreturn.
> stub_rt_sigreturn manually steers into iret code path instead:
>
> ENTRY(stub_rt_sigreturn)
> CFI_STARTPROC
> addq $8, %rsp
> DEFAULT_FRAME 0
> SAVE_EXTRA_REGS
> STORE_IRET_FRAME_CS_SS
> call sys_rt_sigreturn
> movq %rax,RAX(%rsp)
> RESTORE_EXTRA_REGS
> jmp int_ret_from_sys_call <==== NOTE THIS
>
> So, we don't do any rcx/r11 fixups after sys_rt_sigreturn.
Oh, right. rt_sigreturn overwrites all regs, so it doesn't need a
fixup in advance.
That still leaves fork and everything that calls ptrace_event, though.
--Andy
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath
2014-08-11 20:06 ` Andy Lutomirski
@ 2014-08-12 9:21 ` Denys Vlasenko
2014-08-13 1:02 ` Andy Lutomirski
0 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2014-08-12 9:21 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Frederic Weisbecker, X86 ML, Alexei Starovoitov, Will Drewry,
Kees Cook
On 08/11/2014 10:06 PM, Andy Lutomirski wrote:
> On Mon, Aug 11, 2014 at 5:24 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> On 08/11/2014 12:42 AM, Andy Lutomirski wrote:
>>> On Mon, Aug 11, 2014 at 12:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>>> On 08/09/2014 12:59 AM, Andy Lutomirski wrote:
>>>>>> + * When returning through fast path, userspace sees rcx = return address,
>>>>>> + * r11 = rflags. When returning through iret (e.g. if audit is active),
>>>>>> + * these registers may contain garbage.
>>>>>> + * For ptrace we manage to avoid that: when we hit slow path on entry,
>>>>>> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
>>>>>> + * If slow path is entered only on exit, there will be garbage.
>>>>>
>>>>> I don't like this. At least the current code puts something
>>>>> deterministic in there (-1) for the slow path, even though it's wrong
>>>>> and makes the slow path behave visibly differently from the fast path.
>>>>>
>>>>> Leaking uninitialized data here is extra bad, though. Keep in mind
>>>>> that, when a syscall entry is interrupted before fully setting up
>>>>> pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
>>>>> depending on the stack slot ordering, for a kernel secret
>>>>> (kernel_stack?) to end up somewhere in task_pt_regs.
>>>>
>>>> It's easy to fix. We jump off fast path to slow path here:
>>>>
>>>> movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
>>>> andl %edi,%edx
>>>> jnz sysret_careful
>>>>
>>>> This is the only use of "sysret_careful" label.
>>>> Therefore, there we don't need to think about any other scenarios
>>>> besides "we are returning from syscall here".
>>>
>>> I may be missing something here (on vacation, not really testing
>>> things, no big monitor, etc), but how is this compatible with things
>>> like rt_sigreturn? rt_sigreturn is call from the fastpath, via a
>>> stub, and it returns through int_ret_from_syscall. The C part needs
>>> to modify all the regs, and those regs need to stick, so fixing up rcx
>>> and r11 after rt_sigreturn can't work.
>>
>> Code at "sysret_careful" label is only reachable
>> on fast path return.
>> We don't go down this code path after rt_sigreturn.
>> stub_rt_sigreturn manually steers into iret code path instead:
>>
>> ENTRY(stub_rt_sigreturn)
>> CFI_STARTPROC
>> addq $8, %rsp
>> DEFAULT_FRAME 0
>> SAVE_EXTRA_REGS
>> STORE_IRET_FRAME_CS_SS
>> call sys_rt_sigreturn
>> movq %rax,RAX(%rsp)
>> RESTORE_EXTRA_REGS
>> jmp int_ret_from_sys_call <==== NOTE THIS
>>
>> So, we don't do any rcx/r11 fixups after sys_rt_sigreturn.
>
> Oh, right. rt_sigreturn overwrites all regs, so it doesn't need a
> fixup in advance.
>
> That still leaves fork and everything that calls ptrace_event, though.
I think I have it covered:
[v]fork and clone have fully populated pt_regs.
Syscall entry/exit ptrace stops are on slow path and therefore
also have fully populated pt_regs.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath
2014-08-12 9:21 ` Denys Vlasenko
@ 2014-08-13 1:02 ` Andy Lutomirski
0 siblings, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2014-08-13 1:02 UTC (permalink / raw)
To: Denys Vlasenko
Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, H. Peter Anvin,
Frederic Weisbecker, X86 ML, Alexei Starovoitov, Will Drewry,
Kees Cook
On Tue, Aug 12, 2014 at 2:21 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 08/11/2014 10:06 PM, Andy Lutomirski wrote:
>> On Mon, Aug 11, 2014 at 5:24 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>> On 08/11/2014 12:42 AM, Andy Lutomirski wrote:
>>>> On Mon, Aug 11, 2014 at 12:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>>>> On 08/09/2014 12:59 AM, Andy Lutomirski wrote:
>>>>>>> + * When returning through fast path, userspace sees rcx = return address,
>>>>>>> + * r11 = rflags. When returning through iret (e.g. if audit is active),
>>>>>>> + * these registers may contain garbage.
>>>>>>> + * For ptrace we manage to avoid that: when we hit slow path on entry,
>>>>>>> + * we do save rcx and r11 in pt_regs, so ptrace on exit also sees them.
>>>>>>> + * If slow path is entered only on exit, there will be garbage.
>>>>>>
>>>>>> I don't like this. At least the current code puts something
>>>>>> deterministic in there (-1) for the slow path, even though it's wrong
>>>>>> and makes the slow path behave visibly differently from the fast path.
>>>>>>
>>>>>> Leaking uninitialized data here is extra bad, though. Keep in mind
>>>>>> that, when a syscall entry is interrupted before fully setting up
>>>>>> pt_regs, the interrupt frame overlaps task_pt_regs, so it's possible,
>>>>>> depending on the stack slot ordering, for a kernel secret
>>>>>> (kernel_stack?) to end up somewhere in task_pt_regs.
>>>>>
>>>>> It's easy to fix. We jump off fast path to slow path here:
>>>>>
>>>>> movl TI_flags+THREAD_INFO(%rsp,SIZEOF_PTREGS),%edx
>>>>> andl %edi,%edx
>>>>> jnz sysret_careful
>>>>>
>>>>> This is the only use of "sysret_careful" label.
>>>>> Therefore, there we don't need to think about any other scenarios
>>>>> besides "we are returning from syscall here".
>>>>
>>>> I may be missing something here (on vacation, not really testing
>>>> things, no big monitor, etc), but how is this compatible with things
>>>> like rt_sigreturn? rt_sigreturn is call from the fastpath, via a
>>>> stub, and it returns through int_ret_from_syscall. The C part needs
>>>> to modify all the regs, and those regs need to stick, so fixing up rcx
>>>> and r11 after rt_sigreturn can't work.
>>>
>>> Code at "sysret_careful" label is only reachable
>>> on fast path return.
>>> We don't go down this code path after rt_sigreturn.
>>> stub_rt_sigreturn manually steers into iret code path instead:
>>>
>>> ENTRY(stub_rt_sigreturn)
>>> CFI_STARTPROC
>>> addq $8, %rsp
>>> DEFAULT_FRAME 0
>>> SAVE_EXTRA_REGS
>>> STORE_IRET_FRAME_CS_SS
>>> call sys_rt_sigreturn
>>> movq %rax,RAX(%rsp)
>>> RESTORE_EXTRA_REGS
>>> jmp int_ret_from_sys_call <==== NOTE THIS
>>>
>>> So, we don't do any rcx/r11 fixups after sys_rt_sigreturn.
>>
>> Oh, right. rt_sigreturn overwrites all regs, so it doesn't need a
>> fixup in advance.
>>
>> That still leaves fork and everything that calls ptrace_event, though.
>
> I think I have it covered:
>
> [v]fork and clone have fully populated pt_regs.
>
> Syscall entry/exit ptrace stops are on slow path and therefore
> also have fully populated pt_regs.
Heh. I hope so, but CVE-2014-4699 was an exception to that rule...
Anyway, I'll see if I can beef up my test cases that are relevant to this stuff.
--Andy
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2014-08-13 1:03 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-08 17:44 [PATCH v4 0/17] x86: entry.S optimizations Denys Vlasenko
2014-08-08 17:44 ` [PATCH 01/17] x86: entry_64.S: delete unused code Denys Vlasenko
2014-08-08 17:44 ` [PATCH 02/17] x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET Denys Vlasenko
2014-08-08 17:44 ` [PATCH 03/17] x86: open-code register save/restore in trace_hardirqs thunks Denys Vlasenko
2014-08-08 17:44 ` [PATCH 04/17] x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user Denys Vlasenko
2014-08-08 17:44 ` [PATCH 05/17] x86: entry_64.S: always allocate complete "struct pt_regs" Denys Vlasenko
2014-08-08 17:44 ` [PATCH 06/17] x86: mass removal of ARGOFFSET Denys Vlasenko
2014-08-08 17:44 ` [PATCH 07/17] x86: rename some macros and labels, no code changes Denys Vlasenko
2014-08-08 17:44 ` [PATCH 08/17] x86: add comments about various syscall instructions, " Denys Vlasenko
2014-08-08 17:44 ` [PATCH 09/17] x86: entry_64.S: move save_paranoid and ret_from_fork closer to their users Denys Vlasenko
2014-08-08 17:44 ` [PATCH 10/17] x86: entry_64.S: rename save_paranoid to paranoid_entry, no code changes Denys Vlasenko
2014-08-08 17:44 ` [PATCH 11/17] x86: entry_64.S: fold test_in_nmi macro into its only user Denys Vlasenko
2014-08-08 17:44 ` [PATCH 12/17] x86: get rid of KERNEL_STACK_OFFSET Denys Vlasenko
2014-08-08 17:44 ` [PATCH 13/17] x86: ia32entry.S: fold IA32_ARG_FIXUP macro into its callers Denys Vlasenko
2014-08-08 17:44 ` [PATCH 14/17] x86: ia32entry.S: use mov instead of push/pop where possible Denys Vlasenko
2014-08-08 17:44 ` [PATCH 15/17] x86: code shrink in paranoid_exit Denys Vlasenko
2014-08-08 17:44 ` [PATCH 16/17] x86: entry_64.S: trivial optimization for ENOSYS Denys Vlasenko
2014-08-08 22:48 ` Andy Lutomirski
2014-08-08 17:44 ` [PATCH 17/17] x86: simplify iret stack handling on SYSCALL64 fastpath Denys Vlasenko
2014-08-08 22:59 ` Andy Lutomirski
2014-08-10 15:00 ` Denys Vlasenko
2014-08-10 22:42 ` Andy Lutomirski
2014-08-11 12:24 ` Denys Vlasenko
2014-08-11 20:06 ` Andy Lutomirski
2014-08-12 9:21 ` Denys Vlasenko
2014-08-13 1:02 ` Andy Lutomirski
2014-08-10 18:47 ` [PATCH 17/17 v2] " Denys Vlasenko
2014-08-09 0:27 ` [PATCH v4 0/17] x86: entry.S optimizations H. Peter Anvin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.