All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32
@ 2015-07-24 13:47 Denys Vlasenko
  2015-07-24 13:47 ` [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1 Denys Vlasenko
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Denys Vlasenko @ 2015-07-24 13:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
	Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov,
	Will Drewry, Kees Cook, x86, linux-kernel

This change swaps a few instructions in final register restoring/zeroing
section of SYSENTER fast path, and adds/deletes a few empty lines.

After this, the difference between SYSENTER and SYCALL fast paths
(after the prologue which saved pt_regs) is very small:
they differ merely in the choice of register to hold arg6 (EBP or R9)
and in the value of EDX on exit: SYSENTER ABI doesn't need to preserve it,
so it is zeroed. SYSCALL preserves it:

       |(prologue is different)
       | 	orl	$TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
       | 	testl	$_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
       |-	jnz	sysenter_tracesys
       |+	jnz	cstar_tracesys
       |
       |-sysenter_do_call:
       |+cstar_do_call:
       | 	/* 32-bit syscall -> 64-bit C ABI argument conversion */
       | 	movl	%edi, %r8d		/* arg5 */
       |-	movl	%ebp, %r9d		/* arg6 */
       |+	/* r9 already loaded */		/* arg6 */
       | 	xchg	%ecx, %esi		/* rsi:arg2, rcx:arg4 */
       | 	movl	%ebx, %edi		/* arg1 */
       | 	movl	%edx, %edx		/* arg3 (zero extension) */
       |
       |-sysenter_dispatch:
       |+cstar_dispatch:
       | 	cmpq	$(IA32_NR_syscalls-1), %rax
       | 	ja	1f
       | 	call	*ia32_sys_call_table(, %rax, 8)
       |@@ -19,15 +19,15 @@
       | 	DISABLE_INTERRUPTS(CLBR_NONE)
       | 	TRACE_IRQS_OFF
       | 	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
       |-	jnz	sysexit_audit
       |+	jnz	sysretl_audit
       |
       |-sysexit_from_sys_call:
       |+sysretl_from_sys_call:
       | 	andl	$~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
       |+	movl	RDX(%rsp), %edx
       | 	movl	RSI(%rsp), %esi
       | 	movl	RDI(%rsp), %edi
       | 	movl	RIP(%rsp), %ecx
       | 	movl	EFLAGS(%rsp), %r11d
       |-	xorl	%edx, %edx
       | 	xorq	%r10, %r10
       | 	xorq	%r9, %r9
       | 	xorq	%r8, %r8
       |(the rest of fast path, up to final SYSRET32, is identical)

This is a preparatory change which allows to drop most of SYSENTER machinery
and make SYSENTER reuse SYSCALL code.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Krzysztof A. Sobiecki <sobkas@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/entry/entry_64_compat.S | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 8997383..9f9dfa5 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -117,6 +117,7 @@ sysenter_do_call:
 	xchg	%ecx, %esi		/* rsi:arg2, rcx:arg4 */
 	movl	%ebx, %edi		/* arg1 */
 	movl	%edx, %edx		/* arg3 (zero extension) */
+
 sysenter_dispatch:
 	cmpq	$(IA32_NR_syscalls-1), %rax
 	ja	1f
@@ -127,6 +128,7 @@ sysenter_dispatch:
 	TRACE_IRQS_OFF
 	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
 	jnz	sysexit_audit
+
 sysexit_from_sys_call:
 	/*
 	 * NB: SYSEXIT is not obviously safe for 64-bit kernels -- an
@@ -139,14 +141,14 @@ sysexit_from_sys_call:
 	 * with 'sysenter' and it uses the SYSENTER calling convention.
 	 */
 	andl	$~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
-	movl	RIP(%rsp), %ecx		/* User %eip */
 	movl	RSI(%rsp), %esi
 	movl	RDI(%rsp), %edi
+	movl	RIP(%rsp), %ecx		/* User %eip */
+	movl	EFLAGS(%rsp), %r11d	/* User eflags */
 	xorl	%edx, %edx		/* Do not leak kernel information */
-	xorq	%r8, %r8
-	xorq	%r9, %r9
 	xorq	%r10, %r10
-	movl	EFLAGS(%rsp), %r11d	/* User eflags */
+	xorq	%r9, %r9
+	xorq	%r8, %r8
 	TRACE_IRQS_ON
 
 	/*
@@ -340,6 +342,7 @@ ENTRY(entry_SYSCALL_compat)
 1:	movl	(%r8), %r9d
 	_ASM_EXTABLE(1b, ia32_badarg)
 	ASM_CLAC
+
 	orl	$TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
 	testl	$_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
 	jnz	cstar_tracesys
@@ -355,7 +358,6 @@ cstar_do_call:
 cstar_dispatch:
 	cmpq	$(IA32_NR_syscalls-1), %rax
 	ja	1f
-
 	call	*ia32_sys_call_table(, %rax, 8)
 	movq	%rax, RAX(%rsp)
 1:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1
  2015-07-24 13:47 [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32 Denys Vlasenko
@ 2015-07-24 13:47 ` Denys Vlasenko
  2015-07-24 17:50   ` Andy Lutomirski
  2015-07-27 16:05   ` Ingo Molnar
  2015-07-24 13:47 ` [PATCH 3/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 2 Denys Vlasenko
  2015-07-24 17:37 ` [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32 Andy Lutomirski
  2 siblings, 2 replies; 11+ messages in thread
From: Denys Vlasenko @ 2015-07-24 13:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
	Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov,
	Will Drewry, Kees Cook, x86, linux-kernel

SYSCALL32 code is nearly identical to SYSCALL32, except for initial
section. Merge them.

The removal is split into two parts, to make review eaiser. This is part 1.

auditsys_entry_common and auditsys_exit macros are indented one more tab without
any changes. This prevents diff from becoming unreadable.
They will be removed in part 2.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Krzysztof A. Sobiecki <sobkas@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/entry/entry_64_compat.S | 237 +++++++++++----------------------------
 1 file changed, 63 insertions(+), 174 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 9f9dfa5..2d0a2f0 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -16,13 +16,7 @@
 #include <linux/linkage.h>
 #include <linux/err.h>
 
-/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this.  */
-#include <linux/elf-em.h>
-#define AUDIT_ARCH_I386		(EM_386|__AUDIT_ARCH_LE)
-#define __AUDIT_ARCH_LE		0x40000000
-
 #ifndef CONFIG_AUDITSYSCALL
-# define sysexit_audit		ia32_ret_from_sys_call_irqs_off
 # define sysretl_audit		ia32_ret_from_sys_call_irqs_off
 #endif
 
@@ -93,188 +87,82 @@ ENTRY(entry_SYSENTER_compat)
 	 * 32-bit zero extended
 	 */
 	ASM_STAC
-1:	movl	(%rbp), %ebp
+1:	movl	(%rbp), %r9d
 	_ASM_EXTABLE(1b, ia32_badarg)
 	ASM_CLAC
 
 	/*
-	 * Sysenter doesn't filter flags, so we need to clear NT
-	 * ourselves.  To save a few cycles, we can check whether
-	 * NT was set instead of doing an unconditional popfq.
+	 * Sysenter doesn't filter flags, so we need to clear NT ourselves.
 	 */
 	testl	$X86_EFLAGS_NT, EFLAGS(%rsp)
 	jnz	sysenter_fix_flags
 sysenter_flags_fixed:
-
-	orl	$TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
-	testl	$_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jnz	sysenter_tracesys
-
-sysenter_do_call:
-	/* 32-bit syscall -> 64-bit C ABI argument conversion */
-	movl	%edi, %r8d		/* arg5 */
-	movl	%ebp, %r9d		/* arg6 */
-	xchg	%ecx, %esi		/* rsi:arg2, rcx:arg4 */
-	movl	%ebx, %edi		/* arg1 */
-	movl	%edx, %edx		/* arg3 (zero extension) */
-
-sysenter_dispatch:
-	cmpq	$(IA32_NR_syscalls-1), %rax
-	ja	1f
-	call	*ia32_sys_call_table(, %rax, 8)
-	movq	%rax, RAX(%rsp)
-1:
-	DISABLE_INTERRUPTS(CLBR_NONE)
-	TRACE_IRQS_OFF
-	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jnz	sysexit_audit
-
-sysexit_from_sys_call:
-	/*
-	 * NB: SYSEXIT is not obviously safe for 64-bit kernels -- an
-	 * NMI between STI and SYSEXIT has poorly specified behavior,
-	 * and and NMI followed by an IRQ with usergs is fatal.  So
-	 * we just pretend we're using SYSEXIT but we really use
-	 * SYSRETL instead.
-	 *
-	 * This code path is still called 'sysexit' because it pairs
-	 * with 'sysenter' and it uses the SYSENTER calling convention.
-	 */
-	andl	$~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
-	movl	RSI(%rsp), %esi
-	movl	RDI(%rsp), %edi
-	movl	RIP(%rsp), %ecx		/* User %eip */
-	movl	EFLAGS(%rsp), %r11d	/* User eflags */
-	xorl	%edx, %edx		/* Do not leak kernel information */
-	xorq	%r10, %r10
-	xorq	%r9, %r9
-	xorq	%r8, %r8
-	TRACE_IRQS_ON
-
-	/*
-	 * SYSRETL works even on Intel CPUs.  Use it in preference to SYSEXIT,
-	 * since it avoids a dicey window with interrupts enabled.
-	 */
-	movl	RSP(%rsp), %esp
-
-	/*
-	 * USERGS_SYSRET32 does:
-	 *  gsbase = user's gs base
-	 *  eip = ecx
-	 *  rflags = r11
-	 *  cs = __USER32_CS
-	 *  ss = __USER_DS
-	 *
-	 * The prologue set RIP(%rsp) to VDSO32_SYSENTER_RETURN, which does:
-	 *
-	 *  pop %ebp
-	 *  pop %edx
-	 *  pop %ecx
-	 *
-	 * Therefore, we invoke SYSRETL with EDX and R8-R10 zeroed to
-	 * avoid info leaks.  R11 ends up with VDSO32_SYSENTER_RETURN's
-	 * address (already known to user code), and R12-R15 are
-	 * callee-saved and therefore don't contain any interesting
-	 * kernel data.
-	 */
-	USERGS_SYSRET32
-
-#ifdef CONFIG_AUDITSYSCALL
-	.macro auditsys_entry_common
-	/*
-	 * At this point, registers hold syscall args in the 32-bit syscall ABI:
-	 * EAX is syscall number, the 6 args are in EBX,ECX,EDX,ESI,EDI,EBP.
-	 *
-	 * We want to pass them to __audit_syscall_entry(), which is a 64-bit
-	 * C function with 5 parameters, so shuffle them to match what
-	 * the function expects: RDI,RSI,RDX,RCX,R8.
-	 */
-	movl	%esi, %r8d		/* arg5 (R8 ) <= 4th syscall arg (ESI) */
-	xchg	%ecx, %edx		/* arg4 (RCX) <= 3rd syscall arg (EDX) */
-					/* arg3 (RDX) <= 2nd syscall arg (ECX) */
-	movl	%ebx, %esi		/* arg2 (RSI) <= 1st syscall arg (EBX) */
-	movl	%eax, %edi		/* arg1 (RDI) <= syscall number  (EAX) */
-	call	__audit_syscall_entry
-
-	/*
-	 * We are going to jump back to the syscall dispatch code.
-	 * Prepare syscall args as required by the 64-bit C ABI.
-	 * Registers clobbered by __audit_syscall_entry() are
-	 * loaded from pt_regs on stack:
-	 */
-	movl	ORIG_RAX(%rsp), %eax	/* syscall number */
-	movl	%ebx, %edi		/* arg1 */
-	movl	RCX(%rsp), %esi		/* arg2 */
-	movl	RDX(%rsp), %edx		/* arg3 */
-	movl	RSI(%rsp), %ecx		/* arg4 */
-	movl	RDI(%rsp), %r8d		/* arg5 */
-	.endm
-
-	.macro auditsys_exit exit
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_NONE)
-	testl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jnz	ia32_ret_from_sys_call
-	movl	%eax, %esi		/* second arg, syscall return value */
-	cmpl	$-MAX_ERRNO, %eax	/* is it an error ? */
-	jbe	1f
-	movslq	%eax, %rsi		/* if error sign extend to 64 bits */
-1:	setbe	%al			/* 1 if error, 0 if not */
-	movzbl	%al, %edi		/* zero-extend that into %edi */
-	call	__audit_syscall_exit
-	movq	RAX(%rsp), %rax		/* reload syscall return value */
-	movl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
-	DISABLE_INTERRUPTS(CLBR_NONE)
-	TRACE_IRQS_OFF
-	testl	%edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jz	\exit
-	xorl	%eax, %eax		/* Do not leak kernel information */
-	movq	%rax, R11(%rsp)
-	movq	%rax, R10(%rsp)
-	movq	%rax, R9(%rsp)
-	movq	%rax, R8(%rsp)
-	jmp	int_ret_from_sys_call_irqs_off
-	.endm
-
-sysenter_auditsys:
-	auditsys_entry_common
-	movl	%ebp, %r9d		/* reload 6th syscall arg */
-	jmp	sysenter_dispatch
-
-sysexit_audit:
-	auditsys_exit sysexit_from_sys_call
-#endif
+	jmp	sysenter_jumps_here
 
 sysenter_fix_flags:
 	pushq	$(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
 	popfq
 	jmp	sysenter_flags_fixed
-
-sysenter_tracesys:
-#ifdef CONFIG_AUDITSYSCALL
-	testl	$(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-	jz	sysenter_auditsys
-#endif
-	SAVE_EXTRA_REGS
-	xorl	%eax, %eax		/* Do not leak kernel information */
-	movq	%rax, R11(%rsp)
-	movq	%rax, R10(%rsp)
-	movq	%rax, R9(%rsp)
-	movq	%rax, R8(%rsp)
-	movq	%rsp, %rdi		/* &pt_regs -> arg1 */
-	call	syscall_trace_enter
-
-	/* Reload arg registers from stack. (see sysenter_tracesys) */
-	movl	RCX(%rsp), %ecx
-	movl	RDX(%rsp), %edx
-	movl	RSI(%rsp), %esi
-	movl	RDI(%rsp), %edi
-	movl	%eax, %eax		/* zero extension */
-
-	RESTORE_EXTRA_REGS
-	jmp	sysenter_do_call
 ENDPROC(entry_SYSENTER_compat)
 
+
+	#ifdef CONFIG_AUDITSYSCALL
+		.macro auditsys_entry_common
+		/*
+		 * At this point, registers hold syscall args in the 32-bit syscall ABI:
+		 * EAX is syscall number, the 6 args are in EBX,ECX,EDX,ESI,EDI,EBP.
+		 *
+		 * We want to pass them to __audit_syscall_entry(), which is a 64-bit
+		 * C function with 5 parameters, so shuffle them to match what
+		 * the function expects: RDI,RSI,RDX,RCX,R8.
+		 */
+		movl	%esi, %r8d		/* arg5 (R8 ) <= 4th syscall arg (ESI) */
+		xchg	%ecx, %edx		/* arg4 (RCX) <= 3rd syscall arg (EDX) */
+						/* arg3 (RDX) <= 2nd syscall arg (ECX) */
+		movl	%ebx, %esi		/* arg2 (RSI) <= 1st syscall arg (EBX) */
+		movl	%eax, %edi		/* arg1 (RDI) <= syscall number  (EAX) */
+		call	__audit_syscall_entry
+
+		/*
+		 * We are going to jump back to the syscall dispatch code.
+		 * Prepare syscall args as required by the 64-bit C ABI.
+		 * Registers clobbered by __audit_syscall_entry() are
+		 * loaded from pt_regs on stack:
+		 */
+		movl	ORIG_RAX(%rsp), %eax	/* syscall number */
+		movl	%ebx, %edi		/* arg1 */
+		movl	RCX(%rsp), %esi		/* arg2 */
+		movl	RDX(%rsp), %edx		/* arg3 */
+		movl	RSI(%rsp), %ecx		/* arg4 */
+		movl	RDI(%rsp), %r8d		/* arg5 */
+		.endm
+
+		.macro auditsys_exit exit
+		TRACE_IRQS_ON
+		ENABLE_INTERRUPTS(CLBR_NONE)
+		testl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+		jnz	ia32_ret_from_sys_call
+		movl	%eax, %esi		/* second arg, syscall return value */
+		cmpl	$-MAX_ERRNO, %eax	/* is it an error ? */
+		jbe	1f
+		movslq	%eax, %rsi		/* if error sign extend to 64 bits */
+	1:	setbe	%al			/* 1 if error, 0 if not */
+		movzbl	%al, %edi		/* zero-extend that into %edi */
+		call	__audit_syscall_exit
+		movq	RAX(%rsp), %rax		/* reload syscall return value */
+		movl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
+		DISABLE_INTERRUPTS(CLBR_NONE)
+		TRACE_IRQS_OFF
+		testl	%edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+		jz	\exit
+		xorl	%eax, %eax		/* Do not leak kernel information */
+		movq	%rax, R11(%rsp)
+		movq	%rax, R10(%rsp)
+		movq	%rax, R9(%rsp)
+		movq	%rax, R8(%rsp)
+		jmp	int_ret_from_sys_call_irqs_off
+		.endm
+	#endif
 /*
  * 32-bit SYSCALL instruction entry.
  *
@@ -343,6 +231,7 @@ ENTRY(entry_SYSCALL_compat)
 	_ASM_EXTABLE(1b, ia32_badarg)
 	ASM_CLAC
 
+sysenter_jumps_here:
 	orl	$TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
 	testl	$_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
 	jnz	cstar_tracesys
@@ -421,7 +310,7 @@ cstar_tracesys:
 	call	syscall_trace_enter
 	movl	R9(%rsp), %r9d
 
-	/* Reload arg registers from stack. (see sysenter_tracesys) */
+	/* Reload arg registers from stack. */
 	movl	RCX(%rsp), %ecx
 	movl	RDX(%rsp), %edx
 	movl	RSI(%rsp), %esi
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 2
  2015-07-24 13:47 [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32 Denys Vlasenko
  2015-07-24 13:47 ` [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1 Denys Vlasenko
@ 2015-07-24 13:47 ` Denys Vlasenko
  2015-07-24 17:37 ` [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32 Andy Lutomirski
  2 siblings, 0 replies; 11+ messages in thread
From: Denys Vlasenko @ 2015-07-24 13:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
	Oleg Nesterov, Frederic Weisbecker, Alexei Starovoitov,
	Will Drewry, Kees Cook, x86, linux-kernel

SYSCALL32 code is nearly identical to SYSCALL32, except for initial
section. Merge them.

This change is split into two parts, to make review eaiser.
This is part 2, which tidies up loose ends:

"sysenter_fix_flags" detour does not need to be convoluted anymore,
straigten it up.

auditsys_entry_common and auditsys_exit macros have only one caller now.
Drop masros, move their bodies to invocation locations.

Reinstate "why we use SYSRETL instead of SYSEXIT" comment.

Run-tested under QEMU: calls through SYSENTER VDSO still work:

/ # ./test_syscall_vdso32
[RUN]	Executing 6-argument 32-bit syscall via VDSO
[OK]	Arguments are preserved across syscall
[NOTE]	R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK]	R8..R15 did not leak kernel data
[RUN]	Executing 6-argument 32-bit syscall via INT 80
[OK]	Arguments are preserved across syscall
[OK]	R8..R15 did not leak kernel data
[RUN]	Running tests under ptrace
[RUN]	Executing 6-argument 32-bit syscall via VDSO
[OK]	Arguments are preserved across syscall
[OK]	R8..R15 did not leak kernel data
[RUN]	Executing 6-argument 32-bit syscall via INT 80
[OK]	Arguments are preserved across syscall
[OK]	R8..R15 did not leak kernel data

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Krzysztof A. Sobiecki <sobkas@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/entry/entry_64_compat.S | 140 +++++++++++++++++++--------------------
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 2d0a2f0..6ee70fd 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -95,74 +95,25 @@ ENTRY(entry_SYSENTER_compat)
 	 * Sysenter doesn't filter flags, so we need to clear NT ourselves.
 	 */
 	testl	$X86_EFLAGS_NT, EFLAGS(%rsp)
-	jnz	sysenter_fix_flags
-sysenter_flags_fixed:
-	jmp	sysenter_jumps_here
-
-sysenter_fix_flags:
+	jz	sysenter_jumps_here
 	pushq	$(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
 	popfq
-	jmp	sysenter_flags_fixed
-ENDPROC(entry_SYSENTER_compat)
-
-
-	#ifdef CONFIG_AUDITSYSCALL
-		.macro auditsys_entry_common
-		/*
-		 * At this point, registers hold syscall args in the 32-bit syscall ABI:
-		 * EAX is syscall number, the 6 args are in EBX,ECX,EDX,ESI,EDI,EBP.
-		 *
-		 * We want to pass them to __audit_syscall_entry(), which is a 64-bit
-		 * C function with 5 parameters, so shuffle them to match what
-		 * the function expects: RDI,RSI,RDX,RCX,R8.
-		 */
-		movl	%esi, %r8d		/* arg5 (R8 ) <= 4th syscall arg (ESI) */
-		xchg	%ecx, %edx		/* arg4 (RCX) <= 3rd syscall arg (EDX) */
-						/* arg3 (RDX) <= 2nd syscall arg (ECX) */
-		movl	%ebx, %esi		/* arg2 (RSI) <= 1st syscall arg (EBX) */
-		movl	%eax, %edi		/* arg1 (RDI) <= syscall number  (EAX) */
-		call	__audit_syscall_entry
-
-		/*
-		 * We are going to jump back to the syscall dispatch code.
-		 * Prepare syscall args as required by the 64-bit C ABI.
-		 * Registers clobbered by __audit_syscall_entry() are
-		 * loaded from pt_regs on stack:
-		 */
-		movl	ORIG_RAX(%rsp), %eax	/* syscall number */
-		movl	%ebx, %edi		/* arg1 */
-		movl	RCX(%rsp), %esi		/* arg2 */
-		movl	RDX(%rsp), %edx		/* arg3 */
-		movl	RSI(%rsp), %ecx		/* arg4 */
-		movl	RDI(%rsp), %r8d		/* arg5 */
-		.endm
-
-		.macro auditsys_exit exit
-		TRACE_IRQS_ON
-		ENABLE_INTERRUPTS(CLBR_NONE)
-		testl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-		jnz	ia32_ret_from_sys_call
-		movl	%eax, %esi		/* second arg, syscall return value */
-		cmpl	$-MAX_ERRNO, %eax	/* is it an error ? */
-		jbe	1f
-		movslq	%eax, %rsi		/* if error sign extend to 64 bits */
-	1:	setbe	%al			/* 1 if error, 0 if not */
-		movzbl	%al, %edi		/* zero-extend that into %edi */
-		call	__audit_syscall_exit
-		movq	RAX(%rsp), %rax		/* reload syscall return value */
-		movl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
-		DISABLE_INTERRUPTS(CLBR_NONE)
-		TRACE_IRQS_OFF
-		testl	%edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
-		jz	\exit
-		xorl	%eax, %eax		/* Do not leak kernel information */
-		movq	%rax, R11(%rsp)
-		movq	%rax, R10(%rsp)
-		movq	%rax, R9(%rsp)
-		movq	%rax, R8(%rsp)
-		jmp	int_ret_from_sys_call_irqs_off
-		.endm
-	#endif
+	jmp	sysenter_jumps_here
+	/*
+	 * SYSEXIT insn is not obviously safe for 64-bit kernels --
+	 * an NMI between STI and SYSEXIT has poorly specified behavior,
+	 * and NMI followed by an IRQ with usergs is fatal.
+	 * So we just pretend we're using SYSEXIT but we really use
+	 * SYSRETL instead. (Yes, SYSRETL works even on Intel CPUs.)
+	 * We do that by reusing the entire SYSCALL code path:
+	 * the jump above takes us there.
+	 *
+	 * The difference of SYSENTER 32-bit ABI versus SYSCALL
+	 * is that SYSENTER ABI does not promise to preserve EDX and EBP,
+	 * SYSCALL does.
+	 */
+ENDPROC(entry_SYSENTER_compat)
+
 /*
  * 32-bit SYSCALL instruction entry.
  *
@@ -285,13 +236,62 @@ sysretl_from_sys_call:
 
 #ifdef CONFIG_AUDITSYSCALL
 cstar_auditsys:
-	movl	%r9d, R9(%rsp)		/* register to be clobbered by call */
-	auditsys_entry_common
-	movl	R9(%rsp), %r9d		/* reload 6th syscall arg */
+	movl	%r9d, R9(%rsp)		/* R9 is callee-clobbered, save it */
+	/*
+	 * At this point, registers hold syscall args in the 32-bit syscall ABI:
+	 * EAX is syscall number, the 6 args are in EBX,ECX,EDX,ESI,EDI,
+	 * 6th arg is in R9.
+	 *
+	 * We want to pass them to __audit_syscall_entry(), which is a 64-bit
+	 * C function with 5 parameters, so shuffle them to match what
+	 * the function expects: RDI,RSI,RDX,RCX,R8.
+	 */
+	movl	%esi, %r8d		/* arg5 (R8 ) <= 4th syscall arg (ESI) */
+	xchg	%ecx, %edx		/* arg4 (RCX) <= 3rd syscall arg (EDX) */
+					/* arg3 (RDX) <= 2nd syscall arg (ECX) */
+	movl	%ebx, %esi		/* arg2 (RSI) <= 1st syscall arg (EBX) */
+	movl	%eax, %edi		/* arg1 (RDI) <= syscall number  (EAX) */
+	call	__audit_syscall_entry
+
+	/*
+	 * We are going to jump back to the syscall dispatch code.
+	 * Prepare syscall args as required by the 64-bit C ABI.
+	 * Registers clobbered by __audit_syscall_entry() are
+	 * loaded from pt_regs on stack:
+	 */
+	movl	ORIG_RAX(%rsp), %eax	/* syscall number */
+	movl	%ebx, %edi		/* arg1 */
+	movl	RCX(%rsp), %esi		/* arg2 */
+	movl	RDX(%rsp), %edx		/* arg3 */
+	movl	RSI(%rsp), %ecx		/* arg4 */
+	movl	RDI(%rsp), %r8d		/* arg5 */
+	movl	R9(%rsp), %r9d		/* arg6 */
 	jmp	cstar_dispatch
 
 sysretl_audit:
-	auditsys_exit sysretl_from_sys_call
+	TRACE_IRQS_ON
+	ENABLE_INTERRUPTS(CLBR_NONE)
+	testl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+	jnz	ia32_ret_from_sys_call
+	movl	%eax, %esi		/* second arg, syscall return value */
+	cmpl	$-MAX_ERRNO, %eax	/* is it an error ? */
+	jbe	1f
+	movslq	%eax, %rsi		/* if error sign extend to 64 bits */
+1:	setbe	%al			/* 1 if error, 0 if not */
+	movzbl	%al, %edi		/* zero-extend that into %edi */
+	call	__audit_syscall_exit
+	movq	RAX(%rsp), %rax		/* reload syscall return value */
+	movl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %edi
+	DISABLE_INTERRUPTS(CLBR_NONE)
+	TRACE_IRQS_OFF
+	testl	%edi, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+	jz	sysretl_from_sys_call
+	xorl	%eax, %eax		/* Do not leak kernel information */
+	movq	%rax, R11(%rsp)
+	movq	%rax, R10(%rsp)
+	movq	%rax, R9(%rsp)
+	movq	%rax, R8(%rsp)
+	jmp	int_ret_from_sys_call_irqs_off
 #endif
 
 cstar_tracesys:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32
  2015-07-24 13:47 [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32 Denys Vlasenko
  2015-07-24 13:47 ` [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1 Denys Vlasenko
  2015-07-24 13:47 ` [PATCH 3/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 2 Denys Vlasenko
@ 2015-07-24 17:37 ` Andy Lutomirski
  2 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-07-24 17:37 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> This change swaps a few instructions in final register restoring/zeroing
> section of SYSENTER fast path, and adds/deletes a few empty lines.
>
> After this, the difference between SYSENTER and SYCALL fast paths
> (after the prologue which saved pt_regs) is very small:
> they differ merely in the choice of register to hold arg6 (EBP or R9)
> and in the value of EDX on exit: SYSENTER ABI doesn't need to preserve it,
> so it is zeroed. SYSCALL preserves it:

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1
  2015-07-24 13:47 ` [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1 Denys Vlasenko
@ 2015-07-24 17:50   ` Andy Lutomirski
  2015-07-25 18:36     ` Denys Vlasenko
  2015-07-27 19:19     ` Denys Vlasenko
  2015-07-27 16:05   ` Ingo Molnar
  1 sibling, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-07-24 17:50 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
> section. Merge them.
>
> The removal is split into two parts, to make review eaiser. This is part 1.
>
> auditsys_entry_common and auditsys_exit macros are indented one more tab without
> any changes. This prevents diff from becoming unreadable.
> They will be removed in part 2.

I need to read these more closely, which is, at present, exceeding my
ability to look at asm.  (See the big NMI thread.)  I'll look soon.

Meanwhile, this code is incredibly fragile wrt syscall restart.
(Syscall restart on compat is really weird.)  Do we have a decent test
for it?

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1
  2015-07-24 17:50   ` Andy Lutomirski
@ 2015-07-25 18:36     ` Denys Vlasenko
  2015-07-25 19:33       ` Andy Lutomirski
  2015-07-27 19:19     ` Denys Vlasenko
  1 sibling, 1 reply; 11+ messages in thread
From: Denys Vlasenko @ 2015-07-25 18:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>> section. Merge them.
>>
>> The removal is split into two parts, to make review eaiser. This is part 1.
>>
>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>> any changes. This prevents diff from becoming unreadable.
>> They will be removed in part 2.
> 
> I need to read these more closely, which is, at present, exceeding my
> ability to look at asm.  (See the big NMI thread.)  I'll look soon.

The "sysenter_fix_flags" thingy prevented the diff from being
a pure delete, so it is not as clear as I hoped.

What patch is doing is actually very simple. It "amputates"
entire SYSENTER code path after it finished creating partially
filled pt_regs, loaded arg6 and dealt with EFLAGS sanitization -
after this is done, the state is identical to the similar
state in SYSCALL code, so we can just use SYSCALL code from that moment
onward! :)


> Meanwhile, this code is incredibly fragile wrt syscall restart.
> (Syscall restart on compat is really weird.)

Weird in what way?

> Do we have a decent test for it?

I don't know of any such test.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1
  2015-07-25 18:36     ` Denys Vlasenko
@ 2015-07-25 19:33       ` Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-07-25 19:33 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Sat, Jul 25, 2015 at 11:36 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
>> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>>> section. Merge them.
>>>
>>> The removal is split into two parts, to make review eaiser. This is part 1.
>>>
>>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>>> any changes. This prevents diff from becoming unreadable.
>>> They will be removed in part 2.
>>
>> I need to read these more closely, which is, at present, exceeding my
>> ability to look at asm.  (See the big NMI thread.)  I'll look soon.
>
> The "sysenter_fix_flags" thingy prevented the diff from being
> a pure delete, so it is not as clear as I hoped.
>
> What patch is doing is actually very simple. It "amputates"
> entire SYSENTER code path after it finished creating partially
> filled pt_regs, loaded arg6 and dealt with EFLAGS sanitization -
> after this is done, the state is identical to the similar
> state in SYSCALL code, so we can just use SYSCALL code from that moment
> onward! :)
>

I certainly agree that your patches are a nice cleanup.  I just want
to make sure there isn't something subtle and undocumented going on
there.

>
>> Meanwhile, this code is incredibly fragile wrt syscall restart.
>> (Syscall restart on compat is really weird.)
>
> Weird in what way?

See:

https://lkml.kernel.org/g/20110821084230.GI2203@ZenIV.linux.org.uk

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1
  2015-07-24 13:47 ` [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1 Denys Vlasenko
  2015-07-24 17:50   ` Andy Lutomirski
@ 2015-07-27 16:05   ` Ingo Molnar
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2015-07-27 16:05 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Krzysztof A. Sobiecki, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
> section. Merge them.
> 
> The removal is split into two parts, to make review eaiser. This is part 1.
> 
> auditsys_entry_common and auditsys_exit macros are indented one more tab without
> any changes. This prevents diff from becoming unreadable.
> They will be removed in part 2.
> 
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Krzysztof A. Sobiecki <sobkas@gmail.com>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Borislav Petkov <bp@alien8.de>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: Alexei Starovoitov <ast@plumgrid.com>
> CC: Will Drewry <wad@chromium.org>
> CC: Kees Cook <keescook@chromium.org>
> CC: x86@kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>  arch/x86/entry/entry_64_compat.S | 237 +++++++++++----------------------------
>  1 file changed, 63 insertions(+), 174 deletions(-)

Yeah, so I realize that this is already a 'cleaner', split up version of your 
original change - but the diffstat is still way too large: _please_ split it up 
into 2-3 further steps to make each step really, utterly obvious.

As you must have experienced it with recent regressions in the x86 entry code, the 
smaller our assembly changes, the easier our job of doing a revert is, when such a 
change regresses ...

I don't care if it ends up being 5-7 patches - that's far more preferable to 
having to decode difficult looking regressions.

For example in hindsight I regret that I did not insist on further splitting up 
this commit:

  53e9accf0f76 ("x86/asm/entry/32: Do not use R9 in SYSCALL32 entry point")

and that was a small patch already:

 arch/x86/entry/ia32entry.S | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

I'm not going to make that mistake again: if you want to change the x86 assembly 
code, you need to learn to decompose dangerous changes into maximally split up, 
atomic, bisectable steps.

Agreed?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1
  2015-07-24 17:50   ` Andy Lutomirski
  2015-07-25 18:36     ` Denys Vlasenko
@ 2015-07-27 19:19     ` Denys Vlasenko
  2015-07-27 19:26       ` Andy Lutomirski
  1 sibling, 1 reply; 11+ messages in thread
From: Denys Vlasenko @ 2015-07-27 19:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>> section. Merge them.
>>
>> The removal is split into two parts, to make review eaiser. This is part 1.
>>
>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>> any changes. This prevents diff from becoming unreadable.
>> They will be removed in part 2.
> 
> I need to read these more closely, which is, at present, exceeding my
> ability to look at asm.  (See the big NMI thread.)  I'll look soon.
> 
> Meanwhile, this code is incredibly fragile wrt syscall restart.
> (Syscall restart on compat is really weird.)  Do we have a decent test
> for it?

How about this? (Feel free to expand, this is a first cut only).

$ ./test_restart
[RUN]	Test restart of read()
[OK]	Restart of read() worked
[RUN]	Test restart of recv()
[OK]	Restart of recv() worked





#undef _GNU_SOURCE
#define _GNU_SOURCE 1
#undef __USE_GNU
#define __USE_GNU 1
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/select.h>
#include <sys/time.h>
//#include <sys/ptrace.h>
#include <sys/wait.h>

void signal_SA_RESTART_empty_mask(int sig, void (*handler)(int))
{
	struct sigaction sa;
	memset(&sa, 0, sizeof(sa));
	sigemptyset(&sa.sa_mask);
	sa.sa_flags = SA_RESTART;
	sa.sa_handler = handler;
	sigaction(sig, &sa, NULL);
}

int make_writer(void)
{
	int pid, fd[2];
	if (pipe(fd)) exit(1);
	pid = fork();
	if (pid < 0) exit(1);
	if (pid == 0) {
		/* child */
		usleep(50*1000);
		write(fd[1], "123456789", 10);
		exit(0);
	}
	close(fd[1]);
	return fd[0];
}

int make_sender(void)
{
	int pid, fd[2];
	if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) exit(1);
	pid = fork();
	if (pid < 0) exit(1);
	if (pid == 0) {
		/* child */
		usleep(50*1000);
		write(fd[1], "123456789", 10);
		exit(0);
	}
	close(fd[1]);
	return fd[0];
}

int got_sig;

void sighandler(int sig)
{
	got_sig = 1;
}

void prepare(char buf[10])
{
	got_sig = 0;
	strcpy(buf, "qwertyuio");
	ualarm(25*1000, 0);
}

int check(const char *test_type, int len, char buf[10])
{
	int err = 0;
	if (len != 10 || strcmp(buf, "123456789") != 0) {
		printf("[FAIL]\tRestarted %s() returns wrong result\n", test_type);
		err = 1;
	}
	if (!got_sig) {
		printf("[FAIL]\tSignal was expected on %s read() but wasn't seen\n", test_type);
		err = 1;
	}
	if (!err)
		printf("[OK]\tRestart of %s() worked\n", test_type);
	return err;
}

int main(int argc, char **argv, char **envp)
{
	char buf[10];
	int fd;
	int len;
	int err;

	signal_SA_RESTART_empty_mask(SIGALRM, sighandler);

	printf("[RUN]\tTest restart of read()\n");
	fd = make_writer();
	prepare(buf);
	len = read(fd, buf, 10);
	err = check("read", len, buf);
	close(fd);

	printf("[RUN]\tTest restart of recv()\n");
	fd = make_sender();
	prepare(buf);
	len = recv(fd, buf, 10, MSG_PEEK);
	err |= check("recv", len, buf);
	if (!err) {
		/* Check that restarted recv() indeed had MSG_PEEK:
		 * i.e. that it did not consume the data.
		 */
		strcpy(buf, "qwertyuio");
		len = recv(fd, buf, 10, 0);
		if (len != 10 || strcmp(buf, "123456789") != 0) {
			printf("[FAIL]\tRestarted %s() had no MSG_PEEK flag\n", "recv");
			err = 1;
		}
	}
	close(fd);

	return err;
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1
  2015-07-27 19:19     ` Denys Vlasenko
@ 2015-07-27 19:26       ` Andy Lutomirski
  2015-08-25  7:19         ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2015-07-27 19:26 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Mon, Jul 27, 2015 at 12:19 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
>> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>>> section. Merge them.
>>>
>>> The removal is split into two parts, to make review eaiser. This is part 1.
>>>
>>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>>> any changes. This prevents diff from becoming unreadable.
>>> They will be removed in part 2.
>>
>> I need to read these more closely, which is, at present, exceeding my
>> ability to look at asm.  (See the big NMI thread.)  I'll look soon.
>>
>> Meanwhile, this code is incredibly fragile wrt syscall restart.
>> (Syscall restart on compat is really weird.)  Do we have a decent test
>> for it?
>
> How about this? (Feel free to expand, this is a first cut only).

On a very brief glance, it looks reasonable, but I'd try it with
recvfrom instead of recv because it's a six-argument syscall.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1
  2015-07-27 19:26       ` Andy Lutomirski
@ 2015-08-25  7:19         ` Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2015-08-25  7:19 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Linus Torvalds, Krzysztof A. Sobiecki,
	Steven Rostedt, Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	X86 ML, linux-kernel

On Mon, Jul 27, 2015 at 12:26 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jul 27, 2015 at 12:19 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> On 07/24/2015 07:50 PM, Andy Lutomirski wrote:
>>> On Fri, Jul 24, 2015 at 6:47 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>>> SYSCALL32 code is nearly identical to SYSCALL32, except for initial
>>>> section. Merge them.
>>>>
>>>> The removal is split into two parts, to make review eaiser. This is part 1.
>>>>
>>>> auditsys_entry_common and auditsys_exit macros are indented one more tab without
>>>> any changes. This prevents diff from becoming unreadable.
>>>> They will be removed in part 2.
>>>
>>> I need to read these more closely, which is, at present, exceeding my
>>> ability to look at asm.  (See the big NMI thread.)  I'll look soon.
>>>
>>> Meanwhile, this code is incredibly fragile wrt syscall restart.
>>> (Syscall restart on compat is really weird.)  Do we have a decent test
>>> for it?
>>
>> How about this? (Feel free to expand, this is a first cut only).
>
> On a very brief glance, it looks reasonable, but I'd try it with
> recvfrom instead of recv because it's a six-argument syscall.
>

I was wondering how syscall restart works on compat syscalls on AMD
(i.e. SYSCALL32).  I think the answer is that it doesn't.  Sigh.
(Tested with ptrace instead of actual syscall restart.)

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-25  7:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24 13:47 [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32 Denys Vlasenko
2015-07-24 13:47 ` [PATCH 2/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 1 Denys Vlasenko
2015-07-24 17:50   ` Andy Lutomirski
2015-07-25 18:36     ` Denys Vlasenko
2015-07-25 19:33       ` Andy Lutomirski
2015-07-27 19:19     ` Denys Vlasenko
2015-07-27 19:26       ` Andy Lutomirski
2015-08-25  7:19         ` Andy Lutomirski
2015-07-27 16:05   ` Ingo Molnar
2015-07-24 13:47 ` [PATCH 3/3] x86/asm/entry/32: Remove most of SYSCALL32 code, part 2 Denys Vlasenko
2015-07-24 17:37 ` [PATCH 1/3] x86/asm/entry/32: Massage SYSENTER32 fast path to be nearly identical to SYSCALL32 Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.