linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths
@ 2015-07-08 19:23 Andy Lutomirski
  2015-07-08 19:23 ` [RFC/PATCH 1/7] x86/entry/32: Remove 32-bit syscall audit optimizations Andy Lutomirski
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:23 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst,
	Linus Torvalds, Andy Lutomirski

This is a review version of the 32-bit asm-to-C migration.  I think
it works, but it's not yet well enough tested.  I'm a lot more
familiar with the 64-bit asm than the 32-bit asm.

Al is cc'd, because some of this partially reverts some of his old
changes.

The vm86 stuff especially needs much more careful testing.  Brian,
since you're playing with vm86 now, can you take a look?

NB: Even if this code turns out to be perfect, patches 3 and 4 need
to be squashed together.  Patch 3 is a hack to temporarily add an
assertion that the existing asm works the way I think it does.  The
assertion doesn't apprear to fire but, again, I need to pound on it
harder.

Andy Lutomirski (7):
  x86/entry/32: Remove 32-bit syscall audit optimizations
  x86/entry/32: Fix an incorrect comment for work_notifysig_v86
  [TEMPORARY] x86/entry/32: Sanity check for work_notifysig
  x86/entry/32: Finish removing bogus kernel-mode check
  x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly
  x86/entry/32: Use prepare_exit_to_usermode and syscall_return_slowpath
  x86/entry: Remove do_notify_resume, syscall_trace_leave, and their TIF
    masks

 arch/x86/entry/common.c            |  57 ----------------
 arch/x86/entry/entry_32.S          | 130 ++++---------------------------------
 arch/x86/include/asm/ptrace.h      |   1 -
 arch/x86/include/asm/signal.h      |   1 -
 arch/x86/include/asm/thread_info.h |  16 -----
 arch/x86/kernel/traps.c            |  12 ++++
 arch/x86/kernel/vm86_32.c          |   8 +--
 7 files changed, 26 insertions(+), 199 deletions(-)

-- 
2.4.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC/PATCH 1/7] x86/entry/32: Remove 32-bit syscall audit optimizations
  2015-07-08 19:23 [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths Andy Lutomirski
@ 2015-07-08 19:23 ` Andy Lutomirski
  2015-07-08 19:23 ` [RFC/PATCH 2/7] x86/entry/32: Fix an incorrect comment for work_notifysig_v86 Andy Lutomirski
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:23 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst,
	Linus Torvalds, Andy Lutomirski

The asm audit optimizations are ugly and obfuscate the code too
much.  Remove them.

This will regress performance if syscall auditing is enabled on
32-bit kernels and sysenter is in use.  If this becomes a problem,
interested parties are encouraged to implement the equivalent of the
64-bit opportunistic sysret optimization.

Alternatively, a case could be made that, on 32-bit kernels, a less
messy asm audit optimization could be done.  32-bit kernels don't have
the complicated partial register saving tricks that 64-bit kernels
have, so the sysenter post-syscall path could just call the audit
hooks directly.  Any reimplementation of this ought to demonstrate
that it only calls the audit hook once per syscall, though, which does
not currently appear to be true.  Someone would have to make the case
that doing so would be better than implementing opportunistic sysexit,
though.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 48 ++---------------------------------------------
 1 file changed, 2 insertions(+), 46 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 21dc60a60b5f..90f9e7f6c15e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -45,16 +45,6 @@
 #include <asm/asm.h>
 #include <asm/smap.h>
 
-/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this.  */
-#include <linux/elf-em.h>
-#define AUDIT_ARCH_I386		(EM_386|__AUDIT_ARCH_LE)
-#define __AUDIT_ARCH_LE		0x40000000
-
-#ifndef CONFIG_AUDITSYSCALL
-# define sysenter_audit		syscall_trace_entry
-# define sysexit_audit		syscall_exit_work
-#endif
-
 	.section .entry.text, "ax"
 
 /*
@@ -339,7 +329,7 @@ sysenter_past_esp:
 	GET_THREAD_INFO(%ebp)
 
 	testl	$_TIF_WORK_SYSCALL_ENTRY, TI_flags(%ebp)
-	jnz	sysenter_audit
+	jnz	syscall_trace_entry
 sysenter_do_call:
 	cmpl	$(NR_syscalls), %eax
 	jae	sysenter_badsys
@@ -351,7 +341,7 @@ sysenter_after_call:
 	TRACE_IRQS_OFF
 	movl	TI_flags(%ebp), %ecx
 	testl	$_TIF_ALLWORK_MASK, %ecx
-	jnz	sysexit_audit
+	jnz	syscall_exit_work
 sysenter_exit:
 /* if something modifies registers it must also disable sysexit */
 	movl	PT_EIP(%esp), %edx
@@ -362,40 +352,6 @@ sysenter_exit:
 	PTGS_TO_GS
 	ENABLE_INTERRUPTS_SYSEXIT
 
-#ifdef CONFIG_AUDITSYSCALL
-sysenter_audit:
-	testl	$(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT), TI_flags(%ebp)
-	jnz	syscall_trace_entry
-	/* movl	PT_EAX(%esp), %eax already set, syscall number: 1st arg to audit */
-	movl	PT_EBX(%esp), %edx		/* ebx/a0: 2nd arg to audit */
-	/* movl	PT_ECX(%esp), %ecx already set, a1: 3nd arg to audit */
-	pushl	PT_ESI(%esp)			/* a3: 5th arg */
-	pushl	PT_EDX+4(%esp)			/* a2: 4th arg */
-	call	__audit_syscall_entry
-	popl	%ecx				/* get that remapped edx off the stack */
-	popl	%ecx				/* get that remapped esi off the stack */
-	movl	PT_EAX(%esp), %eax		/* reload syscall number */
-	jmp	sysenter_do_call
-
-sysexit_audit:
-	testl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %ecx
-	jnz	syscall_exit_work
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_ANY)
-	movl	%eax, %edx			/* second arg, syscall return value */
-	cmpl	$-MAX_ERRNO, %eax		/* is it an error ? */
-	setbe %al				/* 1 if so, 0 if not */
-	movzbl %al, %eax			/* zero-extend that */
-	call	__audit_syscall_exit
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	TRACE_IRQS_OFF
-	movl	TI_flags(%ebp), %ecx
-	testl	$(_TIF_ALLWORK_MASK & ~_TIF_SYSCALL_AUDIT), %ecx
-	jnz	syscall_exit_work
-	movl	PT_EAX(%esp), %eax		/* reload syscall return value */
-	jmp	sysenter_exit
-#endif
-
 .pushsection .fixup, "ax"
 2:	movl	$0, PT_FS(%esp)
 	jmp	1b
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC/PATCH 2/7] x86/entry/32: Fix an incorrect comment for work_notifysig_v86
  2015-07-08 19:23 [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths Andy Lutomirski
  2015-07-08 19:23 ` [RFC/PATCH 1/7] x86/entry/32: Remove 32-bit syscall audit optimizations Andy Lutomirski
@ 2015-07-08 19:23 ` Andy Lutomirski
  2015-07-08 19:24 ` [RFC/PATCH 3/7] [TEMPORARY] x86/entry/32: Sanity check for work_notifysig Andy Lutomirski
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:23 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst,
	Linus Torvalds, Andy Lutomirski

This code path is for returns to user mode only.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 90f9e7f6c15e..a36d5df6a749 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -484,8 +484,7 @@ work_notifysig:					# deal with pending signals and
 #ifdef CONFIG_VM86
 	testl	$X86_EFLAGS_VM, PT_EFLAGS(%esp)
 	movl	%esp, %eax
-	jnz	work_notifysig_v86		# returning to kernel-space or
-						# vm86-space
+	jnz	work_notifysig_v86		# special case for v86
 1:
 #else
 	movl	%esp, %eax
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC/PATCH 3/7] [TEMPORARY] x86/entry/32: Sanity check for work_notifysig
  2015-07-08 19:23 [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths Andy Lutomirski
  2015-07-08 19:23 ` [RFC/PATCH 1/7] x86/entry/32: Remove 32-bit syscall audit optimizations Andy Lutomirski
  2015-07-08 19:23 ` [RFC/PATCH 2/7] x86/entry/32: Fix an incorrect comment for work_notifysig_v86 Andy Lutomirski
@ 2015-07-08 19:24 ` Andy Lutomirski
  2015-07-08 19:25   ` Andy Lutomirski
  2015-07-08 19:24 ` [RFC/PATCH 4/7] x86/entry/32: Finish removing bogus kernel-mode check Andy Lutomirski
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:24 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst,
	Linus Torvalds, Andy Lutomirski

44fbbb3dc687c added an unnecessary check.  Add a temporary assertion
to confirm that it's unnecessary.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index a36d5df6a749..d36afad80ad1 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -491,10 +491,13 @@ work_notifysig:					# deal with pending signals and
 #endif
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
+	/* Temporary sanity check */
 	movb	PT_CS(%esp), %bl
 	andb	$SEGMENT_RPL_MASK, %bl
 	cmpb	$USER_RPL, %bl
-	jb	resume_kernel
+	jnb	2f
+	ud2
+2:
 	xorl	%edx, %edx
 	call	do_notify_resume
 	jmp	resume_userspace
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC/PATCH 4/7] x86/entry/32: Finish removing bogus kernel-mode check
  2015-07-08 19:23 [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths Andy Lutomirski
                   ` (2 preceding siblings ...)
  2015-07-08 19:24 ` [RFC/PATCH 3/7] [TEMPORARY] x86/entry/32: Sanity check for work_notifysig Andy Lutomirski
@ 2015-07-08 19:24 ` Andy Lutomirski
  2015-07-08 19:25   ` Andy Lutomirski
  2015-07-08 19:24 ` [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly Andy Lutomirski
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:24 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst,
	Linus Torvalds, Andy Lutomirski

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d36afad80ad1..66ff9c4055d7 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -491,13 +491,6 @@ work_notifysig:					# deal with pending signals and
 #endif
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	/* Temporary sanity check */
-	movb	PT_CS(%esp), %bl
-	andb	$SEGMENT_RPL_MASK, %bl
-	cmpb	$USER_RPL, %bl
-	jnb	2f
-	ud2
-2:
 	xorl	%edx, %edx
 	call	do_notify_resume
 	jmp	resume_userspace
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly
  2015-07-08 19:23 [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths Andy Lutomirski
                   ` (3 preceding siblings ...)
  2015-07-08 19:24 ` [RFC/PATCH 4/7] x86/entry/32: Finish removing bogus kernel-mode check Andy Lutomirski
@ 2015-07-08 19:24 ` Andy Lutomirski
  2015-07-09 22:41   ` Andy Lutomirski
  2015-07-08 19:24 ` [RFC/PATCH 6/7] x86/entry/32: Use prepare_exit_to_usermode and syscall_return_slowpath Andy Lutomirski
  2015-07-08 19:24 ` [RFC/PATCH 7/7] x86/entry: Remove do_notify_resume, syscall_trace_leave, and their TIF masks Andy Lutomirski
  6 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:24 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst,
	Linus Torvalds, Andy Lutomirski

The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable.
vm86 mode was completely broken under ptrace, for example, because
we'd never make it to v8086 mode.

This code is still a huge, scary mess, but at least it's no longer
tangled with the exit-to-userspace loop.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 17 -----------------
 arch/x86/kernel/traps.c   | 12 ++++++++++++
 arch/x86/kernel/vm86_32.c |  8 ++------
 3 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 66ff9c4055d7..3afd201bce55 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -481,29 +481,12 @@ work_resched:
 
 work_notifysig:					# deal with pending signals and
 						# notify-resume requests
-#ifdef CONFIG_VM86
-	testl	$X86_EFLAGS_VM, PT_EFLAGS(%esp)
-	movl	%esp, %eax
-	jnz	work_notifysig_v86		# special case for v86
-1:
-#else
 	movl	%esp, %eax
-#endif
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
 	xorl	%edx, %edx
 	call	do_notify_resume
 	jmp	resume_userspace
-
-#ifdef CONFIG_VM86
-	ALIGN
-work_notifysig_v86:
-	pushl	%ecx				# save ti_flags for do_notify_resume
-	call	save_v86_state			# %eax contains pt_regs pointer
-	popl	%ecx
-	movl	%eax, %esp
-	jmp	1b
-#endif
 END(work_pending)
 
 	# perform syscall exit tracing
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8e65d8a9b8db..f86172d4f5db 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -190,6 +190,13 @@ do_trap_no_signal(struct task_struct *tsk, int trapnr, char *str,
 		 * On nmi (interrupt 2), do_trap should not be called.
 		 */
 		if (trapnr < X86_TRAP_UD) {
+			/*
+			 * handle_vm86_trap may not return.  If that
+			 * happens, then there is no debug stack counter
+			 * (it's a 32-bit kernel) and handle_vm86_trap
+			 * will clear the preempt counter.
+			 */
+
 			if (!handle_vm86_trap((struct kernel_vm86_regs *) regs,
 						error_code, trapnr))
 				return 0;
@@ -650,6 +657,11 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
 	preempt_conditional_sti(regs);
 
 	if (v8086_mode(regs)) {
+		/*
+		 * handle_vm86_trap may not return.  If that happens, then
+		 * there is no debug stack counter (it's a 32-bit kernel)
+		 * and handle_vm86_trap will clear the preempt counter.
+		 */
 		handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code,
 					X86_TRAP_DB);
 		preempt_conditional_cli(regs);
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index fc9db6ef2a95..c526ee34c22d 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -351,6 +351,7 @@ static inline void return_to_32bit(struct kernel_vm86_regs *regs16, int retval)
 {
 	struct pt_regs *regs32;
 
+	preempt_count_set(0);
 	regs32 = save_v86_state(regs16);
 	regs32->ax = retval;
 	__asm__ __volatile__("movl %0,%%esp\n\t"
@@ -549,12 +550,7 @@ int handle_vm86_trap(struct kernel_vm86_regs *regs, long error_code, int trapno)
 {
 	if (VMPI.is_vm86pus) {
 		if ((trapno == 3) || (trapno == 1)) {
-			KVM86->regs32->ax = VM86_TRAP + (trapno << 8);
-			/* setting this flag forces the code in entry_32.S to
-			   the path where we call save_v86_state() and change
-			   the stack pointer to KVM86->regs32 */
-			set_thread_flag(TIF_NOTIFY_RESUME);
-			return 0;
+			return_to_32bit(regs, VM86_TRAP + (trapno << 8));
 		}
 		do_int(regs, trapno, (unsigned char __user *) (regs->pt.ss << 4), SP(regs));
 		return 0;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC/PATCH 6/7] x86/entry/32: Use prepare_exit_to_usermode and syscall_return_slowpath
  2015-07-08 19:23 [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths Andy Lutomirski
                   ` (4 preceding siblings ...)
  2015-07-08 19:24 ` [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly Andy Lutomirski
@ 2015-07-08 19:24 ` Andy Lutomirski
  2015-07-08 19:24 ` [RFC/PATCH 7/7] x86/entry: Remove do_notify_resume, syscall_trace_leave, and their TIF masks Andy Lutomirski
  6 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:24 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst,
	Linus Torvalds, Andy Lutomirski

This removes the hybrid asm-and-C implementation of exit work.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/entry_32.S | 62 +++++++++--------------------------------------
 1 file changed, 11 insertions(+), 51 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 3afd201bce55..b2909bf8cf70 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -256,14 +256,10 @@ ret_from_intr:
 
 ENTRY(resume_userspace)
 	LOCKDEP_SYS_EXIT
-	DISABLE_INTERRUPTS(CLBR_ANY)		# make sure we don't miss an interrupt
-						# setting need_resched or sigpending
-						# between sampling and the iret
+	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF
-	movl	TI_flags(%ebp), %ecx
-	andl	$_TIF_WORK_MASK, %ecx		# is there any work to be done on
-						# int/exception return?
-	jne	work_pending
+	movl	%esp, %eax
+	call	prepare_exit_to_usermode
 	jmp	restore_all
 END(ret_from_exception)
 
@@ -341,7 +337,7 @@ sysenter_after_call:
 	TRACE_IRQS_OFF
 	movl	TI_flags(%ebp), %ecx
 	testl	$_TIF_ALLWORK_MASK, %ecx
-	jnz	syscall_exit_work
+	jnz	syscall_exit_work_irqs_off
 sysenter_exit:
 /* if something modifies registers it must also disable sysexit */
 	movl	PT_EIP(%esp), %edx
@@ -377,13 +373,7 @@ syscall_after_call:
 	movl	%eax, PT_EAX(%esp)		# store the return value
 syscall_exit:
 	LOCKDEP_SYS_EXIT
-	DISABLE_INTERRUPTS(CLBR_ANY)		# make sure we don't miss an interrupt
-						# setting need_resched or sigpending
-						# between sampling and the iret
-	TRACE_IRQS_OFF
-	movl	TI_flags(%ebp), %ecx
-	testl	$_TIF_ALLWORK_MASK, %ecx	# current->work
-	jnz	syscall_exit_work
+	jmp	syscall_exit_work
 
 restore_all:
 	TRACE_IRQS_IRET
@@ -460,35 +450,6 @@ ldt_ss:
 #endif
 ENDPROC(entry_INT80_32)
 
-	# perform work that needs to be done immediately before resumption
-	ALIGN
-work_pending:
-	testb	$_TIF_NEED_RESCHED, %cl
-	jz	work_notifysig
-work_resched:
-	call	schedule
-	LOCKDEP_SYS_EXIT
-	DISABLE_INTERRUPTS(CLBR_ANY)		# make sure we don't miss an interrupt
-						# setting need_resched or sigpending
-						# between sampling and the iret
-	TRACE_IRQS_OFF
-	movl	TI_flags(%ebp), %ecx
-	andl	$_TIF_WORK_MASK, %ecx		# is there any work to be done other
-						# than syscall tracing?
-	jz	restore_all
-	testb	$_TIF_NEED_RESCHED, %cl
-	jnz	work_resched
-
-work_notifysig:					# deal with pending signals and
-						# notify-resume requests
-	movl	%esp, %eax
-	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_NONE)
-	xorl	%edx, %edx
-	call	do_notify_resume
-	jmp	resume_userspace
-END(work_pending)
-
 	# perform syscall exit tracing
 	ALIGN
 syscall_trace_entry:
@@ -503,15 +464,14 @@ END(syscall_trace_entry)
 
 	# perform syscall exit tracing
 	ALIGN
-syscall_exit_work:
-	testl	$_TIF_WORK_SYSCALL_EXIT, %ecx
-	jz	work_pending
+syscall_exit_work_irqs_off:
 	TRACE_IRQS_ON
-	ENABLE_INTERRUPTS(CLBR_ANY)		# could let syscall_trace_leave() call
-						# schedule() instead
+	ENABLE_INTERRUPTS(CLBR_ANY)
+
+syscall_exit_work:
 	movl	%esp, %eax
-	call	syscall_trace_leave
-	jmp	resume_userspace
+	call	syscall_return_slowpath
+	jmp	restore_all
 END(syscall_exit_work)
 
 syscall_fault:
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC/PATCH 7/7] x86/entry: Remove do_notify_resume, syscall_trace_leave, and their TIF masks
  2015-07-08 19:23 [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths Andy Lutomirski
                   ` (5 preceding siblings ...)
  2015-07-08 19:24 ` [RFC/PATCH 6/7] x86/entry/32: Use prepare_exit_to_usermode and syscall_return_slowpath Andy Lutomirski
@ 2015-07-08 19:24 ` Andy Lutomirski
  6 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:24 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Brian Gerst,
	Linus Torvalds, Andy Lutomirski

They are no longer used.  Good riddance!

Deleting the TIF_ macros is really nice.  It was never clear why
there were so many variants.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/entry/common.c            | 57 --------------------------------------
 arch/x86/include/asm/ptrace.h      |  1 -
 arch/x86/include/asm/signal.h      |  1 -
 arch/x86/include/asm/thread_info.h | 16 -----------
 4 files changed, 75 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index febc53086a69..1e8a2901fa00 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -207,37 +207,6 @@ long syscall_trace_enter(struct pt_regs *regs)
 		return syscall_trace_enter_phase2(regs, arch, phase1_result);
 }
 
-/* Deprecated. */
-void syscall_trace_leave(struct pt_regs *regs)
-{
-	bool step;
-
-	/*
-	 * We may come here right after calling schedule_user()
-	 * or do_notify_resume(), in which case we can be in RCU
-	 * user mode.
-	 */
-	user_exit();
-
-	audit_syscall_exit(regs);
-
-	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
-		trace_sys_exit(regs, regs->ax);
-
-	/*
-	 * If TIF_SYSCALL_EMU is set, we only get here because of
-	 * TIF_SINGLESTEP (i.e. this is PTRACE_SYSEMU_SINGLESTEP).
-	 * We already reported this syscall instruction in
-	 * syscall_trace_enter().
-	 */
-	step = unlikely(test_thread_flag(TIF_SINGLESTEP)) &&
-			!test_thread_flag(TIF_SYSCALL_EMU);
-	if (step || test_thread_flag(TIF_SYSCALL_TRACE))
-		tracehook_report_syscall_exit(regs, step);
-
-	user_enter();
-}
-
 static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
 {
 	unsigned long top_of_stack =
@@ -346,29 +315,3 @@ __visible void syscall_return_slowpath(struct pt_regs *regs)
 	local_irq_disable();
 	prepare_exit_to_usermode(regs);
 }
-
-/*
- * Deprecated notification of userspace execution resumption
- * - triggered by the TIF_WORK_MASK flags
- */
-__visible void
-do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
-{
-	user_exit();
-
-	if (thread_info_flags & _TIF_UPROBE)
-		uprobe_notify_resume(regs);
-
-	/* deal with pending signal delivery */
-	if (thread_info_flags & _TIF_SIGPENDING)
-		do_signal(regs);
-
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
-		clear_thread_flag(TIF_NOTIFY_RESUME);
-		tracehook_notify_resume(regs);
-	}
-	if (thread_info_flags & _TIF_USER_RETURN_NOTIFY)
-		fire_user_return_notifiers();
-
-	user_enter();
-}
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 5fabf1362942..6271281f947d 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -88,7 +88,6 @@ extern long syscall_trace_enter_phase2(struct pt_regs *, u32 arch,
 				       unsigned long phase1_result);
 
 extern long syscall_trace_enter(struct pt_regs *);
-extern void syscall_trace_leave(struct pt_regs *);
 
 static inline unsigned long regs_return_value(struct pt_regs *regs)
 {
diff --git a/arch/x86/include/asm/signal.h b/arch/x86/include/asm/signal.h
index b42408bcf6b5..c481be78fcf1 100644
--- a/arch/x86/include/asm/signal.h
+++ b/arch/x86/include/asm/signal.h
@@ -31,7 +31,6 @@ typedef sigset_t compat_sigset_t;
 #include <uapi/asm/signal.h>
 #ifndef __ASSEMBLY__
 extern void do_signal(struct pt_regs *regs);
-extern void do_notify_resume(struct pt_regs *, void *, __u32);
 
 #define __ARCH_HAS_SA_RESTORER
 
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 225ee545e1a0..aa12c5bbf192 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -140,27 +140,11 @@ struct thread_info {
 	 _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT |	\
 	 _TIF_NOHZ)
 
-/* work to do in syscall_trace_leave() */
-#define _TIF_WORK_SYSCALL_EXIT	\
-	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP |	\
-	 _TIF_SYSCALL_TRACEPOINT | _TIF_NOHZ)
-
-/* work to do on interrupt/exception return */
-#define _TIF_WORK_MASK							\
-	(0x0000FFFF &							\
-	 ~(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|			\
-	   _TIF_SINGLESTEP|_TIF_SECCOMP|_TIF_SYSCALL_EMU))
-
 /* work to do on any return to user space */
 #define _TIF_ALLWORK_MASK						\
 	((0x0000FFFF & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT |	\
 	_TIF_NOHZ)
 
-/* Only used for 64 bit */
-#define _TIF_DO_NOTIFY_MASK						\
-	(_TIF_SIGPENDING | _TIF_NOTIFY_RESUME |				\
-	 _TIF_USER_RETURN_NOTIFY | _TIF_UPROBE)
-
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW							\
 	(_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH 3/7] [TEMPORARY] x86/entry/32: Sanity check for work_notifysig
  2015-07-08 19:24 ` [RFC/PATCH 3/7] [TEMPORARY] x86/entry/32: Sanity check for work_notifysig Andy Lutomirski
@ 2015-07-08 19:25   ` Andy Lutomirski
  0 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Frédéric Weisbecker,
	Rik van Riel, Oleg Nesterov, Denys Vlasenko, Borislav Petkov,
	Kees Cook, Brian Gerst, Linus Torvalds, Al Viro

[cc: Al -- saying I'm cc-ing you in the patch 0 description doesn't make it so.]

On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <luto@kernel.org> wrote:
> 44fbbb3dc687c added an unnecessary check.  Add a temporary assertion
> to confirm that it's unnecessary.
>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_32.S | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index a36d5df6a749..d36afad80ad1 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -491,10 +491,13 @@ work_notifysig:                                   # deal with pending signals and
>  #endif
>         TRACE_IRQS_ON
>         ENABLE_INTERRUPTS(CLBR_NONE)
> +       /* Temporary sanity check */
>         movb    PT_CS(%esp), %bl
>         andb    $SEGMENT_RPL_MASK, %bl
>         cmpb    $USER_RPL, %bl
> -       jb      resume_kernel
> +       jnb     2f
> +       ud2
> +2:
>         xorl    %edx, %edx
>         call    do_notify_resume
>         jmp     resume_userspace
> --
> 2.4.3
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH 4/7] x86/entry/32: Finish removing bogus kernel-mode check
  2015-07-08 19:24 ` [RFC/PATCH 4/7] x86/entry/32: Finish removing bogus kernel-mode check Andy Lutomirski
@ 2015-07-08 19:25   ` Andy Lutomirski
  0 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-08 19:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Frédéric Weisbecker,
	Rik van Riel, Oleg Nesterov, Denys Vlasenko, Borislav Petkov,
	Kees Cook, Brian Gerst, Linus Torvalds, Al Viro

[cc: Al -- saying I'm cc-ing you in the patch 0 description doesn't make it so.]

On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <luto@kernel.org> wrote:
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/entry/entry_32.S | 7 -------
>  1 file changed, 7 deletions(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index d36afad80ad1..66ff9c4055d7 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -491,13 +491,6 @@ work_notifysig:                                    # deal with pending signals and
>  #endif
>         TRACE_IRQS_ON
>         ENABLE_INTERRUPTS(CLBR_NONE)
> -       /* Temporary sanity check */
> -       movb    PT_CS(%esp), %bl
> -       andb    $SEGMENT_RPL_MASK, %bl
> -       cmpb    $USER_RPL, %bl
> -       jnb     2f
> -       ud2
> -2:
>         xorl    %edx, %edx
>         call    do_notify_resume
>         jmp     resume_userspace
> --
> 2.4.3
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly
  2015-07-08 19:24 ` [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly Andy Lutomirski
@ 2015-07-09 22:41   ` Andy Lutomirski
  2015-07-10  1:33     ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-09 22:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Frédéric Weisbecker,
	Rik van Riel, Oleg Nesterov, Denys Vlasenko, Borislav Petkov,
	Kees Cook, Brian Gerst, Linus Torvalds

On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <luto@kernel.org> wrote:
> The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable.
> vm86 mode was completely broken under ptrace, for example, because
> we'd never make it to v8086 mode.
>
> This code is still a huge, scary mess, but at least it's no longer
> tangled with the exit-to-userspace loop.

This patch is incorrect.  Brian, what's the ETA for your vm86 cleanup?
 If it's very soon, then I'll see if I can rely on it.  If not, I'll
have to come up with a way to fix this patch.

Grr.  The kernel state when handle_vm86_trap is called is absurd right
now.  Somehow we're supposed to survive do_trap, send a signal
corresponding to the outside-vm86 state, and exit vm86 cleanly (with
ax = 0), all before returning to user mode.  I doubt these semantics
are even intentional.

This code sucks.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly
  2015-07-09 22:41   ` Andy Lutomirski
@ 2015-07-10  1:33     ` Andy Lutomirski
  2015-07-10 15:27       ` Brian Gerst
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2015-07-10  1:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, Frédéric Weisbecker,
	Rik van Riel, Oleg Nesterov, Denys Vlasenko, Borislav Petkov,
	Kees Cook, Brian Gerst, Linus Torvalds

On Thu, Jul 9, 2015 at 3:41 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable.
>> vm86 mode was completely broken under ptrace, for example, because
>> we'd never make it to v8086 mode.
>>
>> This code is still a huge, scary mess, but at least it's no longer
>> tangled with the exit-to-userspace loop.
>
> This patch is incorrect.  Brian, what's the ETA for your vm86 cleanup?
>  If it's very soon, then I'll see if I can rely on it.  If not, I'll
> have to come up with a way to fix this patch.
>
> Grr.  The kernel state when handle_vm86_trap is called is absurd right
> now.  Somehow we're supposed to survive do_trap, send a signal
> corresponding to the outside-vm86 state, and exit vm86 cleanly (with
> ax = 0), all before returning to user mode.  I doubt these semantics
> are even intentional.
>
> This code sucks.

OK, I have a version that seems to work.  It comes with a much better
selftest, too.  I'll send it shortly.

Brian, would it make sense to base your work on top of it?

Now that I've looked at this stuff, if I were designing Linux support
for v8086 mode, I'd do it very differently.  There wouldn't be a vm86
syscall at all.  Instead you'd call sigaltstack, then raise a signal,
set X86_EFLAGS_VM, and return.

The kernel would handle X86_EFLAGS_VM being set by setting TIF_V8086
and adjusting sp0.  On entry, TIF_V8086 would move the segment
registers from the hardware frame into pt_regs and, on exit, TIF_V8086
would move them back.  Clearing X86_EFLAGS_VM (via ptrace, signal
delivery, or sigreturn) would sanitize the segment registers.

SYSENTER would be safe, so the SYSENTER_CS hack wouldn't be needed.
Of course, we'd lose the CPU state, so the user would have to be
careful.

And that's it.  There wouldn't be any emulation -- user code could
emulate syscalls all by itself in a signal handler.  Exiting v8086
mode would be straightforward -- just do anything that would raise a
signal.

Of course, this isn't at all ABI-compatible with the current turd, and
v8086 mode isn't really that useful, so this is just idle retroactive
speculation.  But the TIF_V8086 trick would still be useful to let us
get rid of all the awful hacks in the trap and exit code.

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly
  2015-07-10  1:33     ` Andy Lutomirski
@ 2015-07-10 15:27       ` Brian Gerst
  0 siblings, 0 replies; 13+ messages in thread
From: Brian Gerst @ 2015-07-10 15:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, X86 ML, linux-kernel,
	Frédéric Weisbecker, Rik van Riel, Oleg Nesterov,
	Denys Vlasenko, Borislav Petkov, Kees Cook, Linus Torvalds

On Thu, Jul 9, 2015 at 9:33 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Jul 9, 2015 at 3:41 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Jul 8, 2015 at 12:24 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>> The TIF_NOTIFY_RESUME hack it was using was buggy and unsupportable.
>>> vm86 mode was completely broken under ptrace, for example, because
>>> we'd never make it to v8086 mode.
>>>
>>> This code is still a huge, scary mess, but at least it's no longer
>>> tangled with the exit-to-userspace loop.
>>
>> This patch is incorrect.  Brian, what's the ETA for your vm86 cleanup?
>>  If it's very soon, then I'll see if I can rely on it.  If not, I'll
>> have to come up with a way to fix this patch.
>>
>> Grr.  The kernel state when handle_vm86_trap is called is absurd right
>> now.  Somehow we're supposed to survive do_trap, send a signal
>> corresponding to the outside-vm86 state, and exit vm86 cleanly (with
>> ax = 0), all before returning to user mode.  I doubt these semantics
>> are even intentional.
>>
>> This code sucks.
>
> OK, I have a version that seems to work.  It comes with a much better
> selftest, too.  I'll send it shortly.
>
> Brian, would it make sense to base your work on top of it?
>
> Now that I've looked at this stuff, if I were designing Linux support
> for v8086 mode, I'd do it very differently.  There wouldn't be a vm86
> syscall at all.  Instead you'd call sigaltstack, then raise a signal,
> set X86_EFLAGS_VM, and return.
>
> The kernel would handle X86_EFLAGS_VM being set by setting TIF_V8086
> and adjusting sp0.  On entry, TIF_V8086 would move the segment
> registers from the hardware frame into pt_regs and, on exit, TIF_V8086
> would move them back.  Clearing X86_EFLAGS_VM (via ptrace, signal
> delivery, or sigreturn) would sanitize the segment registers.
>
> SYSENTER would be safe, so the SYSENTER_CS hack wouldn't be needed.
> Of course, we'd lose the CPU state, so the user would have to be
> careful.
>
> And that's it.  There wouldn't be any emulation -- user code could
> emulate syscalls all by itself in a signal handler.  Exiting v8086
> mode would be straightforward -- just do anything that would raise a
> signal.
>
> Of course, this isn't at all ABI-compatible with the current turd, and
> v8086 mode isn't really that useful, so this is just idle retroactive
> speculation.  But the TIF_V8086 trick would still be useful to let us
> get rid of all the awful hacks in the trap and exit code.

I'll post my patches tonight when I get home.  It would probably make
more sense for you to base off mine, since it should eliminate the
need for you to touch any vm86 code.

I fixed the signal issue by checking if the VM flag is set in
handle_signal(), and swap the register state there before pushing the
signal frame, but that is only possible after removing the need to
jump back into the exit asm routines. work_notifysig_v86 is gone too
as a result.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-07-10 15:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-08 19:23 [RFC/PATCH 0/7] x86_32: Migrate to new entry/exit paths Andy Lutomirski
2015-07-08 19:23 ` [RFC/PATCH 1/7] x86/entry/32: Remove 32-bit syscall audit optimizations Andy Lutomirski
2015-07-08 19:23 ` [RFC/PATCH 2/7] x86/entry/32: Fix an incorrect comment for work_notifysig_v86 Andy Lutomirski
2015-07-08 19:24 ` [RFC/PATCH 3/7] [TEMPORARY] x86/entry/32: Sanity check for work_notifysig Andy Lutomirski
2015-07-08 19:25   ` Andy Lutomirski
2015-07-08 19:24 ` [RFC/PATCH 4/7] x86/entry/32: Finish removing bogus kernel-mode check Andy Lutomirski
2015-07-08 19:25   ` Andy Lutomirski
2015-07-08 19:24 ` [RFC/PATCH 5/7] x86/vm86: Teach handle_vm86_trap to return to 32bit mode directly Andy Lutomirski
2015-07-09 22:41   ` Andy Lutomirski
2015-07-10  1:33     ` Andy Lutomirski
2015-07-10 15:27       ` Brian Gerst
2015-07-08 19:24 ` [RFC/PATCH 6/7] x86/entry/32: Use prepare_exit_to_usermode and syscall_return_slowpath Andy Lutomirski
2015-07-08 19:24 ` [RFC/PATCH 7/7] x86/entry: Remove do_notify_resume, syscall_trace_leave, and their TIF masks Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).