linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] x86_64,entry: Clear NT on entry and speed up switch_to
@ 2014-10-01 18:28 Andy Lutomirski
  2014-10-01 18:28 ` [PATCH v3 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Andy Lutomirski @ 2014-10-01 18:28 UTC (permalink / raw)
  To: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel, Chuck Ebbert,
	Andy Lutomirski

Anish Bhatt noticed that user programs can set RFLAGS.NT before
syscall or sysenter, and the kernel entry code doesn't filter out
NT.  This causes kernel C code and, depending on thread flags, the
exit slow path to run with NT set.

The former is a little bit scary (imagine calling into EFI with NT
set), and the latter will fail with #GP and send a spurious SIGSEGV.

One answer would be "don't do that".  But the kernel can do better
here.

These patches filter NT on all kernel entries.  For syscall (both
bitnesses), this is free.  For sysenter, it seems to cost very
little (less than my ability to measure, although I didn't try that
hard).  Patch 2, which isn't tagged for -stable, speeds up context
switches by avoiding saving and restoring flags, so this series
should be a decent overall performance win.

See: https://bugs.winehq.org/show_bug.cgi?id=33275

Note to bikeshedders: I have no desire to go crazy micro-optimizing
the sysenter path. :) This version seems to be good enough (and
should be a performance *increase* for most workloads).

Changes from v2:
 - Move the flag fixup out of line
 - Fix a CFI buglet

Changes from v1:
 - Spell stable@vger.kernel.org correctly
 - Tidy up changelog text
 - Actually commit an asm constraint fix in patch 2 (egads!)
 - Replace the unconditional popfq with a branch

Andy Lutomirski (2):
  x86_64,entry: Filter RFLAGS.NT on entry from userspace
  x86_64: Don't save flags on context switch

 arch/x86/ia32/ia32entry.S        | 18 +++++++++++++++++-
 arch/x86/include/asm/switch_to.h | 12 ++++++++----
 arch/x86/kernel/cpu/common.c     |  2 +-
 3 files changed, 26 insertions(+), 6 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 18:28 [PATCH v3 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Andy Lutomirski
@ 2014-10-01 18:28 ` Andy Lutomirski
  2014-10-01 18:28 ` [PATCH v3 2/2] x86_64: Don't save flags on context switch Andy Lutomirski
  2014-10-01 18:34 ` [PATCH v3 0/2] x86_64,entry: Clear NT on entry and speed up switch_to H. Peter Anvin
  2 siblings, 0 replies; 17+ messages in thread
From: Andy Lutomirski @ 2014-10-01 18:28 UTC (permalink / raw)
  To: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel, Chuck Ebbert,
	Andy Lutomirski, stable

The NT flag doesn't do anything in long mode other than causing IRET
to #GP.  Oddly, CPL3 code can still set NT using popf.

Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.

If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble.  For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen?  Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault.  That
segfault cannot be handled, because signal delivery fails, too.

This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.

To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it.  As a result,
it seems to have very little effect on SYSENTER performance on my
machine.

There's another minor bug fix in here: it looks like the CFI
annotations were wrong if CONFIG_AUDITSYSCALL=n.

Testers beware: on Xen, SYSENTER with NT set turns into a GPF.

I haven't touched anything on 32-bit kernels.

The syscall mask change comes from a variant of this patch by Anish
Bhatt.

Cc: stable@vger.kernel.org
Reported-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/ia32/ia32entry.S    | 18 +++++++++++++++++-
 arch/x86/kernel/cpu/common.c |  2 +-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb05023c..711de084ab57 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -151,6 +151,16 @@ ENTRY(ia32_sysenter_target)
 1:	movl	(%rbp),%ebp
 	_ASM_EXTABLE(1b,ia32_badarg)
 	ASM_CLAC
+
+	/*
+	 * Sysenter doesn't filter flags, so we need to clear NT
+	 * ourselves.  To save a few cycles, we can check whether
+	 * NT was set instead of doing an unconditional popfq.
+	 */
+	testl $X86_EFLAGS_NT,EFLAGS(%rsp)	/* saved EFLAGS match cpu */
+	jnz sysenter_fix_flags
+sysenter_flags_fixed:
+
 	orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
 	testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
 	CFI_REMEMBER_STATE
@@ -184,6 +194,8 @@ sysexit_from_sys_call:
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS_SYSEXIT32
 
+	CFI_RESTORE_STATE
+
 #ifdef CONFIG_AUDITSYSCALL
 	.macro auditsys_entry_common
 	movl %esi,%r9d			/* 6th arg: 4th syscall arg */
@@ -226,7 +238,6 @@ sysexit_from_sys_call:
 	.endm
 
 sysenter_auditsys:
-	CFI_RESTORE_STATE
 	auditsys_entry_common
 	movl %ebp,%r9d			/* reload 6th syscall arg */
 	jmp sysenter_dispatch
@@ -235,6 +246,11 @@ sysexit_audit:
 	auditsys_exit sysexit_from_sys_call
 #endif
 
+sysenter_fix_flags:
+	pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
+	popfq_cfi
+	jmp sysenter_flags_fixed
+
 sysenter_tracesys:
 #ifdef CONFIG_AUDITSYSCALL
 	testl	$(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e4ab2b42bd6f..31265580c38a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1184,7 +1184,7 @@ void syscall_init(void)
 	/* Flags to clear on syscall */
 	wrmsrl(MSR_SYSCALL_MASK,
 	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
-	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
+	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
 /*
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/2] x86_64: Don't save flags on context switch
  2014-10-01 18:28 [PATCH v3 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Andy Lutomirski
  2014-10-01 18:28 ` [PATCH v3 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
@ 2014-10-01 18:28 ` Andy Lutomirski
  2014-10-01 18:35   ` H. Peter Anvin
                     ` (2 more replies)
  2014-10-01 18:34 ` [PATCH v3 0/2] x86_64,entry: Clear NT on entry and speed up switch_to H. Peter Anvin
  2 siblings, 3 replies; 17+ messages in thread
From: Andy Lutomirski @ 2014-10-01 18:28 UTC (permalink / raw)
  To: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel, Chuck Ebbert,
	Andy Lutomirski

Now that the kernel always runs with clean flags (in particular, NT
is clear), there is no need to save and restore flags on every
context switch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/switch_to.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index d7f3b3b78ac3..751bf4b7bf11 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -79,12 +79,12 @@ do {									\
 #else /* CONFIG_X86_32 */
 
 /* frame pointer must be last for get_wchan */
-#define SAVE_CONTEXT    "pushf ; pushq %%rbp ; movq %%rsi,%%rbp\n\t"
-#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp ; popf\t"
+#define SAVE_CONTEXT    "pushq %%rbp ; movq %%rsi,%%rbp\n\t"
+#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp\t"
 
 #define __EXTRA_CLOBBER  \
 	, "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
-	  "r12", "r13", "r14", "r15"
+	  "r12", "r13", "r14", "r15", "flags"
 
 #ifdef CONFIG_CC_STACKPROTECTOR
 #define __switch_canary							  \
@@ -100,7 +100,11 @@ do {									\
 #define __switch_canary_iparam
 #endif	/* CC_STACKPROTECTOR */
 
-/* Save restore flags to clear handle leaking NT */
+/*
+ * There is no need to save or restore flags, because flags are always
+ * clean in kernel mode, with the possible exception of IOPL.  Kernel IOPL
+ * has no effect.
+ */
 #define switch_to(prev, next, last) \
 	asm volatile(SAVE_CONTEXT					  \
 	     "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */	  \
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/2] x86_64,entry: Clear NT on entry and speed up switch_to
  2014-10-01 18:28 [PATCH v3 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Andy Lutomirski
  2014-10-01 18:28 ` [PATCH v3 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
  2014-10-01 18:28 ` [PATCH v3 2/2] x86_64: Don't save flags on context switch Andy Lutomirski
@ 2014-10-01 18:34 ` H. Peter Anvin
  2 siblings, 0 replies; 17+ messages in thread
From: H. Peter Anvin @ 2014-10-01 18:34 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, X86 ML, Ingo Molnar
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel, Chuck Ebbert

On 10/01/2014 11:28 AM, Andy Lutomirski wrote:
> Anish Bhatt noticed that user programs can set RFLAGS.NT before
> syscall or sysenter, and the kernel entry code doesn't filter out
> NT.  This causes kernel C code and, depending on thread flags, the
> exit slow path to run with NT set.
> 
> The former is a little bit scary (imagine calling into EFI with NT
> set), and the latter will fail with #GP and send a spurious SIGSEGV.
> 
> One answer would be "don't do that".  But the kernel can do better
> here.
> 
> These patches filter NT on all kernel entries.  For syscall (both
> bitnesses), this is free.  For sysenter, it seems to cost very
> little (less than my ability to measure, although I didn't try that
> hard).  Patch 2, which isn't tagged for -stable, speeds up context
> switches by avoiding saving and restoring flags, so this series
> should be a decent overall performance win.
> 
> See: https://bugs.winehq.org/show_bug.cgi?id=33275
> 
> Note to bikeshedders: I have no desire to go crazy micro-optimizing
> the sysenter path. :) This version seems to be good enough (and
> should be a performance *increase* for most workloads).
> 

The motivation for this in -stable is the Wine issue, right?  Could you
please add that to the patch description for the 1/2 patch?

Thanks,

	-hpa


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86_64: Don't save flags on context switch
  2014-10-01 18:28 ` [PATCH v3 2/2] x86_64: Don't save flags on context switch Andy Lutomirski
@ 2014-10-01 18:35   ` H. Peter Anvin
  2014-10-01 18:44     ` Andy Lutomirski
  2014-10-20 18:52   ` Andy Lutomirski
  2014-10-28 11:14   ` [tip:x86/asm] sched/x86_64: " tip-bot for Andy Lutomirski
  2 siblings, 1 reply; 17+ messages in thread
From: H. Peter Anvin @ 2014-10-01 18:35 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, X86 ML, Ingo Molnar
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel, Chuck Ebbert

On 10/01/2014 11:28 AM, Andy Lutomirski wrote:
>  
>  #define __EXTRA_CLOBBER  \
>  	, "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
> -	  "r12", "r13", "r14", "r15"
> +	  "r12", "r13", "r14", "r15", "flags"
>  

I was under the impression that gcc *always* assumes the flags were
clobbered for an asm statement.  Otherwise I think we'd have a lot of
problems.

	-hpa



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86_64: Don't save flags on context switch
  2014-10-01 18:35   ` H. Peter Anvin
@ 2014-10-01 18:44     ` Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: Andy Lutomirski @ 2014-10-01 18:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, Sebastian Lackner,
	Anish Bhatt, linux-kernel, Chuck Ebbert

On Wed, Oct 1, 2014 at 11:35 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 10/01/2014 11:28 AM, Andy Lutomirski wrote:
>>
>>  #define __EXTRA_CLOBBER  \
>>       , "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
>> -       "r12", "r13", "r14", "r15"
>> +       "r12", "r13", "r14", "r15", "flags"
>>
>
> I was under the impression that gcc *always* assumes the flags were
> clobbered for an asm statement.  Otherwise I think we'd have a lot of
> problems.
>

I have no idea, but I doubt that adding the explicit "flags" clobber
hurts, and it will make other people who are unsure about this less
worried.

--Andy

>         -hpa
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] x86_64: Don't save flags on context switch
  2014-10-01 18:28 ` [PATCH v3 2/2] x86_64: Don't save flags on context switch Andy Lutomirski
  2014-10-01 18:35   ` H. Peter Anvin
@ 2014-10-20 18:52   ` Andy Lutomirski
  2014-10-28 11:14   ` [tip:x86/asm] sched/x86_64: " tip-bot for Andy Lutomirski
  2 siblings, 0 replies; 17+ messages in thread
From: Andy Lutomirski @ 2014-10-20 18:52 UTC (permalink / raw)
  To: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel, Chuck Ebbert,
	Andy Lutomirski

On Wed, Oct 1, 2014 at 11:28 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> Now that the kernel always runs with clean flags (in particular, NT
> is clear), there is no need to save and restore flags on every
> context switch.

Since I'm liable to forget about this, and it's a nice speedup, I
figured I'd remind you all, too :)

(Really crude benchmarking in KVM: context switches take around 750ns,
and it's maybe 10ns faster with this patch.  Nothing earth-shattering,
but it's still nice.)

--Andy

>
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
>  arch/x86/include/asm/switch_to.h | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
> index d7f3b3b78ac3..751bf4b7bf11 100644
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -79,12 +79,12 @@ do {                                                                        \
>  #else /* CONFIG_X86_32 */
>
>  /* frame pointer must be last for get_wchan */
> -#define SAVE_CONTEXT    "pushf ; pushq %%rbp ; movq %%rsi,%%rbp\n\t"
> -#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp ; popf\t"
> +#define SAVE_CONTEXT    "pushq %%rbp ; movq %%rsi,%%rbp\n\t"
> +#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp\t"
>
>  #define __EXTRA_CLOBBER  \
>         , "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
> -         "r12", "r13", "r14", "r15"
> +         "r12", "r13", "r14", "r15", "flags"
>
>  #ifdef CONFIG_CC_STACKPROTECTOR
>  #define __switch_canary                                                          \
> @@ -100,7 +100,11 @@ do {                                                                       \
>  #define __switch_canary_iparam
>  #endif /* CC_STACKPROTECTOR */
>
> -/* Save restore flags to clear handle leaking NT */
> +/*
> + * There is no need to save or restore flags, because flags are always
> + * clean in kernel mode, with the possible exception of IOPL.  Kernel IOPL
> + * has no effect.
> + */
>  #define switch_to(prev, next, last) \
>         asm volatile(SAVE_CONTEXT                                         \
>              "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */       \
> --
> 1.9.3
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-10-01 18:28 ` [PATCH v3 2/2] x86_64: Don't save flags on context switch Andy Lutomirski
  2014-10-01 18:35   ` H. Peter Anvin
  2014-10-20 18:52   ` Andy Lutomirski
@ 2014-10-28 11:14   ` tip-bot for Andy Lutomirski
  2014-11-03 21:11     ` Andy Lutomirski
  2 siblings, 1 reply; 17+ messages in thread
From: tip-bot for Andy Lutomirski @ 2014-10-28 11:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, cebbert.lkml, anish, tglx, luto, jbeulich, mingo,
	torvalds, sebastian, hpa

Commit-ID:  2c7577a7583747c9b71f26dced7f696b739da745
Gitweb:     http://git.kernel.org/tip/2c7577a7583747c9b71f26dced7f696b739da745
Author:     Andy Lutomirski <luto@amacapital.net>
AuthorDate: Wed, 1 Oct 2014 11:28:25 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Oct 2014 11:11:30 +0100

sched/x86_64: Don't save flags on context switch

Now that the kernel always runs with clean flags (in particular,
NT is clear), there is no need to save and restore flags on
every context switch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Sebastian Lackner <sebastian@fds-team.de>
Cc: Anish Bhatt <anish@chelsio.com>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/bf6fb790787eb95b922157838f52712c25dda157.1412187233.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/switch_to.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index d7f3b3b..751bf4b 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -79,12 +79,12 @@ do {									\
 #else /* CONFIG_X86_32 */
 
 /* frame pointer must be last for get_wchan */
-#define SAVE_CONTEXT    "pushf ; pushq %%rbp ; movq %%rsi,%%rbp\n\t"
-#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp ; popf\t"
+#define SAVE_CONTEXT    "pushq %%rbp ; movq %%rsi,%%rbp\n\t"
+#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp\t"
 
 #define __EXTRA_CLOBBER  \
 	, "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
-	  "r12", "r13", "r14", "r15"
+	  "r12", "r13", "r14", "r15", "flags"
 
 #ifdef CONFIG_CC_STACKPROTECTOR
 #define __switch_canary							  \
@@ -100,7 +100,11 @@ do {									\
 #define __switch_canary_iparam
 #endif	/* CC_STACKPROTECTOR */
 
-/* Save restore flags to clear handle leaking NT */
+/*
+ * There is no need to save or restore flags, because flags are always
+ * clean in kernel mode, with the possible exception of IOPL.  Kernel IOPL
+ * has no effect.
+ */
 #define switch_to(prev, next, last) \
 	asm volatile(SAVE_CONTEXT					  \
 	     "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */	  \

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-10-28 11:14   ` [tip:x86/asm] sched/x86_64: " tip-bot for Andy Lutomirski
@ 2014-11-03 21:11     ` Andy Lutomirski
  2014-11-03 21:47       ` Oleg Nesterov
  2014-11-03 22:17       ` H. Peter Anvin
  0 siblings, 2 replies; 17+ messages in thread
From: Andy Lutomirski @ 2014-11-03 21:11 UTC (permalink / raw)
  To: Andy Lutomirski, Jan Beulich, Ingo Molnar, Linus Torvalds,
	H. Peter Anvin, Sebastian Lackner, linux-kernel, Chuck Ebbert,
	Anish Bhatt, Thomas Gleixner, Oleg Nesterov
  Cc: linux-tip-commits

On Tue, Oct 28, 2014 at 4:14 AM, tip-bot for Andy Lutomirski
<tipbot@zytor.com> wrote:
> Commit-ID:  2c7577a7583747c9b71f26dced7f696b739da745
> Gitweb:     http://git.kernel.org/tip/2c7577a7583747c9b71f26dced7f696b739da745
> Author:     Andy Lutomirski <luto@amacapital.net>
> AuthorDate: Wed, 1 Oct 2014 11:28:25 -0700
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 28 Oct 2014 11:11:30 +0100
>
> sched/x86_64: Don't save flags on context switch
>
> Now that the kernel always runs with clean flags (in particular,
> NT is clear), there is no need to save and restore flags on
> every context switch.

Just to make myself a little more comfortable with this...

There is one potentially relevant flag: AC.  I think this is still OK.
If we schedule with STAC set, then we've already screwed up, I think.
Even preempt schedules from interrupt context, so if we schedule due
to preemption or #PF in the middle of uaccess, AC should be saved and
cleared by whatever interrupt caused the reschedule, right?

And do we ever have TF set during a context switch?  I hope not.

Also, what's with 'jmp exit_intr' at the end of retint_kernel?  Why
isn't that 'jmp retint_kernel'?

--Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-11-03 21:11     ` Andy Lutomirski
@ 2014-11-03 21:47       ` Oleg Nesterov
  2014-11-03 21:58         ` Oleg Nesterov
  2014-11-03 22:17       ` H. Peter Anvin
  1 sibling, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2014-11-03 21:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jan Beulich, Ingo Molnar, Linus Torvalds, H. Peter Anvin,
	Sebastian Lackner, linux-kernel, Chuck Ebbert, Anish Bhatt,
	Thomas Gleixner, linux-tip-commits

On 11/03, Andy Lutomirski wrote:
>
> And do we ever have TF set during a context switch?  I hope not.

I too hope.

> Also, what's with 'jmp exit_intr' at the end of retint_kernel?  Why
> isn't that 'jmp retint_kernel'?

Even better, why not "jmp retint_restore_args" ?

preempt_schedule_irq() checks need_resched() and returns with irqs
disabled, not need to to recheck test_preempt_need_resched() ?

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-11-03 21:47       ` Oleg Nesterov
@ 2014-11-03 21:58         ` Oleg Nesterov
  2014-11-03 22:37           ` Andy Lutomirski
  0 siblings, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2014-11-03 21:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jan Beulich, Ingo Molnar, Linus Torvalds, H. Peter Anvin,
	Sebastian Lackner, linux-kernel, Chuck Ebbert, Anish Bhatt,
	Thomas Gleixner, linux-tip-commits

On 11/03, Oleg Nesterov wrote:
>
> On 11/03, Andy Lutomirski wrote:
> >
> > And do we ever have TF set during a context switch?  I hope not.
>
> I too hope.
>
> > Also, what's with 'jmp exit_intr' at the end of retint_kernel?  Why
> > isn't that 'jmp retint_kernel'?
>
> Even better, why not "jmp retint_restore_args" ?
>
> preempt_schedule_irq() checks need_resched() and returns with irqs
> disabled, not need to to recheck test_preempt_need_resched() ?

Btw, why retint_kernel() checks "interrupts on" ? It seems to me that
that "interrupts off" is not possible, no? And this will be more clear
when we remove the "exit_intr" label.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-11-03 21:11     ` Andy Lutomirski
  2014-11-03 21:47       ` Oleg Nesterov
@ 2014-11-03 22:17       ` H. Peter Anvin
  1 sibling, 0 replies; 17+ messages in thread
From: H. Peter Anvin @ 2014-11-03 22:17 UTC (permalink / raw)
  To: Andy Lutomirski, Jan Beulich, Ingo Molnar, Linus Torvalds,
	Sebastian Lackner, linux-kernel, Chuck Ebbert, Anish Bhatt,
	Thomas Gleixner, Oleg Nesterov
  Cc: linux-tip-commits

On 11/03/2014 01:11 PM, Andy Lutomirski wrote:
> On Tue, Oct 28, 2014 at 4:14 AM, tip-bot for Andy Lutomirski
> <tipbot@zytor.com> wrote:
>> Commit-ID:  2c7577a7583747c9b71f26dced7f696b739da745
>> Gitweb:     http://git.kernel.org/tip/2c7577a7583747c9b71f26dced7f696b739da745
>> Author:     Andy Lutomirski <luto@amacapital.net>
>> AuthorDate: Wed, 1 Oct 2014 11:28:25 -0700
>> Committer:  Ingo Molnar <mingo@kernel.org>
>> CommitDate: Tue, 28 Oct 2014 11:11:30 +0100
>>
>> sched/x86_64: Don't save flags on context switch
>>
>> Now that the kernel always runs with clean flags (in particular,
>> NT is clear), there is no need to save and restore flags on
>> every context switch.
> 
> Just to make myself a little more comfortable with this...
> 
> There is one potentially relevant flag: AC.  I think this is still OK.
> If we schedule with STAC set, then we've already screwed up, I think.
> Even preempt schedules from interrupt context, so if we schedule due
> to preemption or #PF in the middle of uaccess, AC should be saved and
> cleared by whatever interrupt caused the reschedule, right?
> 
> And do we ever have TF set during a context switch?  I hope not.
> 
> Also, what's with 'jmp exit_intr' at the end of retint_kernel?  Why
> isn't that 'jmp retint_kernel'?
> 

AC is saved by interrupts and cleared by IRET.  We execute CLAC in all
the interrupt entry paths.

	-hpa



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-11-03 21:58         ` Oleg Nesterov
@ 2014-11-03 22:37           ` Andy Lutomirski
  2014-11-03 22:57             ` Oleg Nesterov
  2014-11-04 23:09             ` Oleg Nesterov
  0 siblings, 2 replies; 17+ messages in thread
From: Andy Lutomirski @ 2014-11-03 22:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jan Beulich, Ingo Molnar, Linus Torvalds, H. Peter Anvin,
	Sebastian Lackner, linux-kernel, Chuck Ebbert, Anish Bhatt,
	Thomas Gleixner, linux-tip-commits

On Mon, Nov 3, 2014 at 1:58 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 11/03, Oleg Nesterov wrote:
>>
>> On 11/03, Andy Lutomirski wrote:
>> >
>> > And do we ever have TF set during a context switch?  I hope not.
>>
>> I too hope.
>>
>> > Also, what's with 'jmp exit_intr' at the end of retint_kernel?  Why
>> > isn't that 'jmp retint_kernel'?
>>
>> Even better, why not "jmp retint_restore_args" ?
>>
>> preempt_schedule_irq() checks need_resched() and returns with irqs
>> disabled, not need to to recheck test_preempt_need_resched() ?

Seems reasonable to me.  Want to write the patch?

>
> Btw, why retint_kernel() checks "interrupts on" ? It seems to me that
> that "interrupts off" is not possible, no? And this will be more clear
> when we remove the "exit_intr" label.

We might get there from #MC or from any of a number of synchronous
errors (#GP from xyz_safe, #PF from some atomic uaccess thing or a
vmap fault, etc), and all of those have interrupts off.

--Andy

>
> Oleg.
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-11-03 22:37           ` Andy Lutomirski
@ 2014-11-03 22:57             ` Oleg Nesterov
  2014-11-03 23:02               ` Andy Lutomirski
  2014-11-04 23:09             ` Oleg Nesterov
  1 sibling, 1 reply; 17+ messages in thread
From: Oleg Nesterov @ 2014-11-03 22:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jan Beulich, Ingo Molnar, Linus Torvalds, H. Peter Anvin,
	Sebastian Lackner, linux-kernel, Chuck Ebbert, Anish Bhatt,
	Thomas Gleixner, linux-tip-commits

Argh, sorry for confusion...

On 11/03, Andy Lutomirski wrote:
>
> On Mon, Nov 3, 2014 at 1:58 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 11/03, Oleg Nesterov wrote:
> >>
> > Btw, why retint_kernel() checks "interrupts on" ? It seems to me that
> > that "interrupts off" is not possible, no? And this will be more clear
> > when we remove the "exit_intr" label.
>
> We might get there from #MC or from any of a number of synchronous
> errors (#GP from xyz_safe, #PF from some atomic uaccess thing or a
> vmap fault, etc), and all of those have interrupts off.

Yes, yes, exactly.

I actually tried to say that irqs should be always disabled (afaics!).
IOW "interrupts on" should not be possible, not "interrupts off".

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-11-03 22:57             ` Oleg Nesterov
@ 2014-11-03 23:02               ` Andy Lutomirski
  2014-11-03 23:10                 ` Oleg Nesterov
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2014-11-03 23:02 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jan Beulich, Ingo Molnar, Linus Torvalds, H. Peter Anvin,
	Sebastian Lackner, linux-kernel, Chuck Ebbert, Anish Bhatt,
	Thomas Gleixner, linux-tip-commits

On Mon, Nov 3, 2014 at 2:57 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> Argh, sorry for confusion...
>
> On 11/03, Andy Lutomirski wrote:
>>
>> On Mon, Nov 3, 2014 at 1:58 PM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > On 11/03, Oleg Nesterov wrote:
>> >>
>> > Btw, why retint_kernel() checks "interrupts on" ? It seems to me that
>> > that "interrupts off" is not possible, no? And this will be more clear
>> > when we remove the "exit_intr" label.
>>
>> We might get there from #MC or from any of a number of synchronous
>> errors (#GP from xyz_safe, #PF from some atomic uaccess thing or a
>> vmap fault, etc), and all of those have interrupts off.
>
> Yes, yes, exactly.
>
> I actually tried to say that irqs should be always disabled (afaics!).
> IOW "interrupts on" should not be possible, not "interrupts off".

But this is checking whether interrupts were on in the frame we're
returning to, not whether they're on right now, right?

--Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-11-03 23:02               ` Andy Lutomirski
@ 2014-11-03 23:10                 ` Oleg Nesterov
  0 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2014-11-03 23:10 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jan Beulich, Ingo Molnar, Linus Torvalds, H. Peter Anvin,
	Sebastian Lackner, linux-kernel, Chuck Ebbert, Anish Bhatt,
	Thomas Gleixner, linux-tip-commits

On 11/03, Andy Lutomirski wrote:
>
> On Mon, Nov 3, 2014 at 2:57 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > I actually tried to say that irqs should be always disabled (afaics!).
> > IOW "interrupts on" should not be possible, not "interrupts off".
>
> But this is checking whether interrupts were on in the frame we're
> returning to, not whether they're on right now, right?

OOPS ;) Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip:x86/asm] sched/x86_64: Don't save flags on context switch
  2014-11-03 22:37           ` Andy Lutomirski
  2014-11-03 22:57             ` Oleg Nesterov
@ 2014-11-04 23:09             ` Oleg Nesterov
  1 sibling, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2014-11-04 23:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jan Beulich, Ingo Molnar, Linus Torvalds, H. Peter Anvin,
	Sebastian Lackner, linux-kernel, Chuck Ebbert, Anish Bhatt,
	Thomas Gleixner, linux-tip-commits

Didn't notice this part yesterday...

On 11/03, Andy Lutomirski wrote:
>
> On Mon, Nov 3, 2014 at 1:58 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 11/03, Oleg Nesterov wrote:
> >>
> >> On 11/03, Andy Lutomirski wrote:
> >> >
> >> > Also, what's with 'jmp exit_intr' at the end of retint_kernel?  Why
> >> > isn't that 'jmp retint_kernel'?
> >>
> >> Even better, why not "jmp retint_restore_args" ?
> >>
> >> preempt_schedule_irq() checks need_resched() and returns with irqs
> >> disabled, not need to to recheck test_preempt_need_resched() ?
>
> Seems reasonable to me.  Want to write the patch?

OK, will try to do tomorrow.

And it seems that we can do a bit more, although I need to recheck.


retint_kernel() no longer needs rcx == threadinfo, it doesn't check
TIF_NEED_RESCHED. This means we can shift/remove some GET_THREAD_INFO()'s
under retint_check.

Oleg.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-11-04 23:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-01 18:28 [PATCH v3 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Andy Lutomirski
2014-10-01 18:28 ` [PATCH v3 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
2014-10-01 18:28 ` [PATCH v3 2/2] x86_64: Don't save flags on context switch Andy Lutomirski
2014-10-01 18:35   ` H. Peter Anvin
2014-10-01 18:44     ` Andy Lutomirski
2014-10-20 18:52   ` Andy Lutomirski
2014-10-28 11:14   ` [tip:x86/asm] sched/x86_64: " tip-bot for Andy Lutomirski
2014-11-03 21:11     ` Andy Lutomirski
2014-11-03 21:47       ` Oleg Nesterov
2014-11-03 21:58         ` Oleg Nesterov
2014-11-03 22:37           ` Andy Lutomirski
2014-11-03 22:57             ` Oleg Nesterov
2014-11-03 23:02               ` Andy Lutomirski
2014-11-03 23:10                 ` Oleg Nesterov
2014-11-04 23:09             ` Oleg Nesterov
2014-11-03 22:17       ` H. Peter Anvin
2014-10-01 18:34 ` [PATCH v3 0/2] x86_64,entry: Clear NT on entry and speed up switch_to H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).