linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to
@ 2014-09-30 19:40 Andy Lutomirski
  2014-09-30 19:40 ` [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Andy Lutomirski @ 2014-09-30 19:40 UTC (permalink / raw)
  To: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel, Chuck Ebbert,
	Andy Lutomirski

Anish Bhatt noticed that user programs can set RFLAGS.NT before
syscall or sysenter, and the kernel entry code doesn't filter out
NT.  This causes kernel C code and, depending on thread flags, the
exit slow path to run with NT set.

The former is a little bit scary (imagine calling into EFI with NT
set), and the latter will fail with #GP and send a spurious SIGSEGV.

One answer would be "don't do that".  But the kernel can do better
here.

These patches, which I'm not completely thrilled by, filter NT on
all kernel entries.  For syscall (both bitnesses), this is free.
For sysenter, it costs 15 cycles or so.  As a consolation prize, we
can speed up context switches by avoiding saving and restoring flags.

If we don't like the added sysenter overhead, there are few other
options:

 - Try to optimize it by folding it with other flag manipulations
   (my attempt to do that didn't end up being any faster).

 - Only do the syscall part.  That's free, but it serves little
   purpose other than being polite to buggy userspace code.

 - Don't filter NT on sysenter.  Instead, filter it on EFI entry
   and modify the IRET code to retry without NT set if NT was set.

 - Don't filter NT on sysenter.  Instead, only filter it when
   sysenter jumps to the slow path.  (This is trivial, but it does
   nothing to reduce the chance that evil user code can cause
   trouble by, say, reading from sysfs with NT set using sysenter.)

See: https://bugs.winehq.org/show_bug.cgi?id=33275

Andy Lutomirski (2):
  x86_64,entry: Filter RFLAGS.NT on entry from userspace
  x86_64: Don't save flags on context switch

 arch/x86/ia32/ia32entry.S        | 10 +++++++++-
 arch/x86/include/asm/switch_to.h | 10 +++++++---
 arch/x86/kernel/cpu/common.c     |  2 +-
 3 files changed, 17 insertions(+), 5 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 19:40 [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Andy Lutomirski
@ 2014-09-30 19:40 ` Andy Lutomirski
  2014-09-30 21:39   ` Sebastian Lackner
  2014-10-01  0:27   ` Chuck Ebbert
  2014-09-30 19:40 ` [PATCH 2/2] x86_64: Don't save flags on context switch Andy Lutomirski
  2014-09-30 22:21 ` [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Thomas Gleixner
  2 siblings, 2 replies; 16+ messages in thread
From: Andy Lutomirski @ 2014-09-30 19:40 UTC (permalink / raw)
  To: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel, Chuck Ebbert,
	Andy Lutomirski, stable

The NT flag doesn't do anything in long mode other than causing IRET
to #GP.  Oddly, CPL3 code can still net NT using popf.

Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.

This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM).  It also clears NT
(and some other flags) in software on SYSENTER.

I haven't touched anything on 32-bit kernels.

If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble.  For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen.  Apparently Wine sometimes does this (!), and, if
an IRET return happens, Wine will segfault.

I think that Wine should be fixed to stop setting NT when a syscall
happens, but handling NT more gracefully is still nice.

The syscall mask change comes from a variant of this patch by Anish
Bhatt.

Cc: stable@kernel.org
Reported-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/ia32/ia32entry.S    | 10 +++++++++-
 arch/x86/kernel/cpu/common.c |  2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb05023c..079f42a7ad58 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -143,7 +143,15 @@ ENTRY(ia32_sysenter_target)
 	pushq_cfi %r10
 	CFI_REL_OFFSET rip,0
 	pushq_cfi %rax
-	cld
+
+	/*
+	 * Sysenter doesn't filter flags, so we should filter them
+	 * ourselves.
+	 */
+	pushfq_cfi
+	andl $~(X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_NT|X86_EFLAGS_IOPL),(%rsp)
+	popfq_cfi
+
 	SAVE_ARGS 0,1,0
  	/* no need to do an access_ok check here because rbp has been
  	   32bit zero extended */ 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e4ab2b42bd6f..31265580c38a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1184,7 +1184,7 @@ void syscall_init(void)
 	/* Flags to clear on syscall */
 	wrmsrl(MSR_SYSCALL_MASK,
 	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
-	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
+	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
 /*
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] x86_64: Don't save flags on context switch
  2014-09-30 19:40 [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Andy Lutomirski
  2014-09-30 19:40 ` [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
@ 2014-09-30 19:40 ` Andy Lutomirski
  2014-09-30 22:21 ` [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Thomas Gleixner
  2 siblings, 0 replies; 16+ messages in thread
From: Andy Lutomirski @ 2014-09-30 19:40 UTC (permalink / raw)
  To: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel, Chuck Ebbert,
	Andy Lutomirski

Now that the kernel always runs with clean flags (in particular, NT
is clear), there is no need to save and restore flags on every
context switch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/switch_to.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index d7f3b3b78ac3..0464181b0cbe 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -79,8 +79,8 @@ do {									\
 #else /* CONFIG_X86_32 */
 
 /* frame pointer must be last for get_wchan */
-#define SAVE_CONTEXT    "pushf ; pushq %%rbp ; movq %%rsi,%%rbp\n\t"
-#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp ; popf\t"
+#define SAVE_CONTEXT    "pushq %%rbp ; movq %%rsi,%%rbp\n\t"
+#define RESTORE_CONTEXT "movq %%rbp,%%rsi ; popq %%rbp\t"
 
 #define __EXTRA_CLOBBER  \
 	, "rcx", "rbx", "rdx", "r8", "r9", "r10", "r11", \
@@ -100,7 +100,11 @@ do {									\
 #define __switch_canary_iparam
 #endif	/* CC_STACKPROTECTOR */
 
-/* Save restore flags to clear handle leaking NT */
+/*
+ * There is no need to save or restore flags, because flags are always
+ * clean in kernel mode, with the possible exception of IOPL.  Kernel IOPL
+ * has no effect.
+ */
 #define switch_to(prev, next, last) \
 	asm volatile(SAVE_CONTEXT					  \
 	     "movq %%rsp,%P[threadrsp](%[prev])\n\t" /* save RSP */	  \
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 19:40 ` [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
@ 2014-09-30 21:39   ` Sebastian Lackner
  2014-09-30 21:45     ` Andy Lutomirski
  2014-10-01  0:27   ` Chuck Ebbert
  1 sibling, 1 reply; 16+ messages in thread
From: Sebastian Lackner @ 2014-09-30 21:39 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin
  Cc: Anish Bhatt, linux-kernel, Chuck Ebbert, stable

On 30.09.2014 21:40, Andy Lutomirski wrote:
> what would happen.  Apparently Wine sometimes does this (!), and, if
> an IRET return happens, Wine will segfault.
> 
> I think that Wine should be fixed to stop setting NT when a syscall
> happens, but handling NT more gracefully is still nice.
> 

Just to give some more background about this issue: Wine has no influence
if the NT flag is set or not - as Wine doesn't trace each individual opcode,
there is no chance to know, if a Windows program messes up the EFLAGS. This
happens in closed source Windows applications, so its not really Wines fault.

I think the current approach should be fine, but if other people prefer one of
the solutions without additional overhead on syscall entry:

At least for Wine it would also be absolutely fine, when the application would
just get a proper signal (if adding the retry IRET is too complicated). I've
attached the url to an additional example program, which shows that currently
the signal handler is unable to process such a fault, and a proof-of-concept
patch to clear the NT flags at least for the signal handler. Such a patch
doesn't fix the potential issues with EFI though.

Example program:
http://ix.io/ezl

Proof-of-concept patch:
http://ix.io/ezm

Regards,
Sebastian


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 21:39   ` Sebastian Lackner
@ 2014-09-30 21:45     ` Andy Lutomirski
  2014-09-30 22:23       ` Sebastian Lackner
  2014-09-30 22:27       ` Thomas Gleixner
  0 siblings, 2 replies; 16+ messages in thread
From: Andy Lutomirski @ 2014-09-30 21:45 UTC (permalink / raw)
  To: Sebastian Lackner
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Anish Bhatt, linux-kernel, Chuck Ebbert, stable

On Tue, Sep 30, 2014 at 2:39 PM, Sebastian Lackner
<sebastian@fds-team.de> wrote:
> On 30.09.2014 21:40, Andy Lutomirski wrote:
>> what would happen.  Apparently Wine sometimes does this (!), and, if
>> an IRET return happens, Wine will segfault.
>>
>> I think that Wine should be fixed to stop setting NT when a syscall
>> happens, but handling NT more gracefully is still nice.
>>
>
> Just to give some more background about this issue: Wine has no influence
> if the NT flag is set or not - as Wine doesn't trace each individual opcode,
> there is no chance to know, if a Windows program messes up the EFLAGS. This
> happens in closed source Windows applications, so its not really Wines fault.

I don't buy this explanation at all.  Surely Wine is responsible for
all system calls that happen.  There's only a problem when NT is set
and a system call happens.

>
> I think the current approach should be fine, but if other people prefer one of
> the solutions without additional overhead on syscall entry:
>
> At least for Wine it would also be absolutely fine, when the application would
> just get a proper signal (if adding the retry IRET is too complicated). I've
> attached the url to an additional example program, which shows that currently
> the signal handler is unable to process such a fault, and a proof-of-concept
> patch to clear the NT flags at least for the signal handler. Such a patch
> doesn't fix the potential issues with EFI though.
>
> Example program:
> http://ix.io/ezl
>
> Proof-of-concept patch:
> http://ix.io/ezm

This patch would make more sense if you only cleared it from the
actual regs, not the saved regs.  User code can do the latter on its
own.

It would certainly be possible to clear NT and retry IRET if IRET
fails with NT set.  This would have no overhead for anything relevant.
That would be this alternative from my 0/2 email:

 - Don't filter NT on sysenter.  Instead, filter it on EFI entry
   and modify the IRET code to retry without NT set if NT was set.

Thomas hpa, etc: any thoughts?

>
> Regards,
> Sebastian
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to
  2014-09-30 19:40 [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Andy Lutomirski
  2014-09-30 19:40 ` [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
  2014-09-30 19:40 ` [PATCH 2/2] x86_64: Don't save flags on context switch Andy Lutomirski
@ 2014-09-30 22:21 ` Thomas Gleixner
  2014-09-30 22:30   ` Andy Lutomirski
  2 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2014-09-30 22:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, Ingo Molnar, H. Peter Anvin, Sebastian Lackner,
	Anish Bhatt, linux-kernel, Chuck Ebbert

On Tue, 30 Sep 2014, Andy Lutomirski wrote:
> Anish Bhatt noticed that user programs can set RFLAGS.NT before
> syscall or sysenter, and the kernel entry code doesn't filter out
> NT.  This causes kernel C code and, depending on thread flags, the
> exit slow path to run with NT set.
> 
> The former is a little bit scary (imagine calling into EFI with NT
> set), and the latter will fail with #GP and send a spurious SIGSEGV.
> 
> One answer would be "don't do that".  But the kernel can do better
> here.
> 
> These patches, which I'm not completely thrilled by, filter NT on
> all kernel entries.  For syscall (both bitnesses), this is free.
> For sysenter, it costs 15 cycles or so.  As a consolation prize, we
> can speed up context switches by avoiding saving and restoring flags.

That's a nice reason not to do any of the other ugly variants.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 21:45     ` Andy Lutomirski
@ 2014-09-30 22:23       ` Sebastian Lackner
  2014-09-30 22:27       ` Thomas Gleixner
  1 sibling, 0 replies; 16+ messages in thread
From: Sebastian Lackner @ 2014-09-30 22:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Anish Bhatt, linux-kernel, Chuck Ebbert, stable

On 30.09.2014 23:45, Andy Lutomirski wrote:
> On Tue, Sep 30, 2014 at 2:39 PM, Sebastian Lackner
> <sebastian@fds-team.de> wrote:
>> On 30.09.2014 21:40, Andy Lutomirski wrote:
>>> what would happen.  Apparently Wine sometimes does this (!), and, if
>>> an IRET return happens, Wine will segfault.
>>>
>>> I think that Wine should be fixed to stop setting NT when a syscall
>>> happens, but handling NT more gracefully is still nice.
>>>
>>
>> Just to give some more background about this issue: Wine has no influence
>> if the NT flag is set or not - as Wine doesn't trace each individual opcode,
>> there is no chance to know, if a Windows program messes up the EFLAGS. This
>> happens in closed source Windows applications, so its not really Wines fault.
> 
> I don't buy this explanation at all.  Surely Wine is responsible for
> all system calls that happen.  There's only a problem when NT is set
> and a system call happens.

Wine maps all the Windows API calls to their corresponding Linux equivalent shared
library, you will rarely find any direct syscalls. This means it would involve adding
special code to clear NT at the beginning of _each_ function, which can be called
directly from the Windows application. Also callbacks to/from native code would have
to be protected...

I doubt that gcc will ever add such a feature, but even if there would be something
like that, it still wouldn't be a good solution - lot more overhead than a kernel
solution. ;)

> 
>>
>> I think the current approach should be fine, but if other people prefer one of
>> the solutions without additional overhead on syscall entry:
>>
>> At least for Wine it would also be absolutely fine, when the application would
>> just get a proper signal (if adding the retry IRET is too complicated). I've
>> attached the url to an additional example program, which shows that currently
>> the signal handler is unable to process such a fault, and a proof-of-concept
>> patch to clear the NT flags at least for the signal handler. Such a patch
>> doesn't fix the potential issues with EFI though.
>>
>> Example program:
>> http://ix.io/ezl
>>
>> Proof-of-concept patch:
>> http://ix.io/ezm
> 
> This patch would make more sense if you only cleared it from the
> actual regs, not the saved regs.  User code can do the latter on its
> own.

Maybe I misunderstood the comments in the source, but it sounded like
regs->flags contains the flags which are used to enter the signal handler ?!
(which would mean that they are saved on the stack and used for the following IRET)

Having the NT flag still set at this point would cause immediately another
recursive fault, if the signal handler tries to run any syscall, which is
most probably not what the developer wants.

> 
> It would certainly be possible to clear NT and retry IRET if IRET
> fails with NT set.  This would have no overhead for anything relevant.
> That would be this alternative from my 0/2 email:
> 
>  - Don't filter NT on sysenter.  Instead, filter it on EFI entry
>    and modify the IRET code to retry without NT set if NT was set.

Yup, I saw that, but I'm not sure how small the overhead really is.
Is it sufficient to check the NT flag and retry? What about other instructions,
which could also cause a #GP, it wouldn't make sense to retry them, even if
the NT flag is set (for some unknown reason). Not sure if it will cause any
harm, but at least retry _only_ IRET instructions sounds a bit complicated.

> 
> Thomas hpa, etc: any thoughts?
> 
>>
>> Regards,
>> Sebastian
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 21:45     ` Andy Lutomirski
  2014-09-30 22:23       ` Sebastian Lackner
@ 2014-09-30 22:27       ` Thomas Gleixner
  2014-09-30 22:33         ` Andy Lutomirski
  2014-09-30 22:42         ` H. Peter Anvin
  1 sibling, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2014-09-30 22:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sebastian Lackner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Anish Bhatt, linux-kernel, Chuck Ebbert, stable

On Tue, 30 Sep 2014, Andy Lutomirski wrote:
> It would certainly be possible to clear NT and retry IRET if IRET
> fails with NT set.  This would have no overhead for anything relevant.
> That would be this alternative from my 0/2 email:
> 
>  - Don't filter NT on sysenter.  Instead, filter it on EFI entry
>    and modify the IRET code to retry without NT set if NT was set.
> 
> Thomas hpa, etc: any thoughts?

Filter it right away. That's solid and obvious. Anything else is just
complex and prone for future brown paperbag failures.

We get the context switch benefit from it, so there is some
compensation for the extra cycles.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to
  2014-09-30 22:21 ` [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Thomas Gleixner
@ 2014-09-30 22:30   ` Andy Lutomirski
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Lutomirski @ 2014-09-30 22:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: X86 ML, Ingo Molnar, H. Peter Anvin, Sebastian Lackner,
	Anish Bhatt, linux-kernel, Chuck Ebbert

On Tue, Sep 30, 2014 at 3:21 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 30 Sep 2014, Andy Lutomirski wrote:
>> Anish Bhatt noticed that user programs can set RFLAGS.NT before
>> syscall or sysenter, and the kernel entry code doesn't filter out
>> NT.  This causes kernel C code and, depending on thread flags, the
>> exit slow path to run with NT set.
>>
>> The former is a little bit scary (imagine calling into EFI with NT
>> set), and the latter will fail with #GP and send a spurious SIGSEGV.
>>
>> One answer would be "don't do that".  But the kernel can do better
>> here.
>>
>> These patches, which I'm not completely thrilled by, filter NT on
>> all kernel entries.  For syscall (both bitnesses), this is free.
>> For sysenter, it costs 15 cycles or so.  As a consolation prize, we
>> can speed up context switches by avoiding saving and restoring flags.
>
> That's a nice reason not to do any of the other ugly variants.

We could do something hideous:

Don't filter NT in sysexit or on context switch.  Instead, handle it
in bad_iret.

Up side: all common cases are maximally fast.

Down side: Ugly.  And malicious processes can leak NT, causing return
to a different process to fault, thereby adding a thousand or two
cycles (or possibly a lot more if the fault hits in the middle of
espfix64.  Egads.)

This is not intended to be a serious suggestion...

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 22:27       ` Thomas Gleixner
@ 2014-09-30 22:33         ` Andy Lutomirski
  2014-09-30 23:21           ` Thomas Gleixner
  2014-09-30 22:42         ` H. Peter Anvin
  1 sibling, 1 reply; 16+ messages in thread
From: Andy Lutomirski @ 2014-09-30 22:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sebastian Lackner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Anish Bhatt, linux-kernel, Chuck Ebbert, stable

On Tue, Sep 30, 2014 at 3:27 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 30 Sep 2014, Andy Lutomirski wrote:
>> It would certainly be possible to clear NT and retry IRET if IRET
>> fails with NT set.  This would have no overhead for anything relevant.
>> That would be this alternative from my 0/2 email:
>>
>>  - Don't filter NT on sysenter.  Instead, filter it on EFI entry
>>    and modify the IRET code to retry without NT set if NT was set.
>>
>> Thomas hpa, etc: any thoughts?
>
> Filter it right away. That's solid and obvious. Anything else is just
> complex and prone for future brown paperbag failures.

Yeah, agreed.  That's exactly what these patches do, although, if you
put them in -tip and want to keep the stable CC, it's probably worth
fixing the address (oops).

>
> We get the context switch benefit from it, so there is some
> compensation for the extra cycles.

If we ever want those cycles back, I bet that the compat sysenter path
could be trimmed down a lot.  For example, I think that all of the
zero-extension stuff is unnecessary now that we have the magic syscall
wrappers for all (?) syscalls.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 22:27       ` Thomas Gleixner
  2014-09-30 22:33         ` Andy Lutomirski
@ 2014-09-30 22:42         ` H. Peter Anvin
  1 sibling, 0 replies; 16+ messages in thread
From: H. Peter Anvin @ 2014-09-30 22:42 UTC (permalink / raw)
  To: Thomas Gleixner, Andy Lutomirski
  Cc: Sebastian Lackner, X86 ML, Ingo Molnar, Anish Bhatt,
	linux-kernel, Chuck Ebbert, stable

Nicely enough we get the context switch benefit without any cost in the comon case of 64-bit apps.

On September 30, 2014 3:27:41 PM PDT, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Tue, 30 Sep 2014, Andy Lutomirski wrote:
>> It would certainly be possible to clear NT and retry IRET if IRET
>> fails with NT set.  This would have no overhead for anything
>relevant.
>> That would be this alternative from my 0/2 email:
>> 
>>  - Don't filter NT on sysenter.  Instead, filter it on EFI entry
>>    and modify the IRET code to retry without NT set if NT was set.
>> 
>> Thomas hpa, etc: any thoughts?
>
>Filter it right away. That's solid and obvious. Anything else is just
>complex and prone for future brown paperbag failures.
>
>We get the context switch benefit from it, so there is some
>compensation for the extra cycles.
>
>Thanks,
>
>	tglx

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 22:33         ` Andy Lutomirski
@ 2014-09-30 23:21           ` Thomas Gleixner
  2014-10-01 17:50             ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2014-09-30 23:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sebastian Lackner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Anish Bhatt, linux-kernel, Chuck Ebbert

On Tue, 30 Sep 2014, Andy Lutomirski wrote:

> On Tue, Sep 30, 2014 at 3:27 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Tue, 30 Sep 2014, Andy Lutomirski wrote:
> >> It would certainly be possible to clear NT and retry IRET if IRET
> >> fails with NT set.  This would have no overhead for anything relevant.
> >> That would be this alternative from my 0/2 email:
> >>
> >>  - Don't filter NT on sysenter.  Instead, filter it on EFI entry
> >>    and modify the IRET code to retry without NT set if NT was set.
> >>
> >> Thomas hpa, etc: any thoughts?
> >
> > Filter it right away. That's solid and obvious. Anything else is just
> > complex and prone for future brown paperbag failures.
> 
> Yeah, agreed.  That's exactly what these patches do, although, if you
> put them in -tip and want to keep the stable CC, it's probably worth
> fixing the address (oops).

Even more oops as you failed to update it in your reply again ....
 
> > We get the context switch benefit from it, so there is some
> > compensation for the extra cycles.
> 
> If we ever want those cycles back, I bet that the compat sysenter path
> could be trimmed down a lot.  For example, I think that all of the
> zero-extension stuff is unnecessary now that we have the magic syscall
> wrappers for all (?) syscalls.

Emphasis on "(?)". So yes, once we verified that ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 19:40 ` [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
  2014-09-30 21:39   ` Sebastian Lackner
@ 2014-10-01  0:27   ` Chuck Ebbert
  2014-10-01  0:38     ` Andy Lutomirski
  1 sibling, 1 reply; 16+ messages in thread
From: Chuck Ebbert @ 2014-10-01  0:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner, Anish Bhatt, linux-kernel, stable

On Tue, 30 Sep 2014 12:40:35 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> The NT flag doesn't do anything in long mode other than causing IRET
> to #GP.  Oddly, CPL3 code can still net NT using popf.
> 
> Entry via hardware or software interrupt clears NT automatically, so
> the only relevant entries are fast syscalls.
> 
> This patch programs the CPU to clear NT on entry via SYSCALL (both
> 32-bit and 64-bit, by my reading of the AMD APM).  It also clears NT
> (and some other flags) in software on SYSENTER.
> 
> I haven't touched anything on 32-bit kernels.
> 
> If user code causes kernel code to run with NT set, then there's at
> least some (small) chance that it could cause trouble.  For example,
> user code could cause a call to EFI code with NT set, and who knows
> what would happen.  Apparently Wine sometimes does this (!), and, if
> an IRET return happens, Wine will segfault.
> 
> I think that Wine should be fixed to stop setting NT when a syscall
> happens, but handling NT more gracefully is still nice.
> 
> The syscall mask change comes from a variant of this patch by Anish
> Bhatt.
> 
> Cc: stable@kernel.org
> Reported-by: Anish Bhatt <anish@chelsio.com>
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
>  arch/x86/ia32/ia32entry.S    | 10 +++++++++-
>  arch/x86/kernel/cpu/common.c |  2 +-
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 4299eb05023c..079f42a7ad58 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -143,7 +143,15 @@ ENTRY(ia32_sysenter_target)
>  	pushq_cfi %r10
>  	CFI_REL_OFFSET rip,0
>  	pushq_cfi %rax
> -	cld
> +
> +	/*
> +	 * Sysenter doesn't filter flags, so we should filter them
> +	 * ourselves.
> +	 */
> +	pushfq_cfi
> +	andl $~(X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_NT|X86_EFLAGS_IOPL),(%rsp)
> +	popfq_cfi

Can't you just push a constant and pop that onto flags instead? It's
not like we care what's in there on entry to the kernel.

> +
>  	SAVE_ARGS 0,1,0
>   	/* no need to do an access_ok check here because rbp has been
>   	   32bit zero extended */ 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index e4ab2b42bd6f..31265580c38a 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1184,7 +1184,7 @@ void syscall_init(void)
>  	/* Flags to clear on syscall */
>  	wrmsrl(MSR_SYSCALL_MASK,
>  	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
> -	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
> +	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01  0:27   ` Chuck Ebbert
@ 2014-10-01  0:38     ` Andy Lutomirski
  0 siblings, 0 replies; 16+ messages in thread
From: Andy Lutomirski @ 2014-10-01  0:38 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner, Anish Bhatt, linux-kernel, stable

On Tue, Sep 30, 2014 at 5:27 PM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote:
> On Tue, 30 Sep 2014 12:40:35 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>> The NT flag doesn't do anything in long mode other than causing IRET
>> to #GP.  Oddly, CPL3 code can still net NT using popf.
>>
>> Entry via hardware or software interrupt clears NT automatically, so
>> the only relevant entries are fast syscalls.
>>
>> This patch programs the CPU to clear NT on entry via SYSCALL (both
>> 32-bit and 64-bit, by my reading of the AMD APM).  It also clears NT
>> (and some other flags) in software on SYSENTER.
>>
>> I haven't touched anything on 32-bit kernels.
>>
>> If user code causes kernel code to run with NT set, then there's at
>> least some (small) chance that it could cause trouble.  For example,
>> user code could cause a call to EFI code with NT set, and who knows
>> what would happen.  Apparently Wine sometimes does this (!), and, if
>> an IRET return happens, Wine will segfault.
>>
>> I think that Wine should be fixed to stop setting NT when a syscall
>> happens, but handling NT more gracefully is still nice.
>>
>> The syscall mask change comes from a variant of this patch by Anish
>> Bhatt.
>>
>> Cc: stable@kernel.org
>> Reported-by: Anish Bhatt <anish@chelsio.com>
>> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>> ---
>>  arch/x86/ia32/ia32entry.S    | 10 +++++++++-
>>  arch/x86/kernel/cpu/common.c |  2 +-
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
>> index 4299eb05023c..079f42a7ad58 100644
>> --- a/arch/x86/ia32/ia32entry.S
>> +++ b/arch/x86/ia32/ia32entry.S
>> @@ -143,7 +143,15 @@ ENTRY(ia32_sysenter_target)
>>       pushq_cfi %r10
>>       CFI_REL_OFFSET rip,0
>>       pushq_cfi %rax
>> -     cld
>> +
>> +     /*
>> +      * Sysenter doesn't filter flags, so we should filter them
>> +      * ourselves.
>> +      */
>> +     pushfq_cfi
>> +     andl $~(X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_NT|X86_EFLAGS_IOPL),(%rsp)
>> +     popfq_cfi
>
> Can't you just push a constant and pop that onto flags instead? It's
> not like we care what's in there on entry to the kernel.

I don't see why not.  I'll test on Xen just to be sure.

--Andy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-09-30 23:21           ` Thomas Gleixner
@ 2014-10-01 17:50             ` H. Peter Anvin
  2014-10-01 17:53               ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2014-10-01 17:50 UTC (permalink / raw)
  To: Thomas Gleixner, Andy Lutomirski
  Cc: Sebastian Lackner, X86 ML, Ingo Molnar, Anish Bhatt,
	linux-kernel, Chuck Ebbert

On 09/30/2014 04:21 PM, Thomas Gleixner wrote:
>>
>> If we ever want those cycles back, I bet that the compat sysenter path
>> could be trimmed down a lot.  For example, I think that all of the
>> zero-extension stuff is unnecessary now that we have the magic syscall
>> wrappers for all (?) syscalls.
> 
> Emphasis on "(?)". So yes, once we verified that ....
> 

I don't think that's true.  Many system calls use exactly the same entry
point for compat and noncompat calls.  I don't see any value in
replicating that code in every system call.

The only time we need to do anything horribly special is when we have an
argument which is a signed long, which fortunately is not at all common.

	-hpa



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 17:50             ` H. Peter Anvin
@ 2014-10-01 17:53               ` H. Peter Anvin
  0 siblings, 0 replies; 16+ messages in thread
From: H. Peter Anvin @ 2014-10-01 17:53 UTC (permalink / raw)
  To: Thomas Gleixner, Andy Lutomirski
  Cc: Sebastian Lackner, X86 ML, Ingo Molnar, Anish Bhatt,
	linux-kernel, Chuck Ebbert

On 10/01/2014 10:50 AM, H. Peter Anvin wrote:
> On 09/30/2014 04:21 PM, Thomas Gleixner wrote:
>>>
>>> If we ever want those cycles back, I bet that the compat sysenter path
>>> could be trimmed down a lot.  For example, I think that all of the
>>> zero-extension stuff is unnecessary now that we have the magic syscall
>>> wrappers for all (?) syscalls.
>>
>> Emphasis on "(?)". So yes, once we verified that ....
>>
> 
> I don't think that's true.  Many system calls use exactly the same entry
> point for compat and noncompat calls.  I don't see any value in
> replicating that code in every system call.
> 
> The only time we need to do anything horribly special is when we have an
> argument which is a signed long, which fortunately is not at all common.
> 

Also, we still need to shuffle registers around to match the x86-64
calling convention.  Zero extension as part of that shuffle is free.

	-hpa



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-10-01 17:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-30 19:40 [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Andy Lutomirski
2014-09-30 19:40 ` [PATCH 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
2014-09-30 21:39   ` Sebastian Lackner
2014-09-30 21:45     ` Andy Lutomirski
2014-09-30 22:23       ` Sebastian Lackner
2014-09-30 22:27       ` Thomas Gleixner
2014-09-30 22:33         ` Andy Lutomirski
2014-09-30 23:21           ` Thomas Gleixner
2014-10-01 17:50             ` H. Peter Anvin
2014-10-01 17:53               ` H. Peter Anvin
2014-09-30 22:42         ` H. Peter Anvin
2014-10-01  0:27   ` Chuck Ebbert
2014-10-01  0:38     ` Andy Lutomirski
2014-09-30 19:40 ` [PATCH 2/2] x86_64: Don't save flags on context switch Andy Lutomirski
2014-09-30 22:21 ` [PATCH 0/2] x86_64,entry: Clear NT on entry and speed up switch_to Thomas Gleixner
2014-09-30 22:30   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).