[PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
@ 2014-09-25 19:42 Anish Bhatt
  2014-09-25 23:00 ` Chuck Ebbert
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Anish Bhatt @ 2014-09-25 19:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, tglx, mingo, hpa, sebastian, Anish Bhatt

The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
 syscall entry, should also clear the nested task (NT) flag to be safe from
 userspace injection. Without this fix the application segmentation
 faults on syscall return because of the changed meaning of the IRET
 instruction.

Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Sebastian Lackner <sebastian@fds-team.de>
---
 arch/x86/kernel/cpu/common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e4ab2b4..3126558 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1184,7 +1184,7 @@ void syscall_init(void)
 	/* Flags to clear on syscall */
 	wrmsrl(MSR_SYSCALL_MASK,
 	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
-	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
+	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
 /*
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-25 19:42 [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry Anish Bhatt
@ 2014-09-25 23:00 ` Chuck Ebbert
  2014-09-29 17:30   ` Andy Lutomirski
  2014-09-26 22:00 ` Chuck Ebbert
  2014-09-29 17:40 ` Andy Lutomirski
  2 siblings, 1 reply; 24+ messages in thread
From: Chuck Ebbert @ 2014-09-25 23:00 UTC (permalink / raw)
  To: Anish Bhatt; +Cc: linux-kernel, x86, tglx, mingo, hpa, sebastian

On Thu, 25 Sep 2014 12:42:51 -0700
Anish Bhatt <anish@chelsio.com> wrote:

> The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
>  syscall entry, should also clear the nested task (NT) flag to be safe from
>  userspace injection. Without this fix the application segmentation
>  faults on syscall return because of the changed meaning of the IRET
>  instruction.
> 
> Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275
> 
> Signed-off-by: Anish Bhatt <anish@chelsio.com>
> Signed-off-by: Sebastian Lackner <sebastian@fds-team.de>
> ---
>  arch/x86/kernel/cpu/common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index e4ab2b4..3126558 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1184,7 +1184,7 @@ void syscall_init(void)
>  	/* Flags to clear on syscall */
>  	wrmsrl(MSR_SYSCALL_MASK,
>  	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
> -	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
> +	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
>  }
>  
>  /*

To demonstrate the bug this fixes, just strace this program, adapted
from the linked bug report:

#include <stdio.h>
#include <unistd.h>

int main()
{
	asm volatile("pushfq           \n\t" \
		     "pop %rax         \n\t" \
		     "or $0x4000,%rax  \n\t" \
		     "push %rax        \n\t" \
		     "popfq            \n\t");
	sleep(1);

	return 0;
}

I ran strace under gdb and found it dies here with SIGSEGV:

   0x00007ffff7a505c2 <+50>:	mov    $0xea,%eax
   0x00007ffff7a505c7 <+55>:	syscall 
=> 0x00007ffff7a505c9 <+57>:	cmp    $0xfffffffffffff000,%rax


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-25 19:42 [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry Anish Bhatt
  2014-09-25 23:00 ` Chuck Ebbert
@ 2014-09-26 22:00 ` Chuck Ebbert
  2014-09-26 22:10   ` Anish Bhatt
  2014-09-26 23:32   ` Linus Torvalds
  2014-09-29 17:40 ` Andy Lutomirski
  2 siblings, 2 replies; 24+ messages in thread
From: Chuck Ebbert @ 2014-09-26 22:00 UTC (permalink / raw)
  To: Anish Bhatt
  Cc: linux-kernel, x86, tglx, mingo, hpa, sebastian, Linus Torvalds

On Thu, 25 Sep 2014 12:42:51 -0700
Anish Bhatt <anish@chelsio.com> wrote:

> The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
>  syscall entry, should also clear the nested task (NT) flag to be safe from
>  userspace injection. Without this fix the application segmentation
>  faults on syscall return because of the changed meaning of the IRET
>  instruction.
> 
> Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275
> 
> Signed-off-by: Anish Bhatt <anish@chelsio.com>
> Signed-off-by: Sebastian Lackner <sebastian@fds-team.de>
> ---
>  arch/x86/kernel/cpu/common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index e4ab2b4..3126558 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1184,7 +1184,7 @@ void syscall_init(void)
>  	/* Flags to clear on syscall */
>  	wrmsrl(MSR_SYSCALL_MASK,
>  	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
> -	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
> +	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
>  }
>  
>  /*

I don't get it. Why isn't this patch acceptable, at least on x86-64
where NT is never valid?

Bueller?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-26 22:00 ` Chuck Ebbert
@ 2014-09-26 22:10   ` Anish Bhatt
  2014-09-26 23:32   ` Linus Torvalds
  1 sibling, 0 replies; 24+ messages in thread
From: Anish Bhatt @ 2014-09-26 22:10 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: linux-kernel, x86, tglx, mingo, hpa, sebastian, Linus Torvalds

> -----Original Message-----
> From: Chuck Ebbert [mailto:cebbert.lkml@gmail.com]
> Sent: Friday, September 26, 2014 3:01 PM
> To: Anish Bhatt
> Cc: linux-kernel@vger.kernel.org; x86@kernel.org; tglx@linutronix.de;
> mingo@redhat.com; hpa@zytor.com; sebastian@fds-team.de; Linus
> Torvalds
> Subject: Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
> 
> On Thu, 25 Sep 2014 12:42:51 -0700
> Anish Bhatt <anish@chelsio.com> wrote:
> 
> > The MSR_SYSCALL_MASK, which is responsible for clearing specific
> > EFLAGS on  syscall entry, should also clear the nested task (NT) flag
> > to be safe from  userspace injection. Without this fix the application
> > segmentation  faults on syscall return because of the changed meaning
> > of the IRET  instruction.
> >
> > Further details can be seen here
> > https://bugs.winehq.org/show_bug.cgi?id=33275
> >
> > Signed-off-by: Anish Bhatt <anish@chelsio.com>
> > Signed-off-by: Sebastian Lackner <sebastian@fds-team.de>
> > ---
> >  arch/x86/kernel/cpu/common.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/common.c
> > b/arch/x86/kernel/cpu/common.c index e4ab2b4..3126558 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1184,7 +1184,7 @@ void syscall_init(void)
> >  	/* Flags to clear on syscall */
> >  	wrmsrl(MSR_SYSCALL_MASK,
> >  	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
> > -	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
> > +	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
> >  }
> >
> >  /*
> 
> I don't get it. Why isn't this patch acceptable, at least on x86-64 where NT is
> never valid?
> 
> Bueller?

Chuck, I don't think anyone has gotten around to reviewing, accepting or 
 rejecting this patch yet.
-Anish

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-26 22:00 ` Chuck Ebbert
  2014-09-26 22:10   ` Anish Bhatt
@ 2014-09-26 23:32   ` Linus Torvalds
  1 sibling, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2014-09-26 23:32 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Anish Bhatt, Linux Kernel Mailing List, the arch/x86 maintainers,
	Thomas Gleixner, Ingo Molnar, Peter Anvin, sebastian

On Fri, Sep 26, 2014 at 3:00 PM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote:
>
> I don't get it. Why isn't this patch acceptable, at least on x86-64
> where NT is never valid?

Why do you think it's not acceptable? Why do you raise a stink *one*
day after the patch - that seems to not be very important - is sent
out?

Why would SIGSEGV even be that bad? The program does crap things, why
accept it silently?

I don't think the patch is necessarily wrong, but I don't see why it
would be critical, and I *definitely* don't see why the f*ck you are
making a big deal of it.

Go away, stop bothering people.

         Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-25 23:00 ` Chuck Ebbert
@ 2014-09-29 17:30   ` Andy Lutomirski
  0 siblings, 0 replies; 24+ messages in thread
From: Andy Lutomirski @ 2014-09-29 17:30 UTC (permalink / raw)
  To: Chuck Ebbert, Anish Bhatt; +Cc: linux-kernel, x86, tglx, mingo, hpa, sebastian

On 09/25/2014 04:00 PM, Chuck Ebbert wrote:
> #include <stdio.h>
> #include <unistd.h>
> 
> int main()
> {
> 	asm volatile("pushfq           \n\t" \
> 		     "pop %rax         \n\t" \
> 		     "or $0x4000,%rax  \n\t" \
> 		     "push %rax        \n\t" \
> 		     "popfq            \n\t");

You're missing clobbers here.  But fixing that has no effect.

> 	sleep(1);
> 
> 	return 0;
> }


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-25 19:42 [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry Anish Bhatt
  2014-09-25 23:00 ` Chuck Ebbert
  2014-09-26 22:00 ` Chuck Ebbert
@ 2014-09-29 17:40 ` Andy Lutomirski
  2014-09-29 18:30   ` Sebastian Lackner
  2014-09-29 18:59   ` Thomas Gleixner
  2 siblings, 2 replies; 24+ messages in thread
From: Andy Lutomirski @ 2014-09-29 17:40 UTC (permalink / raw)
  To: Anish Bhatt, linux-kernel; +Cc: x86, tglx, mingo, hpa, sebastian

On 09/25/2014 12:42 PM, Anish Bhatt wrote:
> The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
>  syscall entry, should also clear the nested task (NT) flag to be safe from
>  userspace injection. Without this fix the application segmentation
>  faults on syscall return because of the changed meaning of the IRET
>  instruction.
> 
> Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275
> 
> Signed-off-by: Anish Bhatt <anish@chelsio.com>
> Signed-off-by: Sebastian Lackner <sebastian@fds-team.de>
> ---
>  arch/x86/kernel/cpu/common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index e4ab2b4..3126558 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1184,7 +1184,7 @@ void syscall_init(void)
>  	/* Flags to clear on syscall */
>  	wrmsrl(MSR_SYSCALL_MASK,
>  	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
> -	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
> +	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);

Something's weird here, and at the very least the changelog is
insufficiently informative.

The Intel SDM says:

If the NT flag is set and the processor is in IA-32e mode, the IRET
instruction causes a general protection exception.

Presumably interrupt delivery clears NT.  I haven't spotted where that's
documented yet.

sysret doesn't appear to care about NT at all.

So: the test code doesn't appear to do anything interesting *unless* it
goes through syscall followed by the iret exit path.  Then it receives
#GP on return, which turns into a signal.

On the premise that the slow and fast return paths ought to be
indistinguishable from userspace, I think we should fix this.  But I
want to understand it better first.

Also, 32-bit may need more care here.

--Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 17:40 ` Andy Lutomirski
@ 2014-09-29 18:30   ` Sebastian Lackner
  2014-09-29 18:43     ` Andy Lutomirski
  2014-09-29 19:33     ` Thomas Gleixner
  2014-09-29 18:59   ` Thomas Gleixner
  1 sibling, 2 replies; 24+ messages in thread
From: Sebastian Lackner @ 2014-09-29 18:30 UTC (permalink / raw)
  To: Andy Lutomirski, Anish Bhatt, linux-kernel; +Cc: x86, tglx, mingo, hpa

On 29.09.2014 19:40, Andy Lutomirski wrote:
> On 09/25/2014 12:42 PM, Anish Bhatt wrote:
>> The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
>>  syscall entry, should also clear the nested task (NT) flag to be safe from
>>  userspace injection. Without this fix the application segmentation
>>  faults on syscall return because of the changed meaning of the IRET
>>  instruction.
>>
>> Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275
>>
>> Signed-off-by: Anish Bhatt <anish@chelsio.com>
>> Signed-off-by: Sebastian Lackner <sebastian@fds-team.de>
>> ---
>>  arch/x86/kernel/cpu/common.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> index e4ab2b4..3126558 100644
>> --- a/arch/x86/kernel/cpu/common.c
>> +++ b/arch/x86/kernel/cpu/common.c
>> @@ -1184,7 +1184,7 @@ void syscall_init(void)
>>  	/* Flags to clear on syscall */
>>  	wrmsrl(MSR_SYSCALL_MASK,
>>  	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
>> -	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
>> +	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
> 
> Something's weird here, and at the very least the changelog is
> insufficiently informative.
> 
> The Intel SDM says:
> 
> If the NT flag is set and the processor is in IA-32e mode, the IRET
> instruction causes a general protection exception.
> 
> Presumably interrupt delivery clears NT.  I haven't spotted where that's
> documented yet.

Well, the best documentation I've found is something like
http://www.fermimn.gov.it/linux/quarta/x86/int.htm

which states:

--- snip ---
INTERRUPT-TO-INNER-PRIVILEGE:
   [...]
   TF := 0;
   NT := 0;
--- snip ---
(Doesn't say anything about HW interrupts though)

This also makes sense at my opinion, since the interrupt handler has to know if it should return
to the previous task (when NT=1) or to the same task (when NT=0).

> 
> sysret doesn't appear to care about NT at all.
> 
> So: the test code doesn't appear to do anything interesting *unless* it
> goes through syscall followed by the iret exit path.  Then it receives
> #GP on return, which turns into a signal.

Yep, thats also my interpretation of this issue. If the processor would be in 32-bit/protected-mode the
NT flag would be interpreted as a task return, and it would probably cause a different exception,
because the kernel never uses the task link property of the TSS.

> 
> On the premise that the slow and fast return paths ought to be
> indistinguishable from userspace, I think we should fix this.  But I
> want to understand it better first.

A reliable way to force the slow return path is to use ptrace, see:
http://lxr.free-electrons.com/source/arch/x86/kernel/entry_64.S#L544

This also matches the experience: The test application only crashes with a small probability,
except you use strace, then it will always crash (because the kernel forces the slow return path).

Two additional remarks:

* A reliable way to let it crash without strace, is to run the fork()/clone() syscall afterwards and
  compile as 32-bit.

* When you run exec*() afterwards, the crash will happen at the entry of the new executable. Doesn't
  matter if the target process is SUID or not. I don't see a way to exploit this issue, though, but
  probably some more people should take a look at it...

> 
> Also, 32-bit may need more care here.

That might be possible. It probably makes sense to review other parts of the code, for similar issues.

> 
> --Andy
> 

Regards,
Sebastian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 18:30   ` Sebastian Lackner
@ 2014-09-29 18:43     ` Andy Lutomirski
  2014-09-29 19:33     ` Thomas Gleixner
  1 sibling, 0 replies; 24+ messages in thread
From: Andy Lutomirski @ 2014-09-29 18:43 UTC (permalink / raw)
  To: Sebastian Lackner
  Cc: Anish Bhatt, linux-kernel, X86 ML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

On Mon, Sep 29, 2014 at 11:30 AM, Sebastian Lackner
<sebastian@fds-team.de> wrote:
> On 29.09.2014 19:40, Andy Lutomirski wrote:
>> On 09/25/2014 12:42 PM, Anish Bhatt wrote:
>>> The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
>>>  syscall entry, should also clear the nested task (NT) flag to be safe from
>>>  userspace injection. Without this fix the application segmentation
>>>  faults on syscall return because of the changed meaning of the IRET
>>>  instruction.
>>>
>>> Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275
>>>
>>> Signed-off-by: Anish Bhatt <anish@chelsio.com>
>>> Signed-off-by: Sebastian Lackner <sebastian@fds-team.de>
>>> ---
>>>  arch/x86/kernel/cpu/common.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>>> index e4ab2b4..3126558 100644
>>> --- a/arch/x86/kernel/cpu/common.c
>>> +++ b/arch/x86/kernel/cpu/common.c
>>> @@ -1184,7 +1184,7 @@ void syscall_init(void)
>>>      /* Flags to clear on syscall */
>>>      wrmsrl(MSR_SYSCALL_MASK,
>>>             X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
>>> -           X86_EFLAGS_IOPL|X86_EFLAGS_AC);
>>> +           X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
>>
>> Something's weird here, and at the very least the changelog is
>> insufficiently informative.
>>
>> The Intel SDM says:
>>
>> If the NT flag is set and the processor is in IA-32e mode, the IRET
>> instruction causes a general protection exception.
>>
>> Presumably interrupt delivery clears NT.  I haven't spotted where that's
>> documented yet.
>
> Well, the best documentation I've found is something like
> http://www.fermimn.gov.it/linux/quarta/x86/int.htm
>
> which states:
>
> --- snip ---
> INTERRUPT-TO-INNER-PRIVILEGE:
>    [...]
>    TF := 0;
>    NT := 0;
> --- snip ---
> (Doesn't say anything about HW interrupts though)
>
> This also makes sense at my opinion, since the interrupt handler has to know if it should return
> to the previous task (when NT=1) or to the same task (when NT=0).
>
>>
>> sysret doesn't appear to care about NT at all.
>>
>> So: the test code doesn't appear to do anything interesting *unless* it
>> goes through syscall followed by the iret exit path.  Then it receives
>> #GP on return, which turns into a signal.
>
> Yep, thats also my interpretation of this issue. If the processor would be in 32-bit/protected-mode the
> NT flag would be interpreted as a task return, and it would probably cause a different exception,
> because the kernel never uses the task link property of the TSS.
>
>>
>> On the premise that the slow and fast return paths ought to be
>> indistinguishable from userspace, I think we should fix this.  But I
>> want to understand it better first.
>
> A reliable way to force the slow return path is to use ptrace, see:
> http://lxr.free-electrons.com/source/arch/x86/kernel/entry_64.S#L544
>
> This also matches the experience: The test application only crashes with a small probability,
> except you use strace, then it will always crash (because the kernel forces the slow return path).
>
> Two additional remarks:
>
> * A reliable way to let it crash without strace, is to run the fork()/clone() syscall afterwards and
>   compile as 32-bit.
>
> * When you run exec*() afterwards, the crash will happen at the entry of the new executable. Doesn't
>   matter if the target process is SUID or not. I don't see a way to exploit this issue, though, but
>   probably some more people should take a look at it...
>
>>
>> Also, 32-bit may need more care here.
>
> That might be possible. It probably makes sense to review other parts of the code, for similar issues.

sysenter probably has the same problem.

--Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 17:40 ` Andy Lutomirski
  2014-09-29 18:30   ` Sebastian Lackner
@ 2014-09-29 18:59   ` Thomas Gleixner
  2014-09-29 19:08     ` Andy Lutomirski
  2014-09-29 19:17     ` Andy Lutomirski
  1 sibling, 2 replies; 24+ messages in thread
From: Thomas Gleixner @ 2014-09-29 18:59 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Anish Bhatt, linux-kernel, x86, mingo, hpa, sebastian

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2771 bytes --]

On Mon, 29 Sep 2014, Andy Lutomirski wrote:
> On 09/25/2014 12:42 PM, Anish Bhatt wrote:
> > The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
> >  syscall entry, should also clear the nested task (NT) flag to be safe from
> >  userspace injection. Without this fix the application segmentation
> >  faults on syscall return because of the changed meaning of the IRET
> >  instruction.
> > 
> > Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275
> > 
> > Signed-off-by: Anish Bhatt <anish@chelsio.com>
> > Signed-off-by: Sebastian Lackner <sebastian@fds-team.de>
> > ---
> >  arch/x86/kernel/cpu/common.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index e4ab2b4..3126558 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -1184,7 +1184,7 @@ void syscall_init(void)
> >  	/* Flags to clear on syscall */
> >  	wrmsrl(MSR_SYSCALL_MASK,
> >  	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
> > -	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
> > +	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
> 
> Something's weird here, and at the very least the changelog is
> insufficiently informative.
> 
> The Intel SDM says:
> 
> If the NT flag is set and the processor is in IA-32e mode, the IRET
> instruction causes a general protection exception.
> 
> Presumably interrupt delivery clears NT.  I haven't spotted where that's
> documented yet. 

Nope, that's unrelated.

See Volume 3, chapter 7.4 "Task linking":

"The previous task link field of the TSS (sometimes called the
 “backlink”) and the NT flag in the EFLAGS register are used to return
 execution to the previous task. EFLAGS.NT = 1 indicates that the
 currently executing task is nested within the execution of another
 task.

 When a CALL instruction, an interrupt, or an exception causes a task
 switch: the processor copies the segment selector for the current TSS
 to the previous task link field of the TSS for the new task; it then
 sets EFLAGS.NT = 1.  If software uses an IRET instruction to suspend
 the new task, the processor checks for EFLAGS.NT = 1; it then uses the
 value in the previous task link field to return to the previous
 task. See Figures 7-8."

Now, Linux does not care about that. Thread management is done purely
in software. So nothing uses and nothing can use the TSS backlink and
NT mode.

In IA-32e mode a IRET seing EFLAGS.NT=1 will cause #GP. In non IA-32e
mode it would simply explode by returning to TSS.back_link, which is
reliably NULL.

So there is nothing to see here other than the stupid user space task
fiddling with the NT flag being killed rightfully.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 18:59   ` Thomas Gleixner
@ 2014-09-29 19:08     ` Andy Lutomirski
  2014-09-29 19:17     ` Andy Lutomirski
  1 sibling, 0 replies; 24+ messages in thread
From: Andy Lutomirski @ 2014-09-29 19:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar, H. Peter Anvin,
	sebastian

On Mon, Sep 29, 2014 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 29 Sep 2014, Andy Lutomirski wrote:
>> On 09/25/2014 12:42 PM, Anish Bhatt wrote:
>> > The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
>> >  syscall entry, should also clear the nested task (NT) flag to be safe from
>> >  userspace injection. Without this fix the application segmentation
>> >  faults on syscall return because of the changed meaning of the IRET
>> >  instruction.
>> >
>> > Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275
>> >
>> > Signed-off-by: Anish Bhatt <anish@chelsio.com>
>> > Signed-off-by: Sebastian Lackner <sebastian@fds-team.de>
>> > ---
>> >  arch/x86/kernel/cpu/common.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>> > index e4ab2b4..3126558 100644
>> > --- a/arch/x86/kernel/cpu/common.c
>> > +++ b/arch/x86/kernel/cpu/common.c
>> > @@ -1184,7 +1184,7 @@ void syscall_init(void)
>> >     /* Flags to clear on syscall */
>> >     wrmsrl(MSR_SYSCALL_MASK,
>> >            X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
>> > -          X86_EFLAGS_IOPL|X86_EFLAGS_AC);
>> > +          X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
>>
>> Something's weird here, and at the very least the changelog is
>> insufficiently informative.
>>
>> The Intel SDM says:
>>
>> If the NT flag is set and the processor is in IA-32e mode, the IRET
>> instruction causes a general protection exception.
>>
>> Presumably interrupt delivery clears NT.  I haven't spotted where that's
>> documented yet.
>
> Nope, that's unrelated.
>
> See Volume 3, chapter 7.4 "Task linking":
>
> "The previous task link field of the TSS (sometimes called the
>  “backlink”) and the NT flag in the EFLAGS register are used to return
>  execution to the previous task. EFLAGS.NT = 1 indicates that the
>  currently executing task is nested within the execution of another
>  task.
>
>  When a CALL instruction, an interrupt, or an exception causes a task
>  switch: the processor copies the segment selector for the current TSS
>  to the previous task link field of the TSS for the new task; it then
>  sets EFLAGS.NT = 1.  If software uses an IRET instruction to suspend
>  the new task, the processor checks for EFLAGS.NT = 1; it then uses the
>  value in the previous task link field to return to the previous
>  task. See Figures 7-8."
>
> Now, Linux does not care about that. Thread management is done purely
> in software. So nothing uses and nothing can use the TSS backlink and
> NT mode.
>
> In IA-32e mode a IRET seing EFLAGS.NT=1 will cause #GP. In non IA-32e
> mode it would simply explode by returning to TSS.back_link, which is
> reliably NULL.
>
> So there is nothing to see here other than the stupid user space task
> fiddling with the NT flag being killed rightfully.

Except that we're exposing ourselves to security issues.  I don't see
any off the top of my head, but what if an unprivileged or
semi-privileged process sets NT, does syscall or sysenter, and causes
the kernel to jump to EFI code?  Or, hell, what if there's something
in the kernel that fakes interrupt delivery and blows up on return?
Or what if we're running in kernel mode with NT set and we take an NMI
or even a nested NMI?  What happens if we have NT set in kernel mode
and enter a VM?

I think it's absurd that you can set NT at all from CPL3 in long mode,
but we should at least try to be graceful about it.

I don't know of any actual bugs here, but fixing this (at least for
syscall) will have absolutely no performance impact.

--Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 18:59   ` Thomas Gleixner
  2014-09-29 19:08     ` Andy Lutomirski
@ 2014-09-29 19:17     ` Andy Lutomirski
  2014-09-29 19:41       ` Thomas Gleixner
  1 sibling, 1 reply; 24+ messages in thread
From: Andy Lutomirski @ 2014-09-29 19:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner

On Mon, Sep 29, 2014 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 29 Sep 2014, Andy Lutomirski wrote:
>> Presumably interrupt delivery clears NT.  I haven't spotted where that's
>> documented yet.
>
> Nope, that's unrelated.

If it weren't the case, then we'd be totally screwed.  Fortunately, it
is.  I found it: SDM Volume 3 6.12.1.2 says:

(On calls to exception and interrupt
handlers, the processor also clears the VM, RF, and NT flags in the
EFLAGS register,
after they are saved on the stack.)

--Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 18:30   ` Sebastian Lackner
  2014-09-29 18:43     ` Andy Lutomirski
@ 2014-09-29 19:33     ` Thomas Gleixner
  2014-09-29 19:41       ` Sebastian Lackner
  1 sibling, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2014-09-29 19:33 UTC (permalink / raw)
  To: Sebastian Lackner
  Cc: Andy Lutomirski, Anish Bhatt, linux-kernel, x86, mingo, hpa

On Mon, 29 Sep 2014, Sebastian Lackner wrote:
> On 29.09.2014 19:40, Andy Lutomirski wrote:
> Well, the best documentation I've found is something like
> http://www.fermimn.gov.it/linux/quarta/x86/int.htm
> 
> which states:
> 
> --- snip ---
> INTERRUPT-TO-INNER-PRIVILEGE:
>    [...]
>    TF := 0;
>    NT := 0;
> --- snip ---
> (Doesn't say anything about HW interrupts though)
> 
> This also makes sense at my opinion, since the interrupt handler has
> to know if it should return to the previous task (when NT=1) or to
> the same task (when NT=0).

No, it does not. Simply because Linux does not support nested tasks at
all, because the TSS is not accessible and the TSS.back_link is
sturdily NULL. So even if it would not explode with a #GP in IA-32e
mode it would explode while trying to execute the instruction at NULL.

> That might be possible. It probably makes sense to review other
> parts of the code, for similar issues.

What's the issue? Stupid user space programs segfaulting?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 19:33     ` Thomas Gleixner
@ 2014-09-29 19:41       ` Sebastian Lackner
  2014-09-29 19:51         ` Thomas Gleixner
  0 siblings, 1 reply; 24+ messages in thread
From: Sebastian Lackner @ 2014-09-29 19:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Anish Bhatt, linux-kernel, x86, mingo, hpa

On 29.09.2014 21:33, Thomas Gleixner wrote:
> On Mon, 29 Sep 2014, Sebastian Lackner wrote:
>> On 29.09.2014 19:40, Andy Lutomirski wrote:
>> Well, the best documentation I've found is something like
>> http://www.fermimn.gov.it/linux/quarta/x86/int.htm
>>
>> which states:
>>
>> --- snip ---
>> INTERRUPT-TO-INNER-PRIVILEGE:
>>    [...]
>>    TF := 0;
>>    NT := 0;
>> --- snip ---
>> (Doesn't say anything about HW interrupts though)
>>
>> This also makes sense at my opinion, since the interrupt handler has
>> to know if it should return to the previous task (when NT=1) or to
>> the same task (when NT=0).
> 
> No, it does not. Simply because Linux does not support nested tasks at
> all, because the TSS is not accessible and the TSS.back_link is
> sturdily NULL. So even if it would not explode with a #GP in IA-32e
> mode it would explode while trying to execute the instruction at NULL.

Sure, I also mentioned this in my last mail:

On 29.09.2014 20:30, Sebastian Lackner wrote:
> NT flag would be interpreted as a task return, and it would probably cause a different exception,
> because the kernel never uses the task link property of the TSS.

> 
>> That might be possible. It probably makes sense to review other
>> parts of the code, for similar issues.
> 
> What's the issue? Stupid user space programs segfaulting?

I see several issues here:

* At first the behaviour is not consistent between several system call and return instructions. For example calling syscalls by using 'int' doesn't have this issue, as it clears the NT flag before entering kernel code. Return instructions also don't show this issue all the time, just when it hits one of the problematic pieces of code.

* The kernel might execute all kind of other code (for example inside of drivers) and start additional threads. I didn't find any good example yet, but its not that unlikely, that the exception can also happen in a completely unrelated thread, where the kernel can not just kill the corresponding usermode app...

I'm fine with all kind of solutions, either it should be allowed to set NT, or the kernel should at least throw a proper exception, so that usermode has a chance to catch and handle it. At the moment the segfault is deadly, as the segfault handler immediately segfaults again - no chance to recover from such an error.

Regards,
Sebastian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 19:17     ` Andy Lutomirski
@ 2014-09-29 19:41       ` Thomas Gleixner
  2014-09-29 19:43         ` H. Peter Anvin
  2014-09-29 20:16         ` Andy Lutomirski
  0 siblings, 2 replies; 24+ messages in thread
From: Thomas Gleixner @ 2014-09-29 19:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner

On Mon, 29 Sep 2014, Andy Lutomirski wrote:
> On Mon, Sep 29, 2014 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, 29 Sep 2014, Andy Lutomirski wrote:
> >> Presumably interrupt delivery clears NT.  I haven't spotted where that's
> >> documented yet.
> >
> > Nope, that's unrelated.
> 
> If it weren't the case, then we'd be totally screwed.  Fortunately, it
> is.  I found it: SDM Volume 3 6.12.1.2 says:
> 
> (On calls to exception and interrupt
> handlers, the processor also clears the VM, RF, and NT flags in the
> EFLAGS register,
> after they are saved on the stack.)

Sorry, I misunderstood your question.

And yes on exception and interrupt entry it is cleared. Otherwise the
whole feature would not work at all ...

But that's why I'm really not worried about it. While we can mask out
the stupid bit easily, it does not provide any value except protecting
silly userspace from rightfully raised exceptions.

When I first saw that patch, I was worried about the security impact,
but after staring long enough at the SDM and the code, the only way it
can explode is when returning to user space. It cannot explode in the
kernel.

So in IA-32e it creates a #GP otherwise it falls over the return to
NULL (TSS.back_link). So what?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 19:41       ` Thomas Gleixner
@ 2014-09-29 19:43         ` H. Peter Anvin
  2014-09-29 19:57           ` Thomas Gleixner
  2014-09-29 20:16         ` Andy Lutomirski
  1 sibling, 1 reply; 24+ messages in thread
From: H. Peter Anvin @ 2014-09-29 19:43 UTC (permalink / raw)
  To: Thomas Gleixner, Andy Lutomirski
  Cc: Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar, Sebastian Lackner

On 09/29/2014 12:41 PM, Thomas Gleixner wrote:
>>
>> If it weren't the case, then we'd be totally screwed.  Fortunately, it
>> is.  I found it: SDM Volume 3 6.12.1.2 says:
>>
>> (On calls to exception and interrupt
>> handlers, the processor also clears the VM, RF, and NT flags in the
>> EFLAGS register,
>> after they are saved on the stack.)
> 
> Sorry, I misunderstood your question.
> 
> And yes on exception and interrupt entry it is cleared. Otherwise the
> whole feature would not work at all ...
> 
> But that's why I'm really not worried about it. While we can mask out
> the stupid bit easily, it does not provide any value except protecting
> silly userspace from rightfully raised exceptions.
> 
> When I first saw that patch, I was worried about the security impact,
> but after staring long enough at the SDM and the code, the only way it
> can explode is when returning to user space. It cannot explode in the
> kernel.
> 
> So in IA-32e it creates a #GP otherwise it falls over the return to
> NULL (TSS.back_link). So what?
> 

How about "it's a bug, but it's not (necessarily) a security issue?"

I think we should mask the bit anyway.

	-hpa



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 19:41       ` Sebastian Lackner
@ 2014-09-29 19:51         ` Thomas Gleixner
  0 siblings, 0 replies; 24+ messages in thread
From: Thomas Gleixner @ 2014-09-29 19:51 UTC (permalink / raw)
  To: Sebastian Lackner
  Cc: Andy Lutomirski, Anish Bhatt, linux-kernel, x86, mingo, hpa

On Mon, 29 Sep 2014, Sebastian Lackner wrote:
> I see several issues here:
> 
> * At first the behaviour is not consistent between several system
> call and return instructions. For example calling syscalls by using
> 'int' doesn't have this issue, as it clears the NT flag before
> entering kernel code. Return instructions also don't show this issue
> all the time, just when it hits one of the problematic pieces of
> code.

And why should we care? The use of NT is not supported. Period. So it
does not matter whether A explodes and B does not.

> * The kernel might execute all kind of other code (for example
> inside of drivers) and start additional threads. I didn't find any
> good example yet, but its not that unlikely, that the exception can
> also happen in a completely unrelated thread, where the kernel can
> not just kill the corresponding usermode app...

The thread can execute whatever it wants in kernel context. The
exception is going to hit on return to user space and not in some
random kernel context. The threads it creates are going to die as
well.

> I'm fine with all kind of solutions, either it should be allowed to
> set NT, or the kernel should at least throw a proper exception, so
> that usermode has a chance to catch and handle it. At the moment the
> segfault is deadly, as the segfault handler immediately segfaults
> again - no chance to recover from such an error.

# gdb ./crap core

(gdb) info registers

eflags         0x4296	[ PF AF SF IF NT ]

Tells you very much that there is state which is not supported.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 19:43         ` H. Peter Anvin
@ 2014-09-29 19:57           ` Thomas Gleixner
  2014-09-29 20:01             ` H. Peter Anvin
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2014-09-29 19:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar,
	Sebastian Lackner

On Mon, 29 Sep 2014, H. Peter Anvin wrote:
> On 09/29/2014 12:41 PM, Thomas Gleixner wrote:
> >>
> >> If it weren't the case, then we'd be totally screwed.  Fortunately, it
> >> is.  I found it: SDM Volume 3 6.12.1.2 says:
> >>
> >> (On calls to exception and interrupt
> >> handlers, the processor also clears the VM, RF, and NT flags in the
> >> EFLAGS register,
> >> after they are saved on the stack.)
> > 
> > Sorry, I misunderstood your question.
> > 
> > And yes on exception and interrupt entry it is cleared. Otherwise the
> > whole feature would not work at all ...
> > 
> > But that's why I'm really not worried about it. While we can mask out
> > the stupid bit easily, it does not provide any value except protecting
> > silly userspace from rightfully raised exceptions.
> > 
> > When I first saw that patch, I was worried about the security impact,
> > but after staring long enough at the SDM and the code, the only way it
> > can explode is when returning to user space. It cannot explode in the
> > kernel.
> > 
> > So in IA-32e it creates a #GP otherwise it falls over the return to
> > NULL (TSS.back_link). So what?
> > 
> 
> How about "it's a bug, but it's not (necessarily) a security issue?"

It's a bug in user space, but as I explained it is NOT a security
issue because the only place it can explode is the return to the buggy
user space code via IRET.

If we start to use IRET in the kernel for random crap where we should
not, then its better we trap over the stupid NT bit as well.
 
> I think we should mask the bit anyway.

I tend to disagree. If we clear it there we need to consequentely
audit ALL other possibilites and if there are any we need to clear the
bit there as well. Just to make buggy user space happy?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 19:57           ` Thomas Gleixner
@ 2014-09-29 20:01             ` H. Peter Anvin
  2014-09-29 20:10               ` Thomas Gleixner
  2014-09-29 20:29               ` Andy Lutomirski
  0 siblings, 2 replies; 24+ messages in thread
From: H. Peter Anvin @ 2014-09-29 20:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar,
	Sebastian Lackner

On 09/29/2014 12:57 PM, Thomas Gleixner wrote:
>  
>> I think we should mask the bit anyway.
> 
> I tend to disagree. If we clear it there we need to consequentely
> audit ALL other possibilites and if there are any we need to clear the
> bit there as well. Just to make buggy user space happy?
> 

The entry options into the kernel are: interrupt/exception (already
known to be OK), SYSENTER32, SYSCALL32, and SYSCALL64.  It is not too
much to work through those issues, I don't think.

	-hpa


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 20:01             ` H. Peter Anvin
@ 2014-09-29 20:10               ` Thomas Gleixner
  2014-09-29 20:29               ` Andy Lutomirski
  1 sibling, 0 replies; 24+ messages in thread
From: Thomas Gleixner @ 2014-09-29 20:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andy Lutomirski, Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar,
	Sebastian Lackner

On Mon, 29 Sep 2014, H. Peter Anvin wrote:
> On 09/29/2014 12:57 PM, Thomas Gleixner wrote:
> >  
> >> I think we should mask the bit anyway.
> > 
> > I tend to disagree. If we clear it there we need to consequentely
> > audit ALL other possibilites and if there are any we need to clear the
> > bit there as well. Just to make buggy user space happy?
> > 
> 
> The entry options into the kernel are: interrupt/exception (already
> known to be OK), SYSENTER32, SYSCALL32, and SYSCALL64.  It is not too
> much to work through those issues, I don't think.

Fair enough.

I still don't see why we should care. Supporting buggy user space was
never high on my list.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 19:41       ` Thomas Gleixner
  2014-09-29 19:43         ` H. Peter Anvin
@ 2014-09-29 20:16         ` Andy Lutomirski
  2014-09-29 21:37           ` Thomas Gleixner
  1 sibling, 1 reply; 24+ messages in thread
From: Andy Lutomirski @ 2014-09-29 20:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner

On Mon, Sep 29, 2014 at 12:41 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 29 Sep 2014, Andy Lutomirski wrote:
>> On Mon, Sep 29, 2014 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Mon, 29 Sep 2014, Andy Lutomirski wrote:
>> >> Presumably interrupt delivery clears NT.  I haven't spotted where that's
>> >> documented yet.
>> >
>> > Nope, that's unrelated.
>>
>> If it weren't the case, then we'd be totally screwed.  Fortunately, it
>> is.  I found it: SDM Volume 3 6.12.1.2 says:
>>
>> (On calls to exception and interrupt
>> handlers, the processor also clears the VM, RF, and NT flags in the
>> EFLAGS register,
>> after they are saved on the stack.)
>
> Sorry, I misunderstood your question.
>
> And yes on exception and interrupt entry it is cleared. Otherwise the
> whole feature would not work at all ...
>
> But that's why I'm really not worried about it. While we can mask out
> the stupid bit easily, it does not provide any value except protecting
> silly userspace from rightfully raised exceptions.
>
> When I first saw that patch, I was worried about the security impact,
> but after staring long enough at the SDM and the code, the only way it
> can explode is when returning to user space. It cannot explode in the
> kernel.

This is only true as long as the only use of lret from a system call
(or kernel thread started from a system call) is to return to
userspace.

For example, __efi64_thunk uses lretq, so mixed-mode EFI doesn't
violate this assumption, but __efi64_thunk could just as easily have
used iret.

IOW, I don't think there's any vulnerability here, but this makes me nervous.

--Andy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 20:01             ` H. Peter Anvin
  2014-09-29 20:10               ` Thomas Gleixner
@ 2014-09-29 20:29               ` Andy Lutomirski
  1 sibling, 0 replies; 24+ messages in thread
From: Andy Lutomirski @ 2014-09-29 20:29 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner, Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar,
	Sebastian Lackner

On Mon, Sep 29, 2014 at 1:01 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/29/2014 12:57 PM, Thomas Gleixner wrote:
>>
>>> I think we should mask the bit anyway.
>>
>> I tend to disagree. If we clear it there we need to consequentely
>> audit ALL other possibilites and if there are any we need to clear the
>> bit there as well. Just to make buggy user space happy?
>>
>
> The entry options into the kernel are: interrupt/exception (already
> known to be OK),

> SYSENTER32,

I don't immediately see how to fix that without adding overhead.
Maybe do it in CLEAR_RREGS?  This won't do any good if EFI ever starts
using IRET, though.  We could suck it up and fix it on entry, adding
maybe ten cycles (wild guess).

> SYSCALL32

This should happen for free if we fix SYSCALL64, I think.

Also, I thought entry_64.S was a mess.  Eww, ia32entry.S.

I can try to write the patch later today.  But I don't want to touch
the actual 32-bit kernel code -- I'll stick to the 64-bit native and
compat stuff, thanks.

--Andy

 and SYSCALL64.  It is not too
> much to work through those issues, I don't think.
>
>         -hpa
>

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 20:16         ` Andy Lutomirski
@ 2014-09-29 21:37           ` Thomas Gleixner
  2014-09-30  0:11             ` Andy Lutomirski
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2014-09-29 21:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner

On Mon, 29 Sep 2014, Andy Lutomirski wrote:
> On Mon, Sep 29, 2014 at 12:41 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, 29 Sep 2014, Andy Lutomirski wrote:
> >> On Mon, Sep 29, 2014 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> > On Mon, 29 Sep 2014, Andy Lutomirski wrote:
> >> >> Presumably interrupt delivery clears NT.  I haven't spotted where that's
> >> >> documented yet.
> >> >
> >> > Nope, that's unrelated.
> >>
> >> If it weren't the case, then we'd be totally screwed.  Fortunately, it
> >> is.  I found it: SDM Volume 3 6.12.1.2 says:
> >>
> >> (On calls to exception and interrupt
> >> handlers, the processor also clears the VM, RF, and NT flags in the
> >> EFLAGS register,
> >> after they are saved on the stack.)
> >
> > Sorry, I misunderstood your question.
> >
> > And yes on exception and interrupt entry it is cleared. Otherwise the
> > whole feature would not work at all ...
> >
> > But that's why I'm really not worried about it. While we can mask out
> > the stupid bit easily, it does not provide any value except protecting
> > silly userspace from rightfully raised exceptions.
> >
> > When I first saw that patch, I was worried about the security impact,
> > but after staring long enough at the SDM and the code, the only way it
> > can explode is when returning to user space. It cannot explode in the
> > kernel.
> 
> This is only true as long as the only use of lret from a system call
> (or kernel thread started from a system call) is to return to
> userspace.
> 
> For example, __efi64_thunk uses lretq, so mixed-mode EFI doesn't
> violate this assumption, but __efi64_thunk could just as easily have
> used iret.

And if __efi64_thunk would use iret, it would be wrong to begin with,
really. I'd rather see it die right there.
 
> IOW, I don't think there's any vulnerability here, but this makes me
> nervous.

I was pretty relaxed until you mentioned EFI ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry
  2014-09-29 21:37           ` Thomas Gleixner
@ 2014-09-30  0:11             ` Andy Lutomirski
  0 siblings, 0 replies; 24+ messages in thread
From: Andy Lutomirski @ 2014-09-30  0:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anish Bhatt, linux-kernel, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner

On Mon, Sep 29, 2014 at 2:37 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 29 Sep 2014, Andy Lutomirski wrote:
>> On Mon, Sep 29, 2014 at 12:41 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Mon, 29 Sep 2014, Andy Lutomirski wrote:
>> >> On Mon, Sep 29, 2014 at 11:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> >> > On Mon, 29 Sep 2014, Andy Lutomirski wrote:
>> >> >> Presumably interrupt delivery clears NT.  I haven't spotted where that's
>> >> >> documented yet.
>> >> >
>> >> > Nope, that's unrelated.
>> >>
>> >> If it weren't the case, then we'd be totally screwed.  Fortunately, it
>> >> is.  I found it: SDM Volume 3 6.12.1.2 says:
>> >>
>> >> (On calls to exception and interrupt
>> >> handlers, the processor also clears the VM, RF, and NT flags in the
>> >> EFLAGS register,
>> >> after they are saved on the stack.)
>> >
>> > Sorry, I misunderstood your question.
>> >
>> > And yes on exception and interrupt entry it is cleared. Otherwise the
>> > whole feature would not work at all ...
>> >
>> > But that's why I'm really not worried about it. While we can mask out
>> > the stupid bit easily, it does not provide any value except protecting
>> > silly userspace from rightfully raised exceptions.
>> >
>> > When I first saw that patch, I was worried about the security impact,
>> > but after staring long enough at the SDM and the code, the only way it
>> > can explode is when returning to user space. It cannot explode in the
>> > kernel.
>>
>> This is only true as long as the only use of lret from a system call
>> (or kernel thread started from a system call) is to return to
>> userspace.
>>
>> For example, __efi64_thunk uses lretq, so mixed-mode EFI doesn't
>> violate this assumption, but __efi64_thunk could just as easily have
>> used iret.
>
> And if __efi64_thunk would use iret, it would be wrong to begin with,
> really. I'd rather see it die right there.
>
>> IOW, I don't think there's any vulnerability here, but this makes me
>> nervous.
>
> I was pretty relaxed until you mentioned EFI ....

:)

I have a patch that seems to work.  It won't have any effect at all on
syscall performance (32- or 64-bit), but it slows down sysenter by 15
cycles or so.  On the other hand, if we did this, then we could
possible stop saving and restoring RFLAGS in switch_to, which might be
worthwhile, since 64-bit code ought to be more common than 32-bit
these days.

See also:
https://lkml.org/lkml/2006/9/18/161

--Andy

>
> Thanks,
>
>         tglx



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-09-30  0:12 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-25 19:42 [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry Anish Bhatt
2014-09-25 23:00 ` Chuck Ebbert
2014-09-29 17:30   ` Andy Lutomirski
2014-09-26 22:00 ` Chuck Ebbert
2014-09-26 22:10   ` Anish Bhatt
2014-09-26 23:32   ` Linus Torvalds
2014-09-29 17:40 ` Andy Lutomirski
2014-09-29 18:30   ` Sebastian Lackner
2014-09-29 18:43     ` Andy Lutomirski
2014-09-29 19:33     ` Thomas Gleixner
2014-09-29 19:41       ` Sebastian Lackner
2014-09-29 19:51         ` Thomas Gleixner
2014-09-29 18:59   ` Thomas Gleixner
2014-09-29 19:08     ` Andy Lutomirski
2014-09-29 19:17     ` Andy Lutomirski
2014-09-29 19:41       ` Thomas Gleixner
2014-09-29 19:43         ` H. Peter Anvin
2014-09-29 19:57           ` Thomas Gleixner
2014-09-29 20:01             ` H. Peter Anvin
2014-09-29 20:10               ` Thomas Gleixner
2014-09-29 20:29               ` Andy Lutomirski
2014-09-29 20:16         ` Andy Lutomirski
2014-09-29 21:37           ` Thomas Gleixner
2014-09-30  0:11             ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.