linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
@ 2018-08-07 17:29 Sean Christopherson
  2018-08-07 18:28 ` Dave Hansen
  2018-09-04 19:50 ` Sean Christopherson
  0 siblings, 2 replies; 12+ messages in thread
From: Sean Christopherson @ 2018-08-07 17:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: sean.j.christopherson, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

Kernel addresses are always mapped with _PAGE_USER=0, i.e. PKRU
isn't enforced, and so we should never see X86_PF_PK set on a
kernel address fault.  WARN once to capture the issue in case we
somehow don't die, e.g. the access originated in userspace.

Remove a similar check and its comment from spurious_fault_check().
The intent of the comment (and later code[1]) was simply to document
that spurious faults due to protection keys should be impossible, but
that's irrelevant and a bit of a red herring since we should never
get a protection keys fault on a kernel address regardless of the
kernel's TLB flushing behavior.

[1] http://lists-archives.com/linux-kernel/28407455-x86-pkeys-new-page-fault-error-code-bit-pf_pk.html

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
---
There's no indication that this condition has ever been encountered.
I came across the code in spurious_fault_check() and was confused as
to why we would unconditionally treat a protection keys fault as
spurious when the comment explicitly stated that such a case should
be impossible.

Dave Hansen suggested adding a WARN_ON_ONCE in spurious_fault_check(),
but it seemed more appropriate to freak out on any protection keys
fault on a kernel address since that would imply a hardware issue or
kernel bug.  I omitted a Suggested-by since this isn't necessarily
what Dave had in mind.

 arch/x86/mm/fault.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..f19a55972136 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1040,12 +1040,6 @@ static int spurious_fault_check(unsigned long error_code, pte_t *pte)
 
 	if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
 		return 0;
-	/*
-	 * Note: We do not do lazy flushing on protection key
-	 * changes, so no spurious fault will ever set X86_PF_PK.
-	 */
-	if ((error_code & X86_PF_PK))
-		return 1;
 
 	return 1;
 }
@@ -1241,6 +1235,14 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 	 * protection error (error_code & 9) == 0.
 	 */
 	if (unlikely(fault_in_kernel_space(address))) {
+		/*
+		 * We should never encounter a protection keys fault on a
+		 * kernel address as kernel address are always mapped with
+		 * _PAGE_USER=0, i.e. PKRU isn't enforced.
+		 */
+		if (WARN_ON_ONCE(error_code & X86_PF_PK))
+			goto bad_kernel_address;
+
 		if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
 			if (vmalloc_fault(address) >= 0)
 				return;
@@ -1253,6 +1255,8 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
 		/* kprobes don't want to hook the spurious faults: */
 		if (kprobes_fault(regs))
 			return;
+
+bad_kernel_address:
 		/*
 		 * Don't take the mm semaphore here. If we fixup a prefetch
 		 * fault we could otherwise deadlock:
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-08-07 17:29 [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area Sean Christopherson
@ 2018-08-07 18:28 ` Dave Hansen
  2018-08-30 10:40   ` Thomas Gleixner
  2018-08-31  2:38   ` Jann Horn
  2018-09-04 19:50 ` Sean Christopherson
  1 sibling, 2 replies; 12+ messages in thread
From: Dave Hansen @ 2018-08-07 18:28 UTC (permalink / raw)
  To: Sean Christopherson, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86

On 08/07/2018 10:29 AM, Sean Christopherson wrote:
>  	if (unlikely(fault_in_kernel_space(address))) {
> +		/*
> +		 * We should never encounter a protection keys fault on a
> +		 * kernel address as kernel address are always mapped with
> +		 * _PAGE_USER=0, i.e. PKRU isn't enforced.
> +		 */
> +		if (WARN_ON_ONCE(error_code & X86_PF_PK))
> +			goto bad_kernel_address;

I just realized one more thing: the vsyscall page can bite us here.
It's at a fault_in_kernel_space() address and we *can* trigger a pkey
fault on it if we jump to an instruction that reads from a
pkey-protected area.

We can make a gadget out of unaligned vsyscall instructions that does
that.  See:

0xffffffffff600002:  shlb   $0x0,0x0(%rax)

Then, we turn off access to all pkeys, including pkey-0, then jump to
the unaligned vsyscall instruction, which reads %rax, which is a kernel
address:

        asm("movl $0xffffffff, %eax;\
             movl $0x00000000, %ecx;\
             movl $0x00000000, %edx;\
             wrpkru;\
             movq $0xffffffffff600000, %rax;\
             movq $0xffffffffff600002, %rbx;\
             jmpq *%rbx;");

So, my bad.  It was not a good suggestion to do a WARN_ON().  But, the
other funny thing is I would have expected spurious_fault() to get us
into a fault loop, which it doesn't.  It's definitely getting *called*
with my little test program (I see it in ftrace) but it's not quite
doing what I expect.

I need to dig a bit more.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-08-07 18:28 ` Dave Hansen
@ 2018-08-30 10:40   ` Thomas Gleixner
  2018-08-30 23:33     ` Dave Hansen
  2018-08-31  2:38   ` Jann Horn
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2018-08-30 10:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sean Christopherson, linux-kernel, Ingo Molnar, H. Peter Anvin, x86

On Tue, 7 Aug 2018, Dave Hansen wrote:
> On 08/07/2018 10:29 AM, Sean Christopherson wrote:
> >  	if (unlikely(fault_in_kernel_space(address))) {
> > +		/*
> > +		 * We should never encounter a protection keys fault on a
> > +		 * kernel address as kernel address are always mapped with
> > +		 * _PAGE_USER=0, i.e. PKRU isn't enforced.
> > +		 */
> > +		if (WARN_ON_ONCE(error_code & X86_PF_PK))
> > +			goto bad_kernel_address;
> 
> I just realized one more thing: the vsyscall page can bite us here.
> It's at a fault_in_kernel_space() address and we *can* trigger a pkey
> fault on it if we jump to an instruction that reads from a
> pkey-protected area.
> 
> We can make a gadget out of unaligned vsyscall instructions that does
> that.  See:
> 
> 0xffffffffff600002:  shlb   $0x0,0x0(%rax)
> 
> Then, we turn off access to all pkeys, including pkey-0, then jump to
> the unaligned vsyscall instruction, which reads %rax, which is a kernel
> address:
> 
>         asm("movl $0xffffffff, %eax;\
>              movl $0x00000000, %ecx;\
>              movl $0x00000000, %edx;\
>              wrpkru;\
>              movq $0xffffffffff600000, %rax;\
>              movq $0xffffffffff600002, %rbx;\
>              jmpq *%rbx;");
> 
> So, my bad.  It was not a good suggestion to do a WARN_ON().  But, the
> other funny thing is I would have expected spurious_fault() to get us
> into a fault loop, which it doesn't.  It's definitely getting *called*
> with my little test program (I see it in ftrace) but it's not quite
> doing what I expect.
> 
> I need to dig a bit more.

Given the time span you should be close to ground water with your digging
by now.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-08-30 10:40   ` Thomas Gleixner
@ 2018-08-30 23:33     ` Dave Hansen
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2018-08-30 23:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sean Christopherson, linux-kernel, Ingo Molnar, H. Peter Anvin, x86

On 08/30/2018 03:40 AM, Thomas Gleixner wrote:
> Given the time span you should be close to ground water with your digging
> by now.

So, turns out that we start our spurious_fault() code with this check:

>         if (error_code != (X86_PF_WRITE | X86_PF_PROT) &&
>             error_code != (X86_PF_INSTR | X86_PF_PROT))
>                 return 0;

Which ensures that we only do spurious checking for *very* specific
error_code's.  That ends up making the X86_PF_PK check inside of
spurious_fault_check() dead code _anyway_.  It's totally unreachable as
far as I can tell.

We could add a comment above the error_code check to make it explicit
that it excludes pkeys.

But, otherwise, I think we can just axe the X86_PF_PK
spurious_fault_check().


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-08-07 18:28 ` Dave Hansen
  2018-08-30 10:40   ` Thomas Gleixner
@ 2018-08-31  2:38   ` Jann Horn
  2018-08-31  3:08     ` Andy Lutomirski
  1 sibling, 1 reply; 12+ messages in thread
From: Jann Horn @ 2018-08-31  2:38 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski
  Cc: sean.j.christopherson, kernel list, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers

On Tue, 7 Aug 2018 Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 08/07/2018 10:29 AM, Sean Christopherson wrote:
> >       if (unlikely(fault_in_kernel_space(address))) {
> > +             /*
> > +              * We should never encounter a protection keys fault on a
> > +              * kernel address as kernel address are always mapped with
> > +              * _PAGE_USER=0, i.e. PKRU isn't enforced.
> > +              */
> > +             if (WARN_ON_ONCE(error_code & X86_PF_PK))
> > +                     goto bad_kernel_address;
>
> I just realized one more thing: the vsyscall page can bite us here.
> It's at a fault_in_kernel_space() address and we *can* trigger a pkey
> fault on it if we jump to an instruction that reads from a
> pkey-protected area.
>
> We can make a gadget out of unaligned vsyscall instructions that does
> that.  See:
>
> 0xffffffffff600002:  shlb   $0x0,0x0(%rax)
>
> Then, we turn off access to all pkeys, including pkey-0, then jump to
> the unaligned vsyscall instruction, which reads %rax, which is a kernel
> address:

Andy got rid of the (native) vsyscall page in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=076ca272a14cea558b1092ec85cea08510283f2a
('x86/vsyscall/64: Drop "native" vsyscalls') a few months ago, right?
At this point, the vsyscall page should never be executable.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-08-31  2:38   ` Jann Horn
@ 2018-08-31  3:08     ` Andy Lutomirski
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2018-08-31  3:08 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dave Hansen, Andy Lutomirski, sean.j.christopherson, kernel list,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers



> On Aug 30, 2018, at 7:38 PM, Jann Horn <jannh@google.com> wrote:
> 
>> On Tue, 7 Aug 2018 Dave Hansen <dave.hansen@intel.com> wrote:
>> 
>>> On 08/07/2018 10:29 AM, Sean Christopherson wrote:
>>>      if (unlikely(fault_in_kernel_space(address))) {
>>> +             /*
>>> +              * We should never encounter a protection keys fault on a
>>> +              * kernel address as kernel address are always mapped with
>>> +              * _PAGE_USER=0, i.e. PKRU isn't enforced.
>>> +              */
>>> +             if (WARN_ON_ONCE(error_code & X86_PF_PK))
>>> +                     goto bad_kernel_address;
>> 
>> I just realized one more thing: the vsyscall page can bite us here.
>> It's at a fault_in_kernel_space() address and we *can* trigger a pkey
>> fault on it if we jump to an instruction that reads from a
>> pkey-protected area.
>> 
>> We can make a gadget out of unaligned vsyscall instructions that does
>> that.  See:
>> 
>> 0xffffffffff600002:  shlb   $0x0,0x0(%rax)
>> 
>> Then, we turn off access to all pkeys, including pkey-0, then jump to
>> the unaligned vsyscall instruction, which reads %rax, which is a kernel
>> address:
> 
> Andy got rid of the (native) vsyscall page in
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=076ca272a14cea558b1092ec85cea08510283f2a
> ('x86/vsyscall/64: Drop "native" vsyscalls') a few months ago, right?
> At this point, the vsyscall page should never be executable.

Indeed.

Can one of you cc me on the original patch?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-08-07 17:29 [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area Sean Christopherson
  2018-08-07 18:28 ` Dave Hansen
@ 2018-09-04 19:50 ` Sean Christopherson
  2018-09-04 19:56   ` Andy Lutomirski
  1 sibling, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2018-09-04 19:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andy Lutomirski, Jann Horn

Cc-ing Jann and Andy.

On Tue, Aug 07, 2018 at 10:29:20AM -0700, Sean Christopherson wrote:
> Kernel addresses are always mapped with _PAGE_USER=0, i.e. PKRU
> isn't enforced, and so we should never see X86_PF_PK set on a
> kernel address fault.  WARN once to capture the issue in case we
> somehow don't die, e.g. the access originated in userspace.
> 
> Remove a similar check and its comment from spurious_fault_check().
> The intent of the comment (and later code[1]) was simply to document
> that spurious faults due to protection keys should be impossible, but
> that's irrelevant and a bit of a red herring since we should never
> get a protection keys fault on a kernel address regardless of the
> kernel's TLB flushing behavior.
> 
> [1] http://lists-archives.com/linux-kernel/28407455-x86-pkeys-new-page-fault-error-code-bit-pf_pk.html
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> ---
> There's no indication that this condition has ever been encountered.
> I came across the code in spurious_fault_check() and was confused as
> to why we would unconditionally treat a protection keys fault as
> spurious when the comment explicitly stated that such a case should
> be impossible.
> 
> Dave Hansen suggested adding a WARN_ON_ONCE in spurious_fault_check(),
> but it seemed more appropriate to freak out on any protection keys
> fault on a kernel address since that would imply a hardware issue or
> kernel bug.  I omitted a Suggested-by since this isn't necessarily
> what Dave had in mind.
> 
>  arch/x86/mm/fault.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 2aafa6ab6103..f19a55972136 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1040,12 +1040,6 @@ static int spurious_fault_check(unsigned long error_code, pte_t *pte)
>  
>  	if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
>  		return 0;
> -	/*
> -	 * Note: We do not do lazy flushing on protection key
> -	 * changes, so no spurious fault will ever set X86_PF_PK.
> -	 */
> -	if ((error_code & X86_PF_PK))
> -		return 1;
>  
>  	return 1;
>  }
> @@ -1241,6 +1235,14 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>  	 * protection error (error_code & 9) == 0.
>  	 */
>  	if (unlikely(fault_in_kernel_space(address))) {
> +		/*
> +		 * We should never encounter a protection keys fault on a
> +		 * kernel address as kernel address are always mapped with
> +		 * _PAGE_USER=0, i.e. PKRU isn't enforced.
> +		 */
> +		if (WARN_ON_ONCE(error_code & X86_PF_PK))
> +			goto bad_kernel_address;
> +
>  		if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
>  			if (vmalloc_fault(address) >= 0)
>  				return;
> @@ -1253,6 +1255,8 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>  		/* kprobes don't want to hook the spurious faults: */
>  		if (kprobes_fault(regs))
>  			return;
> +
> +bad_kernel_address:
>  		/*
>  		 * Don't take the mm semaphore here. If we fixup a prefetch
>  		 * fault we could otherwise deadlock:
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-09-04 19:50 ` Sean Christopherson
@ 2018-09-04 19:56   ` Andy Lutomirski
  2018-09-04 21:21     ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2018-09-04 19:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, Dave Hansen, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	X86 ML, Andy Lutomirski, Jann Horn

On Tue, Sep 4, 2018 at 12:50 PM, Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
> Cc-ing Jann and Andy.
>
> On Tue, Aug 07, 2018 at 10:29:20AM -0700, Sean Christopherson wrote:
>> Kernel addresses are always mapped with _PAGE_USER=0, i.e. PKRU
>> isn't enforced, and so we should never see X86_PF_PK set on a
>> kernel address fault.  WARN once to capture the issue in case we
>> somehow don't die, e.g. the access originated in userspace.
>>
>> Remove a similar check and its comment from spurious_fault_check().
>> The intent of the comment (and later code[1]) was simply to document
>> that spurious faults due to protection keys should be impossible, but
>> that's irrelevant and a bit of a red herring since we should never
>> get a protection keys fault on a kernel address regardless of the
>> kernel's TLB flushing behavior.
>>
>> [1] http://lists-archives.com/linux-kernel/28407455-x86-pkeys-new-page-fault-error-code-bit-pf_pk.html
>>
>> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> ---
>> There's no indication that this condition has ever been encountered.
>> I came across the code in spurious_fault_check() and was confused as
>> to why we would unconditionally treat a protection keys fault as
>> spurious when the comment explicitly stated that such a case should
>> be impossible.
>>
>> Dave Hansen suggested adding a WARN_ON_ONCE in spurious_fault_check(),
>> but it seemed more appropriate to freak out on any protection keys
>> fault on a kernel address since that would imply a hardware issue or
>> kernel bug.  I omitted a Suggested-by since this isn't necessarily
>> what Dave had in mind.
>>
>>  arch/x86/mm/fault.c | 16 ++++++++++------
>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index 2aafa6ab6103..f19a55972136 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -1040,12 +1040,6 @@ static int spurious_fault_check(unsigned long error_code, pte_t *pte)
>>
>>       if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
>>               return 0;
>> -     /*
>> -      * Note: We do not do lazy flushing on protection key
>> -      * changes, so no spurious fault will ever set X86_PF_PK.
>> -      */
>> -     if ((error_code & X86_PF_PK))
>> -             return 1;
>>
>>       return 1;
>>  }
>> @@ -1241,6 +1235,14 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>>        * protection error (error_code & 9) == 0.
>>        */
>>       if (unlikely(fault_in_kernel_space(address))) {
>> +             /*
>> +              * We should never encounter a protection keys fault on a
>> +              * kernel address as kernel address are always mapped with
>> +              * _PAGE_USER=0, i.e. PKRU isn't enforced.
>> +              */
>> +             if (WARN_ON_ONCE(error_code & X86_PF_PK))
>> +                     goto bad_kernel_address;
>> +
>>               if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
>>                       if (vmalloc_fault(address) >= 0)
>>                               return;
>> @@ -1253,6 +1255,8 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>>               /* kprobes don't want to hook the spurious faults: */
>>               if (kprobes_fault(regs))
>>                       return;
>> +
>> +bad_kernel_address:
>>               /*
>>                * Don't take the mm semaphore here. If we fixup a prefetch
>>                * fault we could otherwise deadlock:

I have no objection to this patch.

Dave, why did you think that we could get a PK fault on the vsyscall
page, even on kernels that still marked it executable?  Sure, you
could get an instruction in the vsyscall page to get a PK fault, but
CR2 wouldn't point to the vsyscall page, right?

>> --
>> 2.18.0
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-09-04 19:56   ` Andy Lutomirski
@ 2018-09-04 21:21     ` Dave Hansen
  2018-09-04 21:27       ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2018-09-04 21:21 UTC (permalink / raw)
  To: Andy Lutomirski, Sean Christopherson
  Cc: LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML, Jann Horn

On 09/04/2018 12:56 PM, Andy Lutomirski wrote:
> I have no objection to this patch.
> 
> Dave, why did you think that we could get a PK fault on the vsyscall
> page, even on kernels that still marked it executable?  Sure, you
> could get an instruction in the vsyscall page to get a PK fault, but
> CR2 wouldn't point to the vsyscall page, right?

I'm inferring the CR2 value from the page fault trace point.  I see
entries like this:

 protection_keys-4313  [002] d... 420257.094541: page_fault_user:
address=_end ip=_end error_code=0x15

But, that's not a PK fault, and it triggers the "misaligned vsyscall
(exploit attempt or buggy program)" stuff in dmesg.  It's just the
symptom of trying to execute the non-executable vsyscall page.

I'm not a super big fan of this particular patch, though.  The
fault_in_kernel_space() check is really presuming two things:
1. pkey faults (PF_PK=1) only occur on user pages (_PAGE_USER=1)
2. fault_in_kernel_space()==1 addresses are never user pages

#1 is a hardware expectation.  We *can* look for that directly by just
making sure that X86_PF_PK is only set when it also comes with
X86_PF_USER in the hardware page fault error code.

(...
	Aside: We should probably explicitly separate out the hardware
	error code from the software-munged version, like we do here:
	>         if (user_mode(regs)) {
	>                 local_irq_enable();
	>                 error_code |= X86_PF_USER)

But, #2 is a bit of a more loose check.  It wasn't true for the recent
vsyscall, and I've also seen goofy drivers map memory out to userspace
quite a few times in the kernel address space.

So, I'd much rather see a X86_PF_USER check than a X86_PF_USER check.

But, as for pkeys...

The original intent here was to relay: "protection key faults can never
be spurious".  The reason in my silly comment was that we don't do lazy
flushing, but that's imprecise: the real reasoning is that we don't ever
have kernel pages on which we can take protection key faults.

IOW, I think the check here should be for "protection key faults only
occur on user pages", and all the *spurious* checking should be looking
at *just* user vs. kernel pages, like:

static int spurious_fault_check(unsigned long error_code, pte_t *pte)
{
	/* Only expect spurious faults on kernel pages: */
	WARN_ON_ONCE(pte_flags(*pte) & _PAGE_USER);
	/* Only expect spurious faults originating from kernel code: */
	WARN_ON_ONCE(error_code & X86_PF_USER);
	...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-09-04 21:21     ` Dave Hansen
@ 2018-09-04 21:27       ` Andy Lutomirski
  2018-09-05 21:35         ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2018-09-04 21:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Sean Christopherson, LKML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Jann Horn

On Tue, Sep 4, 2018 at 2:21 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 09/04/2018 12:56 PM, Andy Lutomirski wrote:
>> I have no objection to this patch.
>>
>> Dave, why did you think that we could get a PK fault on the vsyscall
>> page, even on kernels that still marked it executable?  Sure, you
>> could get an instruction in the vsyscall page to get a PK fault, but
>> CR2 wouldn't point to the vsyscall page, right?
>
> I'm inferring the CR2 value from the page fault trace point.  I see
> entries like this:
>
>  protection_keys-4313  [002] d... 420257.094541: page_fault_user:
> address=_end ip=_end error_code=0x15
>
> But, that's not a PK fault, and it triggers the "misaligned vsyscall
> (exploit attempt or buggy program)" stuff in dmesg.  It's just the
> symptom of trying to execute the non-executable vsyscall page.
>
> I'm not a super big fan of this particular patch, though.  The
> fault_in_kernel_space() check is really presuming two things:
> 1. pkey faults (PF_PK=1) only occur on user pages (_PAGE_USER=1)
> 2. fault_in_kernel_space()==1 addresses are never user pages
>
> #1 is a hardware expectation.  We *can* look for that directly by just
> making sure that X86_PF_PK is only set when it also comes with
> X86_PF_USER in the hardware page fault error code.
>
> (...
>         Aside: We should probably explicitly separate out the hardware
>         error code from the software-munged version, like we do here:
>         >         if (user_mode(regs)) {
>         >                 local_irq_enable();
>         >                 error_code |= X86_PF_USER)
>
> But, #2 is a bit of a more loose check.  It wasn't true for the recent
> vsyscall, and I've also seen goofy drivers map memory out to userspace
> quite a few times in the kernel address space.
>
> So, I'd much rather see a X86_PF_USER check than a X86_PF_USER check.
>
> But, as for pkeys...
>
> The original intent here was to relay: "protection key faults can never
> be spurious".  The reason in my silly comment was that we don't do lazy
> flushing, but that's imprecise: the real reasoning is that we don't ever
> have kernel pages on which we can take protection key faults.
>
> IOW, I think the check here should be for "protection key faults only
> occur on user pages", and all the *spurious* checking should be looking
> at *just* user vs. kernel pages, like:
>
> static int spurious_fault_check(unsigned long error_code, pte_t *pte)
> {
>         /* Only expect spurious faults on kernel pages: */
>         WARN_ON_ONCE(pte_flags(*pte) & _PAGE_USER);
>         /* Only expect spurious faults originating from kernel code: */
>         WARN_ON_ONCE(error_code & X86_PF_USER);
>         ...
>

Want to just send an alternative patch?

Also, I doubt it matters right here, but !X86_PF_USER isn't quite the
same thing as "originating from kernel code" -- it can also be user
code that does a CPL0 access due to exception delivery or access to a
descriptor table.  Which you saw plenty of times while debugging
PTI... :)  I doubt any of those should be spurious, though.

--Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-09-04 21:27       ` Andy Lutomirski
@ 2018-09-05 21:35         ` Dave Hansen
  2018-09-05 21:39           ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2018-09-05 21:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sean Christopherson, LKML, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, X86 ML, Jann Horn

On 09/04/2018 02:27 PM, Andy Lutomirski wrote:
> Also, I doubt it matters right here, but !X86_PF_USER isn't quite the
> same thing as "originating from kernel code" -- it can also be user
> code that does a CPL0 access due to exception delivery or access to a
> descriptor table.  Which you saw plenty of times while debugging
> PTI... :)  I doubt any of those should be spurious, though.

Yeah, you're talking about "implicit supervisor access".  Right?

I definitely saw those during KAISER/KPTI development.  But, it's
probably worth noting that the code that we have that _looks_ for these:

>         /*
>          * It's safe to allow irq's after cr2 has been saved and the
>          * vmalloc fault has been handled.
>          *
>          * User-mode registers count as a user access even for any
>          * potential system fault or CPU buglet:
>          */
>         if (user_mode(regs)) {
>                 local_irq_enable();
>                 error_code |= X86_PF_USER;
>                 flags |= FAULT_FLAG_USER;

is in the user address space handling portion of the code.  So, it's
really not useful for any of the _known_ implicit supervisor-mode
accesses.  I think it can only get triggered in the case of hardware or
software bugs.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area
  2018-09-05 21:35         ` Dave Hansen
@ 2018-09-05 21:39           ` Andy Lutomirski
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2018-09-05 21:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Sean Christopherson, LKML, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, X86 ML, Jann Horn

On Wed, Sep 5, 2018 at 2:35 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> On 09/04/2018 02:27 PM, Andy Lutomirski wrote:
>> Also, I doubt it matters right here, but !X86_PF_USER isn't quite the
>> same thing as "originating from kernel code" -- it can also be user
>> code that does a CPL0 access due to exception delivery or access to a
>> descriptor table.  Which you saw plenty of times while debugging
>> PTI... :)  I doubt any of those should be spurious, though.
>
> Yeah, you're talking about "implicit supervisor access".  Right?

Yes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-09-05 21:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-07 17:29 [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a bad area Sean Christopherson
2018-08-07 18:28 ` Dave Hansen
2018-08-30 10:40   ` Thomas Gleixner
2018-08-30 23:33     ` Dave Hansen
2018-08-31  2:38   ` Jann Horn
2018-08-31  3:08     ` Andy Lutomirski
2018-09-04 19:50 ` Sean Christopherson
2018-09-04 19:56   ` Andy Lutomirski
2018-09-04 21:21     ` Dave Hansen
2018-09-04 21:27       ` Andy Lutomirski
2018-09-05 21:35         ` Dave Hansen
2018-09-05 21:39           ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).