linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/fpu: Correct pkru/xstate inconsistency
@ 2022-02-15 15:36 Brian Geffon
  2022-02-15 15:57 ` Guenter Roeck
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Brian Geffon @ 2022-02-15 15:36 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner
  Cc: Willis Kung, Guenter Roeck, Borislav Petkov, Andy Lutomirski,
	stable, x86, linux-kernel, Brian Geffon

There are two issues with PKRU handling prior to 5.13. The first is that
when eagerly switching PKRU we check that current is not a kernel
thread as kernel threads will never use PKRU. It's possible that
this_cpu_read_stable() on current_task (ie. get_current()) is returning
an old cached value. By forcing the read with this_cpu_read() the
correct task is used. Without this it's possible when switching from
a kernel thread to a userspace thread that we'll still observe the
PF_KTHREAD flag and never restore the PKRU. And as a result this
issue only occurs when switching from a kernel thread to a userspace
thread, switching from a non kernel thread works perfectly fine because
all we consider in that situation is the flags from some other non
kernel task and the next fpu is passed in to switch_fpu_finish().

Without reloading the value finish_fpu_load() after being inlined into
__switch_to() uses a stale value of current:

  ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
  ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
  bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
  bb2:   41 f6 45 3e 20          testb  $0x20,0x3e(%r13)
  bb7:   75 1c                   jne    bd5 <__switch_to+0x20e>

By using this_cpu_read() and avoiding the cached value the compiler does
insert an additional load instruction and observes the correct value now:

  ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
  ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
  bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
  bb2:   65 48 8b 05 00 00 00    mov    %gs:0x0(%rip),%rax
  bb9:   00
  bba:   f6 40 3e 20             testb  $0x20,0x3e(%rax)
  bbe:   75 1c                   jne    bdc <__switch_to+0x215>

The second issue is when using write_pkru() we only write to the
xstate when the feature bit is set because get_xsave_addr() returns
NULL when the feature bit is not set. This is problematic as the CPU
is free to clear the feature bit when it observes the xstate in the
init state, this behavior seems to be documented a few places throughout
the kernel. If the bit was cleared then in write_pkru() we would happily
write to PKRU without ever updating the xstate, and the FPU restore on
return to userspace would load the old value agian.

Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
Signed-off-by: Brian Geffon <bgeffon@google.com>
Signed-off-by: Willis Kung <williskung@google.com>
Tested-by: Willis Kung <williskung@google.com>
---
 arch/x86/include/asm/fpu/internal.h |  2 +-
 arch/x86/include/asm/pgtable.h      | 14 ++++++++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 03b3de491b5e..540bda5bdd28 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
 	 * PKRU state is switched eagerly because it needs to be valid before we
 	 * return to userland e.g. for a copy_to_user() operation.
 	 */
-	if (!(current->flags & PF_KTHREAD)) {
+	if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
 		/*
 		 * If the PKRU bit in xsave.header.xfeatures is not set,
 		 * then the PKRU component was in init state, which means
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 9e71bf86d8d0..aa381b530de0 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
 	if (!boot_cpu_has(X86_FEATURE_OSPKE))
 		return;
 
-	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
-
 	/*
 	 * The PKRU value in xstate needs to be in sync with the value that is
 	 * written to the CPU. The FPU restore on return to userland would
 	 * otherwise load the previous value again.
 	 */
 	fpregs_lock();
-	if (pk)
-		pk->pkru = pkru;
+	/*
+	 * The CPU is free to clear the feature bit when the xstate is in the
+	 * init state. For this reason, we need to make sure the feature bit is
+	 * reset when we're explicitly writing to pkru. If we did not then we
+	 * would write to pkru and it would not be saved on a context switch.
+	 */
+	current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
+	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
+	BUG_ON(!pk);
+	pk->pkru = pkru;
 	__write_pkru(pkru);
 	fpregs_unlock();
 }
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 15:36 [PATCH] x86/fpu: Correct pkru/xstate inconsistency Brian Geffon
@ 2022-02-15 15:57 ` Guenter Roeck
  2022-02-15 16:19   ` Brian Geffon
  2022-02-15 16:20 ` Greg KH
  2022-02-15 17:07 ` Dave Hansen
  2 siblings, 1 reply; 26+ messages in thread
From: Guenter Roeck @ 2022-02-15 15:57 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Borislav Petkov,
	Andy Lutomirski, # v4 . 10+,
	x86, linux-kernel

Hi Brian,

On Tue, Feb 15, 2022 at 7:37 AM Brian Geffon <bgeffon@google.com> wrote:
>
> There are two issues with PKRU handling prior to 5.13. The first is that

Nice catch and work. One question, though: From the above, it seems
like this patch only applies to kernels earlier than v5.13 or, more
specifically, to v5.4.y and v5.10.y. Is this correct, or should it be
applied to the upstream kernel and to all applicable stable releases ?

Thanks,
Guenter

> when eagerly switching PKRU we check that current is not a kernel
> thread as kernel threads will never use PKRU. It's possible that
> this_cpu_read_stable() on current_task (ie. get_current()) is returning
> an old cached value. By forcing the read with this_cpu_read() the
> correct task is used. Without this it's possible when switching from
> a kernel thread to a userspace thread that we'll still observe the
> PF_KTHREAD flag and never restore the PKRU. And as a result this
> issue only occurs when switching from a kernel thread to a userspace
> thread, switching from a non kernel thread works perfectly fine because
> all we consider in that situation is the flags from some other non
> kernel task and the next fpu is passed in to switch_fpu_finish().
>
> Without reloading the value finish_fpu_load() after being inlined into
> __switch_to() uses a stale value of current:
>
>   ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
>   ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
>   bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
>   bb2:   41 f6 45 3e 20          testb  $0x20,0x3e(%r13)
>   bb7:   75 1c                   jne    bd5 <__switch_to+0x20e>
>
> By using this_cpu_read() and avoiding the cached value the compiler does
> insert an additional load instruction and observes the correct value now:
>
>   ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
>   ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
>   bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
>   bb2:   65 48 8b 05 00 00 00    mov    %gs:0x0(%rip),%rax
>   bb9:   00
>   bba:   f6 40 3e 20             testb  $0x20,0x3e(%rax)
>   bbe:   75 1c                   jne    bdc <__switch_to+0x215>
>
> The second issue is when using write_pkru() we only write to the
> xstate when the feature bit is set because get_xsave_addr() returns
> NULL when the feature bit is not set. This is problematic as the CPU
> is free to clear the feature bit when it observes the xstate in the
> init state, this behavior seems to be documented a few places throughout
> the kernel. If the bit was cleared then in write_pkru() we would happily
> write to PKRU without ever updating the xstate, and the FPU restore on
> return to userspace would load the old value agian.
>
> Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> Signed-off-by: Willis Kung <williskung@google.com>
> Tested-by: Willis Kung <williskung@google.com>
> ---
>  arch/x86/include/asm/fpu/internal.h |  2 +-
>  arch/x86/include/asm/pgtable.h      | 14 ++++++++++----
>  2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 03b3de491b5e..540bda5bdd28 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
>          * PKRU state is switched eagerly because it needs to be valid before we
>          * return to userland e.g. for a copy_to_user() operation.
>          */
> -       if (!(current->flags & PF_KTHREAD)) {
> +       if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
>                 /*
>                  * If the PKRU bit in xsave.header.xfeatures is not set,
>                  * then the PKRU component was in init state, which means
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 9e71bf86d8d0..aa381b530de0 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
>         if (!boot_cpu_has(X86_FEATURE_OSPKE))
>                 return;
>
> -       pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> -
>         /*
>          * The PKRU value in xstate needs to be in sync with the value that is
>          * written to the CPU. The FPU restore on return to userland would
>          * otherwise load the previous value again.
>          */
>         fpregs_lock();
> -       if (pk)
> -               pk->pkru = pkru;
> +       /*
> +        * The CPU is free to clear the feature bit when the xstate is in the
> +        * init state. For this reason, we need to make sure the feature bit is
> +        * reset when we're explicitly writing to pkru. If we did not then we
> +        * would write to pkru and it would not be saved on a context switch.
> +        */
> +       current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
> +       pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> +       BUG_ON(!pk);
> +       pk->pkru = pkru;
>         __write_pkru(pkru);
>         fpregs_unlock();
>  }
> --
> 2.35.1.265.g69c8d7142f-goog
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 15:57 ` Guenter Roeck
@ 2022-02-15 16:19   ` Brian Geffon
  2022-02-15 17:02     ` Guenter Roeck
  2022-02-15 17:10     ` Dave Hansen
  0 siblings, 2 replies; 26+ messages in thread
From: Brian Geffon @ 2022-02-15 16:19 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Borislav Petkov,
	Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, linux-kernel

On Tue, Feb 15, 2022 at 10:57 AM Guenter Roeck <groeck@google.com> wrote:
>
> Hi Brian,
>
> On Tue, Feb 15, 2022 at 7:37 AM Brian Geffon <bgeffon@google.com> wrote:
> >
> > There are two issues with PKRU handling prior to 5.13. The first is that
>
> Nice catch and work. One question, though: From the above, it seems
> like this patch only applies to kernels earlier than v5.13 or, more
> specifically, to v5.4.y and v5.10.y. Is this correct, or should it be
> applied to the upstream kernel and to all applicable stable releases ?

This only applies before 5.13, so 5.10.y and 5.4.y, the behavior
decoupled PKRU from xstate in a long series from Thomas Gleixner, but
the first commit where this would have been fixed in 5.13 would be:

  commit 954436989cc550dd91aab98363240c9c0a4b7e23
  Author: Thomas Gleixner <tglx@linutronix.de>
  Date:   Wed Jun 23 14:02:21 2021 +0200

    x86/fpu: Remove PKRU handling from switch_fpu_finish()

>
> Thanks,
> Guenter
>
> > when eagerly switching PKRU we check that current is not a kernel
> > thread as kernel threads will never use PKRU. It's possible that
> > this_cpu_read_stable() on current_task (ie. get_current()) is returning
> > an old cached value. By forcing the read with this_cpu_read() the
> > correct task is used. Without this it's possible when switching from
> > a kernel thread to a userspace thread that we'll still observe the
> > PF_KTHREAD flag and never restore the PKRU. And as a result this
> > issue only occurs when switching from a kernel thread to a userspace
> > thread, switching from a non kernel thread works perfectly fine because
> > all we consider in that situation is the flags from some other non
> > kernel task and the next fpu is passed in to switch_fpu_finish().
> >
> > Without reloading the value finish_fpu_load() after being inlined into
> > __switch_to() uses a stale value of current:
> >
> >   ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
> >   ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
> >   bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
> >   bb2:   41 f6 45 3e 20          testb  $0x20,0x3e(%r13)
> >   bb7:   75 1c                   jne    bd5 <__switch_to+0x20e>
> >
> > By using this_cpu_read() and avoiding the cached value the compiler does
> > insert an additional load instruction and observes the correct value now:
> >
> >   ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
> >   ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
> >   bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
> >   bb2:   65 48 8b 05 00 00 00    mov    %gs:0x0(%rip),%rax
> >   bb9:   00
> >   bba:   f6 40 3e 20             testb  $0x20,0x3e(%rax)
> >   bbe:   75 1c                   jne    bdc <__switch_to+0x215>
> >
> > The second issue is when using write_pkru() we only write to the
> > xstate when the feature bit is set because get_xsave_addr() returns
> > NULL when the feature bit is not set. This is problematic as the CPU
> > is free to clear the feature bit when it observes the xstate in the
> > init state, this behavior seems to be documented a few places throughout
> > the kernel. If the bit was cleared then in write_pkru() we would happily
> > write to PKRU without ever updating the xstate, and the FPU restore on
> > return to userspace would load the old value agian.
> >
> > Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> > Signed-off-by: Brian Geffon <bgeffon@google.com>
> > Signed-off-by: Willis Kung <williskung@google.com>
> > Tested-by: Willis Kung <williskung@google.com>
> > ---
> >  arch/x86/include/asm/fpu/internal.h |  2 +-
> >  arch/x86/include/asm/pgtable.h      | 14 ++++++++++----
> >  2 files changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > index 03b3de491b5e..540bda5bdd28 100644
> > --- a/arch/x86/include/asm/fpu/internal.h
> > +++ b/arch/x86/include/asm/fpu/internal.h
> > @@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
> >          * PKRU state is switched eagerly because it needs to be valid before we
> >          * return to userland e.g. for a copy_to_user() operation.
> >          */
> > -       if (!(current->flags & PF_KTHREAD)) {
> > +       if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
> >                 /*
> >                  * If the PKRU bit in xsave.header.xfeatures is not set,
> >                  * then the PKRU component was in init state, which means
> > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > index 9e71bf86d8d0..aa381b530de0 100644
> > --- a/arch/x86/include/asm/pgtable.h
> > +++ b/arch/x86/include/asm/pgtable.h
> > @@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
> >         if (!boot_cpu_has(X86_FEATURE_OSPKE))
> >                 return;
> >
> > -       pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > -
> >         /*
> >          * The PKRU value in xstate needs to be in sync with the value that is
> >          * written to the CPU. The FPU restore on return to userland would
> >          * otherwise load the previous value again.
> >          */
> >         fpregs_lock();
> > -       if (pk)
> > -               pk->pkru = pkru;
> > +       /*
> > +        * The CPU is free to clear the feature bit when the xstate is in the
> > +        * init state. For this reason, we need to make sure the feature bit is
> > +        * reset when we're explicitly writing to pkru. If we did not then we
> > +        * would write to pkru and it would not be saved on a context switch.
> > +        */
> > +       current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
> > +       pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > +       BUG_ON(!pk);
> > +       pk->pkru = pkru;
> >         __write_pkru(pkru);
> >         fpregs_unlock();
> >  }
> > --
> > 2.35.1.265.g69c8d7142f-goog
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 15:36 [PATCH] x86/fpu: Correct pkru/xstate inconsistency Brian Geffon
  2022-02-15 15:57 ` Guenter Roeck
@ 2022-02-15 16:20 ` Greg KH
  2022-02-15 17:07 ` Dave Hansen
  2 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2022-02-15 16:20 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, stable, x86, linux-kernel

On Tue, Feb 15, 2022 at 07:36:44AM -0800, Brian Geffon wrote:
> There are two issues with PKRU handling prior to 5.13. The first is that
> when eagerly switching PKRU we check that current is not a kernel
> thread as kernel threads will never use PKRU. It's possible that
> this_cpu_read_stable() on current_task (ie. get_current()) is returning
> an old cached value. By forcing the read with this_cpu_read() the
> correct task is used. Without this it's possible when switching from
> a kernel thread to a userspace thread that we'll still observe the
> PF_KTHREAD flag and never restore the PKRU. And as a result this
> issue only occurs when switching from a kernel thread to a userspace
> thread, switching from a non kernel thread works perfectly fine because
> all we consider in that situation is the flags from some other non
> kernel task and the next fpu is passed in to switch_fpu_finish().
> 
> Without reloading the value finish_fpu_load() after being inlined into
> __switch_to() uses a stale value of current:
> 
>   ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
>   ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
>   bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
>   bb2:   41 f6 45 3e 20          testb  $0x20,0x3e(%r13)
>   bb7:   75 1c                   jne    bd5 <__switch_to+0x20e>
> 
> By using this_cpu_read() and avoiding the cached value the compiler does
> insert an additional load instruction and observes the correct value now:
> 
>   ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
>   ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
>   bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
>   bb2:   65 48 8b 05 00 00 00    mov    %gs:0x0(%rip),%rax
>   bb9:   00
>   bba:   f6 40 3e 20             testb  $0x20,0x3e(%rax)
>   bbe:   75 1c                   jne    bdc <__switch_to+0x215>
> 
> The second issue is when using write_pkru() we only write to the
> xstate when the feature bit is set because get_xsave_addr() returns
> NULL when the feature bit is not set. This is problematic as the CPU
> is free to clear the feature bit when it observes the xstate in the
> init state, this behavior seems to be documented a few places throughout
> the kernel. If the bit was cleared then in write_pkru() we would happily
> write to PKRU without ever updating the xstate, and the FPU restore on
> return to userspace would load the old value agian.
> 
> Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> Signed-off-by: Willis Kung <williskung@google.com>
> Tested-by: Willis Kung <williskung@google.com>
> ---
>  arch/x86/include/asm/fpu/internal.h |  2 +-
>  arch/x86/include/asm/pgtable.h      | 14 ++++++++++----
>  2 files changed, 11 insertions(+), 5 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 16:19   ` Brian Geffon
@ 2022-02-15 17:02     ` Guenter Roeck
  2022-02-15 17:10     ` Dave Hansen
  1 sibling, 0 replies; 26+ messages in thread
From: Guenter Roeck @ 2022-02-15 17:02 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Borislav Petkov,
	Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, linux-kernel

On Tue, Feb 15, 2022 at 8:20 AM Brian Geffon <bgeffon@google.com> wrote:
>
> On Tue, Feb 15, 2022 at 10:57 AM Guenter Roeck <groeck@google.com> wrote:
> >
> > Hi Brian,
> >
> > On Tue, Feb 15, 2022 at 7:37 AM Brian Geffon <bgeffon@google.com> wrote:
> > >
> > > There are two issues with PKRU handling prior to 5.13. The first is that
> >
> > Nice catch and work. One question, though: From the above, it seems
> > like this patch only applies to kernels earlier than v5.13 or, more
> > specifically, to v5.4.y and v5.10.y. Is this correct, or should it be
> > applied to the upstream kernel and to all applicable stable releases ?
>
> This only applies before 5.13, so 5.10.y and 5.4.y, the behavior
> decoupled PKRU from xstate in a long series from Thomas Gleixner, but
> the first commit where this would have been fixed in 5.13 would be:
>

Thought so. You'll have to explain that clarly in both subject and
description to get the patch accepted to the affected stable releases.

Thanks,
Guenter

>   commit 954436989cc550dd91aab98363240c9c0a4b7e23
>   Author: Thomas Gleixner <tglx@linutronix.de>
>   Date:   Wed Jun 23 14:02:21 2021 +0200
>
>     x86/fpu: Remove PKRU handling from switch_fpu_finish()
>
> >
> > Thanks,
> > Guenter
> >
> > > when eagerly switching PKRU we check that current is not a kernel
> > > thread as kernel threads will never use PKRU. It's possible that
> > > this_cpu_read_stable() on current_task (ie. get_current()) is returning
> > > an old cached value. By forcing the read with this_cpu_read() the
> > > correct task is used. Without this it's possible when switching from
> > > a kernel thread to a userspace thread that we'll still observe the
> > > PF_KTHREAD flag and never restore the PKRU. And as a result this
> > > issue only occurs when switching from a kernel thread to a userspace
> > > thread, switching from a non kernel thread works perfectly fine because
> > > all we consider in that situation is the flags from some other non
> > > kernel task and the next fpu is passed in to switch_fpu_finish().
> > >
> > > Without reloading the value finish_fpu_load() after being inlined into
> > > __switch_to() uses a stale value of current:
> > >
> > >   ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
> > >   ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
> > >   bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
> > >   bb2:   41 f6 45 3e 20          testb  $0x20,0x3e(%r13)
> > >   bb7:   75 1c                   jne    bd5 <__switch_to+0x20e>
> > >
> > > By using this_cpu_read() and avoiding the cached value the compiler does
> > > insert an additional load instruction and observes the correct value now:
> > >
> > >   ba1:   8b 35 00 00 00 00       mov    0x0(%rip),%esi
> > >   ba7:   f0 41 80 4d 01 40       lock orb $0x40,0x1(%r13)
> > >   bad:   e9 00 00 00 00          jmp    bb2 <__switch_to+0x1eb>
> > >   bb2:   65 48 8b 05 00 00 00    mov    %gs:0x0(%rip),%rax
> > >   bb9:   00
> > >   bba:   f6 40 3e 20             testb  $0x20,0x3e(%rax)
> > >   bbe:   75 1c                   jne    bdc <__switch_to+0x215>
> > >
> > > The second issue is when using write_pkru() we only write to the
> > > xstate when the feature bit is set because get_xsave_addr() returns
> > > NULL when the feature bit is not set. This is problematic as the CPU
> > > is free to clear the feature bit when it observes the xstate in the
> > > init state, this behavior seems to be documented a few places throughout
> > > the kernel. If the bit was cleared then in write_pkru() we would happily
> > > write to PKRU without ever updating the xstate, and the FPU restore on
> > > return to userspace would load the old value agian.
> > >
> > > Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> > > Signed-off-by: Brian Geffon <bgeffon@google.com>
> > > Signed-off-by: Willis Kung <williskung@google.com>
> > > Tested-by: Willis Kung <williskung@google.com>
> > > ---
> > >  arch/x86/include/asm/fpu/internal.h |  2 +-
> > >  arch/x86/include/asm/pgtable.h      | 14 ++++++++++----
> > >  2 files changed, 11 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > > index 03b3de491b5e..540bda5bdd28 100644
> > > --- a/arch/x86/include/asm/fpu/internal.h
> > > +++ b/arch/x86/include/asm/fpu/internal.h
> > > @@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
> > >          * PKRU state is switched eagerly because it needs to be valid before we
> > >          * return to userland e.g. for a copy_to_user() operation.
> > >          */
> > > -       if (!(current->flags & PF_KTHREAD)) {
> > > +       if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
> > >                 /*
> > >                  * If the PKRU bit in xsave.header.xfeatures is not set,
> > >                  * then the PKRU component was in init state, which means
> > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > > index 9e71bf86d8d0..aa381b530de0 100644
> > > --- a/arch/x86/include/asm/pgtable.h
> > > +++ b/arch/x86/include/asm/pgtable.h
> > > @@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
> > >         if (!boot_cpu_has(X86_FEATURE_OSPKE))
> > >                 return;
> > >
> > > -       pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > > -
> > >         /*
> > >          * The PKRU value in xstate needs to be in sync with the value that is
> > >          * written to the CPU. The FPU restore on return to userland would
> > >          * otherwise load the previous value again.
> > >          */
> > >         fpregs_lock();
> > > -       if (pk)
> > > -               pk->pkru = pkru;
> > > +       /*
> > > +        * The CPU is free to clear the feature bit when the xstate is in the
> > > +        * init state. For this reason, we need to make sure the feature bit is
> > > +        * reset when we're explicitly writing to pkru. If we did not then we
> > > +        * would write to pkru and it would not be saved on a context switch.
> > > +        */
> > > +       current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
> > > +       pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > > +       BUG_ON(!pk);
> > > +       pk->pkru = pkru;
> > >         __write_pkru(pkru);
> > >         fpregs_unlock();
> > >  }
> > > --
> > > 2.35.1.265.g69c8d7142f-goog
> > >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 15:36 [PATCH] x86/fpu: Correct pkru/xstate inconsistency Brian Geffon
  2022-02-15 15:57 ` Guenter Roeck
  2022-02-15 16:20 ` Greg KH
@ 2022-02-15 17:07 ` Dave Hansen
  2022-02-15 17:50   ` Brian Geffon
  2022-02-15 21:14   ` [PATCH] " Guenter Roeck
  2 siblings, 2 replies; 26+ messages in thread
From: Dave Hansen @ 2022-02-15 17:07 UTC (permalink / raw)
  To: Brian Geffon, Thomas Gleixner
  Cc: Willis Kung, Guenter Roeck, Borislav Petkov, Andy Lutomirski,
	stable, x86, linux-kernel

On 2/15/22 07:36, Brian Geffon wrote:
> There are two issues with PKRU handling prior to 5.13. 

Are you sure both of these issues were introduced by 0cecca9d03c?  I'm
surprised that the get_xsave_addr() issue is not older.

Should this be two patches?

> The first is that when eagerly switching PKRU we check that current

Don't forget to write in imperative mood.  No "we's", please.

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html

This goes for changelogs and comments too.

> is not a kernel thread as kernel threads will never use PKRU. It's
> possible that this_cpu_read_stable() on current_task (ie.
> get_current()) is returning an old cached value. By forcing the read
> with this_cpu_read() the correct task is used. Without this it's
> possible when switching from a kernel thread to a userspace thread
> that we'll still observe the PF_KTHREAD flag and never restore the
> PKRU. And as a result this issue only occurs when switching from a
> kernel thread to a userspace thread, switching from a non kernel
> thread works perfectly fine because all we consider in that situation
> is the flags from some other non kernel task and the next fpu is
> passed in to switch_fpu_finish().

It makes *sense* that there would be a place in the context switch code
where 'current' is wonky, but I never realized this.  This seems really
fragile, but *also* trivially detectable.

Is the PKRU code really the only code to use 'current' in a buggy way
like this?

> The second issue is when using write_pkru() we only write to the
> xstate when the feature bit is set because get_xsave_addr() returns
> NULL when the feature bit is not set. This is problematic as the CPU
> is free to clear the feature bit when it observes the xstate in the
> init state, this behavior seems to be documented a few places throughout
> the kernel. If the bit was cleared then in write_pkru() we would happily
> write to PKRU without ever updating the xstate, and the FPU restore on
> return to userspace would load the old value agian.


						^ again

It's probably worth noting that the AMD init tracker is a lot more
aggressive than Intel's.  On Intel, I think XRSTOR is the only way to
get back to the init state.  You're obviously hitting this on AMD.

It's also *very* unlikely that PKRU gets back to a value of 0.  I think
we added a selftest for this case in later kernels.

That helps explain why this bug hung around for so long.

> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 03b3de491b5e..540bda5bdd28 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
>  	 * PKRU state is switched eagerly because it needs to be valid before we
>  	 * return to userland e.g. for a copy_to_user() operation.
>  	 */
> -	if (!(current->flags & PF_KTHREAD)) {
> +	if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {

This really deserves a specific comment.

>  		/*
>  		 * If the PKRU bit in xsave.header.xfeatures is not set,
>  		 * then the PKRU component was in init state, which means
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 9e71bf86d8d0..aa381b530de0 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
>  	if (!boot_cpu_has(X86_FEATURE_OSPKE))
>  		return;
>  
> -	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> -
>  	/*
>  	 * The PKRU value in xstate needs to be in sync with the value that is
>  	 * written to the CPU. The FPU restore on return to userland would
>  	 * otherwise load the previous value again.
>  	 */
>  	fpregs_lock();
> -	if (pk)
> -		pk->pkru = pkru;
> +	/*
> +	 * The CPU is free to clear the feature bit when the xstate is in the
> +	 * init state. For this reason, we need to make sure the feature bit is
> +	 * reset when we're explicitly writing to pkru. If we did not then we
> +	 * would write to pkru and it would not be saved on a context switch.
> +	 */
> +	current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;

I don't think we need to describe how the init optimization works again.
 I'm also not sure it's worth mentioning context switches here.  It's a
wider problem than that.  Maybe:

	/*
	 * All fpregs will be XRSTOR'd from this buffer before returning
	 * to userspace.  Ensure that XRSTOR does not init PKRU and that
	 * get_xsave_addr() will work.
	 */

> +	pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> +	BUG_ON(!pk);

A BUG_ON() a line before a NULL pointer dereference doesn't tend to do
much good.

> +	pk->pkru = pkru;
>  	__write_pkru(pkru);
>  	fpregs_unlock();
>  }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 16:19   ` Brian Geffon
  2022-02-15 17:02     ` Guenter Roeck
@ 2022-02-15 17:10     ` Dave Hansen
  1 sibling, 0 replies; 26+ messages in thread
From: Dave Hansen @ 2022-02-15 17:10 UTC (permalink / raw)
  To: Brian Geffon, Guenter Roeck
  Cc: Thomas Gleixner, Willis Kung, Borislav Petkov, Andy Lutomirski,
	# v4 . 10+,
	the arch/x86 maintainers, linux-kernel

On 2/15/22 08:19, Brian Geffon wrote:
> This only applies before 5.13, so 5.10.y and 5.4.y, the behavior
> decoupled PKRU from xstate in a long series from Thomas Gleixner, but
> the first commit where this would have been fixed in 5.13 would be:
> 
>   commit 954436989cc550dd91aab98363240c9c0a4b7e23
>   Author: Thomas Gleixner <tglx@linutronix.de>
>   Date:   Wed Jun 23 14:02:21 2021 +0200
> 
>     x86/fpu: Remove PKRU handling from switch_fpu_finish()

Could you also describe why the >=5.13 fix for this isn't suitable for
older kernels?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 17:07 ` Dave Hansen
@ 2022-02-15 17:50   ` Brian Geffon
  2022-02-15 17:55     ` Dave Hansen
  2022-02-15 21:14   ` [PATCH] " Guenter Roeck
  1 sibling, 1 reply; 26+ messages in thread
From: Brian Geffon @ 2022-02-15 17:50 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Willis Kung, Guenter Roeck, Borislav Petkov,
	Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

Hi Dave,
Thank you for taking a look at this.

On Tue, Feb 15, 2022 at 12:13 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/15/22 07:36, Brian Geffon wrote:
> > There are two issues with PKRU handling prior to 5.13.
>
> Are you sure both of these issues were introduced by 0cecca9d03c?  I'm
> surprised that the get_xsave_addr() issue is not older.
>
> Should this be two patches?

You're right, the get_xsave_addr() issue is much older than the eager
reloading of PKRU. I'll split this out into two patches.

>
> > The first is that when eagerly switching PKRU we check that current
>
> Don't forget to write in imperative mood.  No "we's", please.
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
>
> This goes for changelogs and comments too.

This will be corrected in future patches.

>
> > is not a kernel thread as kernel threads will never use PKRU. It's
> > possible that this_cpu_read_stable() on current_task (ie.
> > get_current()) is returning an old cached value. By forcing the read
> > with this_cpu_read() the correct task is used. Without this it's
> > possible when switching from a kernel thread to a userspace thread
> > that we'll still observe the PF_KTHREAD flag and never restore the
> > PKRU. And as a result this issue only occurs when switching from a
> > kernel thread to a userspace thread, switching from a non kernel
> > thread works perfectly fine because all we consider in that situation
> > is the flags from some other non kernel task and the next fpu is
> > passed in to switch_fpu_finish().
>
> It makes *sense* that there would be a place in the context switch code
> where 'current' is wonky, but I never realized this.  This seems really
> fragile, but *also* trivially detectable.
>
> Is the PKRU code really the only code to use 'current' in a buggy way
> like this?

Yes, because the remaining code in __switch_to() references the next
task as next_p rather than current. Technically, it might be more
correct to pass next_p to switch_fpu_finish(), what do you think? This
may make sense since we're also passing the next fpu anyway.

>
> > The second issue is when using write_pkru() we only write to the
> > xstate when the feature bit is set because get_xsave_addr() returns
> > NULL when the feature bit is not set. This is problematic as the CPU
> > is free to clear the feature bit when it observes the xstate in the
> > init state, this behavior seems to be documented a few places throughout
> > the kernel. If the bit was cleared then in write_pkru() we would happily
> > write to PKRU without ever updating the xstate, and the FPU restore on
> > return to userspace would load the old value agian.
>
>
>                                                 ^ again
>
> It's probably worth noting that the AMD init tracker is a lot more
> aggressive than Intel's.  On Intel, I think XRSTOR is the only way to
> get back to the init state.  You're obviously hitting this on AMD.
>
> It's also *very* unlikely that PKRU gets back to a value of 0.  I think
> we added a selftest for this case in later kernels.
>
> That helps explain why this bug hung around for so long.
>
> > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > index 03b3de491b5e..540bda5bdd28 100644
> > --- a/arch/x86/include/asm/fpu/internal.h
> > +++ b/arch/x86/include/asm/fpu/internal.h
> > @@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
> >        * PKRU state is switched eagerly because it needs to be valid before we
> >        * return to userland e.g. for a copy_to_user() operation.
> >        */
> > -     if (!(current->flags & PF_KTHREAD)) {
> > +     if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
>
> This really deserves a specific comment.
>
> >               /*
> >                * If the PKRU bit in xsave.header.xfeatures is not set,
> >                * then the PKRU component was in init state, which means
> > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > index 9e71bf86d8d0..aa381b530de0 100644
> > --- a/arch/x86/include/asm/pgtable.h
> > +++ b/arch/x86/include/asm/pgtable.h
> > @@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
> >       if (!boot_cpu_has(X86_FEATURE_OSPKE))
> >               return;
> >
> > -     pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > -
> >       /*
> >        * The PKRU value in xstate needs to be in sync with the value that is
> >        * written to the CPU. The FPU restore on return to userland would
> >        * otherwise load the previous value again.
> >        */
> >       fpregs_lock();
> > -     if (pk)
> > -             pk->pkru = pkru;
> > +     /*
> > +      * The CPU is free to clear the feature bit when the xstate is in the
> > +      * init state. For this reason, we need to make sure the feature bit is
> > +      * reset when we're explicitly writing to pkru. If we did not then we
> > +      * would write to pkru and it would not be saved on a context switch.
> > +      */
> > +     current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
>
> I don't think we need to describe how the init optimization works again.
>  I'm also not sure it's worth mentioning context switches here.  It's a
> wider problem than that.  Maybe:
>
>         /*
>          * All fpregs will be XRSTOR'd from this buffer before returning
>          * to userspace.  Ensure that XRSTOR does not init PKRU and that
>          * get_xsave_addr() will work.
>          */
>
> > +     pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > +     BUG_ON(!pk);
>
> A BUG_ON() a line before a NULL pointer dereference doesn't tend to do
> much good.
>
> > +     pk->pkru = pkru;
> >       __write_pkru(pkru);
> >       fpregs_unlock();
> >  }
>

Brian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 17:50   ` Brian Geffon
@ 2022-02-15 17:55     ` Dave Hansen
  2022-02-15 19:22       ` [PATCH stable 5.4,5.10] " Brian Geffon
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2022-02-15 17:55 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Thomas Gleixner, Willis Kung, Guenter Roeck, Borislav Petkov,
	Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On 2/15/22 09:50, Brian Geffon wrote:
>>> is not a kernel thread as kernel threads will never use PKRU. It's
>>> possible that this_cpu_read_stable() on current_task (ie.
>>> get_current()) is returning an old cached value. By forcing the read
>>> with this_cpu_read() the correct task is used. Without this it's
>>> possible when switching from a kernel thread to a userspace thread
>>> that we'll still observe the PF_KTHREAD flag and never restore the
>>> PKRU. And as a result this issue only occurs when switching from a
>>> kernel thread to a userspace thread, switching from a non kernel
>>> thread works perfectly fine because all we consider in that situation
>>> is the flags from some other non kernel task and the next fpu is
>>> passed in to switch_fpu_finish().
>>
>> It makes *sense* that there would be a place in the context switch code
>> where 'current' is wonky, but I never realized this.  This seems really
>> fragile, but *also* trivially detectable.
>>
>> Is the PKRU code really the only code to use 'current' in a buggy way
>> like this?
> 
> Yes, because the remaining code in __switch_to() references the next
> task as next_p rather than current. Technically, it might be more
> correct to pass next_p to switch_fpu_finish(), what do you think? This
> may make sense since we're also passing the next fpu anyway.

Yeah, passing next_p instead of next_fpu makes a lot of sense to me.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 17:55     ` Dave Hansen
@ 2022-02-15 19:22       ` Brian Geffon
  2022-02-15 19:44         ` Greg KH
  2022-02-24 15:16         ` Dave Hansen
  0 siblings, 2 replies; 26+ messages in thread
From: Brian Geffon @ 2022-02-15 19:22 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner
  Cc: Willis Kung, Guenter Roeck, Borislav Petkov, Andy Lutomirski,
	stable, x86, linux-kernel, Brian Geffon

When eagerly switching PKRU in switch_fpu_finish() it checks that
current is not a kernel thread as kernel threads will never use PKRU.
It's possible that this_cpu_read_stable() on current_task
(ie. get_current()) is returning an old cached value. To resolve this
reference next_p directly rather than relying on current.

As written it's possible when switching from a kernel thread to a
userspace thread to observe a cached PF_KTHREAD flag and never restore
the PKRU. And as a result this issue only occurs when switching
from a kernel thread to a userspace thread, switching from a non kernel
thread works perfectly fine because all that is considered in that
situation are the flags from some other non kernel task and the next fpu
is passed in to switch_fpu_finish().

This behavior only exists between 5.2 and 5.13 when it was fixed by a
rewrite decoupling PKRU from xstate, in:
  commit 954436989cc5 ("x86/fpu: Remove PKRU handling from switch_fpu_finish()")

Unfortunately backporting the fix from 5.13 is probably not realistic as
it's part of a 60+ patch series which rewrites most of the PKRU handling.

Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
Signed-off-by: Brian Geffon <bgeffon@google.com>
Signed-off-by: Willis Kung <williskung@google.com>
Tested-by: Willis Kung <williskung@google.com>
Cc: <stable@vger.kernel.org> # v5.4.x
Cc: <stable@vger.kernel.org> # v5.10.x
---
 arch/x86/include/asm/fpu/internal.h | 13 ++++++++-----
 arch/x86/kernel/process_32.c        |  6 ++----
 arch/x86/kernel/process_64.c        |  6 ++----
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 03b3de491b5e..5ed702e2c55f 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -560,9 +560,11 @@ static inline void __fpregs_load_activate(void)
  * The FPU context is only stored/restored for a user task and
  * PF_KTHREAD is used to distinguish between kernel and user threads.
  */
-static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
+static inline void switch_fpu_prepare(struct task_struct *prev, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU) && !(current->flags & PF_KTHREAD)) {
+	struct fpu *old_fpu = &prev->thread.fpu;
+
+	if (static_cpu_has(X86_FEATURE_FPU) && !(prev->flags & PF_KTHREAD)) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
 			old_fpu->last_cpu = -1;
 		else
@@ -581,10 +583,11 @@ static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  * Load PKRU from the FPU context if available. Delay loading of the
  * complete FPU state until the return to userland.
  */
-static inline void switch_fpu_finish(struct fpu *new_fpu)
+static inline void switch_fpu_finish(struct task_struct *next)
 {
 	u32 pkru_val = init_pkru_value;
 	struct pkru_state *pk;
+	struct fpu *next_fpu = &next->thread.fpu;
 
 	if (!static_cpu_has(X86_FEATURE_FPU))
 		return;
@@ -598,7 +601,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
 	 * PKRU state is switched eagerly because it needs to be valid before we
 	 * return to userland e.g. for a copy_to_user() operation.
 	 */
-	if (!(current->flags & PF_KTHREAD)) {
+	if (!(next->flags & PF_KTHREAD)) {
 		/*
 		 * If the PKRU bit in xsave.header.xfeatures is not set,
 		 * then the PKRU component was in init state, which means
@@ -607,7 +610,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
 		 * in memory is not valid. This means pkru_val has to be
 		 * set to 0 and not to init_pkru_value.
 		 */
-		pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
+		pk = get_xsave_addr(&next_fpu->state.xsave, XFEATURE_PKRU);
 		pkru_val = pk ? pk->pkru : 0;
 	}
 	__write_pkru(pkru_val);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index b8ceec4974fe..352f876950ab 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -229,14 +229,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 {
 	struct thread_struct *prev = &prev_p->thread,
 			     *next = &next_p->thread;
-	struct fpu *prev_fpu = &prev->fpu;
-	struct fpu *next_fpu = &next->fpu;
 	int cpu = smp_processor_id();
 
 	/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
 
 	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
-		switch_fpu_prepare(prev_fpu, cpu);
+		switch_fpu_prepare(prev_p, cpu);
 
 	/*
 	 * Save away %gs. No need to save %fs, as it was saved on the
@@ -292,7 +290,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	this_cpu_write(current_task, next_p);
 
-	switch_fpu_finish(next_fpu);
+	switch_fpu_finish(next_p);
 
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index da3cc3a10d63..633788362906 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -505,15 +505,13 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 {
 	struct thread_struct *prev = &prev_p->thread;
 	struct thread_struct *next = &next_p->thread;
-	struct fpu *prev_fpu = &prev->fpu;
-	struct fpu *next_fpu = &next->fpu;
 	int cpu = smp_processor_id();
 
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) &&
 		     this_cpu_read(irq_count) != -1);
 
 	if (!test_thread_flag(TIF_NEED_FPU_LOAD))
-		switch_fpu_prepare(prev_fpu, cpu);
+		switch_fpu_prepare(prev_p, cpu);
 
 	/* We must save %fs and %gs before load_TLS() because
 	 * %fs and %gs may be cleared by load_TLS().
@@ -565,7 +563,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish(next_fpu);
+	switch_fpu_finish(next_p);
 
 	/* Reload sp0. */
 	update_task_stack(next_p);
-- 
2.35.1.265.g69c8d7142f-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 19:22       ` [PATCH stable 5.4,5.10] " Brian Geffon
@ 2022-02-15 19:44         ` Greg KH
  2022-02-15 21:32           ` Brian Geffon
  2022-02-24 15:16         ` Dave Hansen
  1 sibling, 1 reply; 26+ messages in thread
From: Greg KH @ 2022-02-15 19:44 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, stable, x86, linux-kernel

On Tue, Feb 15, 2022 at 11:22:33AM -0800, Brian Geffon wrote:
> When eagerly switching PKRU in switch_fpu_finish() it checks that
> current is not a kernel thread as kernel threads will never use PKRU.
> It's possible that this_cpu_read_stable() on current_task
> (ie. get_current()) is returning an old cached value. To resolve this
> reference next_p directly rather than relying on current.
> 
> As written it's possible when switching from a kernel thread to a
> userspace thread to observe a cached PF_KTHREAD flag and never restore
> the PKRU. And as a result this issue only occurs when switching
> from a kernel thread to a userspace thread, switching from a non kernel
> thread works perfectly fine because all that is considered in that
> situation are the flags from some other non kernel task and the next fpu
> is passed in to switch_fpu_finish().
> 
> This behavior only exists between 5.2 and 5.13 when it was fixed by a
> rewrite decoupling PKRU from xstate, in:
>   commit 954436989cc5 ("x86/fpu: Remove PKRU handling from switch_fpu_finish()")
> 
> Unfortunately backporting the fix from 5.13 is probably not realistic as
> it's part of a 60+ patch series which rewrites most of the PKRU handling.
> 
> Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> Signed-off-by: Willis Kung <williskung@google.com>
> Tested-by: Willis Kung <williskung@google.com>
> Cc: <stable@vger.kernel.org> # v5.4.x
> Cc: <stable@vger.kernel.org> # v5.10.x
> ---
>  arch/x86/include/asm/fpu/internal.h | 13 ++++++++-----
>  arch/x86/kernel/process_32.c        |  6 ++----
>  arch/x86/kernel/process_64.c        |  6 ++----
>  3 files changed, 12 insertions(+), 13 deletions(-)

So this is ONLY for 5.4.y and 5.10.y?  I'm really really loath to take
non-upstream changes as 95% of the time (really) it goes wrong.

How was this tested, and what do the maintainers of this subsystem
think?  And will you be around to fix the bugs in this when they are
found?

And finally, what's wrong with 60+ patches to backport to fix a severe
issue?  What's preventing that from happening?  Did you try it and see
what exactly is involved?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 17:07 ` Dave Hansen
  2022-02-15 17:50   ` Brian Geffon
@ 2022-02-15 21:14   ` Guenter Roeck
  2022-02-15 21:36     ` Brian Geffon
  1 sibling, 1 reply; 26+ messages in thread
From: Guenter Roeck @ 2022-02-15 21:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Brian Geffon, Thomas Gleixner, Willis Kung, Borislav Petkov,
	Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, linux-kernel

On Tue, Feb 15, 2022 at 9:13 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/15/22 07:36, Brian Geffon wrote:
> > There are two issues with PKRU handling prior to 5.13.
>
> Are you sure both of these issues were introduced by 0cecca9d03c?  I'm
> surprised that the get_xsave_addr() issue is not older.
>
> Should this be two patches?
>
> > The first is that when eagerly switching PKRU we check that current
>
> Don't forget to write in imperative mood.  No "we's", please.
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
>
> This goes for changelogs and comments too.
>
> > is not a kernel thread as kernel threads will never use PKRU. It's
> > possible that this_cpu_read_stable() on current_task (ie.
> > get_current()) is returning an old cached value. By forcing the read
> > with this_cpu_read() the correct task is used. Without this it's
> > possible when switching from a kernel thread to a userspace thread
> > that we'll still observe the PF_KTHREAD flag and never restore the
> > PKRU. And as a result this issue only occurs when switching from a
> > kernel thread to a userspace thread, switching from a non kernel
> > thread works perfectly fine because all we consider in that situation
> > is the flags from some other non kernel task and the next fpu is
> > passed in to switch_fpu_finish().
>
> It makes *sense* that there would be a place in the context switch code
> where 'current' is wonky, but I never realized this.  This seems really
> fragile, but *also* trivially detectable.
>
> Is the PKRU code really the only code to use 'current' in a buggy way
> like this?
>
> > The second issue is when using write_pkru() we only write to the
> > xstate when the feature bit is set because get_xsave_addr() returns
> > NULL when the feature bit is not set. This is problematic as the CPU
> > is free to clear the feature bit when it observes the xstate in the
> > init state, this behavior seems to be documented a few places throughout
> > the kernel. If the bit was cleared then in write_pkru() we would happily
> > write to PKRU without ever updating the xstate, and the FPU restore on
> > return to userspace would load the old value agian.
>
>
>                                                 ^ again
>
> It's probably worth noting that the AMD init tracker is a lot more
> aggressive than Intel's.  On Intel, I think XRSTOR is the only way to
> get back to the init state.  You're obviously hitting this on AMD.
>

Brian should correct me here, but I think we have seen this with one
specific Intel CPU.

Brian, would it make sense to list the affected CPU model(s), or at
least the ones where we have observed the problem ?

Thanks,
Guenter

> It's also *very* unlikely that PKRU gets back to a value of 0.  I think
> we added a selftest for this case in later kernels.
>
> That helps explain why this bug hung around for so long.
>
> > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > index 03b3de491b5e..540bda5bdd28 100644
> > --- a/arch/x86/include/asm/fpu/internal.h
> > +++ b/arch/x86/include/asm/fpu/internal.h
> > @@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
> >        * PKRU state is switched eagerly because it needs to be valid before we
> >        * return to userland e.g. for a copy_to_user() operation.
> >        */
> > -     if (!(current->flags & PF_KTHREAD)) {
> > +     if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
>
> This really deserves a specific comment.
>
> >               /*
> >                * If the PKRU bit in xsave.header.xfeatures is not set,
> >                * then the PKRU component was in init state, which means
> > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > index 9e71bf86d8d0..aa381b530de0 100644
> > --- a/arch/x86/include/asm/pgtable.h
> > +++ b/arch/x86/include/asm/pgtable.h
> > @@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
> >       if (!boot_cpu_has(X86_FEATURE_OSPKE))
> >               return;
> >
> > -     pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > -
> >       /*
> >        * The PKRU value in xstate needs to be in sync with the value that is
> >        * written to the CPU. The FPU restore on return to userland would
> >        * otherwise load the previous value again.
> >        */
> >       fpregs_lock();
> > -     if (pk)
> > -             pk->pkru = pkru;
> > +     /*
> > +      * The CPU is free to clear the feature bit when the xstate is in the
> > +      * init state. For this reason, we need to make sure the feature bit is
> > +      * reset when we're explicitly writing to pkru. If we did not then we
> > +      * would write to pkru and it would not be saved on a context switch.
> > +      */
> > +     current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
>
> I don't think we need to describe how the init optimization works again.
>  I'm also not sure it's worth mentioning context switches here.  It's a
> wider problem than that.  Maybe:
>
>         /*
>          * All fpregs will be XRSTOR'd from this buffer before returning
>          * to userspace.  Ensure that XRSTOR does not init PKRU and that
>          * get_xsave_addr() will work.
>          */
>
> > +     pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > +     BUG_ON(!pk);
>
> A BUG_ON() a line before a NULL pointer dereference doesn't tend to do
> much good.
>
> > +     pk->pkru = pkru;
> >       __write_pkru(pkru);
> >       fpregs_unlock();
> >  }
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 19:44         ` Greg KH
@ 2022-02-15 21:32           ` Brian Geffon
  2022-02-15 21:42             ` Dave Hansen
  2022-02-16 10:05             ` Greg KH
  0 siblings, 2 replies; 26+ messages in thread
From: Brian Geffon @ 2022-02-15 21:32 UTC (permalink / raw)
  To: Greg KH
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On Tue, Feb 15, 2022 at 2:45 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Feb 15, 2022 at 11:22:33AM -0800, Brian Geffon wrote:
> > When eagerly switching PKRU in switch_fpu_finish() it checks that
> > current is not a kernel thread as kernel threads will never use PKRU.
> > It's possible that this_cpu_read_stable() on current_task
> > (ie. get_current()) is returning an old cached value. To resolve this
> > reference next_p directly rather than relying on current.
> >
> > As written it's possible when switching from a kernel thread to a
> > userspace thread to observe a cached PF_KTHREAD flag and never restore
> > the PKRU. And as a result this issue only occurs when switching
> > from a kernel thread to a userspace thread, switching from a non kernel
> > thread works perfectly fine because all that is considered in that
> > situation are the flags from some other non kernel task and the next fpu
> > is passed in to switch_fpu_finish().
> >
> > This behavior only exists between 5.2 and 5.13 when it was fixed by a
> > rewrite decoupling PKRU from xstate, in:
> >   commit 954436989cc5 ("x86/fpu: Remove PKRU handling from switch_fpu_finish()")
> >
> > Unfortunately backporting the fix from 5.13 is probably not realistic as
> > it's part of a 60+ patch series which rewrites most of the PKRU handling.
> >
> > Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> > Signed-off-by: Brian Geffon <bgeffon@google.com>
> > Signed-off-by: Willis Kung <williskung@google.com>
> > Tested-by: Willis Kung <williskung@google.com>
> > Cc: <stable@vger.kernel.org> # v5.4.x
> > Cc: <stable@vger.kernel.org> # v5.10.x
> > ---
> >  arch/x86/include/asm/fpu/internal.h | 13 ++++++++-----
> >  arch/x86/kernel/process_32.c        |  6 ++----
> >  arch/x86/kernel/process_64.c        |  6 ++----
> >  3 files changed, 12 insertions(+), 13 deletions(-)
>
> So this is ONLY for 5.4.y and 5.10.y?  I'm really really loath to take
> non-upstream changes as 95% of the time (really) it goes wrong.

That's correct, this bug was introduced in 5.2 and that code was
completely refactored in 5.13 indirectly fixing it.

>
> How was this tested, and what do the maintainers of this subsystem
> think?  And will you be around to fix the bugs in this when they are
> found?

This has been trivial to reproduce, I've used a small repro which I've
put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
, I also was able to reproduce this using the protection_keys self
tests on a 11th Gen Core i5-1135G7. I'm happy to commit to addressing
any bugs that may appear. I'll see what the maintainers say, but there
is also a smaller fix that just involves using this_cpu_read() in
switch_fpu_finish() for this specific issue, although that approach
isn't as clean.

>
> And finally, what's wrong with 60+ patches to backport to fix a severe
> issue?  What's preventing that from happening?  Did you try it and see
> what exactly is involved?

It was quite a substantial rewrite of that code with fixes layered on since.

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 21:14   ` [PATCH] " Guenter Roeck
@ 2022-02-15 21:36     ` Brian Geffon
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Geffon @ 2022-02-15 21:36 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Borislav Petkov,
	Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, linux-kernel

On Tue, Feb 15, 2022 at 4:14 PM Guenter Roeck <groeck@google.com> wrote:
>
> On Tue, Feb 15, 2022 at 9:13 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 2/15/22 07:36, Brian Geffon wrote:
> > > There are two issues with PKRU handling prior to 5.13.
> >
> > Are you sure both of these issues were introduced by 0cecca9d03c?  I'm
> > surprised that the get_xsave_addr() issue is not older.
> >
> > Should this be two patches?
> >
> > > The first is that when eagerly switching PKRU we check that current
> >
> > Don't forget to write in imperative mood.  No "we's", please.
> >
> > https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
> >
> > This goes for changelogs and comments too.
> >
> > > is not a kernel thread as kernel threads will never use PKRU. It's
> > > possible that this_cpu_read_stable() on current_task (ie.
> > > get_current()) is returning an old cached value. By forcing the read
> > > with this_cpu_read() the correct task is used. Without this it's
> > > possible when switching from a kernel thread to a userspace thread
> > > that we'll still observe the PF_KTHREAD flag and never restore the
> > > PKRU. And as a result this issue only occurs when switching from a
> > > kernel thread to a userspace thread, switching from a non kernel
> > > thread works perfectly fine because all we consider in that situation
> > > is the flags from some other non kernel task and the next fpu is
> > > passed in to switch_fpu_finish().
> >
> > It makes *sense* that there would be a place in the context switch code
> > where 'current' is wonky, but I never realized this.  This seems really
> > fragile, but *also* trivially detectable.
> >
> > Is the PKRU code really the only code to use 'current' in a buggy way
> > like this?
> >
> > > The second issue is when using write_pkru() we only write to the
> > > xstate when the feature bit is set because get_xsave_addr() returns
> > > NULL when the feature bit is not set. This is problematic as the CPU
> > > is free to clear the feature bit when it observes the xstate in the
> > > init state, this behavior seems to be documented a few places throughout
> > > the kernel. If the bit was cleared then in write_pkru() we would happily
> > > write to PKRU without ever updating the xstate, and the FPU restore on
> > > return to userspace would load the old value agian.
> >
> >
> >                                                 ^ again
> >
> > It's probably worth noting that the AMD init tracker is a lot more
> > aggressive than Intel's.  On Intel, I think XRSTOR is the only way to
> > get back to the init state.  You're obviously hitting this on AMD.
> >
>
> Brian should correct me here, but I think we have seen this with one
> specific Intel CPU.
>
> Brian, would it make sense to list the affected CPU model(s), or at
> least the ones where we have observed the problem ?

The only CPU I have access to at the moment with OSPKE is an 11th Gen
Core i5-1135G7, so that's the only one I've observed it on. I can try
to search around for other hardware.

Brian

>
> Thanks,
> Guenter
>
> > It's also *very* unlikely that PKRU gets back to a value of 0.  I think
> > we added a selftest for this case in later kernels.
> >
> > That helps explain why this bug hung around for so long.
> >
> > > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> > > index 03b3de491b5e..540bda5bdd28 100644
> > > --- a/arch/x86/include/asm/fpu/internal.h
> > > +++ b/arch/x86/include/asm/fpu/internal.h
> > > @@ -598,7 +598,7 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
> > >        * PKRU state is switched eagerly because it needs to be valid before we
> > >        * return to userland e.g. for a copy_to_user() operation.
> > >        */
> > > -     if (!(current->flags & PF_KTHREAD)) {
> > > +     if (!(this_cpu_read(current_task)->flags & PF_KTHREAD)) {
> >
> > This really deserves a specific comment.
> >
> > >               /*
> > >                * If the PKRU bit in xsave.header.xfeatures is not set,
> > >                * then the PKRU component was in init state, which means
> > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > > index 9e71bf86d8d0..aa381b530de0 100644
> > > --- a/arch/x86/include/asm/pgtable.h
> > > +++ b/arch/x86/include/asm/pgtable.h
> > > @@ -140,16 +140,22 @@ static inline void write_pkru(u32 pkru)
> > >       if (!boot_cpu_has(X86_FEATURE_OSPKE))
> > >               return;
> > >
> > > -     pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > > -
> > >       /*
> > >        * The PKRU value in xstate needs to be in sync with the value that is
> > >        * written to the CPU. The FPU restore on return to userland would
> > >        * otherwise load the previous value again.
> > >        */
> > >       fpregs_lock();
> > > -     if (pk)
> > > -             pk->pkru = pkru;
> > > +     /*
> > > +      * The CPU is free to clear the feature bit when the xstate is in the
> > > +      * init state. For this reason, we need to make sure the feature bit is
> > > +      * reset when we're explicitly writing to pkru. If we did not then we
> > > +      * would write to pkru and it would not be saved on a context switch.
> > > +      */
> > > +     current->thread.fpu.state.xsave.header.xfeatures |= XFEATURE_MASK_PKRU;
> >
> > I don't think we need to describe how the init optimization works again.
> >  I'm also not sure it's worth mentioning context switches here.  It's a
> > wider problem than that.  Maybe:
> >
> >         /*
> >          * All fpregs will be XRSTOR'd from this buffer before returning
> >          * to userspace.  Ensure that XRSTOR does not init PKRU and that
> >          * get_xsave_addr() will work.
> >          */
> >
> > > +     pk = get_xsave_addr(&current->thread.fpu.state.xsave, XFEATURE_PKRU);
> > > +     BUG_ON(!pk);
> >
> > A BUG_ON() a line before a NULL pointer dereference doesn't tend to do
> > much good.
> >
> > > +     pk->pkru = pkru;
> > >       __write_pkru(pkru);
> > >       fpregs_unlock();
> > >  }
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 21:32           ` Brian Geffon
@ 2022-02-15 21:42             ` Dave Hansen
  2022-02-15 21:48               ` Brian Geffon
  2022-02-16  2:01               ` Brian Geffon
  2022-02-16 10:05             ` Greg KH
  1 sibling, 2 replies; 26+ messages in thread
From: Dave Hansen @ 2022-02-15 21:42 UTC (permalink / raw)
  To: Brian Geffon, Greg KH
  Cc: Thomas Gleixner, Willis Kung, Guenter Roeck, Borislav Petkov,
	Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On 2/15/22 13:32, Brian Geffon wrote:
>> How was this tested, and what do the maintainers of this subsystem
>> think?  And will you be around to fix the bugs in this when they are
>> found?
> This has been trivial to reproduce, I've used a small repro which I've
> put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
> , I also was able to reproduce this using the protection_keys self
> tests on a 11th Gen Core i5-1135G7. 

I've got an i7-1165G7, but I'm not seeing any failures on a
5.11 distro kernel.

How long does this take for you to reproduce?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 21:42             ` Dave Hansen
@ 2022-02-15 21:48               ` Brian Geffon
  2022-02-16  2:01               ` Brian Geffon
  1 sibling, 0 replies; 26+ messages in thread
From: Brian Geffon @ 2022-02-15 21:48 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Greg KH, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On Tue, Feb 15, 2022 at 4:42 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/15/22 13:32, Brian Geffon wrote:
> >> How was this tested, and what do the maintainers of this subsystem
> >> think?  And will you be around to fix the bugs in this when they are
> >> found?
> > This has been trivial to reproduce, I've used a small repro which I've
> > put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
> > , I also was able to reproduce this using the protection_keys self
> > tests on a 11th Gen Core i5-1135G7.
>
> I've got an i7-1165G7, but I'm not seeing any failures on a
> 5.11 distro kernel.
>
> How long does this take for you to reproduce?

It reproduces for me in just a few seconds. I'm not sure what could be
different about my setup, I've tested on a vanilla 5.10 kernel and it
reproduced the same way.

Brian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 21:42             ` Dave Hansen
  2022-02-15 21:48               ` Brian Geffon
@ 2022-02-16  2:01               ` Brian Geffon
  2022-02-16 10:05                 ` Greg KH
  1 sibling, 1 reply; 26+ messages in thread
From: Brian Geffon @ 2022-02-16  2:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Greg KH, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On Tue, Feb 15, 2022 at 4:42 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/15/22 13:32, Brian Geffon wrote:
> >> How was this tested, and what do the maintainers of this subsystem
> >> think?  And will you be around to fix the bugs in this when they are
> >> found?
> > This has been trivial to reproduce, I've used a small repro which I've
> > put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
> > , I also was able to reproduce this using the protection_keys self
> > tests on a 11th Gen Core i5-1135G7.
>
> I've got an i7-1165G7, but I'm not seeing any failures on a
> 5.11 distro kernel.
>

Hi Dave,
I suspect the reason you're not seeing it is toolchain related, I'm
building with clang 14.0.0 and it produces the sequence of
instructions which use the cached value. Let me know if there is
anything I can do to help you investigate this further.

Brian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 21:32           ` Brian Geffon
  2022-02-15 21:42             ` Dave Hansen
@ 2022-02-16 10:05             ` Greg KH
  2022-02-16 15:14               ` Brian Geffon
  2022-02-16 15:16               ` Dave Hansen
  1 sibling, 2 replies; 26+ messages in thread
From: Greg KH @ 2022-02-16 10:05 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On Tue, Feb 15, 2022 at 04:32:48PM -0500, Brian Geffon wrote:
> On Tue, Feb 15, 2022 at 2:45 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Feb 15, 2022 at 11:22:33AM -0800, Brian Geffon wrote:
> > > When eagerly switching PKRU in switch_fpu_finish() it checks that
> > > current is not a kernel thread as kernel threads will never use PKRU.
> > > It's possible that this_cpu_read_stable() on current_task
> > > (ie. get_current()) is returning an old cached value. To resolve this
> > > reference next_p directly rather than relying on current.
> > >
> > > As written it's possible when switching from a kernel thread to a
> > > userspace thread to observe a cached PF_KTHREAD flag and never restore
> > > the PKRU. And as a result this issue only occurs when switching
> > > from a kernel thread to a userspace thread, switching from a non kernel
> > > thread works perfectly fine because all that is considered in that
> > > situation are the flags from some other non kernel task and the next fpu
> > > is passed in to switch_fpu_finish().
> > >
> > > This behavior only exists between 5.2 and 5.13 when it was fixed by a
> > > rewrite decoupling PKRU from xstate, in:
> > >   commit 954436989cc5 ("x86/fpu: Remove PKRU handling from switch_fpu_finish()")
> > >
> > > Unfortunately backporting the fix from 5.13 is probably not realistic as
> > > it's part of a 60+ patch series which rewrites most of the PKRU handling.
> > >
> > > Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> > > Signed-off-by: Brian Geffon <bgeffon@google.com>
> > > Signed-off-by: Willis Kung <williskung@google.com>
> > > Tested-by: Willis Kung <williskung@google.com>
> > > Cc: <stable@vger.kernel.org> # v5.4.x
> > > Cc: <stable@vger.kernel.org> # v5.10.x
> > > ---
> > >  arch/x86/include/asm/fpu/internal.h | 13 ++++++++-----
> > >  arch/x86/kernel/process_32.c        |  6 ++----
> > >  arch/x86/kernel/process_64.c        |  6 ++----
> > >  3 files changed, 12 insertions(+), 13 deletions(-)
> >
> > So this is ONLY for 5.4.y and 5.10.y?  I'm really really loath to take
> > non-upstream changes as 95% of the time (really) it goes wrong.
> 
> That's correct, this bug was introduced in 5.2 and that code was
> completely refactored in 5.13 indirectly fixing it.

What series of commits contain that work?

And again, why not just take them?  What is wrong with that if this is
such a big issue?

> > How was this tested, and what do the maintainers of this subsystem
> > think?  And will you be around to fix the bugs in this when they are
> > found?
> 
> This has been trivial to reproduce, I've used a small repro which I've
> put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
> , I also was able to reproduce this using the protection_keys self
> tests on a 11th Gen Core i5-1135G7. I'm happy to commit to addressing
> any bugs that may appear. I'll see what the maintainers say, but there
> is also a smaller fix that just involves using this_cpu_read() in
> switch_fpu_finish() for this specific issue, although that approach
> isn't as clean.

Can you add the test to the in-kernel tests so that we make sure it is
fixed and never comes back?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-16  2:01               ` Brian Geffon
@ 2022-02-16 10:05                 ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2022-02-16 10:05 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On Tue, Feb 15, 2022 at 09:01:54PM -0500, Brian Geffon wrote:
> On Tue, Feb 15, 2022 at 4:42 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 2/15/22 13:32, Brian Geffon wrote:
> > >> How was this tested, and what do the maintainers of this subsystem
> > >> think?  And will you be around to fix the bugs in this when they are
> > >> found?
> > > This has been trivial to reproduce, I've used a small repro which I've
> > > put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
> > > , I also was able to reproduce this using the protection_keys self
> > > tests on a 11th Gen Core i5-1135G7.
> >
> > I've got an i7-1165G7, but I'm not seeing any failures on a
> > 5.11 distro kernel.
> >
> 
> Hi Dave,
> I suspect the reason you're not seeing it is toolchain related, I'm
> building with clang 14.0.0 and it produces the sequence of
> instructions which use the cached value. Let me know if there is
> anything I can do to help you investigate this further.

Do older versions of clang have this problem?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-16 10:05             ` Greg KH
@ 2022-02-16 15:14               ` Brian Geffon
  2022-02-16 15:16               ` Dave Hansen
  1 sibling, 0 replies; 26+ messages in thread
From: Brian Geffon @ 2022-02-16 15:14 UTC (permalink / raw)
  To: Greg KH
  Cc: Dave Hansen, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On Wed, Feb 16, 2022 at 5:05 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Feb 15, 2022 at 04:32:48PM -0500, Brian Geffon wrote:
> > On Tue, Feb 15, 2022 at 2:45 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Tue, Feb 15, 2022 at 11:22:33AM -0800, Brian Geffon wrote:
> > > > When eagerly switching PKRU in switch_fpu_finish() it checks that
> > > > current is not a kernel thread as kernel threads will never use PKRU.
> > > > It's possible that this_cpu_read_stable() on current_task
> > > > (ie. get_current()) is returning an old cached value. To resolve this
> > > > reference next_p directly rather than relying on current.
> > > >
> > > > As written it's possible when switching from a kernel thread to a
> > > > userspace thread to observe a cached PF_KTHREAD flag and never restore
> > > > the PKRU. And as a result this issue only occurs when switching
> > > > from a kernel thread to a userspace thread, switching from a non kernel
> > > > thread works perfectly fine because all that is considered in that
> > > > situation are the flags from some other non kernel task and the next fpu
> > > > is passed in to switch_fpu_finish().
> > > >
> > > > This behavior only exists between 5.2 and 5.13 when it was fixed by a
> > > > rewrite decoupling PKRU from xstate, in:
> > > >   commit 954436989cc5 ("x86/fpu: Remove PKRU handling from switch_fpu_finish()")
> > > >
> > > > Unfortunately backporting the fix from 5.13 is probably not realistic as
> > > > it's part of a 60+ patch series which rewrites most of the PKRU handling.
> > > >
> > > > Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> > > > Signed-off-by: Brian Geffon <bgeffon@google.com>
> > > > Signed-off-by: Willis Kung <williskung@google.com>
> > > > Tested-by: Willis Kung <williskung@google.com>
> > > > Cc: <stable@vger.kernel.org> # v5.4.x
> > > > Cc: <stable@vger.kernel.org> # v5.10.x
> > > > ---
> > > >  arch/x86/include/asm/fpu/internal.h | 13 ++++++++-----
> > > >  arch/x86/kernel/process_32.c        |  6 ++----
> > > >  arch/x86/kernel/process_64.c        |  6 ++----
> > > >  3 files changed, 12 insertions(+), 13 deletions(-)
> > >
> > > So this is ONLY for 5.4.y and 5.10.y?  I'm really really loath to take
> > > non-upstream changes as 95% of the time (really) it goes wrong.
> >
> > That's correct, this bug was introduced in 5.2 and that code was
> > completely refactored in 5.13 indirectly fixing it.
>

Hi Greg,

> What series of commits contain that work?

This is the series,
https://lore.kernel.org/all/20210623120127.327154589@linutronix.de/,
it does quite a bit of cleanup to correct the larger problem of having
pkru merged into xstate.

> And again, why not just take them?  What is wrong with that if this is
> such a big issue?

Anything is possible I suppose but looking through the series it seems
that it's not going to apply cleanly so we're going to end up with
something that, if we made it work, would look very different and
would touch a much larger cross section of code. If the concern here
is risk of things going wrong, attempting to pull back or cherry-pick
parts of this series seems riskier. However, if we don't attempt to
pull back all those patches I will likely be proposing at least one
more small patch for 5.4 and 5.10 to fix PKRU in these kernels because
right now it's broken, particularly on AMD processors as Dave
mentioned.

>
> > > How was this tested, and what do the maintainers of this subsystem
> > > think?  And will you be around to fix the bugs in this when they are
> > > found?
> >
> > This has been trivial to reproduce, I've used a small repro which I've
> > put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
> > , I also was able to reproduce this using the protection_keys self
> > tests on a 11th Gen Core i5-1135G7. I'm happy to commit to addressing
> > any bugs that may appear. I'll see what the maintainers say, but there
> > is also a smaller fix that just involves using this_cpu_read() in
> > switch_fpu_finish() for this specific issue, although that approach
> > isn't as clean.
>
> Can you add the test to the in-kernel tests so that we make sure it is
> fixed and never comes back?

I'm already able to reproduce it with the kernel selftests. To be
honest, I'm not sure why this hasn't been reported yet. I could be
doing something horribly wrong. But it seems the likely reason is that
my compiler is doing what it's allowed to do, which is optimize the
load of current_task. current -> get_current() ->
this_cpu_read_stable(current_task) is allowed to read a cached value.
Perhaps gcc is just not taking advantage of that optimization, I'm not
sure. But writing to current_task and then immediately reading it back
via this_cpu_read_stable() can be problematic for this reason, and the
disassembled code shows that this happening.

Brian

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-16 10:05             ` Greg KH
  2022-02-16 15:14               ` Brian Geffon
@ 2022-02-16 15:16               ` Dave Hansen
  2022-02-17 13:31                 ` Brian Geffon
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2022-02-16 15:16 UTC (permalink / raw)
  To: Greg KH, Brian Geffon
  Cc: Thomas Gleixner, Willis Kung, Guenter Roeck, Borislav Petkov,
	Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On 2/16/22 02:05, Greg KH wrote:
>>> How was this tested, and what do the maintainers of this subsystem
>>> think?  And will you be around to fix the bugs in this when they are
>>> found?
>> This has been trivial to reproduce, I've used a small repro which I've
>> put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
>> , I also was able to reproduce this using the protection_keys self
>> tests on a 11th Gen Core i5-1135G7. I'm happy to commit to addressing
>> any bugs that may appear. I'll see what the maintainers say, but there
>> is also a smaller fix that just involves using this_cpu_read() in
>> switch_fpu_finish() for this specific issue, although that approach
>> isn't as clean.
> Can you add the test to the in-kernel tests so that we make sure it is
> fixed and never comes back?

It would be great if Brian could confirm this.  But, I'm 99% sure that
this can be reproduced in the vm/protection_keys.c selftest, if you run
it for long enough.

The symptom here is corruption of the PKRU register.  I created *lots*
of bugs like this during protection keys development so the selftest
keeps a shadow copy of the register to specifically watch for corruption.

It's _plausible_ that no one ever ran the pkey selftests with a
clang-compiled kernel for long enough to hit this issue.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-16 15:16               ` Dave Hansen
@ 2022-02-17 13:31                 ` Brian Geffon
  2022-02-17 16:44                   ` Dave Hansen
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Geffon @ 2022-02-17 13:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Greg KH, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On Wed, Feb 16, 2022 at 10:16 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/16/22 02:05, Greg KH wrote:
> >>> How was this tested, and what do the maintainers of this subsystem
> >>> think?  And will you be around to fix the bugs in this when they are
> >>> found?
> >> This has been trivial to reproduce, I've used a small repro which I've
> >> put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
> >> , I also was able to reproduce this using the protection_keys self
> >> tests on a 11th Gen Core i5-1135G7. I'm happy to commit to addressing
> >> any bugs that may appear. I'll see what the maintainers say, but there
> >> is also a smaller fix that just involves using this_cpu_read() in
> >> switch_fpu_finish() for this specific issue, although that approach
> >> isn't as clean.
> > Can you add the test to the in-kernel tests so that we make sure it is
> > fixed and never comes back?
>
> It would be great if Brian could confirm this.  But, I'm 99% sure that
> this can be reproduced in the vm/protection_keys.c selftest, if you run
> it for long enough.

Hi Dave,
Yes, this is reproduced by the protection keys selfs tests. If your
kernel was compiled in a way which caches current_task when read via
this_cpu_read_stable(), then when switching from a kernel thread to a
user thread you will observe this behavior, so the only situation
where it's timing related is waiting for that switch from a kernel to
user thread. If your kernel is compiled in a way which does not cache
the value of current_task then you will never observe this behavior.
My understanding is that this is perfectly valid for the compiler to
produce that code.

>
> The symptom here is corruption of the PKRU register.  I created *lots*
> of bugs like this during protection keys development so the selftest
> keeps a shadow copy of the register to specifically watch for corruption.
>
> It's _plausible_ that no one ever ran the pkey selftests with a
> clang-compiled kernel for long enough to hit this issue.

For ChromeOS we use clang and when I tested a vanilla 5.10 kernel
_built with clang_ it also reproduced, so I suspect you're right that
the self tests just haven't been run against clang built kernels all
that often.

How would you and Greg KH like to proceed with this? I'm happy to help
however I can.

Brian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-17 13:31                 ` Brian Geffon
@ 2022-02-17 16:44                   ` Dave Hansen
  2022-02-17 20:42                     ` Brian Geffon
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2022-02-17 16:44 UTC (permalink / raw)
  To: Brian Geffon
  Cc: Greg KH, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On 2/17/22 05:31, Brian Geffon wrote:
> How would you and Greg KH like to proceed with this? I'm happy to help
> however I can.

If I could wave a magic wand, I'd just apply the whole FPU rewrite to
stable.

My second choice would be to stop managing PKRU with XSAVE.
x86_pkru_load() uses WRPKRU instead of XSAVE and keeps the task's PKRU
in task->pkru instead of the XSAVE buffer.  Doing that will take some
care, including pulling XFEATURE_PKRU out of the feature mask (RFBM) at
XRSTOR.  I _think_ that can be done in a manageable set of patches which
 will keep stable close to mainline.  I recognize that more bugs might
get introduced in the process which are unique to stable.

If you give that a shot and realize that it's not feasible to do a
subset, then we can fall back to the minimal fix.  I'm not asking for a
multi-month engineering effort here.  Maybe an hour or two to see if
it's really as scary as it looks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-17 16:44                   ` Dave Hansen
@ 2022-02-17 20:42                     ` Brian Geffon
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Geffon @ 2022-02-17 20:42 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Greg KH, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, # v4 . 10+,
	the arch/x86 maintainers, LKML

On Thu, Feb 17, 2022 at 11:44 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/17/22 05:31, Brian Geffon wrote:
> > How would you and Greg KH like to proceed with this? I'm happy to help
> > however I can.
>
> If I could wave a magic wand, I'd just apply the whole FPU rewrite to
> stable.
>
> My second choice would be to stop managing PKRU with XSAVE.
> x86_pkru_load() uses WRPKRU instead of XSAVE and keeps the task's PKRU
> in task->pkru instead of the XSAVE buffer.  Doing that will take some
> care, including pulling XFEATURE_PKRU out of the feature mask (RFBM) at
> XRSTOR.  I _think_ that can be done in a manageable set of patches which
>  will keep stable close to mainline.  I recognize that more bugs might
> get introduced in the process which are unique to stable.

Hi Dave,
I did take some time to look through that series and pick out what I
think is the minimum set that would pull out PKRU from xstate, that
list is:

   9782a712eb  x86/fpu: Add PKRU storage outside of task XSAVE buffer
   784a46618f   x86/pkeys: Move read_pkru() and write_pkru()
   ff7ebff47c59  x86/pkru: Provide pkru_write_default()
   739e2eec0f   x86/pkru: Provide pkru_get_init_value()
   fa8c84b77a   x86/cpu: Write the default PKRU value when enabling PKE
   72a6c08c44  x86/pkru: Remove xstate fiddling from write_pkru()
   2ebe81c6d8  x86/fpu: Dont restore PKRU in fpregs_restore_userspace()
   71ef453355   x86/kvm: Avoid looking up PKRU in XSAVE buffer
   954436989c  x86/fpu: Remove PKRU handling from switch_fpu_finish()
   e84ba47e31  x86/fpu: Hook up PKRU into ptrace()
   30a304a138  x86/fpu: Mask PKRU from kernel XRSTOR[S] operations
   0e8c54f6b2c  x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate()

The majority of these don't apply cleanly to 5.4, and there are some
other patches we'd have to pull back too that moved code and some of
the those won't be needed for 5.10 though. TBH, I'm not sure it makes
sense to try to do this just given the fact that most do not cleanly
apply.

Brian

>
> If you give that a shot and realize that it's not feasible to do a
> subset, then we can fall back to the minimal fix.  I'm not asking for a
> multi-month engineering effort here.  Maybe an hour or two to see if
> it's really as scary as it looks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-15 19:22       ` [PATCH stable 5.4,5.10] " Brian Geffon
  2022-02-15 19:44         ` Greg KH
@ 2022-02-24 15:16         ` Dave Hansen
  2022-02-25 12:01           ` Greg KH
  1 sibling, 1 reply; 26+ messages in thread
From: Dave Hansen @ 2022-02-24 15:16 UTC (permalink / raw)
  To: Brian Geffon, Thomas Gleixner
  Cc: Willis Kung, Guenter Roeck, Borislav Petkov, Andy Lutomirski,
	stable, x86, linux-kernel

On 2/15/22 11:22, Brian Geffon wrote:
> When eagerly switching PKRU in switch_fpu_finish() it checks that
> current is not a kernel thread as kernel threads will never use PKRU.
> It's possible that this_cpu_read_stable() on current_task
> (ie. get_current()) is returning an old cached value. To resolve this
> reference next_p directly rather than relying on current.
> 
> As written it's possible when switching from a kernel thread to a
> userspace thread to observe a cached PF_KTHREAD flag and never restore
> the PKRU. And as a result this issue only occurs when switching
> from a kernel thread to a userspace thread, switching from a non kernel
> thread works perfectly fine because all that is considered in that
> situation are the flags from some other non kernel task and the next fpu
> is passed in to switch_fpu_finish().
> 
> This behavior only exists between 5.2 and 5.13 when it was fixed by a
> rewrite decoupling PKRU from xstate, in:
>   commit 954436989cc5 ("x86/fpu: Remove PKRU handling from switch_fpu_finish()")
> 
> Unfortunately backporting the fix from 5.13 is probably not realistic as
> it's part of a 60+ patch series which rewrites most of the PKRU handling.
> 
> Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> Signed-off-by: Brian Geffon <bgeffon@google.com>
> Signed-off-by: Willis Kung <williskung@google.com>
> Tested-by: Willis Kung <williskung@google.com>
> Cc: <stable@vger.kernel.org> # v5.4.x
> Cc: <stable@vger.kernel.org> # v5.10.x

I don't like forking the stable code from mainline.  But I also think
that backporting the FPU reworking that changed the PKRU handling is
likely to cause more bugs in stable than it fixes.

This fix is at least isolated to the protection keys code.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency
  2022-02-24 15:16         ` Dave Hansen
@ 2022-02-25 12:01           ` Greg KH
  0 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2022-02-25 12:01 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Brian Geffon, Thomas Gleixner, Willis Kung, Guenter Roeck,
	Borislav Petkov, Andy Lutomirski, stable, x86, linux-kernel

On Thu, Feb 24, 2022 at 07:16:17AM -0800, Dave Hansen wrote:
> On 2/15/22 11:22, Brian Geffon wrote:
> > When eagerly switching PKRU in switch_fpu_finish() it checks that
> > current is not a kernel thread as kernel threads will never use PKRU.
> > It's possible that this_cpu_read_stable() on current_task
> > (ie. get_current()) is returning an old cached value. To resolve this
> > reference next_p directly rather than relying on current.
> > 
> > As written it's possible when switching from a kernel thread to a
> > userspace thread to observe a cached PF_KTHREAD flag and never restore
> > the PKRU. And as a result this issue only occurs when switching
> > from a kernel thread to a userspace thread, switching from a non kernel
> > thread works perfectly fine because all that is considered in that
> > situation are the flags from some other non kernel task and the next fpu
> > is passed in to switch_fpu_finish().
> > 
> > This behavior only exists between 5.2 and 5.13 when it was fixed by a
> > rewrite decoupling PKRU from xstate, in:
> >   commit 954436989cc5 ("x86/fpu: Remove PKRU handling from switch_fpu_finish()")
> > 
> > Unfortunately backporting the fix from 5.13 is probably not realistic as
> > it's part of a 60+ patch series which rewrites most of the PKRU handling.
> > 
> > Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> > Signed-off-by: Brian Geffon <bgeffon@google.com>
> > Signed-off-by: Willis Kung <williskung@google.com>
> > Tested-by: Willis Kung <williskung@google.com>
> > Cc: <stable@vger.kernel.org> # v5.4.x
> > Cc: <stable@vger.kernel.org> # v5.10.x
> 
> I don't like forking the stable code from mainline.  But I also think
> that backporting the FPU reworking that changed the PKRU handling is
> likely to cause more bugs in stable than it fixes.
> 
> This fix is at least isolated to the protection keys code.
> 
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-02-25 12:01 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-15 15:36 [PATCH] x86/fpu: Correct pkru/xstate inconsistency Brian Geffon
2022-02-15 15:57 ` Guenter Roeck
2022-02-15 16:19   ` Brian Geffon
2022-02-15 17:02     ` Guenter Roeck
2022-02-15 17:10     ` Dave Hansen
2022-02-15 16:20 ` Greg KH
2022-02-15 17:07 ` Dave Hansen
2022-02-15 17:50   ` Brian Geffon
2022-02-15 17:55     ` Dave Hansen
2022-02-15 19:22       ` [PATCH stable 5.4,5.10] " Brian Geffon
2022-02-15 19:44         ` Greg KH
2022-02-15 21:32           ` Brian Geffon
2022-02-15 21:42             ` Dave Hansen
2022-02-15 21:48               ` Brian Geffon
2022-02-16  2:01               ` Brian Geffon
2022-02-16 10:05                 ` Greg KH
2022-02-16 10:05             ` Greg KH
2022-02-16 15:14               ` Brian Geffon
2022-02-16 15:16               ` Dave Hansen
2022-02-17 13:31                 ` Brian Geffon
2022-02-17 16:44                   ` Dave Hansen
2022-02-17 20:42                     ` Brian Geffon
2022-02-24 15:16         ` Dave Hansen
2022-02-25 12:01           ` Greg KH
2022-02-15 21:14   ` [PATCH] " Guenter Roeck
2022-02-15 21:36     ` Brian Geffon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).