linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How should we handle illegal task FPU state?
@ 2020-10-01 17:43 Andy Lutomirski
  2020-10-01 20:32 ` Yu, Yu-cheng
  2020-10-01 21:26 ` Andrew Cooper
  0 siblings, 2 replies; 10+ messages in thread
From: Andy Lutomirski @ 2020-10-01 17:43 UTC (permalink / raw)
  To: Dave Hansen, Yu-cheng Yu, LKML, X86 ML, Rik van Riel; +Cc: Andrew Cooper

Our current handling of illegal task FPU state is currently rather
simplistic.  We basically ignore the issue with this extable code:

/*
 * Handler for when we fail to restore a task's FPU state.  We should never get
 * here because the FPU state of a task using the FPU (task->thread.fpu.state)
 * should always be valid.  However, past bugs have allowed userspace to set
 * reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
 * These caused XRSTOR to fail when switching to the task, leaking the FPU
 * registers of the task previously executing on the CPU.  Mitigate this class
 * of vulnerability by restoring from the initial state (essentially, zeroing
 * out all the FPU registers) if we can't restore from the task's FPU state.
 */
__visible bool ex_handler_fprestore(const struct exception_table_entry *fixup,
                                    struct pt_regs *regs, int trapnr,
                                    unsigned long error_code,
                                    unsigned long fault_addr)
{
        regs->ip = ex_fixup_addr(fixup);

        WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing
FPU registers.",
                  (void *)instruction_pointer(regs));

        __copy_kernel_to_fpregs(&init_fpstate, -1);
        return true;
}
EXPORT_SYMBOL_GPL(ex_handler_fprestore);

In other words, we mostly pretend that illegal FPU state can't happen,
and, if it happens, we print a WARN and we blindly run the task with
the wrong state.  This is at least an improvement from the previous
code -- see

commit d5c8028b4788f62b31fb79a331b3ad3e041fa366
Author: Eric Biggers <ebiggers@google.com>
Date:   Sat Sep 23 15:00:09 2017 +0200

    x86/fpu: Reinitialize FPU registers if restoring FPU state fails

And we have some code that tries to sanitize user state to avoid this.

IMO this all made a little bit of sense when "FPU" meant literally FPU
or at least state that was more or less just user registers.  But now
we have this fancy "supervisor" state, and I don't think we should be
running user code in a context with potentially corrupted or even
potentially incorrectly re-initialized supervisor state.  This is an
issue for SHSTK -- if an attacker can find a straightforward way to
corrupt a target task's FPU state, then that task will run with CET
disabled.  Whoops!

The question is: what do we do about it?  We have two basic choices, I think.

a) Decide that the saved FPU for a task *must* be valid at all times.
If there's a failure to restore state, kill the task.

b) Improve our failed restoration handling and maybe even
intentionally make it possible to create illegal state to allow
testing.

(a) sounds like a nice concept, but I'm not convinced it's practical.
For example, I'm not even convinced that the set of valid SSP values
is documented.

So maybe (b) is the right choice.  Getting a good implementation might
be tricky.  Right now, we restore FPU too late in
arch_exit_to_user_mode_prepare(), and that function isn't allowed to
fail or to send signals.  We could kill the task on failure, and I
suppose we could consider queueing a signal, sending IPI-to-self, and
returning with TIF_NEED_FPU_LOAD still set and bogus state.  Or we
could rework the exit-to-usermode code to allow failure.  All of this
becomes utterly gross for the return-from-NMI path, although I guess
we don't restore FPU regs in that path regardless.  Or we can
do_exit() and just bail outright.

I think it would be polite to at least allow core dumping a bogus FPU
state, and notifying ptrace() might be nice.  And, if the bogus part
of the FPU state is non-supervisor, we could plausibly deliver a
signal, but this is (as above) potentially quite difficult.

(As an aside, our current handling of signal delivery failure sucks.
We should *at least* log something useful.)


Regardless of how we decide to handle this, I do think we need to do
*something* before applying the CET patches.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How should we handle illegal task FPU state?
  2020-10-01 17:43 How should we handle illegal task FPU state? Andy Lutomirski
@ 2020-10-01 20:32 ` Yu, Yu-cheng
  2020-10-01 20:58   ` Sean Christopherson
  2020-10-01 21:26 ` Andrew Cooper
  1 sibling, 1 reply; 10+ messages in thread
From: Yu, Yu-cheng @ 2020-10-01 20:32 UTC (permalink / raw)
  To: Andy Lutomirski, Dave Hansen, LKML, X86 ML, Rik van Riel; +Cc: Andrew Cooper

On 10/1/2020 10:43 AM, Andy Lutomirski wrote:
> Our current handling of illegal task FPU state is currently rather
> simplistic.  We basically ignore the issue with this extable code:
> 
> /*
>   * Handler for when we fail to restore a task's FPU state.  We should never get
>   * here because the FPU state of a task using the FPU (task->thread.fpu.state)
>   * should always be valid.  However, past bugs have allowed userspace to set
>   * reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
>   * These caused XRSTOR to fail when switching to the task, leaking the FPU
>   * registers of the task previously executing on the CPU.  Mitigate this class
>   * of vulnerability by restoring from the initial state (essentially, zeroing
>   * out all the FPU registers) if we can't restore from the task's FPU state.
>   */
> __visible bool ex_handler_fprestore(const struct exception_table_entry *fixup,
>                                      struct pt_regs *regs, int trapnr,
>                                      unsigned long error_code,
>                                      unsigned long fault_addr)
> {
>          regs->ip = ex_fixup_addr(fixup);
> 
>          WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing
> FPU registers.",
>                    (void *)instruction_pointer(regs));
> 
>          __copy_kernel_to_fpregs(&init_fpstate, -1);
>          return true;
> }
> EXPORT_SYMBOL_GPL(ex_handler_fprestore);
> 
> In other words, we mostly pretend that illegal FPU state can't happen,
> and, if it happens, we print a WARN and we blindly run the task with
> the wrong state.  This is at least an improvement from the previous
> code -- see
> 
> commit d5c8028b4788f62b31fb79a331b3ad3e041fa366
> Author: Eric Biggers <ebiggers@google.com>
> Date:   Sat Sep 23 15:00:09 2017 +0200
> 
>      x86/fpu: Reinitialize FPU registers if restoring FPU state fails
> 
> And we have some code that tries to sanitize user state to avoid this.
> 
> IMO this all made a little bit of sense when "FPU" meant literally FPU
> or at least state that was more or less just user registers.  But now
> we have this fancy "supervisor" state, and I don't think we should be
> running user code in a context with potentially corrupted or even
> potentially incorrectly re-initialized supervisor state.  This is an
> issue for SHSTK -- if an attacker can find a straightforward way to
> corrupt a target task's FPU state, then that task will run with CET
> disabled.  Whoops!
> 
> The question is: what do we do about it?  We have two basic choices, I think.
> 
> a) Decide that the saved FPU for a task *must* be valid at all times.
> If there's a failure to restore state, kill the task.
> 
> b) Improve our failed restoration handling and maybe even
> intentionally make it possible to create illegal state to allow
> testing.
> 
> (a) sounds like a nice concept, but I'm not convinced it's practical.
> For example, I'm not even convinced that the set of valid SSP values
> is documented.
> 
> So maybe (b) is the right choice.  Getting a good implementation might
> be tricky.  Right now, we restore FPU too late in
> arch_exit_to_user_mode_prepare(), and that function isn't allowed to
> fail or to send signals.  We could kill the task on failure, and I
> suppose we could consider queueing a signal, sending IPI-to-self, and
> returning with TIF_NEED_FPU_LOAD still set and bogus state.  Or we
> could rework the exit-to-usermode code to allow failure.  All of this
> becomes utterly gross for the return-from-NMI path, although I guess
> we don't restore FPU regs in that path regardless.  Or we can
> do_exit() and just bail outright.
> 
> I think it would be polite to at least allow core dumping a bogus FPU
> state, and notifying ptrace() might be nice.  And, if the bogus part
> of the FPU state is non-supervisor, we could plausibly deliver a
> signal, but this is (as above) potentially quite difficult.
> 
> (As an aside, our current handling of signal delivery failure sucks.
> We should *at least* log something useful.)
> 
> 
> Regardless of how we decide to handle this, I do think we need to do
> *something* before applying the CET patches.
> 

Before supervisor states are introduced, XRSTOR* fails because one of 
the following: memory operand is invalid, xstate_header is wrong, or 
fxregs_state->mxcsr is wrong.  So the code in ex_handler_fprestore() was 
good.

When supervisor states are introduced for CET and PASID, XRSTORS can 
fail for only one additional reason: if it effects a WRMSR of invalid 
values.

If the kernel writes to the MSRs directly, there is wrmsr_safe().  If 
the kernel writes to MSRs' xstates, it can check the values first.  So 
this might not need a generalized handling (but I would not oppose it). 
Maybe we can add a config debug option to check if any writes to those 
MSR xstates are checked before being written (and print out warnings 
when not)?

Thanks,
Yu-cheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How should we handle illegal task FPU state?
  2020-10-01 20:32 ` Yu, Yu-cheng
@ 2020-10-01 20:58   ` Sean Christopherson
  2020-10-01 21:42     ` Andrew Cooper
  2020-10-01 21:50     ` Dave Hansen
  0 siblings, 2 replies; 10+ messages in thread
From: Sean Christopherson @ 2020-10-01 20:58 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: Andy Lutomirski, Dave Hansen, LKML, X86 ML, Rik van Riel, Andrew Cooper

On Thu, Oct 01, 2020 at 01:32:04PM -0700, Yu, Yu-cheng wrote:
> On 10/1/2020 10:43 AM, Andy Lutomirski wrote:
> >The question is: what do we do about it?  We have two basic choices, I think.
> >
> >a) Decide that the saved FPU for a task *must* be valid at all times.
> >If there's a failure to restore state, kill the task.
> >
> >b) Improve our failed restoration handling and maybe even
> >intentionally make it possible to create illegal state to allow
> >testing.
> >
> >(a) sounds like a nice concept, but I'm not convinced it's practical.
> >For example, I'm not even convinced that the set of valid SSP values
> >is documented.

Eh, crappy SDM writing isn't a good reason to make our lives harder.  The
SSP MSRs are canonical MSRs and follow the same rules as the SYSCALL,
FS/GS BASE, etc... MSRs.  I'll file an SDM bug.

> >So maybe (b) is the right choice.  Getting a good implementation might
> >be tricky.  Right now, we restore FPU too late in
> >arch_exit_to_user_mode_prepare(), and that function isn't allowed to
> >fail or to send signals.  We could kill the task on failure, and I
> >suppose we could consider queueing a signal, sending IPI-to-self, and
> >returning with TIF_NEED_FPU_LOAD still set and bogus state.  Or we
> >could rework the exit-to-usermode code to allow failure.  All of this
> >becomes utterly gross for the return-from-NMI path, although I guess
> >we don't restore FPU regs in that path regardless.  Or we can
> >do_exit() and just bail outright.
> >
> >I think it would be polite to at least allow core dumping a bogus FPU
> >state, and notifying ptrace() might be nice.  And, if the bogus part
> >of the FPU state is non-supervisor, we could plausibly deliver a
> >signal, but this is (as above) potentially quite difficult.
> >
> >(As an aside, our current handling of signal delivery failure sucks.
> >We should *at least* log something useful.)
> >
> >
> >Regardless of how we decide to handle this, I do think we need to do
> >*something* before applying the CET patches.
> >
> 
> Before supervisor states are introduced, XRSTOR* fails because one of the
> following: memory operand is invalid, xstate_header is wrong, or
> fxregs_state->mxcsr is wrong.  So the code in ex_handler_fprestore() was
> good.
> 
> When supervisor states are introduced for CET and PASID, XRSTORS can fail
> for only one additional reason: if it effects a WRMSR of invalid values.
> 
> If the kernel writes to the MSRs directly, there is wrmsr_safe().  If the
> kernel writes to MSRs' xstates, it can check the values first.  So this
> might not need a generalized handling (but I would not oppose it). Maybe we
> can add a config debug option to check if any writes to those MSR xstates
> are checked before being written (and print out warnings when not)?

That's not really checking the values first though, e.g. if the WRMSR succeeds,
which is the common case, but a later WRMSR fails, then you have to back out
the first MSR.  Even if all goes well, each WRMSR is 125+ cycles, which means
that loading state would get very painful and would defeat the entire reason
for shoving CET into XSAVE state.

Having a try-catch variant at the lowest level, i.e. propagating errors to the
the caller, and building on that sounds appealing.  E.g. KVM could use the
try-catch to test that incoming XSAVE state is valid when userspace is stuffing
guest state instead of manually validating every piece.  Validating CET and
PASID won't be too painful, but there might be a breaking point if the current
trend of shoving everything into XSAVE continues.

One thought for a lowish effort approach to pave the way for CET would be to
try XRSTORS multiple times in switch_fpu_return().  If the first try fails,
then WARN, init non-supervisor state and try a second time, and if _that_ fails
then kill the task.  I.e. do the minimum effort to play nice with bad FPU
state, but don't let anything "accidentally" turn off CET.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How should we handle illegal task FPU state?
  2020-10-01 17:43 How should we handle illegal task FPU state? Andy Lutomirski
  2020-10-01 20:32 ` Yu, Yu-cheng
@ 2020-10-01 21:26 ` Andrew Cooper
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2020-10-01 21:26 UTC (permalink / raw)
  To: Andy Lutomirski, Dave Hansen, Yu-cheng Yu, LKML, X86 ML, Rik van Riel

On 01/10/2020 18:43, Andy Lutomirski wrote:
> Our current handling of illegal task FPU state is currently rather
> simplistic.  We basically ignore the issue with this extable code:
>
> /*
>  * Handler for when we fail to restore a task's FPU state.  We should never get
>  * here because the FPU state of a task using the FPU (task->thread.fpu.state)
>  * should always be valid.  However, past bugs have allowed userspace to set
>  * reserved bits in the XSAVE area using PTRACE_SETREGSET or sys_rt_sigreturn().
>  * These caused XRSTOR to fail when switching to the task, leaking the FPU
>  * registers of the task previously executing on the CPU.  Mitigate this class
>  * of vulnerability by restoring from the initial state (essentially, zeroing
>  * out all the FPU registers) if we can't restore from the task's FPU state.
>  */
> __visible bool ex_handler_fprestore(const struct exception_table_entry *fixup,
>                                     struct pt_regs *regs, int trapnr,
>                                     unsigned long error_code,
>                                     unsigned long fault_addr)
> {
>         regs->ip = ex_fixup_addr(fixup);
>
>         WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.",
>                   (void *)instruction_pointer(regs));
>
>         __copy_kernel_to_fpregs(&init_fpstate, -1);
>         return true;
> }
> EXPORT_SYMBOL_GPL(ex_handler_fprestore);
>
> In other words, we mostly pretend that illegal FPU state can't happen,
> and, if it happens, we print a WARN and we blindly run the task with
> the wrong state.  This is at least an improvement from the previous
> code -- see
>
> commit d5c8028b4788f62b31fb79a331b3ad3e041fa366
> Author: Eric Biggers <ebiggers@google.com>
> Date:   Sat Sep 23 15:00:09 2017 +0200
>
>     x86/fpu: Reinitialize FPU registers if restoring FPU state fails
>
> And we have some code that tries to sanitize user state to avoid this.
>
> IMO this all made a little bit of sense when "FPU" meant literally FPU
> or at least state that was more or less just user registers.  But now
> we have this fancy "supervisor" state, and I don't think we should be
> running user code in a context with potentially corrupted or even
> potentially incorrectly re-initialized supervisor state.  This is an
> issue for SHSTK -- if an attacker can find a straightforward way to
> corrupt a target task's FPU state, then that task will run with CET
> disabled.  Whoops!

-1 would not recommend.

> The question is: what do we do about it?  We have two basic choices, I think.
>
> a) Decide that the saved FPU for a task *must* be valid at all times.
> If there's a failure to restore state, kill the task.
>
> b) Improve our failed restoration handling and maybe even
> intentionally make it possible to create illegal state to allow
> testing.
>
> (a) sounds like a nice concept, but I'm not convinced it's practical.
> For example, I'm not even convinced that the set of valid SSP values
> is documented.

SSP is just a stack pointer, but its not included in XSTATE.

The CET_U XSTATE component contains MSR_PL3_SSP and MSR_U_CET, both of
which have well defined validity descriptions in the manual.

The CET_S XSTATE component contains MSR_PL{2..0}_SSP, but this will be
of no interest to Linux at all.

As these can only be loaded in supervisor mode, neither operate on the
currently active SSP.

Given how broken Rings 1 and 2 are by CET-SS, I'm very surprised that an
XSTATE was allocated for this purpose.  Nothing in the architecture ever
updates these values in hardware, so they're never modified since the
last XRSTORS, unless someone played with the MSRs directly.

> So maybe (b) is the right choice.  Getting a good implementation might
> be tricky.  Right now, we restore FPU too late in
> arch_exit_to_user_mode_prepare(), and that function isn't allowed to
> fail or to send signals.  We could kill the task on failure, and I
> suppose we could consider queueing a signal, sending IPI-to-self, and
> returning with TIF_NEED_FPU_LOAD still set and bogus state.  Or we
> could rework the exit-to-usermode code to allow failure.  All of this
> becomes utterly gross for the return-from-NMI path, although I guess
> we don't restore FPU regs in that path regardless.  Or we can
> do_exit() and just bail outright.
>
> I think it would be polite to at least allow core dumping a bogus FPU
> state, and notifying ptrace() might be nice.  And, if the bogus part
> of the FPU state is non-supervisor, we could plausibly deliver a
> signal, but this is (as above) potentially quite difficult.
>
> (As an aside, our current handling of signal delivery failure sucks.
> We should *at least* log something useful.)
>
>
> Regardless of how we decide to handle this, I do think we need to do
> *something* before applying the CET patches.

Conceptually, a fault on [F]XRSTOR[S] ought to be fatal.  Something
corrupt there is definitely an exceptional circumstance.

Making it accessible in a coredump is a nice touch.  In theory it should
be impossible, but CPUs don't have a perfect track record of being able
to consume the data they emit - the Haswell/Broadwell TSX errata with
LBR MSRs come to mind, even if it wasn't XSTATE.

Identifying the faulty state isn't necessarily trivial.  Even presuming
no memory corruption, it could be one or multiple of the components, and
even if you can find the bad one(s), this issue demonstrates that
zeroing it isn't necessarily the safe thing to do.

This condition is going to be a slowpath anyway.  What about a fixup
function which runs across an arbitrary XSTATE blob, and tries to
sanitises it?  This can be reused by software (i.e. non-[F]XSAVE[S]
sources) of XSTATE to reject bad input, and on fault to try and cope
with fallout.  If the fixed up version faults again, then you have to
terminate the task anyway.

Whatever happens, even if running the task with fixed up state, the
fixup action needs to be very loud in the logs, so someone will
investigate and fix the root cause.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How should we handle illegal task FPU state?
  2020-10-01 20:58   ` Sean Christopherson
@ 2020-10-01 21:42     ` Andrew Cooper
  2020-10-01 21:50     ` Dave Hansen
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2020-10-01 21:42 UTC (permalink / raw)
  To: Sean Christopherson, Yu, Yu-cheng
  Cc: Andy Lutomirski, Dave Hansen, LKML, X86 ML, Rik van Riel

On 01/10/2020 21:58, Sean Christopherson wrote:
> On Thu, Oct 01, 2020 at 01:32:04PM -0700, Yu, Yu-cheng wrote:
>> On 10/1/2020 10:43 AM, Andy Lutomirski wrote:
>>> The question is: what do we do about it?  We have two basic choices, I think.
>>>
>>> a) Decide that the saved FPU for a task *must* be valid at all times.
>>> If there's a failure to restore state, kill the task.
>>>
>>> b) Improve our failed restoration handling and maybe even
>>> intentionally make it possible to create illegal state to allow
>>> testing.
>>>
>>> (a) sounds like a nice concept, but I'm not convinced it's practical.
>>> For example, I'm not even convinced that the set of valid SSP values
>>> is documented.
> Eh, crappy SDM writing isn't a good reason to make our lives harder.  The
> SSP MSRs are canonical MSRs and follow the same rules as the SYSCALL,
> FS/GS BASE, etc... MSRs.  I'll file an SDM bug.

Don't forget the added constraint of being 4 byte aligned. ;)

But the SDM is fine in this regard, at least as far as Vol4 goes, even
if does have an excessively verbose way of expressing itself.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How should we handle illegal task FPU state?
  2020-10-01 20:58   ` Sean Christopherson
  2020-10-01 21:42     ` Andrew Cooper
@ 2020-10-01 21:50     ` Dave Hansen
  2020-10-01 22:04       ` Andy Lutomirski
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2020-10-01 21:50 UTC (permalink / raw)
  To: Sean Christopherson, Yu, Yu-cheng
  Cc: Andy Lutomirski, LKML, X86 ML, Rik van Riel, Andrew Cooper

On 10/1/20 1:58 PM, Sean Christopherson wrote:
> One thought for a lowish effort approach to pave the way for CET would be to
> try XRSTORS multiple times in switch_fpu_return().  If the first try fails,
> then WARN, init non-supervisor state and try a second time, and if _that_ fails
> then kill the task.  I.e. do the minimum effort to play nice with bad FPU
> state, but don't let anything "accidentally" turn off CET.

I'm not sure we should ever keep running userspace after an XRSTOR*
failure.  For MPX, this might have provided a nice, additional vector
for an attacker to turn off MPX.  Same for pkeys if we didn't correctly
differentiate between the hardware init state versus the "software init"
state that we keep in init_task.

What's the advantage of letting userspace keep running after we init its
state?  That it _might_ be able to recover?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How should we handle illegal task FPU state?
  2020-10-01 21:50     ` Dave Hansen
@ 2020-10-01 22:04       ` Andy Lutomirski
  2020-10-08 18:08         ` Yu, Yu-cheng
  2020-11-02 18:39         ` Borislav Petkov
  0 siblings, 2 replies; 10+ messages in thread
From: Andy Lutomirski @ 2020-10-01 22:04 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sean Christopherson, Yu, Yu-cheng, Andy Lutomirski, LKML, X86 ML,
	Rik van Riel, Andrew Cooper

On Thu, Oct 1, 2020 at 2:50 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 10/1/20 1:58 PM, Sean Christopherson wrote:
> > One thought for a lowish effort approach to pave the way for CET would be to
> > try XRSTORS multiple times in switch_fpu_return().  If the first try fails,
> > then WARN, init non-supervisor state and try a second time, and if _that_ fails
> > then kill the task.  I.e. do the minimum effort to play nice with bad FPU
> > state, but don't let anything "accidentally" turn off CET.
>
> I'm not sure we should ever keep running userspace after an XRSTOR*
> failure.  For MPX, this might have provided a nice, additional vector
> for an attacker to turn off MPX.  Same for pkeys if we didn't correctly
> differentiate between the hardware init state versus the "software init"
> state that we keep in init_task.
>
> What's the advantage of letting userspace keep running after we init its
> state?  That it _might_ be able to recover?

I suppose we can kill userspace and change that behavior only if
someone complains.  I still think it would be polite to try to dump
core, but that could be tricky with the current code structure.  I'll
try to whip up a patch.  Maybe I'll add a debugfs file to trash MXCSR
for testing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How should we handle illegal task FPU state?
  2020-10-01 22:04       ` Andy Lutomirski
@ 2020-10-08 18:08         ` Yu, Yu-cheng
  2020-10-09  0:08           ` Andy Lutomirski
  2020-11-02 18:39         ` Borislav Petkov
  1 sibling, 1 reply; 10+ messages in thread
From: Yu, Yu-cheng @ 2020-10-08 18:08 UTC (permalink / raw)
  To: Andy Lutomirski, Dave Hansen
  Cc: Sean Christopherson, LKML, X86 ML, Rik van Riel, Andrew Cooper

On 10/1/2020 3:04 PM, Andy Lutomirski wrote:
> On Thu, Oct 1, 2020 at 2:50 PM Dave Hansen <dave.hansen@intel.com> wrote:
>>
>> On 10/1/20 1:58 PM, Sean Christopherson wrote:
>>> One thought for a lowish effort approach to pave the way for CET would be to
>>> try XRSTORS multiple times in switch_fpu_return().  If the first try fails,
>>> then WARN, init non-supervisor state and try a second time, and if _that_ fails
>>> then kill the task.  I.e. do the minimum effort to play nice with bad FPU
>>> state, but don't let anything "accidentally" turn off CET.
>>
>> I'm not sure we should ever keep running userspace after an XRSTOR*
>> failure.  For MPX, this might have provided a nice, additional vector
>> for an attacker to turn off MPX.  Same for pkeys if we didn't correctly
>> differentiate between the hardware init state versus the "software init"
>> state that we keep in init_task.
>>
>> What's the advantage of letting userspace keep running after we init its
>> state?  That it _might_ be able to recover?
> 
> I suppose we can kill userspace and change that behavior only if
> someone complains.  I still think it would be polite to try to dump
> core, but that could be tricky with the current code structure.  I'll
> try to whip up a patch.  Maybe I'll add a debugfs file to trash MXCSR
> for testing.
> 

One complication of letting XRSTORS fail is exit_to_user_mode_prepare() 
will need to go back to exit_to_user_mode_loop() again (or repeat some 
parts of it).

Currently, when exit_to_user_mode_loop() exits, xstates should have been 
validated earlier and to be restored shortly.  At this stage, XRSTORS 
should not fault.  If we need to kill the task, we should have done that 
earlier.

There are only two sources of invalid xstates: PTRACE and sigreturn. 
For user-mode data, both sources have been taken care of. 
Supervisor-mode data can be checked/reviewed easily.  There are only CET 
and PASID supervisor states.

Yu-cheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How should we handle illegal task FPU state?
  2020-10-08 18:08         ` Yu, Yu-cheng
@ 2020-10-09  0:08           ` Andy Lutomirski
  0 siblings, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2020-10-09  0:08 UTC (permalink / raw)
  To: Yu, Yu-cheng
  Cc: Andy Lutomirski, Dave Hansen, Sean Christopherson, LKML, X86 ML,
	Rik van Riel, Andrew Cooper

On Thu, Oct 8, 2020 at 11:08 AM Yu, Yu-cheng <yu-cheng.yu@intel.com> wrote:
>
> On 10/1/2020 3:04 PM, Andy Lutomirski wrote:
> > On Thu, Oct 1, 2020 at 2:50 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >>
> >> On 10/1/20 1:58 PM, Sean Christopherson wrote:
> >>> One thought for a lowish effort approach to pave the way for CET would be to
> >>> try XRSTORS multiple times in switch_fpu_return().  If the first try fails,
> >>> then WARN, init non-supervisor state and try a second time, and if _that_ fails
> >>> then kill the task.  I.e. do the minimum effort to play nice with bad FPU
> >>> state, but don't let anything "accidentally" turn off CET.
> >>
> >> I'm not sure we should ever keep running userspace after an XRSTOR*
> >> failure.  For MPX, this might have provided a nice, additional vector
> >> for an attacker to turn off MPX.  Same for pkeys if we didn't correctly
> >> differentiate between the hardware init state versus the "software init"
> >> state that we keep in init_task.
> >>
> >> What's the advantage of letting userspace keep running after we init its
> >> state?  That it _might_ be able to recover?
> >
> > I suppose we can kill userspace and change that behavior only if
> > someone complains.  I still think it would be polite to try to dump
> > core, but that could be tricky with the current code structure.  I'll
> > try to whip up a patch.  Maybe I'll add a debugfs file to trash MXCSR
> > for testing.
> >
>
> One complication of letting XRSTORS fail is exit_to_user_mode_prepare()
> will need to go back to exit_to_user_mode_loop() again (or repeat some
> parts of it).
>
> Currently, when exit_to_user_mode_loop() exits, xstates should have been
> validated earlier and to be restored shortly.  At this stage, XRSTORS
> should not fault.  If we need to kill the task, we should have done that
> earlier.

We can still do_exit().  I'll ponder this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How should we handle illegal task FPU state?
  2020-10-01 22:04       ` Andy Lutomirski
  2020-10-08 18:08         ` Yu, Yu-cheng
@ 2020-11-02 18:39         ` Borislav Petkov
  1 sibling, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2020-11-02 18:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Sean Christopherson, Yu, Yu-cheng, LKML, X86 ML,
	Rik van Riel, Andrew Cooper

On Thu, Oct 01, 2020 at 03:04:48PM -0700, Andy Lutomirski wrote:
> On Thu, Oct 1, 2020 at 2:50 PM Dave Hansen <dave.hansen@intel.com> wrote:
> > I'm not sure we should ever keep running userspace after an XRSTOR*
> > failure.  For MPX, this might have provided a nice, additional vector
> > for an attacker to turn off MPX.  Same for pkeys if we didn't correctly
> > differentiate between the hardware init state versus the "software init"
> > state that we keep in init_task.
> >
> > What's the advantage of letting userspace keep running after we init its
> > state?  That it _might_ be able to recover?
> 
> I suppose we can kill userspace and change that behavior only if
> someone complains.  I still think it would be polite to try to dump
> core, but that could be tricky with the current code structure.  I'll
> try to whip up a patch.  Maybe I'll add a debugfs file to trash MXCSR
> for testing.

Just for the record, I like this: safe and simple. We can always do
smarter shenanigans later, if at all needed, that is.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-11-02 18:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-01 17:43 How should we handle illegal task FPU state? Andy Lutomirski
2020-10-01 20:32 ` Yu, Yu-cheng
2020-10-01 20:58   ` Sean Christopherson
2020-10-01 21:42     ` Andrew Cooper
2020-10-01 21:50     ` Dave Hansen
2020-10-01 22:04       ` Andy Lutomirski
2020-10-08 18:08         ` Yu, Yu-cheng
2020-10-09  0:08           ` Andy Lutomirski
2020-11-02 18:39         ` Borislav Petkov
2020-10-01 21:26 ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).