Re: irq_fpu_usable() is irreliable

* Re: irq_fpu_usable() is irreliable
       [not found] <CAHmME9rgXh3zQDfc2Yo_Au0CSg--X+ak=SQdS9DoXNsKK0TPmA@mail.gmail.com>
@ 2015-11-17 14:06 ` Thomas Gleixner
  2015-11-17 14:51   ` Jason A. Donenfeld
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2015-11-17 14:06 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: mingo, hpa, LKML

Jason,

On Tue, 17 Nov 2015, Jason A. Donenfeld wrote:
> The availability of the FPU in kernel space, as you know, is determined by
> this function:
> 
> bool irq_fpu_usable(void)
> {
>         return !in_interrupt() ||
>                 interrupted_user_mode() ||
>                 interrupted_kernel_fpu_idle();
> }
> 
> My understanding is that the first check is !in_interrupt(), because if
> `current` is valid - if we are in process context - then we have a place to
> store the existing FPU regs in kernel_fpu_begin, to be restored later in
> kernel_fpu_end. Recently I've been tracking down a problem in
> which irq_fpu_usable() returns false, yet a stack trace shows the first
> function is the syscall entry point. This leads me to believe that
> in_interrupt() is not an adequate way of testing for a valid `current`.

This function has absolute nothing to do with current. current is
always valid. The function checks whether we can use the fpu safely in
kernel context.

> In my particular problematic case, the reason in_interrupt() was
> returning false is because a number of rcu_read_lock_bh()s were
> being held; IOW this is occurring in the ndo_start_xmit path of a
> network driver.
>
> I therefore propose changing the function to this:
> 
> bool irq_fpu_usable(void)
> {
>         return (!in_irq() && !in_nmi()) ||
>                 interrupted_user_mode() ||
>                 interrupted_kernel_fpu_idle();
> }
> 
> What would you think of that?

That's broken. Assume we interrupted a kernel thread which fiddles
with the FPU and then on irq exit we run a softirq which tries to use
the FPU....

The real question in your case is WHY interrupted_kernel_fpu_idle()
returns false. We know for sure that in a syscall with BH disabled the
first two checks are false.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread