All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	x86@kernel.org, Filipe Manana <fdmanana@suse.com>,
	Vadim Galitsin <vadim.galitsyn@oracle.com>
Subject: Re: [patch 0/3] x86/fpu: Prevent FPU state corruption
Date: Wed, 18 May 2022 15:09:54 +0200	[thread overview]
Message-ID: <87fsl7j8bh.ffs@tglx> (raw)
In-Reply-To: <YoRFjTIzMYZu8Hq8@zx2c4.com>

On Wed, May 18 2022 at 03:02, Jason A. Donenfeld wrote:
> On Wed, May 04, 2022 at 05:40:26PM +0200, Jason A. Donenfeld wrote:
>> On Sun, May 01, 2022 at 09:31:42PM +0200, Thomas Gleixner wrote:
>> > The recent changes in the random code unearthed a long standing FPU state
>> > corruption due do a buggy condition for granting in-kernel FPU usage.
>>  
>> Thanks for working that out. I've been banging my head over [1] for a
>> few days now trying to see if it's a mis-bisect or a real thing. I'll
>> ask Larry to retry with this patchset.
>
> So, Larry's debugging was inconsistent and didn't result in anything I
> could piece together into basic cause and effect. But luckily Vadim, who
> maintains the VirtualBox drivers for Oracle, was able to reproduce the
> issue and was able to conduct some real debugging. I've CC'd him here.
> From talking with Vadim, here are some findings thus far:
>
>   - Certain Linux guest processes crash under high load.
>   - Windows kernel guest panics.
>
> Observation: the Windows kernel uses SSSE3 in their kernel all over the
> place, generated by the compiler.
>
>   - Moving the mouse around helps induce the crash.
>
> Observation: add_input_randomness() -> .. -> kernel_fpu_begin() -> blake2s_compress().
>
>   - The problem exhibits itself in rc7, so this patchset does not fix
>     the issue.
>   - Applying https://xn--4db.cc/ttEUSvdC fixes the issue.
>
> Observation: the problem is definitely related to using the FPU in a
> hard IRQ.
>
> I went reading KVM to get some idea of why KVM does *not* have this
> problem, and it looks like there's some careful code there about doing
> xsave and such around IRQs. So my current theory is that VirtualBox's
> VMM just forgot to do this, and until now this bug went unnoticed.

That's a very valid assumption. I audited all places which fiddle with
FPU in Linus tree and with the fix applied they're all safe.

> Since VirtualBox is out of tree (and extremely messy of a codebase), and
> this appears to be an out of tree module problem rather than a kernel
> problem, I'm inclined to think that there's not much for us to do, at
> least until we receive information to the contrary of this presumption.

Agreed in all points.

> But in case you do want to do something proactively, I don't have any
> objections to just disabling the FPU in hard IRQ for 5.18. And in 5.19,
> add_input_randomness() isn't going to hit that path anyway. But also,
> doing nothing and letting the VirtualBox people figure out their bug
> would be fine with me too. Either way, just wanted to give you a heads
> up.

That virtualborx bug has to be fixed in any case as this problem exists
forever and there have been drivers using FPU in hard interrupt context
in the past sporadically, so it's sheer luck that this didn't explode
before. AFAICT all of this has been moved to softirq context over the
years, so the random code is probably the sole in hard interrupt user in
mainline today.

In the interest of users we should probably bite the bullet and just
disable hard interrupt FPU usage upstream and Cc stable. The stable
kernel updates probably reach users faster.

Thanks,

        tglx



  parent reply	other threads:[~2022-05-18 13:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-01 19:31 [patch 0/3] x86/fpu: Prevent FPU state corruption Thomas Gleixner
2022-05-01 19:31 ` [patch 1/3] " Thomas Gleixner
2022-05-02 13:16   ` Borislav Petkov
2022-05-05  0:42   ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
2022-05-01 19:31 ` [patch 2/3] x86/fpu: Rename irq_fpu_usable() Thomas Gleixner
2022-05-02 13:57   ` Borislav Petkov
2022-05-01 19:31 ` [patch 3/3] x86/fpu: Make FPU protection more robust Thomas Gleixner
2022-05-02 14:35   ` Borislav Petkov
2022-05-02 15:58     ` Thomas Gleixner
2022-05-03  9:06       ` Peter Zijlstra
2022-05-04 15:36         ` Thomas Gleixner
2022-05-04 15:55           ` Jason A. Donenfeld
2022-05-04 16:45             ` Thomas Gleixner
2022-05-04 19:05               ` Jason A. Donenfeld
2022-05-04 21:04                 ` Thomas Gleixner
2022-05-04 23:52                   ` Jason A. Donenfeld
2022-05-05  0:55                     ` Thomas Gleixner
2022-05-05  1:11                       ` Jason A. Donenfeld
2022-05-05  1:21                         ` Thomas Gleixner
2022-05-05 11:02                           ` Jason A. Donenfeld
2022-05-05 11:34                             ` David Laight
2022-05-05 11:35                               ` Jason A. Donenfeld
2022-05-05 11:53                                 ` David Laight
2022-05-06 22:34                               ` Jason A. Donenfeld
2022-05-07 13:50                                 ` David Laight
2022-05-05 13:48                             ` Jason A. Donenfeld
2022-05-06 22:15                 ` Jason A. Donenfeld
2022-05-03  9:03   ` Peter Zijlstra
2022-05-02 10:02 ` [patch 0/3] x86/fpu: Prevent FPU state corruption Filipe Manana
2022-05-02 12:22   ` Borislav Petkov
2022-05-04 15:40 ` Jason A. Donenfeld
2022-05-04 18:05   ` Thomas Gleixner
2022-05-18  1:02   ` Jason A. Donenfeld
2022-05-18 11:14     ` Jason A. Donenfeld
2022-05-18 11:18       ` Jason A. Donenfeld
2022-05-18 13:09     ` Thomas Gleixner [this message]
2022-05-18 14:08       ` Jason A. Donenfeld
2022-05-25 20:36         ` Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fsl7j8bh.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=Jason@zx2c4.com \
    --cc=fdmanana@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vadim.galitsyn@oracle.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.