linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [5.2 regression] x86/fpu changes cause crashes in KVM guest
@ 2019-07-17 23:47 Thomas Lambertz
  2019-07-19  8:59 ` Wanpeng Li
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Lambertz @ 2019-07-17 23:47 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Rik van Riel, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, x86, linux-kernel

Since kernel 5.2, I've been experiencing strange issues in my Windows 10 
QEMU/KVM guest.
Via bisection, I have tracked down that the issue lies in the FPU state 
handling changes.
Kernels before 8ff468c29e9a9c3afe9152c10c7b141343270bf3 work great, the 
ones afterwards are affected.
Sometimes the state seems to be restored incorrectly in the guest.

I have managed to reproduce it relatively cleanly, on a linux guest.
(ubuntu-server 18.04, but that should not matter, since it occured on 
windows aswell)

To reproduce the issue, you need prime95 (or mprime), from 
https://www.mersenne.org/download/ .
This is just a stress test for the FPU, which helps reproduce the error 
much quicker.

- Run it in the guest as 'Benchmark Only', and choose the '(2) Small 
FFTs' torture test. Give it the maximum amount of cores (for me 10).
- On the host, run the same test. To keep my pc usable, I limited it to 
5 cores. I do this to put some pressure on the system.
- repeatedly focus and unfocus the qemu window

With this config, errors in the guest usually occur within 30 seconds. 
Without the refocusing, takes ~5min on average, but the variance of this 
time is quite large.

The error messages are either
     "FATAL ERROR: Rounding was ......., expected less than 0.4"
or
     "FATAL ERROR: Resulting sum was ....., expexted: ......",
suggesting that something in the calculation has gone wrong.

On the host, no errors are ever observed!


I am running an AMD Ryzen 5 1600X on an Gigabyte GA-AX370 Gaming 5 
motherboard.
My main operating system is ArchLinux, the issue exists both with the 
Arch and upstream kernel.
QEMU is managed with virt-manager, but the issue also appears with the 
following simple qemu cmdline:

qemu-system-x86_64 -hda /var/lib/libvirt/images/ubuntu18.04.qcow2 
-enable-kvm -smp 10 -m 2048

When kvm acceleration is disabled, the issue predictably goes away.

The issue still exists on the latest github upstream kernel, 
22051d9c4a57d3b4a8b5a7407efc80c71c7bfb16.

- Thomas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [5.2 regression] x86/fpu changes cause crashes in KVM guest
  2019-07-17 23:47 [5.2 regression] x86/fpu changes cause crashes in KVM guest Thomas Lambertz
@ 2019-07-19  8:59 ` Wanpeng Li
  2019-07-19 11:09   ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Wanpeng Li @ 2019-07-19  8:59 UTC (permalink / raw)
  To: Thomas Lambertz
  Cc: Sebastian Andrzej Siewior, Rik van Riel, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, LKML, Paolo Bonzini, Radim Krcmar, kvm,
	Peter Zijlstra

Cc kvm ml,
On Thu, 18 Jul 2019 at 08:08, Thomas Lambertz <mail@thomaslambertz.de> wrote:
>
> Since kernel 5.2, I've been experiencing strange issues in my Windows 10
> QEMU/KVM guest.
> Via bisection, I have tracked down that the issue lies in the FPU state
> handling changes.
> Kernels before 8ff468c29e9a9c3afe9152c10c7b141343270bf3 work great, the
> ones afterwards are affected.
> Sometimes the state seems to be restored incorrectly in the guest.
>
> I have managed to reproduce it relatively cleanly, on a linux guest.
> (ubuntu-server 18.04, but that should not matter, since it occured on
> windows aswell)
>
> To reproduce the issue, you need prime95 (or mprime), from
> https://www.mersenne.org/download/ .
> This is just a stress test for the FPU, which helps reproduce the error
> much quicker.
>
> - Run it in the guest as 'Benchmark Only', and choose the '(2) Small
> FFTs' torture test. Give it the maximum amount of cores (for me 10).
> - On the host, run the same test. To keep my pc usable, I limited it to
> 5 cores. I do this to put some pressure on the system.
> - repeatedly focus and unfocus the qemu window
>
> With this config, errors in the guest usually occur within 30 seconds.
> Without the refocusing, takes ~5min on average, but the variance of this
> time is quite large.
>
> The error messages are either
>      "FATAL ERROR: Rounding was ......., expected less than 0.4"
> or
>      "FATAL ERROR: Resulting sum was ....., expexted: ......",
> suggesting that something in the calculation has gone wrong.
>
> On the host, no errors are ever observed!

I found it is offended by commit 5f409e20b (x86/fpu: Defer FPU state
load until return to userspace) and can only be reproduced when
CONFIG_PREEMPT is enabled. Why restore qemu userspace fpu context to
hardware before vmentry in the commit?
https://lkml.org/lkml/2017/11/14/945 Actually I suspect the commit
f775b13eedee2 (x86,kvm: move qemu/guest FPU switching out to vcpu_run)
inaccurately save guest fpu state which in xsave area into the qemu
userspace fpu buffer. However, Rik replied in
https://lkml.org/lkml/2017/11/14/891, "The scheduler will save the
guest fpu context when a vCPU thread is preempted, and restore it when
it is scheduled back in." But I can't find any scheduler codes do
this. In addition, below codes can fix the mprime error warning.
(Still not sure it is correct)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58305cf..18f928e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3306,6 +3306,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)

     kvm_x86_ops->vcpu_load(vcpu, cpu);

+    if (test_thread_flag(TIF_NEED_FPU_LOAD))
+        switch_fpu_return();
+
     /* Apply any externally detected TSC adjustments (due to suspend) */
     if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
         adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
@@ -7990,10 +7993,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
     trace_kvm_entry(vcpu->vcpu_id);
     guest_enter_irqoff();

-    fpregs_assert_state_consistent();
-    if (test_thread_flag(TIF_NEED_FPU_LOAD))
-        switch_fpu_return();
-
     if (unlikely(vcpu->arch.switch_db_regs)) {
         set_debugreg(0, 7);
         set_debugreg(vcpu->arch.eff_db[0], 0);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [5.2 regression] x86/fpu changes cause crashes in KVM guest
  2019-07-19  8:59 ` Wanpeng Li
@ 2019-07-19 11:09   ` Paolo Bonzini
  2019-07-22  4:28     ` Wanpeng Li
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2019-07-19 11:09 UTC (permalink / raw)
  To: Wanpeng Li, Thomas Lambertz
  Cc: Sebastian Andrzej Siewior, Rik van Riel, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, LKML, Radim Krcmar, kvm,
	Peter Zijlstra, Marc Orr, Dave Hansen

On 19/07/19 10:59, Wanpeng Li wrote:
> https://lkml.org/lkml/2017/11/14/891, "The scheduler will save the
> guest fpu context when a vCPU thread is preempted, and restore it when
> it is scheduled back in." But I can't find any scheduler codes do
> this.

That's because applying commit 240c35a37 was completely wrong.  The idea
before commit 240c35a37 was that you have the following FPU states:

               userspace (QEMU)             guest
---------------------------------------------------------------------------
               processor                    vcpu->arch.guest_fpu
>>> KVM_RUN: kvm_load_guest_fpu
               vcpu->arch.user_fpu          processor
>>> preempt out
               vcpu->arch.user_fpu          current->thread.fpu
>>> preempt in
               vcpu->arch.user_fpu          processor
>>> back to userspace
>>> kvm_put_guest_fpu
               processor                    vcpu->arch.guest_fpu
---------------------------------------------------------------------------

After removing user_fpu, QEMU's FPU state is destroyed when KVM_RUN is
preempted.  So that's already messed up (I'll send a revert), and given
the diagram above your patch makes total sense.

With the new lazy model we want to hook into kvm_vcpu_arch_load and get
the state back to the processor from current->thread.fpu, and indeed
switch_fpu_return is essentially copy_kernel_to_fpregs(&current->thread.
fpu->state).

However I would keep the fpregs_assert_state_consistent in
kvm_arch_vcpu_load, and also
WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD)) in vcpu_enter_guest.

Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [5.2 regression] x86/fpu changes cause crashes in KVM guest
  2019-07-19 11:09   ` Paolo Bonzini
@ 2019-07-22  4:28     ` Wanpeng Li
  0 siblings, 0 replies; 4+ messages in thread
From: Wanpeng Li @ 2019-07-22  4:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Thomas Lambertz, Sebastian Andrzej Siewior, Rik van Riel,
	Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, LKML, Radim Krcmar,
	kvm, Peter Zijlstra, Marc Orr

On Fri, 19 Jul 2019 at 19:09, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 19/07/19 10:59, Wanpeng Li wrote:
> > https://lkml.org/lkml/2017/11/14/891, "The scheduler will save the
> > guest fpu context when a vCPU thread is preempted, and restore it when
> > it is scheduled back in." But I can't find any scheduler codes do
> > this.
>
> That's because applying commit 240c35a37 was completely wrong.  The idea
> before commit 240c35a37 was that you have the following FPU states:
>
>                userspace (QEMU)             guest
> ---------------------------------------------------------------------------
>                processor                    vcpu->arch.guest_fpu
> >>> KVM_RUN: kvm_load_guest_fpu
>                vcpu->arch.user_fpu          processor
> >>> preempt out
>                vcpu->arch.user_fpu          current->thread.fpu
> >>> preempt in
>                vcpu->arch.user_fpu          processor
> >>> back to userspace
> >>> kvm_put_guest_fpu
>                processor                    vcpu->arch.guest_fpu
> ---------------------------------------------------------------------------
>
> After removing user_fpu, QEMU's FPU state is destroyed when KVM_RUN is
> preempted.  So that's already messed up (I'll send a revert), and given
> the diagram above your patch makes total sense.
>
> With the new lazy model we want to hook into kvm_vcpu_arch_load and get
> the state back to the processor from current->thread.fpu, and indeed
> switch_fpu_return is essentially copy_kernel_to_fpregs(&current->thread.
> fpu->state).
>
> However I would keep the fpregs_assert_state_consistent in
> kvm_arch_vcpu_load, and also
> WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD)) in vcpu_enter_guest.

Looks good to me, just send out two patches rebase on the revert.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-07-22  4:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-17 23:47 [5.2 regression] x86/fpu changes cause crashes in KVM guest Thomas Lambertz
2019-07-19  8:59 ` Wanpeng Li
2019-07-19 11:09   ` Paolo Bonzini
2019-07-22  4:28     ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).