Re: [5.2 regression] x86/fpu changes cause crashes in KVM guest

* Re: [5.2 regression] x86/fpu changes cause crashes in KVM guest
       [not found] <217248af-e980-9cb0-ff0d-9773413b9d38@thomaslambertz.de>
@ 2019-07-19  8:59 ` Wanpeng Li
  2019-07-19 11:09   ` Paolo Bonzini
  0 siblings, 1 reply; 3+ messages in thread
From: Wanpeng Li @ 2019-07-19  8:59 UTC (permalink / raw)
  To: Thomas Lambertz
  Cc: Sebastian Andrzej Siewior, Rik van Riel, Dave Hansen,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, LKML, Paolo Bonzini, Radim Krcmar, kvm,
	Peter Zijlstra

Cc kvm ml,
On Thu, 18 Jul 2019 at 08:08, Thomas Lambertz <mail@thomaslambertz.de> wrote:
>
> Since kernel 5.2, I've been experiencing strange issues in my Windows 10
> QEMU/KVM guest.
> Via bisection, I have tracked down that the issue lies in the FPU state
> handling changes.
> Kernels before 8ff468c29e9a9c3afe9152c10c7b141343270bf3 work great, the
> ones afterwards are affected.
> Sometimes the state seems to be restored incorrectly in the guest.
>
> I have managed to reproduce it relatively cleanly, on a linux guest.
> (ubuntu-server 18.04, but that should not matter, since it occured on
> windows aswell)
>
> To reproduce the issue, you need prime95 (or mprime), from
> https://www.mersenne.org/download/ .
> This is just a stress test for the FPU, which helps reproduce the error
> much quicker.
>
> - Run it in the guest as 'Benchmark Only', and choose the '(2) Small
> FFTs' torture test. Give it the maximum amount of cores (for me 10).
> - On the host, run the same test. To keep my pc usable, I limited it to
> 5 cores. I do this to put some pressure on the system.
> - repeatedly focus and unfocus the qemu window
>
> With this config, errors in the guest usually occur within 30 seconds.
> Without the refocusing, takes ~5min on average, but the variance of this
> time is quite large.
>
> The error messages are either
>      "FATAL ERROR: Rounding was ......., expected less than 0.4"
> or
>      "FATAL ERROR: Resulting sum was ....., expexted: ......",
> suggesting that something in the calculation has gone wrong.
>
> On the host, no errors are ever observed!

I found it is offended by commit 5f409e20b (x86/fpu: Defer FPU state
load until return to userspace) and can only be reproduced when
CONFIG_PREEMPT is enabled. Why restore qemu userspace fpu context to
hardware before vmentry in the commit?
https://lkml.org/lkml/2017/11/14/945 Actually I suspect the commit
f775b13eedee2 (x86,kvm: move qemu/guest FPU switching out to vcpu_run)
inaccurately save guest fpu state which in xsave area into the qemu
userspace fpu buffer. However, Rik replied in
https://lkml.org/lkml/2017/11/14/891, "The scheduler will save the
guest fpu context when a vCPU thread is preempted, and restore it when
it is scheduled back in." But I can't find any scheduler codes do
this. In addition, below codes can fix the mprime error warning.
(Still not sure it is correct)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58305cf..18f928e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3306,6 +3306,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)

     kvm_x86_ops->vcpu_load(vcpu, cpu);

+    if (test_thread_flag(TIF_NEED_FPU_LOAD))
+        switch_fpu_return();
+
     /* Apply any externally detected TSC adjustments (due to suspend) */
     if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
         adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
@@ -7990,10 +7993,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
     trace_kvm_entry(vcpu->vcpu_id);
     guest_enter_irqoff();

-    fpregs_assert_state_consistent();
-    if (test_thread_flag(TIF_NEED_FPU_LOAD))
-        switch_fpu_return();
-
     if (unlikely(vcpu->arch.switch_db_regs)) {
         set_debugreg(0, 7);
         set_debugreg(vcpu->arch.eff_db[0], 0);

^ permalink raw reply related	[flat|nested] 3+ messages in thread