On Tue, Nov 19, 2019 at 12:01:33PM -0800, Sean Christopherson wrote: > On Wed, Oct 30, 2019 at 11:44:09PM -0400, Derek Yerger wrote: > > I noticed the following in the host kernel log around the time the guest > > encountered BSOD on 5.2.7: > > > > [  337.841491] WARNING: CPU: 6 PID: 7548 at arch/x86/kvm/x86.c:7963 > > kvm_arch_vcpu_ioctl_run+0x19b1/0x1b00 [kvm] > > Rats, I overlooked this first time round. In the future, if you get a > WARN splat, try to make it very obvious in the bug report, they're almost > always a smoking gun. > > That WARN that fired is: > > /* The preempt notifier should have taken care of the FPU already. */ > WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD)); > > which was added part of a bug fix by commit: > > 240c35a3783a ("kvm: x86: Use task structs fpu field for user") > > the buggy commit that was fixed is > > 5f409e20b794 ("x86/fpu: Defer FPU state load until return to userspace") > > which was part of a FPU rewrite that went into 5.2[*]. So yep, big > smoking gun :-) > > My understanding of the WARN is that it means the kernel's FPU state is > unexpectedly loaded when entry to the KVM guest is imminent. As for *how* > the kernel's FPU state is getting loaded, no clue. But, I think it'd be > pretty easy to find the the culprit by adding a debug flag into struct > thread_info that gets set in vcpu_load() and clearing it in vcpu_put(), > and then WARN in set_ti_thread_flag() if the debug flag is true when > TIF_NEED_FPU_LOAD is being set. I'll put together a debugging patch later > today and send it your way. Debug patch attached. Hopefully it finds something, it took me an embarassing number of attempts to get correct, I kept screwing up checking a bit number versus checking a bit mask...