All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Derek Yerger <derek@djy.llc>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	kvm@vger.kernel.org, "Bonzini, Paolo" <pbonzini@redhat.com>
Subject: Re: PROBLEM: Regression of MMU causing guest VM application errors
Date: Tue, 19 Nov 2019 12:01:33 -0800	[thread overview]
Message-ID: <20191119200133.GD25672@linux.intel.com> (raw)
In-Reply-To: <36be1503-f6f1-0ed0-b1fe-9c05d827f624@djy.llc>

On Wed, Oct 30, 2019 at 11:44:09PM -0400, Derek Yerger wrote:
> 
> On 10/24/19 1:32 PM, Sean Christopherson wrote:
> >On Thu, Oct 24, 2019 at 11:18:59AM -0400, Derek Yerger wrote:
> >>On 10/22/19 4:28 PM, Sean Christopherson wrote:
> >>>On Thu, Oct 17, 2019 at 07:57:35PM -0400, Derek Yerger wrote:
> >>>Heh, should've checked from the get go...  It's definitely not the memslot
> >>>issue, because the memslot bug is in 5.1.16 as well.  :-)
> >>I didn't pick up on that, nice catch. The memslot thread was the closest
> >>thing I could find to an educated guess.
> >>>>I'm stuck on 5.1.x for now, maybe I'll give up and get a dedicated windows
> >>>>machine /s
> >>>What hardware are you running on?  I was thinking this was AMD specific,
> >>>but then realized you said "AMD Radeon 540 GPU" and not "AMD CPU".
> >>Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
> >>
> >>07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> >>Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)
> >>         Subsystem: Gigabyte Technology Co., Ltd Device 22fe
> >>         Kernel driver in use: vfio-pci
> >>         Kernel modules: amdgpu
> >>(plus related audio device)
> >>
> >>I can't think of any other data points that would be helpful to solving
> >>system instability in a guest OS.
> >Can you bisect starting from v5.2?  Identifying which commit in the kernel
> >introduced the regression would help immensely.
> On the host, I have to install NVIDIA GPU drivers with each new kernel
> build. During the process I discovered that I can't reproduce the issue on
> any kernel if I skip the *host* GPU drivers and start libvirtd in single
> mode.
> 
> I noticed the following in the host kernel log around the time the guest
> encountered BSOD on 5.2.7:
> 
> [  337.841491] WARNING: CPU: 6 PID: 7548 at arch/x86/kvm/x86.c:7963
> kvm_arch_vcpu_ioctl_run+0x19b1/0x1b00 [kvm]

Rats, I overlooked this first time round.  In the future, if you get a
WARN splat, try to make it very obvious in the bug report, they're almost
always a smoking gun.

That WARN that fired is:

        /* The preempt notifier should have taken care of the FPU already.  */
        WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD));

which was added part of a bug fix by commit:

	240c35a3783a ("kvm: x86: Use task structs fpu field for user")

the buggy commit that was fixed is

	5f409e20b794 ("x86/fpu: Defer FPU state load until return to userspace")

which was part of a FPU rewrite that went into 5.2[*].  So yep, big
smoking gun :-)

My understanding of the WARN is that it means the kernel's FPU state is
unexpectedly loaded when entry to the KVM guest is imminent.  As for *how*
the kernel's FPU state is getting loaded, no clue.  But, I think it'd be
pretty easy to find the the culprit by adding a debug flag into struct
thread_info that gets set in vcpu_load() and clearing it in vcpu_put(),
and then WARN in set_ti_thread_flag() if the debug flag is true when
TIF_NEED_FPU_LOAD is being set.  I'll put together a debugging patch later
today and send it your way.

[*] https://lkml.kernel.org/r/20190403164156.19645-1-bigeasy@linutronix.de

  reply	other threads:[~2019-11-19 20:01 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-16  4:49 PROBLEM: Regression of MMU causing guest VM application errors Derek Yerger
2019-10-16  7:28 ` Paolo Bonzini
2019-10-16 17:28 ` Alex Williamson
2019-10-16 17:49   ` Sean Christopherson
2019-10-17 23:57     ` Derek Yerger
2019-10-22 20:28       ` Sean Christopherson
2019-10-24 15:18         ` Derek Yerger
2019-10-24 17:32           ` Sean Christopherson
2019-10-31  3:44             ` Derek Yerger
2019-11-19 20:01               ` Sean Christopherson [this message]
2019-11-20  9:19                 ` Wanpeng Li
2019-11-20  9:57                   ` Paolo Bonzini
2019-11-20 18:19                 ` Sean Christopherson
2019-11-20 19:04                   ` Derek Yerger
2019-11-20 19:28                     ` Sean Christopherson
2019-11-27 15:24                       ` Sean Christopherson
2019-12-17 23:11                         ` Sean Christopherson
2019-12-17 23:13                           ` Derek Yerger
2020-01-02 13:42                           ` Derek Yerger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191119200133.GD25672@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=derek@djy.llc \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.