All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derek Yerger <derek@djy.llc>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	kvm@vger.kernel.org, "Bonzini, Paolo" <pbonzini@redhat.com>
Subject: Re: PROBLEM: Regression of MMU causing guest VM application errors
Date: Thu, 24 Oct 2019 11:18:59 -0400	[thread overview]
Message-ID: <4af8cbac-39b1-1a20-8e26-54a37189fe32@djy.llc> (raw)
In-Reply-To: <20191022202847.GO2343@linux.intel.com>

On 10/22/19 4:28 PM, Sean Christopherson wrote:
> On Thu, Oct 17, 2019 at 07:57:35PM -0400, Derek Yerger wrote:
>> On 10/16/19 1:49 PM, Sean Christopherson wrote:
>>> On Wed, Oct 16, 2019 at 11:28:57AM -0600, Alex Williamson wrote:
>>>> On Wed, 16 Oct 2019 00:49:51 -0400
>>>> Derek Yerger<derek@djy.llc>  wrote:
>>>>
>>>>> In at least Linux 5.2.7 via Fedora, up to 5.2.18, guest OS applications
>>>>> repeatedly crash with segfaults. The problem does not occur on 5.1.16.
>>>>>
>>>>> System is running Fedora 29 with kernel 5.2.18. Guest OS is Windows 10 with an
>>>>> AMD Radeon 540 GPU passthrough. When on 5.2.7 or 5.2.18, specific windows
>>>>> applications frequently and repeatedly crash, throwing exceptions in random
>>>>> libraries. Going back to 5.1.16, the issue does not occur.
>>>>>
>>>>> The host system is unaffected by the regression.
>>>>>
>>>>> Keywords: kvm mmu pci passthrough vfio vfio-pci amdgpu
>>>>>
>>>>> Possibly related: Unmerged [PATCH] KVM: x86/MMU: Zap all when removing memslot
>>>>> if VM has assigned device
>>>> That was never merged because it was superseded by:
>>>>
>>>> d012a06ab1d2 Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"
>>>>
>>>> That revert also induced this commit:
>>>>
>>>> 002c5f73c508 KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
>>>>
>>>> Both of these were merged to stable, showing up in 5.2.11 and 5.2.16
>>>> respectively, so seeing these sorts of issues might be considered a
>>>> known issue on 5.2.7, but not 5.2.18 afaik.  Do you have a specific
>>>> test that reliably reproduces the issue?  Thanks,
>> Test case 1: Kernel 5.2.18, PCI passthrough, Windows 10 guest, error condition.
>> Error 1: Application error in Firefox, restarting firefox and restoring tabs
>> reliably causes application crash with stack overflow error.
>> Error 2: Guest BSOD by the morning if left idle
>> Error 3: Guest BSOD within 1 minute of using SolidWorks CAD software
>>
>> Test case 2: Kernel 5.2.18, no PCI passthrough, same environment. Guest BSOD
>> encountered.
>>
>> Test case 3: Kernel 5.1.16, no PCI passthrough, same environment. Worked in
>> Solidworks for 10 minutes without BSOD. Opened firefox and restored tabs, no
>> crash.
>>
>> Test case 4: Kernel 5.1.16, with PCI passthrough, same environment. Worked
>> in Solidworks for a half hour. Opened firefox and restored tabs, no crash.
>>
>> Other factors: The guest does not change between tests. Same drivers,
>> software, etc. I have reliably switched between 5.2.x and 5.1.x multiple
>> times in the past month and repeatably see issues with 5.2.x. At this point
>> I'm unsure if it's PCI passthrough causing the problem.
>>
>> I know I should probably start from fresh host and guest, but time isn't
>> really permitting.
>>> Also, does the failure reproduce on on 5.2.1 - 5.2.6?  The memslot debacle
>>> exists on all flavors of 5.2.x, if the errors showed up in 5.2.7 then they
>>> are being caused by something else.
>> After experiencing the issue in absence of PCI passthrough, I believe the
>> problem is unrelated to the memslot debacle.
> Heh, should've checked from the get go...  It's definitely not the memslot
> issue, because the memslot bug is in 5.1.16 as well.  :-)
I didn't pick up on that, nice catch. The memslot thread was the closest thing I 
could find to an educated guess.
>> I'm stuck on 5.1.x for now, maybe I'll give up and get a dedicated windows
>> machine /s
> What hardware are you running on?  I was thinking this was AMD specific,
> but then realized you said "AMD Radeon 540 GPU" and not "AMD CPU".
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa 
PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)
         Subsystem: Gigabyte Technology Co., Ltd Device 22fe
         Kernel driver in use: vfio-pci
         Kernel modules: amdgpu
(plus related audio device)

I can't think of any other data points that would be helpful to solving system 
instability in a guest OS. But given my troubleshooting before, it looks like 
presence/absence of a PCI passthrough device is inconsequential to whether the 
problem is occurring.

I may have to try out other VMs or a fresh windows guest.

  reply	other threads:[~2019-10-24 15:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-16  4:49 PROBLEM: Regression of MMU causing guest VM application errors Derek Yerger
2019-10-16  7:28 ` Paolo Bonzini
2019-10-16 17:28 ` Alex Williamson
2019-10-16 17:49   ` Sean Christopherson
2019-10-17 23:57     ` Derek Yerger
2019-10-22 20:28       ` Sean Christopherson
2019-10-24 15:18         ` Derek Yerger [this message]
2019-10-24 17:32           ` Sean Christopherson
2019-10-31  3:44             ` Derek Yerger
2019-11-19 20:01               ` Sean Christopherson
2019-11-20  9:19                 ` Wanpeng Li
2019-11-20  9:57                   ` Paolo Bonzini
2019-11-20 18:19                 ` Sean Christopherson
2019-11-20 19:04                   ` Derek Yerger
2019-11-20 19:28                     ` Sean Christopherson
2019-11-27 15:24                       ` Sean Christopherson
2019-12-17 23:11                         ` Sean Christopherson
2019-12-17 23:13                           ` Derek Yerger
2020-01-02 13:42                           ` Derek Yerger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4af8cbac-39b1-1a20-8e26-54a37189fe32@djy.llc \
    --to=derek@djy.llc \
    --cc=alex.williamson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.