All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derek Yerger <derek@djy.llc>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	kvm@vger.kernel.org, "Bonzini, Paolo" <pbonzini@redhat.com>
Subject: Re: PROBLEM: Regression of MMU causing guest VM application errors
Date: Wed, 30 Oct 2019 23:44:09 -0400	[thread overview]
Message-ID: <36be1503-f6f1-0ed0-b1fe-9c05d827f624@djy.llc> (raw)
In-Reply-To: <20191024173212.GC20633@linux.intel.com>


On 10/24/19 1:32 PM, Sean Christopherson wrote:
> On Thu, Oct 24, 2019 at 11:18:59AM -0400, Derek Yerger wrote:
>> On 10/22/19 4:28 PM, Sean Christopherson wrote:
>>> On Thu, Oct 17, 2019 at 07:57:35PM -0400, Derek Yerger wrote:
>>> Heh, should've checked from the get go...  It's definitely not the memslot
>>> issue, because the memslot bug is in 5.1.16 as well.  :-)
>> I didn't pick up on that, nice catch. The memslot thread was the closest
>> thing I could find to an educated guess.
>>>> I'm stuck on 5.1.x for now, maybe I'll give up and get a dedicated windows
>>>> machine /s
>>> What hardware are you running on?  I was thinking this was AMD specific,
>>> but then realized you said "AMD Radeon 540 GPU" and not "AMD CPU".
>> Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
>>
>> 07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
>> Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7)
>>          Subsystem: Gigabyte Technology Co., Ltd Device 22fe
>>          Kernel driver in use: vfio-pci
>>          Kernel modules: amdgpu
>> (plus related audio device)
>>
>> I can't think of any other data points that would be helpful to solving
>> system instability in a guest OS.
> Can you bisect starting from v5.2?  Identifying which commit in the kernel
> introduced the regression would help immensely.
On the host, I have to install NVIDIA GPU drivers with each new kernel build. 
During the process I discovered that I can't reproduce the issue on any kernel 
if I skip the *host* GPU drivers and start libvirtd in single mode.

I noticed the following in the host kernel log around the time the guest 
encountered BSOD on 5.2.7:

[  337.841491] WARNING: CPU: 6 PID: 7548 at arch/x86/kvm/x86.c:7963 
kvm_arch_vcpu_ioctl_run+0x19b1/0x1b00 [kvm]

I have the rest of the log available if it's needed.

Otherwise the bisection process is: Build/install/run kernel, install host GPU 
drivers, exit single mode, start virt-manager, and do a few things in the guest 
until a crash occurs.

I swapped between Fedora distribution kernel 5.2.7 and 5.1.16 to be sure my test 
was reliably working between good/bad. I then built from tag v5.2.7 and 
confirmed the issue was present. The test failure is indicated by one of BSOD, 
Firefox crash, or tab crash, and reliably happens on the problem kernel but not 
on the good one.

After about 10 steps into bisecting, my tests became less reliable to the point 
that I'm not sure whether to mark my current point @381dc73f as good or bad. I 
had one crash but have been using the guest otherwise reliably for a few days. 
Considering the time it takes to build, install, and test, I didn't want to go 
too far down the wrong path if my tests are unreliable (even though 5.2.7 is a 
guaranteed and timely failure). I'll probably pick it back up over the weekend.

In any event, here is the bisect log up to now:

git bisect start
# bad: [5697a9d3d55fad99ffc3c1ba5654426ab64df333] Linux 5.2.7
git bisect bad 5697a9d3d55fad99ffc3c1ba5654426ab64df333
# good: [8584aaf1c3262ca17d1e4a614ede9179ef462bb0] Linux 5.1.16
git bisect good 8584aaf1c3262ca17d1e4a614ede9179ef462bb0
# good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
# skip: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 
'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
git bisect skip a2d635decbfa9c1e4ae15cb05b68b2559f7f827c
# good: [ee8146aad87cd8eeb5963856ac0b9a9176392e3a] coresight: 
dynamic-replicator: Clean up error handling
git bisect good ee8146aad87cd8eeb5963856ac0b9a9176392e3a
# good: [2e1f164861e500f4e068a9d909bbd3fcc7841483] net: hns: Fix loopback test 
failed at copper ports
git bisect good 2e1f164861e500f4e068a9d909bbd3fcc7841483
# good: [c884d8ac7ffccc094e9674a3eb3be90d3b296c0a] Merge tag 'spdx-5.2-rc6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
git bisect good c884d8ac7ffccc094e9674a3eb3be90d3b296c0a
# bad: [1ba0d730c0ca6825225171b74721bc75f3d12da8] bcache: fix potential deadlock 
in cached_def_free()
git bisect bad 1ba0d730c0ca6825225171b74721bc75f3d12da8
# good: [a5fff14a0c7989fbc8316a43f52aed1804f02ddd] Merge branch 'akpm' (patches 
from Andrew)
git bisect good a5fff14a0c7989fbc8316a43f52aed1804f02ddd
# good: [42db12d5cd081964e1844dad1f5f4088921fd303] ice: Gracefully handle reset 
failure in ice_alloc_vfs()
git bisect good 42db12d5cd081964e1844dad1f5f4088921fd303
# good: [161c926ba6f0bb779c0fb860d3cf390eb314d345] perf/x86/intel: Add more 
Icelake CPUIDs
git bisect good 161c926ba6f0bb779c0fb860d3cf390eb314d345
# good: [9a9ff8f128445688f43b9afc1b837a3de4548586] media: coda: increment 
sequence offset for the last returned frame
git bisect good 9a9ff8f128445688f43b9afc1b837a3de4548586
# good: [381dc73f8216252904d6578d7229282029aa430d] netfilter: ctnetlink: Fix 
regression in conntrack entry deletion
git bisect good 381dc73f8216252904d6578d7229282029aa430d

  reply	other threads:[~2019-10-31  3:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-16  4:49 PROBLEM: Regression of MMU causing guest VM application errors Derek Yerger
2019-10-16  7:28 ` Paolo Bonzini
2019-10-16 17:28 ` Alex Williamson
2019-10-16 17:49   ` Sean Christopherson
2019-10-17 23:57     ` Derek Yerger
2019-10-22 20:28       ` Sean Christopherson
2019-10-24 15:18         ` Derek Yerger
2019-10-24 17:32           ` Sean Christopherson
2019-10-31  3:44             ` Derek Yerger [this message]
2019-11-19 20:01               ` Sean Christopherson
2019-11-20  9:19                 ` Wanpeng Li
2019-11-20  9:57                   ` Paolo Bonzini
2019-11-20 18:19                 ` Sean Christopherson
2019-11-20 19:04                   ` Derek Yerger
2019-11-20 19:28                     ` Sean Christopherson
2019-11-27 15:24                       ` Sean Christopherson
2019-12-17 23:11                         ` Sean Christopherson
2019-12-17 23:13                           ` Derek Yerger
2020-01-02 13:42                           ` Derek Yerger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36be1503-f6f1-0ed0-b1fe-9c05d827f624@djy.llc \
    --to=derek@djy.llc \
    --cc=alex.williamson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.