kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 210695] New: error: kvm run failed Invalid argument
@ 2020-12-15  0:11 bugzilla-daemon
  2020-12-15  0:13 ` [Bug 210695] " bugzilla-daemon
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-15  0:11 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

            Bug ID: 210695
           Summary: error: kvm run failed Invalid argument
           Product: Virtualization
           Version: unspecified
    Kernel Version: 5.10.1
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: rherbert@sympatico.ca
        Regression: No

Created attachment 294133
  --> https://bugzilla.kernel.org/attachment.cgi?id=294133&action=edit
Error outputs

Hi.

Starting with kernel 5.10-rc1, I've been unable to start Qemu VM's.  The only
5.10-rc that worked was 5.10-rc4 (not sure about -rc5).  I've tried with both
qemu 4.1.0 and the latest 5.2.0.  The VM runs fine with kernel 5.9.13.

uname -a:

Linux starbug.dom 5.10.1 #1 SMP Mon Dec 14 18:24:40 EST 2020 x86_64 Intel(R)
Core(TM)2 Quad CPU    Q9650  @ 3.00GHz GenuineIntel GNU/Linux

Motherboard: Intel DG41RQ

Thanks.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 210695] error: kvm run failed Invalid argument
  2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
@ 2020-12-15  0:13 ` bugzilla-daemon
  2020-12-16  2:03 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-15  0:13 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

Richard Herbert (rherbert@sympatico.ca) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Regression|No                          |Yes

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 210695] error: kvm run failed Invalid argument
  2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
  2020-12-15  0:13 ` [Bug 210695] " bugzilla-daemon
@ 2020-12-16  2:03 ` bugzilla-daemon
  2020-12-16  2:28 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-16  2:03 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

Sean Christopherson (seanjc@google.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |seanjc@google.com

--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
Hrm, this is more than likely related to refactoring for the introduction of
the TDP MMU.

> kernel get_mmio_spte: detect reserved bits on spte, addr 0xfe013000, dump
> hierarchy:
> kernel        ------ spte 0x80000b0e level 3.

This SPTE looks odd/wrong, but unless I'm misreading things, there are no
reserved bits set.

> kernel        ------ spte 0x49c0027 level 2.
> kernel        ------ spte 0x38001ffe0134d7 level 1.

And this SPTE obviously has reserved bits set, but that's intentional and this
should not trigger the warning as the SPTE should be recognized as an MMIO
SPTE.

I'll stare at this more tomorrow and play with things a bit to see if I can
figure out what's broken.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 210695] error: kvm run failed Invalid argument
  2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
  2020-12-15  0:13 ` [Bug 210695] " bugzilla-daemon
  2020-12-16  2:03 ` bugzilla-daemon
@ 2020-12-16  2:28 ` bugzilla-daemon
  2020-12-16  2:57 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-16  2:28 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

--- Comment #2 from Richard Herbert (rherbert@sympatico.ca) ---
Hi, Sean.  I'm no programmer, unfortunately, but when kernel 5.10-rc4 was
released, this was the patch that I suspected had fixed the problem.  I don't
find this code is 5.10.1, so it must have been reverted or moved.

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 27e381c9da6c2..ff28a5c6abd63 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -49,7 +49,14 @@ bool is_tdp_mmu_root(struct kvm *kvm, hpa_t hpa)
 {
        struct kvm_mmu_page *sp;

+       if (!kvm->arch.tdp_mmu_enabled)
+               return false;
+       if (WARN_ON(!VALID_PAGE(hpa)))
+               return false;
+
        sp = to_shadow_page(hpa);
+       if (WARN_ON(!sp))
+               return false;

        return sp->tdp_mmu_page && sp->root_count;
 }

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Bug 210695] error: kvm run failed Invalid argument
  2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-12-16  2:28 ` bugzilla-daemon
@ 2020-12-16  2:57 ` bugzilla-daemon
  2020-12-16 20:49 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-16  2:57 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

--- Comment #3 from Richard Herbert (rherbert@sympatico.ca) ---
So sorry, Sean. I was looking in mmu.c and not tdp_mmu.c.  My very bad. 
Sheesh.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 210695] error: kvm run failed Invalid argument
  2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-12-16  2:57 ` bugzilla-daemon
@ 2020-12-16 20:49 ` bugzilla-daemon
  2020-12-16 23:08 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-16 20:49 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

--- Comment #4 from Sean Christopherson (seanjc@google.com) ---
Aha!  I haven't reproduce the bug (mostly because I'm pretty sure my guests
aren't doing emulated MMIO accesses with paging disabled), but I'm pretty sure
I know what's going on, and why -rc4 may have worked.

Your guest has paging disabled, in which case mmu->root_level will be '0' and
mmu->shadow_root_level will be '3'.  If the shadow walk in get_walk() bails
without ever entering the loop (due to an invalid PAE root), the returned leaf
will be '0' because get_walk() uses mmu->root level instead of mmu->shadow_root
level.  In get_mmio_spte(), this causes the check for reserved bits to check
uninitialized/stale stack memory and return a bogus SPTE.

Pre rc6, both get_mmio_spte() and get_walk() used the bad mmu->root_level,
which meant that the reserved bits check would get skipped in the above
scenario.  But, get_mmio_spte() would still return a stale/bogus SPTE, so it's
not at all surprising that things failed.  Actually, it's surprising that any
5.10-rc* work.  Best guess is that there is a mostly unrelated change that
cause things to work by sheer dumb luck.

In rc6, the get_mmio_spte() half of the bug was fixed by commit 9a2a0d3ca163
("kvm: x86/mmu: Fix get_mmio_spte() on CPUs supporting 5-level PT").  This
cause get_mmio_spte() to resume the reserved bits check, which all but
guaranteed an explosion, i.e. ensured a 100% failure rate on your end.

TL;DR: Can you try this patch?  I'll also try to reproduce the original bug on
my end now that I have a smoking gun.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7a6ae9e90bd7..6880119840c1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3488,7 +3488,7 @@ static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64
addr, bool direct)
 static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes)
 {
        struct kvm_shadow_walk_iterator iterator;
-       int leaf = vcpu->arch.mmu->root_level;
+       int leaf = vcpu->arch.mmu->shadow_root_level;
        u64 spte;

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Bug 210695] error: kvm run failed Invalid argument
  2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-12-16 20:49 ` bugzilla-daemon
@ 2020-12-16 23:08 ` bugzilla-daemon
  2020-12-16 23:14 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-16 23:08 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

--- Comment #5 from Richard Herbert (rherbert@sympatico.ca) ---
Created attachment 294167
  --> https://bugzilla.kernel.org/attachment.cgi?id=294167&action=edit
qemu syslog crash output after patch for 5.10.1

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 210695] error: kvm run failed Invalid argument
  2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-12-16 23:08 ` bugzilla-daemon
@ 2020-12-16 23:14 ` bugzilla-daemon
  2020-12-17 19:54 ` bugzilla-daemon
  2020-12-18  3:10 ` bugzilla-daemon
  8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-16 23:14 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

--- Comment #6 from Richard Herbert (rherbert@sympatico.ca) ---
Hi, Sean.

Thanks for the patch.  I applied it and recompiled kernel 5.10.1. Qemu still
crashes, but the output is slightly different.

Not sure if this has anything to do with it:

dmesg | grep -i mmu

[    0.301060] pci 0000:00:00.0: DMAR: Disabling IOMMU for graphics on this
chipset

Thanks again.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 210695] error: kvm run failed Invalid argument
  2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-12-16 23:14 ` bugzilla-daemon
@ 2020-12-17 19:54 ` bugzilla-daemon
  2020-12-18  3:10 ` bugzilla-daemon
  8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-17 19:54 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

--- Comment #7 from Sean Christopherson (seanjc@google.com) ---
Finally figured it out.  When KVM uses PAE shadow paging, the PDPTRs are
effectively skipped when walking the shadow tables because a bad PDPTR will be
detected much earlier and KVM essentially hardcodes the PDPTRs.  As a result,
the corresponding SPTE isn't filled by get_walk() as it never "sees" the SPTE
for the PDTPR.  The refactored get_mmio_spte() doesn't account for this and
checks the PDPTR SPTE array value.  This reads uninitialized stack data, and in
your case, this yields the garbage value '0x80000b0e' that causes things to
explode (my system gets '0' most/all of the time).

I'll get a patch out later today, it's a bit of a mess....

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 210695] error: kvm run failed Invalid argument
  2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-12-17 19:54 ` bugzilla-daemon
@ 2020-12-18  3:10 ` bugzilla-daemon
  8 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2020-12-18  3:10 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=210695

Richard Herbert (rherbert@sympatico.ca) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #8 from Richard Herbert (rherbert@sympatico.ca) ---
Get the so called "root" level from the low level shadow page table
walkers instead of manually attempting to calculate it higher up the
stack, e.g. in get_mmio_spte().  When KVM is using PAE shadow paging,
the starting level of the walk, from the callers perspective, is not
the CR3 root but rather the PDPTR "root".  Checking for reserved bits
from the CR3 root causes get_mmio_spte() to consume uninitialized stack
data due to indexing into sptes[] for a level that was not filled by
get_walk().  This can result in false positives and/or negatives
depending on what garbage happens to be on the stack.

Opportunistically nuke a few extra newlines.

Fixes: 95fb5b0258b7 ("kvm: x86/mmu: Support MMIO in the TDP MMU")

Signed-off-by: Sean Christopherson <seanjc@google.com>

Marking as RESOLVED, with Thanks.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-18  3:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-15  0:11 [Bug 210695] New: error: kvm run failed Invalid argument bugzilla-daemon
2020-12-15  0:13 ` [Bug 210695] " bugzilla-daemon
2020-12-16  2:03 ` bugzilla-daemon
2020-12-16  2:28 ` bugzilla-daemon
2020-12-16  2:57 ` bugzilla-daemon
2020-12-16 20:49 ` bugzilla-daemon
2020-12-16 23:08 ` bugzilla-daemon
2020-12-16 23:14 ` bugzilla-daemon
2020-12-17 19:54 ` bugzilla-daemon
2020-12-18  3:10 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).