From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
To: Sean Christopherson <seanjc@google.com>,
Zdenek Kaspar <zkaspar82@gmail.com>
Cc: kvm@vger.kernel.org
Subject: Re: Core2 and v5.18-rc5 troubles
Date: Tue, 3 May 2022 18:44:49 +0200 [thread overview]
Message-ID: <0426aa3f-bd8c-33f6-e492-088972b5dff0@maciej.szmigiero.name> (raw)
In-Reply-To: <YnFWT+OdBAOPpZfi@google.com>
On 3.05.2022 18:20, Sean Christopherson wrote:
> On Mon, May 02, 2022, Zdenek Kaspar wrote:
>> On Mon, 2 May 2022 02:29:59 +0200
>> Zdenek Kaspar <zkaspar82@gmail.com> wrote:
>>
>>> Hi, when I noticed fix for pre-EPYC and older Intel hardware I checked
>>> v5.18-rc5 on my old Core2 machine and something else fails here.
Core2 means shadow paging, but I don't know how this is related
to pre-EPYC AMD chips (Family 10h and Bulldozers support NPT, too).
Do you mean AMD K8 support here?
>>> When I kill -9 the first qemu attempt (see dmesg-1), then next
>>> attempt is OK, after successful VM shutdown another attempt fails
>>> again (see dmesg-2). After that it takes some time and machine needs
>>> reset (see dmesg-3).
>>>
>>> HTH, Z.
>>
>> Oh crap, I had auto-applied mglru patch, sorry...
Try a raw v5.18-rc5 if your test kernel has some extra patches applied.
>> Now starting qemu does this:
>>
>> 20 root 20 0 0.0m R 100.0 0.0 0:28.33 kworker/1:0+rcu_gp
>> looks like: 94.36% [kernel] [k] delay_tsc
>> kill -9 "qemu process" and noticed several D state processes:
>> 107 ? D 0:04 [kworker/u8:7+events_unbound]
>> 108 ? D 0:03 [kworker/u8:8+events_unbound]
>> 632 ? Ds 0:00 /usr/lib/systemd/systemd-journald
>>
>> machine is trashed, here's dmesg info:
>>
>> [ 172.349282] BUG: kernel NULL pointer dereference, address: 000000000000000b
>> [ 172.349324] #PF: supervisor write access in kernel mode
>> [ 172.349345] #PF: error_code(0x0002) - not-present page
>> [ 172.349363] PGD 0 P4D 0
>> [ 172.349375] Oops: 0002 [#1] PREEMPT SMP PTI
>> [ 172.349393] CPU: 0 PID: 626 Comm: qemu-build Not tainted 5.18.0-rc5-2-amd64 #1
>> [ 172.349420] Hardware name: /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
>> [ 172.349446] RIP: 0010:kvm_replace_memslot+0xb0/0x2a0 [kvm]
>> [ 172.349496] Code: ac 1c 80 04 00 00 4d 85 f6 74 65 48 89 e8 48 c1 e0 04 49 8b 4c 06 08 48 85 c9 74 21 4c 01 f0 48 8b 10 48 89 11 48 85 d2 74 04 <48> 89 4a 08 48 c7 40 08 00 00 00 00 48 c7 00 00 00 00 00 48 8d 44
>
> ...
>
>> [ 172.349772] Call Trace:
>> [ 172.349785] <TASK>
>> [ 172.349795] kvm_set_memslot+0x3a8/0x5e0 [kvm]
>
> My kernel build doesn't match exactly, but it's pretty close, and this call points
> to the kvm_activate_memslot() call in the kvm_prepare_memory_region() error path.
>
> r = kvm_prepare_memory_region(kvm, old, new, change);
> if (r) {
> /*
> * For DELETE/MOVE, revert the above INVALID change. No
> * modifications required since the original slot was preserved
> * in the inactive slots. Changing the active memslots also
> * release slots_arch_lock.
> */
> if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
> kvm_activate_memslot(kvm, invalid_slot, old);
> kfree(invalid_slot);
> } else {
> mutex_unlock(&kvm->slots_arch_lock);
> }
> return r;
> }
>
> What I can't figure out is how that can result in a NULL pointer dereference. The
> the faulting instruction is
>
> mov rcx, [rdx + 8]
>
> which appears to be
>
> WRITE_ONCE(*pprev, next);
>
> in __hlist_del() from hlist_del().
>
> if (!hlist_unhashed(n)) {
> __hlist_del(n);
> INIT_HLIST_NODE(n);
> }
>
> which comes from
>
> hash_del(&old->id_node[idx]);
>
> so the fault on 0xb means the pointer is '3'. "old" is zero allocated, and idx
> comes from slots->node_idx, which is effectively a constant '0' or '1'.
>
> I can't see any way for "old" to be '3. Even more confusing is that that would
> imply kvm_prepare_memory_region() failed on a DELETE action, which should be
> impossible.
>
> I've tried hitting every edge case of this flow I can think of and haven't been
> able to reproduce the behavior, e.g. triggering the new error path introduced by
> commit 86931ff7207b ("KVM: x86/mmu: Do not create SPTEs for GFNs that exceed
> host.MAXPHYADDR") is handled cleanly. Furthermore, AFAIK QEMU doesn't MOVE slots,
> so that path seems highly unlikely.
>
> The other relevant datapoint is that fsnotify() hits a very similar NULL pointer
> shortly after KVM's explosion, that one directly on address '3'. That suggests
> there's data corruption of some form going on, and KVM just happens to be the first
> to be encounter a bad pointer.
>
> Have you tried bisecting? And/or can you provide your kernel config? Maybe there's
> a debug/sanitizer option that's causing problems.
+1 for bisecting.
Would also ask the reporter to try enabling KASAN,
if it is a genuine memory corruption bug it might show the place it originally
happens (rather than where the overwritten pointer ultimately gets dereferenced).
Thanks,
Maciej
next prev parent reply other threads:[~2022-05-03 16:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-02 0:29 Core2 and v5.18-rc5 troubles Zdenek Kaspar
2022-05-02 17:00 ` Zdenek Kaspar
2022-05-03 16:20 ` Sean Christopherson
2022-05-03 16:44 ` Maciej S. Szmigiero [this message]
2022-05-03 21:07 ` Zdenek Kaspar
2022-05-03 23:52 ` Sean Christopherson
2022-05-04 14:17 ` Zdenek Kaspar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0426aa3f-bd8c-33f6-e492-088972b5dff0@maciej.szmigiero.name \
--to=mail@maciej.szmigiero.name \
--cc=kvm@vger.kernel.org \
--cc=seanjc@google.com \
--cc=zkaspar82@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).