From: Kashyap Chamarthy <kchamart@redhat.com>
To: kvm@vger.kernel.org, jan.kiszka@siemens.com
Cc: dgilbert@redhat.com
Subject: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
Date: Mon, 16 Feb 2015 21:40:13 +0100 [thread overview]
Message-ID: <20150216204013.GI21838@tesla.redhat.com> (raw)
I can observe this only one of the Intel Xeon machines (which has 48
CPUs and 1TB memory), but very reliably reproducible.
Reproducer:
- Just ensure physical host (L0) and guest hypervisor (L1) are running
3.20.0-0.rc0.git5.1 Kernel (I used from Fedora's Rawhide).
Preferably on an Intel Xeon machine - as that's where I could
reproduce this issue, not on a Haswell machine
- Boot an L2 guest: Run `qemu-sanity-check --accel=kvm` in L1 (or
your own preferred method to boot an L2 KVM guest).
- On a different terminal, which has serial console for L1: observe L1
reboot
The only thing I notice in `demsg` (on L0) is this trace. _However_ this
trace does not occur when an L1 reboot is triggered while you watch
`dmesg -w` (to wait for new messages) as I boot an L2 guest -- which
means, the below trace is not the root cause of L1 being rebooted. When
the L2 gets rebooted, what you observe is just one of these messages
"vcpu0 unhandled rdmsr: 0x1a6" below
. . .
[Feb16 13:44] ------------[ cut here ]------------
[ +0.004632] WARNING: CPU: 4 PID: 1837 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]()
[ +0.009835] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ip6table_filter ip6_tables cfg80211 rfkill iTCO_wdt iTCO_vendor_support ipmi_devintf gpio_ich dcdbas coretemp kvm_intel kvm crc32c_intel ipmi_ssif serio_raw acpi_power_meter ipmi_si tpm_tis ipmi_msghandler tpm lpc_ich i7core_edac mfd_core edac_core acpi_cpufreq shpchp wmi mgag200 i2c_algo_bit drm_kms_helper ttm ata_generic drm pata_acpi megaraid_sas bnx2
[ +0.050289] CPU: 4 PID: 1837 Comm: qemu-system-x86 Not tainted 3.20.0-0.rc0.git5.1.fc23.x86_64 #1
[ +0.008902] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
[ +0.007469] 0000000000000000 00000000ee6c0c54 ffff88bf60bf7c18 ffffffff818760f7
[ +0.007542] 0000000000000000 0000000000000000 ffff88bf60bf7c58 ffffffff810ab80a
[ +0.007519] ffff88ff625b8000 ffff883f55f9b000 0000000000000000 0000000000000014
[ +0.007489] Call Trace:
[ +0.002471] [<ffffffff818760f7>] dump_stack+0x4c/0x65
[ +0.005152] [<ffffffff810ab80a>] warn_slowpath_common+0x8a/0xc0
[ +0.006020] [<ffffffff810ab93a>] warn_slowpath_null+0x1a/0x20
[ +0.005851] [<ffffffffa130957e>] nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]
[ +0.006974] [<ffffffffa130c5f7>] ? vmx_handle_exit+0x1e7/0xcb2 [kvm_intel]
[ +0.006999] [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
[ +0.007239] [<ffffffffa130992a>] vmx_queue_exception+0x10a/0x150 [kvm_intel]
[ +0.007136] [<ffffffffa02cb30b>] kvm_arch_vcpu_ioctl_run+0x106b/0x1b50 [kvm]
[ +0.007162] [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
[ +0.007241] [<ffffffff8110760d>] ? trace_hardirqs_on+0xd/0x10
[ +0.005864] [<ffffffffa02b2df6>] ? vcpu_load+0x26/0x70 [kvm]
[ +0.005761] [<ffffffff81103c0f>] ? lock_release_holdtime.part.29+0xf/0x200
[ +0.006979] [<ffffffffa02c5f88>] ? kvm_arch_vcpu_load+0x58/0x210 [kvm]
[ +0.006634] [<ffffffffa02b3203>] kvm_vcpu_ioctl+0x383/0x7e0 [kvm]
[ +0.006197] [<ffffffff81027b9d>] ? native_sched_clock+0x2d/0xa0
[ +0.006026] [<ffffffff810d5fc6>] ? creds_are_invalid.part.1+0x16/0x50
[ +0.006537] [<ffffffff810d6021>] ? creds_are_invalid+0x21/0x30
[ +0.005930] [<ffffffff813a61da>] ? inode_has_perm.isra.48+0x2a/0xa0
[ +0.006365] [<ffffffff8128c7b8>] do_vfs_ioctl+0x2e8/0x530
[ +0.005496] [<ffffffff8128ca81>] SyS_ioctl+0x81/0xa0
[ +0.005065] [<ffffffff8187f8e9>] system_call_fastpath+0x12/0x17
[ +0.006014] ---[ end trace 2f24e0820b44f686 ]---
[ +5.870886] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
[ +0.004991] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
[ +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
[Feb16 14:18] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
[ +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
[ +0.004998] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
. . .
Version
-------
Exact below versions were used on L0 and L1:
$ uname -r; rpm -q qemu-system-x86
3.20.0-0.rc0.git5.1.fc23.x86_64
qemu-system-x86-2.2.0-5.fc22.x86_64
Other info
----------
- Unpacking the kernel-3.20.0-0.rc0.git5.1.fc23.src.rpm and looking at
this file, arch/x86/kvm/vmx.c, line 9190 is below, with contextual
code:
[. . .]
9178 * Emulate an exit from nested guest (L2) to L1, i.e., prepare to run L1
9179 * and modify vmcs12 to make it see what it would expect to see there if
9180 * L2 was its real guest. Must only be called when in L2 (is_guest_mode())
9181 */
9182 static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
9183 u32 exit_intr_info,
9184 unsigned long exit_qualification)
9185 {
9186 struct vcpu_vmx *vmx = to_vmx(vcpu);
9187 struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
9188
9189 /* trying to cancel vmlaunch/vmresume is a bug */
9190 WARN_ON_ONCE(vmx->nested.nested_run_pending);
9191
9192 leave_guest_mode(vcpu);
9193 prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
9194 exit_qualification);
9195
9196 vmx_load_vmcs01(vcpu);
9197
9198 if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
9199 && nested_exit_intr_ack_set(vcpu)) {
9200 int irq = kvm_cpu_get_interrupt(vcpu);
9201 WARN_ON(irq < 0);
9202 vmcs12->vm_exit_intr_info = irq |
9203 INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
9204 }
- The above line 9190 was introduced in this commt:
$ git log -S'WARN_ON_ONCE(vmx->nested.nested_run_pending)' \
-- ./arch/x86/kvm/vmx.c
commit 5f3d5799974b89100268ba813cec8db7bd0693fb
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date: Sun Apr 14 12:12:46 2013 +0200
KVM: nVMX: Rework event injection and recovery
The basic idea is to always transfer the pending event injection on
vmexit into the architectural state of the VCPU and then drop it from
there if it turns out that we left L2 to enter L1, i.e. if we enter
prepare_vmcs12.
vmcs12_save_pending_events takes care to transfer pending L0 events into
the queue of L1. That is mandatory as L1 may decide to switch the guest
state completely, invalidating or preserving the pending events for
later injection (including on a different node, once we support
migration).
This concept is based on the rule that a pending vmlaunch/vmresume is
not canceled. Otherwise, we would risk to lose injected events or leak
them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
entry of nested_vmx_vmexit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
- `dmesg`, `dmidecode`, `x86info -a` details of L0 and L1 here
https://kashyapc.fedorapeople.org/virt/Info-L0-Intel-Xeon-and-L1-nVMX-test/
--
/kashyap
next reply other threads:[~2015-02-16 20:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-16 20:40 Kashyap Chamarthy [this message]
2015-02-17 6:02 ` [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting* Jan Kiszka
2015-02-17 11:24 ` Kashyap Chamarthy
2015-02-17 18:00 ` Bandan Das
2015-02-17 18:07 ` Jan Kiszka
2015-02-18 10:20 ` Kashyap Chamarthy
2015-02-18 16:42 ` Paolo Bonzini
2015-02-19 12:07 ` Kashyap Chamarthy
2015-02-19 15:01 ` Radim Krčmář
2015-02-19 16:02 ` Radim Krčmář
2015-02-19 16:07 ` Radim Krčmář
2015-02-19 21:10 ` Kashyap Chamarthy
2015-02-19 22:28 ` Kashyap Chamarthy
2015-02-20 16:14 ` Radim Krčmář
2015-02-20 19:45 ` Kashyap Chamarthy
2015-02-22 15:46 ` Kashyap Chamarthy
2015-02-23 13:56 ` Radim Krčmář
2015-02-23 16:14 ` Kashyap Chamarthy
2015-02-23 17:09 ` Kashyap Chamarthy
2015-02-23 18:05 ` Kashyap Chamarthy
2015-02-24 16:30 ` [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0 Radim Krčmář
2015-02-24 16:39 ` Jan Kiszka
2015-02-24 18:32 ` Bandan Das
2015-02-25 15:50 ` Kashyap Chamarthy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150216204013.GI21838@tesla.redhat.com \
--to=kchamart@redhat.com \
--cc=dgilbert@redhat.com \
--cc=jan.kiszka@siemens.com \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.