All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kashyap Chamarthy <kchamart@redhat.com>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: kvm@vger.kernel.org, dgilbert@redhat.com
Subject: Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
Date: Tue, 17 Feb 2015 12:24:26 +0100	[thread overview]
Message-ID: <20150217112426.GL21838@tesla.redhat.com> (raw)
In-Reply-To: <54E2D966.9070706@siemens.com>

On Tue, Feb 17, 2015 at 07:02:14AM +0100, Jan Kiszka wrote:
> On 2015-02-16 21:40, Kashyap Chamarthy wrote:
> > I can observe this only one of the Intel Xeon machines (which has 48
> > CPUs and 1TB memory), but very reliably reproducible.
> > 
> > 
> > Reproducer:
> > 
> >   - Just ensure physical host (L0) and guest hypervisor (L1) are running
> >     3.20.0-0.rc0.git5.1 Kernel (I used from Fedora's Rawhide).
> >     Preferably on an Intel Xeon machine - as that's where I could
> >     reproduce this issue, not on a Haswell machine
> >   - Boot an L2 guest: Run `qemu-sanity-check --accel=kvm` in L1 (or
> >     your own preferred method to boot an L2 KVM guest).
> >   - On a different terminal, which has serial console for L1: observe L1
> >     reboot
> > 
> > 
> > The only thing I notice in `demsg` (on L0) is this trace. _However_ this
> > trace does not occur when an L1 reboot is triggered while you watch
> > `dmesg -w` (to wait for new messages) as I boot an L2 guest -- which
> > means, the below trace is not the root cause of L1 being rebooted.  When
> > the L2 gets rebooted, what you observe is just one of these messages
> > "vcpu0 unhandled rdmsr: 0x1a6" below
> > 
> > . . .
> > [Feb16 13:44] ------------[ cut here ]------------
> > [  +0.004632] WARNING: CPU: 4 PID: 1837 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]()
> > [  +0.009835] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ip6table_filter ip6_tables cfg80211 rfkill iTCO_wdt iTCO_vendor_support ipmi_devintf gpio_ich dcdbas coretemp kvm_intel kvm crc32c_intel ipmi_ssif serio_raw acpi_power_meter ipmi_si tpm_tis ipmi_msghandler tpm lpc_ich i7core_edac mfd_core edac_core acpi_cpufreq shpchp wmi mgag200 i2c_algo_bit drm_kms_helper ttm ata_generic drm pata_acpi megaraid_sas bnx2
> > [  +0.050289] CPU: 4 PID: 1837 Comm: qemu-system-x86 Not tainted 3.20.0-0.rc0.git5.1.fc23.x86_64 #1
> > [  +0.008902] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
> > [  +0.007469]  0000000000000000 00000000ee6c0c54 ffff88bf60bf7c18 ffffffff818760f7
> > [  +0.007542]  0000000000000000 0000000000000000 ffff88bf60bf7c58 ffffffff810ab80a
> > [  +0.007519]  ffff88ff625b8000 ffff883f55f9b000 0000000000000000 0000000000000014
> > [  +0.007489] Call Trace:
> > [  +0.002471]  [<ffffffff818760f7>] dump_stack+0x4c/0x65
> > [  +0.005152]  [<ffffffff810ab80a>] warn_slowpath_common+0x8a/0xc0
> > [  +0.006020]  [<ffffffff810ab93a>] warn_slowpath_null+0x1a/0x20
> > [  +0.005851]  [<ffffffffa130957e>] nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]
> > [  +0.006974]  [<ffffffffa130c5f7>] ? vmx_handle_exit+0x1e7/0xcb2 [kvm_intel]
> > [  +0.006999]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
> > [  +0.007239]  [<ffffffffa130992a>] vmx_queue_exception+0x10a/0x150 [kvm_intel]
> > [  +0.007136]  [<ffffffffa02cb30b>] kvm_arch_vcpu_ioctl_run+0x106b/0x1b50 [kvm]
> > [  +0.007162]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
> > [  +0.007241]  [<ffffffff8110760d>] ? trace_hardirqs_on+0xd/0x10
> > [  +0.005864]  [<ffffffffa02b2df6>] ? vcpu_load+0x26/0x70 [kvm]
> > [  +0.005761]  [<ffffffff81103c0f>] ? lock_release_holdtime.part.29+0xf/0x200
> > [  +0.006979]  [<ffffffffa02c5f88>] ? kvm_arch_vcpu_load+0x58/0x210 [kvm]
> > [  +0.006634]  [<ffffffffa02b3203>] kvm_vcpu_ioctl+0x383/0x7e0 [kvm]
> > [  +0.006197]  [<ffffffff81027b9d>] ? native_sched_clock+0x2d/0xa0
> > [  +0.006026]  [<ffffffff810d5fc6>] ? creds_are_invalid.part.1+0x16/0x50
> > [  +0.006537]  [<ffffffff810d6021>] ? creds_are_invalid+0x21/0x30
> > [  +0.005930]  [<ffffffff813a61da>] ? inode_has_perm.isra.48+0x2a/0xa0
> > [  +0.006365]  [<ffffffff8128c7b8>] do_vfs_ioctl+0x2e8/0x530
> > [  +0.005496]  [<ffffffff8128ca81>] SyS_ioctl+0x81/0xa0
> > [  +0.005065]  [<ffffffff8187f8e9>] system_call_fastpath+0x12/0x17
> > [  +0.006014] ---[ end trace 2f24e0820b44f686 ]---
> > [  +5.870886] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
> > [  +0.004991] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
> > [  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
> > [Feb16 14:18] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
> > [  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
> > [  +0.004998] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
> > . . .
> > 
> > 
> > Version
> > -------
> > 
> > Exact below versions were used on L0 and L1:
> > 
> >   $ uname -r; rpm -q qemu-system-x86
> >   3.20.0-0.rc0.git5.1.fc23.x86_64
> >   qemu-system-x86-2.2.0-5.fc22.x86_64
> > 
> > 
> > 
> > Other info
> > ----------
> > 
> > - Unpacking the kernel-3.20.0-0.rc0.git5.1.fc23.src.rpm and looking at
> >   this file, arch/x86/kvm/vmx.c, line 9190 is below, with contextual
> >   code:
> > 
> >    [. . .]
> >    9178  * Emulate an exit from nested guest (L2) to L1, i.e., prepare to run L1
> >    9179  * and modify vmcs12 to make it see what it would expect to see there if
> >    9180  * L2 was its real guest. Must only be called when in L2 (is_guest_mode())
> >    9181  */
> >    9182 static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
> >    9183                               u32 exit_intr_info,
> >    9184                               unsigned long exit_qualification)
> >    9185 {
> >    9186         struct vcpu_vmx *vmx = to_vmx(vcpu);
> >    9187         struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> >    9188 
> >    9189         /* trying to cancel vmlaunch/vmresume is a bug */
> >    9190         WARN_ON_ONCE(vmx->nested.nested_run_pending);
> >    9191 
> >    9192         leave_guest_mode(vcpu);
> >    9193         prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
> >    9194                        exit_qualification);
> >    9195 
> >    9196         vmx_load_vmcs01(vcpu);
> >    9197 
> >    9198         if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
> >    9199             && nested_exit_intr_ack_set(vcpu)) {
> >    9200                 int irq = kvm_cpu_get_interrupt(vcpu);
> >    9201                 WARN_ON(irq < 0);
> >    9202                 vmcs12->vm_exit_intr_info = irq |
> >    9203                         INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
> >    9204         }
> > 
> > 
> > - The above line 9190 was introduced in this commt:
> > 
> >   $ git log -S'WARN_ON_ONCE(vmx->nested.nested_run_pending)' \
> >       -- ./arch/x86/kvm/vmx.c
> >   commit 5f3d5799974b89100268ba813cec8db7bd0693fb
> >   Author: Jan Kiszka <jan.kiszka@siemens.com>
> >   Date:   Sun Apr 14 12:12:46 2013 +0200
> >   
> >       KVM: nVMX: Rework event injection and recovery
> >       
> >       The basic idea is to always transfer the pending event injection on
> >       vmexit into the architectural state of the VCPU and then drop it from
> >       there if it turns out that we left L2 to enter L1, i.e. if we enter
> >       prepare_vmcs12.
> >       
> >       vmcs12_save_pending_events takes care to transfer pending L0 events into
> >       the queue of L1. That is mandatory as L1 may decide to switch the guest
> >       state completely, invalidating or preserving the pending events for
> >       later injection (including on a different node, once we support
> >       migration).
> >       
> >       This concept is based on the rule that a pending vmlaunch/vmresume is
> >       not canceled. Otherwise, we would risk to lose injected events or leak
> >       them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
> >       entry of nested_vmx_vmexit.
> >       
> >       Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >       Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > 
> > 
> > - `dmesg`, `dmidecode`, `x86info -a` details of L0 and L1 here
> > 
> >     https://kashyapc.fedorapeople.org/virt/Info-L0-Intel-Xeon-and-L1-nVMX-test/
> > 
> 
> Does enable_apicv make a difference?

Actually, I did perform a test (on Paolo's suggestion on IRC) with
enable_apicv=0 on physical host, and it didn't make any difference:

$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.20.0-0.rc0.git5.1.fc23.x86_64 root=/dev/mapper/fedora--server_dell--per910--02-root ro console=ttyS1,115200n81 rd.lvm.lv=fedora-server_dell-per910-02/swap rd.lvm.lv=fedora-server_dell-per910-02/root LANG=en_US.UTF-8 enable_apicv=0

> Is this a regression caused by the commit, or do you only see it with
> very recent kvm.git?

Afraid, I didn't bisect it, but I just wanted to note that the above
specific WARN was introduced in the above commit.

I'm sure this Kernel (on L0) does not exhibit the problem:
kernel-3.17.4-301.fc21.x86_64. But, if I had either of these two Kernels
on the physical host, then the said problem manifests (L1 reboots):
3.19.0-1.fc22 or kernel-3.20.0-0.rc0.git5.1.fc23

-- 
/kashyap

  reply	other threads:[~2015-02-17 11:24 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-16 20:40 [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting* Kashyap Chamarthy
2015-02-17  6:02 ` Jan Kiszka
2015-02-17 11:24   ` Kashyap Chamarthy [this message]
2015-02-17 18:00     ` Bandan Das
2015-02-17 18:07       ` Jan Kiszka
2015-02-18 10:20         ` Kashyap Chamarthy
2015-02-18 16:42     ` Paolo Bonzini
2015-02-19 12:07       ` Kashyap Chamarthy
2015-02-19 15:01         ` Radim Krčmář
2015-02-19 16:02           ` Radim Krčmář
2015-02-19 16:07             ` Radim Krčmář
2015-02-19 21:10             ` Kashyap Chamarthy
2015-02-19 22:28               ` Kashyap Chamarthy
2015-02-20 16:14                 ` Radim Krčmář
2015-02-20 19:45                   ` Kashyap Chamarthy
2015-02-22 15:46                     ` Kashyap Chamarthy
2015-02-23 13:56                       ` Radim Krčmář
2015-02-23 16:14                         ` Kashyap Chamarthy
2015-02-23 17:09                           ` Kashyap Chamarthy
2015-02-23 18:05                             ` Kashyap Chamarthy
2015-02-24 16:30                               ` [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0 Radim Krčmář
2015-02-24 16:39                                 ` Jan Kiszka
2015-02-24 18:32                                   ` Bandan Das
2015-02-25 15:50                                 ` Kashyap Chamarthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150217112426.GL21838@tesla.redhat.com \
    --to=kchamart@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.