All of lore.kernel.org
 help / color / mirror / Atom feed
* [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
@ 2015-02-16 20:40 Kashyap Chamarthy
  2015-02-17  6:02 ` Jan Kiszka
  0 siblings, 1 reply; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-16 20:40 UTC (permalink / raw)
  To: kvm, jan.kiszka; +Cc: dgilbert

I can observe this only one of the Intel Xeon machines (which has 48
CPUs and 1TB memory), but very reliably reproducible.


Reproducer:

  - Just ensure physical host (L0) and guest hypervisor (L1) are running
    3.20.0-0.rc0.git5.1 Kernel (I used from Fedora's Rawhide).
    Preferably on an Intel Xeon machine - as that's where I could
    reproduce this issue, not on a Haswell machine
  - Boot an L2 guest: Run `qemu-sanity-check --accel=kvm` in L1 (or
    your own preferred method to boot an L2 KVM guest).
  - On a different terminal, which has serial console for L1: observe L1
    reboot


The only thing I notice in `demsg` (on L0) is this trace. _However_ this
trace does not occur when an L1 reboot is triggered while you watch
`dmesg -w` (to wait for new messages) as I boot an L2 guest -- which
means, the below trace is not the root cause of L1 being rebooted.  When
the L2 gets rebooted, what you observe is just one of these messages
"vcpu0 unhandled rdmsr: 0x1a6" below

. . .
[Feb16 13:44] ------------[ cut here ]------------
[  +0.004632] WARNING: CPU: 4 PID: 1837 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]()
[  +0.009835] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ip6table_filter ip6_tables cfg80211 rfkill iTCO_wdt iTCO_vendor_support ipmi_devintf gpio_ich dcdbas coretemp kvm_intel kvm crc32c_intel ipmi_ssif serio_raw acpi_power_meter ipmi_si tpm_tis ipmi_msghandler tpm lpc_ich i7core_edac mfd_core edac_core acpi_cpufreq shpchp wmi mgag200 i2c_algo_bit drm_kms_helper ttm ata_generic drm pata_acpi megaraid_sas bnx2
[  +0.050289] CPU: 4 PID: 1837 Comm: qemu-system-x86 Not tainted 3.20.0-0.rc0.git5.1.fc23.x86_64 #1
[  +0.008902] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
[  +0.007469]  0000000000000000 00000000ee6c0c54 ffff88bf60bf7c18 ffffffff818760f7
[  +0.007542]  0000000000000000 0000000000000000 ffff88bf60bf7c58 ffffffff810ab80a
[  +0.007519]  ffff88ff625b8000 ffff883f55f9b000 0000000000000000 0000000000000014
[  +0.007489] Call Trace:
[  +0.002471]  [<ffffffff818760f7>] dump_stack+0x4c/0x65
[  +0.005152]  [<ffffffff810ab80a>] warn_slowpath_common+0x8a/0xc0
[  +0.006020]  [<ffffffff810ab93a>] warn_slowpath_null+0x1a/0x20
[  +0.005851]  [<ffffffffa130957e>] nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]
[  +0.006974]  [<ffffffffa130c5f7>] ? vmx_handle_exit+0x1e7/0xcb2 [kvm_intel]
[  +0.006999]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
[  +0.007239]  [<ffffffffa130992a>] vmx_queue_exception+0x10a/0x150 [kvm_intel]
[  +0.007136]  [<ffffffffa02cb30b>] kvm_arch_vcpu_ioctl_run+0x106b/0x1b50 [kvm]
[  +0.007162]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
[  +0.007241]  [<ffffffff8110760d>] ? trace_hardirqs_on+0xd/0x10
[  +0.005864]  [<ffffffffa02b2df6>] ? vcpu_load+0x26/0x70 [kvm]
[  +0.005761]  [<ffffffff81103c0f>] ? lock_release_holdtime.part.29+0xf/0x200
[  +0.006979]  [<ffffffffa02c5f88>] ? kvm_arch_vcpu_load+0x58/0x210 [kvm]
[  +0.006634]  [<ffffffffa02b3203>] kvm_vcpu_ioctl+0x383/0x7e0 [kvm]
[  +0.006197]  [<ffffffff81027b9d>] ? native_sched_clock+0x2d/0xa0
[  +0.006026]  [<ffffffff810d5fc6>] ? creds_are_invalid.part.1+0x16/0x50
[  +0.006537]  [<ffffffff810d6021>] ? creds_are_invalid+0x21/0x30
[  +0.005930]  [<ffffffff813a61da>] ? inode_has_perm.isra.48+0x2a/0xa0
[  +0.006365]  [<ffffffff8128c7b8>] do_vfs_ioctl+0x2e8/0x530
[  +0.005496]  [<ffffffff8128ca81>] SyS_ioctl+0x81/0xa0
[  +0.005065]  [<ffffffff8187f8e9>] system_call_fastpath+0x12/0x17
[  +0.006014] ---[ end trace 2f24e0820b44f686 ]---
[  +5.870886] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
[  +0.004991] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
[  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
[Feb16 14:18] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
[  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
[  +0.004998] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
. . .


Version
-------

Exact below versions were used on L0 and L1:

  $ uname -r; rpm -q qemu-system-x86
  3.20.0-0.rc0.git5.1.fc23.x86_64
  qemu-system-x86-2.2.0-5.fc22.x86_64



Other info
----------

- Unpacking the kernel-3.20.0-0.rc0.git5.1.fc23.src.rpm and looking at
  this file, arch/x86/kvm/vmx.c, line 9190 is below, with contextual
  code:

   [. . .]
   9178  * Emulate an exit from nested guest (L2) to L1, i.e., prepare to run L1
   9179  * and modify vmcs12 to make it see what it would expect to see there if
   9180  * L2 was its real guest. Must only be called when in L2 (is_guest_mode())
   9181  */
   9182 static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
   9183                               u32 exit_intr_info,
   9184                               unsigned long exit_qualification)
   9185 {
   9186         struct vcpu_vmx *vmx = to_vmx(vcpu);
   9187         struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
   9188 
   9189         /* trying to cancel vmlaunch/vmresume is a bug */
   9190         WARN_ON_ONCE(vmx->nested.nested_run_pending);
   9191 
   9192         leave_guest_mode(vcpu);
   9193         prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
   9194                        exit_qualification);
   9195 
   9196         vmx_load_vmcs01(vcpu);
   9197 
   9198         if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
   9199             && nested_exit_intr_ack_set(vcpu)) {
   9200                 int irq = kvm_cpu_get_interrupt(vcpu);
   9201                 WARN_ON(irq < 0);
   9202                 vmcs12->vm_exit_intr_info = irq |
   9203                         INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
   9204         }


- The above line 9190 was introduced in this commt:

  $ git log -S'WARN_ON_ONCE(vmx->nested.nested_run_pending)' \
      -- ./arch/x86/kvm/vmx.c
  commit 5f3d5799974b89100268ba813cec8db7bd0693fb
  Author: Jan Kiszka <jan.kiszka@siemens.com>
  Date:   Sun Apr 14 12:12:46 2013 +0200
  
      KVM: nVMX: Rework event injection and recovery
      
      The basic idea is to always transfer the pending event injection on
      vmexit into the architectural state of the VCPU and then drop it from
      there if it turns out that we left L2 to enter L1, i.e. if we enter
      prepare_vmcs12.
      
      vmcs12_save_pending_events takes care to transfer pending L0 events into
      the queue of L1. That is mandatory as L1 may decide to switch the guest
      state completely, invalidating or preserving the pending events for
      later injection (including on a different node, once we support
      migration).
      
      This concept is based on the rule that a pending vmlaunch/vmresume is
      not canceled. Otherwise, we would risk to lose injected events or leak
      them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
      entry of nested_vmx_vmexit.
      
      Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: Gleb Natapov <gleb@redhat.com>


- `dmesg`, `dmidecode`, `x86info -a` details of L0 and L1 here

    https://kashyapc.fedorapeople.org/virt/Info-L0-Intel-Xeon-and-L1-nVMX-test/

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-16 20:40 [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting* Kashyap Chamarthy
@ 2015-02-17  6:02 ` Jan Kiszka
  2015-02-17 11:24   ` Kashyap Chamarthy
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Kiszka @ 2015-02-17  6:02 UTC (permalink / raw)
  To: Kashyap Chamarthy, kvm; +Cc: dgilbert

On 2015-02-16 21:40, Kashyap Chamarthy wrote:
> I can observe this only one of the Intel Xeon machines (which has 48
> CPUs and 1TB memory), but very reliably reproducible.
> 
> 
> Reproducer:
> 
>   - Just ensure physical host (L0) and guest hypervisor (L1) are running
>     3.20.0-0.rc0.git5.1 Kernel (I used from Fedora's Rawhide).
>     Preferably on an Intel Xeon machine - as that's where I could
>     reproduce this issue, not on a Haswell machine
>   - Boot an L2 guest: Run `qemu-sanity-check --accel=kvm` in L1 (or
>     your own preferred method to boot an L2 KVM guest).
>   - On a different terminal, which has serial console for L1: observe L1
>     reboot
> 
> 
> The only thing I notice in `demsg` (on L0) is this trace. _However_ this
> trace does not occur when an L1 reboot is triggered while you watch
> `dmesg -w` (to wait for new messages) as I boot an L2 guest -- which
> means, the below trace is not the root cause of L1 being rebooted.  When
> the L2 gets rebooted, what you observe is just one of these messages
> "vcpu0 unhandled rdmsr: 0x1a6" below
> 
> . . .
> [Feb16 13:44] ------------[ cut here ]------------
> [  +0.004632] WARNING: CPU: 4 PID: 1837 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]()
> [  +0.009835] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ip6table_filter ip6_tables cfg80211 rfkill iTCO_wdt iTCO_vendor_support ipmi_devintf gpio_ich dcdbas coretemp kvm_intel kvm crc32c_intel ipmi_ssif serio_raw acpi_power_meter ipmi_si tpm_tis ipmi_msghandler tpm lpc_ich i7core_edac mfd_core edac_core acpi_cpufreq shpchp wmi mgag200 i2c_algo_bit drm_kms_helper ttm ata_generic drm pata_acpi megaraid_sas bnx2
> [  +0.050289] CPU: 4 PID: 1837 Comm: qemu-system-x86 Not tainted 3.20.0-0.rc0.git5.1.fc23.x86_64 #1
> [  +0.008902] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
> [  +0.007469]  0000000000000000 00000000ee6c0c54 ffff88bf60bf7c18 ffffffff818760f7
> [  +0.007542]  0000000000000000 0000000000000000 ffff88bf60bf7c58 ffffffff810ab80a
> [  +0.007519]  ffff88ff625b8000 ffff883f55f9b000 0000000000000000 0000000000000014
> [  +0.007489] Call Trace:
> [  +0.002471]  [<ffffffff818760f7>] dump_stack+0x4c/0x65
> [  +0.005152]  [<ffffffff810ab80a>] warn_slowpath_common+0x8a/0xc0
> [  +0.006020]  [<ffffffff810ab93a>] warn_slowpath_null+0x1a/0x20
> [  +0.005851]  [<ffffffffa130957e>] nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]
> [  +0.006974]  [<ffffffffa130c5f7>] ? vmx_handle_exit+0x1e7/0xcb2 [kvm_intel]
> [  +0.006999]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
> [  +0.007239]  [<ffffffffa130992a>] vmx_queue_exception+0x10a/0x150 [kvm_intel]
> [  +0.007136]  [<ffffffffa02cb30b>] kvm_arch_vcpu_ioctl_run+0x106b/0x1b50 [kvm]
> [  +0.007162]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
> [  +0.007241]  [<ffffffff8110760d>] ? trace_hardirqs_on+0xd/0x10
> [  +0.005864]  [<ffffffffa02b2df6>] ? vcpu_load+0x26/0x70 [kvm]
> [  +0.005761]  [<ffffffff81103c0f>] ? lock_release_holdtime.part.29+0xf/0x200
> [  +0.006979]  [<ffffffffa02c5f88>] ? kvm_arch_vcpu_load+0x58/0x210 [kvm]
> [  +0.006634]  [<ffffffffa02b3203>] kvm_vcpu_ioctl+0x383/0x7e0 [kvm]
> [  +0.006197]  [<ffffffff81027b9d>] ? native_sched_clock+0x2d/0xa0
> [  +0.006026]  [<ffffffff810d5fc6>] ? creds_are_invalid.part.1+0x16/0x50
> [  +0.006537]  [<ffffffff810d6021>] ? creds_are_invalid+0x21/0x30
> [  +0.005930]  [<ffffffff813a61da>] ? inode_has_perm.isra.48+0x2a/0xa0
> [  +0.006365]  [<ffffffff8128c7b8>] do_vfs_ioctl+0x2e8/0x530
> [  +0.005496]  [<ffffffff8128ca81>] SyS_ioctl+0x81/0xa0
> [  +0.005065]  [<ffffffff8187f8e9>] system_call_fastpath+0x12/0x17
> [  +0.006014] ---[ end trace 2f24e0820b44f686 ]---
> [  +5.870886] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
> [  +0.004991] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
> [  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
> [Feb16 14:18] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
> [  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
> [  +0.004998] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
> . . .
> 
> 
> Version
> -------
> 
> Exact below versions were used on L0 and L1:
> 
>   $ uname -r; rpm -q qemu-system-x86
>   3.20.0-0.rc0.git5.1.fc23.x86_64
>   qemu-system-x86-2.2.0-5.fc22.x86_64
> 
> 
> 
> Other info
> ----------
> 
> - Unpacking the kernel-3.20.0-0.rc0.git5.1.fc23.src.rpm and looking at
>   this file, arch/x86/kvm/vmx.c, line 9190 is below, with contextual
>   code:
> 
>    [. . .]
>    9178  * Emulate an exit from nested guest (L2) to L1, i.e., prepare to run L1
>    9179  * and modify vmcs12 to make it see what it would expect to see there if
>    9180  * L2 was its real guest. Must only be called when in L2 (is_guest_mode())
>    9181  */
>    9182 static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>    9183                               u32 exit_intr_info,
>    9184                               unsigned long exit_qualification)
>    9185 {
>    9186         struct vcpu_vmx *vmx = to_vmx(vcpu);
>    9187         struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>    9188 
>    9189         /* trying to cancel vmlaunch/vmresume is a bug */
>    9190         WARN_ON_ONCE(vmx->nested.nested_run_pending);
>    9191 
>    9192         leave_guest_mode(vcpu);
>    9193         prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
>    9194                        exit_qualification);
>    9195 
>    9196         vmx_load_vmcs01(vcpu);
>    9197 
>    9198         if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
>    9199             && nested_exit_intr_ack_set(vcpu)) {
>    9200                 int irq = kvm_cpu_get_interrupt(vcpu);
>    9201                 WARN_ON(irq < 0);
>    9202                 vmcs12->vm_exit_intr_info = irq |
>    9203                         INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
>    9204         }
> 
> 
> - The above line 9190 was introduced in this commt:
> 
>   $ git log -S'WARN_ON_ONCE(vmx->nested.nested_run_pending)' \
>       -- ./arch/x86/kvm/vmx.c
>   commit 5f3d5799974b89100268ba813cec8db7bd0693fb
>   Author: Jan Kiszka <jan.kiszka@siemens.com>
>   Date:   Sun Apr 14 12:12:46 2013 +0200
>   
>       KVM: nVMX: Rework event injection and recovery
>       
>       The basic idea is to always transfer the pending event injection on
>       vmexit into the architectural state of the VCPU and then drop it from
>       there if it turns out that we left L2 to enter L1, i.e. if we enter
>       prepare_vmcs12.
>       
>       vmcs12_save_pending_events takes care to transfer pending L0 events into
>       the queue of L1. That is mandatory as L1 may decide to switch the guest
>       state completely, invalidating or preserving the pending events for
>       later injection (including on a different node, once we support
>       migration).
>       
>       This concept is based on the rule that a pending vmlaunch/vmresume is
>       not canceled. Otherwise, we would risk to lose injected events or leak
>       them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
>       entry of nested_vmx_vmexit.
>       
>       Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>       Signed-off-by: Gleb Natapov <gleb@redhat.com>
> 
> 
> - `dmesg`, `dmidecode`, `x86info -a` details of L0 and L1 here
> 
>     https://kashyapc.fedorapeople.org/virt/Info-L0-Intel-Xeon-and-L1-nVMX-test/
> 

Does enable_apicv make a difference?

Is this a regression caused by the commit, or do you only see it with
very recent kvm.git?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-17  6:02 ` Jan Kiszka
@ 2015-02-17 11:24   ` Kashyap Chamarthy
  2015-02-17 18:00     ` Bandan Das
  2015-02-18 16:42     ` Paolo Bonzini
  0 siblings, 2 replies; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-17 11:24 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: kvm, dgilbert

On Tue, Feb 17, 2015 at 07:02:14AM +0100, Jan Kiszka wrote:
> On 2015-02-16 21:40, Kashyap Chamarthy wrote:
> > I can observe this only one of the Intel Xeon machines (which has 48
> > CPUs and 1TB memory), but very reliably reproducible.
> > 
> > 
> > Reproducer:
> > 
> >   - Just ensure physical host (L0) and guest hypervisor (L1) are running
> >     3.20.0-0.rc0.git5.1 Kernel (I used from Fedora's Rawhide).
> >     Preferably on an Intel Xeon machine - as that's where I could
> >     reproduce this issue, not on a Haswell machine
> >   - Boot an L2 guest: Run `qemu-sanity-check --accel=kvm` in L1 (or
> >     your own preferred method to boot an L2 KVM guest).
> >   - On a different terminal, which has serial console for L1: observe L1
> >     reboot
> > 
> > 
> > The only thing I notice in `demsg` (on L0) is this trace. _However_ this
> > trace does not occur when an L1 reboot is triggered while you watch
> > `dmesg -w` (to wait for new messages) as I boot an L2 guest -- which
> > means, the below trace is not the root cause of L1 being rebooted.  When
> > the L2 gets rebooted, what you observe is just one of these messages
> > "vcpu0 unhandled rdmsr: 0x1a6" below
> > 
> > . . .
> > [Feb16 13:44] ------------[ cut here ]------------
> > [  +0.004632] WARNING: CPU: 4 PID: 1837 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]()
> > [  +0.009835] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ip6table_filter ip6_tables cfg80211 rfkill iTCO_wdt iTCO_vendor_support ipmi_devintf gpio_ich dcdbas coretemp kvm_intel kvm crc32c_intel ipmi_ssif serio_raw acpi_power_meter ipmi_si tpm_tis ipmi_msghandler tpm lpc_ich i7core_edac mfd_core edac_core acpi_cpufreq shpchp wmi mgag200 i2c_algo_bit drm_kms_helper ttm ata_generic drm pata_acpi megaraid_sas bnx2
> > [  +0.050289] CPU: 4 PID: 1837 Comm: qemu-system-x86 Not tainted 3.20.0-0.rc0.git5.1.fc23.x86_64 #1
> > [  +0.008902] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
> > [  +0.007469]  0000000000000000 00000000ee6c0c54 ffff88bf60bf7c18 ffffffff818760f7
> > [  +0.007542]  0000000000000000 0000000000000000 ffff88bf60bf7c58 ffffffff810ab80a
> > [  +0.007519]  ffff88ff625b8000 ffff883f55f9b000 0000000000000000 0000000000000014
> > [  +0.007489] Call Trace:
> > [  +0.002471]  [<ffffffff818760f7>] dump_stack+0x4c/0x65
> > [  +0.005152]  [<ffffffff810ab80a>] warn_slowpath_common+0x8a/0xc0
> > [  +0.006020]  [<ffffffff810ab93a>] warn_slowpath_null+0x1a/0x20
> > [  +0.005851]  [<ffffffffa130957e>] nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]
> > [  +0.006974]  [<ffffffffa130c5f7>] ? vmx_handle_exit+0x1e7/0xcb2 [kvm_intel]
> > [  +0.006999]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
> > [  +0.007239]  [<ffffffffa130992a>] vmx_queue_exception+0x10a/0x150 [kvm_intel]
> > [  +0.007136]  [<ffffffffa02cb30b>] kvm_arch_vcpu_ioctl_run+0x106b/0x1b50 [kvm]
> > [  +0.007162]  [<ffffffffa02ca972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
> > [  +0.007241]  [<ffffffff8110760d>] ? trace_hardirqs_on+0xd/0x10
> > [  +0.005864]  [<ffffffffa02b2df6>] ? vcpu_load+0x26/0x70 [kvm]
> > [  +0.005761]  [<ffffffff81103c0f>] ? lock_release_holdtime.part.29+0xf/0x200
> > [  +0.006979]  [<ffffffffa02c5f88>] ? kvm_arch_vcpu_load+0x58/0x210 [kvm]
> > [  +0.006634]  [<ffffffffa02b3203>] kvm_vcpu_ioctl+0x383/0x7e0 [kvm]
> > [  +0.006197]  [<ffffffff81027b9d>] ? native_sched_clock+0x2d/0xa0
> > [  +0.006026]  [<ffffffff810d5fc6>] ? creds_are_invalid.part.1+0x16/0x50
> > [  +0.006537]  [<ffffffff810d6021>] ? creds_are_invalid+0x21/0x30
> > [  +0.005930]  [<ffffffff813a61da>] ? inode_has_perm.isra.48+0x2a/0xa0
> > [  +0.006365]  [<ffffffff8128c7b8>] do_vfs_ioctl+0x2e8/0x530
> > [  +0.005496]  [<ffffffff8128ca81>] SyS_ioctl+0x81/0xa0
> > [  +0.005065]  [<ffffffff8187f8e9>] system_call_fastpath+0x12/0x17
> > [  +0.006014] ---[ end trace 2f24e0820b44f686 ]---
> > [  +5.870886] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
> > [  +0.004991] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
> > [  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
> > [Feb16 14:18] kvm [1783]: vcpu0 unhandled rdmsr: 0x1c9
> > [  +0.005020] kvm [1783]: vcpu0 unhandled rdmsr: 0x1a6
> > [  +0.004998] kvm [1783]: vcpu0 unhandled rdmsr: 0x3f6
> > . . .
> > 
> > 
> > Version
> > -------
> > 
> > Exact below versions were used on L0 and L1:
> > 
> >   $ uname -r; rpm -q qemu-system-x86
> >   3.20.0-0.rc0.git5.1.fc23.x86_64
> >   qemu-system-x86-2.2.0-5.fc22.x86_64
> > 
> > 
> > 
> > Other info
> > ----------
> > 
> > - Unpacking the kernel-3.20.0-0.rc0.git5.1.fc23.src.rpm and looking at
> >   this file, arch/x86/kvm/vmx.c, line 9190 is below, with contextual
> >   code:
> > 
> >    [. . .]
> >    9178  * Emulate an exit from nested guest (L2) to L1, i.e., prepare to run L1
> >    9179  * and modify vmcs12 to make it see what it would expect to see there if
> >    9180  * L2 was its real guest. Must only be called when in L2 (is_guest_mode())
> >    9181  */
> >    9182 static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
> >    9183                               u32 exit_intr_info,
> >    9184                               unsigned long exit_qualification)
> >    9185 {
> >    9186         struct vcpu_vmx *vmx = to_vmx(vcpu);
> >    9187         struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> >    9188 
> >    9189         /* trying to cancel vmlaunch/vmresume is a bug */
> >    9190         WARN_ON_ONCE(vmx->nested.nested_run_pending);
> >    9191 
> >    9192         leave_guest_mode(vcpu);
> >    9193         prepare_vmcs12(vcpu, vmcs12, exit_reason, exit_intr_info,
> >    9194                        exit_qualification);
> >    9195 
> >    9196         vmx_load_vmcs01(vcpu);
> >    9197 
> >    9198         if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
> >    9199             && nested_exit_intr_ack_set(vcpu)) {
> >    9200                 int irq = kvm_cpu_get_interrupt(vcpu);
> >    9201                 WARN_ON(irq < 0);
> >    9202                 vmcs12->vm_exit_intr_info = irq |
> >    9203                         INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
> >    9204         }
> > 
> > 
> > - The above line 9190 was introduced in this commt:
> > 
> >   $ git log -S'WARN_ON_ONCE(vmx->nested.nested_run_pending)' \
> >       -- ./arch/x86/kvm/vmx.c
> >   commit 5f3d5799974b89100268ba813cec8db7bd0693fb
> >   Author: Jan Kiszka <jan.kiszka@siemens.com>
> >   Date:   Sun Apr 14 12:12:46 2013 +0200
> >   
> >       KVM: nVMX: Rework event injection and recovery
> >       
> >       The basic idea is to always transfer the pending event injection on
> >       vmexit into the architectural state of the VCPU and then drop it from
> >       there if it turns out that we left L2 to enter L1, i.e. if we enter
> >       prepare_vmcs12.
> >       
> >       vmcs12_save_pending_events takes care to transfer pending L0 events into
> >       the queue of L1. That is mandatory as L1 may decide to switch the guest
> >       state completely, invalidating or preserving the pending events for
> >       later injection (including on a different node, once we support
> >       migration).
> >       
> >       This concept is based on the rule that a pending vmlaunch/vmresume is
> >       not canceled. Otherwise, we would risk to lose injected events or leak
> >       them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
> >       entry of nested_vmx_vmexit.
> >       
> >       Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >       Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > 
> > 
> > - `dmesg`, `dmidecode`, `x86info -a` details of L0 and L1 here
> > 
> >     https://kashyapc.fedorapeople.org/virt/Info-L0-Intel-Xeon-and-L1-nVMX-test/
> > 
> 
> Does enable_apicv make a difference?

Actually, I did perform a test (on Paolo's suggestion on IRC) with
enable_apicv=0 on physical host, and it didn't make any difference:

$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.20.0-0.rc0.git5.1.fc23.x86_64 root=/dev/mapper/fedora--server_dell--per910--02-root ro console=ttyS1,115200n81 rd.lvm.lv=fedora-server_dell-per910-02/swap rd.lvm.lv=fedora-server_dell-per910-02/root LANG=en_US.UTF-8 enable_apicv=0

> Is this a regression caused by the commit, or do you only see it with
> very recent kvm.git?

Afraid, I didn't bisect it, but I just wanted to note that the above
specific WARN was introduced in the above commit.

I'm sure this Kernel (on L0) does not exhibit the problem:
kernel-3.17.4-301.fc21.x86_64. But, if I had either of these two Kernels
on the physical host, then the said problem manifests (L1 reboots):
3.19.0-1.fc22 or kernel-3.20.0-0.rc0.git5.1.fc23

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-17 11:24   ` Kashyap Chamarthy
@ 2015-02-17 18:00     ` Bandan Das
  2015-02-17 18:07       ` Jan Kiszka
  2015-02-18 16:42     ` Paolo Bonzini
  1 sibling, 1 reply; 24+ messages in thread
From: Bandan Das @ 2015-02-17 18:00 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: Jan Kiszka, kvm, dgilbert

Kashyap Chamarthy <kchamart@redhat.com> writes:
..
>> 
>> Does enable_apicv make a difference?
>
> Actually, I did perform a test (on Paolo's suggestion on IRC) with
> enable_apicv=0 on physical host, and it didn't make any difference:
>
> $ cat /proc/cmdline 
> BOOT_IMAGE=/vmlinuz-3.20.0-0.rc0.git5.1.fc23.x86_64 root=/dev/mapper/fedora--server_dell--per910--02-root ro console=ttyS1,115200n81 rd.lvm.lv=fedora-server_dell-per910-02/swap rd.lvm.lv=fedora-server_dell-per910-02/root LANG=en_US.UTF-8 enable_apicv=0

I am not sure if this works ? enable_apicv is a kvm_intel module parameter

>> Is this a regression caused by the commit, or do you only see it with
>> very recent kvm.git?
>
> Afraid, I didn't bisect it, but I just wanted to note that the above
> specific WARN was introduced in the above commit.

You could try an upstream kernel before the recent MSR load/store changes
to narrow down the problem.

Bandan

> I'm sure this Kernel (on L0) does not exhibit the problem:
> kernel-3.17.4-301.fc21.x86_64. But, if I had either of these two Kernels
> on the physical host, then the said problem manifests (L1 reboots):
> 3.19.0-1.fc22 or kernel-3.20.0-0.rc0.git5.1.fc23

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-17 18:00     ` Bandan Das
@ 2015-02-17 18:07       ` Jan Kiszka
  2015-02-18 10:20         ` Kashyap Chamarthy
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Kiszka @ 2015-02-17 18:07 UTC (permalink / raw)
  To: Bandan Das, Kashyap Chamarthy; +Cc: kvm, dgilbert

On 2015-02-17 19:00, Bandan Das wrote:
> Kashyap Chamarthy <kchamart@redhat.com> writes:
> ..
>>>
>>> Does enable_apicv make a difference?
>>
>> Actually, I did perform a test (on Paolo's suggestion on IRC) with
>> enable_apicv=0 on physical host, and it didn't make any difference:
>>
>> $ cat /proc/cmdline 
>> BOOT_IMAGE=/vmlinuz-3.20.0-0.rc0.git5.1.fc23.x86_64 root=/dev/mapper/fedora--server_dell--per910--02-root ro console=ttyS1,115200n81 rd.lvm.lv=fedora-server_dell-per910-02/swap rd.lvm.lv=fedora-server_dell-per910-02/root LANG=en_US.UTF-8 enable_apicv=0
> 
> I am not sure if this works ? enable_apicv is a kvm_intel module parameter

Good point. Has to be kvm_intel.enable_apicv=0 (if the module is built in).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-17 18:07       ` Jan Kiszka
@ 2015-02-18 10:20         ` Kashyap Chamarthy
  0 siblings, 0 replies; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-18 10:20 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Bandan Das, kvm, dgilbert

On Tue, Feb 17, 2015 at 07:07:21PM +0100, Jan Kiszka wrote:
> On 2015-02-17 19:00, Bandan Das wrote:
> > Kashyap Chamarthy <kchamart@redhat.com> writes:
> > ..
> >>>
> >>> Does enable_apicv make a difference?
> >>
> >> Actually, I did perform a test (on Paolo's suggestion on IRC) with
> >> enable_apicv=0 on physical host, and it didn't make any difference:
> >>
> >> $ cat /proc/cmdline 
> >> BOOT_IMAGE=/vmlinuz-3.20.0-0.rc0.git5.1.fc23.x86_64 root=/dev/mapper/fedora--server_dell--per910--02-root ro console=ttyS1,115200n81 rd.lvm.lv=fedora-server_dell-per910-02/swap rd.lvm.lv=fedora-server_dell-per910-02/root LANG=en_US.UTF-8 enable_apicv=0
> > 
> > I am not sure if this works ? enable_apicv is a kvm_intel module parameter
> 
> Good point. Has to be kvm_intel.enable_apicv=0 (if the module is built in).

Hmm, yeah, I should have just added "options kvm-intel enable_apicv=n"
(without quotes) to  /etc/modprobe.d/dist.conf.

I just rebooted the host with "kvm_intel.enable_apicv=0" on Kernel command
line:

$ dmesg | grep apicv
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.20.0-0.rc0.git5.1.fc23.x86_64 root=/dev/mapper/fedora--server_dell--per910--02-root ro console=ttyS1,115200n81 rd.lvm.lv=fedora-server_dell-per910-02/swap rd.lvm.lv=fedora-server_dell-per910-02/root LANG=en_US.UTF-8 kvm_intel.enable_apicv=0
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.20.0-0.rc0.git5.1.fc23.x86_64 root=/dev/mapper/fedora--server_dell--per910--02-root ro console=ttyS1,115200n81 rd.lvm.lv=fedora-server_dell-per910-02/swap rd.lvm.lv=fedora-server_dell-per910-02/root LANG=en_US.UTF-8 kvm_intel.enable_apicv=0

$ cat /sys/module/kvm_intel/parameters/enable_apicv
N
 

Then, booted an L2 guest over L1's serial console, while observing
host's `dmesg -w`, I can see the same traceback:

. . .
[  918.327553] ------------[ cut here ]------------
[  918.332196] WARNING: CPU: 13 PID: 2201 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]()
[  918.342162] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ip6table_filter ip6_tables cfg80211 rfkill iTCO_wdt ipmi_devintf iTCO_vendor_support gpio_ich dcdbas coretemp kvm_intel kvm crc32c_intel serio_raw ipmi_ssif ipmi_si tpm_tis tpm ipmi_msghandler acpi_power_meter i7core_edac lpc_ich edac_core acpi_cpufreq mfd_core shpchp wmi mgag200 i2c_algo_bit drm_kms_helper ttm drm ata_generic pata_acpi megaraid_sas bnx2
[  918.396548] CPU: 13 PID: 2201 Comm: qemu-system-x86 Not tainted 3.20.0-0.rc0.git5.1.fc23.x86_64 #1
[  918.405605] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
[  918.413138]  0000000000000000 00000000651bc665 ffff883f619ebc18 ffffffff818760f7
[  918.420790]  0000000000000000 0000000000000000 ffff883f619ebc58 ffffffff810ab80a
[  918.428336]  ffff887f5b838000 ffff883f3f8c8000 0000000000000000 0000000000000014
[  918.435865] Call Trace:
[  918.438390]  [<ffffffff818760f7>] dump_stack+0x4c/0x65
[  918.443590]  [<ffffffff810ab80a>] warn_slowpath_common+0x8a/0xc0
[  918.449596]  [<ffffffff810ab93a>] warn_slowpath_null+0x1a/0x20
[  918.455494]  [<ffffffffa100857e>] nested_vmx_vmexit+0x96e/0xb00 [kvm_intel]
[  918.462455]  [<ffffffffa100b5f7>] ? vmx_handle_exit+0x1e7/0xcb2 [kvm_intel]
[  918.469444]  [<ffffffffa0236972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
[  918.476731]  [<ffffffffa100892a>] vmx_queue_exception+0x10a/0x150 [kvm_intel]
[  918.483870]  [<ffffffffa023730b>] kvm_arch_vcpu_ioctl_run+0x106b/0x1b50 [kvm]
[  918.491078]  [<ffffffffa0236972>] ? kvm_arch_vcpu_ioctl_run+0x6d2/0x1b50 [kvm]
[  918.498302]  [<ffffffff8110760d>] ? trace_hardirqs_on+0xd/0x10
[  918.504203]  [<ffffffffa021edf6>] ? vcpu_load+0x26/0x70 [kvm]
[  918.510012]  [<ffffffff81103c0f>] ? lock_release_holdtime.part.29+0xf/0x200
[  918.516973]  [<ffffffffa021f203>] kvm_vcpu_ioctl+0x383/0x7e0 [kvm]
[  918.523157]  [<ffffffff81027b9d>] ? native_sched_clock+0x2d/0xa0
[  918.529167]  [<ffffffff810d5fc6>] ? creds_are_invalid.part.1+0x16/0x50
[  918.535701]  [<ffffffff810d6021>] ? creds_are_invalid+0x21/0x30
[  918.541678]  [<ffffffff813a61da>] ? inode_has_perm.isra.48+0x2a/0xa0
[  918.548097]  [<ffffffff8128c7b8>] do_vfs_ioctl+0x2e8/0x530
[  918.553579]  [<ffffffff8128ca81>] SyS_ioctl+0x81/0xa0
[  918.558635]  [<ffffffff8187f8e9>] system_call_fastpath+0x12/0x17
[  918.564640] ---[ end trace b07d41c569219c46 ]---
[ 1092.389383 <  173.824743>] kvm [2168]: vcpu0 unhandled rdmsr: 0x1c9
[ 1092.394374 <    0.004991>] kvm [2168]: vcpu0 unhandled rdmsr: 0x1a6
[ 1092.399399 <    0.005025>] kvm [2168]: vcpu0 unhandled rdmsr: 0x3f6
. . .

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-17 11:24   ` Kashyap Chamarthy
  2015-02-17 18:00     ` Bandan Das
@ 2015-02-18 16:42     ` Paolo Bonzini
  2015-02-19 12:07       ` Kashyap Chamarthy
  1 sibling, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2015-02-18 16:42 UTC (permalink / raw)
  To: Kashyap Chamarthy, Jan Kiszka; +Cc: kvm, dgilbert



On 17/02/2015 12:24, Kashyap Chamarthy wrote:
> Afraid, I didn't bisect it, but I just wanted to note that the above
> specific WARN was introduced in the above commit.
> 
> I'm sure this Kernel (on L0) does not exhibit the problem:
> kernel-3.17.4-301.fc21.x86_64. But, if I had either of these two Kernels
> on the physical host, then the said problem manifests (L1 reboots):
> 3.19.0-1.fc22 or kernel-3.20.0-0.rc0.git5.1.fc23

Nested APICv is not part of 3.19, so it cannot be the culprit.

Can you try 3.18?

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-18 16:42     ` Paolo Bonzini
@ 2015-02-19 12:07       ` Kashyap Chamarthy
  2015-02-19 15:01         ` Radim Krčmář
  0 siblings, 1 reply; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-19 12:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jan Kiszka, kvm, dgilbert

On Wed, Feb 18, 2015 at 05:42:37PM +0100, Paolo Bonzini wrote:
> 
> 
> On 17/02/2015 12:24, Kashyap Chamarthy wrote:
> > Afraid, I didn't bisect it, but I just wanted to note that the above
> > specific WARN was introduced in the above commit.
> > 
> > I'm sure this Kernel (on L0) does not exhibit the problem:
> > kernel-3.17.4-301.fc21.x86_64. But, if I had either of these two Kernels
> > on the physical host, then the said problem manifests (L1 reboots):
> > 3.19.0-1.fc22 or kernel-3.20.0-0.rc0.git5.1.fc23
> 
> Nested APICv is not part of 3.19, so it cannot be the culprit.
> 
> Can you try 3.18?

Just did two tests with 3.18:

(1) Kernel 3.18 on L0 and 3.20 on L1

    Result: Booting L2 guest causes L1 to reboot, and the same[*] stack
            trace on L0 (mentioned on this thread previously).

            But, annoyingly enough, when I did test (2) below, and then
            switched back to test (1), I don't notice the said stack
            trace in L0's `dmesg` however many times I boot an L2 guest.

(2) Kernel 3.18 on both L0 and L1

    Result: Booting L2 guest causes L1 to reboot, but *no* stack trace
            on L0



[*] Stack trace from test (1)

. . .
[ 4120.296552] ------------[ cut here ]------------
[ 4120.301190] WARNING: CPU: 6 PID: 1841 at arch/x86/kvm/vmx.c:8962 nested_vmx_vmexit+0x7ee/0x880 [kvm_intel]()
[ 4120.311048] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ip6table_filter ip6_tables cfg80211 rfkill coretemp kvm_intel kvm iTCO_wdt gpio_ich iTCO_vendor_support joydev crc32c_intel lpc_ich ipmi_devintf ipmi_si tpm_tis shpchp i7core_edac dcdbas mfd_core tpm ipmi_msghandler serio_raw edac_core acpi_power_meter wmi acpi_cpufreq mgag200 i2c_algo_bit drm_kms_helper ttm drm megaraid_sas ata_generic bnx2 pata_acpi
[ 4120.361643] CPU: 6 PID: 1841 Comm: qemu-system-x86 Not tainted 3.18.7-200.fc21.x86_64 #1
[ 4120.369757] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
[ 4120.377269]  0000000000000000 00000000e947d406 ffff88bf21f27c48 ffffffff8175e686
[ 4120.384866]  0000000000000000 0000000000000000 ffff88bf21f27c88 ffffffff810991d1
[ 4120.392469]  ffff88bf21f27c98 ffff887f1f73e000 0000000000000000 0000000000000014
[ 4120.400033] Call Trace:
[ 4120.402533]  [<ffffffff8175e686>] dump_stack+0x46/0x58
[ 4120.407714]  [<ffffffff810991d1>] warn_slowpath_common+0x81/0xa0
[ 4120.413740]  [<ffffffff810992ea>] warn_slowpath_null+0x1a/0x20
[ 4120.419611]  [<ffffffffa1cee0ee>] nested_vmx_vmexit+0x7ee/0x880 [kvm_intel]
[ 4120.426609]  [<ffffffffa1cee5af>] ? vmx_handle_exit+0x1bf/0xaa0 [kvm_intel]
[ 4120.433585]  [<ffffffffa1cee39c>] vmx_queue_exception+0xfc/0x150 [kvm_intel]
[ 4120.440697]  [<ffffffffa0192dfd>] kvm_arch_vcpu_ioctl_run+0xd9d/0x1290 [kvm]
[ 4120.447783]  [<ffffffffa018e528>] ? kvm_arch_vcpu_load+0x58/0x220 [kvm]
[ 4120.454436]  [<ffffffffa017acbc>] kvm_vcpu_ioctl+0x32c/0x5c0 [kvm]
[ 4120.460650]  [<ffffffff817634cd>] ? down_read+0x1d/0x30
[ 4120.465915]  [<ffffffff8122a1c0>] do_vfs_ioctl+0x2d0/0x4b0
[ 4120.471431]  [<ffffffff8122a421>] SyS_ioctl+0x81/0xa0
[ 4120.476477]  [<ffffffff81765429>] system_call_fastpath+0x12/0x17
[ 4120.482533] ---[ end trace 5410644656637166 ]---
[ 4128.015867] kvm [1768]: vcpu0 unhandled rdmsr: 0x1c9
[ 4128.020849] kvm [1768]: vcpu0 unhandled rdmsr: 0x1a6
[ 4128.025848] kvm [1768]: vcpu0 unhandled rdmsr: 0x3f6
. . .

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-19 12:07       ` Kashyap Chamarthy
@ 2015-02-19 15:01         ` Radim Krčmář
  2015-02-19 16:02           ` Radim Krčmář
  0 siblings, 1 reply; 24+ messages in thread
From: Radim Krčmář @ 2015-02-19 15:01 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

2015-02-19 13:07+0100, Kashyap Chamarthy:
> Just did two tests with 3.18:
> 
> (1) Kernel 3.18 on L0 and 3.20 on L1
> 
>     Result: Booting L2 guest causes L1 to reboot, and the same[*] stack
>             trace on L0 (mentioned on this thread previously).
> 
>             But, annoyingly enough, when I did test (2) below, and then
>             switched back to test (1), I don't notice the said stack
>             trace in L0's `dmesg` however many times I boot an L2 guest.
> 
> (2) Kernel 3.18 on both L0 and L1
> 
>     Result: Booting L2 guest causes L1 to reboot, but *no* stack trace
>             on L0

It is WARN_ON_ONCE, so it quite likely happened with 3.18 too.

5f3d5799974b8 KVM: nVMX: Rework event injection and recovery:
  This concept is based on the rule that a pending vmlaunch/vmresume is
  not canceled. Otherwise, we would risk to lose injected events or leak
  them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
  entry of nested_vmx_vmexit.

I wonder if we have broken the invariant since 3.9 ...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-19 15:01         ` Radim Krčmář
@ 2015-02-19 16:02           ` Radim Krčmář
  2015-02-19 16:07             ` Radim Krčmář
  2015-02-19 21:10             ` Kashyap Chamarthy
  0 siblings, 2 replies; 24+ messages in thread
From: Radim Krčmář @ 2015-02-19 16:02 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

2015-02-19 16:01+0100, Radim Krčmář:
> 2015-02-19 13:07+0100, Kashyap Chamarthy:
> 5f3d5799974b8 KVM: nVMX: Rework event injection and recovery:
>   This concept is based on the rule that a pending vmlaunch/vmresume is
>   not canceled. Otherwise, we would risk to lose injected events or leak
>   them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
>   entry of nested_vmx_vmexit.
> 
> I wonder if we have broken the invariant since 3.9 ...

e011c663b9c786d115c0f45e5b0bfae0c39428d4
KVM: nVMX: Check all exceptions for intercept during delivery to L2

  All exceptions should be checked for intercept during delivery to L2,
  but we check only #PF currently. Drop nested_run_pending while we are
  at it since exception cannot be injected during vmentry anyway.

The last sentence is not true.

Can you try if the following patch works?
(I know little about nested, so it might be introducing another bug.)

Thanks.

---8<---
KVM: nVMX: fix L2 to L1 interrupt leak

When vmx->nested.nested_run_pending is set, we aren't expected to exit
to L1, but nested_vmx_check_exception() could, since e011c663b9c7.
Prevent that.

Fixes: e011c663b9c7 ("Check all exceptions for intercept during delivery to L2")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3f73bfad0349..389166a1b79a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2098,6 +2098,9 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 
+	if (to_vmx(vcpu)->nested.nested_run_pending)
+		return 0;
+
 	if (!(vmcs12->exception_bitmap & (1u << nr)))
 		return 0;
 
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-19 16:02           ` Radim Krčmář
@ 2015-02-19 16:07             ` Radim Krčmář
  2015-02-19 21:10             ` Kashyap Chamarthy
  1 sibling, 0 replies; 24+ messages in thread
From: Radim Krčmář @ 2015-02-19 16:07 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

2015-02-19 17:02+0100, Radim Krčmář:
> Fixes: e011c663b9c7 ("Check all exceptions for intercept during delivery to L2")

Note: I haven't verified that it was introduced by this patch, just
nothing against the hypothesis popped out in a short gravedigging.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-19 16:02           ` Radim Krčmář
  2015-02-19 16:07             ` Radim Krčmář
@ 2015-02-19 21:10             ` Kashyap Chamarthy
  2015-02-19 22:28               ` Kashyap Chamarthy
  1 sibling, 1 reply; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-19 21:10 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

On Thu, Feb 19, 2015 at 05:02:22PM +0100, Radim Krčmář wrote:
> 2015-02-19 16:01+0100, Radim Krčmář:
> > 2015-02-19 13:07+0100, Kashyap Chamarthy:
> > 5f3d5799974b8 KVM: nVMX: Rework event injection and recovery:
> >   This concept is based on the rule that a pending vmlaunch/vmresume is
> >   not canceled. Otherwise, we would risk to lose injected events or leak
> >   them into the wrong queues. Encode this rule via a WARN_ON_ONCE at the
> >   entry of nested_vmx_vmexit.
> > 
> > I wonder if we have broken the invariant since 3.9 ...
> 
> e011c663b9c786d115c0f45e5b0bfae0c39428d4
> KVM: nVMX: Check all exceptions for intercept during delivery to L2
> 
>   All exceptions should be checked for intercept during delivery to L2,
>   but we check only #PF currently. Drop nested_run_pending while we are
>   at it since exception cannot be injected during vmentry anyway.
> 
> The last sentence is not true.
> 
> Can you try if the following patch works?

Sure, will test a Kernel built with the below patch and report back.

Thanks for taking a look.

--
/kashyap


> (I know little about nested, so it might be introducing another bug.)
> 
> Thanks.
> 
> ---8<---
> KVM: nVMX: fix L2 to L1 interrupt leak
> 
> When vmx->nested.nested_run_pending is set, we aren't expected to exit
> to L1, but nested_vmx_check_exception() could, since e011c663b9c7.
> Prevent that.
> 
> Fixes: e011c663b9c7 ("Check all exceptions for intercept during delivery to L2")
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 3f73bfad0349..389166a1b79a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2098,6 +2098,9 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
>  {
>  	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>  
> +	if (to_vmx(vcpu)->nested.nested_run_pending)
> +		return 0;
> +
>  	if (!(vmcs12->exception_bitmap & (1u << nr)))
>  		return 0;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-19 21:10             ` Kashyap Chamarthy
@ 2015-02-19 22:28               ` Kashyap Chamarthy
  2015-02-20 16:14                 ` Radim Krčmář
  0 siblings, 1 reply; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-19 22:28 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

On Thu, Feb 19, 2015 at 10:10:11PM +0100, Kashyap Chamarthy wrote:
> On Thu, Feb 19, 2015 at 05:02:22PM +0100, Radim Krčmář wrote:

[. . .]

> > Can you try if the following patch works?
> 
> Sure, will test a Kernel built with the below patch and report back.

Hmm, I'm stuck with a meta issue.

I checked out the KVM tree[1] on L0, applied your patch and built[*] the
Kernel, and booted into it. Boot fails and drops into a dracut shell on
because:

 . . .
 dracut-initqueue[3045]: Warning: Cancelling resume operation. Device not found.
 [ TIME ] Timed out waiting for device
 dev-ma...per910\x2d\x2d02\x2droot.device.
 [DEPEND] Dependency failed for /sysroot.
 [DEPEND] Dependency failed for Initrd Root File SyWarning:
 /dev/disk/by-uuid/4ccddb2d-4d63-4fce-b4d4-9b2f119a30cc does not exist
 . . .

I saved the report from /run/initramfs/rdsosreport.txt here[2].


Then, I did another test:

  - Rebooted into Kernel 3.20.0-0.rc0.git5.1.fc23.x86_64 on physical
    host (L0).
  - In L1, checked out the KVM tree, applied your patch and built
    Kernel[*] from the current KVM tree and booted into the newly built
    one, here too, I'm thrown into a dracut shell


[1] git://git.kernel.org/pub/scm/virt/kvm/kvm.git
[2] https://kashyapc.fedorapeople.org/temp/kernel-boot-failure.txt

[*] Exactly, I built it this way:

  # Clone the tree
  $ git://git.kernel.org/pub/scm/virt/kvm/kvm.git

  # Make a new branch:
  $ git checkout -b nvmx_test
  $ git describe
  warning: tag 'for-linus' is really 'kvm-3.19-1' here
  for-linus-14459-g49776d5
  
  # Make a config file
  $ make defconfig
  
  # Compile
  $ make -j4 && make bzImage && make modules
  
  # Install
  $ sudo -i
  $ make modules_install && make install

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-19 22:28               ` Kashyap Chamarthy
@ 2015-02-20 16:14                 ` Radim Krčmář
  2015-02-20 19:45                   ` Kashyap Chamarthy
  0 siblings, 1 reply; 24+ messages in thread
From: Radim Krčmář @ 2015-02-20 16:14 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

2015-02-19 23:28+0100, Kashyap Chamarthy:
> On Thu, Feb 19, 2015 at 10:10:11PM +0100, Kashyap Chamarthy wrote:
> > On Thu, Feb 19, 2015 at 05:02:22PM +0100, Radim Krčmář wrote:
> 
> [. . .]
> 
> > > Can you try if the following patch works?
> > 
> > Sure, will test a Kernel built with the below patch and report back.
> 
> Hmm, I'm stuck with a meta issue.
> 
> I checked out the KVM tree[1] on L0, applied your patch and built[*] the
> Kernel, and booted into it. Boot fails and drops into a dracut shell on
> because:
> 
>  . . .
>  dracut-initqueue[3045]: Warning: Cancelling resume operation. Device not found.
>  [ TIME ] Timed out waiting for device
>  dev-ma...per910\x2d\x2d02\x2droot.device.
>  [DEPEND] Dependency failed for /sysroot.
>  [DEPEND] Dependency failed for Initrd Root File SyWarning:
>  /dev/disk/by-uuid/4ccddb2d-4d63-4fce-b4d4-9b2f119a30cc does not exist
>  . . .
> 
> I saved the report from /run/initramfs/rdsosreport.txt here[2].
> 
> 
> Then, I did another test:
> 
>   - Rebooted into Kernel 3.20.0-0.rc0.git5.1.fc23.x86_64 on physical
>     host (L0).
>   - In L1, checked out the KVM tree, applied your patch and built
>     Kernel[*] from the current KVM tree and booted into the newly built
>     one, here too, I'm thrown into a dracut shell

Weird, but considering that boot fails on L0 as well, I think it that
basing off a different commit could help ...

> [1] git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> [2] https://kashyapc.fedorapeople.org/temp/kernel-boot-failure.txt
> 
> [*] Exactly, I built it this way:
> 
>   # Clone the tree
>   $ git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> 
>   # Make a new branch:
>   $ git checkout -b nvmx_test
>   $ git describe
>   warning: tag 'for-linus' is really 'kvm-3.19-1' here
>   for-linus-14459-g49776d5

Hm, it should say v3.19 -- does it stay the same if you do
`git fetch && git checkout origin/master`?

If it still does, please try to apply it on top of `git checkout v3.18`.
(The one that one failed too.)

>   # Make a config file
>   $ make defconfig

It would be safer to copy the fedora config (from /boot) to .config and
do `make olddefconfig`.

>   # Compile
>   $ make -j4 && make bzImage && make modules
>   
>   # Install
>   $ sudo -i
>   $ make modules_install && make install
> 
> -- 
> /kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-20 16:14                 ` Radim Krčmář
@ 2015-02-20 19:45                   ` Kashyap Chamarthy
  2015-02-22 15:46                     ` Kashyap Chamarthy
  0 siblings, 1 reply; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-20 19:45 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

On Fri, Feb 20, 2015 at 05:14:15PM +0100, Radim Krčmář wrote:
> 2015-02-19 23:28+0100, Kashyap Chamarthy:
> > On Thu, Feb 19, 2015 at 10:10:11PM +0100, Kashyap Chamarthy wrote:
> > > On Thu, Feb 19, 2015 at 05:02:22PM +0100, Radim Krčmář wrote:

[. . .]

> > Then, I did another test:
> > 
> >   - Rebooted into Kernel 3.20.0-0.rc0.git5.1.fc23.x86_64 on physical
> >     host (L0).
> >   - In L1, checked out the KVM tree, applied your patch and built
> >     Kernel[*] from the current KVM tree and booted into the newly built
> >     one, here too, I'm thrown into a dracut shell
> 
> Weird, but considering that boot fails on L0 as well, I think it that
> basing off a different commit could help ...

What I missed to do was to build initramfs:

    $ cd /boot
    $ dracut initramfs-3.19.0+.img 3.19.0+ --force

Then I can boot. However, networking was hosed due to this bug[1] in
`dhclient` (Andrea Arcangeli said it's fixed for him in newest Kernels,
but unfortunately it's still not fixed for me as I noted in the bug.

Anyway, for the nVMX bug in question, I actually built a Fedora scratch
Kernel build[2], with your fix, which was successful[3]. I will test
with it once I get the networking fixed on the physical machine,
hopefully, early next week.

> > [*] Exactly, I built it this way:
> > 
> >   # Clone the tree
> >   $ git://git.kernel.org/pub/scm/virt/kvm/kvm.git
> > 
> >   # Make a new branch:
> >   $ git checkout -b nvmx_test
> >   $ git describe
> >   warning: tag 'for-linus' is really 'kvm-3.19-1' here
> >   for-linus-14459-g49776d5
> 
> Hm, it should say v3.19 -- does it stay the same if you do
> `git fetch && git checkout origin/master`?
> 
> If it still does, please try to apply it on top of `git checkout v3.18`.
> (The one that one failed too.)
> 
> >   # Make a config file
> >   $ make defconfig
> 
> It would be safer to copy the fedora config (from /boot) to .config and
> do `make olddefconfig`.

That's actually what I did on my later compiles.

For now, as noted above, will test with the Fedora Kernel scratch build
I made.

  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1194809 --  `dhclient`
      crashes on boot
  [2] http://koji.fedoraproject.org/koji/taskinfo?taskID=9004708
  [3] https://kojipkgs.fedoraproject.org//work/tasks/4708/9004708/build.log

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-20 19:45                   ` Kashyap Chamarthy
@ 2015-02-22 15:46                     ` Kashyap Chamarthy
  2015-02-23 13:56                       ` Radim Krčmář
  0 siblings, 1 reply; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-22 15:46 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

Radim,

I just tested with your patch[1] in this thread. I built a Fedora
Kernel[2] with it, and installed (and booted into) it on both L0 and L1. 

Result: I don't have good news, I'm afraid: L1 *still* reboots when an
        L2 guest is booted. And, L0 throws the stack trace that was
        previously noted on this thread:

. . .
[<   57.747345>] ------------[ cut here ]------------
[<    0.004638>] WARNING: CPU: 5 PID: 50206 at arch/x86/kvm/vmx.c:8962 nested_vmx_vmexit+0x7ee/0x880 [kvm_intel]()
[<    0.009903>] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defra
g_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables kvm_intel coretemp iTCO_wdt kvm ipmi_devintf iTCO_vendor_support i7core_edac gpio_ich c
rc32c_intel serio_raw edac_core ipmi_si dcdbas shpchp tpm_tis lpc_ich mfd_core tpm ipmi_msghandler wmi acpi_power_meter acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc mgag200 i2c_algo_bit drm_kms_helper ttm drm ata_generic megaraid_sas bnx2 pata_acpi [last unloaded: kvm_intel]
[<    0.060404>] CPU: 5 PID: 50206 Comm: qemu-system-x86 Not tainted 3.18.7-200.fc21.x86_64 #1
[  +0.008220] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
[  +0.007526]  0000000000000000 00000000a30d0ba3 ffff883f2489fc48 ffffffff8175e686
[  +0.007688]  0000000000000000 0000000000000000 ffff883f2489fc88 ffffffff810991d1
[  +0.007613]  ffff883f2489fc98 ffff88bece1ba000 0000000000000000 0000000000000014
[  +0.007611] Call Trace:
[  +0.002518]  [<ffffffff8175e686>] dump_stack+0x46/0x58
[  +0.005202]  [<ffffffff810991d1>] warn_slowpath_common+0x81/0xa0
[  +0.006055]  [<ffffffff810992ea>] warn_slowpath_null+0x1a/0x20
[  +0.005889]  [<ffffffffa02f00ee>] nested_vmx_vmexit+0x7ee/0x880 [kvm_intel]
[  +0.007014]  [<ffffffffa02f05af>] ? vmx_handle_exit+0x1bf/0xaa0 [kvm_intel]
[  +0.007015]  [<ffffffffa02f039c>] vmx_queue_exception+0xfc/0x150 [kvm_intel]
[  +0.007130]  [<ffffffffa028cdfd>] kvm_arch_vcpu_ioctl_run+0xd9d/0x1290 [kvm]
[  +0.007111]  [<ffffffffa0288528>] ? kvm_arch_vcpu_load+0x58/0x220 [kvm]
[  +0.006670]  [<ffffffffa0274cbc>] kvm_vcpu_ioctl+0x32c/0x5c0 [kvm]
[  +0.006236]  [<ffffffff810d0f7b>] ? put_prev_entity+0x5b/0x400
[  +0.005887]  [<ffffffff810cbb37>] ? set_next_entity+0x67/0x80
[  +0.005802]  [<ffffffff810d4549>] ? pick_next_task_fair+0x6c9/0x8c0
[  +0.006324]  [<ffffffff810126d6>] ? __switch_to+0x1d6/0x5f0
[  +0.005626]  [<ffffffff8122a1c0>] do_vfs_ioctl+0x2d0/0x4b0
[  +0.005543]  [<ffffffff81760764>] ? __schedule+0x2f4/0x8a0
[  +0.005537]  [<ffffffff8122a421>] SyS_ioctl+0x81/0xa0
[  +0.005106]  [<ffffffff81765429>] system_call_fastpath+0x12/0x17
[  +0.006056] ---[ end trace 646ed2360b84865c ]---
[  +7.000298] kvm [50179]: vcpu0 unhandled rdmsr: 0x1c9
[  +0.005061] kvm [50179]: vcpu0 unhandled rdmsr: 0x1a6
[  +0.005053] kvm [50179]: vcpu0 unhandled rdmsr: 0x3f6
. . .



  [1] http://article.gmane.org/gmane.comp.emulators.kvm.devel/132937
  [2] http://koji.fedoraproject.org/koji/taskinfo?taskID=9004708

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-22 15:46                     ` Kashyap Chamarthy
@ 2015-02-23 13:56                       ` Radim Krčmář
  2015-02-23 16:14                         ` Kashyap Chamarthy
  0 siblings, 1 reply; 24+ messages in thread
From: Radim Krčmář @ 2015-02-23 13:56 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

2015-02-22 16:46+0100, Kashyap Chamarthy:
> Radim,
> 
> I just tested with your patch[1] in this thread. I built a Fedora
> Kernel[2] with it, and installed (and booted into) it on both L0 and L1. 
> 
> Result: I don't have good news, I'm afraid: L1 *still* reboots when an
>         L2 guest is booted. And, L0 throws the stack trace that was
>         previously noted on this thread:

Thanks, I'm puzzled though ... isn't it possible that a wrong kernel
sneaked into grub?

> . . .
> [<   57.747345>] ------------[ cut here ]------------
> [<    0.004638>] WARNING: CPU: 5 PID: 50206 at arch/x86/kvm/vmx.c:8962 nested_vmx_vmexit+0x7ee/0x880 [kvm_intel]()
> [<    0.060404>] CPU: 5 PID: 50206 Comm: qemu-system-x86 Not tainted 3.18.7-200.fc21.x86_64 #1

This looks like a new backtrace, but the kernel is not [2].

> [  +0.006055]  [<ffffffff810992ea>] warn_slowpath_null+0x1a/0x20
> [  +0.005889]  [<ffffffffa02f00ee>] nested_vmx_vmexit+0x7ee/0x880 [kvm_intel]
> [  +0.007014]  [<ffffffffa02f05af>] ? vmx_handle_exit+0x1bf/0xaa0 [kvm_intel]
> [  +0.007015]  [<ffffffffa02f039c>] vmx_queue_exception+0xfc/0x150 [kvm_intel]
> [  +0.007130]  [<ffffffffa028cdfd>] kvm_arch_vcpu_ioctl_run+0xd9d/0x1290 [kvm]

(There is only one execution path and unless there is a race, it would
 be prevented by [1].)

> [  +0.007111]  [<ffffffffa0288528>] ? kvm_arch_vcpu_load+0x58/0x220 [kvm]
> [  +0.006670]  [<ffffffffa0274cbc>] kvm_vcpu_ioctl+0x32c/0x5c0 [kvm]
[...]
>   [1] http://article.gmane.org/gmane.comp.emulators.kvm.devel/132937
>   [2] http://koji.fedoraproject.org/koji/taskinfo?taskID=9004708

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-23 13:56                       ` Radim Krčmář
@ 2015-02-23 16:14                         ` Kashyap Chamarthy
  2015-02-23 17:09                           ` Kashyap Chamarthy
  0 siblings, 1 reply; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-23 16:14 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Kashyap Chamarthy, Paolo Bonzini, Jan Kiszka, kvm, dgilbert

On Mon, Feb 23, 2015 at 02:56:11PM +0100, Radim Krčmář wrote:
> 2015-02-22 16:46+0100, Kashyap Chamarthy:
> > Radim,
> > 
> > I just tested with your patch[1] in this thread. I built a Fedora
> > Kernel[2] with it, and installed (and booted into) it on both L0 and L1. 
> > 
> > Result: I don't have good news, I'm afraid: L1 *still* reboots when an
> >         L2 guest is booted. And, L0 throws the stack trace that was
> >         previously noted on this thread:
> 
> Thanks, I'm puzzled though ... isn't it possible that a wrong kernel
> sneaked into grub?

Hmm, unlikely - I just double-confirmed that I'm running the same
patched Kernel (3.20.0-0.rc0.git9.1.fc23.x86_64) on both L0 and L1.
 
> > . . .
> > [<   57.747345>] ------------[ cut here ]------------
> > [<    0.004638>] WARNING: CPU: 5 PID: 50206 at arch/x86/kvm/vmx.c:8962 nested_vmx_vmexit+0x7ee/0x880 [kvm_intel]()
> > [<    0.060404>] CPU: 5 PID: 50206 Comm: qemu-system-x86 Not tainted 3.18.7-200.fc21.x86_64 #1
> 
> This looks like a new backtrace, but the kernel is not [2].

Err, looks like I pasted the wrong one, but here it is again. I just
tested with the patched Kernel (that I linked below) on both L0 and L1,
the same behavior (L1 reboot on L2 boot) manifests:

. . .
[<    0.058440>] CPU: 8 PID: 1828 Comm: qemu-system-x86 Not tainted 3.20.0-0.rc0.git9.1.fc23.x86_64 #1
[<    0.008856>] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
[<    0.007475>]  0000000000000000 0000000097b7f39b ffff883f5acc3bf8 ffffffff818773cd
[<    0.007477>]  0000000000000000 0000000000000000 ffff883f5acc3c38 ffffffff810ab3ba
[<    0.007495>]  ffff883f5acc3c68 ffff887f62678000 0000000000000000 0000000000000000
[<    0.007489>] Call Trace:
[<    0.002455>]  [<ffffffff818773cd>] dump_stack+0x4c/0x65
[<    0.005139>]  [<ffffffff810ab3ba>] warn_slowpath_common+0x8a/0xc0
[<    0.006001>]  [<ffffffff810ab4ea>] warn_slowpath_null+0x1a/0x20
[<    0.005831>]  [<ffffffffa220cf8e>] nested_vmx_vmexit+0xbde/0xd30 [kvm_intel]
[<    0.006957>]  [<ffffffffa220fda3>] ? vmx_handle_exit+0x213/0xd80 [kvm_intel]
[<    0.006956>]  [<ffffffffa220d3fa>] vmx_queue_exception+0x10a/0x150 [kvm_intel]
[<    0.007160>]  [<ffffffffa03c8cdb>] kvm_arch_vcpu_ioctl_run+0x107b/0x1b60 [kvm]
[<    0.007138>]  [<ffffffffa03c833a>] ? kvm_arch_vcpu_ioctl_run+0x6da/0x1b60 [kvm]
[<    0.007219>]  [<ffffffff8110725d>] ? trace_hardirqs_on+0xd/0x10
[<    0.005837>]  [<ffffffffa03b0666>] ? vcpu_load+0x26/0x70 [kvm]
[<    0.005745>]  [<ffffffff8110385f>] ? lock_release_holdtime.part.29+0xf/0x200
[<    0.006966>]  [<ffffffffa03c3a68>] ? kvm_arch_vcpu_load+0x58/0x210 [kvm]
[<    0.006618>]  [<ffffffffa03b0a73>] kvm_vcpu_ioctl+0x383/0x7e0 [kvm]
[<    0.006175>]  [<ffffffff81027b9d>] ? native_sched_clock+0x2d/0xa0
[<    0.006000>]  [<ffffffff810d5c56>] ? creds_are_invalid.part.1+0x16/0x50
[<    0.006518>]  [<ffffffff810d5cb1>] ? creds_are_invalid+0x21/0x30
[<    0.005918>]  [<ffffffff813a77fa>] ? inode_has_perm.isra.48+0x2a/0xa0
[<    0.006350>]  [<ffffffff8128c9a8>] do_vfs_ioctl+0x2e8/0x530
[<    0.005514>]  [<ffffffff8128cc71>] SyS_ioctl+0x81/0xa0
[<    0.005051>]  [<ffffffff81880969>] system_call_fastpath+0x12/0x17
[<    0.005999>] ---[ end trace 3e4dca7180cdddab ]---
[<    5.529564>] kvm [1766]: vcpu0 unhandled rdmsr: 0x1c9
[<    0.005026>] kvm [1766]: vcpu0 unhandled rdmsr: 0x1a6
[<    0.004998>] kvm [1766]: vcpu0 unhandled rdmsr: 0x3f6
. . .
 
> > [  +0.006055]  [<ffffffff810992ea>] warn_slowpath_null+0x1a/0x20
> > [  +0.005889]  [<ffffffffa02f00ee>] nested_vmx_vmexit+0x7ee/0x880 [kvm_intel]
> > [  +0.007014]  [<ffffffffa02f05af>] ? vmx_handle_exit+0x1bf/0xaa0 [kvm_intel]
> > [  +0.007015]  [<ffffffffa02f039c>] vmx_queue_exception+0xfc/0x150 [kvm_intel]
> > [  +0.007130]  [<ffffffffa028cdfd>] kvm_arch_vcpu_ioctl_run+0xd9d/0x1290 [kvm]
> 
> (There is only one execution path and unless there is a race, it would
>  be prevented by [1].)
> 
> > [  +0.007111]  [<ffffffffa0288528>] ? kvm_arch_vcpu_load+0x58/0x220 [kvm]
> > [  +0.006670]  [<ffffffffa0274cbc>] kvm_vcpu_ioctl+0x32c/0x5c0 [kvm]
> [...]
> >   [1] http://article.gmane.org/gmane.comp.emulators.kvm.devel/132937
> >   [2] http://koji.fedoraproject.org/koji/taskinfo?taskID=9004708

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-23 16:14                         ` Kashyap Chamarthy
@ 2015-02-23 17:09                           ` Kashyap Chamarthy
  2015-02-23 18:05                             ` Kashyap Chamarthy
  0 siblings, 1 reply; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-23 17:09 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

On Mon, Feb 23, 2015 at 05:14:37PM +0100, Kashyap Chamarthy wrote:
> On Mon, Feb 23, 2015 at 02:56:11PM +0100, Radim Krčmář wrote:
> > 2015-02-22 16:46+0100, Kashyap Chamarthy:
> > > Radim,
> > > 
> > > I just tested with your patch[1] in this thread. I built a Fedora
> > > Kernel[2] with it, and installed (and booted into) it on both L0 and L1. 
> > > 
> > > Result: I don't have good news, I'm afraid: L1 *still* reboots when an
> > >         L2 guest is booted. And, L0 throws the stack trace that was
> > >         previously noted on this thread:
> > 
> > Thanks, I'm puzzled though ... isn't it possible that a wrong kernel
> > sneaked into grub?
> 
> Hmm, unlikely - I just double-confirmed that I'm running the same
> patched Kernel (3.20.0-0.rc0.git9.1.fc23.x86_64) on both L0 and L1.

[Correcting myself here.]

Unfortunately, I was double-wrong and your guess is right -- I seemed to
have made _two_ Kernel builds (one doesn't contain your patch, and the
other) and now not sure _which_ one I used as I didn't add a custom tag.
To confuse more, I pointed the URL to wrong build (without your fix)
previously in this thread - so likely I must have used that in my last
test.

The correct build is here:

    http://koji.fedoraproject.org/koji/taskinfo?taskID=9006612

And, the build log does confirm the 'nvmx-fix.patch' that was applied

    https://kojipkgs.fedoraproject.org//work/tasks/6612/9006612/build.log

The contents of the patch, I just generated a patch with `diff -u orig
new > nvmx-fix.patch` forgetting that the Fedora Kernel handles git
formatted patches just fine.

$ cat nvmx-fix.patch 
--- vmx.c.orig  2015-02-20 19:09:49.850841320 +0100
+++ vmx.c   2015-02-20 19:11:12.153491715 +0100
@@ -2038,6 +2038,9 @@
 {
    struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
 
+    if (to_vmx(vcpu)->nested.nested_run_pending)
+        return 0;
+
    if (!(vmcs12->exception_bitmap & (1u << nr)))
        return 0;

So, my conclusion was wrong and need to report back with the _proper_
Kernel build.

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting*
  2015-02-23 17:09                           ` Kashyap Chamarthy
@ 2015-02-23 18:05                             ` Kashyap Chamarthy
  2015-02-24 16:30                               ` [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0 Radim Krčmář
  0 siblings, 1 reply; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-23 18:05 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert

Tested with the _correct_ Kernel[1] (that has Radim's patch) now --
applied it on both L0 and L1.

Result: Same as before -- Booting L2 causes L1 to reboot. However, the
        stack trace from `dmesg` on L0 is took slightly different path than
        before -- it's using MSR handling:

. . .
[Feb23 12:14] ------------[ cut here ]------------
[  +0.004658] WARNING: CPU: 5 PID: 1785 at arch/x86/kvm/vmx.c:9973 nested_vmx_vmexit+0xbde/0xd30 [kvm_intel]()
[  +0.009897] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_i
pv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iTCO_wdt ipmi_devintf gpio_ich iTCO_vendor_support coretemp kvm_intel dcdbas kvm crc32c_in
tel joydev ipmi_ssif serio_raw ipmi_si tpm_tis i7core_edac lpc_ich ipmi_msghandler edac_core tpm mfd_core shpchp wmi acpi_power_meter acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
 mgag200 i2c_algo_bit drm_kms_helper ttm ata_generic drm pata_acpi megaraid_sas bnx2
[  +0.060790] CPU: 5 PID: 1785 Comm: qemu-system-x86 Not tainted 3.20.0-0.rc0.git9.1.fc23.x86_64 #1
[  +0.008938] Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.8.2 10/25/2012
[  +0.007476]  0000000000000000 000000008ba15f99 ffff88ff5d627b38 ffffffff818773cd
[  +0.007727]  0000000000000000 0000000000000000 ffff88ff5d627b78 ffffffff810ab3ba
[  +0.007660]  ffff88ff5d627b68 ffff883f5fd20000 0000000000000000 0000000000000000
[  +0.007729] Call Trace:
[  +0.002543]  [<ffffffff818773cd>] dump_stack+0x4c/0x65
[  +0.005205]  [<ffffffff810ab3ba>] warn_slowpath_common+0x8a/0xc0
[  +0.006085]  [<ffffffff810ab4ea>] warn_slowpath_null+0x1a/0x20
[  +0.005915]  [<ffffffffa0244f8e>] nested_vmx_vmexit+0xbde/0xd30 [kvm_intel]
[  +0.007061]  [<ffffffffa0245976>] vmx_set_msr+0x416/0x420 [kvm_intel]
[  +0.006549]  [<ffffffffa029f0c0>] ? kvm_set_msr+0x70/0x70 [kvm]
[  +0.006018]  [<ffffffffa029f091>] kvm_set_msr+0x41/0x70 [kvm]
[  +0.005840]  [<ffffffffa029f0f3>] do_set_msr+0x33/0x50 [kvm]
[  +0.005692]  [<ffffffffa02a3a80>] msr_io+0x100/0x1c0 [kvm]
[  +0.005567]  [<ffffffffa02a3a10>] ? msr_io+0x90/0x1c0 [kvm]
[  +0.005657]  [<ffffffffa023de70>] ? handle_task_switch+0x1f0/0x1f0 [kvm_intel]
[  +0.007321]  [<ffffffffa02ac799>] kvm_arch_vcpu_ioctl+0xb79/0x11a0 [kvm]
[  +0.006788]  [<ffffffffa023f7fe>] ? vmx_vcpu_load+0x15e/0x1e0 [kvm_intel]
[  +0.006878]  [<ffffffffa0298666>] ? vcpu_load+0x26/0x70 [kvm]
[  +0.005825]  [<ffffffffa02abac3>] ? kvm_arch_vcpu_load+0xb3/0x210 [kvm]
[  +0.006712]  [<ffffffffa02987da>] kvm_vcpu_ioctl+0xea/0x7e0 [kvm]
[  +0.006140]  [<ffffffff81027b9d>] ? native_sched_clock+0x2d/0xa0
[  +0.006063]  [<ffffffff810d5c56>] ? creds_are_invalid.part.1+0x16/0x50
[  +0.006583]  [<ffffffff810d5cb1>] ? creds_are_invalid+0x21/0x30
[  +0.005984]  [<ffffffff813a77fa>] ? inode_has_perm.isra.48+0x2a/0xa0
[  +0.006436]  [<ffffffff8128c9a8>] do_vfs_ioctl+0x2e8/0x530
[  +0.005559]  [<ffffffff8128cc71>] SyS_ioctl+0x81/0xa0
[  +0.005135]  [<ffffffff81880969>] system_call_fastpath+0x12/0x17
[  +0.006065] ---[ end trace a7f3bc31fb0ddbff ]---
. . .


[1] https://kashyapc.fedorapeople.org/kernel-3.20.0-0.rc0.git9.1.fc23.rpms-with-nvmx-test-fix-from-radim/
     - I uploaded the Fedora Koji scratch build for this Kernel to a
       more permanant location, as these type of builds will be removed
       automatically after 3 weeks

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0
  2015-02-23 18:05                             ` Kashyap Chamarthy
@ 2015-02-24 16:30                               ` Radim Krčmář
  2015-02-24 16:39                                 ` Jan Kiszka
  2015-02-25 15:50                                 ` Kashyap Chamarthy
  0 siblings, 2 replies; 24+ messages in thread
From: Radim Krčmář @ 2015-02-24 16:30 UTC (permalink / raw)
  To: Kashyap Chamarthy; +Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert, bsd, mtosatti

2015-02-23 19:05+0100, Kashyap Chamarthy:
> Tested with the _correct_ Kernel[1] (that has Radim's patch) now --
> applied it on both L0 and L1.
> 
> Result: Same as before -- Booting L2 causes L1 to reboot. However, the
>         stack trace from `dmesg` on L0 is took slightly different path than
>         before -- it's using MSR handling:

Thanks, the problem was deeper ... L1 enabled unrestricted mode while L0
had it disabled.  L1 could then vmrun a L2 state that L0 would have to
emulate, but that doesn't work.  There are at least these solutions:

 1) don't expose unrestricted_guest when L0 doesn't have it
 2) fix unrestricted mode emulation code
 3) handle the failure a without killing L1

I'd do just (1) -- emulating unrestricted mode is a loss.

I have done initial testing and at least qemu-sanity-check works now:

---8<---
If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
L0.  L1 triple faulted when running L2 guest that required emulation.

Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
in L0's dmesg:
  WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()

Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
the host doesn't have it enabled.

Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f7b20b417a3a..dbabea21357b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2476,8 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
 	if (enable_ept) {
 		/* nested EPT: emulate EPT also to L1 */
 		vmx->nested.nested_vmx_secondary_ctls_high |=
-			SECONDARY_EXEC_ENABLE_EPT |
-			SECONDARY_EXEC_UNRESTRICTED_GUEST;
+			SECONDARY_EXEC_ENABLE_EPT;
 		vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
 			 VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
 			 VMX_EPT_INVEPT_BIT;
@@ -2491,6 +2490,10 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
 	} else
 		vmx->nested.nested_vmx_ept_caps = 0;
 
+	if (enable_unrestricted_guest)
+		vmx->nested.nested_vmx_secondary_ctls_high |=
+			SECONDARY_EXEC_UNRESTRICTED_GUEST;
+
 	/* miscellaneous data */
 	rdmsr(MSR_IA32_VMX_MISC,
 		vmx->nested.nested_vmx_misc_low,

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0
  2015-02-24 16:30                               ` [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0 Radim Krčmář
@ 2015-02-24 16:39                                 ` Jan Kiszka
  2015-02-24 18:32                                   ` Bandan Das
  2015-02-25 15:50                                 ` Kashyap Chamarthy
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Kiszka @ 2015-02-24 16:39 UTC (permalink / raw)
  To: Radim Krčmář, Kashyap Chamarthy
  Cc: Paolo Bonzini, kvm, dgilbert, bsd, mtosatti

On 2015-02-24 17:30, Radim Krčmář wrote:
> 2015-02-23 19:05+0100, Kashyap Chamarthy:
>> Tested with the _correct_ Kernel[1] (that has Radim's patch) now --
>> applied it on both L0 and L1.
>>
>> Result: Same as before -- Booting L2 causes L1 to reboot. However, the
>>         stack trace from `dmesg` on L0 is took slightly different path than
>>         before -- it's using MSR handling:
> 
> Thanks, the problem was deeper ... L1 enabled unrestricted mode while L0
> had it disabled.  L1 could then vmrun a L2 state that L0 would have to
> emulate, but that doesn't work.  There are at least these solutions:
> 
>  1) don't expose unrestricted_guest when L0 doesn't have it

Reminds me of a patch called "KVM: nVMX: Disable unrestricted mode if
ept=0" by Bandan. I thought that would have caught it - apparently not.

>  2) fix unrestricted mode emulation code
>  3) handle the failure a without killing L1
> 
> I'd do just (1) -- emulating unrestricted mode is a loss.

Agreed.

Jan

> 
> I have done initial testing and at least qemu-sanity-check works now:
> 
> ---8<---
> If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
> L0.  L1 triple faulted when running L2 guest that required emulation.
> 
> Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
> in L0's dmesg:
>   WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()
> 
> Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
> the host doesn't have it enabled.
> 
> Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0")
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> ---
>  arch/x86/kvm/vmx.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f7b20b417a3a..dbabea21357b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2476,8 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>  	if (enable_ept) {
>  		/* nested EPT: emulate EPT also to L1 */
>  		vmx->nested.nested_vmx_secondary_ctls_high |=
> -			SECONDARY_EXEC_ENABLE_EPT |
> -			SECONDARY_EXEC_UNRESTRICTED_GUEST;
> +			SECONDARY_EXEC_ENABLE_EPT;
>  		vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
>  			 VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
>  			 VMX_EPT_INVEPT_BIT;
> @@ -2491,6 +2490,10 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>  	} else
>  		vmx->nested.nested_vmx_ept_caps = 0;
>  
> +	if (enable_unrestricted_guest)
> +		vmx->nested.nested_vmx_secondary_ctls_high |=
> +			SECONDARY_EXEC_UNRESTRICTED_GUEST;
> +
>  	/* miscellaneous data */
>  	rdmsr(MSR_IA32_VMX_MISC,
>  		vmx->nested.nested_vmx_misc_low,
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0
  2015-02-24 16:39                                 ` Jan Kiszka
@ 2015-02-24 18:32                                   ` Bandan Das
  0 siblings, 0 replies; 24+ messages in thread
From: Bandan Das @ 2015-02-24 18:32 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Radim Krčmář,
	Kashyap Chamarthy, Paolo Bonzini, kvm, dgilbert, mtosatti

Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 2015-02-24 17:30, Radim Krčmář wrote:
>> 2015-02-23 19:05+0100, Kashyap Chamarthy:
>>> Tested with the _correct_ Kernel[1] (that has Radim's patch) now --
>>> applied it on both L0 and L1.
>>>
>>> Result: Same as before -- Booting L2 causes L1 to reboot. However, the
>>>         stack trace from `dmesg` on L0 is took slightly different path than
>>>         before -- it's using MSR handling:
>> 
>> Thanks, the problem was deeper ... L1 enabled unrestricted mode while L0
>> had it disabled.  L1 could then vmrun a L2 state that L0 would have to
>> emulate, but that doesn't work.  There are at least these solutions:
>> 
>>  1) don't expose unrestricted_guest when L0 doesn't have it
>
> Reminds me of a patch called "KVM: nVMX: Disable unrestricted mode if
> ept=0" by Bandan. I thought that would have caught it - apparently not.

Yeah... Unrestricted guest could be disabled even if ept=0,
and I incorrectly didn't take that into account.

>>  2) fix unrestricted mode emulation code
>>  3) handle the failure a without killing L1
>> 
>> I'd do just (1) -- emulating unrestricted mode is a loss.
>
> Agreed.
>
> Jan
>
>> 
>> I have done initial testing and at least qemu-sanity-check works now:
>> 
>> ---8<---
>> If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
>> L0.  L1 triple faulted when running L2 guest that required emulation.
>> 
>> Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
>> in L0's dmesg:
>>   WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()
>> 
>> Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
>> the host doesn't have it enabled.
>> 
>> Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0")
>> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

We should Cc stable on this patch.

Bandan
>> ---
>>  arch/x86/kvm/vmx.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f7b20b417a3a..dbabea21357b 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2476,8 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>>  	if (enable_ept) {
>>  		/* nested EPT: emulate EPT also to L1 */
>>  		vmx->nested.nested_vmx_secondary_ctls_high |=
>> -			SECONDARY_EXEC_ENABLE_EPT |
>> -			SECONDARY_EXEC_UNRESTRICTED_GUEST;
>> +			SECONDARY_EXEC_ENABLE_EPT;
>>  		vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
>>  			 VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
>>  			 VMX_EPT_INVEPT_BIT;
>> @@ -2491,6 +2490,10 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>>  	} else
>>  		vmx->nested.nested_vmx_ept_caps = 0;
>>  
>> +	if (enable_unrestricted_guest)
>> +		vmx->nested.nested_vmx_secondary_ctls_high |=
>> +			SECONDARY_EXEC_UNRESTRICTED_GUEST;
>> +
>>  	/* miscellaneous data */
>>  	rdmsr(MSR_IA32_VMX_MISC,
>>  		vmx->nested.nested_vmx_misc_low,
>> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0
  2015-02-24 16:30                               ` [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0 Radim Krčmář
  2015-02-24 16:39                                 ` Jan Kiszka
@ 2015-02-25 15:50                                 ` Kashyap Chamarthy
  1 sibling, 0 replies; 24+ messages in thread
From: Kashyap Chamarthy @ 2015-02-25 15:50 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Paolo Bonzini, Jan Kiszka, kvm, dgilbert, bsd, mtosatti

On Tue, Feb 24, 2015 at 05:30:06PM +0100, Radim Krčmář wrote:
> 2015-02-23 19:05+0100, Kashyap Chamarthy:
> > Tested with the _correct_ Kernel[1] (that has Radim's patch) now --
> > applied it on both L0 and L1.
> > 
> > Result: Same as before -- Booting L2 causes L1 to reboot. However, the
> >         stack trace from `dmesg` on L0 is took slightly different path than
> >         before -- it's using MSR handling:
> 
> Thanks, the problem was deeper ... L1 enabled unrestricted mode while L0
> had it disabled.  L1 could then vmrun a L2 state that L0 would have to
> emulate, but that doesn't work.  There are at least these solutions:
> 
>  1) don't expose unrestricted_guest when L0 doesn't have it
>  2) fix unrestricted mode emulation code
>  3) handle the failure a without killing L1
> 
> I'd do just (1) -- emulating unrestricted mode is a loss.
> 
> I have done initial testing and at least qemu-sanity-check works now:
> 
> ---8<---
> If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
> L0.  L1 triple faulted when running L2 guest that required emulation.
> 
> Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
> in L0's dmesg:
>   WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()
> 
> Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
> the host doesn't have it enabled.
> 
> Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0")
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>


I just built[1] a Kernel with this patch and tested it on L0 and L1 and
can confirm, the patch fixes the issue -- Booting L2 does not cause L1
to reboot.

So:

    Tested-By: Kashyap Chamarthy <kchamart@redhat.com>

Thanks for investigating, Radim!

[1] https://kashyapc.fedorapeople.org/kernel-4.0.0-0.rc1.git1.1.kashyap1.fc23-with-nvmx-fix2-radim/


> ---
>  arch/x86/kvm/vmx.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f7b20b417a3a..dbabea21357b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2476,8 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>  	if (enable_ept) {
>  		/* nested EPT: emulate EPT also to L1 */
>  		vmx->nested.nested_vmx_secondary_ctls_high |=
> -			SECONDARY_EXEC_ENABLE_EPT |
> -			SECONDARY_EXEC_UNRESTRICTED_GUEST;
> +			SECONDARY_EXEC_ENABLE_EPT;
>  		vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
>  			 VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
>  			 VMX_EPT_INVEPT_BIT;
> @@ -2491,6 +2490,10 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
>  	} else
>  		vmx->nested.nested_vmx_ept_caps = 0;
>  
> +	if (enable_unrestricted_guest)
> +		vmx->nested.nested_vmx_secondary_ctls_high |=
> +			SECONDARY_EXEC_UNRESTRICTED_GUEST;
> +
>  	/* miscellaneous data */
>  	rdmsr(MSR_IA32_VMX_MISC,
>  		vmx->nested.nested_vmx_misc_low,

-- 
/kashyap

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-02-25 15:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-16 20:40 [nVMX] With 3.20.0-0.rc0.git5.1 on L0, booting L2 guest results in L1 *rebooting* Kashyap Chamarthy
2015-02-17  6:02 ` Jan Kiszka
2015-02-17 11:24   ` Kashyap Chamarthy
2015-02-17 18:00     ` Bandan Das
2015-02-17 18:07       ` Jan Kiszka
2015-02-18 10:20         ` Kashyap Chamarthy
2015-02-18 16:42     ` Paolo Bonzini
2015-02-19 12:07       ` Kashyap Chamarthy
2015-02-19 15:01         ` Radim Krčmář
2015-02-19 16:02           ` Radim Krčmář
2015-02-19 16:07             ` Radim Krčmář
2015-02-19 21:10             ` Kashyap Chamarthy
2015-02-19 22:28               ` Kashyap Chamarthy
2015-02-20 16:14                 ` Radim Krčmář
2015-02-20 19:45                   ` Kashyap Chamarthy
2015-02-22 15:46                     ` Kashyap Chamarthy
2015-02-23 13:56                       ` Radim Krčmář
2015-02-23 16:14                         ` Kashyap Chamarthy
2015-02-23 17:09                           ` Kashyap Chamarthy
2015-02-23 18:05                             ` Kashyap Chamarthy
2015-02-24 16:30                               ` [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0 Radim Krčmář
2015-02-24 16:39                                 ` Jan Kiszka
2015-02-24 18:32                                   ` Bandan Das
2015-02-25 15:50                                 ` Kashyap Chamarthy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.