From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: Nested virtualization off VMware vSphere 6.0 with EL6 guests crashes on Xen 4.6 Date: Tue, 12 Jan 2016 02:22:03 -0700 Message-ID: <5694D3CB02000078000C5D00@prv-mh.provo.novell.com> References: <20160112033844.GB15551@char.us.oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aIv9U-000605-Ak for xen-devel@lists.xenproject.org; Tue, 12 Jan 2016 09:22:10 +0000 In-Reply-To: <20160112033844.GB15551@char.us.oracle.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: jun.nakajima@intel.com, kevin.tian@intel.com, Konrad Rzeszutek Wilk Cc: andrew.cooper3@citrix.com, wim.coekaerts@oracle.com, xen-devel List-Id: xen-devel@lists.xenproject.org >>> On 12.01.16 at 04:38, wrote: > (XEN) Assertion 'vapic_pg && !p2m_is_paging(p2mt)' failed at vvmx.c:698 > (XEN) ----[ Xen-4.6.0 x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 39 > (XEN) RIP: e008:[] virtual_vmentry+0x487/0xac9 > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (d1v3) > (XEN) rax: 0000000000000000 rbx: ffff83007786c000 rcx: 0000000000000000 > (XEN) rdx: 0000000000000e00 rsi: 000fffffffffffff rdi: ffff83407f81e010 > (XEN) rbp: ffff834008a47ea8 rsp: ffff834008a47e38 r8: 0000000000000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: ffff82c000341000 r14: ffff834008a47f18 > (XEN) r15: ffff83407f7c4000 cr0: 0000000080050033 cr4: 00000000001526e0 > (XEN) cr3: 000000407fb22000 cr2: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff834008a47e38: > (XEN) ffff834008a47e68 ffff82d0801d2cde ffff834008a47e68 0000000000000d00 > (XEN) 0000000000000000 0000000000000000 ffff834008a47e88 00000004801cc30e > (XEN) ffff83007786c000 ffff83007786c000 ffff834008a40000 0000000000000000 > (XEN) ffff834008a47f18 0000000000000000 ffff834008a47f08 ffff82d0801edf94 > (XEN) ffff834008a47ef8 0000000000000000 ffff834008f62000 ffff834008a47f18 > (XEN) 000000ae8c99eb8d ffff83007786c000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82d0801ee2ab > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 00000000078bfbff 0000000000000000 0000000000000000 0000beef0000beef > (XEN) fffffffffc4b3440 000000bf0000beef 0000000000040046 fffffffffc607f00 > (XEN) 000000000000beef 000000000000beef 000000000000beef 000000000000beef > (XEN) 000000000000beef 0000000000000027 ffff83007786c000 0000006f88716300 > (XEN) 0000000000000000 > (XEN) Xen call trace: > (XEN) [] virtual_vmentry+0x487/0xac9 > (XEN) [] nvmx_switch_guest+0x8ff/0x915 > (XEN) [] vmx_asm_vmexit_handler+0x4b/0xc0 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 39: > (XEN) Assertion 'vapic_pg && !p2m_is_paging(p2mt)' failed at vvmx.c:698 > (XEN) **************************************** > (XEN) > > ..and then to my surprise the hypervisor stopped hitting this. Since we can (I hope) pretty much exclude a paging type, the ASSERT() must have triggered because of vapic_pg being NULL. That might be verifiable without extra printk()s, just by checking the disassembly (assuming the value sits in a register). In which case vapic_gpfn would be of interest too. What looks odd to me is the connection between CPU_BASED_TPR_SHADOW being set and the use of a (valid) virtual APIC page: Wouldn't this rather need to depend on SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES, just like in nvmx_update_apic_access_address()? Anyway, the writing of the respective VMCS field to zero in the alternative worries me a little: Aren't we risking MFN zero to be wrongly accessed due to this? Furthermore, nvmx_update_apic_access_address() having a similar ASSERT() seems entirely wrong: The APIC access page doesn't really need to match up with any actual page belonging to the guest - a guest could choose to point this into no-where (note that we've been at least considering this option recently for our own purposes, in the context of http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02191.html). > Instead I started getting an even more bizzare crash: > > > (d1) enter handle_19: > (d1) NULL > (d1) Booting from Hard Disk... > (d1) Booting from 0000:7c00 > (XEN) stdvga.c:151:d1v0 leaving stdvga mode > (XEN) stdvga.c:147:d1v0 entering stdvga and caching modes > (XEN) stdvga.c:520:d1v0 leaving caching mode > (XEN) ----[ Xen-4.6.0 x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 3 > (XEN) RIP: e008:[] vmx_cpu_up+0xacc/0xba5 > (XEN) RFLAGS: 0000000000010242 CONTEXT: hypervisor (d1v1) > (XEN) rax: 0000000000000000 rbx: ffff830077877000 rcx: ffff834077e54000 > (XEN) rdx: ffff834007dc8000 rsi: 0000000000002000 rdi: ffff830077877000 > (XEN) rbp: ffff834007dcfc48 rsp: ffff834007dcfc38 r8: 0000000004040000 > (XEN) r9: 000ffffffffff000 r10: 0000000000000000 r11: fffffffffc423f1e > (XEN) r12: 0000000000002000 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000001526e0 > (XEN) cr3: 0000004000763000 cr2: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff834007dcfc38: > (XEN) ffff834007dcfc98 0000000000000000 ffff834007dcfc68 ffff82d0801e2533 > (XEN) ffff830077877000 0000000000002000 ffff834007dcfc78 ffff82d0801ea933 > (XEN) ffff834007dcfca8 ffff82d0801eaae4 0000000000000000 ffff830077877000 > (XEN) 0000000000000000 ffff834007dcff18 ffff834007dcfd08 ffff82d0801eb983 > (XEN) ffff834000000001 000000013692c000 ffff834000000000 fffffffffc607f28 > (XEN) 0000000000000008 ffff834000000006 ffff834007dcff18 ffff830077877000 > (XEN) 0000000000000015 0000000000000000 ffff834007dcff08 ffff82d0801e8c8d > (XEN) ffff834007763000 ffff8300778c2000 ffff8340007c3000 ffff834007dcfd50 > (XEN) ffff82d0801e120b ffff834007dcfd50 ffff830077877000 ffff834007dcfdf0 > (XEN) 0000000000000000 0000000000000000 ffff82d08012fe0b ffff834007dfcac0 > (XEN) ffff834007dd30e8 0000000000000086 ffff834007dcfda0 ffff82d08012d4c2 > (XEN) ffff834000000003 0000000000000008 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffff834007dcfdf0 ffff8300778c2000 ffff830077877000 > (XEN) ffff834007dd30c8 00000083aa72fdd8 0000000000000001 ffff834007dcfe90 > (XEN) 0000000000000286 ffff834007dcfe18 ffff82d08012d4c2 ffff830077877000 > (XEN) ffff834007dcfe88 ffff82d0801d67b2 92e004e300000002 ffff830077877560 > (XEN) ffff834007dcfe68 ffff82d0801d2cbe ffff834007dcfe68 ffff830077877000 > (XEN) ffff8340007c3000 0000439115b27100 ffff834007dcfe88 ffff82d0801cc2ee > (XEN) ffff830077877000 0000000000000100 ffff834007dcff08 ffff82d0801dfd2a > (XEN) ffff834007dcff18 ffff830077877000 ffff834007dcff08 ffff82d0801e6f09 > (XEN) Xen call trace: > (XEN) [] vmx_cpu_up+0xacc/0xba5 > (XEN) [] virtual_vmcs_vmread+0x1c/0x3f > (XEN) [] get_vvmcs_real+0x9/0xb > (XEN) [] _map_io_bitmap+0x5a/0x9f > (XEN) [] nvmx_handle_vmptrld+0xd5/0x201 > (XEN) [] vmx_vmexit_handler+0x1253/0x19d4 > (XEN) [] vmx_asm_vmexit_handler+0x41/0xc0 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 3: > (XEN) FATAL TRAP: vector = 6 (invalid opcode) > (XEN) **************************************** > (XEN) > (XEN) Manual reset required ('noreboot' specified) > > With the stack and gdb and following it I see: > (gdb) x/20i virtual_vmcs_vmread > 0xffff82d0801e2517 : push %rbp > 0xffff82d0801e2518 : mov %rsp,%rbp > 0xffff82d0801e251b : sub $0x10,%rsp > 0xffff82d0801e251f : mov %rbx,(%rsp) > 0xffff82d0801e2523 : mov %r12,0x8(%rsp) > 0xffff82d0801e2528 : mov %rdi,%rbx > 0xffff82d0801e252b : mov %esi,%r12d > 0xffff82d0801e252e : callq 0xffff82d0801e03f9 > 0xffff82d0801e2533 : mov %r12d,%r12d > 0xffff82d0801e2536 : vmread %r12,%r12 %r12 = 0x2000 (i.e. IO_BITMAP_A) fits the call trace. > 0xffff82d0801e253a : jbe 0xffff82d0801e3df3 The branch target here, however, doesn't fit the crash %rip. In any event, if IO_BITMAP_A is the field being read, then the only failure condition I can see would be "in VMX root operation AND current-VMCS pointer is not valid". > (gdb) x/20i 0xffff82d0801e03f9 > 0xffff82d0801e03f9 : push %rbp > 0xffff82d0801e03fa : mov %rsp,%rbp > 0xffff82d0801e03fd : sub $0x10,%rsp > 0xffff82d0801e0401 : mov 0x5c8(%rdi),%rax > 0xffff82d0801e0408 : mov %rax,-0x8(%rbp) > 0xffff82d0801e040c : vmptrld -0x8(%rbp) > 0xffff82d0801e0410 : jbe 0xffff82d0801e3dc7 While the branch target here matches the exception %rip, this doesn't match the call stack. Something's pretty fishy here. Jan