All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: jun.nakajima@intel.com, kevin.tian@intel.com,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: andrew.cooper3@citrix.com, wim.coekaerts@oracle.com,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: Nested virtualization off VMware vSphere 6.0 with EL6 guests crashes on Xen 4.6
Date: Tue, 12 Jan 2016 02:22:03 -0700	[thread overview]
Message-ID: <5694D3CB02000078000C5D00@prv-mh.provo.novell.com> (raw)
In-Reply-To: <20160112033844.GB15551@char.us.oracle.com>

>>> On 12.01.16 at 04:38, <konrad.wilk@oracle.com> wrote:
> (XEN) Assertion 'vapic_pg && !p2m_is_paging(p2mt)' failed at vvmx.c:698
> (XEN) ----[ Xen-4.6.0  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    39
> (XEN) RIP:    e008:[<ffff82d0801ed053>] virtual_vmentry+0x487/0xac9
> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor (d1v3)
> (XEN) rax: 0000000000000000   rbx: ffff83007786c000   rcx: 0000000000000000
> (XEN) rdx: 0000000000000e00   rsi: 000fffffffffffff   rdi: ffff83407f81e010
> (XEN) rbp: ffff834008a47ea8   rsp: ffff834008a47e38   r8: 0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: ffff82c000341000   r14: ffff834008a47f18
> (XEN) r15: ffff83407f7c4000   cr0: 0000000080050033   cr4: 00000000001526e0
> (XEN) cr3: 000000407fb22000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff834008a47e38:
> (XEN)    ffff834008a47e68 ffff82d0801d2cde ffff834008a47e68 0000000000000d00
> (XEN)    0000000000000000 0000000000000000 ffff834008a47e88 00000004801cc30e
> (XEN)    ffff83007786c000 ffff83007786c000 ffff834008a40000 0000000000000000
> (XEN)    ffff834008a47f18 0000000000000000 ffff834008a47f08 ffff82d0801edf94
> (XEN)    ffff834008a47ef8 0000000000000000 ffff834008f62000 ffff834008a47f18
> (XEN)    000000ae8c99eb8d ffff83007786c000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 ffff82d0801ee2ab
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    00000000078bfbff 0000000000000000 0000000000000000 0000beef0000beef
> (XEN)    fffffffffc4b3440 000000bf0000beef 0000000000040046 fffffffffc607f00
> (XEN)    000000000000beef 000000000000beef 000000000000beef 000000000000beef
> (XEN)    000000000000beef 0000000000000027 ffff83007786c000 0000006f88716300
> (XEN)    0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0801ed053>] virtual_vmentry+0x487/0xac9
> (XEN)    [<ffff82d0801edf94>] nvmx_switch_guest+0x8ff/0x915
> (XEN)    [<ffff82d0801ee2ab>] vmx_asm_vmexit_handler+0x4b/0xc0
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 39:
> (XEN) Assertion 'vapic_pg && !p2m_is_paging(p2mt)' failed at vvmx.c:698
> (XEN) ****************************************
> (XEN)
> 
> ..and then to my surprise the hypervisor stopped hitting this.

Since we can (I hope) pretty much exclude a paging type, the
ASSERT() must have triggered because of vapic_pg being NULL.
That might be verifiable without extra printk()s, just by checking
the disassembly (assuming the value sits in a register). In which
case vapic_gpfn would be of interest too.

What looks odd to me is the connection between
CPU_BASED_TPR_SHADOW being set and the use of a (valid)
virtual APIC page: Wouldn't this rather need to depend on
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES, just like in
nvmx_update_apic_access_address()?

Anyway, the writing of the respective VMCS field to zero in the
alternative worries me a little: Aren't we risking MFN zero to be
wrongly accessed due to this?

Furthermore, nvmx_update_apic_access_address() having a
similar ASSERT() seems entirely wrong: The APIC access
page doesn't really need to match up with any actual page
belonging to the guest - a guest could choose to point this
into no-where (note that we've been at least considering this
option recently for our own purposes, in the context of
http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02191.html).

> Instead I started getting an even more bizzare crash:
> 
> 
> (d1) enter handle_19:
> (d1)   NULL
> (d1) Booting from Hard Disk...
> (d1) Booting from 0000:7c00
> (XEN) stdvga.c:151:d1v0 leaving stdvga mode
> (XEN) stdvga.c:147:d1v0 entering stdvga and caching modes
> (XEN) stdvga.c:520:d1v0 leaving caching mode
> (XEN) ----[ Xen-4.6.0  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    3
> (XEN) RIP:    e008:[<ffff82d0801e3dc7>] vmx_cpu_up+0xacc/0xba5
> (XEN) RFLAGS: 0000000000010242   CONTEXT: hypervisor (d1v1)
> (XEN) rax: 0000000000000000   rbx: ffff830077877000   rcx: ffff834077e54000
> (XEN) rdx: ffff834007dc8000   rsi: 0000000000002000   rdi: ffff830077877000
> (XEN) rbp: ffff834007dcfc48   rsp: ffff834007dcfc38   r8:  0000000004040000
> (XEN) r9:  000ffffffffff000   r10: 0000000000000000   r11: fffffffffc423f1e
> (XEN) r12: 0000000000002000   r13: 0000000000000000   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000001526e0
> (XEN) cr3: 0000004000763000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff834007dcfc38:
> (XEN)    ffff834007dcfc98 0000000000000000 ffff834007dcfc68 ffff82d0801e2533
> (XEN)    ffff830077877000 0000000000002000 ffff834007dcfc78 ffff82d0801ea933
> (XEN)    ffff834007dcfca8 ffff82d0801eaae4 0000000000000000 ffff830077877000
> (XEN)    0000000000000000 ffff834007dcff18 ffff834007dcfd08 ffff82d0801eb983
> (XEN)    ffff834000000001 000000013692c000 ffff834000000000 fffffffffc607f28
> (XEN)    0000000000000008 ffff834000000006 ffff834007dcff18 ffff830077877000
> (XEN)    0000000000000015 0000000000000000 ffff834007dcff08 ffff82d0801e8c8d
> (XEN)    ffff834007763000 ffff8300778c2000 ffff8340007c3000 ffff834007dcfd50
> (XEN)    ffff82d0801e120b ffff834007dcfd50 ffff830077877000 ffff834007dcfdf0
> (XEN)    0000000000000000 0000000000000000 ffff82d08012fe0b ffff834007dfcac0
> (XEN)    ffff834007dd30e8 0000000000000086 ffff834007dcfda0 ffff82d08012d4c2
> (XEN)    ffff834000000003 0000000000000008 0000000000000000 0000000000000000
> (XEN)    0000000000000000 ffff834007dcfdf0 ffff8300778c2000 ffff830077877000
> (XEN)    ffff834007dd30c8 00000083aa72fdd8 0000000000000001 ffff834007dcfe90
> (XEN)    0000000000000286 ffff834007dcfe18 ffff82d08012d4c2 ffff830077877000
> (XEN)    ffff834007dcfe88 ffff82d0801d67b2 92e004e300000002 ffff830077877560
> (XEN)    ffff834007dcfe68 ffff82d0801d2cbe ffff834007dcfe68 ffff830077877000
> (XEN)    ffff8340007c3000 0000439115b27100 ffff834007dcfe88 ffff82d0801cc2ee
> (XEN)    ffff830077877000 0000000000000100 ffff834007dcff08 ffff82d0801dfd2a
> (XEN)    ffff834007dcff18 ffff830077877000 ffff834007dcff08 ffff82d0801e6f09
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0801e3dc7>] vmx_cpu_up+0xacc/0xba5
> (XEN)    [<ffff82d0801e2533>] virtual_vmcs_vmread+0x1c/0x3f
> (XEN)    [<ffff82d0801ea933>] get_vvmcs_real+0x9/0xb
> (XEN)    [<ffff82d0801eaae4>] _map_io_bitmap+0x5a/0x9f
> (XEN)    [<ffff82d0801eb983>] nvmx_handle_vmptrld+0xd5/0x201
> (XEN)    [<ffff82d0801e8c8d>] vmx_vmexit_handler+0x1253/0x19d4
> (XEN)    [<ffff82d0801ee261>] vmx_asm_vmexit_handler+0x41/0xc0
> (XEN) 
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) FATAL TRAP: vector = 6 (invalid opcode)
> (XEN) ****************************************
> (XEN) 
> (XEN) Manual reset required ('noreboot' specified)
> 
> With the stack and gdb and following it I see:
> (gdb) x/20i virtual_vmcs_vmread
>    0xffff82d0801e2517 <virtual_vmcs_vmread>:    push   %rbp
>    0xffff82d0801e2518 <virtual_vmcs_vmread+1>:  mov    %rsp,%rbp
>    0xffff82d0801e251b <virtual_vmcs_vmread+4>:  sub    $0x10,%rsp
>    0xffff82d0801e251f <virtual_vmcs_vmread+8>:  mov    %rbx,(%rsp)
>    0xffff82d0801e2523 <virtual_vmcs_vmread+12>: mov    %r12,0x8(%rsp)
>    0xffff82d0801e2528 <virtual_vmcs_vmread+17>: mov    %rdi,%rbx
>    0xffff82d0801e252b <virtual_vmcs_vmread+20>: mov    %esi,%r12d
>    0xffff82d0801e252e <virtual_vmcs_vmread+23>: callq  0xffff82d0801e03f9 <virtual_vmcs_enter>
>    0xffff82d0801e2533 <virtual_vmcs_vmread+28>: mov    %r12d,%r12d
>    0xffff82d0801e2536 <virtual_vmcs_vmread+31>: vmread %r12,%r12

%r12 = 0x2000 (i.e. IO_BITMAP_A) fits the call trace.

>    0xffff82d0801e253a <virtual_vmcs_vmread+35>: jbe    0xffff82d0801e3df3

The branch target here, however, doesn't fit the crash %rip.

In any event, if IO_BITMAP_A is the field being read, then the
only failure condition I can see would be "in VMX root operation AND
current-VMCS pointer is not valid".

> (gdb) x/20i 0xffff82d0801e03f9
>    0xffff82d0801e03f9 <virtual_vmcs_enter>:     push   %rbp
>    0xffff82d0801e03fa <virtual_vmcs_enter+1>:   mov    %rsp,%rbp
>    0xffff82d0801e03fd <virtual_vmcs_enter+4>:   sub    $0x10,%rsp
>    0xffff82d0801e0401 <virtual_vmcs_enter+8>:   mov    0x5c8(%rdi),%rax
>    0xffff82d0801e0408 <virtual_vmcs_enter+15>:  mov    %rax,-0x8(%rbp)
>    0xffff82d0801e040c <virtual_vmcs_enter+19>:  vmptrld -0x8(%rbp)
>    0xffff82d0801e0410 <virtual_vmcs_enter+23>:  jbe    0xffff82d0801e3dc7

While the branch target here matches the exception %rip,
this doesn't match the call stack. Something's pretty fishy here.

Jan

  reply	other threads:[~2016-01-12  9:22 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-12  3:38 Nested virtualization off VMware vSphere 6.0 with EL6 guests crashes on Xen 4.6 Konrad Rzeszutek Wilk
2016-01-12  9:22 ` Jan Beulich [this message]
2016-01-15 21:39   ` Konrad Rzeszutek Wilk
2016-01-18  9:41     ` Jan Beulich
2016-02-02 22:05       ` Konrad Rzeszutek Wilk
2016-02-03  9:34         ` Jan Beulich
2016-02-03 15:07           ` Konrad Rzeszutek Wilk
2016-02-04 18:36             ` Konrad Rzeszutek Wilk
2016-02-05 10:33               ` Jan Beulich
2016-11-03  1:41                 ` Konrad Rzeszutek Wilk
2016-11-03 14:36                   ` Konrad Rzeszutek Wilk
2016-02-04  5:52           ` Tian, Kevin
2016-02-17  2:54           ` Tian, Kevin
2016-01-12 14:18 ` Alvin Starr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5694D3CB02000078000C5D00@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=wim.coekaerts@oracle.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.