From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: Nested virtualization off VMware vSphere 6.0 with EL6 guests crashes on Xen 4.6 Date: Fri, 05 Feb 2016 03:33:44 -0700 Message-ID: <56B4889802000078000CEF57@prv-mh.provo.novell.com> References: <20160112033844.GB15551@char.us.oracle.com> <5694D3CB02000078000C5D00@prv-mh.provo.novell.com> <20160115213958.GA16118@char.us.oracle.com> <569CC17002000078000C7D91@prv-mh.provo.novell.com> <20160202220545.GA9915@char.us.oracle.com> <56B1D7C702000078000CDDAA@prv-mh.provo.novell.com> <20160203150727.GC20732@char.us.oracle.com> <20160204183647.GA7205@char.us.oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aRdi0-00079p-8b for xen-devel@lists.xenproject.org; Fri, 05 Feb 2016 10:33:48 +0000 In-Reply-To: <20160204183647.GA7205@char.us.oracle.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: andrew.cooper3@citrix.com, kevin.tian@intel.com, wim.coekaerts@oracle.com, jun.nakajima@intel.com, xen-devel List-Id: xen-devel@lists.xenproject.org >>> On 04.02.16 at 19:36, wrote: > (XEN) nvmx_handle_vmwrite 1: IO_BITMAP_A(2000)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 0: IO_BITMAP_A(2000)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 1: IO_BITMAP_B(2002)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 2: IO_BITMAP_A(2000)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 1: VIRTUAL_APIC_PAGE_ADDR(2012)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 2: IO_BITMAP_B(2002)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 1: (2006)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 2: VIRTUAL_APIC_PAGE_ADDR(2012)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 1: VM_EXIT_MSR_LOAD_ADDR(2008)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 3: IO_BITMAP_A(2000)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 3: IO_BITMAP_B(2002)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 2: MSR_BITMAP(2004)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 1: MSR_BITMAP(2004)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 0: MSR_BITMAP(2004)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 3: (2006)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 3: VM_EXIT_MSR_LOAD_ADDR(2008)[0=ffffffffffffffff] > (XEN) nvmx_handle_vmwrite 3: MSR_BITMAP(2004)[0=ffffffffffffffff] So there's a whole lot of "interesting" writes of all ones, and indeed VIRTUAL_APIC_PAGE_ADDR is among them, and the code doesn't handle that case (nor the equivalent for APIC_ACCESS_ADDR). What's odd though is that the writes are for vCPU 1 and 2, while the crash is on vCPU 3 (it would of course help if the guest had as few vCPU-s as possible without making the issue disappear). While you have circumvented the ASSERT() you've originally hit, the log messages you've added there don't appear anywhere, which is clearly confusing, so I wonder what other unintended effects your debugging code has (there's clearly an uninitialized variable issue in your additions to vmx_vmexit_handler(), but that shouldn't matter here, albeit it should have cause build failure, making me suspect the patch to be stale). Oddly enough the various bitmap field VMWRITEs above should all fail, yet the guest appears to recover from (ignore?) these failures. (From all I can tell we're prone to NULL dereferences due to that at least in _shadow_io_bitmap().) > (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state (4). 4 means invalid VMCS link pointer - interesting. Jan