From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeroen Groenewegen van der Weyden Subject: Re: Crash of guest with nested vmx with Unknown nested vmexit reason 80000021. Date: Tue, 07 Oct 2014 17:16:56 +0200 Message-ID: <543403E8.6040507@grosc.com> References: <542AD266.5030803@grosc.com> <5433B94E020000780003CBF0@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5433B94E020000780003CBF0@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Eddie Dong , Jun Nakajima , Kevin Tian Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Hi Jan, I installed xen-4.5.29823-1.xen_unstable.1.x86_64 from the http://download.opensuse.org/repositories/home:/olh:/xen-unstable/openSUSE_Factory/ After about 45 minutes the guests crashed again. mfg, Jeroen (XEN) vvmx.c:2476:d1v0 Unknown nested vmexit reason 80000021. (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state (0). (XEN) ************* VMCS Area ************** (XEN) *** Guest State *** (XEN) CR0: actual=0x0000000000000033, shadow=0x0000000000000011, gh_mask=ffffffffffffffff (XEN) CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffffff (XEN) CR3: actual=0x00000000feffc000, target_count=0 (XEN) target0=0000000000000000, target1=0000000000000000 (XEN) target2=0000000000000000, target3=0000000000000000 (XEN) RSP = 0x00000000000006be (0x00000000000006be) RIP = 0x000000000000017d (0x000000000000017d) (XEN) RFLAGS=0x0000000000000002 (0x0000000000000002) DR7 = 0x0000000000000400 (XEN) Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000 (XEN) CS: sel=0xbf70, attr=0x0009b, limit=0x0000075a, base=0x000000000194fec0 (XEN) DS: sel=0x3cec, attr=0x00093, limit=0x00000110, base=0x000000000bcbe276 (XEN) SS: sel=0x3cf4, attr=0x00093, limit=0x000007ff, base=0x000000000bcbe3aa (XEN) ES: sel=0x3cfc, attr=0x00093, limit=0x00000099, base=0x000000000bcbebac (XEN) FS: sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000 (XEN) GS: sel=0x0000, attr=0x1c000, limit=0xffffffff, base=0x0000000000000000 (XEN) GDTR: limit=0x0000ffff, base=0x0000000001000000 (XEN) LDTR: sel=0x0820, attr=0x00082, limit=0x0000ffff, base=0x0000000001010400 (XEN) IDTR: limit=0x000003ff, base=0x0000000001010000 (XEN) TR: sel=0x0690, attr=0x00083, limit=0x0000002b, base=0x0000000001213a30 (XEN) Guest PAT = 0x0000050100070406 (XEN) TSC Offset = fffffe8a52350fd4 (XEN) DebugCtl=0000000000000000 DebugExceptions=0000000000000000 (XEN) Interruptibility=0008 ActivityState=0000 (XEN) *** Host State *** (XEN) RSP = 0xffff830836b5ff90 RIP = 0xffff82d0801e3ed0 (XEN) CS=e008 DS=0000 ES=0000 FS=0000 GS=0000 SS=0000 TR=e040 (XEN) FSBase=0000000000000000 GSBase=0000000000000000 TRBase=ffff830836b64c80 (XEN) GDTBase=ffff830836b55000 IDTBase=ffff830836b61000 (XEN) CR0=0000000080050033 CR3=000000082ffdd000 CR4=00000000000026f0 (XEN) Sysenter RSP=ffff830836b5ffc0 CS:RIP=e008:ffff82d0802220a0 (XEN) Host PAT = 0x0000050100070406 (XEN) *** Control State *** (XEN) PinBased=0000003f CPUBased=b6b9e5fa SecondaryExec=000004eb (XEN) EntryControls=000011ff ExitControls=000fefff (XEN) ExceptionBitmap=00040042 (XEN) VMEntry: intr_info=80000202 errcode=00000000 ilen=00000003 (XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000003 (XEN) reason=80000021 qualification=00000000 (XEN) IDTVectoring: info=80000202 errcode=00000000 (XEN) TPR Threshold = 0x00 (XEN) EPT pointer = 0x000000082ffff01e (XEN) Virtual processor ID = 0xe6a3 (XEN) ************************************** (XEN) domain_crash called from vmx.c:2483 (XEN) Domain 1 (vcpu#0) crashed on cpu#9: (XEN) ----[ Xen-4.5.29823-1.xen_unstable.1 x86_64 debug=n Not tainted ]---- (XEN) CPU: 9 (XEN) RIP: bf70:[<000000000000017d>] (XEN) RFLAGS: 0000000000000002 CONTEXT: hvm guest (XEN) rax: 0000000000040002 rbx: 0000000000042701 rcx: 00000000bf480003 (XEN) rdx: 00000000248816a0 rsi: 0000000000000000 rdi: 0000000026b00096 (XEN) rbp: 00000000000006d4 rsp: 00000000000006be r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 0000000000000033 cr4: 0000000000002010 (XEN) cr3: 00000000feffc000 cr2: 0000000000000000 (XEN) ds: 3cec es: 3cfc fs: 0000 gs: 0000 ss: 3cf4 cs: bf70 Jan Beulich schreef op 7-10-2014 om 09:58: >>>> On 30.09.14 at 17:55, wrote: >> Recently I updated my openSuse box from 12.3 to 13.1. On this box I run >> xen with several guests. One of these guests is an appliance that has 4 >> kvm guests running. >> When I start this appliance with the nested vmx feature the appliance >> crashes either immediately or after a few minutes. >> >> This same guest was running without a problem on opensuse releases 11.4 >> until 12.3 >> [...] >> ==== outup xl demsg >> (XEN) vvmx.c:2459:d5 Unknown nested vmexit reason 80000021. >> (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest >> state (0). >> (XEN) ************* VMCS Area ************** >> [...] > So the problem here is that > >> (XEN) Interruptibility=0008 ActivityState=0000 > VMX_INTR_SHADOW_NMI is set while > >> (XEN) PinBased=0000003f CPUBased=b6b9e5fa SecondaryExec=000004eb > PIN_BASED_VIRTUAL_NMIS is active and > >> (XEN) VMEntry: intr_info=80000202 errcode=5d021101 ilen=00000003 >> (XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000003 >> (XEN) reason=80000021 qualification=00000000 >> (XEN) IDTVectoring: info=80000202 errcode=00000000 > an NMI is being injected. This case is explicitly mentioned in Vol 3 > section 31.7.1.2 (Resuming Guest Software after Handling an > Exception). Either there needs to be extra code in vvmx.c to clear > VMX_INTR_SHADOW_NMI (as the second sub-bullet point of the last > bullet point says), or the second half of vmx_idtv_reinject() needs > to be done without regard to nestedhvm_vcpu_in_guestmode(v) > (and maybe also without regard to EXIT_REASON_TASK_SWITCH). > > Speaking of SDM sections: There are quite a few references in the > code that name just section numbers (in the case here, several > references to sections 25.7.1.* exist). These numbers become stale > quite quickly (here they're now 31.7.1.*), so in order to help > digging through issues like the one here, can I please ask one of > you to go through and replace (or at least amend) these numbers > with the sections' titles (which I hope won't get altered that quickly)? > > Jan > >