* KVM_SET_NESTED_STATE not yet stable @ 2019-07-08 20:39 Jan Kiszka 2019-07-10 15:24 ` Raslan, KarimAllah 0 siblings, 1 reply; 10+ messages in thread From: Jan Kiszka @ 2019-07-08 20:39 UTC (permalink / raw) To: Paolo Bonzini, Jim Mattson, Liran Alon, KarimAllah Ahmed, kvm Hi all, it seems the "new" KVM_SET_NESTED_STATE interface has some remaining robustness issues. The most urgent one: With the help of latest QEMU master that uses this interface, you can easily crash the host. You just need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1. The host CPU that ran this will stall, the system will freeze soon. I've also seen a pattern with my Jailhouse test VM where I seems to get stuck in a loop between L1 and L2: qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3 qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2) qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3 qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 These issues disappear when going from ebbfef2f back to 6cfd7639 (both with build fixes) in QEMU. Host kernels tested: 5.1.16 (distro) and 5.2 (vanilla). Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable 2019-07-08 20:39 KVM_SET_NESTED_STATE not yet stable Jan Kiszka @ 2019-07-10 15:24 ` Raslan, KarimAllah 2019-07-10 16:05 ` Jan Kiszka 0 siblings, 1 reply; 10+ messages in thread From: Raslan, KarimAllah @ 2019-07-10 15:24 UTC (permalink / raw) To: jmattson, liran.alon, kvm, pbonzini, jan.kiszka On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote: > Hi all, > > it seems the "new" KVM_SET_NESTED_STATE interface has some remaining > robustness issues. I would be very interested to learn about any more robustness issues that you are seeing. > The most urgent one: With the help of latest QEMU > master that uses this interface, you can easily crash the host. You just > need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1. > The host CPU that ran this will stall, the system will freeze soon. Just to confirm, you start an L2 guest using qemu inside an L1-guest and then hard-reset the L1 guest? Are you running any special workload in L2 or L1 when you reset? Also how exactly are you doing this "hard reset"? (sorry just tried this in my setup and I did not see any problem but my setup is slightly different, so just ruling out obvious stuff). > > I've also seen a pattern with my Jailhouse test VM where I seems to get > stuck in a loop between L1 and L2: > > qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 > qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload > qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2) > qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load > qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 > qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync > qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3 > qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 > qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 > qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload > qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2) > qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load > qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 > qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync > qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3 > qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 > qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 > > These issues disappear when going from ebbfef2f back to 6cfd7639 (both > with build fixes) in QEMU. This is the QEMU that you are using in L0 to launch an L1 guest, right? or are you still referring to the QEMU mentioned above? > Host kernels tested: 5.1.16 (distro) and 5.2 (vanilla). > Jan > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable 2019-07-10 15:24 ` Raslan, KarimAllah @ 2019-07-10 16:05 ` Jan Kiszka 2019-07-10 20:31 ` Jan Kiszka 0 siblings, 1 reply; 10+ messages in thread From: Jan Kiszka @ 2019-07-10 16:05 UTC (permalink / raw) To: Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini Hi KarimAllah, On 10.07.19 17:24, Raslan, KarimAllah wrote: > On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote: >> Hi all, >> >> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining >> robustness issues. > > I would be very interested to learn about any more robustness issues that you > are seeing. > >> The most urgent one: With the help of latest QEMU >> master that uses this interface, you can easily crash the host. You just >> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1. >> The host CPU that ran this will stall, the system will freeze soon. > > Just to confirm, you start an L2 guest using qemu inside an L1-guest and then > hard-reset the L1 guest? Exactly. > > Are you running any special workload in L2 or L1 when you reset? Also how Nope. It is a standard (though rather oldish) userland in L1, just running a more recent kernel 5.2. > exactly are you doing this "hard reset"? system_reset from the monitor or "reset" from QEMU window menu. > > (sorry just tried this in my setup and I did not see any problem but my setup > is slightly different, so just ruling out obvious stuff). > If it helps, I can share privately a guest image that was built via https://github.com/siemens/jailhouse-images which exposes the reset issue after starting Jailhouse (instead of qemu-system-x86_64 - though that should "work" as well, just not tested yet). It's about 70M packed. Host-wise, 5.2.0 + QEMU master should do. I can also provide you the .config if needed. >> >> I've also seen a pattern with my Jailhouse test VM where I seems to get >> stuck in a loop between L1 and L2: >> >> qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 >> qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload >> qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2) >> qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load >> qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 >> qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync >> qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3 >> qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 >> qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 >> qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload >> qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2) >> qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load >> qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 >> qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync >> qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3 >> qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 >> qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 >> >> These issues disappear when going from ebbfef2f back to 6cfd7639 (both >> with build fixes) in QEMU. > > This is the QEMU that you are using in L0 to launch an L1 guest, right? or are > you still referring to the QEMU mentioned above? This scenario is similar but still a bit different than the above. Yes, same L0 image and host QEMU here (and the traces were taken on the host, obviously), but the workload is now as follows: - boot L1 Linux - enable Jailhouse inside L1 - move the mouse over the graphical desktop of L2, ie. the former L1 Linux (Jailhouse is now L1) - the L1/L2 guests enter the loop above while trying to read from the vmmouse port Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable 2019-07-10 16:05 ` Jan Kiszka @ 2019-07-10 20:31 ` Jan Kiszka 2019-07-10 21:14 ` Jan Kiszka 2019-07-11 11:37 ` Ralf Ramsauer 0 siblings, 2 replies; 10+ messages in thread From: Jan Kiszka @ 2019-07-10 20:31 UTC (permalink / raw) To: Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini; +Cc: Ralf Ramsauer On 10.07.19 18:05, Jan Kiszka wrote: > Hi KarimAllah, > > On 10.07.19 17:24, Raslan, KarimAllah wrote: >> On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote: >>> Hi all, >>> >>> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining >>> robustness issues. >> >> I would be very interested to learn about any more robustness issues that you >> are seeing. >> >>> The most urgent one: With the help of latest QEMU >>> master that uses this interface, you can easily crash the host. You just >>> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1. >>> The host CPU that ran this will stall, the system will freeze soon. >> >> Just to confirm, you start an L2 guest using qemu inside an L1-guest and then >> hard-reset the L1 guest? > > Exactly. > >> >> Are you running any special workload in L2 or L1 when you reset? Also how > > Nope. It is a standard (though rather oldish) userland in L1, just running a > more recent kernel 5.2. > >> exactly are you doing this "hard reset"? > > system_reset from the monitor or "reset" from QEMU window menu. > >> >> (sorry just tried this in my setup and I did not see any problem but my setup >> is slightly different, so just ruling out obvious stuff). >> > > If it helps, I can share privately a guest image that was built via > https://github.com/siemens/jailhouse-images which exposes the reset issue after > starting Jailhouse (instead of qemu-system-x86_64 - though that should "work" as > well, just not tested yet). It's about 70M packed. > > Host-wise, 5.2.0 + QEMU master should do. I can also provide you the .config if > needed. > >>> >>> I've also seen a pattern with my Jailhouse test VM where I seems to get >>> stuck in a loop between L1 and L2: >>> >>> qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 >>> qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload >>> qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2) >>> qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load >>> qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 >>> qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync >>> qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3 >>> qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 >>> qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 >>> qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload >>> qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2) >>> qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load >>> qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 >>> qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync >>> qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3 >>> qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 >>> qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 >>> >>> These issues disappear when going from ebbfef2f back to 6cfd7639 (both >>> with build fixes) in QEMU. >> >> This is the QEMU that you are using in L0 to launch an L1 guest, right? or are >> you still referring to the QEMU mentioned above? > > This scenario is similar but still a bit different than the above. Yes, same L0 > image and host QEMU here (and the traces were taken on the host, obviously), but > the workload is now as follows: > > - boot L1 Linux > - enable Jailhouse inside L1 > - move the mouse over the graphical desktop of L2, ie. the former L1 > Linux (Jailhouse is now L1) > - the L1/L2 guests enter the loop above while trying to read from the > vmmouse port > > Jan > Ralf tried my case on some of his systems as well but he also didn't succeed in reproducing. So we compared vmxcap lists because I'm starting to think it's feature-related. There are some differences... --- vmxcap.i7-5600u 2019-07-10 21:59:05.616547924 +0200 +++ vmxcap.jan 2019-07-10 21:58:23.135686409 +0200 @@ -1,6 +1,6 @@ Basic VMX Information - Hex: 0xda040000000012 - Revision 18 + Hex: 0xda040000000004 + Revision 4 VMCS size 1024 VMCS restricted to 32 bit addresses no Dual-monitor support yes @@ -51,13 +51,13 @@ Enable INVPCID yes Enable VM functions yes VMCS shadowing yes - Enable ENCLS exiting no + Enable ENCLS exiting yes RDSEED exiting yes - Enable PML no + Enable PML yes EPT-violation #VE yes - Conceal non-root operation from PT no - Enable XSAVES/XRSTORS no - Mode-based execute control (XS/XU) no + Conceal non-root operation from PT yes + Enable XSAVES/XRSTORS yes + Mode-based execute control (XS/XU) yes TSC scaling no VM-Exit controls Save debug controls default @@ -69,8 +69,8 @@ Save IA32_EFER yes Load IA32_EFER yes Save VMX-preemption timer value yes - Clear IA32_BNDCFGS no - Conceal VM exits from PT no + Clear IA32_BNDCFGS yes + Conceal VM exits from PT yes VM-Entry controls Load debug controls default IA-32e mode guest yes @@ -79,11 +79,11 @@ Load IA32_PERF_GLOBAL_CTRL yes Load IA32_PAT yes Load IA32_EFER yes - Load IA32_BNDCFGS no - Conceal VM entries from PT no + Load IA32_BNDCFGS yes + Conceal VM entries from PT yes Miscellaneous data - Hex: 0x300481e5 - VMX-preemption timer scale (log2) 5 + Hex: 0x7004c1e7 + VMX-preemption timer scale (log2) 7 Store EFER.LMA into IA-32e mode guest control yes HLT activity state yes Shutdown activity state yes @@ -93,10 +93,10 @@ MSR-load/store count recommendation 0 IA32_SMM_MONITOR_CTL[2] can be set to 1 yes VMWRITE to VM-exit information fields yes - Inject event with insn length=0 no + Inject event with insn length=0 yes MSEG revision identifier 0 VPID and EPT capabilities - Hex: 0xf0106334141 + Hex: 0xf0106734141 Execute-only EPT translations yes Page-walk length 4 yes Paging-structure memory type UC yes And another machine that does not crash: --- vmxcaps.e5-2683v4 2019-07-10 22:21:28.620329384 +0200 +++ vmxcap.jan 2019-07-10 21:58:23.135686409 +0200 @@ -1,6 +1,6 @@ Basic VMX Information - Hex: 0xda040000000012 - Revision 18 + Hex: 0xda040000000004 + Revision 4 VMCS size 1024 VMCS restricted to 32 bit addresses no Dual-monitor support yes @@ -12,7 +12,7 @@ NMI exiting yes Virtual NMIs yes Activate VMX-preemption timer yes - Process posted interrupts yes + Process posted interrupts no primary processor-based controls Interrupt window exiting yes Use TSC offsetting yes @@ -44,20 +44,20 @@ Enable VPID yes WBINVD exiting yes Unrestricted guest yes - APIC register emulation yes - Virtual interrupt delivery yes + APIC register emulation no + Virtual interrupt delivery no PAUSE-loop exiting yes RDRAND exiting yes Enable INVPCID yes Enable VM functions yes VMCS shadowing yes - Enable ENCLS exiting no + Enable ENCLS exiting yes RDSEED exiting yes Enable PML yes EPT-violation #VE yes - Conceal non-root operation from PT no - Enable XSAVES/XRSTORS no - Mode-based execute control (XS/XU) no + Conceal non-root operation from PT yes + Enable XSAVES/XRSTORS yes + Mode-based execute control (XS/XU) yes TSC scaling no VM-Exit controls Save debug controls default @@ -69,8 +69,8 @@ Save IA32_EFER yes Load IA32_EFER yes Save VMX-preemption timer value yes - Clear IA32_BNDCFGS no - Conceal VM exits from PT no + Clear IA32_BNDCFGS yes + Conceal VM exits from PT yes VM-Entry controls Load debug controls default IA-32e mode guest yes @@ -79,11 +79,11 @@ Load IA32_PERF_GLOBAL_CTRL yes Load IA32_PAT yes Load IA32_EFER yes - Load IA32_BNDCFGS no - Conceal VM entries from PT no + Load IA32_BNDCFGS yes + Conceal VM entries from PT yes Miscellaneous data - Hex: 0x300481e5 - VMX-preemption timer scale (log2) 5 + Hex: 0x7004c1e7 + VMX-preemption timer scale (log2) 7 Store EFER.LMA into IA-32e mode guest control yes HLT activity state yes Shutdown activity state yes @@ -93,10 +93,10 @@ MSR-load/store count recommendation 0 IA32_SMM_MONITOR_CTL[2] can be set to 1 yes VMWRITE to VM-exit information fields yes - Inject event with insn length=0 no + Inject event with insn length=0 yes MSEG revision identifier 0 VPID and EPT capabilities - Hex: 0xf0106334141 + Hex: 0xf0106734141 Execute-only EPT translations yes Page-walk length 4 yes Paging-structure memory type UC yes And on a Xeon D-1540, I'm not seeing a crash but a kvm entry failure when resetting L1 while running Jailhouse: KVM: entry failed, hardware error 0x7 EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00a09b00 SS =0000 00000000 0000ffff 00c09300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Here is the vmxcap diff: --- xeon-d 2019-07-10 22:29:56.735374032 +0200 +++ i7-8850H 2019-07-10 22:29:31.747467248 +0200 @@ -1,6 +1,6 @@ Basic VMX Information - Hex: 0xda040000000012 - Revision 18 + Hex: 0xda040000000004 + Revision 4 VMCS size 1024 VMCS restricted to 32 bit addresses no Dual-monitor support yes @@ -12,7 +12,7 @@ pin-based controls NMI exiting yes Virtual NMIs yes Activate VMX-preemption timer yes - Process posted interrupts yes + Process posted interrupts no primary processor-based controls Interrupt window exiting yes Use TSC offsetting yes @@ -44,20 +44,20 @@ secondary processor-based controls Enable VPID yes WBINVD exiting yes Unrestricted guest yes - APIC register emulation yes - Virtual interrupt delivery yes + APIC register emulation no + Virtual interrupt delivery no PAUSE-loop exiting yes RDRAND exiting yes Enable INVPCID yes Enable VM functions yes VMCS shadowing yes - Enable ENCLS exiting no + Enable ENCLS exiting yes RDSEED exiting yes Enable PML yes EPT-violation #VE yes - Conceal non-root operation from PT no - Enable XSAVES/XRSTORS no - Mode-based execute control (XS/XU) no + Conceal non-root operation from PT yes + Enable XSAVES/XRSTORS yes + Mode-based execute control (XS/XU) yes TSC scaling no VM-Exit controls Save debug controls default @@ -69,8 +69,8 @@ VM-Exit controls Save IA32_EFER yes Load IA32_EFER yes Save VMX-preemption timer value yes - Clear IA32_BNDCFGS no - Conceal VM exits from PT no + Clear IA32_BNDCFGS yes + Conceal VM exits from PT yes VM-Entry controls Load debug controls default IA-32e mode guest yes @@ -79,11 +79,11 @@ VM-Entry controls Load IA32_PERF_GLOBAL_CTRL yes Load IA32_PAT yes Load IA32_EFER yes - Load IA32_BNDCFGS no - Conceal VM entries from PT no + Load IA32_BNDCFGS yes + Conceal VM entries from PT yes Miscellaneous data - Hex: 0x300481e5 - VMX-preemption timer scale (log2) 5 + Hex: 0x7004c1e7 + VMX-preemption timer scale (log2) 7 Store EFER.LMA into IA-32e mode guest control yes HLT activity state yes Shutdown activity state yes @@ -93,10 +93,10 @@ Miscellaneous data MSR-load/store count recommendation 0 IA32_SMM_MONITOR_CTL[2] can be set to 1 yes VMWRITE to VM-exit information fields yes - Inject event with insn length=0 no + Inject event with insn length=0 yes MSEG revision identifier 0 VPID and EPT capabilities - Hex: 0xf0106334141 + Hex: 0xf0106734141 Execute-only EPT translations yes Page-walk length 4 yes Paging-structure memory type UC yes Maybe the KVM code does not take the latest VMX features into account when importing a userspace nested state? Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable 2019-07-10 20:31 ` Jan Kiszka @ 2019-07-10 21:14 ` Jan Kiszka 2019-07-11 11:37 ` Ralf Ramsauer 1 sibling, 0 replies; 10+ messages in thread From: Jan Kiszka @ 2019-07-10 21:14 UTC (permalink / raw) To: Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini; +Cc: Ralf Ramsauer On 10.07.19 22:31, Jan Kiszka wrote: > And on a Xeon D-1540, I'm not seeing a crash but a kvm entry failure when > resetting L1 while running Jailhouse: > > KVM: entry failed, hardware error 0x7 > EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61 > ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 > EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 00000000 0000ffff 00009300 > CS =f000 ffff0000 0000ffff 00a09b00 > SS =0000 00000000 0000ffff 00c09300 > DS =0000 00000000 0000ffff 00009300 > FS =0000 00000000 0000ffff 00009300 > GS =0000 00000000 0000ffff 00009300 > LDT=0000 00000000 0000ffff 00008200 > TR =0000 00000000 0000ffff 00008b00 > GDT= 00000000 0000ffff > IDT= 00000000 0000ffff > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 > DR6=00000000ffff0ff0 DR7=0000000000000400 > EFER=0000000000000000 > Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 > f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > OK, looks like the feature diff was a red herring: Ralf found a server with even more features and without a crash, and I found familiar error messages in the kernel log of that Xeon D: kvm: vmptrld (null)/778000000000 failed kvm: vmclear fail: (null)/778000000000 Only difference: No crash, just that more graceful entry failure. Jan -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable 2019-07-10 20:31 ` Jan Kiszka 2019-07-10 21:14 ` Jan Kiszka @ 2019-07-11 11:37 ` Ralf Ramsauer 2019-07-11 17:30 ` Paolo Bonzini 1 sibling, 1 reply; 10+ messages in thread From: Ralf Ramsauer @ 2019-07-11 11:37 UTC (permalink / raw) To: Jan Kiszka, Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini Hi all, On 7/10/19 10:31 PM, Jan Kiszka wrote: > On 10.07.19 18:05, Jan Kiszka wrote: >> Hi KarimAllah, >> >> On 10.07.19 17:24, Raslan, KarimAllah wrote: >>> On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote: >>>> Hi all, >>>> >>>> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining >>>> robustness issues. >>> >>> I would be very interested to learn about any more robustness issues that you >>> are seeing. >>> >>>> The most urgent one: With the help of latest QEMU >>>> master that uses this interface, you can easily crash the host. You just >>>> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1. >>>> The host CPU that ran this will stall, the system will freeze soon. >>> >>> Just to confirm, you start an L2 guest using qemu inside an L1-guest and then >>> hard-reset the L1 guest? >> >> Exactly. >> >>> >>> Are you running any special workload in L2 or L1 when you reset? Also how >> >> Nope. It is a standard (though rather oldish) userland in L1, just running a >> more recent kernel 5.2. >> >>> exactly are you doing this "hard reset"? >> >> system_reset from the monitor or "reset" from QEMU window menu. While I'm not able to reproduce this behaviour on any of my machines (i7-4810MQ, i7-5600U, Xeon Gold 5118), >> >>> >>> (sorry just tried this in my setup and I did not see any problem but my setup >>> is slightly different, so just ruling out obvious stuff). >>> >> >> If it helps, I can share privately a guest image that was built via >> https://github.com/siemens/jailhouse-images which exposes the reset issue after >> starting Jailhouse (instead of qemu-system-x86_64 - though that should "work" as >> well, just not tested yet). It's about 70M packed. >> >> Host-wise, 5.2.0 + QEMU master should do. I can also provide you the .config if >> needed. I can reproduce and confirm this issue. A system_reset of qemu after Jailhouse is enabled leads to the crash listed below, on all machines. On the Xeon Gold, e.g., Qemu reports: EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =f000 ffff0000 0000ffff 00a09b00 SS =0000 00000000 0000ffff 00c09300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Kernel: [ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed [ 1868.804568] kvm: vmclear fail: (null)/6b8640000000 And the host freezes unrecoverably. Hosts use standard distro kernels >= v5.0. Ralf >> >>>> >>>> I've also seen a pattern with my Jailhouse test VM where I seems to get >>>> stuck in a loop between L1 and L2: >>>> >>>> qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 >>>> qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload >>>> qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2) >>>> qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load >>>> qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 >>>> qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync >>>> qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3 >>>> qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 >>>> qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 >>>> qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload >>>> qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2) >>>> qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load >>>> qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4 >>>> qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync >>>> qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3 >>>> qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0 >>>> qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0 >>>> >>>> These issues disappear when going from ebbfef2f back to 6cfd7639 (both >>>> with build fixes) in QEMU. >>> >>> This is the QEMU that you are using in L0 to launch an L1 guest, right? or are >>> you still referring to the QEMU mentioned above? >> >> This scenario is similar but still a bit different than the above. Yes, same L0 >> image and host QEMU here (and the traces were taken on the host, obviously), but >> the workload is now as follows: >> >> - boot L1 Linux >> - enable Jailhouse inside L1 >> - move the mouse over the graphical desktop of L2, ie. the former L1 >> Linux (Jailhouse is now L1) >> - the L1/L2 guests enter the loop above while trying to read from the >> vmmouse port >> >> Jan >> > > Ralf tried my case on some of his systems as well but he also didn't succeed in > reproducing. So we compared vmxcap lists because I'm starting to think it's > feature-related. There are some differences... > > --- vmxcap.i7-5600u 2019-07-10 21:59:05.616547924 +0200 > +++ vmxcap.jan 2019-07-10 21:58:23.135686409 +0200 > @@ -1,6 +1,6 @@ > Basic VMX Information > - Hex: 0xda040000000012 > - Revision 18 > + Hex: 0xda040000000004 > + Revision 4 > VMCS size 1024 > VMCS restricted to 32 bit addresses no > Dual-monitor support yes > @@ -51,13 +51,13 @@ > Enable INVPCID yes > Enable VM functions yes > VMCS shadowing yes > - Enable ENCLS exiting no > + Enable ENCLS exiting yes > RDSEED exiting yes > - Enable PML no > + Enable PML yes > EPT-violation #VE yes > - Conceal non-root operation from PT no > - Enable XSAVES/XRSTORS no > - Mode-based execute control (XS/XU) no > + Conceal non-root operation from PT yes > + Enable XSAVES/XRSTORS yes > + Mode-based execute control (XS/XU) yes > TSC scaling no > VM-Exit controls > Save debug controls default > @@ -69,8 +69,8 @@ > Save IA32_EFER yes > Load IA32_EFER yes > Save VMX-preemption timer value yes > - Clear IA32_BNDCFGS no > - Conceal VM exits from PT no > + Clear IA32_BNDCFGS yes > + Conceal VM exits from PT yes > VM-Entry controls > Load debug controls default > IA-32e mode guest yes > @@ -79,11 +79,11 @@ > Load IA32_PERF_GLOBAL_CTRL yes > Load IA32_PAT yes > Load IA32_EFER yes > - Load IA32_BNDCFGS no > - Conceal VM entries from PT no > + Load IA32_BNDCFGS yes > + Conceal VM entries from PT yes > Miscellaneous data > - Hex: 0x300481e5 > - VMX-preemption timer scale (log2) 5 > + Hex: 0x7004c1e7 > + VMX-preemption timer scale (log2) 7 > Store EFER.LMA into IA-32e mode guest control yes > HLT activity state yes > Shutdown activity state yes > @@ -93,10 +93,10 @@ > MSR-load/store count recommendation 0 > IA32_SMM_MONITOR_CTL[2] can be set to 1 yes > VMWRITE to VM-exit information fields yes > - Inject event with insn length=0 no > + Inject event with insn length=0 yes > MSEG revision identifier 0 > VPID and EPT capabilities > - Hex: 0xf0106334141 > + Hex: 0xf0106734141 > Execute-only EPT translations yes > Page-walk length 4 yes > Paging-structure memory type UC yes > > And another machine that does not crash: > > --- vmxcaps.e5-2683v4 2019-07-10 22:21:28.620329384 +0200 > +++ vmxcap.jan 2019-07-10 21:58:23.135686409 +0200 > @@ -1,6 +1,6 @@ > Basic VMX Information > - Hex: 0xda040000000012 > - Revision 18 > + Hex: 0xda040000000004 > + Revision 4 > VMCS size 1024 > VMCS restricted to 32 bit addresses no > Dual-monitor support yes > @@ -12,7 +12,7 @@ > NMI exiting yes > Virtual NMIs yes > Activate VMX-preemption timer yes > - Process posted interrupts yes > + Process posted interrupts no > primary processor-based controls > Interrupt window exiting yes > Use TSC offsetting yes > @@ -44,20 +44,20 @@ > Enable VPID yes > WBINVD exiting yes > Unrestricted guest yes > - APIC register emulation yes > - Virtual interrupt delivery yes > + APIC register emulation no > + Virtual interrupt delivery no > PAUSE-loop exiting yes > RDRAND exiting yes > Enable INVPCID yes > Enable VM functions yes > VMCS shadowing yes > - Enable ENCLS exiting no > + Enable ENCLS exiting yes > RDSEED exiting yes > Enable PML yes > EPT-violation #VE yes > - Conceal non-root operation from PT no > - Enable XSAVES/XRSTORS no > - Mode-based execute control (XS/XU) no > + Conceal non-root operation from PT yes > + Enable XSAVES/XRSTORS yes > + Mode-based execute control (XS/XU) yes > TSC scaling no > VM-Exit controls > Save debug controls default > @@ -69,8 +69,8 @@ > Save IA32_EFER yes > Load IA32_EFER yes > Save VMX-preemption timer value yes > - Clear IA32_BNDCFGS no > - Conceal VM exits from PT no > + Clear IA32_BNDCFGS yes > + Conceal VM exits from PT yes > VM-Entry controls > Load debug controls default > IA-32e mode guest yes > @@ -79,11 +79,11 @@ > Load IA32_PERF_GLOBAL_CTRL yes > Load IA32_PAT yes > Load IA32_EFER yes > - Load IA32_BNDCFGS no > - Conceal VM entries from PT no > + Load IA32_BNDCFGS yes > + Conceal VM entries from PT yes > Miscellaneous data > - Hex: 0x300481e5 > - VMX-preemption timer scale (log2) 5 > + Hex: 0x7004c1e7 > + VMX-preemption timer scale (log2) 7 > Store EFER.LMA into IA-32e mode guest control yes > HLT activity state yes > Shutdown activity state yes > @@ -93,10 +93,10 @@ > MSR-load/store count recommendation 0 > IA32_SMM_MONITOR_CTL[2] can be set to 1 yes > VMWRITE to VM-exit information fields yes > - Inject event with insn length=0 no > + Inject event with insn length=0 yes > MSEG revision identifier 0 > VPID and EPT capabilities > - Hex: 0xf0106334141 > + Hex: 0xf0106734141 > Execute-only EPT translations yes > Page-walk length 4 yes > Paging-structure memory type UC yes > > And on a Xeon D-1540, I'm not seeing a crash but a kvm entry failure when > resetting L1 while running Jailhouse: > > KVM: entry failed, hardware error 0x7 > EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61 > ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 > EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 00000000 0000ffff 00009300 > CS =f000 ffff0000 0000ffff 00a09b00 > SS =0000 00000000 0000ffff 00c09300 > DS =0000 00000000 0000ffff 00009300 > FS =0000 00000000 0000ffff 00009300 > GS =0000 00000000 0000ffff 00009300 > LDT=0000 00000000 0000ffff 00008200 > TR =0000 00000000 0000ffff 00008b00 > GDT= 00000000 0000ffff > IDT= 00000000 0000ffff > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 > DR6=00000000ffff0ff0 DR7=0000000000000400 > EFER=0000000000000000 > Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00 > f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > Here is the vmxcap diff: > > --- xeon-d 2019-07-10 22:29:56.735374032 +0200 > +++ i7-8850H 2019-07-10 22:29:31.747467248 +0200 > @@ -1,6 +1,6 @@ > Basic VMX Information > - Hex: 0xda040000000012 > - Revision 18 > + Hex: 0xda040000000004 > + Revision 4 > VMCS size 1024 > VMCS restricted to 32 bit addresses no > Dual-monitor support yes > @@ -12,7 +12,7 @@ pin-based controls > NMI exiting yes > Virtual NMIs yes > Activate VMX-preemption timer yes > - Process posted interrupts yes > + Process posted interrupts no > primary processor-based controls > Interrupt window exiting yes > Use TSC offsetting yes > @@ -44,20 +44,20 @@ secondary processor-based controls > Enable VPID yes > WBINVD exiting yes > Unrestricted guest yes > - APIC register emulation yes > - Virtual interrupt delivery yes > + APIC register emulation no > + Virtual interrupt delivery no > PAUSE-loop exiting yes > RDRAND exiting yes > Enable INVPCID yes > Enable VM functions yes > VMCS shadowing yes > - Enable ENCLS exiting no > + Enable ENCLS exiting yes > RDSEED exiting yes > Enable PML yes > EPT-violation #VE yes > - Conceal non-root operation from PT no > - Enable XSAVES/XRSTORS no > - Mode-based execute control (XS/XU) no > + Conceal non-root operation from PT yes > + Enable XSAVES/XRSTORS yes > + Mode-based execute control (XS/XU) yes > TSC scaling no > VM-Exit controls > Save debug controls default > @@ -69,8 +69,8 @@ VM-Exit controls > Save IA32_EFER yes > Load IA32_EFER yes > Save VMX-preemption timer value yes > - Clear IA32_BNDCFGS no > - Conceal VM exits from PT no > + Clear IA32_BNDCFGS yes > + Conceal VM exits from PT yes > VM-Entry controls > Load debug controls default > IA-32e mode guest yes > @@ -79,11 +79,11 @@ VM-Entry controls > Load IA32_PERF_GLOBAL_CTRL yes > Load IA32_PAT yes > Load IA32_EFER yes > - Load IA32_BNDCFGS no > - Conceal VM entries from PT no > + Load IA32_BNDCFGS yes > + Conceal VM entries from PT yes > Miscellaneous data > - Hex: 0x300481e5 > - VMX-preemption timer scale (log2) 5 > + Hex: 0x7004c1e7 > + VMX-preemption timer scale (log2) 7 > Store EFER.LMA into IA-32e mode guest control yes > HLT activity state yes > Shutdown activity state yes > @@ -93,10 +93,10 @@ Miscellaneous data > MSR-load/store count recommendation 0 > IA32_SMM_MONITOR_CTL[2] can be set to 1 yes > VMWRITE to VM-exit information fields yes > - Inject event with insn length=0 no > + Inject event with insn length=0 yes > MSEG revision identifier 0 > VPID and EPT capabilities > - Hex: 0xf0106334141 > + Hex: 0xf0106734141 > Execute-only EPT translations yes > Page-walk length 4 yes > Paging-structure memory type UC yes > > Maybe the KVM code does not take the latest VMX features into account when > importing a userspace nested state? > > Jan > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable 2019-07-11 11:37 ` Ralf Ramsauer @ 2019-07-11 17:30 ` Paolo Bonzini 2019-07-19 16:38 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2019-07-11 17:30 UTC (permalink / raw) To: Ralf Ramsauer, Jan Kiszka, Raslan, KarimAllah, jmattson, liran.alon, kvm On 11/07/19 13:37, Ralf Ramsauer wrote: > I can reproduce and confirm this issue. A system_reset of qemu after > Jailhouse is enabled leads to the crash listed below, on all machines. > > On the Xeon Gold, e.g., Qemu reports: > > EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61 > ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 > EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0000 00000000 0000ffff 00009300 > CS =f000 ffff0000 0000ffff 00a09b00 > SS =0000 00000000 0000ffff 00c09300 > DS =0000 00000000 0000ffff 00009300 > FS =0000 00000000 0000ffff 00009300 > GS =0000 00000000 0000ffff 00009300 > LDT=0000 00000000 0000ffff 00008200 > TR =0000 00000000 0000ffff 00008b00 > GDT= 00000000 0000ffff > IDT= 00000000 0000ffff > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680 > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 > DR3=0000000000000000 > DR6=00000000ffff0ff0 DR7=0000000000000400 > EFER=0000000000000000 > Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b > e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > Kernel: > [ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed > [ 1868.804568] kvm: vmclear fail: (null)/6b8640000000 > > And the host freezes unrecoverably. Hosts use standard distro kernels Thanks. I'm going to look at it tomorrow. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable 2019-07-11 17:30 ` Paolo Bonzini @ 2019-07-19 16:38 ` Paolo Bonzini 2019-07-21 9:05 ` Jan Kiszka 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2019-07-19 16:38 UTC (permalink / raw) To: Ralf Ramsauer, Jan Kiszka, Raslan, KarimAllah, jmattson, liran.alon, kvm On 11/07/19 19:30, Paolo Bonzini wrote: > On 11/07/19 13:37, Ralf Ramsauer wrote: >> I can reproduce and confirm this issue. A system_reset of qemu after >> Jailhouse is enabled leads to the crash listed below, on all machines. >> >> On the Xeon Gold, e.g., Qemu reports: >> >> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61 >> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 >> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 >> ES =0000 00000000 0000ffff 00009300 >> CS =f000 ffff0000 0000ffff 00a09b00 >> SS =0000 00000000 0000ffff 00c09300 >> DS =0000 00000000 0000ffff 00009300 >> FS =0000 00000000 0000ffff 00009300 >> GS =0000 00000000 0000ffff 00009300 >> LDT=0000 00000000 0000ffff 00008200 >> TR =0000 00000000 0000ffff 00008b00 >> GDT= 00000000 0000ffff >> IDT= 00000000 0000ffff >> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680 >> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 >> DR3=0000000000000000 >> DR6=00000000ffff0ff0 DR7=0000000000000400 >> EFER=0000000000000000 >> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b >> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 >> 00 00 00 00 >> >> Kernel: >> [ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed >> [ 1868.804568] kvm: vmclear fail: (null)/6b8640000000 >> >> And the host freezes unrecoverably. Hosts use standard distro kernels > > Thanks. I'm going to look at it tomorrow. Ok, it was only tomorrow modulo 7, but the first fix I got is trivial: diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 6e88f459b323..6119b30347c6 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx) { secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS); vmcs_write64(VMCS_LINK_POINTER, -1ull); + vmx->nested.need_vmcs12_to_shadow_sync = false; } static inline void nested_release_evmcs(struct kvm_vcpu *vcpu) Can you try it and see what you get? Paolo ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable 2019-07-19 16:38 ` Paolo Bonzini @ 2019-07-21 9:05 ` Jan Kiszka 2019-07-22 15:10 ` Ralf Ramsauer 0 siblings, 1 reply; 10+ messages in thread From: Jan Kiszka @ 2019-07-21 9:05 UTC (permalink / raw) To: Paolo Bonzini, Ralf Ramsauer, Raslan, KarimAllah, jmattson, liran.alon, kvm On 19.07.19 18:38, Paolo Bonzini wrote: > On 11/07/19 19:30, Paolo Bonzini wrote: >> On 11/07/19 13:37, Ralf Ramsauer wrote: >>> I can reproduce and confirm this issue. A system_reset of qemu after >>> Jailhouse is enabled leads to the crash listed below, on all machines. >>> >>> On the Xeon Gold, e.g., Qemu reports: >>> >>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61 >>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 >>> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 >>> ES =0000 00000000 0000ffff 00009300 >>> CS =f000 ffff0000 0000ffff 00a09b00 >>> SS =0000 00000000 0000ffff 00c09300 >>> DS =0000 00000000 0000ffff 00009300 >>> FS =0000 00000000 0000ffff 00009300 >>> GS =0000 00000000 0000ffff 00009300 >>> LDT=0000 00000000 0000ffff 00008200 >>> TR =0000 00000000 0000ffff 00008b00 >>> GDT= 00000000 0000ffff >>> IDT= 00000000 0000ffff >>> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680 >>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 >>> DR3=0000000000000000 >>> DR6=00000000ffff0ff0 DR7=0000000000000400 >>> EFER=0000000000000000 >>> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b >>> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 >>> 00 00 00 00 >>> >>> Kernel: >>> [ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed >>> [ 1868.804568] kvm: vmclear fail: (null)/6b8640000000 >>> >>> And the host freezes unrecoverably. Hosts use standard distro kernels >> >> Thanks. I'm going to look at it tomorrow. > > Ok, it was only tomorrow modulo 7, but the first fix I got is trivial: > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 6e88f459b323..6119b30347c6 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx) > { > secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS); > vmcs_write64(VMCS_LINK_POINTER, -1ull); > + vmx->nested.need_vmcs12_to_shadow_sync = false; > } > > static inline void nested_release_evmcs(struct kvm_vcpu *vcpu) > > Can you try it and see what you get? > Confirmed that this fixes the host crashes for me as well. Now I'm only still seeing guest corruptions on vmport/vmmouse accesses from L2. Looking into that right now. Jan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable 2019-07-21 9:05 ` Jan Kiszka @ 2019-07-22 15:10 ` Ralf Ramsauer 0 siblings, 0 replies; 10+ messages in thread From: Ralf Ramsauer @ 2019-07-22 15:10 UTC (permalink / raw) To: Jan Kiszka, Paolo Bonzini, Raslan, KarimAllah, jmattson, liran.alon, kvm On 7/21/19 11:05 AM, Jan Kiszka wrote: > On 19.07.19 18:38, Paolo Bonzini wrote: >> On 11/07/19 19:30, Paolo Bonzini wrote: >>> On 11/07/19 13:37, Ralf Ramsauer wrote: >>>> I can reproduce and confirm this issue. A system_reset of qemu after >>>> Jailhouse is enabled leads to the crash listed below, on all machines. >>>> >>>> On the Xeon Gold, e.g., Qemu reports: >>>> >>>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61 >>>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 >>>> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 >>>> ES =0000 00000000 0000ffff 00009300 >>>> CS =f000 ffff0000 0000ffff 00a09b00 >>>> SS =0000 00000000 0000ffff 00c09300 >>>> DS =0000 00000000 0000ffff 00009300 >>>> FS =0000 00000000 0000ffff 00009300 >>>> GS =0000 00000000 0000ffff 00009300 >>>> LDT=0000 00000000 0000ffff 00008200 >>>> TR =0000 00000000 0000ffff 00008b00 >>>> GDT= 00000000 0000ffff >>>> IDT= 00000000 0000ffff >>>> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680 >>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 >>>> DR3=0000000000000000 >>>> DR6=00000000ffff0ff0 DR7=0000000000000400 >>>> EFER=0000000000000000 >>>> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b >>>> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 >>>> 00 00 00 00 >>>> >>>> Kernel: >>>> [ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed >>>> [ 1868.804568] kvm: vmclear fail: (null)/6b8640000000 >>>> >>>> And the host freezes unrecoverably. Hosts use standard distro kernels >>> >>> Thanks. I'm going to look at it tomorrow. >> >> Ok, it was only tomorrow modulo 7, but the first fix I got is trivial: >> >> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c >> index 6e88f459b323..6119b30347c6 100644 >> --- a/arch/x86/kvm/vmx/nested.c >> +++ b/arch/x86/kvm/vmx/nested.c >> @@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx) >> { >> secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS); >> vmcs_write64(VMCS_LINK_POINTER, -1ull); >> + vmx->nested.need_vmcs12_to_shadow_sync = false; >> } >> >> static inline void nested_release_evmcs(struct kvm_vcpu *vcpu) >> >> Can you try it and see what you get? >> > > Confirmed that this fixes the host crashes for me as well. Works, thanks. Tested on a v5.3-rc1. There, the proper patch is already applied. No more crashes, qemu resets as expected. Let's wait for the backport… Ralf > > Now I'm only still seeing guest corruptions on vmport/vmmouse accesses from L2. > Looking into that right now. > > Jan > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-07-22 15:10 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-08 20:39 KVM_SET_NESTED_STATE not yet stable Jan Kiszka 2019-07-10 15:24 ` Raslan, KarimAllah 2019-07-10 16:05 ` Jan Kiszka 2019-07-10 20:31 ` Jan Kiszka 2019-07-10 21:14 ` Jan Kiszka 2019-07-11 11:37 ` Ralf Ramsauer 2019-07-11 17:30 ` Paolo Bonzini 2019-07-19 16:38 ` Paolo Bonzini 2019-07-21 9:05 ` Jan Kiszka 2019-07-22 15:10 ` Ralf Ramsauer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).