* KVM_SET_NESTED_STATE not yet stable
@ 2019-07-08 20:39 Jan Kiszka
2019-07-10 15:24 ` Raslan, KarimAllah
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2019-07-08 20:39 UTC (permalink / raw)
To: Paolo Bonzini, Jim Mattson, Liran Alon, KarimAllah Ahmed, kvm
Hi all,
it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
robustness issues. The most urgent one: With the help of latest QEMU
master that uses this interface, you can easily crash the host. You just
need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
The host CPU that ran this will stall, the system will freeze soon.
I've also seen a pattern with my Jailhouse test VM where I seems to get
stuck in a loop between L1 and L2:
qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload
qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2)
qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load
qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3
qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload
qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2)
qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load
qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3
qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
These issues disappear when going from ebbfef2f back to 6cfd7639 (both
with build fixes) in QEMU. Host kernels tested: 5.1.16 (distro) and 5.2
(vanilla).
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable
2019-07-08 20:39 KVM_SET_NESTED_STATE not yet stable Jan Kiszka
@ 2019-07-10 15:24 ` Raslan, KarimAllah
2019-07-10 16:05 ` Jan Kiszka
0 siblings, 1 reply; 10+ messages in thread
From: Raslan, KarimAllah @ 2019-07-10 15:24 UTC (permalink / raw)
To: jmattson, liran.alon, kvm, pbonzini, jan.kiszka
On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote:
> Hi all,
>
> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
> robustness issues.
I would be very interested to learn about any more robustness issues that you
are seeing.
> The most urgent one: With the help of latest QEMU
> master that uses this interface, you can easily crash the host. You just
> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
> The host CPU that ran this will stall, the system will freeze soon.
Just to confirm, you start an L2 guest using qemu inside an L1-guest and then
hard-reset the L1 guest?
Are you running any special workload in L2 or L1 when you reset? Also how
exactly are you doing this "hard reset"?
(sorry just tried this in my setup and I did not see any problem but my setup
is slightly different, so just ruling out obvious stuff).
>
> I've also seen a pattern with my Jailhouse test VM where I seems to get
> stuck in a loop between L1 and L2:
>
> qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
> qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload
> qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2)
> qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load
> qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
> qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
> qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3
> qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
> qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
> qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload
> qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2)
> qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load
> qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
> qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
> qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3
> qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
> qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>
> These issues disappear when going from ebbfef2f back to 6cfd7639 (both
> with build fixes) in QEMU.
This is the QEMU that you are using in L0 to launch an L1 guest, right? or are
you still referring to the QEMU mentioned above?
> Host kernels tested: 5.1.16 (distro) and 5.2 (vanilla).
> Jan
>
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable
2019-07-10 15:24 ` Raslan, KarimAllah
@ 2019-07-10 16:05 ` Jan Kiszka
2019-07-10 20:31 ` Jan Kiszka
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2019-07-10 16:05 UTC (permalink / raw)
To: Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini
Hi KarimAllah,
On 10.07.19 17:24, Raslan, KarimAllah wrote:
> On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote:
>> Hi all,
>>
>> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
>> robustness issues.
>
> I would be very interested to learn about any more robustness issues that you
> are seeing.
>
>> The most urgent one: With the help of latest QEMU
>> master that uses this interface, you can easily crash the host. You just
>> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
>> The host CPU that ran this will stall, the system will freeze soon.
>
> Just to confirm, you start an L2 guest using qemu inside an L1-guest and then
> hard-reset the L1 guest?
Exactly.
>
> Are you running any special workload in L2 or L1 when you reset? Also how
Nope. It is a standard (though rather oldish) userland in L1, just running a
more recent kernel 5.2.
> exactly are you doing this "hard reset"?
system_reset from the monitor or "reset" from QEMU window menu.
>
> (sorry just tried this in my setup and I did not see any problem but my setup
> is slightly different, so just ruling out obvious stuff).
>
If it helps, I can share privately a guest image that was built via
https://github.com/siemens/jailhouse-images which exposes the reset issue after
starting Jailhouse (instead of qemu-system-x86_64 - though that should "work" as
well, just not tested yet). It's about 70M packed.
Host-wise, 5.2.0 + QEMU master should do. I can also provide you the .config if
needed.
>>
>> I've also seen a pattern with my Jailhouse test VM where I seems to get
>> stuck in a loop between L1 and L2:
>>
>> qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>> qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload
>> qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2)
>> qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load
>> qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
>> qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>> qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3
>> qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>> qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>> qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload
>> qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2)
>> qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load
>> qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
>> qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>> qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3
>> qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>> qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>
>> These issues disappear when going from ebbfef2f back to 6cfd7639 (both
>> with build fixes) in QEMU.
>
> This is the QEMU that you are using in L0 to launch an L1 guest, right? or are
> you still referring to the QEMU mentioned above?
This scenario is similar but still a bit different than the above. Yes, same L0
image and host QEMU here (and the traces were taken on the host, obviously), but
the workload is now as follows:
- boot L1 Linux
- enable Jailhouse inside L1
- move the mouse over the graphical desktop of L2, ie. the former L1
Linux (Jailhouse is now L1)
- the L1/L2 guests enter the loop above while trying to read from the
vmmouse port
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable
2019-07-10 16:05 ` Jan Kiszka
@ 2019-07-10 20:31 ` Jan Kiszka
2019-07-10 21:14 ` Jan Kiszka
2019-07-11 11:37 ` Ralf Ramsauer
0 siblings, 2 replies; 10+ messages in thread
From: Jan Kiszka @ 2019-07-10 20:31 UTC (permalink / raw)
To: Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini; +Cc: Ralf Ramsauer
On 10.07.19 18:05, Jan Kiszka wrote:
> Hi KarimAllah,
>
> On 10.07.19 17:24, Raslan, KarimAllah wrote:
>> On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote:
>>> Hi all,
>>>
>>> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
>>> robustness issues.
>>
>> I would be very interested to learn about any more robustness issues that you
>> are seeing.
>>
>>> The most urgent one: With the help of latest QEMU
>>> master that uses this interface, you can easily crash the host. You just
>>> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
>>> The host CPU that ran this will stall, the system will freeze soon.
>>
>> Just to confirm, you start an L2 guest using qemu inside an L1-guest and then
>> hard-reset the L1 guest?
>
> Exactly.
>
>>
>> Are you running any special workload in L2 or L1 when you reset? Also how
>
> Nope. It is a standard (though rather oldish) userland in L1, just running a
> more recent kernel 5.2.
>
>> exactly are you doing this "hard reset"?
>
> system_reset from the monitor or "reset" from QEMU window menu.
>
>>
>> (sorry just tried this in my setup and I did not see any problem but my setup
>> is slightly different, so just ruling out obvious stuff).
>>
>
> If it helps, I can share privately a guest image that was built via
> https://github.com/siemens/jailhouse-images which exposes the reset issue after
> starting Jailhouse (instead of qemu-system-x86_64 - though that should "work" as
> well, just not tested yet). It's about 70M packed.
>
> Host-wise, 5.2.0 + QEMU master should do. I can also provide you the .config if
> needed.
>
>>>
>>> I've also seen a pattern with my Jailhouse test VM where I seems to get
>>> stuck in a loop between L1 and L2:
>>>
>>> qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>> qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload
>>> qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2)
>>> qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load
>>> qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
>>> qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>> qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3
>>> qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>> qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>> qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload
>>> qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2)
>>> qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load
>>> qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
>>> qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>> qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3
>>> qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>> qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>
>>> These issues disappear when going from ebbfef2f back to 6cfd7639 (both
>>> with build fixes) in QEMU.
>>
>> This is the QEMU that you are using in L0 to launch an L1 guest, right? or are
>> you still referring to the QEMU mentioned above?
>
> This scenario is similar but still a bit different than the above. Yes, same L0
> image and host QEMU here (and the traces were taken on the host, obviously), but
> the workload is now as follows:
>
> - boot L1 Linux
> - enable Jailhouse inside L1
> - move the mouse over the graphical desktop of L2, ie. the former L1
> Linux (Jailhouse is now L1)
> - the L1/L2 guests enter the loop above while trying to read from the
> vmmouse port
>
> Jan
>
Ralf tried my case on some of his systems as well but he also didn't succeed in
reproducing. So we compared vmxcap lists because I'm starting to think it's
feature-related. There are some differences...
--- vmxcap.i7-5600u 2019-07-10 21:59:05.616547924 +0200
+++ vmxcap.jan 2019-07-10 21:58:23.135686409 +0200
@@ -1,6 +1,6 @@
Basic VMX Information
- Hex: 0xda040000000012
- Revision 18
+ Hex: 0xda040000000004
+ Revision 4
VMCS size 1024
VMCS restricted to 32 bit addresses no
Dual-monitor support yes
@@ -51,13 +51,13 @@
Enable INVPCID yes
Enable VM functions yes
VMCS shadowing yes
- Enable ENCLS exiting no
+ Enable ENCLS exiting yes
RDSEED exiting yes
- Enable PML no
+ Enable PML yes
EPT-violation #VE yes
- Conceal non-root operation from PT no
- Enable XSAVES/XRSTORS no
- Mode-based execute control (XS/XU) no
+ Conceal non-root operation from PT yes
+ Enable XSAVES/XRSTORS yes
+ Mode-based execute control (XS/XU) yes
TSC scaling no
VM-Exit controls
Save debug controls default
@@ -69,8 +69,8 @@
Save IA32_EFER yes
Load IA32_EFER yes
Save VMX-preemption timer value yes
- Clear IA32_BNDCFGS no
- Conceal VM exits from PT no
+ Clear IA32_BNDCFGS yes
+ Conceal VM exits from PT yes
VM-Entry controls
Load debug controls default
IA-32e mode guest yes
@@ -79,11 +79,11 @@
Load IA32_PERF_GLOBAL_CTRL yes
Load IA32_PAT yes
Load IA32_EFER yes
- Load IA32_BNDCFGS no
- Conceal VM entries from PT no
+ Load IA32_BNDCFGS yes
+ Conceal VM entries from PT yes
Miscellaneous data
- Hex: 0x300481e5
- VMX-preemption timer scale (log2) 5
+ Hex: 0x7004c1e7
+ VMX-preemption timer scale (log2) 7
Store EFER.LMA into IA-32e mode guest control yes
HLT activity state yes
Shutdown activity state yes
@@ -93,10 +93,10 @@
MSR-load/store count recommendation 0
IA32_SMM_MONITOR_CTL[2] can be set to 1 yes
VMWRITE to VM-exit information fields yes
- Inject event with insn length=0 no
+ Inject event with insn length=0 yes
MSEG revision identifier 0
VPID and EPT capabilities
- Hex: 0xf0106334141
+ Hex: 0xf0106734141
Execute-only EPT translations yes
Page-walk length 4 yes
Paging-structure memory type UC yes
And another machine that does not crash:
--- vmxcaps.e5-2683v4 2019-07-10 22:21:28.620329384 +0200
+++ vmxcap.jan 2019-07-10 21:58:23.135686409 +0200
@@ -1,6 +1,6 @@
Basic VMX Information
- Hex: 0xda040000000012
- Revision 18
+ Hex: 0xda040000000004
+ Revision 4
VMCS size 1024
VMCS restricted to 32 bit addresses no
Dual-monitor support yes
@@ -12,7 +12,7 @@
NMI exiting yes
Virtual NMIs yes
Activate VMX-preemption timer yes
- Process posted interrupts yes
+ Process posted interrupts no
primary processor-based controls
Interrupt window exiting yes
Use TSC offsetting yes
@@ -44,20 +44,20 @@
Enable VPID yes
WBINVD exiting yes
Unrestricted guest yes
- APIC register emulation yes
- Virtual interrupt delivery yes
+ APIC register emulation no
+ Virtual interrupt delivery no
PAUSE-loop exiting yes
RDRAND exiting yes
Enable INVPCID yes
Enable VM functions yes
VMCS shadowing yes
- Enable ENCLS exiting no
+ Enable ENCLS exiting yes
RDSEED exiting yes
Enable PML yes
EPT-violation #VE yes
- Conceal non-root operation from PT no
- Enable XSAVES/XRSTORS no
- Mode-based execute control (XS/XU) no
+ Conceal non-root operation from PT yes
+ Enable XSAVES/XRSTORS yes
+ Mode-based execute control (XS/XU) yes
TSC scaling no
VM-Exit controls
Save debug controls default
@@ -69,8 +69,8 @@
Save IA32_EFER yes
Load IA32_EFER yes
Save VMX-preemption timer value yes
- Clear IA32_BNDCFGS no
- Conceal VM exits from PT no
+ Clear IA32_BNDCFGS yes
+ Conceal VM exits from PT yes
VM-Entry controls
Load debug controls default
IA-32e mode guest yes
@@ -79,11 +79,11 @@
Load IA32_PERF_GLOBAL_CTRL yes
Load IA32_PAT yes
Load IA32_EFER yes
- Load IA32_BNDCFGS no
- Conceal VM entries from PT no
+ Load IA32_BNDCFGS yes
+ Conceal VM entries from PT yes
Miscellaneous data
- Hex: 0x300481e5
- VMX-preemption timer scale (log2) 5
+ Hex: 0x7004c1e7
+ VMX-preemption timer scale (log2) 7
Store EFER.LMA into IA-32e mode guest control yes
HLT activity state yes
Shutdown activity state yes
@@ -93,10 +93,10 @@
MSR-load/store count recommendation 0
IA32_SMM_MONITOR_CTL[2] can be set to 1 yes
VMWRITE to VM-exit information fields yes
- Inject event with insn length=0 no
+ Inject event with insn length=0 yes
MSEG revision identifier 0
VPID and EPT capabilities
- Hex: 0xf0106334141
+ Hex: 0xf0106734141
Execute-only EPT translations yes
Page-walk length 4 yes
Paging-structure memory type UC yes
And on a Xeon D-1540, I'm not seeing a crash but a kvm entry failure when
resetting L1 while running Jailhouse:
KVM: entry failed, hardware error 0x7
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00a09b00
SS =0000 00000000 0000ffff 00c09300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00
f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Here is the vmxcap diff:
--- xeon-d 2019-07-10 22:29:56.735374032 +0200
+++ i7-8850H 2019-07-10 22:29:31.747467248 +0200
@@ -1,6 +1,6 @@
Basic VMX Information
- Hex: 0xda040000000012
- Revision 18
+ Hex: 0xda040000000004
+ Revision 4
VMCS size 1024
VMCS restricted to 32 bit addresses no
Dual-monitor support yes
@@ -12,7 +12,7 @@ pin-based controls
NMI exiting yes
Virtual NMIs yes
Activate VMX-preemption timer yes
- Process posted interrupts yes
+ Process posted interrupts no
primary processor-based controls
Interrupt window exiting yes
Use TSC offsetting yes
@@ -44,20 +44,20 @@ secondary processor-based controls
Enable VPID yes
WBINVD exiting yes
Unrestricted guest yes
- APIC register emulation yes
- Virtual interrupt delivery yes
+ APIC register emulation no
+ Virtual interrupt delivery no
PAUSE-loop exiting yes
RDRAND exiting yes
Enable INVPCID yes
Enable VM functions yes
VMCS shadowing yes
- Enable ENCLS exiting no
+ Enable ENCLS exiting yes
RDSEED exiting yes
Enable PML yes
EPT-violation #VE yes
- Conceal non-root operation from PT no
- Enable XSAVES/XRSTORS no
- Mode-based execute control (XS/XU) no
+ Conceal non-root operation from PT yes
+ Enable XSAVES/XRSTORS yes
+ Mode-based execute control (XS/XU) yes
TSC scaling no
VM-Exit controls
Save debug controls default
@@ -69,8 +69,8 @@ VM-Exit controls
Save IA32_EFER yes
Load IA32_EFER yes
Save VMX-preemption timer value yes
- Clear IA32_BNDCFGS no
- Conceal VM exits from PT no
+ Clear IA32_BNDCFGS yes
+ Conceal VM exits from PT yes
VM-Entry controls
Load debug controls default
IA-32e mode guest yes
@@ -79,11 +79,11 @@ VM-Entry controls
Load IA32_PERF_GLOBAL_CTRL yes
Load IA32_PAT yes
Load IA32_EFER yes
- Load IA32_BNDCFGS no
- Conceal VM entries from PT no
+ Load IA32_BNDCFGS yes
+ Conceal VM entries from PT yes
Miscellaneous data
- Hex: 0x300481e5
- VMX-preemption timer scale (log2) 5
+ Hex: 0x7004c1e7
+ VMX-preemption timer scale (log2) 7
Store EFER.LMA into IA-32e mode guest control yes
HLT activity state yes
Shutdown activity state yes
@@ -93,10 +93,10 @@ Miscellaneous data
MSR-load/store count recommendation 0
IA32_SMM_MONITOR_CTL[2] can be set to 1 yes
VMWRITE to VM-exit information fields yes
- Inject event with insn length=0 no
+ Inject event with insn length=0 yes
MSEG revision identifier 0
VPID and EPT capabilities
- Hex: 0xf0106334141
+ Hex: 0xf0106734141
Execute-only EPT translations yes
Page-walk length 4 yes
Paging-structure memory type UC yes
Maybe the KVM code does not take the latest VMX features into account when
importing a userspace nested state?
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable
2019-07-10 20:31 ` Jan Kiszka
@ 2019-07-10 21:14 ` Jan Kiszka
2019-07-11 11:37 ` Ralf Ramsauer
1 sibling, 0 replies; 10+ messages in thread
From: Jan Kiszka @ 2019-07-10 21:14 UTC (permalink / raw)
To: Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini; +Cc: Ralf Ramsauer
On 10.07.19 22:31, Jan Kiszka wrote:
> And on a Xeon D-1540, I'm not seeing a crash but a kvm entry failure when
> resetting L1 while running Jailhouse:
>
> KVM: entry failed, hardware error 0x7
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00a09b00
> SS =0000 00000000 0000ffff 00c09300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT= 00000000 0000ffff
> IDT= 00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00
> f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
OK, looks like the feature diff was a red herring: Ralf found a server with even
more features and without a crash, and I found familiar error messages in the
kernel log of that Xeon D:
kvm: vmptrld (null)/778000000000 failed
kvm: vmclear fail: (null)/778000000000
Only difference: No crash, just that more graceful entry failure.
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable
2019-07-10 20:31 ` Jan Kiszka
2019-07-10 21:14 ` Jan Kiszka
@ 2019-07-11 11:37 ` Ralf Ramsauer
2019-07-11 17:30 ` Paolo Bonzini
1 sibling, 1 reply; 10+ messages in thread
From: Ralf Ramsauer @ 2019-07-11 11:37 UTC (permalink / raw)
To: Jan Kiszka, Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini
Hi all,
On 7/10/19 10:31 PM, Jan Kiszka wrote:
> On 10.07.19 18:05, Jan Kiszka wrote:
>> Hi KarimAllah,
>>
>> On 10.07.19 17:24, Raslan, KarimAllah wrote:
>>> On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote:
>>>> Hi all,
>>>>
>>>> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
>>>> robustness issues.
>>>
>>> I would be very interested to learn about any more robustness issues that you
>>> are seeing.
>>>
>>>> The most urgent one: With the help of latest QEMU
>>>> master that uses this interface, you can easily crash the host. You just
>>>> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
>>>> The host CPU that ran this will stall, the system will freeze soon.
>>>
>>> Just to confirm, you start an L2 guest using qemu inside an L1-guest and then
>>> hard-reset the L1 guest?
>>
>> Exactly.
>>
>>>
>>> Are you running any special workload in L2 or L1 when you reset? Also how
>>
>> Nope. It is a standard (though rather oldish) userland in L1, just running a
>> more recent kernel 5.2.
>>
>>> exactly are you doing this "hard reset"?
>>
>> system_reset from the monitor or "reset" from QEMU window menu.
While I'm not able to reproduce this behaviour on any of my machines
(i7-4810MQ, i7-5600U, Xeon Gold 5118),
>>
>>>
>>> (sorry just tried this in my setup and I did not see any problem but my setup
>>> is slightly different, so just ruling out obvious stuff).
>>>
>>
>> If it helps, I can share privately a guest image that was built via
>> https://github.com/siemens/jailhouse-images which exposes the reset issue after
>> starting Jailhouse (instead of qemu-system-x86_64 - though that should "work" as
>> well, just not tested yet). It's about 70M packed.
>>
>> Host-wise, 5.2.0 + QEMU master should do. I can also provide you the .config if
>> needed.
I can reproduce and confirm this issue. A system_reset of qemu after
Jailhouse is enabled leads to the crash listed below, on all machines.
On the Xeon Gold, e.g., Qemu reports:
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00a09b00
SS =0000 00000000 0000ffff 00c09300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
Kernel:
[ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed
[ 1868.804568] kvm: vmclear fail: (null)/6b8640000000
And the host freezes unrecoverably. Hosts use standard distro kernels
>= v5.0.
Ralf
>>
>>>>
>>>> I've also seen a pattern with my Jailhouse test VM where I seems to get
>>>> stuck in a loop between L1 and L2:
>>>>
>>>> qemu-system-x86-6660 [007] 398.691401: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>> qemu-system-x86-6660 [007] 398.691402: kvm_fpu: unload
>>>> qemu-system-x86-6660 [007] 398.691403: kvm_userspace_exit: reason KVM_EXIT_IO (2)
>>>> qemu-system-x86-6660 [007] 398.691440: kvm_fpu: load
>>>> qemu-system-x86-6660 [007] 398.691441: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
>>>> qemu-system-x86-6660 [007] 398.691443: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>>> qemu-system-x86-6660 [007] 398.691444: kvm_entry: vcpu 3
>>>> qemu-system-x86-6660 [007] 398.691475: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>>> qemu-system-x86-6660 [007] 398.691476: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>> qemu-system-x86-6660 [007] 398.691477: kvm_fpu: unload
>>>> qemu-system-x86-6660 [007] 398.691478: kvm_userspace_exit: reason KVM_EXIT_IO (2)
>>>> qemu-system-x86-6660 [007] 398.691526: kvm_fpu: load
>>>> qemu-system-x86-6660 [007] 398.691527: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x4
>>>> qemu-system-x86-6660 [007] 398.691529: kvm_mmu_get_page: existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>>> qemu-system-x86-6660 [007] 398.691530: kvm_entry: vcpu 3
>>>> qemu-system-x86-6660 [007] 398.691533: kvm_exit: reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>>> qemu-system-x86-6660 [007] 398.691534: kvm_nested_vmexit: rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>>
>>>> These issues disappear when going from ebbfef2f back to 6cfd7639 (both
>>>> with build fixes) in QEMU.
>>>
>>> This is the QEMU that you are using in L0 to launch an L1 guest, right? or are
>>> you still referring to the QEMU mentioned above?
>>
>> This scenario is similar but still a bit different than the above. Yes, same L0
>> image and host QEMU here (and the traces were taken on the host, obviously), but
>> the workload is now as follows:
>>
>> - boot L1 Linux
>> - enable Jailhouse inside L1
>> - move the mouse over the graphical desktop of L2, ie. the former L1
>> Linux (Jailhouse is now L1)
>> - the L1/L2 guests enter the loop above while trying to read from the
>> vmmouse port
>>
>> Jan
>>
>
> Ralf tried my case on some of his systems as well but he also didn't succeed in
> reproducing. So we compared vmxcap lists because I'm starting to think it's
> feature-related. There are some differences...
>
> --- vmxcap.i7-5600u 2019-07-10 21:59:05.616547924 +0200
> +++ vmxcap.jan 2019-07-10 21:58:23.135686409 +0200
> @@ -1,6 +1,6 @@
> Basic VMX Information
> - Hex: 0xda040000000012
> - Revision 18
> + Hex: 0xda040000000004
> + Revision 4
> VMCS size 1024
> VMCS restricted to 32 bit addresses no
> Dual-monitor support yes
> @@ -51,13 +51,13 @@
> Enable INVPCID yes
> Enable VM functions yes
> VMCS shadowing yes
> - Enable ENCLS exiting no
> + Enable ENCLS exiting yes
> RDSEED exiting yes
> - Enable PML no
> + Enable PML yes
> EPT-violation #VE yes
> - Conceal non-root operation from PT no
> - Enable XSAVES/XRSTORS no
> - Mode-based execute control (XS/XU) no
> + Conceal non-root operation from PT yes
> + Enable XSAVES/XRSTORS yes
> + Mode-based execute control (XS/XU) yes
> TSC scaling no
> VM-Exit controls
> Save debug controls default
> @@ -69,8 +69,8 @@
> Save IA32_EFER yes
> Load IA32_EFER yes
> Save VMX-preemption timer value yes
> - Clear IA32_BNDCFGS no
> - Conceal VM exits from PT no
> + Clear IA32_BNDCFGS yes
> + Conceal VM exits from PT yes
> VM-Entry controls
> Load debug controls default
> IA-32e mode guest yes
> @@ -79,11 +79,11 @@
> Load IA32_PERF_GLOBAL_CTRL yes
> Load IA32_PAT yes
> Load IA32_EFER yes
> - Load IA32_BNDCFGS no
> - Conceal VM entries from PT no
> + Load IA32_BNDCFGS yes
> + Conceal VM entries from PT yes
> Miscellaneous data
> - Hex: 0x300481e5
> - VMX-preemption timer scale (log2) 5
> + Hex: 0x7004c1e7
> + VMX-preemption timer scale (log2) 7
> Store EFER.LMA into IA-32e mode guest control yes
> HLT activity state yes
> Shutdown activity state yes
> @@ -93,10 +93,10 @@
> MSR-load/store count recommendation 0
> IA32_SMM_MONITOR_CTL[2] can be set to 1 yes
> VMWRITE to VM-exit information fields yes
> - Inject event with insn length=0 no
> + Inject event with insn length=0 yes
> MSEG revision identifier 0
> VPID and EPT capabilities
> - Hex: 0xf0106334141
> + Hex: 0xf0106734141
> Execute-only EPT translations yes
> Page-walk length 4 yes
> Paging-structure memory type UC yes
>
> And another machine that does not crash:
>
> --- vmxcaps.e5-2683v4 2019-07-10 22:21:28.620329384 +0200
> +++ vmxcap.jan 2019-07-10 21:58:23.135686409 +0200
> @@ -1,6 +1,6 @@
> Basic VMX Information
> - Hex: 0xda040000000012
> - Revision 18
> + Hex: 0xda040000000004
> + Revision 4
> VMCS size 1024
> VMCS restricted to 32 bit addresses no
> Dual-monitor support yes
> @@ -12,7 +12,7 @@
> NMI exiting yes
> Virtual NMIs yes
> Activate VMX-preemption timer yes
> - Process posted interrupts yes
> + Process posted interrupts no
> primary processor-based controls
> Interrupt window exiting yes
> Use TSC offsetting yes
> @@ -44,20 +44,20 @@
> Enable VPID yes
> WBINVD exiting yes
> Unrestricted guest yes
> - APIC register emulation yes
> - Virtual interrupt delivery yes
> + APIC register emulation no
> + Virtual interrupt delivery no
> PAUSE-loop exiting yes
> RDRAND exiting yes
> Enable INVPCID yes
> Enable VM functions yes
> VMCS shadowing yes
> - Enable ENCLS exiting no
> + Enable ENCLS exiting yes
> RDSEED exiting yes
> Enable PML yes
> EPT-violation #VE yes
> - Conceal non-root operation from PT no
> - Enable XSAVES/XRSTORS no
> - Mode-based execute control (XS/XU) no
> + Conceal non-root operation from PT yes
> + Enable XSAVES/XRSTORS yes
> + Mode-based execute control (XS/XU) yes
> TSC scaling no
> VM-Exit controls
> Save debug controls default
> @@ -69,8 +69,8 @@
> Save IA32_EFER yes
> Load IA32_EFER yes
> Save VMX-preemption timer value yes
> - Clear IA32_BNDCFGS no
> - Conceal VM exits from PT no
> + Clear IA32_BNDCFGS yes
> + Conceal VM exits from PT yes
> VM-Entry controls
> Load debug controls default
> IA-32e mode guest yes
> @@ -79,11 +79,11 @@
> Load IA32_PERF_GLOBAL_CTRL yes
> Load IA32_PAT yes
> Load IA32_EFER yes
> - Load IA32_BNDCFGS no
> - Conceal VM entries from PT no
> + Load IA32_BNDCFGS yes
> + Conceal VM entries from PT yes
> Miscellaneous data
> - Hex: 0x300481e5
> - VMX-preemption timer scale (log2) 5
> + Hex: 0x7004c1e7
> + VMX-preemption timer scale (log2) 7
> Store EFER.LMA into IA-32e mode guest control yes
> HLT activity state yes
> Shutdown activity state yes
> @@ -93,10 +93,10 @@
> MSR-load/store count recommendation 0
> IA32_SMM_MONITOR_CTL[2] can be set to 1 yes
> VMWRITE to VM-exit information fields yes
> - Inject event with insn length=0 no
> + Inject event with insn length=0 yes
> MSEG revision identifier 0
> VPID and EPT capabilities
> - Hex: 0xf0106334141
> + Hex: 0xf0106734141
> Execute-only EPT translations yes
> Page-walk length 4 yes
> Paging-structure memory type UC yes
>
> And on a Xeon D-1540, I'm not seeing a crash but a kvm entry failure when
> resetting L1 while running Jailhouse:
>
> KVM: entry failed, hardware error 0x7
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00a09b00
> SS =0000 00000000 0000ffff 00c09300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT= 00000000 0000ffff
> IDT= 00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00
> f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Here is the vmxcap diff:
>
> --- xeon-d 2019-07-10 22:29:56.735374032 +0200
> +++ i7-8850H 2019-07-10 22:29:31.747467248 +0200
> @@ -1,6 +1,6 @@
> Basic VMX Information
> - Hex: 0xda040000000012
> - Revision 18
> + Hex: 0xda040000000004
> + Revision 4
> VMCS size 1024
> VMCS restricted to 32 bit addresses no
> Dual-monitor support yes
> @@ -12,7 +12,7 @@ pin-based controls
> NMI exiting yes
> Virtual NMIs yes
> Activate VMX-preemption timer yes
> - Process posted interrupts yes
> + Process posted interrupts no
> primary processor-based controls
> Interrupt window exiting yes
> Use TSC offsetting yes
> @@ -44,20 +44,20 @@ secondary processor-based controls
> Enable VPID yes
> WBINVD exiting yes
> Unrestricted guest yes
> - APIC register emulation yes
> - Virtual interrupt delivery yes
> + APIC register emulation no
> + Virtual interrupt delivery no
> PAUSE-loop exiting yes
> RDRAND exiting yes
> Enable INVPCID yes
> Enable VM functions yes
> VMCS shadowing yes
> - Enable ENCLS exiting no
> + Enable ENCLS exiting yes
> RDSEED exiting yes
> Enable PML yes
> EPT-violation #VE yes
> - Conceal non-root operation from PT no
> - Enable XSAVES/XRSTORS no
> - Mode-based execute control (XS/XU) no
> + Conceal non-root operation from PT yes
> + Enable XSAVES/XRSTORS yes
> + Mode-based execute control (XS/XU) yes
> TSC scaling no
> VM-Exit controls
> Save debug controls default
> @@ -69,8 +69,8 @@ VM-Exit controls
> Save IA32_EFER yes
> Load IA32_EFER yes
> Save VMX-preemption timer value yes
> - Clear IA32_BNDCFGS no
> - Conceal VM exits from PT no
> + Clear IA32_BNDCFGS yes
> + Conceal VM exits from PT yes
> VM-Entry controls
> Load debug controls default
> IA-32e mode guest yes
> @@ -79,11 +79,11 @@ VM-Entry controls
> Load IA32_PERF_GLOBAL_CTRL yes
> Load IA32_PAT yes
> Load IA32_EFER yes
> - Load IA32_BNDCFGS no
> - Conceal VM entries from PT no
> + Load IA32_BNDCFGS yes
> + Conceal VM entries from PT yes
> Miscellaneous data
> - Hex: 0x300481e5
> - VMX-preemption timer scale (log2) 5
> + Hex: 0x7004c1e7
> + VMX-preemption timer scale (log2) 7
> Store EFER.LMA into IA-32e mode guest control yes
> HLT activity state yes
> Shutdown activity state yes
> @@ -93,10 +93,10 @@ Miscellaneous data
> MSR-load/store count recommendation 0
> IA32_SMM_MONITOR_CTL[2] can be set to 1 yes
> VMWRITE to VM-exit information fields yes
> - Inject event with insn length=0 no
> + Inject event with insn length=0 yes
> MSEG revision identifier 0
> VPID and EPT capabilities
> - Hex: 0xf0106334141
> + Hex: 0xf0106734141
> Execute-only EPT translations yes
> Page-walk length 4 yes
> Paging-structure memory type UC yes
>
> Maybe the KVM code does not take the latest VMX features into account when
> importing a userspace nested state?
>
> Jan
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable
2019-07-11 11:37 ` Ralf Ramsauer
@ 2019-07-11 17:30 ` Paolo Bonzini
2019-07-19 16:38 ` Paolo Bonzini
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2019-07-11 17:30 UTC (permalink / raw)
To: Ralf Ramsauer, Jan Kiszka, Raslan, KarimAllah, jmattson, liran.alon, kvm
On 11/07/19 13:37, Ralf Ramsauer wrote:
> I can reproduce and confirm this issue. A system_reset of qemu after
> Jailhouse is enabled leads to the crash listed below, on all machines.
>
> On the Xeon Gold, e.g., Qemu reports:
>
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00a09b00
> SS =0000 00000000 0000ffff 00c09300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT= 00000000 0000ffff
> IDT= 00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00
>
> Kernel:
> [ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed
> [ 1868.804568] kvm: vmclear fail: (null)/6b8640000000
>
> And the host freezes unrecoverably. Hosts use standard distro kernels
Thanks. I'm going to look at it tomorrow.
Paolo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable
2019-07-11 17:30 ` Paolo Bonzini
@ 2019-07-19 16:38 ` Paolo Bonzini
2019-07-21 9:05 ` Jan Kiszka
0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2019-07-19 16:38 UTC (permalink / raw)
To: Ralf Ramsauer, Jan Kiszka, Raslan, KarimAllah, jmattson, liran.alon, kvm
On 11/07/19 19:30, Paolo Bonzini wrote:
> On 11/07/19 13:37, Ralf Ramsauer wrote:
>> I can reproduce and confirm this issue. A system_reset of qemu after
>> Jailhouse is enabled leads to the crash listed below, on all machines.
>>
>> On the Xeon Gold, e.g., Qemu reports:
>>
>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
>> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0000 00000000 0000ffff 00009300
>> CS =f000 ffff0000 0000ffff 00a09b00
>> SS =0000 00000000 0000ffff 00c09300
>> DS =0000 00000000 0000ffff 00009300
>> FS =0000 00000000 0000ffff 00009300
>> GS =0000 00000000 0000ffff 00009300
>> LDT=0000 00000000 0000ffff 00008200
>> TR =0000 00000000 0000ffff 00008b00
>> GDT= 00000000 0000ffff
>> IDT= 00000000 0000ffff
>> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>> DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000000
>> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
>> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00
>>
>> Kernel:
>> [ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed
>> [ 1868.804568] kvm: vmclear fail: (null)/6b8640000000
>>
>> And the host freezes unrecoverably. Hosts use standard distro kernels
>
> Thanks. I'm going to look at it tomorrow.
Ok, it was only tomorrow modulo 7, but the first fix I got is trivial:
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6e88f459b323..6119b30347c6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
{
secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
vmcs_write64(VMCS_LINK_POINTER, -1ull);
+ vmx->nested.need_vmcs12_to_shadow_sync = false;
}
static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
Can you try it and see what you get?
Paolo
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable
2019-07-19 16:38 ` Paolo Bonzini
@ 2019-07-21 9:05 ` Jan Kiszka
2019-07-22 15:10 ` Ralf Ramsauer
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2019-07-21 9:05 UTC (permalink / raw)
To: Paolo Bonzini, Ralf Ramsauer, Raslan, KarimAllah, jmattson,
liran.alon, kvm
On 19.07.19 18:38, Paolo Bonzini wrote:
> On 11/07/19 19:30, Paolo Bonzini wrote:
>> On 11/07/19 13:37, Ralf Ramsauer wrote:
>>> I can reproduce and confirm this issue. A system_reset of qemu after
>>> Jailhouse is enabled leads to the crash listed below, on all machines.
>>>
>>> On the Xeon Gold, e.g., Qemu reports:
>>>
>>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
>>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
>>> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0000 00000000 0000ffff 00009300
>>> CS =f000 ffff0000 0000ffff 00a09b00
>>> SS =0000 00000000 0000ffff 00c09300
>>> DS =0000 00000000 0000ffff 00009300
>>> FS =0000 00000000 0000ffff 00009300
>>> GS =0000 00000000 0000ffff 00009300
>>> LDT=0000 00000000 0000ffff 00008200
>>> TR =0000 00000000 0000ffff 00008b00
>>> GDT= 00000000 0000ffff
>>> IDT= 00000000 0000ffff
>>> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>>> DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000000
>>> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
>>> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00
>>>
>>> Kernel:
>>> [ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed
>>> [ 1868.804568] kvm: vmclear fail: (null)/6b8640000000
>>>
>>> And the host freezes unrecoverably. Hosts use standard distro kernels
>>
>> Thanks. I'm going to look at it tomorrow.
>
> Ok, it was only tomorrow modulo 7, but the first fix I got is trivial:
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6e88f459b323..6119b30347c6 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
> {
> secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
> vmcs_write64(VMCS_LINK_POINTER, -1ull);
> + vmx->nested.need_vmcs12_to_shadow_sync = false;
> }
>
> static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
>
> Can you try it and see what you get?
>
Confirmed that this fixes the host crashes for me as well.
Now I'm only still seeing guest corruptions on vmport/vmmouse accesses from L2.
Looking into that right now.
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KVM_SET_NESTED_STATE not yet stable
2019-07-21 9:05 ` Jan Kiszka
@ 2019-07-22 15:10 ` Ralf Ramsauer
0 siblings, 0 replies; 10+ messages in thread
From: Ralf Ramsauer @ 2019-07-22 15:10 UTC (permalink / raw)
To: Jan Kiszka, Paolo Bonzini, Raslan, KarimAllah, jmattson, liran.alon, kvm
On 7/21/19 11:05 AM, Jan Kiszka wrote:
> On 19.07.19 18:38, Paolo Bonzini wrote:
>> On 11/07/19 19:30, Paolo Bonzini wrote:
>>> On 11/07/19 13:37, Ralf Ramsauer wrote:
>>>> I can reproduce and confirm this issue. A system_reset of qemu after
>>>> Jailhouse is enabled leads to the crash listed below, on all machines.
>>>>
>>>> On the Xeon Gold, e.g., Qemu reports:
>>>>
>>>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
>>>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
>>>> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>>> ES =0000 00000000 0000ffff 00009300
>>>> CS =f000 ffff0000 0000ffff 00a09b00
>>>> SS =0000 00000000 0000ffff 00c09300
>>>> DS =0000 00000000 0000ffff 00009300
>>>> FS =0000 00000000 0000ffff 00009300
>>>> GS =0000 00000000 0000ffff 00009300
>>>> LDT=0000 00000000 0000ffff 00008200
>>>> TR =0000 00000000 0000ffff 00008b00
>>>> GDT= 00000000 0000ffff
>>>> IDT= 00000000 0000ffff
>>>> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
>>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>>>> DR3=0000000000000000
>>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>>> EFER=0000000000000000
>>>> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
>>>> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
>>>> 00 00 00 00
>>>>
>>>> Kernel:
>>>> [ 1868.804515] kvm: vmptrld (null)/6b8640000000 failed
>>>> [ 1868.804568] kvm: vmclear fail: (null)/6b8640000000
>>>>
>>>> And the host freezes unrecoverably. Hosts use standard distro kernels
>>>
>>> Thanks. I'm going to look at it tomorrow.
>>
>> Ok, it was only tomorrow modulo 7, but the first fix I got is trivial:
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 6e88f459b323..6119b30347c6 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
>> {
>> secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
>> vmcs_write64(VMCS_LINK_POINTER, -1ull);
>> + vmx->nested.need_vmcs12_to_shadow_sync = false;
>> }
>>
>> static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
>>
>> Can you try it and see what you get?
>>
>
> Confirmed that this fixes the host crashes for me as well.
Works, thanks. Tested on a v5.3-rc1. There, the proper patch is already
applied. No more crashes, qemu resets as expected. Let's wait for the
backport…
Ralf
>
> Now I'm only still seeing guest corruptions on vmport/vmmouse accesses from L2.
> Looking into that right now.
>
> Jan
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-07-22 15:10 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-08 20:39 KVM_SET_NESTED_STATE not yet stable Jan Kiszka
2019-07-10 15:24 ` Raslan, KarimAllah
2019-07-10 16:05 ` Jan Kiszka
2019-07-10 20:31 ` Jan Kiszka
2019-07-10 21:14 ` Jan Kiszka
2019-07-11 11:37 ` Ralf Ramsauer
2019-07-11 17:30 ` Paolo Bonzini
2019-07-19 16:38 ` Paolo Bonzini
2019-07-21 9:05 ` Jan Kiszka
2019-07-22 15:10 ` Ralf Ramsauer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).