kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* KVM_SET_NESTED_STATE not yet stable
@ 2019-07-08 20:39 Jan Kiszka
  2019-07-10 15:24 ` Raslan, KarimAllah
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2019-07-08 20:39 UTC (permalink / raw)
  To: Paolo Bonzini, Jim Mattson, Liran Alon, KarimAllah Ahmed, kvm

Hi all,

it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
robustness issues. The most urgent one: With the help of latest QEMU
master that uses this interface, you can easily crash the host. You just
need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
The host CPU that ran this will stall, the system will freeze soon.

I've also seen a pattern with my Jailhouse test VM where I seems to get
stuck in a loop between L1 and L2:

 qemu-system-x86-6660  [007]   398.691401: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
 qemu-system-x86-6660  [007]   398.691402: kvm_fpu:              unload
 qemu-system-x86-6660  [007]   398.691403: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
 qemu-system-x86-6660  [007]   398.691440: kvm_fpu:              load
 qemu-system-x86-6660  [007]   398.691441: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
 qemu-system-x86-6660  [007]   398.691443: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
 qemu-system-x86-6660  [007]   398.691444: kvm_entry:            vcpu 3
 qemu-system-x86-6660  [007]   398.691475: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
 qemu-system-x86-6660  [007]   398.691476: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
 qemu-system-x86-6660  [007]   398.691477: kvm_fpu:              unload
 qemu-system-x86-6660  [007]   398.691478: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
 qemu-system-x86-6660  [007]   398.691526: kvm_fpu:              load
 qemu-system-x86-6660  [007]   398.691527: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
 qemu-system-x86-6660  [007]   398.691529: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
 qemu-system-x86-6660  [007]   398.691530: kvm_entry:            vcpu 3
 qemu-system-x86-6660  [007]   398.691533: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
 qemu-system-x86-6660  [007]   398.691534: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0

These issues disappear when going from ebbfef2f back to 6cfd7639 (both
with build fixes) in QEMU. Host kernels tested: 5.1.16 (distro) and 5.2
(vanilla).

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM_SET_NESTED_STATE not yet stable
  2019-07-08 20:39 KVM_SET_NESTED_STATE not yet stable Jan Kiszka
@ 2019-07-10 15:24 ` Raslan, KarimAllah
  2019-07-10 16:05   ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Raslan, KarimAllah @ 2019-07-10 15:24 UTC (permalink / raw)
  To: jmattson, liran.alon, kvm, pbonzini, jan.kiszka

On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote:
> Hi all,
> 
> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
> robustness issues.

I would be very interested to learn about any more robustness issues that you 
are seeing.

> The most urgent one: With the help of latest QEMU
> master that uses this interface, you can easily crash the host. You just
> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
> The host CPU that ran this will stall, the system will freeze soon.

Just to confirm, you start an L2 guest using qemu inside an L1-guest and then 
hard-reset the L1 guest?

Are you running any special workload in L2 or L1 when you reset? Also how 
exactly are you doing this "hard reset"?

(sorry just tried this in my setup and I did not see any problem but my setup
 is slightly different, so just ruling out obvious stuff).

> 
> I've also seen a pattern with my Jailhouse test VM where I seems to get
> stuck in a loop between L1 and L2:
> 
>  qemu-system-x86-6660  [007]   398.691401: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>  qemu-system-x86-6660  [007]   398.691402: kvm_fpu:              unload
>  qemu-system-x86-6660  [007]   398.691403: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>  qemu-system-x86-6660  [007]   398.691440: kvm_fpu:              load
>  qemu-system-x86-6660  [007]   398.691441: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
>  qemu-system-x86-6660  [007]   398.691443: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>  qemu-system-x86-6660  [007]   398.691444: kvm_entry:            vcpu 3
>  qemu-system-x86-6660  [007]   398.691475: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>  qemu-system-x86-6660  [007]   398.691476: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>  qemu-system-x86-6660  [007]   398.691477: kvm_fpu:              unload
>  qemu-system-x86-6660  [007]   398.691478: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>  qemu-system-x86-6660  [007]   398.691526: kvm_fpu:              load
>  qemu-system-x86-6660  [007]   398.691527: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
>  qemu-system-x86-6660  [007]   398.691529: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>  qemu-system-x86-6660  [007]   398.691530: kvm_entry:            vcpu 3
>  qemu-system-x86-6660  [007]   398.691533: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>  qemu-system-x86-6660  [007]   398.691534: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
> 
> These issues disappear when going from ebbfef2f back to 6cfd7639 (both
> with build fixes) in QEMU.

This is the QEMU that you are using in L0 to launch an L1 guest, right? or are 
you still referring to the QEMU mentioned above?

> Host kernels tested: 5.1.16 (distro) and 5.2 (vanilla).
> Jan
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM_SET_NESTED_STATE not yet stable
  2019-07-10 15:24 ` Raslan, KarimAllah
@ 2019-07-10 16:05   ` Jan Kiszka
  2019-07-10 20:31     ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2019-07-10 16:05 UTC (permalink / raw)
  To: Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini

Hi KarimAllah,

On 10.07.19 17:24, Raslan, KarimAllah wrote:
> On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote:
>> Hi all,
>>
>> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
>> robustness issues.
> 
> I would be very interested to learn about any more robustness issues that you 
> are seeing.
> 
>> The most urgent one: With the help of latest QEMU
>> master that uses this interface, you can easily crash the host. You just
>> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
>> The host CPU that ran this will stall, the system will freeze soon.
> 
> Just to confirm, you start an L2 guest using qemu inside an L1-guest and then 
> hard-reset the L1 guest?

Exactly.

> 
> Are you running any special workload in L2 or L1 when you reset? Also how 

Nope. It is a standard (though rather oldish) userland in L1, just running a
more recent kernel 5.2.

> exactly are you doing this "hard reset"?

system_reset from the monitor or "reset" from QEMU window menu.

> 
> (sorry just tried this in my setup and I did not see any problem but my setup
>  is slightly different, so just ruling out obvious stuff).
> 

If it helps, I can share privately a guest image that was built via
https://github.com/siemens/jailhouse-images which exposes the reset issue after
starting Jailhouse (instead of qemu-system-x86_64 - though that should "work" as
well, just not tested yet). It's about 70M packed.

Host-wise, 5.2.0 + QEMU master should do. I can also provide you the .config if
needed.

>>
>> I've also seen a pattern with my Jailhouse test VM where I seems to get
>> stuck in a loop between L1 and L2:
>>
>>  qemu-system-x86-6660  [007]   398.691401: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>  qemu-system-x86-6660  [007]   398.691402: kvm_fpu:              unload
>>  qemu-system-x86-6660  [007]   398.691403: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>>  qemu-system-x86-6660  [007]   398.691440: kvm_fpu:              load
>>  qemu-system-x86-6660  [007]   398.691441: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
>>  qemu-system-x86-6660  [007]   398.691443: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>  qemu-system-x86-6660  [007]   398.691444: kvm_entry:            vcpu 3
>>  qemu-system-x86-6660  [007]   398.691475: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>  qemu-system-x86-6660  [007]   398.691476: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>  qemu-system-x86-6660  [007]   398.691477: kvm_fpu:              unload
>>  qemu-system-x86-6660  [007]   398.691478: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>>  qemu-system-x86-6660  [007]   398.691526: kvm_fpu:              load
>>  qemu-system-x86-6660  [007]   398.691527: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
>>  qemu-system-x86-6660  [007]   398.691529: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>  qemu-system-x86-6660  [007]   398.691530: kvm_entry:            vcpu 3
>>  qemu-system-x86-6660  [007]   398.691533: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>  qemu-system-x86-6660  [007]   398.691534: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>
>> These issues disappear when going from ebbfef2f back to 6cfd7639 (both
>> with build fixes) in QEMU.
> 
> This is the QEMU that you are using in L0 to launch an L1 guest, right? or are 
> you still referring to the QEMU mentioned above?

This scenario is similar but still a bit different than the above. Yes, same L0
image and host QEMU here (and the traces were taken on the host, obviously), but
the workload is now as follows:

 - boot L1 Linux
 - enable Jailhouse inside L1
 - move the mouse over the graphical desktop of L2, ie. the former L1
   Linux (Jailhouse is now L1)
 - the L1/L2 guests enter the loop above while trying to read from the
   vmmouse port

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM_SET_NESTED_STATE not yet stable
  2019-07-10 16:05   ` Jan Kiszka
@ 2019-07-10 20:31     ` Jan Kiszka
  2019-07-10 21:14       ` Jan Kiszka
  2019-07-11 11:37       ` Ralf Ramsauer
  0 siblings, 2 replies; 10+ messages in thread
From: Jan Kiszka @ 2019-07-10 20:31 UTC (permalink / raw)
  To: Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini; +Cc: Ralf Ramsauer

On 10.07.19 18:05, Jan Kiszka wrote:
> Hi KarimAllah,
> 
> On 10.07.19 17:24, Raslan, KarimAllah wrote:
>> On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote:
>>> Hi all,
>>>
>>> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
>>> robustness issues.
>>
>> I would be very interested to learn about any more robustness issues that you 
>> are seeing.
>>
>>> The most urgent one: With the help of latest QEMU
>>> master that uses this interface, you can easily crash the host. You just
>>> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
>>> The host CPU that ran this will stall, the system will freeze soon.
>>
>> Just to confirm, you start an L2 guest using qemu inside an L1-guest and then 
>> hard-reset the L1 guest?
> 
> Exactly.
> 
>>
>> Are you running any special workload in L2 or L1 when you reset? Also how 
> 
> Nope. It is a standard (though rather oldish) userland in L1, just running a
> more recent kernel 5.2.
> 
>> exactly are you doing this "hard reset"?
> 
> system_reset from the monitor or "reset" from QEMU window menu.
> 
>>
>> (sorry just tried this in my setup and I did not see any problem but my setup
>>  is slightly different, so just ruling out obvious stuff).
>>
> 
> If it helps, I can share privately a guest image that was built via
> https://github.com/siemens/jailhouse-images which exposes the reset issue after
> starting Jailhouse (instead of qemu-system-x86_64 - though that should "work" as
> well, just not tested yet). It's about 70M packed.
> 
> Host-wise, 5.2.0 + QEMU master should do. I can also provide you the .config if
> needed.
> 
>>>
>>> I've also seen a pattern with my Jailhouse test VM where I seems to get
>>> stuck in a loop between L1 and L2:
>>>
>>>  qemu-system-x86-6660  [007]   398.691401: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>  qemu-system-x86-6660  [007]   398.691402: kvm_fpu:              unload
>>>  qemu-system-x86-6660  [007]   398.691403: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>>>  qemu-system-x86-6660  [007]   398.691440: kvm_fpu:              load
>>>  qemu-system-x86-6660  [007]   398.691441: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
>>>  qemu-system-x86-6660  [007]   398.691443: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>>  qemu-system-x86-6660  [007]   398.691444: kvm_entry:            vcpu 3
>>>  qemu-system-x86-6660  [007]   398.691475: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>>  qemu-system-x86-6660  [007]   398.691476: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>  qemu-system-x86-6660  [007]   398.691477: kvm_fpu:              unload
>>>  qemu-system-x86-6660  [007]   398.691478: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>>>  qemu-system-x86-6660  [007]   398.691526: kvm_fpu:              load
>>>  qemu-system-x86-6660  [007]   398.691527: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
>>>  qemu-system-x86-6660  [007]   398.691529: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>>  qemu-system-x86-6660  [007]   398.691530: kvm_entry:            vcpu 3
>>>  qemu-system-x86-6660  [007]   398.691533: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>>  qemu-system-x86-6660  [007]   398.691534: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>
>>> These issues disappear when going from ebbfef2f back to 6cfd7639 (both
>>> with build fixes) in QEMU.
>>
>> This is the QEMU that you are using in L0 to launch an L1 guest, right? or are 
>> you still referring to the QEMU mentioned above?
> 
> This scenario is similar but still a bit different than the above. Yes, same L0
> image and host QEMU here (and the traces were taken on the host, obviously), but
> the workload is now as follows:
> 
>  - boot L1 Linux
>  - enable Jailhouse inside L1
>  - move the mouse over the graphical desktop of L2, ie. the former L1
>    Linux (Jailhouse is now L1)
>  - the L1/L2 guests enter the loop above while trying to read from the
>    vmmouse port
> 
> Jan
> 

Ralf tried my case on some of his systems as well but he also didn't succeed in
reproducing. So we compared vmxcap lists because I'm starting to think it's
feature-related. There are some differences...

--- vmxcap.i7-5600u	2019-07-10 21:59:05.616547924 +0200
+++ vmxcap.jan	2019-07-10 21:58:23.135686409 +0200
@@ -1,6 +1,6 @@
 Basic VMX Information
-  Hex: 0xda040000000012
-  Revision                                 18
+  Hex: 0xda040000000004
+  Revision                                 4
   VMCS size                                1024
   VMCS restricted to 32 bit addresses      no
   Dual-monitor support                     yes
@@ -51,13 +51,13 @@
   Enable INVPCID                           yes
   Enable VM functions                      yes
   VMCS shadowing                           yes
-  Enable ENCLS exiting                     no
+  Enable ENCLS exiting                     yes
   RDSEED exiting                           yes
-  Enable PML                               no
+  Enable PML                               yes
   EPT-violation #VE                        yes
-  Conceal non-root operation from PT       no
-  Enable XSAVES/XRSTORS                    no
-  Mode-based execute control (XS/XU)       no
+  Conceal non-root operation from PT       yes
+  Enable XSAVES/XRSTORS                    yes
+  Mode-based execute control (XS/XU)       yes
   TSC scaling                              no
 VM-Exit controls
   Save debug controls                      default
@@ -69,8 +69,8 @@
   Save IA32_EFER                           yes
   Load IA32_EFER                           yes
   Save VMX-preemption timer value          yes
-  Clear IA32_BNDCFGS                       no
-  Conceal VM exits from PT                 no
+  Clear IA32_BNDCFGS                       yes
+  Conceal VM exits from PT                 yes
 VM-Entry controls
   Load debug controls                      default
   IA-32e mode guest                        yes
@@ -79,11 +79,11 @@
   Load IA32_PERF_GLOBAL_CTRL               yes
   Load IA32_PAT                            yes
   Load IA32_EFER                           yes
-  Load IA32_BNDCFGS                        no
-  Conceal VM entries from PT               no
+  Load IA32_BNDCFGS                        yes
+  Conceal VM entries from PT               yes
 Miscellaneous data
-  Hex: 0x300481e5
-  VMX-preemption timer scale (log2)        5
+  Hex: 0x7004c1e7
+  VMX-preemption timer scale (log2)        7
   Store EFER.LMA into IA-32e mode guest control yes
   HLT activity state                       yes
   Shutdown activity state                  yes
@@ -93,10 +93,10 @@
   MSR-load/store count recommendation      0
   IA32_SMM_MONITOR_CTL[2] can be set to 1  yes
   VMWRITE to VM-exit information fields    yes
-  Inject event with insn length=0          no
+  Inject event with insn length=0          yes
   MSEG revision identifier                 0
 VPID and EPT capabilities
-  Hex: 0xf0106334141
+  Hex: 0xf0106734141
   Execute-only EPT translations            yes
   Page-walk length 4                       yes
   Paging-structure memory type UC          yes

And another machine that does not crash:

--- vmxcaps.e5-2683v4	2019-07-10 22:21:28.620329384 +0200
+++ vmxcap.jan	2019-07-10 21:58:23.135686409 +0200
@@ -1,6 +1,6 @@
 Basic VMX Information
-  Hex: 0xda040000000012
-  Revision                                 18
+  Hex: 0xda040000000004
+  Revision                                 4
   VMCS size                                1024
   VMCS restricted to 32 bit addresses      no
   Dual-monitor support                     yes
@@ -12,7 +12,7 @@
   NMI exiting                              yes
   Virtual NMIs                             yes
   Activate VMX-preemption timer            yes
-  Process posted interrupts                yes
+  Process posted interrupts                no
 primary processor-based controls
   Interrupt window exiting                 yes
   Use TSC offsetting                       yes
@@ -44,20 +44,20 @@
   Enable VPID                              yes
   WBINVD exiting                           yes
   Unrestricted guest                       yes
-  APIC register emulation                  yes
-  Virtual interrupt delivery               yes
+  APIC register emulation                  no
+  Virtual interrupt delivery               no
   PAUSE-loop exiting                       yes
   RDRAND exiting                           yes
   Enable INVPCID                           yes
   Enable VM functions                      yes
   VMCS shadowing                           yes
-  Enable ENCLS exiting                     no
+  Enable ENCLS exiting                     yes
   RDSEED exiting                           yes
   Enable PML                               yes
   EPT-violation #VE                        yes
-  Conceal non-root operation from PT       no
-  Enable XSAVES/XRSTORS                    no
-  Mode-based execute control (XS/XU)       no
+  Conceal non-root operation from PT       yes
+  Enable XSAVES/XRSTORS                    yes
+  Mode-based execute control (XS/XU)       yes
   TSC scaling                              no
 VM-Exit controls
   Save debug controls                      default
@@ -69,8 +69,8 @@
   Save IA32_EFER                           yes
   Load IA32_EFER                           yes
   Save VMX-preemption timer value          yes
-  Clear IA32_BNDCFGS                       no
-  Conceal VM exits from PT                 no
+  Clear IA32_BNDCFGS                       yes
+  Conceal VM exits from PT                 yes
 VM-Entry controls
   Load debug controls                      default
   IA-32e mode guest                        yes
@@ -79,11 +79,11 @@
   Load IA32_PERF_GLOBAL_CTRL               yes
   Load IA32_PAT                            yes
   Load IA32_EFER                           yes
-  Load IA32_BNDCFGS                        no
-  Conceal VM entries from PT               no
+  Load IA32_BNDCFGS                        yes
+  Conceal VM entries from PT               yes
 Miscellaneous data
-  Hex: 0x300481e5
-  VMX-preemption timer scale (log2)        5
+  Hex: 0x7004c1e7
+  VMX-preemption timer scale (log2)        7
   Store EFER.LMA into IA-32e mode guest control yes
   HLT activity state                       yes
   Shutdown activity state                  yes
@@ -93,10 +93,10 @@
   MSR-load/store count recommendation      0
   IA32_SMM_MONITOR_CTL[2] can be set to 1  yes
   VMWRITE to VM-exit information fields    yes
-  Inject event with insn length=0          no
+  Inject event with insn length=0          yes
   MSEG revision identifier                 0
 VPID and EPT capabilities
-  Hex: 0xf0106334141
+  Hex: 0xf0106734141
   Execute-only EPT translations            yes
   Page-walk length 4                       yes
   Paging-structure memory type UC          yes

And on a Xeon D-1540, I'm not seeing a crash but a kvm entry failure when
resetting L1 while running Jailhouse:

KVM: entry failed, hardware error 0x7
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00a09b00
SS =0000 00000000 0000ffff 00c09300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00
f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Here is the vmxcap diff:

--- xeon-d	2019-07-10 22:29:56.735374032 +0200
+++ i7-8850H	2019-07-10 22:29:31.747467248 +0200
@@ -1,6 +1,6 @@
 Basic VMX Information
-  Hex: 0xda040000000012
-  Revision                                 18
+  Hex: 0xda040000000004
+  Revision                                 4
   VMCS size                                1024
   VMCS restricted to 32 bit addresses      no
   Dual-monitor support                     yes
@@ -12,7 +12,7 @@ pin-based controls
   NMI exiting                              yes
   Virtual NMIs                             yes
   Activate VMX-preemption timer            yes
-  Process posted interrupts                yes
+  Process posted interrupts                no
 primary processor-based controls
   Interrupt window exiting                 yes
   Use TSC offsetting                       yes
@@ -44,20 +44,20 @@ secondary processor-based controls
   Enable VPID                              yes
   WBINVD exiting                           yes
   Unrestricted guest                       yes
-  APIC register emulation                  yes
-  Virtual interrupt delivery               yes
+  APIC register emulation                  no
+  Virtual interrupt delivery               no
   PAUSE-loop exiting                       yes
   RDRAND exiting                           yes
   Enable INVPCID                           yes
   Enable VM functions                      yes
   VMCS shadowing                           yes
-  Enable ENCLS exiting                     no
+  Enable ENCLS exiting                     yes
   RDSEED exiting                           yes
   Enable PML                               yes
   EPT-violation #VE                        yes
-  Conceal non-root operation from PT       no
-  Enable XSAVES/XRSTORS                    no
-  Mode-based execute control (XS/XU)       no
+  Conceal non-root operation from PT       yes
+  Enable XSAVES/XRSTORS                    yes
+  Mode-based execute control (XS/XU)       yes
   TSC scaling                              no
 VM-Exit controls
   Save debug controls                      default
@@ -69,8 +69,8 @@ VM-Exit controls
   Save IA32_EFER                           yes
   Load IA32_EFER                           yes
   Save VMX-preemption timer value          yes
-  Clear IA32_BNDCFGS                       no
-  Conceal VM exits from PT                 no
+  Clear IA32_BNDCFGS                       yes
+  Conceal VM exits from PT                 yes
 VM-Entry controls
   Load debug controls                      default
   IA-32e mode guest                        yes
@@ -79,11 +79,11 @@ VM-Entry controls
   Load IA32_PERF_GLOBAL_CTRL               yes
   Load IA32_PAT                            yes
   Load IA32_EFER                           yes
-  Load IA32_BNDCFGS                        no
-  Conceal VM entries from PT               no
+  Load IA32_BNDCFGS                        yes
+  Conceal VM entries from PT               yes
 Miscellaneous data
-  Hex: 0x300481e5
-  VMX-preemption timer scale (log2)        5
+  Hex: 0x7004c1e7
+  VMX-preemption timer scale (log2)        7
   Store EFER.LMA into IA-32e mode guest control yes
   HLT activity state                       yes
   Shutdown activity state                  yes
@@ -93,10 +93,10 @@ Miscellaneous data
   MSR-load/store count recommendation      0
   IA32_SMM_MONITOR_CTL[2] can be set to 1  yes
   VMWRITE to VM-exit information fields    yes
-  Inject event with insn length=0          no
+  Inject event with insn length=0          yes
   MSEG revision identifier                 0
 VPID and EPT capabilities
-  Hex: 0xf0106334141
+  Hex: 0xf0106734141
   Execute-only EPT translations            yes
   Page-walk length 4                       yes
   Paging-structure memory type UC          yes

Maybe the KVM code does not take the latest VMX features into account when
importing a userspace nested state?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM_SET_NESTED_STATE not yet stable
  2019-07-10 20:31     ` Jan Kiszka
@ 2019-07-10 21:14       ` Jan Kiszka
  2019-07-11 11:37       ` Ralf Ramsauer
  1 sibling, 0 replies; 10+ messages in thread
From: Jan Kiszka @ 2019-07-10 21:14 UTC (permalink / raw)
  To: Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini; +Cc: Ralf Ramsauer

On 10.07.19 22:31, Jan Kiszka wrote:
> And on a Xeon D-1540, I'm not seeing a crash but a kvm entry failure when
> resetting L1 while running Jailhouse:
> 
> KVM: entry failed, hardware error 0x7
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00a09b00
> SS =0000 00000000 0000ffff 00c09300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT=     00000000 0000ffff
> IDT=     00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00
> f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 

OK, looks like the feature diff was a red herring: Ralf found a server with even
more features and without a crash, and I found familiar error messages in the
kernel log of that Xeon D:

kvm: vmptrld           (null)/778000000000 failed
kvm: vmclear fail:           (null)/778000000000

Only difference: No crash, just that more graceful entry failure.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM_SET_NESTED_STATE not yet stable
  2019-07-10 20:31     ` Jan Kiszka
  2019-07-10 21:14       ` Jan Kiszka
@ 2019-07-11 11:37       ` Ralf Ramsauer
  2019-07-11 17:30         ` Paolo Bonzini
  1 sibling, 1 reply; 10+ messages in thread
From: Ralf Ramsauer @ 2019-07-11 11:37 UTC (permalink / raw)
  To: Jan Kiszka, Raslan, KarimAllah, jmattson, liran.alon, kvm, pbonzini

Hi all,

On 7/10/19 10:31 PM, Jan Kiszka wrote:
> On 10.07.19 18:05, Jan Kiszka wrote:
>> Hi KarimAllah,
>>
>> On 10.07.19 17:24, Raslan, KarimAllah wrote:
>>> On Mon, 2019-07-08 at 22:39 +0200, Jan Kiszka wrote:
>>>> Hi all,
>>>>
>>>> it seems the "new" KVM_SET_NESTED_STATE interface has some remaining
>>>> robustness issues.
>>>
>>> I would be very interested to learn about any more robustness issues that you 
>>> are seeing.
>>>
>>>> The most urgent one: With the help of latest QEMU
>>>> master that uses this interface, you can easily crash the host. You just
>>>> need to start qemu-system-x86 -enable-kvm in L1 and then hard-reset L1.
>>>> The host CPU that ran this will stall, the system will freeze soon.
>>>
>>> Just to confirm, you start an L2 guest using qemu inside an L1-guest and then 
>>> hard-reset the L1 guest?
>>
>> Exactly.
>>
>>>
>>> Are you running any special workload in L2 or L1 when you reset? Also how 
>>
>> Nope. It is a standard (though rather oldish) userland in L1, just running a
>> more recent kernel 5.2.
>>
>>> exactly are you doing this "hard reset"?
>>
>> system_reset from the monitor or "reset" from QEMU window menu.

While I'm not able to reproduce this behaviour on any of my machines
(i7-4810MQ, i7-5600U, Xeon Gold 5118),

>>
>>>
>>> (sorry just tried this in my setup and I did not see any problem but my setup
>>>  is slightly different, so just ruling out obvious stuff).
>>>
>>
>> If it helps, I can share privately a guest image that was built via
>> https://github.com/siemens/jailhouse-images which exposes the reset issue after
>> starting Jailhouse (instead of qemu-system-x86_64 - though that should "work" as
>> well, just not tested yet). It's about 70M packed.
>>
>> Host-wise, 5.2.0 + QEMU master should do. I can also provide you the .config if
>> needed.

I can reproduce and confirm this issue. A system_reset of qemu after
Jailhouse is enabled leads to the crash listed below, on all machines.

On the Xeon Gold, e.g., Qemu reports:

EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00a09b00
SS =0000 00000000 0000ffff 00c09300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00

Kernel:
[ 1868.804515] kvm: vmptrld           (null)/6b8640000000 failed
[ 1868.804568] kvm: vmclear fail:           (null)/6b8640000000

And the host freezes unrecoverably. Hosts use standard distro kernels
>= v5.0.

  Ralf

>>
>>>>
>>>> I've also seen a pattern with my Jailhouse test VM where I seems to get
>>>> stuck in a loop between L1 and L2:
>>>>
>>>>  qemu-system-x86-6660  [007]   398.691401: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>>  qemu-system-x86-6660  [007]   398.691402: kvm_fpu:              unload
>>>>  qemu-system-x86-6660  [007]   398.691403: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>>>>  qemu-system-x86-6660  [007]   398.691440: kvm_fpu:              load
>>>>  qemu-system-x86-6660  [007]   398.691441: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
>>>>  qemu-system-x86-6660  [007]   398.691443: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>>>  qemu-system-x86-6660  [007]   398.691444: kvm_entry:            vcpu 3
>>>>  qemu-system-x86-6660  [007]   398.691475: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>>>  qemu-system-x86-6660  [007]   398.691476: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>>  qemu-system-x86-6660  [007]   398.691477: kvm_fpu:              unload
>>>>  qemu-system-x86-6660  [007]   398.691478: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>>>>  qemu-system-x86-6660  [007]   398.691526: kvm_fpu:              load
>>>>  qemu-system-x86-6660  [007]   398.691527: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x4 
>>>>  qemu-system-x86-6660  [007]   398.691529: kvm_mmu_get_page:     existing sp gfn 3a22e 1/4 q3 direct --x !pge !nxe root 6 sync
>>>>  qemu-system-x86-6660  [007]   398.691530: kvm_entry:            vcpu 3
>>>>  qemu-system-x86-6660  [007]   398.691533: kvm_exit:             reason IO_INSTRUCTION rip 0x7fa9ee5224e4 info 5658000b 0
>>>>  qemu-system-x86-6660  [007]   398.691534: kvm_nested_vmexit:    rip 7fa9ee5224e4 reason IO_INSTRUCTION info1 5658000b info2 0 int_info 0 int_info_err 0
>>>>
>>>> These issues disappear when going from ebbfef2f back to 6cfd7639 (both
>>>> with build fixes) in QEMU.
>>>
>>> This is the QEMU that you are using in L0 to launch an L1 guest, right? or are 
>>> you still referring to the QEMU mentioned above?
>>
>> This scenario is similar but still a bit different than the above. Yes, same L0
>> image and host QEMU here (and the traces were taken on the host, obviously), but
>> the workload is now as follows:
>>
>>  - boot L1 Linux
>>  - enable Jailhouse inside L1
>>  - move the mouse over the graphical desktop of L2, ie. the former L1
>>    Linux (Jailhouse is now L1)
>>  - the L1/L2 guests enter the loop above while trying to read from the
>>    vmmouse port
>>
>> Jan
>>
> 
> Ralf tried my case on some of his systems as well but he also didn't succeed in
> reproducing. So we compared vmxcap lists because I'm starting to think it's
> feature-related. There are some differences...
> 
> --- vmxcap.i7-5600u	2019-07-10 21:59:05.616547924 +0200
> +++ vmxcap.jan	2019-07-10 21:58:23.135686409 +0200
> @@ -1,6 +1,6 @@
>  Basic VMX Information
> -  Hex: 0xda040000000012
> -  Revision                                 18
> +  Hex: 0xda040000000004
> +  Revision                                 4
>    VMCS size                                1024
>    VMCS restricted to 32 bit addresses      no
>    Dual-monitor support                     yes
> @@ -51,13 +51,13 @@
>    Enable INVPCID                           yes
>    Enable VM functions                      yes
>    VMCS shadowing                           yes
> -  Enable ENCLS exiting                     no
> +  Enable ENCLS exiting                     yes
>    RDSEED exiting                           yes
> -  Enable PML                               no
> +  Enable PML                               yes
>    EPT-violation #VE                        yes
> -  Conceal non-root operation from PT       no
> -  Enable XSAVES/XRSTORS                    no
> -  Mode-based execute control (XS/XU)       no
> +  Conceal non-root operation from PT       yes
> +  Enable XSAVES/XRSTORS                    yes
> +  Mode-based execute control (XS/XU)       yes
>    TSC scaling                              no
>  VM-Exit controls
>    Save debug controls                      default
> @@ -69,8 +69,8 @@
>    Save IA32_EFER                           yes
>    Load IA32_EFER                           yes
>    Save VMX-preemption timer value          yes
> -  Clear IA32_BNDCFGS                       no
> -  Conceal VM exits from PT                 no
> +  Clear IA32_BNDCFGS                       yes
> +  Conceal VM exits from PT                 yes
>  VM-Entry controls
>    Load debug controls                      default
>    IA-32e mode guest                        yes
> @@ -79,11 +79,11 @@
>    Load IA32_PERF_GLOBAL_CTRL               yes
>    Load IA32_PAT                            yes
>    Load IA32_EFER                           yes
> -  Load IA32_BNDCFGS                        no
> -  Conceal VM entries from PT               no
> +  Load IA32_BNDCFGS                        yes
> +  Conceal VM entries from PT               yes
>  Miscellaneous data
> -  Hex: 0x300481e5
> -  VMX-preemption timer scale (log2)        5
> +  Hex: 0x7004c1e7
> +  VMX-preemption timer scale (log2)        7
>    Store EFER.LMA into IA-32e mode guest control yes
>    HLT activity state                       yes
>    Shutdown activity state                  yes
> @@ -93,10 +93,10 @@
>    MSR-load/store count recommendation      0
>    IA32_SMM_MONITOR_CTL[2] can be set to 1  yes
>    VMWRITE to VM-exit information fields    yes
> -  Inject event with insn length=0          no
> +  Inject event with insn length=0          yes
>    MSEG revision identifier                 0
>  VPID and EPT capabilities
> -  Hex: 0xf0106334141
> +  Hex: 0xf0106734141
>    Execute-only EPT translations            yes
>    Page-walk length 4                       yes
>    Paging-structure memory type UC          yes
> 
> And another machine that does not crash:
> 
> --- vmxcaps.e5-2683v4	2019-07-10 22:21:28.620329384 +0200
> +++ vmxcap.jan	2019-07-10 21:58:23.135686409 +0200
> @@ -1,6 +1,6 @@
>  Basic VMX Information
> -  Hex: 0xda040000000012
> -  Revision                                 18
> +  Hex: 0xda040000000004
> +  Revision                                 4
>    VMCS size                                1024
>    VMCS restricted to 32 bit addresses      no
>    Dual-monitor support                     yes
> @@ -12,7 +12,7 @@
>    NMI exiting                              yes
>    Virtual NMIs                             yes
>    Activate VMX-preemption timer            yes
> -  Process posted interrupts                yes
> +  Process posted interrupts                no
>  primary processor-based controls
>    Interrupt window exiting                 yes
>    Use TSC offsetting                       yes
> @@ -44,20 +44,20 @@
>    Enable VPID                              yes
>    WBINVD exiting                           yes
>    Unrestricted guest                       yes
> -  APIC register emulation                  yes
> -  Virtual interrupt delivery               yes
> +  APIC register emulation                  no
> +  Virtual interrupt delivery               no
>    PAUSE-loop exiting                       yes
>    RDRAND exiting                           yes
>    Enable INVPCID                           yes
>    Enable VM functions                      yes
>    VMCS shadowing                           yes
> -  Enable ENCLS exiting                     no
> +  Enable ENCLS exiting                     yes
>    RDSEED exiting                           yes
>    Enable PML                               yes
>    EPT-violation #VE                        yes
> -  Conceal non-root operation from PT       no
> -  Enable XSAVES/XRSTORS                    no
> -  Mode-based execute control (XS/XU)       no
> +  Conceal non-root operation from PT       yes
> +  Enable XSAVES/XRSTORS                    yes
> +  Mode-based execute control (XS/XU)       yes
>    TSC scaling                              no
>  VM-Exit controls
>    Save debug controls                      default
> @@ -69,8 +69,8 @@
>    Save IA32_EFER                           yes
>    Load IA32_EFER                           yes
>    Save VMX-preemption timer value          yes
> -  Clear IA32_BNDCFGS                       no
> -  Conceal VM exits from PT                 no
> +  Clear IA32_BNDCFGS                       yes
> +  Conceal VM exits from PT                 yes
>  VM-Entry controls
>    Load debug controls                      default
>    IA-32e mode guest                        yes
> @@ -79,11 +79,11 @@
>    Load IA32_PERF_GLOBAL_CTRL               yes
>    Load IA32_PAT                            yes
>    Load IA32_EFER                           yes
> -  Load IA32_BNDCFGS                        no
> -  Conceal VM entries from PT               no
> +  Load IA32_BNDCFGS                        yes
> +  Conceal VM entries from PT               yes
>  Miscellaneous data
> -  Hex: 0x300481e5
> -  VMX-preemption timer scale (log2)        5
> +  Hex: 0x7004c1e7
> +  VMX-preemption timer scale (log2)        7
>    Store EFER.LMA into IA-32e mode guest control yes
>    HLT activity state                       yes
>    Shutdown activity state                  yes
> @@ -93,10 +93,10 @@
>    MSR-load/store count recommendation      0
>    IA32_SMM_MONITOR_CTL[2] can be set to 1  yes
>    VMWRITE to VM-exit information fields    yes
> -  Inject event with insn length=0          no
> +  Inject event with insn length=0          yes
>    MSEG revision identifier                 0
>  VPID and EPT capabilities
> -  Hex: 0xf0106334141
> +  Hex: 0xf0106734141
>    Execute-only EPT translations            yes
>    Page-walk length 4                       yes
>    Paging-structure memory type UC          yes
> 
> And on a Xeon D-1540, I'm not seeing a crash but a kvm entry failure when
> resetting L1 while running Jailhouse:
> 
> KVM: entry failed, hardware error 0x7
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00a09b00
> SS =0000 00000000 0000ffff 00c09300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT=     00000000 0000ffff
> IDT=     00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b e0 00
> f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> Here is the vmxcap diff:
> 
> --- xeon-d	2019-07-10 22:29:56.735374032 +0200
> +++ i7-8850H	2019-07-10 22:29:31.747467248 +0200
> @@ -1,6 +1,6 @@
>  Basic VMX Information
> -  Hex: 0xda040000000012
> -  Revision                                 18
> +  Hex: 0xda040000000004
> +  Revision                                 4
>    VMCS size                                1024
>    VMCS restricted to 32 bit addresses      no
>    Dual-monitor support                     yes
> @@ -12,7 +12,7 @@ pin-based controls
>    NMI exiting                              yes
>    Virtual NMIs                             yes
>    Activate VMX-preemption timer            yes
> -  Process posted interrupts                yes
> +  Process posted interrupts                no
>  primary processor-based controls
>    Interrupt window exiting                 yes
>    Use TSC offsetting                       yes
> @@ -44,20 +44,20 @@ secondary processor-based controls
>    Enable VPID                              yes
>    WBINVD exiting                           yes
>    Unrestricted guest                       yes
> -  APIC register emulation                  yes
> -  Virtual interrupt delivery               yes
> +  APIC register emulation                  no
> +  Virtual interrupt delivery               no
>    PAUSE-loop exiting                       yes
>    RDRAND exiting                           yes
>    Enable INVPCID                           yes
>    Enable VM functions                      yes
>    VMCS shadowing                           yes
> -  Enable ENCLS exiting                     no
> +  Enable ENCLS exiting                     yes
>    RDSEED exiting                           yes
>    Enable PML                               yes
>    EPT-violation #VE                        yes
> -  Conceal non-root operation from PT       no
> -  Enable XSAVES/XRSTORS                    no
> -  Mode-based execute control (XS/XU)       no
> +  Conceal non-root operation from PT       yes
> +  Enable XSAVES/XRSTORS                    yes
> +  Mode-based execute control (XS/XU)       yes
>    TSC scaling                              no
>  VM-Exit controls
>    Save debug controls                      default
> @@ -69,8 +69,8 @@ VM-Exit controls
>    Save IA32_EFER                           yes
>    Load IA32_EFER                           yes
>    Save VMX-preemption timer value          yes
> -  Clear IA32_BNDCFGS                       no
> -  Conceal VM exits from PT                 no
> +  Clear IA32_BNDCFGS                       yes
> +  Conceal VM exits from PT                 yes
>  VM-Entry controls
>    Load debug controls                      default
>    IA-32e mode guest                        yes
> @@ -79,11 +79,11 @@ VM-Entry controls
>    Load IA32_PERF_GLOBAL_CTRL               yes
>    Load IA32_PAT                            yes
>    Load IA32_EFER                           yes
> -  Load IA32_BNDCFGS                        no
> -  Conceal VM entries from PT               no
> +  Load IA32_BNDCFGS                        yes
> +  Conceal VM entries from PT               yes
>  Miscellaneous data
> -  Hex: 0x300481e5
> -  VMX-preemption timer scale (log2)        5
> +  Hex: 0x7004c1e7
> +  VMX-preemption timer scale (log2)        7
>    Store EFER.LMA into IA-32e mode guest control yes
>    HLT activity state                       yes
>    Shutdown activity state                  yes
> @@ -93,10 +93,10 @@ Miscellaneous data
>    MSR-load/store count recommendation      0
>    IA32_SMM_MONITOR_CTL[2] can be set to 1  yes
>    VMWRITE to VM-exit information fields    yes
> -  Inject event with insn length=0          no
> +  Inject event with insn length=0          yes
>    MSEG revision identifier                 0
>  VPID and EPT capabilities
> -  Hex: 0xf0106334141
> +  Hex: 0xf0106734141
>    Execute-only EPT translations            yes
>    Page-walk length 4                       yes
>    Paging-structure memory type UC          yes
> 
> Maybe the KVM code does not take the latest VMX features into account when
> importing a userspace nested state?
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM_SET_NESTED_STATE not yet stable
  2019-07-11 11:37       ` Ralf Ramsauer
@ 2019-07-11 17:30         ` Paolo Bonzini
  2019-07-19 16:38           ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2019-07-11 17:30 UTC (permalink / raw)
  To: Ralf Ramsauer, Jan Kiszka, Raslan, KarimAllah, jmattson, liran.alon, kvm

On 11/07/19 13:37, Ralf Ramsauer wrote:
> I can reproduce and confirm this issue. A system_reset of qemu after
> Jailhouse is enabled leads to the crash listed below, on all machines.
> 
> On the Xeon Gold, e.g., Qemu reports:
> 
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00a09b00
> SS =0000 00000000 0000ffff 00c09300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT=     00000000 0000ffff
> IDT=     00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00
> 
> Kernel:
> [ 1868.804515] kvm: vmptrld           (null)/6b8640000000 failed
> [ 1868.804568] kvm: vmclear fail:           (null)/6b8640000000
> 
> And the host freezes unrecoverably. Hosts use standard distro kernels

Thanks.  I'm going to look at it tomorrow.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM_SET_NESTED_STATE not yet stable
  2019-07-11 17:30         ` Paolo Bonzini
@ 2019-07-19 16:38           ` Paolo Bonzini
  2019-07-21  9:05             ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2019-07-19 16:38 UTC (permalink / raw)
  To: Ralf Ramsauer, Jan Kiszka, Raslan, KarimAllah, jmattson, liran.alon, kvm

On 11/07/19 19:30, Paolo Bonzini wrote:
> On 11/07/19 13:37, Ralf Ramsauer wrote:
>> I can reproduce and confirm this issue. A system_reset of qemu after
>> Jailhouse is enabled leads to the crash listed below, on all machines.
>>
>> On the Xeon Gold, e.g., Qemu reports:
>>
>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
>> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>> ES =0000 00000000 0000ffff 00009300
>> CS =f000 ffff0000 0000ffff 00a09b00
>> SS =0000 00000000 0000ffff 00c09300
>> DS =0000 00000000 0000ffff 00009300
>> FS =0000 00000000 0000ffff 00009300
>> GS =0000 00000000 0000ffff 00009300
>> LDT=0000 00000000 0000ffff 00008200
>> TR =0000 00000000 0000ffff 00008b00
>> GDT=     00000000 0000ffff
>> IDT=     00000000 0000ffff
>> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>> DR3=0000000000000000
>> DR6=00000000ffff0ff0 DR7=0000000000000400
>> EFER=0000000000000000
>> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
>> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00
>>
>> Kernel:
>> [ 1868.804515] kvm: vmptrld           (null)/6b8640000000 failed
>> [ 1868.804568] kvm: vmclear fail:           (null)/6b8640000000
>>
>> And the host freezes unrecoverably. Hosts use standard distro kernels
> 
> Thanks.  I'm going to look at it tomorrow.

Ok, it was only tomorrow modulo 7, but the first fix I got is trivial:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6e88f459b323..6119b30347c6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
 {
 	secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
 	vmcs_write64(VMCS_LINK_POINTER, -1ull);
+	vmx->nested.need_vmcs12_to_shadow_sync = false;
 }
 
 static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)

Can you try it and see what you get?

Paolo


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: KVM_SET_NESTED_STATE not yet stable
  2019-07-19 16:38           ` Paolo Bonzini
@ 2019-07-21  9:05             ` Jan Kiszka
  2019-07-22 15:10               ` Ralf Ramsauer
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2019-07-21  9:05 UTC (permalink / raw)
  To: Paolo Bonzini, Ralf Ramsauer, Raslan, KarimAllah, jmattson,
	liran.alon, kvm

On 19.07.19 18:38, Paolo Bonzini wrote:
> On 11/07/19 19:30, Paolo Bonzini wrote:
>> On 11/07/19 13:37, Ralf Ramsauer wrote:
>>> I can reproduce and confirm this issue. A system_reset of qemu after
>>> Jailhouse is enabled leads to the crash listed below, on all machines.
>>>
>>> On the Xeon Gold, e.g., Qemu reports:
>>>
>>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
>>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
>>> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>> ES =0000 00000000 0000ffff 00009300
>>> CS =f000 ffff0000 0000ffff 00a09b00
>>> SS =0000 00000000 0000ffff 00c09300
>>> DS =0000 00000000 0000ffff 00009300
>>> FS =0000 00000000 0000ffff 00009300
>>> GS =0000 00000000 0000ffff 00009300
>>> LDT=0000 00000000 0000ffff 00008200
>>> TR =0000 00000000 0000ffff 00008b00
>>> GDT=     00000000 0000ffff
>>> IDT=     00000000 0000ffff
>>> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>>> DR3=0000000000000000
>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>> EFER=0000000000000000
>>> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
>>> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00
>>>
>>> Kernel:
>>> [ 1868.804515] kvm: vmptrld           (null)/6b8640000000 failed
>>> [ 1868.804568] kvm: vmclear fail:           (null)/6b8640000000
>>>
>>> And the host freezes unrecoverably. Hosts use standard distro kernels
>>
>> Thanks.  I'm going to look at it tomorrow.
>
> Ok, it was only tomorrow modulo 7, but the first fix I got is trivial:
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6e88f459b323..6119b30347c6 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
>  {
>  	secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
>  	vmcs_write64(VMCS_LINK_POINTER, -1ull);
> +	vmx->nested.need_vmcs12_to_shadow_sync = false;
>  }
>
>  static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
>
> Can you try it and see what you get?
>

Confirmed that this fixes the host crashes for me as well.

Now I'm only still seeing guest corruptions on vmport/vmmouse accesses from L2.
Looking into that right now.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM_SET_NESTED_STATE not yet stable
  2019-07-21  9:05             ` Jan Kiszka
@ 2019-07-22 15:10               ` Ralf Ramsauer
  0 siblings, 0 replies; 10+ messages in thread
From: Ralf Ramsauer @ 2019-07-22 15:10 UTC (permalink / raw)
  To: Jan Kiszka, Paolo Bonzini, Raslan, KarimAllah, jmattson, liran.alon, kvm


On 7/21/19 11:05 AM, Jan Kiszka wrote:
> On 19.07.19 18:38, Paolo Bonzini wrote:
>> On 11/07/19 19:30, Paolo Bonzini wrote:
>>> On 11/07/19 13:37, Ralf Ramsauer wrote:
>>>> I can reproduce and confirm this issue. A system_reset of qemu after
>>>> Jailhouse is enabled leads to the crash listed below, on all machines.
>>>>
>>>> On the Xeon Gold, e.g., Qemu reports:
>>>>
>>>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000f61
>>>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
>>>> EIP=0000fff0 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>>> ES =0000 00000000 0000ffff 00009300
>>>> CS =f000 ffff0000 0000ffff 00a09b00
>>>> SS =0000 00000000 0000ffff 00c09300
>>>> DS =0000 00000000 0000ffff 00009300
>>>> FS =0000 00000000 0000ffff 00009300
>>>> GS =0000 00000000 0000ffff 00009300
>>>> LDT=0000 00000000 0000ffff 00008200
>>>> TR =0000 00000000 0000ffff 00008b00
>>>> GDT=     00000000 0000ffff
>>>> IDT=     00000000 0000ffff
>>>> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000680
>>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
>>>> DR3=0000000000000000
>>>> DR6=00000000ffff0ff0 DR7=0000000000000400
>>>> EFER=0000000000000000
>>>> Code=00 66 89 d8 66 e8 af a1 ff ff 66 83 c4 0c 66 5b 66 5e 66 c3 <ea> 5b
>>>> e0 00 f0 30 36 2f 32 33 2f 39 39 00 fc 00 00 00 00 00 00 00 00 00 00 00
>>>> 00 00 00 00
>>>>
>>>> Kernel:
>>>> [ 1868.804515] kvm: vmptrld           (null)/6b8640000000 failed
>>>> [ 1868.804568] kvm: vmclear fail:           (null)/6b8640000000
>>>>
>>>> And the host freezes unrecoverably. Hosts use standard distro kernels
>>>
>>> Thanks.  I'm going to look at it tomorrow.
>>
>> Ok, it was only tomorrow modulo 7, but the first fix I got is trivial:
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 6e88f459b323..6119b30347c6 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
>>  {
>>  	secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
>>  	vmcs_write64(VMCS_LINK_POINTER, -1ull);
>> +	vmx->nested.need_vmcs12_to_shadow_sync = false;
>>  }
>>
>>  static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
>>
>> Can you try it and see what you get?
>>
> 
> Confirmed that this fixes the host crashes for me as well.

Works, thanks. Tested on a v5.3-rc1. There, the proper patch is already
applied. No more crashes, qemu resets as expected. Let's wait for the
backport…

  Ralf

> 
> Now I'm only still seeing guest corruptions on vmport/vmmouse accesses from L2.
> Looking into that right now.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-07-22 15:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-08 20:39 KVM_SET_NESTED_STATE not yet stable Jan Kiszka
2019-07-10 15:24 ` Raslan, KarimAllah
2019-07-10 16:05   ` Jan Kiszka
2019-07-10 20:31     ` Jan Kiszka
2019-07-10 21:14       ` Jan Kiszka
2019-07-11 11:37       ` Ralf Ramsauer
2019-07-11 17:30         ` Paolo Bonzini
2019-07-19 16:38           ` Paolo Bonzini
2019-07-21  9:05             ` Jan Kiszka
2019-07-22 15:10               ` Ralf Ramsauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).