kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
@ 2018-02-05 11:04 Wanpeng Li
       [not found] ` <CALMp9eQNB+2KLAR2Tv+By2oTthRLvAQMmJiEs61kFNrZDct3_g@mail.gmail.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2018-02-05 11:04 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář

From: Wanpeng Li <wanpengli@tencent.com>

In L0, Haswell client host:

 nested_vmx_exit_reflected failed vm entry 7
 WARNING: CPU: 6 PID: 6797 at kvm/arch/x86/kvm//vmx.c:6206 handle_desc+0x2d/0x40 [kvm_intel]
 CPU: 6 PID: 6797 Comm: qemu-system-x86 Tainted: G        W  OE    4.15.0+ #4
 RIP: 0010:handle_desc+0x2d/0x40 [kvm_intel]
 Call Trace:
  vmx_handle_exit+0xbd/0xe20 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0xcde/0x1c00 [kvm]
  kvm_arch_vcpu_ioctl_run+0xd5a/0x1c00 [kvm]
  kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
  ? kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
  ? __fget+0xfc/0x210
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x25/0x9c

This can be reproduced by running kvm-unit-tests/run_tests.sh vmx_controls in 
L1. UMIP CPUID bit is exposed to the L1 UMIP aware guest since it is emulated 
by enabling descriptor-table exits on L0. There is a vmentry fail when 
L0 tries to run L2 directly, the L1 guest architectural CR4 is not restored 
after this failure since commit 4f350c6dbcb (kvm: nVMX: Handle deferred early 
VMLAUNCH/VMRESUME failure properly). The L2 is kvm-unit-tests which will not 
write CR4 w/ X86_CR4_UMIP bit. After another L1 access descriptor vmexit, we 
check L2's architectural CR4 instead of L1's architectural CR4. This patch 
fixes it by restoring L1's architectural CR4 after L0's VMLAUNCH/VMRESUME 
failure.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/vmx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 23789c9..9fc0492 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11633,6 +11633,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 	 */
 	nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
 
+	vcpu->arch.cr4 = vmcs12->host_cr4;
 	load_vmcs12_mmu_host_state(vcpu, vmcs12);
 
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
       [not found] ` <CALMp9eQNB+2KLAR2Tv+By2oTthRLvAQMmJiEs61kFNrZDct3_g@mail.gmail.com>
@ 2018-02-05 18:24   ` Jim Mattson
  2018-02-06  0:57     ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Mattson @ 2018-02-05 18:24 UTC (permalink / raw)
  To: kvm list; +Cc: LKML

[Resending as plain text]

On Mon, Feb 5, 2018 at 10:21 AM Jim Mattson <jmattson@google.com> wrote:

> This is incorrect. In the event of an early VM-entry failure (e.g. a
> VM-entry failure for "VM entry with invalid control field(s)"), no host
> state should be loaded from the VMCS12. Of course, no guest state should
> have been loaded from the VMCS12 either, but that's a problem we have with
> deferring some VMCS12 control field checks to the hardware.

> CR4 should be unchanged from the time of the VMLAUNCH/VMRESUME. There is
> no guarantee that vmcs12->host_cr4 holds the correct value.


> On Mon, Feb 5, 2018 at 3:05 AM Wanpeng Li <kernellwp@gmail.com> wrote:

>> From: Wanpeng Li <wanpengli@tencent.com>

>> In L0, Haswell client host:

>>   nested_vmx_exit_reflected failed vm entry 7
>>   WARNING: CPU: 6 PID: 6797 at kvm/arch/x86/kvm//vmx.c:6206
>> handle_desc+0x2d/0x40 [kvm_intel]
>>   CPU: 6 PID: 6797 Comm: qemu-system-x86 Tainted: G        W  OE
>> 4.15.0+ #4
>>   RIP: 0010:handle_desc+0x2d/0x40 [kvm_intel]
>>   Call Trace:
>>    vmx_handle_exit+0xbd/0xe20 [kvm_intel]
>>    ? kvm_arch_vcpu_ioctl_run+0xcde/0x1c00 [kvm]
>>    kvm_arch_vcpu_ioctl_run+0xd5a/0x1c00 [kvm]
>>    kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
>>    ? kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
>>    ? __fget+0xfc/0x210
>>    ? __fget+0xfc/0x210
>>    do_vfs_ioctl+0xa4/0x6a0
>>    ? __fget+0x11d/0x210
>>    SyS_ioctl+0x79/0x90
>>    entry_SYSCALL_64_fastpath+0x25/0x9c

>> This can be reproduced by running kvm-unit-tests/run_tests.sh
>> vmx_controls in
>> L1. UMIP CPUID bit is exposed to the L1 UMIP aware guest since it is
>> emulated
>> by enabling descriptor-table exits on L0. There is a vmentry fail when
>> L0 tries to run L2 directly, the L1 guest architectural CR4 is not
>> restored
>> after this failure since commit 4f350c6dbcb (kvm: nVMX: Handle deferred
>> early
>> VMLAUNCH/VMRESUME failure properly). The L2 is kvm-unit-tests which will
>> not
>> write CR4 w/ X86_CR4_UMIP bit. After another L1 access descriptor vmexit,
>> we
>> check L2's architectural CR4 instead of L1's architectural CR4. This
>> patch
>> fixes it by restoring L1's architectural CR4 after L0's VMLAUNCH/VMRESUME
>> failure.

>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>> ---
>>   arch/x86/kvm/vmx.c | 1 +
>>   1 file changed, 1 insertion(+)

>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 23789c9..9fc0492 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -11633,6 +11633,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu
>> *vcpu, u32 exit_reason,
>>           */
>>          nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);

>> +       vcpu->arch.cr4 = vmcs12->host_cr4;
>>          load_vmcs12_mmu_host_state(vcpu, vmcs12);

>>          /*
>> --
>> 2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-05 18:24   ` Jim Mattson
@ 2018-02-06  0:57     ` Wanpeng Li
  2018-02-06 16:58       ` Jim Mattson
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2018-02-06  0:57 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, LKML, Radim Krcmar, Paolo Bonzini

2018-02-06 2:24 GMT+08:00 Jim Mattson <jmattson@google.com>:
> [Resending as plain text]
>
> On Mon, Feb 5, 2018 at 10:21 AM Jim Mattson <jmattson@google.com> wrote:
>
>> This is incorrect. In the event of an early VM-entry failure (e.g. a
>> VM-entry failure for "VM entry with invalid control field(s)"), no host
>> state should be loaded from the VMCS12. Of course, no guest state should
>> have been loaded from the VMCS12 either, but that's a problem we have with
>> deferring some VMCS12 control field checks to the hardware.
>
>> CR4 should be unchanged from the time of the VMLAUNCH/VMRESUME. There is

This is effective one, what I restore in this patch is
achitectural/guest visible.

Regards,
Wanpeng Li

>> no guarantee that vmcs12->host_cr4 holds the correct value.
>
>
>> On Mon, Feb 5, 2018 at 3:05 AM Wanpeng Li <kernellwp@gmail.com> wrote:
>
>>> From: Wanpeng Li <wanpengli@tencent.com>
>
>>> In L0, Haswell client host:
>
>>>   nested_vmx_exit_reflected failed vm entry 7
>>>   WARNING: CPU: 6 PID: 6797 at kvm/arch/x86/kvm//vmx.c:6206
>>> handle_desc+0x2d/0x40 [kvm_intel]
>>>   CPU: 6 PID: 6797 Comm: qemu-system-x86 Tainted: G        W  OE
>>> 4.15.0+ #4
>>>   RIP: 0010:handle_desc+0x2d/0x40 [kvm_intel]
>>>   Call Trace:
>>>    vmx_handle_exit+0xbd/0xe20 [kvm_intel]
>>>    ? kvm_arch_vcpu_ioctl_run+0xcde/0x1c00 [kvm]
>>>    kvm_arch_vcpu_ioctl_run+0xd5a/0x1c00 [kvm]
>>>    kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
>>>    ? kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
>>>    ? __fget+0xfc/0x210
>>>    ? __fget+0xfc/0x210
>>>    do_vfs_ioctl+0xa4/0x6a0
>>>    ? __fget+0x11d/0x210
>>>    SyS_ioctl+0x79/0x90
>>>    entry_SYSCALL_64_fastpath+0x25/0x9c
>
>>> This can be reproduced by running kvm-unit-tests/run_tests.sh
>>> vmx_controls in
>>> L1. UMIP CPUID bit is exposed to the L1 UMIP aware guest since it is
>>> emulated
>>> by enabling descriptor-table exits on L0. There is a vmentry fail when
>>> L0 tries to run L2 directly, the L1 guest architectural CR4 is not
>>> restored
>>> after this failure since commit 4f350c6dbcb (kvm: nVMX: Handle deferred
>>> early
>>> VMLAUNCH/VMRESUME failure properly). The L2 is kvm-unit-tests which will
>>> not
>>> write CR4 w/ X86_CR4_UMIP bit. After another L1 access descriptor vmexit,
>>> we
>>> check L2's architectural CR4 instead of L1's architectural CR4. This
>>> patch
>>> fixes it by restoring L1's architectural CR4 after L0's VMLAUNCH/VMRESUME
>>> failure.
>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>>> ---
>>>   arch/x86/kvm/vmx.c | 1 +
>>>   1 file changed, 1 insertion(+)
>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 23789c9..9fc0492 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -11633,6 +11633,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu
>>> *vcpu, u32 exit_reason,
>>>           */
>>>          nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
>
>>> +       vcpu->arch.cr4 = vmcs12->host_cr4;
>>>          load_vmcs12_mmu_host_state(vcpu, vmcs12);
>
>>>          /*
>>> --
>>> 2.7.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-06  0:57     ` Wanpeng Li
@ 2018-02-06 16:58       ` Jim Mattson
  2018-02-07  8:31         ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Mattson @ 2018-02-06 16:58 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: kvm list, LKML, Radim Krcmar, Paolo Bonzini

On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <kernellwp@gmail.com> wrote:

> This is effective one, what I restore in this patch is
> achitectural/guest visible.

This patch doesn't "restore" the guest visible CR4 to its value at the
time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
That behavior is incorrect.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-06 16:58       ` Jim Mattson
@ 2018-02-07  8:31         ` Wanpeng Li
  2018-02-07 16:57           ` Jim Mattson
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2018-02-07  8:31 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, LKML, Radim Krcmar, Paolo Bonzini

2018-02-07 0:58 GMT+08:00 Jim Mattson <jmattson@google.com>:
> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>
>> This is effective one, what I restore in this patch is
>> achitectural/guest visible.
>
> This patch doesn't "restore" the guest visible CR4 to its value at the
> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
> That behavior is incorrect.

You have another pointing out about this.
https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
up-to-date value when L1 is running, it is still up-to-date after
vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
comments, why vmcs12->host_cr3/cr4 is not the value which we should
restore?

* After an early L2 VM-entry failure, we're now back
* in L1 which thinks it just finished a VMLAUNCH or
* VMRESUME instruction

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-07  8:31         ` Wanpeng Li
@ 2018-02-07 16:57           ` Jim Mattson
  2018-02-08  6:22             ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Mattson @ 2018-02-07 16:57 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: kvm list, LKML, Radim Krcmar, Paolo Bonzini

vmcs12->host_cr[34] does not contain the up-to-date values when L1 is
running. L1 can vmwrite any values there. We know at this point that
they are legal (because we checked them), but that's about it. If the
VMLAUNCH/VMRESUME of vmcs12 fails for "invalid control field," there
is no VM-exit from L2 to L1, and these fields are not loaded. Instead,
execution just falls through to the next instruction with VMFailValid
semantics.

On Wed, Feb 7, 2018 at 12:31 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2018-02-07 0:58 GMT+08:00 Jim Mattson <jmattson@google.com>:
>> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>
>>> This is effective one, what I restore in this patch is
>>> achitectural/guest visible.
>>
>> This patch doesn't "restore" the guest visible CR4 to its value at the
>> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
>> That behavior is incorrect.
>
> You have another pointing out about this.
> https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
> up-to-date value when L1 is running, it is still up-to-date after
> vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
> the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
> comments, why vmcs12->host_cr3/cr4 is not the value which we should
> restore?
>
> * After an early L2 VM-entry failure, we're now back
> * in L1 which thinks it just finished a VMLAUNCH or
> * VMRESUME instruction
>
> Regards,
> Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-07 16:57           ` Jim Mattson
@ 2018-02-08  6:22             ` Wanpeng Li
  2018-02-08 15:29               ` Jim Mattson
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2018-02-08  6:22 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, LKML, Radim Krcmar, Paolo Bonzini

2018-02-08 0:57 GMT+08:00 Jim Mattson <jmattson@google.com>:
> vmcs12->host_cr[34] does not contain the up-to-date values when L1 is
> running. L1 can vmwrite any values there. We know at this point that

It will incur a vmexit to emulate L1 vmwrites vmcs12->host_cr[34] even
if vmcs shadow is enabled since host_cr[34] is not shadowed in the
bitmap, why it is not up-to-date when L1 is running?

Regards,
Wanpeng Li

> they are legal (because we checked them), but that's about it. If the
> VMLAUNCH/VMRESUME of vmcs12 fails for "invalid control field," there
> is no VM-exit from L2 to L1, and these fields are not loaded. Instead,
> execution just falls through to the next instruction with VMFailValid
> semantics.
>
> On Wed, Feb 7, 2018 at 12:31 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> 2018-02-07 0:58 GMT+08:00 Jim Mattson <jmattson@google.com>:
>>> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>>
>>>> This is effective one, what I restore in this patch is
>>>> achitectural/guest visible.
>>>
>>> This patch doesn't "restore" the guest visible CR4 to its value at the
>>> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
>>> That behavior is incorrect.
>>
>> You have another pointing out about this.
>> https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
>> up-to-date value when L1 is running, it is still up-to-date after
>> vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
>> the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
>> comments, why vmcs12->host_cr3/cr4 is not the value which we should
>> restore?
>>
>> * After an early L2 VM-entry failure, we're now back
>> * in L1 which thinks it just finished a VMLAUNCH or
>> * VMRESUME instruction
>>
>> Regards,
>> Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-08  6:22             ` Wanpeng Li
@ 2018-02-08 15:29               ` Jim Mattson
  2018-02-08 15:54                 ` Jim Mattson
  2018-02-11 11:56                 ` Wanpeng Li
  0 siblings, 2 replies; 12+ messages in thread
From: Jim Mattson @ 2018-02-08 15:29 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: kvm list, LKML, Radim Krcmar, Paolo Bonzini

Consider the following scenario:

L1 has never successfully executed VMLAUNCH. It has written 0 to
vmcs12's host CR3 field using VMWRITE, but the current host CR3 value
is actually 3e7000. It has written some illegal control field that the
L0 KVM doesn't check itself, but defers to the hardware checks on
vmcs02 instead. So, when L1 tries to execute VMLAUNCH, L0 follows this
path for "VM-entry to vmcs02 failed due to invalid control field(s)."
Your change would set CR3 to 0, which is incorrect. CR3 should
actually be set to 3e7000. Now, if L0 is sane and using EPT, then it
can find the correct L1 CR3 value in vmcs01's Guest CR3 field, but if
for some reason L0 is using shadow paging to execute L1, that won't
work. Similarly, the correct L1 CR4 value should be in vmcs01's CR4
read shadow field.

You can't just assume that L1 has written values to the vmcs12 host
fields that actually match the current host values. There is nothing
in the architecture that would require this behavior.

On Wed, Feb 7, 2018 at 10:22 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2018-02-08 0:57 GMT+08:00 Jim Mattson <jmattson@google.com>:
>> vmcs12->host_cr[34] does not contain the up-to-date values when L1 is
>> running. L1 can vmwrite any values there. We know at this point that
>
> It will incur a vmexit to emulate L1 vmwrites vmcs12->host_cr[34] even
> if vmcs shadow is enabled since host_cr[34] is not shadowed in the
> bitmap, why it is not up-to-date when L1 is running?
>
> Regards,
> Wanpeng Li
>
>> they are legal (because we checked them), but that's about it. If the
>> VMLAUNCH/VMRESUME of vmcs12 fails for "invalid control field," there
>> is no VM-exit from L2 to L1, and these fields are not loaded. Instead,
>> execution just falls through to the next instruction with VMFailValid
>> semantics.
>>
>> On Wed, Feb 7, 2018 at 12:31 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>> 2018-02-07 0:58 GMT+08:00 Jim Mattson <jmattson@google.com>:
>>>> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>>>
>>>>> This is effective one, what I restore in this patch is
>>>>> achitectural/guest visible.
>>>>
>>>> This patch doesn't "restore" the guest visible CR4 to its value at the
>>>> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
>>>> That behavior is incorrect.
>>>
>>> You have another pointing out about this.
>>> https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
>>> up-to-date value when L1 is running, it is still up-to-date after
>>> vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
>>> the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
>>> comments, why vmcs12->host_cr3/cr4 is not the value which we should
>>> restore?
>>>
>>> * After an early L2 VM-entry failure, we're now back
>>> * in L1 which thinks it just finished a VMLAUNCH or
>>> * VMRESUME instruction
>>>
>>> Regards,
>>> Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-08 15:29               ` Jim Mattson
@ 2018-02-08 15:54                 ` Jim Mattson
  2018-02-08 16:40                   ` Paolo Bonzini
  2018-02-11 11:56                 ` Wanpeng Li
  1 sibling, 1 reply; 12+ messages in thread
From: Jim Mattson @ 2018-02-08 15:54 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: kvm list, LKML, Radim Krcmar, Paolo Bonzini

On Thu, Feb 8, 2018 at 7:29 AM, Jim Mattson <jmattson@google.com> wrote:
> Similarly, the correct L1 CR4 value should be in vmcs01's CR4
> read shadow field.

Sorry; that's wrong. L1's CR4 value has to be reconstructed from the
vmcs01 guest CR4 field and CR4 shadow field using the cr4 guest/host
mask. But there is no way to get it from any field(s) in vmcs12.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-08 15:54                 ` Jim Mattson
@ 2018-02-08 16:40                   ` Paolo Bonzini
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Bonzini @ 2018-02-08 16:40 UTC (permalink / raw)
  To: Jim Mattson, Wanpeng Li; +Cc: kvm list, LKML, Radim Krcmar

On 08/02/2018 16:54, Jim Mattson wrote:
> On Thu, Feb 8, 2018 at 7:29 AM, Jim Mattson <jmattson@google.com> wrote:
>> Similarly, the correct L1 CR4 value should be in vmcs01's CR4
>> read shadow field.
> Sorry; that's wrong. L1's CR4 value has to be reconstructed from the
> vmcs01 guest CR4 field and CR4 shadow field using the cr4 guest/host
> mask. But there is no way to get it from any field(s) in vmcs12.

Now that we have the prepare_vmcs02_full/prepare_vmcs02 split, we
probably should do more checks in there, and not rely on the processor
anymore.

Paolo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-08 15:29               ` Jim Mattson
  2018-02-08 15:54                 ` Jim Mattson
@ 2018-02-11 11:56                 ` Wanpeng Li
  2018-02-12 17:37                   ` Jim Mattson
  1 sibling, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2018-02-11 11:56 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, LKML, Radim Krcmar, Paolo Bonzini

2018-02-08 23:29 GMT+08:00 Jim Mattson <jmattson@google.com>:
> Consider the following scenario:
>
> L1 has never successfully executed VMLAUNCH. It has written 0 to
> vmcs12's host CR3 field using VMWRITE, but the current host CR3 value

Writes 0 to cr3 can't be detected during vmentry checks by hardware.

> is actually 3e7000. It has written some illegal control field that the
> L0 KVM doesn't check itself, but defers to the hardware checks on
> vmcs02 instead. So, when L1 tries to execute VMLAUNCH, L0 follows this
> path for "VM-entry to vmcs02 failed due to invalid control field(s)."
> Your change would set CR3 to 0, which is incorrect. CR3 should
> actually be set to 3e7000. Now, if L0 is sane and using EPT, then it
> can find the correct L1 CR3 value in vmcs01's Guest CR3 field, but if
> for some reason L0 is using shadow paging to execute L1, that won't
> work. Similarly, the correct L1 CR4 value should be in vmcs01's CR4
> read shadow field.
>
> You can't just assume that L1 has written values to the vmcs12 host
> fields that actually match the current host values. There is nothing
> in the architecture that would require this behavior.
>
> On Wed, Feb 7, 2018 at 10:22 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> 2018-02-08 0:57 GMT+08:00 Jim Mattson <jmattson@google.com>:
>>> vmcs12->host_cr[34] does not contain the up-to-date values when L1 is
>>> running. L1 can vmwrite any values there. We know at this point that
>>
>> It will incur a vmexit to emulate L1 vmwrites vmcs12->host_cr[34] even
>> if vmcs shadow is enabled since host_cr[34] is not shadowed in the
>> bitmap, why it is not up-to-date when L1 is running?
>>
>> Regards,
>> Wanpeng Li
>>
>>> they are legal (because we checked them), but that's about it. If the
>>> VMLAUNCH/VMRESUME of vmcs12 fails for "invalid control field," there
>>> is no VM-exit from L2 to L1, and these fields are not loaded. Instead,
>>> execution just falls through to the next instruction with VMFailValid
>>> semantics.
>>>
>>> On Wed, Feb 7, 2018 at 12:31 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>>> 2018-02-07 0:58 GMT+08:00 Jim Mattson <jmattson@google.com>:
>>>>> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>>>>
>>>>>> This is effective one, what I restore in this patch is
>>>>>> achitectural/guest visible.
>>>>>
>>>>> This patch doesn't "restore" the guest visible CR4 to its value at the
>>>>> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
>>>>> That behavior is incorrect.
>>>>
>>>> You have another pointing out about this.
>>>> https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
>>>> up-to-date value when L1 is running, it is still up-to-date after
>>>> vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
>>>> the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
>>>> comments, why vmcs12->host_cr3/cr4 is not the value which we should
>>>> restore?
>>>>
>>>> * After an early L2 VM-entry failure, we're now back
>>>> * in L1 which thinks it just finished a VMLAUNCH or
>>>> * VMRESUME instruction
>>>>
>>>> Regards,
>>>> Wanpeng Li

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure
  2018-02-11 11:56                 ` Wanpeng Li
@ 2018-02-12 17:37                   ` Jim Mattson
  0 siblings, 0 replies; 12+ messages in thread
From: Jim Mattson @ 2018-02-12 17:37 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: kvm list, LKML, Radim Krcmar, Paolo Bonzini

On Sun, Feb 11, 2018 at 3:56 AM, Wanpeng Li <kernellwp@gmail.com> wrote:

> Writes 0 to cr3 can't be detected during vmentry checks by hardware.

I never suggested otherwise. I was just trying to explain why you
can't assume that the host CR3 field in the VMCS matches the host CR3
at the time of VMLAUNCH.

KVM is set up for failure, because it loads a bunch of guest state
before checking the validity of all of the control fields. If a
control field in the vmcs12 is invalid, but KVM doesn't recognize this
until after it has loaded guest state, all of the host state that has
been overwritten should be restored. "Restored" does not mean "loaded
from the vmcs12." It means reverted to its state at the time of the
failed VMLAUNCH/VMRESUME.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-02-12 17:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-05 11:04 [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure Wanpeng Li
     [not found] ` <CALMp9eQNB+2KLAR2Tv+By2oTthRLvAQMmJiEs61kFNrZDct3_g@mail.gmail.com>
2018-02-05 18:24   ` Jim Mattson
2018-02-06  0:57     ` Wanpeng Li
2018-02-06 16:58       ` Jim Mattson
2018-02-07  8:31         ` Wanpeng Li
2018-02-07 16:57           ` Jim Mattson
2018-02-08  6:22             ` Wanpeng Li
2018-02-08 15:29               ` Jim Mattson
2018-02-08 15:54                 ` Jim Mattson
2018-02-08 16:40                   ` Paolo Bonzini
2018-02-11 11:56                 ` Wanpeng Li
2018-02-12 17:37                   ` Jim Mattson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).