All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT
@ 2017-03-17 14:41 Wanpeng Li
  2017-03-17 14:47 ` Paolo Bonzini
  2017-03-17 17:28 ` Ladi Prosek
  0 siblings, 2 replies; 8+ messages in thread
From: Wanpeng Li @ 2017-03-17 14:41 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li, Ladi Prosek

From: Wanpeng Li <wanpeng.li@hotmail.com>

The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that 
L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:

qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14

Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is 
uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE 
paging and EPT enabled. However, there is a progress to switch from Legacy 
mode's such-mode Protected mode to Long mode during system boot, the check 
in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in 
Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually 
the original commit should just intended to prevent to dereference L2's CR3 
if the L1 hypervisor emulates L2's real mode through vm8086.  

This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and 
!vm86_active.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/vmx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c664365..2b2a05f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
 static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
 			       u32 *entry_failure_code)
 {
-	if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
+	if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
 		if (!nested_cr3_valid(vcpu, cr3)) {
 			*entry_failure_code = ENTRY_FAIL_DEFAULT;
 			return 1;
@@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
 		 * must not be dereferenced.
 		 */
 		if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
-		    !nested_ept) {
+		    !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
 			if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
 				*entry_failure_code = ENTRY_FAIL_PDPTE;
 				return 1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT
  2017-03-17 14:41 [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT Wanpeng Li
@ 2017-03-17 14:47 ` Paolo Bonzini
  2017-03-17 17:28 ` Ladi Prosek
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2017-03-17 14:47 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Wanpeng Li, Ladi Prosek



On 17/03/2017 15:41, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that 
> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
> 
> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
> 
> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is 
> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE 
> paging and EPT enabled. However, there is a progress to switch from Legacy 
> mode's such-mode Protected mode to Long mode during system boot, the check 
> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in 
> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually 
> the original commit should just intended to prevent to dereference L2's CR3 
> if the L1 hypervisor emulates L2's real mode through vm8086.  
> 
> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and 
> !vm86_active.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Ladi Prosek <lprosek@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>

Please provide a testcase.  I know this is a regression, but I'm not
going to merge the fix without a corresponding patch to kvm-unit-tests.

Paolo

> ---
>  arch/x86/kvm/vmx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c664365..2b2a05f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>  			       u32 *entry_failure_code)
>  {
> -	if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
> +	if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>  		if (!nested_cr3_valid(vcpu, cr3)) {
>  			*entry_failure_code = ENTRY_FAIL_DEFAULT;
>  			return 1;
> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>  		 * must not be dereferenced.
>  		 */
>  		if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
> -		    !nested_ept) {
> +		    !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>  			if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>  				*entry_failure_code = ENTRY_FAIL_PDPTE;
>  				return 1;
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT
  2017-03-17 14:41 [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT Wanpeng Li
  2017-03-17 14:47 ` Paolo Bonzini
@ 2017-03-17 17:28 ` Ladi Prosek
  2017-03-17 17:33   ` Paolo Bonzini
  2017-03-18  6:37   ` Wanpeng Li
  1 sibling, 2 replies; 8+ messages in thread
From: Ladi Prosek @ 2017-03-17 17:28 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, KVM list, Paolo Bonzini,
	Radim Krčmář,
	Wanpeng Li

On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
>
> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>
> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>
> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
> paging and EPT enabled. However, there is a progress to switch from Legacy
> mode's such-mode Protected mode to Long mode during system boot, the check
> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
> the original commit should just intended to prevent to dereference L2's CR3
> if the L1 hypervisor emulates L2's real mode through vm8086.
>
> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
> !vm86_active.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Ladi Prosek <lprosek@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/kvm/vmx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c664365..2b2a05f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>                                u32 *entry_failure_code)
>  {
> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>                 if (!nested_cr3_valid(vcpu, cr3)) {
>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>                         return 1;
> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>                  * must not be dereferenced.
>                  */
>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
> -                   !nested_ept) {
> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {

This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
as before 7ca29de21362.

I'll take a closer look next week. Is there an easy way for me to
reproduce the issue you're seeing?

>                         if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>                                 *entry_failure_code = ENTRY_FAIL_PDPTE;
>                                 return 1;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT
  2017-03-17 17:28 ` Ladi Prosek
@ 2017-03-17 17:33   ` Paolo Bonzini
  2017-03-18  6:37   ` Wanpeng Li
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2017-03-17 17:33 UTC (permalink / raw)
  To: Ladi Prosek, Wanpeng Li
  Cc: linux-kernel, KVM list, Radim Krčmář, Wanpeng Li



On 17/03/2017 18:28, Ladi Prosek wrote:
> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>
>> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>>
>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>> paging and EPT enabled. However, there is a progress to switch from Legacy
>> mode's such-mode Protected mode to Long mode during system boot, the check
>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>> the original commit should just intended to prevent to dereference L2's CR3
>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>
>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>> !vm86_active.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Ladi Prosek <lprosek@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>  arch/x86/kvm/vmx.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c664365..2b2a05f 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>>                                u32 *entry_failure_code)
>>  {
>> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>>                 if (!nested_cr3_valid(vcpu, cr3)) {
>>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>                         return 1;
>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>                  * must not be dereferenced.
>>                  */
>>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>> -                   !nested_ept) {
>> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
> 
> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
> as before 7ca29de21362.

Looks like we need _two_ testcases then... :)

Paolo

> I'll take a closer look next week. Is there an easy way for me to
> reproduce the issue you're seeing?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT
  2017-03-17 17:28 ` Ladi Prosek
  2017-03-17 17:33   ` Paolo Bonzini
@ 2017-03-18  6:37   ` Wanpeng Li
  2017-03-22 12:00     ` Ladi Prosek
  1 sibling, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2017-03-18  6:37 UTC (permalink / raw)
  To: Ladi Prosek
  Cc: linux-kernel, KVM list, Paolo Bonzini,
	Radim Krčmář,
	Wanpeng Li

2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>
>> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>>
>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>> paging and EPT enabled. However, there is a progress to switch from Legacy
>> mode's such-mode Protected mode to Long mode during system boot, the check
>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>> the original commit should just intended to prevent to dereference L2's CR3
>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>
>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>> !vm86_active.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Ladi Prosek <lprosek@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>  arch/x86/kvm/vmx.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c664365..2b2a05f 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>>                                u32 *entry_failure_code)
>>  {
>> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>>                 if (!nested_cr3_valid(vcpu, cr3)) {
>>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>                         return 1;
>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>                  * must not be dereferenced.
>>                  */
>>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>> -                   !nested_ept) {
>> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>
> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
> as before 7ca29de21362.

Hmm, I miss the function pdptrs_changed() will also dereference CR3.
How about something like this:

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c664365..d7ebf03 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu
*vcpu, unsigned long val)
 static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long
cr3, bool nested_ept,
                    u32 *entry_failure_code)
 {
-    if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
+    if (cr3 != kvm_read_cr3(vcpu) ||
+        (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) &&
+        pdptrs_changed(vcpu))) {
         if (!nested_cr3_valid(vcpu, cr3)) {
             *entry_failure_code = ENTRY_FAIL_DEFAULT;
             return 1;
@@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu
*vcpu, unsigned long cr3, bool ne
          * must not be dereferenced.
          */
         if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
-            !nested_ept) {
+            !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
             if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
                 *entry_failure_code = ENTRY_FAIL_PDPTE;
                 return 1;

>
> I'll take a closer look next week. Is there an easy way for me to
> reproduce the issue you're seeing?

L1 KVM w/ EPT = 0 and L0 KVM w/ EPT = 1.

Regards,
Wanpeng Li

>
>>                         if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>>                                 *entry_failure_code = ENTRY_FAIL_PDPTE;
>>                                 return 1;
>> --
>> 2.7.4
>>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT
  2017-03-18  6:37   ` Wanpeng Li
@ 2017-03-22 12:00     ` Ladi Prosek
  2017-03-22 13:44       ` Wanpeng Li
  0 siblings, 1 reply; 8+ messages in thread
From: Ladi Prosek @ 2017-03-22 12:00 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, KVM list, Paolo Bonzini,
	Radim Krčmář,
	Wanpeng Li

On Sat, Mar 18, 2017 at 7:37 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
>> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>>
>>> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>>>
>>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>>> paging and EPT enabled. However, there is a progress to switch from Legacy
>>> mode's such-mode Protected mode to Long mode during system boot, the check
>>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>>> the original commit should just intended to prevent to dereference L2's CR3
>>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>>
>>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>>> !vm86_active.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Cc: Ladi Prosek <lprosek@redhat.com>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>> ---
>>>  arch/x86/kvm/vmx.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index c664365..2b2a05f 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>>>                                u32 *entry_failure_code)
>>>  {
>>> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>>> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>>>                 if (!nested_cr3_valid(vcpu, cr3)) {
>>>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>>                         return 1;
>>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>>                  * must not be dereferenced.
>>>                  */
>>>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>>> -                   !nested_ept) {
>>> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>>
>> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
>> as before 7ca29de21362.
>
> Hmm, I miss the function pdptrs_changed() will also dereference CR3.
> How about something like this:
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c664365..d7ebf03 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu
> *vcpu, unsigned long val)
>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long
> cr3, bool nested_ept,
>                     u32 *entry_failure_code)
>  {
> -    if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
> +    if (cr3 != kvm_read_cr3(vcpu) ||
> +        (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) &&
> +        pdptrs_changed(vcpu))) {
>          if (!nested_cr3_valid(vcpu, cr3)) {
>              *entry_failure_code = ENTRY_FAIL_DEFAULT;
>              return 1;
> @@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu
> *vcpu, unsigned long cr3, bool ne
>           * must not be dereferenced.
>           */
>          if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
> -            !nested_ept) {
> +            !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>              if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>                  *entry_failure_code = ENTRY_FAIL_PDPTE;
>                  return 1;

Still the same, Hyper-V is broken. The problem is not in real vs.
protected mode. The way nested_ept_enabled is computed is incorrect.

I can run both Hyper-V and KVM with EPT = 0 in L1 with this patch. Can
you please give it a try?

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 98e82ee..9145c94 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10121,7 +10121,7 @@ static int prepare_vmcs02(struct kvm_vcpu
*vcpu, struct vmcs12 *vmcs12,
                                vmcs12->guest_intr_status);
                }

-               nested_ept_enabled = (exec_control &
SECONDARY_EXEC_ENABLE_EPT) != 0;
+               nested_ept_enabled =
(vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0;

                /*
                 * Write an illegal value to APIC_ACCESS_ADDR. Later,

Thanks!
Ladi

>> I'll take a closer look next week. Is there an easy way for me to
>> reproduce the issue you're seeing?
>
> L1 KVM w/ EPT = 0 and L0 KVM w/ EPT = 1.
>
> Regards,
> Wanpeng Li
>
>>
>>>                         if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>>>                                 *entry_failure_code = ENTRY_FAIL_PDPTE;
>>>                                 return 1;
>>> --
>>> 2.7.4
>>>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT
  2017-03-22 12:00     ` Ladi Prosek
@ 2017-03-22 13:44       ` Wanpeng Li
  2017-03-22 14:17         ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2017-03-22 13:44 UTC (permalink / raw)
  To: Ladi Prosek
  Cc: linux-kernel, KVM list, Paolo Bonzini,
	Radim Krčmář,
	Wanpeng Li

2017-03-22 20:00 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
> On Sat, Mar 18, 2017 at 7:37 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> 2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
>>> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>>
>>>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>>>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>>>
>>>> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
>>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>>>>
>>>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>>>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>>>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>>>> paging and EPT enabled. However, there is a progress to switch from Legacy
>>>> mode's such-mode Protected mode to Long mode during system boot, the check
>>>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>>>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>>>> the original commit should just intended to prevent to dereference L2's CR3
>>>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>>>
>>>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>>>> !vm86_active.
>>>>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>>> Cc: Ladi Prosek <lprosek@redhat.com>
>>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>>> ---
>>>>  arch/x86/kvm/vmx.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>> index c664365..2b2a05f 100644
>>>> --- a/arch/x86/kvm/vmx.c
>>>> +++ b/arch/x86/kvm/vmx.c
>>>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>>>>                                u32 *entry_failure_code)
>>>>  {
>>>> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>>>> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>>>>                 if (!nested_cr3_valid(vcpu, cr3)) {
>>>>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>>>                         return 1;
>>>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>>>                  * must not be dereferenced.
>>>>                  */
>>>>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>>>> -                   !nested_ept) {
>>>> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>>>
>>> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
>>> as before 7ca29de21362.
>>
>> Hmm, I miss the function pdptrs_changed() will also dereference CR3.
>> How about something like this:
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c664365..d7ebf03 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu
>> *vcpu, unsigned long val)
>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long
>> cr3, bool nested_ept,
>>                     u32 *entry_failure_code)
>>  {
>> -    if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>> +    if (cr3 != kvm_read_cr3(vcpu) ||
>> +        (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) &&
>> +        pdptrs_changed(vcpu))) {
>>          if (!nested_cr3_valid(vcpu, cr3)) {
>>              *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>              return 1;
>> @@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu
>> *vcpu, unsigned long cr3, bool ne
>>           * must not be dereferenced.
>>           */
>>          if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>> -            !nested_ept) {
>> +            !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>>              if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>>                  *entry_failure_code = ENTRY_FAIL_PDPTE;
>>                  return 1;
>
> Still the same, Hyper-V is broken. The problem is not in real vs.
> protected mode. The way nested_ept_enabled is computed is incorrect.
>
> I can run both Hyper-V and KVM with EPT = 0 in L1 with this patch. Can
> you please give it a try?
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 98e82ee..9145c94 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10121,7 +10121,7 @@ static int prepare_vmcs02(struct kvm_vcpu
> *vcpu, struct vmcs12 *vmcs12,
>                                 vmcs12->guest_intr_status);
>                 }
>
> -               nested_ept_enabled = (exec_control &
> SECONDARY_EXEC_ENABLE_EPT) != 0;
> +               nested_ept_enabled =
> (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0;
>
>                 /*
>                  * Write an illegal value to APIC_ACCESS_ADDR. Later,

You are right, it works. Please send out a formal patch and add the
kvm-unit-tests as Paolo mentioned.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT
  2017-03-22 13:44       ` Wanpeng Li
@ 2017-03-22 14:17         ` Paolo Bonzini
  0 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2017-03-22 14:17 UTC (permalink / raw)
  To: Wanpeng Li, Ladi Prosek
  Cc: linux-kernel, KVM list, Radim Krčmář, Wanpeng Li



On 22/03/2017 14:44, Wanpeng Li wrote:
> 2017-03-22 20:00 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
>> On Sat, Mar 18, 2017 at 7:37 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>> 2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
>>>> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>>>
>>>>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>>>>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>>>>
>>>>> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
>>>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>>>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>>>>>
>>>>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>>>>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>>>>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>>>>> paging and EPT enabled. However, there is a progress to switch from Legacy
>>>>> mode's such-mode Protected mode to Long mode during system boot, the check
>>>>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>>>>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>>>>> the original commit should just intended to prevent to dereference L2's CR3
>>>>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>>>>
>>>>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>>>>> !vm86_active.
>>>>>
>>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>>>> Cc: Ladi Prosek <lprosek@redhat.com>
>>>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>>>> ---
>>>>>  arch/x86/kvm/vmx.c | 4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index c664365..2b2a05f 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>>>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>>>>>                                u32 *entry_failure_code)
>>>>>  {
>>>>> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>>>>> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>>>>>                 if (!nested_cr3_valid(vcpu, cr3)) {
>>>>>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>>>>                         return 1;
>>>>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>>>>                  * must not be dereferenced.
>>>>>                  */
>>>>>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>>>>> -                   !nested_ept) {
>>>>> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>>>>
>>>> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
>>>> as before 7ca29de21362.
>>>
>>> Hmm, I miss the function pdptrs_changed() will also dereference CR3.
>>> How about something like this:
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index c664365..d7ebf03 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu
>>> *vcpu, unsigned long val)
>>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long
>>> cr3, bool nested_ept,
>>>                     u32 *entry_failure_code)
>>>  {
>>> -    if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>>> +    if (cr3 != kvm_read_cr3(vcpu) ||
>>> +        (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) &&
>>> +        pdptrs_changed(vcpu))) {
>>>          if (!nested_cr3_valid(vcpu, cr3)) {
>>>              *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>>              return 1;
>>> @@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu
>>> *vcpu, unsigned long cr3, bool ne
>>>           * must not be dereferenced.
>>>           */
>>>          if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>>> -            !nested_ept) {
>>> +            !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>>>              if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>>>                  *entry_failure_code = ENTRY_FAIL_PDPTE;
>>>                  return 1;
>>
>> Still the same, Hyper-V is broken. The problem is not in real vs.
>> protected mode. The way nested_ept_enabled is computed is incorrect.
>>
>> I can run both Hyper-V and KVM with EPT = 0 in L1 with this patch. Can
>> you please give it a try?
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 98e82ee..9145c94 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -10121,7 +10121,7 @@ static int prepare_vmcs02(struct kvm_vcpu
>> *vcpu, struct vmcs12 *vmcs12,
>>                                 vmcs12->guest_intr_status);
>>                 }
>>
>> -               nested_ept_enabled = (exec_control &
>> SECONDARY_EXEC_ENABLE_EPT) != 0;
>> +               nested_ept_enabled =
>> (vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0;
>>
>>                 /*
>>                  * Write an illegal value to APIC_ACCESS_ADDR. Later,
> 
> You are right, it works. Please send out a formal patch and add the
> kvm-unit-tests as Paolo mentioned.

I do believe it would be more fair if _you_ send out the kvm-unit-tests
patch.

Paolo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-03-22 14:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-17 14:41 [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT Wanpeng Li
2017-03-17 14:47 ` Paolo Bonzini
2017-03-17 17:28 ` Ladi Prosek
2017-03-17 17:33   ` Paolo Bonzini
2017-03-18  6:37   ` Wanpeng Li
2017-03-22 12:00     ` Ladi Prosek
2017-03-22 13:44       ` Wanpeng Li
2017-03-22 14:17         ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.