All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ladi Prosek <lprosek@redhat.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"KVM list" <kvm@vger.kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Wanpeng Li" <wanpeng.li@hotmail.com>
Subject: Re: [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT
Date: Wed, 22 Mar 2017 13:00:57 +0100	[thread overview]
Message-ID: <CABdb735TnqpdfHcgUu7eJh6eJ+S45tY55u4fWNJrRh3ujK-X_g@mail.gmail.com> (raw)
In-Reply-To: <CANRm+CwBgXWoN29PsP-zhyzwq6X3RopxgLS9sj4vep_BTFAALw@mail.gmail.com>

On Sat, Mar 18, 2017 at 7:37 AM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2017-03-18 1:28 GMT+08:00 Ladi Prosek <lprosek@redhat.com>:
>> On Fri, Mar 17, 2017 at 3:41 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> The L2 guest hang if shadow page tables on EPT, the trace on L1 shows that
>>> L2 kvm_exit reason EXCEPTION_NMI and page fault repeatedly:
>>>
>>> qemu-system-x86-2821  [003] d..2    45.848814: kvm_entry: vcpu 0
>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_exit: reason EXCEPTION_NMI rip 0xe05b info fe05b 80000b0e
>>> qemu-system-x86-2821  [003] ...1    45.848827: kvm_page_fault: address fe05b error_code 14
>>>
>>> Commit 7ca29de21362 (KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT)
>>> prevents to load L2's PDPTRs according to dereferencing L2's CR3 since it is
>>> uninitialized in real mode. Hyper-V L1 will emulate L2 real mode with PAE
>>> paging and EPT enabled. However, there is a progress to switch from Legacy
>>> mode's such-mode Protected mode to Long mode during system boot, the check
>>> in nested_vmx_load_cr3() will prevent to load PDPTRs if it is still in
>>> Protected mode w/ PAE paging and nested EPT/shadow page tables on EPT. Actually
>>> the original commit should just intended to prevent to dereference L2's CR3
>>> if the L1 hypervisor emulates L2's real mode through vm8086.
>>>
>>> This patch fixes it by allowing load PDPTRs if PAE paing, EPT enabled and
>>> !vm86_active.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Cc: Ladi Prosek <lprosek@redhat.com>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>> ---
>>>  arch/x86/kvm/vmx.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index c664365..2b2a05f 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -9933,7 +9933,7 @@ static bool nested_cr3_valid(struct kvm_vcpu *vcpu, unsigned long val)
>>>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
>>>                                u32 *entry_failure_code)
>>>  {
>>> -       if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
>>> +       if (cr3 != kvm_read_cr3(vcpu) || pdptrs_changed(vcpu)) {
>>>                 if (!nested_cr3_valid(vcpu, cr3)) {
>>>                         *entry_failure_code = ENTRY_FAIL_DEFAULT;
>>>                         return 1;
>>> @@ -9944,7 +9944,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool ne
>>>                  * must not be dereferenced.
>>>                  */
>>>                 if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
>>> -                   !nested_ept) {
>>> +                   !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>>
>> This change breaks Hyper-V on KVM. L2 hangs on start-up, same symptoms
>> as before 7ca29de21362.
>
> Hmm, I miss the function pdptrs_changed() will also dereference CR3.
> How about something like this:
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c664365..d7ebf03 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -9933,7 +9933,9 @@ static bool nested_cr3_valid(struct kvm_vcpu
> *vcpu, unsigned long val)
>  static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long
> cr3, bool nested_ept,
>                     u32 *entry_failure_code)
>  {
> -    if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
> +    if (cr3 != kvm_read_cr3(vcpu) ||
> +        (!(nested_ept && to_vmx(vcpu)->rmode.vm86_active) &&
> +        pdptrs_changed(vcpu))) {
>          if (!nested_cr3_valid(vcpu, cr3)) {
>              *entry_failure_code = ENTRY_FAIL_DEFAULT;
>              return 1;
> @@ -9944,7 +9946,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu
> *vcpu, unsigned long cr3, bool ne
>           * must not be dereferenced.
>           */
>          if (!is_long_mode(vcpu) && is_pae(vcpu) && is_paging(vcpu) &&
> -            !nested_ept) {
> +            !(nested_ept && to_vmx(vcpu)->rmode.vm86_active)) {
>              if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>                  *entry_failure_code = ENTRY_FAIL_PDPTE;
>                  return 1;

Still the same, Hyper-V is broken. The problem is not in real vs.
protected mode. The way nested_ept_enabled is computed is incorrect.

I can run both Hyper-V and KVM with EPT = 0 in L1 with this patch. Can
you please give it a try?

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 98e82ee..9145c94 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10121,7 +10121,7 @@ static int prepare_vmcs02(struct kvm_vcpu
*vcpu, struct vmcs12 *vmcs12,
                                vmcs12->guest_intr_status);
                }

-               nested_ept_enabled = (exec_control &
SECONDARY_EXEC_ENABLE_EPT) != 0;
+               nested_ept_enabled =
(vmcs12->secondary_vm_exec_control & SECONDARY_EXEC_ENABLE_EPT) != 0;

                /*
                 * Write an illegal value to APIC_ACCESS_ADDR. Later,

Thanks!
Ladi

>> I'll take a closer look next week. Is there an easy way for me to
>> reproduce the issue you're seeing?
>
> L1 KVM w/ EPT = 0 and L0 KVM w/ EPT = 1.
>
> Regards,
> Wanpeng Li
>
>>
>>>                         if (!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) {
>>>                                 *entry_failure_code = ENTRY_FAIL_PDPTE;
>>>                                 return 1;
>>> --
>>> 2.7.4
>>>

  reply	other threads:[~2017-03-22 12:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-17 14:41 [PATCH] KVM: nVMX: Fix L2 guest hang if shadow page tables on EPT Wanpeng Li
2017-03-17 14:47 ` Paolo Bonzini
2017-03-17 17:28 ` Ladi Prosek
2017-03-17 17:33   ` Paolo Bonzini
2017-03-18  6:37   ` Wanpeng Li
2017-03-22 12:00     ` Ladi Prosek [this message]
2017-03-22 13:44       ` Wanpeng Li
2017-03-22 14:17         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABdb735TnqpdfHcgUu7eJh6eJ+S45tY55u4fWNJrRh3ujK-X_g@mail.gmail.com \
    --to=lprosek@redhat.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.