All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jim Mattson <jmattson@google.com>
To: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: "Wanpeng Li" <kernellwp@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"kvm list" <kvm@vger.kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Wanpeng Li" <wanpeng.li@hotmail.com>
Subject: Re: [PATCH v5 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
Date: Mon, 7 May 2018 15:56:58 -0700	[thread overview]
Message-ID: <CALMp9eSJ0UpH9zLT0+uf3+vPa1wvhZSY9wiyWF7_HFOL9jKC=A@mail.gmail.com> (raw)
In-Reply-To: <CALMp9eTGeqV21MtMRsvsBsKn6VT7qHNrh8ciT1zueLZ0z1-dww@mail.gmail.com>

Moreover, if the VMLAUNCH/VMRESUME is in 32-bit PAE mode, then the
PDPTRs should not be reloaded.

On Mon, Feb 5, 2018 at 10:44 AM, Jim Mattson <jmattson@google.com> wrote:
> I realize now that this fix isn't quite right, since it loads
> vmcs12->host_cr3 rather than reverting to the CR3 that was loaded at the
> time of VMLAUNCH/VMRESUME. In the case of VMfailValid(VM entry with invalid
> VMX-control field(s)), none of the VMCS12 host state fields should be
> loaded. See the pseudocode for VMLAUNCH/VMRESUME in volume 3 of the SDM.
>
>
> On Wed, Nov 8, 2017 at 1:47 PM Jim Mattson <jmattson@google.com> wrote:
>
>> I realize now that there are actually many other problems with
>> deferring some control field checks to the hardware VM-entry of
>> vmcs02. When there is an invalid control field, the vCPU should just
>> fall through to the next instruction, without any state modifiation
>> other than the ALU flags and the VM-instruction error field of the
>> current VMCS. However, in preparation for the hardware VM-entry of
>> vmcs02, we have already changed quite a bit of the vCPU state: the
>> MSRs on the VM-entry MSR-load list, DR7, IA32_DEBUGCTL, the entire
>> FLAGS register, etc. All of these changes should be undone, and we're
>> not prepared to do that. (For instance, what was the old DR7 value
>> that needs to be restored?)
>
>
>> On Fri, Nov 3, 2017 at 5:07 PM, Krish Sadhukhan
>> <krish.sadhukhan@oracle.com> wrote:
>> > On 11/02/2017 05:50 PM, Wanpeng Li wrote:
>> >
>> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >>
>> >> Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME
>> >> failure
>> >> properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls
>> in
>> >> L1)
>> >> null pointer deference and also L0 calltrace when EPT=0 on both L0 and
>> L1.
>> >>
>> >> In L1:
>> >>
>> >> BUG: unable to handle kernel paging request at ffffffffc015bf8f
>> >>   IP: vmx_vcpu_run+0x202/0x510 [kvm_intel]
>> >>   PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161
>> >>   Oops: 0003 [#1] PREEMPT SMP
>> >>   CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6
>> >>   RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel]
>> >>   Call Trace:
>> >>   WARNING: kernel stack frame pointer at ffffb86f4988bc18 in
>> >> qemu-system-x86:1798 has bad value 0000000000000002
>> >>
>> >> In L0:
>> >>
>> >> -----------[ cut here ]------------
>> >>   WARNING: CPU: 6 PID: 4460 at
>> /home/kernel/linux/arch/x86/kvm//vmx.c:9845
>> >> vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
>> >>   CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G           OE
>> >> 4.14.0-rc7+ #25
>> >>   RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
>> >>   Call Trace:
>> >>    paging64_page_fault+0x500/0xde0 [kvm]
>> >>    ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm]
>> >>    ? nonpaging_page_fault+0x3b0/0x3b0 [kvm]
>> >>    ? __asan_storeN+0x12/0x20
>> >>    ? paging64_gva_to_gpa+0xb0/0x120 [kvm]
>> >>    ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm]
>> >>    ? lock_acquire+0x2c0/0x2c0
>> >>    ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel]
>> >>    ? vmx_get_segment+0x2a6/0x310 [kvm_intel]
>> >>    ? sched_clock+0x1f/0x30
>> >>    ? check_chain_key+0x137/0x1e0
>> >>    ? __lock_acquire+0x83c/0x2420
>> >>    ? kvm_multiple_exception+0xf2/0x220 [kvm]
>> >>    ? debug_check_no_locks_freed+0x240/0x240
>> >>    ? debug_smp_processor_id+0x17/0x20
>> >>    ? __lock_is_held+0x9e/0x100
>> >>    kvm_mmu_page_fault+0x90/0x180 [kvm]
>> >>    kvm_handle_page_fault+0x15c/0x310 [kvm]
>> >>    ? __lock_is_held+0x9e/0x100
>> >>    handle_exception+0x3c7/0x4d0 [kvm_intel]
>> >>    vmx_handle_exit+0x103/0x1010 [kvm_intel]
>> >>    ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm]
>> >>
>> >> The commit avoids to load host state of vmcs12 as vmcs01's guest state
>> >> since vmcs12 is not modified (except for the VM-instruction error
>> >> field)
>> >> if the checking of vmcs control area fails. However, the mmu context is
>> >> switched to nested mmu in prepare_vmcs02() and it will not be reloaded
>> >> since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME
>> >> fails. This patch fixes it by reloading mmu context when nested
>> >> VMLAUNCH/VMRESUME fails.
>> >>
>> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> >> Cc: Jim Mattson <jmattson@google.com>
>> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> >> ---
>> >> v3 -> v4:
>> >>   * move it to a new function load_vmcs12_mmu_host_state
>> >>
>> >>   arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++------------
>> >>   1 file changed, 22 insertions(+), 12 deletions(-)
>> >>
>> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> >> index 6cf3972..8aefb91 100644
>> >> --- a/arch/x86/kvm/vmx.c
>> >> +++ b/arch/x86/kvm/vmx.c
>> >> @@ -11259,6 +11259,24 @@ static void prepare_vmcs12(struct kvm_vcpu
>> *vcpu,
>> >> struct vmcs12 *vmcs12,
>> >>         kvm_clear_interrupt_queue(vcpu);
>> >>   }
>> >>   +static void load_vmcs12_mmu_host_state(struct kvm_vcpu *vcpu,
>> >> +                       struct vmcs12 *vmcs12)
>> >> +{
>> >> +       u32 entry_failure_code;
>> >> +
>> >> +       nested_ept_uninit_mmu_context(vcpu);
>> >> +
>> >> +       /*
>> >> +        * Only PDPTE load can fail as the value of cr3 was checked on
>> >> entry and
>> >> +        * couldn't have changed.
>> >> +        */
>> >> +       if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false,
>> >> &entry_failure_code))
>> >> +               nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
>> >> +
>> >> +       if (!enable_ept)
>> >> +               vcpu->arch.walk_mmu->inject_page_fault =
>> >> kvm_inject_page_fault;
>> >> +}
>> >> +
>> >>   /*
>> >>    * A part of what we need to when the nested L2 guest exits and we
>> want
>> >> to
>> >>    * run its L1 parent, is to reset L1's guest state to the host state
>> >> specified
>> >> @@ -11272,7 +11290,6 @@ static void load_vmcs12_host_state(struct
>> kvm_vcpu
>> >> *vcpu,
>> >>                                    struct vmcs12 *vmcs12)
>> >>   {
>> >>         struct kvm_segment seg;
>> >> -       u32 entry_failure_code;
>> >>         if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
>> >>                 vcpu->arch.efer = vmcs12->host_ia32_efer;
>> >> @@ -11299,17 +11316,7 @@ static void load_vmcs12_host_state(struct
>> >> kvm_vcpu *vcpu,
>> >>         vcpu->arch.cr4_guest_owned_bits =
>> >> ~vmcs_readl(CR4_GUEST_HOST_MASK);
>> >>         vmx_set_cr4(vcpu, vmcs12->host_cr4);
>> >>   -     nested_ept_uninit_mmu_context(vcpu);
>> >> -
>> >> -       /*
>> >> -        * Only PDPTE load can fail as the value of cr3 was checked on
>> >> entry and
>> >> -        * couldn't have changed.
>> >> -        */
>> >> -       if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false,
>> >> &entry_failure_code))
>> >> -               nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
>> >> -
>> >> -       if (!enable_ept)
>> >> -               vcpu->arch.walk_mmu->inject_page_fault =
>> >> kvm_inject_page_fault;
>> >> +       load_vmcs12_mmu_host_state(vcpu, vmcs12);
>> >>         if (enable_vpid) {
>> >>                 /*
>> >> @@ -11539,6 +11546,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu
>> >> *vcpu, u32 exit_reason,
>> >>          * accordingly.
>> >>          */
>> >>         nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
>> >> +
>> >> +       load_vmcs12_mmu_host_state(vcpu, vmcs12);
>> >> +
>> >>         /*
>> >>          * The emulated instruction was already skipped in
>> >>          * nested_vmx_run, but the updated RIP was never
>> >
>> > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>
>

      reply	other threads:[~2018-05-07 22:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-03  0:50 [PATCH v5 1/3] KVM: X86: Fix operand/address-size during instruction decoding Wanpeng Li
2017-11-03  0:50 ` [PATCH v5 2/3] KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry Wanpeng Li
     [not found]   ` <0b1d82f7-2fc6-9fc0-15a4-3500413814bd@oracle.com>
2017-11-03  6:40     ` Wanpeng Li
2017-11-03 17:13       ` Krish Sadhukhan
2017-11-03 17:54         ` Jim Mattson
2017-11-03  0:50 ` [PATCH v5 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure Wanpeng Li
2017-11-03 16:01   ` Jim Mattson
2017-11-04  0:07   ` Krish Sadhukhan
2017-11-08 21:47     ` Jim Mattson
2017-11-09  0:37       ` Wanpeng Li
2017-11-09 10:40         ` Paolo Bonzini
2017-11-09 10:47           ` Wanpeng Li
2017-11-09 11:05             ` Paolo Bonzini
2017-11-09 11:10               ` Wanpeng Li
2017-11-09 16:19           ` Jim Mattson
2018-02-05 18:44       ` Jim Mattson
2018-05-07 22:56         ` Jim Mattson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALMp9eSJ0UpH9zLT0+uf3+vPa1wvhZSY9wiyWF7_HFOL9jKC=A@mail.gmail.com' \
    --to=jmattson@google.com \
    --cc=kernellwp@gmail.com \
    --cc=krish.sadhukhan@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.