From: Jim Mattson <jmattson@google.com>
To: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Cc: "Wanpeng Li" <kernellwp@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
"kvm list" <kvm@vger.kernel.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Radim Krčmář" <rkrcmar@redhat.com>,
"Wanpeng Li" <wanpeng.li@hotmail.com>
Subject: Re: [PATCH v5 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
Date: Mon, 05 Feb 2018 18:44:17 +0000 [thread overview]
Message-ID: <CALMp9eTGeqV21MtMRsvsBsKn6VT7qHNrh8ciT1zueLZ0z1-dww@mail.gmail.com> (raw)
In-Reply-To: <CALMp9eSoGfhutN3R7pYBYsG-Guu9tNG4ebc5r+MjD+Epxi0=Dw@mail.gmail.com>
I realize now that this fix isn't quite right, since it loads
vmcs12->host_cr3 rather than reverting to the CR3 that was loaded at the
time of VMLAUNCH/VMRESUME. In the case of VMfailValid(VM entry with invalid
VMX-control field(s)), none of the VMCS12 host state fields should be
loaded. See the pseudocode for VMLAUNCH/VMRESUME in volume 3 of the SDM.
On Wed, Nov 8, 2017 at 1:47 PM Jim Mattson <jmattson@google.com> wrote:
> I realize now that there are actually many other problems with
> deferring some control field checks to the hardware VM-entry of
> vmcs02. When there is an invalid control field, the vCPU should just
> fall through to the next instruction, without any state modifiation
> other than the ALU flags and the VM-instruction error field of the
> current VMCS. However, in preparation for the hardware VM-entry of
> vmcs02, we have already changed quite a bit of the vCPU state: the
> MSRs on the VM-entry MSR-load list, DR7, IA32_DEBUGCTL, the entire
> FLAGS register, etc. All of these changes should be undone, and we're
> not prepared to do that. (For instance, what was the old DR7 value
> that needs to be restored?)
> On Fri, Nov 3, 2017 at 5:07 PM, Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
> > On 11/02/2017 05:50 PM, Wanpeng Li wrote:
> >
> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
> >>
> >> Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME
> >> failure
> >> properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls
> in
> >> L1)
> >> null pointer deference and also L0 calltrace when EPT=0 on both L0 and
> L1.
> >>
> >> In L1:
> >>
> >> BUG: unable to handle kernel paging request at ffffffffc015bf8f
> >> IP: vmx_vcpu_run+0x202/0x510 [kvm_intel]
> >> PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161
> >> Oops: 0003 [#1] PREEMPT SMP
> >> CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6
> >> RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel]
> >> Call Trace:
> >> WARNING: kernel stack frame pointer at ffffb86f4988bc18 in
> >> qemu-system-x86:1798 has bad value 0000000000000002
> >>
> >> In L0:
> >>
> >> -----------[ cut here ]------------
> >> WARNING: CPU: 6 PID: 4460 at
> /home/kernel/linux/arch/x86/kvm//vmx.c:9845
> >> vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
> >> CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G OE
> >> 4.14.0-rc7+ #25
> >> RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
> >> Call Trace:
> >> paging64_page_fault+0x500/0xde0 [kvm]
> >> ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm]
> >> ? nonpaging_page_fault+0x3b0/0x3b0 [kvm]
> >> ? __asan_storeN+0x12/0x20
> >> ? paging64_gva_to_gpa+0xb0/0x120 [kvm]
> >> ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm]
> >> ? lock_acquire+0x2c0/0x2c0
> >> ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel]
> >> ? vmx_get_segment+0x2a6/0x310 [kvm_intel]
> >> ? sched_clock+0x1f/0x30
> >> ? check_chain_key+0x137/0x1e0
> >> ? __lock_acquire+0x83c/0x2420
> >> ? kvm_multiple_exception+0xf2/0x220 [kvm]
> >> ? debug_check_no_locks_freed+0x240/0x240
> >> ? debug_smp_processor_id+0x17/0x20
> >> ? __lock_is_held+0x9e/0x100
> >> kvm_mmu_page_fault+0x90/0x180 [kvm]
> >> kvm_handle_page_fault+0x15c/0x310 [kvm]
> >> ? __lock_is_held+0x9e/0x100
> >> handle_exception+0x3c7/0x4d0 [kvm_intel]
> >> vmx_handle_exit+0x103/0x1010 [kvm_intel]
> >> ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm]
> >>
> >> The commit avoids to load host state of vmcs12 as vmcs01's guest state
> >> since vmcs12 is not modified (except for the VM-instruction error
> field)
> >> if the checking of vmcs control area fails. However, the mmu context is
> >> switched to nested mmu in prepare_vmcs02() and it will not be reloaded
> >> since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME
> >> fails. This patch fixes it by reloading mmu context when nested
> >> VMLAUNCH/VMRESUME fails.
> >>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
> >> Cc: Jim Mattson <jmattson@google.com>
> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> >> ---
> >> v3 -> v4:
> >> * move it to a new function load_vmcs12_mmu_host_state
> >>
> >> arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++------------
> >> 1 file changed, 22 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 6cf3972..8aefb91 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -11259,6 +11259,24 @@ static void prepare_vmcs12(struct kvm_vcpu
> *vcpu,
> >> struct vmcs12 *vmcs12,
> >> kvm_clear_interrupt_queue(vcpu);
> >> }
> >> +static void load_vmcs12_mmu_host_state(struct kvm_vcpu *vcpu,
> >> + struct vmcs12 *vmcs12)
> >> +{
> >> + u32 entry_failure_code;
> >> +
> >> + nested_ept_uninit_mmu_context(vcpu);
> >> +
> >> + /*
> >> + * Only PDPTE load can fail as the value of cr3 was checked on
> >> entry and
> >> + * couldn't have changed.
> >> + */
> >> + if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false,
> >> &entry_failure_code))
> >> + nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
> >> +
> >> + if (!enable_ept)
> >> + vcpu->arch.walk_mmu->inject_page_fault =
> >> kvm_inject_page_fault;
> >> +}
> >> +
> >> /*
> >> * A part of what we need to when the nested L2 guest exits and we
> want
> >> to
> >> * run its L1 parent, is to reset L1's guest state to the host state
> >> specified
> >> @@ -11272,7 +11290,6 @@ static void load_vmcs12_host_state(struct
> kvm_vcpu
> >> *vcpu,
> >> struct vmcs12 *vmcs12)
> >> {
> >> struct kvm_segment seg;
> >> - u32 entry_failure_code;
> >> if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
> >> vcpu->arch.efer = vmcs12->host_ia32_efer;
> >> @@ -11299,17 +11316,7 @@ static void load_vmcs12_host_state(struct
> >> kvm_vcpu *vcpu,
> >> vcpu->arch.cr4_guest_owned_bits =
> >> ~vmcs_readl(CR4_GUEST_HOST_MASK);
> >> vmx_set_cr4(vcpu, vmcs12->host_cr4);
> >> - nested_ept_uninit_mmu_context(vcpu);
> >> -
> >> - /*
> >> - * Only PDPTE load can fail as the value of cr3 was checked on
> >> entry and
> >> - * couldn't have changed.
> >> - */
> >> - if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false,
> >> &entry_failure_code))
> >> - nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
> >> -
> >> - if (!enable_ept)
> >> - vcpu->arch.walk_mmu->inject_page_fault =
> >> kvm_inject_page_fault;
> >> + load_vmcs12_mmu_host_state(vcpu, vmcs12);
> >> if (enable_vpid) {
> >> /*
> >> @@ -11539,6 +11546,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu
> >> *vcpu, u32 exit_reason,
> >> * accordingly.
> >> */
> >> nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
> >> +
> >> + load_vmcs12_mmu_host_state(vcpu, vmcs12);
> >> +
> >> /*
> >> * The emulated instruction was already skipped in
> >> * nested_vmx_run, but the updated RIP was never
> >
> > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
next prev parent reply other threads:[~2018-02-05 18:44 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-03 0:50 [PATCH v5 1/3] KVM: X86: Fix operand/address-size during instruction decoding Wanpeng Li
2017-11-03 0:50 ` [PATCH v5 2/3] KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry Wanpeng Li
[not found] ` <0b1d82f7-2fc6-9fc0-15a4-3500413814bd@oracle.com>
2017-11-03 6:40 ` Wanpeng Li
2017-11-03 17:13 ` Krish Sadhukhan
2017-11-03 17:54 ` Jim Mattson
2017-11-03 0:50 ` [PATCH v5 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure Wanpeng Li
2017-11-03 16:01 ` Jim Mattson
2017-11-04 0:07 ` Krish Sadhukhan
2017-11-08 21:47 ` Jim Mattson
2017-11-09 0:37 ` Wanpeng Li
2017-11-09 10:40 ` Paolo Bonzini
2017-11-09 10:47 ` Wanpeng Li
2017-11-09 11:05 ` Paolo Bonzini
2017-11-09 11:10 ` Wanpeng Li
2017-11-09 16:19 ` Jim Mattson
2018-02-05 18:44 ` Jim Mattson [this message]
2018-05-07 22:56 ` Jim Mattson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALMp9eTGeqV21MtMRsvsBsKn6VT7qHNrh8ciT1zueLZ0z1-dww@mail.gmail.com \
--to=jmattson@google.com \
--cc=kernellwp@gmail.com \
--cc=krish.sadhukhan@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=wanpeng.li@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).