From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756053AbdKCQBt (ORCPT ); Fri, 3 Nov 2017 12:01:49 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:48727 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755883AbdKCQBr (ORCPT ); Fri, 3 Nov 2017 12:01:47 -0400 X-Google-Smtp-Source: ABhQp+TWPM3pZPj0gdHnA880j7CcKniXRdXB/ADppnw99E9fyc/A4bMxiivLrb1/ioKjuqBVYPWzF+5vBTshkODww+s= MIME-Version: 1.0 In-Reply-To: <1509670249-4907-3-git-send-email-wanpeng.li@hotmail.com> References: <1509670249-4907-1-git-send-email-wanpeng.li@hotmail.com> <1509670249-4907-3-git-send-email-wanpeng.li@hotmail.com> From: Jim Mattson Date: Fri, 3 Nov 2017 09:01:44 -0700 Message-ID: Subject: Re: [PATCH v5 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure To: Wanpeng Li Cc: LKML , kvm list , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id vA3G1sCb024069 That seems reasonable to me. Thanks for the fix. Reviewed-by: Jim Mattson On Thu, Nov 2, 2017 at 5:50 PM, Wanpeng Li wrote: > From: Wanpeng Li > > Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure > properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls in L1) > null pointer deference and also L0 calltrace when EPT=0 on both L0 and L1. > > In L1: > > BUG: unable to handle kernel paging request at ffffffffc015bf8f > IP: vmx_vcpu_run+0x202/0x510 [kvm_intel] > PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161 > Oops: 0003 [#1] PREEMPT SMP > CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6 > RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel] > Call Trace: > WARNING: kernel stack frame pointer at ffffb86f4988bc18 in qemu-system-x86:1798 has bad value 0000000000000002 > > In L0: > > -----------[ cut here ]------------ > WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:9845 vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] > CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G OE 4.14.0-rc7+ #25 > RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel] > Call Trace: > paging64_page_fault+0x500/0xde0 [kvm] > ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm] > ? nonpaging_page_fault+0x3b0/0x3b0 [kvm] > ? __asan_storeN+0x12/0x20 > ? paging64_gva_to_gpa+0xb0/0x120 [kvm] > ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm] > ? lock_acquire+0x2c0/0x2c0 > ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel] > ? vmx_get_segment+0x2a6/0x310 [kvm_intel] > ? sched_clock+0x1f/0x30 > ? check_chain_key+0x137/0x1e0 > ? __lock_acquire+0x83c/0x2420 > ? kvm_multiple_exception+0xf2/0x220 [kvm] > ? debug_check_no_locks_freed+0x240/0x240 > ? debug_smp_processor_id+0x17/0x20 > ? __lock_is_held+0x9e/0x100 > kvm_mmu_page_fault+0x90/0x180 [kvm] > kvm_handle_page_fault+0x15c/0x310 [kvm] > ? __lock_is_held+0x9e/0x100 > handle_exception+0x3c7/0x4d0 [kvm_intel] > vmx_handle_exit+0x103/0x1010 [kvm_intel] > ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm] > > The commit avoids to load host state of vmcs12 as vmcs01's guest state > since vmcs12 is not modified (except for the VM-instruction error field) > if the checking of vmcs control area fails. However, the mmu context is > switched to nested mmu in prepare_vmcs02() and it will not be reloaded > since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME > fails. This patch fixes it by reloading mmu context when nested > VMLAUNCH/VMRESUME fails. > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Cc: Jim Mattson > Signed-off-by: Wanpeng Li > --- > v3 -> v4: > * move it to a new function load_vmcs12_mmu_host_state > > arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++------------ > 1 file changed, 22 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 6cf3972..8aefb91 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -11259,6 +11259,24 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > kvm_clear_interrupt_queue(vcpu); > } > > +static void load_vmcs12_mmu_host_state(struct kvm_vcpu *vcpu, > + struct vmcs12 *vmcs12) > +{ > + u32 entry_failure_code; > + > + nested_ept_uninit_mmu_context(vcpu); > + > + /* > + * Only PDPTE load can fail as the value of cr3 was checked on entry and > + * couldn't have changed. > + */ > + if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code)) > + nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL); > + > + if (!enable_ept) > + vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault; > +} > + > /* > * A part of what we need to when the nested L2 guest exits and we want to > * run its L1 parent, is to reset L1's guest state to the host state specified > @@ -11272,7 +11290,6 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) > { > struct kvm_segment seg; > - u32 entry_failure_code; > > if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER) > vcpu->arch.efer = vmcs12->host_ia32_efer; > @@ -11299,17 +11316,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, > vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK); > vmx_set_cr4(vcpu, vmcs12->host_cr4); > > - nested_ept_uninit_mmu_context(vcpu); > - > - /* > - * Only PDPTE load can fail as the value of cr3 was checked on entry and > - * couldn't have changed. > - */ > - if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code)) > - nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL); > - > - if (!enable_ept) > - vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault; > + load_vmcs12_mmu_host_state(vcpu, vmcs12); > > if (enable_vpid) { > /* > @@ -11539,6 +11546,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, > * accordingly. > */ > nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); > + > + load_vmcs12_mmu_host_state(vcpu, vmcs12); > + > /* > * The emulated instruction was already skipped in > * nested_vmx_run, but the updated RIP was never > -- > 2.7.4 >