From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,T_MIXED_ES,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81F02C65BAF for ; Wed, 12 Dec 2018 23:20:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2A0D320645 for ; Wed, 12 Dec 2018 23:20:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fuOlYnDm" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A0D320645 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728484AbeLLXUH (ORCPT ); Wed, 12 Dec 2018 18:20:07 -0500 Received: from mail-ot1-f68.google.com ([209.85.210.68]:47040 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726337AbeLLXUH (ORCPT ); Wed, 12 Dec 2018 18:20:07 -0500 Received: by mail-ot1-f68.google.com with SMTP id w25so113966otm.13 for ; Wed, 12 Dec 2018 15:20:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=k6iJu6y4M0WOt5PU1EDhJkC4+HR3/BXBZzq4HJfVoa8=; b=fuOlYnDmNNgDl5nC0ZyoCZAyRSqdqKvTteSJ3zTcVgF74L8on4nh1q2nw6Yki5QVvh EChmVIGCeVPu9DLJnTbKPjW5EnRkW3HZ66RkhvtQWI+wuxq68+dvUDo2xx3AH4nbyaO2 OZspSpk4M62WZImXkIQ/5qy8xQ+KFK2YxCZ2xoEIOMDxwU7eRPipoQ4ezVjk/Ta2OzqG K3mPR484UTJ7Rk0mPRNlMPo3QbZEFcKQ1zxPjQeQx97WE5gEQa7rLKMnDpbcx7XZ+nwS AjH128ggC4lLRrJcckknoTsGgwU86xuU9Un1jePYAAw20KhYSGeyKVGuM/hSVzrzZiAN 3zRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=k6iJu6y4M0WOt5PU1EDhJkC4+HR3/BXBZzq4HJfVoa8=; b=GLhky+c4V1qIip/AqB0tG2c9/eXNbwKQILjrRKjZS0XLFXB0f5SeYXffPkBudr7KtL bR7cGolRSVO/pzN3HhiQQEz0p2n0WucUCVfNnhrvT8qjZNmXe2FfqeiW8xhCD5tOZziP ZTlxkPYHoNuDiZ3Ju8LodSGAn2WvXlb8F4Xwy8BGDaSQR44CMWVKW53uVNJqUawdS8BN nNEjGR11XQNuZRGktZxZ7HDXtzUKRxumN+4tbldf1HpN7awQBBuPOUfcpqKH7U8helMF iVKWrlCX6KZQSPhmSgVDm0eL0w2zollcXZcQuBl9QLtICmeJ0XJF+/DJK0+e+QzBInif jLpA== X-Gm-Message-State: AA+aEWbdKHyYF0KpbzxbJUpEdLSCcJeBc7MOMsewZHABAjsCghJVpzOz oiiOsl9XVTQ7089STCjkrHfGQUCLtxeNeR6V4BbpfA== X-Google-Smtp-Source: AFSGD/VnTqkZqfQP1z9+U5S+0bIzzBWshO1ouzXprMIz4Zxxw5Syh3K26bKmxCPAAj1aN6/5lB+xOynVB71p8SwGcxI= X-Received: by 2002:a9d:6297:: with SMTP id x23mr14469017otk.63.1544656804541; Wed, 12 Dec 2018 15:20:04 -0800 (PST) MIME-Version: 1.0 References: <20181016165011.6607-1-vkuznets@redhat.com> <20181016165011.6607-6-vkuznets@redhat.com> In-Reply-To: <20181016165011.6607-6-vkuznets@redhat.com> From: Jim Mattson Date: Wed, 12 Dec 2018 15:19:53 -0800 Message-ID: Subject: Re: [PATCH v6 05/13] KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR To: Vitaly Kuznetsov Cc: kvm list , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Roman Kagan , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , "Michael Kelley (EOSG)" , LKML , Liran Alon Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 16, 2018 at 9:50 AM Vitaly Kuznetsov wrote: > > Per Hyper-V TLFS 5.0b: > > "The L1 hypervisor may choose to use enlightened VMCSs by writing 1 to > the corresponding field in the VP assist page (see section 7.8.7). > Another field in the VP assist page controls the currently active > enlightened VMCS. Each enlightened VMCS is exactly one page (4 KB) in > size and must be initially zeroed. No VMPTRLD instruction must be > executed to make an enlightened VMCS active or current. > > After the L1 hypervisor performs a VM entry with an enlightened VMCS, > the VMCS is considered active on the processor. An enlightened VMCS > can only be active on a single processor at the same time. The L1 > hypervisor can execute a VMCLEAR instruction to transition an > enlightened VMCS from the active to the non-active state. Any VMREAD > or VMWRITE instructions while an enlightened VMCS is active is > unsupported and can result in unexpected behavior." > > Keep Enlightened VMCS structure for the current L2 guest permanently mapped > from struct nested_vmx instead of mapping it every time. > > Suggested-by: Ladi Prosek > Signed-off-by: Vitaly Kuznetsov > --- > arch/x86/kvm/vmx.c | 115 ++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 108 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index aebd008ccccb..cfb44acd4291 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -20,6 +20,7 @@ > #include "mmu.h" > #include "cpuid.h" > #include "lapic.h" > +#include "hyperv.h" > > #include > #include > @@ -889,6 +890,8 @@ struct nested_vmx { > bool guest_mode; > } smm; > > + gpa_t hv_evmcs_vmptr; > + struct page *hv_evmcs_page; > struct hv_enlightened_vmcs *hv_evmcs; > }; > > @@ -8111,11 +8114,13 @@ static int nested_vmx_failInvalid(struct kvm_vcpu *vcpu) > static int nested_vmx_failValid(struct kvm_vcpu *vcpu, > u32 vm_instruction_error) > { > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + > /* > * failValid writes the error number to the current VMCS, which > * can't be done if there isn't a current VMCS. > */ > - if (to_vmx(vcpu)->nested.current_vmptr == -1ull) > + if (vmx->nested.current_vmptr == -1ull && !vmx->nested.hv_evmcs) > return nested_vmx_failInvalid(vcpu); > > vmx_set_rflags(vcpu, (vmx_get_rflags(vcpu) > @@ -8441,6 +8446,20 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx) > vmcs_write64(VMCS_LINK_POINTER, -1ull); > } > > +static inline void nested_release_evmcs(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + > + if (!vmx->nested.hv_evmcs) > + return; > + > + kunmap(vmx->nested.hv_evmcs_page); > + kvm_release_page_dirty(vmx->nested.hv_evmcs_page); > + vmx->nested.hv_evmcs_vmptr = -1ull; > + vmx->nested.hv_evmcs_page = NULL; > + vmx->nested.hv_evmcs = NULL; > +} > + > static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > @@ -8509,6 +8528,8 @@ static void free_nested(struct kvm_vcpu *vcpu) > > kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); > > + nested_release_evmcs(vcpu); > + > free_loaded_vmcs(&vmx->nested.vmcs02); > } > > @@ -8542,12 +8563,18 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) > return nested_vmx_failValid(vcpu, > VMXERR_VMCLEAR_VMXON_POINTER); > > - if (vmptr == vmx->nested.current_vmptr) > - nested_release_vmcs12(vcpu); > + if (vmx->nested.hv_evmcs_page) { > + if (vmptr == vmx->nested.hv_evmcs_vmptr) > + nested_release_evmcs(vcpu); > + } else { > + if (vmptr == vmx->nested.current_vmptr) > + nested_release_vmcs12(vcpu); > > - kvm_vcpu_write_guest(vcpu, > - vmptr + offsetof(struct vmcs12, launch_state), > - &zero, sizeof(zero)); > + kvm_vcpu_write_guest(vcpu, > + vmptr + offsetof(struct vmcs12, > + launch_state), > + &zero, sizeof(zero)); > + } > > return nested_vmx_succeed(vcpu); > } > @@ -8637,6 +8664,8 @@ static int copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx) > struct vmcs12 *vmcs12 = vmx->nested.cached_vmcs12; > struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs; > > + vmcs12->hdr.revision_id = evmcs->revision_id; > + > /* HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE */ > vmcs12->tpr_threshold = evmcs->tpr_threshold; > vmcs12->guest_rip = evmcs->guest_rip; > @@ -9268,6 +9297,10 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu) > return nested_vmx_failValid(vcpu, > VMXERR_VMPTRLD_VMXON_POINTER); > > + /* Forbid normal VMPTRLD if Enlightened version was used */ > + if (vmx->nested.hv_evmcs) > + return 1; > + > if (vmx->nested.current_vmptr != vmptr) { > struct vmcs12 *new_vmcs12; > struct page *page; > @@ -9301,6 +9334,68 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu) > return nested_vmx_succeed(vcpu); > } > > +/* > + * This is an equivalent of the nested hypervisor executing the vmptrld > + * instruction. > + */ > +static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu) > +{ > + struct vcpu_vmx *vmx = to_vmx(vcpu); > + struct hv_vp_assist_page assist_page; > + > + if (likely(!vmx->nested.enlightened_vmcs_enabled)) > + return 1; > + > + if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page))) > + return 1; > + > + if (unlikely(!assist_page.enlighten_vmentry)) > + return 1; > + > + if (unlikely(assist_page.current_nested_vmcs != > + vmx->nested.hv_evmcs_vmptr)) { > + > + if (!vmx->nested.hv_evmcs) > + vmx->nested.current_vmptr = -1ull; > + > + nested_release_evmcs(vcpu); > + > + vmx->nested.hv_evmcs_page = kvm_vcpu_gpa_to_page( > + vcpu, assist_page.current_nested_vmcs); > + > + if (unlikely(is_error_page(vmx->nested.hv_evmcs_page))) > + return 0; > + > + vmx->nested.hv_evmcs = kmap(vmx->nested.hv_evmcs_page); Are you sure that directly mapping guest memory isn't going to lead to time-of-check vs. time-of-use bugs? This is a very hard programming model to get right.