From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F7C7C43463 for ; Mon, 21 Sep 2020 13:24:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 62E1621789 for ; Mon, 21 Sep 2020 13:24:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gfq6BrgX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727014AbgIUNX7 (ORCPT ); Mon, 21 Sep 2020 09:23:59 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:25188 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726419AbgIUNX7 (ORCPT ); Mon, 21 Sep 2020 09:23:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600694637; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J2/FeX5V/ZEFcOQ6qB7LEOuNKAsmu6be+QX5MLdG1mc=; b=gfq6BrgXcIJSNgoNLREU2w++dW6/Q/wRE15fOeAQ9DB1wepFuN1jvVpXwzHtFrTzPVB5po 1XT5lZ6kNi7edFZ8qzr1MIp/lloDWpWCqBxbPWFM8II00mqAwGLMKFxrwqFy+hI8B/bF/S 2kPrhapXexK0alQ1FgfL/9dCr6QUU70= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-282-CbD6S37cMhSQvyElEobkeg-1; Mon, 21 Sep 2020 09:23:55 -0400 X-MC-Unique: CbD6S37cMhSQvyElEobkeg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C2BEC107464D; Mon, 21 Sep 2020 13:23:53 +0000 (UTC) Received: from starship (unknown [10.35.206.238]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3A6845C1DC; Mon, 21 Sep 2020 13:23:49 +0000 (UTC) Message-ID: Subject: Re: [PATCH v4 2/2] KVM: nSVM: implement ondemand allocation of the nested state From: Maxim Levitsky To: Sean Christopherson Cc: kvm@vger.kernel.org, Vitaly Kuznetsov , Ingo Molnar , Wanpeng Li , "H. Peter Anvin" , Borislav Petkov , Jim Mattson , Paolo Bonzini , Joerg Roedel , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , linux-kernel@vger.kernel.org, Thomas Gleixner Date: Mon, 21 Sep 2020 16:23:47 +0300 In-Reply-To: <20200917162942.GE13522@sjchrist-ice> References: <20200917101048.739691-1-mlevitsk@redhat.com> <20200917101048.739691-3-mlevitsk@redhat.com> <20200917162942.GE13522@sjchrist-ice> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.36.3 (3.36.3-1.fc32) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2020-09-17 at 09:29 -0700, Sean Christopherson wrote: > On Thu, Sep 17, 2020 at 01:10:48PM +0300, Maxim Levitsky wrote: > > This way we don't waste memory on VMs which don't use > > nesting virtualization even if it is available to them. > > > > If allocation of nested state fails (which should happen, > > only when host is about to OOM anyway), use new KVM_REQ_OUT_OF_MEMORY > > request to shut down the guest > > > > Signed-off-by: Maxim Levitsky > > --- > > arch/x86/kvm/svm/nested.c | 42 ++++++++++++++++++++++++++++++ > > arch/x86/kvm/svm/svm.c | 54 ++++++++++++++++++++++----------------- > > arch/x86/kvm/svm/svm.h | 7 +++++ > > 3 files changed, 79 insertions(+), 24 deletions(-) > > > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c > > index 09417f5197410..fe119da2ef836 100644 > > --- a/arch/x86/kvm/svm/nested.c > > +++ b/arch/x86/kvm/svm/nested.c > > @@ -467,6 +467,9 @@ int nested_svm_vmrun(struct vcpu_svm *svm) > > > > vmcb12 = map.hva; > > > > + if (WARN_ON(!svm->nested.initialized)) > > + return 1; > > + > > if (!nested_vmcb_checks(svm, vmcb12)) { > > vmcb12->control.exit_code = SVM_EXIT_ERR; > > vmcb12->control.exit_code_hi = 0; > > @@ -684,6 +687,45 @@ int nested_svm_vmexit(struct vcpu_svm *svm) > > return 0; > > } > > > > +int svm_allocate_nested(struct vcpu_svm *svm) > > +{ > > + struct page *hsave_page; > > + > > + if (svm->nested.initialized) > > + return 0; > > + > > + hsave_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); > > + if (!hsave_page) > > + goto error; > > goto is unnecessary, just do > > return -ENOMEM; To be honest this is a philosophical question, what way is better, but I don't mind to change this. > > > + > > + svm->nested.hsave = page_address(hsave_page); > > + > > + svm->nested.msrpm = svm_vcpu_init_msrpm(); > > + if (!svm->nested.msrpm) > > + goto err_free_hsave; > > + > > + svm->nested.initialized = true; > > + return 0; > > +err_free_hsave: > > + __free_page(hsave_page); > > +error: > > + return 1; > > As above, -ENOMEM would be preferable. After the changes to return negative values from msr writes, this indeed makes sense and is done now. > > > +} > > + > > +void svm_free_nested(struct vcpu_svm *svm) > > +{ > > + if (!svm->nested.initialized) > > + return; > > + > > + svm_vcpu_free_msrpm(svm->nested.msrpm); > > + svm->nested.msrpm = NULL; > > + > > + __free_page(virt_to_page(svm->nested.hsave)); > > + svm->nested.hsave = NULL; > > + > > + svm->nested.initialized = false; > > +} > > + > > /* > > * Forcibly leave nested mode in order to be able to reset the VCPU later on. > > */ > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index 3da5b2f1b4a19..57ea4407dcf09 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -266,6 +266,7 @@ static int get_max_npt_level(void) > > void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer) > > { > > struct vcpu_svm *svm = to_svm(vcpu); > > + u64 old_efer = vcpu->arch.efer; > > vcpu->arch.efer = efer; > > > > if (!npt_enabled) { > > @@ -276,9 +277,26 @@ void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer) > > efer &= ~EFER_LME; > > } > > > > - if (!(efer & EFER_SVME)) { > > - svm_leave_nested(svm); > > - svm_set_gif(svm, true); > > + if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) { > > + if (!(efer & EFER_SVME)) { > > + svm_leave_nested(svm); > > + svm_set_gif(svm, true); > > + > > + /* > > + * Free the nested state unless we are in SMM, in which > > + * case the exit from SVM mode is only for duration of the SMI > > + * handler > > + */ > > + if (!is_smm(&svm->vcpu)) > > + svm_free_nested(svm); > > + > > + } else { > > + if (svm_allocate_nested(svm)) { > > + vcpu->arch.efer = old_efer; > > + kvm_make_request(KVM_REQ_OUT_OF_MEMORY, vcpu); > > I really dislike KVM_REQ_OUT_OF_MEMORY. It's redundant with -ENOMEM and > creates a huge discrepancy with respect to existing code, e.g. nVMX returns > -ENOMEM in a similar situation. > > The deferred error handling creates other issues, e.g. vcpu->arch.efer is > unwound but the guest's RIP is not. > > One thought for handling this without opening a can of worms would be to do: > > r = kvm_x86_ops.set_efer(vcpu, efer); > if (r) { > WARN_ON(r > 0); > return r; > } > > I.e. go with the original approach, but only for returning errors that will > go all the way out to userspace. Done as explained in the other reply. > > > + return; > > + } > > + } > > } > > > > svm->vmcb->save.efer = efer | EFER_SVME; Thanks for the review, Best regards, Maxim Levitsky