From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=k3Bv=C6=vger.kernel.org=linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9F7C7C43463
	for <linux-kernel@archiver.kernel.org>; Mon, 21 Sep 2020 13:24:00 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 62E1621789
	for <linux-kernel@archiver.kernel.org>; Mon, 21 Sep 2020 13:24:00 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gfq6BrgX"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727014AbgIUNX7 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 21 Sep 2020 09:23:59 -0400
Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:25188 "EHLO
        us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org
        with ESMTP id S1726419AbgIUNX7 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 21 Sep 2020 09:23:59 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
        s=mimecast20190719; t=1600694637;
        h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
         to:to:cc:cc:mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=J2/FeX5V/ZEFcOQ6qB7LEOuNKAsmu6be+QX5MLdG1mc=;
        b=gfq6BrgXcIJSNgoNLREU2w++dW6/Q/wRE15fOeAQ9DB1wepFuN1jvVpXwzHtFrTzPVB5po
        1XT5lZ6kNi7edFZ8qzr1MIp/lloDWpWCqBxbPWFM8II00mqAwGLMKFxrwqFy+hI8B/bF/S
        2kPrhapXexK0alQ1FgfL/9dCr6QUU70=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-282-CbD6S37cMhSQvyElEobkeg-1; Mon, 21 Sep 2020 09:23:55 -0400
X-MC-Unique: CbD6S37cMhSQvyElEobkeg-1
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])
        (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
        (No client certificate requested)
        by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C2BEC107464D;
        Mon, 21 Sep 2020 13:23:53 +0000 (UTC)
Received: from starship (unknown [10.35.206.238])
        by smtp.corp.redhat.com (Postfix) with ESMTP id 3A6845C1DC;
        Mon, 21 Sep 2020 13:23:49 +0000 (UTC)
Message-ID: <badafb14f2b3659e6c5669602511083364e99fb5.camel@redhat.com>
Subject: Re: [PATCH v4 2/2] KVM: nSVM: implement ondemand allocation of the
 nested state
From:   Maxim Levitsky <mlevitsk@redhat.com>
To:     Sean Christopherson <sean.j.christopherson@intel.com>
Cc:     kvm@vger.kernel.org, Vitaly Kuznetsov <vkuznets@redhat.com>,
        Ingo Molnar <mingo@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>,
        "H. Peter Anvin" <hpa@zytor.com>, Borislav Petkov <bp@alien8.de>,
        Jim Mattson <jmattson@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Joerg Roedel <joro@8bytes.org>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@kernel.org>,
        linux-kernel@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>
Date:   Mon, 21 Sep 2020 16:23:47 +0300
In-Reply-To: <20200917162942.GE13522@sjchrist-ice>
References: <20200917101048.739691-1-mlevitsk@redhat.com>
         <20200917101048.739691-3-mlevitsk@redhat.com>
         <20200917162942.GE13522@sjchrist-ice>
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.36.3 (3.36.3-1.fc32) 
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, 2020-09-17 at 09:29 -0700, Sean Christopherson wrote:
> On Thu, Sep 17, 2020 at 01:10:48PM +0300, Maxim Levitsky wrote:
> > This way we don't waste memory on VMs which don't use
> > nesting virtualization even if it is available to them.
> > 
> > If allocation of nested state fails (which should happen,
> > only when host is about to OOM anyway), use new KVM_REQ_OUT_OF_MEMORY
> > request to shut down the guest
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/nested.c | 42 ++++++++++++++++++++++++++++++
> >  arch/x86/kvm/svm/svm.c    | 54 ++++++++++++++++++++++-----------------
> >  arch/x86/kvm/svm/svm.h    |  7 +++++
> >  3 files changed, 79 insertions(+), 24 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 09417f5197410..fe119da2ef836 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -467,6 +467,9 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
> >  
> >  	vmcb12 = map.hva;
> >  
> > +	if (WARN_ON(!svm->nested.initialized))
> > +		return 1;
> > +
> >  	if (!nested_vmcb_checks(svm, vmcb12)) {
> >  		vmcb12->control.exit_code    = SVM_EXIT_ERR;
> >  		vmcb12->control.exit_code_hi = 0;
> > @@ -684,6 +687,45 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> >  	return 0;
> >  }
> >  
> > +int svm_allocate_nested(struct vcpu_svm *svm)
> > +{
> > +	struct page *hsave_page;
> > +
> > +	if (svm->nested.initialized)
> > +		return 0;
> > +
> > +	hsave_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> > +	if (!hsave_page)
> > +		goto error;
> 
> goto is unnecessary, just do
> 
> 		return -ENOMEM;

To be honest this is a philosophical question,
what way is better, but I don't mind to change this.

> 
> > +
> > +	svm->nested.hsave = page_address(hsave_page);
> > +
> > +	svm->nested.msrpm = svm_vcpu_init_msrpm();
> > +	if (!svm->nested.msrpm)
> > +		goto err_free_hsave;
> > +
> > +	svm->nested.initialized = true;
> > +	return 0;
> > +err_free_hsave:
> > +	__free_page(hsave_page);
> > +error:
> > +	return 1;
> 
> As above, -ENOMEM would be preferable.
After the changes to return negative values from msr writes,
this indeed makes sense and is done now.
> 
> > +}
> > +
> > +void svm_free_nested(struct vcpu_svm *svm)
> > +{
> > +	if (!svm->nested.initialized)
> > +		return;
> > +
> > +	svm_vcpu_free_msrpm(svm->nested.msrpm);
> > +	svm->nested.msrpm = NULL;
> > +
> > +	__free_page(virt_to_page(svm->nested.hsave));
> > +	svm->nested.hsave = NULL;
> > +
> > +	svm->nested.initialized = false;
> > +}
> > +
> >  /*
> >   * Forcibly leave nested mode in order to be able to reset the VCPU later on.
> >   */
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 3da5b2f1b4a19..57ea4407dcf09 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -266,6 +266,7 @@ static int get_max_npt_level(void)
> >  void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
> >  {
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> > +	u64 old_efer = vcpu->arch.efer;
> >  	vcpu->arch.efer = efer;
> >  
> >  	if (!npt_enabled) {
> > @@ -276,9 +277,26 @@ void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
> >  			efer &= ~EFER_LME;
> >  	}
> >  
> > -	if (!(efer & EFER_SVME)) {
> > -		svm_leave_nested(svm);
> > -		svm_set_gif(svm, true);
> > +	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
> > +		if (!(efer & EFER_SVME)) {
> > +			svm_leave_nested(svm);
> > +			svm_set_gif(svm, true);
> > +
> > +			/*
> > +			 * Free the nested state unless we are in SMM, in which
> > +			 * case the exit from SVM mode is only for duration of the SMI
> > +			 * handler
> > +			 */
> > +			if (!is_smm(&svm->vcpu))
> > +				svm_free_nested(svm);
> > +
> > +		} else {
> > +			if (svm_allocate_nested(svm)) {
> > +				vcpu->arch.efer = old_efer;
> > +				kvm_make_request(KVM_REQ_OUT_OF_MEMORY, vcpu);
> 
> I really dislike KVM_REQ_OUT_OF_MEMORY.  It's redundant with -ENOMEM and
> creates a huge discrepancy with respect to existing code, e.g. nVMX returns
> -ENOMEM in a similar situation.
> 
> The deferred error handling creates other issues, e.g. vcpu->arch.efer is
> unwound but the guest's RIP is not.
> 
> One thought for handling this without opening a can of worms would be to do:
> 
> 	r = kvm_x86_ops.set_efer(vcpu, efer);
> 	if (r) {
> 		WARN_ON(r > 0);
> 		return r;
> 	}
> 
> I.e. go with the original approach, but only for returning errors that will
> go all the way out to userspace.

Done as explained in the other reply.

> 
> > +				return;
> > +			}
> > +		}
> >  	}
> >  
> >  	svm->vmcb->save.efer = efer | EFER_SVME;


Thanks for the review,
	Best regards,
		Maxim Levitsky