From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1603BC61DA4 for ; Wed, 22 Feb 2023 20:43:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9656A6B0073; Wed, 22 Feb 2023 15:43:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EED06B0075; Wed, 22 Feb 2023 15:43:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7728B6B0078; Wed, 22 Feb 2023 15:43:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 635156B0073 for ; Wed, 22 Feb 2023 15:43:00 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0CD5EC099D for ; Wed, 22 Feb 2023 20:42:59 +0000 (UTC) X-FDA: 80496102120.16.15C7EAB Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf14.hostedemail.com (Postfix) with ESMTP id E4C9B100004 for ; Wed, 22 Feb 2023 20:42:57 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LoYZQSVZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of zhi.wang.linux@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=zhi.wang.linux@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677098578; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3pLghbOMnh1AF6uxDquwPbjxAIvGIgrI59/gCPI5GL8=; b=izJBYdLgf4LDTqQmg71dnYueFHAwc/K3h1wxXb5yMNHt3YgZLnzxoscADYYbySEo4qhdGK wn8GPZn18UfDa0VwwGVsYQSQGQ0ZK9AuY6sWoEQCEPzL8WDWUBFgCj1uDS/pbfTvwgCiQj Xm5sogeJMHN2yZ6JACXPt/B4eR93Q7I= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=LoYZQSVZ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of zhi.wang.linux@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=zhi.wang.linux@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677098578; a=rsa-sha256; cv=none; b=HGSHZ5yrHyjcIX1Urv/xfnjkXgCvMXNMEVfaMH9fFpSbhZyaHdQBVlszMTxZDnABVJPT9y taCWj1fyCc0WzckIkWPfQl6YoUUnGqnFEtlJpDJZaxJSs0QuB/9d4KSwfYBmsrhGVsyiEY aTEVdDAAnD3WuQ+VCevUnW56xKkYfkY= Received: by mail-lf1-f43.google.com with SMTP id g8so7976129lfj.2 for ; Wed, 22 Feb 2023 12:42:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1677098576; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=3pLghbOMnh1AF6uxDquwPbjxAIvGIgrI59/gCPI5GL8=; b=LoYZQSVZum4vtB0QbV6sETXieLV8c6vvsXwrHI1vh+AGuUKZMBsz8L7mM6LfJKELKI QfvgfhQg+sZ/gniTDFx3ZAy9/xXY55xBIwFP9lZhRlKUk17axgFRUh5vGc8iR+XCl/cx jqyyglRPUmn8fEwbDuoZApjPmcKsSiGfD7uz8HQShxJ3mXr1pDKtTKyRlT6NUz6kIiKm Rxn5eWJkMDTN7uV4ofHLWtRNGeDVouEKAZuSMJvw94LX3zOwxeJHH2B0U8532rh8gNFl CFCAjb80p3am8S6nfeiqM3ZtA5Mnc47up8bpowWhniLnBjT1i6Zkq6FDLT/qd9SBCY/2 Hh8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677098576; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3pLghbOMnh1AF6uxDquwPbjxAIvGIgrI59/gCPI5GL8=; b=oiU5nTKPg8bEeiiZ0+zIu4Zvs45/qMWTivXANVAjdV26wlACFtdWXplFQyHoLA1/zv rutMz2NQXgQo9bcW9zaMqqa93eogVuslG0QqYxfR8gudhlCkVNgDwiUKYlQHz1k7O/F5 WyoJkDcGrk0+NzZ5ISCC1I0VrbZit6kIXmU3tAvbDh+EjKKiLt3Yax5DIB9pXKtXUSpH W2Fzaj7qXfxW/pzLFMNzA+QQwAxDAb7zlyt8i1xzWK8Rsl3Knlkjw2p7Y5UDuUs2SVbD 5xpNWuu6LjV/Da0FoLTeiByuSThOql3K2WUSIihHBpz2JXc5jTnWRK60lvVfGiPTaIrL EIhA== X-Gm-Message-State: AO0yUKVd+r0aZG7ywh24MlUnTpXF5Zv4Wx7njY5RtOrctSpASvq5VQZg VVi0DmSMjSTdf2ImA40xlLY= X-Google-Smtp-Source: AK7set8rAA9AhpMbKSjj7YzXz1bfeSjIQsdkFqAlsea9eIRT6FzIZoaOkh8a/2uc/4oVhXEb/Qh2bw== X-Received: by 2002:a05:6512:7b:b0:4dd:a025:d87 with SMTP id i27-20020a056512007b00b004dda0250d87mr426608lfo.0.1677098576019; Wed, 22 Feb 2023 12:42:56 -0800 (PST) Received: from localhost (88-115-161-74.elisa-laajakaista.fi. [88.115.161.74]) by smtp.gmail.com with ESMTPSA id a28-20020ac2521c000000b004d34238ca44sm1135378lfl.214.2023.02.22.12.42.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Feb 2023 12:42:55 -0800 (PST) Date: Wed, 22 Feb 2023 22:42:53 +0200 From: Zhi Wang To: Michael Roth Cc: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Brijesh Singh Subject: Re: [PATCH RFC v8 31/56] KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Message-ID: <20230222224253.00002740@gmail.com> In-Reply-To: <20230220183847.59159-32-michael.roth@amd.com> References: <20230220183847.59159-1-michael.roth@amd.com> <20230220183847.59159-32-michael.roth@amd.com> X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: E4C9B100004 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: dbc5hod8cor1kbpdoa3wy6mitfmu6769 X-HE-Tag: 1677098577-501920 X-HE-Meta: U2FsdGVkX18igAD3y7p0//Ro3wRQsgk3jNYNHJ+W33XnfIJhE01uoax1npfRdT82rFv8ePFOwx/GCpDWW7xzRP8bNWFk6I8Ze/ODtZGHXTgqrN+/TfViyhystBaaH2OPv0qO77iLr5nwkghMEBw9+EZQZ8t/qEhSDDNeqa73/k+jtXWsABPux6msP3J5UZwJK0/0mkomH2kVBLm6RZz9d5P8jWRKKsKJe0LM+0B+qhRcwgwHy74bekccdhplGC8lb1dM7pfSNLiftTAFeMAQoplMkMdDOhenBXqd9MAc5GbE/aHZtd7pXJVSM6198n2a44O/mvxULAihnq7pyfz8QT9SIqvilTdmzXaE1VqkCTetca9pOD49CJn4GDNgXGl9/BKniLQ3jp79p0rPvC/Mt7+YCW84WWo3QQFfLx3CS45AqNHxxZWWlbOwYZMHkPf3cXqlavQd6KO5Kd9pEsNDT5iVTTWjG0AcNAHopdJh1qIQdQYgaRqcQBU3KUJjKZg1meW2SlHNNZYVxgWqQy7W7RBjkayP00lpecDxaRdy1jUI8NBjREg8rRSdIEU8RLwTMXZf/SjbQeTkOPCBcqS2m/1J4X9Go1qipuHJkWEseVIa+8E78GfYIfQYqm4tnqm+TRYI5GSujuIc/x6giuIt8DU4Vunr13oUHGurb0qyqwcUeWb9f8wriNEoHHWISIJ/h9jELcX21W4x5iA13lEyYLLTgf7Ce0DZp/fcoN48wypex1Dlo/nVBrK095uRQNkpm0qxm9xIEQBqnidG3nvRaplmB962xP6jvMMlh8LrqndGu34uC9WHcR4j6SRs7wY6bmvgj627cYhu2bbcTOHpqEYNfbO92daHAQE/PHd11raUHLbfqMeI/5E8IhjLzvErCWAXzRHheVrdonEi7J6/5vQl4WONP4b+T1FIlDy0qKlhBjzr7MuBh9K9nUpeGzkKUmQ39i/VtTwBbJerePQ 8sGc2v+t bzj48LNpKJoe3N/OwhCE18QlS7AtWwtibnpG4GSaDzDSKK8XQwQe+CF6+9bcweZdsmKFQFEVCyRkyuKuZ5FBogKY1z/BjtL9HVzHEZLYisMCexetW+sgo/ylUpI+YZWMGnTg1u2r+AKbsyLF1/t11yGYkxpSDnJ/xeLoQJQB/N1Cr32SEwCWDabm8mzrkyxxslVj20NjaCIf/ilm3de5ZsTKbbixjnQ5IIofhinoJ/BoB3DchG/v4NE5V9KPmpBHKDVB/nnpYd+6HfF1Xs5DKAd80rzLItiL6aqPMnZcV7AC5CyqUEaX1C/o3XEvhAN6C3aA0EejXJT2a+FSq76Q7Uvdx1/oZblX4Tk3cexPXgZsf+8XNXFYmWeYIBMp5nWUseF0izVHivnU+YnZUKah89+WloRAJ25XHvDveuDmkMoX68bznufr+auqGFBHN761vU6ktCH68nPLwAYR3tc53j1DFxRDVdtQBdXbceoi7e0bDIx660iM6PVfKrLZVX4xIHuFEv+3Jmbu3ZAOhKk6Wfnl2XQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 20 Feb 2023 12:38:22 -0600 Michael Roth wrote: > From: Brijesh Singh > > Implement a workaround for an SNP erratum where the CPU will incorrectly > signal an RMP violation #PF if a hugepage (2mb or 1gb) collides with the > RMP entry of a VMCB, VMSA or AVIC backing page. > > When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC > backing pages as "in-use" in the RMP after a successful VMRUN. This Is this "in-use" bit part of an RMP entry? If yes, better list its name in APM. > is done for _all_ VMs, not just SNP-Active VMs. _All_ VMs? Do you mean SEV VMs and SEVSNP VMs? I guess legacy VM is not affected, right? > > If the hypervisor accesses an in-use page through a writable > translation, the CPU will throw an RMP violation #PF. On early SNP > hardware, if an in-use page is 2mb aligned and software accesses any > part of the associated 2mb region with a hupage, the CPU will ^hugepage > incorrectly treat the entire 2mb region as in-use and signal a spurious > RMP violation #PF. > > The recommended is to not use the hugepage for the VMCB, VMSA or > AVIC backing page. Add a generic allocator that will ensure that the > page returns is not hugepage (2mb or 1gb) and is safe to be used when > SEV-SNP is enabled. > > Co-developed-by: Marc Orr > Signed-off-by: Marc Orr > Signed-off-by: Brijesh Singh > Signed-off-by: Ashish Kalra > Signed-off-by: Michael Roth > --- > arch/x86/include/asm/kvm-x86-ops.h | 1 + > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/lapic.c | 5 ++++- > arch/x86/kvm/svm/sev.c | 33 ++++++++++++++++++++++++++++++ > arch/x86/kvm/svm/svm.c | 15 ++++++++++++-- > arch/x86/kvm/svm/svm.h | 1 + > 6 files changed, 54 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > index 6a885f024a00..e116405cbb5f 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed) > KVM_X86_OP(complete_emulated_msr) > KVM_X86_OP(vcpu_deliver_sipi_vector) > KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > +KVM_X86_OP_OPTIONAL(alloc_apic_backing_page) > KVM_X86_OP_OPTIONAL_RET0(fault_is_private); > KVM_X86_OP_OPTIONAL_RET0(update_mem_attr) > KVM_X86_OP_OPTIONAL(invalidate_restricted_mem) > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 37c92412035f..a9363a6f779d 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1729,6 +1729,8 @@ struct kvm_x86_ops { > * Returns vCPU specific APICv inhibit reasons > */ > unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); > + > + void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu); > }; > > struct kvm_x86_nested_ops { > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 80f92cbc4029..72e46d5b4201 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -2740,7 +2740,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns) > > vcpu->arch.apic = apic; > > - apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); > + if (kvm_x86_ops.alloc_apic_backing_page) > + apic->regs = static_call(kvm_x86_alloc_apic_backing_page)(vcpu); > + else > + apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); > if (!apic->regs) { > printk(KERN_ERR "malloc apic regs error for vcpu %x\n", > vcpu->vcpu_id); > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index c1f0d4898ce3..9e9efb42a766 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -3241,3 +3241,36 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector) > break; > } > } > + > +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) > +{ > + unsigned long pfn; > + struct page *p; > + > + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) > + return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); > + > + /* > + * Allocate an SNP safe page to workaround the SNP erratum where > + * the CPU will incorrectly signal an RMP violation #PF if a > + * hugepage (2mb or 1gb) collides with the RMP entry of VMCB, VMSA > + * or AVIC backing page. The recommeded workaround is to not use the > + * hugepage. > + * > + * Allocate one extra page, use a page which is not 2mb aligned > + * and free the other. > + */ > + p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1); > + if (!p) > + return NULL; > + > + split_page(p, 1); > + > + pfn = page_to_pfn(p); > + if (IS_ALIGNED(pfn, PTRS_PER_PMD)) > + __free_page(p++); > + else > + __free_page(p + 1); > + > + return p; > +} The duplicate allocation routine in snp_alloc_vmsa_page() in sev.c can be replaced with snp_safe_alloc_page(). > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 213593dbd7a1..1061aaf66f0a 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1372,7 +1372,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu) > svm = to_svm(vcpu); > > err = -ENOMEM; > - vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); > + vmcb01_page = snp_safe_alloc_page(vcpu); > if (!vmcb01_page) > goto out; > > @@ -1381,7 +1381,7 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu) > * SEV-ES guests require a separate VMSA page used to contain > * the encrypted register state of the guest. > */ > - vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); > + vmsa_page = snp_safe_alloc_page(vcpu); > if (!vmsa_page) > goto error_free_vmcb_page; > > @@ -4696,6 +4696,16 @@ static int svm_vm_init(struct kvm *kvm) > return 0; > } > > +static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu) > +{ > + struct page *page = snp_safe_alloc_page(vcpu); > + > + if (!page) > + return NULL; > + > + return page_address(page); > +} > + > static struct kvm_x86_ops svm_x86_ops __initdata = { > .name = KBUILD_MODNAME, > > @@ -4824,6 +4834,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { > > .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector, > .vcpu_get_apicv_inhibit_reasons = avic_vcpu_get_apicv_inhibit_reasons, > + .alloc_apic_backing_page = svm_alloc_apic_backing_page, > }; > > /* > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index c249c360fe36..5efcf036ccad 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -692,6 +692,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm); > void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector); > void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa); > void sev_es_unmap_ghcb(struct vcpu_svm *svm); > +struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu); > > /* vmenter.S */ >