From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B68A7C6FA8E for ; Tue, 28 Feb 2023 19:12:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D0A216B0071; Tue, 28 Feb 2023 14:12:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB9666B0073; Tue, 28 Feb 2023 14:12:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B80CF6B0074; Tue, 28 Feb 2023 14:12:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A7DC16B0071 for ; Tue, 28 Feb 2023 14:12:04 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6606340A1F for ; Tue, 28 Feb 2023 19:12:04 +0000 (UTC) X-FDA: 80517645768.17.8BBDA8A Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf07.hostedemail.com (Postfix) with ESMTP id 1FD5A4001C for ; Tue, 28 Feb 2023 19:12:01 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=BfMMLuS5; spf=pass (imf07.hostedemail.com: domain of zhi.wang.linux@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=zhi.wang.linux@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677611522; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uv7zPB/3/n/GjotxPr3QAZ4fe2v+az1qjxPnevupIu0=; b=J/anIaxTS17vTcXMHW/X4//Vh9turuIUlW+lwwbsMt4+eA3lQLzztKTo2+JI7f6C7ulYIa qVzCIc+xu3n0HTWYq+NDh18PvloTW3LzxjZFNTnDYKpQGHhc+hRv1b5T4CYauAcbZzDfDQ BmgLQSM9TcPJ8P3i05GLbzcbkUZXhFU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=BfMMLuS5; spf=pass (imf07.hostedemail.com: domain of zhi.wang.linux@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=zhi.wang.linux@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677611522; a=rsa-sha256; cv=none; b=SM0mYWiNfpJruSPO1htkLvsHzK+hZCxmjDohyDa4e+IasnxNV8hgeceMTJoKImVui4yM88 djKPafn1QCuh8Ya187mLXwZleBepvkoshqak6QDebUfwZ5r7xEKUn8Nf+VGZveCR1WJuCv 3Fh/LzPAmtXx9ce719xyCiLLBzdYTao= Received: by mail-lf1-f49.google.com with SMTP id m7so14577287lfj.8 for ; Tue, 28 Feb 2023 11:12:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1677611520; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:date:from:from:to:cc:subject:date :message-id:reply-to; bh=uv7zPB/3/n/GjotxPr3QAZ4fe2v+az1qjxPnevupIu0=; b=BfMMLuS5eSNj/Ge+TUpVvYl+FYWzdyNRKWJ+xpggvWkmwP80jfH5IayK0shL2PiNmF U5c7C6XGKocmqu9lu6iucY5RFxep3UTIjMecuocKxZfFtZloKpLSmEYJy2/mk6H46O0s BQrvtUKPnQHx0rcaBtxXBw7JRnJ/qEdj2weTpk0+GLVe52DONvIQanmhWdgM5JMmSYuF kE+JmQS85MtKbowNvjIl1DE8s137VHAc97GRfKpnsYRT4f2ODxdo3AMC9SGl6cXA4Wue yvkrPrrMs3bC61/OOetNvwBPJCHNcVJxxOog/bLvxI7sRFkV/CqE6nMM1nLeRvmhrYui np/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677611520; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uv7zPB/3/n/GjotxPr3QAZ4fe2v+az1qjxPnevupIu0=; b=BnIy8FUBAnQSNU42ZFJQZar6Ti+KdAH4OFklG3HbgKhcCwsoHTio88knJAvQPmBUFH vAJZs7FZfr6ZHZW+r0nNn1nHotY4o1rkUoXHQ3AGOjiomndvmvR/Loc+ReE+61ISvBSF l2p3r6jjbdyR7Wq77B/zhBSjwI1IcPIciJDUJLbEg9eBdCuE57PGZEPAyOyPqA7MdsmC 73d0LJVzmtTJ5ubiI9qEnxo0OzumKdebGI8SWVnn6XhEBFL+vU7tyt5KRqvkJS5nmMLz c0mcHXNv95z6xGPxM+ebcRY3qKfgkQNgT+3x72FDPhzUdRtFGJIWQXY8fXeG4E+JgLL+ unGA== X-Gm-Message-State: AO0yUKX5yo6iWjcOzTCNxnvc9EoWTvgKlhrl+yOnd4pJAVvzidOUot7D LbZb05AmDjLkRZDECX8avvc= X-Google-Smtp-Source: AK7set8yCOjxHk7icQ9ufval00Ha+MWUzClI0rlxUuq3jsmZAk2MiIPR4enIzec0MGw/UdqPETNCyA== X-Received: by 2002:ac2:44a9:0:b0:4d2:c70a:fe0a with SMTP id c9-20020ac244a9000000b004d2c70afe0amr834685lfm.2.1677611520244; Tue, 28 Feb 2023 11:12:00 -0800 (PST) Received: from localhost (88-115-161-74.elisa-laajakaista.fi. [88.115.161.74]) by smtp.gmail.com with ESMTPSA id b7-20020ac24107000000b004db297957e8sm1441198lfi.305.2023.02.28.11.11.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Feb 2023 11:12:00 -0800 (PST) From: Zhi Wang X-Google-Original-From: Zhi Wang Date: Tue, 28 Feb 2023 21:11:57 +0200 To: Michael Roth Cc: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Brijesh Singh Subject: Re: [PATCH RFC v8 44/56] KVM: SVM: Add support to handle the RMP nested page fault Message-ID: <20230228211157.0000071b@intel.com> In-Reply-To: <20230220183847.59159-45-michael.roth@amd.com> References: <20230220183847.59159-1-michael.roth@amd.com> <20230220183847.59159-45-michael.roth@amd.com> X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1FD5A4001C X-Stat-Signature: jq7og91341ywwu9matfq7fquc1jc9a5t X-HE-Tag: 1677611521-887534 X-HE-Meta: U2FsdGVkX18DtjPSL1iir+MhYQhSAKf9Ts3bHPb/utHq61Z8oyWPMkWCft69vlVVENjVagLkaA0sJQ5+Z+qf6ibVAqkjhkH2bdUXsBRtcPY0nzJCnAljWrL68SSKtmex4I11BgIM4mHhWprmsisOZRO0YcPB88oLk0+Ig3XkeYMxIxFnWkVKRm3cRCwupCCnwlBI9xejbUcwL88+oCs5FBX+uk7eefDhfeOHH+yeCfUYFu1Lv4Ei/OfbrW9ccOlqvsaJ7NQpPOx/mD+6EiYK6b4fg7dqza6mEUKroOb8odSK5t4Cpv8eA15FlGc9vzKbF2jgMg0/88zUwqo+0MugfT3wNtIsNVB8hIPqyT05G2SgE2mdDYHjCzKa5YblDlDOV10w0h1cDSpe2F3q3lDn2s0YRLiLufkIyfIHgLQyyZSj6dv300n64301yaEdpXYDkdKETQ6SWKsIf0gLIQeWEB+QPQwFp/ovRYoEyvsd2ZNt/4j5JhLMcED79JTbYTJ/LRWTE8haEVaKU8Y1YS/ON4Qm6XGiG9ESFibmmKfiXUk7/H7fq7PtJ5pycl8q+hXKGXkgFgabYRBVnjUKFLjBpswtuOei9a1U9UN9YGkBQ6mr3c/xnoUA3iinQVnXYBCKexzq7x/5OMol8fnTvnFwOLG0eYjtO2+K0w/YN4JgtoZEIXzuNlddC1BmrE282xKm0DhQSzrIy75qIoy3s3UUnwdl9sXerMh4e+yfrblptS50xgkLuAdAk8a7A7oikI73RgFHPuI8a5kNsTjEy/FWcsD/0ZynnE1TMOJdJXuRNIPPj3QtkYAAEWmWtAyMCKBKsJPkaRC+8OFza1G2a1HduI1S9wsI3EPjOQYj3lrhQFVpjLUOM5bcOrhkP/mXL+RPzG6th2VmcWwD+1SymToFuU9/hnCM0fy/fpB7Q9ELylGARtKT5dYs1nyZzvR0DHf6GFdHeyVkyWjo9wyQWkg +NWjkWvS Nxbguu+sCHUCj4TfkItMiKsobs679X1rzC2aqhkZJbMVOSGlXVgm4u63Pnhf3brQBzIJ6bnckTQ4xwSUNKHJl/ZmlcnbOgj6p9+hHJVgZ8YlovGq4Tq9BSNBeeYKpr2R+fzdgQkrJ36srlodRGk1RZG9tyoQGhcDDzqN0E1lmR134QhPLmgytdXfi5xKWtFr4HfWvvImAbZv9j/Qw+YSAGemmNsBVST7s8Rv9ZvgepNzZo2+Di+vz/5PhuAVp3qVLy1MWxqBsj2qErg079jk1083iHAG/Vga/yVUDQtpVi1ePG0i3jQ9CeyUzEb5URlEZcpBFNvwDvGaNt2yD7sJdZ5R5wulr9VTKUfpVfsSnyUwDDCfHKuB19BxcRRxswDc0vCpGKWFwPlta0QYDvIfyfqg9OfuWNBgy2Tum6tXbWpW3iMavIO6wj7Bnbp+7BJst56vzZK81u41bkWDWRMBE2hSR3A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 20 Feb 2023 12:38:35 -0600 Michael Roth wrote: > From: Brijesh Singh > > When SEV-SNP is enabled in the guest, the hardware places restrictions > on all memory accesses based on the contents of the RMP table. When > hardware encounters RMP check failure caused by the guest memory access > it raises the #NPF. The error code contains additional information on > the access type. See the APM volume 2 for additional information. > > Page state changes are handled by userspace, so if an RMP fault is > triggered as a result of an RMP NPT fault, exit to userspace just like > with explicit page-state change requests. > > RMP NPT faults can also occur if the guest pvalidates a 2M page as 4K, > in which case the RMP entries need to be PSMASH'd. Handle this case > immediately in the kernel. > > Co-developed-by: Michael Roth > Signed-off-by: Michael Roth > Signed-off-by: Brijesh Singh > Signed-off-by: Ashish Kalra > --- > arch/x86/kvm/svm/sev.c | 84 ++++++++++++++++++++++++++++++++++++++++++ > arch/x86/kvm/svm/svm.c | 21 +++++++++-- > arch/x86/kvm/svm/svm.h | 1 + > 3 files changed, 102 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 102966c43e28..197b1f904567 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -3347,6 +3347,13 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value) > svm->vmcb->control.ghcb_gpa = value; > } > > +static int snp_rmptable_psmash(struct kvm *kvm, kvm_pfn_t pfn) > +{ > + pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1); > + > + return psmash(pfn); > +} > + > /* > * TODO: need to get the value set by userspace in vcpu->run->vmgexit.ghcb_msr > * and process that here accordingly. > @@ -3872,3 +3879,80 @@ void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *le > pr_debug("%s: GFN: 0x%llx, PFN: 0x%llx, level: %d, rmp_level: %d, level_orig: %d, assigned: %d\n", > __func__, gfn, pfn, *level, rmp_level, level_orig, assigned); > } > + > +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) > +{ > + int order, rmp_level, assigned, ret; > + struct kvm_memory_slot *slot; > + struct kvm *kvm = vcpu->kvm; > + kvm_pfn_t pfn; > + gfn_t gfn; > + > + /* > + * Private memslots punt handling of implicit page state changes to ^put > + * userspace, so the only RMP faults expected here for > + * PFERR_GUEST_SIZEM_MASK. Anything else suggests that the RMP table has > + * gotten out of sync with the private memslot. > + * > + * TODO: However, this case has also been noticed when an access occurs > + * to an NPT mapping that has just been split/PSMASHED, in which case > + * PFERR_GUEST_SIZEM_MASK might not be set. In those cases it should be > + * safe to ignore and let the guest retry, but log these just in case > + * for now. > + */ > + if (!(error_code & PFERR_GUEST_SIZEM_MASK)) { > + pr_warn_ratelimited("Unexpected RMP fault for GPA 0x%llx, error_code 0x%llx", > + gpa, error_code); > + return; > + } > + > + gfn = gpa >> PAGE_SHIFT; > + > + /* > + * Only RMPADJUST/PVALIDATE should cause PFERR_GUEST_SIZEM. > + * > + * For PVALIDATE, this should only happen if a guest PVALIDATEs a 4K GFN > + * that is backed by a huge page in the host whose RMP entry has the > + * hugepage/assigned bits set. With UPM, that should only ever happen > + * for private pages. > + * > + * For RMPADJUST, this assumption might not hold, in which case handling > + * for obtaining the PFN from HVA-backed memory may be needed. For now, > + * just print warnings. > + */ > + if (!kvm_mem_is_private(kvm, gfn)) { > + pr_warn_ratelimited("Unexpected RMP fault, size-mismatch for non-private GPA 0x%llx\n", > + gpa); > + return; > + } > + > + slot = gfn_to_memslot(kvm, gfn); > + if (!kvm_slot_can_be_private(slot)) { > + pr_warn_ratelimited("Unexpected RMP fault, non-private slot for GPA 0x%llx\n", > + gpa); > + return; > + } > + > + ret = kvm_restrictedmem_get_pfn(slot, gfn, &pfn, &order); > + if (ret) { > + pr_warn_ratelimited("Unexpected RMP fault, no private backing page for GPA 0x%llx\n", > + gpa); > + return; > + } > + > + assigned = snp_lookup_rmpentry(pfn, &rmp_level); > + if (assigned != 1) { > + pr_warn_ratelimited("Unexpected RMP fault, no assigned RMP entry for GPA 0x%llx\n", > + gpa); > + goto out; > + } > + > + ret = snp_rmptable_psmash(kvm, pfn); > + if (ret) > + pr_err_ratelimited("Unable to split RMP entries for GPA 0x%llx PFN 0x%llx ret %d\n", > + gpa, pfn, ret); > + > +out: > + kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD); > + put_page(pfn_to_page(pfn)); > +} > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 9eb750c8b04c..f9ab4bf6d245 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1976,15 +1976,28 @@ static int pf_interception(struct kvm_vcpu *vcpu) > static int npf_interception(struct kvm_vcpu *vcpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > + int rc; > > u64 fault_address = svm->vmcb->control.exit_info_2; > u64 error_code = svm->vmcb->control.exit_info_1; > > trace_kvm_page_fault(vcpu, fault_address, error_code); > - return kvm_mmu_page_fault(vcpu, fault_address, error_code, > - static_cpu_has(X86_FEATURE_DECODEASSISTS) ? > - svm->vmcb->control.insn_bytes : NULL, > - svm->vmcb->control.insn_len); > + rc = kvm_mmu_page_fault(vcpu, fault_address, error_code, > + static_cpu_has(X86_FEATURE_DECODEASSISTS) ? > + svm->vmcb->control.insn_bytes : NULL, > + svm->vmcb->control.insn_len); > + > + /* > + * rc == 0 indicates a userspace exit is needed to handle page > + * transitions, so do that first before updating the RMP table. > + */ > + if (error_code & PFERR_GUEST_RMP_MASK) { > + if (rc == 0) > + return rc; > + handle_rmp_page_fault(vcpu, fault_address, error_code); > + } > + > + return rc; > } > > static int db_interception(struct kvm_vcpu *vcpu) > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index 0c655a4d32d5..13b00233b315 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -714,6 +714,7 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa); > void sev_es_unmap_ghcb(struct vcpu_svm *svm); > struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu); > void sev_adjust_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *level); > +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code); > > /* vmenter.S */ >