From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DCAAC64EC7 for ; Wed, 1 Mar 2023 10:25:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8FDB6B0072; Wed, 1 Mar 2023 05:25:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D411B6B0073; Wed, 1 Mar 2023 05:25:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE0EB6B0074; Wed, 1 Mar 2023 05:25:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AE19A6B0072 for ; Wed, 1 Mar 2023 05:25:37 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 89DD014161D for ; Wed, 1 Mar 2023 10:25:37 +0000 (UTC) X-FDA: 80519947914.01.15E530B Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) by imf25.hostedemail.com (Postfix) with ESMTP id 61562A000A for ; Wed, 1 Mar 2023 10:25:35 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=WEnSZTun; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of zhi.wang.linux@gmail.com designates 209.85.167.44 as permitted sender) smtp.mailfrom=zhi.wang.linux@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677666335; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JZu0Ks1uYe7IRz+Gi1Lxco47D+ViK2VorG1zsDjPkks=; b=WJ75NjfhwLL56KJd0KIYq1oK/vAQrB6PU2PPXtaadrk9PrDcyGXdRnG7RbZ8XMzxO60YGQ PcYm83+MvZDFQIvEtGX4Isw+A36qlrlyIgbKUac3d0JFM04Ok0pBQnIioL6BGLIuoqDgoY 4hOZQuD9RMP4J/AX21Y9nWTD0Om/2RA= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=WEnSZTun; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of zhi.wang.linux@gmail.com designates 209.85.167.44 as permitted sender) smtp.mailfrom=zhi.wang.linux@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677666335; a=rsa-sha256; cv=none; b=uRqYW5omGPkTmw9NrSHcAeyRGNjJ7YqWcQECU4ZSvo4JXLD4F3KtuOJiD5mVCMRi8NEc5P JjoL17IyMm4FUyoqRHvK1mL9pkpGnwQvkMYTnPOLBGaUScxjK4nm/DkDKsvIk9CXRCMbJi c+2mp26qEPqMeqEBv9nh/4e5tP4YQ3A= Received: by mail-lf1-f44.google.com with SMTP id i9so16971885lfc.6 for ; Wed, 01 Mar 2023 02:25:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1677666333; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:date:from:from:to:cc:subject:date :message-id:reply-to; bh=JZu0Ks1uYe7IRz+Gi1Lxco47D+ViK2VorG1zsDjPkks=; b=WEnSZTunykIiIBgJEVCz0BhZ2TNgPkFmsQt0d8KElWevOE783/UOqHK+3ZM89BkuRu FRl5actMlgaIiVFNf7sTSCbeJESVYAOJtjd9aS7le5Am+HyMUibEidoFl05BDdEEbJr2 RqhBzm+B7e2UkGTKJaMGleR8BDoIoMnA6y0gDxB7IoWJB+4+ADlppzmEB2ks7A68CM7N n8u/pRDvUwPgoCPhcEEXAzWmx0Lv+NDTeCux51efdZYQhmfPe3lAkJ2/Qo3HeuJzwUma 6QdVn+82G4uwwLHyVvG0UFaZVZMjCSXomM4DBFa59vUQsgQPGC4XeBXVnS5t8Zxi9dO9 xnDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677666333; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JZu0Ks1uYe7IRz+Gi1Lxco47D+ViK2VorG1zsDjPkks=; b=NDk/NXhrHhuXhybPAzl5Ag3CdR/Khb/E7kqcBv8vsx7lFhVge4Xo5g/Jmmbs1qvnVD jBqlkKsYR36++sxmCqWdMK+1Undc8y56kwxnqF0GnLXDH8Uq4Cu3f5NsxoCsLxmIAosP xbQsNHkrIRz6HByPoTprnncOguITQ3F0fMZvHN4VX47B8OvU0fjr8rpan71n4fu11pqw o5o6wG6nUgLgz9xD1YOa3nBLri2FJYLiBeKSR5PXBPAOiVm2DLmMQqAsBBb2GV6ZRL1W FNDSApXXwtBzidnh++WZkLYTOfgPmC/V4w4hZiV9SqH2FLujuPOxck/gTj5dGRHWDtnw qoUw== X-Gm-Message-State: AO0yUKWAjLyriyaiY+GflDD44R05t40HnbCxBnteJBylXw0wlmG3FP72 3DYP3cURwNbTbsgD5/5ZUmA= X-Google-Smtp-Source: AK7set+bABlwqK9McXLfJZ3vCHfIEcU9y6Jo3dU/O68iZ6+n+40Xw+nNndbqA6MJL9cW5AwBuvpUDA== X-Received: by 2002:ac2:4c21:0:b0:4b5:a7c7:9dc4 with SMTP id u1-20020ac24c21000000b004b5a7c79dc4mr1272513lfq.3.1677666333460; Wed, 01 Mar 2023 02:25:33 -0800 (PST) Received: from localhost (88-115-161-74.elisa-laajakaista.fi. [88.115.161.74]) by smtp.gmail.com with ESMTPSA id y6-20020ac255a6000000b004db3eff4b12sm1684552lfg.171.2023.03.01.02.25.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Mar 2023 02:25:33 -0800 (PST) From: Zhi Wang X-Google-Original-From: Zhi Wang Date: Wed, 1 Mar 2023 12:25:31 +0200 To: Michael Roth Cc: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op Message-ID: <20230301122531.00002657@intel.com> In-Reply-To: <20230220183847.59159-2-michael.roth@amd.com> References: <20230220183847.59159-1-michael.roth@amd.com> <20230220183847.59159-2-michael.roth@amd.com> X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 61562A000A X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: fxms3sb1ggwdoaawpenwnhy6rnd6xd3t X-HE-Tag: 1677666335-798017 X-HE-Meta: U2FsdGVkX18nMSqwTvw8sX2ZnushbqVvH92Ty+D6hyTNcBVf0AC+fU+Vf4m1jJT1k++cUnYFqsaX9RoafKmD/EJHLnW/21JW/rfPliKHA3+vmphwxMXFX73wKl9zh4GLklLqLbaqaRTzgk3ZwUBfL/aVW25w6DeR2bEeiJI24KkyF3DDgQz7L22y//zvOOwihTlPXpO4bbsWzOIU8XobF5C3kMIQerzAmz9FAromfq61EV1o8n8veFMdOKI74CteE/5IpT7R+Zc03NXfvqR26C2ScVfIYvndp4bcEC74c3xAv7zadBkFl2dsRTMc5VKJcgmkKdsDjtXb1cLmBljFLeu6N3Vkm6K4QcBnDWZ/OYrByEK9zUd3Kvh0iIt1/WfpN64mjFcB4fPglOK5LQm61cmkfE7zpi/VPf4iz8HCJxELG/vxQ1Ilhl+UGiXLF0DzJzREfUQzlLDGpj4DTeTYkZt3JgC2JrAVo4So1UlO8KJz5+j+oVIvAHZckxj9hNag6BdwBXvyvLFYt6ptXcziUAkDzKDxYYAic4V3sd7UikjAgDqJLxJGldqZwlYzxtBMzcu3W4XkWXXZZueOKuWsPPhs9jrQwrAhpMCUNhyPojVI6rQzl4WX/4qHTp9JbXLUquOSQ1zHPZN4nY0fTvFeyKVys9gh78D4TnVsRqU1sPvI9gFmgUw1jV0/UGwRqYAkc1SgW3njgqwZ2k0ww6RY/ivnaDBjqfKOF23Bijtgq1tlMYORnZIZeYOl7/qsO3kV3buoVMKbCsK0wkxzmVnZ5SHf7Pi0+qqQGr8elxXnTqRt7ln5KywljETOTnztr9FFh0z7hf1Lh+xZjR6rQwyiNsKRYjjzuF1086k7FUUUNY7kwFRPHNXVczexdLZwMmkGIbsGZ9041lQ/KAgzT3YBUCuJdjhgz5pV/OZYSfPIDPEubkmMLEbGixkppiLb0SrNtki/4curo84yMMjz8Et ayYtAZ3u tZekfAyxXsDqghuF84kyFUIv01I+BOiR6FHmLO3m74jQ1GQQnfa06PcSv+5/nYbQkY6hdDrmeOK7YuZDJW1iIzrb05KInK3sGutffzupf5+5wPazh45leCiKfoN5NTr4uO0A6rLg2iu18KM8RkPpw/H5bDP8yfk/FgvA2iaRwf6rzdtORn8Ya/Pa5+u+JexfKTv/s7rAnUBj8NBf7w5nfOo4xktEwarPTMDjnFP3+JxInMY9tI2ANOLlIdxn3TVxZd9qYSDBwPj1EVGcz/q04a+9ptzX4RyoFQsJmgxuWEY0oqNKtSN9YSjqSrbHDzR6ETptO/SonLpg/wgjOeIi1l3zfL5EOeBcAJTnKKDdvRxUn5XCY8khjQBtS/WHmww7lEcRhSSPmLwSA25Q+tzqOb6DOddVpj0HcWiRhUKWVAmFbzOELtG6ULE9F/QMPRRcEKoRiWZf1Nbh3mAOrXKTs4vsA5S6QHhnxI2BWPP6VoUw8sF9x6sIFamrfVnvQue2Hwq2C0fq7dGl7SjI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 20 Feb 2023 12:37:52 -0600 Michael Roth wrote: Basically, I don't think kvm_mmu_fault_is_private() is promising after going through both SNP and TDX patches: 1) Fault path is critical. kvm_mmu_fault_is_private() is always doing a gfn_to _memslot() no matter SNP/TDX is enabled or not. It might mostly hits the slots->last_used_slot, but the worst case is going through an RB-tree search. Adding extra overhead on the generic fault path needs to be re-considered carefully. At least, check if the guest is a CC(SNP/TDX) guest. 2) Just after the gfn_to_memslot() in kvm_mmu_fault_is_private(), there is another gfn_to_memslot(): static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err, bool prefetch) { struct kvm_page_fault fault = { .addr = cr2_or_gpa, .error_code = lower_32_bits(err), .exec = err & PFERR_FETCH_MASK, .write = err & PFERR_WRITE_MASK, .present = err & PFERR_PRESENT_MASK, .rsvd = err & PFERR_RSVD_MASK, .user = err & PFERR_USER_MASK, .prefetch = prefetch, .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault), .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(vcpu->kvm), .max_level = KVM_MAX_HUGEPAGE_LEVEL, .req_level = PG_LEVEL_4K, .goal_level = PG_LEVEL_4K, .is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err), }; int r; if (vcpu->arch.mmu->root_role.direct) { fault.gfn = fault.addr >> PAGE_SHIFT; /* here */ fault.slot = kvm_vcpu_gfn_to_memslot(vcpu, fault.gfn); } I was thinking if checking the private slot and kvm_slot_can_be_private() is necessary in kvm_mmu_fault_is_private(). TDP MMU is expecting fault.is_private to indicate if CPU thinks the fault is private or not (For SNP, it is in PF error code, for TDX it is the shared bit in the fault GPA). TDP MMU will check if the slot is a private slot or not, leave the userspace to handle it when they thinks differently. My points: 1) Resolving the PFER in kvm_x86_ops.fault_is_private and setting fault.is_private is enough. The rest can be handled by the TDP MMU. 2) Put the kvm_x86_ops.fault_is_private in a separate patch so that TDX series can include it. (64bit-error code part can stay in another patch) > This callback is used by the KVM MMU to check whether a #NPF was for a > private GPA or not. > > In some cases the full 64-bit error code for the #NPF will be needed to > make this determination, so also update kvm_mmu_do_page_fault() to > accept the full 64-bit value so it can be plumbed through to the > callback. > > Signed-off-by: Michael Roth > --- > arch/x86/include/asm/kvm-x86-ops.h | 1 + > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/mmu/mmu.c | 3 +-- > arch/x86/kvm/mmu/mmu_internal.h | 37 +++++++++++++++++++++++++++--- > 4 files changed, 37 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > index 8dc345cc6318..72183da010b8 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -131,6 +131,7 @@ KVM_X86_OP(msr_filter_changed) > KVM_X86_OP(complete_emulated_msr) > KVM_X86_OP(vcpu_deliver_sipi_vector) > KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > +KVM_X86_OP_OPTIONAL_RET0(fault_is_private); > > #undef KVM_X86_OP > #undef KVM_X86_OP_OPTIONAL > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index e552374f2357..f856d689dda0 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1643,6 +1643,7 @@ struct kvm_x86_ops { > > void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, > int root_level); > + bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code, bool *private_fault); > > bool (*has_wbinvd_exit)(void); > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index eda615f3951c..fb3f34b7391c 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5724,8 +5724,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err > } > > if (r == RET_PF_INVALID) { > - r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, > - lower_32_bits(error_code), false); > + r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false); > if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm)) > return -EIO; > } > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h > index e642d431df4b..557a001210df 100644 > --- a/arch/x86/kvm/mmu/mmu_internal.h > +++ b/arch/x86/kvm/mmu/mmu_internal.h > @@ -231,6 +231,37 @@ struct kvm_page_fault { > > int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); > > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err) > +{ > + struct kvm_memory_slot *slot; > + bool private_fault = false; > + gfn_t gfn = gpa_to_gfn(gpa); > + > + slot = gfn_to_memslot(kvm, gfn); > + if (!slot) { > + pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn); > + goto out; > + } > + > + if (!kvm_slot_can_be_private(slot)) { > + pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn); > + goto out; > + } > + > + if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault)) > + goto out; > + > + /* > + * Handling below is for UPM self-tests and guests that treat userspace > + * as the authority on whether a fault should be private or not. > + */ > + private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT); > + > +out: > + pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault); > + return private_fault; > +} > + > /* > * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(), > * and of course kvm_mmu_do_page_fault(). > @@ -262,11 +293,11 @@ enum { > }; > > static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > - u32 err, bool prefetch) > + u64 err, bool prefetch) > { > struct kvm_page_fault fault = { > .addr = cr2_or_gpa, > - .error_code = err, > + .error_code = lower_32_bits(err), > .exec = err & PFERR_FETCH_MASK, > .write = err & PFERR_WRITE_MASK, > .present = err & PFERR_PRESENT_MASK, > @@ -280,7 +311,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, > .max_level = KVM_MAX_HUGEPAGE_LEVEL, > .req_level = PG_LEVEL_4K, > .goal_level = PG_LEVEL_4K, > - .is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT), > + .is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err), > }; > int r; >