From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A49A1C4320A for ; Fri, 6 Aug 2021 22:00:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 80F7261186 for ; Fri, 6 Aug 2021 22:00:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245433AbhHFWAa (ORCPT ); Fri, 6 Aug 2021 18:00:30 -0400 Received: from mga01.intel.com ([192.55.52.88]:19696 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241755AbhHFWA3 (ORCPT ); Fri, 6 Aug 2021 18:00:29 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10068"; a="236433226" X-IronPort-AV: E=Sophos;i="5.84,301,1620716400"; d="scan'208";a="236433226" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Aug 2021 15:00:11 -0700 X-IronPort-AV: E=Sophos;i="5.84,301,1620716400"; d="scan'208";a="481944226" Received: from alsoller-mobl1.amr.corp.intel.com (HELO khuang2-desk.gar.corp.intel.com) ([10.254.16.75]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Aug 2021 15:00:08 -0700 Date: Sat, 7 Aug 2021 10:00:06 +1200 From: Kai Huang To: Sean Christopherson Cc: isaku.yamahata@intel.com, Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , erdemaktas@google.com, Connor Kuehl , x86@kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, isaku.yamahata@gmail.com, Rick Edgecombe Subject: Re: [RFC PATCH v2 41/69] KVM: x86: Add infrastructure for stolen GPA bits Message-Id: <20210807100006.3518bf9fbdecf13006030c22@intel.com> In-Reply-To: References: <20210805234424.d14386b79413845b990a18ac@intel.com> <20210806095922.6e2ca6587dc6f5b4fe8d52e7@intel.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 6 Aug 2021 19:02:39 +0000 Sean Christopherson wrote: > On Fri, Aug 06, 2021, Kai Huang wrote: > > On Thu, 5 Aug 2021 16:06:41 +0000 Sean Christopherson wrote: > > > On Thu, Aug 05, 2021, Kai Huang wrote: > > > > On Fri, 2 Jul 2021 15:04:47 -0700 isaku.yamahata@intel.com wrote: > > > > > From: Rick Edgecombe > > > > > @@ -2020,6 +2032,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > > > > > sp = kvm_mmu_alloc_page(vcpu, direct); > > > > > > > > > > sp->gfn = gfn; > > > > > + sp->gfn_stolen_bits = gfn_stolen_bits; > > > > > sp->role = role; > > > > > hlist_add_head(&sp->hash_link, sp_list); > > > > > if (!direct) { > > > > > @@ -2044,6 +2057,13 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > > > > > return sp; > > > > > } > > > > > > > > > > > > Sorry for replying old thread, > > > > > > Ha, one month isn't old, it's barely even mature. > > > > > > > but to me it looks weird to have gfn_stolen_bits > > > > in 'struct kvm_mmu_page'. If I understand correctly, above code basically > > > > means that GFN with different stolen bit will have different 'struct > > > > kvm_mmu_page', but in the context of this patch, mappings with different > > > > stolen bits still use the same root, > > > > > > You're conflating "mapping" with "PTE". The GFN is a per-PTE value. Yes, there > > > is a final GFN that is representative of the mapping, but more directly the final > > > GFN is associated with the leaf PTE. > > > > > > TDX effectively adds the restriction that all PTEs used for a mapping must have > > > the same shared/private status, so mapping and PTE are somewhat interchangeable > > > when talking about stolen bits (the shared bit), but in the context of this patch, > > > the stolen bits are a property of the PTE. > > > > Yes it is a property of PTE, this is the reason that I think it's weird to have > > stolen bits in 'struct kvm_mmu_page'. Shouldn't stolen bits in 'struct > > kvm_mmu_page' imply that all PTEs (whether leaf or not) share the same > > stolen bit? > > No, the stolen bits are the property of the shadow page. I'm using "PTE" above > to mean "PTE for this shadow page", not PTEs within the shadow page, if that makes > sense. I see. > > > > Back to your statement, it's incorrect. PTEs (effectively mappings in TDX) with > > > different stolen bits will _not_ use the same root. kvm_mmu_get_page() includes > > > the stolen bits in both the hash lookup and in the comparison, i.e. restores the > > > stolen bits when looking for an existing shadow page at the target GFN. > > > > > > @@ -1978,9 +1990,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, > > > role.quadrant = quadrant; > > > } > > > > > > - sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]; > > > + sp_list = &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn_and_stolen)]; > > > for_each_valid_sp(vcpu->kvm, sp, sp_list) { > > > - if (sp->gfn != gfn) { > > > + if ((sp->gfn | sp->gfn_stolen_bits) != gfn_and_stolen) { > > > collisions++; > > > continue; > > > } > > > > > > > This only works for non-root table, but there's only one single > > vcpu->arch.mmu->root_hpa, we don't have an array to have one root for each > > stolen bit, i.e. do a loop in mmu_alloc_direct_roots(), so effectively all > > stolen bits share one single root. > > Yes, and that's absolutely the required behavior for everything except for TDX > with its two EPTPs. E.g. any other implement _must_ reject CR3s that set stolen > gfn bits. OK. I was thinking gfn_stolen_bits for 'struct kvm_mmu_page' for the table pointed by CR3 should still make sense. > > > > > which means gfn_stolen_bits doesn't make a lot of sense at least for root > > > > page table. > > > > > > It does make sense, even without a follow-up patch. In Rick's original series, > > > stealing a bit for execute-only guest memory, there was only a single root. And > > > except for TDX, there can only ever be a single root because the shared EPTP isn't > > > usable, i.e. there's only the regular/private EPTP. > > > > > > > Instead, having gfn_stolen_bits in 'struct kvm_mmu_page' only makes sense in > > > > the context of TDX, since TDX requires two separate roots for private and > > > > shared mappings. > > > > > > > So given we cannot tell whether the same root, or different roots should be > > > > used for different stolen bits, I think we should not add 'gfn_stolen_bits' to > > > > 'struct kvm_mmu_page' and use it to determine whether to allocate a new table > > > > for the same GFN, but should use a new role (i.e role.private) to determine. > > > > > > A new role would work, too, but it has the disadvantage of not automagically > > > working for all uses of stolen bits, e.g. XO support would have to add another > > > role bit. > > > > For each purpose of particular stolen bit, a new role can be defined. For > > instance, in __direct_map(), if you see stolen bit is TDX shared bit, you don't > > set role.private (otherwise set role.private). For XO, if you see the stolen > > bit is XO, you set role.xo. > > > > We already have info of 'gfn_stolen_mask' in vcpu, so we just need to make sure > > all code paths can find the actual stolen bit based on sp->role and vcpu (I > > haven't gone through all though, assuming the annoying part is rmap). > > Yes, and I'm not totally against the idea, but I'm also not 100% sold on it either, > yet... The idea of a 'private' flag is growing on me. > > If we're treating the shared bit as an attribute bit, which we are, then it's > effectively an extension of role.access. Ditto for XO. > > And looking at the code, I think it would be an improvement for TDX, as all of > the is_private_gfn() calls that operate on a shadow page would be simplified and > optimized as they wouldn't have to lookup both gfn_stolen_bits and the vcpu->kvm > mask of the shared bit. > > Actually, the more I think about it, the more I like it. For TDX, there's no > risk of increased hash collisions, as we've already done messed up if there's a > shared vs. private collision. > > And for XO, if it ever makes it way upstream, I think we should flat out disallow > referencing XO addresses in non-leaf PTEs, i.e. make the XO permission bit reserved > in non-leaf PTEs. That would avoid any theoretical problems with the guest doing > something stupid by polluting all its upper level PxEs with XO. Collisions would > be purely limited to the case where the guest is intentionally creating an alternate > mapping, which should be a rare event (or the guest is comprosied, which is also > hopefully a rare event). > > My main motivation is 'gfn_stolen_bits' doesn't quite make sense for 'struct kvm_mmu_page' for root, plus it seems it's a little bit redundant at first glance. So could we have your final suggestion? :) Thanks, -Kai