From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 419C9C433EF for ; Sun, 13 Mar 2022 18:41:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235279AbiCMSmF (ORCPT ); Sun, 13 Mar 2022 14:42:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235267AbiCMSmE (ORCPT ); Sun, 13 Mar 2022 14:42:04 -0400 Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A6C27C17A for ; Sun, 13 Mar 2022 11:40:56 -0700 (PDT) Received: by mail-qk1-x730.google.com with SMTP id v13so10664184qkv.3 for ; Sun, 13 Mar 2022 11:40:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9bPraKHQSPApCS4zXrLsYTAtSP08ox4AHFZsxX+X+So=; b=S4+eaGwZ5G3TOxNsf27mR/klfpGd/1SZF8zUpfws/QhFN3TRVDtTBPSC9CmiQjk5vy /Bj6EvLcklgupojIvXEsCpIOEtHB426sXWBVwQGUJFwsFLeC6rWxTViIy/G+hggjvnEA RkFzxSgEbG3NRK0hY8d/tgjYuBsi4jR1kjeoalL4AHCWO+ShN1Bi944vf7PH/yMnBSqc h8WCr/vp7FUwGQmgS7kyPjPQF60Eb35rwdwjIAA0ZXaTMKuvNDzcLBxqcirIUdy9aNC3 NuN2y/cw129VbXUXeb9ucw9t5Kowz19SF80iMxsi7V9ruiSPhU2Tr6tBILq5xixvshZa AA3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9bPraKHQSPApCS4zXrLsYTAtSP08ox4AHFZsxX+X+So=; b=gAX9f7Dj+fsUXYAu4iBdGzBgKQP8DzA17xkMpRjnw/z0L2UpxkIK1Y5wPKEhPd6pge VFgbfzioLlDmntWYGMEnkJQkz68Kryv2i5OmKYpVZlN6qxOntPcVWsuBVzeQnhy602BB mb+eAgt4rXA1wWuNhUWAGOJcU9MQveaG8iIL20dFj0x/P+eh3HKVRiyB0L7ozpegVzd4 iVhCwFJuSWT2MF7HkZBdxx53ohMN9EDoKt5G8GFUeMswgyQ7WJSUaDxvVshveyZwqP4v uEFBPd5B/ZmVehtMg13arAmW8Fi/HrHMr3IWwWTa8ROUfVHiRtG7EkZoEjbecMnqhO9s CBlQ== X-Gm-Message-State: AOAM531cQfy5Jd5ekYsUHPnXp3BzFS1EgiRH9amLUvZH++uI42thc/t4 oDirDBo25eTynduonki50J8TnX/fVM9aNUn0WwhXwXEu6wVGxQ== X-Google-Smtp-Source: ABdhPJxSnBoZFlouaq675FevDv7Qawrsa0Z7dS5iR5AKNJEdPBmi5KmTIvd8LfLeBq9XCr2+52dBWuoz6h0Cb7Hjv4Y= X-Received: by 2002:a37:b2c3:0:b0:67b:118d:81ea with SMTP id b186-20020a37b2c3000000b0067b118d81eamr12754466qkf.148.1647196855017; Sun, 13 Mar 2022 11:40:55 -0700 (PDT) MIME-Version: 1.0 References: <20220303193842.370645-1-pbonzini@redhat.com> <20220303193842.370645-19-pbonzini@redhat.com> In-Reply-To: <20220303193842.370645-19-pbonzini@redhat.com> From: Mingwei Zhang Date: Sun, 13 Mar 2022 11:40:44 -0700 Message-ID: Subject: Re: [PATCH v4 18/30] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() To: Paolo Bonzini Cc: LKML , kvm , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Hildenbrand , David Matlack , Ben Gardon Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 3, 2022 at 11:39 AM Paolo Bonzini wrote: > > From: Sean Christopherson > > Zap only leaf SPTEs in the TDP MMU's zap_gfn_range(), and rename various > functions accordingly. When removing mappings for functional correctness > (except for the stupid VFIO GPU passthrough memslots bug), zapping the > leaf SPTEs is sufficient as the paging structures themselves do not point > at guest memory and do not directly impact the final translation (in the > TDP MMU). > > Note, this aligns the TDP MMU with the legacy/full MMU, which zaps only > the rmaps, a.k.a. leaf SPTEs, in kvm_zap_gfn_range() and > kvm_unmap_gfn_range(). > > Signed-off-by: Sean Christopherson > Reviewed-by: Ben Gardon > Message-Id: <20220226001546.360188-18-seanjc@google.com> > Signed-off-by: Paolo Bonzini > --- > arch/x86/kvm/mmu/mmu.c | 4 ++-- > arch/x86/kvm/mmu/tdp_mmu.c | 41 ++++++++++---------------------------- > arch/x86/kvm/mmu/tdp_mmu.h | 8 +------- > 3 files changed, 14 insertions(+), 39 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 8408d7db8d2a..febdcaaa7b94 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5834,8 +5834,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > > if (is_tdp_mmu_enabled(kvm)) { > for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) > - flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start, > - gfn_end, flush); > + flush = kvm_tdp_mmu_zap_leafs(kvm, i, gfn_start, > + gfn_end, true, flush); > } > > if (flush) > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > index f3939ce4a115..c71debdbc732 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.c > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > @@ -834,10 +834,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) > } > > /* > - * Tears down the mappings for the range of gfns, [start, end), and frees the > - * non-root pages mapping GFNs strictly within that range. Returns true if > - * SPTEs have been cleared and a TLB flush is needed before releasing the > - * MMU lock. > + * Zap leafs SPTEs for the range of gfns, [start, end). Returns true if SPTEs > + * have been cleared and a TLB flush is needed before releasing the MMU lock. > * > * If can_yield is true, will release the MMU lock and reschedule if the > * scheduler needs the CPU or there is contention on the MMU lock. If this > @@ -845,42 +843,25 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp) > * the caller must ensure it does not supply too large a GFN range, or the > * operation can cause a soft lockup. > */ > -static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, > - gfn_t start, gfn_t end, bool can_yield, bool flush) > +static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root, > + gfn_t start, gfn_t end, bool can_yield, bool flush) > { > - bool zap_all = (start == 0 && end >= tdp_mmu_max_gfn_host()); > struct tdp_iter iter; > > - /* > - * No need to try to step down in the iterator when zapping all SPTEs, > - * zapping the top-level non-leaf SPTEs will recurse on their children. > - */ > - int min_level = zap_all ? root->role.level : PG_LEVEL_4K; > - > end = min(end, tdp_mmu_max_gfn_host()); > > lockdep_assert_held_write(&kvm->mmu_lock); > > rcu_read_lock(); > > - for_each_tdp_pte_min_level(iter, root, min_level, start, end) { > + for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) { > if (can_yield && > tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) { > flush = false; > continue; > } > > - if (!is_shadow_present_pte(iter.old_spte)) > - continue; > - > - /* > - * If this is a non-last-level SPTE that covers a larger range > - * than should be zapped, continue, and zap the mappings at a > - * lower level, except when zapping all SPTEs. > - */ > - if (!zap_all && > - (iter.gfn < start || > - iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) && > + if (!is_shadow_present_pte(iter.old_spte) || > !is_last_spte(iter.old_spte, iter.level)) > continue; > > @@ -898,13 +879,13 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root, > * SPTEs have been cleared and a TLB flush is needed before releasing the > * MMU lock. > */ > -bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start, > - gfn_t end, bool can_yield, bool flush) > +bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end, > + bool can_yield, bool flush) > { > struct kvm_mmu_page *root; > > for_each_tdp_mmu_root_yield_safe(kvm, root, as_id) > - flush = zap_gfn_range(kvm, root, start, end, can_yield, flush); > + flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, false); hmm, I think we might have to be very careful here. If we only zap leafs, then there could be side effects. For instance, the code in disallowed_hugepage_adjust() may not work as intended. If you check the following condition in arch/x86/kvm/mmu/mmu.c:2918 if (cur_level > PG_LEVEL_4K && cur_level == fault->goal_level && is_shadow_present_pte(spte) && !is_large_pte(spte)) { If we previously use 4K mappings in this range due to various reasons (dirty logging etc), then afterwards, we zap the range. Then the guest touches a 4K and now we should map the range with whatever the maximum level we can for the guest. However, if we just zap only the leafs, then when the code comes to the above location, is_shadow_present_pte(spte) will return true, since the spte is a non-leaf (say a regular PMD entry). The whole if statement will be true, then we never allow remapping guest memory with huge pages. > > return flush; > } > @@ -1202,8 +1183,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, > bool flush) > { > - return __kvm_tdp_mmu_zap_gfn_range(kvm, range->slot->as_id, range->start, > - range->end, range->may_block, flush); > + return kvm_tdp_mmu_zap_leafs(kvm, range->slot->as_id, range->start, > + range->end, range->may_block, flush); > } > > typedef bool (*tdp_handler_t)(struct kvm *kvm, struct tdp_iter *iter, > diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h > index 5e5ef2576c81..54bc8118c40a 100644 > --- a/arch/x86/kvm/mmu/tdp_mmu.h > +++ b/arch/x86/kvm/mmu/tdp_mmu.h > @@ -15,14 +15,8 @@ __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *root) > void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root, > bool shared); > > -bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start, > +bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, > gfn_t end, bool can_yield, bool flush); > -static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, > - gfn_t start, gfn_t end, bool flush) > -{ > - return __kvm_tdp_mmu_zap_gfn_range(kvm, as_id, start, end, true, flush); > -} > - > bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp); > void kvm_tdp_mmu_zap_all(struct kvm *kvm); > void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); > -- > 2.31.1 > >