linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mingwei Zhang <mizhang@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	David Hildenbrand <david@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	David Matlack <dmatlack@google.com>,
	Ben Gardon <bgardon@google.com>
Subject: Re: [PATCH v3 15/28] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page
Date: Thu, 3 Mar 2022 21:19:43 +0000	[thread overview]
Message-ID: <YiEw7z9TCQJl+udS@google.com> (raw)
In-Reply-To: <20220226001546.360188-16-seanjc@google.com>

On Sat, Feb 26, 2022, Sean Christopherson wrote:
> Add a dedicated helper for zapping a TDP MMU root, and use it in the three
> flows that do "zap_all" and intentionally do not do a TLB flush if SPTEs
> are zapped (zapping an entire root is safe if and only if it cannot be in
> use by any vCPU).  Because a TLB flush is never required, unconditionally
> pass "false" to tdp_mmu_iter_cond_resched() when potentially yielding.
> 
> Opportunistically document why KVM must not yield when zapping roots that
> are being zapped by kvm_tdp_mmu_put_root(), i.e. roots whose refcount has
> reached zero, and further harden the flow to detect improper KVM behavior
> with respect to roots that are supposed to be unreachable.
> 
> In addition to hardening zapping of roots, isolating zapping of roots
> will allow future simplification of zap_gfn_range() by having it zap only
> leaf SPTEs, and by removing its tricky "zap all" heuristic.  By having
> all paths that truly need to free _all_ SPs flow through the dedicated
> root zapper, the generic zapper can be freed of those concerns.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 98 +++++++++++++++++++++++++++++++-------
>  1 file changed, 82 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 87706e9cc6f3..c5df9a552470 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -56,10 +56,6 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>  	rcu_barrier();
>  }
>  
> -static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
> -			  gfn_t start, gfn_t end, bool can_yield, bool flush,
> -			  bool shared);
> -
>  static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
>  {
>  	free_page((unsigned long)sp->spt);
> @@ -82,6 +78,9 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
>  	tdp_mmu_free_sp(sp);
>  }
>  
> +static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
> +			     bool shared);
> +
>  void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>  			  bool shared)
>  {
> @@ -104,7 +103,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
>  	 * intermediate paging structures, that may be zapped, as such entries
>  	 * are associated with the ASID on both VMX and SVM.
>  	 */
> -	(void)zap_gfn_range(kvm, root, 0, -1ull, false, false, shared);
> +	tdp_mmu_zap_root(kvm, root, shared);
>  
>  	call_rcu(&root->rcu_head, tdp_mmu_free_sp_rcu_callback);
>  }
> @@ -751,6 +750,76 @@ static inline bool __must_check tdp_mmu_iter_cond_resched(struct kvm *kvm,
>  	return iter->yielded;
>  }
>  
> +static inline gfn_t tdp_mmu_max_gfn_host(void)
> +{
> +	/*
> +	 * Bound TDP MMU walks at host.MAXPHYADDR, guest accesses beyond that
> +	 * will hit a #PF(RSVD) and never hit an EPT Violation/Misconfig / #NPF,
> +	 * and so KVM will never install a SPTE for such addresses.
> +	 */
> +	return 1ULL << (shadow_phys_bits - PAGE_SHIFT);
> +}
> +
> +static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
> +			     bool shared)
> +{
> +	bool root_is_unreachable = !refcount_read(&root->tdp_mmu_root_count);
> +	struct tdp_iter iter;
> +
> +	gfn_t end = tdp_mmu_max_gfn_host();
> +	gfn_t start = 0;
> +
> +	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
> +
> +	rcu_read_lock();
> +
> +	/*
> +	 * No need to try to step down in the iterator when zapping an entire
> +	 * root, zapping an upper-level SPTE will recurse on its children.
> +	 */
> +	for_each_tdp_pte_min_level(iter, root, root->role.level, start, end) {
> +retry:
> +		/*
> +		 * Yielding isn't allowed when zapping an unreachable root as
> +		 * the root won't be processed by mmu_notifier callbacks.  When
> +		 * handling an unmap/release mmu_notifier command, KVM must
> +		 * drop all references to relevant pages prior to completing
> +		 * the callback.  Dropping mmu_lock can result in zapping SPTEs
> +		 * for an unreachable root after a relevant callback completes,
> +		 * which leads to use-after-free as zapping a SPTE triggers
> +		 * "writeback" of dirty/accessed bits to the SPTE's associated
> +		 * struct page.
> +		 */

I have a quick question here: when the roots are unreachable, we can't
yield, understand that after reading the comments. However, what if
there are too many SPTEs that need to be zapped that requires yielding.
In this case, I guess we will have a RCU warning, which is unavoidable,
right?
> +		if (!root_is_unreachable &&
> +		    tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
> +			continue;
> +
> +		if (!is_shadow_present_pte(iter.old_spte))
> +			continue;
> +
> +		if (!shared) {
> +			tdp_mmu_set_spte(kvm, &iter, 0);
> +		} else if (tdp_mmu_set_spte_atomic(kvm, &iter, 0)) {
> +			/*
> +			 * cmpxchg() shouldn't fail if the root is unreachable.
> +			 * Retry so as not to leak the page and its children.
> +			 */
> +			WARN_ONCE(root_is_unreachable,
> +				  "Contended TDP MMU SPTE in unreachable root.");
> +			goto retry;
> +		}
> +
> +		/*
> +		 * WARN if the root is invalid and is unreachable, all SPTEs
> +		 * should've been zapped by kvm_tdp_mmu_zap_invalidated_roots(),
> +		 * and inserting new SPTEs under an invalid root is a KVM bug.
> +		 */
> +		WARN_ON_ONCE(root_is_unreachable && root->role.invalid);
> +	}
> +
> +	rcu_read_unlock();
> +}
> +
>  bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>  	u64 old_spte;
> @@ -799,8 +868,7 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>  			  gfn_t start, gfn_t end, bool can_yield, bool flush,
>  			  bool shared)
>  {
> -	gfn_t max_gfn_host = 1ULL << (shadow_phys_bits - PAGE_SHIFT);
> -	bool zap_all = (start == 0 && end >= max_gfn_host);
> +	bool zap_all = (start == 0 && end >= tdp_mmu_max_gfn_host());
>  	struct tdp_iter iter;
>  
>  	/*
> @@ -809,12 +877,7 @@ static bool zap_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
>  	 */
>  	int min_level = zap_all ? root->role.level : PG_LEVEL_4K;
>  
> -	/*
> -	 * Bound the walk at host.MAXPHYADDR, guest accesses beyond that will
> -	 * hit a #PF(RSVD) and never get to an EPT Violation/Misconfig / #NPF,
> -	 * and so KVM will never install a SPTE for such addresses.
> -	 */
> -	end = min(end, max_gfn_host);
> +	end = min(end, tdp_mmu_max_gfn_host());
>  
>  	kvm_lockdep_assert_mmu_lock_held(kvm, shared);
>  
> @@ -874,6 +937,7 @@ bool __kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, int as_id, gfn_t start,
>  
>  void kvm_tdp_mmu_zap_all(struct kvm *kvm)
>  {
> +	struct kvm_mmu_page *root;
>  	int i;
>  
>  	/*
> @@ -881,8 +945,10 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm)
>  	 * is being destroyed or the userspace VMM has exited.  In both cases,
>  	 * KVM_RUN is unreachable, i.e. no vCPUs will ever service the request.
>  	 */
> -	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
> -		(void)kvm_tdp_mmu_zap_gfn_range(kvm, i, 0, -1ull, false);
> +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> +		for_each_tdp_mmu_root_yield_safe(kvm, root, i, false)
> +			tdp_mmu_zap_root(kvm, root, false);
> +	}
>  }
>  
>  /*
> @@ -908,7 +974,7 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
>  		 * will still flush on yield, but that's a minor performance
>  		 * blip and not a functional issue.
>  		 */
> -		(void)zap_gfn_range(kvm, root, 0, -1ull, true, false, true);
> +		tdp_mmu_zap_root(kvm, root, true);
>  
>  		/*
>  		 * Put the reference acquired in kvm_tdp_mmu_invalidate_roots().
> -- 
> 2.35.1.574.g5d30c73bfb-goog
> 

  parent reply	other threads:[~2022-03-03 21:19 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-26  0:15 [PATCH v3 00/28] KVM: x86/mmu: Overhaul TDP MMU zapping and flushing Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 01/28] KVM: x86/mmu: Use common iterator for walking invalid TDP MMU roots Sean Christopherson
2022-03-02 19:08   ` Mingwei Zhang
2022-03-02 19:51     ` Sean Christopherson
2022-03-03  0:57       ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 02/28] KVM: x86/mmu: Check for present SPTE when clearing dirty bit in TDP MMU Sean Christopherson
2022-03-02 19:50   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 03/28] KVM: x86/mmu: Fix wrong/misleading comments in TDP MMU fast zap Sean Christopherson
2022-02-28 23:15   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 04/28] KVM: x86/mmu: Formalize TDP MMU's (unintended?) deferred TLB flush logic Sean Christopherson
2022-03-02 23:59   ` Mingwei Zhang
2022-03-03  0:12     ` Sean Christopherson
2022-03-03  1:20       ` Mingwei Zhang
2022-03-03  1:41         ` Sean Christopherson
2022-03-03  4:50           ` Mingwei Zhang
2022-03-03 16:45             ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 05/28] KVM: x86/mmu: Document that zapping invalidated roots doesn't need to flush Sean Christopherson
2022-02-28 23:17   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 06/28] KVM: x86/mmu: Require mmu_lock be held for write in unyielding root iter Sean Christopherson
2022-02-28 23:26   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 07/28] KVM: x86/mmu: Check for !leaf=>leaf, not PFN change, in TDP MMU SP removal Sean Christopherson
2022-03-01  0:11   ` Ben Gardon
2022-03-03 18:02   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 08/28] KVM: x86/mmu: Batch TLB flushes from TDP MMU for MMU notifier change_spte Sean Christopherson
2022-03-03 18:08   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 09/28] KVM: x86/mmu: Drop RCU after processing each root in MMU notifier hooks Sean Christopherson
2022-03-03 18:24   ` Mingwei Zhang
2022-03-03 18:32   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 10/28] KVM: x86/mmu: Add helpers to read/write TDP MMU SPTEs and document RCU Sean Christopherson
2022-03-03 18:34   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 11/28] KVM: x86/mmu: WARN if old _or_ new SPTE is REMOVED in non-atomic path Sean Christopherson
2022-03-03 18:37   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 12/28] KVM: x86/mmu: Refactor low-level TDP MMU set SPTE helper to take raw vals Sean Christopherson
2022-03-03 18:47   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 13/28] KVM: x86/mmu: Zap only the target TDP MMU shadow page in NX recovery Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 14/28] KVM: x86/mmu: Skip remote TLB flush when zapping all of TDP MMU Sean Christopherson
2022-03-01  0:19   ` Ben Gardon
2022-03-03 18:50   ` Mingwei Zhang
2022-02-26  0:15 ` [PATCH v3 15/28] KVM: x86/mmu: Add dedicated helper to zap TDP MMU root shadow page Sean Christopherson
2022-03-01  0:32   ` Ben Gardon
2022-03-03 21:19   ` Mingwei Zhang [this message]
2022-03-03 21:24     ` Mingwei Zhang
2022-03-03 23:06       ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 16/28] KVM: x86/mmu: Require mmu_lock be held for write to zap TDP MMU range Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 17/28] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range() Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 18/28] KVM: x86/mmu: Do remote TLB flush before dropping RCU in TDP MMU resched Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 19/28] KVM: x86/mmu: Defer TLB flush to caller when freeing TDP MMU shadow pages Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 20/28] KVM: x86/mmu: Allow yielding when zapping GFNs for defunct TDP MMU root Sean Christopherson
2022-03-01 18:21   ` Paolo Bonzini
2022-03-01 19:43     ` Sean Christopherson
2022-03-01 20:12       ` Paolo Bonzini
2022-03-02  2:13         ` Sean Christopherson
2022-03-02 14:54           ` Paolo Bonzini
2022-03-02 17:43             ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 21/28] KVM: x86/mmu: Zap roots in two passes to avoid inducing RCU stalls Sean Christopherson
2022-03-01  0:43   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 22/28] KVM: x86/mmu: Zap defunct roots via asynchronous worker Sean Christopherson
2022-03-01 17:57   ` Ben Gardon
2022-03-02 17:25   ` Paolo Bonzini
2022-03-02 17:35     ` Sean Christopherson
2022-03-02 18:33       ` David Matlack
2022-03-02 18:36         ` Paolo Bonzini
2022-03-02 18:01     ` Sean Christopherson
2022-03-02 18:20       ` Paolo Bonzini
2022-03-02 19:33         ` Sean Christopherson
2022-03-02 20:14           ` Paolo Bonzini
2022-03-02 20:47             ` Sean Christopherson
2022-03-02 21:22               ` Paolo Bonzini
2022-03-02 22:25                 ` Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 23/28] KVM: x86/mmu: Check for a REMOVED leaf SPTE before making the SPTE Sean Christopherson
2022-03-01 18:06   ` Ben Gardon
2022-02-26  0:15 ` [PATCH v3 24/28] KVM: x86/mmu: WARN on any attempt to atomically update REMOVED SPTE Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 25/28] KVM: selftests: Move raw KVM_SET_USER_MEMORY_REGION helper to utils Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 26/28] KVM: selftests: Split out helper to allocate guest mem via memfd Sean Christopherson
2022-02-28 23:36   ` David Woodhouse
2022-03-02 18:36     ` Paolo Bonzini
2022-03-02 21:55       ` David Woodhouse
2022-02-26  0:15 ` [PATCH v3 27/28] KVM: selftests: Define cpu_relax() helpers for s390 and x86 Sean Christopherson
2022-02-26  0:15 ` [PATCH v3 28/28] KVM: selftests: Add test to populate a VM with the max possible guest mem Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YiEw7z9TCQJl+udS@google.com \
    --to=mizhang@google.com \
    --cc=bgardon@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).