All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race
@ 2022-04-20  0:48 Sean Christopherson
  2022-04-20  0:48 ` [PATCH 1/2] KVM: Fix race between mmu_notifier invalidation and pfncache refresh Sean Christopherson
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Sean Christopherson @ 2022-04-20  0:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, David Woodhouse, Mingwei Zhang, Sean Christopherson

Fix a race between mmu_notifier invalidation and pfncache refresh, and
then fix another race of sorts within pfncache that exists because of the
hacky approach pfncache currently employs to try and handle races with
mmu_notifiers.

Both issues were found by inspection and not proven on hardware.

Sean Christopherson (2):
  KVM: Fix race between mmu_notifier invalidation and pfncache refresh
  KVM: Do not speculatively mark pfn cache valid to "fix" race

 virt/kvm/kvm_main.c |  9 ++++++
 virt/kvm/pfncache.c | 79 +++++++++++++++++++++++++++++----------------
 2 files changed, 61 insertions(+), 27 deletions(-)


base-commit: 150866cd0ec871c765181d145aa0912628289c8a
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] KVM: Fix race between mmu_notifier invalidation and pfncache refresh
  2022-04-20  0:48 [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Sean Christopherson
@ 2022-04-20  0:48 ` Sean Christopherson
  2022-04-20  0:48 ` [PATCH 2/2] KVM: Do not speculatively mark pfn cache valid to "fix" race Sean Christopherson
  2022-04-20 10:41 ` [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2022-04-20  0:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, David Woodhouse, Mingwei Zhang, Sean Christopherson

Key off of mn_active_invalidate_count to detect that a pfncache refresh
needs to wait for an in-progress mmu_notifier invalidation.  Using the
common "retry" approach is broken as the invalidation occurs _before_
mmu_notifier_count is elevated and before mmu_notifier_range_start is
set/updated.  The cache refresh handles the race where an invalidation
occurs between dropping and re-acquiring gpc->lock, but it doesn't handle
the scenario where the cache is refreshed after the cache was invalidated
by the notifier, but before the notifier elevates mmu_notifier_count.

  CPU0                                    CPU1
  ----                                    ----

  gfn_to_pfn_cache_invalidate_start()
  |
  -> gpc->valid = false;
                                          kvm_gfn_to_pfn_cache_refresh()
                                          |
                                          |-> gpc->valid = true;

                                          hva_to_pfn_retry()
                                          |
                                          -> acquire kvm->mmu_lock
                                             kvm->mmu_notifier_count == 0
                                             mmu_seq == kvm->mmu_notifier_seq
                                             drop kvm->mmu_lock
                                             return pfn 'X'
  acquire kvm->mmu_lock
  kvm_inc_notifier_count()
  drop kvm->mmu_lock()
  kernel frees pfn 'X'
                                          kvm_gfn_to_pfn_cache_check()
                                          |
                                          |-> gpc->valid == true

                                          caller accesses freed pfn 'X'

While mn_active_invalidate_count is not guaranteed to be stable, it is
guaranteed to be elevated prior to an invalidation acquiring gpc->lock,
so either the refresh will see an active invalidation and wait, or the
invalidation will run after the refresh completes.

Waiting for in-progress invalidations to complete also obviates the need
to mark gpc->valid before calling hva_to_pfn_retry(), which is in itself
flawed as a concurrent kvm_gfn_to_pfn_cache_check() would see a "valid"
cache with garbage pfn/khva values.  That issue will be remedied in a
future commit.

Opportunistically change the do-while(1) to a for(;;) to make it easier
to see that it loops until a break condition is encountered, the size of
the loop body makes the while(1) rather difficult to see.

Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: stable@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c |  9 ++++++
 virt/kvm/pfncache.c | 70 +++++++++++++++++++++++++++++++--------------
 2 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dfb7dabdbc63..0848430f36c6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -705,6 +705,15 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	kvm->mn_active_invalidate_count++;
 	spin_unlock(&kvm->mn_invalidate_lock);
 
+	/*
+	 * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e.
+	 * before acquiring mmu_lock, to avoid holding mmu_lock while acquiring
+	 * each cache's lock.  There are relatively few caches in existence at
+	 * any given time, and the caches themselves can check for hva overlap,
+	 * i.e. don't need to rely on memslot overlap checks for performance.
+	 * Because this runs without holding mmu_lock, the pfn caches must use
+	 * mn_active_invalidate_count (see above) instead of mmu_notifier_count.
+	 */
 	gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end,
 					  hva_range.may_block);
 
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index dd84676615f1..71c84a43024c 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -112,29 +112,63 @@ static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, gpa_t gpa)
 	}
 }
 
-static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
+static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 {
+	bool first_attempt = true;
 	unsigned long mmu_seq;
 	kvm_pfn_t new_pfn;
-	int retry;
 
-	do {
+	lockdep_assert_held_write(&gpc->lock);
+
+	for (;;) {
 		mmu_seq = kvm->mmu_notifier_seq;
 		smp_rmb();
 
+		write_unlock_irq(&gpc->lock);
+
+		/* Opportunistically check for resched while the lock isn't held. */
+		if (!first_attempt)
+			cond_resched();
+
 		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
+		new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
+
+		write_lock_irq(&gpc->lock);
+
 		if (is_error_noslot_pfn(new_pfn))
 			break;
 
-		KVM_MMU_READ_LOCK(kvm);
-		retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
-		KVM_MMU_READ_UNLOCK(kvm);
-		if (!retry)
+		first_attempt = false;
+
+		/*
+		 * Wait for mn_active_invalidate_count, not mmu_notifier_count,
+		 * to go away, as the invalidation in the mmu_notifier event
+		 * occurs _before_ mmu_notifier_count is elevated.
+		 *
+		 * Note, mn_active_invalidate_count can change at any time as
+		 * it's not protected by gpc->lock.  But, it is guaranteed to
+		 * be elevated before the mmu_notifier acquires gpc->lock, and
+		 * isn't dropped until after mmu_notifier_seq is updated.  So,
+		 * this task may get a false positive of sorts, i.e. see an
+		 * elevated count and wait even though it's technically safe to
+		 * proceed (becase the mmu_notifier will invalidate the cache
+		 * _after_ it's refreshed here), but the cache will never be
+		 * refreshed with stale data, i.e. won't get false negatives.
+		 */
+		if (kvm->mn_active_invalidate_count)
+			continue;
+
+		/*
+		 * Ensure mn_active_invalidate_count is read before
+		 * mmu_notifier_seq.  This pairs with the smp_wmb() in
+		 * mmu_notifier_invalidate_range_end() to guarantee either the
+		 * old (non-zero) value of mn_active_invalidate_count or the
+		 * new (incremented) value of mmu_notifier_seq is observed.
+		 */
+		smp_rmb();
+		if (kvm->mmu_notifier_seq == mmu_seq)
 			break;
-
-		cond_resched();
-	} while (1);
+	}
 
 	return new_pfn;
 }
@@ -190,7 +224,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	 * drop the lock and do the HVA to PFN lookup again.
 	 */
 	if (!old_valid || old_uhva != gpc->uhva) {
-		unsigned long uhva = gpc->uhva;
 		void *new_khva = NULL;
 
 		/* Placeholders for "hva is valid but not yet mapped" */
@@ -198,15 +231,10 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 		gpc->khva = NULL;
 		gpc->valid = true;
 
-		write_unlock_irq(&gpc->lock);
-
-		new_pfn = hva_to_pfn_retry(kvm, uhva);
+		new_pfn = hva_to_pfn_retry(kvm, gpc);
 		if (is_error_noslot_pfn(new_pfn)) {
 			ret = -EFAULT;
-			goto map_done;
-		}
-
-		if (gpc->usage & KVM_HOST_USES_PFN) {
+		} else if (gpc->usage & KVM_HOST_USES_PFN) {
 			if (new_pfn == old_pfn) {
 				new_khva = old_khva;
 				old_pfn = KVM_PFN_ERR_FAULT;
@@ -222,10 +250,10 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 				new_khva += page_offset;
 			else
 				ret = -EFAULT;
+		} else {
+			/* Nothing more to do, the pfn is consumed only by the guest. */
 		}
 
-	map_done:
-		write_lock_irq(&gpc->lock);
 		if (ret) {
 			gpc->valid = false;
 			gpc->pfn = KVM_PFN_ERR_FAULT;
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] KVM: Do not speculatively mark pfn cache valid to "fix" race
  2022-04-20  0:48 [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Sean Christopherson
  2022-04-20  0:48 ` [PATCH 1/2] KVM: Fix race between mmu_notifier invalidation and pfncache refresh Sean Christopherson
@ 2022-04-20  0:48 ` Sean Christopherson
  2022-04-20 10:41 ` [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2022-04-20  0:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, David Woodhouse, Mingwei Zhang, Sean Christopherson

When refreshing a stale pfn cache, mark it valid only after the new pfn
(and maybe kernel mapping) has been acquired.  Speculatively marking the
cache as valid with garbage pfn/khva values can lead to other tasks using
the garbage values.  Presumably, the cache was previously marked valid
before dropping gpc->lock to address a race where an mmu_notifier event
could invalidate the cache between retrieving the pfn via hva_to_pfn()
and re-acquiring gpc->lock.  That race has been plugged by waiting for
mn_active_invalidate_count to reach zero before returning from
hva_to_pfn_retry(), i.e. waiting for any notifiers to fully complete
before returning.

Note, this could also be "fixed" by having kvm_gfn_to_pfn_cache_check()
also verify gpc->hkva, but that's unnecessary now that the aforementioned
race between refresh and mmu_notifier invalidation has been eliminated.

  CPU0                                    CPU1
  ----                                    ----

                                          gpc->valid == false

                                          kvm_gfn_to_pfn_cache_refresh()
                                          |
                                          |-> gpc->valid = true;

                                          hva_to_pfn_retry()
                                          |
                                          -> drop gpc->lock

  acquire gpc->lock for read
  kvm_gfn_to_pfn_cache_check()
  |
  |-> gpc->valid == true
      gpc->khva  == NULL

  caller dereferences NULL khva

Opportunistically add a lockdep to the check() helper to make it slightly
more obvious that there is no memory ordering issue when setting "valid"
versus "pfn" and/or "khva".

Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: stable@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/pfncache.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 71c84a43024c..72eee096a7cd 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -81,6 +81,8 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
 
+	lockdep_assert_held_read(&gpc->lock);
+
 	if ((gpa & ~PAGE_MASK) + len > PAGE_SIZE)
 		return false;
 
@@ -226,11 +228,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	if (!old_valid || old_uhva != gpc->uhva) {
 		void *new_khva = NULL;
 
-		/* Placeholders for "hva is valid but not yet mapped" */
-		gpc->pfn = KVM_PFN_ERR_FAULT;
-		gpc->khva = NULL;
-		gpc->valid = true;
-
 		new_pfn = hva_to_pfn_retry(kvm, gpc);
 		if (is_error_noslot_pfn(new_pfn)) {
 			ret = -EFAULT;
@@ -259,7 +256,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 			gpc->pfn = KVM_PFN_ERR_FAULT;
 			gpc->khva = NULL;
 		} else {
-			/* At this point, gpc->valid may already have been cleared */
+			gpc->valid = true;
 			gpc->pfn = new_pfn;
 			gpc->khva = new_khva;
 		}
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race
  2022-04-20  0:48 [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Sean Christopherson
  2022-04-20  0:48 ` [PATCH 1/2] KVM: Fix race between mmu_notifier invalidation and pfncache refresh Sean Christopherson
  2022-04-20  0:48 ` [PATCH 2/2] KVM: Do not speculatively mark pfn cache valid to "fix" race Sean Christopherson
@ 2022-04-20 10:41 ` Paolo Bonzini
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2022-04-20 10:41 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kernel, David Woodhouse, Mingwei Zhang

Queued, thanks.

Paolo



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-04-20 10:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20  0:48 [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Sean Christopherson
2022-04-20  0:48 ` [PATCH 1/2] KVM: Fix race between mmu_notifier invalidation and pfncache refresh Sean Christopherson
2022-04-20  0:48 ` [PATCH 2/2] KVM: Do not speculatively mark pfn cache valid to "fix" race Sean Christopherson
2022-04-20 10:41 ` [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.