All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races
@ 2022-04-29 21:00 Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 1/8] Revert "KVM: Do not speculatively mark pfn cache valid to "fix" race" Sean Christopherson
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-04-29 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Sean Christopherson, Lai Jiangshan,
	David Woodhouse, Mingwei Zhang

Fix races between mmu_notifier invalidation and pfncache refresh, and
within the pfncache itself.

The first two patches are reverts of the patches sitting in kvm/queue,
trying to separate and fix the races independently is nigh impossible.
I assume/hope they can be ignored and the original patches dropped.

I've proven all the races, though I was never able to trigger an actual
error in the race with the mmu_notifier, just a WARN I added on the
hva=>pfn translation being invalid/not-present when accessing memory
via the khva.  Hitting the race also required a series of handoffs in the
kernel between the two tasks, i.e. I can't provide any upstream-worthy
test :-(

v3:
  - Split the refresh serialization to a separate patch.
  - Use a mutex to serialize refrehses. [Lai Jiangshan]
  - Add back Cc to stable@ (omitted in v2 because I was less confident
    that backporting the mess would be a good idea].

v2:
  - https://lore.kernel.org/all/20220427014004.1992589-1-seanjc@google.com
  - Map the pfn=>khva outside of gpc->lock. [Maxim]
  - Fix a page leak.
  - Fix more races.

v1:
  https://lore.kernel.org/all/20220420004859.3298837-1-seanjc@google.com

Sean Christopherson (8):
  Revert "KVM: Do not speculatively mark pfn cache valid to "fix" race"
  Revert "KVM: Fix race between mmu_notifier invalidation and pfncache
    refresh"
  KVM: Drop unused @gpa param from gfn=>pfn cache's __release_gpc()
    helper
  KVM: Put the extra pfn reference when reusing a pfn in the gpc cache
  KVM: Do not incorporate page offset into gfn=>pfn cache user address
  KVM: Fully serialize gfn=>pfn cache refresh via mutex
  KVM: Fix multiple races in gfn=>pfn cache refresh
  KVM: Do not pin pages tracked by gfn=>pfn caches

 include/linux/kvm_types.h |   2 +
 virt/kvm/pfncache.c       | 180 +++++++++++++++++++++++---------------
 2 files changed, 113 insertions(+), 69 deletions(-)


base-commit: 2a39d8b39bffdaf1a4223d0d22f07baee154c8f3
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/8] Revert "KVM: Do not speculatively mark pfn cache valid to "fix" race"
  2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
@ 2022-04-29 21:00 ` Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 2/8] Revert "KVM: Fix race between mmu_notifier invalidation and pfncache refresh" Sean Christopherson
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-04-29 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Sean Christopherson, Lai Jiangshan,
	David Woodhouse, Mingwei Zhang

This reverts commit 55111927df1cd140aa7b7ea3f33f524b87776381.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/pfncache.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 72eee096a7cd..71c84a43024c 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -81,8 +81,6 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
 
-	lockdep_assert_held_read(&gpc->lock);
-
 	if ((gpa & ~PAGE_MASK) + len > PAGE_SIZE)
 		return false;
 
@@ -228,6 +226,11 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	if (!old_valid || old_uhva != gpc->uhva) {
 		void *new_khva = NULL;
 
+		/* Placeholders for "hva is valid but not yet mapped" */
+		gpc->pfn = KVM_PFN_ERR_FAULT;
+		gpc->khva = NULL;
+		gpc->valid = true;
+
 		new_pfn = hva_to_pfn_retry(kvm, gpc);
 		if (is_error_noslot_pfn(new_pfn)) {
 			ret = -EFAULT;
@@ -256,7 +259,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 			gpc->pfn = KVM_PFN_ERR_FAULT;
 			gpc->khva = NULL;
 		} else {
-			gpc->valid = true;
+			/* At this point, gpc->valid may already have been cleared */
 			gpc->pfn = new_pfn;
 			gpc->khva = new_khva;
 		}
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/8] Revert "KVM: Fix race between mmu_notifier invalidation and pfncache refresh"
  2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 1/8] Revert "KVM: Do not speculatively mark pfn cache valid to "fix" race" Sean Christopherson
@ 2022-04-29 21:00 ` Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 3/8] KVM: Drop unused @gpa param from gfn=>pfn cache's __release_gpc() helper Sean Christopherson
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-04-29 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Sean Christopherson, Lai Jiangshan,
	David Woodhouse, Mingwei Zhang

This reverts commit c496097d2c0bdc229f82d72b4b1e55d64974c316.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c |  9 ------
 virt/kvm/pfncache.c | 70 ++++++++++++++-------------------------------
 2 files changed, 21 insertions(+), 58 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0848430f36c6..dfb7dabdbc63 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -705,15 +705,6 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	kvm->mn_active_invalidate_count++;
 	spin_unlock(&kvm->mn_invalidate_lock);
 
-	/*
-	 * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e.
-	 * before acquiring mmu_lock, to avoid holding mmu_lock while acquiring
-	 * each cache's lock.  There are relatively few caches in existence at
-	 * any given time, and the caches themselves can check for hva overlap,
-	 * i.e. don't need to rely on memslot overlap checks for performance.
-	 * Because this runs without holding mmu_lock, the pfn caches must use
-	 * mn_active_invalidate_count (see above) instead of mmu_notifier_count.
-	 */
 	gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end,
 					  hva_range.may_block);
 
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 71c84a43024c..dd84676615f1 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -112,63 +112,29 @@ static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, gpa_t gpa)
 	}
 }
 
-static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
+static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
 {
-	bool first_attempt = true;
 	unsigned long mmu_seq;
 	kvm_pfn_t new_pfn;
+	int retry;
 
-	lockdep_assert_held_write(&gpc->lock);
-
-	for (;;) {
+	do {
 		mmu_seq = kvm->mmu_notifier_seq;
 		smp_rmb();
 
-		write_unlock_irq(&gpc->lock);
-
-		/* Opportunistically check for resched while the lock isn't held. */
-		if (!first_attempt)
-			cond_resched();
-
 		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
-
-		write_lock_irq(&gpc->lock);
-
+		new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
 		if (is_error_noslot_pfn(new_pfn))
 			break;
 
-		first_attempt = false;
-
-		/*
-		 * Wait for mn_active_invalidate_count, not mmu_notifier_count,
-		 * to go away, as the invalidation in the mmu_notifier event
-		 * occurs _before_ mmu_notifier_count is elevated.
-		 *
-		 * Note, mn_active_invalidate_count can change at any time as
-		 * it's not protected by gpc->lock.  But, it is guaranteed to
-		 * be elevated before the mmu_notifier acquires gpc->lock, and
-		 * isn't dropped until after mmu_notifier_seq is updated.  So,
-		 * this task may get a false positive of sorts, i.e. see an
-		 * elevated count and wait even though it's technically safe to
-		 * proceed (becase the mmu_notifier will invalidate the cache
-		 * _after_ it's refreshed here), but the cache will never be
-		 * refreshed with stale data, i.e. won't get false negatives.
-		 */
-		if (kvm->mn_active_invalidate_count)
-			continue;
-
-		/*
-		 * Ensure mn_active_invalidate_count is read before
-		 * mmu_notifier_seq.  This pairs with the smp_wmb() in
-		 * mmu_notifier_invalidate_range_end() to guarantee either the
-		 * old (non-zero) value of mn_active_invalidate_count or the
-		 * new (incremented) value of mmu_notifier_seq is observed.
-		 */
-		smp_rmb();
-		if (kvm->mmu_notifier_seq == mmu_seq)
+		KVM_MMU_READ_LOCK(kvm);
+		retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
+		KVM_MMU_READ_UNLOCK(kvm);
+		if (!retry)
 			break;
-	}
+
+		cond_resched();
+	} while (1);
 
 	return new_pfn;
 }
@@ -224,6 +190,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	 * drop the lock and do the HVA to PFN lookup again.
 	 */
 	if (!old_valid || old_uhva != gpc->uhva) {
+		unsigned long uhva = gpc->uhva;
 		void *new_khva = NULL;
 
 		/* Placeholders for "hva is valid but not yet mapped" */
@@ -231,10 +198,15 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 		gpc->khva = NULL;
 		gpc->valid = true;
 
-		new_pfn = hva_to_pfn_retry(kvm, gpc);
+		write_unlock_irq(&gpc->lock);
+
+		new_pfn = hva_to_pfn_retry(kvm, uhva);
 		if (is_error_noslot_pfn(new_pfn)) {
 			ret = -EFAULT;
-		} else if (gpc->usage & KVM_HOST_USES_PFN) {
+			goto map_done;
+		}
+
+		if (gpc->usage & KVM_HOST_USES_PFN) {
 			if (new_pfn == old_pfn) {
 				new_khva = old_khva;
 				old_pfn = KVM_PFN_ERR_FAULT;
@@ -250,10 +222,10 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 				new_khva += page_offset;
 			else
 				ret = -EFAULT;
-		} else {
-			/* Nothing more to do, the pfn is consumed only by the guest. */
 		}
 
+	map_done:
+		write_lock_irq(&gpc->lock);
 		if (ret) {
 			gpc->valid = false;
 			gpc->pfn = KVM_PFN_ERR_FAULT;
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/8] KVM: Drop unused @gpa param from gfn=>pfn cache's __release_gpc() helper
  2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 1/8] Revert "KVM: Do not speculatively mark pfn cache valid to "fix" race" Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 2/8] Revert "KVM: Fix race between mmu_notifier invalidation and pfncache refresh" Sean Christopherson
@ 2022-04-29 21:00 ` Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 4/8] KVM: Put the extra pfn reference when reusing a pfn in the gpc cache Sean Christopherson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-04-29 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Sean Christopherson, Lai Jiangshan,
	David Woodhouse, Mingwei Zhang

Drop the @pga param from __release_gpc() and rename the helper to make it
more obvious that the cache itself is not being released.  The helper
will be reused by a future commit to release a pfn+khva combination that
is _never_ associated with the cache, at which point the current name
would go from slightly misleading to blatantly wrong.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/pfncache.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index dd84676615f1..e05a6a1b8eff 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -95,7 +95,7 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 }
 EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_check);
 
-static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, gpa_t gpa)
+static void gpc_release_pfn_and_khva(struct kvm *kvm, kvm_pfn_t pfn, void *khva)
 {
 	/* Unmap the old page if it was mapped before, and release it */
 	if (!is_error_noslot_pfn(pfn)) {
@@ -146,7 +146,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	unsigned long page_offset = gpa & ~PAGE_MASK;
 	kvm_pfn_t old_pfn, new_pfn;
 	unsigned long old_uhva;
-	gpa_t old_gpa;
 	void *old_khva;
 	bool old_valid;
 	int ret = 0;
@@ -160,7 +159,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 
 	write_lock_irq(&gpc->lock);
 
-	old_gpa = gpc->gpa;
 	old_pfn = gpc->pfn;
 	old_khva = gpc->khva - offset_in_page(gpc->khva);
 	old_uhva = gpc->uhva;
@@ -244,7 +242,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
  out:
 	write_unlock_irq(&gpc->lock);
 
-	__release_gpc(kvm, old_pfn, old_khva, old_gpa);
+	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
 
 	return ret;
 }
@@ -254,14 +252,12 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 {
 	void *old_khva;
 	kvm_pfn_t old_pfn;
-	gpa_t old_gpa;
 
 	write_lock_irq(&gpc->lock);
 
 	gpc->valid = false;
 
 	old_khva = gpc->khva - offset_in_page(gpc->khva);
-	old_gpa = gpc->gpa;
 	old_pfn = gpc->pfn;
 
 	/*
@@ -273,7 +269,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 
 	write_unlock_irq(&gpc->lock);
 
-	__release_gpc(kvm, old_pfn, old_khva, old_gpa);
+	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
 }
 EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_unmap);
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/8] KVM: Put the extra pfn reference when reusing a pfn in the gpc cache
  2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-04-29 21:00 ` [PATCH v3 3/8] KVM: Drop unused @gpa param from gfn=>pfn cache's __release_gpc() helper Sean Christopherson
@ 2022-04-29 21:00 ` Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 5/8] KVM: Do not incorporate page offset into gfn=>pfn cache user address Sean Christopherson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-04-29 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Sean Christopherson, Lai Jiangshan,
	David Woodhouse, Mingwei Zhang

Put the struct page reference to pfn acquired by hva_to_pfn() when the
old and new pfns for a gfn=>pfn cache match.  The cache already has a
reference via the old/current pfn, and will only put one reference when
the cache is done with the pfn.

Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/pfncache.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index e05a6a1b8eff..40cbe90d52e0 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -206,6 +206,14 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 
 		if (gpc->usage & KVM_HOST_USES_PFN) {
 			if (new_pfn == old_pfn) {
+				/*
+				 * Reuse the existing pfn and khva, but put the
+				 * reference acquired hva_to_pfn_retry(); the
+				 * cache still holds a reference to the pfn
+				 * from the previous refresh.
+				 */
+				gpc_release_pfn_and_khva(kvm, new_pfn, NULL);
+
 				new_khva = old_khva;
 				old_pfn = KVM_PFN_ERR_FAULT;
 				old_khva = NULL;
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/8] KVM: Do not incorporate page offset into gfn=>pfn cache user address
  2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-04-29 21:00 ` [PATCH v3 4/8] KVM: Put the extra pfn reference when reusing a pfn in the gpc cache Sean Christopherson
@ 2022-04-29 21:00 ` Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 6/8] KVM: Fully serialize gfn=>pfn cache refresh via mutex Sean Christopherson
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-04-29 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Sean Christopherson, Lai Jiangshan,
	David Woodhouse, Mingwei Zhang

Don't adjust the userspace address in the gfn=>pfn cache by the page
offset from the gpa.  KVM should never use the user address directly, and
all KVM operations that translate a user address to something else
require the user address to be page aligned.  Ignoring the offset will
allow the cache to reuse a gfn=>hva translation in the unlikely event
that the page offset of the gpa changes, but the gfn does not.  And more
importantly, not having to (un)adjust the user address will simplify a
future bug fix.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/pfncache.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 40cbe90d52e0..05cb0bcbf662 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -179,8 +179,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 			ret = -EFAULT;
 			goto out;
 		}
-
-		gpc->uhva += page_offset;
 	}
 
 	/*
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 6/8] KVM: Fully serialize gfn=>pfn cache refresh via mutex
  2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
                   ` (4 preceding siblings ...)
  2022-04-29 21:00 ` [PATCH v3 5/8] KVM: Do not incorporate page offset into gfn=>pfn cache user address Sean Christopherson
@ 2022-04-29 21:00 ` Sean Christopherson
  2022-05-20 15:24   ` Paolo Bonzini
  2022-04-29 21:00 ` [PATCH v3 7/8] KVM: Fix multiple races in gfn=>pfn cache refresh Sean Christopherson
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2022-04-29 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Sean Christopherson, Lai Jiangshan,
	David Woodhouse, Mingwei Zhang

Protect gfn=>pfn cache refresh with a mutex to fully serialize refreshes.
The refresh logic doesn't protect against concurrent refreshes with
different GPAs (which may or may not be a desired use case, but it's
allowed in the code), nor does it protect against a false negative on the
memslot generation.

If the first refresh sees a stale memslot generation, it will refresh the
hva and generation before moving on to the hva=>pfn translation.  If it
then drops gpc->lock, a different user of the cache can come along,
acquire gpc->lock, see that the memslot generation is fresh, and skip
the hva=>pfn update due to the userspace address also matching (because
it too was updated).

The refresh path can already sleep during hva=>pfn resolution, so wrap
the refresh with a mutex to ensure that any given refresh runs to
completion before other callers can start their refresh.

Cc: stable@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_types.h |  2 ++
 virt/kvm/pfncache.c       | 10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index ac1ebb37a0ff..f328a01db4fe 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -19,6 +19,7 @@ struct kvm_memslots;
 enum kvm_mr_change;
 
 #include <linux/bits.h>
+#include <linux/mutex.h>
 #include <linux/types.h>
 #include <linux/spinlock_types.h>
 
@@ -69,6 +70,7 @@ struct gfn_to_pfn_cache {
 	struct kvm_vcpu *vcpu;
 	struct list_head list;
 	rwlock_t lock;
+	struct mutex refresh_lock;
 	void *khva;
 	kvm_pfn_t pfn;
 	enum pfn_cache_usage usage;
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 05cb0bcbf662..eaef31462bbe 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -157,6 +157,13 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	if (page_offset + len > PAGE_SIZE)
 		return -EINVAL;
 
+	/*
+	 * If another task is refreshing the cache, wait for it to complete.
+	 * There is no guarantee that concurrent refreshes will see the same
+	 * gpa, memslots generation, etc..., so they must be fully serialized.
+	 */
+	mutex_lock(&gpc->refresh_lock);
+
 	write_lock_irq(&gpc->lock);
 
 	old_pfn = gpc->pfn;
@@ -248,6 +255,8 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
  out:
 	write_unlock_irq(&gpc->lock);
 
+	mutex_unlock(&gpc->refresh_lock);
+
 	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
 
 	return ret;
@@ -288,6 +297,7 @@ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 
 	if (!gpc->active) {
 		rwlock_init(&gpc->lock);
+		mutex_init(&gpc->refresh_lock);
 
 		gpc->khva = NULL;
 		gpc->pfn = KVM_PFN_ERR_FAULT;
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 7/8] KVM: Fix multiple races in gfn=>pfn cache refresh
  2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
                   ` (5 preceding siblings ...)
  2022-04-29 21:00 ` [PATCH v3 6/8] KVM: Fully serialize gfn=>pfn cache refresh via mutex Sean Christopherson
@ 2022-04-29 21:00 ` Sean Christopherson
  2022-04-29 21:00 ` [PATCH v3 8/8] KVM: Do not pin pages tracked by gfn=>pfn caches Sean Christopherson
  2022-05-20 16:04 ` [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Paolo Bonzini
  8 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-04-29 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Sean Christopherson, Lai Jiangshan,
	David Woodhouse, Mingwei Zhang

Rework the gfn=>pfn cache (gpc) refresh logic to address multiple races
between the cache itself, and between the cache and mmu_notifier events.

The existing refresh code attempts to guard against races with the
mmu_notifier by speculatively marking the cache valid, and then marking
it invalid if a mmu_notifier invalidation occurs.  That handles the case
where an invalidation occurs between dropping and re-acquiring gpc->lock,
but it doesn't handle the scenario where the cache is refreshed after the
cache was invalidated by the notifier, but before the notifier elevates
mmu_notifier_count.  The gpc refresh can't use the "retry" helper as its
invalidation occurs _before_ mmu_notifier_count is elevated and before
mmu_notifier_range_start is set/updated.

  CPU0                                    CPU1
  ----                                    ----

  gfn_to_pfn_cache_invalidate_start()
  |
  -> gpc->valid = false;
                                          kvm_gfn_to_pfn_cache_refresh()
                                          |
                                          |-> gpc->valid = true;

                                          hva_to_pfn_retry()
                                          |
                                          -> acquire kvm->mmu_lock
                                             kvm->mmu_notifier_count == 0
                                             mmu_seq == kvm->mmu_notifier_seq
                                             drop kvm->mmu_lock
                                             return pfn 'X'
  acquire kvm->mmu_lock
  kvm_inc_notifier_count()
  drop kvm->mmu_lock()
  kernel frees pfn 'X'
                                          kvm_gfn_to_pfn_cache_check()
                                          |
                                          |-> gpc->valid == true

                                          caller accesses freed pfn 'X'

Key off of mn_active_invalidate_count to detect that a pfncache refresh
needs to wait for an in-progress mmu_notifier invalidation.  While
mn_active_invalidate_count is not guaranteed to be stable, it is
guaranteed to be elevated prior to an invalidation acquiring gpc->lock,
so either the refresh will see an active invalidation and wait, or the
invalidation will run after the refresh completes.

Speculatively marking the cache valid is itself flawed, as a concurrent
kvm_gfn_to_pfn_cache_check() would see a valid cache with stale pfn/khva
values.  The KVM Xen use case explicitly allows/wants multiple users;
even though the caches are allocated per vCPU, __kvm_xen_has_interrupt()
can read a different vCPU (or vCPUs).  Address this race by invalidating
the cache prior to dropping gpc->lock (this is made possible by fixing
the above mmu_notifier race).

Complicating all of this is the fact that both the hva=>pfn resolution
and mapping of the kernel address can sleep, i.e. must be done outside
of gpc->lock.

Fix the above races in one fell swoop, trying to fix each individual race
is largely pointless and essentially impossible to test, e.g. closing one
hole just shifts the focus to the other hole.

Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: stable@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/kvm_main.c |   9 +++
 virt/kvm/pfncache.c | 193 ++++++++++++++++++++++++++++----------------
 2 files changed, 131 insertions(+), 71 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dfb7dabdbc63..0848430f36c6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -705,6 +705,15 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	kvm->mn_active_invalidate_count++;
 	spin_unlock(&kvm->mn_invalidate_lock);
 
+	/*
+	 * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e.
+	 * before acquiring mmu_lock, to avoid holding mmu_lock while acquiring
+	 * each cache's lock.  There are relatively few caches in existence at
+	 * any given time, and the caches themselves can check for hva overlap,
+	 * i.e. don't need to rely on memslot overlap checks for performance.
+	 * Because this runs without holding mmu_lock, the pfn caches must use
+	 * mn_active_invalidate_count (see above) instead of mmu_notifier_count.
+	 */
 	gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end,
 					  hva_range.may_block);
 
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index eaef31462bbe..0535669ea2a1 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -112,31 +112,122 @@ static void gpc_release_pfn_and_khva(struct kvm *kvm, kvm_pfn_t pfn, void *khva)
 	}
 }
 
-static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
+static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 {
+	/* Note, the new page offset may be different than the old! */
+	void *old_khva = gpc->khva - offset_in_page(gpc->khva);
+	kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
+	void *new_khva = NULL;
 	unsigned long mmu_seq;
-	kvm_pfn_t new_pfn;
-	int retry;
 
-	do {
+	lockdep_assert_held_write(&gpc->lock);
+
+	/*
+	 * Invalidate the cache prior to dropping gpc->lock, the gpa=>uhva
+	 * assets have already been updated and so a concurrent check() from a
+	 * different task may not fail the gpa/uhva/generation checks.
+	 */
+	gpc->valid = false;
+
+	for (;;) {
 		mmu_seq = kvm->mmu_notifier_seq;
 		smp_rmb();
 
+		write_unlock_irq(&gpc->lock);
+
+		/*
+		 * If the previous iteration "failed" due to an mmu_notifier
+		 * event, release the pfn and unmap the kernel virtual address
+		 * from the previous attempt.  Unmapping might sleep, so this
+		 * needs to be done after dropping the lock.  Opportunistically
+		 * check for resched while the lock isn't held.
+		 */
+		if (new_pfn != KVM_PFN_ERR_FAULT) {
+			/*
+			 * Keep the mapping if the previous iteration reused
+			 * the existing mapping and didn't create a new one.
+			 */
+			if (new_khva == old_khva)
+				new_khva = NULL;
+
+			gpc_release_pfn_and_khva(kvm, new_pfn, new_khva);
+
+			cond_resched();
+		}
+
 		/* We always request a writeable mapping */
-		new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
+		new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
 		if (is_error_noslot_pfn(new_pfn))
-			break;
+			goto out_error;
+
+		/*
+		 * Obtain a new kernel mapping if KVM itself will access the
+		 * pfn.  Note, kmap() and memremap() can both sleep, so this
+		 * too must be done outside of gpc->lock!
+		 */
+		if (gpc->usage & KVM_HOST_USES_PFN) {
+			if (new_pfn == gpc->pfn) {
+				new_khva = old_khva;
+			} else if (pfn_valid(new_pfn)) {
+				new_khva = kmap(pfn_to_page(new_pfn));
+#ifdef CONFIG_HAS_IOMEM
+			} else {
+				new_khva = memremap(pfn_to_hpa(new_pfn), PAGE_SIZE, MEMREMAP_WB);
+#endif
+			}
+			if (!new_khva) {
+				kvm_release_pfn_clean(new_pfn);
+				goto out_error;
+			}
+		}
+
+		write_lock_irq(&gpc->lock);
 
-		KVM_MMU_READ_LOCK(kvm);
-		retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
-		KVM_MMU_READ_UNLOCK(kvm);
-		if (!retry)
+		/*
+		 * Other tasks must wait for _this_ refresh to complete before
+		 * attempting to refresh.
+		 */
+		WARN_ON_ONCE(gpc->valid);
+
+		/*
+		 * Wait for mn_active_invalidate_count, not mmu_notifier_count,
+		 * to go away, as the invalidation in the mmu_notifier event
+		 * occurs _before_ mmu_notifier_count is elevated.
+		 *
+		 * Note, mn_active_invalidate_count can change at any time as
+		 * it's not protected by gpc->lock.  But, it is guaranteed to
+		 * be elevated before the mmu_notifier acquires gpc->lock, and
+		 * isn't dropped until after mmu_notifier_seq is updated.  So,
+		 * this task may get a false positive of sorts, i.e. see an
+		 * elevated count and wait even though it's technically safe to
+		 * proceed (becase the mmu_notifier will invalidate the cache
+		 * _after_ it's refreshed here), but the cache will never be
+		 * refreshed with stale data, i.e. won't get false negatives.
+		 */
+		if (kvm->mn_active_invalidate_count)
+			continue;
+
+		/*
+		 * Ensure mn_active_invalidate_count is read before
+		 * mmu_notifier_seq.  This pairs with the smp_wmb() in
+		 * mmu_notifier_invalidate_range_end() to guarantee either the
+		 * old (non-zero) value of mn_active_invalidate_count or the
+		 * new (incremented) value of mmu_notifier_seq is observed.
+		 */
+		smp_rmb();
+		if (kvm->mmu_notifier_seq == mmu_seq)
 			break;
+	}
+
+	gpc->valid = true;
+	gpc->pfn = new_pfn;
+	gpc->khva = new_khva + (gpc->gpa & ~PAGE_MASK);
+	return 0;
 
-		cond_resched();
-	} while (1);
+out_error:
+	write_lock_irq(&gpc->lock);
 
-	return new_pfn;
+	return -EFAULT;
 }
 
 int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
@@ -147,7 +238,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	kvm_pfn_t old_pfn, new_pfn;
 	unsigned long old_uhva;
 	void *old_khva;
-	bool old_valid;
 	int ret = 0;
 
 	/*
@@ -169,7 +259,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	old_pfn = gpc->pfn;
 	old_khva = gpc->khva - offset_in_page(gpc->khva);
 	old_uhva = gpc->uhva;
-	old_valid = gpc->valid;
 
 	/* If the userspace HVA is invalid, refresh that first */
 	if (gpc->gpa != gpa || gpc->generation != slots->generation ||
@@ -182,7 +271,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 		gpc->uhva = gfn_to_hva_memslot(gpc->memslot, gfn);
 
 		if (kvm_is_error_hva(gpc->uhva)) {
-			gpc->pfn = KVM_PFN_ERR_FAULT;
 			ret = -EFAULT;
 			goto out;
 		}
@@ -192,60 +280,8 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	 * If the userspace HVA changed or the PFN was already invalid,
 	 * drop the lock and do the HVA to PFN lookup again.
 	 */
-	if (!old_valid || old_uhva != gpc->uhva) {
-		unsigned long uhva = gpc->uhva;
-		void *new_khva = NULL;
-
-		/* Placeholders for "hva is valid but not yet mapped" */
-		gpc->pfn = KVM_PFN_ERR_FAULT;
-		gpc->khva = NULL;
-		gpc->valid = true;
-
-		write_unlock_irq(&gpc->lock);
-
-		new_pfn = hva_to_pfn_retry(kvm, uhva);
-		if (is_error_noslot_pfn(new_pfn)) {
-			ret = -EFAULT;
-			goto map_done;
-		}
-
-		if (gpc->usage & KVM_HOST_USES_PFN) {
-			if (new_pfn == old_pfn) {
-				/*
-				 * Reuse the existing pfn and khva, but put the
-				 * reference acquired hva_to_pfn_retry(); the
-				 * cache still holds a reference to the pfn
-				 * from the previous refresh.
-				 */
-				gpc_release_pfn_and_khva(kvm, new_pfn, NULL);
-
-				new_khva = old_khva;
-				old_pfn = KVM_PFN_ERR_FAULT;
-				old_khva = NULL;
-			} else if (pfn_valid(new_pfn)) {
-				new_khva = kmap(pfn_to_page(new_pfn));
-#ifdef CONFIG_HAS_IOMEM
-			} else {
-				new_khva = memremap(pfn_to_hpa(new_pfn), PAGE_SIZE, MEMREMAP_WB);
-#endif
-			}
-			if (new_khva)
-				new_khva += page_offset;
-			else
-				ret = -EFAULT;
-		}
-
-	map_done:
-		write_lock_irq(&gpc->lock);
-		if (ret) {
-			gpc->valid = false;
-			gpc->pfn = KVM_PFN_ERR_FAULT;
-			gpc->khva = NULL;
-		} else {
-			/* At this point, gpc->valid may already have been cleared */
-			gpc->pfn = new_pfn;
-			gpc->khva = new_khva;
-		}
+	if (!gpc->valid || old_uhva != gpc->uhva) {
+		ret = hva_to_pfn_retry(kvm, gpc);
 	} else {
 		/* If the HVA→PFN mapping was already valid, don't unmap it. */
 		old_pfn = KVM_PFN_ERR_FAULT;
@@ -253,11 +289,26 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	}
 
  out:
+	/*
+	 * Invalidate the cache and purge the pfn/khva if the refresh failed.
+	 * Some/all of the uhva, gpa, and memslot generation info may still be
+	 * valid, leave it as is.
+	 */
+	if (ret) {
+		gpc->valid = false;
+		gpc->pfn = KVM_PFN_ERR_FAULT;
+		gpc->khva = NULL;
+	}
+
+	/* Snapshot the new pfn before dropping the lock! */
+	new_pfn = gpc->pfn;
+
 	write_unlock_irq(&gpc->lock);
 
 	mutex_unlock(&gpc->refresh_lock);
 
-	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
+	if (old_pfn != new_pfn)
+		gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
 
 	return ret;
 }
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 8/8] KVM: Do not pin pages tracked by gfn=>pfn caches
  2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
                   ` (6 preceding siblings ...)
  2022-04-29 21:00 ` [PATCH v3 7/8] KVM: Fix multiple races in gfn=>pfn cache refresh Sean Christopherson
@ 2022-04-29 21:00 ` Sean Christopherson
  2022-05-20 16:04 ` [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Paolo Bonzini
  8 siblings, 0 replies; 13+ messages in thread
From: Sean Christopherson @ 2022-04-29 21:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Sean Christopherson, Lai Jiangshan,
	David Woodhouse, Mingwei Zhang

Put the reference to any struct page mapped/tracked by a gfn=>pfn cache
upon inserting the pfn into its associated cache, as opposed to putting
the reference only when the cache is done using the pfn.  In other words,
don't pin pages while they're in the cache.  One of the major roles of
the gfn=>pfn cache is to play nicely with invalidation events, i.e. it
exists in large part so that KVM doesn't rely on pinning pages.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/pfncache.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 0535669ea2a1..c4fb81929663 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -95,20 +95,16 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 }
 EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_check);
 
-static void gpc_release_pfn_and_khva(struct kvm *kvm, kvm_pfn_t pfn, void *khva)
+static void gpc_unmap_khva(struct kvm *kvm, kvm_pfn_t pfn, void *khva)
 {
-	/* Unmap the old page if it was mapped before, and release it */
-	if (!is_error_noslot_pfn(pfn)) {
-		if (khva) {
-			if (pfn_valid(pfn))
-				kunmap(pfn_to_page(pfn));
+	/* Unmap the old pfn/page if it was mapped before. */
+	if (!is_error_noslot_pfn(pfn) && khva) {
+		if (pfn_valid(pfn))
+			kunmap(pfn_to_page(pfn));
 #ifdef CONFIG_HAS_IOMEM
-			else
-				memunmap(khva);
+		else
+			memunmap(khva);
 #endif
-		}
-
-		kvm_release_pfn(pfn, false);
 	}
 }
 
@@ -147,10 +143,10 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 			 * Keep the mapping if the previous iteration reused
 			 * the existing mapping and didn't create a new one.
 			 */
-			if (new_khva == old_khva)
-				new_khva = NULL;
+			if (new_khva != old_khva)
+				gpc_unmap_khva(kvm, new_pfn, new_khva);
 
-			gpc_release_pfn_and_khva(kvm, new_pfn, new_khva);
+			kvm_release_pfn_clean(new_pfn);
 
 			cond_resched();
 		}
@@ -222,6 +218,14 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 	gpc->valid = true;
 	gpc->pfn = new_pfn;
 	gpc->khva = new_khva + (gpc->gpa & ~PAGE_MASK);
+
+	/*
+	 * Put the reference to the _new_ pfn.  The pfn is now tracked by the
+	 * cache and can be safely migrated, swapped, etc... as the cache will
+	 * invalidate any mappings in response to relevant mmu_notifier events.
+	 */
+	kvm_release_pfn_clean(new_pfn);
+
 	return 0;
 
 out_error:
@@ -308,7 +312,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 	mutex_unlock(&gpc->refresh_lock);
 
 	if (old_pfn != new_pfn)
-		gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
+		gpc_unmap_khva(kvm, old_pfn, old_khva);
 
 	return ret;
 }
@@ -335,7 +339,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
 
 	write_unlock_irq(&gpc->lock);
 
-	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
+	gpc_unmap_khva(kvm, old_pfn, old_khva);
 }
 EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_unmap);
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/8] KVM: Fully serialize gfn=>pfn cache refresh via mutex
  2022-04-29 21:00 ` [PATCH v3 6/8] KVM: Fully serialize gfn=>pfn cache refresh via mutex Sean Christopherson
@ 2022-05-20 15:24   ` Paolo Bonzini
  2022-05-20 15:53     ` Sean Christopherson
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-20 15:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Lai Jiangshan, David Woodhouse, Mingwei Zhang

On 4/29/22 23:00, Sean Christopherson wrote:
> diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
> index 05cb0bcbf662..eaef31462bbe 100644
> --- a/virt/kvm/pfncache.c
> +++ b/virt/kvm/pfncache.c
> @@ -157,6 +157,13 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
>   	if (page_offset + len > PAGE_SIZE)
>   		return -EINVAL;
>   
> +	/*
> +	 * If another task is refreshing the cache, wait for it to complete.
> +	 * There is no guarantee that concurrent refreshes will see the same
> +	 * gpa, memslots generation, etc..., so they must be fully serialized.
> +	 */
> +	mutex_lock(&gpc->refresh_lock);
> +
>   	write_lock_irq(&gpc->lock);
>   
>   	old_pfn = gpc->pfn;
> @@ -248,6 +255,8 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
>    out:
>   	write_unlock_irq(&gpc->lock);
>   
> +	mutex_unlock(&gpc->refresh_lock);
> +
>   	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
>   
>   	return ret;

Does kvm_gfn_to_pfn_cache_unmap also need to take the mutex, to avoid 
the WARN_ON(gpc->valid)?

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/8] KVM: Fully serialize gfn=>pfn cache refresh via mutex
  2022-05-20 15:24   ` Paolo Bonzini
@ 2022-05-20 15:53     ` Sean Christopherson
  2022-05-20 16:01       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Sean Christopherson @ 2022-05-20 15:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Lai Jiangshan, David Woodhouse, Mingwei Zhang

On Fri, May 20, 2022, Paolo Bonzini wrote:
> On 4/29/22 23:00, Sean Christopherson wrote:
> > diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
> > index 05cb0bcbf662..eaef31462bbe 100644
> > --- a/virt/kvm/pfncache.c
> > +++ b/virt/kvm/pfncache.c
> > @@ -157,6 +157,13 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
> >   	if (page_offset + len > PAGE_SIZE)
> >   		return -EINVAL;
> > +	/*
> > +	 * If another task is refreshing the cache, wait for it to complete.
> > +	 * There is no guarantee that concurrent refreshes will see the same
> > +	 * gpa, memslots generation, etc..., so they must be fully serialized.
> > +	 */
> > +	mutex_lock(&gpc->refresh_lock);
> > +
> >   	write_lock_irq(&gpc->lock);
> >   	old_pfn = gpc->pfn;
> > @@ -248,6 +255,8 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
> >    out:
> >   	write_unlock_irq(&gpc->lock);
> > +	mutex_unlock(&gpc->refresh_lock);
> > +
> >   	gpc_release_pfn_and_khva(kvm, old_pfn, old_khva);
> >   	return ret;
> 
> Does kvm_gfn_to_pfn_cache_unmap also need to take the mutex, to avoid the
> WARN_ON(gpc->valid)?

I don't know What WARN_ON() you're referring to, but there is a double-free bug
if unmap() runs during an invalidation.  That can be solved without having to
take the mutex though, just reset valid/pfn/khva before the retry.

When searching to see how unmap() was used in the original series (there's no
other user besides destroy...), I stumbled across this likely-related syzbot bug
that unfortunately didn't Cc KVM :-(

https://lore.kernel.org/all/00000000000073f09205db439577@google.com


diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 72eee096a7cd..1719b0249dbc 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -228,6 +228,11 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
        if (!old_valid || old_uhva != gpc->uhva) {
                void *new_khva = NULL;

+               /* blah blah blah */
+               gpc->valid = false;
+               gpc->pfn = KVM_PFN_ERR_FAULT;
+               gpc->khva = NULL;
+
                new_pfn = hva_to_pfn_retry(kvm, gpc);
                if (is_error_noslot_pfn(new_pfn)) {
                        ret = -EFAULT;
@@ -251,11 +256,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
                        /* Nothing more to do, the pfn is consumed only by the guest. */
                }

-               if (ret) {
-                       gpc->valid = false;
-                       gpc->pfn = KVM_PFN_ERR_FAULT;
-                       gpc->khva = NULL;
-               } else {
+               if (!ret) {
                        gpc->valid = true;
                        gpc->pfn = new_pfn;
                        gpc->khva = new_khva;
@@ -283,6 +284,11 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)

        write_lock_irq(&gpc->lock);

+       if (!gpc->valid) {
+               write_unlock_irq(&gpc->lock);
+               return;
+       }
+
        gpc->valid = false;

        old_khva = gpc->khva - offset_in_page(gpc->khva);


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/8] KVM: Fully serialize gfn=>pfn cache refresh via mutex
  2022-05-20 15:53     ` Sean Christopherson
@ 2022-05-20 16:01       ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-20 16:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Lai Jiangshan, David Woodhouse, Mingwei Zhang

On 5/20/22 17:53, Sean Christopherson wrote:
>> Does kvm_gfn_to_pfn_cache_unmap also need to take the mutex, to avoid the
>> WARN_ON(gpc->valid)?
> I don't know What WARN_ON() you're referring to, but there is a double-free bug
> if unmap() runs during an invalidation.  That can be solved without having to
> take the mutex though, just reset valid/pfn/khva before the retry.

I was thinking of this one:

	/*
	 * Other tasks must wait for _this_ refresh to complete before
	 * attempting to refresh.
	 */
	WARN_ON_ONCE(gpc->valid);

but unmap sets gpc->valid to false, not true. ಠ_ಠ

Still, as you point out unmap() and refresh() can easily clash.  In 
practice they probably exclude each other by different means (e.g. 
running in a single vCPU thread), but then in practice neither is a fast 
path.  It seems easier to just make them thread-safe the easy way now 
that there is a mutex.

> When searching to see how unmap() was used in the original series (there's no
> other user besides destroy...), I stumbled across this likely-related syzbot bug
> that unfortunately didn't Cc KVM:-(

To give an example, VMCLEAR would do an unmap() if the VMCS12 was mapped 
with a gfn-to-pfn cache.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races
  2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
                   ` (7 preceding siblings ...)
  2022-04-29 21:00 ` [PATCH v3 8/8] KVM: Do not pin pages tracked by gfn=>pfn caches Sean Christopherson
@ 2022-05-20 16:04 ` Paolo Bonzini
  8 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2022-05-20 16:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Lai Jiangshan, David Woodhouse, Mingwei Zhang

Queued, thanks.

Paolo



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-05-20 16:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29 21:00 [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Sean Christopherson
2022-04-29 21:00 ` [PATCH v3 1/8] Revert "KVM: Do not speculatively mark pfn cache valid to "fix" race" Sean Christopherson
2022-04-29 21:00 ` [PATCH v3 2/8] Revert "KVM: Fix race between mmu_notifier invalidation and pfncache refresh" Sean Christopherson
2022-04-29 21:00 ` [PATCH v3 3/8] KVM: Drop unused @gpa param from gfn=>pfn cache's __release_gpc() helper Sean Christopherson
2022-04-29 21:00 ` [PATCH v3 4/8] KVM: Put the extra pfn reference when reusing a pfn in the gpc cache Sean Christopherson
2022-04-29 21:00 ` [PATCH v3 5/8] KVM: Do not incorporate page offset into gfn=>pfn cache user address Sean Christopherson
2022-04-29 21:00 ` [PATCH v3 6/8] KVM: Fully serialize gfn=>pfn cache refresh via mutex Sean Christopherson
2022-05-20 15:24   ` Paolo Bonzini
2022-05-20 15:53     ` Sean Christopherson
2022-05-20 16:01       ` Paolo Bonzini
2022-04-29 21:00 ` [PATCH v3 7/8] KVM: Fix multiple races in gfn=>pfn cache refresh Sean Christopherson
2022-04-29 21:00 ` [PATCH v3 8/8] KVM: Do not pin pages tracked by gfn=>pfn caches Sean Christopherson
2022-05-20 16:04 ` [PATCH v3 0/8] KVM: Fix mmu_notifier vs. pfncache vs. pfncache races Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.