* [PATCH 1/2] KVM: Fix race between mmu_notifier invalidation and pfncache refresh
2022-04-20 0:48 [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Sean Christopherson
@ 2022-04-20 0:48 ` Sean Christopherson
2022-04-20 0:48 ` [PATCH 2/2] KVM: Do not speculatively mark pfn cache valid to "fix" race Sean Christopherson
2022-04-20 10:41 ` [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Paolo Bonzini
2 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2022-04-20 0:48 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, David Woodhouse, Mingwei Zhang, Sean Christopherson
Key off of mn_active_invalidate_count to detect that a pfncache refresh
needs to wait for an in-progress mmu_notifier invalidation. Using the
common "retry" approach is broken as the invalidation occurs _before_
mmu_notifier_count is elevated and before mmu_notifier_range_start is
set/updated. The cache refresh handles the race where an invalidation
occurs between dropping and re-acquiring gpc->lock, but it doesn't handle
the scenario where the cache is refreshed after the cache was invalidated
by the notifier, but before the notifier elevates mmu_notifier_count.
CPU0 CPU1
---- ----
gfn_to_pfn_cache_invalidate_start()
|
-> gpc->valid = false;
kvm_gfn_to_pfn_cache_refresh()
|
|-> gpc->valid = true;
hva_to_pfn_retry()
|
-> acquire kvm->mmu_lock
kvm->mmu_notifier_count == 0
mmu_seq == kvm->mmu_notifier_seq
drop kvm->mmu_lock
return pfn 'X'
acquire kvm->mmu_lock
kvm_inc_notifier_count()
drop kvm->mmu_lock()
kernel frees pfn 'X'
kvm_gfn_to_pfn_cache_check()
|
|-> gpc->valid == true
caller accesses freed pfn 'X'
While mn_active_invalidate_count is not guaranteed to be stable, it is
guaranteed to be elevated prior to an invalidation acquiring gpc->lock,
so either the refresh will see an active invalidation and wait, or the
invalidation will run after the refresh completes.
Waiting for in-progress invalidations to complete also obviates the need
to mark gpc->valid before calling hva_to_pfn_retry(), which is in itself
flawed as a concurrent kvm_gfn_to_pfn_cache_check() would see a "valid"
cache with garbage pfn/khva values. That issue will be remedied in a
future commit.
Opportunistically change the do-while(1) to a for(;;) to make it easier
to see that it loops until a break condition is encountered, the size of
the loop body makes the while(1) rather difficult to see.
Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: stable@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 9 ++++++
virt/kvm/pfncache.c | 70 +++++++++++++++++++++++++++++++--------------
2 files changed, 58 insertions(+), 21 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index dfb7dabdbc63..0848430f36c6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -705,6 +705,15 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
kvm->mn_active_invalidate_count++;
spin_unlock(&kvm->mn_invalidate_lock);
+ /*
+ * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e.
+ * before acquiring mmu_lock, to avoid holding mmu_lock while acquiring
+ * each cache's lock. There are relatively few caches in existence at
+ * any given time, and the caches themselves can check for hva overlap,
+ * i.e. don't need to rely on memslot overlap checks for performance.
+ * Because this runs without holding mmu_lock, the pfn caches must use
+ * mn_active_invalidate_count (see above) instead of mmu_notifier_count.
+ */
gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end,
hva_range.may_block);
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index dd84676615f1..71c84a43024c 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -112,29 +112,63 @@ static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, gpa_t gpa)
}
}
-static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
+static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache *gpc)
{
+ bool first_attempt = true;
unsigned long mmu_seq;
kvm_pfn_t new_pfn;
- int retry;
- do {
+ lockdep_assert_held_write(&gpc->lock);
+
+ for (;;) {
mmu_seq = kvm->mmu_notifier_seq;
smp_rmb();
+ write_unlock_irq(&gpc->lock);
+
+ /* Opportunistically check for resched while the lock isn't held. */
+ if (!first_attempt)
+ cond_resched();
+
/* We always request a writeable mapping */
- new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
+ new_pfn = hva_to_pfn(gpc->uhva, false, NULL, true, NULL);
+
+ write_lock_irq(&gpc->lock);
+
if (is_error_noslot_pfn(new_pfn))
break;
- KVM_MMU_READ_LOCK(kvm);
- retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
- KVM_MMU_READ_UNLOCK(kvm);
- if (!retry)
+ first_attempt = false;
+
+ /*
+ * Wait for mn_active_invalidate_count, not mmu_notifier_count,
+ * to go away, as the invalidation in the mmu_notifier event
+ * occurs _before_ mmu_notifier_count is elevated.
+ *
+ * Note, mn_active_invalidate_count can change at any time as
+ * it's not protected by gpc->lock. But, it is guaranteed to
+ * be elevated before the mmu_notifier acquires gpc->lock, and
+ * isn't dropped until after mmu_notifier_seq is updated. So,
+ * this task may get a false positive of sorts, i.e. see an
+ * elevated count and wait even though it's technically safe to
+ * proceed (becase the mmu_notifier will invalidate the cache
+ * _after_ it's refreshed here), but the cache will never be
+ * refreshed with stale data, i.e. won't get false negatives.
+ */
+ if (kvm->mn_active_invalidate_count)
+ continue;
+
+ /*
+ * Ensure mn_active_invalidate_count is read before
+ * mmu_notifier_seq. This pairs with the smp_wmb() in
+ * mmu_notifier_invalidate_range_end() to guarantee either the
+ * old (non-zero) value of mn_active_invalidate_count or the
+ * new (incremented) value of mmu_notifier_seq is observed.
+ */
+ smp_rmb();
+ if (kvm->mmu_notifier_seq == mmu_seq)
break;
-
- cond_resched();
- } while (1);
+ }
return new_pfn;
}
@@ -190,7 +224,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
* drop the lock and do the HVA to PFN lookup again.
*/
if (!old_valid || old_uhva != gpc->uhva) {
- unsigned long uhva = gpc->uhva;
void *new_khva = NULL;
/* Placeholders for "hva is valid but not yet mapped" */
@@ -198,15 +231,10 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
gpc->khva = NULL;
gpc->valid = true;
- write_unlock_irq(&gpc->lock);
-
- new_pfn = hva_to_pfn_retry(kvm, uhva);
+ new_pfn = hva_to_pfn_retry(kvm, gpc);
if (is_error_noslot_pfn(new_pfn)) {
ret = -EFAULT;
- goto map_done;
- }
-
- if (gpc->usage & KVM_HOST_USES_PFN) {
+ } else if (gpc->usage & KVM_HOST_USES_PFN) {
if (new_pfn == old_pfn) {
new_khva = old_khva;
old_pfn = KVM_PFN_ERR_FAULT;
@@ -222,10 +250,10 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
new_khva += page_offset;
else
ret = -EFAULT;
+ } else {
+ /* Nothing more to do, the pfn is consumed only by the guest. */
}
- map_done:
- write_lock_irq(&gpc->lock);
if (ret) {
gpc->valid = false;
gpc->pfn = KVM_PFN_ERR_FAULT;
--
2.36.0.rc0.470.gd361397f0d-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] KVM: Do not speculatively mark pfn cache valid to "fix" race
2022-04-20 0:48 [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Sean Christopherson
2022-04-20 0:48 ` [PATCH 1/2] KVM: Fix race between mmu_notifier invalidation and pfncache refresh Sean Christopherson
@ 2022-04-20 0:48 ` Sean Christopherson
2022-04-20 10:41 ` [PATCH 0/2] KVM: Fix mmu_notifier vs. pfncache race Paolo Bonzini
2 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2022-04-20 0:48 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, David Woodhouse, Mingwei Zhang, Sean Christopherson
When refreshing a stale pfn cache, mark it valid only after the new pfn
(and maybe kernel mapping) has been acquired. Speculatively marking the
cache as valid with garbage pfn/khva values can lead to other tasks using
the garbage values. Presumably, the cache was previously marked valid
before dropping gpc->lock to address a race where an mmu_notifier event
could invalidate the cache between retrieving the pfn via hva_to_pfn()
and re-acquiring gpc->lock. That race has been plugged by waiting for
mn_active_invalidate_count to reach zero before returning from
hva_to_pfn_retry(), i.e. waiting for any notifiers to fully complete
before returning.
Note, this could also be "fixed" by having kvm_gfn_to_pfn_cache_check()
also verify gpc->hkva, but that's unnecessary now that the aforementioned
race between refresh and mmu_notifier invalidation has been eliminated.
CPU0 CPU1
---- ----
gpc->valid == false
kvm_gfn_to_pfn_cache_refresh()
|
|-> gpc->valid = true;
hva_to_pfn_retry()
|
-> drop gpc->lock
acquire gpc->lock for read
kvm_gfn_to_pfn_cache_check()
|
|-> gpc->valid == true
gpc->khva == NULL
caller dereferences NULL khva
Opportunistically add a lockdep to the check() helper to make it slightly
more obvious that there is no memory ordering issue when setting "valid"
versus "pfn" and/or "khva".
Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Cc: stable@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/pfncache.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 71c84a43024c..72eee096a7cd 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -81,6 +81,8 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
{
struct kvm_memslots *slots = kvm_memslots(kvm);
+ lockdep_assert_held_read(&gpc->lock);
+
if ((gpa & ~PAGE_MASK) + len > PAGE_SIZE)
return false;
@@ -226,11 +228,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
if (!old_valid || old_uhva != gpc->uhva) {
void *new_khva = NULL;
- /* Placeholders for "hva is valid but not yet mapped" */
- gpc->pfn = KVM_PFN_ERR_FAULT;
- gpc->khva = NULL;
- gpc->valid = true;
-
new_pfn = hva_to_pfn_retry(kvm, gpc);
if (is_error_noslot_pfn(new_pfn)) {
ret = -EFAULT;
@@ -259,7 +256,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
gpc->pfn = KVM_PFN_ERR_FAULT;
gpc->khva = NULL;
} else {
- /* At this point, gpc->valid may already have been cleared */
+ gpc->valid = true;
gpc->pfn = new_pfn;
gpc->khva = new_khva;
}
--
2.36.0.rc0.470.gd361397f0d-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread