kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] KVM: PPC: Book3S HV: Remove user-triggerable WARN_ON
@ 2020-05-28  1:53 Paul Mackerras
  2020-05-28  1:54 ` [PATCH 2/2] KVM: PPC: Book3S HV: Close race with page faults around memslot flushes Paul Mackerras
  0 siblings, 1 reply; 2+ messages in thread
From: Paul Mackerras @ 2020-05-28  1:53 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Laurent Vivier, David Gibson, Nick Piggin

Although in general we do not expect valid PTEs to be found in
kvmppc_create_pte when we are inserting a large page mapping, there
is one situation where this can occur.  That is when dirty page
logging is turned off for a memslot while the VM is running.
Because the new memslots are installed before the old memslot is
flushed in kvmppc_core_commit_memory_region_hv(), there is a
window where a hypervisor page fault can try to install a 2MB
(or 1GB) page where there are already small page mappings which
were installed while dirty page logging was enabled and which
have not yet been flushed.

Since we have a situation where valid PTEs can legitimately be
found by kvmppc_unmap_free_pte, and which can be triggered by
userspace, just remove the WARN_ON_ONCE, since it is undesirable
to have userspace able to trigger a kernel warning.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 97b45ea..bc3f795 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -429,9 +429,13 @@ void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long gpa,
  * Callers are responsible for flushing the PWC.
  *
  * When page tables are being unmapped/freed as part of page fault path
- * (full == false), ptes are not expected. There is code to unmap them
- * and emit a warning if encountered, but there may already be data
- * corruption due to the unexpected mappings.
+ * (full == false), valid ptes are generally not expected; however, there
+ * is one situation where they arise, which is when dirty page logging is
+ * turned off for a memslot while the VM is running.  The new memslot
+ * becomes visible to page faults before the memslot commit function
+ * gets to flush the memslot, which can lead to a 2MB page mapping being
+ * installed for a guest physical address where there are already 64kB
+ * (or 4kB) mappings (of sub-pages of the same 2MB page).
  */
 static void kvmppc_unmap_free_pte(struct kvm *kvm, pte_t *pte, bool full,
 				  unsigned int lpid)
@@ -445,7 +449,6 @@ static void kvmppc_unmap_free_pte(struct kvm *kvm, pte_t *pte, bool full,
 		for (it = 0; it < PTRS_PER_PTE; ++it, ++p) {
 			if (pte_val(*p) == 0)
 				continue;
-			WARN_ON_ONCE(1);
 			kvmppc_unmap_pte(kvm, p,
 					 pte_pfn(*p) << PAGE_SHIFT,
 					 PAGE_SHIFT, NULL, lpid);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH 2/2] KVM: PPC: Book3S HV: Close race with page faults around memslot flushes
  2020-05-28  1:53 [PATCH 1/2] KVM: PPC: Book3S HV: Remove user-triggerable WARN_ON Paul Mackerras
@ 2020-05-28  1:54 ` Paul Mackerras
  0 siblings, 0 replies; 2+ messages in thread
From: Paul Mackerras @ 2020-05-28  1:54 UTC (permalink / raw)
  To: kvm; +Cc: kvm-ppc, Laurent Vivier, David Gibson, Nick Piggin

There is a potential race condition between hypervisor page faults
and flushing a memslot.  It is possible for a page fault to read the
memslot before a memslot is updated and then write a PTE to the
partition-scoped page tables after kvmppc_radix_flush_memslot has
completed.  (Note that this race has never been explicitly observed.)

To close this race, it is sufficient to increment the MMU sequence
number while the kvm->mmu_lock is held.  That will cause
mmu_notifier_retry() to return true, and the page fault will then
return to the guest without inserting a PTE.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index bc3f795..aa41183 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1130,6 +1130,11 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm,
 					 kvm->arch.lpid);
 		gpa += PAGE_SIZE;
 	}
+	/*
+	 * Increase the mmu notifier sequence number to prevent any page
+	 * fault that read the memslot earlier from writing a PTE.
+	 */
+	kvm->mmu_notifier_seq++;
 	spin_unlock(&kvm->mmu_lock);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-28  1:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-28  1:53 [PATCH 1/2] KVM: PPC: Book3S HV: Remove user-triggerable WARN_ON Paul Mackerras
2020-05-28  1:54 ` [PATCH 2/2] KVM: PPC: Book3S HV: Close race with page faults around memslot flushes Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).