linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements
@ 2021-02-13  0:50 Sean Christopherson
  2021-02-13  0:50 ` [PATCH 01/14] KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE pages Sean Christopherson
                   ` (15 more replies)
  0 siblings, 16 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Paolo, this is more or less ready, but on final read-through before
sending I realized it would be a good idea to WARN during VM destruction
if cpu_dirty_logging_count is non-zero.  I wanted to get you this before
the 5.12 window opens in case you want the TDP MMU fixes for 5.12.  I'll
do the above change and retest next week (note, Monday is a US holiday).

On to the code...

This started out as a small tweak to collapsible SPTE zapping in the TDP
MMU, and ended up as a rather large overhaul of CPU dirty logging, a.k.a.
PML.

Four main highlights:

  - Do a more precise check on whether or not a SPTE should be zapped to
    rebuild it as a large page.
  - Disable PML when running L2.  PML is fully emulated for L1 VMMs, thus
    enabling PML in L2 can only hurt and never help.
  - Drop the existing PML kvm_x86_ops.  They're basically trampolines into
    the MMU, and IMO do far more harm than good.
  - Turn on PML only when it's needed instead of setting all dirty bits to
    soft disable PML.

What led me down the rabbit's hole of ripping out the existing PML
kvm_x86_ops isn't really shown here.  Prior to incorporating Makarand's
patch, which allowed for the wholesale remove of setting dirty bits,
I spent a bunch of time poking around the "set dirty bits" code.  My
original changes optimized that path to skip setting dirty bits in the
nested MMU, since the nested MMU relies on write-protection and not PML.
That in turn allowed the TDP MMU zapping to completely skip walking the
rmaps, but doing so based on a bunch of callbacks was a twisted mess.

Happily, those patches got dropped in favor of nuking the code entirely.

Ran selftest and unit tests, and migrated actual VMs on AMD and Intel,
with and without TDP MMU, and with and without EPT.  The AMD system I'm
testing on infinite loops on the reset vector due to a #PF when NPT is
disabled, so that didn't get tested.  That reproduces with kvm/next,
I'll dig into it next week (no idea if it's a KVM or hardware issue).

For actual migration, I ran kvm-unit-tests in L1 along with stress to
hammer memory, and verified migration was effectively blocked until the
stress threads were killed (I didn't feel like figuring out how to
throttle the VM).

Makarand Sonare (1):
  KVM: VMX: Dynamically enable/disable PML based on memslot dirty
    logging

Sean Christopherson (13):
  KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE
    pages
  KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU
  KVM: x86/mmu: Split out max mapping level calculation to helper
  KVM: x86/mmu: Pass the memslot to the rmap callbacks
  KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  KVM: nVMX: Disable PML in hardware when running L2
  KVM: x86/mmu: Expand on the comment in
    kvm_vcpu_ad_need_write_protect()
  KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function
  KVM: x86: Move MMU's PML logic to common code
  KVM: x86: Further clarify the logic and comments for toggling log
    dirty
  KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML
  KVM: x86: Fold "write-protect large" use case into generic
    write-protect
  KVM: x86/mmu: Remove a variety of unnecessary exports

 arch/x86/include/asm/kvm-x86-ops.h |   6 +-
 arch/x86/include/asm/kvm_host.h    |  36 +----
 arch/x86/kvm/mmu/mmu.c             | 203 +++++++++--------------------
 arch/x86/kvm/mmu/mmu_internal.h    |   7 +-
 arch/x86/kvm/mmu/tdp_mmu.c         |  66 +---------
 arch/x86/kvm/mmu/tdp_mmu.h         |   3 +-
 arch/x86/kvm/vmx/nested.c          |  34 +++--
 arch/x86/kvm/vmx/vmx.c             |  94 +++++--------
 arch/x86/kvm/vmx/vmx.h             |   2 +
 arch/x86/kvm/x86.c                 | 145 +++++++++++++--------
 10 files changed, 230 insertions(+), 366 deletions(-)

-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/14] KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE pages
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-18 12:36   ` Paolo Bonzini
  2021-02-13  0:50 ` [PATCH 02/14] KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU Sean Christopherson
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Zap SPTEs that are backed by ZONE_DEVICE pages when zappings SPTEs to
rebuild them as huge pages in the TDP MMU.  ZONE_DEVICE huge pages are
managed differently than "regular" pages and are not compound pages.

Cc: Ben Gardon <bgardon@google.com>
Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 71e100a5670f..3cc332ed099d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1348,7 +1348,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
 		pfn = spte_to_pfn(iter.old_spte);
 		if (kvm_is_reserved_pfn(pfn) ||
-		    !PageTransCompoundMap(pfn_to_page(pfn)))
+		    (!PageTransCompoundMap(pfn_to_page(pfn)) &&
+		     !kvm_is_zone_device_pfn(pfn)))
 			continue;
 
 		tdp_mmu_set_spte(kvm, &iter, 0);
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/14] KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
  2021-02-13  0:50 ` [PATCH 01/14] KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE pages Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-13  0:50 ` [PATCH 03/14] KVM: x86/mmu: Split out max mapping level calculation to helper Sean Christopherson
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Respect start_level when write-protect pages in the TDP MMU for dirty
logging.  When the dirty bitmaps are initialized with all bits set, small
pages don't need to be write-protected as they've already been marked
dirty.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e507568cd55d..24325bdcd387 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5500,7 +5500,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 	flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
 				start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
 	if (is_tdp_mmu_enabled(kvm))
-		flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_4K);
+		flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, start_level);
 	write_unlock(&kvm->mmu_lock);
 
 	/*
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/14] KVM: x86/mmu: Split out max mapping level calculation to helper
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
  2021-02-13  0:50 ` [PATCH 01/14] KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE pages Sean Christopherson
  2021-02-13  0:50 ` [PATCH 02/14] KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-13  0:50 ` [PATCH 04/14] KVM: x86/mmu: Pass the memslot to the rmap callbacks Sean Christopherson
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Factor out the logic for determining the maximum mapping level given a
memslot and a gpa.  The helper will be used when zapping collapsible
SPTEs when disabling dirty logging, e.g. to avoid zapping SPTEs that
can't possibly be rebuilt as hugepages.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 37 ++++++++++++++++++++-------------
 arch/x86/kvm/mmu/mmu_internal.h |  2 ++
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 24325bdcd387..9be7fd474b2d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2756,8 +2756,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 	__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int host_pfn_mapping_level(struct kvm_vcpu *vcpu, gfn_t gfn,
-				  kvm_pfn_t pfn, struct kvm_memory_slot *slot)
+static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
+				  struct kvm_memory_slot *slot)
 {
 	unsigned long hva;
 	pte_t *pte;
@@ -2776,19 +2776,36 @@ static int host_pfn_mapping_level(struct kvm_vcpu *vcpu, gfn_t gfn,
 	 */
 	hva = __gfn_to_hva_memslot(slot, gfn);
 
-	pte = lookup_address_in_mm(vcpu->kvm->mm, hva, &level);
+	pte = lookup_address_in_mm(kvm->mm, hva, &level);
 	if (unlikely(!pte))
 		return PG_LEVEL_4K;
 
 	return level;
 }
 
+int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot,
+			      gfn_t gfn, kvm_pfn_t pfn, int max_level)
+{
+	struct kvm_lpage_info *linfo;
+
+	max_level = min(max_level, max_huge_page_level);
+	for ( ; max_level > PG_LEVEL_4K; max_level--) {
+		linfo = lpage_info_slot(gfn, slot, max_level);
+		if (!linfo->disallow_lpage)
+			break;
+	}
+
+	if (max_level == PG_LEVEL_4K)
+		return PG_LEVEL_4K;
+
+	return host_pfn_mapping_level(kvm, gfn, pfn, slot);
+}
+
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
 			    bool huge_page_disallowed, int *req_level)
 {
 	struct kvm_memory_slot *slot;
-	struct kvm_lpage_info *linfo;
 	kvm_pfn_t pfn = *pfnp;
 	kvm_pfn_t mask;
 	int level;
@@ -2805,17 +2822,7 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 	if (!slot)
 		return PG_LEVEL_4K;
 
-	max_level = min(max_level, max_huge_page_level);
-	for ( ; max_level > PG_LEVEL_4K; max_level--) {
-		linfo = lpage_info_slot(gfn, slot, max_level);
-		if (!linfo->disallow_lpage)
-			break;
-	}
-
-	if (max_level == PG_LEVEL_4K)
-		return PG_LEVEL_4K;
-
-	level = host_pfn_mapping_level(vcpu, gfn, pfn, slot);
+	level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, gfn, pfn, max_level);
 	if (level == PG_LEVEL_4K)
 		return level;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 9e38d3c5daad..0b55aa561ec8 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -138,6 +138,8 @@ enum {
 #define SET_SPTE_NEED_REMOTE_TLB_FLUSH	BIT(1)
 #define SET_SPTE_SPURIOUS		BIT(2)
 
+int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot,
+			      gfn_t gfn, kvm_pfn_t pfn, int max_level);
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    int max_level, kvm_pfn_t *pfnp,
 			    bool huge_page_disallowed, int *req_level);
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/14] KVM: x86/mmu: Pass the memslot to the rmap callbacks
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 03/14] KVM: x86/mmu: Split out max mapping level calculation to helper Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-13  0:50 ` [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs Sean Christopherson
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Pass the memslot to the rmap callbacks, it will be used when zapping
collapsible SPTEs to verify the memslot is compatible with hugepages
before zapping its SPTEs.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9be7fd474b2d..fb719e7a0cbb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1165,7 +1165,8 @@ static bool spte_wrprot_for_clear_dirty(u64 *sptep)
  *	- W bit on ad-disabled SPTEs.
  * Returns true iff any D or W bits were cleared.
  */
-static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			       struct kvm_memory_slot *slot)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -1196,7 +1197,8 @@ static bool spte_set_dirty(u64 *sptep)
 	return mmu_spte_update(sptep, spte);
 }
 
-static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			     struct kvm_memory_slot *slot)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -1260,7 +1262,7 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 	while (mask) {
 		rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
 					  PG_LEVEL_4K, slot);
-		__rmap_clear_dirty(kvm, rmap_head);
+		__rmap_clear_dirty(kvm, rmap_head, slot);
 
 		/* clear the first set bit */
 		mask &= mask - 1;
@@ -1325,7 +1327,8 @@ static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
 	return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn);
 }
 
-static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+			  struct kvm_memory_slot *slot)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
@@ -1345,7 +1348,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 			   struct kvm_memory_slot *slot, gfn_t gfn, int level,
 			   unsigned long data)
 {
-	return kvm_zap_rmapp(kvm, rmap_head);
+	return kvm_zap_rmapp(kvm, rmap_head, slot);
 }
 
 static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
@@ -5189,7 +5192,8 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
 EXPORT_SYMBOL_GPL(kvm_configure_mmu);
 
 /* The return value indicates if tlb flush on all vcpus is needed. */
-typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
+typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+				    struct kvm_memory_slot *slot);
 
 /* The caller should hold mmu-lock before calling this function. */
 static __always_inline bool
@@ -5203,7 +5207,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
 			end_gfn, &iterator) {
 		if (iterator.rmap)
-			flush |= fn(kvm, iterator.rmap);
+			flush |= fn(kvm, iterator.rmap, memslot);
 
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
 			if (flush && lock_flush_tlb) {
@@ -5492,7 +5496,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 }
 
 static bool slot_rmap_write_protect(struct kvm *kvm,
-				    struct kvm_rmap_head *rmap_head)
+				    struct kvm_rmap_head *rmap_head,
+				    struct kvm_memory_slot *slot)
 {
 	return __rmap_write_protect(kvm, rmap_head, false);
 }
@@ -5526,7 +5531,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 }
 
 static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
-					 struct kvm_rmap_head *rmap_head)
+					 struct kvm_rmap_head *rmap_head,
+					 struct kvm_memory_slot *slot)
 {
 	u64 *sptep;
 	struct rmap_iterator iter;
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (3 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 04/14] KVM: x86/mmu: Pass the memslot to the rmap callbacks Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-18 12:43   ` Paolo Bonzini
  2021-02-13  0:50 ` [PATCH 06/14] KVM: nVMX: Disable PML in hardware when running L2 Sean Christopherson
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

When zapping SPTEs in order to rebuild them as huge pages, use the new
helper that computes the max mapping level to detect whether or not a
SPTE should be zapped.  Doing so avoids zapping SPTEs that can't
possibly be rebuilt as huge pages, e.g. due to hardware constraints,
memslot alignment, etc...

This also avoids zapping SPTEs that are still large, e.g. if migration
was canceled before write-protected huge pages were shattered to enable
dirty logging.  Note, such pages are still write-protected at this time,
i.e. a page fault VM-Exit will still occur.  This will hopefully be
addressed in a future patch.

Sadly, TDP MMU loses its const on the memslot, but that's a pervasive
problem that's been around for quite some time.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 11 ++++++-----
 arch/x86/kvm/mmu/tdp_mmu.c | 13 +++++++------
 arch/x86/kvm/mmu/tdp_mmu.h |  2 +-
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fb719e7a0cbb..d5849a0e3de1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5553,8 +5553,8 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 		 * mapping if the indirect sp has level = 1.
 		 */
 		if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
-		    (kvm_is_zone_device_pfn(pfn) ||
-		     PageCompound(pfn_to_page(pfn)))) {
+		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
+							       pfn, PG_LEVEL_NUM)) {
 			pte_list_remove(rmap_head, sptep);
 
 			if (kvm_available_flush_tlb_with_range())
@@ -5574,12 +5574,13 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot)
 {
 	/* FIXME: const-ify all uses of struct kvm_memory_slot.  */
+	struct kvm_memory_slot *slot = (struct kvm_memory_slot *)memslot;
+
 	write_lock(&kvm->mmu_lock);
-	slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
-			 kvm_mmu_zap_collapsible_spte, true);
+	slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
 
 	if (is_tdp_mmu_enabled(kvm))
-		kvm_tdp_mmu_zap_collapsible_sptes(kvm, memslot);
+		kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot);
 	write_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3cc332ed099d..f8fa1f64e10d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1328,8 +1328,10 @@ bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot)
  */
 static void zap_collapsible_spte_range(struct kvm *kvm,
 				       struct kvm_mmu_page *root,
-				       gfn_t start, gfn_t end)
+				       struct kvm_memory_slot *slot)
 {
+	gfn_t start = slot->base_gfn;
+	gfn_t end = start + slot->npages;
 	struct tdp_iter iter;
 	kvm_pfn_t pfn;
 	bool spte_set = false;
@@ -1348,8 +1350,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
 		pfn = spte_to_pfn(iter.old_spte);
 		if (kvm_is_reserved_pfn(pfn) ||
-		    (!PageTransCompoundMap(pfn_to_page(pfn)) &&
-		     !kvm_is_zone_device_pfn(pfn)))
+		    iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
+							    pfn, PG_LEVEL_NUM))
 			continue;
 
 		tdp_mmu_set_spte(kvm, &iter, 0);
@@ -1367,7 +1369,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
  * be replaced by large mappings, for GFNs within the slot.
  */
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
-				       const struct kvm_memory_slot *slot)
+				       struct kvm_memory_slot *slot)
 {
 	struct kvm_mmu_page *root;
 	int root_as_id;
@@ -1377,8 +1379,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 		if (root_as_id != slot->as_id)
 			continue;
 
-		zap_collapsible_spte_range(kvm, root, slot->base_gfn,
-					   slot->base_gfn + slot->npages);
+		zap_collapsible_spte_range(kvm, root, slot);
 	}
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index b4b65e3699b3..d31c5ed81a18 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -35,7 +35,7 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 				       bool wrprot);
 bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
-				       const struct kvm_memory_slot *slot);
+				       struct kvm_memory_slot *slot);
 
 bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm,
 				   struct kvm_memory_slot *slot, gfn_t gfn);
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/14] KVM: nVMX: Disable PML in hardware when running L2
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (4 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-13  0:50 ` [PATCH 07/14] KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect() Sean Christopherson
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Unconditionally disable PML in vmcs02, KVM emulates PML purely in the
MMU, e.g. vmx_flush_pml_buffer() doesn't even try to copy the L2 GPAs
from vmcs02's buffer to vmcs12.  At best, enabling PML is a nop.  At
worst, it will cause vmx_flush_pml_buffer() to record bogus GFNs in the
dirty logs.

Initialize vmcs02.GUEST_PML_INDEX such that PML writes would trigger
VM-Exit if PML was somehow enabled, skip flushing the buffer for guest
mode since the index is bogus, and freak out if a PML full exit occurs
when L2 is active.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 29 +++++++++++++++--------------
 arch/x86/kvm/vmx/vmx.c    | 12 ++++++++++--
 2 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b2f0b5e9cd63..0c6dda9980a6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2167,15 +2167,13 @@ static void prepare_vmcs02_constant_state(struct vcpu_vmx *vmx)
 		vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap));
 
 	/*
-	 * The PML address never changes, so it is constant in vmcs02.
-	 * Conceptually we want to copy the PML index from vmcs01 here,
-	 * and then back to vmcs01 on nested vmexit.  But since we flush
-	 * the log and reset GUEST_PML_INDEX on each vmexit, the PML
-	 * index is also effectively constant in vmcs02.
+	 * PML is emulated for L2, but never enabled in hardware as the MMU
+	 * handles A/D emulation.  Disabling PML for L2 also avoids having to
+	 * deal with filtering out L2 GPAs from the buffer.
 	 */
 	if (enable_pml) {
-		vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
-		vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
+		vmcs_write64(PML_ADDRESS, 0);
+		vmcs_write16(GUEST_PML_INDEX, -1);
 	}
 
 	if (cpu_has_vmx_encls_vmexit())
@@ -2210,7 +2208,7 @@ static void prepare_vmcs02_early_rare(struct vcpu_vmx *vmx,
 
 static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 {
-	u32 exec_control, vmcs12_exec_ctrl;
+	u32 exec_control;
 	u64 guest_efer = nested_vmx_calc_efer(vmx, vmcs12);
 
 	if (vmx->nested.dirty_vmcs12 || vmx->nested.hv_evmcs)
@@ -2284,11 +2282,11 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 				  SECONDARY_EXEC_APIC_REGISTER_VIRT |
 				  SECONDARY_EXEC_ENABLE_VMFUNC);
 		if (nested_cpu_has(vmcs12,
-				   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)) {
-			vmcs12_exec_ctrl = vmcs12->secondary_vm_exec_control &
-				~SECONDARY_EXEC_ENABLE_PML;
-			exec_control |= vmcs12_exec_ctrl;
-		}
+				   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
+			exec_control |= vmcs12->secondary_vm_exec_control;
+
+		/* PML is emulated and never enabled in hardware for L2. */
+		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
 		/* VMCS shadowing for L2 is emulated for now */
 		exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
@@ -5793,7 +5791,10 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
 	case EXIT_REASON_PREEMPTION_TIMER:
 		return true;
 	case EXIT_REASON_PML_FULL:
-		/* We emulate PML support to L1. */
+		/*
+		 * PML is emulated for an L1 VMM and should never be enabled in
+		 * vmcs02, always "handle" PML_FULL by exiting to userspace.
+		 */
 		return true;
 	case EXIT_REASON_VMFUNC:
 		/* VM functions are emulated through L2->L0 vmexits. */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e0a3a9be654b..b47ed3f412ef 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5976,9 +5976,10 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 	 * updated. Another good is, in kvm_vm_ioctl_get_dirty_log, before
 	 * querying dirty_bitmap, we only need to kick all vcpus out of guest
 	 * mode as if vcpus is in root mode, the PML buffer must has been
-	 * flushed already.
+	 * flushed already.  Note, PML is never enabled in hardware while
+	 * running L2.
 	 */
-	if (enable_pml)
+	if (enable_pml && !is_guest_mode(vcpu))
 		vmx_flush_pml_buffer(vcpu);
 
 	/*
@@ -5994,6 +5995,13 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 		return handle_invalid_guest_state(vcpu);
 
 	if (is_guest_mode(vcpu)) {
+		/*
+		 * PML is never enabled when running L2, bail immediately if a
+		 * PML full exit occurs as something is horribly wrong.
+		 */
+		if (exit_reason.basic == EXIT_REASON_PML_FULL)
+			goto unexpected_vmexit;
+
 		/*
 		 * The host physical addresses of some pages of guest memory
 		 * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/14] KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect()
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (5 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 06/14] KVM: nVMX: Disable PML in hardware when running L2 Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-13  0:50 ` [PATCH 08/14] KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function Sean Christopherson
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Expand the comment about need to use write-protection for nested EPT
when PML is enabled to clarify that the tagging is a nop when PML is
_not_ enabled.  Without the clarification, omitting the PML check looks
wrong at first^Wfifth glance.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu_internal.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 0b55aa561ec8..72b0928f2b2d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -84,7 +84,10 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
 	 * When using the EPT page-modification log, the GPAs in the log
 	 * would come from L2 rather than L1.  Therefore, we need to rely
 	 * on write protection to record dirty pages.  This also bypasses
-	 * PML, since writes now result in a vmexit.
+	 * PML, since writes now result in a vmexit.  Note, this helper will
+	 * tag SPTEs as needing write-protection even if PML is disabled or
+	 * unsupported, but that's ok because the tag is consumed if and only
+	 * if PML is enabled.  Omit the PML check to save a few uops.
 	 */
 	return vcpu->arch.mmu == &vcpu->arch.guest_mmu;
 }
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/14] KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (6 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 07/14] KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect() Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-18 12:45   ` Paolo Bonzini
  2021-02-13  0:50 ` [PATCH 09/14] KVM: x86: Move MMU's PML logic to common code Sean Christopherson
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Store the vendor-specific dirty log size in a variable, there's no need
to wrap it in a function since the value is constant after
hardware_setup() runs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 1 -
 arch/x86/include/asm/kvm_host.h    | 2 +-
 arch/x86/kvm/mmu/mmu.c             | 5 +----
 arch/x86/kvm/vmx/vmx.c             | 9 ++-------
 4 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 355a2ab8fc09..28c07cc01474 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -97,7 +97,6 @@ KVM_X86_OP_NULL(slot_enable_log_dirty)
 KVM_X86_OP_NULL(slot_disable_log_dirty)
 KVM_X86_OP_NULL(flush_log_dirty)
 KVM_X86_OP_NULL(enable_log_dirty_pt_masked)
-KVM_X86_OP_NULL(cpu_dirty_log_size)
 KVM_X86_OP_NULL(pre_block)
 KVM_X86_OP_NULL(post_block)
 KVM_X86_OP_NULL(vcpu_blocking)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 84499aad01a4..fb59933610d9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1294,7 +1294,7 @@ struct kvm_x86_ops {
 	void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
 					   struct kvm_memory_slot *slot,
 					   gfn_t offset, unsigned long mask);
-	int (*cpu_dirty_log_size)(void);
+	int cpu_dirty_log_size;
 
 	/* pmu operations of sub-arch */
 	const struct kvm_pmu_ops *pmu_ops;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d5849a0e3de1..6c32e8e0f720 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1294,10 +1294,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 
 int kvm_cpu_dirty_log_size(void)
 {
-	if (kvm_x86_ops.cpu_dirty_log_size)
-		return static_call(kvm_x86_cpu_dirty_log_size)();
-
-	return 0;
+	return kvm_x86_ops.cpu_dirty_log_size;
 }
 
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b47ed3f412ef..f843707dd7df 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7650,11 +7650,6 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
 	return supported & BIT(bit);
 }
 
-static int vmx_cpu_dirty_log_size(void)
-{
-	return enable_pml ? PML_ENTITY_NUM : 0;
-}
-
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.hardware_unsetup = hardware_unsetup,
 
@@ -7758,6 +7753,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
 	.flush_log_dirty = vmx_flush_log_dirty,
 	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
+	.cpu_dirty_log_size = PML_ENTITY_NUM,
 
 	.pre_block = vmx_pre_block,
 	.post_block = vmx_post_block,
@@ -7785,7 +7781,6 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.msr_filter_changed = vmx_msr_filter_changed,
 	.complete_emulated_msr = kvm_complete_insn_gp,
-	.cpu_dirty_log_size = vmx_cpu_dirty_log_size,
 
 	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
 };
@@ -7907,7 +7902,7 @@ static __init int hardware_setup(void)
 		vmx_x86_ops.slot_disable_log_dirty = NULL;
 		vmx_x86_ops.flush_log_dirty = NULL;
 		vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
-		vmx_x86_ops.cpu_dirty_log_size = NULL;
+		vmx_x86_ops.cpu_dirty_log_size = 0;
 	}
 
 	if (!cpu_has_vmx_preemption_timer())
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/14] KVM: x86: Move MMU's PML logic to common code
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (7 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 08/14] KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-13  0:50 ` [PATCH 10/14] KVM: x86: Further clarify the logic and comments for toggling log dirty Sean Christopherson
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Drop the facade of KVM's PML logic being vendor specific and move the
bits that aren't truly VMX specific into common x86 code.  The MMU logic
for dealing with PML is tightly coupled to the feature and to VMX's
implementation, bouncing through kvm_x86_ops obfuscates the code without
providing any meaningful separation of concerns or encapsulation.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  4 ---
 arch/x86/include/asm/kvm_host.h    | 27 ++-------------
 arch/x86/kvm/mmu/mmu.c             | 16 +++------
 arch/x86/kvm/vmx/vmx.c             | 55 +-----------------------------
 arch/x86/kvm/x86.c                 | 22 ++++++++----
 5 files changed, 24 insertions(+), 100 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 28c07cc01474..90affdb2cbbc 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -93,10 +93,6 @@ KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
 KVM_X86_OP_NULL(request_immediate_exit)
 KVM_X86_OP(sched_in)
-KVM_X86_OP_NULL(slot_enable_log_dirty)
-KVM_X86_OP_NULL(slot_disable_log_dirty)
-KVM_X86_OP_NULL(flush_log_dirty)
-KVM_X86_OP_NULL(enable_log_dirty_pt_masked)
 KVM_X86_OP_NULL(pre_block)
 KVM_X86_OP_NULL(post_block)
 KVM_X86_OP_NULL(vcpu_blocking)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fb59933610d9..5cf382ec48b0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1271,29 +1271,9 @@ struct kvm_x86_ops {
 	void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
 
 	/*
-	 * Arch-specific dirty logging hooks. These hooks are only supposed to
-	 * be valid if the specific arch has hardware-accelerated dirty logging
-	 * mechanism. Currently only for PML on VMX.
-	 *
-	 *  - slot_enable_log_dirty:
-	 *	called when enabling log dirty mode for the slot.
-	 *  - slot_disable_log_dirty:
-	 *	called when disabling log dirty mode for the slot.
-	 *	also called when slot is created with log dirty disabled.
-	 *  - flush_log_dirty:
-	 *	called before reporting dirty_bitmap to userspace.
-	 *  - enable_log_dirty_pt_masked:
-	 *	called when reenabling log dirty for the GFNs in the mask after
-	 *	corresponding bits are cleared in slot->dirty_bitmap.
+	 * Size of the CPU's dirty log buffer, i.e. VMX's PML buffer.  A zero
+	 * value indicates CPU dirty logging is unsupported or disabled.
 	 */
-	void (*slot_enable_log_dirty)(struct kvm *kvm,
-				      struct kvm_memory_slot *slot);
-	void (*slot_disable_log_dirty)(struct kvm *kvm,
-				       struct kvm_memory_slot *slot);
-	void (*flush_log_dirty)(struct kvm *kvm);
-	void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
-					   struct kvm_memory_slot *slot,
-					   gfn_t offset, unsigned long mask);
 	int cpu_dirty_log_size;
 
 	/* pmu operations of sub-arch */
@@ -1439,9 +1419,6 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
 					struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 			    struct kvm_memory_slot *memslot);
-void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
-				   struct kvm_memory_slot *slot,
-				   gfn_t gfn_offset, unsigned long mask);
 void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6c32e8e0f720..86182e79beaf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1250,9 +1250,9 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
  *
  * Used for PML to re-log the dirty GPAs after userspace querying dirty_bitmap.
  */
-void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
-				     struct kvm_memory_slot *slot,
-				     gfn_t gfn_offset, unsigned long mask)
+static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
+					 struct kvm_memory_slot *slot,
+					 gfn_t gfn_offset, unsigned long mask)
 {
 	struct kvm_rmap_head *rmap_head;
 
@@ -1268,7 +1268,6 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 		mask &= mask - 1;
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked);
 
 /**
  * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
@@ -1284,10 +1283,8 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 				struct kvm_memory_slot *slot,
 				gfn_t gfn_offset, unsigned long mask)
 {
-	if (kvm_x86_ops.enable_log_dirty_pt_masked)
-		static_call(kvm_x86_enable_log_dirty_pt_masked)(kvm, slot,
-								gfn_offset,
-								mask);
+	if (kvm_x86_ops.cpu_dirty_log_size)
+		kvm_mmu_clear_dirty_pt_masked(kvm, slot, gfn_offset, mask);
 	else
 		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
@@ -5616,7 +5613,6 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 	if (flush)
 		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);
 
 void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
 					struct kvm_memory_slot *memslot)
@@ -5633,7 +5629,6 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
 	if (flush)
 		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access);
 
 void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 			    struct kvm_memory_slot *memslot)
@@ -5649,7 +5644,6 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 	if (flush)
 		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
 
 void kvm_mmu_zap_all(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f843707dd7df..862d1f5627e7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5776,24 +5776,6 @@ static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
 	vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
 }
 
-/*
- * Flush all vcpus' PML buffer and update logged GPAs to dirty_bitmap.
- * Called before reporting dirty_bitmap to userspace.
- */
-static void kvm_flush_pml_buffers(struct kvm *kvm)
-{
-	int i;
-	struct kvm_vcpu *vcpu;
-	/*
-	 * We only need to kick vcpu out of guest mode here, as PML buffer
-	 * is flushed at beginning of all VMEXITs, and it's obvious that only
-	 * vcpus running in guest are possible to have unflushed GPAs in PML
-	 * buffer.
-	 */
-	kvm_for_each_vcpu(i, vcpu, kvm)
-		kvm_vcpu_kick(vcpu);
-}
-
 static void vmx_dump_sel(char *name, uint32_t sel)
 {
 	pr_err("%s sel=0x%04x, attr=0x%05x, limit=0x%08x, base=0x%016lx\n",
@@ -7517,32 +7499,6 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 		shrink_ple_window(vcpu);
 }
 
-static void vmx_slot_enable_log_dirty(struct kvm *kvm,
-				     struct kvm_memory_slot *slot)
-{
-	if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
-		kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
-	kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
-}
-
-static void vmx_slot_disable_log_dirty(struct kvm *kvm,
-				       struct kvm_memory_slot *slot)
-{
-	kvm_mmu_slot_set_dirty(kvm, slot);
-}
-
-static void vmx_flush_log_dirty(struct kvm *kvm)
-{
-	kvm_flush_pml_buffers(kvm);
-}
-
-static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
-					   struct kvm_memory_slot *memslot,
-					   gfn_t offset, unsigned long mask)
-{
-	kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
-}
-
 static int vmx_pre_block(struct kvm_vcpu *vcpu)
 {
 	if (pi_pre_block(vcpu))
@@ -7749,10 +7705,6 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.sched_in = vmx_sched_in,
 
-	.slot_enable_log_dirty = vmx_slot_enable_log_dirty,
-	.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
-	.flush_log_dirty = vmx_flush_log_dirty,
-	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
 	.cpu_dirty_log_size = PML_ENTITY_NUM,
 
 	.pre_block = vmx_pre_block,
@@ -7897,13 +7849,8 @@ static __init int hardware_setup(void)
 	if (!enable_ept || !enable_ept_ad_bits || !cpu_has_vmx_pml())
 		enable_pml = 0;
 
-	if (!enable_pml) {
-		vmx_x86_ops.slot_enable_log_dirty = NULL;
-		vmx_x86_ops.slot_disable_log_dirty = NULL;
-		vmx_x86_ops.flush_log_dirty = NULL;
-		vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
+	if (!enable_pml)
 		vmx_x86_ops.cpu_dirty_log_size = 0;
-	}
 
 	if (!cpu_has_vmx_preemption_timer())
 		enable_preemption_timer = false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3fa140383f5d..e89fe98a0099 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5214,10 +5214,18 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
+
 	/*
-	 * Flush potentially hardware-cached dirty pages to dirty_bitmap.
+	 * Flush all CPUs' dirty log buffers to the  dirty_bitmap.  Called
+	 * before reporting dirty_bitmap to userspace.  KVM flushes the buffers
+	 * on all VM-Exits, thus we only need to kick running vCPUs to force a
+	 * VM-Exit.
 	 */
-	static_call_cond(kvm_x86_flush_log_dirty)(kvm);
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		kvm_vcpu_kick(vcpu);
 }
 
 int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
@@ -10809,8 +10817,10 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 	 * is enabled the D-bit or the W-bit will be cleared.
 	 */
 	if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-		if (kvm_x86_ops.slot_enable_log_dirty) {
-			static_call(kvm_x86_slot_enable_log_dirty)(kvm, new);
+		if (kvm_x86_ops.cpu_dirty_log_size) {
+			if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
+				kvm_mmu_slot_leaf_clear_dirty(kvm, new);
+			kvm_mmu_slot_largepage_remove_write_access(kvm, new);
 		} else {
 			int level =
 				kvm_dirty_log_manual_protect_and_init_set(kvm) ?
@@ -10826,8 +10836,8 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 			 */
 			kvm_mmu_slot_remove_write_access(kvm, new, level);
 		}
-	} else {
-		static_call_cond(kvm_x86_slot_disable_log_dirty)(kvm, new);
+	} else if (kvm_x86_ops.cpu_dirty_log_size) {
+		kvm_mmu_slot_set_dirty(kvm, new);
 	}
 }
 
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/14] KVM: x86: Further clarify the logic and comments for toggling log dirty
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (8 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 09/14] KVM: x86: Move MMU's PML logic to common code Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-18 12:50   ` Paolo Bonzini
  2021-02-13  0:50 ` [PATCH 11/14] KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging Sean Christopherson
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Add a sanity check in kvm_mmu_slot_apply_flags to assert that the
LOG_DIRTY_PAGES flag is indeed being toggled, and explicitly rely on
that holding true when zapping collapsible SPTEs.  Manipulating the
CPU dirty log (PML) and write-protection also relies on this assertion,
but that's not obvious in the current code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e89fe98a0099..c0d22f19aed0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10761,12 +10761,20 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 				     enum kvm_mr_change change)
 {
 	/*
-	 * Nothing to do for RO slots or CREATE/MOVE/DELETE of a slot.
-	 * See comments below.
+	 * Nothing to do for RO slots (which can't be dirtied and can't be made
+	 * writable) or CREATE/MOVE/DELETE of a slot.  See comments below.
 	 */
 	if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY))
 		return;
 
+	/*
+	 * READONLY and non-flags changes were filtered out above, and the only
+	 * other flag is LOG_DIRTY_PAGES, i.e. something is wrong if dirty
+	 * logging isn't being toggled on or off.
+	 */
+	if (WARN_ON_ONCE(!((old->flags ^ new->flags) & KVM_MEM_LOG_DIRTY_PAGES)))
+		return;
+
 	/*
 	 * Dirty logging tracks sptes in 4k granularity, meaning that large
 	 * sptes have to be split.  If live migration is successful, the guest
@@ -10784,8 +10792,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 	 * MOVE/DELETE: The old mappings will already have been cleaned up by
 	 *		kvm_arch_flush_shadow_memslot()
 	 */
-	if ((old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-	    !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
+	if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
 		kvm_mmu_zap_collapsible_sptes(kvm, new);
 
 	/*
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/14] KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (9 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 10/14] KVM: x86: Further clarify the logic and comments for toggling log dirty Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-13  0:50 ` [PATCH 12/14] KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML Sean Christopherson
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

From: Makarand Sonare <makarandsonare@google.com>

Currently, if enable_pml=1 PML remains enabled for the entire lifetime
of the VM irrespective of whether dirty logging is enable or disabled.
When dirty logging is disabled, all the pages of the VM are manually
marked dirty, so that PML is effectively non-operational.  Setting
the dirty bits is an expensive operation which can cause severe MMU
lock contention in a performance sensitive path when dirty logging is
disabled after a failed or canceled live migration.

Manually setting dirty bits also fails to prevent PML activity if some
code path clears dirty bits, which can incur unnecessary VM-Exits.

In order to avoid this extra overhead, dynamically enable/disable PML
when dirty logging gets turned on/off for the first/last memslot.

Signed-off-by: Makarand Sonare <makarandsonare@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  4 ++++
 arch/x86/kvm/vmx/nested.c          |  5 +++++
 arch/x86/kvm/vmx/vmx.c             | 28 +++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h             |  2 ++
 arch/x86/kvm/x86.c                 | 35 ++++++++++++++++++++++++++----
 6 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 90affdb2cbbc..323641097f63 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -93,6 +93,7 @@ KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
 KVM_X86_OP_NULL(request_immediate_exit)
 KVM_X86_OP(sched_in)
+KVM_X86_OP_NULL(update_cpu_dirty_logging)
 KVM_X86_OP_NULL(pre_block)
 KVM_X86_OP_NULL(post_block)
 KVM_X86_OP_NULL(vcpu_blocking)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5cf382ec48b0..ffcfa84c969d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -89,6 +89,8 @@
 	KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_APF_READY		KVM_ARCH_REQ(28)
 #define KVM_REQ_MSR_FILTER_CHANGED	KVM_ARCH_REQ(29)
+#define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \
+	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1007,6 +1009,7 @@ struct kvm_arch {
 	u32 bsp_vcpu_id;
 
 	u64 disabled_quirks;
+	int cpu_dirty_logging_count;
 
 	enum kvm_irqchip_mode irqchip_mode;
 	u8 nr_reserved_ioapic_pins;
@@ -1275,6 +1278,7 @@ struct kvm_x86_ops {
 	 * value indicates CPU dirty logging is unsupported or disabled.
 	 */
 	int cpu_dirty_log_size;
+	void (*update_cpu_dirty_logging)(struct kvm_vcpu *vcpu);
 
 	/* pmu operations of sub-arch */
 	const struct kvm_pmu_ops *pmu_ops;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c6dda9980a6..a63da447ede9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4493,6 +4493,11 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 		vmx_set_virtual_apic_mode(vcpu);
 	}
 
+	if (vmx->nested.update_vmcs01_cpu_dirty_logging) {
+		vmx->nested.update_vmcs01_cpu_dirty_logging = false;
+		vmx_update_cpu_dirty_logging(vcpu);
+	}
+
 	/* Unpin physical memory we referred to in vmcs02 */
 	if (vmx->nested.apic_access_page) {
 		kvm_release_page_clean(vmx->nested.apic_access_page);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 862d1f5627e7..1204e5f0fe67 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4277,7 +4277,12 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 	*/
 	exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
 
-	if (!enable_pml)
+	/*
+	 * PML is enabled/disabled when dirty logging of memsmlots changes, but
+	 * it needs to be set here when dirty logging is already active, e.g.
+	 * if this vCPU was created after dirty logging was enabled.
+	 */
+	if (!vcpu->kvm->arch.cpu_dirty_logging_count)
 		exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
 	if (cpu_has_vmx_xsaves()) {
@@ -7499,6 +7504,26 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 		shrink_ple_window(vcpu);
 }
 
+void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (is_guest_mode(vcpu)) {
+		vmx->nested.update_vmcs01_cpu_dirty_logging = true;
+		return;
+	}
+
+	/*
+	 * Note, cpu_dirty_logging_count can be changed concurrent with this
+	 * code, but in that case another update request will be made and so
+	 * the guest will never run with a stale PML value.
+	 */
+	if (vcpu->kvm->arch.cpu_dirty_logging_count)
+		secondary_exec_controls_setbit(vmx, SECONDARY_EXEC_ENABLE_PML);
+	else
+		secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_ENABLE_PML);
+}
+
 static int vmx_pre_block(struct kvm_vcpu *vcpu)
 {
 	if (pi_pre_block(vcpu))
@@ -7706,6 +7731,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.sched_in = vmx_sched_in,
 
 	.cpu_dirty_log_size = PML_ENTITY_NUM,
+	.update_cpu_dirty_logging = vmx_update_cpu_dirty_logging,
 
 	.pre_block = vmx_pre_block,
 	.post_block = vmx_post_block,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 12c53d05a902..89da5e1251f1 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -165,6 +165,7 @@ struct nested_vmx {
 
 	bool change_vmcs01_virtual_apic_mode;
 	bool reload_vmcs01_apic_access_page;
+	bool update_vmcs01_cpu_dirty_logging;
 
 	/*
 	 * Enlightened VMCS has been enabled. It does not mean that L1 has to
@@ -393,6 +394,7 @@ int vmx_find_loadstore_msr_slot(struct vmx_msrs *m, u32 msr);
 void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu);
 void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
 	u32 msr, int type, bool value);
+void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
 
 static inline u8 vmx_get_rvi(void)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c0d22f19aed0..b9a8c8af9713 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8987,6 +8987,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_check_async_pf_completion(vcpu);
 		if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu))
 			static_call(kvm_x86_msr_filter_changed)(vcpu);
+
+		if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu))
+			static_call(kvm_x86_update_cpu_dirty_logging)(vcpu);
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win ||
@@ -10755,14 +10758,38 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	return 0;
 }
 
+
+static void kvm_mmu_update_cpu_dirty_logging(struct kvm *kvm, bool enable)
+{
+	struct kvm_arch *ka = &kvm->arch;
+
+	if (!kvm_x86_ops.cpu_dirty_log_size)
+		return;
+
+	if ((enable && ++ka->cpu_dirty_logging_count == 1) ||
+	    (!enable && --ka->cpu_dirty_logging_count == 0))
+		kvm_make_all_cpus_request(kvm, KVM_REQ_UPDATE_CPU_DIRTY_LOGGING);
+
+	WARN_ON_ONCE(ka->cpu_dirty_logging_count < 0);
+}
+
 static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 				     struct kvm_memory_slot *old,
 				     struct kvm_memory_slot *new,
 				     enum kvm_mr_change change)
 {
+	bool log_dirty_pages = new->flags & KVM_MEM_LOG_DIRTY_PAGES;
+
 	/*
-	 * Nothing to do for RO slots (which can't be dirtied and can't be made
-	 * writable) or CREATE/MOVE/DELETE of a slot.  See comments below.
+	 * Update CPU dirty logging if dirty logging is being toggled.  This
+	 * applies to all operations.
+	 */
+	if ((old->flags ^ new->flags) & KVM_MEM_LOG_DIRTY_PAGES)
+		kvm_mmu_update_cpu_dirty_logging(kvm, log_dirty_pages);
+
+	/*
+	 * Nothing more to do for RO slots (which can't be dirtied and can't be
+	 * made writable) or CREATE/MOVE/DELETE of a slot.  See comments below.
 	 */
 	if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY))
 		return;
@@ -10792,7 +10819,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 	 * MOVE/DELETE: The old mappings will already have been cleaned up by
 	 *		kvm_arch_flush_shadow_memslot()
 	 */
-	if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
+	if (!log_dirty_pages)
 		kvm_mmu_zap_collapsible_sptes(kvm, new);
 
 	/*
@@ -10823,7 +10850,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 	 * initial-all-set state.  Otherwise, depending on whether pml
 	 * is enabled the D-bit or the W-bit will be cleared.
 	 */
-	if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
+	if (log_dirty_pages) {
 		if (kvm_x86_ops.cpu_dirty_log_size) {
 			if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
 				kvm_mmu_slot_leaf_clear_dirty(kvm, new);
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/14] KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (10 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 11/14] KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-18 17:08   ` Paolo Bonzini
  2021-02-13  0:50 ` [PATCH 13/14] KVM: x86: Fold "write-protect large" use case into generic write-protect Sean Christopherson
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Stop setting dirty bits for MMU pages when dirty logging is disabled for
a memslot, as PML is now completely disabled when there are no memslots
with dirty logging enabled.

This means that spurious PML entries will be created for memslots with
dirty logging disabled if at least one other memslot has dirty logging
enabled, but for all known use cases, dirty logging is a global VMM
control.  Furthermore, spurious PML entries are already possible since
dirty bits are set only when a dirty logging is turned off, i.e. memslots
that are never dirty logged will have dirty bits cleared.

In the end, it's faster overall to eat a few spurious PML entries in the
window where dirty logging is being disabled across all memslots.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 -
 arch/x86/kvm/mmu/mmu.c          | 45 -----------------
 arch/x86/kvm/mmu/tdp_mmu.c      | 54 --------------------
 arch/x86/kvm/mmu/tdp_mmu.h      |  1 -
 arch/x86/kvm/x86.c              | 87 ++++++++++++++-------------------
 5 files changed, 36 insertions(+), 153 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ffcfa84c969d..c15d6de8c457 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1421,8 +1421,6 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 				   struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
 					struct kvm_memory_slot *memslot);
-void kvm_mmu_slot_set_dirty(struct kvm *kvm,
-			    struct kvm_memory_slot *memslot);
 void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 86182e79beaf..44ee55b26c3d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1181,36 +1181,6 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
 	return flush;
 }
 
-static bool spte_set_dirty(u64 *sptep)
-{
-	u64 spte = *sptep;
-
-	rmap_printk("spte %p %llx\n", sptep, *sptep);
-
-	/*
-	 * Similar to the !kvm_x86_ops.slot_disable_log_dirty case,
-	 * do not bother adding back write access to pages marked
-	 * SPTE_AD_WRPROT_ONLY_MASK.
-	 */
-	spte |= shadow_dirty_mask;
-
-	return mmu_spte_update(sptep, spte);
-}
-
-static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-			     struct kvm_memory_slot *slot)
-{
-	u64 *sptep;
-	struct rmap_iterator iter;
-	bool flush = false;
-
-	for_each_rmap_spte(rmap_head, &iter, sptep)
-		if (spte_ad_enabled(*sptep))
-			flush |= spte_set_dirty(sptep);
-
-	return flush;
-}
-
 /**
  * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
  * @kvm: kvm instance
@@ -5630,21 +5600,6 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
 		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
 }
 
-void kvm_mmu_slot_set_dirty(struct kvm *kvm,
-			    struct kvm_memory_slot *memslot)
-{
-	bool flush;
-
-	write_lock(&kvm->mmu_lock);
-	flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false);
-	if (is_tdp_mmu_enabled(kvm))
-		flush |= kvm_tdp_mmu_slot_set_dirty(kvm, memslot);
-	write_unlock(&kvm->mmu_lock);
-
-	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
-}
-
 void kvm_mmu_zap_all(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f8fa1f64e10d..c926c6b899a1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1268,60 +1268,6 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 	}
 }
 
-/*
- * Set the dirty status of all the SPTEs mapping GFNs in the memslot. This is
- * only used for PML, and so will involve setting the dirty bit on each SPTE.
- * Returns true if an SPTE has been changed and the TLBs need to be flushed.
- */
-static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
-				gfn_t start, gfn_t end)
-{
-	struct tdp_iter iter;
-	u64 new_spte;
-	bool spte_set = false;
-
-	rcu_read_lock();
-
-	tdp_root_for_each_pte(iter, root, start, end) {
-		if (tdp_mmu_iter_cond_resched(kvm, &iter, false))
-			continue;
-
-		if (!is_shadow_present_pte(iter.old_spte) ||
-		    iter.old_spte & shadow_dirty_mask)
-			continue;
-
-		new_spte = iter.old_spte | shadow_dirty_mask;
-
-		tdp_mmu_set_spte(kvm, &iter, new_spte);
-		spte_set = true;
-	}
-
-	rcu_read_unlock();
-	return spte_set;
-}
-
-/*
- * Set the dirty status of all the SPTEs mapping GFNs in the memslot. This is
- * only used for PML, and so will involve setting the dirty bit on each SPTE.
- * Returns true if an SPTE has been changed and the TLBs need to be flushed.
- */
-bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot)
-{
-	struct kvm_mmu_page *root;
-	int root_as_id;
-	bool spte_set = false;
-
-	for_each_tdp_mmu_root_yield_safe(kvm, root) {
-		root_as_id = kvm_mmu_page_as_id(root);
-		if (root_as_id != slot->as_id)
-			continue;
-
-		spte_set |= set_dirty_gfn_range(kvm, root, slot->base_gfn,
-				slot->base_gfn + slot->npages);
-	}
-	return spte_set;
-}
-
 /*
  * Clear leaf entries which could be replaced by large mappings, for
  * GFNs within the slot.
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index d31c5ed81a18..3b761c111bff 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -33,7 +33,6 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
 				       struct kvm_memory_slot *slot,
 				       gfn_t gfn, unsigned long mask,
 				       bool wrprot);
-bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				       struct kvm_memory_slot *slot);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b9a8c8af9713..dca2c3333ef2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10789,7 +10789,18 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 
 	/*
 	 * Nothing more to do for RO slots (which can't be dirtied and can't be
-	 * made writable) or CREATE/MOVE/DELETE of a slot.  See comments below.
+	 * made writable) or CREATE/MOVE/DELETE of a slot.
+	 *
+	 * For a memslot with dirty logging disabled:
+	 * CREATE:      No dirty mappings will already exist.
+	 * MOVE/DELETE: The old mappings will already have been cleaned up by
+	 *		kvm_arch_flush_shadow_memslot()
+	 *
+	 * For a memslot with dirty logging enabled:
+	 * CREATE:      No shadow pages exist, thus nothing to write-protect
+	 *		and no dirty bits to clear.
+	 * MOVE/DELETE: The old mappings will already have been cleaned up by
+	 *		kvm_arch_flush_shadow_memslot().
 	 */
 	if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY))
 		return;
@@ -10802,55 +10813,31 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 	if (WARN_ON_ONCE(!((old->flags ^ new->flags) & KVM_MEM_LOG_DIRTY_PAGES)))
 		return;
 
-	/*
-	 * Dirty logging tracks sptes in 4k granularity, meaning that large
-	 * sptes have to be split.  If live migration is successful, the guest
-	 * in the source machine will be destroyed and large sptes will be
-	 * created in the destination. However, if the guest continues to run
-	 * in the source machine (for example if live migration fails), small
-	 * sptes will remain around and cause bad performance.
-	 *
-	 * Scan sptes if dirty logging has been stopped, dropping those
-	 * which can be collapsed into a single large-page spte.  Later
-	 * page faults will create the large-page sptes.
-	 *
-	 * There is no need to do this in any of the following cases:
-	 * CREATE:      No dirty mappings will already exist.
-	 * MOVE/DELETE: The old mappings will already have been cleaned up by
-	 *		kvm_arch_flush_shadow_memslot()
-	 */
-	if (!log_dirty_pages)
+	if (!log_dirty_pages) {
+		/*
+		 * Dirty logging tracks sptes in 4k granularity, meaning that
+		 * large sptes have to be split.  If live migration succeeds,
+		 * the guest in the source machine will be destroyed and large
+		 * sptes will be created in the destination.  However, if the
+		 * guest continues to run in the source machine (for example if
+		 * live migration fails), small sptes will remain around and
+		 * cause bad performance.
+		 *
+		 * Scan sptes if dirty logging has been stopped, dropping those
+		 * which can be collapsed into a single large-page spte.  Later
+		 * page faults will create the large-page sptes.
+		 */
 		kvm_mmu_zap_collapsible_sptes(kvm, new);
-
-	/*
-	 * Enable or disable dirty logging for the slot.
-	 *
-	 * For KVM_MR_DELETE and KVM_MR_MOVE, the shadow pages of the old
-	 * slot have been zapped so no dirty logging updates are needed for
-	 * the old slot.
-	 * For KVM_MR_CREATE and KVM_MR_MOVE, once the new slot is visible
-	 * any mappings that might be created in it will consume the
-	 * properties of the new slot and do not need to be updated here.
-	 *
-	 * When PML is enabled, the kvm_x86_ops dirty logging hooks are
-	 * called to enable/disable dirty logging.
-	 *
-	 * When disabling dirty logging with PML enabled, the D-bit is set
-	 * for sptes in the slot in order to prevent unnecessary GPA
-	 * logging in the PML buffer (and potential PML buffer full VMEXIT).
-	 * This guarantees leaving PML enabled for the guest's lifetime
-	 * won't have any additional overhead from PML when the guest is
-	 * running with dirty logging disabled.
-	 *
-	 * When enabling dirty logging, large sptes are write-protected
-	 * so they can be split on first write.  New large sptes cannot
-	 * be created for this slot until the end of the logging.
-	 * See the comments in fast_page_fault().
-	 * For small sptes, nothing is done if the dirty log is in the
-	 * initial-all-set state.  Otherwise, depending on whether pml
-	 * is enabled the D-bit or the W-bit will be cleared.
-	 */
-	if (log_dirty_pages) {
+	} else {
+		/*
+		 * Large sptes are write-protected so they can be split on first
+		 * write. New large sptes cannot be created for this slot until
+		 * the end of the logging. See the comments in fast_page_fault().
+		 *
+		 * For small sptes, nothing is done if the dirty log is in the
+		 * initial-all-set state.  Otherwise, depending on whether pml
+		 * is enabled the D-bit or the W-bit will be cleared.
+		 */
 		if (kvm_x86_ops.cpu_dirty_log_size) {
 			if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
 				kvm_mmu_slot_leaf_clear_dirty(kvm, new);
@@ -10870,8 +10857,6 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 			 */
 			kvm_mmu_slot_remove_write_access(kvm, new, level);
 		}
-	} else if (kvm_x86_ops.cpu_dirty_log_size) {
-		kvm_mmu_slot_set_dirty(kvm, new);
 	}
 }
 
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/14] KVM: x86: Fold "write-protect large" use case into generic write-protect
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (11 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 12/14] KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-13  0:50 ` [PATCH 14/14] KVM: x86/mmu: Remove a variety of unnecessary exports Sean Christopherson
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole
caller to use kvm_mmu_slot_remove_write_access().  Remove the now-unused
slot_handle_large_level() and slot_handle_all_level() helpers.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 32 --------------------------------
 arch/x86/kvm/x86.c     | 32 +++++++++++++++++---------------
 2 files changed, 17 insertions(+), 47 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 44ee55b26c3d..6ad0fb1913c6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5204,22 +5204,6 @@ slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			lock_flush_tlb);
 }
 
-static __always_inline bool
-slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
-		      slot_level_handler fn, bool lock_flush_tlb)
-{
-	return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K,
-				 KVM_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
-}
-
-static __always_inline bool
-slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
-			slot_level_handler fn, bool lock_flush_tlb)
-{
-	return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K + 1,
-				 KVM_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
-}
-
 static __always_inline bool
 slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		 slot_level_handler fn, bool lock_flush_tlb)
@@ -5584,22 +5568,6 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
 }
 
-void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
-					struct kvm_memory_slot *memslot)
-{
-	bool flush;
-
-	write_lock(&kvm->mmu_lock);
-	flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect,
-					false);
-	if (is_tdp_mmu_enabled(kvm))
-		flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_2M);
-	write_unlock(&kvm->mmu_lock);
-
-	if (flush)
-		kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
-}
-
 void kvm_mmu_zap_all(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dca2c3333ef2..1d2bc89431a2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10829,24 +10829,25 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 		 */
 		kvm_mmu_zap_collapsible_sptes(kvm, new);
 	} else {
-		/*
-		 * Large sptes are write-protected so they can be split on first
-		 * write. New large sptes cannot be created for this slot until
-		 * the end of the logging. See the comments in fast_page_fault().
-		 *
-		 * For small sptes, nothing is done if the dirty log is in the
-		 * initial-all-set state.  Otherwise, depending on whether pml
-		 * is enabled the D-bit or the W-bit will be cleared.
-		 */
+		/* By default, write-protect everything to log writes. */
+		int level = PG_LEVEL_4K;
+
 		if (kvm_x86_ops.cpu_dirty_log_size) {
+			/*
+			 * Clear all dirty bits, unless pages are treated as
+			 * dirty from the get-go.
+			 */
 			if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
 				kvm_mmu_slot_leaf_clear_dirty(kvm, new);
-			kvm_mmu_slot_largepage_remove_write_access(kvm, new);
-		} else {
-			int level =
-				kvm_dirty_log_manual_protect_and_init_set(kvm) ?
-				PG_LEVEL_2M : PG_LEVEL_4K;
 
+			/*
+			 * Write-protect large pages on write so that dirty
+			 * logging happens at 4k granularity.  No need to
+			 * write-protect small SPTEs since write accesses are
+			 * logged by the CPU via dirty bits.
+			 */
+			level = PG_LEVEL_2M;
+		} else if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
 			/*
 			 * If we're with initial-all-set, we don't need
 			 * to write protect any small page because
@@ -10855,8 +10856,9 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 			 * so that the page split can happen lazily on
 			 * the first write to the huge page.
 			 */
-			kvm_mmu_slot_remove_write_access(kvm, new, level);
+			level = PG_LEVEL_2M;
 		}
+		kvm_mmu_slot_remove_write_access(kvm, new, level);
 	}
 }
 
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 14/14] KVM: x86/mmu: Remove a variety of unnecessary exports
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (12 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 13/14] KVM: x86: Fold "write-protect large" use case into generic write-protect Sean Christopherson
@ 2021-02-13  0:50 ` Sean Christopherson
  2021-02-17 22:50 ` [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
  2021-02-18 12:57 ` Paolo Bonzini
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-13  0:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

Remove several exports from the MMU that are no longer necessary.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu/mmu.c          | 35 ++++++++++++++-------------------
 2 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c15d6de8c457..0cf71ff2b2e5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1592,7 +1592,6 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 void kvm_update_dr7(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
-int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6ad0fb1913c6..30e9b0cb9abd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2466,7 +2466,21 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
+
+static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
+{
+	gpa_t gpa;
+	int r;
+
+	if (vcpu->arch.mmu->direct_map)
+		return 0;
+
+	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
+
+	r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+
+	return r;
+}
 
 static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
@@ -3411,7 +3425,6 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	kvm_mmu_audit(vcpu, AUDIT_POST_SYNC);
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_sync_roots);
 
 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gpa_t vaddr,
 				  u32 access, struct x86_exception *exception)
@@ -4977,22 +4990,6 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
 
-int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
-{
-	gpa_t gpa;
-	int r;
-
-	if (vcpu->arch.mmu->direct_map)
-		return 0;
-
-	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
-
-	r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
-
-	return r;
-}
-EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt);
-
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 		       void *insn, int insn_len)
 {
@@ -5091,7 +5088,6 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		mmu->invlpg(vcpu, gva, root_hpa);
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_gva);
 
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
 {
@@ -5131,7 +5127,6 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
 	 * for them.
 	 */
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_invpcid_gva);
 
 void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
 		       int tdp_huge_page_level)
-- 
2.30.0.478.g8a0d178c01-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (13 preceding siblings ...)
  2021-02-13  0:50 ` [PATCH 14/14] KVM: x86/mmu: Remove a variety of unnecessary exports Sean Christopherson
@ 2021-02-17 22:50 ` Sean Christopherson
  2021-02-18 12:57 ` Paolo Bonzini
  15 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-17 22:50 UTC (permalink / raw)
  To: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

On Fri, Feb 12, 2021, Sean Christopherson wrote:
> Paolo, this is more or less ready, but on final read-through before
> sending I realized it would be a good idea to WARN during VM destruction
> if cpu_dirty_logging_count is non-zero.  I wanted to get you this before
> the 5.12 window opens in case you want the TDP MMU fixes for 5.12.  I'll
> do the above change and retest next week (note, Monday is a US holiday).

Verified cpu_dirty_logging_count does indeed hit zero during VM destruction.

Adding a WARN to KVM would require adding an arch hook to kvm_free_memslots(),
otherwise the count will be non-zero if the VM is destroyed with dirty logging
active.  That doesn't seem worthwhile, so I'm not planning on pursuing a WARN
for the upstream code.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/14] KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE pages
  2021-02-13  0:50 ` [PATCH 01/14] KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE pages Sean Christopherson
@ 2021-02-18 12:36   ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2021-02-18 12:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On 13/02/21 01:50, Sean Christopherson wrote:
> Zap SPTEs that are backed by ZONE_DEVICE pages when zappings SPTEs to
> rebuild them as huge pages in the TDP MMU.  ZONE_DEVICE huge pages are
> managed differently than "regular" pages and are not compound pages.
> 
> Cc: Ben Gardon <bgardon@google.com>
> Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/tdp_mmu.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 71e100a5670f..3cc332ed099d 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1348,7 +1348,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
>   
>   		pfn = spte_to_pfn(iter.old_spte);
>   		if (kvm_is_reserved_pfn(pfn) ||
> -		    !PageTransCompoundMap(pfn_to_page(pfn)))
> +		    (!PageTransCompoundMap(pfn_to_page(pfn)) &&
> +		     !kvm_is_zone_device_pfn(pfn)))
>   			continue;
>   
>   		tdp_mmu_set_spte(kvm, &iter, 0);
> 

I added a note to the commit message that a similar check is found in 
kvm_mmu_zap_collapsible_spte.

Paolo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  2021-02-13  0:50 ` [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs Sean Christopherson
@ 2021-02-18 12:43   ` Paolo Bonzini
  2021-02-18 16:23     ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2021-02-18 12:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On 13/02/21 01:50, Sean Christopherson wrote:
> 
>  		pfn = spte_to_pfn(iter.old_spte);
>  		if (kvm_is_reserved_pfn(pfn) ||
> -		    (!PageTransCompoundMap(pfn_to_page(pfn)) &&
> -		     !kvm_is_zone_device_pfn(pfn)))
> +		    iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
> +							    pfn, PG_LEVEL_NUM))
>  			continue;
>  


This changes the test to PageCompound.  Is it worth moving the change to 
patch 1?

Paolo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 08/14] KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function
  2021-02-13  0:50 ` [PATCH 08/14] KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function Sean Christopherson
@ 2021-02-18 12:45   ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2021-02-18 12:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On 13/02/21 01:50, Sean Christopherson wrote:
> Store the vendor-specific dirty log size in a variable, there's no need
> to wrap it in a function since the value is constant after
> hardware_setup() runs.

For now... :)

Paolo

> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/include/asm/kvm-x86-ops.h | 1 -
>   arch/x86/include/asm/kvm_host.h    | 2 +-
>   arch/x86/kvm/mmu/mmu.c             | 5 +----
>   arch/x86/kvm/vmx/vmx.c             | 9 ++-------
>   4 files changed, 4 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 355a2ab8fc09..28c07cc01474 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -97,7 +97,6 @@ KVM_X86_OP_NULL(slot_enable_log_dirty)
>   KVM_X86_OP_NULL(slot_disable_log_dirty)
>   KVM_X86_OP_NULL(flush_log_dirty)
>   KVM_X86_OP_NULL(enable_log_dirty_pt_masked)
> -KVM_X86_OP_NULL(cpu_dirty_log_size)
>   KVM_X86_OP_NULL(pre_block)
>   KVM_X86_OP_NULL(post_block)
>   KVM_X86_OP_NULL(vcpu_blocking)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 84499aad01a4..fb59933610d9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1294,7 +1294,7 @@ struct kvm_x86_ops {
>   	void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
>   					   struct kvm_memory_slot *slot,
>   					   gfn_t offset, unsigned long mask);
> -	int (*cpu_dirty_log_size)(void);
> +	int cpu_dirty_log_size;
>   
>   	/* pmu operations of sub-arch */
>   	const struct kvm_pmu_ops *pmu_ops;
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d5849a0e3de1..6c32e8e0f720 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1294,10 +1294,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>   
>   int kvm_cpu_dirty_log_size(void)
>   {
> -	if (kvm_x86_ops.cpu_dirty_log_size)
> -		return static_call(kvm_x86_cpu_dirty_log_size)();
> -
> -	return 0;
> +	return kvm_x86_ops.cpu_dirty_log_size;
>   }
>   
>   bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b47ed3f412ef..f843707dd7df 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7650,11 +7650,6 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
>   	return supported & BIT(bit);
>   }
>   
> -static int vmx_cpu_dirty_log_size(void)
> -{
> -	return enable_pml ? PML_ENTITY_NUM : 0;
> -}
> -
>   static struct kvm_x86_ops vmx_x86_ops __initdata = {
>   	.hardware_unsetup = hardware_unsetup,
>   
> @@ -7758,6 +7753,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>   	.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
>   	.flush_log_dirty = vmx_flush_log_dirty,
>   	.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
> +	.cpu_dirty_log_size = PML_ENTITY_NUM,
>   
>   	.pre_block = vmx_pre_block,
>   	.post_block = vmx_post_block,
> @@ -7785,7 +7781,6 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>   
>   	.msr_filter_changed = vmx_msr_filter_changed,
>   	.complete_emulated_msr = kvm_complete_insn_gp,
> -	.cpu_dirty_log_size = vmx_cpu_dirty_log_size,
>   
>   	.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
>   };
> @@ -7907,7 +7902,7 @@ static __init int hardware_setup(void)
>   		vmx_x86_ops.slot_disable_log_dirty = NULL;
>   		vmx_x86_ops.flush_log_dirty = NULL;
>   		vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
> -		vmx_x86_ops.cpu_dirty_log_size = NULL;
> +		vmx_x86_ops.cpu_dirty_log_size = 0;
>   	}
>   
>   	if (!cpu_has_vmx_preemption_timer())
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/14] KVM: x86: Further clarify the logic and comments for toggling log dirty
  2021-02-13  0:50 ` [PATCH 10/14] KVM: x86: Further clarify the logic and comments for toggling log dirty Sean Christopherson
@ 2021-02-18 12:50   ` Paolo Bonzini
  2021-02-18 16:15     ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2021-02-18 12:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On 13/02/21 01:50, Sean Christopherson wrote:
> 
> -	 * Nothing to do for RO slots or CREATE/MOVE/DELETE of a slot.
> -	 * See comments below.
> +	 * Nothing to do for RO slots (which can't be dirtied and can't be made
> +	 * writable) or CREATE/MOVE/DELETE of a slot.  See comments below.
>  	 */
>  	if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY))
>  		return;
>  
> +	/*
> +	 * READONLY and non-flags changes were filtered out above, and the only
> +	 * other flag is LOG_DIRTY_PAGES, i.e. something is wrong if dirty
> +	 * logging isn't being toggled on or off.
> +	 */
> +	if (WARN_ON_ONCE(!((old->flags ^ new->flags) & KVM_MEM_LOG_DIRTY_PAGES)))
> +		return;
> +

What about readonly -> readwrite changes?

Paolo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements
  2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
                   ` (14 preceding siblings ...)
  2021-02-17 22:50 ` [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
@ 2021-02-18 12:57 ` Paolo Bonzini
  15 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2021-02-18 12:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On 13/02/21 01:50, Sean Christopherson wrote:
> Paolo, this is more or less ready, but on final read-through before
> sending I realized it would be a good idea to WARN during VM destruction
> if cpu_dirty_logging_count is non-zero.  I wanted to get you this before
> the 5.12 window opens in case you want the TDP MMU fixes for 5.12.  I'll
> do the above change and retest next week (note, Monday is a US holiday).
> 
> On to the code...
> 
> This started out as a small tweak to collapsible SPTE zapping in the TDP
> MMU, and ended up as a rather large overhaul of CPU dirty logging, a.k.a.
> PML.
> 
> Four main highlights:
> 
>    - Do a more precise check on whether or not a SPTE should be zapped to
>      rebuild it as a large page.
>    - Disable PML when running L2.  PML is fully emulated for L1 VMMs, thus
>      enabling PML in L2 can only hurt and never help.
>    - Drop the existing PML kvm_x86_ops.  They're basically trampolines into
>      the MMU, and IMO do far more harm than good.
>    - Turn on PML only when it's needed instead of setting all dirty bits to
>      soft disable PML.
> 
> What led me down the rabbit's hole of ripping out the existing PML
> kvm_x86_ops isn't really shown here.  Prior to incorporating Makarand's
> patch, which allowed for the wholesale remove of setting dirty bits,
> I spent a bunch of time poking around the "set dirty bits" code.  My
> original changes optimized that path to skip setting dirty bits in the
> nested MMU, since the nested MMU relies on write-protection and not PML.
> That in turn allowed the TDP MMU zapping to completely skip walking the
> rmaps, but doing so based on a bunch of callbacks was a twisted mess.
> 
> Happily, those patches got dropped in favor of nuking the code entirely.
> 
> Ran selftest and unit tests, and migrated actual VMs on AMD and Intel,
> with and without TDP MMU, and with and without EPT.  The AMD system I'm
> testing on infinite loops on the reset vector due to a #PF when NPT is
> disabled, so that didn't get tested.  That reproduces with kvm/next,
> I'll dig into it next week (no idea if it's a KVM or hardware issue).
> 
> For actual migration, I ran kvm-unit-tests in L1 along with stress to
> hammer memory, and verified migration was effectively blocked until the
> stress threads were killed (I didn't feel like figuring out how to
> throttle the VM).
> 
> Makarand Sonare (1):
>    KVM: VMX: Dynamically enable/disable PML based on memslot dirty
>      logging
> 
> Sean Christopherson (13):
>    KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE
>      pages
>    KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU
>    KVM: x86/mmu: Split out max mapping level calculation to helper
>    KVM: x86/mmu: Pass the memslot to the rmap callbacks
>    KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
>    KVM: nVMX: Disable PML in hardware when running L2
>    KVM: x86/mmu: Expand on the comment in
>      kvm_vcpu_ad_need_write_protect()
>    KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function
>    KVM: x86: Move MMU's PML logic to common code
>    KVM: x86: Further clarify the logic and comments for toggling log
>      dirty
>    KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML
>    KVM: x86: Fold "write-protect large" use case into generic
>      write-protect
>    KVM: x86/mmu: Remove a variety of unnecessary exports
> 
>   arch/x86/include/asm/kvm-x86-ops.h |   6 +-
>   arch/x86/include/asm/kvm_host.h    |  36 +----
>   arch/x86/kvm/mmu/mmu.c             | 203 +++++++++--------------------
>   arch/x86/kvm/mmu/mmu_internal.h    |   7 +-
>   arch/x86/kvm/mmu/tdp_mmu.c         |  66 +---------
>   arch/x86/kvm/mmu/tdp_mmu.h         |   3 +-
>   arch/x86/kvm/vmx/nested.c          |  34 +++--
>   arch/x86/kvm/vmx/vmx.c             |  94 +++++--------
>   arch/x86/kvm/vmx/vmx.h             |   2 +
>   arch/x86/kvm/x86.c                 | 145 +++++++++++++--------
>   10 files changed, 230 insertions(+), 366 deletions(-)
> 

Queued 1-9 and 14 until you clarify my doubt about patch 10, thanks.

Paolo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/14] KVM: x86: Further clarify the logic and comments for toggling log dirty
  2021-02-18 12:50   ` Paolo Bonzini
@ 2021-02-18 16:15     ` Sean Christopherson
  2021-02-18 16:56       ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2021-02-18 16:15 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On Thu, Feb 18, 2021, Paolo Bonzini wrote:
> On 13/02/21 01:50, Sean Christopherson wrote:
> > 
> > -	 * Nothing to do for RO slots or CREATE/MOVE/DELETE of a slot.
> > -	 * See comments below.
> > +	 * Nothing to do for RO slots (which can't be dirtied and can't be made
> > +	 * writable) or CREATE/MOVE/DELETE of a slot.  See comments below.
> >  	 */
> >  	if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY))
> >  		return;
> > +	/*
> > +	 * READONLY and non-flags changes were filtered out above, and the only
> > +	 * other flag is LOG_DIRTY_PAGES, i.e. something is wrong if dirty
> > +	 * logging isn't being toggled on or off.
> > +	 */
> > +	if (WARN_ON_ONCE(!((old->flags ^ new->flags) & KVM_MEM_LOG_DIRTY_PAGES)))
> > +		return;
> > +
> 
> What about readonly -> readwrite changes?

Not allowed without first deleting the memslot.  See commit 75d61fbcf563 ("KVM:
set_memory_region: Disallow changing read-only attribute later").  RW->RO is
also not supported.

	if (!old.npages) {
		change = KVM_MR_CREATE;
		new.dirty_bitmap = NULL;
		memset(&new.arch, 0, sizeof(new.arch));
	} else { /* Modify an existing slot. */
		if ((new.userspace_addr != old.userspace_addr) ||
		    (new.npages != old.npages) ||
		    ((new.flags ^ old.flags) & KVM_MEM_READONLY))
			return -EINVAL;


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  2021-02-18 12:43   ` Paolo Bonzini
@ 2021-02-18 16:23     ` Sean Christopherson
  2021-02-18 22:30       ` Mike Kravetz
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2021-02-18 16:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On Thu, Feb 18, 2021, Paolo Bonzini wrote:
> On 13/02/21 01:50, Sean Christopherson wrote:
> > 
> >  		pfn = spte_to_pfn(iter.old_spte);
> >  		if (kvm_is_reserved_pfn(pfn) ||
> > -		    (!PageTransCompoundMap(pfn_to_page(pfn)) &&
> > -		     !kvm_is_zone_device_pfn(pfn)))
> > +		    iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
> > +							    pfn, PG_LEVEL_NUM))
> >  			continue;
> 
> 
> This changes the test to PageCompound.  Is it worth moving the change to
> patch 1?

Yes?  I originally did that in a separate patch, then changed my mind.

If PageTransCompoundMap() also detects HugeTLB pages, then it is the "better"
option as it checks that the page is actually mapped huge.  I dropped the change
because PageTransCompound() is just a wrapper around PageCompound(), and so I
assumed PageTransCompoundMap() would detect HugeTLB pages, too.  I'm not so sure
about that after rereading the code, yet again.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/14] KVM: x86: Further clarify the logic and comments for toggling log dirty
  2021-02-18 16:15     ` Sean Christopherson
@ 2021-02-18 16:56       ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2021-02-18 16:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On 18/02/21 17:15, Sean Christopherson wrote:
> On Thu, Feb 18, 2021, Paolo Bonzini wrote:
>> On 13/02/21 01:50, Sean Christopherson wrote:
>>>
>>> -	 * Nothing to do for RO slots or CREATE/MOVE/DELETE of a slot.
>>> -	 * See comments below.
>>> +	 * Nothing to do for RO slots (which can't be dirtied and can't be made
>>> +	 * writable) or CREATE/MOVE/DELETE of a slot.  See comments below.
>>>   	 */
>>>   	if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY))
>>>   		return;
>>> +	/*
>>> +	 * READONLY and non-flags changes were filtered out above, and the only
>>> +	 * other flag is LOG_DIRTY_PAGES, i.e. something is wrong if dirty
>>> +	 * logging isn't being toggled on or off.
>>> +	 */
>>> +	if (WARN_ON_ONCE(!((old->flags ^ new->flags) & KVM_MEM_LOG_DIRTY_PAGES)))
>>> +		return;
>>> +
>>
>> What about readonly -> readwrite changes?
> 
> Not allowed without first deleting the memslot.  See commit 75d61fbcf563 ("KVM:
> set_memory_region: Disallow changing read-only attribute later").  RW->RO is
> also not supported.
> 
> 	if (!old.npages) {
> 		change = KVM_MR_CREATE;
> 		new.dirty_bitmap = NULL;
> 		memset(&new.arch, 0, sizeof(new.arch));
> 	} else { /* Modify an existing slot. */
> 		if ((new.userspace_addr != old.userspace_addr) ||
> 		    (new.npages != old.npages) ||
> 		    ((new.flags ^ old.flags) & KVM_MEM_READONLY))
> 			return -EINVAL;
> 

Right you are, thanks.  Then the warning would catch this quick.  I 
queued 10 and 11 too, and will reply on 12 now that I looked at it more 
closely.

Paolo


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 12/14] KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML
  2021-02-13  0:50 ` [PATCH 12/14] KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML Sean Christopherson
@ 2021-02-18 17:08   ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2021-02-18 17:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On 13/02/21 01:50, Sean Christopherson wrote:
> This means that spurious PML entries will be created for memslots with 
> dirty logging disabled if at least one other memslot has dirty logging 
> enabled, but for all known use cases, dirty logging is a global VMM 
> control.

This is not true.  For example QEMU uses dirty logging to track changes 
to the framebuffer.

However, what you're saying below is true: after a MR_CREATE there will 
be no shadow pages, and when they are created with mmu_set_spte they 
will not have the dirty bits set.  So there's really no change here for 
the case of only some memslots having dirty logging enabled.  Queued 12 
and 13 as well then!

Paolo

> Furthermore, spurious PML entries are already possible since 
> dirty bits are set only when a dirty logging is turned off, i.e. 
> memslots that are never dirty logged will have dirty bits cleared.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  2021-02-18 16:23     ` Sean Christopherson
@ 2021-02-18 22:30       ` Mike Kravetz
  2021-02-19  1:31         ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Mike Kravetz @ 2021-02-18 22:30 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon, Makarand Sonare

On 2/18/21 8:23 AM, Sean Christopherson wrote:
> On Thu, Feb 18, 2021, Paolo Bonzini wrote:
>> On 13/02/21 01:50, Sean Christopherson wrote:
>>>
>>>  		pfn = spte_to_pfn(iter.old_spte);
>>>  		if (kvm_is_reserved_pfn(pfn) ||
>>> -		    (!PageTransCompoundMap(pfn_to_page(pfn)) &&
>>> -		     !kvm_is_zone_device_pfn(pfn)))
>>> +		    iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
>>> +							    pfn, PG_LEVEL_NUM))
>>>  			continue;
>>
>>
>> This changes the test to PageCompound.  Is it worth moving the change to
>> patch 1?
> 
> Yes?  I originally did that in a separate patch, then changed my mind.
> 
> If PageTransCompoundMap() also detects HugeTLB pages, then it is the "better"
> option as it checks that the page is actually mapped huge.  I dropped the change
> because PageTransCompound() is just a wrapper around PageCompound(), and so I
> assumed PageTransCompoundMap() would detect HugeTLB pages, too.  I'm not so sure
> about that after rereading the code, yet again.

I have not followed this thread, but HugeTLB hit my mail filter and I can
help with this question.

No, PageTransCompoundMap() will not detect HugeTLB.  hugetlb pages do not
use the compound_mapcount_ptr field.  So, that final check/return in
PageTransCompoundMap() will always be false.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  2021-02-18 22:30       ` Mike Kravetz
@ 2021-02-19  1:31         ` Sean Christopherson
  0 siblings, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2021-02-19  1:31 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon, Makarand Sonare

On Thu, Feb 18, 2021, Mike Kravetz wrote:
> On 2/18/21 8:23 AM, Sean Christopherson wrote:
> > On Thu, Feb 18, 2021, Paolo Bonzini wrote:
> >> On 13/02/21 01:50, Sean Christopherson wrote:
> >>>
> >>>  		pfn = spte_to_pfn(iter.old_spte);
> >>>  		if (kvm_is_reserved_pfn(pfn) ||
> >>> -		    (!PageTransCompoundMap(pfn_to_page(pfn)) &&
> >>> -		     !kvm_is_zone_device_pfn(pfn)))
> >>> +		    iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
> >>> +							    pfn, PG_LEVEL_NUM))
> >>>  			continue;
> >>
> >>
> >> This changes the test to PageCompound.  Is it worth moving the change to
> >> patch 1?
> > 
> > Yes?  I originally did that in a separate patch, then changed my mind.
> > 
> > If PageTransCompoundMap() also detects HugeTLB pages, then it is the "better"
> > option as it checks that the page is actually mapped huge.  I dropped the change
> > because PageTransCompound() is just a wrapper around PageCompound(), and so I
> > assumed PageTransCompoundMap() would detect HugeTLB pages, too.  I'm not so sure
> > about that after rereading the code, yet again.
> 
> I have not followed this thread, but HugeTLB hit my mail filter and I can
> help with this question.
> 
> No, PageTransCompoundMap() will not detect HugeTLB.  hugetlb pages do not
> use the compound_mapcount_ptr field.  So, that final check/return in
> PageTransCompoundMap() will always be false.

Thanks Mike!

Paolo, I agree it makes sense to switch to PageCompound in the earlier patch, in
case this one needs to be reverted.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-02-19  1:32 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-13  0:50 [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
2021-02-13  0:50 ` [PATCH 01/14] KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE pages Sean Christopherson
2021-02-18 12:36   ` Paolo Bonzini
2021-02-13  0:50 ` [PATCH 02/14] KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU Sean Christopherson
2021-02-13  0:50 ` [PATCH 03/14] KVM: x86/mmu: Split out max mapping level calculation to helper Sean Christopherson
2021-02-13  0:50 ` [PATCH 04/14] KVM: x86/mmu: Pass the memslot to the rmap callbacks Sean Christopherson
2021-02-13  0:50 ` [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs Sean Christopherson
2021-02-18 12:43   ` Paolo Bonzini
2021-02-18 16:23     ` Sean Christopherson
2021-02-18 22:30       ` Mike Kravetz
2021-02-19  1:31         ` Sean Christopherson
2021-02-13  0:50 ` [PATCH 06/14] KVM: nVMX: Disable PML in hardware when running L2 Sean Christopherson
2021-02-13  0:50 ` [PATCH 07/14] KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect() Sean Christopherson
2021-02-13  0:50 ` [PATCH 08/14] KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function Sean Christopherson
2021-02-18 12:45   ` Paolo Bonzini
2021-02-13  0:50 ` [PATCH 09/14] KVM: x86: Move MMU's PML logic to common code Sean Christopherson
2021-02-13  0:50 ` [PATCH 10/14] KVM: x86: Further clarify the logic and comments for toggling log dirty Sean Christopherson
2021-02-18 12:50   ` Paolo Bonzini
2021-02-18 16:15     ` Sean Christopherson
2021-02-18 16:56       ` Paolo Bonzini
2021-02-13  0:50 ` [PATCH 11/14] KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging Sean Christopherson
2021-02-13  0:50 ` [PATCH 12/14] KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML Sean Christopherson
2021-02-18 17:08   ` Paolo Bonzini
2021-02-13  0:50 ` [PATCH 13/14] KVM: x86: Fold "write-protect large" use case into generic write-protect Sean Christopherson
2021-02-13  0:50 ` [PATCH 14/14] KVM: x86/mmu: Remove a variety of unnecessary exports Sean Christopherson
2021-02-17 22:50 ` [PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements Sean Christopherson
2021-02-18 12:57 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).