[PATCH v2] KVM: x86: enable dirty log gradually in small chunks

* [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
@ 2020-02-20  4:28 Jay Zhou
  2020-02-20 19:17 ` Peter Xu
  2020-02-20 19:28 ` Peter Xu
  0 siblings, 2 replies; 12+ messages in thread
From: Jay Zhou @ 2020-02-20  4:28 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, peterx, wangxinxin.wang, weidong.huang, jianjay.zhou,
	sean.j.christopherson, liu.jinsong

It could take kvm->mmu_lock for an extended period of time when
enabling dirty log for the first time. The main cost is to clear
all the D-bits of last level SPTEs. This situation can benefit from
manual dirty log protect as well, which can reduce the mmu_lock
time taken. The sequence is like this:

1. Initialize all the bits of the dirty bitmap to 1 when enabling
   dirty log for the first time
2. Only write protect the huge pages
3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
   SPTEs gradually in small chunks

Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
I did some tests with a 128G windows VM and counted the time taken
of memory_global_dirty_log_start, here is the numbers:

VM Size        Before    After optimization
128G           460ms     10ms

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
---
v2:
  * add new bit to KVM_ENABLE_CAP for KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 [Paolo]
  * support non-PML path [Peter]
  * delete the unnecessary ifdef and make the initialization of bitmap
    more clear [Sean]
  * document the new bits and tweak the testcase

 Documentation/virt/kvm/api.rst               | 23 +++++++++++++++++------
 arch/x86/kvm/mmu/mmu.c                       |  8 ++++++--
 arch/x86/kvm/vmx/vmx.c                       |  3 ++-
 include/linux/kvm_host.h                     |  7 ++++++-
 tools/testing/selftests/kvm/dirty_log_test.c |  3 ++-
 virt/kvm/kvm_main.c                          | 25 ++++++++++++++++++-------
 6 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 97a72a5..1afd310 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1267,9 +1267,11 @@ pages in the host.
 The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
 KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
 writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
-use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
-to make a new slot read-only.  In this case, writes to this memory will be
-posted to userspace as KVM_EXIT_MMIO exits.
+use it.  It will be different if the KVM_DIRTY_LOG_INITIALLY_SET flag of
+KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is set.  For more information, see the
+description of the capability.  The latter can be set, if KVM_CAP_READONLY_MEM
+capability allows it, to make a new slot read-only.  In this case, writes to
+this memory will be posted to userspace as KVM_EXIT_MMIO exits.
 
 When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
 the memory region are automatically reflected into the guest.  For example, an
@@ -5704,10 +5706,19 @@ and injected exceptions.
 :Architectures: x86, arm, arm64, mips
 :Parameters: args[0] whether feature should be enabled or not
 
-With this capability enabled, KVM_GET_DIRTY_LOG will not automatically
+Valid flags are::
+
+  #define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0)
+  #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
+
+With KVM_DIRTY_LOG_MANUAL_PROTECT set, KVM_GET_DIRTY_LOG will not automatically
 clear and write-protect all pages that are returned as dirty.
 Rather, userspace will have to do this operation separately using
-KVM_CLEAR_DIRTY_LOG.
+KVM_CLEAR_DIRTY_LOG.  With KVM_DIRTY_LOG_INITIALLY_SET set, all the bits of
+the dirty bitmap will be initialized to 1 when created, dirty logging will be
+enabled gradually in small chunks using KVM_CLEAR_DIRTY_LOG ioctl.  However,
+the KVM_DIRTY_LOG_INITIALLY_SET depends on KVM_DIRTY_LOG_MANUAL_PROTECT, it
+can not be set individually and supports x86 only for now.
 
 At the cost of a slightly more complicated operation, this provides better
 scalability and responsiveness for two reasons.  First,
@@ -5716,7 +5727,7 @@ than requiring to sync a full memslot; this ensures that KVM does not
 take spinlocks for an extended period of time.  Second, in some cases a
 large amount of time can pass between a call to KVM_GET_DIRTY_LOG and
 userspace actually using the data in the page.  Pages can be modified
-during this time, which is inefficint for both the guest and userspace:
+during this time, which is inefficient for both the guest and userspace:
 the guest will incur a higher penalty due to write protection faults,
 while userspace can see false reports of dirty pages.  Manual reprotection
 helps reducing this time, improving guest performance and reducing the
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 87e9ba2..f9c120e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5865,8 +5865,12 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 	bool flush;
 
 	spin_lock(&kvm->mmu_lock);
-	flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
-				      false);
+	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
+		flush = slot_handle_large_level(kvm, memslot,
+						slot_rmap_write_protect, false);
+	else
+		flush = slot_handle_all_level(kvm, memslot,
+						slot_rmap_write_protect, false);
 	spin_unlock(&kvm->mmu_lock);
 
 	/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3be25ec..fcc585a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7201,7 +7201,8 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_slot_enable_log_dirty(struct kvm *kvm,
 				     struct kvm_memory_slot *slot)
 {
-	kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
+	if (!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET))
+		kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
 	kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e89eb67..a555b52 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -39,6 +39,11 @@
 #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
 #endif
 
+#define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0)
+#define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
+#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT | \
+				KVM_DIRTY_LOG_INITIALLY_SET)
+
 /*
  * The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
  * in kvm, other bits are visible for userspace which are defined in
@@ -493,7 +498,7 @@ struct kvm {
 #endif
 	long tlbs_dirty;
 	struct list_head devices;
-	bool manual_dirty_log_protect;
+	u64 manual_dirty_log_protect;
 	struct dentry *debugfs_dentry;
 	struct kvm_stat_data **debugfs_stat_data;
 	struct srcu_struct srcu;
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 5614222..2a493c1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -317,10 +317,11 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations,
 	host_bmap_track = bitmap_alloc(host_num_pages);
 
 #ifdef USE_CLEAR_DIRTY_LOG
+	int ret = kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
 	struct kvm_enable_cap cap = {};
 
 	cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2;
-	cap.args[0] = 1;
+	cap.args[0] = ret;
 	vm_enable_cap(vm, &cap);
 #endif
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 70f03ce..f2631d0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -862,7 +862,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
  * Allocation size is twice as large as the actual dirty bitmap size.
  * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
  */
-static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
+static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
 	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
 
@@ -1094,8 +1094,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
 
 	/* Allocate page dirty bitmap if needed */
 	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
-		if (kvm_create_dirty_bitmap(&new) < 0)
+		if (kvm_alloc_dirty_bitmap(&new))
 			goto out_free;
+
+		if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
+			bitmap_set(new.dirty_bitmap, 0, new.npages);
 	}
 
 	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
@@ -1255,7 +1258,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
 
 	n = kvm_dirty_bitmap_bytes(memslot);
 	*flush = false;
-	if (kvm->manual_dirty_log_protect) {
+	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_MANUAL_PROTECT) {
 		/*
 		 * Unlike kvm_get_dirty_log, we always return false in *flush,
 		 * because no flush is needed until KVM_CLEAR_DIRTY_LOG.  There
@@ -3310,9 +3313,6 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_IOEVENTFD_ANY_LENGTH:
 	case KVM_CAP_CHECK_EXTENSION_VM:
 	case KVM_CAP_ENABLE_CAP_VM:
-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
-	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
-#endif
 		return 1;
 #ifdef CONFIG_KVM_MMIO
 	case KVM_CAP_COALESCED_MMIO:
@@ -3320,6 +3320,10 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_COALESCED_PIO:
 		return 1;
 #endif
+#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
+		return KVM_DIRTY_LOG_MANUAL_CAPS;
+#endif
 #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
 	case KVM_CAP_IRQ_ROUTING:
 		return KVM_MAX_IRQ_ROUTES;
@@ -3348,7 +3352,14 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	switch (cap->cap) {
 #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
 	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
-		if (cap->flags || (cap->args[0] & ~1))
+		if (cap->flags ||
+		    (cap->args[0] & ~KVM_DIRTY_LOG_MANUAL_CAPS) ||
+		    /* The capability of KVM_DIRTY_LOG_INITIALLY_SET depends
+		     * on KVM_DIRTY_LOG_MANUAL_PROTECT, it should not be
+		     * set individually
+		     */
+		    ((cap->args[0] & KVM_DIRTY_LOG_MANUAL_CAPS) ==
+			KVM_DIRTY_LOG_INITIALLY_SET))
 			return -EINVAL;
 		kvm->manual_dirty_log_protect = cap->args[0];
 		return 0;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread