All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
@ 2020-02-20  4:28 Jay Zhou
  2020-02-20 19:17 ` Peter Xu
  2020-02-20 19:28 ` Peter Xu
  0 siblings, 2 replies; 12+ messages in thread
From: Jay Zhou @ 2020-02-20  4:28 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, peterx, wangxinxin.wang, weidong.huang, jianjay.zhou,
	sean.j.christopherson, liu.jinsong

It could take kvm->mmu_lock for an extended period of time when
enabling dirty log for the first time. The main cost is to clear
all the D-bits of last level SPTEs. This situation can benefit from
manual dirty log protect as well, which can reduce the mmu_lock
time taken. The sequence is like this:

1. Initialize all the bits of the dirty bitmap to 1 when enabling
   dirty log for the first time
2. Only write protect the huge pages
3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
   SPTEs gradually in small chunks

Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
I did some tests with a 128G windows VM and counted the time taken
of memory_global_dirty_log_start, here is the numbers:

VM Size        Before    After optimization
128G           460ms     10ms

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
---
v2:
  * add new bit to KVM_ENABLE_CAP for KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 [Paolo]
  * support non-PML path [Peter]
  * delete the unnecessary ifdef and make the initialization of bitmap
    more clear [Sean]
  * document the new bits and tweak the testcase

 Documentation/virt/kvm/api.rst               | 23 +++++++++++++++++------
 arch/x86/kvm/mmu/mmu.c                       |  8 ++++++--
 arch/x86/kvm/vmx/vmx.c                       |  3 ++-
 include/linux/kvm_host.h                     |  7 ++++++-
 tools/testing/selftests/kvm/dirty_log_test.c |  3 ++-
 virt/kvm/kvm_main.c                          | 25 ++++++++++++++++++-------
 6 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 97a72a5..1afd310 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1267,9 +1267,11 @@ pages in the host.
 The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
 KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
 writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
-use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
-to make a new slot read-only.  In this case, writes to this memory will be
-posted to userspace as KVM_EXIT_MMIO exits.
+use it.  It will be different if the KVM_DIRTY_LOG_INITIALLY_SET flag of
+KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is set.  For more information, see the
+description of the capability.  The latter can be set, if KVM_CAP_READONLY_MEM
+capability allows it, to make a new slot read-only.  In this case, writes to
+this memory will be posted to userspace as KVM_EXIT_MMIO exits.
 
 When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
 the memory region are automatically reflected into the guest.  For example, an
@@ -5704,10 +5706,19 @@ and injected exceptions.
 :Architectures: x86, arm, arm64, mips
 :Parameters: args[0] whether feature should be enabled or not
 
-With this capability enabled, KVM_GET_DIRTY_LOG will not automatically
+Valid flags are::
+
+  #define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0)
+  #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
+
+With KVM_DIRTY_LOG_MANUAL_PROTECT set, KVM_GET_DIRTY_LOG will not automatically
 clear and write-protect all pages that are returned as dirty.
 Rather, userspace will have to do this operation separately using
-KVM_CLEAR_DIRTY_LOG.
+KVM_CLEAR_DIRTY_LOG.  With KVM_DIRTY_LOG_INITIALLY_SET set, all the bits of
+the dirty bitmap will be initialized to 1 when created, dirty logging will be
+enabled gradually in small chunks using KVM_CLEAR_DIRTY_LOG ioctl.  However,
+the KVM_DIRTY_LOG_INITIALLY_SET depends on KVM_DIRTY_LOG_MANUAL_PROTECT, it
+can not be set individually and supports x86 only for now.
 
 At the cost of a slightly more complicated operation, this provides better
 scalability and responsiveness for two reasons.  First,
@@ -5716,7 +5727,7 @@ than requiring to sync a full memslot; this ensures that KVM does not
 take spinlocks for an extended period of time.  Second, in some cases a
 large amount of time can pass between a call to KVM_GET_DIRTY_LOG and
 userspace actually using the data in the page.  Pages can be modified
-during this time, which is inefficint for both the guest and userspace:
+during this time, which is inefficient for both the guest and userspace:
 the guest will incur a higher penalty due to write protection faults,
 while userspace can see false reports of dirty pages.  Manual reprotection
 helps reducing this time, improving guest performance and reducing the
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 87e9ba2..f9c120e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5865,8 +5865,12 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 	bool flush;
 
 	spin_lock(&kvm->mmu_lock);
-	flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
-				      false);
+	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
+		flush = slot_handle_large_level(kvm, memslot,
+						slot_rmap_write_protect, false);
+	else
+		flush = slot_handle_all_level(kvm, memslot,
+						slot_rmap_write_protect, false);
 	spin_unlock(&kvm->mmu_lock);
 
 	/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3be25ec..fcc585a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7201,7 +7201,8 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_slot_enable_log_dirty(struct kvm *kvm,
 				     struct kvm_memory_slot *slot)
 {
-	kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
+	if (!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET))
+		kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
 	kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e89eb67..a555b52 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -39,6 +39,11 @@
 #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
 #endif
 
+#define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0)
+#define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
+#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT | \
+				KVM_DIRTY_LOG_INITIALLY_SET)
+
 /*
  * The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
  * in kvm, other bits are visible for userspace which are defined in
@@ -493,7 +498,7 @@ struct kvm {
 #endif
 	long tlbs_dirty;
 	struct list_head devices;
-	bool manual_dirty_log_protect;
+	u64 manual_dirty_log_protect;
 	struct dentry *debugfs_dentry;
 	struct kvm_stat_data **debugfs_stat_data;
 	struct srcu_struct srcu;
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 5614222..2a493c1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -317,10 +317,11 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations,
 	host_bmap_track = bitmap_alloc(host_num_pages);
 
 #ifdef USE_CLEAR_DIRTY_LOG
+	int ret = kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
 	struct kvm_enable_cap cap = {};
 
 	cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2;
-	cap.args[0] = 1;
+	cap.args[0] = ret;
 	vm_enable_cap(vm, &cap);
 #endif
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 70f03ce..f2631d0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -862,7 +862,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
  * Allocation size is twice as large as the actual dirty bitmap size.
  * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
  */
-static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
+static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
 	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
 
@@ -1094,8 +1094,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
 
 	/* Allocate page dirty bitmap if needed */
 	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
-		if (kvm_create_dirty_bitmap(&new) < 0)
+		if (kvm_alloc_dirty_bitmap(&new))
 			goto out_free;
+
+		if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
+			bitmap_set(new.dirty_bitmap, 0, new.npages);
 	}
 
 	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
@@ -1255,7 +1258,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
 
 	n = kvm_dirty_bitmap_bytes(memslot);
 	*flush = false;
-	if (kvm->manual_dirty_log_protect) {
+	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_MANUAL_PROTECT) {
 		/*
 		 * Unlike kvm_get_dirty_log, we always return false in *flush,
 		 * because no flush is needed until KVM_CLEAR_DIRTY_LOG.  There
@@ -3310,9 +3313,6 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_IOEVENTFD_ANY_LENGTH:
 	case KVM_CAP_CHECK_EXTENSION_VM:
 	case KVM_CAP_ENABLE_CAP_VM:
-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
-	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
-#endif
 		return 1;
 #ifdef CONFIG_KVM_MMIO
 	case KVM_CAP_COALESCED_MMIO:
@@ -3320,6 +3320,10 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 	case KVM_CAP_COALESCED_PIO:
 		return 1;
 #endif
+#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
+		return KVM_DIRTY_LOG_MANUAL_CAPS;
+#endif
 #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
 	case KVM_CAP_IRQ_ROUTING:
 		return KVM_MAX_IRQ_ROUTES;
@@ -3348,7 +3352,14 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	switch (cap->cap) {
 #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
 	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
-		if (cap->flags || (cap->args[0] & ~1))
+		if (cap->flags ||
+		    (cap->args[0] & ~KVM_DIRTY_LOG_MANUAL_CAPS) ||
+		    /* The capability of KVM_DIRTY_LOG_INITIALLY_SET depends
+		     * on KVM_DIRTY_LOG_MANUAL_PROTECT, it should not be
+		     * set individually
+		     */
+		    ((cap->args[0] & KVM_DIRTY_LOG_MANUAL_CAPS) ==
+			KVM_DIRTY_LOG_INITIALLY_SET))
 			return -EINVAL;
 		kvm->manual_dirty_log_protect = cap->args[0];
 		return 0;
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-20  4:28 [PATCH v2] KVM: x86: enable dirty log gradually in small chunks Jay Zhou
@ 2020-02-20 19:17 ` Peter Xu
  2020-02-20 19:23   ` Sean Christopherson
  2020-02-21  9:31   ` Zhoujian (jay)
  2020-02-20 19:28 ` Peter Xu
  1 sibling, 2 replies; 12+ messages in thread
From: Peter Xu @ 2020-02-20 19:17 UTC (permalink / raw)
  To: Jay Zhou
  Cc: kvm, pbonzini, wangxinxin.wang, weidong.huang,
	sean.j.christopherson, liu.jinsong

On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> It could take kvm->mmu_lock for an extended period of time when
> enabling dirty log for the first time. The main cost is to clear
> all the D-bits of last level SPTEs. This situation can benefit from
> manual dirty log protect as well, which can reduce the mmu_lock
> time taken. The sequence is like this:
> 
> 1. Initialize all the bits of the dirty bitmap to 1 when enabling
>    dirty log for the first time
> 2. Only write protect the huge pages
> 3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
> 4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
>    SPTEs gradually in small chunks
> 
> Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
> I did some tests with a 128G windows VM and counted the time taken
> of memory_global_dirty_log_start, here is the numbers:
> 
> VM Size        Before    After optimization
> 128G           460ms     10ms
> 
> Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
> ---
> v2:
>   * add new bit to KVM_ENABLE_CAP for KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 [Paolo]
>   * support non-PML path [Peter]
>   * delete the unnecessary ifdef and make the initialization of bitmap
>     more clear [Sean]
>   * document the new bits and tweak the testcase
> 
>  Documentation/virt/kvm/api.rst               | 23 +++++++++++++++++------
>  arch/x86/kvm/mmu/mmu.c                       |  8 ++++++--
>  arch/x86/kvm/vmx/vmx.c                       |  3 ++-
>  include/linux/kvm_host.h                     |  7 ++++++-
>  tools/testing/selftests/kvm/dirty_log_test.c |  3 ++-
>  virt/kvm/kvm_main.c                          | 25 ++++++++++++++++++-------
>  6 files changed, 51 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 97a72a5..1afd310 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1267,9 +1267,11 @@ pages in the host.
>  The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
>  KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
>  writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
> -use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
> -to make a new slot read-only.  In this case, writes to this memory will be
> -posted to userspace as KVM_EXIT_MMIO exits.
> +use it.  It will be different if the KVM_DIRTY_LOG_INITIALLY_SET flag of
> +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is set.  For more information, see the
> +description of the capability.  The latter can be set, if KVM_CAP_READONLY_MEM
> +capability allows it, to make a new slot read-only.  In this case, writes to
> +this memory will be posted to userspace as KVM_EXIT_MMIO exits.

Not sure about others, but my own preference is that we keep this part
untouched...  The changed document could be even more confusing to me...

>  
>  When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
>  the memory region are automatically reflected into the guest.  For example, an
> @@ -5704,10 +5706,19 @@ and injected exceptions.
>  :Architectures: x86, arm, arm64, mips
>  :Parameters: args[0] whether feature should be enabled or not
>  
> -With this capability enabled, KVM_GET_DIRTY_LOG will not automatically
> +Valid flags are::
> +
> +  #define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0)
> +  #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
> +
> +With KVM_DIRTY_LOG_MANUAL_PROTECT set, KVM_GET_DIRTY_LOG will not automatically
>  clear and write-protect all pages that are returned as dirty.
>  Rather, userspace will have to do this operation separately using
> -KVM_CLEAR_DIRTY_LOG.
> +KVM_CLEAR_DIRTY_LOG.  With KVM_DIRTY_LOG_INITIALLY_SET set, all the bits of
> +the dirty bitmap will be initialized to 1 when created, dirty logging will be
> +enabled gradually in small chunks using KVM_CLEAR_DIRTY_LOG ioctl.  However,
> +the KVM_DIRTY_LOG_INITIALLY_SET depends on KVM_DIRTY_LOG_MANUAL_PROTECT, it
> +can not be set individually and supports x86 only for now.

Need s/KVM_DIRTY_LOG_MANUAL_PROTECT/KVM_DIRTY_LOG_MANUAL_PROTECT2/?

>  
>  At the cost of a slightly more complicated operation, this provides better
>  scalability and responsiveness for two reasons.  First,
> @@ -5716,7 +5727,7 @@ than requiring to sync a full memslot; this ensures that KVM does not
>  take spinlocks for an extended period of time.  Second, in some cases a
>  large amount of time can pass between a call to KVM_GET_DIRTY_LOG and
>  userspace actually using the data in the page.  Pages can be modified
> -during this time, which is inefficint for both the guest and userspace:
> +during this time, which is inefficient for both the guest and userspace:
>  the guest will incur a higher penalty due to write protection faults,
>  while userspace can see false reports of dirty pages.  Manual reprotection
>  helps reducing this time, improving guest performance and reducing the
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 87e9ba2..f9c120e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5865,8 +5865,12 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
>  	bool flush;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
> -				      false);
> +	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
> +		flush = slot_handle_large_level(kvm, memslot,
> +						slot_rmap_write_protect, false);
> +	else
> +		flush = slot_handle_all_level(kvm, memslot,
> +						slot_rmap_write_protect, false);
>  	spin_unlock(&kvm->mmu_lock);
>  
>  	/*
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 3be25ec..fcc585a 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7201,7 +7201,8 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
>  static void vmx_slot_enable_log_dirty(struct kvm *kvm,
>  				     struct kvm_memory_slot *slot)
>  {
> -	kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> +	if (!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET))
> +		kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
>  	kvm_mmu_slot_largepage_remove_write_access(kvm, slot);
>  }
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e89eb67..a555b52 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -39,6 +39,11 @@
>  #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS
>  #endif
>  
> +#define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0)
> +#define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
> +#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT | \
> +				KVM_DIRTY_LOG_INITIALLY_SET)
> +
>  /*
>   * The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
>   * in kvm, other bits are visible for userspace which are defined in
> @@ -493,7 +498,7 @@ struct kvm {
>  #endif
>  	long tlbs_dirty;
>  	struct list_head devices;
> -	bool manual_dirty_log_protect;
> +	u64 manual_dirty_log_protect;
>  	struct dentry *debugfs_dentry;
>  	struct kvm_stat_data **debugfs_stat_data;
>  	struct srcu_struct srcu;
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
> index 5614222..2a493c1 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -317,10 +317,11 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations,
>  	host_bmap_track = bitmap_alloc(host_num_pages);
>  
>  #ifdef USE_CLEAR_DIRTY_LOG
> +	int ret = kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
>  	struct kvm_enable_cap cap = {};
>  
>  	cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2;
> -	cap.args[0] = 1;
> +	cap.args[0] = ret;

You enabled the initial-all-set but didn't really check it, so it
didn't help much from the testcase pov...  I'd suggest you drop this
change, and you can work on top after this patch can be accepted.

(Not to mention the original test actually verified that we don't
 break, which seems good..)

>  	vm_enable_cap(vm, &cap);
>  #endif
>  
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 70f03ce..f2631d0 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -862,7 +862,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
>   * Allocation size is twice as large as the actual dirty bitmap size.
>   * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
>   */
> -static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
> +static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)

This change seems irrelevant..

>  {
>  	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
>  
> @@ -1094,8 +1094,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  
>  	/* Allocate page dirty bitmap if needed */
>  	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
> -		if (kvm_create_dirty_bitmap(&new) < 0)
> +		if (kvm_alloc_dirty_bitmap(&new))

Same here.

>  			goto out_free;
> +
> +		if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)

(Maybe time to introduce a helper to shorten this check. :)

> +			bitmap_set(new.dirty_bitmap, 0, new.npages);
>  	}
>  
>  	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
> @@ -1255,7 +1258,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
>  
>  	n = kvm_dirty_bitmap_bytes(memslot);
>  	*flush = false;
> -	if (kvm->manual_dirty_log_protect) {
> +	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_MANUAL_PROTECT) {

Can also introduce a helper for this too.

Side note: logically this line doesn't need a change, because bit 1
depends on bit 0 so if (kvm->manual_dirty_log_protect) it means bit 0
must be set.

>  		/*
>  		 * Unlike kvm_get_dirty_log, we always return false in *flush,
>  		 * because no flush is needed until KVM_CLEAR_DIRTY_LOG.  There
> @@ -3310,9 +3313,6 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  	case KVM_CAP_IOEVENTFD_ANY_LENGTH:
>  	case KVM_CAP_CHECK_EXTENSION_VM:
>  	case KVM_CAP_ENABLE_CAP_VM:
> -#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> -	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> -#endif
>  		return 1;
>  #ifdef CONFIG_KVM_MMIO
>  	case KVM_CAP_COALESCED_MMIO:
> @@ -3320,6 +3320,10 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  	case KVM_CAP_COALESCED_PIO:
>  		return 1;
>  #endif
> +#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> +	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> +		return KVM_DIRTY_LOG_MANUAL_CAPS;
> +#endif
>  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
>  	case KVM_CAP_IRQ_ROUTING:
>  		return KVM_MAX_IRQ_ROUTES;
> @@ -3348,7 +3352,14 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  	switch (cap->cap) {
>  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
>  	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> -		if (cap->flags || (cap->args[0] & ~1))
> +		if (cap->flags ||
> +		    (cap->args[0] & ~KVM_DIRTY_LOG_MANUAL_CAPS) ||
> +		    /* The capability of KVM_DIRTY_LOG_INITIALLY_SET depends
> +		     * on KVM_DIRTY_LOG_MANUAL_PROTECT, it should not be
> +		     * set individually
> +		     */
> +		    ((cap->args[0] & KVM_DIRTY_LOG_MANUAL_CAPS) ==
> +			KVM_DIRTY_LOG_INITIALLY_SET))

How about something easier to read? :)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 70f03ce0e5c1..9dfbab2a9929 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3348,7 +3348,10 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
        switch (cap->cap) {
 #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
        case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
-               if (cap->flags || (cap->args[0] & ~1))
+               if (cap->flags || (cap->args[0] & ~3))
+                       return -EINVAL;
+               /* Allow 00, 01, and 11. */
+               if (cap->args[0] == KVM_DIRTY_LOG_INITIALLY_SET)
                        return -EINVAL;
                kvm->manual_dirty_log_protect = cap->args[0];
                return 0;

Otherwise it looks good to me!

Thanks,

-- 
Peter Xu


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-20 19:17 ` Peter Xu
@ 2020-02-20 19:23   ` Sean Christopherson
  2020-02-20 19:42     ` Peter Xu
  2020-02-21  9:31   ` Zhoujian (jay)
  1 sibling, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2020-02-20 19:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jay Zhou, kvm, pbonzini, wangxinxin.wang, weidong.huang, liu.jinsong

On Thu, Feb 20, 2020 at 02:17:06PM -0500, Peter Xu wrote:
> On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> > @@ -3348,7 +3352,14 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> >  	switch (cap->cap) {
> >  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> >  	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > -		if (cap->flags || (cap->args[0] & ~1))
> > +		if (cap->flags ||
> > +		    (cap->args[0] & ~KVM_DIRTY_LOG_MANUAL_CAPS) ||
> > +		    /* The capability of KVM_DIRTY_LOG_INITIALLY_SET depends
> > +		     * on KVM_DIRTY_LOG_MANUAL_PROTECT, it should not be
> > +		     * set individually
> > +		     */
> > +		    ((cap->args[0] & KVM_DIRTY_LOG_MANUAL_CAPS) ==
> > +			KVM_DIRTY_LOG_INITIALLY_SET))
> 
> How about something easier to read? :)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 70f03ce0e5c1..9dfbab2a9929 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3348,7 +3348,10 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>         switch (cap->cap) {
>  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
>         case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> -               if (cap->flags || (cap->args[0] & ~1))
> +               if (cap->flags || (cap->args[0] & ~3))
> +                       return -EINVAL;
> +               /* Allow 00, 01, and 11. */
> +               if (cap->args[0] == KVM_DIRTY_LOG_INITIALLY_SET)
>                         return -EINVAL;

Oof, "easier" is subjective :-)  How about this?

	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2: {
		u64 allowed_options = KVM_DIRTY_LOG_MANUAL_PROTECT;

		if (cap->args[0] & KVM_DIRTY_LOG_MANUAL_PROTECT)
			allowed_options = KVM_DIRTY_LOG_INITIALLY_SET;

		if (cap->flags || (cap->args[0] & ~allowed_options))
			return -EINVAL;
		kvm->manual_dirty_log_protect = cap->args[0];
		return 0;
	}

>                 kvm->manual_dirty_log_protect = cap->args[0];
>                 return 0;
> 
> Otherwise it looks good to me!
> 
> Thanks,
> 
> -- 
> Peter Xu
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-20  4:28 [PATCH v2] KVM: x86: enable dirty log gradually in small chunks Jay Zhou
  2020-02-20 19:17 ` Peter Xu
@ 2020-02-20 19:28 ` Peter Xu
  2020-02-21  9:53   ` Zhoujian (jay)
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Xu @ 2020-02-20 19:28 UTC (permalink / raw)
  To: Jay Zhou
  Cc: kvm, pbonzini, wangxinxin.wang, weidong.huang,
	sean.j.christopherson, liu.jinsong

On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> @@ -5865,8 +5865,12 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
>  	bool flush;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
> -				      false);
> +	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
> +		flush = slot_handle_large_level(kvm, memslot,
> +						slot_rmap_write_protect, false);
> +	else
> +		flush = slot_handle_all_level(kvm, memslot,
> +						slot_rmap_write_protect, false);

Another extra comment:

I think we should still keep the old behavior for KVM_MEM_READONLY (in
kvm_mmu_slot_apply_flags())) for this...  Say, instead of doing this,
maybe we want kvm_mmu_slot_remove_write_access() to take a new
parameter to decide to which level we do the wr-protect.

Thanks,

>  	spin_unlock(&kvm->mmu_lock);
>  
>  	/*

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-20 19:23   ` Sean Christopherson
@ 2020-02-20 19:42     ` Peter Xu
  2020-02-21  9:43       ` Zhoujian (jay)
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2020-02-20 19:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Jay Zhou, kvm, pbonzini, wangxinxin.wang, weidong.huang, liu.jinsong

On Thu, Feb 20, 2020 at 11:23:50AM -0800, Sean Christopherson wrote:
> On Thu, Feb 20, 2020 at 02:17:06PM -0500, Peter Xu wrote:
> > On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> > > @@ -3348,7 +3352,14 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > >  	switch (cap->cap) {
> > >  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> > >  	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > > -		if (cap->flags || (cap->args[0] & ~1))
> > > +		if (cap->flags ||
> > > +		    (cap->args[0] & ~KVM_DIRTY_LOG_MANUAL_CAPS) ||
> > > +		    /* The capability of KVM_DIRTY_LOG_INITIALLY_SET depends
> > > +		     * on KVM_DIRTY_LOG_MANUAL_PROTECT, it should not be
> > > +		     * set individually
> > > +		     */
> > > +		    ((cap->args[0] & KVM_DIRTY_LOG_MANUAL_CAPS) ==
> > > +			KVM_DIRTY_LOG_INITIALLY_SET))
> > 
> > How about something easier to read? :)
> > 
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 70f03ce0e5c1..9dfbab2a9929 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3348,7 +3348,10 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> >         switch (cap->cap) {
> >  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> >         case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > -               if (cap->flags || (cap->args[0] & ~1))
> > +               if (cap->flags || (cap->args[0] & ~3))
> > +                       return -EINVAL;
> > +               /* Allow 00, 01, and 11. */
> > +               if (cap->args[0] == KVM_DIRTY_LOG_INITIALLY_SET)
> >                         return -EINVAL;
> 
> Oof, "easier" is subjective :-)  How about this?
> 
> 	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2: {
> 		u64 allowed_options = KVM_DIRTY_LOG_MANUAL_PROTECT;
> 
> 		if (cap->args[0] & KVM_DIRTY_LOG_MANUAL_PROTECT)
> 			allowed_options = KVM_DIRTY_LOG_INITIALLY_SET;
> 
> 		if (cap->flags || (cap->args[0] & ~allowed_options))
> 			return -EINVAL;
> 		kvm->manual_dirty_log_protect = cap->args[0];
> 		return 0;
> 	}

Fine by me! (But I won't tell I still prefer mine ;)

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-20 19:17 ` Peter Xu
  2020-02-20 19:23   ` Sean Christopherson
@ 2020-02-21  9:31   ` Zhoujian (jay)
  2020-02-21 15:19     ` Peter Xu
  1 sibling, 1 reply; 12+ messages in thread
From: Zhoujian (jay) @ 2020-02-21  9:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, pbonzini, wangxin (U), Huangweidong (C),
	sean.j.christopherson, Liujinsong (Paul)



> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, February 21, 2020 3:17 AM
> To: Zhoujian (jay) <jianjay.zhou@huawei.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wangxin (U)
> <wangxinxin.wang@huawei.com>; Huangweidong (C)
> <weidong.huang@huawei.com>; sean.j.christopherson@intel.com; Liujinsong
> (Paul) <liu.jinsong@huawei.com>
> Subject: Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
> 
> On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> > It could take kvm->mmu_lock for an extended period of time when
> > enabling dirty log for the first time. The main cost is to clear all
> > the D-bits of last level SPTEs. This situation can benefit from manual
> > dirty log protect as well, which can reduce the mmu_lock time taken.
> > The sequence is like this:
> >
> > 1. Initialize all the bits of the dirty bitmap to 1 when enabling
> >    dirty log for the first time
> > 2. Only write protect the huge pages
> > 3. KVM_GET_DIRTY_LOG returns the dirty bitmap info 4.
> > KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
> >    SPTEs gradually in small chunks
> >
> > Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment, I did
> > some tests with a 128G windows VM and counted the time taken of
> > memory_global_dirty_log_start, here is the numbers:
> >
> > VM Size        Before    After optimization
> > 128G           460ms     10ms
> >
> > Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
> > ---
> > v2:
> >   * add new bit to KVM_ENABLE_CAP for
> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 [Paolo]
> >   * support non-PML path [Peter]
> >   * delete the unnecessary ifdef and make the initialization of bitmap
> >     more clear [Sean]
> >   * document the new bits and tweak the testcase
> >
> >  Documentation/virt/kvm/api.rst               | 23
> +++++++++++++++++------
> >  arch/x86/kvm/mmu/mmu.c                       |  8 ++++++--
> >  arch/x86/kvm/vmx/vmx.c                       |  3 ++-
> >  include/linux/kvm_host.h                     |  7 ++++++-
> >  tools/testing/selftests/kvm/dirty_log_test.c |  3 ++-
> >  virt/kvm/kvm_main.c                          | 25
> ++++++++++++++++++-------
> >  6 files changed, 51 insertions(+), 18 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/api.rst
> > b/Documentation/virt/kvm/api.rst index 97a72a5..1afd310 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -1267,9 +1267,11 @@ pages in the host.
> >  The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
> > KVM_MEM_READONLY.  The former can be set to instruct KVM to keep
> track
> > of  writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to
> > know how to -use it.  The latter can be set, if KVM_CAP_READONLY_MEM
> > capability allows it, -to make a new slot read-only.  In this case,
> > writes to this memory will be -posted to userspace as KVM_EXIT_MMIO
> exits.
> > +use it.  It will be different if the KVM_DIRTY_LOG_INITIALLY_SET flag
> > +of
> > +KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is set.  For more information,
> see
> > +the description of the capability.  The latter can be set, if
> > +KVM_CAP_READONLY_MEM capability allows it, to make a new slot
> > +read-only.  In this case, writes to this memory will be posted to userspace
> as KVM_EXIT_MMIO exits.
> 
> Not sure about others, but my own preference is that we keep this part
> untouched...  The changed document could be even more confusing to me...

Just want to mention something different happened for the other code logic
if KVM_DIRTY_LOG_INITIALLY_SET flag is set, but I have to admit that the
"difference" is not described clearly here. It is described at [1] below, 
so keep this part untouched maybe is a choice.

> >
> >  When the KVM_CAP_SYNC_MMU capability is available, changes in the
> > backing of  the memory region are automatically reflected into the
> > guest.  For example, an @@ -5704,10 +5706,19 @@ and injected
> exceptions.
> >  :Architectures: x86, arm, arm64, mips
> >  :Parameters: args[0] whether feature should be enabled or not
> >
> > -With this capability enabled, KVM_GET_DIRTY_LOG will not
> > automatically
> > +Valid flags are::
> > +
> > +  #define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0)  #define
> > + KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
> > +
> > +With KVM_DIRTY_LOG_MANUAL_PROTECT set, KVM_GET_DIRTY_LOG will
> not
> > +automatically
> >  clear and write-protect all pages that are returned as dirty.
> >  Rather, userspace will have to do this operation separately using
> > -KVM_CLEAR_DIRTY_LOG.
> > +KVM_CLEAR_DIRTY_LOG.  With KVM_DIRTY_LOG_INITIALLY_SET set, all
> the
> > +bits of the dirty bitmap will be initialized to 1 when created, dirty
> > +logging will be enabled gradually in small chunks using
> > +KVM_CLEAR_DIRTY_LOG ioctl.  However, the
> KVM_DIRTY_LOG_INITIALLY_SET
> > +depends on KVM_DIRTY_LOG_MANUAL_PROTECT, it can not be set
> individually and supports x86 only for now.

[1] 

> 
> Need
> s/KVM_DIRTY_LOG_MANUAL_PROTECT/KVM_DIRTY_LOG_MANUAL_PROTEC
> T2/?

Since the ioctl name is KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2, this naming
replacement seems reasonable. If no one is against it, I'll take it in the next
version.

> 
> >
> >  At the cost of a slightly more complicated operation, this provides
> > better  scalability and responsiveness for two reasons.  First, @@
> > -5716,7 +5727,7 @@ than requiring to sync a full memslot; this ensures
> > that KVM does not  take spinlocks for an extended period of time.
> > Second, in some cases a  large amount of time can pass between a call
> > to KVM_GET_DIRTY_LOG and  userspace actually using the data in the
> > page.  Pages can be modified -during this time, which is inefficint for both
> the guest and userspace:
> > +during this time, which is inefficient for both the guest and userspace:
> >  the guest will incur a higher penalty due to write protection faults,
> > while userspace can see false reports of dirty pages.  Manual
> > reprotection  helps reducing this time, improving guest performance
> > and reducing the diff --git a/arch/x86/kvm/mmu/mmu.c
> > b/arch/x86/kvm/mmu/mmu.c index 87e9ba2..f9c120e 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5865,8 +5865,12 @@ void
> kvm_mmu_slot_remove_write_access(struct kvm *kvm,
> >  	bool flush;
> >
> >  	spin_lock(&kvm->mmu_lock);
> > -	flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
> > -				      false);
> > +	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
> > +		flush = slot_handle_large_level(kvm, memslot,
> > +						slot_rmap_write_protect, false);
> > +	else
> > +		flush = slot_handle_all_level(kvm, memslot,
> > +						slot_rmap_write_protect, false);
> >  	spin_unlock(&kvm->mmu_lock);
> >
> >  	/*
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > 3be25ec..fcc585a 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7201,7 +7201,8 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu,
> > int cpu)  static void vmx_slot_enable_log_dirty(struct kvm *kvm,
> >  				     struct kvm_memory_slot *slot)  {
> > -	kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> > +	if (!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET))
> > +		kvm_mmu_slot_leaf_clear_dirty(kvm, slot);
> >  	kvm_mmu_slot_largepage_remove_write_access(kvm, slot);  }
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index
> > e89eb67..a555b52 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -39,6 +39,11 @@
> >  #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS  #endif
> >
> > +#define KVM_DIRTY_LOG_MANUAL_PROTECT (1 << 0) #define
> > +KVM_DIRTY_LOG_INITIALLY_SET (1 << 1) #define
> > +KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT |
> \
> > +				KVM_DIRTY_LOG_INITIALLY_SET)
> > +
> >  /*
> >   * The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
> >   * in kvm, other bits are visible for userspace which are defined in
> > @@ -493,7 +498,7 @@ struct kvm {  #endif
> >  	long tlbs_dirty;
> >  	struct list_head devices;
> > -	bool manual_dirty_log_protect;
> > +	u64 manual_dirty_log_protect;
> >  	struct dentry *debugfs_dentry;
> >  	struct kvm_stat_data **debugfs_stat_data;
> >  	struct srcu_struct srcu;
> > diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
> > b/tools/testing/selftests/kvm/dirty_log_test.c
> > index 5614222..2a493c1 100644
> > --- a/tools/testing/selftests/kvm/dirty_log_test.c
> > +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> > @@ -317,10 +317,11 @@ static void run_test(enum vm_guest_mode mode,
> unsigned long iterations,
> >  	host_bmap_track = bitmap_alloc(host_num_pages);
> >
> >  #ifdef USE_CLEAR_DIRTY_LOG
> > +	int ret = kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> >  	struct kvm_enable_cap cap = {};
> >
> >  	cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2;
> > -	cap.args[0] = 1;
> > +	cap.args[0] = ret;
> 
> You enabled the initial-all-set but didn't really check it, so it didn't help much
> from the testcase pov...  

vm_enable_cap is called afterwards, the return value is checked inside it,
may I ask this check is enough, or it is needed to get the value through
something like vm_get_cap ?

> I'd suggest you drop this change, and you can work
> on top after this patch can be accepted.

OK, some input parameters for cap.args[0] should be tested I think: 0, 1, 3
should be accepted, the other numbers will not.

> 
> (Not to mention the original test actually verified that we don't  break, which
> seems good..)
> 
> >  	vm_enable_cap(vm, &cap);
> >  #endif
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > 70f03ce..f2631d0 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -862,7 +862,7 @@ static int kvm_vm_release(struct inode *inode,
> struct file *filp)
> >   * Allocation size is twice as large as the actual dirty bitmap size.
> >   * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
> >   */
> > -static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
> > +static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)
> 
> This change seems irrelevant..
> 
> >  {
> >  	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
> >
> > @@ -1094,8 +1094,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
> >
> >  	/* Allocate page dirty bitmap if needed */
> >  	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
> > -		if (kvm_create_dirty_bitmap(&new) < 0)
> > +		if (kvm_alloc_dirty_bitmap(&new))
> 
> Same here.

s/kvm_create_dirty_bitmap/kvm_alloc_dirty_bitmap is Sean's suggestion to make
it clear that the helper is only responsible for allocation, then set all the bitmap bits
to 1 using bitmap_set afterwards, which seems reasonable. Do you still think it's
better to keep this name untouched?

> 
> >  			goto out_free;
> > +
> > +		if (kvm->manual_dirty_log_protect &
> KVM_DIRTY_LOG_INITIALLY_SET)
> 
> (Maybe time to introduce a helper to shorten this check. :)

Yeah, but could this be done on top of this patch?

> 
> > +			bitmap_set(new.dirty_bitmap, 0, new.npages);
> >  	}
> >
> >  	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
> > @@ -1255,7 +1258,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
> >
> >  	n = kvm_dirty_bitmap_bytes(memslot);
> >  	*flush = false;
> > -	if (kvm->manual_dirty_log_protect) {
> > +	if (kvm->manual_dirty_log_protect &
> KVM_DIRTY_LOG_MANUAL_PROTECT) {
> 
> Can also introduce a helper for this too.
> 
> Side note: logically this line doesn't need a change, because bit 1 depends on bit
> 0 so if (kvm->manual_dirty_log_protect) it means bit 0 must be set.

Yes, I agree.

> 
> >  		/*
> >  		 * Unlike kvm_get_dirty_log, we always return false in *flush,
> >  		 * because no flush is needed until KVM_CLEAR_DIRTY_LOG.
> There @@
> > -3310,9 +3313,6 @@ static long
> kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >  	case KVM_CAP_IOEVENTFD_ANY_LENGTH:
> >  	case KVM_CAP_CHECK_EXTENSION_VM:
> >  	case KVM_CAP_ENABLE_CAP_VM:
> > -#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> > -	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > -#endif
> >  		return 1;
> >  #ifdef CONFIG_KVM_MMIO
> >  	case KVM_CAP_COALESCED_MMIO:
> > @@ -3320,6 +3320,10 @@ static long
> kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >  	case KVM_CAP_COALESCED_PIO:
> >  		return 1;
> >  #endif
> > +#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> > +	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > +		return KVM_DIRTY_LOG_MANUAL_CAPS;
> > +#endif
> >  #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> >  	case KVM_CAP_IRQ_ROUTING:
> >  		return KVM_MAX_IRQ_ROUTES;
> > @@ -3348,7 +3352,14 @@ static int
> kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> >  	switch (cap->cap) {
> >  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> >  	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > -		if (cap->flags || (cap->args[0] & ~1))
> > +		if (cap->flags ||
> > +		    (cap->args[0] & ~KVM_DIRTY_LOG_MANUAL_CAPS) ||
> > +		    /* The capability of KVM_DIRTY_LOG_INITIALLY_SET depends
> > +		     * on KVM_DIRTY_LOG_MANUAL_PROTECT, it should not be
> > +		     * set individually
> > +		     */
> > +		    ((cap->args[0] & KVM_DIRTY_LOG_MANUAL_CAPS) ==
> > +			KVM_DIRTY_LOG_INITIALLY_SET))
> 
> How about something easier to read? :)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> 70f03ce0e5c1..9dfbab2a9929 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3348,7 +3348,10 @@ static int
> kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>         switch (cap->cap) {
>  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
>         case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> -               if (cap->flags || (cap->args[0] & ~1))
> +               if (cap->flags || (cap->args[0] & ~3))
> +                       return -EINVAL;
> +               /* Allow 00, 01, and 11. */
> +               if (cap->args[0] == KVM_DIRTY_LOG_INITIALLY_SET)
>                         return -EINVAL;
>                 kvm->manual_dirty_log_protect = cap->args[0];
>                 return 0;

Looks simpler than mine.

> Otherwise it looks good to me!

Thanks!


Regards,
Jay Zhou

> 
> Thanks,
> 
> --
> Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-20 19:42     ` Peter Xu
@ 2020-02-21  9:43       ` Zhoujian (jay)
  0 siblings, 0 replies; 12+ messages in thread
From: Zhoujian (jay) @ 2020-02-21  9:43 UTC (permalink / raw)
  To: Peter Xu, Sean Christopherson
  Cc: kvm, pbonzini, wangxin (U), Huangweidong (C), Liujinsong (Paul)



> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, February 21, 2020 3:42 AM
> To: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Zhoujian (jay) <jianjay.zhou@huawei.com>; kvm@vger.kernel.org;
> pbonzini@redhat.com; wangxin (U) <wangxinxin.wang@huawei.com>;
> Huangweidong (C) <weidong.huang@huawei.com>; Liujinsong (Paul)
> <liu.jinsong@huawei.com>
> Subject: Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
> 
> On Thu, Feb 20, 2020 at 11:23:50AM -0800, Sean Christopherson wrote:
> > On Thu, Feb 20, 2020 at 02:17:06PM -0500, Peter Xu wrote:
> > > On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> > > > @@ -3348,7 +3352,14 @@ static int
> kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > > >  	switch (cap->cap) {
> > > >  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> > > >  	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > > > -		if (cap->flags || (cap->args[0] & ~1))
> > > > +		if (cap->flags ||
> > > > +		    (cap->args[0] & ~KVM_DIRTY_LOG_MANUAL_CAPS) ||
> > > > +		    /* The capability of KVM_DIRTY_LOG_INITIALLY_SET
> depends
> > > > +		     * on KVM_DIRTY_LOG_MANUAL_PROTECT, it should not
> be
> > > > +		     * set individually
> > > > +		     */
> > > > +		    ((cap->args[0] & KVM_DIRTY_LOG_MANUAL_CAPS) ==
> > > > +			KVM_DIRTY_LOG_INITIALLY_SET))
> > >
> > > How about something easier to read? :)
> > >
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > > 70f03ce0e5c1..9dfbab2a9929 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -3348,7 +3348,10 @@ static int
> kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > >         switch (cap->cap) {
> > >  #ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> > >         case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2:
> > > -               if (cap->flags || (cap->args[0] & ~1))
> > > +               if (cap->flags || (cap->args[0] & ~3))
> > > +                       return -EINVAL;
> > > +               /* Allow 00, 01, and 11. */
> > > +               if (cap->args[0] == KVM_DIRTY_LOG_INITIALLY_SET)
> > >                         return -EINVAL;
> >
> > Oof, "easier" is subjective :-)  How about this?
> >
> > 	case KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2: {
> > 		u64 allowed_options = KVM_DIRTY_LOG_MANUAL_PROTECT;
> >
> > 		if (cap->args[0] & KVM_DIRTY_LOG_MANUAL_PROTECT)
> > 			allowed_options = KVM_DIRTY_LOG_INITIALLY_SET;

I believe you mean

		if (cap->args[0] & KVM_DIRTY_LOG_MANUAL_PROTECT)
			allowed_options |= KVM_DIRTY_LOG_INITIALLY_SET;

?

> >
> > 		if (cap->flags || (cap->args[0] & ~allowed_options))
> > 			return -EINVAL;
> > 		kvm->manual_dirty_log_protect = cap->args[0];
> > 		return 0;
> > 	}
> 
> Fine by me! (But I won't tell I still prefer mine ;)

:-)


Regards,
Jay Zhou

> 
> --
> Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-20 19:28 ` Peter Xu
@ 2020-02-21  9:53   ` Zhoujian (jay)
  2020-02-21 15:41     ` Peter Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Zhoujian (jay) @ 2020-02-21  9:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, pbonzini, wangxin (U), Huangweidong (C),
	sean.j.christopherson, Liujinsong (Paul)



> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, February 21, 2020 3:28 AM
> To: Zhoujian (jay) <jianjay.zhou@huawei.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wangxin (U)
> <wangxinxin.wang@huawei.com>; Huangweidong (C)
> <weidong.huang@huawei.com>; sean.j.christopherson@intel.com; Liujinsong
> (Paul) <liu.jinsong@huawei.com>
> Subject: Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
> 
> On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> > @@ -5865,8 +5865,12 @@ void
> kvm_mmu_slot_remove_write_access(struct kvm *kvm,
> >  	bool flush;
> >
> >  	spin_lock(&kvm->mmu_lock);
> > -	flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
> > -				      false);
> > +	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
> > +		flush = slot_handle_large_level(kvm, memslot,
> > +						slot_rmap_write_protect, false);
> > +	else
> > +		flush = slot_handle_all_level(kvm, memslot,
> > +						slot_rmap_write_protect, false);
> 
> Another extra comment:
> 
> I think we should still keep the old behavior for KVM_MEM_READONLY (in
> kvm_mmu_slot_apply_flags())) for this...  

I also realized this issue after posting this patch, and I agree.

> Say, instead of doing this, maybe we
> want kvm_mmu_slot_remove_write_access() to take a new parameter to
> decide to which level we do the wr-protect.

How about using the "flags" field to distinguish:

		if ((kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
                && (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES))
                flush = slot_handle_large_level(kvm, memslot,
                                        slot_rmap_write_protect, false);
        else
                flush = slot_handle_all_level(kvm, memslot,
                                        slot_rmap_write_protect, false);

Regards,
Jay Zhou

> 
> Thanks,
> 
> >  	spin_unlock(&kvm->mmu_lock);
> >
> >  	/*
> 
> --
> Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-21  9:31   ` Zhoujian (jay)
@ 2020-02-21 15:19     ` Peter Xu
  2020-02-22  8:11       ` Zhoujian (jay)
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2020-02-21 15:19 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: kvm, pbonzini, wangxin (U), Huangweidong (C),
	sean.j.christopherson, Liujinsong (Paul)

On Fri, Feb 21, 2020 at 09:31:52AM +0000, Zhoujian (jay) wrote:

[...]

> > > diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
> > > b/tools/testing/selftests/kvm/dirty_log_test.c
> > > index 5614222..2a493c1 100644
> > > --- a/tools/testing/selftests/kvm/dirty_log_test.c
> > > +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> > > @@ -317,10 +317,11 @@ static void run_test(enum vm_guest_mode mode,
> > unsigned long iterations,
> > >  	host_bmap_track = bitmap_alloc(host_num_pages);
> > >
> > >  #ifdef USE_CLEAR_DIRTY_LOG
> > > +	int ret = kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> > >  	struct kvm_enable_cap cap = {};
> > >
> > >  	cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2;
> > > -	cap.args[0] = 1;
> > > +	cap.args[0] = ret;
> > 
> > You enabled the initial-all-set but didn't really check it, so it didn't help much
> > from the testcase pov...  
> 
> vm_enable_cap is called afterwards, the return value is checked inside it,
> may I ask this check is enough, or it is needed to get the value through
> something like vm_get_cap ?

I meant to check what has offered by the initial-all-set feature bit,
which is, we should get the bitmap before dirtying and verify that
it's all ones.

> 
> > I'd suggest you drop this change, and you can work
> > on top after this patch can be accepted.
> 
> OK, some input parameters for cap.args[0] should be tested I think: 0, 1, 3
> should be accepted, the other numbers will not.

Yes. I think it'll be fine too if you want to put the test changes
into this patch. It's just not required so this patch could
potentially get merged easier, since the test may not be an oneliner
change.

> 
> > 
> > (Not to mention the original test actually verified that we don't  break, which
> > seems good..)
> > 
> > >  	vm_enable_cap(vm, &cap);
> > >  #endif
> > >
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > > 70f03ce..f2631d0 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -862,7 +862,7 @@ static int kvm_vm_release(struct inode *inode,
> > struct file *filp)
> > >   * Allocation size is twice as large as the actual dirty bitmap size.
> > >   * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
> > >   */
> > > -static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
> > > +static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)
> > 
> > This change seems irrelevant..
> > 
> > >  {
> > >  	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
> > >
> > > @@ -1094,8 +1094,11 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > >
> > >  	/* Allocate page dirty bitmap if needed */
> > >  	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
> > > -		if (kvm_create_dirty_bitmap(&new) < 0)
> > > +		if (kvm_alloc_dirty_bitmap(&new))
> > 
> > Same here.
> 
> s/kvm_create_dirty_bitmap/kvm_alloc_dirty_bitmap is Sean's suggestion to make
> it clear that the helper is only responsible for allocation, then set all the bitmap bits
> to 1 using bitmap_set afterwards, which seems reasonable. Do you still think it's
> better to keep this name untouched?

No strong opinion, feel free to take your preference (because we've
got three here and if you like it too then it's already 2:1 :).

> 
> > 
> > >  			goto out_free;
> > > +
> > > +		if (kvm->manual_dirty_log_protect &
> > KVM_DIRTY_LOG_INITIALLY_SET)
> > 
> > (Maybe time to introduce a helper to shorten this check. :)
> 
> Yeah, but could this be done on top of this patch?

You're going to repost after all, right?  If so, IMO it's easier to
just add the helper in the same patch.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-21  9:53   ` Zhoujian (jay)
@ 2020-02-21 15:41     ` Peter Xu
  2020-02-22  8:18       ` Zhoujian (jay)
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2020-02-21 15:41 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: kvm, pbonzini, wangxin (U), Huangweidong (C),
	sean.j.christopherson, Liujinsong (Paul)

On Fri, Feb 21, 2020 at 09:53:51AM +0000, Zhoujian (jay) wrote:
> 
> 
> > -----Original Message-----
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Friday, February 21, 2020 3:28 AM
> > To: Zhoujian (jay) <jianjay.zhou@huawei.com>
> > Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wangxin (U)
> > <wangxinxin.wang@huawei.com>; Huangweidong (C)
> > <weidong.huang@huawei.com>; sean.j.christopherson@intel.com; Liujinsong
> > (Paul) <liu.jinsong@huawei.com>
> > Subject: Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
> > 
> > On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> > > @@ -5865,8 +5865,12 @@ void
> > kvm_mmu_slot_remove_write_access(struct kvm *kvm,
> > >  	bool flush;
> > >
> > >  	spin_lock(&kvm->mmu_lock);
> > > -	flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
> > > -				      false);
> > > +	if (kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
> > > +		flush = slot_handle_large_level(kvm, memslot,
> > > +						slot_rmap_write_protect, false);
> > > +	else
> > > +		flush = slot_handle_all_level(kvm, memslot,
> > > +						slot_rmap_write_protect, false);
> > 
> > Another extra comment:
> > 
> > I think we should still keep the old behavior for KVM_MEM_READONLY (in
> > kvm_mmu_slot_apply_flags())) for this...  
> 
> I also realized this issue after posting this patch, and I agree.
> 
> > Say, instead of doing this, maybe we
> > want kvm_mmu_slot_remove_write_access() to take a new parameter to
> > decide to which level we do the wr-protect.
> 
> How about using the "flags" field to distinguish:
> 
> 		if ((kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET)
>                 && (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES))
>                 flush = slot_handle_large_level(kvm, memslot,
>                                         slot_rmap_write_protect, false);
>         else
>                 flush = slot_handle_all_level(kvm, memslot,
>                                         slot_rmap_write_protect, false);

This seems to be OK too.  But just to show what I meant (which I still
think could be a bit clearer; assuming kvm_manual_dirty_log_init_set()
is the helper you'll introduce):

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 40a0c0fd95ca..a90630cde92d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1312,7 +1312,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,

 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
-                                     struct kvm_memory_slot *memslot);
+                                     struct kvm_memory_slot *memslot,
+                                     int start_level);
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
                                   const struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 87e9ba27ada1..f538b7977fa2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5860,13 +5860,14 @@ static bool slot_rmap_write_protect(struct kvm *kvm,
 }

 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
-                                     struct kvm_memory_slot *memslot)
+                                     struct kvm_memory_slot *memslot,
+                                     int start_level)
 {
        bool flush;

        spin_lock(&kvm->mmu_lock);
-       flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
-                                     false);
+       flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
+                                 start_level, PT_MAX_HUGEPAGE_LEVEL, false);
        spin_unlock(&kvm->mmu_lock);

        /*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fb5d64ebc35d..2ed3204dfd9f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9956,7 +9956,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 {
        /* Still write protect RO slot */
        if (new->flags & KVM_MEM_READONLY) {
-               kvm_mmu_slot_remove_write_access(kvm, new);
+               kvm_mmu_slot_remove_write_access(kvm, new, PT_PAGE_TABLE_LEVEL);
                return;
        }

@@ -9993,8 +9993,20 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
        if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
                if (kvm_x86_ops->slot_enable_log_dirty)
                        kvm_x86_ops->slot_enable_log_dirty(kvm, new);
-               else
-                       kvm_mmu_slot_remove_write_access(kvm, new);
+               else {
+                       int level = kvm_manual_dirty_log_init_set(kvm) ?
+                           PT_DIRECTORY_LEVEL : PT_PAGE_TABLE_LEVEL;
+
+                       /*
+                        * If we're with intial-all-set, we don't need
+                        * to write protect any small page because
+                        * they're reported as dirty already.  However
+                        * we still need to write-protect huge pages
+                        * so that the page split can happen lazily on
+                        * the first write to the huge page.
+                        */
+                       kvm_mmu_slot_remove_write_access(kvm, new, level);
+               }
        } else {
                if (kvm_x86_ops->slot_disable_log_dirty)
                        kvm_x86_ops->slot_disable_log_dirty(kvm, new);

Thanks,

-- 
Peter Xu


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-21 15:19     ` Peter Xu
@ 2020-02-22  8:11       ` Zhoujian (jay)
  0 siblings, 0 replies; 12+ messages in thread
From: Zhoujian (jay) @ 2020-02-22  8:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, pbonzini, wangxin (U), Huangweidong (C),
	sean.j.christopherson, Liujinsong (Paul)



> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, February 21, 2020 11:19 PM
> To: Zhoujian (jay) <jianjay.zhou@huawei.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wangxin (U)
> <wangxinxin.wang@huawei.com>; Huangweidong (C)
> <weidong.huang@huawei.com>; sean.j.christopherson@intel.com; Liujinsong
> (Paul) <liu.jinsong@huawei.com>
> Subject: Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
> 
> On Fri, Feb 21, 2020 at 09:31:52AM +0000, Zhoujian (jay) wrote:
> 
> [...]
> 
> > > > diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
> > > > b/tools/testing/selftests/kvm/dirty_log_test.c
> > > > index 5614222..2a493c1 100644
> > > > --- a/tools/testing/selftests/kvm/dirty_log_test.c
> > > > +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> > > > @@ -317,10 +317,11 @@ static void run_test(enum vm_guest_mode
> > > > mode,
> > > unsigned long iterations,
> > > >  	host_bmap_track = bitmap_alloc(host_num_pages);
> > > >
> > > >  #ifdef USE_CLEAR_DIRTY_LOG
> > > > +	int ret =
> kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> > > >  	struct kvm_enable_cap cap = {};
> > > >
> > > >  	cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2;
> > > > -	cap.args[0] = 1;
> > > > +	cap.args[0] = ret;
> > >
> > > You enabled the initial-all-set but didn't really check it, so it
> > > didn't help much from the testcase pov...
> >
> > vm_enable_cap is called afterwards, the return value is checked inside
> > it, may I ask this check is enough, or it is needed to get the value
> > through something like vm_get_cap ?
> 
> I meant to check what has offered by the initial-all-set feature bit, which is, we
> should get the bitmap before dirtying and verify that it's all ones.

OK, I got your meaning now.

> 
> >
> > > I'd suggest you drop this change, and you can work on top after this
> > > patch can be accepted.
> >
> > OK, some input parameters for cap.args[0] should be tested I think: 0,
> > 1, 3 should be accepted, the other numbers will not.
> 
> Yes. I think it'll be fine too if you want to put the test changes into this patch.
> It's just not required so this patch could potentially get merged easier, since
> the test may not be an oneliner change.

Yeah, will drop it in the next version.

> 
> >
> > >
> > > (Not to mention the original test actually verified that we don't
> > > break, which seems good..)
> > >
> > > >  	vm_enable_cap(vm, &cap);
> > > >  #endif
> > > >
> > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index
> > > > 70f03ce..f2631d0 100644
> > > > --- a/virt/kvm/kvm_main.c
> > > > +++ b/virt/kvm/kvm_main.c
> > > > @@ -862,7 +862,7 @@ static int kvm_vm_release(struct inode *inode,
> > > struct file *filp)
> > > >   * Allocation size is twice as large as the actual dirty bitmap size.
> > > >   * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
> > > >   */
> > > > -static int kvm_create_dirty_bitmap(struct kvm_memory_slot
> > > > *memslot)
> > > > +static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot
> > > > +*memslot)
> > >
> > > This change seems irrelevant..
> > >
> > > >  {
> > > >  	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
> > > >
> > > > @@ -1094,8 +1094,11 @@ int __kvm_set_memory_region(struct kvm
> > > > *kvm,
> > > >
> > > >  	/* Allocate page dirty bitmap if needed */
> > > >  	if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES)
> && !new.dirty_bitmap) {
> > > > -		if (kvm_create_dirty_bitmap(&new) < 0)
> > > > +		if (kvm_alloc_dirty_bitmap(&new))
> > >
> > > Same here.
> >
> > s/kvm_create_dirty_bitmap/kvm_alloc_dirty_bitmap is Sean's suggestion
> > to make it clear that the helper is only responsible for allocation,
> > then set all the bitmap bits to 1 using bitmap_set afterwards, which
> > seems reasonable. Do you still think it's better to keep this name untouched?
> 
> No strong opinion, feel free to take your preference (because we've got three
> here and if you like it too then it's already 2:1 :).
> 
> >
> > >
> > > >  			goto out_free;
> > > > +
> > > > +		if (kvm->manual_dirty_log_protect &
> > > KVM_DIRTY_LOG_INITIALLY_SET)
> > >
> > > (Maybe time to introduce a helper to shorten this check. :)
> >
> > Yeah, but could this be done on top of this patch?
> 
> You're going to repost after all, right?  If so, IMO it's easier to just add the
> helper in the same patch.

OK, will do in v3.

Regards,
Jay Zhou

> 
> Thanks,
> 
> --
> Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
  2020-02-21 15:41     ` Peter Xu
@ 2020-02-22  8:18       ` Zhoujian (jay)
  0 siblings, 0 replies; 12+ messages in thread
From: Zhoujian (jay) @ 2020-02-22  8:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, pbonzini, wangxin (U), Huangweidong (C),
	sean.j.christopherson, Liujinsong (Paul)



> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Friday, February 21, 2020 11:41 PM
> To: Zhoujian (jay) <jianjay.zhou@huawei.com>
> Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wangxin (U)
> <wangxinxin.wang@huawei.com>; Huangweidong (C)
> <weidong.huang@huawei.com>; sean.j.christopherson@intel.com; Liujinsong
> (Paul) <liu.jinsong@huawei.com>
> Subject: Re: [PATCH v2] KVM: x86: enable dirty log gradually in small chunks
> 
> On Fri, Feb 21, 2020 at 09:53:51AM +0000, Zhoujian (jay) wrote:
> >
> >
> > > -----Original Message-----
> > > From: Peter Xu [mailto:peterx@redhat.com]
> > > Sent: Friday, February 21, 2020 3:28 AM
> > > To: Zhoujian (jay) <jianjay.zhou@huawei.com>
> > > Cc: kvm@vger.kernel.org; pbonzini@redhat.com; wangxin (U)
> > > <wangxinxin.wang@huawei.com>; Huangweidong (C)
> > > <weidong.huang@huawei.com>; sean.j.christopherson@intel.com;
> > > Liujinsong
> > > (Paul) <liu.jinsong@huawei.com>
> > > Subject: Re: [PATCH v2] KVM: x86: enable dirty log gradually in
> > > small chunks
> > >
> > > On Thu, Feb 20, 2020 at 12:28:28PM +0800, Jay Zhou wrote:
> > > > @@ -5865,8 +5865,12 @@ void
> > > kvm_mmu_slot_remove_write_access(struct kvm *kvm,
> > > >  	bool flush;
> > > >
> > > >  	spin_lock(&kvm->mmu_lock);
> > > > -	flush = slot_handle_all_level(kvm, memslot,
> slot_rmap_write_protect,
> > > > -				      false);
> > > > +	if (kvm->manual_dirty_log_protect &
> KVM_DIRTY_LOG_INITIALLY_SET)
> > > > +		flush = slot_handle_large_level(kvm, memslot,
> > > > +						slot_rmap_write_protect, false);
> > > > +	else
> > > > +		flush = slot_handle_all_level(kvm, memslot,
> > > > +						slot_rmap_write_protect, false);
> > >
> > > Another extra comment:
> > >
> > > I think we should still keep the old behavior for KVM_MEM_READONLY
> > > (in
> > > kvm_mmu_slot_apply_flags())) for this...
> >
> > I also realized this issue after posting this patch, and I agree.
> >
> > > Say, instead of doing this, maybe we want
> > > kvm_mmu_slot_remove_write_access() to take a new parameter to decide
> > > to which level we do the wr-protect.
> >
> > How about using the "flags" field to distinguish:
> >
> > 		if ((kvm->manual_dirty_log_protect &
> KVM_DIRTY_LOG_INITIALLY_SET)
> >                 && (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES))
> >                 flush = slot_handle_large_level(kvm, memslot,
> >                                         slot_rmap_write_protect,
> false);
> >         else
> >                 flush = slot_handle_all_level(kvm, memslot,
> >                                         slot_rmap_write_protect,
> > false);
> 
> This seems to be OK too.  But just to show what I meant (which I still think
> could be a bit clearer; assuming kvm_manual_dirty_log_init_set() is the helper
> you'll introduce):
> 
> diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h index 40a0c0fd95ca..a90630cde92d
> 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1312,7 +1312,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask,
> u64 accessed_mask,
> 
>  void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);  void
> kvm_mmu_slot_remove_write_access(struct kvm *kvm,
> -                                     struct kvm_memory_slot
> *memslot);
> +                                     struct kvm_memory_slot
> *memslot,
> +                                     int start_level);
>  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>                                    const struct kvm_memory_slot
> *memslot);  void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, diff --git
> a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index
> 87e9ba27ada1..f538b7977fa2 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5860,13 +5860,14 @@ static bool slot_rmap_write_protect(struct kvm
> *kvm,  }
> 
>  void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
> -                                     struct kvm_memory_slot
> *memslot)
> +                                     struct kvm_memory_slot
> *memslot,
> +                                     int start_level)
>  {
>         bool flush;
> 
>         spin_lock(&kvm->mmu_lock);
> -       flush = slot_handle_all_level(kvm, memslot,
> slot_rmap_write_protect,
> -                                     false);
> +       flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
> +                                 start_level,
> PT_MAX_HUGEPAGE_LEVEL,
> + false);
>         spin_unlock(&kvm->mmu_lock);
> 
>         /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
> fb5d64ebc35d..2ed3204dfd9f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9956,7 +9956,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm
> *kvm,  {
>         /* Still write protect RO slot */
>         if (new->flags & KVM_MEM_READONLY) {
> -               kvm_mmu_slot_remove_write_access(kvm, new);
> +               kvm_mmu_slot_remove_write_access(kvm, new,
> + PT_PAGE_TABLE_LEVEL);
>                 return;
>         }
> 
> @@ -9993,8 +9993,20 @@ static void kvm_mmu_slot_apply_flags(struct kvm
> *kvm,
>         if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
>                 if (kvm_x86_ops->slot_enable_log_dirty)
>                         kvm_x86_ops->slot_enable_log_dirty(kvm, new);
> -               else
> -                       kvm_mmu_slot_remove_write_access(kvm,
> new);
> +               else {
> +                       int level = kvm_manual_dirty_log_init_set(kvm) ?
> +                           PT_DIRECTORY_LEVEL :
> PT_PAGE_TABLE_LEVEL;
> +
> +                       /*
> +                        * If we're with intial-all-set, we don't need

s/intial/initial

> +                        * to write protect any small page because
> +                        * they're reported as dirty already.  However
> +                        * we still need to write-protect huge pages
> +                        * so that the page split can happen lazily on
> +                        * the first write to the huge page.
> +                        */
> +                       kvm_mmu_slot_remove_write_access(kvm, new,
> level);
> +               }
>         } else {
>                 if (kvm_x86_ops->slot_disable_log_dirty)
>                         kvm_x86_ops->slot_disable_log_dirty(kvm, new);
> 

Good suggestion, it does much clearer in kvm_mmu_slot_remove_write_access
adding a new start_level parameter, will add this in v3, thanks!

Regards,
Jay Zhou

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-02-22  8:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20  4:28 [PATCH v2] KVM: x86: enable dirty log gradually in small chunks Jay Zhou
2020-02-20 19:17 ` Peter Xu
2020-02-20 19:23   ` Sean Christopherson
2020-02-20 19:42     ` Peter Xu
2020-02-21  9:43       ` Zhoujian (jay)
2020-02-21  9:31   ` Zhoujian (jay)
2020-02-21 15:19     ` Peter Xu
2020-02-22  8:11       ` Zhoujian (jay)
2020-02-20 19:28 ` Peter Xu
2020-02-21  9:53   ` Zhoujian (jay)
2020-02-21 15:41     ` Peter Xu
2020-02-22  8:18       ` Zhoujian (jay)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.