All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH being tested] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
@ 2012-04-11 11:22 Takuya Yoshikawa
  2012-04-12 22:56 ` Marcelo Tosatti
  0 siblings, 1 reply; 3+ messages in thread
From: Takuya Yoshikawa @ 2012-04-11 11:22 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, yoshikawa.takuya

I am now testing the following patch.

Note: this technique is used in several subsystems, e.g. jbd.

Although people tend to say that holding mmu_lock during get_dirty is
always a problem, my impression is slightly different.

When we call get_dirty, most of hot memory pages have already been
written at least once and faults are becoming rare.

Actually I rarely saw rescheduling due to mmu_lock contention when
I tested this patch locally -- though not enough.

In contrast, if we do O(1), we need to write protect 511 pages soon
after the get_dirty and the chance of mmu_lock contention may increase
if multiple VCPUs try to write to memory.

Anyway, this patch is small and seems effective.

	Takuya

===
From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

get_dirty_log() needs to hold mmu_lock during write protecting dirty
pages and this can be long when there are many dirty pages to protect.

As the guest can get faulted during that time, this may result in a
severe latency problem which would prevent the system to scale.

This patch mitigates this by checking mmu_lock contention for every 2K
dirty pages we protect: we have selected this value since it took about
100us to get 2K dirty pages.

TODO: more numbers.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/include/asm/kvm_host.h |    6 +++---
 arch/x86/kvm/mmu.c              |   12 +++++++++---
 arch/x86/kvm/x86.c              |   18 +++++++++++++-----
 3 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f624ca7..26b39c1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -712,9 +712,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
-void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
-				     struct kvm_memory_slot *slot,
-				     gfn_t gfn_offset, unsigned long mask);
+int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+				    struct kvm_memory_slot *slot,
+				    gfn_t gfn_offset, unsigned long mask);
 void kvm_mmu_zap_all(struct kvm *kvm);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 29ad6f9..b88c5cc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1081,20 +1081,26 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level
  *
  * Used when we do not need to care about huge page mappings: e.g. during dirty
  * logging we do not have any such mappings.
+ *
+ * Returns the number of pages protected by this.
  */
-void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
-				     struct kvm_memory_slot *slot,
-				     gfn_t gfn_offset, unsigned long mask)
+int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+				    struct kvm_memory_slot *slot,
+				    gfn_t gfn_offset, unsigned long mask)
 {
 	unsigned long *rmapp;
+	int nr_protected = 0;
 
 	while (mask) {
 		rmapp = &slot->rmap[gfn_offset + __ffs(mask)];
 		__rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL);
+		++nr_protected;
 
 		/* clear the first set bit */
 		mask &= mask - 1;
 	}
+
+	return nr_protected;
 }
 
 static int rmap_write_protect(struct kvm *kvm, u64 gfn)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0d9a578..b636669 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3092,7 +3092,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long n, i;
 	unsigned long *dirty_bitmap;
 	unsigned long *dirty_bitmap_buffer;
-	bool is_dirty = false;
+	int nr_protected = 0;
 
 	mutex_lock(&kvm->slots_lock);
 
@@ -3121,15 +3121,23 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 		if (!dirty_bitmap[i])
 			continue;
 
-		is_dirty = true;
-
 		mask = xchg(&dirty_bitmap[i], 0);
 		dirty_bitmap_buffer[i] = mask;
 
 		offset = i * BITS_PER_LONG;
-		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+		nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot,
+								offset, mask);
+		if (nr_protected > 2048) {
+			if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+				kvm_flush_remote_tlbs(kvm);
+				spin_unlock(&kvm->mmu_lock);
+				cond_resched();
+				spin_lock(&kvm->mmu_lock);
+			}
+			nr_protected = 0;
+		}
 	}
-	if (is_dirty)
+	if (nr_protected)
 		kvm_flush_remote_tlbs(kvm);
 
 	spin_unlock(&kvm->mmu_lock);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH being tested] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-11 11:22 [PATCH being tested] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa
@ 2012-04-12 22:56 ` Marcelo Tosatti
  2012-04-14  0:35   ` Takuya Yoshikawa
  0 siblings, 1 reply; 3+ messages in thread
From: Marcelo Tosatti @ 2012-04-12 22:56 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: avi, kvm, yoshikawa.takuya

On Wed, Apr 11, 2012 at 08:22:07PM +0900, Takuya Yoshikawa wrote:
> I am now testing the following patch.
> 
> Note: this technique is used in several subsystems, e.g. jbd.
> 
> Although people tend to say that holding mmu_lock during get_dirty is
> always a problem, my impression is slightly different.

Other than potential performance improvement, the worst case scenario
of holding mmu_lock for hundreds of milliseconds at the beginning 
of migration of huge guests must be fixed.

> When we call get_dirty, most of hot memory pages have already been
> written at least once and faults are becoming rare.
> 
> Actually I rarely saw rescheduling due to mmu_lock contention when
> I tested this patch locally -- though not enough.
> 
> In contrast, if we do O(1), we need to write protect 511 pages soon
> after the get_dirty and the chance of mmu_lock contention may increase
> if multiple VCPUs try to write to memory.
> 
> Anyway, this patch is small and seems effective.
> 
> 	Takuya
> 
> ===
> From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> 
> get_dirty_log() needs to hold mmu_lock during write protecting dirty
> pages and this can be long when there are many dirty pages to protect.
> 
> As the guest can get faulted during that time, this may result in a
> severe latency problem which would prevent the system to scale.
> 
> This patch mitigates this by checking mmu_lock contention for every 2K
> dirty pages we protect: we have selected this value since it took about
> 100us to get 2K dirty pages.
> 
> TODO: more numbers.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
>  arch/x86/include/asm/kvm_host.h |    6 +++---
>  arch/x86/kvm/mmu.c              |   12 +++++++++---
>  arch/x86/kvm/x86.c              |   18 +++++++++++++-----
>  3 files changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f624ca7..26b39c1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -712,9 +712,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>  
>  int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>  void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> -				     struct kvm_memory_slot *slot,
> -				     gfn_t gfn_offset, unsigned long mask);
> +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +				    struct kvm_memory_slot *slot,
> +				    gfn_t gfn_offset, unsigned long mask);
>  void kvm_mmu_zap_all(struct kvm *kvm);
>  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 29ad6f9..b88c5cc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1081,20 +1081,26 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level
>   *
>   * Used when we do not need to care about huge page mappings: e.g. during dirty
>   * logging we do not have any such mappings.
> + *
> + * Returns the number of pages protected by this.
>   */
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> -				     struct kvm_memory_slot *slot,
> -				     gfn_t gfn_offset, unsigned long mask)
> +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +				    struct kvm_memory_slot *slot,
> +				    gfn_t gfn_offset, unsigned long mask)
>  {
>  	unsigned long *rmapp;
> +	int nr_protected = 0;
>  
>  	while (mask) {
>  		rmapp = &slot->rmap[gfn_offset + __ffs(mask)];
>  		__rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL);
> +		++nr_protected;
>  
>  		/* clear the first set bit */
>  		mask &= mask - 1;
>  	}
> +
> +	return nr_protected;
>  }
>  
>  static int rmap_write_protect(struct kvm *kvm, u64 gfn)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0d9a578..b636669 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3092,7 +3092,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  	unsigned long n, i;
>  	unsigned long *dirty_bitmap;
>  	unsigned long *dirty_bitmap_buffer;
> -	bool is_dirty = false;
> +	int nr_protected = 0;
>  
>  	mutex_lock(&kvm->slots_lock);
>  
> @@ -3121,15 +3121,23 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  		if (!dirty_bitmap[i])
>  			continue;
>  
> -		is_dirty = true;
> -
>  		mask = xchg(&dirty_bitmap[i], 0);
>  		dirty_bitmap_buffer[i] = mask;
>  
>  		offset = i * BITS_PER_LONG;
> -		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> +		nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot,
> +								offset, mask);
> +		if (nr_protected > 2048) {

Can you expand on the reasoning behind this?


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH being tested] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-12 22:56 ` Marcelo Tosatti
@ 2012-04-14  0:35   ` Takuya Yoshikawa
  0 siblings, 0 replies; 3+ messages in thread
From: Takuya Yoshikawa @ 2012-04-14  0:35 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: avi, kvm, yoshikawa.takuya

On Thu, 12 Apr 2012 19:56:45 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> Other than potential performance improvement, the worst case scenario
> of holding mmu_lock for hundreds of milliseconds at the beginning 
> of migration of huge guests must be fixed.

Write protection in kvm_arch_commit_memory_region() can be treated
similarly.

I am now checking that part together with my
	KVM: Avoid zapping unrelated shadows in __kvm_set_memory_region()
because if we do rmap-based write protection when we enable dirty-logging,
sp->slot_bitmap necessary for kvm_mmu_slot_remove_write_access() can be
removed.

People who want to support more devices/slots will be happy, no?

But then, I need to find another way to eliminate shadow flushes.
Maybe I should leave sp->slot_bitmap removal to those who really want to
do that.


> > @@ -3121,15 +3121,23 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> >  		if (!dirty_bitmap[i])
> >  			continue;
> >  
> > -		is_dirty = true;
> > -
> >  		mask = xchg(&dirty_bitmap[i], 0);
> >  		dirty_bitmap_buffer[i] = mask;
> >  
> >  		offset = i * BITS_PER_LONG;
> > -		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> > +		nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot,
> > +								offset, mask);
> > +		if (nr_protected > 2048) {
> 
> Can you expand on the reasoning behind this?


Sure.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-04-14  0:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11 11:22 [PATCH being tested] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa
2012-04-12 22:56 ` Marcelo Tosatti
2012-04-14  0:35   ` Takuya Yoshikawa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.