All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes
@ 2018-12-06  7:58 Wanpeng Li
  2018-12-14  7:24 ` Wanpeng Li
  2018-12-20 14:43 ` Radim Krčmář
  0 siblings, 2 replies; 5+ messages in thread
From: Wanpeng Li @ 2018-12-06  7:58 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář

From: Wanpeng Li <wanpengli@tencent.com>

Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop() 
takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable 
migration downtime. [1] [2]

Guangrong pointed out:

| collapsible_sptes zaps 4k mappings to make memory-read happy, it is not
| required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not
| urgent for vCPU's running, it could be done in a separate thread and use
| lock-break technology.

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05249.html
[2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg449994.html

Several TB memory guest is common now after NVDIMM is deployed in cloud environment.
This patch utilizes worker thread to zap collapsible sptes in order to lazy collapse 
small sptes into large sptes during roll-back after live migration fails.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/mmu.c              | 37 ++++++++++++++++++++++++++++++++-----
 arch/x86/kvm/x86.c              |  4 ++++
 3 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fbda5a9..dde32f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -892,6 +892,8 @@ struct kvm_arch {
 	u64 master_cycle_now;
 	struct delayed_work kvmclock_update_work;
 	struct delayed_work kvmclock_sync_work;
+	struct delayed_work kvm_mmu_zap_collapsible_sptes_work;
+	bool zap_in_progress;
 
 	struct kvm_xen_hvm_config xen_hvm_config;
 
@@ -1247,6 +1249,7 @@ void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots *slots);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
+void zap_collapsible_sptes_fn(struct work_struct *work);
 
 int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
 bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7c03c0f..fe87dd3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5679,14 +5679,41 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 	return need_tlb_flush;
 }
 
+void zap_collapsible_sptes_fn(struct work_struct *work)
+{
+	struct kvm_memory_slot *memslot;
+	struct kvm_memslots *slots;
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct kvm_arch *ka = container_of(dwork, struct kvm_arch,
+					   kvm_mmu_zap_collapsible_sptes_work);
+	struct kvm *kvm = container_of(ka, struct kvm, arch);
+	int i;
+
+	mutex_lock(&kvm->slots_lock);
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		spin_lock(&kvm->mmu_lock);
+		slots = __kvm_memslots(kvm, i);
+		kvm_for_each_memslot(memslot, slots) {
+			slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
+				kvm_mmu_zap_collapsible_spte, true);
+			if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+				cond_resched_lock(&kvm->mmu_lock);
+		}
+		spin_unlock(&kvm->mmu_lock);
+	}
+	kvm->arch.zap_in_progress = false;
+	mutex_unlock(&kvm->slots_lock);
+}
+
+#define KVM_MMU_ZAP_DELAYED (60 * HZ)
 void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
 				   const struct kvm_memory_slot *memslot)
 {
-	/* FIXME: const-ify all uses of struct kvm_memory_slot.  */
-	spin_lock(&kvm->mmu_lock);
-	slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
-			 kvm_mmu_zap_collapsible_spte, true);
-	spin_unlock(&kvm->mmu_lock);
+	if (!kvm->arch.zap_in_progress) {
+		kvm->arch.zap_in_progress = true;
+		schedule_delayed_work(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work,
+			KVM_MMU_ZAP_DELAYED);
+	}
 }
 
 void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d029377..c2af289 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9019,6 +9019,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
+	INIT_DELAYED_WORK(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work,
+			zap_collapsible_sptes_fn);
+	kvm->arch.zap_in_progress = false;
 
 	kvm_hv_init_vm(kvm);
 	kvm_page_track_init(kvm);
@@ -9064,6 +9067,7 @@ void kvm_arch_sync_events(struct kvm *kvm)
 {
 	cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work);
 	cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work);
+	cancel_delayed_work_sync(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work);
 	kvm_free_pit(kvm);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes
  2018-12-06  7:58 [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes Wanpeng Li
@ 2018-12-14  7:24 ` Wanpeng Li
  2018-12-20  6:16   ` Wanpeng Li
  2018-12-20 14:43 ` Radim Krčmář
  1 sibling, 1 reply; 5+ messages in thread
From: Wanpeng Li @ 2018-12-14  7:24 UTC (permalink / raw)
  To: LKML, kvm; +Cc: Paolo Bonzini, Radim Krcmar

ping,
On Thu, 6 Dec 2018 at 15:58, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> From: Wanpeng Li <wanpengli@tencent.com>
>
> Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop()
> takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable
> migration downtime. [1] [2]
>
> Guangrong pointed out:
>
> | collapsible_sptes zaps 4k mappings to make memory-read happy, it is not
> | required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not
> | urgent for vCPU's running, it could be done in a separate thread and use
> | lock-break technology.
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05249.html
> [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg449994.html
>
> Several TB memory guest is common now after NVDIMM is deployed in cloud environment.
> This patch utilizes worker thread to zap collapsible sptes in order to lazy collapse
> small sptes into large sptes during roll-back after live migration fails.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  3 +++
>  arch/x86/kvm/mmu.c              | 37 ++++++++++++++++++++++++++++++++-----
>  arch/x86/kvm/x86.c              |  4 ++++
>  3 files changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index fbda5a9..dde32f9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -892,6 +892,8 @@ struct kvm_arch {
>         u64 master_cycle_now;
>         struct delayed_work kvmclock_update_work;
>         struct delayed_work kvmclock_sync_work;
> +       struct delayed_work kvm_mmu_zap_collapsible_sptes_work;
> +       bool zap_in_progress;
>
>         struct kvm_xen_hvm_config xen_hvm_config;
>
> @@ -1247,6 +1249,7 @@ void kvm_mmu_zap_all(struct kvm *kvm);
>  void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots *slots);
>  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
> +void zap_collapsible_sptes_fn(struct work_struct *work);
>
>  int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
>  bool pdptrs_changed(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 7c03c0f..fe87dd3 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -5679,14 +5679,41 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>         return need_tlb_flush;
>  }
>
> +void zap_collapsible_sptes_fn(struct work_struct *work)
> +{
> +       struct kvm_memory_slot *memslot;
> +       struct kvm_memslots *slots;
> +       struct delayed_work *dwork = to_delayed_work(work);
> +       struct kvm_arch *ka = container_of(dwork, struct kvm_arch,
> +                                          kvm_mmu_zap_collapsible_sptes_work);
> +       struct kvm *kvm = container_of(ka, struct kvm, arch);
> +       int i;
> +
> +       mutex_lock(&kvm->slots_lock);
> +       for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> +               spin_lock(&kvm->mmu_lock);
> +               slots = __kvm_memslots(kvm, i);
> +               kvm_for_each_memslot(memslot, slots) {
> +                       slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
> +                               kvm_mmu_zap_collapsible_spte, true);
> +                       if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +                               cond_resched_lock(&kvm->mmu_lock);
> +               }
> +               spin_unlock(&kvm->mmu_lock);
> +       }
> +       kvm->arch.zap_in_progress = false;
> +       mutex_unlock(&kvm->slots_lock);
> +}
> +
> +#define KVM_MMU_ZAP_DELAYED (60 * HZ)
>  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>                                    const struct kvm_memory_slot *memslot)
>  {
> -       /* FIXME: const-ify all uses of struct kvm_memory_slot.  */
> -       spin_lock(&kvm->mmu_lock);
> -       slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
> -                        kvm_mmu_zap_collapsible_spte, true);
> -       spin_unlock(&kvm->mmu_lock);
> +       if (!kvm->arch.zap_in_progress) {
> +               kvm->arch.zap_in_progress = true;
> +               schedule_delayed_work(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work,
> +                       KVM_MMU_ZAP_DELAYED);
> +       }
>  }
>
>  void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d029377..c2af289 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9019,6 +9019,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>
>         INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
>         INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
> +       INIT_DELAYED_WORK(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work,
> +                       zap_collapsible_sptes_fn);
> +       kvm->arch.zap_in_progress = false;
>
>         kvm_hv_init_vm(kvm);
>         kvm_page_track_init(kvm);
> @@ -9064,6 +9067,7 @@ void kvm_arch_sync_events(struct kvm *kvm)
>  {
>         cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work);
>         cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work);
> +       cancel_delayed_work_sync(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work);
>         kvm_free_pit(kvm);
>  }
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes
  2018-12-14  7:24 ` Wanpeng Li
@ 2018-12-20  6:16   ` Wanpeng Li
  0 siblings, 0 replies; 5+ messages in thread
From: Wanpeng Li @ 2018-12-20  6:16 UTC (permalink / raw)
  To: LKML, kvm; +Cc: Paolo Bonzini, Radim Krcmar

kindly ping,
On Fri, 14 Dec 2018 at 15:24, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> ping,
> On Thu, 6 Dec 2018 at 15:58, Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop()
> > takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable
> > migration downtime. [1] [2]
> >
> > Guangrong pointed out:
> >
> > | collapsible_sptes zaps 4k mappings to make memory-read happy, it is not
> > | required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not
> > | urgent for vCPU's running, it could be done in a separate thread and use
> > | lock-break technology.
> >
> > [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05249.html
> > [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg449994.html
> >
> > Several TB memory guest is common now after NVDIMM is deployed in cloud environment.
> > This patch utilizes worker thread to zap collapsible sptes in order to lazy collapse
> > small sptes into large sptes during roll-back after live migration fails.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h |  3 +++
> >  arch/x86/kvm/mmu.c              | 37 ++++++++++++++++++++++++++++++++-----
> >  arch/x86/kvm/x86.c              |  4 ++++
> >  3 files changed, 39 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index fbda5a9..dde32f9 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -892,6 +892,8 @@ struct kvm_arch {
> >         u64 master_cycle_now;
> >         struct delayed_work kvmclock_update_work;
> >         struct delayed_work kvmclock_sync_work;
> > +       struct delayed_work kvm_mmu_zap_collapsible_sptes_work;
> > +       bool zap_in_progress;
> >
> >         struct kvm_xen_hvm_config xen_hvm_config;
> >
> > @@ -1247,6 +1249,7 @@ void kvm_mmu_zap_all(struct kvm *kvm);
> >  void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots *slots);
> >  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
> >  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
> > +void zap_collapsible_sptes_fn(struct work_struct *work);
> >
> >  int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
> >  bool pdptrs_changed(struct kvm_vcpu *vcpu);
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index 7c03c0f..fe87dd3 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -5679,14 +5679,41 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
> >         return need_tlb_flush;
> >  }
> >
> > +void zap_collapsible_sptes_fn(struct work_struct *work)
> > +{
> > +       struct kvm_memory_slot *memslot;
> > +       struct kvm_memslots *slots;
> > +       struct delayed_work *dwork = to_delayed_work(work);
> > +       struct kvm_arch *ka = container_of(dwork, struct kvm_arch,
> > +                                          kvm_mmu_zap_collapsible_sptes_work);
> > +       struct kvm *kvm = container_of(ka, struct kvm, arch);
> > +       int i;
> > +
> > +       mutex_lock(&kvm->slots_lock);
> > +       for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> > +               spin_lock(&kvm->mmu_lock);
> > +               slots = __kvm_memslots(kvm, i);
> > +               kvm_for_each_memslot(memslot, slots) {
> > +                       slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
> > +                               kvm_mmu_zap_collapsible_spte, true);
> > +                       if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> > +                               cond_resched_lock(&kvm->mmu_lock);
> > +               }
> > +               spin_unlock(&kvm->mmu_lock);
> > +       }
> > +       kvm->arch.zap_in_progress = false;
> > +       mutex_unlock(&kvm->slots_lock);
> > +}
> > +
> > +#define KVM_MMU_ZAP_DELAYED (60 * HZ)
> >  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> >                                    const struct kvm_memory_slot *memslot)
> >  {
> > -       /* FIXME: const-ify all uses of struct kvm_memory_slot.  */
> > -       spin_lock(&kvm->mmu_lock);
> > -       slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
> > -                        kvm_mmu_zap_collapsible_spte, true);
> > -       spin_unlock(&kvm->mmu_lock);
> > +       if (!kvm->arch.zap_in_progress) {
> > +               kvm->arch.zap_in_progress = true;
> > +               schedule_delayed_work(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work,
> > +                       KVM_MMU_ZAP_DELAYED);
> > +       }
> >  }
> >
> >  void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index d029377..c2af289 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9019,6 +9019,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >
> >         INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
> >         INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
> > +       INIT_DELAYED_WORK(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work,
> > +                       zap_collapsible_sptes_fn);
> > +       kvm->arch.zap_in_progress = false;
> >
> >         kvm_hv_init_vm(kvm);
> >         kvm_page_track_init(kvm);
> > @@ -9064,6 +9067,7 @@ void kvm_arch_sync_events(struct kvm *kvm)
> >  {
> >         cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work);
> >         cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work);
> > +       cancel_delayed_work_sync(&kvm->arch.kvm_mmu_zap_collapsible_sptes_work);
> >         kvm_free_pit(kvm);
> >  }
> >
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes
  2018-12-06  7:58 [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes Wanpeng Li
  2018-12-14  7:24 ` Wanpeng Li
@ 2018-12-20 14:43 ` Radim Krčmář
  2018-12-21  0:46   ` Wanpeng Li
  1 sibling, 1 reply; 5+ messages in thread
From: Radim Krčmář @ 2018-12-20 14:43 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini

2018-12-06 15:58+0800, Wanpeng Li:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop() 
> takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable 
> migration downtime. [1] [2]
> 
> Guangrong pointed out:
> 
> | collapsible_sptes zaps 4k mappings to make memory-read happy, it is not
> | required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not
> | urgent for vCPU's running, it could be done in a separate thread and use
> | lock-break technology.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05249.html
> [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg449994.html
> 
> Several TB memory guest is common now after NVDIMM is deployed in cloud environment.
> This patch utilizes worker thread to zap collapsible sptes in order to lazy collapse 
> small sptes into large sptes during roll-back after live migration fails.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> @@ -5679,14 +5679,41 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
>  	return need_tlb_flush;
>  }
>  
> +void zap_collapsible_sptes_fn(struct work_struct *work)
> +{
> +	struct kvm_memory_slot *memslot;
> +	struct kvm_memslots *slots;
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct kvm_arch *ka = container_of(dwork, struct kvm_arch,
> +					   kvm_mmu_zap_collapsible_sptes_work);
> +	struct kvm *kvm = container_of(ka, struct kvm, arch);
> +	int i;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> +		spin_lock(&kvm->mmu_lock);
> +		slots = __kvm_memslots(kvm, i);
> +		kvm_for_each_memslot(memslot, slots) {
> +			slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
> +				kvm_mmu_zap_collapsible_spte, true);
> +			if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +				cond_resched_lock(&kvm->mmu_lock);

I think we shouldn't zap all memslots when kvm_mmu_zap_collapsible_sptes
only wanted to zap a specific one.
Please add a list of memslots to be zapped; delete from the list here
and add in kvm_mmu_zap_collapsible_sptes().

> +		}
> +		spin_unlock(&kvm->mmu_lock);
> +	}
> +	kvm->arch.zap_in_progress = false;
> +	mutex_unlock(&kvm->slots_lock);
> +}
> +
> +#define KVM_MMU_ZAP_DELAYED (60 * HZ)
>  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
>  				   const struct kvm_memory_slot *memslot)
>  {
> -	/* FIXME: const-ify all uses of struct kvm_memory_slot.  */
> -	spin_lock(&kvm->mmu_lock);
> -	slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
> -			 kvm_mmu_zap_collapsible_spte, true);
> -	spin_unlock(&kvm->mmu_lock);
> +	if (!kvm->arch.zap_in_progress) {

The list can also serve in place of zap_in_progress -- if there were any
elements in it, then there is no need to schedule the work again.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes
  2018-12-20 14:43 ` Radim Krčmář
@ 2018-12-21  0:46   ` Wanpeng Li
  0 siblings, 0 replies; 5+ messages in thread
From: Wanpeng Li @ 2018-12-21  0:46 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: LKML, kvm, Paolo Bonzini

On Thu, 20 Dec 2018 at 22:43, Radim Krčmář <rkrcmar@redhat.com> wrote:
>
> 2018-12-06 15:58+0800, Wanpeng Li:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop()
> > takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable
> > migration downtime. [1] [2]
> >
> > Guangrong pointed out:
> >
> > | collapsible_sptes zaps 4k mappings to make memory-read happy, it is not
> > | required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not
> > | urgent for vCPU's running, it could be done in a separate thread and use
> > | lock-break technology.
> >
> > [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05249.html
> > [2] https://www.mail-archive.com/qemu-devel@nongnu.org/msg449994.html
> >
> > Several TB memory guest is common now after NVDIMM is deployed in cloud environment.
> > This patch utilizes worker thread to zap collapsible sptes in order to lazy collapse
> > small sptes into large sptes during roll-back after live migration fails.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > @@ -5679,14 +5679,41 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
> >       return need_tlb_flush;
> >  }
> >
> > +void zap_collapsible_sptes_fn(struct work_struct *work)
> > +{
> > +     struct kvm_memory_slot *memslot;
> > +     struct kvm_memslots *slots;
> > +     struct delayed_work *dwork = to_delayed_work(work);
> > +     struct kvm_arch *ka = container_of(dwork, struct kvm_arch,
> > +                                        kvm_mmu_zap_collapsible_sptes_work);
> > +     struct kvm *kvm = container_of(ka, struct kvm, arch);
> > +     int i;
> > +
> > +     mutex_lock(&kvm->slots_lock);
> > +     for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> > +             spin_lock(&kvm->mmu_lock);
> > +             slots = __kvm_memslots(kvm, i);
> > +             kvm_for_each_memslot(memslot, slots) {
> > +                     slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
> > +                             kvm_mmu_zap_collapsible_spte, true);
> > +                     if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> > +                             cond_resched_lock(&kvm->mmu_lock);
>
> I think we shouldn't zap all memslots when kvm_mmu_zap_collapsible_sptes
> only wanted to zap a specific one.
> Please add a list of memslots to be zapped; delete from the list here
> and add in kvm_mmu_zap_collapsible_sptes().

Yeah, that's my original plan, however, i observe a lot of races here,
the memslot can disappear/modify underneath before the worker thread
start to zap even if i introduce lock to protect the list. This patch
delays the worker thread by 60s(to assume memory_global_dirty_log_stop
can absolutely complete) to coalesce all the zap requirements after
live migration fails.

Regards,
Wanpeng Li

>
> > +             }
> > +             spin_unlock(&kvm->mmu_lock);
> > +     }
> > +     kvm->arch.zap_in_progress = false;
> > +     mutex_unlock(&kvm->slots_lock);
> > +}
> > +
> > +#define KVM_MMU_ZAP_DELAYED (60 * HZ)
> >  void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> >                                  const struct kvm_memory_slot *memslot)
> >  {
> > -     /* FIXME: const-ify all uses of struct kvm_memory_slot.  */
> > -     spin_lock(&kvm->mmu_lock);
> > -     slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
> > -                      kvm_mmu_zap_collapsible_spte, true);
> > -     spin_unlock(&kvm->mmu_lock);
> > +     if (!kvm->arch.zap_in_progress) {
>
> The list can also serve in place of zap_in_progress -- if there were any
> elements in it, then there is no need to schedule the work again.
>
> Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-12-21  0:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-06  7:58 [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes Wanpeng Li
2018-12-14  7:24 ` Wanpeng Li
2018-12-20  6:16   ` Wanpeng Li
2018-12-20 14:43 ` Radim Krčmář
2018-12-21  0:46   ` Wanpeng Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.