[Qemu-devel] About QEMU BQL and dirty log switch in Migration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] About QEMU BQL and dirty log switch in Migration
@ 2017-04-24 11:46 Yang Hongyang
  2017-04-24 12:06 ` Juan Quintela
  0 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2017-04-24 11:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr. David Alan Gilbert, Huangzhichao, wangxinxin.wang, quintela,
	pbonzini, Gonglei (Arei)

Hi all,

We found dirty log switch costs more then 13 seconds while migrating
a 4T memory guest, and dirty log switch is currently protected by QEMU
BQL. This causes guest freeze for a long time when switching dirty log on,
and the migration downtime is unacceptable.
Are there any chance to optimize the time cost for dirty log switch operation?
Or move the time consuming operation out of the QEMU BQL?

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-04-24 11:46 [Qemu-devel] About QEMU BQL and dirty log switch in Migration Yang Hongyang
@ 2017-04-24 12:06 ` Juan Quintela
  2017-04-24 12:13   ` Yang Hongyang
  0 siblings, 1 reply; 24+ messages in thread
From: Juan Quintela @ 2017-04-24 12:06 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: qemu-devel, Dr. David Alan Gilbert, Huangzhichao,
	wangxinxin.wang, pbonzini, Gonglei (Arei)

Yang Hongyang <yanghongyang@huawei.com> wrote:
> Hi all,
>
> We found dirty log switch costs more then 13 seconds while migrating
> a 4T memory guest, and dirty log switch is currently protected by QEMU
> BQL. This causes guest freeze for a long time when switching dirty log on,
> and the migration downtime is unacceptable.
> Are there any chance to optimize the time cost for dirty log switch operation?
> Or move the time consuming operation out of the QEMU BQL?

Hi

Could you specify what do you mean by dirty log switch?
The one inside kvm?
The merge between kvm one and migration bitmap?

Thanks, Juan.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-04-24 12:06 ` Juan Quintela
@ 2017-04-24 12:13   ` Yang Hongyang
  2017-04-24 16:42     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 24+ messages in thread
From: Yang Hongyang @ 2017-04-24 12:13 UTC (permalink / raw)
  To: quintela
  Cc: qemu-devel, Dr. David Alan Gilbert, Huangzhichao,
	wangxinxin.wang, pbonzini, Gonglei (Arei)



On 2017/4/24 20:06, Juan Quintela wrote:
> Yang Hongyang <yanghongyang@huawei.com> wrote:
>> Hi all,
>>
>> We found dirty log switch costs more then 13 seconds while migrating
>> a 4T memory guest, and dirty log switch is currently protected by QEMU
>> BQL. This causes guest freeze for a long time when switching dirty log on,
>> and the migration downtime is unacceptable.
>> Are there any chance to optimize the time cost for dirty log switch operation?
>> Or move the time consuming operation out of the QEMU BQL?
> 
> Hi
> 
> Could you specify what do you mean by dirty log switch?
> The one inside kvm?
> The merge between kvm one and migration bitmap?

The call of the following functions:
memory_global_dirty_log_start/stop();

> 
> Thanks, Juan.
> 

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-04-24 12:13   ` Yang Hongyang
@ 2017-04-24 16:42     ` Dr. David Alan Gilbert
  2017-04-26 15:46       ` Paolo Bonzini
  2017-05-11 12:07       ` Zhoujian (jay)
  0 siblings, 2 replies; 24+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-24 16:42 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: quintela, qemu-devel, Huangzhichao, wangxinxin.wang, pbonzini,
	Gonglei (Arei)

* Yang Hongyang (yanghongyang@huawei.com) wrote:
> 
> 
> On 2017/4/24 20:06, Juan Quintela wrote:
> > Yang Hongyang <yanghongyang@huawei.com> wrote:
> >> Hi all,
> >>
> >> We found dirty log switch costs more then 13 seconds while migrating
> >> a 4T memory guest, and dirty log switch is currently protected by QEMU
> >> BQL. This causes guest freeze for a long time when switching dirty log on,
> >> and the migration downtime is unacceptable.
> >> Are there any chance to optimize the time cost for dirty log switch operation?
> >> Or move the time consuming operation out of the QEMU BQL?
> > 
> > Hi
> > 
> > Could you specify what do you mean by dirty log switch?
> > The one inside kvm?
> > The merge between kvm one and migration bitmap?
> 
> The call of the following functions:
> memory_global_dirty_log_start/stop();

I suppose there's a few questions;
  a) Do we actually need the BQL - and if so why
  b) What actually takes 13s?  It's probably worth figuring
out where it goes,  the whole bitmap is only 1GB isn't it
even on a 4TB machine, and even the simplest way to fill
that takes way less than 13s.

Dave

> 
> > 
> > Thanks, Juan.
> > 
> 
> -- 
> Thanks,
> Yang
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-04-24 16:42     ` Dr. David Alan Gilbert
@ 2017-04-26 15:46       ` Paolo Bonzini
  2017-04-27  2:46         ` Yang Hongyang
  2017-05-11 12:07       ` Zhoujian (jay)
  1 sibling, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2017-04-26 15:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Yang Hongyang
  Cc: quintela, qemu-devel, Huangzhichao, wangxinxin.wang, Gonglei (Arei)



On 24/04/2017 18:42, Dr. David Alan Gilbert wrote:
> I suppose there's a few questions;
>   a) Do we actually need the BQL - and if so why
>   b) What actually takes 13s?  It's probably worth figuring
> out where it goes,  the whole bitmap is only 1GB isn't it
> even on a 4TB machine, and even the simplest way to fill
> that takes way less than 13s.

It's more likely that it is the migration_bitmap_sync immediately after
that.

I think it is possible to move migration_bitmap_sync outside BQL.  It
should be simpler to evaluate that after Juan's cleanups go in.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-04-26 15:46       ` Paolo Bonzini
@ 2017-04-27  2:46         ` Yang Hongyang
  0 siblings, 0 replies; 24+ messages in thread
From: Yang Hongyang @ 2017-04-27  2:46 UTC (permalink / raw)
  To: Paolo Bonzini, Dr. David Alan Gilbert
  Cc: quintela, qemu-devel, Huangzhichao, wangxinxin.wang, Gonglei (Arei)

Hi Paolo, Dave,

On 2017/4/26 23:46, Paolo Bonzini wrote:
> 
> 
> On 24/04/2017 18:42, Dr. David Alan Gilbert wrote:
>> I suppose there's a few questions;
>>   a) Do we actually need the BQL - and if so why

Enable/disable dirty log tracking are operations on memory regions.
That's why they need to be in BQL I think.

>>   b) What actually takes 13s?  It's probably worth figuring
>> out where it goes,  the whole bitmap is only 1GB isn't it
>> even on a 4TB machine, and even the simplest way to fill
>> that takes way less than 13s.

I found two time consuming operations in KVM module,

- one is kvm_mmu_slot_apply_flags(), when enable dirty log tracking
kvm_vm_ioctl_set_memory_region
  |->kvm_set_memory_region
     |->__kvm_set_memory_region
        |->kvm_arch_commit_memory_region
           |->kvm_mmu_slot_apply_flags
              ...

- the other is kvm_mmu_zap_collapsible_sptes(), when disable dirty log tracking
kvm_vm_ioctl_set_memory_region
  |->kvm_set_memory_region
     |->__kvm_set_memory_region
        |->kvm_arch_commit_memory_region
           |->kvm_mmu_zap_collapsible_sptes
              ...

Any ideas that could optimize the time spending for these operations?

> 
> It's more likely that it is the migration_bitmap_sync immediately after
> that.

It's not, it's enable/disable dirty log tracking that costs time.

> 
> I think it is possible to move migration_bitmap_sync outside BQL.  It
> should be simpler to evaluate that after Juan's cleanups go in.
> 
> Paolo
> 

-- 
Thanks,
Yang

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-04-24 16:42     ` Dr. David Alan Gilbert
  2017-04-26 15:46       ` Paolo Bonzini
@ 2017-05-11 12:07       ` Zhoujian (jay)
  2017-05-11 12:24         ` Paolo Bonzini
  1 sibling, 1 reply; 24+ messages in thread
From: Zhoujian (jay) @ 2017-05-11 12:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, yanghongyang
  Cc: quintela, wangxin (U), qemu-devel, Gonglei (Arei),
	Huangzhichao, pbonzini, Zhanghailiang, Herongguang (Stephen)

Hi all,

After applying the patch below, the time which
memory_global_dirty_log_stop() function takes is down to milliseconds
of a 4T memory guest, but I'm not sure whether this patch will trigger
other problems. Does this patch make sense?

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 464da93..fe26ee5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8313,6 +8313,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
                                enum kvm_mr_change change)
 {
        int nr_mmu_pages = 0;
+       int i;
+       struct kvm_vcpu *vcpu;
 
        if (!kvm->arch.n_requested_mmu_pages)
                nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm);
@@ -8328,14 +8330,15 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
         * in the source machine (for example if live migration fails), small
         * sptes will remain around and cause bad performance.
         *
-        * Scan sptes if dirty logging has been stopped, dropping those
-        * which can be collapsed into a single large-page spte.  Later
-        * page faults will create the large-page sptes.
+        * Reset each vcpu's mmu, then page faults will create the large-page
+        * sptes later.
         */
        if ((change != KVM_MR_DELETE) &&
                (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-               kvm_mmu_zap_collapsible_sptes(kvm, new);
+               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+               kvm_for_each_vcpu(i, vcpu, kvm)
+                       kvm_mmu_reset_context(vcpu);
+       }
 
        /*
         * Set up write protection and/or dirty logging for the new slot.

> * Yang Hongyang (yanghongyang@huawei.com) wrote:
> >
> >
> > On 2017/4/24 20:06, Juan Quintela wrote:
> > > Yang Hongyang <yanghongyang@huawei.com> wrote:
> > >> Hi all,
> > >>
> > >> We found dirty log switch costs more then 13 seconds while
> > >> migrating a 4T memory guest, and dirty log switch is currently
> > >> protected by QEMU BQL. This causes guest freeze for a long time
> > >> when switching dirty log on, and the migration downtime is
> unacceptable.
> > >> Are there any chance to optimize the time cost for dirty log switch
> operation?
> > >> Or move the time consuming operation out of the QEMU BQL?
> > >
> > > Hi
> > >
> > > Could you specify what do you mean by dirty log switch?
> > > The one inside kvm?
> > > The merge between kvm one and migration bitmap?
> >
> > The call of the following functions:
> > memory_global_dirty_log_start/stop();
> 
> I suppose there's a few questions;
>   a) Do we actually need the BQL - and if so why
>   b) What actually takes 13s?  It's probably worth figuring out where it
> goes,  the whole bitmap is only 1GB isn't it even on a 4TB machine, and
> even the simplest way to fill that takes way less than 13s.
> 
> Dave
> 
> >
> > >
> > > Thanks, Juan.
> > >
> >
> > --
> > Thanks,
> > Yang
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Regards,
Jay Zhou

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-11 12:07       ` Zhoujian (jay)
@ 2017-05-11 12:24         ` Paolo Bonzini
  2017-05-11 13:43           ` Wanpeng Li
  2017-05-12  8:09           ` Xiao Guangrong
  0 siblings, 2 replies; 24+ messages in thread
From: Paolo Bonzini @ 2017-05-11 12:24 UTC (permalink / raw)
  To: Zhoujian (jay), Dr. David Alan Gilbert, yanghongyang
  Cc: quintela, wangxin (U), qemu-devel, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Wanpeng Li



On 11/05/2017 14:07, Zhoujian (jay) wrote:
> -        * Scan sptes if dirty logging has been stopped, dropping those
> -        * which can be collapsed into a single large-page spte.  Later
> -        * page faults will create the large-page sptes.
> +        * Reset each vcpu's mmu, then page faults will create the large-page
> +        * sptes later.
>          */
>         if ((change != KVM_MR_DELETE) &&
>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
> +               kvm_for_each_vcpu(i, vcpu, kvm)
> +                       kvm_mmu_reset_context(vcpu);

This should be "kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);" but
I am not sure it is enough.  I think that if you do not zap the SPTEs,
the page faults will use 4K SPTEs, not large ones (though I'd have to
check better; CCing Xiao and Wanpeng).

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-11 12:24         ` Paolo Bonzini
@ 2017-05-11 13:43           ` Wanpeng Li
  2017-05-11 13:49             ` Wanpeng Li
  2017-05-17  2:20             ` Zhoujian (jay)
  2017-05-12  8:09           ` Xiao Guangrong
  1 sibling, 2 replies; 24+ messages in thread
From: Wanpeng Li @ 2017-05-11 13:43 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: Dr. David Alan Gilbert, yanghongyang, quintela, wangxin (U),
	qemu-devel, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Paolo Bonzini

2017-05-11 20:24 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 11/05/2017 14:07, Zhoujian (jay) wrote:
>> -        * Scan sptes if dirty logging has been stopped, dropping those
>> -        * which can be collapsed into a single large-page spte.  Later
>> -        * page faults will create the large-page sptes.
>> +        * Reset each vcpu's mmu, then page faults will create the large-page
>> +        * sptes later.
>>          */
>>         if ((change != KVM_MR_DELETE) &&
>>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);

This is an unlikely branch(unless guest live migration fails and
continue to run on the source machine) instead of hot path, do you
have any performance number for your real workloads?

Regards,
Wanpeng Li

>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
>> +               kvm_for_each_vcpu(i, vcpu, kvm)
>> +                       kvm_mmu_reset_context(vcpu);
>
> This should be "kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);" but
> I am not sure it is enough.  I think that if you do not zap the SPTEs,
> the page faults will use 4K SPTEs, not large ones (though I'd have to
> check better; CCing Xiao and Wanpeng).
>
> Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-11 13:43           ` Wanpeng Li
@ 2017-05-11 13:49             ` Wanpeng Li
  2017-05-11 14:18               ` Zhoujian (jay)
  2017-05-17  2:20             ` Zhoujian (jay)
  1 sibling, 1 reply; 24+ messages in thread
From: Wanpeng Li @ 2017-05-11 13:49 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: Dr. David Alan Gilbert, yanghongyang, quintela, wangxin (U),
	qemu-devel, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Paolo Bonzini

2017-05-11 21:43 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2017-05-11 20:24 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>
>>
>> On 11/05/2017 14:07, Zhoujian (jay) wrote:
>>> -        * Scan sptes if dirty logging has been stopped, dropping those
>>> -        * which can be collapsed into a single large-page spte.  Later
>>> -        * page faults will create the large-page sptes.
>>> +        * Reset each vcpu's mmu, then page faults will create the large-page
>>> +        * sptes later.
>>>          */
>>>         if ((change != KVM_MR_DELETE) &&
>>>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>
> This is an unlikely branch(unless guest live migration fails and
> continue to run on the source machine) instead of hot path, do you
> have any performance number for your real workloads?

I find the original discussion by google.
https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg04143.html
You will not go to this branch if the guest live migration
successfully.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-11 13:49             ` Wanpeng Li
@ 2017-05-11 14:18               ` Zhoujian (jay)
  2017-05-12  6:34                 ` Wanpeng Li
  0 siblings, 1 reply; 24+ messages in thread
From: Zhoujian (jay) @ 2017-05-11 14:18 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Dr. David Alan Gilbert, yanghongyang, quintela, wangxin (U),
	qemu-devel, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Paolo Bonzini

Hi Wanpeng,

> 2017-05-11 21:43 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> > 2017-05-11 20:24 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> >>
> >>
> >> On 11/05/2017 14:07, Zhoujian (jay) wrote:
> >>> -        * Scan sptes if dirty logging has been stopped, dropping
> those
> >>> -        * which can be collapsed into a single large-page spte.
> Later
> >>> -        * page faults will create the large-page sptes.
> >>> +        * Reset each vcpu's mmu, then page faults will create the
> large-page
> >>> +        * sptes later.
> >>>          */
> >>>         if ((change != KVM_MR_DELETE) &&
> >>>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> >>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> >>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
> >
> > This is an unlikely branch(unless guest live migration fails and
> > continue to run on the source machine) instead of hot path, do you
> > have any performance number for your real workloads?
> 
> I find the original discussion by google.
> https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg04143.html
> You will not go to this branch if the guest live migration successfully.

 In our tests, this branch is taken when living migration is successful.
 AFAIK, the kmod does not know whether living migration successful or not
 when dealing with KVM_SET_USER_MEMORY_REGION ioctl. Do I miss something?

Regards,
Jay Zhou

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-11 14:18               ` Zhoujian (jay)
@ 2017-05-12  6:34                 ` Wanpeng Li
  0 siblings, 0 replies; 24+ messages in thread
From: Wanpeng Li @ 2017-05-12  6:34 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: Dr. David Alan Gilbert, yanghongyang, quintela, wangxin (U),
	qemu-devel, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Paolo Bonzini

2017-05-11 22:18 GMT+08:00 Zhoujian (jay) <jianjay.zhou@huawei.com>:
> Hi Wanpeng,
>
>> 2017-05-11 21:43 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
>> > 2017-05-11 20:24 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>> >>
>> >>
>> >> On 11/05/2017 14:07, Zhoujian (jay) wrote:
>> >>> -        * Scan sptes if dirty logging has been stopped, dropping
>> those
>> >>> -        * which can be collapsed into a single large-page spte.
>> Later
>> >>> -        * page faults will create the large-page sptes.
>> >>> +        * Reset each vcpu's mmu, then page faults will create the
>> large-page
>> >>> +        * sptes later.
>> >>>          */
>> >>>         if ((change != KVM_MR_DELETE) &&
>> >>>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>> >>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> >>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>> >
>> > This is an unlikely branch(unless guest live migration fails and
>> > continue to run on the source machine) instead of hot path, do you
>> > have any performance number for your real workloads?
>>
>> I find the original discussion by google.
>> https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg04143.html
>> You will not go to this branch if the guest live migration successfully.
>
>  In our tests, this branch is taken when living migration is successful.
>  AFAIK, the kmod does not know whether living migration successful or not
>  when dealing with KVM_SET_USER_MEMORY_REGION ioctl. Do I miss something?

Original there is a bug which will not clear memslot dirty log flag
after live migration fails, a patch is submitted to fix it,
https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg00794.html,
however, I can't remember whether the dirty log flag will be cleared
if live migration complete successfully at that time, but maybe not.
Paolo replied to the patch he has a better method. Then I'm too busy
and didn't follow the qemu patch for this fix any more, I just find
this commit is merged currently:
http://git.qemu.org/?p=qemu.git;a=commit;h=6f6a5ef3e429f92f987678ea8c396aab4dc6aa19.
This commit will clear memslot dirty log flag after live migration no
matter whether it is successful or not.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-11 12:24         ` Paolo Bonzini
  2017-05-11 13:43           ` Wanpeng Li
@ 2017-05-12  8:09           ` Xiao Guangrong
  2017-05-12  8:42             ` Hailiang Zhang
  1 sibling, 1 reply; 24+ messages in thread
From: Xiao Guangrong @ 2017-05-12  8:09 UTC (permalink / raw)
  To: Paolo Bonzini, Zhoujian (jay), Dr. David Alan Gilbert, yanghongyang
  Cc: Wanpeng Li, Zhanghailiang, quintela, wangxin (U),
	Xiao Guangrong, qemu-devel, Gonglei (Arei),
	Huangzhichao, Herongguang (Stephen)



On 05/11/2017 08:24 PM, Paolo Bonzini wrote:
> 
> 
> On 11/05/2017 14:07, Zhoujian (jay) wrote:
>> -        * Scan sptes if dirty logging has been stopped, dropping those
>> -        * which can be collapsed into a single large-page spte.  Later
>> -        * page faults will create the large-page sptes.
>> +        * Reset each vcpu's mmu, then page faults will create the large-page
>> +        * sptes later.
>>           */
>>          if ((change != KVM_MR_DELETE) &&
>>                  (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
>> +               kvm_for_each_vcpu(i, vcpu, kvm)
>> +                       kvm_mmu_reset_context(vcpu);
> 
> This should be "kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);" but
> I am not sure it is enough.  I think that if you do not zap the SPTEs,
> the page faults will use 4K SPTEs, not large ones (though I'd have to
> check better; CCing Xiao and Wanpeng).

Yes, Paolo is right. kvm_mmu_reset_context() just reloads vCPU's
root page table, 4k mappings are still kept.

There are two issues reported:
- one is kvm_mmu_slot_apply_flags(), when enable dirty log tracking.

   Its root cause is kvm_mmu_slot_remove_write_access() takes too much
   time.

   We can make the code adaptive to use the new fast-write-protect faculty
   introduced by my patchset, i.e, if the number of pages contained in this
   memslot is more than > TOTAL * FAST_WRITE_PROTECT_PAGE_PERCENTAGE, then
   we use fast-write-protect instead.

- another one is kvm_mmu_zap_collapsible_sptes() when disable dirty
   log tracking.

   collapsible_sptes zaps 4k mappings to make memory-read happy, it is not
   required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not
   urgent for vCPU's running, it could be done in a separate thread and use
   lock-break technology.

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-12  8:09           ` Xiao Guangrong
@ 2017-05-12  8:42             ` Hailiang Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Hailiang Zhang @ 2017-05-12  8:42 UTC (permalink / raw)
  To: Xiao Guangrong, Paolo Bonzini, Zhoujian (jay),
	Dr. David Alan Gilbert, yanghongyang
  Cc: Wanpeng Li, quintela, wangxin (U),
	Xiao Guangrong, qemu-devel, Gonglei (Arei),
	Huangzhichao, Herongguang (Stephen)

On 2017/5/12 16:09, Xiao Guangrong wrote:
>
> On 05/11/2017 08:24 PM, Paolo Bonzini wrote:
>>
>> On 11/05/2017 14:07, Zhoujian (jay) wrote:
>>> -        * Scan sptes if dirty logging has been stopped, dropping those
>>> -        * which can be collapsed into a single large-page spte.  Later
>>> -        * page faults will create the large-page sptes.
>>> +        * Reset each vcpu's mmu, then page faults will create the large-page
>>> +        * sptes later.
>>>            */
>>>           if ((change != KVM_MR_DELETE) &&
>>>                   (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
>>> +               kvm_for_each_vcpu(i, vcpu, kvm)
>>> +                       kvm_mmu_reset_context(vcpu);
>> This should be "kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);" but
>> I am not sure it is enough.  I think that if you do not zap the SPTEs,
>> the page faults will use 4K SPTEs, not large ones (though I'd have to
>> check better; CCing Xiao and Wanpeng).
> Yes, Paolo is right. kvm_mmu_reset_context() just reloads vCPU's
> root page table, 4k mappings are still kept.
>
> There are two issues reported:
> - one is kvm_mmu_slot_apply_flags(), when enable dirty log tracking.
>
>     Its root cause is kvm_mmu_slot_remove_write_access() takes too much
>     time.
>
>     We can make the code adaptive to use the new fast-write-protect faculty
>     introduced by my patchset, i.e, if the number of pages contained in this
>     memslot is more than > TOTAL * FAST_WRITE_PROTECT_PAGE_PERCENTAGE, then
>     we use fast-write-protect instead.
>
> - another one is kvm_mmu_zap_collapsible_sptes() when disable dirty
>     log tracking.
>
>     collapsible_sptes zaps 4k mappings to make memory-read happy, it is not
>     required by the semanteme of KVM_SET_USER_MEMORY_REGION and it is not
>     urgent for vCPU's running, it could be done in a separate thread and use
>     lock-break technology.

How about move the action of stopping dirty log into migrate_fd_cleanup() directly,
which is processed in main thread as BH after migration is completed?  It will not
has any side-effect even migration is failed, Or users cancel migration, No ?

> Thanks!
>
>
> .
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-11 13:43           ` Wanpeng Li
  2017-05-11 13:49             ` Wanpeng Li
@ 2017-05-17  2:20             ` Zhoujian (jay)
  2017-05-17  5:47               ` Wanpeng Li
  2017-05-17  7:43               ` Paolo Bonzini
  1 sibling, 2 replies; 24+ messages in thread
From: Zhoujian (jay) @ 2017-05-17  2:20 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Dr. David Alan Gilbert, yanghongyang, quintela, wangxin (U),
	qemu-devel, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Paolo Bonzini, Huangweidong (C)

Hi Wanpeng,

> > On 11/05/2017 14:07, Zhoujian (jay) wrote:
> >> -        * Scan sptes if dirty logging has been stopped, dropping those
> >> -        * which can be collapsed into a single large-page spte.  Later
> >> -        * page faults will create the large-page sptes.
> >> +        * Reset each vcpu's mmu, then page faults will create the
> large-page
> >> +        * sptes later.
> >>          */
> >>         if ((change != KVM_MR_DELETE) &&
> >>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> >> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> >> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
> 
> This is an unlikely branch(unless guest live migration fails and continue
> to run on the source machine) instead of hot path, do you have any
> performance number for your real workloads?
> 

Sorry to bother you again.

Recently, I have tested the performance before migration and after migration failure
using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard performance
evaluation tool.

These are the results:
******
    Before migration the score is 153, and the TLB miss statistics of the qemu process is:
    linux-sjrfac:/mnt/zhoujian # perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
    dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10

    Performance counter stats for process id '26463':

           698,938      dTLB-load-misses          #    0.13% of all dTLB cache hits   (50.46%)
       543,303,875      dTLB-loads                                                    (50.43%)
           199,597      dTLB-store-misses                                             (16.51%)
        60,128,561      dTLB-stores                                                   (16.67%)
            69,986      iTLB-load-misses          #    6.17% of all iTLB cache hits   (16.67%)
         1,134,097      iTLB-loads                                                    (33.33%)

      10.000684064 seconds time elapsed

    After migration failure the score is 149, and the TLB miss statistics of the qemu process is:
    linux-sjrfac:/mnt/zhoujian # perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
    dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10

    Performance counter stats for process id '26463':

           765,400      dTLB-load-misses          #    0.14% of all dTLB cache hits   (50.50%)
       540,972,144      dTLB-loads                                                    (50.47%)
           207,670      dTLB-store-misses                                             (16.50%)
        58,363,787      dTLB-stores                                                   (16.67%)
           109,772      iTLB-load-misses          #    9.52% of all iTLB cache hits   (16.67%)
         1,152,784      iTLB-loads                                                    (33.32%)

      10.000703078 seconds time elapsed
******

These are the steps:
======
 (1) the version of kmod is 4.4.11(with slightly modified) and the version of qemu is 2.6.0
    (with slightly modified), the kmod is applied with the following patch according to
    Paolo's advice:

diff --git a/source/x86/x86.c b/source/x86/x86.c
index 054a7d3..75a4bb3 100644
--- a/source/x86/x86.c
+++ b/source/x86/x86.c
@@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
         */
        if ((change != KVM_MR_DELETE) &&
                (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-               kvm_mmu_zap_collapsible_sptes(kvm, new);
+               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
+               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+       }
 
        /*
         * Set up write protection and/or dirty logging for the new slot.

(2) I started up a memory preoccupied 10G VM(suse11sp3), which means its "RES column" in top is 10G,
    in order to set up the EPT table in advance.
(3) And then, I run the test case 429.mcf of spec cpu2006 before migration and after migration failure.
    The 429.mcf is a memory intensive workload, and the migration failure is constructed deliberately
    with the following patch of qemu:

diff --git a/migration/migration.c b/migration/migration.c
index 5d725d0..88dfc59 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -625,6 +625,9 @@ static void process_incoming_migration_co(void *opaque)
                       MIGRATION_STATUS_ACTIVE);
     ret = qemu_loadvm_state(f);
 
+    // deliberately construct the migration failure
+    exit(EXIT_FAILURE); 
+
     ps = postcopy_state_get();
     trace_process_incoming_migration_co_end(ret, ps);
     if (ps != POSTCOPY_INCOMING_NONE) {
======


Results of the score and TLB miss rate are almost the same, and I am confused.
May I ask which tool do you use to evaluate the performance?
And if my test steps are wrong, please let me know, thank you.

Regards,
Jay Zhou






^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-17  2:20             ` Zhoujian (jay)
@ 2017-05-17  5:47               ` Wanpeng Li
  2017-05-17  7:35                 ` Jay Zhou
  2017-05-17  7:43               ` Paolo Bonzini
  1 sibling, 1 reply; 24+ messages in thread
From: Wanpeng Li @ 2017-05-17  5:47 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: Dr. David Alan Gilbert, yanghongyang, quintela, wangxin (U),
	qemu-devel, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Paolo Bonzini, Huangweidong (C)

Hi Zhoujian,
2017-05-17 10:20 GMT+08:00 Zhoujian (jay) <jianjay.zhou@huawei.com>:
> Hi Wanpeng,
>
>> > On 11/05/2017 14:07, Zhoujian (jay) wrote:
>> >> -        * Scan sptes if dirty logging has been stopped, dropping those
>> >> -        * which can be collapsed into a single large-page spte.  Later
>> >> -        * page faults will create the large-page sptes.
>> >> +        * Reset each vcpu's mmu, then page faults will create the
>> large-page
>> >> +        * sptes later.
>> >>          */
>> >>         if ((change != KVM_MR_DELETE) &&
>> >>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>> >> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> >> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>>
>> This is an unlikely branch(unless guest live migration fails and continue
>> to run on the source machine) instead of hot path, do you have any
>> performance number for your real workloads?
>>
>
> Sorry to bother you again.
>
> Recently, I have tested the performance before migration and after migration failure
> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard performance
> evaluation tool.
>
> These are the results:
> ******
>     Before migration the score is 153, and the TLB miss statistics of the qemu process is:
>     linux-sjrfac:/mnt/zhoujian # perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
>     dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10
>
>     Performance counter stats for process id '26463':
>
>            698,938      dTLB-load-misses          #    0.13% of all dTLB cache hits   (50.46%)
>        543,303,875      dTLB-loads                                                    (50.43%)
>            199,597      dTLB-store-misses                                             (16.51%)
>         60,128,561      dTLB-stores                                                   (16.67%)
>             69,986      iTLB-load-misses          #    6.17% of all iTLB cache hits   (16.67%)
>          1,134,097      iTLB-loads                                                    (33.33%)
>
>       10.000684064 seconds time elapsed
>
>     After migration failure the score is 149, and the TLB miss statistics of the qemu process is:
>     linux-sjrfac:/mnt/zhoujian # perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
>     dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10
>
>     Performance counter stats for process id '26463':
>
>            765,400      dTLB-load-misses          #    0.14% of all dTLB cache hits   (50.50%)
>        540,972,144      dTLB-loads                                                    (50.47%)
>            207,670      dTLB-store-misses                                             (16.50%)
>         58,363,787      dTLB-stores                                                   (16.67%)
>            109,772      iTLB-load-misses          #    9.52% of all iTLB cache hits   (16.67%)
>          1,152,784      iTLB-loads                                                    (33.32%)
>
>       10.000703078 seconds time elapsed
> ******

Could you comment out the original "lazy collapse small sptes into
large sptes" codes in the function kvm_arch_commit_memory_region() and
post the results here?

Regards,
Wanpeng Li

>
> These are the steps:
> ======
>  (1) the version of kmod is 4.4.11(with slightly modified) and the version of qemu is 2.6.0
>     (with slightly modified), the kmod is applied with the following patch according to
>     Paolo's advice:
>
> diff --git a/source/x86/x86.c b/source/x86/x86.c
> index 054a7d3..75a4bb3 100644
> --- a/source/x86/x86.c
> +++ b/source/x86/x86.c
> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>          */
>         if ((change != KVM_MR_DELETE) &&
>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
> +               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
> +               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
> +       }
>
>         /*
>          * Set up write protection and/or dirty logging for the new slot.
>
> (2) I started up a memory preoccupied 10G VM(suse11sp3), which means its "RES column" in top is 10G,
>     in order to set up the EPT table in advance.
> (3) And then, I run the test case 429.mcf of spec cpu2006 before migration and after migration failure.
>     The 429.mcf is a memory intensive workload, and the migration failure is constructed deliberately
>     with the following patch of qemu:
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 5d725d0..88dfc59 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -625,6 +625,9 @@ static void process_incoming_migration_co(void *opaque)
>                        MIGRATION_STATUS_ACTIVE);
>      ret = qemu_loadvm_state(f);
>
> +    // deliberately construct the migration failure
> +    exit(EXIT_FAILURE);
> +
>      ps = postcopy_state_get();
>      trace_process_incoming_migration_co_end(ret, ps);
>      if (ps != POSTCOPY_INCOMING_NONE) {
> ======
>
>
> Results of the score and TLB miss rate are almost the same, and I am confused.
> May I ask which tool do you use to evaluate the performance?
> And if my test steps are wrong, please let me know, thank you.
>
> Regards,
> Jay Zhou
>
>
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-17  5:47               ` Wanpeng Li
@ 2017-05-17  7:35                 ` Jay Zhou
  0 siblings, 0 replies; 24+ messages in thread
From: Jay Zhou @ 2017-05-17  7:35 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Dr. David Alan Gilbert, yanghongyang, quintela, wangxin (U),
	qemu-devel, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Paolo Bonzini, Huangweidong (C)



On 2017/5/17 13:47, Wanpeng Li wrote:
> Hi Zhoujian,
> 2017-05-17 10:20 GMT+08:00 Zhoujian (jay) <jianjay.zhou@huawei.com>:
>> Hi Wanpeng,
>>
>>>> On 11/05/2017 14:07, Zhoujian (jay) wrote:
>>>>> -        * Scan sptes if dirty logging has been stopped, dropping those
>>>>> -        * which can be collapsed into a single large-page spte.  Later
>>>>> -        * page faults will create the large-page sptes.
>>>>> +        * Reset each vcpu's mmu, then page faults will create the
>>> large-page
>>>>> +        * sptes later.
>>>>>           */
>>>>>          if ((change != KVM_MR_DELETE) &&
>>>>>                  (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>>>>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>>>>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>>>
>>> This is an unlikely branch(unless guest live migration fails and continue
>>> to run on the source machine) instead of hot path, do you have any
>>> performance number for your real workloads?
>>>
>>
>> Sorry to bother you again.
>>
>> Recently, I have tested the performance before migration and after migration failure
>> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard performance
>> evaluation tool.
>>
>> These are the results:
>> ******
>>      Before migration the score is 153, and the TLB miss statistics of the qemu process is:
>>      linux-sjrfac:/mnt/zhoujian # perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
>>      dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10
>>
>>      Performance counter stats for process id '26463':
>>
>>             698,938      dTLB-load-misses          #    0.13% of all dTLB cache hits   (50.46%)
>>         543,303,875      dTLB-loads                                                    (50.43%)
>>             199,597      dTLB-store-misses                                             (16.51%)
>>          60,128,561      dTLB-stores                                                   (16.67%)
>>              69,986      iTLB-load-misses          #    6.17% of all iTLB cache hits   (16.67%)
>>           1,134,097      iTLB-loads                                                    (33.33%)
>>
>>        10.000684064 seconds time elapsed
>>
>>      After migration failure the score is 149, and the TLB miss statistics of the qemu process is:
>>      linux-sjrfac:/mnt/zhoujian # perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
>>      dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10
>>
>>      Performance counter stats for process id '26463':
>>
>>             765,400      dTLB-load-misses          #    0.14% of all dTLB cache hits   (50.50%)
>>         540,972,144      dTLB-loads                                                    (50.47%)
>>             207,670      dTLB-store-misses                                             (16.50%)
>>          58,363,787      dTLB-stores                                                   (16.67%)
>>             109,772      iTLB-load-misses          #    9.52% of all iTLB cache hits   (16.67%)
>>           1,152,784      iTLB-loads                                                    (33.32%)
>>
>>        10.000703078 seconds time elapsed
>> ******
>
> Could you comment out the original "lazy collapse small sptes into
> large sptes" codes in the function kvm_arch_commit_memory_region() and
> post the results here?
>

   With the patch below,

diff --git a/source/x86/x86.c b/source/x86/x86.c
index 054a7d3..e0288d5 100644
--- a/source/x86/x86.c
+++ b/source/x86/x86.c
@@ -8548,10 +8548,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
          * which can be collapsed into a single large-page spte.  Later
          * page faults will create the large-page sptes.
          */
-       if ((change != KVM_MR_DELETE) &&
-               (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
-               kvm_mmu_zap_collapsible_sptes(kvm, new);

         /*
          * Set up write protection and/or dirty logging for the new slot.

   After migration failure the score is 148, and the TLB miss statistics 
of the qemu process is:
   linux-sjrfac:/mnt/zhoujian # perf stat -e 
dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads 
-p 12432 sleep 10

  Performance counter stats for process id '12432':

          1,052,697      dTLB-load-misses          #    0.19% of all 
dTLB cache hits   (50.45%)
        551,828,702      dTLB-loads 
                (50.46%)
            147,228      dTLB-store-misses 
                (16.55%)
         60,427,834      dTLB-stores 
                (16.50%)
             93,793      iTLB-load-misses          #    7.43% of all 
iTLB cache hits   (16.67%)
          1,262,137      iTLB-loads 
                (33.33%)

       10.000709900 seconds time elapsed

   Regards,
   Jay Zhou

> Regards,
> Wanpeng Li
>
>>
>> These are the steps:
>> ======
>>   (1) the version of kmod is 4.4.11(with slightly modified) and the version of qemu is 2.6.0
>>      (with slightly modified), the kmod is applied with the following patch according to
>>      Paolo's advice:
>>
>> diff --git a/source/x86/x86.c b/source/x86/x86.c
>> index 054a7d3..75a4bb3 100644
>> --- a/source/x86/x86.c
>> +++ b/source/x86/x86.c
>> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>           */
>>          if ((change != KVM_MR_DELETE) &&
>>                  (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
>> +               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
>> +               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
>> +       }
>>
>>          /*
>>           * Set up write protection and/or dirty logging for the new slot.
>>
>> (2) I started up a memory preoccupied 10G VM(suse11sp3), which means its "RES column" in top is 10G,
>>      in order to set up the EPT table in advance.
>> (3) And then, I run the test case 429.mcf of spec cpu2006 before migration and after migration failure.
>>      The 429.mcf is a memory intensive workload, and the migration failure is constructed deliberately
>>      with the following patch of qemu:
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 5d725d0..88dfc59 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -625,6 +625,9 @@ static void process_incoming_migration_co(void *opaque)
>>                         MIGRATION_STATUS_ACTIVE);
>>       ret = qemu_loadvm_state(f);
>>
>> +    // deliberately construct the migration failure
>> +    exit(EXIT_FAILURE);
>> +
>>       ps = postcopy_state_get();
>>       trace_process_incoming_migration_co_end(ret, ps);
>>       if (ps != POSTCOPY_INCOMING_NONE) {
>> ======
>>
>>
>> Results of the score and TLB miss rate are almost the same, and I am confused.
>> May I ask which tool do you use to evaluate the performance?
>> And if my test steps are wrong, please let me know, thank you.
>>
>> Regards,
>> Jay Zhou
>>
>>
>>
>>
>>
>
> .
>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-17  2:20             ` Zhoujian (jay)
  2017-05-17  5:47               ` Wanpeng Li
@ 2017-05-17  7:43               ` Paolo Bonzini
  2017-05-17  8:38                 ` Wanpeng Li
  1 sibling, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2017-05-17  7:43 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: Wanpeng Li, Dr. David Alan Gilbert, yanghongyang, quintela,
	wangxin (U), qemu-devel, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Huangweidong (C)

> Recently, I have tested the performance before migration and after migration failure
> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard performance
> evaluation tool.
>
> These are the steps:
> ======
>  (1) the version of kmod is 4.4.11(with slightly modified) and the version of
>  qemu is 2.6.0
>     (with slightly modified), the kmod is applied with the following patch
> 
> diff --git a/source/x86/x86.c b/source/x86/x86.c
> index 054a7d3..75a4bb3 100644
> --- a/source/x86/x86.c
> +++ b/source/x86/x86.c
> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>          */
>         if ((change != KVM_MR_DELETE) &&
>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
> +               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
> +               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
> +       }
>  
>         /*
>          * Set up write protection and/or dirty logging for the new slot.

Try these modifications to the setup:

1) set up 1G hugetlbfs hugepages and use those for the guest's memory

2) test both without and with the above patch.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-17  7:43               ` Paolo Bonzini
@ 2017-05-17  8:38                 ` Wanpeng Li
  2017-05-19  8:09                   ` Jay Zhou
  0 siblings, 1 reply; 24+ messages in thread
From: Wanpeng Li @ 2017-05-17  8:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Zhoujian (jay),
	Dr. David Alan Gilbert, yanghongyang, Juan Quintela, wangxin (U),
	qemu-devel@nongnu.org Developers, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Huangweidong (C)

2017-05-17 15:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>> Recently, I have tested the performance before migration and after migration failure
>> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard performance
>> evaluation tool.
>>
>> These are the steps:
>> ======
>>  (1) the version of kmod is 4.4.11(with slightly modified) and the version of
>>  qemu is 2.6.0
>>     (with slightly modified), the kmod is applied with the following patch
>>
>> diff --git a/source/x86/x86.c b/source/x86/x86.c
>> index 054a7d3..75a4bb3 100644
>> --- a/source/x86/x86.c
>> +++ b/source/x86/x86.c
>> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>          */
>>         if ((change != KVM_MR_DELETE) &&
>>                 (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
>> +               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
>> +               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
>> +       }
>>
>>         /*
>>          * Set up write protection and/or dirty logging for the new slot.
>
> Try these modifications to the setup:
>
> 1) set up 1G hugetlbfs hugepages and use those for the guest's memory
>
> 2) test both without and with the above patch.
>

In addition, we can compare /sys/kernel/debug/kvm/largepages w/ and
w/o the patch. IIRC, /sys/kernel/debug/kvm/largepages will drop during
live migration, it will keep a small value if live migration fails and
w/o "lazy collapse small sptes into large sptes" codes, however, it
will increase gradually if w/ the "lazy collapse small sptes into
large sptes" codes.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-17  8:38                 ` Wanpeng Li
@ 2017-05-19  8:09                   ` Jay Zhou
  2017-05-19  8:32                     ` Xiao Guangrong
  2018-12-11  3:43                       ` [Qemu-devel] " Wanpeng Li
  0 siblings, 2 replies; 24+ messages in thread
From: Jay Zhou @ 2017-05-19  8:09 UTC (permalink / raw)
  To: Wanpeng Li, Paolo Bonzini
  Cc: Dr. David Alan Gilbert, yanghongyang, Juan Quintela, wangxin (U),
	qemu-devel@nongnu.org Developers, Gonglei (Arei),
	Huangzhichao, Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Huangweidong (C)

Hi Paolo and Wanpeng,

On 2017/5/17 16:38, Wanpeng Li wrote:
> 2017-05-17 15:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>> Recently, I have tested the performance before migration and after migration failure
>>> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard performance
>>> evaluation tool.
>>>
>>> These are the steps:
>>> ======
>>>   (1) the version of kmod is 4.4.11(with slightly modified) and the version of
>>>   qemu is 2.6.0
>>>      (with slightly modified), the kmod is applied with the following patch
>>>
>>> diff --git a/source/x86/x86.c b/source/x86/x86.c
>>> index 054a7d3..75a4bb3 100644
>>> --- a/source/x86/x86.c
>>> +++ b/source/x86/x86.c
>>> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>>           */
>>>          if ((change != KVM_MR_DELETE) &&
>>>                  (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
>>> +               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
>>> +               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
>>> +       }
>>>
>>>          /*
>>>           * Set up write protection and/or dirty logging for the new slot.
>>
>> Try these modifications to the setup:
>>
>> 1) set up 1G hugetlbfs hugepages and use those for the guest's memory
>>
>> 2) test both without and with the above patch.
>>

In order to avoid random memory allocation issues, I reran the test cases:
(1) setup: start a 4U10G VM with memory preoccupied, each vcpu is pinned to a 
pcpu respectively, these resources(memory and pcpu) allocated to VM are all 
from NUMA node 0
(2) sequence: firstly, I run the 429.mcf of spec cpu2006 before migration, and 
get a result. And then, migration failure is constructed. At last, I run the 
test case again, and get an another result.
(3) results:
Host hugepages           THP on(2M)  THP on(2M)   THP on(2M)   THP on(2M)
Patch                    patch1      patch2       patch3       -
Before migration         No          No           No           Yes
After migration failed   Yes         Yes          Yes          No
Largepages               67->1862    62->1890     95->1865     1926
score of 429.mcf         189         188          188          189

Host hugepages           1G hugepages  1G hugepages  1G hugepages  1G hugepages
Patch                    patch1        patch2        patch3        -
Before migration         No            No            No            Yes
After migration failed   Yes           Yes           Yes           No
Largepages               21            21            26            39
score of 429.mcf         188           188           186           188

Notes:
patch1  means with "lazy collapse small sptes into large sptes" codes
patch2  means comment out "lazy collapse small sptes into large sptes" codes
patch3  means using kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD)
         instead of kvm_mmu_zap_collapsible_sptes(kvm, new)

"Largepages" means the value of /sys/kernel/debug/kvm/largepages

> In addition, we can compare /sys/kernel/debug/kvm/largepages w/ and
> w/o the patch. IIRC, /sys/kernel/debug/kvm/largepages will drop during
> live migration, it will keep a small value if live migration fails and
> w/o "lazy collapse small sptes into large sptes" codes, however, it
> will increase gradually if w/ the "lazy collapse small sptes into
> large sptes" codes.
>

No, without the "lazy collapse small sptes into large sptes" codes,
/sys/kernel/debug/kvm/largepages does drop during live migration,
but it still will increase gradually if live migration fails, see the result
above. I printed out the back trace when it increases after migration failure,

[139574.369098]  [<ffffffff81644a7f>] dump_stack+0x19/0x1b
[139574.369111]  [<ffffffffa02c3af6>] mmu_set_spte+0x2f6/0x310 [kvm]
[139574.369122]  [<ffffffffa02c4f7e>] __direct_map.isra.109+0x1de/0x250 [kvm]
[139574.369133]  [<ffffffffa02c8a76>] tdp_page_fault+0x246/0x280 [kvm]
[139574.369144]  [<ffffffffa02bf4e4>] kvm_mmu_page_fault+0x24/0x130 [kvm]
[139574.369148]  [<ffffffffa07c8116>] handle_ept_violation+0x96/0x170 [kvm_intel]
[139574.369153]  [<ffffffffa07cf949>] vmx_handle_exit+0x299/0xbf0 [kvm_intel]
[139574.369157]  [<ffffffff816559f0>] ? uv_bau_message_intr1+0x80/0x80
[139574.369161]  [<ffffffffa07cd5e0>] ? vmx_inject_irq+0xf0/0xf0 [kvm_intel]
[139574.369172]  [<ffffffffa02b35cd>] vcpu_enter_guest+0x76d/0x1160 [kvm]
[139574.369184]  [<ffffffffa02d9285>] ? kvm_apic_local_deliver+0x65/0x70 [kvm]
[139574.369196]  [<ffffffffa02bb125>] kvm_arch_vcpu_ioctl_run+0xd5/0x440 [kvm]
[139574.369205]  [<ffffffffa02a2b11>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm]
[139574.369209]  [<ffffffff810e7852>] ? do_futex+0x122/0x5b0
[139574.369212]  [<ffffffff811fd9d5>] do_vfs_ioctl+0x2e5/0x4c0
[139574.369223]  [<ffffffffa02b0cf5>] ? kvm_on_user_return+0x75/0xb0 [kvm]
[139574.369225]  [<ffffffff811fdc51>] SyS_ioctl+0xa1/0xc0
[139574.369229]  [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b

Any suggestion will be appreciated, Thanks!


Regards,
Jay Zhou

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-19  8:09                   ` Jay Zhou
@ 2017-05-19  8:32                     ` Xiao Guangrong
  2017-05-19  9:27                       ` Jay Zhou
  2018-12-11  3:43                       ` [Qemu-devel] " Wanpeng Li
  1 sibling, 1 reply; 24+ messages in thread
From: Xiao Guangrong @ 2017-05-19  8:32 UTC (permalink / raw)
  To: Jay Zhou, Wanpeng Li, Paolo Bonzini
  Cc: Huangweidong (C), Zhanghailiang, Juan Quintela, wangxin (U),
	yanghongyang, Xiao Guangrong, qemu-devel@nongnu.org Developers,
	Dr. David Alan Gilbert, Gonglei (Arei),
	Huangzhichao, Herongguang (Stephen)


I do not know why i was removed from the list.


On 05/19/2017 04:09 PM, Jay Zhou wrote:
> Hi Paolo and Wanpeng,
> 
> On 2017/5/17 16:38, Wanpeng Li wrote:
>> 2017-05-17 15:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>>> Recently, I have tested the performance before migration and after migration failure
>>>> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard performance
>>>> evaluation tool.
>>>>
>>>> These are the steps:
>>>> ======
>>>>   (1) the version of kmod is 4.4.11(with slightly modified) and the version of
>>>>   qemu is 2.6.0
>>>>      (with slightly modified), the kmod is applied with the following patch
>>>>
>>>> diff --git a/source/x86/x86.c b/source/x86/x86.c
>>>> index 054a7d3..75a4bb3 100644
>>>> --- a/source/x86/x86.c
>>>> +++ b/source/x86/x86.c
>>>> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>>>           */
>>>>          if ((change != KVM_MR_DELETE) &&
>>>>                  (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>>>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>>>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>>>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
>>>> +               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
>>>> +               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
>>>> +       }
>>>>
>>>>          /*
>>>>           * Set up write protection and/or dirty logging for the new slot.
>>>
>>> Try these modifications to the setup:
>>>
>>> 1) set up 1G hugetlbfs hugepages and use those for the guest's memory
>>>
>>> 2) test both without and with the above patch.
>>>
> 
> In order to avoid random memory allocation issues, I reran the test cases:
> (1) setup: start a 4U10G VM with memory preoccupied, each vcpu is pinned to a pcpu respectively, these resources(memory and pcpu) allocated to VM are all from NUMA node 0
> (2) sequence: firstly, I run the 429.mcf of spec cpu2006 before migration, and get a result. And then, migration failure is constructed. At last, I run the test case again, and get an another result.

I guess this case purely writes the memory, that means the readonly mappings will
always be dropped by #PF, then huge mappings are established.

If benchmark memory read, you show observe its difference.

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
  2017-05-19  8:32                     ` Xiao Guangrong
@ 2017-05-19  9:27                       ` Jay Zhou
  0 siblings, 0 replies; 24+ messages in thread
From: Jay Zhou @ 2017-05-19  9:27 UTC (permalink / raw)
  To: Xiao Guangrong, Wanpeng Li, Paolo Bonzini
  Cc: Huangweidong (C), Zhanghailiang, Juan Quintela, wangxin (U),
	yanghongyang, Xiao Guangrong, qemu-devel@nongnu.org Developers,
	Dr. David Alan Gilbert, Gonglei (Arei),
	Huangzhichao, Herongguang (Stephen)

Hi Xiao,

On 2017/5/19 16:32, Xiao Guangrong wrote:
>
> I do not know why i was removed from the list.

I was CCed to you...
Your comments are very valuable to us, and thank for your quick response.

>
> On 05/19/2017 04:09 PM, Jay Zhou wrote:
>> Hi Paolo and Wanpeng,
>>
>> On 2017/5/17 16:38, Wanpeng Li wrote:
>>> 2017-05-17 15:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>>>> Recently, I have tested the performance before migration and after
>>>>> migration failure
>>>>> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard
>>>>> performance
>>>>> evaluation tool.
>>>>>
>>>>> These are the steps:
>>>>> ======
>>>>>   (1) the version of kmod is 4.4.11(with slightly modified) and the
>>>>> version of
>>>>>   qemu is 2.6.0
>>>>>      (with slightly modified), the kmod is applied with the following patch
>>>>>
>>>>> diff --git a/source/x86/x86.c b/source/x86/x86.c
>>>>> index 054a7d3..75a4bb3 100644
>>>>> --- a/source/x86/x86.c
>>>>> +++ b/source/x86/x86.c
>>>>> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>>>>>           */
>>>>>          if ((change != KVM_MR_DELETE) &&
>>>>>                  (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
>>>>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
>>>>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
>>>>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
>>>>> +               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
>>>>> +               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
>>>>> +       }
>>>>>
>>>>>          /*
>>>>>           * Set up write protection and/or dirty logging for the new slot.
>>>>
>>>> Try these modifications to the setup:
>>>>
>>>> 1) set up 1G hugetlbfs hugepages and use those for the guest's memory
>>>>
>>>> 2) test both without and with the above patch.
>>>>
>>
>> In order to avoid random memory allocation issues, I reran the test cases:
>> (1) setup: start a 4U10G VM with memory preoccupied, each vcpu is pinned to a
>> pcpu respectively, these resources(memory and pcpu) allocated to VM are all
>> from NUMA node 0
>> (2) sequence: firstly, I run the 429.mcf of spec cpu2006 before migration,
>> and get a result. And then, migration failure is constructed. At last, I run
>> the test case again, and get an another result.
>
> I guess this case purely writes the memory, that means the readonly mappings will

Yes, I printed out the speed of dirty page rate, it is about 1GB per second.

> always be dropped by #PF, then huge mappings are established.
>
> If benchmark memory read, you show observe its difference.
>

OK, thank for your suggestion!

Regards,
Jay Zhou

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: About QEMU BQL and dirty log switch in Migration
  2017-05-19  8:09                   ` Jay Zhou
@ 2018-12-11  3:43                       ` Wanpeng Li
  2018-12-11  3:43                       ` [Qemu-devel] " Wanpeng Li
  1 sibling, 0 replies; 24+ messages in thread
From: Wanpeng Li @ 2018-12-11  3:43 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: Huangweidong (C), Zhanghailiang, kvm, Juan Quintela, wangxin (U),
	yanghongyang, Xiao Guangrong, qemu-devel@nongnu.org Developers,
	Dr. David Alan Gilbert, Gonglei, Huangzhichao, Paolo Bonzini,
	Herongguang (Stephen)

On Fri, 19 May 2017 at 16:10, Jay Zhou <jianjay.zhou@huawei.com> wrote:
>
> Hi Paolo and Wanpeng,
>
> On 2017/5/17 16:38, Wanpeng Li wrote:
> > 2017-05-17 15:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> >>> Recently, I have tested the performance before migration and after migration failure
> >>> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard performance
> >>> evaluation tool.
> >>>
> >>> These are the steps:
> >>> ======
> >>>   (1) the version of kmod is 4.4.11(with slightly modified) and the version of
> >>>   qemu is 2.6.0
> >>>      (with slightly modified), the kmod is applied with the following patch
> >>>
> >>> diff --git a/source/x86/x86.c b/source/x86/x86.c
> >>> index 054a7d3..75a4bb3 100644
> >>> --- a/source/x86/x86.c
> >>> +++ b/source/x86/x86.c
> >>> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> >>>           */
> >>>          if ((change != KVM_MR_DELETE) &&
> >>>                  (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> >>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> >>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
> >>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
> >>> +               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
> >>> +               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
> >>> +       }
> >>>
> >>>          /*
> >>>           * Set up write protection and/or dirty logging for the new slot.
> >>
> >> Try these modifications to the setup:
> >>
> >> 1) set up 1G hugetlbfs hugepages and use those for the guest's memory
> >>
> >> 2) test both without and with the above patch.
> >>
>
> In order to avoid random memory allocation issues, I reran the test cases:
> (1) setup: start a 4U10G VM with memory preoccupied, each vcpu is pinned to a
> pcpu respectively, these resources(memory and pcpu) allocated to VM are all
> from NUMA node 0
> (2) sequence: firstly, I run the 429.mcf of spec cpu2006 before migration, and
> get a result. And then, migration failure is constructed. At last, I run the
> test case again, and get an another result.
> (3) results:
> Host hugepages           THP on(2M)  THP on(2M)   THP on(2M)   THP on(2M)
> Patch                    patch1      patch2       patch3       -
> Before migration         No          No           No           Yes
> After migration failed   Yes         Yes          Yes          No
> Largepages               67->1862    62->1890     95->1865     1926
> score of 429.mcf         189         188          188          189
>
> Host hugepages           1G hugepages  1G hugepages  1G hugepages  1G hugepages
> Patch                    patch1        patch2        patch3        -
> Before migration         No            No            No            Yes
> After migration failed   Yes           Yes           Yes           No
> Largepages               21            21            26            39
> score of 429.mcf         188           188           186           188
>
> Notes:
> patch1  means with "lazy collapse small sptes into large sptes" codes
> patch2  means comment out "lazy collapse small sptes into large sptes" codes
> patch3  means using kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD)
>          instead of kvm_mmu_zap_collapsible_sptes(kvm, new)
>
> "Largepages" means the value of /sys/kernel/debug/kvm/largepages
>
> > In addition, we can compare /sys/kernel/debug/kvm/largepages w/ and
> > w/o the patch. IIRC, /sys/kernel/debug/kvm/largepages will drop during
> > live migration, it will keep a small value if live migration fails and
> > w/o "lazy collapse small sptes into large sptes" codes, however, it
> > will increase gradually if w/ the "lazy collapse small sptes into
> > large sptes" codes.
> >
>
> No, without the "lazy collapse small sptes into large sptes" codes,
> /sys/kernel/debug/kvm/largepages does drop during live migration,
> but it still will increase gradually if live migration fails, see the result
> above. I printed out the back trace when it increases after migration failure,
>
> [139574.369098]  [<ffffffff81644a7f>] dump_stack+0x19/0x1b
> [139574.369111]  [<ffffffffa02c3af6>] mmu_set_spte+0x2f6/0x310 [kvm]
> [139574.369122]  [<ffffffffa02c4f7e>] __direct_map.isra.109+0x1de/0x250 [kvm]
> [139574.369133]  [<ffffffffa02c8a76>] tdp_page_fault+0x246/0x280 [kvm]
> [139574.369144]  [<ffffffffa02bf4e4>] kvm_mmu_page_fault+0x24/0x130 [kvm]
> [139574.369148]  [<ffffffffa07c8116>] handle_ept_violation+0x96/0x170 [kvm_intel]
> [139574.369153]  [<ffffffffa07cf949>] vmx_handle_exit+0x299/0xbf0 [kvm_intel]
> [139574.369157]  [<ffffffff816559f0>] ? uv_bau_message_intr1+0x80/0x80
> [139574.369161]  [<ffffffffa07cd5e0>] ? vmx_inject_irq+0xf0/0xf0 [kvm_intel]
> [139574.369172]  [<ffffffffa02b35cd>] vcpu_enter_guest+0x76d/0x1160 [kvm]
> [139574.369184]  [<ffffffffa02d9285>] ? kvm_apic_local_deliver+0x65/0x70 [kvm]
> [139574.369196]  [<ffffffffa02bb125>] kvm_arch_vcpu_ioctl_run+0xd5/0x440 [kvm]
> [139574.369205]  [<ffffffffa02a2b11>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm]
> [139574.369209]  [<ffffffff810e7852>] ? do_futex+0x122/0x5b0
> [139574.369212]  [<ffffffff811fd9d5>] do_vfs_ioctl+0x2e5/0x4c0
> [139574.369223]  [<ffffffffa02b0cf5>] ? kvm_on_user_return+0x75/0xb0 [kvm]
> [139574.369225]  [<ffffffff811fdc51>] SyS_ioctl+0xa1/0xc0
> [139574.369229]  [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b
>
> Any suggestion will be appreciated, Thanks!

I found some time to figure it out, there is a simple program to
reproduce in the guest:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

#define BUFSIZE (1024 * 1024)

int useconds = 0;
int mbytes = 0;

void *memory_write(void *arg)
{
    int i = arg;
    int j = 0;
    char *p_buf = NULL;
    p_buf = (char *)malloc(mbytes * BUFSIZE);
    //use the memory
    memset(p_buf, 0, mbytes * BUFSIZE);
    printf("thread: %d\n", i);
    while (1) {
        for (j = 0; j < mbytes; j++) {
            memset(&p_buf[j * BUFSIZE], 0, 100);
        }
    usleep(useconds);
    }
}

int main(int argc, const char *argv[])
{
    int i = 0;
    int ret = 0;
    int threads = 0;
    pthread_t tid = 0;
    mbytes = atoi(argv[1]);
    threads = atoi(argv[2]);
    useconds = atoi(argv[3]);
    if (mbytes == 0 || threads == 0 || useconds == 0) {
        printf("get mbytes or threads or useconds error\n");
        return 1;
    }

    printf("mbytes:%dm, thread:%d, useconds:%d\n", mbytes, threads, useconds);

    for (i=0; i< threads; i++) {
        ret = pthread_create(&tid, NULL, (void *)memory_write, (void *)i);
        if(ret)
        {
            printf("Create pthread error!\n");
            return 1;
        }
   }

    sleep(1000000);
    return 0;
}

I try ./a.out 100 50 2 which means it will spawn 50 threads, each
allocate 100MB, and sleep 2us after each round of writing. In
addition, it just dirties 100 byte(which just occupies 4KB page) of
each 1MB memory.  The large sptes are dropped in the ept violation
path since the large sptes are write-protect during live migration,
small sptes are populated in this process, however, in the above
setup, just 2 small sptes for each 2MB memory range are populated,
there is no further ept violation and no further small sptes are
replaced by large sptes after migration fails since the 2 small sptes
are still populated. If I stop the a.out and run it the second time,
the memory of a.out is reallocated, it probably allocate other gfns,
the small sptes are replaced by large sptes during this process since
the sptes(the remaining sptes in the 2MB memory except the 2 before)
of the new gfns are empty and ept violation path figure out it is huge
page backed. I do another testing, replace the 100 bytes by BUFSIZE
which means that it will dirty the whole 1MB memory, this result in
all the small sptes are populated, it will not be replaced by large
sptes any more after migration fails.

For the the 429.mcf of spec cpu2006 testcase, the RES is 10GB, I guess
the whole memory of each 2MB is not accessing simultaneously, during
EPT violation, most large sptes are dropped, part of each 2MB memory
is accessed and small sptes are populated. The small sptes will be
dropped and replaced by large sptes in the ept violation path if other
part of each 2MB memory is accessed after migration fails.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] About QEMU BQL and dirty log switch in Migration
@ 2018-12-11  3:43                       ` Wanpeng Li
  0 siblings, 0 replies; 24+ messages in thread
From: Wanpeng Li @ 2018-12-11  3:43 UTC (permalink / raw)
  To: Zhoujian (jay)
  Cc: Paolo Bonzini, Dr. David Alan Gilbert, yanghongyang,
	Juan Quintela, wangxin (U),
	qemu-devel@nongnu.org Developers, Gonglei, Huangzhichao,
	Zhanghailiang, Herongguang (Stephen),
	Xiao Guangrong, Huangweidong (C),
	kvm

On Fri, 19 May 2017 at 16:10, Jay Zhou <jianjay.zhou@huawei.com> wrote:
>
> Hi Paolo and Wanpeng,
>
> On 2017/5/17 16:38, Wanpeng Li wrote:
> > 2017-05-17 15:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> >>> Recently, I have tested the performance before migration and after migration failure
> >>> using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard performance
> >>> evaluation tool.
> >>>
> >>> These are the steps:
> >>> ======
> >>>   (1) the version of kmod is 4.4.11(with slightly modified) and the version of
> >>>   qemu is 2.6.0
> >>>      (with slightly modified), the kmod is applied with the following patch
> >>>
> >>> diff --git a/source/x86/x86.c b/source/x86/x86.c
> >>> index 054a7d3..75a4bb3 100644
> >>> --- a/source/x86/x86.c
> >>> +++ b/source/x86/x86.c
> >>> @@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> >>>           */
> >>>          if ((change != KVM_MR_DELETE) &&
> >>>                  (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
> >>> -               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
> >>> -               kvm_mmu_zap_collapsible_sptes(kvm, new);
> >>> +               !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
> >>> +               printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
> >>> +               kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
> >>> +       }
> >>>
> >>>          /*
> >>>           * Set up write protection and/or dirty logging for the new slot.
> >>
> >> Try these modifications to the setup:
> >>
> >> 1) set up 1G hugetlbfs hugepages and use those for the guest's memory
> >>
> >> 2) test both without and with the above patch.
> >>
>
> In order to avoid random memory allocation issues, I reran the test cases:
> (1) setup: start a 4U10G VM with memory preoccupied, each vcpu is pinned to a
> pcpu respectively, these resources(memory and pcpu) allocated to VM are all
> from NUMA node 0
> (2) sequence: firstly, I run the 429.mcf of spec cpu2006 before migration, and
> get a result. And then, migration failure is constructed. At last, I run the
> test case again, and get an another result.
> (3) results:
> Host hugepages           THP on(2M)  THP on(2M)   THP on(2M)   THP on(2M)
> Patch                    patch1      patch2       patch3       -
> Before migration         No          No           No           Yes
> After migration failed   Yes         Yes          Yes          No
> Largepages               67->1862    62->1890     95->1865     1926
> score of 429.mcf         189         188          188          189
>
> Host hugepages           1G hugepages  1G hugepages  1G hugepages  1G hugepages
> Patch                    patch1        patch2        patch3        -
> Before migration         No            No            No            Yes
> After migration failed   Yes           Yes           Yes           No
> Largepages               21            21            26            39
> score of 429.mcf         188           188           186           188
>
> Notes:
> patch1  means with "lazy collapse small sptes into large sptes" codes
> patch2  means comment out "lazy collapse small sptes into large sptes" codes
> patch3  means using kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD)
>          instead of kvm_mmu_zap_collapsible_sptes(kvm, new)
>
> "Largepages" means the value of /sys/kernel/debug/kvm/largepages
>
> > In addition, we can compare /sys/kernel/debug/kvm/largepages w/ and
> > w/o the patch. IIRC, /sys/kernel/debug/kvm/largepages will drop during
> > live migration, it will keep a small value if live migration fails and
> > w/o "lazy collapse small sptes into large sptes" codes, however, it
> > will increase gradually if w/ the "lazy collapse small sptes into
> > large sptes" codes.
> >
>
> No, without the "lazy collapse small sptes into large sptes" codes,
> /sys/kernel/debug/kvm/largepages does drop during live migration,
> but it still will increase gradually if live migration fails, see the result
> above. I printed out the back trace when it increases after migration failure,
>
> [139574.369098]  [<ffffffff81644a7f>] dump_stack+0x19/0x1b
> [139574.369111]  [<ffffffffa02c3af6>] mmu_set_spte+0x2f6/0x310 [kvm]
> [139574.369122]  [<ffffffffa02c4f7e>] __direct_map.isra.109+0x1de/0x250 [kvm]
> [139574.369133]  [<ffffffffa02c8a76>] tdp_page_fault+0x246/0x280 [kvm]
> [139574.369144]  [<ffffffffa02bf4e4>] kvm_mmu_page_fault+0x24/0x130 [kvm]
> [139574.369148]  [<ffffffffa07c8116>] handle_ept_violation+0x96/0x170 [kvm_intel]
> [139574.369153]  [<ffffffffa07cf949>] vmx_handle_exit+0x299/0xbf0 [kvm_intel]
> [139574.369157]  [<ffffffff816559f0>] ? uv_bau_message_intr1+0x80/0x80
> [139574.369161]  [<ffffffffa07cd5e0>] ? vmx_inject_irq+0xf0/0xf0 [kvm_intel]
> [139574.369172]  [<ffffffffa02b35cd>] vcpu_enter_guest+0x76d/0x1160 [kvm]
> [139574.369184]  [<ffffffffa02d9285>] ? kvm_apic_local_deliver+0x65/0x70 [kvm]
> [139574.369196]  [<ffffffffa02bb125>] kvm_arch_vcpu_ioctl_run+0xd5/0x440 [kvm]
> [139574.369205]  [<ffffffffa02a2b11>] kvm_vcpu_ioctl+0x2b1/0x640 [kvm]
> [139574.369209]  [<ffffffff810e7852>] ? do_futex+0x122/0x5b0
> [139574.369212]  [<ffffffff811fd9d5>] do_vfs_ioctl+0x2e5/0x4c0
> [139574.369223]  [<ffffffffa02b0cf5>] ? kvm_on_user_return+0x75/0xb0 [kvm]
> [139574.369225]  [<ffffffff811fdc51>] SyS_ioctl+0xa1/0xc0
> [139574.369229]  [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b
>
> Any suggestion will be appreciated, Thanks!

I found some time to figure it out, there is a simple program to
reproduce in the guest:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

#define BUFSIZE (1024 * 1024)

int useconds = 0;
int mbytes = 0;

void *memory_write(void *arg)
{
    int i = arg;
    int j = 0;
    char *p_buf = NULL;
    p_buf = (char *)malloc(mbytes * BUFSIZE);
    //use the memory
    memset(p_buf, 0, mbytes * BUFSIZE);
    printf("thread: %d\n", i);
    while (1) {
        for (j = 0; j < mbytes; j++) {
            memset(&p_buf[j * BUFSIZE], 0, 100);
        }
    usleep(useconds);
    }
}

int main(int argc, const char *argv[])
{
    int i = 0;
    int ret = 0;
    int threads = 0;
    pthread_t tid = 0;
    mbytes = atoi(argv[1]);
    threads = atoi(argv[2]);
    useconds = atoi(argv[3]);
    if (mbytes == 0 || threads == 0 || useconds == 0) {
        printf("get mbytes or threads or useconds error\n");
        return 1;
    }

    printf("mbytes:%dm, thread:%d, useconds:%d\n", mbytes, threads, useconds);

    for (i=0; i< threads; i++) {
        ret = pthread_create(&tid, NULL, (void *)memory_write, (void *)i);
        if(ret)
        {
            printf("Create pthread error!\n");
            return 1;
        }
   }

    sleep(1000000);
    return 0;
}

I try ./a.out 100 50 2 which means it will spawn 50 threads, each
allocate 100MB, and sleep 2us after each round of writing. In
addition, it just dirties 100 byte(which just occupies 4KB page) of
each 1MB memory.  The large sptes are dropped in the ept violation
path since the large sptes are write-protect during live migration,
small sptes are populated in this process, however, in the above
setup, just 2 small sptes for each 2MB memory range are populated,
there is no further ept violation and no further small sptes are
replaced by large sptes after migration fails since the 2 small sptes
are still populated. If I stop the a.out and run it the second time,
the memory of a.out is reallocated, it probably allocate other gfns,
the small sptes are replaced by large sptes during this process since
the sptes(the remaining sptes in the 2MB memory except the 2 before)
of the new gfns are empty and ept violation path figure out it is huge
page backed. I do another testing, replace the 100 bytes by BUFSIZE
which means that it will dirty the whole 1MB memory, this result in
all the small sptes are populated, it will not be replaced by large
sptes any more after migration fails.

For the the 429.mcf of spec cpu2006 testcase, the RES is 10GB, I guess
the whole memory of each 2MB is not accessing simultaneously, during
EPT violation, most large sptes are dropped, part of each 2MB memory
is accessed and small sptes are populated. The small sptes will be
dropped and replaced by large sptes in the ept violation path if other
part of each 2MB memory is accessed after migration fails.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-12-11  3:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-24 11:46 [Qemu-devel] About QEMU BQL and dirty log switch in Migration Yang Hongyang
2017-04-24 12:06 ` Juan Quintela
2017-04-24 12:13   ` Yang Hongyang
2017-04-24 16:42     ` Dr. David Alan Gilbert
2017-04-26 15:46       ` Paolo Bonzini
2017-04-27  2:46         ` Yang Hongyang
2017-05-11 12:07       ` Zhoujian (jay)
2017-05-11 12:24         ` Paolo Bonzini
2017-05-11 13:43           ` Wanpeng Li
2017-05-11 13:49             ` Wanpeng Li
2017-05-11 14:18               ` Zhoujian (jay)
2017-05-12  6:34                 ` Wanpeng Li
2017-05-17  2:20             ` Zhoujian (jay)
2017-05-17  5:47               ` Wanpeng Li
2017-05-17  7:35                 ` Jay Zhou
2017-05-17  7:43               ` Paolo Bonzini
2017-05-17  8:38                 ` Wanpeng Li
2017-05-19  8:09                   ` Jay Zhou
2017-05-19  8:32                     ` Xiao Guangrong
2017-05-19  9:27                       ` Jay Zhou
2018-12-11  3:43                     ` Wanpeng Li
2018-12-11  3:43                       ` [Qemu-devel] " Wanpeng Li
2017-05-12  8:09           ` Xiao Guangrong
2017-05-12  8:42             ` Hailiang Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.