All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Chegu Vinod <chegu_vinod@hp.com>
Cc: owasserm@redhat.com, pbonzini@redhat.com, qemu-devel@nongnu.org,
	quintela@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration
Date: Fri, 10 May 2013 10:11:31 -0500	[thread overview]
Message-ID: <87ip2qzx7g.fsf@codemonkey.ws> (raw)
In-Reply-To: <518D00B6.6040305@hp.com>

Chegu Vinod <chegu_vinod@hp.com> writes:

> On 5/10/2013 6:07 AM, Anthony Liguori wrote:
>> Chegu Vinod <chegu_vinod@hp.com> writes:
>>
>>>   If a user chooses to turn on the auto-converge migration capability
>>>   these changes detect the lack of convergence and throttle down the
>>>   guest. i.e. force the VCPUs out of the guest for some duration
>>>   and let the migration thread catchup and help converge.
>>>
>>>   Verified the convergence using the following :
>>>   - SpecJbb2005 workload running on a 20VCPU/256G guest(~80% busy)
>>>   - OLTP like workload running on a 80VCPU/512G guest (~80% busy)
>>>
>>>   Sample results with SpecJbb2005 workload : (migrate speed set to 20Gb and
>>>   migrate downtime set to 4seconds).
>> Would it make sense to separate out the "slow the VCPU down" part of
>> this?
>>
>> That would give a management tool more flexibility to create policies
>> around slowing the VCPU down to encourage migration.
>
> I believe one can always enhance libvirt tools to monitor the migration 
> statistics and control the shares/entitlements of the vcpus via 
> cgroups..thereby slowing the guest down to allow for convergence  (I had 
> that listed in my earlier versions of the patches as an option and also 
> noted that it requires external (i.e. tool driven) monitoring and 
> triggers...and that this alternative was kind of automatic after the 
> initial setting of the capability).
>
> Is that what you meant by your comment above (or) are you talking about 
> something outside the scope of cgroups and from an implementation point 
> of view also outside the migration code path...i.e. a new knob that an 
> external tool can use to just throttle down the vcpus of a guest ?

I'm saying, a knob to throttle the guest vcpus within QEMU that could be
used by management tools to encourage convergence.

For instance, consider an imaginary "vcpu_throttle" command that took a
number between 0 and 1 that throttled VCPU performance accordingly.

Then migration would look like:

0) throttle = 1.0
1) call migrate command to start migration
2) query progress until you decide you aren't converging
3) throttle *= 0.75; call vcpu_throttle $throttle
4) goto (2)

Now I'm not opposed to a series like this that adds this sort of policy
to QEMU itself too but I want to make sure the pieces are exposed for a
management tool to implement its own policies too.

Regards,

Anthony Liguori

>
> Thanks
> Vinod
>
>
>
>>
>> In fact, I wonder if we need anything in the migration path if we just
>> expose the "slow the VCPU down" bit as a feature.
>>
>> Slow the VCPU down is not quite the same as setting priority of the VCPU
>> thread largely because of the QBL so I recognize the need to have
>> something for this in QEMU.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>>   (qemu) info migrate
>>>   capabilities: xbzrle: off auto-converge: off  <----
>>>   Migration status: active
>>>   total time: 1487503 milliseconds
>>>   expected downtime: 519 milliseconds
>>>   transferred ram: 383749347 kbytes
>>>   remaining ram: 2753372 kbytes
>>>   total ram: 268444224 kbytes
>>>   duplicate: 65461532 pages
>>>   skipped: 64901568 pages
>>>   normal: 95750218 pages
>>>   normal bytes: 383000872 kbytes
>>>   dirty pages rate: 67551 pages
>>>
>>>   ---
>>>   
>>>   (qemu) info migrate
>>>   capabilities: xbzrle: off auto-converge: on   <----
>>>   Migration status: completed
>>>   total time: 241161 milliseconds
>>>   downtime: 6373 milliseconds
>>>   transferred ram: 28235307 kbytes
>>>   remaining ram: 0 kbytes
>>>   total ram: 268444224 kbytes
>>>   duplicate: 64946416 pages
>>>   skipped: 64903523 pages
>>>   normal: 7044971 pages
>>>   normal bytes: 28179884 kbytes
>>>
>>> Signed-off-by: Chegu Vinod <chegu_vinod@hp.com>
>>> ---
>>>   arch_init.c                   |   68 +++++++++++++++++++++++++++++++++++++++++
>>>   include/migration/migration.h |    4 ++
>>>   migration.c                   |    1 +
>>>   3 files changed, 73 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/arch_init.c b/arch_init.c
>>> index 49c5dc2..29788d6 100644
>>> --- a/arch_init.c
>>> +++ b/arch_init.c
>>> @@ -49,6 +49,7 @@
>>>   #include "trace.h"
>>>   #include "exec/cpu-all.h"
>>>   #include "hw/acpi/acpi.h"
>>> +#include "sysemu/cpus.h"
>>>   
>>>   #ifdef DEBUG_ARCH_INIT
>>>   #define DPRINTF(fmt, ...) \
>>> @@ -104,6 +105,8 @@ int graphic_depth = 15;
>>>   #endif
>>>   
>>>   const uint32_t arch_type = QEMU_ARCH;
>>> +static bool mig_throttle_on;
>>> +
>>>   
>>>   /***********************************************************/
>>>   /* ram save/restore */
>>> @@ -378,8 +381,15 @@ static void migration_bitmap_sync(void)
>>>       uint64_t num_dirty_pages_init = migration_dirty_pages;
>>>       MigrationState *s = migrate_get_current();
>>>       static int64_t start_time;
>>> +    static int64_t bytes_xfer_prev;
>>>       static int64_t num_dirty_pages_period;
>>>       int64_t end_time;
>>> +    int64_t bytes_xfer_now;
>>> +    static int dirty_rate_high_cnt;
>>> +
>>> +    if (!bytes_xfer_prev) {
>>> +        bytes_xfer_prev = ram_bytes_transferred();
>>> +    }
>>>   
>>>       if (!start_time) {
>>>           start_time = qemu_get_clock_ms(rt_clock);
>>> @@ -404,6 +414,23 @@ static void migration_bitmap_sync(void)
>>>   
>>>       /* more than 1 second = 1000 millisecons */
>>>       if (end_time > start_time + 1000) {
>>> +        if (migrate_auto_converge()) {
>>> +            /* The following detection logic can be refined later. For now:
>>> +               Check to see if the dirtied bytes is 50% more than the approx.
>>> +               amount of bytes that just got transferred since the last time we
>>> +               were in this routine. If that happens N times (for now N==5)
>>> +               we turn on the throttle down logic */
>>> +            bytes_xfer_now = ram_bytes_transferred();
>>> +            if (s->dirty_pages_rate &&
>>> +                ((num_dirty_pages_period*TARGET_PAGE_SIZE) >
>>> +                ((bytes_xfer_now - bytes_xfer_prev)/2))) {
>>> +                if (dirty_rate_high_cnt++ > 5) {
>>> +                    DPRINTF("Unable to converge. Throtting down guest\n");
>>> +                    mig_throttle_on = true;
>>> +                }
>>> +             }
>>> +             bytes_xfer_prev = bytes_xfer_now;
>>> +        }
>>>           s->dirty_pages_rate = num_dirty_pages_period * 1000
>>>               / (end_time - start_time);
>>>           s->dirty_bytes_rate = s->dirty_pages_rate * TARGET_PAGE_SIZE;
>>> @@ -496,6 +523,15 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>>>       return bytes_sent;
>>>   }
>>>   
>>> +bool throttling_needed(void)
>>> +{
>>> +    if (!migrate_auto_converge()) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return mig_throttle_on;
>>> +}
>>> +
>>>   static uint64_t bytes_transferred;
>>>   
>>>   static ram_addr_t ram_save_remaining(void)
>>> @@ -1098,3 +1134,35 @@ TargetInfo *qmp_query_target(Error **errp)
>>>   
>>>       return info;
>>>   }
>>> +
>>> +static void mig_delay_vcpu(void)
>>> +{
>>> +    qemu_mutex_unlock_iothread();
>>> +    g_usleep(50*1000);
>>> +    qemu_mutex_lock_iothread();
>>> +}
>>> +
>>> +/* Stub used for getting the vcpu out of VM and into qemu via
>>> +   run_on_cpu()*/
>>> +static void mig_kick_cpu(void *opq)
>>> +{
>>> +    mig_delay_vcpu();
>>> +    return;
>>> +}
>>> +
>>> +/* To reduce the dirty rate explicitly disallow the VCPUs from spending
>>> +   much time in the VM. The migration thread will try to catchup.
>>> +   Workload will experience a performance drop.
>>> +*/
>>> +void migration_throttle_down(void)
>>> +{
>>> +    if (throttling_needed()) {
>>> +        CPUArchState *penv = first_cpu;
>>> +        while (penv) {
>>> +            qemu_mutex_lock_iothread();
>>> +            async_run_on_cpu(ENV_GET_CPU(penv), mig_kick_cpu, NULL);
>>> +            qemu_mutex_unlock_iothread();
>>> +            penv = penv->next_cpu;
>>> +        }
>>> +    }
>>> +}
>>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>>> index ace91b0..68b65c6 100644
>>> --- a/include/migration/migration.h
>>> +++ b/include/migration/migration.h
>>> @@ -129,4 +129,8 @@ int64_t migrate_xbzrle_cache_size(void);
>>>   int64_t xbzrle_cache_resize(int64_t new_size);
>>>   
>>>   bool migrate_auto_converge(void);
>>> +bool throttling_needed(void);
>>> +void stop_throttling(void);
>>> +void migration_throttle_down(void);
>>> +
>>>   #endif
>>> diff --git a/migration.c b/migration.c
>>> index 570cee5..d3673a6 100644
>>> --- a/migration.c
>>> +++ b/migration.c
>>> @@ -526,6 +526,7 @@ static void *migration_thread(void *opaque)
>>>               DPRINTF("pending size %lu max %lu\n", pending_size, max_size);
>>>               if (pending_size && pending_size >= max_size) {
>>>                   qemu_savevm_state_iterate(s->file);
>>> +                migration_throttle_down();
>>>               } else {
>>>                   DPRINTF("done iterating\n");
>>>                   qemu_mutex_lock_iothread();
>>> -- 
>>> 1.7.1
>> .
>>

  reply	other threads:[~2013-05-10 15:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-09 19:43 [Qemu-devel] [RFC PATCH v5 0/3] Throttle-down guest to help with live migration convergence Chegu Vinod
2013-05-09 19:43 ` [Qemu-devel] [RFC PATCH v5 1/3] Introduce async_run_on_cpu() Chegu Vinod
2013-05-10  7:43   ` Paolo Bonzini
2013-05-09 19:43 ` [Qemu-devel] [RFC PATCH v5 2/3] Add 'auto-converge' migration capability Chegu Vinod
2013-05-10  7:43   ` Paolo Bonzini
2013-05-10 14:26     ` Eric Blake
2013-05-09 19:43 ` [Qemu-devel] [RFC PATCH v5 3/3] Force auto-convegence of live migration Chegu Vinod
2013-05-09 20:05   ` Igor Mammedov
2013-05-09 22:26     ` Chegu Vinod
2013-05-09 20:24   ` Igor Mammedov
2013-05-09 23:00     ` Chegu Vinod
2013-05-10  7:47       ` Paolo Bonzini
2013-05-10  7:41   ` Paolo Bonzini
2013-05-10 13:07   ` Anthony Liguori
2013-05-10 14:14     ` Chegu Vinod
2013-05-10 15:11       ` Anthony Liguori [this message]
2013-05-12 17:19         ` Paolo Bonzini
2013-05-13 12:18           ` Anthony Liguori
2013-05-10 14:17     ` Daniel P. Berrange
2013-05-10 15:08       ` Anthony Liguori
2013-05-13 12:33         ` Daniel P. Berrange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ip2qzx7g.fsf@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=chegu_vinod@hp.com \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.