From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41793) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YuK4w-00056d-Vt for qemu-devel@nongnu.org; Mon, 18 May 2015 08:23:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YuK4q-000860-Ja for qemu-devel@nongnu.org; Mon, 18 May 2015 08:23:30 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:33754) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YuK4o-00083N-SY for qemu-devel@nongnu.org; Mon, 18 May 2015 08:23:24 -0400 Message-ID: <5559D999.5000406@huawei.com> Date: Mon, 18 May 2015 20:22:49 +0800 From: zhanghailiang MIME-Version: 1.0 References: <1427347774-8960-1-git-send-email-zhang.zhanghailiang@huawei.com> <1427347774-8960-10-git-send-email-zhang.zhanghailiang@huawei.com> <20150515120944.GF2194@work-vm> <5559ACCB.3020308@huawei.com> <20150518121019.GB14465@work-vm> In-Reply-To: <20150518121019.GB14465@work-vm> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v4 09/28] COLO: Save VM state to slave when do checkpoint List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, amit.shah@redhat.com, Lai Jiangshan , Yang Hongyang , david@gibson.dropbear.id.au On 2015/5/18 20:10, Dr. David Alan Gilbert wrote: > * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >> On 2015/5/15 20:09, Dr. David Alan Gilbert wrote: >>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: >>>> We should save PVM's RAM/device to slave when needed. >>>> >>>> For VM state, we will cache them in slave, we use QEMUSizedBuffer >>>> to store the data, we need know the data size of VM state, so in master, >>>> we use qsb to store VM state temporarily, and then migrate the data to >>>> slave. >>>> >>>> Signed-off-by: zhanghailiang >>>> Signed-off-by: Yang Hongyang >>>> Signed-off-by: Gonglei >>>> Signed-off-by: Lai Jiangshan >>>> Signed-off-by: Li Zhijian >>>> --- >>>> arch_init.c | 22 ++++++++++++++++++-- >>>> migration/colo.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++---- >>>> savevm.c | 2 +- >>>> 3 files changed, 79 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/arch_init.c b/arch_init.c >>>> index fcfa328..e928e11 100644 >>>> --- a/arch_init.c >>>> +++ b/arch_init.c >>>> @@ -53,6 +53,7 @@ >>>> #include "hw/acpi/acpi.h" >>>> #include "qemu/host-utils.h" >>>> #include "qemu/rcu_queue.h" >>>> +#include "migration/migration-colo.h" >>>> >>>> #ifdef DEBUG_ARCH_INIT >>>> #define DPRINTF(fmt, ...) \ >>>> @@ -845,6 +846,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque) >>>> RAMBlock *block; >>>> int64_t ram_bitmap_pages; /* Size of bitmap in pages, including gaps */ >>>> >>>> + /* >>>> + * migration has already setup the bitmap, reuse it. >>>> + */ >>>> + if (migrate_in_colo_state()) { >>>> + goto setup_part; >>>> + } >>>> + >>> >>> This is a bit odd. It would be easier if you just moved the init code >>> inside this if, rather than goto'ing over it (or move the other code that >>> you actually want into another function that then gets called from the bottom >>> of here?) >>> The thing that also makes it especially odd is that you goto over >>> the rcu_read_lock and then have to fix it up; that's getting messy. >>> >> >> Yes, here we reuse ram_save_setup in COLO's checkpoint process, the difference is >> in COLO's checkpoint process, we don't have to initialize these global variables again, >> which has been initialized in the previous first migration process. But we have to resend >> the info of ram_list.blocks. (Maybe this also not necessary ). I will split this function temporarily >> to make this look gentler. > > Great. > >>> (The qemu style seems to be OK to use goto to jump to a shared error >>> block at the end of a function but otherwise it should be rare). >>> >>>> mig_throttle_on = false; >>>> dirty_rate_high_cnt = 0; >>>> bitmap_sync_count = 0; >>>> @@ -901,9 +909,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque) >>>> migration_bitmap_sync(); >>>> qemu_mutex_unlock_ramlist(); >>>> qemu_mutex_unlock_iothread(); >>>> - >>>> +setup_part: >>>> qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE); >>>> >>>> + if (migrate_in_colo_state()) { >>>> + rcu_read_lock(); >>>> + } >>>> QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { >>>> qemu_put_byte(f, strlen(block->idstr)); >>>> qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr)); >>>> @@ -1007,7 +1018,14 @@ static int ram_save_complete(QEMUFile *f, void *opaque) >>>> } >>>> >>>> ram_control_after_iterate(f, RAM_CONTROL_FINISH); >>>> - migration_end(); >>>> + >>>> + /* >>>> + * Since we need to reuse dirty bitmap in colo, >>>> + * don't cleanup the bitmap. >>>> + */ >>>> + if (!migrate_enable_colo() || migration_has_failed(migrate_get_current())) { >>>> + migration_end(); >>>> + } >>>> >>>> rcu_read_unlock(); >>>> qemu_put_be64(f, RAM_SAVE_FLAG_EOS); >>>> diff --git a/migration/colo.c b/migration/colo.c >>>> index 5a8ed1b..64e3f3a 100644 >>>> --- a/migration/colo.c >>>> +++ b/migration/colo.c >>>> @@ -60,6 +60,9 @@ enum { >>>> >>>> static QEMUBH *colo_bh; >>>> static Coroutine *colo; >>>> +/* colo buffer */ >>>> +#define COLO_BUFFER_BASE_SIZE (1000*1000*4ULL) >>> >>> Surely you want that as 4*1024*1024 ? Anyway, now that qemu has >>> migrate_set_parameter, it's probably best to wire these magic numbers >>> to parameters that can be configured. >>> >> >> Er, actually this macro can be any value, it does not matter, because qsb will grow automatically >> if the size of qsb is not enough. (Am i right?). >> And i don't think this internal used value should be exported to user. For now this size include >> the size of dirty pages and the size of device related data. > > Yes, you're right; (However I'd still use a power of 2 for a size for memory, but maybe > that's just me) > Er, I searched this in qemu codes, your advise about 'power of 2 for size' is very reasonable. ;) I will fix that, Thanks! >> But, maybe i can use the new >> 'migrate_set_parameter' to achieve the capability of 'colo_set_checkpoint_period'. > > Yes, it's good for that. > > Dave > >> >> >> Thanks, >> zhanghailiang >> >>>> +QEMUSizedBuffer *colo_buffer; >>>> >>>> bool colo_supported(void) >>>> { >>>> @@ -123,6 +126,8 @@ static int colo_ctl_get(QEMUFile *f, uint64_t require) >>>> static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control) >>>> { >>>> int ret; >>>> + size_t size; >>>> + QEMUFile *trans = NULL; >>>> >>>> ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW); >>>> if (ret < 0) { >>>> @@ -133,16 +138,47 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control) >>>> if (ret < 0) { >>>> goto out; >>>> } >>>> + /* Reset colo buffer and open it for write */ >>>> + qsb_set_length(colo_buffer, 0); >>>> + trans = qemu_bufopen("w", colo_buffer); >>>> + if (!trans) { >>>> + error_report("Open colo buffer for write failed"); >>>> + goto out; >>>> + } >>>> + >>>> + /* suspend and save vm state to colo buffer */ >>>> + qemu_mutex_lock_iothread(); >>>> + vm_stop_force_state(RUN_STATE_COLO); >>>> + qemu_mutex_unlock_iothread(); >>>> + DPRINTF("vm is stoped\n"); >>>> + >>>> + /* Disable block migration */ >>>> + s->params.blk = 0; >>>> + s->params.shared = 0; >>>> + qemu_savevm_state_begin(trans, &s->params); >>>> + qemu_mutex_lock_iothread(); >>>> + qemu_savevm_state_complete(trans); >>>> + qemu_mutex_unlock_iothread(); >>>> >>>> - /* TODO: suspend and save vm state to colo buffer */ >>>> + qemu_fflush(trans); >>>> >>>> ret = colo_ctl_put(s->file, COLO_CHECKPOINT_SEND); >>>> if (ret < 0) { >>>> goto out; >>>> } >>>> + /* we send the total size of the vmstate first */ >>>> + size = qsb_get_length(colo_buffer); >>>> + ret = colo_ctl_put(s->file, size); >>>> + if (ret < 0) { >>>> + goto out; >>>> + } >>>> >>>> - /* TODO: send vmstate to slave */ >>>> - >>>> + qsb_put_buffer(s->file, colo_buffer, size); >>>> + qemu_fflush(s->file); >>>> + ret = qemu_file_get_error(s->file); >>>> + if (ret < 0) { >>>> + goto out; >>>> + } >>>> ret = colo_ctl_get(control, COLO_CHECKPOINT_RECEIVED); >>>> if (ret < 0) { >>>> goto out; >>>> @@ -154,9 +190,18 @@ static int colo_do_checkpoint_transaction(MigrationState *s, QEMUFile *control) >>>> } >>>> DPRINTF("got COLO_CHECKPOINT_LOADED\n"); >>>> >>>> - /* TODO: resume master */ >>>> + ret = 0; >>>> + /* resume master */ >>>> + qemu_mutex_lock_iothread(); >>>> + vm_start(); >>>> + qemu_mutex_unlock_iothread(); >>>> + DPRINTF("vm resume to run again\n"); >>>> >>>> out: >>>> + if (trans) { >>>> + qemu_fclose(trans); >>>> + } >>>> + >>>> return ret; >>>> } >>>> >>>> @@ -182,6 +227,12 @@ static void *colo_thread(void *opaque) >>>> } >>>> DPRINTF("get COLO_READY\n"); >>>> >>>> + colo_buffer = qsb_create(NULL, COLO_BUFFER_BASE_SIZE); >>>> + if (colo_buffer == NULL) { >>>> + error_report("Failed to allocate colo buffer!"); >>>> + goto out; >>>> + } >>>> + >>>> qemu_mutex_lock_iothread(); >>>> vm_start(); >>>> qemu_mutex_unlock_iothread(); >>>> @@ -197,6 +248,9 @@ static void *colo_thread(void *opaque) >>>> out: >>>> migrate_set_state(s, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); >>>> >>>> + qsb_free(colo_buffer); >>>> + colo_buffer = NULL; >>>> + >>>> if (colo_control) { >>>> qemu_fclose(colo_control); >>>> } >>>> diff --git a/savevm.c b/savevm.c >>>> index 3b0e222..cd7ec27 100644 >>>> --- a/savevm.c >>>> +++ b/savevm.c >>>> @@ -42,7 +42,7 @@ >>>> #include "qemu/iov.h" >>>> #include "block/snapshot.h" >>>> #include "block/qapi.h" >>>> - >>>> +#include "migration/migration-colo.h" >>>> >>>> #ifndef ETH_P_RARP >>>> #define ETH_P_RARP 0x8035 >>>> -- >>>> 1.7.12.4 >>>> >>>> >>> -- >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>> >>> . >>> >> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >