From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42843)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1aDOs2-0007H0-Jv
	for qemu-devel@nongnu.org; Sun, 27 Dec 2015 22:53:20 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1aDOrz-0001VS-2h
	for qemu-devel@nongnu.org; Sun, 27 Dec 2015 22:53:18 -0500
Received: from szxga02-in.huawei.com ([119.145.14.65]:14766)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1aDOrx-0001Qo-3L
	for qemu-devel@nongnu.org; Sun, 27 Dec 2015 22:53:14 -0500
References: <1450167779-9960-1-git-send-email-zhang.zhanghailiang@huawei.com>
	<1450167779-9960-32-git-send-email-zhang.zhanghailiang@huawei.com>
	<20151218105357.GB2459@work-vm>
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
Message-ID: <5680B07A.9010004@huawei.com>
Date: Mon, 28 Dec 2015 11:46:02 +0800
MIME-Version: 1.0
In-Reply-To: <20151218105357.GB2459@work-vm>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v12 31/38] COLO: Separate the
 process of saving/loading ram and device state
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, hongyang.yang@easystack.cn

On 2015/12/18 18:53, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> We separate the process of saving/loading ram and device state when do checkpoint,
>> we add new helpers for save/load ram/device. With this change, we can directly
>> transfer ram from master to slave without using QEMUSizeBuffer as assistant,
>> which also reduce the size of extra memory been used during checkpoint.
>>
>> Besides, we move the colo_flush_ram_cache to the proper position after the
>> above change.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>> v11:
>> - Remove load configuration section in qemu_loadvm_state_begin()
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>   include/sysemu/sysemu.h |   6 +++
>>   migration/colo.c        |  43 ++++++++++++----
>>   migration/ram.c         |   5 --
>>   migration/savevm.c      | 132 ++++++++++++++++++++++++++++++++++++++++++++++--
>>   4 files changed, 168 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>> index 91eeda3..5deae53 100644
>> --- a/include/sysemu/sysemu.h
>> +++ b/include/sysemu/sysemu.h
>> @@ -133,7 +133,13 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
>>                                              uint64_t *start_list,
>>                                              uint64_t *length_list);
>>
>> +int qemu_save_ram_precopy(QEMUFile *f);
>> +int qemu_save_device_state(QEMUFile *f);
>> +
>>   int qemu_loadvm_state(QEMUFile *f);
>> +int qemu_loadvm_state_begin(QEMUFile *f);
>> +int qemu_load_ram_state(QEMUFile *f);
>> +int qemu_load_device_state(QEMUFile *f);
>>
>>   typedef enum DisplayType
>>   {
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 62a0444..d253d64 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -272,21 +272,32 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>>           goto out;
>>       }
>>
>> +    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>>       /* Disable block migration */
>>       s->params.blk = 0;
>>       s->params.shared = 0;
>> -    qemu_savevm_state_header(trans);
>> -    qemu_savevm_state_begin(trans, &s->params);
>> -    qemu_mutex_lock_iothread();
>> -    qemu_savevm_state_complete_precopy(trans, false);
>> -    qemu_mutex_unlock_iothread();
>> -
>> -    qemu_fflush(trans);
>> +    qemu_savevm_state_begin(s->to_dst_file, &s->params);
>> +    ret = qemu_file_get_error(s->to_dst_file);
>> +    if (ret < 0) {
>> +        error_report("save vm state begin error\n");
>
> You don't need \n in error_report (Markus is trying to get rid
> of all the existing cases where people do that!)
>

I will remove it.

>> +        goto out;
>> +    }
>>
>> -    ret = colo_put_cmd(s->to_dst_file, COLO_COMMAND_VMSTATE_SEND);
>> +    qemu_mutex_lock_iothread();
>> +    /* Note: device state is saved into buffer */
>> +    ret = qemu_save_device_state(trans);
>>       if (ret < 0) {
>> +        error_report("save device state error\n");
>> +        qemu_mutex_unlock_iothread();
>>           goto out;
>>       }
>> +    qemu_fflush(trans);
>> +    qemu_save_ram_precopy(s->to_dst_file);
>> +    qemu_mutex_unlock_iothread();
>> +
>
> It's interesting you're saving the devices and then saving the RAM,
> where as in a normal migration we always save the RAM first and then
> the devices;  I don't _think_ it makes any difference but I thought
> I'd point it out.
>

Yes, you are right, i will adjust their orders.

>>       /* we send the total size of the vmstate first */
>>       size = qsb_get_length(buffer);
>>       ret = colo_put_cmd_value(s->to_dst_file, COLO_COMMAND_VMSTATE_SIZE, size);
>> @@ -545,6 +556,16 @@ void *colo_process_incoming_thread(void *opaque)
>>               goto out;
>>           }
>>
>> +        ret = qemu_loadvm_state_begin(mis->from_src_file);
>> +        if (ret < 0) {
>> +            error_report("load vm state begin error, ret=%d", ret);
>> +            goto out;
>> +        }
>> +        ret = qemu_load_ram_state(mis->from_src_file);
>> +        if (ret < 0) {
>> +            error_report("load ram state error");
>> +            goto out;
>> +        }
>>           /* read the VM state total size first */
>>           ret = colo_get_cmd_value(mis->from_src_file,
>>                                    COLO_COMMAND_VMSTATE_SIZE, &value);
>> @@ -577,8 +598,10 @@ void *colo_process_incoming_thread(void *opaque)
>>           qemu_mutex_lock_iothread();
>>           qemu_system_reset(VMRESET_SILENT);
>>           vmstate_loading = true;
>> -        if (qemu_loadvm_state(fb) < 0) {
>> -            error_report("COLO: loadvm failed");
>> +        colo_flush_ram_cache();
>> +        ret = qemu_load_device_state(fb);
>> +        if (ret < 0) {
>> +            error_report("COLO: load device state failed\n");
>>               vmstate_loading = false;
>>               qemu_mutex_unlock_iothread();
>>               goto out;
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 8ff7f7c..45d9332 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2458,7 +2458,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>        * be atomic
>>        */
>>       bool postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING;
>> -    bool need_flush = false;
>>
>>       seq_iter++;
>>
>> @@ -2493,7 +2492,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>               /* After going into COLO, we should load the Page into colo_cache */
>>               if (ram_cache_enable) {
>>                   host = colo_cache_from_block_offset(block, addr);
>> -                need_flush = true;
>>               } else {
>>                   host = host_from_ram_block_offset(block, addr);
>>               }
>> @@ -2588,9 +2586,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>>
>>       rcu_read_unlock();
>>
>> -    if (!ret  && ram_cache_enable && need_flush) {
>> -        colo_flush_ram_cache();
>> -    }
>>       DPRINTF("Completed load of VM with exit code %d seq iteration "
>>               "%" PRIu64 "\n", ret, seq_iter);
>>       return ret;
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index c7c26d8..94c0d10 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -50,6 +50,7 @@
>>   #include "qemu/iov.h"
>>   #include "block/snapshot.h"
>>   #include "block/qapi.h"
>> +#include "migration/colo.h"
>>
>>
>>   #ifndef ETH_P_RARP
>> @@ -923,6 +924,10 @@ void qemu_savevm_state_begin(QEMUFile *f,
>>               break;
>>           }
>>       }
>> +    if (migration_in_colo_state()) {
>> +        qemu_put_byte(f, QEMU_VM_EOF);
>> +        qemu_fflush(f);
>> +    }
>>   }
>>
>>   /*
>> @@ -1192,13 +1197,44 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>       return ret;
>>   }
>>
>> -static int qemu_save_device_state(QEMUFile *f)
>> +int qemu_save_ram_precopy(QEMUFile *f)
>>   {
>>       SaveStateEntry *se;
>> +    int ret = 0;
>>
>> -    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
>> -    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
>> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>> +        if (!se->ops || !se->ops->save_live_complete_precopy) {
>> +            continue;
>> +        }
>> +        if (se->ops && se->ops->is_active) {
>> +            if (!se->ops->is_active(se->opaque)) {
>> +                continue;
>> +            }
>> +        }
>> +        trace_savevm_section_start(se->idstr, se->section_id);
>
> Please update the trace_ names to match the function.
>

OK.

>> +
>> +        save_section_header(f, se, QEMU_VM_SECTION_END);
>>
>> +        ret = se->ops->save_live_complete_precopy(f, se->opaque);
>> +        trace_savevm_section_end(se->idstr, se->section_id, ret);
>> +        save_section_footer(f, se);
>> +        if (ret < 0) {
>> +            qemu_file_set_error(f, ret);
>> +            return ret;
>> +        }
>> +    }
>> +    qemu_put_byte(f, QEMU_VM_EOF);
>> +
>> +    return 0;
>> +}
>
> OK, that function is a bit odd - you're relying on a device having
> save_live_complete_precopy to let you know that it's a RAM block; it's

The function name is not exact. It's not only ram, but also include other
devices that use save_live_complete_precopy(), they should be all saved in this
function. For now, it is ram only.

> currently true but it'll get more interesting if anyone tries to
> add any other postcopy devices.  Please add a comment at least
> to point that out.
>

OK.

>> +
>> +int qemu_save_device_state(QEMUFile *f)
>> +{
>> +    SaveStateEntry *se;
>> +
>> +    if (!migration_in_colo_state()) {
>> +        qemu_savevm_state_header(f);
>> +    }
>>       cpu_synchronize_all_states();
>>
>>       QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>> @@ -1938,6 +1974,96 @@ int qemu_loadvm_state(QEMUFile *f)
>>       return ret;
>>   }
>>
>> +int qemu_loadvm_state_begin(QEMUFile *f)
>> +{
>> +    uint8_t section_type;
>> +    int ret = -1;
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +
>> +    if (!mis) {
>> +        error_report("qemu_loadvm_state_begin");
>> +        return -EINVAL;
>> +    }
>
> an odd error; how can that happen?
>

It should never happen, it is needless. I will remove it.

>> +    /* CleanUp */
>> +    loadvm_free_handlers(mis);
>
> I don't understand why you do that here?
>

We will do a full migration process before entering COLO state,
And we will save the section info (LoadStateEntry) into a list while load
VM state. Please see qemu_loadvm_section_start_full().
We will re-save the section info in every load VM state process. So
we need to cleanup the previous section info that stored in
mis->loadvm_handlers list.
Hmm, besides, we should release it after finishing loading VM state for every
checkpoint process, we didn't do that in this version, and there will be an
memory leak error.
I will fix this bug in next version.

>> +
>> +    if (qemu_savevm_state_blocked(NULL)) {
>> +        return -EINVAL;
>> +    }
>
> The other calls to that function print the error it returns!
>

OK, i will fix it.

>> +    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>> +        if (section_type != QEMU_VM_SECTION_START) {
>> +            error_report("QEMU_VM_SECTION_START");
>> +            ret = -EINVAL;
>> +            goto out;
>> +        }
>> +        ret = qemu_loadvm_section_start_full(f, mis);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>> +    ret = qemu_file_get_error(f);
>> +    if (ret == 0) {
>> +        return 0;
>> +     }
>
> That 'if' isn't needed - just remove the 3 lines and it will do the same
> thing!
>

OK.

>> +out:
>> +    return ret;
>> +}
>> +
>> +int qemu_load_ram_state(QEMUFile *f)
>> +{
>> +    uint8_t section_type;
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +    int ret = -1;
>> +
>> +    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>> +        if (section_type != QEMU_VM_SECTION_PART &&
>> +            section_type != QEMU_VM_SECTION_END) {
>> +            error_report("load ram state, not get "
>> +                         "QEMU_VM_SECTION_FULL or QEMU_VM_SECTION_END");
>> +            return -EINVAL;
>> +        }
>> +        ret = qemu_loadvm_section_part_end(f, mis);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>> +    ret = qemu_file_get_error(f);
>> +    if (ret == 0) {
>> +        return 0;
>> +     }
>> +out:
>> +    return ret;
>> +}
>> +
>> +int qemu_load_device_state(QEMUFile *f)
>> +{
>> +    uint8_t section_type;
>> +    MigrationIncomingState *mis = migration_incoming_get_current();
>> +    int ret = -1;
>> +
>> +    while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>> +        if (section_type != QEMU_VM_SECTION_FULL) {
>> +            error_report("load device state error: "
>> +                         "Not get QEMU_VM_SECTION_FULL");
>> +            return -EINVAL;
>> +        }
>> +        ret = qemu_loadvm_section_start_full(f, mis);
>> +        if (ret < 0) {
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    ret = qemu_file_get_error(f);
>> +
>> +    cpu_synchronize_all_post_init();
>> +    if (ret == 0) {
>> +        return 0;
>> +    }
>> +out:
>> +    return ret;
>> +}
>> +
>
> These three functions are all very similar;  would it be easier
> just to call qemu_loadvm_state_main ?  Perhaps add a flag/enum

Yes, we can call qemu_loadvm_state_main() directly in these three functions.
We don't have to change the define of qemu_loadvm_state_main(). This is more simple.

Thanks.
Hailiang

> parameter to it for it to check which section_types are allowed
> in the 3 different cases?
>
> Dave
>
>>   void hmp_savevm(Monitor *mon, const QDict *qdict)
>>   {
>>       BlockDriverState *bs, *bs1;
>> --
>> 1.8.3.1
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>