From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34587)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1YuaiP-0002tk-3o
	for qemu-devel@nongnu.org; Tue, 19 May 2015 02:09:22 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1YuaiK-0004pY-Oj
	for qemu-devel@nongnu.org; Tue, 19 May 2015 02:09:20 -0400
Received: from szxga01-in.huawei.com ([58.251.152.64]:56946)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1YuaiK-0004p3-0X
	for qemu-devel@nongnu.org; Tue, 19 May 2015 02:09:16 -0400
Message-ID: <555AD367.3060308@huawei.com>
Date: Tue, 19 May 2015 14:08:39 +0800
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
MIME-Version: 1.0
References: <1427347774-8960-1-git-send-email-zhang.zhanghailiang@huawei.com>
	<1427347774-8960-24-git-send-email-zhang.zhanghailiang@huawei.com>
	<20150518164859.GC14465@work-vm>
In-Reply-To: <20150518164859.GC14465@work-vm>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH v4 23/28] COLO: Improve checkpoint
 efficiency by do additional periodic checkpoint
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, amit.shah@redhat.com, Yang Hongyang <yanghy@cn.fujitsu.com>, david@gibson.dropbear.id.au

On 2015/5/19 0:48, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Besides normal checkpoint which according to the result of net packets
>> comparing, We do additional checkpoint periodically, it will reduce the number
>> of dirty pages when do one checkpoint, if we don't do checkpoint for a long
>> time (This is a special case when the net packets is always consistent).
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
>> ---
>>   migration/colo.c | 29 +++++++++++++++++++++--------
>>   1 file changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/migration/colo.c b/migration/colo.c
>> index 9ef4554..da5bc5e 100644
>> --- a/migration/colo.c
>> +++ b/migration/colo.c
>> @@ -10,6 +10,7 @@
>>    * later.  See the COPYING file in the top-level directory.
>>    */
>>
>> +#include "qemu/timer.h"
>>   #include "sysemu/sysemu.h"
>>   #include "migration/migration-colo.h"
>>   #include "qemu/error-report.h"
>> @@ -32,6 +33,13 @@
>>   */
>>   #define CHECKPOINT_MIN_PERIOD 100  /* unit: ms */
>>
>> +/*
>> + * force checkpoint timer: unit ms
>> + * this is large because COLO checkpoint will mostly depend on
>> + * COLO compare module.
>> + */
>> +#define CHECKPOINT_MAX_PEROID 10000
>> +
>>   enum {
>>       COLO_READY = 0x46,
>>
>> @@ -340,14 +348,7 @@ static void *colo_thread(void *opaque)
>>           proxy_checkpoint_req = colo_proxy_compare();
>>           if (proxy_checkpoint_req < 0) {
>>               goto out;
>> -        } else if (!proxy_checkpoint_req) {
>> -            /*
>> -             * No checkpoint is needed, wait for 1ms and then
>> -             * check if we need checkpoint again
>> -             */
>> -            g_usleep(1000);
>> -            continue;
>> -        } else {
>> +        } else if (proxy_checkpoint_req) {
>>               int64_t interval;
>>
>>               current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> @@ -357,8 +358,20 @@ static void *colo_thread(void *opaque)
>>                   g_usleep((1000*(CHECKPOINT_MIN_PERIOD - interval)));
>>               }
>>               DPRINTF("Net packets is not consistent!!!\n");
>> +            goto do_checkpoint;
>> +        }
>> +
>> +        /*
>> +         * No proxy checkpoint is request, wait for 100ms
>> +         * and then check if we need checkpoint again.
>> +         */
>> +        current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> +        if (current_time - checkpoint_time < CHECKPOINT_MAX_PEROID) {
>> +            g_usleep(100000);
>> +            continue;
>
> This 100ms sleep is interesting - can you explain it's purpose; is it
> just to save CPU time in the colo thread?  It used to be 1ms (above
> and in the previous version).
>

You are right, it is just to save CPU time. I'm not sure but if '1ms' is a little too short ?
In our latest patch, we actually will send some dirty pages to slave side during this sleep time
if there are dirty pages.

> The MIN_PERIOD already stops the checkpoints being too close together,
> so this is a separate sleep from that.
>>           }
>>
>> +do_checkpoint:
>>           /* start a colo checkpoint */
>>           if (colo_do_checkpoint_transaction(s, colo_control)) {
>>               goto out;
>> --
>> 1.7.12.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>