From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49964)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1a7KAh-0006io-0f
	for qemu-devel@nongnu.org; Fri, 11 Dec 2015 04:39:28 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1a7KAc-0005Qb-TR
	for qemu-devel@nongnu.org; Fri, 11 Dec 2015 04:39:26 -0500
Received: from szxga02-in.huawei.com ([119.145.14.65]:24220)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1a7KAb-0005QD-Vl
	for qemu-devel@nongnu.org; Fri, 11 Dec 2015 04:39:22 -0500
References: <1448357149-17572-1-git-send-email-zhang.zhanghailiang@huawei.com>
	<1448357149-17572-24-git-send-email-zhang.zhanghailiang@huawei.com>
	<20151210183457.GJ2570@work-vm> <566A8146.6010700@huawei.com>
	<20151211092159.GB2987@work-vm>
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
Message-ID: <566A99AF.9050004@huawei.com>
Date: Fri, 11 Dec 2015 17:38:55 +0800
MIME-Version: 1.0
In-Reply-To: <20151211092159.GB2987@work-vm>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH COLO-Frame v11 23/39] COLO: Implement
 failover work for Primary VM
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, qemu-devel@nongnu.org, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, hongyang.yang@easystack.cn

On 2015/12/11 17:22, Dr. David Alan Gilbert wrote:
> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>> On 2015/12/11 2:34, Dr. David Alan Gilbert wrote:
>>> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>>>> For PVM, if there is failover request from users.
>>>> The colo thread will exit the loop while the failover BH does the
>>>> cleanup work and resumes VM.
>>>>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>>> ---
>>>> v11:
>>>> - Don't call migration_end() in primary_vm_do_failover(),
>>>>   The cleanup work will be done in migration_thread().
>>>> - Remove vm_start() in primary_vm_do_failover() which also been done
>>>>    in migraiton_thread()
>>>> v10:
>>>> - Call migration_end() in primary_vm_do_failover()
>>>> ---
>>>>   include/migration/colo.h     |  3 +++
>>>>   include/migration/failover.h |  1 +
>>>>   migration/colo-failover.c    |  7 +++++-
>>>>   migration/colo.c             | 56 ++++++++++++++++++++++++++++++++++++++++++--
>>>>   4 files changed, 64 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/migration/colo.h b/include/migration/colo.h
>>>> index ba27719..0b02e95 100644
>>>> --- a/include/migration/colo.h
>>>> +++ b/include/migration/colo.h
>>>> @@ -32,4 +32,7 @@ void *colo_process_incoming_thread(void *opaque);
>>>>   bool migration_incoming_in_colo_state(void);
>>>>
>>>>   COLOMode get_colo_mode(void);
>>>> +
>>>> +/* failover */
>>>> +void colo_do_failover(MigrationState *s);
>>>>   #endif
>>>> diff --git a/include/migration/failover.h b/include/migration/failover.h
>>>> index 882c625..fba3931 100644
>>>> --- a/include/migration/failover.h
>>>> +++ b/include/migration/failover.h
>>>> @@ -26,5 +26,6 @@ void failover_init_state(void);
>>>>   int failover_set_state(int old_state, int new_state);
>>>>   int failover_get_state(void);
>>>>   void failover_request_active(Error **errp);
>>>> +bool failover_request_is_active(void);
>>>>
>>>>   #endif
>>>> diff --git a/migration/colo-failover.c b/migration/colo-failover.c
>>>> index 1b1be24..0c525da 100644
>>>> --- a/migration/colo-failover.c
>>>> +++ b/migration/colo-failover.c
>>>> @@ -32,7 +32,7 @@ static void colo_failover_bh(void *opaque)
>>>>           error_report("Unkown error for failover, old_state=%d", old_state);
>>>>           return;
>>>>       }
>>>> -    /*TODO: Do failover work */
>>>> +    colo_do_failover(NULL);
>>>>   }
>>>>
>>>>   void failover_request_active(Error **errp)
>>>> @@ -67,6 +67,11 @@ int failover_get_state(void)
>>>>       return atomic_read(&failover_state);
>>>>   }
>>>>
>>>> +bool failover_request_is_active(void)
>>>> +{
>>>> +    return ((failover_get_state() != FAILOVER_STATUS_NONE));
>>>> +}
>>>> +
>>>>   void qmp_x_colo_lost_heartbeat(Error **errp)
>>>>   {
>>>>       if (get_colo_mode() == COLO_MODE_UNKNOWN) {
>>>> diff --git a/migration/colo.c b/migration/colo.c
>>>> index cedfc63..7a42fc6 100644
>>>> --- a/migration/colo.c
>>>> +++ b/migration/colo.c
>>>> @@ -41,6 +41,42 @@ bool migration_incoming_in_colo_state(void)
>>>>       return mis && (mis->state == MIGRATION_STATUS_COLO);
>>>>   }
>>>>
>>>> +static bool colo_runstate_is_stopped(void)
>>>> +{
>>>> +    return runstate_check(RUN_STATE_COLO) || !runstate_is_running();
>>>> +}
>>>> +
>>>> +static void primary_vm_do_failover(void)
>>>> +{
>>>> +    MigrationState *s = migrate_get_current();
>>>> +    int old_state;
>>>> +
>>>> +    if (s->state != MIGRATION_STATUS_FAILED) {
>>>> +        migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>>>> +                          MIGRATION_STATUS_COMPLETED);
>>>> +    }
>>>
>>> That's a little odd; it will only move to completed if the current
>>> state is MIGRATION_STATUS_COLO, but you only do it if the state isn't
>>> FAILED.  You could remove the if and just call migrate_set_state
>>> like that and it would be safe as long as you really only expect
>>> to have to deal with MIGRATION_STATUS_COLO.
>>>
>>
>> Yes, you are right, i will remove the judgement, and set the state directly.
>>
>>>> +    old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
>>>> +                                   FAILOVER_STATUS_COMPLETED);
>>>> +    if (old_state != FAILOVER_STATUS_HANDLING) {
>>>> +        error_report("Serious error while do failover for Primary VM,"
>>>> +                     "old_state: %d", old_state);
>>>
>>> It's generally better to state the reason rather than say it's 'serious';
>>> so something like:
>>>     'Incorrect state (%d) while doing failover for Primary VM'
>>> tells you more, and looks less scary!
>>>
>>
>> Ha, OK, i will fix it.
>>
>>>> +        return;
>>>> +    }
>>>> +}
>>>> +
>>>> +void colo_do_failover(MigrationState *s)
>>>> +{
>>>> +    /* Make sure vm stopped while failover */
>>>> +    if (!colo_runstate_is_stopped()) {
>>>> +        vm_stop_force_state(RUN_STATE_COLO);
>>>> +    }
>>>
>>> That feels like a race? couldn't we be just at the end
>>> of taking a checkpoint and about to restart when you do
>>> the if, so it reads that it's currently stopped but
>>> then it restarts it by the time you have a chance to
>>> do anything?
>>
>> Do you mean, after we stopped VM in failover() but before done the failover work,
>> the migration (checkpoint) thread may starts VM just in the middle time ?
>> The colo_do_failover() is in the context of BH, it holds
>> the __lock__, so checkpoint thread has no chance to execute vm_start().
>
> What happens if the failover code was executed just before the checkpoint thread
> executed the _lock_, when the failover code finishes what happens?
> (I'm not sure this is a problem - I just thought I should check).
>

Er, the colo loop will exit, and the failover flag will be detected in colo process thread finally.
It's OK in this case. But, yes. we should add more check for the failover status.

Thanks
Hailiang


>
>>> I see in patch 13 (Save PVM....) you have:
>>>       qemu_mutex_lock_iothread();
>>>       vm_start();
>>>       qemu_mutex_unlock_iothread();
>>>
>>> So maybe if that code is executed just before the
>>> failover, then it would stop at the _lock_, we would
>>> run here but then as soon as we finish wouldn't it vm_start
>>> there?
>>>
>>
>> But it seems that, we don't need to stop VM to do the failover work.
>> I'm not so sure, i will investigate if we can do it without stopping VM.
>
> Stopping it seems safer.
>

OK.

>>
>>>> +    if (get_colo_mode() == COLO_MODE_PRIMARY) {
>>>> +        primary_vm_do_failover();
>>>> +    }
>>>> +}
>>>> +
>>>>   /* colo checkpoint control helper */
>>>>   static int colo_ctl_put(QEMUFile *f, uint32_t cmd, uint64_t value)
>>>>   {
>>>> @@ -122,9 +158,22 @@ static int colo_do_checkpoint_transaction(MigrationState *s,
>>>>       }
>>>>
>>>>       qemu_mutex_lock_iothread();
>>>> +    if (failover_request_is_active()) {
>>>> +        qemu_mutex_unlock_iothread();
>>>> +        ret = -1;
>>>> +        goto out;
>>>> +    }
>>>>       vm_stop_force_state(RUN_STATE_COLO);
>>>>       qemu_mutex_unlock_iothread();
>>>>       trace_colo_vm_state_change("run", "stop");
>>>> +    /*
>>>> +     * failover request bh could be called after
>>>> +     * vm_stop_force_state so we check failover_request_is_active() again.
>>>> +     */
>>>> +    if (failover_request_is_active()) {
>>>> +        ret = -1;
>>>> +        goto out;
>>>> +    }
>>>
>>> I'm confused about why the check is needed specifically here;
>>> for example can't that happen at any point where we're ousite of the
>>> iothread lock?  e.g. couldn't we set failover just a couple of
>>> lines lower, lets say just after the s->params.blk= 0 ?
>>>
>>
>> Yes, you are right, failover could happen at any place where we are not
>> holding iothread lock. We should checkpoint the failover status after every
>> important steps. I'll add more check in next version ...
>>
>>
>>>>       /* Disable block migration */
>>>>       s->params.blk = 0;
>>>> @@ -221,6 +270,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>>>       trace_colo_vm_state_change("stop", "run");
>>>>
>>>>       while (s->state == MIGRATION_STATUS_COLO) {
>>>> +        if (failover_request_is_active()) {
>>>> +            error_report("failover request");
>>>> +            goto out;
>>>> +        }
>>>> +
>>>>           current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>>>>           if (current_time - checkpoint_time <
>>>>               s->parameters[MIGRATION_PARAMETER_CHECKPOINT_DELAY]) {
>>>> @@ -242,8 +296,6 @@ out:
>>>>       if (ret < 0) {
>>>>           error_report("%s: %s", __func__, strerror(-ret));
>>>>       }
>>>> -    migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
>>>> -                      MIGRATION_STATUS_COMPLETED);
>>>
>>> I had to think about this; but yes, I guess the only way out is via the failover
>>> which does the completed above.
>>>
>>
>> Yes, thanks.
>>
>> Hailiang
>>
>>> Dave
>>>
>>>>       qsb_free(buffer);
>>>>       buffer = NULL;
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>