From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42437) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aIyTJ-0006b2-GH for qemu-devel@nongnu.org; Tue, 12 Jan 2016 07:54:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aIyTF-00089L-FI for qemu-devel@nongnu.org; Tue, 12 Jan 2016 07:54:49 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:8385) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aIyTE-00087d-84 for qemu-devel@nongnu.org; Tue, 12 Jan 2016 07:54:45 -0500 References: <1450167779-9960-1-git-send-email-zhang.zhanghailiang@huawei.com> <1450167779-9960-12-git-send-email-zhang.zhanghailiang@huawei.com> <87twnepz4h.fsf@blackfin.pond.sub.org> <56795103.5060001@huawei.com> <878u3wz27f.fsf@blackfin.pond.sub.org> From: Hailiang Zhang Message-ID: <5694F776.8050802@huawei.com> Date: Tue, 12 Jan 2016 20:54:14 +0800 MIME-Version: 1.0 In-Reply-To: <878u3wz27f.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, qemu-devel@nongnu.org, peter.huangpeng@huawei.com, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, dgilbert@redhat.com, hongyang.yang@easystack.cn On 2016/1/11 21:16, Markus Armbruster wrote: > Hailiang Zhang writes: > >> On 2015/12/19 17:27, Markus Armbruster wrote: >>> zhanghailiang writes: >>> >>>> Guest will enter this state when paused to save/restore VM state >>>> under colo checkpoint. >>>> >>>> Cc: Eric Blake >>>> Cc: Markus Armbruster >>>> Signed-off-by: zhanghailiang >>>> Signed-off-by: Li Zhijian >>>> Signed-off-by: Gonglei >>>> Reviewed-by: Dr. David Alan Gilbert >>>> Reviewed-by: Eric Blake >>>> --- >>>> qapi-schema.json | 5 ++++- >>>> vl.c | 8 ++++++++ >>>> 2 files changed, 12 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/qapi-schema.json b/qapi-schema.json >>>> index 85f7800..0423b47 100644 >>>> --- a/qapi-schema.json >>>> +++ b/qapi-schema.json >>>> @@ -154,12 +154,15 @@ >>>> # @watchdog: the watchdog action is configured to pause and has been triggered >>>> # >>>> # @guest-panicked: guest has been panicked as a result of guest OS panic >>>> +# >>>> +# @colo: guest is paused to save/restore VM state under colo checkpoint (since >>>> +# 2.6) >>>> ## >>>> { 'enum': 'RunState', >>>> 'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused', >>>> 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm', >>>> 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog', >>>> - 'guest-panicked' ] } >>>> + 'guest-panicked', 'colo' ] } >>>> >>>> ## >>>> # @StatusInfo: >>>> diff --git a/vl.c b/vl.c >>>> index f84fde8..fca630b 100644 >>>> --- a/vl.c >>>> +++ b/vl.c >>>> @@ -594,6 +594,7 @@ static const RunStateTransition runstate_transitions_def[] = { >>>> { RUN_STATE_INMIGRATE, RUN_STATE_WATCHDOG }, >>>> { RUN_STATE_INMIGRATE, RUN_STATE_GUEST_PANICKED }, >>>> { RUN_STATE_INMIGRATE, RUN_STATE_FINISH_MIGRATE }, >>>> + { RUN_STATE_INMIGRATE, RUN_STATE_COLO }, >>>> >>>> { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED }, >>>> { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE }, >>>> @@ -603,6 +604,7 @@ static const RunStateTransition runstate_transitions_def[] = { >>>> >>>> { RUN_STATE_PAUSED, RUN_STATE_RUNNING }, >>>> { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE }, >>>> + { RUN_STATE_PAUSED, RUN_STATE_COLO}, >>>> >>>> { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING }, >>>> { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE }, >>>> @@ -613,9 +615,12 @@ static const RunStateTransition runstate_transitions_def[] = { >>>> >>>> { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING }, >>>> { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE }, >>>> + { RUN_STATE_FINISH_MIGRATE, RUN_STATE_COLO}, >>>> >>>> { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING }, >>>> >>>> + { RUN_STATE_COLO, RUN_STATE_RUNNING }, >>>> + >>>> { RUN_STATE_RUNNING, RUN_STATE_DEBUG }, >>>> { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR }, >>>> { RUN_STATE_RUNNING, RUN_STATE_IO_ERROR }, >>>> @@ -626,6 +631,7 @@ static const RunStateTransition runstate_transitions_def[] = { >>>> { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN }, >>>> { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG }, >>>> { RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED }, >>>> + { RUN_STATE_RUNNING, RUN_STATE_COLO}, >>>> >>>> { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING }, >>>> >>>> @@ -636,9 +642,11 @@ static const RunStateTransition runstate_transitions_def[] = { >>>> { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED }, >>>> { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, >>>> { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, >>>> + { RUN_STATE_SUSPENDED, RUN_STATE_COLO}, >>>> >>>> { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, >>>> { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, >>>> + { RUN_STATE_WATCHDOG, RUN_STATE_COLO}, >>>> >>>> { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, >>>> { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, >>> >>> Pardon my ignorance, but could you explain the new run state in a bit >>> more detail for me? >>> >> >> OK, in normally, we only need switch between COLO and RUNNING state. >> But we can't forbid users to issue other command while VM is COLO state. >> >> In every checkpoint, we have to pause to send VM's state to SVM, and before we >> pause VM, users may issue 'stop' command, which will change state to >> 'RUN_STATE_PAUSE', >> we don't want to abort VM because of this command. (Actually, we will >> support 'stop' VM >> during VM is in COLO state). So we need the state machine >> 'RUN_STATE_PAUSED -> RUN_STATE_COLO'. > > What's the next state then? > We may switch to RUN_STATE_RUNNING, actually, here, the RUN_STATE_COLO is only used to indicate that VM is stopped in COLO process. >> We enter COLO state just after a full migration process which the last >> state will be >> 'RUN_STATE_FINISH_MIGRATE' or 'RUN_STATE_INMIGRATE', before we enter >> COLO loop, we may get >> 'x-colo-lost-heartbeat', and will run into 'RUN_STATE_COLO' pause, so we need >> state machines 'RUN_STATE_FINISH_MIGRATE -> RUN_STATE_COLO'and >> 'RUN_STATE_INMIGRATE, RUN_STATE_COLO'. >> The reason we need RUN_STATE_SUSPENDED -> RUN_STATE_COLO is, guest or >> users may issue standby command. >> We need to ensure VM not be crashed. >> >> Actually, we may need more states which can go to 'colo' state, maybe >> just follow the cases of >> 'MIGRATE' state. > > I believe we should fully work out the state transitions added by COLO. > I like to write that down in this form: > > (state, trigger) -> (action, state') > I'm a little confused, for runstate_transitions_def, it seems that, the state transition is a simple way: (state1, state2). Here we only switch to RUN_STATE_COLO state when we need to do something with VM is paused. > Example: > > (running, checkpoint) -> (begin-checkpointing, colo) > Do you want me to add these new states into runstate_transitions_def ? What's the real status (running or stopping) of 'checkpoint' and 'colo' for VM here ? > with a suitable explanation of 'checkpoint' and 'begin-checkpointing'. > > For brevity, multiple > > (state1, trigger) -> (action, state') > (state2, trigger) -> (action, state') > ... > (stateN, trigger) -> (action, state') > > can be abbreviated to > > ({state1, state2, stateN}, trigger) -> (action, state') > > Example: > > ({running, paused, ...}, checkpoint) -> (begin-checkpointing, colo) > > For clarity, chains of state transitions should be described in the > order they happen. > > Pictures showing the states connected with transition arrows labelled > with the trigger can help. > > Two properties to check: > > 1. Correctness: every state transition thus written down does the right > thing. > > 2. Completeness: for every pair (state, trigger), we got a state > transition, or an explanation why it cannot happen. > >> Thanks, >> zhanghailiang >> >>> Your additions to runstate_transitions_def[] show we can go *from* state >>> 'colo' only to state 'running', but we can go *to* state 'colo' from >>> various other states. This may well be sane, but it's not *obviously* >>> sane :) > > . >