All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rao, Lei" <lei.rao@intel.com>
To: Lukas Straub <lukasstraub2@web.de>
Cc: "zhang.zhanghailiang@huawei.com" <zhang.zhanghailiang@huawei.com>,
	"lizhijian@cn.fujitsu.com" <lizhijian@cn.fujitsu.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"jasowang@redhat.com" <jasowang@redhat.com>,
	"dgilbert@redhat.com" <dgilbert@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Zhang, Chen" <chen.zhang@intel.com>
Subject: RE: [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint
Date: Fri, 29 Jan 2021 02:57:57 +0000	[thread overview]
Message-ID: <SN6PR11MB3103D9E313084D9BDB98E2F0FDB99@SN6PR11MB3103.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20210127192416.525baaaa@gecko.fritz.box>

The state will be set RUN_STATE_COLO in colo_do_checkpoint_transaction(). If the guest executes power off or shutdown at this time and the QEMU main thread will call vm_shutdown(), it will set the state to RUN_STATE_SHUTDOWN.
The state switch from RUN_STATE_COLO to RUN_STATE_SHUTDOWN is not defined in runstate_transitions_def. this will cause QEMU crash. Although this is small probability, it may still happen. By the way. Do you have any comments about other patches?

Thanks,
Lei.

-----Original Message-----
From: Lukas Straub <lukasstraub2@web.de> 
Sent: Thursday, January 28, 2021 2:24 AM
To: Rao, Lei <lei.rao@intel.com>
Cc: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com; jasowang@redhat.com; zhang.zhanghailiang@huawei.com; quintela@redhat.com; dgilbert@redhat.com; qemu-devel@nongnu.org
Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint

On Thu, 21 Jan 2021 01:48:31 +0000
"Rao, Lei" <lei.rao@intel.com> wrote:

> The Primary VM can be shut down when it is in COLO state, which may trigger this bug.

Do you have a backtrace for this bug?

> About 'shutdown' -> 'colo' -> 'running', I think you are right, I did have the problems you said. For 'shutdown'->'colo', The fixed patch(5647051f432b7c9b57525470b0a79a31339062d2) have been merged.
> Recently, I found another bug as follows in the test.
> 	qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'running'
>     	Aborted (core dumped)
> The gdb bt as following:
>     #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>     #1  0x00007faa3d613859 in __GI_abort () at abort.c:79
>     #2  0x000055c5a21268fd in runstate_set (new_state=RUN_STATE_RUNNING) at vl.c:723
>     #3  0x000055c5a1f8cae4 in vm_prepare_start () at /home/workspace/colo-qemu/cpus.c:2206
>     #4  0x000055c5a1f8cb1b in vm_start () at /home/workspace/colo-qemu/cpus.c:2213
>     #5  0x000055c5a2332bba in migration_iteration_finish (s=0x55c5a4658810) at migration/migration.c:3376
>     #6  0x000055c5a2332f3b in migration_thread (opaque=0x55c5a4658810) at migration/migration.c:3527
>     #7  0x000055c5a251d68a in qemu_thread_start (args=0x55c5a5491a70) at util/qemu-thread-posix.c:519
>     #8  0x00007faa3d7e9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
>     #9  0x00007faa3d710293 in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> For the bug, I made the following changes:
> 	@@ -3379,7 +3379,9 @@ static void migration_iteration_finish(MigrationState *s)
>      case MIGRATION_STATUS_CANCELLED:
>      case MIGRATION_STATUS_CANCELLING:
>          if (s->vm_was_running) {
> -            vm_start();
> +            if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> +                vm_start();
> +            }
>          } else {
>              if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
>                  runstate_set(RUN_STATE_POSTMIGRATE);
> 				 
> I will send the patch to community after more test.
> 
> Thanks,
> Lei.
> 
> -----Original Message-----
> From: Lukas Straub <lukasstraub2@web.de>
> Sent: Thursday, January 21, 2021 3:13 AM
> To: Rao, Lei <lei.rao@intel.com>
> Cc: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com; 
> jasowang@redhat.com; zhang.zhanghailiang@huawei.com; 
> quintela@redhat.com; dgilbert@redhat.com; qemu-devel@nongnu.org
> Subject: Re: [PATCH 02/10] Fix the qemu crash when guest shutdown 
> during checkpoint
> 
> On Wed, 13 Jan 2021 10:46:27 +0800
> leirao <lei.rao@intel.com> wrote:
> 
> > From: "Rao, Lei" <lei.rao@intel.com>
> > 
> > This patch fixes the following:
> >     qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown'
> >     Aborted (core dumped)
> > 
> > Signed-off-by: Lei Rao <lei.rao@intel.com>
> 
> I wonder how that is possible, since the VM is stopped during 'colo' state.
> 
> Unrelated to this patch, I think this area needs some work since the following unintended runstate transition is possible:
> 'shutdown' -> 'colo' -> 'running'.
> 
> > ---
> >  softmmu/runstate.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/softmmu/runstate.c b/softmmu/runstate.c index 
> > 636aab0..455ad0d 100644
> > --- a/softmmu/runstate.c
> > +++ b/softmmu/runstate.c
> > @@ -125,6 +125,7 @@ static const RunStateTransition runstate_transitions_def[] = {
> >      { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
> >  
> >      { RUN_STATE_COLO, RUN_STATE_RUNNING },
> > +    { RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
> >  
> >      { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
> >      { RUN_STATE_RUNNING, RUN_STATE_INTERNAL_ERROR },
> 
> 
> 



-- 



  reply	other threads:[~2021-01-29  2:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-13  2:46 [PATCH 00/10] Fixed some bugs and optimized some codes for COLO leirao
2021-01-13  2:46 ` [PATCH 01/10] Remove some duplicate trace code leirao
2021-01-20 18:43   ` Lukas Straub
2021-01-13  2:46 ` [PATCH 02/10] Fix the qemu crash when guest shutdown during checkpoint leirao
2021-01-20 19:12   ` Lukas Straub
2021-01-21  1:48     ` Rao, Lei
2021-01-27 18:24       ` Lukas Straub
2021-01-29  2:57         ` Rao, Lei [this message]
2021-02-14 11:45           ` Lukas Straub
2021-02-25  9:40             ` Rao, Lei
2021-01-13  2:46 ` [PATCH 03/10] Optimize the function of filter_send leirao
2021-01-20 19:21   ` Lukas Straub
2021-01-21  1:02     ` Rao, Lei
2021-01-13  2:46 ` [PATCH 04/10] Remove migrate_set_block_enabled in checkpoint leirao
2021-01-20 19:28   ` Lukas Straub
2021-01-13  2:46 ` [PATCH 05/10] Optimize the function of packet_new leirao
2021-01-20 19:45   ` Lukas Straub
2021-01-13  2:46 ` [PATCH 06/10] Add the function of colo_compare_cleanup leirao
2021-01-13  2:46 ` [PATCH 07/10] Disable auto-coverge before entering COLO mode leirao
2021-01-13 11:31   ` Dr. David Alan Gilbert
2021-01-14  3:21     ` Rao, Lei
2021-02-14 10:52   ` Lukas Straub
2021-02-25  9:22     ` Rao, Lei
2021-01-13  2:46 ` [PATCH 08/10] Reduce the PVM stop time during Checkpoint leirao
2021-01-13  2:46 ` [PATCH 09/10] Add the function of colo_bitmap_clear_diry leirao
2021-01-13  2:46 ` [PATCH 10/10] Fixed calculation error of pkt->header_size in fill_pkt_tcp_info() leirao
2021-02-14 11:50 ` [PATCH 00/10] Fixed some bugs and optimized some codes for COLO Lukas Straub

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR11MB3103D9E313084D9BDB98E2F0FDB99@SN6PR11MB3103.namprd11.prod.outlook.com \
    --to=lei.rao@intel.com \
    --cc=chen.zhang@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=lukasstraub2@web.de \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.