All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Lukas Straub <lukasstraub2@web.de>,
	Juan Quintela <quintela@redhat.com>,
	Li Xiaohui <xiaohli@redhat.com>,
	qemu-devel@nongnu.org, Li Xiaohui <xiaohuixiaohli@redhat.com>,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>
Subject: Re: [PATCH 1/3] migration: Release return path early for paused postcopy
Date: Mon, 12 Jul 2021 18:44:42 +0100	[thread overview]
Message-ID: <YOx/im9h/OJLRQ3N@work-vm> (raw)
In-Reply-To: <20210708190653.252961-2-peterx@redhat.com>

* Peter Xu (peterx@redhat.com) wrote:
> When postcopy pause triggered, we rely on the migration thread to cleanup the
> to_dst_file handle, and the return path thread to cleanup the from_dst_file
> handle (which is stored in the local variable "rp").
> 
> Within the process, from_dst_file cleanup (qemu_fclose) is postponed until it's
> setup again due to a postcopy recovery.
> 
> It used to work before yank was born; after yank is introduced we rely on the
> refcount of IOC to correctly unregister yank function in channel_close().  If
> without the early and on-time release of from_dst_file handle the yank function
> will be leftover during paused postcopy.
> 
> Without this patch, below steps (quoted from Xiaohui) could trigger qemu src
> crash:
> 
>   1.Boot vm on src host
>   2.Boot vm on dst host
>   3.Enable postcopy on src&dst host
>   4.Load stressapptest in vm and set postcopy speed to 50M
>   5.Start migration from src to dst host, change into postcopy mode when migration is active.
>   6.When postcopy is active, down the network card(do migration via this network) on dst host.
>   7.Wait untill postcopy is paused on src&dst host.
>   8.Before up network card, recover migration on dst host, will get error like following.
>   9.Ignore the error of step 8, go on recovering migration on src host:
> 
>   After step 9, qemu on src host will core dump after some seconds:
>   qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
>   1.sh: line 38: 44662 Aborted                 (core dumped)
> 
> Reported-by: Li Xiaohui <xiaohuixiaohli@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

(and I can cleanup the email address problem)

> ---
>  migration/migration.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 5ff7ba9d5c..8786104c9a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2818,12 +2818,12 @@ out:
>               * Maybe there is something we can do: it looks like a
>               * network down issue, and we pause for a recovery.
>               */
> +            qemu_fclose(rp);
> +            ms->rp_state.from_dst_file = NULL;
> +            rp = NULL;
>              if (postcopy_pause_return_path_thread(ms)) {
>                  /* Reload rp, reset the rest */
> -                if (rp != ms->rp_state.from_dst_file) {
> -                    qemu_fclose(rp);
> -                    rp = ms->rp_state.from_dst_file;
> -                }
> +                rp = ms->rp_state.from_dst_file;
>                  ms->rp_state.error = false;
>                  goto retry;
>              }
> -- 
> 2.31.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  parent reply	other threads:[~2021-07-12 17:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-08 19:06 [PATCH 0/3] migration: Three more fixes for postcopy recovery Peter Xu
2021-07-08 19:06 ` [PATCH 1/3] migration: Release return path early for paused postcopy Peter Xu
2021-07-08 19:13   ` Peter Xu
2021-07-12 17:44   ` Dr. David Alan Gilbert [this message]
2021-07-08 19:06 ` [PATCH 2/3] migration: Don't do migrate cleanup if during postcopy resume Peter Xu
2021-07-12 18:33   ` Dr. David Alan Gilbert
2021-07-08 19:06 ` [PATCH 3/3] migration: Clear error at entry of migrate_fd_connect() Peter Xu
2021-07-12 18:40   ` Dr. David Alan Gilbert
2021-07-13 11:04 ` [PATCH 0/3] migration: Three more fixes for postcopy recovery Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YOx/im9h/OJLRQ3N@work-vm \
    --to=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=lukasstraub2@web.de \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=xiaohli@redhat.com \
    --cc=xiaohuixiaohli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.