qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Juan Quintela <quintela@redhat.com>
To: Ivan Ren <renyime@gmail.com>
Cc: dgilbert@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migration leads destination hung forever
Date: Wed, 24 Jul 2019 11:01:07 +0200	[thread overview]
Message-ID: <878ssn957w.fsf@trasno.org> (raw)
In-Reply-To: <1561468699-9819-4-git-send-email-ivanren@tencent.com> (Ivan Ren's message of "Tue, 25 Jun 2019 21:18:19 +0800")

Ivan Ren <renyime@gmail.com> wrote:
> When migrate_cancel a multifd migration, if run sequence like this:
>
>         [source]                              [destination]
>
> multifd_send_sync_main[finish]
>                                     multifd_recv_thread wait &p->sem_sync
> shutdown to_dst_file
>                                     detect error from_src_file
> send  RAM_SAVE_FLAG_EOS[fail]       [no chance to run multifd_recv_sync_main]
>                                     multifd_load_cleanup
>                                     join multifd receive thread forever
>
> will lead destination qemu hung at following stack:
>
> pthread_join
> qemu_thread_join
> multifd_load_cleanup
> process_incoming_migration_co
> coroutine_trampoline
>
> Signed-off-by: Ivan Ren <ivanren@tencent.com>

I think this one is not enough.  We need to set some error code, or
disable the running bit at that point.

> ---
>  migration/ram.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index e4eb9c441f..504c8ccb03 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1291,6 +1291,11 @@ int multifd_load_cleanup(Error **errp)
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>  
>          if (p->running) {
> +            /*
> +             * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
> +             * however try to wakeup it without harm in cleanup phase.
> +             */
> +            qemu_sem_post(&p->sem_sync);
>              qemu_thread_join(&p->thread);
>          }
>          object_unref(OBJECT(p->c));

Let's see where we wait for p->sem_sync:


static void *multifd_recv_thread(void *opaque)
{
    ....
    while (true) {
        uint32_t used;
        uint32_t flags;

        ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
                                       p->packet_len, &local_err);
        .....

        if (flags & MULTIFD_FLAG_SYNC) {
            qemu_sem_post(&multifd_recv_state->sem_sync);
            qemu_sem_wait(&p->sem_sync);
        }
    }
    if (local_err) {
        multifd_recv_terminate_threads(local_err);
    }
    qemu_mutex_lock(&p->mutex);
    p->running = false;
    qemu_mutex_unlock(&p->mutex);

    rcu_unregister_thread();
    trace_multifd_recv_thread_end(p->id, p->num_packets, p->num_pages);

    return NULL;
}

If we just post it there, we get out of the wait (that bit is ok), but
then we go back to the beggining of the bucle, we (probably) got one
error on the qui_channel_read_all_eof(), and we go back to
multifd_recv_terminate_threads(), or wait there.

I think that it is better to *also* set an p->quit variable there, and
not even try to receive anything for that channel?

I will send a patch later.

Good catch.

Later, Juan.


  parent reply	other threads:[~2019-07-24  9:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-25 13:18 [Qemu-devel] [PATCH 0/3] migration: fix migrate_cancel problems of multifd Ivan Ren
2019-06-25 13:18 ` [Qemu-devel] [PATCH 1/3] migration: fix migrate_cancel leads live_migration thread endless loop Ivan Ren
2019-07-24  8:43   ` Juan Quintela
2019-06-25 13:18 ` [Qemu-devel] [PATCH 2/3] migration: fix migrate_cancel leads live_migration thread hung forever Ivan Ren
2019-07-24  8:47   ` Juan Quintela
2019-07-24  9:18   ` Juan Quintela
2019-06-25 13:18 ` [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migration leads destination " Ivan Ren
2019-07-24  7:01   ` Ivan Ren
2019-07-24  9:01   ` Juan Quintela [this message]
2019-07-24 11:30     ` Ivan Ren
2019-07-14 14:53 ` [Qemu-devel] [PATCH 0/3] migration: fix migrate_cancel problems of multifd Ivan Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878ssn957w.fsf@trasno.org \
    --to=quintela@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=renyime@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).