qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Ivan Ren <renyime@gmail.com>
To: quintela@redhat.com
Cc: dgilbert@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migration leads destination hung forever
Date: Wed, 24 Jul 2019 19:30:15 +0800	[thread overview]
Message-ID: <CA+6E1=mGdbfqFNyOk-r5H2C8Rm1tJmQyu37n-Bw24Ns-vZ4iyg@mail.gmail.com> (raw)
In-Reply-To: <878ssn957w.fsf@trasno.org>

> If we just post it there, we get out of the wait (that bit is ok), but
> then we go back to the beggining of the bucle, we (probably) got one
> error on the qui_channel_read_all_eof(), and we go back to
> multifd_recv_terminate_threads(), or wait there.
>
> I think that it is better to *also* set an p->quit variable there, and
> not even try to receive anything for that channel?
>
> I will send a patch later.

Yes, agree.

Thanks for review.

On Wed, Jul 24, 2019 at 5:01 PM Juan Quintela <quintela@redhat.com> wrote:

> Ivan Ren <renyime@gmail.com> wrote:
> > When migrate_cancel a multifd migration, if run sequence like this:
> >
> >         [source]                              [destination]
> >
> > multifd_send_sync_main[finish]
> >                                     multifd_recv_thread wait &p->sem_sync
> > shutdown to_dst_file
> >                                     detect error from_src_file
> > send  RAM_SAVE_FLAG_EOS[fail]       [no chance to run
> multifd_recv_sync_main]
> >                                     multifd_load_cleanup
> >                                     join multifd receive thread forever
> >
> > will lead destination qemu hung at following stack:
> >
> > pthread_join
> > qemu_thread_join
> > multifd_load_cleanup
> > process_incoming_migration_co
> > coroutine_trampoline
> >
> > Signed-off-by: Ivan Ren <ivanren@tencent.com>
>
> I think this one is not enough.  We need to set some error code, or
> disable the running bit at that point.
>
> > ---
> >  migration/ram.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index e4eb9c441f..504c8ccb03 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1291,6 +1291,11 @@ int multifd_load_cleanup(Error **errp)
> >          MultiFDRecvParams *p = &multifd_recv_state->params[i];
> >
> >          if (p->running) {
> > +            /*
> > +             * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle
> code,
> > +             * however try to wakeup it without harm in cleanup phase.
> > +             */
> > +            qemu_sem_post(&p->sem_sync);
> >              qemu_thread_join(&p->thread);
> >          }
> >          object_unref(OBJECT(p->c));
>
> Let's see where we wait for p->sem_sync:
>
>
> static void *multifd_recv_thread(void *opaque)
> {
>     ....
>     while (true) {
>         uint32_t used;
>         uint32_t flags;
>
>         ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>                                        p->packet_len, &local_err);
>         .....
>
>         if (flags & MULTIFD_FLAG_SYNC) {
>             qemu_sem_post(&multifd_recv_state->sem_sync);
>             qemu_sem_wait(&p->sem_sync);
>         }
>     }
>     if (local_err) {
>         multifd_recv_terminate_threads(local_err);
>     }
>     qemu_mutex_lock(&p->mutex);
>     p->running = false;
>     qemu_mutex_unlock(&p->mutex);
>
>     rcu_unregister_thread();
>     trace_multifd_recv_thread_end(p->id, p->num_packets, p->num_pages);
>
>     return NULL;
> }
>
> If we just post it there, we get out of the wait (that bit is ok), but
> then we go back to the beggining of the bucle, we (probably) got one
> error on the qui_channel_read_all_eof(), and we go back to
> multifd_recv_terminate_threads(), or wait there.
>
> I think that it is better to *also* set an p->quit variable there, and
> not even try to receive anything for that channel?
>
> I will send a patch later.
>
> Good catch.
>
> Later, Juan.
>

  reply	other threads:[~2019-07-24 11:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-25 13:18 [Qemu-devel] [PATCH 0/3] migration: fix migrate_cancel problems of multifd Ivan Ren
2019-06-25 13:18 ` [Qemu-devel] [PATCH 1/3] migration: fix migrate_cancel leads live_migration thread endless loop Ivan Ren
2019-07-24  8:43   ` Juan Quintela
2019-06-25 13:18 ` [Qemu-devel] [PATCH 2/3] migration: fix migrate_cancel leads live_migration thread hung forever Ivan Ren
2019-07-24  8:47   ` Juan Quintela
2019-07-24  9:18   ` Juan Quintela
2019-06-25 13:18 ` [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migration leads destination " Ivan Ren
2019-07-24  7:01   ` Ivan Ren
2019-07-24  9:01   ` Juan Quintela
2019-07-24 11:30     ` Ivan Ren [this message]
2019-07-14 14:53 ` [Qemu-devel] [PATCH 0/3] migration: fix migrate_cancel problems of multifd Ivan Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+6E1=mGdbfqFNyOk-r5H2C8Rm1tJmQyu37n-Bw24Ns-vZ4iyg@mail.gmail.com' \
    --to=renyime@gmail.com \
    --cc=dgilbert@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).