qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Ivan Ren <renyime@gmail.com>
To: quintela@redhat.com, dgilbert@redhat.com
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migration leads destination hung forever
Date: Wed, 24 Jul 2019 15:01:07 +0800	[thread overview]
Message-ID: <CA+6E1=k9-AeqqwBdWVwZmzY-2V+FYV7eH0mcBuW3RyP=LW=PQQ@mail.gmail.com> (raw)
In-Reply-To: <1561468699-9819-4-git-send-email-ivanren@tencent.com>

ping for review

problem still exist in qemu-4.1.0-rc2

Threads:  24 total,   0 running,  24 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,
 0.0 st
KiB Mem : 39434172+total, 36798950+free,  2948836 used, 23403388 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 38926476+avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
286108 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:01.19
qemu-system-x86
286109 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00
qemu-system-x86
286113 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 IO
mon_iothread
286114 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
0/KVM
286115 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
1/KVM
286116 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
2/KVM
286117 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
3/KVM
286118 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
4/KVM
286119 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
5/KVM
286120 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
6/KVM
286121 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
7/KVM
286122 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
8/KVM
286123 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
9/KVM
286124 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
10/KVM
286125 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
11/KVM
286126 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
12/KVM
286127 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
13/KVM
286128 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
14/KVM
286129 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00 CPU
15/KVM
286131 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.00
vnc_worker
286132 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.01
multifdrecv_0
286133 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.01
multifdrecv_1
286134 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.01
multifdrecv_2
286136 root      20   0  0.127t 110596  12684 S  0.0  0.0   0:00.01
multifdrecv_3

Thread 2 (Thread 0x7f9d075fe700 (LWP 286136)):
#0  0x00007fbd67123a0b in do_futex_wait.constprop.1 () from
/lib64/libpthread.so.0
#1  0x00007fbd67123a9f in __new_sem_wait_slow.constprop.0 () from
/lib64/libpthread.so.0
#2  0x00007fbd67123b3b in sem_wait@@GLIBC_2.2.5 () from
/lib64/libpthread.so.0
#3  0x00005582236e2514 in qemu_sem_wait (sem=sem@entry=0x558226364dc8) at
util/qemu-thread-posix.c:319
#4  0x00005582232efb67 in multifd_recv_thread
(opaque=opaque@entry=0x558226364d30)
at /qemu-4.1.0-rc2/migration/ram.c:1356
#5  0x00005582236e1b26 in qemu_thread_start (args=<optimized out>) at
util/qemu-thread-posix.c:502
#6  0x00007fbd6711de25 in start_thread () from /lib64/libpthread.so.0
#7  0x00007fbd66e4a35d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7fbd6c7a4cc0 (LWP 286108)):
#0  0x00007fbd6711ef57 in pthread_join () from /lib64/libpthread.so.0
#1  0x00005582236e28ff in qemu_thread_join (thread=thread@entry=0x558226364b00)
at util/qemu-thread-posix.c:570
#2  0x00005582232f36b4 in multifd_load_cleanup (errp=errp@entry=0x7fbd341fff58)
at /qemu-4.1.0-rc2/migration/ram.c:1259
#3  0x00005582235765a9 in process_incoming_migration_co (opaque=<optimized
out>) at migration/migration.c:510
#4  0x00005582236f4c0a in coroutine_trampoline (i0=<optimized out>,
i1=<optimized out>) at util/coroutine-ucontext.c:115
#5  0x00007fbd66d98d40 in ?? () from /lib64/libc.so.6
#6  0x00007ffec0bf24d0 in ?? ()
#7  0x0000000000000000 in ?? ()

On Tue, Jun 25, 2019 at 9:18 PM Ivan Ren <renyime@gmail.com> wrote:

> When migrate_cancel a multifd migration, if run sequence like this:
>
>         [source]                              [destination]
>
> multifd_send_sync_main[finish]
>                                     multifd_recv_thread wait &p->sem_sync
> shutdown to_dst_file
>                                     detect error from_src_file
> send  RAM_SAVE_FLAG_EOS[fail]       [no chance to run
> multifd_recv_sync_main]
>                                     multifd_load_cleanup
>                                     join multifd receive thread forever
>
> will lead destination qemu hung at following stack:
>
> pthread_join
> qemu_thread_join
> multifd_load_cleanup
> process_incoming_migration_co
> coroutine_trampoline
>
> Signed-off-by: Ivan Ren <ivanren@tencent.com>
> ---
>  migration/ram.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index e4eb9c441f..504c8ccb03 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1291,6 +1291,11 @@ int multifd_load_cleanup(Error **errp)
>          MultiFDRecvParams *p = &multifd_recv_state->params[i];
>
>          if (p->running) {
> +            /*
> +             * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle
> code,
> +             * however try to wakeup it without harm in cleanup phase.
> +             */
> +            qemu_sem_post(&p->sem_sync);
>              qemu_thread_join(&p->thread);
>          }
>          object_unref(OBJECT(p->c));
> --
> 2.17.2 (Apple Git-113)
>
>

  reply	other threads:[~2019-07-24  7:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-25 13:18 [Qemu-devel] [PATCH 0/3] migration: fix migrate_cancel problems of multifd Ivan Ren
2019-06-25 13:18 ` [Qemu-devel] [PATCH 1/3] migration: fix migrate_cancel leads live_migration thread endless loop Ivan Ren
2019-07-24  8:43   ` Juan Quintela
2019-06-25 13:18 ` [Qemu-devel] [PATCH 2/3] migration: fix migrate_cancel leads live_migration thread hung forever Ivan Ren
2019-07-24  8:47   ` Juan Quintela
2019-07-24  9:18   ` Juan Quintela
2019-06-25 13:18 ` [Qemu-devel] [PATCH 3/3] migration: fix migrate_cancel multifd migration leads destination " Ivan Ren
2019-07-24  7:01   ` Ivan Ren [this message]
2019-07-24  9:01   ` Juan Quintela
2019-07-24 11:30     ` Ivan Ren
2019-07-14 14:53 ` [Qemu-devel] [PATCH 0/3] migration: fix migrate_cancel problems of multifd Ivan Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+6E1=k9-AeqqwBdWVwZmzY-2V+FYV7eH0mcBuW3RyP=LW=PQQ@mail.gmail.com' \
    --to=renyime@gmail.com \
    --cc=dgilbert@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).