From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:58407) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hHgQi-0003Qw-JA for qemu-devel@nongnu.org; Fri, 19 Apr 2019 23:12:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hHgJt-0004V2-9d for qemu-devel@nongnu.org; Fri, 19 Apr 2019 23:05:44 -0400 Received: from indium.canonical.com ([91.189.90.7]:55176) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hHgJt-0004Uc-2j for qemu-devel@nongnu.org; Fri, 19 Apr 2019 23:05:37 -0400 Received: from loganberry.canonical.com ([91.189.90.37]) by indium.canonical.com with esmtp (Exim 4.86_2 #2 (Debian)) id 1hHgJr-0002cQ-F1 for ; Sat, 20 Apr 2019 03:05:35 +0000 Received: from loganberry.canonical.com (localhost [127.0.0.1]) by loganberry.canonical.com (Postfix) with ESMTP id 698E02E80C7 for ; Sat, 20 Apr 2019 03:05:35 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Date: Sat, 20 Apr 2019 02:55:47 -0000 From: =?utf-8?b?6LSe6LS15p2O?= <1824053@bugs.launchpad.net> Reply-To: Bug 1824053 <1824053@bugs.launchpad.net> Sender: bounces@canonical.com References: <155486495593.20543.13567634487304856304.malonedeb@chaenomeles.canonical.com> Message-Id: <155572894823.14003.12733061980284058304.launchpad@wampee.canonical.com> Errors-To: bounces@canonical.com Subject: [Qemu-devel] [Bug 1824053] Re: Qemu-img convert appears to be stuck on aarch64 host with low probability List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org ** Description changed: Hi, I found a problem that qemu-img convert appears to be stuck on aarch64 host with low probability. = The convert command line is "qemu-img convert -f qcow2 -O raw disk.qcow2 disk.raw ". = The bt is below: = Thread 2 (Thread 0x40000b776e50 (LWP 27215)): #0 0x000040000a3f2994 in sigtimedwait () from /lib64/libc.so.6 #1 0x000040000a39c60c in sigwait () from /lib64/libpthread.so.0 #2 0x0000aaaaaae82610 in sigwait_compat (opaque=3D0xaaaac5163b00) at uti= l/compatfd.c:37 #3 0x0000aaaaaae85038 in qemu_thread_start (args=3Dargs@entry=3D0xaaaac5= 163b90) at util/qemu_thread_posix.c:496 #4 0x000040000a3918bc in start_thread () from /lib64/libpthread.so.0 #5 0x000040000a492b2c in thread_start () from /lib64/libc.so.6 = Thread 1 (Thread 0x40000b573370 (LWP 27214)): #0 0x000040000a489020 in ppoll () from /lib64/libc.so.6 #1 0x0000aaaaaadaefc0 in ppoll (__ss=3D0x0, __timeout=3D0x0, __nfds=3D, __fds=3D) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=3D, nfds=3D, timeout= =3D) at qemu_timer.c:391 #3 0x0000aaaaaadae014 in os_host_main_loop_wait (timeout=3D) at main_loop.c:272 #4 0x0000aaaaaadae190 in main_loop_wait (nonblocking=3D) = at main_loop.c:534 #5 0x0000aaaaaad97be0 in convert_do_copy (s=3D0xffffdc32eb48) at qemu-im= g.c:1923 #6 0x0000aaaaaada2d70 in img_convert (argc=3D, argv=3D) at qemu-img.c:2414 #7 0x0000aaaaaad99ac4 in main (argc=3D7, argv=3D) at qemu= -img.c:5305 = - = - The problem seems to be very similar to the phenomenon described by this = patch (https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025-aio_n= otify-force-main-loop-wakeup-with-SIGIO-aarch64.patch), = + The problem seems to be very similar to the phenomenon described by this + patch (https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm-ev/0025 + -aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch), = which force main loop wakeup with SIGIO. But this patch was reverted by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm-Revert- aio_notify-force-main-loop-wakeup-with-SIGIO-.patch). = - The problem still seems to exist in aarch64 host. The qemu version I used= is 2.8.1. The host version is 4.19.28-1.2.108.aarch64. - Do you have any solutions to fix it? Thanks for your reply ! + I can reproduce this problem with qemu.git/matser. It still exists in qem= u.git/matser. I found that when an IO return in + worker threads and want to call aio_notify to wake up main_loop, but it f= ound that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by c= alling atomic_and(&ctx->notify_me, ~1) . So worker thread won't write enven= tfd to notify main_loop. If such a scene happens, the main_loop will hang: + main loop worker thread1 = worker thread2 + -------------------------------------------------------------------------= ----------------- = + qemu_poll_ns aio_worker = + qemu_bh_schedule(pool->completion_bh)= = + glib_pollfds_poll + g_main_context_check + aio_ctx_check = aio_worker = = + atomic_and(&ctx->notify_me, ~1) qemu_bh_schedule(pool= ->completion_bh) = + = = + /* do something for event */ = + qemu_poll_ns + /* hangs !!!*/ + = + As we known ,ctx->notify_me will be visited by worker thread and main + loop. I thank we should add a lock protection for ctx->notify_me to + avoid this happend. -- = You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1824053 Title: Qemu-img convert appears to be stuck on aarch64 host with low probability Status in QEMU: Confirmed Bug description: Hi, I found a problem that qemu-img convert appears to be stuck on aarch64 host with low probability. The convert command line is "qemu-img convert -f qcow2 -O raw disk.qcow2 disk.raw ". The bt is below: Thread 2 (Thread 0x40000b776e50 (LWP 27215)): #0 0x000040000a3f2994 in sigtimedwait () from /lib64/libc.so.6 #1 0x000040000a39c60c in sigwait () from /lib64/libpthread.so.0 #2 0x0000aaaaaae82610 in sigwait_compat (opaque=3D0xaaaac5163b00) at uti= l/compatfd.c:37 #3 0x0000aaaaaae85038 in qemu_thread_start (args=3Dargs@entry=3D0xaaaac5= 163b90) at util/qemu_thread_posix.c:496 #4 0x000040000a3918bc in start_thread () from /lib64/libpthread.so.0 #5 0x000040000a492b2c in thread_start () from /lib64/libc.so.6 Thread 1 (Thread 0x40000b573370 (LWP 27214)): #0 0x000040000a489020 in ppoll () from /lib64/libc.so.6 #1 0x0000aaaaaadaefc0 in ppoll (__ss=3D0x0, __timeout=3D0x0, __nfds=3D, __fds=3D) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=3D, nfds=3D, timeout= =3D) at qemu_timer.c:391 #3 0x0000aaaaaadae014 in os_host_main_loop_wait (timeout=3D) at main_loop.c:272 #4 0x0000aaaaaadae190 in main_loop_wait (nonblocking=3D) = at main_loop.c:534 #5 0x0000aaaaaad97be0 in convert_do_copy (s=3D0xffffdc32eb48) at qemu-im= g.c:1923 #6 0x0000aaaaaada2d70 in img_convert (argc=3D, argv=3D) at qemu-img.c:2414 #7 0x0000aaaaaad99ac4 in main (argc=3D7, argv=3D) at qemu= -img.c:5305 The problem seems to be very similar to the phenomenon described by this patch (https://resources.ovirt.org/pub/ovirt-4.1/src/qemu-kvm- ev/0025-aio_notify-force-main-loop-wakeup-with-SIGIO-aarch64.patch), which force main loop wakeup with SIGIO. But this patch was reverted by the patch (http://ovirt.repo.nfrance.com/src/qemu-kvm-ev/kvm- Revert-aio_notify-force-main-loop-wakeup-with-SIGIO-.patch). I can reproduce this problem with qemu.git/matser. It still exists in qem= u.git/matser. I found that when an IO return in worker threads and want to call aio_notify to wake up main_loop, but it f= ound that ctx->notify_me is cleared to 0 by main_loop in aio_ctx_check by c= alling atomic_and(&ctx->notify_me, ~1) . So worker thread won't write enven= tfd to notify main_loop. If such a scene happens, the main_loop will hang: main loop worker thread1 = worker thread2 -------------------------------------------------------------------------= ----------------- = qemu_poll_ns aio_worker = qemu_bh_schedule(pool->completion_bh)= = glib_pollfds_poll g_main_context_check aio_ctx_check = aio_worker = = atomic_and(&ctx->notify_me, ~1) qemu_bh_schedule(pool= ->completion_bh) = = = /* do something for event */ = qemu_poll_ns /* hangs !!!*/ As we known ,ctx->notify_me will be visited by worker thread and main loop. I thank we should add a lock protection for ctx->notify_me to avoid this happend. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1824053/+subscriptions