From: Jan Glauber <jglauber@marvell.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>, lizhengui <lizhengui@huawei.com>, dann frazier <dann.frazier@canonical.com>, QEMU Developers <qemu-devel@nongnu.org>, Bug 1805256 <1805256@bugs.launchpad.net>, QEMU Developers - ARM <qemu-arm@nongnu.org> Subject: Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Date: Wed, 2 Oct 2019 11:05:59 +0000 [thread overview] Message-ID: <20191002110550.GA3482@hc> (raw) In-Reply-To: <ed5c4522-9250-e403-c55d-d3dbcda82540@redhat.com> On Wed, Oct 02, 2019 at 11:45:19AM +0200, Paolo Bonzini wrote: > On 02/10/19 11:23, Jan Glauber wrote: > > I've tried to verify me theory with this patch and didn't run into the > > issue for ~500 iterations (usually I would trigger the issue ~20 iterations). > > Awesome! That would be a compiler bug though, as atomic_add and atomic_sub > are defined as sequentially consistent: > > #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)) > #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)) Compiler bug sounds kind of unlikely... > What compiler are you using and what distro? Can you compile util/aio-posix.c > with "-fdump-rtl-all -fdump-tree-all", zip the boatload of debugging files and > send them my way? This is on Ubuntu 18.04.3, gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) I've uploaded the debug files to: https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5293619/+files/aio-posix.tar.xz Thanks, Jan > Thanks, > > Paolo
WARNING: multiple messages have this Message-ID (diff)
From: Jan Glauber <jglauber@marvell.com> To: qemu-devel@nongnu.org Subject: [Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Date: Wed, 02 Oct 2019 11:05:59 -0000 [thread overview] Message-ID: <20191002110550.GA3482@hc> (raw) Message-ID: <20191002110559.jT0-zlOjb-BPAAVUz2C_QbO_iLZ7w0IOx9xP3M9wAro@z> (raw) In-Reply-To: 154327283728.15443.11625169757714443608.malonedeb@soybean.canonical.com On Wed, Oct 02, 2019 at 11:45:19AM +0200, Paolo Bonzini wrote: > On 02/10/19 11:23, Jan Glauber wrote: > > I've tried to verify me theory with this patch and didn't run into the > > issue for ~500 iterations (usually I would trigger the issue ~20 iterations). > > Awesome! That would be a compiler bug though, as atomic_add and atomic_sub > are defined as sequentially consistent: > > #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)) > #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)) Compiler bug sounds kind of unlikely... > What compiler are you using and what distro? Can you compile util/aio-posix.c > with "-fdump-rtl-all -fdump-tree-all", zip the boatload of debugging files and > send them my way? This is on Ubuntu 18.04.3, gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) I've uploaded the debug files to: https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5293619/+files/aio-posix.tar.xz Thanks, Jan > Thanks, > > Paolo -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Bionic: New Status in qemu source package in Disco: New Status in qemu source package in Eoan: In Progress Status in qemu source package in FF-Series: New Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. ---- Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. ---- (gdb) thread 1 ... (gdb) bt #0 0x0000ffffbf1ad81c in __GI_ppoll #1 0x0000aaaaaabcf73c in ppoll #2 qemu_poll_ns #3 0x0000aaaaaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0x0000aaaaaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) #3 0x0000aaaaaabed05c in call_rcu_thread #4 0x0000aaaaaabd34c8 in qemu_thread_start #5 0x0000ffffbf25c880 in start_thread #6 0x0000ffffbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0x0000ffffbf11aa20 in __GI___sigtimedwait #1 0x0000ffffbf2671b4 in __sigwait #2 0x0000aaaaaabd1ddc in sigwait_compat #3 0x0000aaaaaabd34c8 in qemu_thread_start #4 0x0000ffffbf25c880 in start_thread #5 0x0000ffffbf1b6b9c in thread_start ---- (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xffffbec5ad90 (LWP 72839)] [New Thread 0xffffbe459d90 (LWP 72840)] [New Thread 0xffffbdb57d90 (LWP 72841)] [New Thread 0xffffacac9d90 (LWP 72859)] [New Thread 0xffffa7ffed90 (LWP 72860)] [New Thread 0xffffa77fdd90 (LWP 72861)] [New Thread 0xffffa6ffcd90 (LWP 72862)] [New Thread 0xffffa67fbd90 (LWP 72863)] [New Thread 0xffffa5ffad90 (LWP 72864)] [Thread 0xffffa5ffad90 (LWP 72864) exited] [Thread 0xffffa6ffcd90 (LWP 72862) exited] [Thread 0xffffa77fdd90 (LWP 72861) exited] [Thread 0xffffbdb57d90 (LWP 72841) exited] [Thread 0xffffa67fbd90 (LWP 72863) exited] [Thread 0xffffacac9d90 (LWP 72859) exited] [Thread 0xffffa7ffed90 (LWP 72860) exited] <HUNG w/ 3 threads in the stack trace showed before> """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). ---- [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760, timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 #5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980 #6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456 #7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
next prev parent reply other threads:[~2019-10-02 12:29 UTC|newest] Thread overview: 134+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-09-11 2:15 [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Rafael David Tinoco 2019-09-11 2:15 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco 2019-09-11 7:05 ` [Qemu-devel] " Rafael David Tinoco 2019-09-11 7:05 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco 2019-09-11 13:17 ` [Qemu-devel] " Paolo Bonzini 2019-09-11 14:48 ` Rafael David Tinoco 2019-09-11 19:09 ` Rafael David Tinoco 2019-09-11 19:09 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco 2019-09-24 20:25 ` [Qemu-devel] " dann frazier 2019-09-24 20:25 ` [Bug 1805256] " dann frazier 2019-10-02 9:23 ` Jan Glauber 2019-10-02 9:23 ` Jan Glauber 2019-10-02 9:45 ` Paolo Bonzini 2019-10-02 11:05 ` Jan Glauber [this message] 2019-10-02 11:05 ` [Bug 1805256] " Jan Glauber 2019-10-02 13:20 ` memory barriers and ATOMIC_SEQ_CST on aarch64 (was Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues) Paolo Bonzini 2019-10-02 14:58 ` Torvald Riegel 2019-10-02 16:30 ` Paolo Bonzini 2019-10-07 11:06 ` [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Paolo Bonzini 2019-10-07 14:36 ` Jan Glauber 2019-10-07 14:36 ` [Bug 1805256] " Jan Glauber 2019-10-07 14:44 ` dann frazier 2019-10-07 14:44 ` [Bug 1805256] " dann frazier 2019-10-07 14:58 ` Paolo Bonzini 2019-10-09 8:02 ` Jan Glauber 2019-10-09 8:02 ` [Bug 1805256] " Jan Glauber 2019-10-09 9:15 ` Paolo Bonzini 2019-10-11 6:05 ` Jan Glauber 2019-10-11 6:05 ` [Bug 1805256] " Jan Glauber 2019-10-11 8:18 ` Paolo Bonzini 2019-10-11 8:30 ` Jan Glauber 2019-10-11 8:30 ` [Bug 1805256] " Jan Glauber 2019-10-11 17:55 ` dann frazier 2019-10-11 17:55 ` dann frazier 2019-10-12 0:24 ` [Bug 1805256] " no-reply 2019-10-12 0:49 ` no-reply 2019-10-11 17:50 ` dann frazier 2019-10-11 17:50 ` [Bug 1805256] " dann frazier -- strict thread matches above, loose matches on Subject: below -- 2018-11-26 22:53 [Qemu-devel] [Bug 1805256] [NEW] qemu-img hangs on high core count ARM system dann frazier 2018-11-26 23:26 ` [Qemu-devel] [Bug 1805256] " John Snow 2018-11-26 23:54 ` dann frazier 2018-12-05 11:20 ` Alex Bennée 2019-04-15 12:59 ` 贞贵李 2019-04-15 14:37 ` 贞贵李 2019-04-15 22:25 ` dann frazier 2019-04-15 23:37 ` dann frazier 2019-04-16 8:16 ` 贞贵李 2019-04-16 13:32 ` 贞贵李 2019-04-23 1:29 ` 贞贵李 2019-06-05 16:16 ` dann frazier 2019-09-05 15:03 ` Rafael David Tinoco 2019-09-06 15:12 ` Rafael David Tinoco 2019-09-06 15:16 ` Rafael David Tinoco 2019-09-06 21:22 ` Rafael David Tinoco 2019-09-09 16:47 ` Rafael David Tinoco 2019-09-10 2:04 ` Rafael David Tinoco 2019-09-10 14:16 ` Rafael David Tinoco 2019-09-10 18:15 ` [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Rafael David Tinoco 2019-09-10 22:56 ` Rafael David Tinoco 2019-09-11 2:17 ` Rafael David Tinoco 2019-09-11 11:19 ` Rafael David Tinoco 2019-09-11 19:23 ` Rafael David Tinoco 2019-10-02 11:02 ` Jan Glauber 2019-10-03 12:28 ` Rafael David Tinoco 2019-10-03 12:29 ` Rafael David Tinoco 2019-10-03 12:29 ` Rafael David Tinoco 2019-10-03 21:35 ` dann frazier 2019-12-13 14:24 ` dann frazier 2019-12-17 1:34 ` Fred Kimmy 2019-12-17 19:17 ` dann frazier 2019-12-18 2:40 ` Rafael David Tinoco 2019-12-18 9:52 ` iveskim 2019-12-18 14:52 ` dann frazier 2019-12-18 16:21 ` Ubuntu Foundations Team Bug Bot 2020-02-13 8:41 ` Ike Panhc 2020-02-13 8:42 ` Andrew Cloke 2020-02-13 9:20 ` Fred Kimmy 2020-04-15 2:47 ` Rafael David Tinoco 2020-05-04 7:24 ` Ike Panhc 2020-05-05 0:54 ` Ike Panhc 2020-05-05 1:22 ` Ying Fang 2020-05-05 6:15 ` Ike Panhc 2020-05-05 15:01 ` Ike Panhc 2020-05-05 18:48 ` Rafael David Tinoco 2020-05-05 23:55 ` dann frazier 2020-05-06 13:08 ` Rafael David Tinoco 2020-05-06 13:23 ` Rafael David Tinoco 2020-05-06 15:45 ` Ike Panhc 2020-05-06 16:42 ` dann frazier 2020-05-06 19:04 ` Launchpad Bug Tracker 2020-05-06 19:09 ` Philippe Mathieu-Daudé 2020-05-06 19:57 ` dann frazier 2020-05-06 20:11 ` Rafael David Tinoco 2020-05-06 21:10 ` Launchpad Bug Tracker 2020-05-06 21:44 ` Launchpad Bug Tracker 2020-05-07 3:37 ` Launchpad Bug Tracker 2020-05-07 7:00 ` Ike Panhc 2020-05-07 22:27 ` dann frazier 2020-05-14 8:05 ` Andrew Cloke 2020-05-27 4:55 ` Christian Ehrhardt 2020-05-28 14:58 ` Christian Ehrhardt 2020-05-29 7:55 ` Launchpad Bug Tracker 2020-05-29 8:01 ` Christian Ehrhardt 2020-06-02 22:45 ` Brian Murray 2020-06-02 22:49 ` [Bug 1805256] Please test proposed package Brian Murray 2020-06-02 22:54 ` Brian Murray 2020-06-03 4:09 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.0+dfsg-0ubuntu9.7) Ubuntu SRU Bot 2020-06-03 6:35 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Ike Panhc 2020-06-03 8:40 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.2-3ubuntu6.2) Ubuntu SRU Bot 2020-06-05 3:51 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Christian Ehrhardt 2020-06-11 8:04 ` Andrew Cloke 2020-06-17 5:16 ` Christian Ehrhardt 2020-06-18 9:23 ` Launchpad Bug Tracker 2020-06-18 9:23 ` [Bug 1805256] Update Released Łukasz Zemczak 2020-06-18 9:38 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Launchpad Bug Tracker 2020-06-18 9:39 ` Launchpad Bug Tracker 2020-06-18 10:27 ` Andrew Cloke 2020-06-30 6:54 ` Christian Ehrhardt 2020-07-01 7:01 ` Ike Panhc 2020-07-12 13:16 ` Rafael David Tinoco 2020-07-13 3:59 ` Launchpad Bug Tracker 2020-07-13 4:12 ` Rafael David Tinoco 2020-07-15 15:31 ` dann frazier 2020-07-20 12:22 ` Rafael David Tinoco 2020-07-21 20:02 ` Rafael David Tinoco 2020-07-21 20:03 ` Rafael David Tinoco 2020-07-31 18:51 ` Rafael David Tinoco 2020-07-31 21:42 ` Rafael David Tinoco 2020-08-07 9:53 ` Timo Aaltonen 2020-08-07 14:41 ` [Bug 1805256] Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.30) Ubuntu SRU Bot 2020-08-07 20:13 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images dann frazier 2020-08-14 19:49 ` dann frazier 2020-08-19 16:36 ` Launchpad Bug Tracker 2020-08-19 17:16 ` Andrew Cloke
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191002110550.GA3482@hc \ --to=jglauber@marvell.com \ --cc=1805256@bugs.launchpad.net \ --cc=dann.frazier@canonical.com \ --cc=lizhengui@huawei.com \ --cc=pbonzini@redhat.com \ --cc=qemu-arm@nongnu.org \ --cc=qemu-devel@nongnu.org \ --cc=rafaeldtinoco@ubuntu.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.