From: Paolo Bonzini <pbonzini@redhat.com>
To: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>
Cc: QEMU Developers - ARM <qemu-arm@nongnu.org>,
lizhengui <lizhengui@huawei.com>,
QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
Date: Wed, 11 Sep 2019 15:17:20 +0200 [thread overview]
Message-ID: <d94f18f1-986f-ec19-02c0-e83e5e7af3d0@redhat.com> (raw)
In-Reply-To: <cbe46ad6-ef6c-d155-e79a-672182c725ad@ubuntu.com>
Note that the RCU thread is expected to sit most of the time doing
nothing, so I don't think this matters.
Zhengui's theory that notify_me doesn't work properly on ARM is more
promising, but he couldn't provide a clear explanation of why he thought
notify_me is involved. In particular, I would have expected notify_me to
be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example:
glib_pollfds_fill
g_main_context_prepare
aio_ctx_prepare
atomic_or(&ctx->notify_me, 1)
qemu_poll_ns
glib_pollfds_poll
g_main_context_check
aio_ctx_check
atomic_and(&ctx->notify_me, ~1)
g_main_context_dispatch
aio_ctx_dispatch
/* do something for event */
qemu_poll_ns
but all backtraces show thread 1 in os_host_main_loop_wait:
Thread 1 (Thread 0x40000b573370 (LWP 27214)):
#0 0x000040000a489020 in ppoll () from /lib64/libc.so.6
#1 0x0000aaaaaadaefc0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at qemu_timer.c:391
#3 0x0000aaaaaadae014 in os_host_main_loop_wait (timeout=<optimized out>) at main_loop.c:272
#4 0x0000aaaaaadae190 in main_loop_wait (nonblocking=<optimized out>) at main_loop.c:534
#5 0x0000aaaaaad97be0 in convert_do_copy (s=0xffffdc32eb48) at qemu-img.c:1923
#6 0x0000aaaaaada2d70 in img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2414
#7 0x0000aaaaaad99ac4 in main (argc=7, argv=<optimized out>) at qemu-img.c:5305
Can you place somewhere your util/async.o object file for me to look at it?
Anyway:
On 11/09/19 04:15, Rafael David Tinoco wrote:
> I've caught the following stack trace after an HUNG in qemu-img convert:
>
> (gdb) bt
> #0 syscall ()
> #1 0x0000aaaaaabd41cc in qemu_futex_wait
> #2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>)
> #3 0x0000aaaaaabed05c in call_rcu_thread
> #4 0x0000aaaaaabd34c8 in qemu_thread_start
> #5 0x0000ffffbf25c880 in start_thread
> #6 0x0000ffffbf1b6b9c in thread_start ()
>
> (gdb) print rcu_call_ready_event
> $4 = {value = 4294967295, initialized = true}
>
> value INT_MAX (4294967295) seems WRONG for qemu_futex_wait():
This is UINT_MAX, not INT_MAX. qemu_futex_wait() doesn't care of the
signedness of the value, which is why it is declared as void *. (That said,
changing "ev" to "&ev->value" would be nicer indeed).
> - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *,
> unsigned), is a two's complement, making argument into a INT_MAX when
> that's not what is expected (unless I missed something).
>
> *** If that is the case, unsure if you, Paolo, prefer declaring
> *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay
> here ***
You could change it to 3, but it has to have all the bits in EV_FREE
(see atomic_or(&ev->value, EV_FREE) in qemu_event_reset).
You could also change it to -1u, but I don't see a particular need to do so.
> BUG: description:
> https://bugs.launchpad.net/qemu/+bug/1805256/comments/15
>
> ========
> ISSUE #2
> ========
>
> I found this when debugging lockups while in futex() in a specific ARM64
> server - https://bugs.launchpad.net/qemu/+bug/1805256 - which I'm still
> investigating.
>
> After fixing the issue above, I'm still getting stuck into:
If you changed it to 2, it's wrong.
> - Should qemu_event_set() check return code from
> qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY
> waiter was ever woken up ? Maybe even loop until at least 1 is awaken ?
Why would it need to do so?
Paolo
next prev parent reply other threads:[~2019-09-11 13:21 UTC|newest]
Thread overview: 141+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-26 22:53 [Qemu-devel] [Bug 1805256] [NEW] qemu-img hangs on high core count ARM system dann frazier
2018-11-26 23:26 ` [Qemu-devel] [Bug 1805256] " John Snow
2018-11-26 23:54 ` dann frazier
2018-12-05 11:20 ` Alex Bennée
2019-04-15 12:59 ` 贞贵李
2019-04-15 12:59 ` 贞贵李
2019-04-15 14:37 ` 贞贵李
2019-04-15 14:37 ` 贞贵李
2019-04-15 22:25 ` dann frazier
2019-04-15 22:25 ` dann frazier
2019-04-15 23:37 ` dann frazier
2019-04-15 23:37 ` dann frazier
2019-04-16 8:16 ` 贞贵李
2019-04-16 8:16 ` 贞贵李
2019-04-16 13:32 ` 贞贵李
2019-04-16 13:32 ` 贞贵李
2019-04-23 1:29 ` 贞贵李
2019-04-23 1:29 ` 贞贵李
2019-06-05 16:16 ` dann frazier
2019-09-05 15:03 ` Rafael David Tinoco
2019-09-06 15:12 ` Rafael David Tinoco
2019-09-06 15:16 ` Rafael David Tinoco
2019-09-06 21:22 ` Rafael David Tinoco
2019-09-09 16:47 ` Rafael David Tinoco
2019-09-10 2:04 ` Rafael David Tinoco
2019-09-10 14:16 ` Rafael David Tinoco
2019-09-10 18:15 ` [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Rafael David Tinoco
2019-09-10 22:56 ` Rafael David Tinoco
2019-09-11 2:15 ` [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Rafael David Tinoco
2019-09-11 2:15 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11 7:05 ` [Qemu-devel] " Rafael David Tinoco
2019-09-11 7:05 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11 13:17 ` Paolo Bonzini [this message]
2019-09-11 14:48 ` [Qemu-devel] " Rafael David Tinoco
2019-09-11 19:09 ` Rafael David Tinoco
2019-09-11 19:09 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-24 20:25 ` [Qemu-devel] " dann frazier
2019-09-24 20:25 ` [Bug 1805256] " dann frazier
2019-10-02 9:23 ` Jan Glauber
2019-10-02 9:23 ` Jan Glauber
2019-10-02 9:45 ` Paolo Bonzini
2019-10-02 11:05 ` Jan Glauber
2019-10-02 11:05 ` [Bug 1805256] " Jan Glauber
2019-10-02 13:20 ` memory barriers and ATOMIC_SEQ_CST on aarch64 (was Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues) Paolo Bonzini
2019-10-02 14:58 ` Torvald Riegel
2019-10-02 16:30 ` Paolo Bonzini
2019-10-07 11:06 ` [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Paolo Bonzini
2019-10-07 14:36 ` Jan Glauber
2019-10-07 14:36 ` [Bug 1805256] " Jan Glauber
2019-10-07 14:44 ` dann frazier
2019-10-07 14:44 ` [Bug 1805256] " dann frazier
2019-10-07 14:58 ` Paolo Bonzini
2019-10-09 8:02 ` Jan Glauber
2019-10-09 8:02 ` [Bug 1805256] " Jan Glauber
2019-10-09 9:15 ` Paolo Bonzini
2019-10-11 6:05 ` Jan Glauber
2019-10-11 6:05 ` [Bug 1805256] " Jan Glauber
2019-10-11 8:18 ` Paolo Bonzini
2019-10-11 8:30 ` Jan Glauber
2019-10-11 8:30 ` [Bug 1805256] " Jan Glauber
2019-10-11 17:55 ` dann frazier
2019-10-11 17:55 ` dann frazier
2019-10-12 0:24 ` [Bug 1805256] " no-reply
2019-10-12 0:49 ` no-reply
2019-10-11 17:50 ` dann frazier
2019-10-11 17:50 ` [Bug 1805256] " dann frazier
2019-09-11 2:17 ` [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Rafael David Tinoco
2019-09-11 11:19 ` Rafael David Tinoco
2019-09-11 19:23 ` Rafael David Tinoco
2019-10-02 11:02 ` Jan Glauber
2019-10-03 12:28 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 21:35 ` dann frazier
2019-12-13 14:24 ` dann frazier
2019-12-17 1:34 ` Fred Kimmy
2019-12-17 19:17 ` dann frazier
2019-12-18 2:40 ` Rafael David Tinoco
2019-12-18 9:52 ` iveskim
2019-12-18 14:52 ` dann frazier
2019-12-18 16:21 ` Ubuntu Foundations Team Bug Bot
2020-02-13 8:41 ` Ike Panhc
2020-02-13 8:42 ` Andrew Cloke
2020-02-13 9:20 ` Fred Kimmy
2020-04-15 2:47 ` Rafael David Tinoco
2020-05-04 7:24 ` Ike Panhc
2020-05-05 0:54 ` Ike Panhc
2020-05-05 1:22 ` Ying Fang
2020-05-05 6:15 ` Ike Panhc
2020-05-05 15:01 ` Ike Panhc
2020-05-05 18:48 ` Rafael David Tinoco
2020-05-05 23:55 ` dann frazier
2020-05-06 13:08 ` Rafael David Tinoco
2020-05-06 13:23 ` Rafael David Tinoco
2020-05-06 15:45 ` Ike Panhc
2020-05-06 16:42 ` dann frazier
2020-05-06 19:04 ` Launchpad Bug Tracker
2020-05-06 19:09 ` Philippe Mathieu-Daudé
2020-05-06 19:57 ` dann frazier
2020-05-06 20:11 ` Rafael David Tinoco
2020-05-06 21:10 ` Launchpad Bug Tracker
2020-05-06 21:44 ` Launchpad Bug Tracker
2020-05-07 3:37 ` Launchpad Bug Tracker
2020-05-07 7:00 ` Ike Panhc
2020-05-07 22:27 ` dann frazier
2020-05-14 8:05 ` Andrew Cloke
2020-05-27 4:55 ` Christian Ehrhardt
2020-05-28 14:58 ` Christian Ehrhardt
2020-05-29 7:55 ` Launchpad Bug Tracker
2020-05-29 8:01 ` Christian Ehrhardt
2020-06-02 22:45 ` Brian Murray
2020-06-02 22:49 ` [Bug 1805256] Please test proposed package Brian Murray
2020-06-02 22:54 ` Brian Murray
2020-06-03 4:09 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.0+dfsg-0ubuntu9.7) Ubuntu SRU Bot
2020-06-03 6:35 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Ike Panhc
2020-06-03 8:40 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.2-3ubuntu6.2) Ubuntu SRU Bot
2020-06-05 3:51 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Christian Ehrhardt
2020-06-11 8:04 ` Andrew Cloke
2020-06-17 5:16 ` Christian Ehrhardt
2020-06-18 9:23 ` Launchpad Bug Tracker
2020-06-18 9:23 ` [Bug 1805256] Update Released Łukasz Zemczak
2020-06-18 9:38 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Launchpad Bug Tracker
2020-06-18 9:39 ` Launchpad Bug Tracker
2020-06-18 10:27 ` Andrew Cloke
2020-06-30 6:54 ` Christian Ehrhardt
2020-07-01 7:01 ` Ike Panhc
2020-07-12 13:16 ` Rafael David Tinoco
2020-07-13 3:59 ` Launchpad Bug Tracker
2020-07-13 4:12 ` Rafael David Tinoco
2020-07-15 15:31 ` dann frazier
2020-07-20 12:22 ` Rafael David Tinoco
2020-07-21 20:02 ` Rafael David Tinoco
2020-07-21 20:03 ` Rafael David Tinoco
2020-07-31 18:51 ` Rafael David Tinoco
2020-07-31 21:42 ` Rafael David Tinoco
2020-08-07 9:53 ` Timo Aaltonen
2020-08-07 14:41 ` [Bug 1805256] Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.30) Ubuntu SRU Bot
2020-08-07 20:13 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images dann frazier
2020-08-14 19:49 ` dann frazier
2020-08-19 16:36 ` Launchpad Bug Tracker
2020-08-19 17:16 ` Andrew Cloke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d94f18f1-986f-ec19-02c0-e83e5e7af3d0@redhat.com \
--to=pbonzini@redhat.com \
--cc=lizhengui@huawei.com \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rafaeldtinoco@ubuntu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).