From: Paolo Bonzini <pbonzini@redhat.com>
To: Jan Glauber <jglauber@marvell.com>
Cc: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>,
lizhengui <lizhengui@huawei.com>,
dann frazier <dann.frazier@canonical.com>,
QEMU Developers <qemu-devel@nongnu.org>,
Bug 1805256 <1805256@bugs.launchpad.net>,
QEMU Developers - ARM <qemu-arm@nongnu.org>
Subject: Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
Date: Wed, 9 Oct 2019 11:15:04 +0200 [thread overview]
Message-ID: <d5367b2a-84ee-1211-a2dc-7d631c94fe3f@redhat.com> (raw)
In-Reply-To: <20191009080220.GA2905@hc>
On 09/10/19 10:02, Jan Glauber wrote:
> On Mon, Oct 07, 2019 at 04:58:30PM +0200, Paolo Bonzini wrote:
>> On 07/10/19 16:44, dann frazier wrote:
>>> On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
>>>> On 02/10/19 11:23, Jan Glauber wrote:
>>>>> I've looked into this on ThunderX2. The arm64 code generated for the
>>>>> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
>>>>> memory barriers. It is just plain ldaxr/stlxr.
>>>>>
>>>>> From my understanding this is not sufficient for SMP sync.
>>>>>
>>>>> If I read this comment correct:
>>>>>
>>>>> void aio_notify(AioContext *ctx)
>>>>> {
>>>>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
>>>>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>>>>> */
>>>>> smp_mb();
>>>>> if (ctx->notify_me) {
>>>>>
>>>>> it points out that the smp_mb() should be paired. But as
>>>>> I said the used atomics don't generate any barriers at all.
>>>>
>>>> Based on the rest of the thread, this patch should also fix the bug:
>>>>
>>>> diff --git a/util/async.c b/util/async.c
>>>> index 47dcbfa..721ea53 100644
>>>> --- a/util/async.c
>>>> +++ b/util/async.c
>>>> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
>>>> aio_notify_accept(ctx);
>>>>
>>>> for (bh = ctx->first_bh; bh; bh = bh->next) {
>>>> - if (bh->scheduled) {
>>>> + if (atomic_mb_read(&bh->scheduled)) {
>>>> return true;
>>>> }
>>>> }
>>>>
>>>>
>>>> And also the memory barrier in aio_notify can actually be replaced
>>>> with a SEQ_CST load:
>>>>
>>>> diff --git a/util/async.c b/util/async.c
>>>> index 47dcbfa..721ea53 100644
>>>> --- a/util/async.c
>>>> +++ b/util/async.c
>>>> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>>>>
>>>> void aio_notify(AioContext *ctx)
>>>> {
>>>> - /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs
>>>> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>>>> + /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
>>>> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or
>>>> + * atomic_add in aio_poll.
>>>> */
>>>> - smp_mb();
>>>> - if (ctx->notify_me) {
>>>> + if (atomic_mb_read(&ctx->notify_me)) {
>>>> event_notifier_set(&ctx->notifier);
>>>> atomic_mb_set(&ctx->notified, true);
>>>> }
>>>>
>>>>
>>>> Would you be able to test these (one by one possibly)?
>>>
>>> Paolo,
>>> I tried them both separately and together on a Hi1620 system, each
>>> time it hung in the first iteration. Here's a backtrace of a run with
>>> both patches applied:
>>
>> Ok, not a great start... I'll find myself an aarch64 machine and look
>> at it more closely. I'd like the patch to be something we can
>> understand and document, since this is probably the second most-used
>> memory barrier idiom in QEMU.
>>
>> Paolo
>
> I'm still not sure what the actual issue is here, but could it be some bad
> interaction between the notify_me and the list_lock? The are both 4 byte
> and side-by-side:
>
> address notify_me: 0xaaaadb528aa0 sizeof notify_me: 4
> address list_lock: 0xaaaadb528aa4 sizeof list_lock: 4
>
> AFAICS the generated code looks OK (all load/store exclusive done
> with 32 bit size):
>
> e6c: 885ffc01 ldaxr w1, [x0]
> e70: 11000821 add w1, w1, #0x2
> e74: 8802fc01 stlxr w2, w1, [x0]
>
> ...but if I bump notify_me size to uint64_t the issue goes away.
Ouch. :) Is this with or without my patch(es)?
Also, what if you just add a dummy uint32_t after notify_me?
Thanks,
Paolo
>
> BTW, the image file I convert in the testcase is ~20 GB.
>
> --Jan
>
> diff --git a/include/block/aio.h b/include/block/aio.h
> index a1d6b9e24939..e8a5ea3860bb 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -83,7 +83,7 @@ struct AioContext {
> * Instead, the aio_poll calls include both the prepare and the
> * dispatch phase, hence a simple counter is enough for them.
> */
> - uint32_t notify_me;
> + uint64_t notify_me;
>
> /* A lock to protect between QEMUBH and AioHandler adders and deleter,
> * and to ensure that no callbacks are removed while we're walking and
>
next prev parent reply other threads:[~2019-10-09 17:26 UTC|newest]
Thread overview: 141+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-26 22:53 [Qemu-devel] [Bug 1805256] [NEW] qemu-img hangs on high core count ARM system dann frazier
2018-11-26 23:26 ` [Qemu-devel] [Bug 1805256] " John Snow
2018-11-26 23:54 ` dann frazier
2018-12-05 11:20 ` Alex Bennée
2019-04-15 12:59 ` 贞贵李
2019-04-15 12:59 ` 贞贵李
2019-04-15 14:37 ` 贞贵李
2019-04-15 14:37 ` 贞贵李
2019-04-15 22:25 ` dann frazier
2019-04-15 22:25 ` dann frazier
2019-04-15 23:37 ` dann frazier
2019-04-15 23:37 ` dann frazier
2019-04-16 8:16 ` 贞贵李
2019-04-16 8:16 ` 贞贵李
2019-04-16 13:32 ` 贞贵李
2019-04-16 13:32 ` 贞贵李
2019-04-23 1:29 ` 贞贵李
2019-04-23 1:29 ` 贞贵李
2019-06-05 16:16 ` dann frazier
2019-09-05 15:03 ` Rafael David Tinoco
2019-09-06 15:12 ` Rafael David Tinoco
2019-09-06 15:16 ` Rafael David Tinoco
2019-09-06 21:22 ` Rafael David Tinoco
2019-09-09 16:47 ` Rafael David Tinoco
2019-09-10 2:04 ` Rafael David Tinoco
2019-09-10 14:16 ` Rafael David Tinoco
2019-09-10 18:15 ` [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Rafael David Tinoco
2019-09-10 22:56 ` Rafael David Tinoco
2019-09-11 2:15 ` [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Rafael David Tinoco
2019-09-11 2:15 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11 7:05 ` [Qemu-devel] " Rafael David Tinoco
2019-09-11 7:05 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11 13:17 ` [Qemu-devel] " Paolo Bonzini
2019-09-11 14:48 ` Rafael David Tinoco
2019-09-11 19:09 ` Rafael David Tinoco
2019-09-11 19:09 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-24 20:25 ` [Qemu-devel] " dann frazier
2019-09-24 20:25 ` [Bug 1805256] " dann frazier
2019-10-02 9:23 ` Jan Glauber
2019-10-02 9:23 ` Jan Glauber
2019-10-02 9:45 ` Paolo Bonzini
2019-10-02 11:05 ` Jan Glauber
2019-10-02 11:05 ` [Bug 1805256] " Jan Glauber
2019-10-02 13:20 ` memory barriers and ATOMIC_SEQ_CST on aarch64 (was Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues) Paolo Bonzini
2019-10-02 14:58 ` Torvald Riegel
2019-10-02 16:30 ` Paolo Bonzini
2019-10-07 11:06 ` [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Paolo Bonzini
2019-10-07 14:36 ` Jan Glauber
2019-10-07 14:36 ` [Bug 1805256] " Jan Glauber
2019-10-07 14:44 ` dann frazier
2019-10-07 14:44 ` [Bug 1805256] " dann frazier
2019-10-07 14:58 ` Paolo Bonzini
2019-10-09 8:02 ` Jan Glauber
2019-10-09 8:02 ` [Bug 1805256] " Jan Glauber
2019-10-09 9:15 ` Paolo Bonzini [this message]
2019-10-11 6:05 ` Jan Glauber
2019-10-11 6:05 ` [Bug 1805256] " Jan Glauber
2019-10-11 8:18 ` Paolo Bonzini
2019-10-11 8:30 ` Jan Glauber
2019-10-11 8:30 ` [Bug 1805256] " Jan Glauber
2019-10-11 17:55 ` dann frazier
2019-10-11 17:55 ` dann frazier
2019-10-12 0:24 ` [Bug 1805256] " no-reply
2019-10-12 0:49 ` no-reply
2019-10-11 17:50 ` dann frazier
2019-10-11 17:50 ` [Bug 1805256] " dann frazier
2019-09-11 2:17 ` [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Rafael David Tinoco
2019-09-11 11:19 ` Rafael David Tinoco
2019-09-11 19:23 ` Rafael David Tinoco
2019-10-02 11:02 ` Jan Glauber
2019-10-03 12:28 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 21:35 ` dann frazier
2019-12-13 14:24 ` dann frazier
2019-12-17 1:34 ` Fred Kimmy
2019-12-17 19:17 ` dann frazier
2019-12-18 2:40 ` Rafael David Tinoco
2019-12-18 9:52 ` iveskim
2019-12-18 14:52 ` dann frazier
2019-12-18 16:21 ` Ubuntu Foundations Team Bug Bot
2020-02-13 8:41 ` Ike Panhc
2020-02-13 8:42 ` Andrew Cloke
2020-02-13 9:20 ` Fred Kimmy
2020-04-15 2:47 ` Rafael David Tinoco
2020-05-04 7:24 ` Ike Panhc
2020-05-05 0:54 ` Ike Panhc
2020-05-05 1:22 ` Ying Fang
2020-05-05 6:15 ` Ike Panhc
2020-05-05 15:01 ` Ike Panhc
2020-05-05 18:48 ` Rafael David Tinoco
2020-05-05 23:55 ` dann frazier
2020-05-06 13:08 ` Rafael David Tinoco
2020-05-06 13:23 ` Rafael David Tinoco
2020-05-06 15:45 ` Ike Panhc
2020-05-06 16:42 ` dann frazier
2020-05-06 19:04 ` Launchpad Bug Tracker
2020-05-06 19:09 ` Philippe Mathieu-Daudé
2020-05-06 19:57 ` dann frazier
2020-05-06 20:11 ` Rafael David Tinoco
2020-05-06 21:10 ` Launchpad Bug Tracker
2020-05-06 21:44 ` Launchpad Bug Tracker
2020-05-07 3:37 ` Launchpad Bug Tracker
2020-05-07 7:00 ` Ike Panhc
2020-05-07 22:27 ` dann frazier
2020-05-14 8:05 ` Andrew Cloke
2020-05-27 4:55 ` Christian Ehrhardt
2020-05-28 14:58 ` Christian Ehrhardt
2020-05-29 7:55 ` Launchpad Bug Tracker
2020-05-29 8:01 ` Christian Ehrhardt
2020-06-02 22:45 ` Brian Murray
2020-06-02 22:49 ` [Bug 1805256] Please test proposed package Brian Murray
2020-06-02 22:54 ` Brian Murray
2020-06-03 4:09 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.0+dfsg-0ubuntu9.7) Ubuntu SRU Bot
2020-06-03 6:35 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Ike Panhc
2020-06-03 8:40 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.2-3ubuntu6.2) Ubuntu SRU Bot
2020-06-05 3:51 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Christian Ehrhardt
2020-06-11 8:04 ` Andrew Cloke
2020-06-17 5:16 ` Christian Ehrhardt
2020-06-18 9:23 ` Launchpad Bug Tracker
2020-06-18 9:23 ` [Bug 1805256] Update Released Łukasz Zemczak
2020-06-18 9:38 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Launchpad Bug Tracker
2020-06-18 9:39 ` Launchpad Bug Tracker
2020-06-18 10:27 ` Andrew Cloke
2020-06-30 6:54 ` Christian Ehrhardt
2020-07-01 7:01 ` Ike Panhc
2020-07-12 13:16 ` Rafael David Tinoco
2020-07-13 3:59 ` Launchpad Bug Tracker
2020-07-13 4:12 ` Rafael David Tinoco
2020-07-15 15:31 ` dann frazier
2020-07-20 12:22 ` Rafael David Tinoco
2020-07-21 20:02 ` Rafael David Tinoco
2020-07-21 20:03 ` Rafael David Tinoco
2020-07-31 18:51 ` Rafael David Tinoco
2020-07-31 21:42 ` Rafael David Tinoco
2020-08-07 9:53 ` Timo Aaltonen
2020-08-07 14:41 ` [Bug 1805256] Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.30) Ubuntu SRU Bot
2020-08-07 20:13 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images dann frazier
2020-08-14 19:49 ` dann frazier
2020-08-19 16:36 ` Launchpad Bug Tracker
2020-08-19 17:16 ` Andrew Cloke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d5367b2a-84ee-1211-a2dc-7d631c94fe3f@redhat.com \
--to=pbonzini@redhat.com \
--cc=1805256@bugs.launchpad.net \
--cc=dann.frazier@canonical.com \
--cc=jglauber@marvell.com \
--cc=lizhengui@huawei.com \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rafaeldtinoco@ubuntu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).