All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Jan Glauber <jglauber@marvell.com>
Cc: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>,
	lizhengui <lizhengui@huawei.com>,
	dann frazier <dann.frazier@canonical.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Bug 1805256 <1805256@bugs.launchpad.net>,
	QEMU Developers - ARM <qemu-arm@nongnu.org>
Subject: Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
Date: Wed, 9 Oct 2019 11:15:04 +0200	[thread overview]
Message-ID: <d5367b2a-84ee-1211-a2dc-7d631c94fe3f@redhat.com> (raw)
In-Reply-To: <20191009080220.GA2905@hc>

On 09/10/19 10:02, Jan Glauber wrote:
> On Mon, Oct 07, 2019 at 04:58:30PM +0200, Paolo Bonzini wrote:
>> On 07/10/19 16:44, dann frazier wrote:
>>> On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
>>>> On 02/10/19 11:23, Jan Glauber wrote:
>>>>> I've looked into this on ThunderX2. The arm64 code generated for the
>>>>> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
>>>>> memory barriers. It is just plain ldaxr/stlxr.
>>>>>
>>>>> From my understanding this is not sufficient for SMP sync.
>>>>>
>>>>> If I read this comment correct:
>>>>>
>>>>>     void aio_notify(AioContext *ctx)
>>>>>     {
>>>>>         /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
>>>>>          * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>>>>>          */
>>>>>         smp_mb();
>>>>>         if (ctx->notify_me) {
>>>>>
>>>>> it points out that the smp_mb() should be paired. But as
>>>>> I said the used atomics don't generate any barriers at all.
>>>>
>>>> Based on the rest of the thread, this patch should also fix the bug:
>>>>
>>>> diff --git a/util/async.c b/util/async.c
>>>> index 47dcbfa..721ea53 100644
>>>> --- a/util/async.c
>>>> +++ b/util/async.c
>>>> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
>>>>      aio_notify_accept(ctx);
>>>>  
>>>>      for (bh = ctx->first_bh; bh; bh = bh->next) {
>>>> -        if (bh->scheduled) {
>>>> +        if (atomic_mb_read(&bh->scheduled)) {
>>>>              return true;
>>>>          }
>>>>      }
>>>>
>>>>
>>>> And also the memory barrier in aio_notify can actually be replaced
>>>> with a SEQ_CST load:
>>>>
>>>> diff --git a/util/async.c b/util/async.c
>>>> index 47dcbfa..721ea53 100644
>>>> --- a/util/async.c
>>>> +++ b/util/async.c
>>>> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>>>>  
>>>>  void aio_notify(AioContext *ctx)
>>>>  {
>>>> -    /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
>>>> -     * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
>>>> +    /* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
>>>> +     * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
>>>> +     * atomic_add in aio_poll.
>>>>       */
>>>> -    smp_mb();
>>>> -    if (ctx->notify_me) {
>>>> +    if (atomic_mb_read(&ctx->notify_me)) {
>>>>          event_notifier_set(&ctx->notifier);
>>>>          atomic_mb_set(&ctx->notified, true);
>>>>      }
>>>>
>>>>
>>>> Would you be able to test these (one by one possibly)?
>>>
>>> Paolo,
>>>   I tried them both separately and together on a Hi1620 system, each
>>> time it hung in the first iteration. Here's a backtrace of a run with
>>> both patches applied:
>>
>> Ok, not a great start...  I'll find myself an aarch64 machine and look
>> at it more closely.  I'd like the patch to be something we can
>> understand and document, since this is probably the second most-used
>> memory barrier idiom in QEMU.
>>
>> Paolo
> 
> I'm still not sure what the actual issue is here, but could it be some bad
> interaction between the notify_me and the list_lock? The are both 4 byte
> and side-by-side:
> 
> address notify_me: 0xaaaadb528aa0  sizeof notify_me: 4
> address list_lock: 0xaaaadb528aa4  sizeof list_lock: 4
> 
> AFAICS the generated code looks OK (all load/store exclusive done
> with 32 bit size):
> 
>      e6c:       885ffc01        ldaxr   w1, [x0]
>      e70:       11000821        add     w1, w1, #0x2
>      e74:       8802fc01        stlxr   w2, w1, [x0]
> 
> ...but if I bump notify_me size to uint64_t the issue goes away.

Ouch. :)  Is this with or without my patch(es)?

Also, what if you just add a dummy uint32_t after notify_me?

Thanks,

Paolo

> 
> BTW, the image file I convert in the testcase is ~20 GB.
> 
> --Jan
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index a1d6b9e24939..e8a5ea3860bb 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -83,7 +83,7 @@ struct AioContext {
>       * Instead, the aio_poll calls include both the prepare and the
>       * dispatch phase, hence a simple counter is enough for them.
>       */
> -    uint32_t notify_me;
> +    uint64_t notify_me;
>  
>      /* A lock to protect between QEMUBH and AioHandler adders and deleter,
>       * and to ensure that no callbacks are removed while we're walking and
> 



  reply	other threads:[~2019-10-09 17:26 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11  2:15 [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Rafael David Tinoco
2019-09-11  2:15 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11  7:05 ` [Qemu-devel] " Rafael David Tinoco
2019-09-11  7:05   ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11 13:17 ` [Qemu-devel] " Paolo Bonzini
2019-09-11 14:48   ` Rafael David Tinoco
2019-09-11 19:09   ` Rafael David Tinoco
2019-09-11 19:09     ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-24 20:25     ` [Qemu-devel] " dann frazier
2019-09-24 20:25       ` [Bug 1805256] " dann frazier
2019-10-02  9:23       ` Jan Glauber
2019-10-02  9:23         ` Jan Glauber
2019-10-02  9:45         ` Paolo Bonzini
2019-10-02 11:05           ` Jan Glauber
2019-10-02 11:05             ` [Bug 1805256] " Jan Glauber
2019-10-02 13:20             ` memory barriers and ATOMIC_SEQ_CST on aarch64 (was Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues) Paolo Bonzini
2019-10-02 14:58               ` Torvald Riegel
2019-10-02 16:30                 ` Paolo Bonzini
2019-10-07 11:06         ` [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Paolo Bonzini
2019-10-07 14:36           ` Jan Glauber
2019-10-07 14:36             ` [Bug 1805256] " Jan Glauber
2019-10-07 14:44           ` dann frazier
2019-10-07 14:44             ` [Bug 1805256] " dann frazier
2019-10-07 14:58             ` Paolo Bonzini
2019-10-09  8:02               ` Jan Glauber
2019-10-09  8:02                 ` [Bug 1805256] " Jan Glauber
2019-10-09  9:15                 ` Paolo Bonzini [this message]
2019-10-11  6:05                   ` Jan Glauber
2019-10-11  6:05                     ` [Bug 1805256] " Jan Glauber
2019-10-11  8:18                     ` Paolo Bonzini
2019-10-11  8:30                       ` Jan Glauber
2019-10-11  8:30                         ` [Bug 1805256] " Jan Glauber
2019-10-11 17:55                         ` dann frazier
2019-10-11 17:55                           ` dann frazier
2019-10-12  0:24                           ` [Bug 1805256] " no-reply
2019-10-12  0:49                           ` no-reply
2019-10-11 17:50                     ` dann frazier
2019-10-11 17:50                       ` [Bug 1805256] " dann frazier
  -- strict thread matches above, loose matches on Subject: below --
2018-11-26 22:53 [Qemu-devel] [Bug 1805256] [NEW] qemu-img hangs on high core count ARM system dann frazier
2018-11-26 23:26 ` [Qemu-devel] [Bug 1805256] " John Snow
2018-11-26 23:54 ` dann frazier
2018-12-05 11:20 ` Alex Bennée
2019-04-15 12:59 ` 贞贵李
2019-04-15 14:37 ` 贞贵李
2019-04-15 22:25 ` dann frazier
2019-04-15 23:37 ` dann frazier
2019-04-16  8:16 ` 贞贵李
2019-04-16 13:32 ` 贞贵李
2019-04-23  1:29 ` 贞贵李
2019-06-05 16:16 ` dann frazier
2019-09-05 15:03 ` Rafael David Tinoco
2019-09-06 15:12 ` Rafael David Tinoco
2019-09-06 15:16 ` Rafael David Tinoco
2019-09-06 21:22 ` Rafael David Tinoco
2019-09-09 16:47 ` Rafael David Tinoco
2019-09-10  2:04 ` Rafael David Tinoco
2019-09-10 14:16 ` Rafael David Tinoco
2019-09-10 18:15 ` [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Rafael David Tinoco
2019-09-10 22:56 ` Rafael David Tinoco
2019-09-11  2:17 ` Rafael David Tinoco
2019-09-11 11:19 ` Rafael David Tinoco
2019-09-11 19:23 ` Rafael David Tinoco
2019-10-02 11:02 ` Jan Glauber
2019-10-03 12:28 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 21:35 ` dann frazier
2019-12-13 14:24 ` dann frazier
2019-12-17  1:34 ` Fred Kimmy
2019-12-17 19:17 ` dann frazier
2019-12-18  2:40 ` Rafael David Tinoco
2019-12-18  9:52 ` iveskim
2019-12-18 14:52 ` dann frazier
2019-12-18 16:21 ` Ubuntu Foundations Team Bug Bot
2020-02-13  8:41 ` Ike Panhc
2020-02-13  8:42 ` Andrew Cloke
2020-02-13  9:20 ` Fred Kimmy
2020-04-15  2:47 ` Rafael David Tinoco
2020-05-04  7:24 ` Ike Panhc
2020-05-05  0:54 ` Ike Panhc
2020-05-05  1:22 ` Ying Fang
2020-05-05  6:15 ` Ike Panhc
2020-05-05 15:01 ` Ike Panhc
2020-05-05 18:48 ` Rafael David Tinoco
2020-05-05 23:55 ` dann frazier
2020-05-06 13:08 ` Rafael David Tinoco
2020-05-06 13:23 ` Rafael David Tinoco
2020-05-06 15:45 ` Ike Panhc
2020-05-06 16:42 ` dann frazier
2020-05-06 19:04 ` Launchpad Bug Tracker
2020-05-06 19:09 ` Philippe Mathieu-Daudé
2020-05-06 19:57   ` dann frazier
2020-05-06 20:11 ` Rafael David Tinoco
2020-05-06 21:10 ` Launchpad Bug Tracker
2020-05-06 21:44 ` Launchpad Bug Tracker
2020-05-07  3:37 ` Launchpad Bug Tracker
2020-05-07  7:00 ` Ike Panhc
2020-05-07 22:27 ` dann frazier
2020-05-14  8:05 ` Andrew Cloke
2020-05-27  4:55 ` Christian Ehrhardt 
2020-05-28 14:58 ` Christian Ehrhardt 
2020-05-29  7:55 ` Launchpad Bug Tracker
2020-05-29  8:01 ` Christian Ehrhardt 
2020-06-02 22:45 ` Brian Murray
2020-06-02 22:49 ` [Bug 1805256] Please test proposed package Brian Murray
2020-06-02 22:54 ` Brian Murray
2020-06-03  4:09 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.0+dfsg-0ubuntu9.7) Ubuntu SRU Bot
2020-06-03  6:35 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Ike Panhc
2020-06-03  8:40 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.2-3ubuntu6.2) Ubuntu SRU Bot
2020-06-05  3:51 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Christian Ehrhardt 
2020-06-11  8:04 ` Andrew Cloke
2020-06-17  5:16 ` Christian Ehrhardt 
2020-06-18  9:23 ` Launchpad Bug Tracker
2020-06-18  9:23 ` [Bug 1805256] Update Released Łukasz Zemczak
2020-06-18  9:38 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Launchpad Bug Tracker
2020-06-18  9:39 ` Launchpad Bug Tracker
2020-06-18 10:27 ` Andrew Cloke
2020-06-30  6:54 ` Christian Ehrhardt 
2020-07-01  7:01 ` Ike Panhc
2020-07-12 13:16 ` Rafael David Tinoco
2020-07-13  3:59 ` Launchpad Bug Tracker
2020-07-13  4:12 ` Rafael David Tinoco
2020-07-15 15:31 ` dann frazier
2020-07-20 12:22 ` Rafael David Tinoco
2020-07-21 20:02 ` Rafael David Tinoco
2020-07-21 20:03 ` Rafael David Tinoco
2020-07-31 18:51 ` Rafael David Tinoco
2020-07-31 21:42 ` Rafael David Tinoco
2020-08-07  9:53 ` Timo Aaltonen
2020-08-07 14:41 ` [Bug 1805256] Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.30) Ubuntu SRU Bot
2020-08-07 20:13 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images dann frazier
2020-08-14 19:49 ` dann frazier
2020-08-19 16:36 ` Launchpad Bug Tracker
2020-08-19 17:16 ` Andrew Cloke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d5367b2a-84ee-1211-a2dc-7d631c94fe3f@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=1805256@bugs.launchpad.net \
    --cc=dann.frazier@canonical.com \
    --cc=jglauber@marvell.com \
    --cc=lizhengui@huawei.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rafaeldtinoco@ubuntu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.