All of lore.kernel.org
 help / color / mirror / Atom feed
From: Torvald Riegel <triegel@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>, Jan Glauber <jglauber@marvell.com>
Cc: Rafael David Tinoco <rafaeldtinoco@ubuntu.com>,
	lizhengui <lizhengui@huawei.com>,
	dann frazier <dann.frazier@canonical.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Bug 1805256 <1805256@bugs.launchpad.net>,
	QEMU Developers - ARM <qemu-arm@nongnu.org>,
	Will Deacon <will@kernel.org>
Subject: Re: memory barriers and ATOMIC_SEQ_CST on aarch64 (was Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues)
Date: Wed, 02 Oct 2019 16:58:23 +0200	[thread overview]
Message-ID: <12dc4ab638bf8b5af941b24ac989ea45aa8c09b6.camel@redhat.com> (raw)
In-Reply-To: <96c26e21-5996-0c63-ce8b-99a1b5473453@redhat.com>

On Wed, 2019-10-02 at 15:20 +0200, Paolo Bonzini wrote:
> On 02/10/19 13:05, Jan Glauber wrote:
> > The arm64 code generated for the
> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> > memory barriers. It is just plain ldaxr/stlxr.
> > 
> > From my understanding this is not sufficient for SMP sync.
> > 
> > > > If I read this comment correct:
> > > > 
> > > >     void aio_notify(AioContext *ctx)
> > > >     {
> > > >         /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> > > >          * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> > > >          */
> > > >         smp_mb();
> > > >         if (ctx->notify_me) {
> > > > 
> > > > it points out that the smp_mb() should be paired. But as
> > > > I said the used atomics don't generate any barriers at all.
> > > 
> > > Awesome!  That would be a compiler bug though, as atomic_add and atomic_sub
> > > are defined as sequentially consistent:
> > > 
> > > #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST))
> > > #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
> > 
> > Compiler bug sounds kind of unlikely...
> 
> Indeed the assembly produced by the compiler matches for example the
> mappings at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.  A
> small testcase is as follows:
> 
>   int ctx_notify_me;
>   int bh_scheduled;
> 
>   int x()
>   {
>       int one = 1;
>       int ret;
>       __atomic_store(&bh_scheduled, &one, __ATOMIC_RELEASE);     // x1
>       __atomic_thread_fence(__ATOMIC_SEQ_CST);                   // x2
>       __atomic_load(&ctx_notify_me, &ret, __ATOMIC_RELAXED);     // x3
>       return ret;
>   }
> 
>   int y()
>   {
>       int ret;
>       __atomic_fetch_add(&ctx_notify_me, 2, __ATOMIC_SEQ_CST);  // y1
>       __atomic_load(&bh_scheduled, &ret, __ATOMIC_RELAXED);     // y2
>       return ret;
>   }
> 
> Here y (which is aio_poll) wants to order the write to ctx->notify_me
> before reads of bh->scheduled.  However, the processor can speculate the
> load of bh->scheduled between the load-acquire and store-release of
> ctx->notify_me.  So you can have something like:
> 
>  thread 0 (y)                          thread 1 (x)
>  -----------------------------------   -----------------------------
>  y1: load-acq ctx->notify_me
>  y2: load-rlx bh->scheduled
>                                        x1: store-rel bh->scheduled <-- 1
>                                        x2: memory barrier
>                                        x3: load-rlx ctx->notify_me
>  y1: store-rel ctx->notify_me <-- 2
> 
> Being very puzzled, I tried to put this into cppmem:
> 
>   int main() {
>     atomic_int ctx_notify_me = 0;
>     atomic_int bh_scheduled = 0;
>     {{{ {
>           bh_scheduled.store(1, mo_release);
>           atomic_thread_fence(mo_seq_cst);
>           // must be zero since the bug report shows no notification
>           ctx_notify_me.load(mo_relaxed).readsvalue(0);
>         }
>     ||| {
>           ctx_notify_me.store(2, mo_seq_cst);
>           r2=bh_scheduled.load(mo_relaxed);
>         }
>     }}};
>     return 0;
>   }
> 
> and much to my surprise, the tool said r2 *can* be 0.  Same if I put a
> CAS like
> 
>         cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2,
>                             mo_seq_cst, mo_seq_cst);
> 
> which resembles the code in the test case a bit more.

This example looks like Dekker synchronization (if I get the intent right).

Two possible implementations of this are either (1) with all memory
accesses having seq-cst MO, or (2) with relaxed-MO accesses and seq-cst
fences on between the store and load on both ends.  It's possible to mix
both, but that get's trickier I think.  I'd prefer the one with just
fences, just because it's easiest, conceptually.

> I then found a discussion about using the C11 memory model in Linux
> (https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html) which contains the
> following statement, which is a bit disheartening even though it is
> about a different test:
> 
>    My first gut feeling was that the assertion should never fire, but
>    that was wrong because (as I seem to usually forget) the seq-cst
>    total order is just a constraint but doesn't itself contribute
>    to synchronizes-with -- but this is different for seq-cst fences.

It works if you use (1) or (2) consistently.  cppmem and the Batty et al.
tech report should give you the gory details.
My comment is just about seq-cst working differently on memory accesses vs.
fences (in the way it's specified in the memory model).

> and later in the thread:
> 
>    Use of C11 atomics to implement Linux kernel atomic operations
>    requires knowledge of the underlying architecture and the compiler's
>    implementation, as was noted earlier in this thread.
> 
> Indeed if I add an atomic_thread_fence I get only one valid execution,
> where r2 must be 1.  This is similar to GCC's bug
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697, and we can fix it in
> QEMU by using __sync_fetch_and_add; in fact cppmem also shows one valid
> execution if the store is replaced with something like GCC's assembly
> for __sync_fetch_and_add (or Linux's assembly for atomic_add_return):
> 
>         cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2,
>                             mo_release, mo_release);
>         atomic_thread_fence(mo_seq_cst);
> 
> So we should:
> 
> 1) understand why ATOMIC_SEQ_CST is not enough in this case.  QEMU code
> seems to be making the same assumptions as Linux about the memory model,
> and this is wrong because QEMU uses C11 atomics if available.
> Fortunately, this kind of synchronization in QEMU is relatively rare and
> only this particular bit seems affected.  If there is a fix which stays
> within the C11 memory model, and does not pessimize code on x86, we can
> use it[1] and document the pitfall.

Using the fences between the store/load pairs in Dekker-like
synchronization should do that, right?  It's also relatively easy to deal
with.

> 2) if there's no way to fix the bug, qemu/atomic.h needs to switch to
> __sync_fetch_and_add and friends.  And again, in this case the
> difference between the C11 and Linux/QEMU memory models must be documented.

I surely not aware of all the constraints here, but I'd be surprised if the
C11 memory model isn't good enough for portable synchronization code (with
the exception of the consume MO minefield, perhaps). 



  reply	other threads:[~2019-10-02 14:59 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11  2:15 [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Rafael David Tinoco
2019-09-11  2:15 ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11  7:05 ` [Qemu-devel] " Rafael David Tinoco
2019-09-11  7:05   ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-11 13:17 ` [Qemu-devel] " Paolo Bonzini
2019-09-11 14:48   ` Rafael David Tinoco
2019-09-11 19:09   ` Rafael David Tinoco
2019-09-11 19:09     ` [Qemu-devel] [Bug 1805256] " Rafael David Tinoco
2019-09-24 20:25     ` [Qemu-devel] " dann frazier
2019-09-24 20:25       ` [Bug 1805256] " dann frazier
2019-10-02  9:23       ` Jan Glauber
2019-10-02  9:23         ` Jan Glauber
2019-10-02  9:45         ` Paolo Bonzini
2019-10-02 11:05           ` Jan Glauber
2019-10-02 11:05             ` [Bug 1805256] " Jan Glauber
2019-10-02 13:20             ` memory barriers and ATOMIC_SEQ_CST on aarch64 (was Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues) Paolo Bonzini
2019-10-02 14:58               ` Torvald Riegel [this message]
2019-10-02 16:30                 ` Paolo Bonzini
2019-10-07 11:06         ` [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues Paolo Bonzini
2019-10-07 14:36           ` Jan Glauber
2019-10-07 14:36             ` [Bug 1805256] " Jan Glauber
2019-10-07 14:44           ` dann frazier
2019-10-07 14:44             ` [Bug 1805256] " dann frazier
2019-10-07 14:58             ` Paolo Bonzini
2019-10-09  8:02               ` Jan Glauber
2019-10-09  8:02                 ` [Bug 1805256] " Jan Glauber
2019-10-09  9:15                 ` Paolo Bonzini
2019-10-11  6:05                   ` Jan Glauber
2019-10-11  6:05                     ` [Bug 1805256] " Jan Glauber
2019-10-11  8:18                     ` Paolo Bonzini
2019-10-11  8:30                       ` Jan Glauber
2019-10-11  8:30                         ` [Bug 1805256] " Jan Glauber
2019-10-11 17:55                         ` dann frazier
2019-10-11 17:55                           ` dann frazier
2019-10-12  0:24                           ` [Bug 1805256] " no-reply
2019-10-12  0:49                           ` no-reply
2019-10-11 17:50                     ` dann frazier
2019-10-11 17:50                       ` [Bug 1805256] " dann frazier
  -- strict thread matches above, loose matches on Subject: below --
2018-11-26 22:53 [Qemu-devel] [Bug 1805256] [NEW] qemu-img hangs on high core count ARM system dann frazier
2018-11-26 23:26 ` [Qemu-devel] [Bug 1805256] " John Snow
2018-11-26 23:54 ` dann frazier
2018-12-05 11:20 ` Alex Bennée
2019-04-15 12:59 ` 贞贵李
2019-04-15 14:37 ` 贞贵李
2019-04-15 22:25 ` dann frazier
2019-04-15 23:37 ` dann frazier
2019-04-16  8:16 ` 贞贵李
2019-04-16 13:32 ` 贞贵李
2019-04-23  1:29 ` 贞贵李
2019-06-05 16:16 ` dann frazier
2019-09-05 15:03 ` Rafael David Tinoco
2019-09-06 15:12 ` Rafael David Tinoco
2019-09-06 15:16 ` Rafael David Tinoco
2019-09-06 21:22 ` Rafael David Tinoco
2019-09-09 16:47 ` Rafael David Tinoco
2019-09-10  2:04 ` Rafael David Tinoco
2019-09-10 14:16 ` Rafael David Tinoco
2019-09-10 18:15 ` [Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Rafael David Tinoco
2019-09-10 22:56 ` Rafael David Tinoco
2019-09-11  2:17 ` Rafael David Tinoco
2019-09-11 11:19 ` Rafael David Tinoco
2019-09-11 19:23 ` Rafael David Tinoco
2019-10-02 11:02 ` Jan Glauber
2019-10-03 12:28 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 12:29 ` Rafael David Tinoco
2019-10-03 21:35 ` dann frazier
2019-12-13 14:24 ` dann frazier
2019-12-17  1:34 ` Fred Kimmy
2019-12-17 19:17 ` dann frazier
2019-12-18  2:40 ` Rafael David Tinoco
2019-12-18  9:52 ` iveskim
2019-12-18 14:52 ` dann frazier
2019-12-18 16:21 ` Ubuntu Foundations Team Bug Bot
2020-02-13  8:41 ` Ike Panhc
2020-02-13  8:42 ` Andrew Cloke
2020-02-13  9:20 ` Fred Kimmy
2020-04-15  2:47 ` Rafael David Tinoco
2020-05-04  7:24 ` Ike Panhc
2020-05-05  0:54 ` Ike Panhc
2020-05-05  1:22 ` Ying Fang
2020-05-05  6:15 ` Ike Panhc
2020-05-05 15:01 ` Ike Panhc
2020-05-05 18:48 ` Rafael David Tinoco
2020-05-05 23:55 ` dann frazier
2020-05-06 13:08 ` Rafael David Tinoco
2020-05-06 13:23 ` Rafael David Tinoco
2020-05-06 15:45 ` Ike Panhc
2020-05-06 16:42 ` dann frazier
2020-05-06 19:04 ` Launchpad Bug Tracker
2020-05-06 19:09 ` Philippe Mathieu-Daudé
2020-05-06 19:57   ` dann frazier
2020-05-06 20:11 ` Rafael David Tinoco
2020-05-06 21:10 ` Launchpad Bug Tracker
2020-05-06 21:44 ` Launchpad Bug Tracker
2020-05-07  3:37 ` Launchpad Bug Tracker
2020-05-07  7:00 ` Ike Panhc
2020-05-07 22:27 ` dann frazier
2020-05-14  8:05 ` Andrew Cloke
2020-05-27  4:55 ` Christian Ehrhardt 
2020-05-28 14:58 ` Christian Ehrhardt 
2020-05-29  7:55 ` Launchpad Bug Tracker
2020-05-29  8:01 ` Christian Ehrhardt 
2020-06-02 22:45 ` Brian Murray
2020-06-02 22:49 ` [Bug 1805256] Please test proposed package Brian Murray
2020-06-02 22:54 ` Brian Murray
2020-06-03  4:09 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.0+dfsg-0ubuntu9.7) Ubuntu SRU Bot
2020-06-03  6:35 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Ike Panhc
2020-06-03  8:40 ` [Bug 1805256] Autopkgtest regression report (qemu/1:4.2-3ubuntu6.2) Ubuntu SRU Bot
2020-06-05  3:51 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Christian Ehrhardt 
2020-06-11  8:04 ` Andrew Cloke
2020-06-17  5:16 ` Christian Ehrhardt 
2020-06-18  9:23 ` Launchpad Bug Tracker
2020-06-18  9:23 ` [Bug 1805256] Update Released Łukasz Zemczak
2020-06-18  9:38 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Launchpad Bug Tracker
2020-06-18  9:39 ` Launchpad Bug Tracker
2020-06-18 10:27 ` Andrew Cloke
2020-06-30  6:54 ` Christian Ehrhardt 
2020-07-01  7:01 ` Ike Panhc
2020-07-12 13:16 ` Rafael David Tinoco
2020-07-13  3:59 ` Launchpad Bug Tracker
2020-07-13  4:12 ` Rafael David Tinoco
2020-07-15 15:31 ` dann frazier
2020-07-20 12:22 ` Rafael David Tinoco
2020-07-21 20:02 ` Rafael David Tinoco
2020-07-21 20:03 ` Rafael David Tinoco
2020-07-31 18:51 ` Rafael David Tinoco
2020-07-31 21:42 ` Rafael David Tinoco
2020-08-07  9:53 ` Timo Aaltonen
2020-08-07 14:41 ` [Bug 1805256] Autopkgtest regression report (qemu/1:2.11+dfsg-1ubuntu7.30) Ubuntu SRU Bot
2020-08-07 20:13 ` [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images dann frazier
2020-08-14 19:49 ` dann frazier
2020-08-19 16:36 ` Launchpad Bug Tracker
2020-08-19 17:16 ` Andrew Cloke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12dc4ab638bf8b5af941b24ac989ea45aa8c09b6.camel@redhat.com \
    --to=triegel@redhat.com \
    --cc=1805256@bugs.launchpad.net \
    --cc=dann.frazier@canonical.com \
    --cc=jglauber@marvell.com \
    --cc=lizhengui@huawei.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rafaeldtinoco@ubuntu.com \
    --cc=richard.henderson@linaro.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.