All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mark Rutland <mark.rutland@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, aryabinin@virtuozzo.com,
	catalin.marinas@arm.com, dvyukov@google.com, will.deacon@arm.com
Subject: Re: [PATCH] locking/atomics: Clean up the atomic.h maze of #defines
Date: Sat, 5 May 2018 18:16:09 +0800	[thread overview]
Message-ID: <20180505101609.5wb56j4mspjkokmw@tardis> (raw)
In-Reply-To: <20180505093829.xfylnedwd5nonhae@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5169 bytes --]

On Sat, May 05, 2018 at 11:38:29AM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > > So we could do the following simplification on top of that:
> > > > 
> > > >  #ifndef atomic_fetch_dec_relaxed
> > > >  # ifndef atomic_fetch_dec
> > > >  #  define atomic_fetch_dec(v)		atomic_fetch_sub(1, (v))
> > > >  #  define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))
> > > >  #  define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))
> > > >  #  define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))
> > > >  # else
> > > >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec
> > > >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec
> > > >  #  define atomic_fetch_dec_release		atomic_fetch_dec
> > > >  # endif
> > > >  #else
> > > >  # ifndef atomic_fetch_dec
> > > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)
> > > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)
> > > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)
> > > >  # endif
> > > >  #endif
> > > 
> > > This would disallow an architecture to override just fetch_dec_release for
> > > instance.
> > 
> > Couldn't such a crazy arch just define _all_ the 3 APIs in this group?
> > That's really a small price and makes the place pay the complexity
> > price that does the weirdness...
> > 
> > > I don't think there currently is any architecture that does that, but the
> > > intent was to allow it to override anything and only provide defaults where it
> > > does not.
> > 
> > I'd argue that if a new arch only defines one of these APIs that's probably a bug. 
> > If they absolutely want to do it, they still can - by defining all 3 APIs.
> > 
> > So there's no loss in arch flexibility.
> 
> BTW., PowerPC for example is already in such a situation, it does not define 
> atomic_cmpxchg_release(), only the other APIs:
> 
> #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
> #define atomic_cmpxchg_relaxed(v, o, n) \
> 	cmpxchg_relaxed(&((v)->counter), (o), (n))
> #define atomic_cmpxchg_acquire(v, o, n) \
> 	cmpxchg_acquire(&((v)->counter), (o), (n))
> 
> Was it really the intention on the PowerPC side that the generic code falls back 
> to cmpxchg(), i.e.:
> 
> #  define atomic_cmpxchg_release(...)           __atomic_op_release(atomic_cmpxchg, __VA_ARGS__)
> 

So ppc has its own definition __atomic_op_release() in
arch/powerpc/include/asm/atomic.h:

	#define __atomic_op_release(op, args...)				\
	({									\
		__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");	\
		op##_relaxed(args);						\
	})

, and PPC_RELEASE_BARRIER is lwsync, so we map to

	lwsync();
	atomic_cmpxchg_relaxed(v, o, n);

And the reason, why we don't define atomic_cmpxchg_release() but define
atomic_cmpxchg_acquire() is that, atomic_cmpxchg_*() could provide no
ordering guarantee if the cmp fails, we did this for
atomic_cmpxchg_acquire() but not for atomic_cmpxchg_release(), because
doing so may introduce a memory barrier inside a ll/sc critical section,
please see the comment before __cmpxchg_u32_acquire() in
arch/powerpc/include/asm/cmpxchg.h:

	/*
	 * cmpxchg family don't have order guarantee if cmp part fails, therefore we
	 * can avoid superfluous barriers if we use assembly code to implement
	 * cmpxchg() and cmpxchg_acquire(), however we don't do the similar for
	 * cmpxchg_release() because that will result in putting a barrier in the
	 * middle of a ll/sc loop, which is probably a bad idea. For example, this
	 * might cause the conditional store more likely to fail.
	 */

Regards,
Boqun


> Which after macro expansion becomes:
> 
> 	smp_mb__before_atomic();
> 	atomic_cmpxchg_relaxed(v, o, n);
> 
> smp_mb__before_atomic() on PowerPC falls back to the generic __smp_mb(), which 
> falls back to mb(), which on PowerPC is the 'sync' instruction.
> 
> Isn't this a inefficiency bug?
> 
> While I'm pretty clueless about PowerPC low level cmpxchg atomics, they appear to 
> have the following basic structure:
> 
> full cmpxchg():
> 
> 	PPC_ATOMIC_ENTRY_BARRIER # sync
> 	ldarx + stdcx
> 	PPC_ATOMIC_EXIT_BARRIER  # sync
> 
> cmpxchg_relaxed():
> 
> 	ldarx + stdcx
> 
> cmpxchg_acquire():
> 
> 	ldarx + stdcx
> 	PPC_ACQUIRE_BARRIER      # lwsync
> 
> The logical extension for cmpxchg_release() would be:
> 
> cmpxchg_release():
> 
> 	PPC_RELEASE_BARRIER      # lwsync
> 	ldarx + stdcx
> 
> But instead we silently get the generic fallback, which does:
> 
> 	smp_mb__before_atomic();
> 	atomic_cmpxchg_relaxed(v, o, n);
> 
> Which maps to:
> 
> 	sync
> 	ldarx + stdcx
> 
> Note that it uses a full barrier instead of lwsync (does that stand for 
> 'lightweight sync'?).
> 
> Even if it turns out we need the full barrier, with the overly finegrained 
> structure of the atomics this detail is totally undocumented and non-obvious.
> 
> Thanks,
> 
> 	Ingo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: boqun.feng@gmail.com (Boqun Feng)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] locking/atomics: Clean up the atomic.h maze of #defines
Date: Sat, 5 May 2018 18:16:09 +0800	[thread overview]
Message-ID: <20180505101609.5wb56j4mspjkokmw@tardis> (raw)
In-Reply-To: <20180505093829.xfylnedwd5nonhae@gmail.com>

On Sat, May 05, 2018 at 11:38:29AM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > > So we could do the following simplification on top of that:
> > > > 
> > > >  #ifndef atomic_fetch_dec_relaxed
> > > >  # ifndef atomic_fetch_dec
> > > >  #  define atomic_fetch_dec(v)		atomic_fetch_sub(1, (v))
> > > >  #  define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))
> > > >  #  define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))
> > > >  #  define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))
> > > >  # else
> > > >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec
> > > >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec
> > > >  #  define atomic_fetch_dec_release		atomic_fetch_dec
> > > >  # endif
> > > >  #else
> > > >  # ifndef atomic_fetch_dec
> > > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)
> > > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)
> > > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)
> > > >  # endif
> > > >  #endif
> > > 
> > > This would disallow an architecture to override just fetch_dec_release for
> > > instance.
> > 
> > Couldn't such a crazy arch just define _all_ the 3 APIs in this group?
> > That's really a small price and makes the place pay the complexity
> > price that does the weirdness...
> > 
> > > I don't think there currently is any architecture that does that, but the
> > > intent was to allow it to override anything and only provide defaults where it
> > > does not.
> > 
> > I'd argue that if a new arch only defines one of these APIs that's probably a bug. 
> > If they absolutely want to do it, they still can - by defining all 3 APIs.
> > 
> > So there's no loss in arch flexibility.
> 
> BTW., PowerPC for example is already in such a situation, it does not define 
> atomic_cmpxchg_release(), only the other APIs:
> 
> #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
> #define atomic_cmpxchg_relaxed(v, o, n) \
> 	cmpxchg_relaxed(&((v)->counter), (o), (n))
> #define atomic_cmpxchg_acquire(v, o, n) \
> 	cmpxchg_acquire(&((v)->counter), (o), (n))
> 
> Was it really the intention on the PowerPC side that the generic code falls back 
> to cmpxchg(), i.e.:
> 
> #  define atomic_cmpxchg_release(...)           __atomic_op_release(atomic_cmpxchg, __VA_ARGS__)
> 

So ppc has its own definition __atomic_op_release() in
arch/powerpc/include/asm/atomic.h:

	#define __atomic_op_release(op, args...)				\
	({									\
		__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");	\
		op##_relaxed(args);						\
	})

, and PPC_RELEASE_BARRIER is lwsync, so we map to

	lwsync();
	atomic_cmpxchg_relaxed(v, o, n);

And the reason, why we don't define atomic_cmpxchg_release() but define
atomic_cmpxchg_acquire() is that, atomic_cmpxchg_*() could provide no
ordering guarantee if the cmp fails, we did this for
atomic_cmpxchg_acquire() but not for atomic_cmpxchg_release(), because
doing so may introduce a memory barrier inside a ll/sc critical section,
please see the comment before __cmpxchg_u32_acquire() in
arch/powerpc/include/asm/cmpxchg.h:

	/*
	 * cmpxchg family don't have order guarantee if cmp part fails, therefore we
	 * can avoid superfluous barriers if we use assembly code to implement
	 * cmpxchg() and cmpxchg_acquire(), however we don't do the similar for
	 * cmpxchg_release() because that will result in putting a barrier in the
	 * middle of a ll/sc loop, which is probably a bad idea. For example, this
	 * might cause the conditional store more likely to fail.
	 */

Regards,
Boqun


> Which after macro expansion becomes:
> 
> 	smp_mb__before_atomic();
> 	atomic_cmpxchg_relaxed(v, o, n);
> 
> smp_mb__before_atomic() on PowerPC falls back to the generic __smp_mb(), which 
> falls back to mb(), which on PowerPC is the 'sync' instruction.
> 
> Isn't this a inefficiency bug?
> 
> While I'm pretty clueless about PowerPC low level cmpxchg atomics, they appear to 
> have the following basic structure:
> 
> full cmpxchg():
> 
> 	PPC_ATOMIC_ENTRY_BARRIER # sync
> 	ldarx + stdcx
> 	PPC_ATOMIC_EXIT_BARRIER  # sync
> 
> cmpxchg_relaxed():
> 
> 	ldarx + stdcx
> 
> cmpxchg_acquire():
> 
> 	ldarx + stdcx
> 	PPC_ACQUIRE_BARRIER      # lwsync
> 
> The logical extension for cmpxchg_release() would be:
> 
> cmpxchg_release():
> 
> 	PPC_RELEASE_BARRIER      # lwsync
> 	ldarx + stdcx
> 
> But instead we silently get the generic fallback, which does:
> 
> 	smp_mb__before_atomic();
> 	atomic_cmpxchg_relaxed(v, o, n);
> 
> Which maps to:
> 
> 	sync
> 	ldarx + stdcx
> 
> Note that it uses a full barrier instead of lwsync (does that stand for 
> 'lightweight sync'?).
> 
> Even if it turns out we need the full barrier, with the overly finegrained 
> structure of the atomics this detail is totally undocumented and non-obvious.
> 
> Thanks,
> 
> 	Ingo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180505/84554ee3/attachment.sig>

  parent reply	other threads:[~2018-05-05 10:11 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-04 17:39 [PATCH 0/6] arm64: add instrumented atomics Mark Rutland
2018-05-04 17:39 ` Mark Rutland
2018-05-04 17:39 ` [PATCH 1/6] locking/atomic, asm-generic: instrument ordering variants Mark Rutland
2018-05-04 17:39   ` Mark Rutland
2018-05-04 18:01   ` Peter Zijlstra
2018-05-04 18:01     ` Peter Zijlstra
2018-05-04 18:09     ` Mark Rutland
2018-05-04 18:09       ` Mark Rutland
2018-05-04 18:24       ` Peter Zijlstra
2018-05-04 18:24         ` Peter Zijlstra
2018-05-05  9:12         ` Mark Rutland
2018-05-05  9:12           ` Mark Rutland
2018-05-05  8:11       ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Ingo Molnar
2018-05-05  8:11         ` Ingo Molnar
2018-05-05  8:36         ` [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more Ingo Molnar
2018-05-05  8:36           ` Ingo Molnar
2018-05-05  8:54           ` [PATCH] locking/atomics: Combine the atomic_andnot() and atomic64_andnot() API definitions Ingo Molnar
2018-05-05  8:54             ` Ingo Molnar
2018-05-06 12:15             ` [tip:locking/core] " tip-bot for Ingo Molnar
2018-05-06 14:15             ` [PATCH] " Andrea Parri
2018-05-06 14:15               ` Andrea Parri
2018-05-06 12:14           ` [tip:locking/core] locking/atomics: Simplify the op definitions in atomic.h some more tip-bot for Ingo Molnar
2018-05-09  7:33             ` Peter Zijlstra
2018-05-09 13:03               ` Will Deacon
2018-05-15  8:54                 ` Ingo Molnar
2018-05-15  8:35               ` Ingo Molnar
2018-05-15 11:41                 ` Peter Zijlstra
2018-05-15 12:13                   ` Peter Zijlstra
2018-05-15 15:43                   ` Mark Rutland
2018-05-15 17:10                     ` Peter Zijlstra
2018-05-15 17:53                       ` Mark Rutland
2018-05-15 18:11                         ` Peter Zijlstra
2018-05-15 18:15                           ` Peter Zijlstra
2018-05-15 18:52                             ` Linus Torvalds
2018-05-15 19:39                               ` Peter Zijlstra
2018-05-21 17:12                           ` Mark Rutland
2018-05-06 14:12           ` [PATCH] " Andrea Parri
2018-05-06 14:12             ` Andrea Parri
2018-05-06 14:57             ` Ingo Molnar
2018-05-06 14:57               ` Ingo Molnar
2018-05-07  9:54               ` Andrea Parri
2018-05-07  9:54                 ` Andrea Parri
2018-05-18 18:43               ` Palmer Dabbelt
2018-05-18 18:43                 ` Palmer Dabbelt
2018-05-05  8:47         ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Peter Zijlstra
2018-05-05  8:47           ` Peter Zijlstra
2018-05-05  9:04           ` Ingo Molnar
2018-05-05  9:04             ` Ingo Molnar
2018-05-05  9:24             ` Peter Zijlstra
2018-05-05  9:24               ` Peter Zijlstra
2018-05-05  9:38             ` Ingo Molnar
2018-05-05  9:38               ` Ingo Molnar
2018-05-05 10:00               ` [RFC PATCH] locking/atomics/powerpc: Introduce optimized cmpxchg_release() family of APIs for PowerPC Ingo Molnar
2018-05-05 10:00                 ` Ingo Molnar
2018-05-05 10:26                 ` Boqun Feng
2018-05-05 10:26                   ` Boqun Feng
2018-05-06  1:56                 ` Benjamin Herrenschmidt
2018-05-06  1:56                   ` Benjamin Herrenschmidt
2018-05-05 10:16               ` Boqun Feng [this message]
2018-05-05 10:16                 ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Boqun Feng
2018-05-05 10:35                 ` [RFC PATCH] locking/atomics/powerpc: Clarify why the cmpxchg_relaxed() family of APIs falls back to full cmpxchg() Ingo Molnar
2018-05-05 10:35                   ` Ingo Molnar
2018-05-05 11:28                   ` Boqun Feng
2018-05-05 11:28                     ` Boqun Feng
2018-05-05 13:27                     ` [PATCH] locking/atomics/powerpc: Move cmpxchg helpers to asm/cmpxchg.h and define the full set of cmpxchg APIs Ingo Molnar
2018-05-05 13:27                       ` Ingo Molnar
2018-05-05 14:03                       ` Boqun Feng
2018-05-05 14:03                         ` Boqun Feng
2018-05-06 12:11                         ` Ingo Molnar
2018-05-06 12:11                           ` Ingo Molnar
2018-05-07  1:04                           ` Boqun Feng
2018-05-07  1:04                             ` Boqun Feng
2018-05-07  6:50                             ` Ingo Molnar
2018-05-07  6:50                               ` Ingo Molnar
2018-05-06 12:13                     ` [tip:locking/core] " tip-bot for Boqun Feng
2018-05-07 13:31                       ` [PATCH v2] " Boqun Feng
2018-05-07 13:31                         ` Boqun Feng
2018-05-05  9:05           ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Dmitry Vyukov
2018-05-05  9:05             ` Dmitry Vyukov
2018-05-05  9:32             ` Peter Zijlstra
2018-05-05  9:32               ` Peter Zijlstra
2018-05-07  6:43               ` [RFC PATCH] locking/atomics/x86/64: Clean up and fix details of <asm/atomic64_64.h> Ingo Molnar
2018-05-07  6:43                 ` Ingo Molnar
2018-05-05  9:09           ` [PATCH] locking/atomics: Clean up the atomic.h maze of #defines Ingo Molnar
2018-05-05  9:09             ` Ingo Molnar
2018-05-05  9:29             ` Peter Zijlstra
2018-05-05  9:29               ` Peter Zijlstra
2018-05-05 10:48               ` [PATCH] locking/atomics: Shorten the __atomic_op() defines to __op() Ingo Molnar
2018-05-05 10:48                 ` Ingo Molnar
2018-05-05 10:59                 ` Ingo Molnar
2018-05-05 10:59                   ` Ingo Molnar
2018-05-06 12:15                 ` [tip:locking/core] " tip-bot for Ingo Molnar
2018-05-06 12:14         ` [tip:locking/core] locking/atomics: Clean up the atomic.h maze of #defines tip-bot for Ingo Molnar
2018-05-04 17:39 ` [PATCH 2/6] locking/atomic, asm-generic: instrument atomic*andnot*() Mark Rutland
2018-05-04 17:39   ` Mark Rutland
2018-05-04 17:39 ` [PATCH 3/6] arm64: use <linux/atomic.h> for cmpxchg Mark Rutland
2018-05-04 17:39   ` Mark Rutland
2018-05-04 17:39 ` [PATCH 4/6] arm64: fix assembly constraints " Mark Rutland
2018-05-04 17:39   ` Mark Rutland
2018-05-04 17:39 ` [PATCH 5/6] arm64: use instrumented atomics Mark Rutland
2018-05-04 17:39   ` Mark Rutland
2018-05-04 17:39 ` [PATCH 6/6] arm64: instrument smp_{load_acquire,store_release} Mark Rutland
2018-05-04 17:39   ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180505101609.5wb56j4mspjkokmw@tardis \
    --to=boqun.feng@gmail.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=catalin.marinas@arm.com \
    --cc=dvyukov@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.