linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/41] arch: barrier cleanup + barriers for virt
@ 2016-01-10 14:16 Michael S. Tsirkin
  2016-01-10 14:16 ` [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release() Michael S. Tsirkin
                   ` (41 more replies)
  0 siblings, 42 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel

Changes since v2:
	- extended checkpatch tests for barriers, and added patches
	teaching it to warn about incorrect usage of barriers
	(__smp_xxx barriers are for use by asm-generic code only),
	should help prevent misuse by arch code
	to address comments by Russell King
	- patched more instances of xen to use virt_ barriers
	as suggested by  Stefano Stabellini
	- implemented a 2 byte xchg on sh instead of hacking around it
	as suggested by Peter Zijlstra and  Rich Felker
	- added a patch to drop some s390 arch-specific smp_xxx barriers - generic
	versions are more efficient
	as suggested by Peter Zijlstra and Martin Schwidefsky
	- added a patch to replace before/after atomic barriers with barrier()
	on s390 as suggested by Peter Zijlstra and Martin Schwidefsky
	- included acks from multiple arch maintainers
	thanks a lot for the review!

Changes since v1:
	- replaced an asm-generic patch with an equivalent patch already in tip
	- add wrappers with virt_ prefix for better code annotation,
	  as suggested by David Miller
	- dropped XXX in patch names as this makes vger choke, Cc all relevant
	  mailing lists on all patches (not personal email, as the list becomes
	  too long then)

I parked this in vhost tree for now, though the inclusion of patch 1 from tip
creates a merge conflict - but one that is trivial to resolve.

So I intend to just merge it all through my tree, including the
duplicate patch, and assume conflict will be resolved.

I would really appreciate some feedback on arch bits (especially the x86 bits),
and acks for merging this through the vhost tree.

Thanks!

What really started me off is trying to cleanup some virt code, as suggested by
Peter, who said
> You could of course go fix that instead of mutilating things into
> sort-of functional state.

This work is needed for virtio, so it's probably easiest to
merge it through my tree - is this fine by everyone?

Note to arch maintainers: please don't cherry-pick patches out of this patchset
as it's been structured in this order to avoid breaking bisect.
Please send acks instead!

=====

Sometimes, virtualization is weird. For example, virtio does this (conceptually):

#ifdef CONFIG_SMP
                smp_mb();
#else
                mb();
#endif

Similarly, Xen calls mb() when it's not doing any MMIO at all.

Of course it's wrong in the sense that it's suboptimal. What we would really
like is to have, on UP, exactly the same barrier as on SMP.  This is because a
UP guest can run on an SMP host.

But Linux doesn't provide this ability: if CONFIG_SMP is not defined is
optimizes most barriers out to a compiler barrier.

Consider for example x86: what we want is xchg (NOT mfence - there's no real IO
going on here - just switching out of the VM - more like a function call
really) but if built without CONFIG_SMP smp_store_mb does not include this.

Virt in general is probably the only use-case, because this really is an
artifact of interfacing with an SMP host while running an UP kernel,
but since we have (at least) two users, it seems to make sense to
put these APIs in a central place.

In fact, smp_ barriers are stubs on !SMP, so they can be defined as follows:

arch/XXX/include/asm/barrier.h:

#define __smp_mb() DOSOMETHING

include/asm-generic/barrier.h:

#ifdef CONFIG_SMP
#define smp_mb() __smp_mb()
#else
#define smp_mb() barrier()
#endif

This has the benefit of cleaning out a bunch of duplicated
ifdefs on a bunch of architectures - this patchset brings
about a net reduction in LOC, more than compensated for
later by performance enhancements, extra documentation and tools :)

Then virt can use __smp_XXX when talking to an SMP host.
To make those users explicit, this patchset adds virt_xxx wrappers
for them.

Touching all archs is a tad tedious, but its fairly straight forward.

The patchset is structured as follows:


-. Patch 1 fixes a bug in asm-generic.
   It is already in tip, included here for completeness.

-. Patches 2-12 make sure barrier.h on all remaining
   architectures includes asm-generic/barrier.h:
   after the change in Patch 1, code there matches
   asm-generic/barrier.h almost verbatim.
   Minor code tweaks were required in a couple of places.
   Macros duplicated from asm-generic/barrier.h are dropped
   in the process.

After all that preparatory work, we are getting to the actual change.

-. Patch 13 adds generic smp_XXX wrappers in asm-generic:
   these select __smp_XXX or barrier() depending on CONFIG_SMP

-. Patches 14-27 change all architectures to
   define __smp_XXX macros; the generic code in asm-generic/barrier.h
   then defines smp_XXX macros

   I compiled the affected arches before and after the changes,
   dumped the .text section (using objdump -O binary) and
   made sure that the object code is exactly identical
   before and after the change.

   Note: the changes were intentionally done in a way
   that avoids generated code changes.
   When I got feedback from arch maintainers that the
   barriers can be improved, I made this in a separate
   patch on top, to allow this testing by binary comparisons.

Unfortunately, I don't have a metag cross-build toolset ready.
Hoping for some acks on this architecture.

Next, the following patches put the __smp_xxx APIs to work for virt:

-. Patch 28 adds virt_ wrappers for __smp_, and documents them.
   After all this work, this requires very few lines of code in
   the generic header.

-. Patches 29,30 convert virtio drivers to use the virt_xxx APIs
   tested on x86

-. Patches 31-33 teach virtio to use virt_store_mb
   sh architecture was missing a 2-byte xchg,
   needed for 2 byte smp_store_mb,
   so I had to add this support for sh

-. Patches 34-36 teach checkpatch to warn about
   misuse of the new barriers

-. Patches 37-39 convert xen drivers to use the virt_xxx APIs
   compiled only (by intel 0-day infrastructure)

-. Patch 40 makes some smp barriers on s390 more efficient
   included here to avoid merge conflicts, at maintainer's request

   tested on x86
Davidlohr Bueso (1):
  lcoking/barriers, arch: Use smp barriers in smp_store_release()

Michael S. Tsirkin (40):
  asm-generic: guard smp_store_release/load_acquire
  ia64: rename nop->iosapic_nop
  ia64: reuse asm-generic/barrier.h
  powerpc: reuse asm-generic/barrier.h
  s390: reuse asm-generic/barrier.h
  sparc: reuse asm-generic/barrier.h
  arm: reuse asm-generic/barrier.h
  arm64: reuse asm-generic/barrier.h
  metag: reuse asm-generic/barrier.h
  mips: reuse asm-generic/barrier.h
  x86/um: reuse asm-generic/barrier.h
  x86: reuse asm-generic/barrier.h
  asm-generic: add __smp_xxx wrappers
  powerpc: define __smp_xxx
  arm64: define __smp_xxx
  arm: define __smp_xxx
  blackfin: define __smp_xxx
  ia64: define __smp_xxx
  metag: define __smp_xxx
  mips: define __smp_xxx
  s390: define __smp_xxx
  sh: define __smp_xxx, fix smp_store_mb for !SMP
  sparc: define __smp_xxx
  tile: define __smp_xxx
  xtensa: define __smp_xxx
  x86: define __smp_xxx
  asm-generic: implement virt_xxx memory barriers
  Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
  virtio_ring: update weak barriers to use virt_xxx
  sh: support 1 and 2 byte xchg
  sh: move xchg_cmpxchg to a header by itself
  virtio_ring: use virt_store_mb
  checkpatch.pl: add missing memory barriers
  checkpatch: check for __smp outside barrier.h
  checkpatch: add virt barriers
  xenbus: use virt_xxx barriers
  xen/io: use virt_xxx barriers
  xen/events: use virt_xxx barriers
  s390: use generic memory barriers
  s390: more efficient smp barriers

 arch/arm/include/asm/barrier.h      |  35 ++----------
 arch/arm64/include/asm/barrier.h    |  19 ++-----
 arch/blackfin/include/asm/barrier.h |   4 +-
 arch/ia64/include/asm/barrier.h     |  24 +++-----
 arch/metag/include/asm/barrier.h    |  55 ++++++-------------
 arch/mips/include/asm/barrier.h     |  51 ++++++-----------
 arch/powerpc/include/asm/barrier.h  |  33 ++++-------
 arch/s390/include/asm/barrier.h     |  23 ++++----
 arch/sh/include/asm/barrier.h       |   3 +-
 arch/sh/include/asm/cmpxchg-grb.h   |  22 ++++++++
 arch/sh/include/asm/cmpxchg-irq.h   |  11 ++++
 arch/sh/include/asm/cmpxchg-llsc.h  |  25 +--------
 arch/sh/include/asm/cmpxchg-xchg.h  |  51 +++++++++++++++++
 arch/sh/include/asm/cmpxchg.h       |   3 +
 arch/sparc/include/asm/barrier_32.h |   1 -
 arch/sparc/include/asm/barrier_64.h |  29 ++--------
 arch/sparc/include/asm/processor.h  |   3 -
 arch/tile/include/asm/barrier.h     |   9 +--
 arch/x86/include/asm/barrier.h      |  36 +++++-------
 arch/x86/um/asm/barrier.h           |   9 +--
 arch/xtensa/include/asm/barrier.h   |   4 +-
 include/asm-generic/barrier.h       | 106 +++++++++++++++++++++++++++++++++---
 include/linux/virtio_ring.h         |  21 +++++--
 include/xen/interface/io/ring.h     |  16 +++---
 arch/ia64/kernel/iosapic.c          |   6 +-
 drivers/virtio/virtio_ring.c        |  15 +++--
 drivers/xen/events/events_fifo.c    |   3 +-
 drivers/xen/xenbus/xenbus_comms.c   |   8 +--
 Documentation/memory-barriers.txt   |  28 ++++++++--
 scripts/checkpatch.pl               |  31 ++++++++++-
 30 files changed, 382 insertions(+), 302 deletions(-)
 create mode 100644 arch/sh/include/asm/cmpxchg-xchg.h

-- 
MST

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
@ 2016-01-10 14:16 ` Michael S. Tsirkin
  2016-01-12 16:28   ` Paul E. McKenney
  2016-01-10 14:16 ` [PATCH v3 02/41] asm-generic: guard smp_store_release/load_acquire Michael S. Tsirkin
                   ` (40 subsequent siblings)
  41 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Davidlohr Bueso, Davidlohr Bueso,
	Andrew Morton, Benjamin Herrenschmidt, Heiko Carstens,
	Linus Torvalds, Paul E . McKenney, Tony Luck, Ingo Molnar,
	Fenghua Yu, Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Christian Borntraeger

From: Davidlohr Bueso <dave@stgolabs.net>

With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/ia64/include/asm/barrier.h    | 2 +-
 arch/powerpc/include/asm/barrier.h | 2 +-
 arch/s390/include/asm/barrier.h    | 2 +-
 include/asm-generic/barrier.h      | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index df896a1..209c4b8 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -77,7 +77,7 @@ do {									\
 	___p1;								\
 })
 
-#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
 
 /*
  * The group barrier in front of the rsm & ssm are necessary to ensure
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index 0eca6ef..a7af5fb 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -34,7 +34,7 @@
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
 
-#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
 
 #ifdef __SUBARCH_HAS_LWSYNC
 #    define SMPWMB      LWSYNC
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index d68e11e..7ffd0b1 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -36,7 +36,7 @@
 #define smp_mb__before_atomic()		smp_mb()
 #define smp_mb__after_atomic()		smp_mb()
 
-#define smp_store_mb(var, value)		do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); smp_mb(); } while (0)
 
 #define smp_store_release(p, v)						\
 do {									\
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index b42afad..0f45f93 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -93,7 +93,7 @@
 #endif	/* CONFIG_SMP */
 
 #ifndef smp_store_mb
-#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); mb(); } while (0)
+#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); smp_mb(); } while (0)
 #endif
 
 #ifndef smp_mb__before_atomic
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 02/41] asm-generic: guard smp_store_release/load_acquire
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
  2016-01-10 14:16 ` [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release() Michael S. Tsirkin
@ 2016-01-10 14:16 ` Michael S. Tsirkin
  2016-01-10 14:16 ` [PATCH v3 03/41] ia64: rename nop->iosapic_nop Michael S. Tsirkin
                   ` (39 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel

Allow architectures to override smp_store_release
and smp_load_acquire by guarding the defines
in asm-generic/barrier.h with ifndef directives.

This is in preparation to reusing asm-generic/barrier.h
on architectures which have their own definition
of these macros.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/asm-generic/barrier.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 0f45f93..987b2e0 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -104,13 +104,16 @@
 #define smp_mb__after_atomic()	smp_mb()
 #endif
 
+#ifndef smp_store_release
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	smp_mb();							\
 	WRITE_ONCE(*p, v);						\
 } while (0)
+#endif
 
+#ifndef smp_load_acquire
 #define smp_load_acquire(p)						\
 ({									\
 	typeof(*p) ___p1 = READ_ONCE(*p);				\
@@ -118,6 +121,7 @@ do {									\
 	smp_mb();							\
 	___p1;								\
 })
+#endif
 
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 03/41] ia64: rename nop->iosapic_nop
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
  2016-01-10 14:16 ` [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release() Michael S. Tsirkin
  2016-01-10 14:16 ` [PATCH v3 02/41] asm-generic: guard smp_store_release/load_acquire Michael S. Tsirkin
@ 2016-01-10 14:16 ` Michael S. Tsirkin
  2016-01-10 14:17 ` [PATCH v3 04/41] ia64: reuse asm-generic/barrier.h Michael S. Tsirkin
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Tony Luck, Fenghua Yu, Jiang Liu,
	Rusty Russell

asm-generic/barrier.h defines a nop() macro.
To be able to use this header on ia64, we shouldn't
call local functions/variables nop().

There's one instance where this breaks on ia64:
rename the function to iosapic_nop to avoid the conflict.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/ia64/kernel/iosapic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c
index d2fae05..90fde5b 100644
--- a/arch/ia64/kernel/iosapic.c
+++ b/arch/ia64/kernel/iosapic.c
@@ -256,7 +256,7 @@ set_rte (unsigned int gsi, unsigned int irq, unsigned int dest, int mask)
 }
 
 static void
-nop (struct irq_data *data)
+iosapic_nop (struct irq_data *data)
 {
 	/* do nothing... */
 }
@@ -415,7 +415,7 @@ iosapic_unmask_level_irq (struct irq_data *data)
 #define iosapic_shutdown_level_irq	mask_irq
 #define iosapic_enable_level_irq	unmask_irq
 #define iosapic_disable_level_irq	mask_irq
-#define iosapic_ack_level_irq		nop
+#define iosapic_ack_level_irq		iosapic_nop
 
 static struct irq_chip irq_type_iosapic_level = {
 	.name =			"IO-SAPIC-level",
@@ -453,7 +453,7 @@ iosapic_ack_edge_irq (struct irq_data *data)
 }
 
 #define iosapic_enable_edge_irq		unmask_irq
-#define iosapic_disable_edge_irq	nop
+#define iosapic_disable_edge_irq	iosapic_nop
 
 static struct irq_chip irq_type_iosapic_edge = {
 	.name =			"IO-SAPIC-edge",
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 04/41] ia64: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2016-01-10 14:16 ` [PATCH v3 03/41] ia64: rename nop->iosapic_nop Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
  2016-01-10 14:17 ` [PATCH v3 05/41] powerpc: " Michael S. Tsirkin
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Tony Luck, Fenghua Yu, Ingo Molnar,
	Davidlohr Bueso

On ia64 smp_rmb, smp_wmb, read_barrier_depends, smp_read_barrier_depends
and smp_store_mb() match the asm-generic variants exactly. Drop the
local definitions and pull in asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/ia64/include/asm/barrier.h | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index 209c4b8..2f93348 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -48,12 +48,6 @@
 # define smp_mb()	barrier()
 #endif
 
-#define smp_rmb()	smp_mb()
-#define smp_wmb()	smp_mb()
-
-#define read_barrier_depends()		do { } while (0)
-#define smp_read_barrier_depends()	do { } while (0)
-
 #define smp_mb__before_atomic()	barrier()
 #define smp_mb__after_atomic()	barrier()
 
@@ -77,12 +71,12 @@ do {									\
 	___p1;								\
 })
 
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
 /*
  * The group barrier in front of the rsm & ssm are necessary to ensure
  * that none of the previous instructions in the same group are
  * affected by the rsm/ssm.
  */
 
+#include <asm-generic/barrier.h>
+
 #endif /* _ASM_IA64_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 05/41] powerpc: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2016-01-10 14:17 ` [PATCH v3 04/41] ia64: reuse asm-generic/barrier.h Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
  2016-01-12 16:31   ` Paul E. McKenney
  2016-01-10 14:17 ` [PATCH v3 06/41] s390: " Michael S. Tsirkin
                   ` (36 subsequent siblings)
  41 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Ingo Molnar, Davidlohr Bueso, Paul E. McKenney

On powerpc read_barrier_depends, smp_read_barrier_depends
smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/powerpc/include/asm/barrier.h | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index a7af5fb..980ad0c 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -34,8 +34,6 @@
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
 
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
 #ifdef __SUBARCH_HAS_LWSYNC
 #    define SMPWMB      LWSYNC
 #else
@@ -60,9 +58,6 @@
 #define smp_wmb()	barrier()
 #endif /* CONFIG_SMP */
 
-#define read_barrier_depends()		do { } while (0)
-#define smp_read_barrier_depends()	do { } while (0)
-
 /*
  * This is a barrier which prevents following instructions from being
  * started until the value of the argument x is known.  For example, if
@@ -87,8 +82,8 @@ do {									\
 	___p1;								\
 })
 
-#define smp_mb__before_atomic()     smp_mb()
-#define smp_mb__after_atomic()      smp_mb()
 #define smp_mb__before_spinlock()   smp_mb()
 
+#include <asm-generic/barrier.h>
+
 #endif /* _ASM_POWERPC_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 06/41] s390: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2016-01-10 14:17 ` [PATCH v3 05/41] powerpc: " Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
  2016-01-10 14:17 ` [PATCH v3 07/41] sparc: " Michael S. Tsirkin
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Martin Schwidefsky, Heiko Carstens,
	Ingo Molnar, Davidlohr Bueso, Christian Borntraeger

On s390 read_barrier_depends, smp_read_barrier_depends
smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/s390/include/asm/barrier.h | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 7ffd0b1..c358c31 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -30,14 +30,6 @@
 #define smp_rmb()			rmb()
 #define smp_wmb()			wmb()
 
-#define read_barrier_depends()		do { } while (0)
-#define smp_read_barrier_depends()	do { } while (0)
-
-#define smp_mb__before_atomic()		smp_mb()
-#define smp_mb__after_atomic()		smp_mb()
-
-#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
@@ -53,4 +45,6 @@ do {									\
 	___p1;								\
 })
 
+#include <asm-generic/barrier.h>
+
 #endif /* __ASM_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 07/41] sparc: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2016-01-10 14:17 ` [PATCH v3 06/41] s390: " Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
  2016-01-10 14:17 ` [PATCH v3 08/41] arm: " Michael S. Tsirkin
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ingo Molnar

On sparc 64 bit dma_rmb, dma_wmb, smp_store_mb, smp_mb, smp_rmb,
smp_wmb, read_barrier_depends and smp_read_barrier_depends match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

nop uses __asm__ __volatile but is otherwise identical to
the generic version, drop that as well.

This is in preparation to refactoring this code area.

Note: nop() was in processor.h and not in barrier.h as on other
architectures. Nothing seems to depend on it being there though.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/include/asm/barrier_32.h |  1 -
 arch/sparc/include/asm/barrier_64.h | 21 ++-------------------
 arch/sparc/include/asm/processor.h  |  3 ---
 3 files changed, 2 insertions(+), 23 deletions(-)

diff --git a/arch/sparc/include/asm/barrier_32.h b/arch/sparc/include/asm/barrier_32.h
index ae69eda..8059130 100644
--- a/arch/sparc/include/asm/barrier_32.h
+++ b/arch/sparc/include/asm/barrier_32.h
@@ -1,7 +1,6 @@
 #ifndef __SPARC_BARRIER_H
 #define __SPARC_BARRIER_H
 
-#include <asm/processor.h> /* for nop() */
 #include <asm-generic/barrier.h>
 
 #endif /* !(__SPARC_BARRIER_H) */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 14a9286..26c3f72 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -37,25 +37,6 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 #define rmb()	__asm__ __volatile__("":::"memory")
 #define wmb()	__asm__ __volatile__("":::"memory")
 
-#define dma_rmb()	rmb()
-#define dma_wmb()	wmb()
-
-#define smp_store_mb(__var, __value) \
-	do { WRITE_ONCE(__var, __value); membar_safe("#StoreLoad"); } while(0)
-
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
-#else
-#define smp_mb()	__asm__ __volatile__("":::"memory")
-#define smp_rmb()	__asm__ __volatile__("":::"memory")
-#define smp_wmb()	__asm__ __volatile__("":::"memory")
-#endif
-
-#define read_barrier_depends()		do { } while (0)
-#define smp_read_barrier_depends()	do { } while (0)
-
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
@@ -74,4 +55,6 @@ do {									\
 #define smp_mb__before_atomic()	barrier()
 #define smp_mb__after_atomic()	barrier()
 
+#include <asm-generic/barrier.h>
+
 #endif /* !(__SPARC64_BARRIER_H) */
diff --git a/arch/sparc/include/asm/processor.h b/arch/sparc/include/asm/processor.h
index 2fe99e6..9da9646 100644
--- a/arch/sparc/include/asm/processor.h
+++ b/arch/sparc/include/asm/processor.h
@@ -5,7 +5,4 @@
 #else
 #include <asm/processor_32.h>
 #endif
-
-#define nop() 		__asm__ __volatile__ ("nop")
-
 #endif
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 08/41] arm: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (6 preceding siblings ...)
  2016-01-10 14:17 ` [PATCH v3 07/41] sparc: " Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
  2016-01-10 14:17 ` [PATCH v3 09/41] arm64: " Michael S. Tsirkin
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Russell King, Ingo Molnar,
	Richard Woodruff

On arm smp_store_mb, read_barrier_depends, smp_read_barrier_depends,
smp_store_release, smp_load_acquire, smp_mb__before_atomic and
smp_mb__after_atomic match the asm-generic variants exactly. Drop the
local definitions and pull in asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/include/asm/barrier.h | 23 +----------------------
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 3ff5642..31152e8 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -70,28 +70,7 @@ extern void arm_heavy_mb(void);
 #define smp_wmb()	dmb(ishst)
 #endif
 
-#define smp_store_release(p, v)						\
-do {									\
-	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
-	WRITE_ONCE(*p, v);						\
-} while (0)
-
-#define smp_load_acquire(p)						\
-({									\
-	typeof(*p) ___p1 = READ_ONCE(*p);				\
-	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
-	___p1;								\
-})
-
-#define read_barrier_depends()		do { } while(0)
-#define smp_read_barrier_depends()	do { } while(0)
-
-#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
-#define smp_mb__before_atomic()	smp_mb()
-#define smp_mb__after_atomic()	smp_mb()
+#include <asm-generic/barrier.h>
 
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 09/41] arm64: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (7 preceding siblings ...)
  2016-01-10 14:17 ` [PATCH v3 08/41] arm: " Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
  2016-01-10 14:17 ` [PATCH v3 10/41] metag: " Michael S. Tsirkin
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Catalin Marinas, Will Deacon,
	Ingo Molnar, Andre Przywara

On arm64 nop, read_barrier_depends, smp_read_barrier_depends
smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm64/include/asm/barrier.h | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 9622eb4..91a43f4 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -91,14 +91,7 @@ do {									\
 	__u.__val;							\
 })
 
-#define read_barrier_depends()		do { } while(0)
-#define smp_read_barrier_depends()	do { } while(0)
-
-#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-#define nop()		asm volatile("nop");
-
-#define smp_mb__before_atomic()	smp_mb()
-#define smp_mb__after_atomic()	smp_mb()
+#include <asm-generic/barrier.h>
 
 #endif	/* __ASSEMBLY__ */
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 10/41] metag: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (8 preceding siblings ...)
  2016-01-10 14:17 ` [PATCH v3 09/41] arm64: " Michael S. Tsirkin
@ 2016-01-10 14:17 ` Michael S. Tsirkin
  2016-01-10 14:18 ` [PATCH v3 11/41] mips: " Michael S. Tsirkin
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, James Hogan, Ingo Molnar

On metag dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
smp_read_barrier_depends, smp_store_release and smp_load_acquire  match
the asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/metag/include/asm/barrier.h | 25 ++-----------------------
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index 172b7e5..b5b778b 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -44,9 +44,6 @@ static inline void wr_fence(void)
 #define rmb()		barrier()
 #define wmb()		mb()
 
-#define dma_rmb()	rmb()
-#define dma_wmb()	wmb()
-
 #ifndef CONFIG_SMP
 #define fence()		do { } while (0)
 #define smp_mb()        barrier()
@@ -81,27 +78,9 @@ static inline void fence(void)
 #endif
 #endif
 
-#define read_barrier_depends()		do { } while (0)
-#define smp_read_barrier_depends()	do { } while (0)
-
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
-#define smp_store_release(p, v)						\
-do {									\
-	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
-	WRITE_ONCE(*p, v);						\
-} while (0)
-
-#define smp_load_acquire(p)						\
-({									\
-	typeof(*p) ___p1 = READ_ONCE(*p);				\
-	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
-	___p1;								\
-})
-
 #define smp_mb__before_atomic()	barrier()
 #define smp_mb__after_atomic()	barrier()
 
+#include <asm-generic/barrier.h>
+
 #endif /* _ASM_METAG_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 11/41] mips: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (9 preceding siblings ...)
  2016-01-10 14:17 ` [PATCH v3 10/41] metag: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
  2016-01-12  1:14   ` [v3,11/41] " Leonid Yegoshin
  2016-01-10 14:18 ` [PATCH v3 12/41] x86/um: " Michael S. Tsirkin
                   ` (30 subsequent siblings)
  41 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar

On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
smp_read_barrier_depends, smp_store_release and smp_load_acquire  match
the asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/mips/include/asm/barrier.h | 25 ++-----------------------
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 752e0b8..3eac4b9 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -10,9 +10,6 @@
 
 #include <asm/addrspace.h>
 
-#define read_barrier_depends()		do { } while(0)
-#define smp_read_barrier_depends()	do { } while(0)
-
 #ifdef CONFIG_CPU_HAS_SYNC
 #define __sync()				\
 	__asm__ __volatile__(			\
@@ -87,8 +84,6 @@
 
 #define wmb()		fast_wmb()
 #define rmb()		fast_rmb()
-#define dma_wmb()	fast_wmb()
-#define dma_rmb()	fast_rmb()
 
 #if defined(CONFIG_WEAK_ORDERING) && defined(CONFIG_SMP)
 # ifdef CONFIG_CPU_CAVIUM_OCTEON
@@ -112,9 +107,6 @@
 #define __WEAK_LLSC_MB		"		\n"
 #endif
 
-#define smp_store_mb(var, value) \
-	do { WRITE_ONCE(var, value); smp_mb(); } while (0)
-
 #define smp_llsc_mb()	__asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
 
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
@@ -129,22 +121,9 @@
 #define nudge_writes() mb()
 #endif
 
-#define smp_store_release(p, v)						\
-do {									\
-	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
-	WRITE_ONCE(*p, v);						\
-} while (0)
-
-#define smp_load_acquire(p)						\
-({									\
-	typeof(*p) ___p1 = READ_ONCE(*p);				\
-	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
-	___p1;								\
-})
-
 #define smp_mb__before_atomic()	smp_mb__before_llsc()
 #define smp_mb__after_atomic()	smp_llsc_mb()
 
+#include <asm-generic/barrier.h>
+
 #endif /* __ASM_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 12/41] x86/um: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (10 preceding siblings ...)
  2016-01-10 14:18 ` [PATCH v3 11/41] mips: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
  2016-01-10 14:18 ` [PATCH v3 13/41] x86: " Michael S. Tsirkin
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Richard Weinberger, Jeff Dike,
	Ingo Molnar, Andy Lutomirski, Borislav Petkov,
	user-mode-linux-user

On x86/um CONFIG_SMP is never defined.  As a result, several macros
match the asm-generic variant exactly. Drop the local definitions and
pull in asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Richard Weinberger <richard@nod.at>
---
 arch/x86/um/asm/barrier.h | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h
index 755481f..174781a 100644
--- a/arch/x86/um/asm/barrier.h
+++ b/arch/x86/um/asm/barrier.h
@@ -36,13 +36,6 @@
 #endif /* CONFIG_X86_PPRO_FENCE */
 #define dma_wmb()	barrier()
 
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0)
-
-#define read_barrier_depends()		do { } while (0)
-#define smp_read_barrier_depends()	do { } while (0)
+#include <asm-generic/barrier.h>
 
 #endif
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 13/41] x86: reuse asm-generic/barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (11 preceding siblings ...)
  2016-01-10 14:18 ` [PATCH v3 12/41] x86/um: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
  2016-01-12 14:10   ` Thomas Gleixner
  2016-01-10 14:18 ` [PATCH v3 14/41] asm-generic: add __smp_xxx wrappers Michael S. Tsirkin
                   ` (28 subsequent siblings)
  41 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski

As on most architectures, on x86 read_barrier_depends and
smp_read_barrier_depends are empty.  Drop the local definitions and pull
the generic ones from asm-generic/barrier.h instead: they are identical.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/x86/include/asm/barrier.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 0681d25..cc4c2a7 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -43,9 +43,6 @@
 #define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0)
 #endif /* SMP */
 
-#define read_barrier_depends()		do { } while (0)
-#define smp_read_barrier_depends()	do { } while (0)
-
 #if defined(CONFIG_X86_PPRO_FENCE)
 
 /*
@@ -91,4 +88,6 @@ do {									\
 #define smp_mb__before_atomic()	barrier()
 #define smp_mb__after_atomic()	barrier()
 
+#include <asm-generic/barrier.h>
+
 #endif /* _ASM_X86_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 14/41] asm-generic: add __smp_xxx wrappers
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (12 preceding siblings ...)
  2016-01-10 14:18 ` [PATCH v3 13/41] x86: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
  2016-01-10 14:18 ` [PATCH v3 15/41] powerpc: define __smp_xxx Michael S. Tsirkin
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel

On !SMP, most architectures define their
barriers as compiler barriers.
On SMP, most need an actual barrier.

Make it possible to remove the code duplication for
!SMP by defining low-level __smp_xxx barriers
which do not depend on the value of SMP, then
use them from asm-generic conditionally.

Besides reducing code duplication, these low level APIs will also be
useful for virtualization, where a barrier is sometimes needed even if
!SMP since we might be talking to another kernel on the same SMP system.

Both virtio and Xen drivers will benefit.

The smp_xxx variants should use __smp_XXX ones or barrier() depending on
SMP, identically for all architectures.

We keep ifndef guards around them for now - once/if all
architectures are converted to use the generic
code, we'll be able to remove these.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 include/asm-generic/barrier.h | 91 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 82 insertions(+), 9 deletions(-)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 987b2e0..8752964 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -54,22 +54,38 @@
 #define read_barrier_depends()		do { } while (0)
 #endif
 
+#ifndef __smp_mb
+#define __smp_mb()	mb()
+#endif
+
+#ifndef __smp_rmb
+#define __smp_rmb()	rmb()
+#endif
+
+#ifndef __smp_wmb
+#define __smp_wmb()	wmb()
+#endif
+
+#ifndef __smp_read_barrier_depends
+#define __smp_read_barrier_depends()	read_barrier_depends()
+#endif
+
 #ifdef CONFIG_SMP
 
 #ifndef smp_mb
-#define smp_mb()	mb()
+#define smp_mb()	__smp_mb()
 #endif
 
 #ifndef smp_rmb
-#define smp_rmb()	rmb()
+#define smp_rmb()	__smp_rmb()
 #endif
 
 #ifndef smp_wmb
-#define smp_wmb()	wmb()
+#define smp_wmb()	__smp_wmb()
 #endif
 
 #ifndef smp_read_barrier_depends
-#define smp_read_barrier_depends()	read_barrier_depends()
+#define smp_read_barrier_depends()	__smp_read_barrier_depends()
 #endif
 
 #else	/* !CONFIG_SMP */
@@ -92,23 +108,78 @@
 
 #endif	/* CONFIG_SMP */
 
+#ifndef __smp_store_mb
+#define __smp_store_mb(var, value)  do { WRITE_ONCE(var, value); __smp_mb(); } while (0)
+#endif
+
+#ifndef __smp_mb__before_atomic
+#define __smp_mb__before_atomic()	__smp_mb()
+#endif
+
+#ifndef __smp_mb__after_atomic
+#define __smp_mb__after_atomic()	__smp_mb()
+#endif
+
+#ifndef __smp_store_release
+#define __smp_store_release(p, v)					\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	__smp_mb();							\
+	WRITE_ONCE(*p, v);						\
+} while (0)
+#endif
+
+#ifndef __smp_load_acquire
+#define __smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = READ_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	__smp_mb();							\
+	___p1;								\
+})
+#endif
+
+#ifdef CONFIG_SMP
+
+#ifndef smp_store_mb
+#define smp_store_mb(var, value)  __smp_store_mb(var, value)
+#endif
+
+#ifndef smp_mb__before_atomic
+#define smp_mb__before_atomic()	__smp_mb__before_atomic()
+#endif
+
+#ifndef smp_mb__after_atomic
+#define smp_mb__after_atomic()	__smp_mb__after_atomic()
+#endif
+
+#ifndef smp_store_release
+#define smp_store_release(p, v) __smp_store_release(p, v)
+#endif
+
+#ifndef smp_load_acquire
+#define smp_load_acquire(p) __smp_load_acquire(p)
+#endif
+
+#else	/* !CONFIG_SMP */
+
 #ifndef smp_store_mb
-#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); smp_mb(); } while (0)
+#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); barrier(); } while (0)
 #endif
 
 #ifndef smp_mb__before_atomic
-#define smp_mb__before_atomic()	smp_mb()
+#define smp_mb__before_atomic()	barrier()
 #endif
 
 #ifndef smp_mb__after_atomic
-#define smp_mb__after_atomic()	smp_mb()
+#define smp_mb__after_atomic()	barrier()
 #endif
 
 #ifndef smp_store_release
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
+	barrier();							\
 	WRITE_ONCE(*p, v);						\
 } while (0)
 #endif
@@ -118,10 +189,12 @@ do {									\
 ({									\
 	typeof(*p) ___p1 = READ_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
+	barrier();							\
 	___p1;								\
 })
 #endif
 
+#endif
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 15/41] powerpc: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (13 preceding siblings ...)
  2016-01-10 14:18 ` [PATCH v3 14/41] asm-generic: add __smp_xxx wrappers Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
  2016-01-10 14:18 ` [PATCH v3 16/41] arm64: " Michael S. Tsirkin
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Boqun Feng, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Ingo Molnar, Davidlohr Bueso,
	Paul E. McKenney

This defines __smp_xxx barriers for powerpc
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

This reduces the amount of arch-specific boiler-plate code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
---
 arch/powerpc/include/asm/barrier.h | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index 980ad0c..c0deafc 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -44,19 +44,11 @@
 #define dma_rmb()	__lwsync()
 #define dma_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 
-#ifdef CONFIG_SMP
-#define smp_lwsync()	__lwsync()
+#define __smp_lwsync()	__lwsync()
 
-#define smp_mb()	mb()
-#define smp_rmb()	__lwsync()
-#define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
-#else
-#define smp_lwsync()	barrier()
-
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#endif /* CONFIG_SMP */
+#define __smp_mb()	mb()
+#define __smp_rmb()	__lwsync()
+#define __smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 
 /*
  * This is a barrier which prevents following instructions from being
@@ -67,18 +59,18 @@
 #define data_barrier(x)	\
 	asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory");
 
-#define smp_store_release(p, v)						\
+#define __smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
-	smp_lwsync();							\
+	__smp_lwsync();							\
 	WRITE_ONCE(*p, v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define __smp_load_acquire(p)						\
 ({									\
 	typeof(*p) ___p1 = READ_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
-	smp_lwsync();							\
+	__smp_lwsync();							\
 	___p1;								\
 })
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 16/41] arm64: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (14 preceding siblings ...)
  2016-01-10 14:18 ` [PATCH v3 15/41] powerpc: define __smp_xxx Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
  2016-01-10 14:18 ` [PATCH v3 17/41] arm: " Michael S. Tsirkin
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Catalin Marinas, Will Deacon,
	Ingo Molnar, Andre Przywara

This defines __smp_xxx barriers for arm64,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Note: arm64 does not support !SMP config,
so smp_xxx and __smp_xxx are always equivalent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm64/include/asm/barrier.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 91a43f4..dae5c49 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -35,11 +35,11 @@
 #define dma_rmb()	dmb(oshld)
 #define dma_wmb()	dmb(oshst)
 
-#define smp_mb()	dmb(ish)
-#define smp_rmb()	dmb(ishld)
-#define smp_wmb()	dmb(ishst)
+#define __smp_mb()	dmb(ish)
+#define __smp_rmb()	dmb(ishld)
+#define __smp_wmb()	dmb(ishst)
 
-#define smp_store_release(p, v)						\
+#define __smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	switch (sizeof(*p)) {						\
@@ -62,7 +62,7 @@ do {									\
 	}								\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define __smp_load_acquire(p)						\
 ({									\
 	union { typeof(*p) __val; char __c[1]; } __u;			\
 	compiletime_assert_atomic_type(*p);				\
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 17/41] arm: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (15 preceding siblings ...)
  2016-01-10 14:18 ` [PATCH v3 16/41] arm64: " Michael S. Tsirkin
@ 2016-01-10 14:18 ` Michael S. Tsirkin
  2016-01-10 14:19 ` [PATCH v3 18/41] blackfin: " Michael S. Tsirkin
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Russell King, Ingo Molnar,
	Richard Woodruff

This defines __smp_xxx barriers for arm,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

This reduces the amount of arch-specific boiler-plate code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/include/asm/barrier.h | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 31152e8..112cc1a 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -60,15 +60,9 @@ extern void arm_heavy_mb(void);
 #define dma_wmb()	barrier()
 #endif
 
-#ifndef CONFIG_SMP
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#else
-#define smp_mb()	dmb(ish)
-#define smp_rmb()	smp_mb()
-#define smp_wmb()	dmb(ishst)
-#endif
+#define __smp_mb()	dmb(ish)
+#define __smp_rmb()	__smp_mb()
+#define __smp_wmb()	dmb(ishst)
 
 #include <asm-generic/barrier.h>
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 18/41] blackfin: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (16 preceding siblings ...)
  2016-01-10 14:18 ` [PATCH v3 17/41] arm: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
  2016-01-10 14:19 ` [PATCH v3 19/41] ia64: " Michael S. Tsirkin
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Steven Miao

This defines __smp_xxx barriers for blackfin,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/blackfin/include/asm/barrier.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/blackfin/include/asm/barrier.h b/arch/blackfin/include/asm/barrier.h
index dfb66fe..7cca51c 100644
--- a/arch/blackfin/include/asm/barrier.h
+++ b/arch/blackfin/include/asm/barrier.h
@@ -78,8 +78,8 @@
 
 #endif /* !CONFIG_SMP */
 
-#define smp_mb__before_atomic()	barrier()
-#define smp_mb__after_atomic()	barrier()
+#define __smp_mb__before_atomic()	barrier()
+#define __smp_mb__after_atomic()	barrier()
 
 #include <asm-generic/barrier.h>
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 19/41] ia64: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (17 preceding siblings ...)
  2016-01-10 14:19 ` [PATCH v3 18/41] blackfin: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
  2016-01-10 14:19 ` [PATCH v3 20/41] metag: " Michael S. Tsirkin
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Tony Luck, Fenghua Yu, Ingo Molnar,
	Davidlohr Bueso

This defines __smp_xxx barriers for ia64,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

This reduces the amount of arch-specific boiler-plate code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/ia64/include/asm/barrier.h | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index 2f93348..588f161 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -42,28 +42,24 @@
 #define dma_rmb()	mb()
 #define dma_wmb()	mb()
 
-#ifdef CONFIG_SMP
-# define smp_mb()	mb()
-#else
-# define smp_mb()	barrier()
-#endif
+# define __smp_mb()	mb()
 
-#define smp_mb__before_atomic()	barrier()
-#define smp_mb__after_atomic()	barrier()
+#define __smp_mb__before_atomic()	barrier()
+#define __smp_mb__after_atomic()	barrier()
 
 /*
  * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
  * need for asm trickery!
  */
 
-#define smp_store_release(p, v)						\
+#define __smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	barrier();							\
 	WRITE_ONCE(*p, v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define __smp_load_acquire(p)						\
 ({									\
 	typeof(*p) ___p1 = READ_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 20/41] metag: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (18 preceding siblings ...)
  2016-01-10 14:19 ` [PATCH v3 19/41] ia64: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
  2016-01-10 14:19 ` [PATCH v3 21/41] mips: " Michael S. Tsirkin
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, James Hogan, Ingo Molnar,
	Davidlohr Bueso

This defines __smp_xxx barriers for metag,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Note: as __smp_XX macros should not depend on CONFIG_SMP, they can not
use the existing fence() macro since that is defined differently between
SMP and !SMP.  For this reason, this patch introduces a wrapper
metag_fence() that doesn't depend on CONFIG_SMP.
fence() is then defined using that, depending on CONFIG_SMP.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/metag/include/asm/barrier.h | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index b5b778b..84880c9 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -44,13 +44,6 @@ static inline void wr_fence(void)
 #define rmb()		barrier()
 #define wmb()		mb()
 
-#ifndef CONFIG_SMP
-#define fence()		do { } while (0)
-#define smp_mb()        barrier()
-#define smp_rmb()       barrier()
-#define smp_wmb()       barrier()
-#else
-
 #ifdef CONFIG_METAG_SMP_WRITE_REORDERING
 /*
  * Write to the atomic memory unlock system event register (command 0). This is
@@ -60,26 +53,31 @@ static inline void wr_fence(void)
  * incoherence). It is therefore ineffective if used after and on the same
  * thread as a write.
  */
-static inline void fence(void)
+static inline void metag_fence(void)
 {
 	volatile int *flushptr = (volatile int *) LINSYSEVENT_WR_ATOMIC_UNLOCK;
 	barrier();
 	*flushptr = 0;
 	barrier();
 }
-#define smp_mb()        fence()
-#define smp_rmb()       fence()
-#define smp_wmb()       barrier()
+#define __smp_mb()        metag_fence()
+#define __smp_rmb()       metag_fence()
+#define __smp_wmb()       barrier()
 #else
-#define fence()		do { } while (0)
-#define smp_mb()        barrier()
-#define smp_rmb()       barrier()
-#define smp_wmb()       barrier()
+#define metag_fence()		do { } while (0)
+#define __smp_mb()        barrier()
+#define __smp_rmb()       barrier()
+#define __smp_wmb()       barrier()
 #endif
+
+#ifdef CONFIG_SMP
+#define fence() metag_fence()
+#else
+#define fence()		do { } while (0)
 #endif
 
-#define smp_mb__before_atomic()	barrier()
-#define smp_mb__after_atomic()	barrier()
+#define __smp_mb__before_atomic()	barrier()
+#define __smp_mb__after_atomic()	barrier()
 
 #include <asm-generic/barrier.h>
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 21/41] mips: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (19 preceding siblings ...)
  2016-01-10 14:19 ` [PATCH v3 20/41] metag: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
  2016-01-10 14:19 ` [PATCH v3 22/41] s390: " Michael S. Tsirkin
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar,
	Davidlohr Bueso

This defines __smp_xxx barriers for mips,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Note: the only exception is smp_mb__before_llsc which is mips-specific.
We define both the __smp_mb__before_llsc variant (for use in
asm/barriers.h) and smp_mb__before_llsc (for use elsewhere on this
architecture).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/mips/include/asm/barrier.h | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 3eac4b9..d296633 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -85,20 +85,20 @@
 #define wmb()		fast_wmb()
 #define rmb()		fast_rmb()
 
-#if defined(CONFIG_WEAK_ORDERING) && defined(CONFIG_SMP)
+#if defined(CONFIG_WEAK_ORDERING)
 # ifdef CONFIG_CPU_CAVIUM_OCTEON
-#  define smp_mb()	__sync()
-#  define smp_rmb()	barrier()
-#  define smp_wmb()	__syncw()
+#  define __smp_mb()	__sync()
+#  define __smp_rmb()	barrier()
+#  define __smp_wmb()	__syncw()
 # else
-#  define smp_mb()	__asm__ __volatile__("sync" : : :"memory")
-#  define smp_rmb()	__asm__ __volatile__("sync" : : :"memory")
-#  define smp_wmb()	__asm__ __volatile__("sync" : : :"memory")
+#  define __smp_mb()	__asm__ __volatile__("sync" : : :"memory")
+#  define __smp_rmb()	__asm__ __volatile__("sync" : : :"memory")
+#  define __smp_wmb()	__asm__ __volatile__("sync" : : :"memory")
 # endif
 #else
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
+#define __smp_mb()	barrier()
+#define __smp_rmb()	barrier()
+#define __smp_wmb()	barrier()
 #endif
 
 #if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP)
@@ -111,6 +111,7 @@
 
 #ifdef CONFIG_CPU_CAVIUM_OCTEON
 #define smp_mb__before_llsc() smp_wmb()
+#define __smp_mb__before_llsc() __smp_wmb()
 /* Cause previous writes to become visible on all CPUs as soon as possible */
 #define nudge_writes() __asm__ __volatile__(".set push\n\t"		\
 					    ".set arch=octeon\n\t"	\
@@ -118,11 +119,12 @@
 					    ".set pop" : : : "memory")
 #else
 #define smp_mb__before_llsc() smp_llsc_mb()
+#define __smp_mb__before_llsc() smp_llsc_mb()
 #define nudge_writes() mb()
 #endif
 
-#define smp_mb__before_atomic()	smp_mb__before_llsc()
-#define smp_mb__after_atomic()	smp_llsc_mb()
+#define __smp_mb__before_atomic()	__smp_mb__before_llsc()
+#define __smp_mb__after_atomic()	smp_llsc_mb()
 
 #include <asm-generic/barrier.h>
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 22/41] s390: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (20 preceding siblings ...)
  2016-01-10 14:19 ` [PATCH v3 21/41] mips: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
  2016-01-10 14:19 ` [PATCH v3 23/41] sh: define __smp_xxx, fix smp_store_mb for !SMP Michael S. Tsirkin
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Martin Schwidefsky, Heiko Carstens,
	Ingo Molnar, Davidlohr Bueso, Christian Borntraeger

This defines __smp_xxx barriers for s390,
for use by virtualization.

Some smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Note: smp_mb, smp_rmb and smp_wmb are defined as full barriers
unconditionally on this architecture.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
 arch/s390/include/asm/barrier.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index c358c31..fbd25b2 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -26,18 +26,21 @@
 #define wmb()				barrier()
 #define dma_rmb()			mb()
 #define dma_wmb()			mb()
-#define smp_mb()			mb()
-#define smp_rmb()			rmb()
-#define smp_wmb()			wmb()
-
-#define smp_store_release(p, v)						\
+#define __smp_mb()			mb()
+#define __smp_rmb()			rmb()
+#define __smp_wmb()			wmb()
+#define smp_mb()			__smp_mb()
+#define smp_rmb()			__smp_rmb()
+#define smp_wmb()			__smp_wmb()
+
+#define __smp_store_release(p, v)					\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	barrier();							\
 	WRITE_ONCE(*p, v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define __smp_load_acquire(p)						\
 ({									\
 	typeof(*p) ___p1 = READ_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 23/41] sh: define __smp_xxx, fix smp_store_mb for !SMP
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (21 preceding siblings ...)
  2016-01-10 14:19 ` [PATCH v3 22/41] s390: " Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
  2016-01-10 14:19 ` [PATCH v3 24/41] sparc: define __smp_xxx Michael S. Tsirkin
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ingo Molnar

sh variant of smp_store_mb() calls xchg() on !SMP which is stronger than
implied by both the name and the documentation.

define __smp_store_mb instead: code in asm-generic/barrier.h
will then define smp_store_mb correctly depending on
CONFIG_SMP.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/sh/include/asm/barrier.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/sh/include/asm/barrier.h b/arch/sh/include/asm/barrier.h
index bf91037..f887c64 100644
--- a/arch/sh/include/asm/barrier.h
+++ b/arch/sh/include/asm/barrier.h
@@ -32,7 +32,8 @@
 #define ctrl_barrier()	__asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop")
 #endif
 
-#define smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
+#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
+#define smp_store_mb(var, value) __smp_store_mb(var, value)
 
 #include <asm-generic/barrier.h>
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 24/41] sparc: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (22 preceding siblings ...)
  2016-01-10 14:19 ` [PATCH v3 23/41] sh: define __smp_xxx, fix smp_store_mb for !SMP Michael S. Tsirkin
@ 2016-01-10 14:19 ` Michael S. Tsirkin
  2016-01-10 14:20 ` [PATCH v3 25/41] tile: " Michael S. Tsirkin
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ingo Molnar

This defines __smp_xxx barriers for sparc,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/include/asm/barrier_64.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 26c3f72..c9f6ee6 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -37,14 +37,14 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 #define rmb()	__asm__ __volatile__("":::"memory")
 #define wmb()	__asm__ __volatile__("":::"memory")
 
-#define smp_store_release(p, v)						\
+#define __smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	barrier();							\
 	WRITE_ONCE(*p, v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define __smp_load_acquire(p)						\
 ({									\
 	typeof(*p) ___p1 = READ_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
@@ -52,8 +52,8 @@ do {									\
 	___p1;								\
 })
 
-#define smp_mb__before_atomic()	barrier()
-#define smp_mb__after_atomic()	barrier()
+#define __smp_mb__before_atomic()	barrier()
+#define __smp_mb__after_atomic()	barrier()
 
 #include <asm-generic/barrier.h>
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 25/41] tile: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (23 preceding siblings ...)
  2016-01-10 14:19 ` [PATCH v3 24/41] sparc: define __smp_xxx Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
  2016-01-10 14:20 ` [PATCH v3 26/41] xtensa: " Michael S. Tsirkin
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Chris Metcalf

This defines __smp_xxx barriers for tile,
for use by virtualization.

Some smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Note: for 32 bit, keep smp_mb__after_atomic around since it's faster
than the generic implementation.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/tile/include/asm/barrier.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/tile/include/asm/barrier.h b/arch/tile/include/asm/barrier.h
index 96a42ae..d552228 100644
--- a/arch/tile/include/asm/barrier.h
+++ b/arch/tile/include/asm/barrier.h
@@ -79,11 +79,12 @@ mb_incoherent(void)
  * But after the word is updated, the routine issues an "mf" before returning,
  * and since it's a function call, we don't even need a compiler barrier.
  */
-#define smp_mb__before_atomic()	smp_mb()
-#define smp_mb__after_atomic()	do { } while (0)
+#define __smp_mb__before_atomic()	__smp_mb()
+#define __smp_mb__after_atomic()	do { } while (0)
+#define smp_mb__after_atomic()	__smp_mb__after_atomic()
 #else /* 64 bit */
-#define smp_mb__before_atomic()	smp_mb()
-#define smp_mb__after_atomic()	smp_mb()
+#define __smp_mb__before_atomic()	__smp_mb()
+#define __smp_mb__after_atomic()	__smp_mb()
 #endif
 
 #include <asm-generic/barrier.h>
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 26/41] xtensa: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (24 preceding siblings ...)
  2016-01-10 14:20 ` [PATCH v3 25/41] tile: " Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
  2016-01-10 14:20 ` [PATCH v3 27/41] x86: " Michael S. Tsirkin
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Chris Zankel, Max Filippov

This defines __smp_xxx barriers for xtensa,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/xtensa/include/asm/barrier.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/xtensa/include/asm/barrier.h b/arch/xtensa/include/asm/barrier.h
index 5b88774..956596e 100644
--- a/arch/xtensa/include/asm/barrier.h
+++ b/arch/xtensa/include/asm/barrier.h
@@ -13,8 +13,8 @@
 #define rmb() barrier()
 #define wmb() mb()
 
-#define smp_mb__before_atomic()		barrier()
-#define smp_mb__after_atomic()		barrier()
+#define __smp_mb__before_atomic()		barrier()
+#define __smp_mb__after_atomic()		barrier()
 
 #include <asm-generic/barrier.h>
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 27/41] x86: define __smp_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (25 preceding siblings ...)
  2016-01-10 14:20 ` [PATCH v3 26/41] xtensa: " Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
  2016-01-12 14:11   ` Thomas Gleixner
  2016-01-10 14:20 ` [PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers Michael S. Tsirkin
                   ` (14 subsequent siblings)
  41 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski

This defines __smp_xxx barriers for x86,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/x86/include/asm/barrier.h | 31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index cc4c2a7..a584e1c 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -31,17 +31,10 @@
 #endif
 #define dma_wmb()	barrier()
 
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
-#define smp_rmb()	dma_rmb()
-#define smp_wmb()	barrier()
-#define smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
-#else /* !SMP */
-#define smp_mb()	barrier()
-#define smp_rmb()	barrier()
-#define smp_wmb()	barrier()
-#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); barrier(); } while (0)
-#endif /* SMP */
+#define __smp_mb()	mb()
+#define __smp_rmb()	dma_rmb()
+#define __smp_wmb()	barrier()
+#define __smp_store_mb(var, value) do { (void)xchg(&var, value); } while (0)
 
 #if defined(CONFIG_X86_PPRO_FENCE)
 
@@ -50,31 +43,31 @@
  * model and we should fall back to full barriers.
  */
 
-#define smp_store_release(p, v)						\
+#define __smp_store_release(p, v)					\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
+	__smp_mb();							\
 	WRITE_ONCE(*p, v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define __smp_load_acquire(p)						\
 ({									\
 	typeof(*p) ___p1 = READ_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
-	smp_mb();							\
+	__smp_mb();							\
 	___p1;								\
 })
 
 #else /* regular x86 TSO memory ordering */
 
-#define smp_store_release(p, v)						\
+#define __smp_store_release(p, v)					\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
 	barrier();							\
 	WRITE_ONCE(*p, v);						\
 } while (0)
 
-#define smp_load_acquire(p)						\
+#define __smp_load_acquire(p)						\
 ({									\
 	typeof(*p) ___p1 = READ_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
@@ -85,8 +78,8 @@ do {									\
 #endif
 
 /* Atomic operations are already serializing on x86 */
-#define smp_mb__before_atomic()	barrier()
-#define smp_mb__after_atomic()	barrier()
+#define __smp_mb__before_atomic()	barrier()
+#define __smp_mb__after_atomic()	barrier()
 
 #include <asm-generic/barrier.h>
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (26 preceding siblings ...)
  2016-01-10 14:20 ` [PATCH v3 27/41] x86: " Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
  2016-01-10 14:20 ` [PATCH v3 29/41] Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" Michael S. Tsirkin
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Jonathan Corbet, linux-doc

Guests running within virtual machines might be affected by SMP effects even if
the guest itself is compiled without SMP support.  This is an artifact of
interfacing with an SMP host while running an UP kernel.  Using mandatory
barriers for this use-case would be possible but is often suboptimal.

In particular, virtio uses a bunch of confusing ifdefs to work around
this, while xen just uses the mandatory barriers.

To better handle this case, low-level virt_mb() etc macros are made available.
These are implemented trivially using the low-level __smp_xxx macros,
the purpose of these wrappers is to annotate those specific cases.

These have the same effect as smp_mb() etc when SMP is enabled, but generate
identical code for SMP and non-SMP systems. For example, virtual machine guests
should use virt_mb() rather than smp_mb() when synchronizing against a
(possibly SMP) host.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/asm-generic/barrier.h     | 11 +++++++++++
 Documentation/memory-barriers.txt | 28 +++++++++++++++++++++++-----
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 8752964..1cceca14 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -196,5 +196,16 @@ do {									\
 
 #endif
 
+/* Barriers for virtual machine guests when talking to an SMP host */
+#define virt_mb() __smp_mb()
+#define virt_rmb() __smp_rmb()
+#define virt_wmb() __smp_wmb()
+#define virt_read_barrier_depends() __smp_read_barrier_depends()
+#define virt_store_mb(var, value) __smp_store_mb(var, value)
+#define virt_mb__before_atomic() __smp_mb__before_atomic()
+#define virt_mb__after_atomic()	__smp_mb__after_atomic()
+#define virt_store_release(p, v) __smp_store_release(p, v)
+#define virt_load_acquire(p) __smp_load_acquire(p)
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index aef9487..8f4a93a 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1655,17 +1655,18 @@ macro is a good place to start looking.
 SMP memory barriers are reduced to compiler barriers on uniprocessor compiled
 systems because it is assumed that a CPU will appear to be self-consistent,
 and will order overlapping accesses correctly with respect to itself.
+However, see the subsection on "Virtual Machine Guests" below.
 
 [!] Note that SMP memory barriers _must_ be used to control the ordering of
 references to shared memory on SMP systems, though the use of locking instead
 is sufficient.
 
 Mandatory barriers should not be used to control SMP effects, since mandatory
-barriers unnecessarily impose overhead on UP systems. They may, however, be
-used to control MMIO effects on accesses through relaxed memory I/O windows.
-These are required even on non-SMP systems as they affect the order in which
-memory operations appear to a device by prohibiting both the compiler and the
-CPU from reordering them.
+barriers impose unnecessary overhead on both SMP and UP systems. They may,
+however, be used to control MMIO effects on accesses through relaxed memory I/O
+windows.  These barriers are required even on non-SMP systems as they affect
+the order in which memory operations appear to a device by prohibiting both the
+compiler and the CPU from reordering them.
 
 
 There are some more advanced barrier functions:
@@ -2948,6 +2949,23 @@ The Alpha defines the Linux kernel's memory barrier model.
 
 See the subsection on "Cache Coherency" above.
 
+VIRTUAL MACHINE GUESTS
+-------------------
+
+Guests running within virtual machines might be affected by SMP effects even if
+the guest itself is compiled without SMP support.  This is an artifact of
+interfacing with an SMP host while running an UP kernel.  Using mandatory
+barriers for this use-case would be possible but is often suboptimal.
+
+To handle this case optimally, low-level virt_mb() etc macros are available.
+These have the same effect as smp_mb() etc when SMP is enabled, but generate
+identical code for SMP and non-SMP systems. For example, virtual machine guests
+should use virt_mb() rather than smp_mb() when synchronizing against a
+(possibly SMP) host.
+
+These are equivalent to smp_mb() etc counterparts in all other respects,
+in particular, they do not control MMIO effects: to control
+MMIO effects, use mandatory barriers.
 
 ============
 EXAMPLE USES
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 29/41] Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (27 preceding siblings ...)
  2016-01-10 14:20 ` [PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
  2016-01-10 14:20 ` [PATCH v3 30/41] virtio_ring: update weak barriers to use virt_xxx Michael S. Tsirkin
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Alexander Duyck

This reverts commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9.

While that commit optimizes !CONFIG_SMP, it mixes
up DMA and SMP concepts, making the code hard
to figure out.

A better way to optimize this is with the new __smp_XXX
barriers.

As a first step, go back to full rmb/wmb barriers
for !SMP.
We switch to __smp_XXX barriers in the next patch.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/virtio_ring.h | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 8e50888..67e06fe 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -21,20 +21,19 @@
  * actually quite cheap.
  */
 
+#ifdef CONFIG_SMP
 static inline void virtio_mb(bool weak_barriers)
 {
-#ifdef CONFIG_SMP
 	if (weak_barriers)
 		smp_mb();
 	else
-#endif
 		mb();
 }
 
 static inline void virtio_rmb(bool weak_barriers)
 {
 	if (weak_barriers)
-		dma_rmb();
+		smp_rmb();
 	else
 		rmb();
 }
@@ -42,10 +41,26 @@ static inline void virtio_rmb(bool weak_barriers)
 static inline void virtio_wmb(bool weak_barriers)
 {
 	if (weak_barriers)
-		dma_wmb();
+		smp_wmb();
 	else
 		wmb();
 }
+#else
+static inline void virtio_mb(bool weak_barriers)
+{
+	mb();
+}
+
+static inline void virtio_rmb(bool weak_barriers)
+{
+	rmb();
+}
+
+static inline void virtio_wmb(bool weak_barriers)
+{
+	wmb();
+}
+#endif
 
 struct virtio_device;
 struct virtqueue;
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 30/41] virtio_ring: update weak barriers to use virt_xxx
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (28 preceding siblings ...)
  2016-01-10 14:20 ` [PATCH v3 29/41] Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
  2016-01-10 14:20 ` [PATCH v3 31/41] sh: support 1 and 2 byte xchg Michael S. Tsirkin
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Alexander Duyck

virtio ring uses smp_wmb on SMP and wmb on !SMP,
the reason for the later being that it might be
talking to another kernel on the same SMP machine.

This is exactly what virt_xxx barriers do,
so switch to these instead of homegrown ifdef hacks.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/virtio_ring.h | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 67e06fe..f3fa55b 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -12,7 +12,7 @@
  * anyone care?
  *
  * For virtio_pci on SMP, we don't need to order with respect to MMIO
- * accesses through relaxed memory I/O windows, so smp_mb() et al are
+ * accesses through relaxed memory I/O windows, so virt_mb() et al are
  * sufficient.
  *
  * For using virtio to talk to real devices (eg. other heterogeneous
@@ -21,11 +21,10 @@
  * actually quite cheap.
  */
 
-#ifdef CONFIG_SMP
 static inline void virtio_mb(bool weak_barriers)
 {
 	if (weak_barriers)
-		smp_mb();
+		virt_mb();
 	else
 		mb();
 }
@@ -33,7 +32,7 @@ static inline void virtio_mb(bool weak_barriers)
 static inline void virtio_rmb(bool weak_barriers)
 {
 	if (weak_barriers)
-		smp_rmb();
+		virt_rmb();
 	else
 		rmb();
 }
@@ -41,26 +40,10 @@ static inline void virtio_rmb(bool weak_barriers)
 static inline void virtio_wmb(bool weak_barriers)
 {
 	if (weak_barriers)
-		smp_wmb();
+		virt_wmb();
 	else
 		wmb();
 }
-#else
-static inline void virtio_mb(bool weak_barriers)
-{
-	mb();
-}
-
-static inline void virtio_rmb(bool weak_barriers)
-{
-	rmb();
-}
-
-static inline void virtio_wmb(bool weak_barriers)
-{
-	wmb();
-}
-#endif
 
 struct virtio_device;
 struct virtqueue;
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 31/41] sh: support 1 and 2 byte xchg
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (29 preceding siblings ...)
  2016-01-10 14:20 ` [PATCH v3 30/41] virtio_ring: update weak barriers to use virt_xxx Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
  2016-01-10 14:20 ` [PATCH v3 32/41] sh: move xchg_cmpxchg to a header by itself Michael S. Tsirkin
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Rich Felker

This completes the xchg implementation for sh architecture.  Note: The
llsc variant is tricky since this only supports 4 byte atomics, the
existing implementation of 1 byte xchg is wrong: we need to do a 4 byte
cmpxchg and retry if any bytes changed meanwhile.

Write this in C for clarity.

Suggested-by: Rich Felker <dalias@libc.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/sh/include/asm/cmpxchg-grb.h  | 22 +++++++++++++++
 arch/sh/include/asm/cmpxchg-irq.h  | 11 ++++++++
 arch/sh/include/asm/cmpxchg-llsc.h | 58 +++++++++++++++++++++++---------------
 arch/sh/include/asm/cmpxchg.h      |  3 ++
 4 files changed, 72 insertions(+), 22 deletions(-)

diff --git a/arch/sh/include/asm/cmpxchg-grb.h b/arch/sh/include/asm/cmpxchg-grb.h
index f848dec..2ed557b 100644
--- a/arch/sh/include/asm/cmpxchg-grb.h
+++ b/arch/sh/include/asm/cmpxchg-grb.h
@@ -23,6 +23,28 @@ static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
 	return retval;
 }
 
+static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
+{
+	unsigned long retval;
+
+	__asm__ __volatile__ (
+		"   .align  2             \n\t"
+		"   mova    1f,   r0      \n\t" /* r0 = end point */
+		"   mov    r15,   r1      \n\t" /* r1 = saved sp */
+		"   mov    #-6,   r15     \n\t" /* LOGIN */
+		"   mov.w  @%1,   %0      \n\t" /* load  old value */
+		"   extu.w  %0,   %0      \n\t" /* extend as unsigned */
+		"   mov.w   %2,   @%1     \n\t" /* store new value */
+		"1: mov     r1,   r15     \n\t" /* LOGOUT */
+		: "=&r" (retval),
+		  "+r"  (m),
+		  "+r"  (val)		/* inhibit r15 overloading */
+		:
+		: "memory" , "r0", "r1");
+
+	return retval;
+}
+
 static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
 {
 	unsigned long retval;
diff --git a/arch/sh/include/asm/cmpxchg-irq.h b/arch/sh/include/asm/cmpxchg-irq.h
index bd11f63..f888772 100644
--- a/arch/sh/include/asm/cmpxchg-irq.h
+++ b/arch/sh/include/asm/cmpxchg-irq.h
@@ -14,6 +14,17 @@ static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
 	return retval;
 }
 
+static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
+{
+	unsigned long flags, retval;
+
+	local_irq_save(flags);
+	retval = *m;
+	*m = val;
+	local_irq_restore(flags);
+	return retval;
+}
+
 static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
 {
 	unsigned long flags, retval;
diff --git a/arch/sh/include/asm/cmpxchg-llsc.h b/arch/sh/include/asm/cmpxchg-llsc.h
index 4713666..e754794 100644
--- a/arch/sh/include/asm/cmpxchg-llsc.h
+++ b/arch/sh/include/asm/cmpxchg-llsc.h
@@ -1,6 +1,9 @@
 #ifndef __ASM_SH_CMPXCHG_LLSC_H
 #define __ASM_SH_CMPXCHG_LLSC_H
 
+#include <linux/bitops.h>
+#include <asm/byteorder.h>
+
 static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
 {
 	unsigned long retval;
@@ -22,29 +25,8 @@ static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
 	return retval;
 }
 
-static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
-{
-	unsigned long retval;
-	unsigned long tmp;
-
-	__asm__ __volatile__ (
-		"1:					\n\t"
-		"movli.l	@%2, %0	! xchg_u8	\n\t"
-		"mov		%0, %1			\n\t"
-		"mov		%3, %0			\n\t"
-		"movco.l	%0, @%2			\n\t"
-		"bf		1b			\n\t"
-		"synco					\n\t"
-		: "=&z"(tmp), "=&r" (retval)
-		: "r" (m), "r" (val & 0xff)
-		: "t", "memory"
-	);
-
-	return retval;
-}
-
 static inline unsigned long
-__cmpxchg_u32(volatile int *m, unsigned long old, unsigned long new)
+__cmpxchg_u32(volatile u32 *m, unsigned long old, unsigned long new)
 {
 	unsigned long retval;
 	unsigned long tmp;
@@ -68,4 +50,36 @@ __cmpxchg_u32(volatile int *m, unsigned long old, unsigned long new)
 	return retval;
 }
 
+static inline u32 __xchg_cmpxchg(volatile void *ptr, u32 x, int size)
+{
+	int off = (unsigned long)ptr % sizeof(u32);
+	volatile u32 *p = ptr - off;
+#ifdef __BIG_ENDIAN
+	int bitoff = (sizeof(u32) - 1 - off) * BITS_PER_BYTE;
+#else
+	int bitoff = off * BITS_PER_BYTE;
+#endif
+	u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;
+	u32 oldv, newv;
+	u32 ret;
+
+	do {
+		oldv = READ_ONCE(*p);
+		ret = (oldv & bitmask) >> bitoff;
+		newv = (oldv & ~bitmask) | (x << bitoff);
+	} while (__cmpxchg_u32(p, oldv, newv) != oldv);
+
+	return ret;
+}
+
+static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
+{
+	return __xchg_cmpxchg(m, val, sizeof *m);
+}
+
+static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
+{
+	return __xchg_cmpxchg(m, val, sizeof *m);
+}
+
 #endif /* __ASM_SH_CMPXCHG_LLSC_H */
diff --git a/arch/sh/include/asm/cmpxchg.h b/arch/sh/include/asm/cmpxchg.h
index 85c97b18..5225916 100644
--- a/arch/sh/include/asm/cmpxchg.h
+++ b/arch/sh/include/asm/cmpxchg.h
@@ -27,6 +27,9 @@ extern void __xchg_called_with_bad_pointer(void);
 	case 4:						\
 		__xchg__res = xchg_u32(__xchg_ptr, x);	\
 		break;					\
+	case 2:						\
+		__xchg__res = xchg_u16(__xchg_ptr, x);	\
+		break;					\
 	case 1:						\
 		__xchg__res = xchg_u8(__xchg_ptr, x);	\
 		break;					\
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 32/41] sh: move xchg_cmpxchg to a header by itself
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (30 preceding siblings ...)
  2016-01-10 14:20 ` [PATCH v3 31/41] sh: support 1 and 2 byte xchg Michael S. Tsirkin
@ 2016-01-10 14:20 ` Michael S. Tsirkin
  2016-01-10 14:21 ` [PATCH v3 33/41] virtio_ring: use virt_store_mb Michael S. Tsirkin
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Rich Felker

Looks like future sh variants will support a 4-byte cas which will be
used to implement 1 and 2 byte xchg.

This is exactly what we do for llsc now, move the portable part of the
code into a separate header so it's easy to reuse.

Suggested-by:  Rich Felker <dalias@libc.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/sh/include/asm/cmpxchg-llsc.h | 35 +-------------------------
 arch/sh/include/asm/cmpxchg-xchg.h | 51 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 34 deletions(-)
 create mode 100644 arch/sh/include/asm/cmpxchg-xchg.h

diff --git a/arch/sh/include/asm/cmpxchg-llsc.h b/arch/sh/include/asm/cmpxchg-llsc.h
index e754794..fcfd322 100644
--- a/arch/sh/include/asm/cmpxchg-llsc.h
+++ b/arch/sh/include/asm/cmpxchg-llsc.h
@@ -1,9 +1,6 @@
 #ifndef __ASM_SH_CMPXCHG_LLSC_H
 #define __ASM_SH_CMPXCHG_LLSC_H
 
-#include <linux/bitops.h>
-#include <asm/byteorder.h>
-
 static inline unsigned long xchg_u32(volatile u32 *m, unsigned long val)
 {
 	unsigned long retval;
@@ -50,36 +47,6 @@ __cmpxchg_u32(volatile u32 *m, unsigned long old, unsigned long new)
 	return retval;
 }
 
-static inline u32 __xchg_cmpxchg(volatile void *ptr, u32 x, int size)
-{
-	int off = (unsigned long)ptr % sizeof(u32);
-	volatile u32 *p = ptr - off;
-#ifdef __BIG_ENDIAN
-	int bitoff = (sizeof(u32) - 1 - off) * BITS_PER_BYTE;
-#else
-	int bitoff = off * BITS_PER_BYTE;
-#endif
-	u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;
-	u32 oldv, newv;
-	u32 ret;
-
-	do {
-		oldv = READ_ONCE(*p);
-		ret = (oldv & bitmask) >> bitoff;
-		newv = (oldv & ~bitmask) | (x << bitoff);
-	} while (__cmpxchg_u32(p, oldv, newv) != oldv);
-
-	return ret;
-}
-
-static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
-{
-	return __xchg_cmpxchg(m, val, sizeof *m);
-}
-
-static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
-{
-	return __xchg_cmpxchg(m, val, sizeof *m);
-}
+#include <asm/cmpxchg-xchg.h>
 
 #endif /* __ASM_SH_CMPXCHG_LLSC_H */
diff --git a/arch/sh/include/asm/cmpxchg-xchg.h b/arch/sh/include/asm/cmpxchg-xchg.h
new file mode 100644
index 0000000..7219719
--- /dev/null
+++ b/arch/sh/include/asm/cmpxchg-xchg.h
@@ -0,0 +1,51 @@
+#ifndef __ASM_SH_CMPXCHG_XCHG_H
+#define __ASM_SH_CMPXCHG_XCHG_H
+
+/*
+ * Copyright (C) 2016 Red Hat, Inc.
+ * Author: Michael S. Tsirkin <mst@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See the
+ * file "COPYING" in the main directory of this archive for more details.
+ */
+#include <linux/bitops.h>
+#include <asm/byteorder.h>
+
+/*
+ * Portable implementations of 1 and 2 byte xchg using a 4 byte cmpxchg.
+ * Note: this header isn't self-contained: before including it, __cmpxchg_u32
+ * must be defined first.
+ */
+static inline u32 __xchg_cmpxchg(volatile void *ptr, u32 x, int size)
+{
+	int off = (unsigned long)ptr % sizeof(u32);
+	volatile u32 *p = ptr - off;
+#ifdef __BIG_ENDIAN
+	int bitoff = (sizeof(u32) - 1 - off) * BITS_PER_BYTE;
+#else
+	int bitoff = off * BITS_PER_BYTE;
+#endif
+	u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;
+	u32 oldv, newv;
+	u32 ret;
+
+	do {
+		oldv = READ_ONCE(*p);
+		ret = (oldv & bitmask) >> bitoff;
+		newv = (oldv & ~bitmask) | (x << bitoff);
+	} while (__cmpxchg_u32(p, oldv, newv) != oldv);
+
+	return ret;
+}
+
+static inline unsigned long xchg_u16(volatile u16 *m, unsigned long val)
+{
+	return __xchg_cmpxchg(m, val, sizeof *m);
+}
+
+static inline unsigned long xchg_u8(volatile u8 *m, unsigned long val)
+{
+	return __xchg_cmpxchg(m, val, sizeof *m);
+}
+
+#endif /* __ASM_SH_CMPXCHG_XCHG_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 33/41] virtio_ring: use virt_store_mb
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (31 preceding siblings ...)
  2016-01-10 14:20 ` [PATCH v3 32/41] sh: move xchg_cmpxchg to a header by itself Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
  2016-01-10 14:21 ` [PATCH v3 34/41] checkpatch.pl: add missing memory barriers Michael S. Tsirkin
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel

We need a full barrier after writing out event index, using
virt_store_mb there seems better than open-coding.  As usual, we need a
wrapper to account for strong barriers.

It's tempting to use this in vhost as well, for that, we'll
need a variant of smp_store_mb that works on __user pointers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/virtio_ring.h  | 11 +++++++++++
 drivers/virtio/virtio_ring.c | 15 +++++++++------
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index f3fa55b..a156e2b 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -45,6 +45,17 @@ static inline void virtio_wmb(bool weak_barriers)
 		wmb();
 }
 
+static inline void virtio_store_mb(bool weak_barriers,
+				   __virtio16 *p, __virtio16 v)
+{
+	if (weak_barriers) {
+		virt_store_mb(*p, v);
+	} else {
+		WRITE_ONCE(*p, v);
+		mb();
+	}
+}
+
 struct virtio_device;
 struct virtqueue;
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ee663c4..e12e385 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -517,10 +517,10 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
 	 * the read in the next get_buf call. */
-	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
-		vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx);
-		virtio_mb(vq->weak_barriers);
-	}
+	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
+		virtio_store_mb(vq->weak_barriers,
+				&vring_used_event(&vq->vring),
+				cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
 
 #ifdef DEBUG
 	vq->last_add_time_valid = false;
@@ -653,8 +653,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 	}
 	/* TODO: tune this threshold */
 	bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
-	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs);
-	virtio_mb(vq->weak_barriers);
+
+	virtio_store_mb(vq->weak_barriers,
+			&vring_used_event(&vq->vring),
+			cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
+
 	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
 		END_USE(vq);
 		return false;
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 34/41] checkpatch.pl: add missing memory barriers
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (32 preceding siblings ...)
  2016-01-10 14:21 ` [PATCH v3 33/41] virtio_ring: use virt_store_mb Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
  2016-01-10 14:21 ` [PATCH v3 35/41] checkpatch: check for __smp outside barrier.h Michael S. Tsirkin
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Andy Whitcroft

SMP-only barriers were missing in checkpatch.pl

Refactor code slightly to make adding more variants easier.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 scripts/checkpatch.pl | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 2b3c228..97b8b62 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5116,7 +5116,25 @@ sub process {
 			}
 		}
 # check for memory barriers without a comment.
-		if ($line =~ /\b(mb|rmb|wmb|read_barrier_depends|smp_mb|smp_rmb|smp_wmb|smp_read_barrier_depends)\(/) {
+
+		my $barriers = qr{
+			mb|
+			rmb|
+			wmb|
+			read_barrier_depends
+		}x;
+		my $smp_barriers = qr{
+			store_release|
+			load_acquire|
+			store_mb|
+			($barriers)
+		}x;
+		my $all_barriers = qr{
+			$barriers|
+			smp_($smp_barriers)
+		}x;
+
+		if ($line =~ /\b($all_barriers)\s*\(/) {
 			if (!ctx_has_comment($first_line, $linenr)) {
 				WARN("MEMORY_BARRIER",
 				     "memory barrier without comment\n" . $herecurr);
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 35/41] checkpatch: check for __smp outside barrier.h
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (33 preceding siblings ...)
  2016-01-10 14:21 ` [PATCH v3 34/41] checkpatch.pl: add missing memory barriers Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
  2016-01-10 14:21 ` [PATCH v3 36/41] checkpatch: add virt barriers Michael S. Tsirkin
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Andy Whitcroft

Introduction of __smp barriers cleans up a bunch of duplicate code, but
it gives people an additional handle onto a "new" set of barriers - just
because they're prefixed with __* unfortunately doesn't stop anyone from
using it (as happened with other arch stuff before.)

Add a checkpatch test so it will trigger a warning.

Reported-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 scripts/checkpatch.pl | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 97b8b62..a96adcb 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5141,6 +5141,16 @@ sub process {
 			}
 		}
 
+		my $underscore_smp_barriers = qr{__smp_($smp_barriers)}x;
+
+		if ($realfile !~ m@^include/asm-generic/@ &&
+		    $realfile !~ m@/barrier\.h$@ &&
+		    $line =~ m/\b($underscore_smp_barriers)\s*\(/ &&
+		    $line !~ m/^.\s*\#\s*define\s+($underscore_smp_barriers)\s*\(/) {
+			WARN("MEMORY_BARRIER",
+			     "__smp memory barriers shouldn't be used outside barrier.h and asm-generic\n" . $herecurr);
+		}
+
 # check for waitqueue_active without a comment.
 		if ($line =~ /\bwaitqueue_active\s*\(/) {
 			if (!ctx_has_comment($first_line, $linenr)) {
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 36/41] checkpatch: add virt barriers
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (34 preceding siblings ...)
  2016-01-10 14:21 ` [PATCH v3 35/41] checkpatch: check for __smp outside barrier.h Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
  2016-01-10 14:21 ` [PATCH v3 37/41] xenbus: use virt_xxx barriers Michael S. Tsirkin
                   ` (5 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Andy Whitcroft

Add virt_ barriers to list of barriers to check for
presence of a comment.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 scripts/checkpatch.pl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index a96adcb..5ca272b 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -5131,7 +5131,8 @@ sub process {
 		}x;
 		my $all_barriers = qr{
 			$barriers|
-			smp_($smp_barriers)
+			smp_($smp_barriers)|
+			virt_($smp_barriers)
 		}x;
 
 		if ($line =~ /\b($all_barriers)\s*\(/) {
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 37/41] xenbus: use virt_xxx barriers
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (35 preceding siblings ...)
  2016-01-10 14:21 ` [PATCH v3 36/41] checkpatch: add virt barriers Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
  2016-01-10 14:21 ` [PATCH v3 38/41] xen/io: " Michael S. Tsirkin
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, David Vrabel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

drivers/xen/xenbus/xenbus_comms.c uses
full memory barriers to communicate with the other side.

For guests compiled with CONFIG_SMP, smp_wmb and smp_mb
would be sufficient, so mb() and wmb() here are only needed if
a non-SMP guest runs on an SMP host.

Switch to virt_xxx barriers which serve this exact purpose.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/xenbus/xenbus_comms.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
index fdb0f33..ecdecce 100644
--- a/drivers/xen/xenbus/xenbus_comms.c
+++ b/drivers/xen/xenbus/xenbus_comms.c
@@ -123,14 +123,14 @@ int xb_write(const void *data, unsigned len)
 			avail = len;
 
 		/* Must write data /after/ reading the consumer index. */
-		mb();
+		virt_mb();
 
 		memcpy(dst, data, avail);
 		data += avail;
 		len -= avail;
 
 		/* Other side must not see new producer until data is there. */
-		wmb();
+		virt_wmb();
 		intf->req_prod += avail;
 
 		/* Implies mb(): other side will see the updated producer. */
@@ -180,14 +180,14 @@ int xb_read(void *data, unsigned len)
 			avail = len;
 
 		/* Must read data /after/ reading the producer index. */
-		rmb();
+		virt_rmb();
 
 		memcpy(data, src, avail);
 		data += avail;
 		len -= avail;
 
 		/* Other side must not see free space until we've copied out */
-		mb();
+		virt_mb();
 		intf->rsp_cons += avail;
 
 		pr_debug("Finished read of %i bytes (%i to go)\n", avail, len);
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 38/41] xen/io: use virt_xxx barriers
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (36 preceding siblings ...)
  2016-01-10 14:21 ` [PATCH v3 37/41] xenbus: use virt_xxx barriers Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
  2016-01-10 14:21 ` [PATCH v3 39/41] xen/events: " Michael S. Tsirkin
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, David Vrabel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky

include/xen/interface/io/ring.h uses
full memory barriers to communicate with the other side.

For guests compiled with CONFIG_SMP, smp_wmb and smp_mb
would be sufficient, so mb() and wmb() here are only needed if
a non-SMP guest runs on an SMP host.

Switch to virt_xxx barriers which serve this exact purpose.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
---
 include/xen/interface/io/ring.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/xen/interface/io/ring.h b/include/xen/interface/io/ring.h
index 7dc685b..21f4fbd 100644
--- a/include/xen/interface/io/ring.h
+++ b/include/xen/interface/io/ring.h
@@ -208,12 +208,12 @@ struct __name##_back_ring {						\
 
 
 #define RING_PUSH_REQUESTS(_r) do {					\
-    wmb(); /* back sees requests /before/ updated producer index */	\
+    virt_wmb(); /* back sees requests /before/ updated producer index */	\
     (_r)->sring->req_prod = (_r)->req_prod_pvt;				\
 } while (0)
 
 #define RING_PUSH_RESPONSES(_r) do {					\
-    wmb(); /* front sees responses /before/ updated producer index */	\
+    virt_wmb(); /* front sees responses /before/ updated producer index */	\
     (_r)->sring->rsp_prod = (_r)->rsp_prod_pvt;				\
 } while (0)
 
@@ -250,9 +250,9 @@ struct __name##_back_ring {						\
 #define RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(_r, _notify) do {		\
     RING_IDX __old = (_r)->sring->req_prod;				\
     RING_IDX __new = (_r)->req_prod_pvt;				\
-    wmb(); /* back sees requests /before/ updated producer index */	\
+    virt_wmb(); /* back sees requests /before/ updated producer index */	\
     (_r)->sring->req_prod = __new;					\
-    mb(); /* back sees new requests /before/ we check req_event */	\
+    virt_mb(); /* back sees new requests /before/ we check req_event */	\
     (_notify) = ((RING_IDX)(__new - (_r)->sring->req_event) <		\
 		 (RING_IDX)(__new - __old));				\
 } while (0)
@@ -260,9 +260,9 @@ struct __name##_back_ring {						\
 #define RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(_r, _notify) do {		\
     RING_IDX __old = (_r)->sring->rsp_prod;				\
     RING_IDX __new = (_r)->rsp_prod_pvt;				\
-    wmb(); /* front sees responses /before/ updated producer index */	\
+    virt_wmb(); /* front sees responses /before/ updated producer index */	\
     (_r)->sring->rsp_prod = __new;					\
-    mb(); /* front sees new responses /before/ we check rsp_event */	\
+    virt_mb(); /* front sees new responses /before/ we check rsp_event */	\
     (_notify) = ((RING_IDX)(__new - (_r)->sring->rsp_event) <		\
 		 (RING_IDX)(__new - __old));				\
 } while (0)
@@ -271,7 +271,7 @@ struct __name##_back_ring {						\
     (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r);			\
     if (_work_to_do) break;						\
     (_r)->sring->req_event = (_r)->req_cons + 1;			\
-    mb();								\
+    virt_mb();								\
     (_work_to_do) = RING_HAS_UNCONSUMED_REQUESTS(_r);			\
 } while (0)
 
@@ -279,7 +279,7 @@ struct __name##_back_ring {						\
     (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r);			\
     if (_work_to_do) break;						\
     (_r)->sring->rsp_event = (_r)->rsp_cons + 1;			\
-    mb();								\
+    virt_mb();								\
     (_work_to_do) = RING_HAS_UNCONSUMED_RESPONSES(_r);			\
 } while (0)
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 39/41] xen/events: use virt_xxx barriers
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (37 preceding siblings ...)
  2016-01-10 14:21 ` [PATCH v3 38/41] xen/io: " Michael S. Tsirkin
@ 2016-01-10 14:21 ` Michael S. Tsirkin
  2016-01-11 11:12   ` David Vrabel
  2016-01-10 14:22 ` [PATCH v3 40/41] s390: use generic memory barriers Michael S. Tsirkin
                   ` (2 subsequent siblings)
  41 siblings, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, David Vrabel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Julien Grall, Ross Lagerwall,
	Stefano Stabellini, Wei Liu

drivers/xen/events/events_fifo.c uses rmb() to communicate with the
other side.

For guests compiled with CONFIG_SMP, smp_rmb would be sufficient, so
rmb() here is only needed if a non-SMP guest runs on an SMP host.

Switch to the virt_rmb barrier which serves this exact purpose.

Pull in asm/barrier.h here to make sure the file is self-contained.

Suggested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/xen/events/events_fifo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 96a1b8d..eff2b88 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -41,6 +41,7 @@
 #include <linux/percpu.h>
 #include <linux/cpu.h>
 
+#include <asm/barrier.h>
 #include <asm/sync_bitops.h>
 #include <asm/xen/hypercall.h>
 #include <asm/xen/hypervisor.h>
@@ -296,7 +297,7 @@ static void consume_one_event(unsigned cpu,
 	 * control block.
 	 */
 	if (head == 0) {
-		rmb(); /* Ensure word is up-to-date before reading head. */
+		virt_rmb(); /* Ensure word is up-to-date before reading head. */
 		head = control_block->head[priority];
 	}
 
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 40/41] s390: use generic memory barriers
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (38 preceding siblings ...)
  2016-01-10 14:21 ` [PATCH v3 39/41] xen/events: " Michael S. Tsirkin
@ 2016-01-10 14:22 ` Michael S. Tsirkin
  2016-01-10 14:22 ` [PATCH v3 41/41] s390: more efficient smp barriers Michael S. Tsirkin
  2016-01-12 12:50 ` [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Peter Zijlstra
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Martin Schwidefsky, Heiko Carstens,
	Ingo Molnar, Davidlohr Bueso, Christian Borntraeger

The s390 kernel is SMP to 99.99%, we just didn't bother with a
non-smp variant for the memory-barriers. If the generic header
is used we'd get the non-smp version for free. It will save a
small amount of text space for CONFIG_SMP=n.

Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/s390/include/asm/barrier.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index fbd25b2..4d26fa4 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -29,9 +29,6 @@
 #define __smp_mb()			mb()
 #define __smp_rmb()			rmb()
 #define __smp_wmb()			wmb()
-#define smp_mb()			__smp_mb()
-#define smp_rmb()			__smp_rmb()
-#define smp_wmb()			__smp_wmb()
 
 #define __smp_store_release(p, v)					\
 do {									\
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH v3 41/41] s390: more efficient smp barriers
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (39 preceding siblings ...)
  2016-01-10 14:22 ` [PATCH v3 40/41] s390: use generic memory barriers Michael S. Tsirkin
@ 2016-01-10 14:22 ` Michael S. Tsirkin
  2016-01-12 12:50 ` [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Peter Zijlstra
  41 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-10 14:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Martin Schwidefsky, Heiko Carstens,
	Ingo Molnar, Davidlohr Bueso, Christian Borntraeger

As per: lkml.kernel.org/r/20150921112252.3c2937e1@mschwide
atomics imply a barrier on s390, so s390 should change
smp_mb__before_atomic and smp_mb__after_atomic to barrier() instead of
smp_mb() and hence should not use the generic versions.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/s390/include/asm/barrier.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 4d26fa4..5c8db3c 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -45,6 +45,9 @@ do {									\
 	___p1;								\
 })
 
+#define __smp_mb__before_atomic()	barrier()
+#define __smp_mb__after_atomic()	barrier()
+
 #include <asm-generic/barrier.h>
 
 #endif /* __ASM_BARRIER_H */
-- 
MST

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH v3 39/41] xen/events: use virt_xxx barriers
  2016-01-10 14:21 ` [PATCH v3 39/41] xen/events: " Michael S. Tsirkin
@ 2016-01-11 11:12   ` David Vrabel
  0 siblings, 0 replies; 153+ messages in thread
From: David Vrabel @ 2016-01-11 11:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Julien Grall, Ross Lagerwall, Stefano Stabellini, Wei Liu

On 10/01/16 14:21, Michael S. Tsirkin wrote:
> drivers/xen/events/events_fifo.c uses rmb() to communicate with the
> other side.
> 
> For guests compiled with CONFIG_SMP, smp_rmb would be sufficient, so
> rmb() here is only needed if a non-SMP guest runs on an SMP host.
> 
> Switch to the virt_rmb barrier which serves this exact purpose.
> 
> Pull in asm/barrier.h here to make sure the file is self-contained.
> 
> Suggested-by: David Vrabel <david.vrabel@citrix.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Acked-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-10 14:18 ` [PATCH v3 11/41] mips: " Michael S. Tsirkin
@ 2016-01-12  1:14   ` Leonid Yegoshin
  2016-01-12  8:43     ` Michael S. Tsirkin
  2016-01-12  9:27     ` Peter Zijlstra
  0 siblings, 2 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-12  1:14 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Peter Zijlstra, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar

On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> smp_read_barrier_depends, smp_store_release and smp_load_acquire  match
> the asm-generic variants exactly. Drop the local definitions and pull in
> asm-generic/barrier.h instead.
>
This statement doesn't fit MIPS barriers variations. Moreover, there is 
a reason to extend that even more specific, at least for 
smp_store_release and smp_load_acquire, look into

     http://patchwork.linux-mips.org/patch/10506/

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12  1:14   ` [v3,11/41] " Leonid Yegoshin
@ 2016-01-12  8:43     ` Michael S. Tsirkin
  2016-01-12  9:51       ` Peter Zijlstra
  2016-01-12  9:27     ` Peter Zijlstra
  1 sibling, 1 reply; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12  8:43 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar

On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> >smp_read_barrier_depends, smp_store_release and smp_load_acquire  match
> >the asm-generic variants exactly. Drop the local definitions and pull in
> >asm-generic/barrier.h instead.
> >
> This statement doesn't fit MIPS barriers variations. Moreover, there is a
> reason to extend that even more specific, at least for smp_store_release and
> smp_load_acquire, look into
> 
>     http://patchwork.linux-mips.org/patch/10506/
> 
> - Leonid.

Fine, but it matches what current code is doing.  Since that
MIPS_LIGHTWEIGHT_SYNC patch didn't go into linux-next yet, do
you see a problem reworking it on top of this patchset?

-- 
MST

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12  1:14   ` [v3,11/41] " Leonid Yegoshin
  2016-01-12  8:43     ` Michael S. Tsirkin
@ 2016-01-12  9:27     ` Peter Zijlstra
  2016-01-12 10:25       ` Peter Zijlstra
  1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12  9:27 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Michael S. Tsirkin, linux-kernel, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	will.deacon, james.hogan

On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:

> This statement doesn't fit MIPS barriers variations. Moreover, there is a
> reason to extend that even more specific, at least for smp_store_release and
> smp_load_acquire, look into
> 
>     http://patchwork.linux-mips.org/patch/10506/

Dude, that's one horrible patch.

1) you do not make such things selectable; either the hardware needs
them or it doesn't. If it does you _must_ use them, however unlikely.

2) the changelog _completely_ fails to explain the sync 0x11 and sync
0x12 semantics nor does it provide a publicly accessible link to
documentation that does.

3) it really should have explained what you did with
smp_llsc_mb/smp_mb__before_llsc() in _detail_.

And I agree that ideally it should be split into parts.

Seriously, this is _NOT_ OK.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12  8:43     ` Michael S. Tsirkin
@ 2016-01-12  9:51       ` Peter Zijlstra
  0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12  9:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Leonid Yegoshin, linux-kernel, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar

On Tue, Jan 12, 2016 at 10:43:36AM +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 11, 2016 at 05:14:14PM -0800, Leonid Yegoshin wrote:
> > On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
> > >On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
> > >smp_read_barrier_depends, smp_store_release and smp_load_acquire  match
> > >the asm-generic variants exactly. Drop the local definitions and pull in
> > >asm-generic/barrier.h instead.
> > >
> > This statement doesn't fit MIPS barriers variations. Moreover, there is a
> > reason to extend that even more specific, at least for smp_store_release and
> > smp_load_acquire, look into
> > 
> >     http://patchwork.linux-mips.org/patch/10506/
> > 
> > - Leonid.
> 
> Fine, but it matches what current code is doing.  Since that
> MIPS_LIGHTWEIGHT_SYNC patch didn't go into linux-next yet, do
> you see a problem reworking it on top of this patchset?

That patch is a complete doorstop atm. It needs a lot more work before
it can go anywhere. Don't worry about it.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12  9:27     ` Peter Zijlstra
@ 2016-01-12 10:25       ` Peter Zijlstra
  2016-01-12 10:40         ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 10:25 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Michael S. Tsirkin, linux-kernel, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	will.deacon, james.hogan

On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> 0x12 semantics nor does it provide a publicly accessible link to
> documentation that does.

Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/

> 3) it really should have explained what you did with
> smp_llsc_mb/smp_mb__before_llsc() in _detail_.

And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
are _NOT_ transitive and therefore cannot be used to implement the
smp_mb__{before,after} stuff.

That is, in MIPS speak, those SYNC types are Ordering Barriers, not
Completion Barriers. They need not be globally performed.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12 10:25       ` Peter Zijlstra
@ 2016-01-12 10:40         ` Peter Zijlstra
  2016-01-12 11:41           ` Will Deacon
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 10:40 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Michael S. Tsirkin, linux-kernel, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	will.deacon, james.hogan, Michael Ellerman, Paul McKenney

On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> > 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> > 0x12 semantics nor does it provide a publicly accessible link to
> > documentation that does.
> 
> Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/
> 
> > 3) it really should have explained what you did with
> > smp_llsc_mb/smp_mb__before_llsc() in _detail_.
> 
> And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> are _NOT_ transitive and therefore cannot be used to implement the
> smp_mb__{before,after} stuff.
> 
> That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> Completion Barriers. They need not be globally performed.

Which if true; and I know Will has some questions here; would also mean
that you 'cannot' use the ACQUIRE/RELEASE barriers for your locks as was
recently suggested by David Daney.

That is, currently all architectures -- with exception of PPC -- have
RCsc locks, but using these non-transitive things will get you RCpc
locks.

So yes, MIPS can go RCpc for its locks and share the burden of pain with
PPC, but that needs to be a very concious decision.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12 10:40         ` Peter Zijlstra
@ 2016-01-12 11:41           ` Will Deacon
  2016-01-12 20:45             ` Leonid Yegoshin
  0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-12 11:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Leonid Yegoshin, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On Tue, Jan 12, 2016 at 11:40:12AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 12, 2016 at 11:25:55AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 12, 2016 at 10:27:11AM +0100, Peter Zijlstra wrote:
> > > 2) the changelog _completely_ fails to explain the sync 0x11 and sync
> > > 0x12 semantics nor does it provide a publicly accessible link to
> > > documentation that does.
> > 
> > Ralf pointed me at: https://imgtec.com/mips/architectures/mips64/
> > 
> > > 3) it really should have explained what you did with
> > > smp_llsc_mb/smp_mb__before_llsc() in _detail_.
> > 
> > And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> > are _NOT_ transitive and therefore cannot be used to implement the
> > smp_mb__{before,after} stuff.
> > 
> > That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> > Completion Barriers. They need not be globally performed.
> 
> Which if true; and I know Will has some questions here; would also mean
> that you 'cannot' use the ACQUIRE/RELEASE barriers for your locks as was
> recently suggested by David Daney.

The issue I have with the SYNC description in the text above is that it
describes the single CPU (program order) and the dual-CPU (confusingly
named global order) cases, but then doesn't generalise any further. That
means we can't sensibly reason about transitivity properties when a third
agent is involved. For example, the WRC+sync+addr test:


P0:
Wx = 1

P1:
Rx == 1
SYNC
Wy = 1

P2:
Ry == 1
<address dep>
Rx = 0


I can't find anything to forbid that, given the text. The main problem
is having the SYNC on P1 affect the write by P0.

> That is, currently all architectures -- with exception of PPC -- have
> RCsc locks, but using these non-transitive things will get you RCpc
> locks.
> 
> So yes, MIPS can go RCpc for its locks and share the burden of pain with
> PPC, but that needs to be a very concious decision.

I think it's much worse than RCpc, given my interpretation of the wording.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH v3 00/41] arch: barrier cleanup + barriers for virt
  2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
                   ` (40 preceding siblings ...)
  2016-01-10 14:22 ` [PATCH v3 41/41] s390: more efficient smp barriers Michael S. Tsirkin
@ 2016-01-12 12:50 ` Peter Zijlstra
  41 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 12:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel

On Sun, Jan 10, 2016 at 04:16:22PM +0200, Michael S. Tsirkin wrote:
> I parked this in vhost tree for now, though the inclusion of patch 1 from tip
> creates a merge conflict - but one that is trivial to resolve.
> 
> So I intend to just merge it all through my tree, including the
> duplicate patch, and assume conflict will be resolved.
> 
> I would really appreciate some feedback on arch bits (especially the x86 bits),
> and acks for merging this through the vhost tree.

Thanks for doing this, looks good to me.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH v3 13/41] x86: reuse asm-generic/barrier.h
  2016-01-10 14:18 ` [PATCH v3 13/41] x86: " Michael S. Tsirkin
@ 2016-01-12 14:10   ` Thomas Gleixner
  0 siblings, 0 replies; 153+ messages in thread
From: Thomas Gleixner @ 2016-01-12 14:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski

On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:

> As on most architectures, on x86 read_barrier_depends and
> smp_read_barrier_depends are empty.  Drop the local definitions and pull
> the generic ones from asm-generic/barrier.h instead: they are identical.
> 
> This is in preparation to refactoring this code area.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH v3 27/41] x86: define __smp_xxx
  2016-01-10 14:20 ` [PATCH v3 27/41] x86: " Michael S. Tsirkin
@ 2016-01-12 14:11   ` Thomas Gleixner
  0 siblings, 0 replies; 153+ messages in thread
From: Thomas Gleixner @ 2016-01-12 14:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski

On Sun, 10 Jan 2016, Michael S. Tsirkin wrote:

> This defines __smp_xxx barriers for x86,
> for use by virtualization.
> 
> smp_xxx barriers are removed as they are
> defined correctly by asm-generic/barriers.h
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()
  2016-01-10 14:16 ` [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release() Michael S. Tsirkin
@ 2016-01-12 16:28   ` Paul E. McKenney
  2016-01-12 18:40     ` Michael S. Tsirkin
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-12 16:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Davidlohr Bueso, Davidlohr Bueso,
	Andrew Morton, Benjamin Herrenschmidt, Heiko Carstens,
	Linus Torvalds, Tony Luck, Ingo Molnar, Fenghua Yu,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Christian Borntraeger

On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote:
> From: Davidlohr Bueso <dave@stgolabs.net>
> 
> With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
> it was made clear that the context of this call (and thus set_mb)
> is strictly for CPU ordering, as opposed to IO. As such all archs
> should use the smp variant of mb(), respecting the semantics and
> saving a mandatory barrier on UP.
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: <linux-arch@vger.kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: dave@stgolabs.net
> Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
> Signed-off-by: Ingo Molnar <mingo@kernel.org>

Aside from a need for s/lcoking/locking/ in the subject line:

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  arch/ia64/include/asm/barrier.h    | 2 +-
>  arch/powerpc/include/asm/barrier.h | 2 +-
>  arch/s390/include/asm/barrier.h    | 2 +-
>  include/asm-generic/barrier.h      | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
> index df896a1..209c4b8 100644
> --- a/arch/ia64/include/asm/barrier.h
> +++ b/arch/ia64/include/asm/barrier.h
> @@ -77,7 +77,7 @@ do {									\
>  	___p1;								\
>  })
> 
> -#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); mb(); } while (0)
> +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> 
>  /*
>   * The group barrier in front of the rsm & ssm are necessary to ensure
> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> index 0eca6ef..a7af5fb 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -34,7 +34,7 @@
>  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
>  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> 
> -#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); mb(); } while (0)
> +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> 
>  #ifdef __SUBARCH_HAS_LWSYNC
>  #    define SMPWMB      LWSYNC
> diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
> index d68e11e..7ffd0b1 100644
> --- a/arch/s390/include/asm/barrier.h
> +++ b/arch/s390/include/asm/barrier.h
> @@ -36,7 +36,7 @@
>  #define smp_mb__before_atomic()		smp_mb()
>  #define smp_mb__after_atomic()		smp_mb()
> 
> -#define smp_store_mb(var, value)		do { WRITE_ONCE(var, value); mb(); } while (0)
> +#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> 
>  #define smp_store_release(p, v)						\
>  do {									\
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index b42afad..0f45f93 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -93,7 +93,7 @@
>  #endif	/* CONFIG_SMP */
> 
>  #ifndef smp_store_mb
> -#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); mb(); } while (0)
> +#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); smp_mb(); } while (0)
>  #endif
> 
>  #ifndef smp_mb__before_atomic
> -- 
> MST
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH v3 05/41] powerpc: reuse asm-generic/barrier.h
  2016-01-10 14:17 ` [PATCH v3 05/41] powerpc: " Michael S. Tsirkin
@ 2016-01-12 16:31   ` Paul E. McKenney
  0 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-12 16:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Ingo Molnar, Davidlohr Bueso

On Sun, Jan 10, 2016 at 04:17:09PM +0200, Michael S. Tsirkin wrote:
> On powerpc read_barrier_depends, smp_read_barrier_depends
> smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
> asm-generic variants exactly. Drop the local definitions and pull in
> asm-generic/barrier.h instead.
> 
> This is in preparation to refactoring this code area.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>

Looks sane to me.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> ---
>  arch/powerpc/include/asm/barrier.h | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> index a7af5fb..980ad0c 100644
> --- a/arch/powerpc/include/asm/barrier.h
> +++ b/arch/powerpc/include/asm/barrier.h
> @@ -34,8 +34,6 @@
>  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
>  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> 
> -#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> -
>  #ifdef __SUBARCH_HAS_LWSYNC
>  #    define SMPWMB      LWSYNC
>  #else
> @@ -60,9 +58,6 @@
>  #define smp_wmb()	barrier()
>  #endif /* CONFIG_SMP */
> 
> -#define read_barrier_depends()		do { } while (0)
> -#define smp_read_barrier_depends()	do { } while (0)
> -
>  /*
>   * This is a barrier which prevents following instructions from being
>   * started until the value of the argument x is known.  For example, if
> @@ -87,8 +82,8 @@ do {									\
>  	___p1;								\
>  })
> 
> -#define smp_mb__before_atomic()     smp_mb()
> -#define smp_mb__after_atomic()      smp_mb()
>  #define smp_mb__before_spinlock()   smp_mb()
> 
> +#include <asm-generic/barrier.h>
> +
>  #endif /* _ASM_POWERPC_BARRIER_H */
> -- 
> MST
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release()
  2016-01-12 16:28   ` Paul E. McKenney
@ 2016-01-12 18:40     ` Michael S. Tsirkin
  0 siblings, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 18:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, Peter Zijlstra, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Davidlohr Bueso, Davidlohr Bueso,
	Andrew Morton, Benjamin Herrenschmidt, Heiko Carstens,
	Linus Torvalds, Tony Luck, Ingo Molnar, Fenghua Yu,
	Paul Mackerras, Michael Ellerman, Martin Schwidefsky,
	Christian Borntraeger

On Tue, Jan 12, 2016 at 08:28:44AM -0800, Paul E. McKenney wrote:
> On Sun, Jan 10, 2016 at 04:16:32PM +0200, Michael S. Tsirkin wrote:
> > From: Davidlohr Bueso <dave@stgolabs.net>
> > 
> > With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
> > it was made clear that the context of this call (and thus set_mb)
> > is strictly for CPU ordering, as opposed to IO. As such all archs
> > should use the smp variant of mb(), respecting the semantics and
> > saving a mandatory barrier on UP.
> > 
> > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: <linux-arch@vger.kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Tony Luck <tony.luck@intel.com>
> > Cc: dave@stgolabs.net
> > Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> 
> Aside from a need for s/lcoking/locking/ in the subject line:
> 
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Thanks!
Though Ingo already put this in tip tree like this,
and I need a copy in my tree to avoid breaking bisect,
so I will probably keep it exactly the same to avoid confusion.

> > ---
> >  arch/ia64/include/asm/barrier.h    | 2 +-
> >  arch/powerpc/include/asm/barrier.h | 2 +-
> >  arch/s390/include/asm/barrier.h    | 2 +-
> >  include/asm-generic/barrier.h      | 2 +-
> >  4 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
> > index df896a1..209c4b8 100644
> > --- a/arch/ia64/include/asm/barrier.h
> > +++ b/arch/ia64/include/asm/barrier.h
> > @@ -77,7 +77,7 @@ do {									\
> >  	___p1;								\
> >  })
> > 
> > -#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); mb(); } while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> > 
> >  /*
> >   * The group barrier in front of the rsm & ssm are necessary to ensure
> > diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
> > index 0eca6ef..a7af5fb 100644
> > --- a/arch/powerpc/include/asm/barrier.h
> > +++ b/arch/powerpc/include/asm/barrier.h
> > @@ -34,7 +34,7 @@
> >  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
> >  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> > 
> > -#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); mb(); } while (0)
> > +#define smp_store_mb(var, value) do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> > 
> >  #ifdef __SUBARCH_HAS_LWSYNC
> >  #    define SMPWMB      LWSYNC
> > diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
> > index d68e11e..7ffd0b1 100644
> > --- a/arch/s390/include/asm/barrier.h
> > +++ b/arch/s390/include/asm/barrier.h
> > @@ -36,7 +36,7 @@
> >  #define smp_mb__before_atomic()		smp_mb()
> >  #define smp_mb__after_atomic()		smp_mb()
> > 
> > -#define smp_store_mb(var, value)		do { WRITE_ONCE(var, value); mb(); } while (0)
> > +#define smp_store_mb(var, value)	do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> > 
> >  #define smp_store_release(p, v)						\
> >  do {									\
> > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> > index b42afad..0f45f93 100644
> > --- a/include/asm-generic/barrier.h
> > +++ b/include/asm-generic/barrier.h
> > @@ -93,7 +93,7 @@
> >  #endif	/* CONFIG_SMP */
> > 
> >  #ifndef smp_store_mb
> > -#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); mb(); } while (0)
> > +#define smp_store_mb(var, value)  do { WRITE_ONCE(var, value); smp_mb(); } while (0)
> >  #endif
> > 
> >  #ifndef smp_mb__before_atomic
> > -- 
> > MST
> > 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12 11:41           ` Will Deacon
@ 2016-01-12 20:45             ` Leonid Yegoshin
  2016-01-12 21:40               ` Peter Zijlstra
  2016-01-13 10:45               ` Will Deacon
  0 siblings, 2 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-12 20:45 UTC (permalink / raw)
  To: Will Deacon, Peter Zijlstra
  Cc: Michael S. Tsirkin, linux-kernel, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman, Paul McKenney

(I try to answer on multiple mails in one)

First of all, it seems like some generic notes should be given here:

1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in 
some CPUs. On that CPUs it basically kills pipelines in each CPU, can do 
a special memory/IO bus transaction (similar to "fence") and hold a 
system until all R/W is completed. It is like Big Kernel Lock but worse. 
So, the move to SMP_* kind of barriers is needed to improve performance, 
especially on newest CPUs with long pipelines.

2. MIPS Arch document may be misleading because words "ordering" and 
"completion" means different from Linux, the SYNC instruction 
description is written for HW engineers. I wrote that in a separate 
patch of the same patchset - 
http://patchwork.linux-mips.org/patch/10505/ "MIPS: R6: Use lightweight 
SYNC instruction in smp_* memory barriers":

> This instructions were specifically designed to work for smp_*() sort of
> memory barriers in MIPS R2/R3/R5 and R6.
>
> Unfortunately, it's description is very cryptic and is done in HW engineering
> style which prevents use of it by SW.

3. I bother MIPS Arch team long time until I completely understood that 
MIPS SYNC_WMB, SYNC_MB, SYNC_RMB, SYNC_RELEASE and SYNC_ACQUIRE do an 
exactly that is required in Documentation/memory-barriers.txt


In Peter Zijlstra mail:

> 1) you do not make such things selectable; either the hardware needs
> them or it doesn't. If it does you_must_  use them, however unlikely.
It is selectable only for MIPS R2 but not MIPS R6. The reason is - most 
of MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU 
resource, especially taking into account that "lightweight syncs" are 
converted to a heavy "SYNC 0" in many of that CPUs. However the latest 
MIPS/Imagination CPU have a pipeline long enough to hit a problem - 
absence of SYNC at LL/SC inside atomics, barriers etc.

> And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> are_NOT_  transitive and therefore cannot be used to implement the
> smp_mb__{before,after} stuff.
>
> That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> Completion Barriers.

Please see above, point 2.

> That is, currently all architectures -- with exception of PPC -- have
> RCsc locks, but using these non-transitive things will get you RCpc
> locks.
>
> So yes, MIPS can go RCpc for its locks and share the burden of pain with
> PPC, but that needs to be a very concious decision.

I don't understand that - I tried hard but I can't find any word like 
"RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.


In Will Deacon mail:

> The issue I have with the SYNC description in the text above is that it
> describes the single CPU (program order) and the dual-CPU (confusingly
> named global order) cases, but then doesn't generalise any further. That
> means we can't sensibly reason about transitivity properties when a third
> agent is involved. For example, the WRC+sync+addr test:
>
>
> P0:
> Wx = 1
>
> P1:
> Rx == 1
> SYNC
> Wy = 1
>
> P2:
> Ry == 1
> <address dep>
> Rx = 0
>
>
> I can't find anything to forbid that, given the text. The main problem
> is having the SYNC on P1 affect the write by P0.

As I understand that test, the visibility of P0: W[x] = 1 is identical 
to P1 and P2 here. If P1 got X before SYNC and write to Y after SYNC 
then instruction source register dependency tracking in P2 prevents a 
speculative load of X before P2 obtains Y from the same place as P0/P1 
and calculate address of X. If some load of X in P2 happens before 
address dependency calculation it's result is discarded.

Yes, you can't find that in MIPS SYNC instruction description, it is 
more likely in CM (Coherence Manager) area. I just pointed our arch team 
member responsible for documents and he will think how to explain that.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12 20:45             ` Leonid Yegoshin
@ 2016-01-12 21:40               ` Peter Zijlstra
  2016-01-13  0:21                 ` Leonid Yegoshin
  2016-01-13 10:45               ` Will Deacon
  1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-12 21:40 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Will Deacon, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
> (I try to answer on multiple mails in one)
> 
> First of all, it seems like some generic notes should be given here:
> 
> 1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in some
> CPUs. On that CPUs it basically kills pipelines in each CPU, can do a
> special memory/IO bus transaction (similar to "fence") and hold a system
> until all R/W is completed. It is like Big Kernel Lock but worse. So, the
> move to SMP_* kind of barriers is needed to improve performance, especially
> on newest CPUs with long pipelines.

The MIPS SYNC isn't any worse than the PPC SYNC, x86 MFENCE or arm DSB
SY, yes they're heavy, so what.

> 2. MIPS Arch document may be misleading because words "ordering" and
> "completion" means different from Linux, the SYNC instruction description is
> written for HW engineers. I wrote that in a separate patch of the same
> patchset - http://patchwork.linux-mips.org/patch/10505/ "MIPS: R6: Use
> lightweight SYNC instruction in smp_* memory barriers":

Did you actually say anything here?

> >This instructions were specifically designed to work for smp_*() sort of
> >memory barriers in MIPS R2/R3/R5 and R6.
> >
> >Unfortunately, it's description is very cryptic and is done in HW engineering
> >style which prevents use of it by SW.
> 
> 3. I bother MIPS Arch team long time until I completely understood that MIPS
> SYNC_WMB, SYNC_MB, SYNC_RMB, SYNC_RELEASE and SYNC_ACQUIRE do an exactly
> that is required in Documentation/memory-barriers.txt

Ha! and you think that document covers all the really fun details?

In particular we're very much all 'confused' about the various notions
of transitivity and what barriers imply how much of it.

> In Peter Zijlstra mail:
> 
> >1) you do not make such things selectable; either the hardware needs
> >them or it doesn't. If it does you_must_  use them, however unlikely.

> It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of
> MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
> resource, especially taking into account that "lightweight syncs" are
> converted to a heavy "SYNC 0" in many of that CPUs. However the latest
> MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence
> of SYNC at LL/SC inside atomics, barriers etc.

What ?! Are you saying that because R2 has short pipelines its unlikely
to hit the reordering issues and we can omit barriers?

> >And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
> >are_NOT_  transitive and therefore cannot be used to implement the
> >smp_mb__{before,after} stuff.
> >
> >That is, in MIPS speak, those SYNC types are Ordering Barriers, not
> >Completion Barriers.
> 
> Please see above, point 2.

That did not in fact enlighten things. Are they transitive/multi-copy
atomic or not?

(and here Will will go into great detail on the differences between the
two and make our collective brains explode :-)

> >That is, currently all architectures -- with exception of PPC -- have
> >RCsc locks, but using these non-transitive things will get you RCpc
> >locks.
> >
> >So yes, MIPS can go RCpc for its locks and share the burden of pain with
> >PPC, but that needs to be a very concious decision.
> 
> I don't understand that - I tried hard but I can't find any word like
> "RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.

From: lkml.kernel.org/r/20150828153921.GF19282@twins.programming.kicks-ass.net

Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
not.

Currently PowerPC is the only arch that (can, and) does RCpc and gives a
weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
to see the stores of the CPU which did the RELEASE in order.

As it stands, RCU is the only _known_ codebase where this matters, but
we did in fact write code for a fair number of years 'assuming' RELEASE
+ ACQUIRE was a full barrier, so who knows what else is out there.


RCsc - release consistency sequential consistency
RCpc - release consistency processor consistency

https://en.wikipedia.org/wiki/Processor_consistency

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12 21:40               ` Peter Zijlstra
@ 2016-01-13  0:21                 ` Leonid Yegoshin
  0 siblings, 0 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-13  0:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On 01/12/2016 01:40 PM, Peter Zijlstra wrote:
>
>> It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of
>> MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
>> resource, especially taking into account that "lightweight syncs" are
>> converted to a heavy "SYNC 0" in many of that CPUs. However the latest
>> MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence
>> of SYNC at LL/SC inside atomics, barriers etc.
> What ?! Are you saying that because R2 has short pipelines its unlikely
> to hit the reordering issues and we can omit barriers?

It was my guess to explain - why barriers was not included originally. 
You can check with Ralf, he knows more about that time MIPS Linux code.

I bother with this more than 2 years and I just try to solve that issue 
- in recent CPUs the load after LL/SC synchronization instruction loop 
can get ahead of SC for sure, it was tested.

>
>>> And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
>>> are_NOT_  transitive and therefore cannot be used to implement the
>>> smp_mb__{before,after} stuff.
>>>
>>> That is, in MIPS speak, those SYNC types are Ordering Barriers, not
>>> Completion Barriers.
>> Please see above, point 2.
> That did not in fact enlighten things. Are they transitive/multi-copy
> atomic or not?

Peter Zijlstra recently wrote: "In particular we're very much all 
'confused' about the various notions of transitivity". I am actually 
confused too and need some examples here.

>
> (and here Will will go into great detail on the differences between the
> two and make our collective brains explode :-)
>
>>> That is, currently all architectures -- with exception of PPC -- have
>>> RCsc locks, but using these non-transitive things will get you RCpc
>>> locks.
>>>
>>> So yes, MIPS can go RCpc for its locks and share the burden of pain with
>>> PPC, but that needs to be a very concious decision.
>> I don't understand that - I tried hard but I can't find any word like
>> "RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.
> From: lkml.kernel.org/r/20150828153921.GF19282@twins.programming.kicks-ass.net
>
> Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
> ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
> not.

MIPS Arch starting from R2 requires that. If some CPU can't, it should 
execute a full "SYNC 0" instead, which is a full memory barrier.

>
> Currently PowerPC is the only arch that (can, and) does RCpc and gives a
> weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
> to see the stores of the CPU which did the RELEASE in order.

Yes, it was a goal for SYNC_ACQUIRE and SYNC_RELEASE.

Caveats:

     - "Full memory barrier" on MIPS means - full barrier for any device 
in coherent domain. In MIPS Tech/Imagination Tech MIPS-based CPU it is 
"for any device connected to CM or IOCU + directly connected memory".

     - It is not applied to instruction fetch. However, I-Cache flushes 
and SYNCI are consistent with that. There is also hazard barrier 
instructions to clear CPU pipeline to some extent - to help with this 
limitation.

I don't think that these caveats prevent a correct Acquire/Release semantic.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-12 20:45             ` Leonid Yegoshin
  2016-01-12 21:40               ` Peter Zijlstra
@ 2016-01-13 10:45               ` Will Deacon
  2016-01-13 19:02                 ` Leonid Yegoshin
  2016-01-13 22:26                 ` Leonid Yegoshin
  1 sibling, 2 replies; 153+ messages in thread
From: Will Deacon @ 2016-01-13 10:45 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
> >The issue I have with the SYNC description in the text above is that it
> >describes the single CPU (program order) and the dual-CPU (confusingly
> >named global order) cases, but then doesn't generalise any further. That
> >means we can't sensibly reason about transitivity properties when a third
> >agent is involved. For example, the WRC+sync+addr test:
> >
> >
> >P0:
> >Wx = 1
> >
> >P1:
> >Rx == 1
> >SYNC
> >Wy = 1
> >
> >P2:
> >Ry == 1
> ><address dep>
> >Rx = 0
> >
> >
> >I can't find anything to forbid that, given the text. The main problem
> >is having the SYNC on P1 affect the write by P0.
> 
> As I understand that test, the visibility of P0: W[x] = 1 is identical to P1
> and P2 here. If P1 got X before SYNC and write to Y after SYNC then
> instruction source register dependency tracking in P2 prevents a speculative
> load of X before P2 obtains Y from the same place as P0/P1 and calculate
> address of X. If some load of X in P2 happens before address dependency
> calculation it's result is discarded.

I don't think the address dependency is enough on its own. By that
reasoning, the following variant (WRC+addr+addr) would work too:


P0:
Wx = 1

P1:
Rx == 1
<address dep>
Wy = 1

P2:
Ry == 1
<address dep>
Rx = 0


So are you saying that this is also forbidden?
Imagine that P0 and P1 are two threads that share a store buffer. What
then?

> Yes, you can't find that in MIPS SYNC instruction description, it is more
> likely in CM (Coherence Manager) area. I just pointed our arch team member
> responsible for documents and he will think how to explain that.

I tried grepping the linked documents for "coherence manager" but couldn't
find anything. Is the description you refer to available anywhere?

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-13 10:45               ` Will Deacon
@ 2016-01-13 19:02                 ` Leonid Yegoshin
  2016-01-13 20:48                   ` Peter Zijlstra
  2016-01-13 22:26                 ` Leonid Yegoshin
  1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-13 19:02 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On 01/13/2016 02:45 AM, Will Deacon wrote:
> On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
>>
> I don't think the address dependency is enough on its own. By that
> reasoning, the following variant (WRC+addr+addr) would work too:
>
>
> P0:
> Wx = 1
>
> P1:
> Rx == 1
> <address dep>
> Wy = 1
>
> P2:
> Ry == 1
> <address dep>
> Rx = 0
>
>
> So are you saying that this is also forbidden?
> Imagine that P0 and P1 are two threads that share a store buffer. What
> then?
>

I ask HW team about it but I have a question - has it any relationship 
with replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)? You use 
any barrier or do not use it and I just voice an intention to use a more 
efficient instruction instead of bold hummer (SYNC instruction). If you 
don't use any barrier here then it is a different issue.

May be it has sense to return back to original issue?

- Leonid

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-13 19:02                 ` Leonid Yegoshin
@ 2016-01-13 20:48                   ` Peter Zijlstra
  2016-01-13 20:58                     ` Leonid Yegoshin
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-13 20:48 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Will Deacon, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:

> I ask HW team about it but I have a question - has it any relationship with
> replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?

Of course. If you cannot explain the semantics of the primitives you
introduce, how can we judge the patch.

This barrier business is hard enough as it is, but magic unexplained
hardware makes it impossible.

Rest assured, you (MIPS) isn't the first (nor likely the last) to go
through all this. We've had these discussions (and to a certain extend
are still having them) for x86, PPC, Alpha, ARM, etc..

Any every time new barriers instructions get introduced we had better
have a full and comprehensive explanation to go along with them.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-13 20:48                   ` Peter Zijlstra
@ 2016-01-13 20:58                     ` Leonid Yegoshin
  2016-01-14 12:04                       ` Will Deacon
  0 siblings, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-13 20:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On 01/13/2016 12:48 PM, Peter Zijlstra wrote:
> On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:
>
>> I ask HW team about it but I have a question - has it any relationship with
>> replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?
> Of course. If you cannot explain the semantics of the primitives you
> introduce, how can we judge the patch.
>
>
You missed a point - it is a question about replacement of SYNC with 
lightweight primitives. It is NOT a question about multithread system 
behavior without any SYNC. The answer on a latest Will's question lies 
in different area.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-13 10:45               ` Will Deacon
  2016-01-13 19:02                 ` Leonid Yegoshin
@ 2016-01-13 22:26                 ` Leonid Yegoshin
  2016-01-14  9:24                   ` Michael S. Tsirkin
  2016-01-14 12:14                   ` Will Deacon
  1 sibling, 2 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-13 22:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On 01/13/2016 02:45 AM, Will Deacon wrote:
>>
> I don't think the address dependency is enough on its own. By that
> reasoning, the following variant (WRC+addr+addr) would work too:
>
>
> P0:
> Wx = 1
>
> P1:
> Rx == 1
> <address dep>
> Wy = 1
>
> P2:
> Ry == 1
> <address dep>
> Rx = 0
>
>
> So are you saying that this is also forbidden?
> Imagine that P0 and P1 are two threads that share a store buffer. What
> then?

OK, I collected answers and it is:

     In MIPS R6 this test passes OK, I mean - P2: Rx = 1 if Ry is read 
as 1. By design.

     However, it is unclear that happens in MIPS R2 1004K.

     Moreover, there are voices against guarantee that it will be in 
future and that voices point me to Documentation/memory-barriers.txt 
section "DATA DEPENDENCY BARRIERS" examples which require SYNC_RMB 
between loading address/index and using that for loading data based on 
that address or index for shared data (look on CPU2 pseudo-code):
> To deal with this, a data dependency barrier or better must be inserted
> between the address load and the data load:
>
>         CPU 1                 CPU 2
>         ===============       ===============
>         { A == 1, B == 2, C = 3, P == &A, Q == &C }
>         B = 4;
>         <write barrier>
>         WRITE_ONCE(P, &B);
>                               Q = READ_ONCE(P);
>                               <data dependency barrier> <----------- 
> SYNC_RMB is here
>                               D = *Q;
...
> Another example of where data dependency barriers might be required is 
> where a
> number is read from memory and then used to calculate the index for an 
> array
> access:
>
>         CPU 1                 CPU 2
>         ===============       ===============
>         { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
>         M[1] = 4;
>         <write barrier>
>         WRITE_ONCE(P, 1);
>                               Q = READ_ONCE(P);
>                               <data dependency barrier> <------------ 
> SYNC_RMB is here
>                               D = M[Q];

That voices say that there is a legitimate reason to relax HW here for 
performance if SYNC_RMB is needed anyway to work with this sequence of 
shared data.


And all that is out-of-topic here in my mind. I just want to be sure 
that this patchset still provides a use of a specific lightweight SYNCs 
on MIPS vs bold and heavy generalized "SYNC 0" in any case.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-13 22:26                 ` Leonid Yegoshin
@ 2016-01-14  9:24                   ` Michael S. Tsirkin
  2016-01-14 12:14                   ` Will Deacon
  1 sibling, 0 replies; 153+ messages in thread
From: Michael S. Tsirkin @ 2016-01-14  9:24 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Will Deacon, Peter Zijlstra, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
> And all that is out-of-topic here in my mind. I just want to be sure that
> this patchset still provides a use of a specific lightweight SYNCs on MIPS
> vs bold and heavy generalized "SYNC 0" in any case.
> 
> - Leonid.

Of course it does. All this patchset does is rename smp_mb/rmb/wmb
to __smp_mb()/__smp_rmb()/__smp_wmb()
and then asm-generic does #define smp_mb __smp_mb
or #define smp_mb barrier depending on CONFIG_SMP.

Why is that needed? So we can implement
[PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers

-- 
MST

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-13 20:58                     ` Leonid Yegoshin
@ 2016-01-14 12:04                       ` Will Deacon
  2016-01-14 16:16                         ` Paul E. McKenney
  2016-01-14 20:12                         ` Leonid Yegoshin
  0 siblings, 2 replies; 153+ messages in thread
From: Will Deacon @ 2016-01-14 12:04 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On Wed, Jan 13, 2016 at 12:58:22PM -0800, Leonid Yegoshin wrote:
> On 01/13/2016 12:48 PM, Peter Zijlstra wrote:
> >On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:
> >
> >>I ask HW team about it but I have a question - has it any relationship with
> >>replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?
> >Of course. If you cannot explain the semantics of the primitives you
> >introduce, how can we judge the patch.
> >
> >
> You missed a point - it is a question about replacement of SYNC with
> lightweight primitives. It is NOT a question about multithread system
> behavior without any SYNC. The answer on a latest Will's question lies in
> different area.

The reason we (Peter and I) care about this isn't because we enjoy being
obstructive. It's because there is a whole load of core (i.e. portable)
kernel code that is written to the *kernel* memory model. For example,
the scheduler, RCU, mutex implementations, perf, drivers, you name it.

Consequently, it's important that the architecture back-ends implement
these portable primitives (e.g. smp_mb()) in a way that satisfies the
kernel memory model so that core code doesn't need to worry about the
underlying architecture for synchronisation purposes. You could turn
around and say "but if MIPS gets it wrong, then that's MIPS's problem",
but actually not having a general understanding of the ordering guarantees
provided by each architecture makes it very difficult for us to extend
the kernel memory model in such a way that it can be implemented
efficiently across the board *and* relied upon by core code.

The virtio patch at the start of the thread doesn't particularly concern
me. It's the other patches you linked to that implement acquire/release
that have me worried.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-13 22:26                 ` Leonid Yegoshin
  2016-01-14  9:24                   ` Michael S. Tsirkin
@ 2016-01-14 12:14                   ` Will Deacon
  2016-01-14 19:28                     ` Leonid Yegoshin
  1 sibling, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-14 12:14 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
> On 01/13/2016 02:45 AM, Will Deacon wrote:
> >>
> >I don't think the address dependency is enough on its own. By that
> >reasoning, the following variant (WRC+addr+addr) would work too:
> >
> >
> >P0:
> >Wx = 1
> >
> >P1:
> >Rx == 1
> ><address dep>
> >Wy = 1
> >
> >P2:
> >Ry == 1
> ><address dep>
> >Rx = 0
> >
> >
> >So are you saying that this is also forbidden?
> >Imagine that P0 and P1 are two threads that share a store buffer. What
> >then?
> 
> OK, I collected answers and it is:
> 
>     In MIPS R6 this test passes OK, I mean - P2: Rx = 1 if Ry is read as 1.
> By design.
> 
>     However, it is unclear that happens in MIPS R2 1004K.

How can it be unclear? If, for example, the outcome is permitted on that
CPU, then your original reasoning for the WRC+sync+addr doesn't apply
there and SYNC is not transitive. That's what I'm trying to get to the
bottom of.

Does the MIPS kernel target a particular CPU at compile time?

>     Moreover, there are voices against guarantee that it will be in future
> and that voices point me to Documentation/memory-barriers.txt section "DATA
> DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading
> address/index and using that for loading data based on that address or index
> for shared data (look on CPU2 pseudo-code):
> >To deal with this, a data dependency barrier or better must be inserted
> >between the address load and the data load:
> >
> >        CPU 1                 CPU 2
> >        ===============       ===============
> >        { A == 1, B == 2, C = 3, P == &A, Q == &C }
> >        B = 4;
> >        <write barrier>
> >        WRITE_ONCE(P, &B);
> >                              Q = READ_ONCE(P);
> >                              <data dependency barrier> <-----------
> >SYNC_RMB is here
> >                              D = *Q;
> ...
> >Another example of where data dependency barriers might be required is
> >where a
> >number is read from memory and then used to calculate the index for an
> >array
> >access:
> >
> >        CPU 1                 CPU 2
> >        ===============       ===============
> >        { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
> >        M[1] = 4;
> >        <write barrier>
> >        WRITE_ONCE(P, 1);
> >                              Q = READ_ONCE(P);
> >                              <data dependency barrier> <------------
> >SYNC_RMB is here
> >                              D = M[Q];
> 
> That voices say that there is a legitimate reason to relax HW here for
> performance if SYNC_RMB is needed anyway to work with this sequence of
> shared data.

Are you saying that MIPS needs to implement [smp_]read_barrier_depends?

> And all that is out-of-topic here in my mind. I just want to be sure that
> this patchset still provides a use of a specific lightweight SYNCs on MIPS
> vs bold and heavy generalized "SYNC 0" in any case.

We may be highjacking the thread slightly, but there are much bigger
issues at play here if you want to start using lightweight barriers to
implement relaxed kernel primitives such as smp_load_acquire and
smp_store_release.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 12:04                       ` Will Deacon
@ 2016-01-14 16:16                         ` Paul E. McKenney
  2016-01-14 19:42                           ` Leonid Yegoshin
  2016-01-14 20:12                         ` Leonid Yegoshin
  1 sibling, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 16:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 12:04:45PM +0000, Will Deacon wrote:
> On Wed, Jan 13, 2016 at 12:58:22PM -0800, Leonid Yegoshin wrote:
> > On 01/13/2016 12:48 PM, Peter Zijlstra wrote:
> > >On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:
> > >
> > >>I ask HW team about it but I have a question - has it any relationship with
> > >>replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?
> > >Of course. If you cannot explain the semantics of the primitives you
> > >introduce, how can we judge the patch.
> > >
> > >
> > You missed a point - it is a question about replacement of SYNC with
> > lightweight primitives. It is NOT a question about multithread system
> > behavior without any SYNC. The answer on a latest Will's question lies in
> > different area.
> 
> The reason we (Peter and I) care about this isn't because we enjoy being
> obstructive. It's because there is a whole load of core (i.e. portable)
> kernel code that is written to the *kernel* memory model. For example,
> the scheduler, RCU, mutex implementations, perf, drivers, you name it.
> 
> Consequently, it's important that the architecture back-ends implement
> these portable primitives (e.g. smp_mb()) in a way that satisfies the
> kernel memory model so that core code doesn't need to worry about the
> underlying architecture for synchronisation purposes. You could turn
> around and say "but if MIPS gets it wrong, then that's MIPS's problem",
> but actually not having a general understanding of the ordering guarantees
> provided by each architecture makes it very difficult for us to extend
> the kernel memory model in such a way that it can be implemented
> efficiently across the board *and* relied upon by core code.

What Will said!

Yes, you can cut corners within MIPS architecture-specific code,
but primitives that are used in the core kernel really do need to
work as expected.

							Thanx, Paul

> The virtio patch at the start of the thread doesn't particularly concern
> me. It's the other patches you linked to that implement acquire/release
> that have me worried.
> 
> Will
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 12:14                   ` Will Deacon
@ 2016-01-14 19:28                     ` Leonid Yegoshin
  2016-01-14 20:34                       ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 19:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On 01/14/2016 04:14 AM, Will Deacon wrote:
> On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
>
>>      Moreover, there are voices against guarantee that it will be in future
>> and that voices point me to Documentation/memory-barriers.txt section "DATA
>> DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading
>> address/index and using that for loading data based on that address or index
>> for shared data (look on CPU2 pseudo-code):
>>> To deal with this, a data dependency barrier or better must be inserted
>>> between the address load and the data load:
>>>
>>>         CPU 1                 CPU 2
>>>         ===============       ===============
>>>         { A == 1, B == 2, C = 3, P == &A, Q == &C }
>>>         B = 4;
>>>         <write barrier>
>>>         WRITE_ONCE(P, &B);
>>>                               Q = READ_ONCE(P);
>>>                               <data dependency barrier> <-----------
>>> SYNC_RMB is here
>>>                               D = *Q;
>> ...
>>> Another example of where data dependency barriers might be required is
>>> where a
>>> number is read from memory and then used to calculate the index for an
>>> array
>>> access:
>>>
>>>         CPU 1                 CPU 2
>>>         ===============       ===============
>>>         { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
>>>         M[1] = 4;
>>>         <write barrier>
>>>         WRITE_ONCE(P, 1);
>>>                               Q = READ_ONCE(P);
>>>                               <data dependency barrier> <------------
>>> SYNC_RMB is here
>>>                               D = M[Q];
>> That voices say that there is a legitimate reason to relax HW here for
>> performance if SYNC_RMB is needed anyway to work with this sequence of
>> shared data.
> Are you saying that MIPS needs to implement [smp_]read_barrier_depends?

It is not me, it is Documentation/memory-barriers.txt from kernel sources.

HW team can't work on voice statements, it should do a work on written 
documents. If that is written (see above the lines which I marked by 
"SYNC_RMB") then anybody should use it and never mind how many 
CPUs/Threads are in play. This examples explicitly requires to insert 
"data dependency barrier" between reading a shared pointer/index and 
using it to fetch a shared data. So, your WRC+addr+addr test is a 
violation of that recommendation.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 16:16                         ` Paul E. McKenney
@ 2016-01-14 19:42                           ` Leonid Yegoshin
  2016-01-14 20:15                             ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 19:42 UTC (permalink / raw)
  To: paulmck, Will Deacon
  Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman

On 01/14/2016 08:16 AM, Paul E. McKenney wrote:
> On Thu, Jan 14, 2016 at 12:04:45PM +0000, Will Deacon wrote:
>> On Wed, Jan 13, 2016 at 12:58:22PM -0800, Leonid Yegoshin wrote:
>>> On 01/13/2016 12:48 PM, Peter Zijlstra wrote:
>>>> On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:
>>>>
>>>>> I ask HW team about it but I have a question - has it any relationship with
>>>>> replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?
>>>> Of course. If you cannot explain the semantics of the primitives you
>>>> introduce, how can we judge the patch.
>>>>
>>>>
>>> You missed a point - it is a question about replacement of SYNC with
>>> lightweight primitives. It is NOT a question about multithread system
>>> behavior without any SYNC. The answer on a latest Will's question lies in
>>> different area.
> What Will said!
>
> Yes, you can cut corners within MIPS architecture-specific code,
> but primitives that are used in the core kernel really do need to
> work as expected.
>
> 							Thanx, Paul
>
>
Absolutelly! Please use SYNC - right now it is not.

An the only point - please use an appropriate SYNC_* barriers instead of 
heavy bold hammer. That stuff was design explicitly to support the 
requirements of Documentation/memory-barriers.txt

It is easy - just use smp_acquire instead of plain smp_mb 
insmp_load_acquire, at least for MIPS.

- Leonid.
- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 12:04                       ` Will Deacon
  2016-01-14 16:16                         ` Paul E. McKenney
@ 2016-01-14 20:12                         ` Leonid Yegoshin
  2016-01-14 20:48                           ` Paul E. McKenney
  1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 20:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman, Paul McKenney

On 01/14/2016 04:04 AM, Will Deacon wrote:
> Consequently, it's important that the architecture back-ends implement 
> these portable primitives (e.g. smp_mb()) in a way that satisfies the 
> kernel memory model so that core code doesn't need to worry about the 
> underlying architecture for synchronisation purposes.

It seems you don't listen me. I said multiple times - MIPS 
implementation of SYNC_RMB/SYNC_WMB/SYNC_MB/SYNC_ACQUIRE/SYNC_RELEASE 
instructions matches the description of 
smp_rmb/smp_wmb/smp_mb/sync_acquire/sync_release from 
Documentation/memory-barriers.txt file.

What else do you want from me - RTL or microArch design for that?

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 19:42                           ` Leonid Yegoshin
@ 2016-01-14 20:15                             ` Peter Zijlstra
  2016-01-14 20:36                               ` Paul E. McKenney
                                                 ` (2 more replies)
  0 siblings, 3 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-14 20:15 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: paulmck, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote:
> An the only point - please use an appropriate SYNC_* barriers instead of
> heavy bold hammer. That stuff was design explicitly to support the
> requirements of Documentation/memory-barriers.txt

That's madness. That document changes from version to version as to what
we _think_ the actual hardware does. It is _NOT_ a specification.

You cannot design hardware from that. Its incomplete and fails to
specify a bunch of things. It not a mathematically sound definition of a
memory model.

Please stop referring to that document for what a particular barrier
_should_ do.  Explain what MIPS does, so we can attempt to integrate
this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve
upon our understanding of hardware and improve the Linux memory model.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 19:28                     ` Leonid Yegoshin
@ 2016-01-14 20:34                       ` Paul E. McKenney
  2016-01-14 21:01                         ` Leonid Yegoshin
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 20:34 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 11:28:18AM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 04:14 AM, Will Deacon wrote:
> >On Wed, Jan 13, 2016 at 02:26:16PM -0800, Leonid Yegoshin wrote:
> >
> >>     Moreover, there are voices against guarantee that it will be in future
> >>and that voices point me to Documentation/memory-barriers.txt section "DATA
> >>DEPENDENCY BARRIERS" examples which require SYNC_RMB between loading
> >>address/index and using that for loading data based on that address or index
> >>for shared data (look on CPU2 pseudo-code):
> >>>To deal with this, a data dependency barrier or better must be inserted
> >>>between the address load and the data load:
> >>>
> >>>        CPU 1                 CPU 2
> >>>        ===============       ===============
> >>>        { A == 1, B == 2, C = 3, P == &A, Q == &C }
> >>>        B = 4;
> >>>        <write barrier>
> >>>        WRITE_ONCE(P, &B);
> >>>                              Q = READ_ONCE(P);
> >>>                              <data dependency barrier> <-----------
> >>>SYNC_RMB is here
> >>>                              D = *Q;
> >>...
> >>>Another example of where data dependency barriers might be required is
> >>>where a
> >>>number is read from memory and then used to calculate the index for an
> >>>array
> >>>access:
> >>>
> >>>        CPU 1                 CPU 2
> >>>        ===============       ===============
> >>>        { M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
> >>>        M[1] = 4;
> >>>        <write barrier>
> >>>        WRITE_ONCE(P, 1);
> >>>                              Q = READ_ONCE(P);
> >>>                              <data dependency barrier> <------------
> >>>SYNC_RMB is here
> >>>                              D = M[Q];
> >>That voices say that there is a legitimate reason to relax HW here for
> >>performance if SYNC_RMB is needed anyway to work with this sequence of
> >>shared data.
> >Are you saying that MIPS needs to implement [smp_]read_barrier_depends?
> 
> It is not me, it is Documentation/memory-barriers.txt from kernel sources.
> 
> HW team can't work on voice statements, it should do a work on
> written documents. If that is written (see above the lines which I
> marked by "SYNC_RMB") then anybody should use it and never mind how
> many CPUs/Threads are in play. This examples explicitly requires to
> insert "data dependency barrier" between reading a shared
> pointer/index and using it to fetch a shared data. So, your
> WRC+addr+addr test is a violation of that recommendation.

Perhaps Documentation/memory-barriers.txt needs additional clarification.
It would not be the first time.

If your CPU implicitly maintains ordering based on address and
data dependencies, then you don't need any instructions for
<data dependency barrier>.

The WRC+addr+addr is OK because data dependencies are not required to be
transitive, in other words, they are not required to flow from one CPU to
another without the help of an explicit memory barrier.  Transitivity is
instead supplied by smp_mb() and by smp_store_release()-smp_load_acquire()
chains.  Here is the Linux kernel code for WRC+addr+addr, give or take
(and no, I have no idea why anyone would want to write code like this):

	struct foo {
		struct foo **a;
	};
	struct foo b;
	struct foo c;
	struct foo d;
	struct foo e;
	struct foo f = { &d };
	struct foo g = { &e };
	struct foo *x = &b;

	void cpu0(void)
	{
		WRITE_ONCE(x, &f);
	}

	void cpu1(void)
	{
		struct foo *p;

		p = lockless_dereference(x);
		WRITE_ONCE(p->a, &x);
	}

	void cpu2(void)
	{
		r1 = lockless_dereference(f.a);
		WRITE_ONCE(*r1, &c);
	}

It is legal to end the run with x==&f and r1==&x.  To prevent this outcome,
we do the following:

	struct foo {
		struct foo **a;
	};
	struct foo b;
	struct foo c;
	struct foo d;
	struct foo e;
	struct foo f = { &d };
	struct foo g = { &e };
	struct foo *x = &b;

	void cpu0(void)
	{
		WRITE_ONCE(x, &f);
	}

	void cpu1(void)
	{
		struct foo *p;

		p = lockless_dereference(x);
		smp_store_release(&p->a, &x); /* Additional ordering. */
	}

	void cpu2(void)
	{
		r1 = lockless_dereference(f.a);
		WRITE_ONCE(*r1, &c);
	}

And I still don't know why anyone would need this sort of code.  ;-)

Alternatively, we pull cpu2() into cpu1():

	struct foo {
		struct foo **a;
	};
	struct foo b;
	struct foo c;
	struct foo d;
	struct foo e;
	struct foo f = { &d };
	struct foo g = { &e };
	struct foo *x = &b;

	void cpu0(void)
	{
		WRITE_ONCE(x, &f);
	}

	void cpu1(void)
	{
		struct foo *p;

		p = lockless_dereference(x);
		WRITE_ONCE(p->a, &x);
		r1 = lockless_dereference(f.a);
		WRITE_ONCE(*r1, &c);
	}

The ordering is now enforced by being within a single thread.  In fact,
the second lockless_dereference() can be READ_ONCE().

So, does MIPS maintain ordering within a given CPU based on address and
data dependencies?  If so, you don't need to emit memory-barrier instructions
for read_barrier_depends().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 20:15                             ` Peter Zijlstra
@ 2016-01-14 20:36                               ` Paul E. McKenney
  2016-01-14 20:46                               ` Peter Zijlstra
  2016-01-14 20:46                               ` Leonid Yegoshin
  2 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 20:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 09:15:13PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote:
> > An the only point - please use an appropriate SYNC_* barriers instead of
> > heavy bold hammer. That stuff was design explicitly to support the
> > requirements of Documentation/memory-barriers.txt
> 
> That's madness. That document changes from version to version as to what
> we _think_ the actual hardware does. It is _NOT_ a specification.

There is work in progress on a specification, but please don't hold
your breath.  And I am not as optimistic as I might be about any formal
specification keeping up with the Linux kernel or with the hardware that
it supports.  But it seems worth a good try.

> You cannot design hardware from that. Its incomplete and fails to
> specify a bunch of things. It not a mathematically sound definition of a
> memory model.
> 
> Please stop referring to that document for what a particular barrier
> _should_ do.  Explain what MIPS does, so we can attempt to integrate
> this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve
> upon our understanding of hardware and improve the Linux memory model.

Please!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 20:15                             ` Peter Zijlstra
  2016-01-14 20:36                               ` Paul E. McKenney
@ 2016-01-14 20:46                               ` Peter Zijlstra
  2016-01-14 20:46                               ` Leonid Yegoshin
  2 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-14 20:46 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: paulmck, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 09:15:13PM +0100, Peter Zijlstra wrote:
> On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote:
> > An the only point - please use an appropriate SYNC_* barriers instead of
> > heavy bold hammer. That stuff was design explicitly to support the
> > requirements of Documentation/memory-barriers.txt
> 
> That's madness. That document changes from version to version as to what
> we _think_ the actual hardware does. It is _NOT_ a specification.
> 
> You cannot design hardware from that. Its incomplete and fails to
> specify a bunch of things. It not a mathematically sound definition of a
> memory model.
> 
> Please stop referring to that document for what a particular barrier
> _should_ do.  Explain what MIPS does, so we can attempt to integrate
> this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve
> upon our understanding of hardware and improve the Linux memory model.

That is, if you'd managed to read that file at the right point in time,
you might have through we'd be OK with requiring a barrier for
control dependencies.

We got rid of that mistake. It was based on a flawed reading of the
Alpha docs. See: 105ff3cbf225 ("atomic: remove all traces of
READ_ONCE_CTRL() and atomic*_read_ctrl()")

Similarly, while the document goes to great length to explain the
read_barrier_depends thing, nobody actually thinks its a brilliant idea
to have. Ideally we'd kill the thing the moment we drop Alpha support.

Again, memory-barriers.txt is _NOT_, I repeat, _NOT_ a hardware spec, it
is not even a recommendation. It are our best effort (but flawed)
scribbles of what we think is makes sense given the huge amount of
actual hardware we have to run on.


As to the ACQUIRE/RELEASE semantics, ARM64 actually has
multi-copy-atomic acquire/release (as does ia64, although in reality it
doesn't actually have acquire/release). PPC otoh does _NOT_ have this,
and is currently the only arch to suffer RCpc locks.

Now for a long long time we assumed our locks were RCsc, and we've
written code assuming UNLOCK x + LOCK y was in fact a full barrier with
transitiviy. Then we figured out PPC didn't actually match that. RCU is
the only piece of code we _know_ relied on that, but there might be more
out there...

So we document, for new code, that UNLOCK+LOCK isn't a MB, while at the
same time we lobby PPC to stick a full barrier in and get rid of this
stuff.

Nobody really likes RCpc locks, esp. given the history we have of
assuming RCsc.

The current document allowing for RCpc is not an endorsement thereof.
Ideally we'd _NOT_ have to worry about that. We can do without these
head-aches.


So again, stop referring to our document as a spec. Also please don't
make MIPS push the limits of weak memory models, we really can do
without the pain.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 20:15                             ` Peter Zijlstra
  2016-01-14 20:36                               ` Paul E. McKenney
  2016-01-14 20:46                               ` Peter Zijlstra
@ 2016-01-14 20:46                               ` Leonid Yegoshin
  2016-01-14 21:34                                 ` Paul E. McKenney
  2 siblings, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 20:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulmck, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On 01/14/2016 12:15 PM, Peter Zijlstra wrote:
> On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote:
>> An the only point - please use an appropriate SYNC_* barriers instead of
>> heavy bold hammer. That stuff was design explicitly to support the
>> requirements of Documentation/memory-barriers.txt
> That's madness. That document changes from version to version as to what
> we _think_ the actual hardware does. It is _NOT_ a specification.
>
> You cannot design hardware from that. Its incomplete and fails to
> specify a bunch of things. It not a mathematically sound definition of a
> memory model.
>
> Please stop referring to that document for what a particular barrier
> _should_ do.  Explain what MIPS does, so we can attempt to integrate
> this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve
> upon our understanding of hardware and improve the Linux memory model.

I am afraid I can't help you here. It is very complicated stuff and a 
model is actually doesn't fit your assumptions about CPUs well without 
some simplifications which are based on what you want to have.

I say that SYNC_ACQUIRE/etc follows what you expect for smp_acquire etc 
(basing on that document). And at least two CPU models were tested with 
my patches (see it in LMO) for that last year and that instructions are 
implemented now in engineering kernel.

If you have something else in mind, you can ask me. But I prefer to do 
not deviate too much from Documentation/memory-barriers.txt, for exam - 
if it asks to have memory barrier somewhere, then I assume the code 
should have it, and please - don't ask me a test which violates the 
current version of document recommendations.

For a moment I don't see a significant changes in this document for MIPS 
Arch at least 1.5 year, and the only significant point is that MIPS CPU 
Arch doesn't have yet smp_read_barrier_depends() and smp_rmb() should be 
used instead.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 20:12                         ` Leonid Yegoshin
@ 2016-01-14 20:48                           ` Paul E. McKenney
  2016-01-14 21:24                             ` Leonid Yegoshin
  2016-01-18  8:19                             ` Herbert Xu
  0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 20:48 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 12:12:53PM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 04:04 AM, Will Deacon wrote:
> >Consequently, it's important that the architecture back-ends
> >implement these portable primitives (e.g. smp_mb()) in a way that
> >satisfies the kernel memory model so that core code doesn't need
> >to worry about the underlying architecture for synchronisation
> >purposes.
> 
> It seems you don't listen me. I said multiple times - MIPS
> implementation of
> SYNC_RMB/SYNC_WMB/SYNC_MB/SYNC_ACQUIRE/SYNC_RELEASE instructions
> matches the description of
> smp_rmb/smp_wmb/smp_mb/sync_acquire/sync_release from
> Documentation/memory-barriers.txt file.
> 
> What else do you want from me - RTL or microArch design for that?

I suspect that it is more likely that we are talking past each other.
This stuff is subtle and although we have better ways of talking about
it than (say) ten years ago, it is subtle.  Two ways of talking about
it are herd and ppcmem.

The overview of ppcmem (AKA armmem and cppmem) is here:
https://www.cl.cam.ac.uk/~pes20/ppcmem/help.html

The intro to herd is here: http://arxiv.org/pdf/1308.6810v5.pdf
It may be downloaded here: http://diy.inria.fr/herd/

As a very rough rule of thumb, herd is faster and easier to use
and ppcmem is more precise.

So SYNC_RMB is intended to implement smp_rmb(), correct?

You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.
The reason for this is that smp_read_barrier_depends() must order the
pointer load against any subsequent read or write through a dereference
of that pointer.  For example:

	p = READ_ONCE(gp);
	smp_rmb();
	r1 = p->a; /* ordered by smp_rmb(). */
	p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */
	r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */

In contrast:

	p = READ_ONCE(gp);
	smp_read_barrier_depends();
	r1 = p->a; /* ordered by smp_read_barrier_depends(). */
	p->b = 42; /* ordered by smp_read_barrier_depends(). */
	r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */

Again, if your hardware maintains local ordering for address
and data dependencies, you can have read_barrier_depends() and
smp_read_barrier_depends() be no-ops like they are for most
architectures.

Does that help?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 20:34                       ` Paul E. McKenney
@ 2016-01-14 21:01                         ` Leonid Yegoshin
  2016-01-14 21:29                           ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 21:01 UTC (permalink / raw)
  To: paulmck
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

I need some time to understand your test examples. However,

On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
>
>
> The WRC+addr+addr is OK because data dependencies are not required to be
> transitive, in other words, they are not required to flow from one CPU to
> another without the help of an explicit memory barrier.

I don't see any reliable way to fit WRC+addr+addr into "DATA DEPENDENCY 
BARRIERS" section recommendation to have data dependency barrier between 
read of a shared pointer/index and read the shared data based on that 
pointer. If you have this two reads, it doesn't matter the rest of 
scenario, you should put the dependency barrier in code anyway. If you 
don't do it in WRC+addr+addr scenario then after years it can be easily 
changed to different scenario which fits some of scenario in "DATA 
DEPENDENCY BARRIERS" section and fails.

>    Transitivity is

Peter Zijlstra recently wrote: "In particular we're very much all 
'confused' about the various notions of transitivity". I am confused 
too, so - please use some more simple way to explain your words. Sorry, 
but we need a common ground first.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 20:48                           ` Paul E. McKenney
@ 2016-01-14 21:24                             ` Leonid Yegoshin
  2016-01-14 22:20                               ` Paul E. McKenney
  2016-01-18  8:19                             ` Herbert Xu
  1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 21:24 UTC (permalink / raw)
  To: paulmck
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On 01/14/2016 12:48 PM, Paul E. McKenney wrote:
>
> So SYNC_RMB is intended to implement smp_rmb(), correct?
Yes.
>
> You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
> smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.

If smp_read_barrier_depends() is used to separate not only two reads but 
read pointer and WRITE basing on that pointer (example below) - yes. I 
just doesn't see any example of this in famous 
Documentation/memory-barriers.txt and had no chance to know what you use 
it in this way too.

> The reason for this is that smp_read_barrier_depends() must order the
> pointer load against any subsequent read or write through a dereference
> of that pointer.

I can't see that requirement anywhere in Documents directory. I mean - 
the words "write through a dereference of that pointer" or similar for 
smp_read_barrier_depends.

>    For example:
>
> 	p = READ_ONCE(gp);
> 	smp_rmb();
> 	r1 = p->a; /* ordered by smp_rmb(). */
> 	p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */
> 	r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */
>
> In contrast:
>
> 	p = READ_ONCE(gp);
> 	smp_read_barrier_depends();
> 	r1 = p->a; /* ordered by smp_read_barrier_depends(). */
> 	p->b = 42; /* ordered by smp_read_barrier_depends(). */
> 	r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */
>
> Again, if your hardware maintains local ordering for address
> and data dependencies, you can have read_barrier_depends() and
> smp_read_barrier_depends() be no-ops like they are for most
> architectures.

It is not so simple, I mean "local ordering for address and data 
dependencies". Local ordering is NOT enough. It happens that current 
MIPS R6 doesn't require in your example smp_read_barrier_depends() but 
in discussion it comes out that it may not. Because without 
smp_read_barrier_depends() your example can be a part of Will's 
WRC+addr+addr and we found some design which easily can bump into this 
test. And that design actually performs "local ordering for address and 
data dependencies" too.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 21:01                         ` Leonid Yegoshin
@ 2016-01-14 21:29                           ` Paul E. McKenney
  2016-01-14 21:36                             ` Leonid Yegoshin
  2016-01-15  8:55                             ` Peter Zijlstra
  0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 21:29 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 01:01:05PM -0800, Leonid Yegoshin wrote:
> I need some time to understand your test examples. However,

Understood.

> On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> >
> >
> >The WRC+addr+addr is OK because data dependencies are not required to be
> >transitive, in other words, they are not required to flow from one CPU to
> >another without the help of an explicit memory barrier.
> 
> I don't see any reliable way to fit WRC+addr+addr into "DATA
> DEPENDENCY BARRIERS" section recommendation to have data dependency
> barrier between read of a shared pointer/index and read the shared
> data based on that pointer. If you have this two reads, it doesn't
> matter the rest of scenario, you should put the dependency barrier
> in code anyway. If you don't do it in WRC+addr+addr scenario then
> after years it can be easily changed to different scenario which
> fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> fails.

The trick is that lockless_dereference() contains an
smp_read_barrier_depends():

#define lockless_dereference(p) \
({ \
	typeof(p) _________p1 = READ_ONCE(p); \
	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
	(_________p1); \
})

Or am I missing your point?

> >   Transitivity is
> 
> Peter Zijlstra recently wrote: "In particular we're very much all
> 'confused' about the various notions of transitivity". I am confused
> too, so - please use some more simple way to explain your words.
> Sorry, but we need a common ground first.

OK, how about an example?  (Z6.3 in the ppcmem naming scheme.)

	int x, y, z;

	void cpu0(void)
	{
		WRITE_ONCE(x, 1);
		smp_wmb();
		WRITE_ONCE(y, 1);
	}

	void cpu1(void)
	{
		WRITE_ONCE(y, 2);
		smp_wmb();
		WRITE_ONCE(z, 1);
	}

	void cpu2(void)
	{
		r1 = READ_ONCE(z);
		smp_rmb();
		r2 = read_once(x);
	}

If smp_rmb() and smp_wmb() provided transitive ordering, then cpu2()
would see cpu0()'s ordering.  But they do not, so the ordering is
visible at best to the adjacent CPU.  This means that the final value
of y can be 2, while at the same time r1==1 && r2==0.

Now the full barrier, smp_mb(), does provide transitive ordering,
so if the three barriers in the above example are replaced with
smp_mb() the y==2 && r1==1 && r2==0 outcome will be prohibited.

So smp_mb() provides transitivity, as do pairs of smp_store_release()
and smp_read_acquire(), as do RCU grace periods.  The exact interactions
between transitive and non-transitive ordering is a work in progress.
That said, if a series of transitive segments ends in a write, which
connects to a single non-transitive segment starting with a read,
you should be good.  And in fact in the example above, you can replace
the smp_wmb()s with smp_mb() and leave the smp_rmb() and still
prohibit the "cyclic" outcome.

If you want a more formal definition, I must refer you back to the
ppcmem and herd references.

Does that help?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 20:46                               ` Leonid Yegoshin
@ 2016-01-14 21:34                                 ` Paul E. McKenney
  2016-01-14 21:45                                   ` Leonid Yegoshin
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 21:34 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Peter Zijlstra, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 12:46:43PM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 12:15 PM, Peter Zijlstra wrote:
> >On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote:
> >>An the only point - please use an appropriate SYNC_* barriers instead of
> >>heavy bold hammer. That stuff was design explicitly to support the
> >>requirements of Documentation/memory-barriers.txt
> >That's madness. That document changes from version to version as to what
> >we _think_ the actual hardware does. It is _NOT_ a specification.
> >
> >You cannot design hardware from that. Its incomplete and fails to
> >specify a bunch of things. It not a mathematically sound definition of a
> >memory model.
> >
> >Please stop referring to that document for what a particular barrier
> >_should_ do.  Explain what MIPS does, so we can attempt to integrate
> >this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve
> >upon our understanding of hardware and improve the Linux memory model.
> 
> I am afraid I can't help you here. It is very complicated stuff and
> a model is actually doesn't fit your assumptions about CPUs well
> without some simplifications which are based on what you want to
> have.
> 
> I say that SYNC_ACQUIRE/etc follows what you expect for smp_acquire
> etc (basing on that document). And at least two CPU models were
> tested with my patches (see it in LMO) for that last year and that
> instructions are implemented now in engineering kernel.
> 
> If you have something else in mind, you can ask me. But I prefer to
> do not deviate too much from Documentation/memory-barriers.txt, for
> exam - if it asks to have memory barrier somewhere, then I assume
> the code should have it, and please - don't ask me a test which
> violates the current version of document recommendations.
> 
> For a moment I don't see a significant changes in this document for
> MIPS Arch at least 1.5 year, and the only significant point is that
> MIPS CPU Arch doesn't have yet smp_read_barrier_depends() and
> smp_rmb() should be used instead.

Is SYNC_ACQUIRE a memory-barrier instruction that orders prior loads
against later loads and stores?  If so, and if MIPS does not do
ordering based on address and data dependencies, I suggest making
read_barrier_depends() be a SYNC_ACQUIRE rather than SYNC_RMB.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 21:29                           ` Paul E. McKenney
@ 2016-01-14 21:36                             ` Leonid Yegoshin
  2016-01-14 22:55                               ` Paul E. McKenney
  2016-01-15  8:55                             ` Peter Zijlstra
  1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 21:36 UTC (permalink / raw)
  To: paulmck
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
>
>> On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
>>>
>>> The WRC+addr+addr is OK because data dependencies are not required to be
>>> transitive, in other words, they are not required to flow from one CPU to
>>> another without the help of an explicit memory barrier.
>> I don't see any reliable way to fit WRC+addr+addr into "DATA
>> DEPENDENCY BARRIERS" section recommendation to have data dependency
>> barrier between read of a shared pointer/index and read the shared
>> data based on that pointer. If you have this two reads, it doesn't
>> matter the rest of scenario, you should put the dependency barrier
>> in code anyway. If you don't do it in WRC+addr+addr scenario then
>> after years it can be easily changed to different scenario which
>> fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
>> fails.
> The trick is that lockless_dereference() contains an
> smp_read_barrier_depends():
>
> #define lockless_dereference(p) \
> ({ \
> 	typeof(p) _________p1 = READ_ONCE(p); \
> 	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> 	(_________p1); \
> })
>
> Or am I missing your point?

WRC+addr+addr has no any barrier. lockless_dereference() has a barrier. 
I don't see a common points between this and that in your answer, sorry.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 21:34                                 ` Paul E. McKenney
@ 2016-01-14 21:45                                   ` Leonid Yegoshin
  2016-01-14 22:24                                     ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 21:45 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On 01/14/2016 01:34 PM, Paul E. McKenney wrote:
> On Thu, Jan 14, 2016 at 12:46:43PM -0800, Leonid Yegoshin wrote:
>> On 01/14/2016 12:15 PM, Peter Zijlstra wrote:
>>> On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote:
>>>> An the only point - please use an appropriate SYNC_* barriers instead of
>>>> heavy bold hammer. That stuff was design explicitly to support the
>>>> requirements of Documentation/memory-barriers.txt
>>> That's madness. That document changes from version to version as to what
>>> we _think_ the actual hardware does. It is _NOT_ a specification.
>>>
>>> You cannot design hardware from that. Its incomplete and fails to
>>> specify a bunch of things. It not a mathematically sound definition of a
>>> memory model.
>>>
>>> Please stop referring to that document for what a particular barrier
>>> _should_ do.  Explain what MIPS does, so we can attempt to integrate
>>> this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve
>>> upon our understanding of hardware and improve the Linux memory model.
>> I am afraid I can't help you here. It is very complicated stuff and
>> a model is actually doesn't fit your assumptions about CPUs well
>> without some simplifications which are based on what you want to
>> have.
>>
>> I say that SYNC_ACQUIRE/etc follows what you expect for smp_acquire
>> etc (basing on that document). And at least two CPU models were
>> tested with my patches (see it in LMO) for that last year and that
>> instructions are implemented now in engineering kernel.
>>
>> If you have something else in mind, you can ask me. But I prefer to
>> do not deviate too much from Documentation/memory-barriers.txt, for
>> exam - if it asks to have memory barrier somewhere, then I assume
>> the code should have it, and please - don't ask me a test which
>> violates the current version of document recommendations.
>>
>> For a moment I don't see a significant changes in this document for
>> MIPS Arch at least 1.5 year, and the only significant point is that
>> MIPS CPU Arch doesn't have yet smp_read_barrier_depends() and
>> smp_rmb() should be used instead.
> Is SYNC_ACQUIRE a memory-barrier instruction that orders prior loads
> against later loads and stores?

Yes, it is in MD00087 (table 6.6 of document Ver 6.04) - 
https://imgtec.com/?do-download=4302

>    If so, and if MIPS does not do
> ordering based on address and data dependencies, I suggest making
> read_barrier_depends() be a SYNC_ACQUIRE rather than SYNC_RMB.

I understood that, after I see the example of using it.
Please consider to add that into Documentation/memory-barriers.txt (it 
is not easy to find that this barrier is used for shared WRITE basing on 
shared pointer), it would be helpful.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 21:24                             ` Leonid Yegoshin
@ 2016-01-14 22:20                               ` Paul E. McKenney
  2016-01-15  9:57                                 ` Will Deacon
  2016-01-26 10:24                                 ` Peter Zijlstra
  0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 22:20 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 01:24:34PM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 12:48 PM, Paul E. McKenney wrote:
> >
> >So SYNC_RMB is intended to implement smp_rmb(), correct?
> Yes.
> >
> >You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
> >smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.
> 
> If smp_read_barrier_depends() is used to separate not only two reads
> but read pointer and WRITE basing on that pointer (example below) -
> yes. I just doesn't see any example of this in famous
> Documentation/memory-barriers.txt and had no chance to know what you
> use it in this way too.

Well, Documentation/memory-barriers.txt was intended as a guide for Linux
kernel hackers, and not for hardware architects.  The need for something
more precise has become clear over the past year or two, and I am working
on it with some heavy-duty memory-model folks.  But all previous memory
models have been for a specific CPU architecture, so doing one for the
intersection of several is offering up some complications.  I therefore
cannot yet provide a completion date.

That said, I still suggest use of SYNC_ACQUIRE for read_barrier_depends().

> >The reason for this is that smp_read_barrier_depends() must order the
> >pointer load against any subsequent read or write through a dereference
> >of that pointer.
> 
> I can't see that requirement anywhere in Documents directory. I mean
> - the words "write through a dereference of that pointer" or similar
> for smp_read_barrier_depends.

No worries, I will add one.  Please see the end of this message for an
initial patch.

Please understand that Documentation/memory-barriers.txt is a living
document:

v4.4: Two changes
v4.3: Three changes
v4.2: Six changes
v4.1: Three changes
v4.0: Two changes

It tends to change as we locate corner cases either in hardware or
in software use cases/APIs.

> >   For example:
> >
> >	p = READ_ONCE(gp);
> >	smp_rmb();
> >	r1 = p->a; /* ordered by smp_rmb(). */
> >	p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */
> >	r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */
> >
> >In contrast:
> >
> >	p = READ_ONCE(gp);
> >	smp_read_barrier_depends();
> >	r1 = p->a; /* ordered by smp_read_barrier_depends(). */
> >	p->b = 42; /* ordered by smp_read_barrier_depends(). */
> >	r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */
> >
> >Again, if your hardware maintains local ordering for address
> >and data dependencies, you can have read_barrier_depends() and
> >smp_read_barrier_depends() be no-ops like they are for most
> >architectures.
> 
> It is not so simple, I mean "local ordering for address and data
> dependencies". Local ordering is NOT enough. It happens that current
> MIPS R6 doesn't require in your example smp_read_barrier_depends()
> but in discussion it comes out that it may not. Because without
> smp_read_barrier_depends() your example can be a part of Will's
> WRC+addr+addr and we found some design which easily can bump into
> this test. And that design actually performs "local ordering for
> address and data dependencies" too.

As noted in another email in this thread, I do not believe that
WRC+addr+addr needs to be prohibited.  Sounds like Will and I need to
get our story straight, though.

Will?

							Thanx, Paul

------------------------------------------------------------------------

commit 955720966e216b00613fcf60188d507c103f0e80
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Thu Jan 14 14:17:04 2016 -0800

    documentation: Subsequent writes ordered by rcu_dereference()
    
    The current memory-barriers.txt does not address the possibility of
    a write to a dereferenced pointer.  This should be rare, but when it
    happens, we need that write -not- to be clobbered by the initialization.
    This commit therefore adds an example showing a data dependency ordering
    a later data-dependent write.
    
    Reported-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index f49c15f7864f..c66ba46d8079 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -555,6 +555,30 @@ between the address load and the data load:
 This enforces the occurrence of one of the two implications, and prevents the
 third possibility from arising.
 
+A data-dependency barrier must also order against dependent writes:
+
+	CPU 1		      CPU 2
+	===============	      ===============
+	{ A == 1, B == 2, C = 3, P == &A, Q == &C }
+	B = 4;
+	<write barrier>
+	WRITE_ONCE(P, &B);
+			      Q = READ_ONCE(P);
+			      <data dependency barrier>
+			      *Q = 5;
+
+The data-dependency barrier must order the read into Q with the store
+into *Q.  This prohibits this outcome:
+
+	(Q == B) && (B == 4)
+
+Please note that this pattern should be rare.  After all, the whole point
+of dependency ordering is to -prevent- writes to the data structure, along
+with the expensive cache misses associated with those writes.  This pattern
+can be used to record rare error conditions and the like, and the ordering
+prevents such records from being lost.
+
+
 [!] Note that this extremely counterintuitive situation arises most easily on
 machines with split caches, so that, for example, one cache bank processes
 even-numbered cache lines and the other bank processes odd-numbered cache

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 21:45                                   ` Leonid Yegoshin
@ 2016-01-14 22:24                                     ` Paul E. McKenney
  2016-01-14 23:04                                       ` Leonid Yegoshin
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 22:24 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Peter Zijlstra, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 01:45:44PM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 01:34 PM, Paul E. McKenney wrote:
> >On Thu, Jan 14, 2016 at 12:46:43PM -0800, Leonid Yegoshin wrote:
> >>On 01/14/2016 12:15 PM, Peter Zijlstra wrote:
> >>>On Thu, Jan 14, 2016 at 11:42:02AM -0800, Leonid Yegoshin wrote:
> >>>>An the only point - please use an appropriate SYNC_* barriers instead of
> >>>>heavy bold hammer. That stuff was design explicitly to support the
> >>>>requirements of Documentation/memory-barriers.txt
> >>>That's madness. That document changes from version to version as to what
> >>>we _think_ the actual hardware does. It is _NOT_ a specification.
> >>>
> >>>You cannot design hardware from that. Its incomplete and fails to
> >>>specify a bunch of things. It not a mathematically sound definition of a
> >>>memory model.
> >>>
> >>>Please stop referring to that document for what a particular barrier
> >>>_should_ do.  Explain what MIPS does, so we can attempt to integrate
> >>>this knowledge with our knowledge of PPC/ARM/Alpha/x86/etc. and improve
> >>>upon our understanding of hardware and improve the Linux memory model.
> >>I am afraid I can't help you here. It is very complicated stuff and
> >>a model is actually doesn't fit your assumptions about CPUs well
> >>without some simplifications which are based on what you want to
> >>have.
> >>
> >>I say that SYNC_ACQUIRE/etc follows what you expect for smp_acquire
> >>etc (basing on that document). And at least two CPU models were
> >>tested with my patches (see it in LMO) for that last year and that
> >>instructions are implemented now in engineering kernel.
> >>
> >>If you have something else in mind, you can ask me. But I prefer to
> >>do not deviate too much from Documentation/memory-barriers.txt, for
> >>exam - if it asks to have memory barrier somewhere, then I assume
> >>the code should have it, and please - don't ask me a test which
> >>violates the current version of document recommendations.
> >>
> >>For a moment I don't see a significant changes in this document for
> >>MIPS Arch at least 1.5 year, and the only significant point is that
> >>MIPS CPU Arch doesn't have yet smp_read_barrier_depends() and
> >>smp_rmb() should be used instead.
> 
> >Is SYNC_ACQUIRE a memory-barrier instruction that orders prior loads
> >against later loads and stores?
> 
> Yes, it is in MD00087 (table 6.6 of document Ver 6.04) -
> https://imgtec.com/?do-download=4302

OK, it does look like it should work.  Of course, if you can rely
on straight address/data dependencies, that would be even better.

> >   If so, and if MIPS does not do
> >ordering based on address and data dependencies, I suggest making
> >read_barrier_depends() be a SYNC_ACQUIRE rather than SYNC_RMB.
> 
> I understood that, after I see the example of using it.
> Please consider to add that into Documentation/memory-barriers.txt
> (it is not easy to find that this barrier is used for shared WRITE
> basing on shared pointer), it would be helpful.

Actually, the Linux kernel doesn't have an acquire barrier, just an
smp_load_acquire().  Or did someone sneak one in while I wasn't looking?  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 21:36                             ` Leonid Yegoshin
@ 2016-01-14 22:55                               ` Paul E. McKenney
  2016-01-14 23:33                                 ` Leonid Yegoshin
  2016-01-15 10:24                                 ` Will Deacon
  0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-14 22:55 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 01:36:50PM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
> >
> >>On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> >>>
> >>>The WRC+addr+addr is OK because data dependencies are not required to be
> >>>transitive, in other words, they are not required to flow from one CPU to
> >>>another without the help of an explicit memory barrier.
> >>I don't see any reliable way to fit WRC+addr+addr into "DATA
> >>DEPENDENCY BARRIERS" section recommendation to have data dependency
> >>barrier between read of a shared pointer/index and read the shared
> >>data based on that pointer. If you have this two reads, it doesn't
> >>matter the rest of scenario, you should put the dependency barrier
> >>in code anyway. If you don't do it in WRC+addr+addr scenario then
> >>after years it can be easily changed to different scenario which
> >>fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> >>fails.
> >The trick is that lockless_dereference() contains an
> >smp_read_barrier_depends():
> >
> >#define lockless_dereference(p) \
> >({ \
> >	typeof(p) _________p1 = READ_ONCE(p); \
> >	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> >	(_________p1); \
> >})
> >
> >Or am I missing your point?
> 
> WRC+addr+addr has no any barrier. lockless_dereference() has a
> barrier. I don't see a common points between this and that in your
> answer, sorry.

Me, I am wondering what WRC+addr+addr has to do with anything at all.

<Going back through earlier email>

OK, so it looks like Will was asking not about WRC+addr+addr, but instead
about WRC+sync+addr.  This would drop an smp_mb() into cpu2() in my
earlier example, which needs to provide ordering.

I am guessing that the manual's "Older instructions which must be globally
performed when the SYNC instruction completes" provides the equivalent
of ARM/Power A-cumulativity, which can be thought of as transitivity
backwards in time.  This leads me to believe that your smp_mb() needs
to use SYNC rather than SYNC_MB, as was the subject of earlier spirited
discussion in this thread.

Suppose you have something like this:

	void cpu0(void)
	{
		WRITE_ONCE(a, 1);
		SYNC_MB();
		r0 = READ_ONCE(b);
	}

	void cpu1(void)
	{
		WRITE_ONCE(b, 1);
		SYNC_MB();
		r1 = READ_ONCE(c);
	}

	void cpu2(void)
	{
		WRITE_ONCE(c, 1);
		SYNC_MB();
		r2 = READ_ONCE(d);
	}

	void cpu3(void)
	{
		WRITE_ONCE(d, 1);
		SYNC_MB();
		r3 = READ_ONCE(a);
	}

Does your hardware guarantee that it is not possible for all of r0,
r1, r2, and r3 to be equal to zero at the end of the test, assuming
that a, b, c, and d are all initially zero, and the four functions
above run concurrently?  There are many similar litmus tests for other
combinations of reads and writes, but this is perhaps the nastiest from
a hardware viewpoint.  Does SYNC_MB() provide sufficient ordering for
this sort of situation?

Another (more academic) case is this one, with x and y initially zero:

	void cpu0(void)
	{
		WRITE_ONCE(x, 1);
	}

	void cpu1(void)
	{
		WRITE_ONCE(y, 1);
	}

	void cpu2(void)
	{
		r1 = READ_ONCE(x, 1);
		SYNC_MB();
		r2 = READ_ONCE(y, 1);
	}

	void cpu3(void)
	{
		r3 = READ_ONCE(y, 1);
		SYNC_MB();
		r4 = READ_ONCE(x, 1);
	}

Does SYNC_MB() prohibit r1 == 1 && r2 == 0 && r3 == 1 && r4 == 0?

Now, I don't know of any specific use cases for this pattern, but it
is greatly beloved of some of the old-school concurrency community,
so it is likely to crop up at some point, despite my best efforts.  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 22:24                                     ` Paul E. McKenney
@ 2016-01-14 23:04                                       ` Leonid Yegoshin
  0 siblings, 0 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 23:04 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On 01/14/2016 02:24 PM, Paul E. McKenney wrote:
> Actually, the Linux kernel doesn't have an acquire barrier, just an 
> smp_load_acquire(). Or did someone sneak one in while I wasn't looking?
That was an exactly starting point for this discussion. This patch just 
pulls out from MIPS files smp_load_acquire() and smp_store_release(). 
However, I put into LMO half year ago the patch 
http://patchwork.linux-mips.org/patch/10506/ which replaces a generic 
smp_mb with MIPS specific smp_release/acquire in that functions. This 
patch also fixes use of SYNCs barriers in spin_locks/atomics/bitops for 
Imagination MIPS CPUs too - it is just absent now for any Imagination 
MIPS CPUs!

Michael later pointed me that it can be returned back with his series of 
patches but discussion was already here.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 22:55                               ` Paul E. McKenney
@ 2016-01-14 23:33                                 ` Leonid Yegoshin
  2016-01-15  0:47                                   ` Paul E. McKenney
  2016-01-15 10:24                                 ` Will Deacon
  1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-14 23:33 UTC (permalink / raw)
  To: paulmck
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On 01/14/2016 02:55 PM, Paul E. McKenney wrote:
> OK, so it looks like Will was asking not about WRC+addr+addr, but instead
> about WRC+sync+addr.
(He actually asked twice about this and that too but skip this)

> I am guessing that the manual's "Older instructions which must be globally
> performed when the SYNC instruction completes" provides the equivalent
> of ARM/Power A-cumulativity, which can be thought of as transitivity
> backwards in time.  This leads me to believe that your smp_mb() needs
> to use SYNC rather than SYNC_MB, as was the subject of earlier spirited
> discussion in this thread.

Don't be fooled here by words "ordered" and "completed" - it is HW 
design items and actually written poorly.
Just assume that SYNC_MB is absolutely the same as SYNC for any CPU and 
coherent device (besides performance). The difference can be in 
non-coherent devices because SYNC actually tries to make a barrier for 
them too. In some SoCs it is just the same because there is no need to 
barrier a non-coherent device (device register access usually strictly 
ordered... if there is no bridge in between).

>
> Suppose you have something like this:
> ...
> Does your hardware guarantee that it is not possible for all of r0,
> r1, r2, and r3 to be equal to zero at the end of the test, assuming
> that a, b, c, and d are all initially zero, and the four functions
> above run concurrently?

It is assumed to be so from Arch point of view. HW bugs are possible, of 
course.

> Another (more academic) case is this one, with x and y initially zero:
>
> ...
> Does SYNC_MB() prohibit r1 == 1 && r2 == 0 && r3 == 1 && r4 == 0?

It is assumed to be so from Arch point of view. HW bugs are possible, of 
course.

Note: I am not sure about ANY past MIPS R2 CPU because that stuff is 
implemented some time but nobody made it in Linux kernel (it was used by 
some vendor for non-Linux system). For that reason my patch for 
lightweight SYNCs has an option - implement it or implement a generic 
SYNC. It is possible that some vendor did it in different way but nobody 
knows or test it. But as a minimum - SYNC must be implemented in 
spinlocks/atomics/bitops, in recent P5600 it is proven that read can 
pass write in atomics.

MIPS R6 is a different story, I verified lightweight SYNCs from the 
beginning and it also should use SYNCs.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 23:33                                 ` Leonid Yegoshin
@ 2016-01-15  0:47                                   ` Paul E. McKenney
  2016-01-15  1:07                                     ` Leonid Yegoshin
  2016-01-27 10:40                                     ` Ralf Baechle
  0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15  0:47 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 03:33:40PM -0800, Leonid Yegoshin wrote:
> On 01/14/2016 02:55 PM, Paul E. McKenney wrote:
> >OK, so it looks like Will was asking not about WRC+addr+addr, but instead
> >about WRC+sync+addr.
> (He actually asked twice about this and that too but skip this)

Fair enough!  ;-)

> >I am guessing that the manual's "Older instructions which must be globally
> >performed when the SYNC instruction completes" provides the equivalent
> >of ARM/Power A-cumulativity, which can be thought of as transitivity
> >backwards in time.  This leads me to believe that your smp_mb() needs
> >to use SYNC rather than SYNC_MB, as was the subject of earlier spirited
> >discussion in this thread.
> 
> Don't be fooled here by words "ordered" and "completed" - it is HW
> design items and actually written poorly.
> Just assume that SYNC_MB is absolutely the same as SYNC for any CPU
> and coherent device (besides performance). The difference can be in
> non-coherent devices because SYNC actually tries to make a barrier
> for them too. In some SoCs it is just the same because there is no
> need to barrier a non-coherent device (device register access
> usually strictly ordered... if there is no bridge in between).

So smp_mb() can be SYNC_MB.  However, mb() needs to be SYNC for MMIO
purposes, correct?

> >Suppose you have something like this:
> >...
> >Does your hardware guarantee that it is not possible for all of r0,
> >r1, r2, and r3 to be equal to zero at the end of the test, assuming
> >that a, b, c, and d are all initially zero, and the four functions
> >above run concurrently?
> 
> It is assumed to be so from Arch point of view. HW bugs are
> possible, of course.

Indeed!

> >Another (more academic) case is this one, with x and y initially zero:
> >
> >...
> >Does SYNC_MB() prohibit r1 == 1 && r2 == 0 && r3 == 1 && r4 == 0?
> 
> It is assumed to be so from Arch point of view. HW bugs are
> possible, of course.

Looks to me like smp_mb() can be SYNC_MB, then.

> Note: I am not sure about ANY past MIPS R2 CPU because that stuff is
> implemented some time but nobody made it in Linux kernel (it was
> used by some vendor for non-Linux system). For that reason my patch
> for lightweight SYNCs has an option - implement it or implement a
> generic SYNC. It is possible that some vendor did it in different
> way but nobody knows or test it. But as a minimum - SYNC must be
> implemented in spinlocks/atomics/bitops, in recent P5600 it is
> proven that read can pass write in atomics.
> 
> MIPS R6 is a different story, I verified lightweight SYNCs from the
> beginning and it also should use SYNCs.

So you need to build a different kernel for some types of MIPS systems?
Or do you do boot-time rewriting, like a number of other arches do?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15  0:47                                   ` Paul E. McKenney
@ 2016-01-15  1:07                                     ` Leonid Yegoshin
  2016-01-27 11:26                                       ` Maciej W. Rozycki
  2016-01-27 10:40                                     ` Ralf Baechle
  1 sibling, 1 reply; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-15  1:07 UTC (permalink / raw)
  To: paulmck
  Cc: Will Deacon, Peter Zijlstra, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On 01/14/2016 04:47 PM, Paul E. McKenney wrote:
> On Thu, Jan 14, 2016 at 03:33:40PM -0800, Leonid Yegoshin wrote:
>> Don't be fooled here by words "ordered" and "completed" - it is HW
>> design items and actually written poorly.
>> Just assume that SYNC_MB is absolutely the same as SYNC for any CPU
>> and coherent device (besides performance). The difference can be in
>> non-coherent devices because SYNC actually tries to make a barrier
>> for them too. In some SoCs it is just the same because there is no
>> need to barrier a non-coherent device (device register access
>> usually strictly ordered... if there is no bridge in between).
> So smp_mb() can be SYNC_MB.  However, mb() needs to be SYNC for MMIO
> purposes, correct?

Absolutely. For MIPS R2 which is not Octeon.

>> Note: I am not sure about ANY past MIPS R2 CPU because that stuff is
>> implemented some time but nobody made it in Linux kernel (it was
>> used by some vendor for non-Linux system). For that reason my patch
>> for lightweight SYNCs has an option - implement it or implement a
>> generic SYNC. It is possible that some vendor did it in different
>> way but nobody knows or test it. But as a minimum - SYNC must be
>> implemented in spinlocks/atomics/bitops, in recent P5600 it is
>> proven that read can pass write in atomics.
>>
>> MIPS R6 is a different story, I verified lightweight SYNCs from the
>> beginning and it also should use SYNCs.
> So you need to build a different kernel for some types of MIPS systems?
> Or do you do boot-time rewriting, like a number of other arches do?

I don't know. I would like to have responses. Ralf asked Maciej about 
old systems and that came nowhere. Even rewrite - don't know what to do 
with that: no lightweight SYNC or no SYNC at all - yes, it is still 
possible that SYNC on some systems can be too heavy or even harmful, 
nobody tested that.

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 21:29                           ` Paul E. McKenney
  2016-01-14 21:36                             ` Leonid Yegoshin
@ 2016-01-15  8:55                             ` Peter Zijlstra
  2016-01-15  9:13                               ` Peter Zijlstra
  2016-01-15 17:39                               ` Paul E. McKenney
  1 sibling, 2 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-15  8:55 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> So smp_mb() provides transitivity, as do pairs of smp_store_release()
> and smp_read_acquire(), 

But they provide different grades of transitivity, which is where all
the confusion lays.

smp_mb() is strongly/globally transitive, all CPUs will agree on the order.

Whereas the RCpc release+acquire is weakly so, only the two cpus
involved in the handover will agree on the order.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15  8:55                             ` Peter Zijlstra
@ 2016-01-15  9:13                               ` Peter Zijlstra
  2016-01-15 17:46                                 ` Paul E. McKenney
  2016-01-15 17:39                               ` Paul E. McKenney
  1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-15  9:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > and smp_read_acquire(), 
> 
> But they provide different grades of transitivity, which is where all
> the confusion lays.
> 
> smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
> 
> Whereas the RCpc release+acquire is weakly so, only the two cpus
> involved in the handover will agree on the order.

And the stuff we're confused about is how best to express the difference
and guarantees of these two forms of transitivity and how exactly they
interact.

And smp_load_acquire()/smp_store_release() are RCpc because TSO archs
and PPC. the atomic*_{acquire,release}() are RCpc because PPC and
LOCK,UNLOCK are similarly RCpc because of PPC.

Now we'd like PPC to stick a SYNC in either LOCK or UNLOCK so at least
the locks are RCsc again, but they resist for performance reasons but
waver because they don't want to be the ones finding all the nasty bugs
because they're the only one.

Now the thing I worry about, and still have not had an answer to is if
weakly ordered MIPS will end up being RCsc or RCpc for their locks if
they get implemented with SYNC_ACQUIRE and SYNC_RELEASE instead of the
current SYNC.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 22:20                               ` Paul E. McKenney
@ 2016-01-15  9:57                                 ` Will Deacon
  2016-01-15 18:54                                   ` Leonid Yegoshin
  2016-01-26 10:24                                 ` Peter Zijlstra
  1 sibling, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-15  9:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

Paul,

On Thu, Jan 14, 2016 at 02:20:46PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 14, 2016 at 01:24:34PM -0800, Leonid Yegoshin wrote:
> > It is not so simple, I mean "local ordering for address and data
> > dependencies". Local ordering is NOT enough. It happens that current
> > MIPS R6 doesn't require in your example smp_read_barrier_depends()
> > but in discussion it comes out that it may not. Because without
> > smp_read_barrier_depends() your example can be a part of Will's
> > WRC+addr+addr and we found some design which easily can bump into
> > this test. And that design actually performs "local ordering for
> > address and data dependencies" too.
> 
> As noted in another email in this thread, I do not believe that
> WRC+addr+addr needs to be prohibited.  Sounds like Will and I need to
> get our story straight, though.

I think you figured this out while I was sleeping, but just to confirm:

 1. The MIPS64 ISA doc [1] talks about SYNC in a way that applies only
    to memory accesses appearing in *program-order* before the SYNC

 2. We need WRC+sync+addr to work, which means that the SYNC in P1 must
    also capture the store in P0 as being "before" the barrier. Leonid
    reckons it works, but his explanation [2] focussed on the address
    dependency in P2 as to why this works. If that is the case (i.e.
    address dependency provides global transitivity), then WRC+addr+addr
    should also work (even though its not required).

 3. It seems that WRC+addr+addr doesn't work, so I'm still suspicious
    about WRC+sync+addr, because neither the architecture document or
    Leonid's explanation tell me that it should be forbidden.

Will

[1] https://imgtec.com/?do-download=4302
[2] http://lkml.kernel.org/r/569565DA.2010903@imgtec.com (scroll to the end)

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 22:55                               ` Paul E. McKenney
  2016-01-14 23:33                                 ` Leonid Yegoshin
@ 2016-01-15 10:24                                 ` Will Deacon
  2016-01-15 17:54                                   ` Paul E. McKenney
  1 sibling, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-15 10:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 02:55:10PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 14, 2016 at 01:36:50PM -0800, Leonid Yegoshin wrote:
> > On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
> > >
> > >>On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> > >>>
> > >>>The WRC+addr+addr is OK because data dependencies are not required to be
> > >>>transitive, in other words, they are not required to flow from one CPU to
> > >>>another without the help of an explicit memory barrier.
> > >>I don't see any reliable way to fit WRC+addr+addr into "DATA
> > >>DEPENDENCY BARRIERS" section recommendation to have data dependency
> > >>barrier between read of a shared pointer/index and read the shared
> > >>data based on that pointer. If you have this two reads, it doesn't
> > >>matter the rest of scenario, you should put the dependency barrier
> > >>in code anyway. If you don't do it in WRC+addr+addr scenario then
> > >>after years it can be easily changed to different scenario which
> > >>fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> > >>fails.
> > >The trick is that lockless_dereference() contains an
> > >smp_read_barrier_depends():
> > >
> > >#define lockless_dereference(p) \
> > >({ \
> > >	typeof(p) _________p1 = READ_ONCE(p); \
> > >	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > >	(_________p1); \
> > >})
> > >
> > >Or am I missing your point?
> > 
> > WRC+addr+addr has no any barrier. lockless_dereference() has a
> > barrier. I don't see a common points between this and that in your
> > answer, sorry.
> 
> Me, I am wondering what WRC+addr+addr has to do with anything at all.

See my earlier reply [1] (but also, your WRC Linux example looks more
like a variant on WWC and I couldn't really follow it).

> <Going back through earlier email>
> 
> OK, so it looks like Will was asking not about WRC+addr+addr, but instead
> about WRC+sync+addr.  This would drop an smp_mb() into cpu2() in my
> earlier example, which needs to provide ordering.
> 
> I am guessing that the manual's "Older instructions which must be globally
> performed when the SYNC instruction completes" provides the equivalent
> of ARM/Power A-cumulativity, which can be thought of as transitivity
> backwards in time. 

I couldn't make that leap. In particular, the manual's "Detailed
Description" sections explicitly refer to program-order:

  Every synchronizable specified memory instruction (loads or stores or
  both) that occurs in the instruction stream before the SYNC
  instruction must reach a stage in the load/store datapath after which
  no instruction re-ordering is possible before any synchronizable
  specified memory instruction which occurs after the SYNC instruction
  in the instruction stream reaches the same stage in the load/store
  datapath.

Will

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/399765.html

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15  8:55                             ` Peter Zijlstra
  2016-01-15  9:13                               ` Peter Zijlstra
@ 2016-01-15 17:39                               ` Paul E. McKenney
  2016-01-15 21:29                                 ` Peter Zijlstra
  2016-01-25 18:02                                 ` Will Deacon
  1 sibling, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 17:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > and smp_read_acquire(), 
> 
> But they provide different grades of transitivity, which is where all
> the confusion lays.
> 
> smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
> 
> Whereas the RCpc release+acquire is weakly so, only the two cpus
> involved in the handover will agree on the order.

Good point!

Using grace periods in place of smp_mb() also provides strong/global
transitivity, but also insanely high latencies.  ;-)

The patch below updates Documentation/memory-barriers.txt to define
local vs. global transitivity.  The corresponding ppcmem litmus test
is included below as well.

Should we start putting litmus tests for the various examples
somewhere, perhaps in a litmus-tests directory within each participating
architecture?  I have a pile of powerpc-related litmus tests on my laptop,
but they probably aren't doing all that much good there.

							Thanx, Paul

------------------------------------------------------------------------

PPC local-transitive
""
{
0:r1=1; 0:r2=u; 0:r3=v; 0:r4=x; 0:r5=y; 0:r6=z;
1:r1=1; 1:r2=u; 1:r3=v; 1:r4=x; 1:r5=y; 1:r6=z;
2:r1=1; 2:r2=u; 2:r3=v; 2:r4=x; 2:r5=y; 2:r6=z;
3:r1=1; 3:r2=u; 3:r3=v; 3:r4=x; 3:r5=y; 3:r6=z;
}
 P0           | P1           | P2           | P3           ;
 lwz r9,0(r4) | lwz r9,0(r5) | lwz r9,0(r6) | stw r1,0(r3) ;
 lwsync       | lwsync       | lwsync       | sync         ;
 stw r1,0(r2) | lwz r8,0(r3) | stw r1,0(r7) | lwz r9,0(r2) ;
 lwsync       | lwz r7,0(r2) |              |              ;
 stw r1,0(r5) | lwsync       |              |              ;
              | stw r1,0(r6) |              |              ;
exists
(* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r8=0 /\ 3:r9=0) *)
(* (0:r9=1 /\ 1:r9=1 /\ 2:r9=1) *)
(* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0) *)
(0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0)

------------------------------------------------------------------------

commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Fri Jan 15 09:30:42 2016 -0800

    documentation: Distinguish between local and global transitivity
    
    The introduction of smp_load_acquire() and smp_store_release() had
    the side effect of introducing a weaker notion of transitivity:
    The transitivity of full smp_mb() barriers is global, but that
    of smp_store_release()/smp_load_acquire() chains is local.  This
    commit therefore introduces the notion of local transitivity and
    gives an example.
    
    Reported-by: Peter Zijlstra <peterz@infradead.org>
    Reported-by: Will Deacon <will.deacon@arm.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index c66ba46d8079..d8109ed99342 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
 General barriers are therefore required to ensure that all CPUs agree
 on the combined order of CPU 1's and CPU 2's accesses.
 
-To reiterate, if your code requires transitivity, use general barriers
-throughout.
+General barriers provide "global transitivity", so that all CPUs will
+agree on the order of operations.  In contrast, a chain of release-acquire
+pairs provides only "local transitivity", so that only those CPUs on
+the chain are guaranteed to agree on the combined order of the accesses.
+For example, switching to C code in deference to Herman Hollerith:
+
+	int u, v, x, y, z;
+
+	void cpu0(void)
+	{
+		r0 = smp_load_acquire(&x);
+		WRITE_ONCE(u, 1);
+		smp_store_release(&y, 1);
+	}
+
+	void cpu1(void)
+	{
+		r1 = smp_load_acquire(&y);
+		r4 = READ_ONCE(v);
+		r5 = READ_ONCE(u);
+		smp_store_release(&z, 1);
+	}
+
+	void cpu2(void)
+	{
+		r2 = smp_load_acquire(&z);
+		smp_store_release(&x, 1);
+	}
+
+	void cpu3(void)
+	{
+		WRITE_ONCE(v, 1);
+		smp_mb();
+		r3 = READ_ONCE(u);
+	}
+
+Because cpu0(), cpu1(), and cpu2() participate in a local transitive
+chain of smp_store_release()/smp_load_acquire() pairs, the following
+outcome is prohibited:
+
+	r0 == 1 && r1 == 1 && r2 == 1
+
+Furthermore, because of the release-acquire relationship between cpu0()
+and cpu1(), cpu1() must see cpu0()'s writes, so that the following
+outcome is prohibited:
+
+	r1 == 1 && r5 == 0
+
+However, the transitivity of release-acquire is local to the participating
+CPUs and does not apply to cpu3().  Therefore, the following outcome
+is possible:
+
+	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
+
+Although cpu0(), cpu1(), and cpu2() will see their respective reads and
+writes in order, CPUs not involved in the release-acquire chain might
+well disagree on the order.  This disagreement stems from the fact that
+the weak memory-barrier instructions used to implement smp_load_acquire()
+and smp_store_release() are not required to order prior stores against
+subsequent loads in all cases.  This means that cpu3() can see cpu0()'s
+store to u as happening -after- cpu1()'s load from v, even though
+both cpu0() and cpu1() agree that these two operations occurred in the
+intended order.
+
+However, please keep in mind that smp_load_acquire() is not magic.
+In particular, it simply reads from its argument with ordering.  It does
+-not- ensure that any particular value will be read.  Therefore, the
+following outcome is possible:
+
+	r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0
+
+Note that this outcome can happen even on a mythical sequentially
+consistent system where nothing is ever reordered.
+
+To reiterate, if your code requires global transitivity, use general
+barriers throughout.
 
 
 ========================

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15  9:13                               ` Peter Zijlstra
@ 2016-01-15 17:46                                 ` Paul E. McKenney
  2016-01-15 21:27                                   ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 17:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > > and smp_read_acquire(), 
> > 
> > But they provide different grades of transitivity, which is where all
> > the confusion lays.
> > 
> > smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
> > 
> > Whereas the RCpc release+acquire is weakly so, only the two cpus
> > involved in the handover will agree on the order.
> 
> And the stuff we're confused about is how best to express the difference
> and guarantees of these two forms of transitivity and how exactly they
> interact.

Hoping my memory-barrier.txt patch helps here...

> And smp_load_acquire()/smp_store_release() are RCpc because TSO archs
> and PPC. the atomic*_{acquire,release}() are RCpc because PPC and
> LOCK,UNLOCK are similarly RCpc because of PPC.
> 
> Now we'd like PPC to stick a SYNC in either LOCK or UNLOCK so at least
> the locks are RCsc again, but they resist for performance reasons but
> waver because they don't want to be the ones finding all the nasty bugs
> because they're the only one.

I believe that the relevant proverb said something about starving to
death between two bales of hay...  ;-)

> Now the thing I worry about, and still have not had an answer to is if
> weakly ordered MIPS will end up being RCsc or RCpc for their locks if
> they get implemented with SYNC_ACQUIRE and SYNC_RELEASE instead of the
> current SYNC.

It would be good to have better clarity on this, no two ways about it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15 10:24                                 ` Will Deacon
@ 2016-01-15 17:54                                   ` Paul E. McKenney
  2016-01-15 19:28                                     ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 17:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> On Thu, Jan 14, 2016 at 02:55:10PM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 14, 2016 at 01:36:50PM -0800, Leonid Yegoshin wrote:
> > > On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
> > > >
> > > >>On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> > > >>>
> > > >>>The WRC+addr+addr is OK because data dependencies are not required to be
> > > >>>transitive, in other words, they are not required to flow from one CPU to
> > > >>>another without the help of an explicit memory barrier.
> > > >>I don't see any reliable way to fit WRC+addr+addr into "DATA
> > > >>DEPENDENCY BARRIERS" section recommendation to have data dependency
> > > >>barrier between read of a shared pointer/index and read the shared
> > > >>data based on that pointer. If you have this two reads, it doesn't
> > > >>matter the rest of scenario, you should put the dependency barrier
> > > >>in code anyway. If you don't do it in WRC+addr+addr scenario then
> > > >>after years it can be easily changed to different scenario which
> > > >>fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> > > >>fails.
> > > >The trick is that lockless_dereference() contains an
> > > >smp_read_barrier_depends():
> > > >
> > > >#define lockless_dereference(p) \
> > > >({ \
> > > >	typeof(p) _________p1 = READ_ONCE(p); \
> > > >	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > >	(_________p1); \
> > > >})
> > > >
> > > >Or am I missing your point?
> > > 
> > > WRC+addr+addr has no any barrier. lockless_dereference() has a
> > > barrier. I don't see a common points between this and that in your
> > > answer, sorry.
> > 
> > Me, I am wondering what WRC+addr+addr has to do with anything at all.
> 
> See my earlier reply [1] (but also, your WRC Linux example looks more
> like a variant on WWC and I couldn't really follow it).

I will revisit my WRC Linux example.  And yes, creating litmus tests
that use non-fake dependencies is still a bit of an undertaking.  :-/
I am sure that it will seem more natural with time and experience...

> > <Going back through earlier email>
> > 
> > OK, so it looks like Will was asking not about WRC+addr+addr, but instead
> > about WRC+sync+addr.  This would drop an smp_mb() into cpu2() in my
> > earlier example, which needs to provide ordering.
> > 
> > I am guessing that the manual's "Older instructions which must be globally
> > performed when the SYNC instruction completes" provides the equivalent
> > of ARM/Power A-cumulativity, which can be thought of as transitivity
> > backwards in time. 
> 
> I couldn't make that leap. In particular, the manual's "Detailed
> Description" sections explicitly refer to program-order:
> 
>   Every synchronizable specified memory instruction (loads or stores or
>   both) that occurs in the instruction stream before the SYNC
>   instruction must reach a stage in the load/store datapath after which
>   no instruction re-ordering is possible before any synchronizable
>   specified memory instruction which occurs after the SYNC instruction
>   in the instruction stream reaches the same stage in the load/store
>   datapath.
> 
> Will
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/399765.html

All good points.  I think we all agree that the MIPS documentation could
use significant help.  And given that I work for the company that produced
the analogous documentation for PowerPC, that is saying something.  ;-)

We simply can't know if MIPS's memory ordering is sufficient for the
Linux kernel given its current implementation of the ordering primitives
and its current documentation.

I feel a bit better than I did earlier due to Leonid's response to my
earlier litmus-test examples.  But I do recommend some serious stress
testing of MIPS on a good set of litmus tests.  Much nicer finding issues
that way than as random irreproducible strange behavior!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15  9:57                                 ` Will Deacon
@ 2016-01-15 18:54                                   ` Leonid Yegoshin
  0 siblings, 0 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-15 18:54 UTC (permalink / raw)
  To: Will Deacon, Paul E. McKenney
  Cc: Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ralf Baechle, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman

On 01/15/2016 01:57 AM, Will Deacon wrote:
> Paul,
>
>
> I think you figured this out while I was sleeping, but just to confirm:
>
>   1. The MIPS64 ISA doc [1] talks about SYNC in a way that applies only
>      to memory accesses appearing in *program-order* before the SYNC
>
>   2. We need WRC+sync+addr to work, which means that the SYNC in P1 must
>      also capture the store in P0 as being "before" the barrier. Leonid
>      reckons it works, but his explanation [2] focussed on the address
>      dependency in P2 as to why this works. If that is the case (i.e.
>      address dependency provides global transitivity), then WRC+addr+addr
>      should also work (even though its not required).

No, it is not correct. There is one old design which provides access to 
core (thread0 + thread1) write-buffers for threads load in advance of it 
is visible to other cores. It means, that WRC+sync+addr passes because 
of SYNC in write thread and register dependency inside other thread but 
WRC+addr+addr may fail because other core may get a stale data.

>
>   3. It seems that WRC+addr+addr doesn't work, so I'm still suspicious
>      about WRC+sync+addr, because neither the architecture document or
>      Leonid's explanation tell me that it should be forbidden.
>
> Will
>
> [1] https://imgtec.com/?do-download=4302
> [2] http://lkml.kernel.org/r/569565DA.2010903@imgtec.com (scroll to the end)

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15 17:54                                   ` Paul E. McKenney
@ 2016-01-15 19:28                                     ` Paul E. McKenney
  2016-01-25 14:41                                       ` Will Deacon
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 19:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > On Thu, Jan 14, 2016 at 02:55:10PM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 14, 2016 at 01:36:50PM -0800, Leonid Yegoshin wrote:
> > > > On 01/14/2016 01:29 PM, Paul E. McKenney wrote:
> > > > >
> > > > >>On 01/14/2016 12:34 PM, Paul E. McKenney wrote:
> > > > >>>
> > > > >>>The WRC+addr+addr is OK because data dependencies are not required to be
> > > > >>>transitive, in other words, they are not required to flow from one CPU to
> > > > >>>another without the help of an explicit memory barrier.
> > > > >>I don't see any reliable way to fit WRC+addr+addr into "DATA
> > > > >>DEPENDENCY BARRIERS" section recommendation to have data dependency
> > > > >>barrier between read of a shared pointer/index and read the shared
> > > > >>data based on that pointer. If you have this two reads, it doesn't
> > > > >>matter the rest of scenario, you should put the dependency barrier
> > > > >>in code anyway. If you don't do it in WRC+addr+addr scenario then
> > > > >>after years it can be easily changed to different scenario which
> > > > >>fits some of scenario in "DATA DEPENDENCY BARRIERS" section and
> > > > >>fails.
> > > > >The trick is that lockless_dereference() contains an
> > > > >smp_read_barrier_depends():
> > > > >
> > > > >#define lockless_dereference(p) \
> > > > >({ \
> > > > >	typeof(p) _________p1 = READ_ONCE(p); \
> > > > >	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > > >	(_________p1); \
> > > > >})
> > > > >
> > > > >Or am I missing your point?
> > > > 
> > > > WRC+addr+addr has no any barrier. lockless_dereference() has a
> > > > barrier. I don't see a common points between this and that in your
> > > > answer, sorry.
> > > 
> > > Me, I am wondering what WRC+addr+addr has to do with anything at all.
> > 
> > See my earlier reply [1] (but also, your WRC Linux example looks more
> > like a variant on WWC and I couldn't really follow it).
> 
> I will revisit my WRC Linux example.  And yes, creating litmus tests
> that use non-fake dependencies is still a bit of an undertaking.  :-/
> I am sure that it will seem more natural with time and experience...

Hmmm...  You are quite right, I did do WWC.  I need to change cpu2()'s
last access from a store to a load to get WRC.  Plus the levels of
indirection definitely didn't match up, did they?

	struct foo {
		struct foo *next;
	};
	struct foo a;
	struct foo b;
	struct foo c = { &a };
	struct foo d = { &b };
	struct foo x = { &c };
	struct foo y = { &d };
	struct foo *r1, *r2, *r3;

	void cpu0(void)
	{
		WRITE_ONCE(x.next, &y);
	}

	void cpu1(void)
	{
		r1 = lockless_dereference(x.next);
		WRITE_ONCE(r1->next, &x);
	}

	void cpu2(void)
	{
		r2 = lockless_dereference(y.next);
		r3 = READ_ONCE(r2->next);
	}

In this case, it is legal to end the run with:

	r1 == &y && r2 == &x && r3 == &c

Please see below for a ppcmem litmus test.

So, did I get it right this time?  ;-)

							Thanx, Paul

PS.  And yes, working through this does help me understand the
     benefits of fake dependencies.  Why do you ask?  ;-)

------------------------------------------------------------------------

PPC WRCnf+addrs
""
{
0:r2=x; 0:r3=y;
1:r2=x; 1:r3=y;
2:r2=x; 2:r3=y;
c=a; d=b; x=c; y=d;
}
 P0           | P1            | P2            ;
 stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
              | stw r2,0(r3)  | lwz r9,0(r8)  ;
exists
(1:r8=y /\ 2:r8=x /\ 2:r9=c)

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15 17:46                                 ` Paul E. McKenney
@ 2016-01-15 21:27                                   ` Peter Zijlstra
  2016-01-15 21:58                                     ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-15 21:27 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:

> > And the stuff we're confused about is how best to express the difference
> > and guarantees of these two forms of transitivity and how exactly they
> > interact.
> 
> Hoping my memory-barrier.txt patch helps here...

Yes, that seems a good start. But yesterday you raised the 'fun' point
of two globally ordered sequences connected by a single local link.

And I think I'm still confused on LWSYNC (in the smp_wmb case) when one
of the stores looses a conflict, and if that scenario matters. If it
does, we should inspect the same case for other barriers.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15 17:39                               ` Paul E. McKenney
@ 2016-01-15 21:29                                 ` Peter Zijlstra
  2016-01-15 22:01                                   ` Paul E. McKenney
  2016-01-25 18:02                                 ` Will Deacon
  1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-15 21:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 09:39:12AM -0800, Paul E. McKenney wrote:
> Should we start putting litmus tests for the various examples
> somewhere, perhaps in a litmus-tests directory within each participating
> architecture?  I have a pile of powerpc-related litmus tests on my laptop,
> but they probably aren't doing all that much good there.

Yeah, or a version of them in C that we can 'compile'?
> 
> commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Fri Jan 15 09:30:42 2016 -0800
> 
>     documentation: Distinguish between local and global transitivity
>     
>     The introduction of smp_load_acquire() and smp_store_release() had
>     the side effect of introducing a weaker notion of transitivity:
>     The transitivity of full smp_mb() barriers is global, but that
>     of smp_store_release()/smp_load_acquire() chains is local.  This
>     commit therefore introduces the notion of local transitivity and
>     gives an example.
>     
>     Reported-by: Peter Zijlstra <peterz@infradead.org>
>     Reported-by: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

I think it fails to mention smp_mb__after_release_acquire(), although I
suspect we didn't actually introduce the primitive yet, which raises the
point, do we want to?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15 21:27                                   ` Peter Zijlstra
@ 2016-01-15 21:58                                     ` Paul E. McKenney
  2016-01-25 16:42                                       ` Will Deacon
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 21:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
> 
> > > And the stuff we're confused about is how best to express the difference
> > > and guarantees of these two forms of transitivity and how exactly they
> > > interact.
> > 
> > Hoping my memory-barrier.txt patch helps here...
> 
> Yes, that seems a good start. But yesterday you raised the 'fun' point
> of two globally ordered sequences connected by a single local link.

The conclusion that I am slowly coming to is that litmus tests should
not be thought of as linear chains, but rather as cycles.  If you think
of it as a cycle, then it doesn't matter where the local link is, just
how many of them and how they are connected.

But I will admit that there are some rather strange litmus tests that
challenge this cycle-centric view, for example, the one shown below.
It turns out that herd and ppcmem disagree on the outcome.  (The Power
architects side with ppcmem.)

> And I think I'm still confused on LWSYNC (in the smp_wmb case) when one
> of the stores looses a conflict, and if that scenario matters. If it
> does, we should inspect the same case for other barriers.

Indeed.  I am still working on how these should be described.  My
current thought is to be quite conservative on what ordering is
actually respected, however, the current task is formalizing how
RCU plays with the rest of the memory model.

							Thanx, Paul

------------------------------------------------------------------------

PPC Overlapping Group-B sets version 4
""
(* When the Group-B sets from two different barriers involve instructions in
   the same thread, within that thread one set must contain the other.

	P0	P1	P2
	Rx=1	Wy=1	Wz=2
	dep.	lwsync	lwsync
	Ry=0	Wz=1	Wx=1
	Rz=1

	assert(!(z=2))

   Forbidden by ppcmem, allowed by herd.
*)
{
0:r1=x; 0:r2=y; 0:r3=z;
1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
}
 P0		| P1		| P2		;
 lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
 xor r7,r6,r6	| lwsync	| lwsync	;
 lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
 lwz r8,0(r3)	|		|		;

exists
(z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15 21:29                                 ` Peter Zijlstra
@ 2016-01-15 22:01                                   ` Paul E. McKenney
  0 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-15 22:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 10:29:12PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 15, 2016 at 09:39:12AM -0800, Paul E. McKenney wrote:
> > Should we start putting litmus tests for the various examples
> > somewhere, perhaps in a litmus-tests directory within each participating
> > architecture?  I have a pile of powerpc-related litmus tests on my laptop,
> > but they probably aren't doing all that much good there.
> 
> Yeah, or a version of them in C that we can 'compile'?

That would be good as well.  I am guessing that architecture-specific
litmus tests will also be needed, but you are right that
architecture-independent versions are higher priority.

> > commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Fri Jan 15 09:30:42 2016 -0800
> > 
> >     documentation: Distinguish between local and global transitivity
> >     
> >     The introduction of smp_load_acquire() and smp_store_release() had
> >     the side effect of introducing a weaker notion of transitivity:
> >     The transitivity of full smp_mb() barriers is global, but that
> >     of smp_store_release()/smp_load_acquire() chains is local.  This
> >     commit therefore introduces the notion of local transitivity and
> >     gives an example.
> >     
> >     Reported-by: Peter Zijlstra <peterz@infradead.org>
> >     Reported-by: Will Deacon <will.deacon@arm.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> I think it fails to mention smp_mb__after_release_acquire(), although I
> suspect we didn't actually introduce the primitive yet, which raises the
> point, do we want to?

Well, it is not in v4.4.  I believe that we need good use cases before
we add it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 20:48                           ` Paul E. McKenney
  2016-01-14 21:24                             ` Leonid Yegoshin
@ 2016-01-18  8:19                             ` Herbert Xu
  2016-01-18 15:46                               ` Paul E. McKenney
  1 sibling, 1 reply; 153+ messages in thread
From: Herbert Xu @ 2016-01-18  8:19 UTC (permalink / raw)
  To: paulmck
  Cc: Leonid.Yegoshin, linux-mips, linux-ia64, mst, peterz,
	will.deacon, virtualization, hpa, sparclinux, mingo, linux-arch,
	linux-s390, linux, user-mode-linux-devel, linux-sh, mpe, x86,
	xen-devel, mingo, linux-xtensa, james.hogan, arnd,
	stefano.stabellini, adi-buildroot-devel, ddaney.cavm, tglx,
	linux-metag, linux-arm-kernel, andrew.cooper3, linux-kernel,
	ralf, joe, linuxppc-dev, davem, Linus Torvalds

Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>
> You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
> smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.
> The reason for this is that smp_read_barrier_depends() must order the
> pointer load against any subsequent read or write through a dereference
> of that pointer.  For example:
> 
>        p = READ_ONCE(gp);
>        smp_rmb();
>        r1 = p->a; /* ordered by smp_rmb(). */
>        p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */
>        r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */
> 
> In contrast:
> 
>        p = READ_ONCE(gp);
>        smp_read_barrier_depends();
>        r1 = p->a; /* ordered by smp_read_barrier_depends(). */
>        p->b = 42; /* ordered by smp_read_barrier_depends(). */
>        r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */
> 
> Again, if your hardware maintains local ordering for address
> and data dependencies, you can have read_barrier_depends() and
> smp_read_barrier_depends() be no-ops like they are for most
> architectures.
> 
> Does that help?

This is crazy! smp_rmb started out being strictly stronger than
smp_read_barrier_depends, when did this stop being the case?
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-18  8:19                             ` Herbert Xu
@ 2016-01-18 15:46                               ` Paul E. McKenney
  2016-01-26 16:52                                 ` Boqun Feng
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-18 15:46 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Leonid.Yegoshin, linux-mips, linux-ia64, mst, peterz,
	will.deacon, virtualization, hpa, sparclinux, mingo, linux-arch,
	linux-s390, linux, user-mode-linux-devel, linux-sh, mpe, x86,
	xen-devel, mingo, linux-xtensa, james.hogan, arnd,
	stefano.stabellini, adi-buildroot-devel, ddaney.cavm, tglx,
	linux-metag, linux-arm-kernel, andrew.cooper3, linux-kernel,
	ralf, joe, linuxppc-dev, davem, Linus Torvalds

On Mon, Jan 18, 2016 at 04:19:29PM +0800, Herbert Xu wrote:
> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> >
> > You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
> > smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.
> > The reason for this is that smp_read_barrier_depends() must order the
> > pointer load against any subsequent read or write through a dereference
> > of that pointer.  For example:
> > 
> >        p = READ_ONCE(gp);
> >        smp_rmb();
> >        r1 = p->a; /* ordered by smp_rmb(). */
> >        p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */
> >        r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */
> > 
> > In contrast:
> > 
> >        p = READ_ONCE(gp);
> >        smp_read_barrier_depends();
> >        r1 = p->a; /* ordered by smp_read_barrier_depends(). */
> >        p->b = 42; /* ordered by smp_read_barrier_depends(). */
> >        r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */
> > 
> > Again, if your hardware maintains local ordering for address
> > and data dependencies, you can have read_barrier_depends() and
> > smp_read_barrier_depends() be no-ops like they are for most
> > architectures.
> > 
> > Does that help?
> 
> This is crazy! smp_rmb started out being strictly stronger than
> smp_read_barrier_depends, when did this stop being the case?

Hello, Herbert!

It is true that most Linux kernel code relies only on the read-read
properties of dependencies, but the read-write properties are useful.
Admittedly relatively rarely, but useful.

The better comparison for smp_read_barrier_depends(), especially in
its rcu_dereference*() form, is smp_load_acquire().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15 19:28                                     ` Paul E. McKenney
@ 2016-01-25 14:41                                       ` Will Deacon
  2016-01-26  1:06                                         ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-25 14:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 11:28:45AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > > See my earlier reply [1] (but also, your WRC Linux example looks more
> > > like a variant on WWC and I couldn't really follow it).
> > 
> > I will revisit my WRC Linux example.  And yes, creating litmus tests
> > that use non-fake dependencies is still a bit of an undertaking.  :-/
> > I am sure that it will seem more natural with time and experience...
> 
> Hmmm...  You are quite right, I did do WWC.  I need to change cpu2()'s
> last access from a store to a load to get WRC.  Plus the levels of
> indirection definitely didn't match up, did they?

Nope, it was pretty baffling!

> 	struct foo {
> 		struct foo *next;
> 	};
> 	struct foo a;
> 	struct foo b;
> 	struct foo c = { &a };
> 	struct foo d = { &b };
> 	struct foo x = { &c };
> 	struct foo y = { &d };
> 	struct foo *r1, *r2, *r3;
> 
> 	void cpu0(void)
> 	{
> 		WRITE_ONCE(x.next, &y);
> 	}
> 
> 	void cpu1(void)
> 	{
> 		r1 = lockless_dereference(x.next);
> 		WRITE_ONCE(r1->next, &x);
> 	}
> 
> 	void cpu2(void)
> 	{
> 		r2 = lockless_dereference(y.next);
> 		r3 = READ_ONCE(r2->next);
> 	}
> 
> In this case, it is legal to end the run with:
> 
> 	r1 == &y && r2 == &x && r3 == &c
> 
> Please see below for a ppcmem litmus test.
> 
> So, did I get it right this time?  ;-)

The code above looks correct to me (in that it matches WRC+addrs),
but your litmus test:

> PPC WRCnf+addrs
> ""
> {
> 0:r2=x; 0:r3=y;
> 1:r2=x; 1:r3=y;
> 2:r2=x; 2:r3=y;
> c=a; d=b; x=c; y=d;
> }
>  P0           | P1            | P2            ;
>  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
>               | stw r2,0(r3)  | lwz r9,0(r8)  ;
> exists
> (1:r8=y /\ 2:r8=x /\ 2:r9=c)

Seems to be missing the address dependency on P1.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15 21:58                                     ` Paul E. McKenney
@ 2016-01-25 16:42                                       ` Will Deacon
  2016-01-26  6:03                                         ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-25 16:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> > On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
> > 
> > > > And the stuff we're confused about is how best to express the difference
> > > > and guarantees of these two forms of transitivity and how exactly they
> > > > interact.
> > > 
> > > Hoping my memory-barrier.txt patch helps here...
> > 
> > Yes, that seems a good start. But yesterday you raised the 'fun' point
> > of two globally ordered sequences connected by a single local link.
> 
> The conclusion that I am slowly coming to is that litmus tests should
> not be thought of as linear chains, but rather as cycles.  If you think
> of it as a cycle, then it doesn't matter where the local link is, just
> how many of them and how they are connected.

Do you have some examples of this? I'm struggling to make it work in my
mind, or are you talking specifically in the context of the kernel
memory model?

> But I will admit that there are some rather strange litmus tests that
> challenge this cycle-centric view, for example, the one shown below.
> It turns out that herd and ppcmem disagree on the outcome.  (The Power
> architects side with ppcmem.)
> 
> > And I think I'm still confused on LWSYNC (in the smp_wmb case) when one
> > of the stores looses a conflict, and if that scenario matters. If it
> > does, we should inspect the same case for other barriers.
> 
> Indeed.  I am still working on how these should be described.  My
> current thought is to be quite conservative on what ordering is
> actually respected, however, the current task is formalizing how
> RCU plays with the rest of the memory model.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> PPC Overlapping Group-B sets version 4
> ""
> (* When the Group-B sets from two different barriers involve instructions in
>    the same thread, within that thread one set must contain the other.
> 
> 	P0	P1	P2
> 	Rx=1	Wy=1	Wz=2
> 	dep.	lwsync	lwsync
> 	Ry=0	Wz=1	Wx=1
> 	Rz=1
> 
> 	assert(!(z=2))
> 
>    Forbidden by ppcmem, allowed by herd.
> *)
> {
> 0:r1=x; 0:r2=y; 0:r3=z;
> 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> }
>  P0		| P1		| P2		;
>  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
>  xor r7,r6,r6	| lwsync	| lwsync	;
>  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
>  lwz r8,0(r3)	|		|		;
> 
> exists
> (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)

That really hurts. Assuming that the "assert(!(z=2))" is actually there
to constrain the coherence order of z to be {0->1->2}, then I think that
this test is forbidden on arm using dmb instead of lwsync. That said, I
also don't think the Rz=1 in P0 changes anything.

The double negatives don't help here! (it is forbidden to guarantee that
z is not always 2).

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15 17:39                               ` Paul E. McKenney
  2016-01-15 21:29                                 ` Peter Zijlstra
@ 2016-01-25 18:02                                 ` Will Deacon
  2016-01-26  6:12                                   ` Paul E. McKenney
  1 sibling, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-25 18:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

Hi Paul,

On Fri, Jan 15, 2016 at 09:39:12AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > > and smp_read_acquire(), 
> > 
> > But they provide different grades of transitivity, which is where all
> > the confusion lays.
> > 
> > smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
> > 
> > Whereas the RCpc release+acquire is weakly so, only the two cpus
> > involved in the handover will agree on the order.
> 
> Good point!
> 
> Using grace periods in place of smp_mb() also provides strong/global
> transitivity, but also insanely high latencies.  ;-)
> 
> The patch below updates Documentation/memory-barriers.txt to define
> local vs. global transitivity.  The corresponding ppcmem litmus test
> is included below as well.
> 
> Should we start putting litmus tests for the various examples
> somewhere, perhaps in a litmus-tests directory within each participating
> architecture?  I have a pile of powerpc-related litmus tests on my laptop,
> but they probably aren't doing all that much good there.

I too would like to have the litmus tests in the kernel so that we can
refer to them from memory-barriers.txt. Ideally they wouldn't be targetted
to a particular arch, however.

> PPC local-transitive
> ""
> {
> 0:r1=1; 0:r2=u; 0:r3=v; 0:r4=x; 0:r5=y; 0:r6=z;
> 1:r1=1; 1:r2=u; 1:r3=v; 1:r4=x; 1:r5=y; 1:r6=z;
> 2:r1=1; 2:r2=u; 2:r3=v; 2:r4=x; 2:r5=y; 2:r6=z;
> 3:r1=1; 3:r2=u; 3:r3=v; 3:r4=x; 3:r5=y; 3:r6=z;
> }
>  P0           | P1           | P2           | P3           ;
>  lwz r9,0(r4) | lwz r9,0(r5) | lwz r9,0(r6) | stw r1,0(r3) ;
>  lwsync       | lwsync       | lwsync       | sync         ;
>  stw r1,0(r2) | lwz r8,0(r3) | stw r1,0(r7) | lwz r9,0(r2) ;
>  lwsync       | lwz r7,0(r2) |              |              ;
>  stw r1,0(r5) | lwsync       |              |              ;
>               | stw r1,0(r6) |              |              ;
> exists
> (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r8=0 /\ 3:r9=0) *)
> (* (0:r9=1 /\ 1:r9=1 /\ 2:r9=1) *)
> (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0) *)
> (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0)

i.e. we should rewrite this using READ_ONCE/WRITE_ONCE and smp_mb() etc.

> ------------------------------------------------------------------------
> 
> commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Fri Jan 15 09:30:42 2016 -0800
> 
>     documentation: Distinguish between local and global transitivity
>     
>     The introduction of smp_load_acquire() and smp_store_release() had
>     the side effect of introducing a weaker notion of transitivity:
>     The transitivity of full smp_mb() barriers is global, but that
>     of smp_store_release()/smp_load_acquire() chains is local.  This
>     commit therefore introduces the notion of local transitivity and
>     gives an example.
>     
>     Reported-by: Peter Zijlstra <peterz@infradead.org>
>     Reported-by: Will Deacon <will.deacon@arm.com>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index c66ba46d8079..d8109ed99342 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
>  General barriers are therefore required to ensure that all CPUs agree
>  on the combined order of CPU 1's and CPU 2's accesses.
>  
> -To reiterate, if your code requires transitivity, use general barriers
> -throughout.
> +General barriers provide "global transitivity", so that all CPUs will
> +agree on the order of operations.  In contrast, a chain of release-acquire
> +pairs provides only "local transitivity", so that only those CPUs on
> +the chain are guaranteed to agree on the combined order of the accesses.

Thanks for having a go at this. I tried defining something axiomatically,
but got stuck pretty quickly. In my scheme, I used "data-directed
transitivity" instead of "local transitivity", since the latter seems to
be a bit of a misnomer.

> +For example, switching to C code in deference to Herman Hollerith:
> +
> +	int u, v, x, y, z;
> +
> +	void cpu0(void)
> +	{
> +		r0 = smp_load_acquire(&x);
> +		WRITE_ONCE(u, 1);
> +		smp_store_release(&y, 1);
> +	}
> +
> +	void cpu1(void)
> +	{
> +		r1 = smp_load_acquire(&y);
> +		r4 = READ_ONCE(v);
> +		r5 = READ_ONCE(u);
> +		smp_store_release(&z, 1);
> +	}
> +
> +	void cpu2(void)
> +	{
> +		r2 = smp_load_acquire(&z);
> +		smp_store_release(&x, 1);
> +	}
> +
> +	void cpu3(void)
> +	{
> +		WRITE_ONCE(v, 1);
> +		smp_mb();
> +		r3 = READ_ONCE(u);
> +	}
> +
> +Because cpu0(), cpu1(), and cpu2() participate in a local transitive
> +chain of smp_store_release()/smp_load_acquire() pairs, the following
> +outcome is prohibited:
> +
> +	r0 == 1 && r1 == 1 && r2 == 1
> +
> +Furthermore, because of the release-acquire relationship between cpu0()
> +and cpu1(), cpu1() must see cpu0()'s writes, so that the following
> +outcome is prohibited:
> +
> +	r1 == 1 && r5 == 0
> +
> +However, the transitivity of release-acquire is local to the participating
> +CPUs and does not apply to cpu3().  Therefore, the following outcome
> +is possible:
> +
> +	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0

I think you should be completely explicit and include r5 == 1 here, too.

Also -- where would you add the smp_mb__after_release_acquire to fix
(i.e. forbid) this? Immediately after cpu1()'s read of y?

> +Although cpu0(), cpu1(), and cpu2() will see their respective reads and
> +writes in order, CPUs not involved in the release-acquire chain might
> +well disagree on the order.  This disagreement stems from the fact that
> +the weak memory-barrier instructions used to implement smp_load_acquire()
> +and smp_store_release() are not required to order prior stores against
> +subsequent loads in all cases.  This means that cpu3() can see cpu0()'s
> +store to u as happening -after- cpu1()'s load from v, even though
> +both cpu0() and cpu1() agree that these two operations occurred in the
> +intended order.
> +
> +However, please keep in mind that smp_load_acquire() is not magic.
> +In particular, it simply reads from its argument with ordering.  It does
> +-not- ensure that any particular value will be read.  Therefore, the
> +following outcome is possible:
> +
> +	r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0
> +
> +Note that this outcome can happen even on a mythical sequentially
> +consistent system where nothing is ever reordered.

I'm not sure this last bit is strictly necessary. If somebody thinks that
acquire/release involve some sort of implicit synchronisation, I think
they may have bigger problems with memory-barriers.txt.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-25 14:41                                       ` Will Deacon
@ 2016-01-26  1:06                                         ` Paul E. McKenney
  2016-01-26 12:10                                           ` Will Deacon
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26  1:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Mon, Jan 25, 2016 at 02:41:34PM +0000, Will Deacon wrote:
> On Fri, Jan 15, 2016 at 11:28:45AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > > > See my earlier reply [1] (but also, your WRC Linux example looks more
> > > > like a variant on WWC and I couldn't really follow it).
> > > 
> > > I will revisit my WRC Linux example.  And yes, creating litmus tests
> > > that use non-fake dependencies is still a bit of an undertaking.  :-/
> > > I am sure that it will seem more natural with time and experience...
> > 
> > Hmmm...  You are quite right, I did do WWC.  I need to change cpu2()'s
> > last access from a store to a load to get WRC.  Plus the levels of
> > indirection definitely didn't match up, did they?
> 
> Nope, it was pretty baffling!

"It is a service that I provide."  ;-)

> > 	struct foo {
> > 		struct foo *next;
> > 	};
> > 	struct foo a;
> > 	struct foo b;
> > 	struct foo c = { &a };
> > 	struct foo d = { &b };
> > 	struct foo x = { &c };
> > 	struct foo y = { &d };
> > 	struct foo *r1, *r2, *r3;
> > 
> > 	void cpu0(void)
> > 	{
> > 		WRITE_ONCE(x.next, &y);
> > 	}
> > 
> > 	void cpu1(void)
> > 	{
> > 		r1 = lockless_dereference(x.next);
> > 		WRITE_ONCE(r1->next, &x);
> > 	}
> > 
> > 	void cpu2(void)
> > 	{
> > 		r2 = lockless_dereference(y.next);
> > 		r3 = READ_ONCE(r2->next);
> > 	}
> > 
> > In this case, it is legal to end the run with:
> > 
> > 	r1 == &y && r2 == &x && r3 == &c
> > 
> > Please see below for a ppcmem litmus test.
> > 
> > So, did I get it right this time?  ;-)
> 
> The code above looks correct to me (in that it matches WRC+addrs),
> but your litmus test:
> 
> > PPC WRCnf+addrs
> > ""
> > {
> > 0:r2=x; 0:r3=y;
> > 1:r2=x; 1:r3=y;
> > 2:r2=x; 2:r3=y;
> > c=a; d=b; x=c; y=d;
> > }
> >  P0           | P1            | P2            ;
> >  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
> >               | stw r2,0(r3)  | lwz r9,0(r8)  ;
> > exists
> > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> 
> Seems to be missing the address dependency on P1.

You are quite correct!  How about the following?

As before, both herd and ppcmem say that the cycle is allowed, as
expected, given non-transitive ordering.  To prohibit the cycle, P1
needs a suitable memory-barrier instruction.

							Thanx, Paul

------------------------------------------------------------------------

PPC WRCnf+addrs
""
{
0:r2=x; 0:r3=y;
1:r2=x; 1:r3=y;
2:r2=x; 2:r3=y;
c=a; d=b; x=c; y=d;
}
 P0           | P1            | P2            ;
 stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
              | stw r2,0(r8)  | lwz r9,0(r8)  ;
exists
(1:r8=y /\ 2:r8=x /\ 2:r9=c)

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-25 16:42                                       ` Will Deacon
@ 2016-01-26  6:03                                         ` Paul E. McKenney
  2016-01-26 10:19                                           ` Peter Zijlstra
  2016-01-26 12:16                                           ` Will Deacon
  0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26  6:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:
> > > 
> > > > > And the stuff we're confused about is how best to express the difference
> > > > > and guarantees of these two forms of transitivity and how exactly they
> > > > > interact.
> > > > 
> > > > Hoping my memory-barrier.txt patch helps here...
> > > 
> > > Yes, that seems a good start. But yesterday you raised the 'fun' point
> > > of two globally ordered sequences connected by a single local link.
> > 
> > The conclusion that I am slowly coming to is that litmus tests should
> > not be thought of as linear chains, but rather as cycles.  If you think
> > of it as a cycle, then it doesn't matter where the local link is, just
> > how many of them and how they are connected.
> 
> Do you have some examples of this? I'm struggling to make it work in my
> mind, or are you talking specifically in the context of the kernel
> memory model?

Now that you mention it, maybe it would be best to keep the transitive
and non-transitive separate for the time being anyway.  Just because it
might be possible to deal with does not necessarily mean that we should
be encouraging it.  ;-)

> > But I will admit that there are some rather strange litmus tests that
> > challenge this cycle-centric view, for example, the one shown below.
> > It turns out that herd and ppcmem disagree on the outcome.  (The Power
> > architects side with ppcmem.)
> > 
> > > And I think I'm still confused on LWSYNC (in the smp_wmb case) when one
> > > of the stores looses a conflict, and if that scenario matters. If it
> > > does, we should inspect the same case for other barriers.
> > 
> > Indeed.  I am still working on how these should be described.  My
> > current thought is to be quite conservative on what ordering is
> > actually respected, however, the current task is formalizing how
> > RCU plays with the rest of the memory model.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > PPC Overlapping Group-B sets version 4
> > ""
> > (* When the Group-B sets from two different barriers involve instructions in
> >    the same thread, within that thread one set must contain the other.
> > 
> > 	P0	P1	P2
> > 	Rx=1	Wy=1	Wz=2
> > 	dep.	lwsync	lwsync
> > 	Ry=0	Wz=1	Wx=1
> > 	Rz=1
> > 
> > 	assert(!(z=2))
> > 
> >    Forbidden by ppcmem, allowed by herd.
> > *)
> > {
> > 0:r1=x; 0:r2=y; 0:r3=z;
> > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > }
> >  P0		| P1		| P2		;
> >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> >  xor r7,r6,r6	| lwsync	| lwsync	;
> >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> >  lwz r8,0(r3)	|		|		;
> > 
> > exists
> > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> 
> That really hurts. Assuming that the "assert(!(z=2))" is actually there
> to constrain the coherence order of z to be {0->1->2}, then I think that
> this test is forbidden on arm using dmb instead of lwsync. That said, I
> also don't think the Rz=1 in P0 changes anything.

What about the smp_wmb() variant of dmb that orders only stores?

> The double negatives don't help here! (it is forbidden to guarantee that
> z is not always 2).

Yes, this is a weird one, and I don't know of any use of it.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-25 18:02                                 ` Will Deacon
@ 2016-01-26  6:12                                   ` Paul E. McKenney
  2016-01-26 10:15                                     ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26  6:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Mon, Jan 25, 2016 at 06:02:34PM +0000, Will Deacon wrote:
> Hi Paul,
> 
> On Fri, Jan 15, 2016 at 09:39:12AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote:
> > > On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote:
> > > > So smp_mb() provides transitivity, as do pairs of smp_store_release()
> > > > and smp_read_acquire(), 
> > > 
> > > But they provide different grades of transitivity, which is where all
> > > the confusion lays.
> > > 
> > > smp_mb() is strongly/globally transitive, all CPUs will agree on the order.
> > > 
> > > Whereas the RCpc release+acquire is weakly so, only the two cpus
> > > involved in the handover will agree on the order.
> > 
> > Good point!
> > 
> > Using grace periods in place of smp_mb() also provides strong/global
> > transitivity, but also insanely high latencies.  ;-)
> > 
> > The patch below updates Documentation/memory-barriers.txt to define
> > local vs. global transitivity.  The corresponding ppcmem litmus test
> > is included below as well.
> > 
> > Should we start putting litmus tests for the various examples
> > somewhere, perhaps in a litmus-tests directory within each participating
> > architecture?  I have a pile of powerpc-related litmus tests on my laptop,
> > but they probably aren't doing all that much good there.
> 
> I too would like to have the litmus tests in the kernel so that we can
> refer to them from memory-barriers.txt. Ideally they wouldn't be targetted
> to a particular arch, however.

Agreed.  Working on it...

> > PPC local-transitive
> > ""
> > {
> > 0:r1=1; 0:r2=u; 0:r3=v; 0:r4=x; 0:r5=y; 0:r6=z;
> > 1:r1=1; 1:r2=u; 1:r3=v; 1:r4=x; 1:r5=y; 1:r6=z;
> > 2:r1=1; 2:r2=u; 2:r3=v; 2:r4=x; 2:r5=y; 2:r6=z;
> > 3:r1=1; 3:r2=u; 3:r3=v; 3:r4=x; 3:r5=y; 3:r6=z;
> > }
> >  P0           | P1           | P2           | P3           ;
> >  lwz r9,0(r4) | lwz r9,0(r5) | lwz r9,0(r6) | stw r1,0(r3) ;
> >  lwsync       | lwsync       | lwsync       | sync         ;
> >  stw r1,0(r2) | lwz r8,0(r3) | stw r1,0(r7) | lwz r9,0(r2) ;
> >  lwsync       | lwz r7,0(r2) |              |              ;
> >  stw r1,0(r5) | lwsync       |              |              ;
> >               | stw r1,0(r6) |              |              ;
> > exists
> > (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r8=0 /\ 3:r9=0) *)
> > (* (0:r9=1 /\ 1:r9=1 /\ 2:r9=1) *)
> > (* (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0) *)
> > (0:r9=0 /\ 1:r9=1 /\ 2:r9=1 /\ 1:r7=0)
> 
> i.e. we should rewrite this using READ_ONCE/WRITE_ONCE and smp_mb() etc.

Yep!

> > ------------------------------------------------------------------------
> > 
> > commit 2cb4e83a1b5c89c8e39b8a64bd89269d05913e41
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Fri Jan 15 09:30:42 2016 -0800
> > 
> >     documentation: Distinguish between local and global transitivity
> >     
> >     The introduction of smp_load_acquire() and smp_store_release() had
> >     the side effect of introducing a weaker notion of transitivity:
> >     The transitivity of full smp_mb() barriers is global, but that
> >     of smp_store_release()/smp_load_acquire() chains is local.  This
> >     commit therefore introduces the notion of local transitivity and
> >     gives an example.
> >     
> >     Reported-by: Peter Zijlstra <peterz@infradead.org>
> >     Reported-by: Will Deacon <will.deacon@arm.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > index c66ba46d8079..d8109ed99342 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1318,8 +1318,82 @@ or a level of cache, CPU 2 might have early access to CPU 1's writes.
> >  General barriers are therefore required to ensure that all CPUs agree
> >  on the combined order of CPU 1's and CPU 2's accesses.
> >  
> > -To reiterate, if your code requires transitivity, use general barriers
> > -throughout.
> > +General barriers provide "global transitivity", so that all CPUs will
> > +agree on the order of operations.  In contrast, a chain of release-acquire
> > +pairs provides only "local transitivity", so that only those CPUs on
> > +the chain are guaranteed to agree on the combined order of the accesses.
> 
> Thanks for having a go at this. I tried defining something axiomatically,
> but got stuck pretty quickly. In my scheme, I used "data-directed
> transitivity" instead of "local transitivity", since the latter seems to
> be a bit of a misnomer.

I figured that "local" meant local to the CPUs participating in the
release-acquire chain.  As opposed to smp_mb() chains where the ordering
is "global" as in visible to all CPUs, whether on the chain or not.
Does that help?

> > +For example, switching to C code in deference to Herman Hollerith:
> > +
> > +	int u, v, x, y, z;
> > +
> > +	void cpu0(void)
> > +	{
> > +		r0 = smp_load_acquire(&x);
> > +		WRITE_ONCE(u, 1);
> > +		smp_store_release(&y, 1);
> > +	}
> > +
> > +	void cpu1(void)
> > +	{
> > +		r1 = smp_load_acquire(&y);
> > +		r4 = READ_ONCE(v);
> > +		r5 = READ_ONCE(u);
> > +		smp_store_release(&z, 1);
> > +	}
> > +
> > +	void cpu2(void)
> > +	{
> > +		r2 = smp_load_acquire(&z);
> > +		smp_store_release(&x, 1);
> > +	}
> > +
> > +	void cpu3(void)
> > +	{
> > +		WRITE_ONCE(v, 1);
> > +		smp_mb();
> > +		r3 = READ_ONCE(u);
> > +	}
> > +
> > +Because cpu0(), cpu1(), and cpu2() participate in a local transitive
> > +chain of smp_store_release()/smp_load_acquire() pairs, the following
> > +outcome is prohibited:
> > +
> > +	r0 == 1 && r1 == 1 && r2 == 1
> > +
> > +Furthermore, because of the release-acquire relationship between cpu0()
> > +and cpu1(), cpu1() must see cpu0()'s writes, so that the following
> > +outcome is prohibited:
> > +
> > +	r1 == 1 && r5 == 0
> > +
> > +However, the transitivity of release-acquire is local to the participating
> > +CPUs and does not apply to cpu3().  Therefore, the following outcome
> > +is possible:
> > +
> > +	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
> 
> I think you should be completely explicit and include r5 == 1 here, too.

Good point -- I added this as an additional outcome:

	r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 && r5 == 1

> Also -- where would you add the smp_mb__after_release_acquire to fix
> (i.e. forbid) this? Immediately after cpu1()'s read of y?

That sounds plausible, but we would first have to agree on exactly
what smp_mb__after_release_acquire() did.  ;-)

> > +Although cpu0(), cpu1(), and cpu2() will see their respective reads and
> > +writes in order, CPUs not involved in the release-acquire chain might
> > +well disagree on the order.  This disagreement stems from the fact that
> > +the weak memory-barrier instructions used to implement smp_load_acquire()
> > +and smp_store_release() are not required to order prior stores against
> > +subsequent loads in all cases.  This means that cpu3() can see cpu0()'s
> > +store to u as happening -after- cpu1()'s load from v, even though
> > +both cpu0() and cpu1() agree that these two operations occurred in the
> > +intended order.
> > +
> > +However, please keep in mind that smp_load_acquire() is not magic.
> > +In particular, it simply reads from its argument with ordering.  It does
> > +-not- ensure that any particular value will be read.  Therefore, the
> > +following outcome is possible:
> > +
> > +	r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0
> > +
> > +Note that this outcome can happen even on a mythical sequentially
> > +consistent system where nothing is ever reordered.
> 
> I'm not sure this last bit is strictly necessary. If somebody thinks that
> acquire/release involve some sort of implicit synchronisation, I think
> they may have bigger problems with memory-barriers.txt.

Agreed.  But unless I add text like this occasionally, such people could
easily read through much of memory-barriers.txt and think that they did
in fact understand it.  So I have to occasionally trip an assertion in
their brain.  Or try to...  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26  6:12                                   ` Paul E. McKenney
@ 2016-01-26 10:15                                     ` Peter Zijlstra
  0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-26 10:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Leonid Yegoshin, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Mon, Jan 25, 2016 at 10:12:11PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 06:02:34PM +0000, Will Deacon wrote:

> > Thanks for having a go at this. I tried defining something axiomatically,
> > but got stuck pretty quickly. In my scheme, I used "data-directed
> > transitivity" instead of "local transitivity", since the latter seems to
> > be a bit of a misnomer.
> 
> I figured that "local" meant local to the CPUs participating in the
> release-acquire chain.  As opposed to smp_mb() chains where the ordering
> is "global" as in visible to all CPUs, whether on the chain or not.
> Does that help?

That is in fact how I read and understood it.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26  6:03                                         ` Paul E. McKenney
@ 2016-01-26 10:19                                           ` Peter Zijlstra
  2016-01-26 20:13                                             ` Paul E. McKenney
  2016-01-26 12:16                                           ` Will Deacon
  1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-26 10:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Leonid Yegoshin, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:

> > > > Yes, that seems a good start. But yesterday you raised the 'fun' point
> > > > of two globally ordered sequences connected by a single local link.
> > > 
> > > The conclusion that I am slowly coming to is that litmus tests should
> > > not be thought of as linear chains, but rather as cycles.  If you think
> > > of it as a cycle, then it doesn't matter where the local link is, just
> > > how many of them and how they are connected.
> > 
> > Do you have some examples of this? I'm struggling to make it work in my
> > mind, or are you talking specifically in the context of the kernel
> > memory model?
> 
> Now that you mention it, maybe it would be best to keep the transitive
> and non-transitive separate for the time being anyway.  Just because it
> might be possible to deal with does not necessarily mean that we should
> be encouraging it.  ;-)

So isn't smp_mb__after_unlock_lock() exactly such a scenario? And would
not someone trying to implement RCsc locks using locally transitive
RELEASE/ACQUIRE operations need exactly this stuff?

That is, I am afraid we need to cover the mix of local and global
transitive operations at least in overview.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-14 22:20                               ` Paul E. McKenney
  2016-01-15  9:57                                 ` Will Deacon
@ 2016-01-26 10:24                                 ` Peter Zijlstra
  2016-01-26 10:32                                   ` Peter Zijlstra
  2016-01-26 19:44                                   ` [v3,11/41] mips: reuse asm-generic/barrier.h Paul E. McKenney
  1 sibling, 2 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-26 10:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, Jan 14, 2016 at 02:20:46PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 14, 2016 at 01:24:34PM -0800, Leonid Yegoshin wrote:
> > On 01/14/2016 12:48 PM, Paul E. McKenney wrote:
> > >
> > >So SYNC_RMB is intended to implement smp_rmb(), correct?
> > Yes.
> > >
> > >You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
> > >smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.
> > 
> > If smp_read_barrier_depends() is used to separate not only two reads
> > but read pointer and WRITE basing on that pointer (example below) -
> > yes. I just doesn't see any example of this in famous
> > Documentation/memory-barriers.txt and had no chance to know what you
> > use it in this way too.
> 
> Well, Documentation/memory-barriers.txt was intended as a guide for Linux
> kernel hackers, and not for hardware architects.

Yeah, this goes under the header: memory-barriers.txt is _NOT_ a
specification (I seem to keep repeating this).

> ------------------------------------------------------------------------
> 
> commit 955720966e216b00613fcf60188d507c103f0e80
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Thu Jan 14 14:17:04 2016 -0800
> 
>     documentation: Subsequent writes ordered by rcu_dereference()
>     
>     The current memory-barriers.txt does not address the possibility of
>     a write to a dereferenced pointer.  This should be rare, 

How are these rare? Isn't:

	rcu_read_lock()
	obj = rcu_dereference(ptr);
	if (!atomic_inc_not_zero(&obj->ref))
		obj = NULL;
	rcu_read_unlock();

a _very_ common thing to do?

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 10:24                                 ` Peter Zijlstra
@ 2016-01-26 10:32                                   ` Peter Zijlstra
  2016-01-26 11:09                                     ` Will Deacon
  2016-01-26 19:44                                   ` [v3,11/41] mips: reuse asm-generic/barrier.h Paul E. McKenney
  1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-26 10:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 11:24:02AM +0100, Peter Zijlstra wrote:

> Yeah, this goes under the header: memory-barriers.txt is _NOT_ a
> specification (I seem to keep repeating this).

Do we want this ?

---
 Documentation/memory-barriers.txt | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index a61be39c7b51..433326ebdc26 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1,3 +1,4 @@
+
 			 ============================
 			 LINUX KERNEL MEMORY BARRIERS
 			 ============================
@@ -5,6 +6,22 @@
 By: David Howells <dhowells@redhat.com>
     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
 
+==========
+DISCLAIMER
+==========
+
+This document is not a specification; it is intentionally (for the sake of
+brevity) and unintentionally (due to being human) incomplete. This document is
+meant as a guide to using the various memory barriers provided by Linux, but
+in case of any doubt (and there are many) please ask.
+
+I repeat, this document is not a specification of what Linux expects from
+hardware.
+
+=====
+INDEX
+=====
+
 Contents:
 
  (*) Abstract memory access model.

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 10:32                                   ` Peter Zijlstra
@ 2016-01-26 11:09                                     ` Will Deacon
  2016-01-26 20:11                                       ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-26 11:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 11:32:00AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 26, 2016 at 11:24:02AM +0100, Peter Zijlstra wrote:
> 
> > Yeah, this goes under the header: memory-barriers.txt is _NOT_ a
> > specification (I seem to keep repeating this).
> 
> Do we want this ?
> 
> ---
>  Documentation/memory-barriers.txt | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index a61be39c7b51..433326ebdc26 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1,3 +1,4 @@
> +
>  			 ============================
>  			 LINUX KERNEL MEMORY BARRIERS
>  			 ============================
> @@ -5,6 +6,22 @@
>  By: David Howells <dhowells@redhat.com>
>      Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>  
> +==========
> +DISCLAIMER
> +==========
> +
> +This document is not a specification; it is intentionally (for the sake of
> +brevity) and unintentionally (due to being human) incomplete. This document is
> +meant as a guide to using the various memory barriers provided by Linux, but
> +in case of any doubt (and there are many) please ask.

It might be worth adding you and me to the top of the file, to save Paul
Cc'ing us on questions (get_maintainer.pl points at poor old Corbet for
this file).

But yes, it seems that something like this is required.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26  1:06                                         ` Paul E. McKenney
@ 2016-01-26 12:10                                           ` Will Deacon
  2016-01-26 23:37                                             ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-26 12:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Mon, Jan 25, 2016 at 05:06:46PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 02:41:34PM +0000, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 11:28:45AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > > > > See my earlier reply [1] (but also, your WRC Linux example looks more
> > > > > like a variant on WWC and I couldn't really follow it).
> > > > 
> > > > I will revisit my WRC Linux example.  And yes, creating litmus tests
> > > > that use non-fake dependencies is still a bit of an undertaking.  :-/
> > > > I am sure that it will seem more natural with time and experience...
> > > 
> > > Hmmm...  You are quite right, I did do WWC.  I need to change cpu2()'s
> > > last access from a store to a load to get WRC.  Plus the levels of
> > > indirection definitely didn't match up, did they?
> > 
> > Nope, it was pretty baffling!
> 
> "It is a service that I provide."  ;-)
> 
> > > 	struct foo {
> > > 		struct foo *next;
> > > 	};
> > > 	struct foo a;
> > > 	struct foo b;
> > > 	struct foo c = { &a };
> > > 	struct foo d = { &b };
> > > 	struct foo x = { &c };
> > > 	struct foo y = { &d };
> > > 	struct foo *r1, *r2, *r3;
> > > 
> > > 	void cpu0(void)
> > > 	{
> > > 		WRITE_ONCE(x.next, &y);
> > > 	}
> > > 
> > > 	void cpu1(void)
> > > 	{
> > > 		r1 = lockless_dereference(x.next);
> > > 		WRITE_ONCE(r1->next, &x);
> > > 	}
> > > 
> > > 	void cpu2(void)
> > > 	{
> > > 		r2 = lockless_dereference(y.next);
> > > 		r3 = READ_ONCE(r2->next);
> > > 	}
> > > 
> > > In this case, it is legal to end the run with:
> > > 
> > > 	r1 == &y && r2 == &x && r3 == &c
> > > 
> > > Please see below for a ppcmem litmus test.
> > > 
> > > So, did I get it right this time?  ;-)
> > 
> > The code above looks correct to me (in that it matches WRC+addrs),
> > but your litmus test:
> > 
> > > PPC WRCnf+addrs
> > > ""
> > > {
> > > 0:r2=x; 0:r3=y;
> > > 1:r2=x; 1:r3=y;
> > > 2:r2=x; 2:r3=y;
> > > c=a; d=b; x=c; y=d;
> > > }
> > >  P0           | P1            | P2            ;
> > >  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
> > >               | stw r2,0(r3)  | lwz r9,0(r8)  ;
> > > exists
> > > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> > 
> > Seems to be missing the address dependency on P1.
> 
> You are quite correct!  How about the following?

I think that's it!

> As before, both herd and ppcmem say that the cycle is allowed, as
> expected, given non-transitive ordering.  To prohibit the cycle, P1
> needs a suitable memory-barrier instruction.
> 
> ------------------------------------------------------------------------
> 
> PPC WRCnf+addrs
> ""
> {
> 0:r2=x; 0:r3=y;
> 1:r2=x; 1:r3=y;
> 2:r2=x; 2:r3=y;
> c=a; d=b; x=c; y=d;
> }
>  P0           | P1            | P2            ;
>  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
>               | stw r2,0(r8)  | lwz r9,0(r8)  ;
> exists
> (1:r8=y /\ 2:r8=x /\ 2:r9=c)

Agreed.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26  6:03                                         ` Paul E. McKenney
  2016-01-26 10:19                                           ` Peter Zijlstra
@ 2016-01-26 12:16                                           ` Will Deacon
  2016-01-26 14:35                                             ` Boqun Feng
  2016-01-26 19:58                                             ` Paul E. McKenney
  1 sibling, 2 replies; 153+ messages in thread
From: Will Deacon @ 2016-01-26 12:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > PPC Overlapping Group-B sets version 4
> > > ""
> > > (* When the Group-B sets from two different barriers involve instructions in
> > >    the same thread, within that thread one set must contain the other.
> > > 
> > > 	P0	P1	P2
> > > 	Rx=1	Wy=1	Wz=2
> > > 	dep.	lwsync	lwsync
> > > 	Ry=0	Wz=1	Wx=1
> > > 	Rz=1
> > > 
> > > 	assert(!(z=2))
> > > 
> > >    Forbidden by ppcmem, allowed by herd.
> > > *)
> > > {
> > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > }
> > >  P0		| P1		| P2		;
> > >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> > >  xor r7,r6,r6	| lwsync	| lwsync	;
> > >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> > >  lwz r8,0(r3)	|		|		;
> > > 
> > > exists
> > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > 
> > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > to constrain the coherence order of z to be {0->1->2}, then I think that
> > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > also don't think the Rz=1 in P0 changes anything.
> 
> What about the smp_wmb() variant of dmb that orders only stores?

Tricky, but I think it still works out if the coherence order of z is as
I described above. The line of reasoning is weird though -- I ended up
considering the two cases where P0 reads z before and after it reads x
and what that means for the read of y.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 12:16                                           ` Will Deacon
@ 2016-01-26 14:35                                             ` Boqun Feng
  2016-01-26 19:58                                             ` Paul E. McKenney
  1 sibling, 0 replies; 153+ messages in thread
From: Boqun Feng @ 2016-01-26 14:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: Paul E. McKenney, Peter Zijlstra, Leonid Yegoshin,
	Michael S. Tsirkin, linux-kernel, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

[-- Attachment #1: Type: text/plain, Size: 2198 bytes --]

Hi Will,

On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon wrote:
> On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > PPC Overlapping Group-B sets version 4
> > > > ""
> > > > (* When the Group-B sets from two different barriers involve instructions in
> > > >    the same thread, within that thread one set must contain the other.
> > > > 
> > > > 	P0	P1	P2
> > > > 	Rx=1	Wy=1	Wz=2
> > > > 	dep.	lwsync	lwsync
> > > > 	Ry=0	Wz=1	Wx=1
> > > > 	Rz=1
> > > > 
> > > > 	assert(!(z=2))
> > > > 
> > > >    Forbidden by ppcmem, allowed by herd.
> > > > *)
> > > > {
> > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > }
> > > >  P0		| P1		| P2		;
> > > >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> > > >  xor r7,r6,r6	| lwsync	| lwsync	;
> > > >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> > > >  lwz r8,0(r3)	|		|		;
> > > > 
> > > > exists
> > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > > 
> > > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > > to constrain the coherence order of z to be {0->1->2}, then I think that
> > > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > > also don't think the Rz=1 in P0 changes anything.
> > 
> > What about the smp_wmb() variant of dmb that orders only stores?
> 
> Tricky, but I think it still works out if the coherence order of z is as
> I described above. The line of reasoning is weird though -- I ended up
> considering the two cases where P0 reads z before and after it reads x
                                             ^^^^^^^^^^^^^^^
Because of the fact that two reads on the same processors can't be
executed simultaneously? I feel like this is exactly something herd
missed.

> and what that means for the read of y.
> 

And the reasoning on PPC is similar, so looks like the read of z on P0
is a necessary condition for the exists clause to be forbidden.

Regards,
Boqun

> Will

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-18 15:46                               ` Paul E. McKenney
@ 2016-01-26 16:52                                 ` Boqun Feng
  2016-01-26 17:22                                   ` Peter Zijlstra
  2016-01-26 19:51                                   ` Paul E. McKenney
  0 siblings, 2 replies; 153+ messages in thread
From: Boqun Feng @ 2016-01-26 16:52 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Herbert Xu, Leonid.Yegoshin, linux-mips, linux-ia64, mst, peterz,
	will.deacon, virtualization, hpa, sparclinux, mingo, linux-arch,
	linux-s390, linux, user-mode-linux-devel, linux-sh, mpe, x86,
	xen-devel, mingo, linux-xtensa, james.hogan, arnd,
	stefano.stabellini, adi-buildroot-devel, ddaney.cavm, tglx,
	linux-metag, linux-arm-kernel, andrew.cooper3, linux-kernel,
	ralf, joe, linuxppc-dev, davem, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 2574 bytes --]

Hi Paul,

On Mon, Jan 18, 2016 at 07:46:29AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 18, 2016 at 04:19:29PM +0800, Herbert Xu wrote:
> > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
> > > smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.
> > > The reason for this is that smp_read_barrier_depends() must order the
> > > pointer load against any subsequent read or write through a dereference
> > > of that pointer.  For example:
> > > 
> > >        p = READ_ONCE(gp);
> > >        smp_rmb();
> > >        r1 = p->a; /* ordered by smp_rmb(). */
> > >        p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */
> > >        r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */
> > > 
> > > In contrast:
> > > 
> > >        p = READ_ONCE(gp);
> > >        smp_read_barrier_depends();
> > >        r1 = p->a; /* ordered by smp_read_barrier_depends(). */
> > >        p->b = 42; /* ordered by smp_read_barrier_depends(). */
> > >        r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */
> > > 
> > > Again, if your hardware maintains local ordering for address
> > > and data dependencies, you can have read_barrier_depends() and
> > > smp_read_barrier_depends() be no-ops like they are for most
> > > architectures.
> > > 
> > > Does that help?
> > 
> > This is crazy! smp_rmb started out being strictly stronger than
> > smp_read_barrier_depends, when did this stop being the case?
> 
> Hello, Herbert!
> 
> It is true that most Linux kernel code relies only on the read-read
> properties of dependencies, but the read-write properties are useful.
> Admittedly relatively rarely, but useful.
> 
> The better comparison for smp_read_barrier_depends(), especially in
> its rcu_dereference*() form, is smp_load_acquire().
> 

Confused..

I recall that last time you and Linus came into a conclusion that even
on Alpha, a barrier for read->write with data dependency is unnecessary:

http://article.gmane.org/gmane.linux.kernel/2077661

And in an earlier mail of that thread, Linus made his point that
smp_read_barrier_depends() should only be used to order read->read.

So right now, are we going to extend the semantics of
smp_read_barrier_depends()? Can we just make smp_read_barrier_depends()
still only work for read->read, and assume all the architectures won't
reorder read->write with data dependency, so that the code above having
a smp_rmb() also works?

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 16:52                                 ` Boqun Feng
@ 2016-01-26 17:22                                   ` Peter Zijlstra
  2016-01-26 19:44                                     ` Linus Torvalds
  2016-01-26 19:51                                   ` Paul E. McKenney
  1 sibling, 1 reply; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-26 17:22 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Herbert Xu, Leonid.Yegoshin, linux-mips,
	linux-ia64, mst, will.deacon, virtualization, hpa, sparclinux,
	mingo, linux-arch, linux-s390, linux, user-mode-linux-devel,
	linux-sh, mpe, x86, xen-devel, mingo, linux-xtensa, james.hogan,
	arnd, stefano.stabellini, adi-buildroot-devel, ddaney.cavm, tglx,
	linux-metag, linux-arm-kernel, andrew.cooper3, linux-kernel,
	ralf, joe, linuxppc-dev, davem, Linus Torvalds

On Wed, Jan 27, 2016 at 12:52:07AM +0800, Boqun Feng wrote:
> I recall that last time you and Linus came into a conclusion that even
> on Alpha, a barrier for read->write with data dependency is unnecessary:
> 
> http://article.gmane.org/gmane.linux.kernel/2077661
> 
> And in an earlier mail of that thread, Linus made his point that
> smp_read_barrier_depends() should only be used to order read->read.
> 
> So right now, are we going to extend the semantics of
> smp_read_barrier_depends()? Can we just make smp_read_barrier_depends()
> still only work for read->read, and assume all the architectures won't
> reorder read->write with data dependency, so that the code above having
> a smp_rmb() also works?

That discussions was about control dependencies. So writes that _depend_
on a prior read having an explicit value.

So something like:

	struct foo *x = READ_ONCE(*ptr);
	smp_read_barrier_depends()
	if (x->val == 5)
		x->bar = 5;

In that case, the load of x->val must be complete and its value
determined _before_ the store to x->bar can happen.

This is distinct from:

	struct foo *x = READ_ONCE(*ptr);
	smp_read_barrier_depends();
	x->bar = 5;

And its the second case where smp_read_barrier_depends() read->write
order matters.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 10:24                                 ` Peter Zijlstra
  2016-01-26 10:32                                   ` Peter Zijlstra
@ 2016-01-26 19:44                                   ` Paul E. McKenney
  1 sibling, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 19:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Leonid Yegoshin, Will Deacon, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 11:24:02AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 14, 2016 at 02:20:46PM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 14, 2016 at 01:24:34PM -0800, Leonid Yegoshin wrote:
> > > On 01/14/2016 12:48 PM, Paul E. McKenney wrote:
> > > >
> > > >So SYNC_RMB is intended to implement smp_rmb(), correct?
> > > Yes.
> > > >
> > > >You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
> > > >smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.
> > > 
> > > If smp_read_barrier_depends() is used to separate not only two reads
> > > but read pointer and WRITE basing on that pointer (example below) -
> > > yes. I just doesn't see any example of this in famous
> > > Documentation/memory-barriers.txt and had no chance to know what you
> > > use it in this way too.
> > 
> > Well, Documentation/memory-barriers.txt was intended as a guide for Linux
> > kernel hackers, and not for hardware architects.
> 
> Yeah, this goes under the header: memory-barriers.txt is _NOT_ a
> specification (I seem to keep repeating this).
> 
> > ------------------------------------------------------------------------
> > 
> > commit 955720966e216b00613fcf60188d507c103f0e80
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Thu Jan 14 14:17:04 2016 -0800
> > 
> >     documentation: Subsequent writes ordered by rcu_dereference()
> >     
> >     The current memory-barriers.txt does not address the possibility of
> >     a write to a dereferenced pointer.  This should be rare, 
> 
> How are these rare? Isn't:
> 
> 	rcu_read_lock()
> 	obj = rcu_dereference(ptr);
> 	if (!atomic_inc_not_zero(&obj->ref))
> 		obj = NULL;
> 	rcu_read_unlock();
> 
> a _very_ common thing to do?

It is, but it provides its own barriers, so does not need to rely on
dependency ordering.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 17:22                                   ` Peter Zijlstra
@ 2016-01-26 19:44                                     ` Linus Torvalds
  2016-01-26 20:10                                       ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Linus Torvalds @ 2016-01-26 19:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Paul E. McKenney, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

On Tue, Jan 26, 2016 at 9:22 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
> This is distinct from:

That may be distinct, but:

>         struct foo *x = READ_ONCE(*ptr);
>         smp_read_barrier_depends();
>         x->bar = 5;

This case is complete BS. Stop perpetuating it. I already removed a
number of bogus cases of it, and I removed the incorrect documentation
that had this crap.

It's called "smp_READ_barrier_depends()" for a reason.

Alpha is the only one that needs it, and alpha needs it only for
dependent READS.

It's not called smp_read_write_barrier_depends(). It's not called
"smp_mb_depends()". It's a weaker form of "smp_rmb()", nothing else.

So alpha does have an implied dependency chain from a read to a
subsequent dependent write, and does not need any extra barriers.

Alpha does *not* have a dependency chain from a read to a subsequent
read, which is why we need that horrible crappy
smp_read_barrier_depends(). But it's the only reason.

This is the alpha reference manual wrt read-to-write dependency:

  5.6.1.7 Definition of Dependence Constraint

    The depends relation (DP) is defined as follows. Given u and v
issued by processor Pi, where u
    is a read or an instruction fetch and v is a write, u precedes v
in DP order (written u DP v, that
    is, v depends on u) in either of the following situations:

     • u determines the execution of v, the location accessed by v, or
the value written by v.
     • u determines the execution or address or value of another
memory access z that precedes

    v or might precede v (that is, would precede v in some execution
path depending
    on the value read by u) by processor issue constraint (see Section 5.6.1.3).

Note that the dependence barrier honors not only control flow, but
address and data values too.  This is a different syntax than we use,
but 'u' is the READ_ONCE, and 'v' is the write. Any data, address or
conditional dependency between the two implies an ordering.

So no, "smp_read_barrier_depends()" is *ONLY* about two reads, where
the second read is data-dependent on the first. Nothing else.

So if you _ever_ see a "smp_read_barrier_depends()" that isn't about a
barrier between two reads, then that is a bug.

The above code is crap.  It's exactly as much crap as

   a = READ_ONCE(x);
   smp_rmb();
   WRITE_ONCE(b, y);

because a "rmb()" simply doesn't have anything to do with
read-vs-subsequent-write ordering.

                 Linus

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 16:52                                 ` Boqun Feng
  2016-01-26 17:22                                   ` Peter Zijlstra
@ 2016-01-26 19:51                                   ` Paul E. McKenney
  1 sibling, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 19:51 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Herbert Xu, Leonid.Yegoshin, linux-mips, linux-ia64, mst, peterz,
	will.deacon, virtualization, hpa, sparclinux, mingo, linux-arch,
	linux-s390, linux, user-mode-linux-devel, linux-sh, mpe, x86,
	xen-devel, mingo, linux-xtensa, james.hogan, arnd,
	stefano.stabellini, adi-buildroot-devel, ddaney.cavm, tglx,
	linux-metag, linux-arm-kernel, andrew.cooper3, linux-kernel,
	ralf, joe, linuxppc-dev, davem, Linus Torvalds

On Wed, Jan 27, 2016 at 12:52:07AM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> On Mon, Jan 18, 2016 at 07:46:29AM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 18, 2016 at 04:19:29PM +0800, Herbert Xu wrote:
> > > Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > > >
> > > > You could use SYNC_ACQUIRE() to implement read_barrier_depends() and
> > > > smp_read_barrier_depends(), but SYNC_RMB probably does not suffice.
> > > > The reason for this is that smp_read_barrier_depends() must order the
> > > > pointer load against any subsequent read or write through a dereference
> > > > of that pointer.  For example:
> > > > 
> > > >        p = READ_ONCE(gp);
> > > >        smp_rmb();
> > > >        r1 = p->a; /* ordered by smp_rmb(). */
> > > >        p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */
> > > >        r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */
> > > > 
> > > > In contrast:
> > > > 
> > > >        p = READ_ONCE(gp);
> > > >        smp_read_barrier_depends();
> > > >        r1 = p->a; /* ordered by smp_read_barrier_depends(). */
> > > >        p->b = 42; /* ordered by smp_read_barrier_depends(). */
> > > >        r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */
> > > > 
> > > > Again, if your hardware maintains local ordering for address
> > > > and data dependencies, you can have read_barrier_depends() and
> > > > smp_read_barrier_depends() be no-ops like they are for most
> > > > architectures.
> > > > 
> > > > Does that help?
> > > 
> > > This is crazy! smp_rmb started out being strictly stronger than
> > > smp_read_barrier_depends, when did this stop being the case?
> > 
> > Hello, Herbert!
> > 
> > It is true that most Linux kernel code relies only on the read-read
> > properties of dependencies, but the read-write properties are useful.
> > Admittedly relatively rarely, but useful.
> > 
> > The better comparison for smp_read_barrier_depends(), especially in
> > its rcu_dereference*() form, is smp_load_acquire().
> 
> Confused..
> 
> I recall that last time you and Linus came into a conclusion that even
> on Alpha, a barrier for read->write with data dependency is unnecessary:
> 
> http://article.gmane.org/gmane.linux.kernel/2077661
> 
> And in an earlier mail of that thread, Linus made his point that
> smp_read_barrier_depends() should only be used to order read->read.

Those examples involved read-to-write with conditionals, as in:

	if (READ_ONCE(a))
		WRITE_ONCE(b, 1);

Without the "if", no ordering is guaranteed on weakly ordered CPUs.
(The volatile accesses keep ordering within the compiler for once...

> So right now, are we going to extend the semantics of
> smp_read_barrier_depends()? Can we just make smp_read_barrier_depends()
> still only work for read->read, and assume all the architectures won't
> reorder read->write with data dependency, so that the code above having
> a smp_rmb() also works?

The semantics of smp_read_barrier_depends() has been both read-to-write
and read-to-read for some time now, this patch just catches the
documentation up with reality.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 12:16                                           ` Will Deacon
  2016-01-26 14:35                                             ` Boqun Feng
@ 2016-01-26 19:58                                             ` Paul E. McKenney
  2016-01-27 10:25                                               ` Will Deacon
  1 sibling, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 19:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon wrote:
> On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > PPC Overlapping Group-B sets version 4
> > > > ""
> > > > (* When the Group-B sets from two different barriers involve instructions in
> > > >    the same thread, within that thread one set must contain the other.
> > > > 
> > > > 	P0	P1	P2
> > > > 	Rx=1	Wy=1	Wz=2
> > > > 	dep.	lwsync	lwsync
> > > > 	Ry=0	Wz=1	Wx=1
> > > > 	Rz=1
> > > > 
> > > > 	assert(!(z=2))
> > > > 
> > > >    Forbidden by ppcmem, allowed by herd.
> > > > *)
> > > > {
> > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > }
> > > >  P0		| P1		| P2		;
> > > >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> > > >  xor r7,r6,r6	| lwsync	| lwsync	;
> > > >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> > > >  lwz r8,0(r3)	|		|		;
> > > > 
> > > > exists
> > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > > 
> > > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > > to constrain the coherence order of z to be {0->1->2}, then I think that
> > > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > > also don't think the Rz=1 in P0 changes anything.
> > 
> > What about the smp_wmb() variant of dmb that orders only stores?
> 
> Tricky, but I think it still works out if the coherence order of z is as
> I described above. The line of reasoning is weird though -- I ended up
> considering the two cases where P0 reads z before and after it reads x
> and what that means for the read of y.

By "works out" you mean that ARM prohibits the outcome?

BTW, I never have seen a real-world use for this case.  At the moment
it is mostly a cautionary tale about memory-model corner cases and
tools.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 19:44                                     ` Linus Torvalds
@ 2016-01-26 20:10                                       ` Paul E. McKenney
  2016-01-26 22:15                                         ` Linus Torvalds
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 20:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Boqun Feng, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

On Tue, Jan 26, 2016 at 11:44:46AM -0800, Linus Torvalds wrote:
> On Tue, Jan 26, 2016 at 9:22 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > This is distinct from:
> 
> That may be distinct, but:
> 
> >         struct foo *x = READ_ONCE(*ptr);
> >         smp_read_barrier_depends();
> >         x->bar = 5;
> 
> This case is complete BS. Stop perpetuating it. I already removed a
> number of bogus cases of it, and I removed the incorrect documentation
> that had this crap.

If I understand your objection correctly, you want the above pattern
expressed either like this:

	struct foo *x = rcu_dereference(*ptr);
	x->bar = 5;

Or like this:

	struct foo *x = lockless_dereference(*ptr);
	x->bar = 5;

Or am I missing your point?

> It's called "smp_READ_barrier_depends()" for a reason.
> 
> Alpha is the only one that needs it, and alpha needs it only for
> dependent READS.
> 
> It's not called smp_read_write_barrier_depends(). It's not called
> "smp_mb_depends()". It's a weaker form of "smp_rmb()", nothing else.
> 
> So alpha does have an implied dependency chain from a read to a
> subsequent dependent write, and does not need any extra barriers.
> 
> Alpha does *not* have a dependency chain from a read to a subsequent
> read, which is why we need that horrible crappy
> smp_read_barrier_depends(). But it's the only reason.
> 
> This is the alpha reference manual wrt read-to-write dependency:
> 
>   5.6.1.7 Definition of Dependence Constraint
> 
>     The depends relation (DP) is defined as follows. Given u and v
> issued by processor Pi, where u
>     is a read or an instruction fetch and v is a write, u precedes v
> in DP order (written u DP v, that
>     is, v depends on u) in either of the following situations:
> 
>      • u determines the execution of v, the location accessed by v, or
> the value written by v.
>      • u determines the execution or address or value of another
> memory access z that precedes
> 
>     v or might precede v (that is, would precede v in some execution
> path depending
>     on the value read by u) by processor issue constraint (see Section 5.6.1.3).
> 
> Note that the dependence barrier honors not only control flow, but
> address and data values too.  This is a different syntax than we use,
> but 'u' is the READ_ONCE, and 'v' is the write. Any data, address or
> conditional dependency between the two implies an ordering.
> 
> So no, "smp_read_barrier_depends()" is *ONLY* about two reads, where
> the second read is data-dependent on the first. Nothing else.
> 
> So if you _ever_ see a "smp_read_barrier_depends()" that isn't about a
> barrier between two reads, then that is a bug.

And the smp_read_barrier_depends() in both rcu_dereference() and
in lockless_dereference() is ordering the read-to-read case and the
underlying hardware is ordering the read-to-write case on weakly ordered
hardware.

Or, again, am I missing your point?

							Thanx, Paul

> The above code is crap.  It's exactly as much crap as
> 
>    a = READ_ONCE(x);
>    smp_rmb();
>    WRITE_ONCE(b, y);
> 
> because a "rmb()" simply doesn't have anything to do with
> read-vs-subsequent-write ordering.
> 
>                  Linus
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 11:09                                     ` Will Deacon
@ 2016-01-26 20:11                                       ` Paul E. McKenney
  2016-01-27  8:35                                         ` [PATCH] documentation: Add disclaimer Peter Zijlstra
  2016-01-27 14:57                                         ` David Howells
  0 siblings, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 20:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 11:09:27AM +0000, Will Deacon wrote:
> On Tue, Jan 26, 2016 at 11:32:00AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 26, 2016 at 11:24:02AM +0100, Peter Zijlstra wrote:
> > 
> > > Yeah, this goes under the header: memory-barriers.txt is _NOT_ a
> > > specification (I seem to keep repeating this).
> > 
> > Do we want this ?

Seems likely to me.  ;-)

> > ---
> >  Documentation/memory-barriers.txt | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> > index a61be39c7b51..433326ebdc26 100644
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1,3 +1,4 @@
> > +
> >  			 ============================
> >  			 LINUX KERNEL MEMORY BARRIERS
> >  			 ============================
> > @@ -5,6 +6,22 @@
> >  By: David Howells <dhowells@redhat.com>
> >      Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >  
> > +==========
> > +DISCLAIMER
> > +==========
> > +
> > +This document is not a specification; it is intentionally (for the sake of
> > +brevity) and unintentionally (due to being human) incomplete. This document is
> > +meant as a guide to using the various memory barriers provided by Linux, but
> > +in case of any doubt (and there are many) please ask.
> 
> It might be worth adding you and me to the top of the file, to save Paul
> Cc'ing us on questions (get_maintainer.pl points at poor old Corbet for
> this file).
> 
> But yes, it seems that something like this is required.

So Peter, would you like to update your patch to include yourself
and Will as authors?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 10:19                                           ` Peter Zijlstra
@ 2016-01-26 20:13                                             ` Paul E. McKenney
  2016-01-27  8:39                                               ` Peter Zijlstra
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 20:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Leonid Yegoshin, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 11:19:27AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> 
> > > > > Yes, that seems a good start. But yesterday you raised the 'fun' point
> > > > > of two globally ordered sequences connected by a single local link.
> > > > 
> > > > The conclusion that I am slowly coming to is that litmus tests should
> > > > not be thought of as linear chains, but rather as cycles.  If you think
> > > > of it as a cycle, then it doesn't matter where the local link is, just
> > > > how many of them and how they are connected.
> > > 
> > > Do you have some examples of this? I'm struggling to make it work in my
> > > mind, or are you talking specifically in the context of the kernel
> > > memory model?
> > 
> > Now that you mention it, maybe it would be best to keep the transitive
> > and non-transitive separate for the time being anyway.  Just because it
> > might be possible to deal with does not necessarily mean that we should
> > be encouraging it.  ;-)
> 
> So isn't smp_mb__after_unlock_lock() exactly such a scenario? And would
> not someone trying to implement RCsc locks using locally transitive
> RELEASE/ACQUIRE operations need exactly this stuff?
> 
> That is, I am afraid we need to cover the mix of local and global
> transitive operations at least in overview.

True, but we haven't gotten to locking yet.  That said, I would argue
that smp_mb__after_unlock_lock() upgrades locks to transitive, and
thus would not be an exception to the "no combining transitive and
non-transitive steps in cycles" rule.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 20:10                                       ` Paul E. McKenney
@ 2016-01-26 22:15                                         ` Linus Torvalds
  2016-01-26 22:33                                           ` Linus Torvalds
  0 siblings, 1 reply; 153+ messages in thread
From: Linus Torvalds @ 2016-01-26 22:15 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Peter Zijlstra, Boqun Feng, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

On Tue, Jan 26, 2016 at 12:10 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Jan 26, 2016 at 11:44:46AM -0800, Linus Torvalds wrote:
>>
>> >         struct foo *x = READ_ONCE(*ptr);
>> >         smp_read_barrier_depends();
>> >         x->bar = 5;
>>
>> This case is complete BS. Stop perpetuating it. I already removed a
>> number of bogus cases of it, and I removed the incorrect documentation
>> that had this crap.
>
> If I understand your objection correctly, you want the above pattern
> expressed either like this:
>
>         struct foo *x = rcu_dereference(*ptr);
>         x->bar = 5;
>
> Or like this:
>
>         struct foo *x = lockless_dereference(*ptr);
>         x->bar = 5;
>
> Or am I missing your point?

You are entirely missing the point.

You might as well just write it as

    struct foo x = READ_ONCE(*ptr);
    x->bar = 5;

because that "smp_read_barrier_depends()" does NOTHING wrt the second write.

So what I am saying is simple: anybody who writes that
"smp_read_barrier_depends()" in there is just ttoally and completely
WRONG, and the fact that Peter wrote it out after I removed several
instances of that bloody f*cking idiocy is disturbing.

Don't do it. It's BS. It's wrong. Don't make excuses for it.

             Linus

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 22:15                                         ` Linus Torvalds
@ 2016-01-26 22:33                                           ` Linus Torvalds
  2016-01-26 23:29                                             ` Paul E. McKenney
  2016-01-27  7:51                                             ` Peter Zijlstra
  0 siblings, 2 replies; 153+ messages in thread
From: Linus Torvalds @ 2016-01-26 22:33 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Peter Zijlstra, Boqun Feng, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

On Tue, Jan 26, 2016 at 2:15 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> You might as well just write it as
>
>     struct foo x = READ_ONCE(*ptr);
>     x->bar = 5;
>
> because that "smp_read_barrier_depends()" does NOTHING wrt the second write.

Just to clarify: on alpha it adds a memory barrier, but that memory
barrier is useless.

On non-alpha, it is a no-op, and obviously does nothing simply because
it generates no code.

So if anybody believes that the "smp_read_barrier_depends()" does
something, they are *wrong*.

And if anybody sends out an email with that smp_read_barrier_depends()
in an example, they are actively just confusing other people, which is
even worse than just being wrong. Which is why I jumped in.

So stop perpetuating the myth that smp_read_barrier_depends() does
something here. It does not. It's a bug, and it has become this "mind
virus" for some people that seem to believe that it does something.

I had to remove this crap once from the kernel already, see commit
105ff3cbf225 ("atomic: remove all traces of READ_ONCE_CTRL() and
atomic*_read_ctrl()").

I don't want to ever see that broken construct again. And I want to
make sure that everybody is educated about how broken it was. I'm
extremely unhappy that it came up again.

If it turns out that some architecture does actually need a barrier
between a read and a dependent write, then that will mean that

 (a) we'll have to make up a _new_ barrier, because
"smp_read_barrier_depends()" is not that barrier. We'll presumably
then have to make that new barrier part of "rcu_derefence()" and
friends.

 (b) we will have found an architecture with even worse memory
ordering semantics than alpha, and we'll have to stop castigating
alpha for being the worst memory ordering ever.

but I sincerely hope that we'll never find that kind of broken architecture.

               Linus

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 22:33                                           ` Linus Torvalds
@ 2016-01-26 23:29                                             ` Paul E. McKenney
  2016-01-26 23:45                                               ` Linus Torvalds
  2016-01-27  2:04                                               ` Boqun Feng
  2016-01-27  7:51                                             ` Peter Zijlstra
  1 sibling, 2 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 23:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Boqun Feng, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

On Tue, Jan 26, 2016 at 02:33:40PM -0800, Linus Torvalds wrote:
> On Tue, Jan 26, 2016 at 2:15 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > You might as well just write it as
> >
> >     struct foo x = READ_ONCE(*ptr);
> >     x->bar = 5;
> >
> > because that "smp_read_barrier_depends()" does NOTHING wrt the second write.
> 
> Just to clarify: on alpha it adds a memory barrier, but that memory
> barrier is useless.

No trailing data-dependent read, so agreed, no smp_read_barrier_depends()
needed.  That said, I believe that we should encourage rcu_dereference*()
or lockless_dereference() instead of READ_ONCE() for documentation
reasons, though.

> On non-alpha, it is a no-op, and obviously does nothing simply because
> it generates no code.
> 
> So if anybody believes that the "smp_read_barrier_depends()" does
> something, they are *wrong*.

The other problem with smp_read_barrier_depends() is that it is often
a pain figuring out which prior load it is supposed to apply to.
Hence my preference for rcu_dereference*() and lockless_dereference().

> And if anybody sends out an email with that smp_read_barrier_depends()
> in an example, they are actively just confusing other people, which is
> even worse than just being wrong. Which is why I jumped in.
> 
> So stop perpetuating the myth that smp_read_barrier_depends() does
> something here. It does not. It's a bug, and it has become this "mind
> virus" for some people that seem to believe that it does something.

It looks like I should add words to memory-barriers.txt de-emphasizing
smp_read_barrier_depends().  I will take a look at that.

> I had to remove this crap once from the kernel already, see commit
> 105ff3cbf225 ("atomic: remove all traces of READ_ONCE_CTRL() and
> atomic*_read_ctrl()").
> 
> I don't want to ever see that broken construct again. And I want to
> make sure that everybody is educated about how broken it was. I'm
> extremely unhappy that it came up again.

Well, if it makes you feel better, that was control dependencies and this
was data dependencies.  So it was not -exactly- the same.  ;-)

(Sorry, couldn't resist...)

> If it turns out that some architecture does actually need a barrier
> between a read and a dependent write, then that will mean that
> 
>  (a) we'll have to make up a _new_ barrier, because
> "smp_read_barrier_depends()" is not that barrier. We'll presumably
> then have to make that new barrier part of "rcu_derefence()" and
> friends.

Agreed.  We can worry about whether or not we replace the current
smp_read_barrier_depends() with that new barrier when and if such
hardware appears.

>  (b) we will have found an architecture with even worse memory
> ordering semantics than alpha, and we'll have to stop castigating
> alpha for being the worst memory ordering ever.

;-) ;-) ;-)

> but I sincerely hope that we'll never find that kind of broken architecture.

Apparently at least some hardware vendors are reading memory-barriers.txt,
so perhaps the odds of that kind of breakage have reduced.

								Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 12:10                                           ` Will Deacon
@ 2016-01-26 23:37                                             ` Paul E. McKenney
  2016-01-27 10:23                                               ` Will Deacon
  0 siblings, 1 reply; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-26 23:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 12:10:10PM +0000, Will Deacon wrote:
> On Mon, Jan 25, 2016 at 05:06:46PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 02:41:34PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 11:28:45AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 09:54:01AM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
> > > > > > See my earlier reply [1] (but also, your WRC Linux example looks more
> > > > > > like a variant on WWC and I couldn't really follow it).
> > > > > 
> > > > > I will revisit my WRC Linux example.  And yes, creating litmus tests
> > > > > that use non-fake dependencies is still a bit of an undertaking.  :-/
> > > > > I am sure that it will seem more natural with time and experience...
> > > > 
> > > > Hmmm...  You are quite right, I did do WWC.  I need to change cpu2()'s
> > > > last access from a store to a load to get WRC.  Plus the levels of
> > > > indirection definitely didn't match up, did they?
> > > 
> > > Nope, it was pretty baffling!
> > 
> > "It is a service that I provide."  ;-)
> > 
> > > > 	struct foo {
> > > > 		struct foo *next;
> > > > 	};
> > > > 	struct foo a;
> > > > 	struct foo b;
> > > > 	struct foo c = { &a };
> > > > 	struct foo d = { &b };
> > > > 	struct foo x = { &c };
> > > > 	struct foo y = { &d };
> > > > 	struct foo *r1, *r2, *r3;
> > > > 
> > > > 	void cpu0(void)
> > > > 	{
> > > > 		WRITE_ONCE(x.next, &y);
> > > > 	}
> > > > 
> > > > 	void cpu1(void)
> > > > 	{
> > > > 		r1 = lockless_dereference(x.next);
> > > > 		WRITE_ONCE(r1->next, &x);
> > > > 	}
> > > > 
> > > > 	void cpu2(void)
> > > > 	{
> > > > 		r2 = lockless_dereference(y.next);
> > > > 		r3 = READ_ONCE(r2->next);
> > > > 	}
> > > > 
> > > > In this case, it is legal to end the run with:
> > > > 
> > > > 	r1 == &y && r2 == &x && r3 == &c
> > > > 
> > > > Please see below for a ppcmem litmus test.
> > > > 
> > > > So, did I get it right this time?  ;-)
> > > 
> > > The code above looks correct to me (in that it matches WRC+addrs),
> > > but your litmus test:
> > > 
> > > > PPC WRCnf+addrs
> > > > ""
> > > > {
> > > > 0:r2=x; 0:r3=y;
> > > > 1:r2=x; 1:r3=y;
> > > > 2:r2=x; 2:r3=y;
> > > > c=a; d=b; x=c; y=d;
> > > > }
> > > >  P0           | P1            | P2            ;
> > > >  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
> > > >               | stw r2,0(r3)  | lwz r9,0(r8)  ;
> > > > exists
> > > > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> > > 
> > > Seems to be missing the address dependency on P1.
> > 
> > You are quite correct!  How about the following?
> 
> I think that's it!
> 
> > As before, both herd and ppcmem say that the cycle is allowed, as
> > expected, given non-transitive ordering.  To prohibit the cycle, P1
> > needs a suitable memory-barrier instruction.
> > 
> > ------------------------------------------------------------------------
> > 
> > PPC WRCnf+addrs
> > ""
> > {
> > 0:r2=x; 0:r3=y;
> > 1:r2=x; 1:r3=y;
> > 2:r2=x; 2:r3=y;
> > c=a; d=b; x=c; y=d;
> > }
> >  P0           | P1            | P2            ;
> >  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
> >               | stw r2,0(r8)  | lwz r9,0(r8)  ;
> > exists
> > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> 
> Agreed.

OK, thank you!  Would you agree that it would be good to replace the
current xor-based fake-dependency litmus tests with tests having real
dependencies?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 23:29                                             ` Paul E. McKenney
@ 2016-01-26 23:45                                               ` Linus Torvalds
  2016-01-27  0:57                                                 ` Paul E. McKenney
  2016-01-27  2:04                                               ` Boqun Feng
  1 sibling, 1 reply; 153+ messages in thread
From: Linus Torvalds @ 2016-01-26 23:45 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Peter Zijlstra, Boqun Feng, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

On Tue, Jan 26, 2016 at 3:29 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>
> No trailing data-dependent read, so agreed, no smp_read_barrier_depends()
> needed.  That said, I believe that we should encourage rcu_dereference*()
> or lockless_dereference() instead of READ_ONCE() for documentation
> reasons, though.

I agree that that is likely the right thing to do in pretty much all situations.

In theory, there might be performance situations where we'd want to
actively avoid the smp_read_barrier_depends() inherent in those, but
considering that it's only a performance issue on alpha, and we
probably have all of two or three people using Linux on alpha, it's a
pretty theoretical performance worry.

                  Linus

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 23:45                                               ` Linus Torvalds
@ 2016-01-27  0:57                                                 ` Paul E. McKenney
  0 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-27  0:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, Boqun Feng, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

On Tue, Jan 26, 2016 at 03:45:23PM -0800, Linus Torvalds wrote:
> On Tue, Jan 26, 2016 at 3:29 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > No trailing data-dependent read, so agreed, no smp_read_barrier_depends()
> > needed.  That said, I believe that we should encourage rcu_dereference*()
> > or lockless_dereference() instead of READ_ONCE() for documentation
> > reasons, though.
> 
> I agree that that is likely the right thing to do in pretty much all situations.
> 
> In theory, there might be performance situations where we'd want to
> actively avoid the smp_read_barrier_depends() inherent in those, but
> considering that it's only a performance issue on alpha, and we
> probably have all of two or three people using Linux on alpha, it's a
> pretty theoretical performance worry.

Agreed!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 23:29                                             ` Paul E. McKenney
  2016-01-26 23:45                                               ` Linus Torvalds
@ 2016-01-27  2:04                                               ` Boqun Feng
  2016-01-27 23:30                                                 ` Paul E. McKenney
  1 sibling, 1 reply; 153+ messages in thread
From: Boqun Feng @ 2016-01-27  2:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Linus Torvalds, Peter Zijlstra, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

[-- Attachment #1: Type: text/plain, Size: 4479 bytes --]

On Tue, Jan 26, 2016 at 03:29:21PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2016 at 02:33:40PM -0800, Linus Torvalds wrote:
> > On Tue, Jan 26, 2016 at 2:15 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > You might as well just write it as
> > >
> > >     struct foo x = READ_ONCE(*ptr);
> > >     x->bar = 5;
> > >
> > > because that "smp_read_barrier_depends()" does NOTHING wrt the second write.
> > 
> > Just to clarify: on alpha it adds a memory barrier, but that memory
> > barrier is useless.
> 
> No trailing data-dependent read, so agreed, no smp_read_barrier_depends()
> needed.  That said, I believe that we should encourage rcu_dereference*()
> or lockless_dereference() instead of READ_ONCE() for documentation
> reasons, though.
> 
> > On non-alpha, it is a no-op, and obviously does nothing simply because
> > it generates no code.
> > 
> > So if anybody believes that the "smp_read_barrier_depends()" does
> > something, they are *wrong*.
> 
> The other problem with smp_read_barrier_depends() is that it is often
> a pain figuring out which prior load it is supposed to apply to.
> Hence my preference for rcu_dereference*() and lockless_dereference().
> 

Because semantically speaking, rcu_derefence*() and
lockless_dereference() are CONSUME(i.e. data/address dependent
read->read and read->write pairs are ordered), whereas
smp_read_barrier_depends() only guarantees read->read pairs with data
dependency are ordered, right?

If so, maybe we need to call it out in memory-barriers.txt, for example:

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 904ee42..6b262c2 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1703,8 +1703,8 @@ There are some more advanced barrier functions:
 
 
  (*) lockless_dereference();
-     This can be thought of as a pointer-fetch wrapper around the
-     smp_read_barrier_depends() data-dependency barrier.
+     This is a load, and any load or store that has a data dependency on the
+     value returned by this load won't be reordered before this load.
 
      This is also similar to rcu_dereference(), but in cases where
      object lifetime is handled by some mechanism other than RCU, for


Regards,
Boqun

> > And if anybody sends out an email with that smp_read_barrier_depends()
> > in an example, they are actively just confusing other people, which is
> > even worse than just being wrong. Which is why I jumped in.
> > 
> > So stop perpetuating the myth that smp_read_barrier_depends() does
> > something here. It does not. It's a bug, and it has become this "mind
> > virus" for some people that seem to believe that it does something.
> 
> It looks like I should add words to memory-barriers.txt de-emphasizing
> smp_read_barrier_depends().  I will take a look at that.
> 
> > I had to remove this crap once from the kernel already, see commit
> > 105ff3cbf225 ("atomic: remove all traces of READ_ONCE_CTRL() and
> > atomic*_read_ctrl()").
> > 
> > I don't want to ever see that broken construct again. And I want to
> > make sure that everybody is educated about how broken it was. I'm
> > extremely unhappy that it came up again.
> 
> Well, if it makes you feel better, that was control dependencies and this
> was data dependencies.  So it was not -exactly- the same.  ;-)
> 
> (Sorry, couldn't resist...)
> 
> > If it turns out that some architecture does actually need a barrier
> > between a read and a dependent write, then that will mean that
> > 
> >  (a) we'll have to make up a _new_ barrier, because
> > "smp_read_barrier_depends()" is not that barrier. We'll presumably
> > then have to make that new barrier part of "rcu_derefence()" and
> > friends.
> 
> Agreed.  We can worry about whether or not we replace the current
> smp_read_barrier_depends() with that new barrier when and if such
> hardware appears.
> 
> >  (b) we will have found an architecture with even worse memory
> > ordering semantics than alpha, and we'll have to stop castigating
> > alpha for being the worst memory ordering ever.
> 
> ;-) ;-) ;-)
> 
> > but I sincerely hope that we'll never find that kind of broken architecture.
> 
> Apparently at least some hardware vendors are reading memory-barriers.txt,
> so perhaps the odds of that kind of breakage have reduced.
> 
> 								Thanx, Paul
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 22:33                                           ` Linus Torvalds
  2016-01-26 23:29                                             ` Paul E. McKenney
@ 2016-01-27  7:51                                             ` Peter Zijlstra
  1 sibling, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-27  7:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Boqun Feng, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

On Tue, Jan 26, 2016 at 02:33:40PM -0800, Linus Torvalds wrote:

> If it turns out that some architecture does actually need a barrier
> between a read and a dependent write, then that will mean that
> 
>  (a) we'll have to make up a _new_ barrier, because
> "smp_read_barrier_depends()" is not that barrier. We'll presumably
> then have to make that new barrier part of "rcu_derefence()" and
> friends.
> 
>  (b) we will have found an architecture with even worse memory
> ordering semantics than alpha, and we'll have to stop castigating
> alpha for being the worst memory ordering ever.
> 
> but I sincerely hope that we'll never find that kind of broken architecture.

So for a moment it looked like MIPS wanted to equal or surpass Alpha in
this respect.

And Paul made the point that smp_read_barrier_depends() really should
be smp_aquire_barrier_depends() in that we rely on both dependent reads
and writes to be ordered against the initial pointer load.

Now, as you've made abundantly clear, Alpha does this, although it needs
the little extra help in the dependent read department.

The 'problem' is that someone seemed to have used our
Documentation/memory-barriers.txt as a specification for what hardware
is permitted and we require. And in that light Paul noted that
read_barrier_depends really should be considered an
acquire_barrier_depends and order both dependent reads and writes
against the (prior) read (if nothing else already does).

Now clearly, any sane architecture doesn't need anything like this, but
again our document doesn't seem to judge. That is, from reading the
document one can get the impression is a perfectly fine thing to do.
Nowhere does our disdain for this thing show.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH] documentation: Add disclaimer
  2016-01-26 20:11                                       ` Paul E. McKenney
@ 2016-01-27  8:35                                         ` Peter Zijlstra
  2016-01-27 10:11                                           ` Will Deacon
  2016-04-14 21:40                                           ` Paul E. McKenney
  2016-01-27 14:57                                         ` David Howells
  1 sibling, 2 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-27  8:35 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Leonid Yegoshin, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 12:11:43PM -0800, Paul E. McKenney wrote:
> So Peter, would you like to update your patch to include yourself
> and Will as authors?

Sure, here goes.

---
Subject: documentation: Add disclaimer

It appears people are reading this document as a requirements list for
building hardware. This is not the intent of this document. Nor is it
particularly suited for this purpose.

The primary purpose of this document is our collective attempt to define
a set of primitives that (hopefully) allow us to write correct code on
the myriad of SMP platforms Linux supports.

Its a definite work in progress as our understanding of these platforms,
and memory ordering in general, progresses.

Nor does being mentioned in this document mean we think its a
particularly good idea; the data dependency barrier required by Alpha
being a prime example. Yes we have it, no you're insane to require it
when building new hardware.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Documentation/memory-barriers.txt | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index a61be39c7b51..98626125f484 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -4,8 +4,24 @@
 
 By: David Howells <dhowells@redhat.com>
     Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+    Will Deacon <will.deacon@arm.com>
+    Peter Zijlstra <peterz@infradead.org>
 
-Contents:
+==========
+DISCLAIMER
+==========
+
+This document is not a specification; it is intentionally (for the sake of
+brevity) and unintentionally (due to being human) incomplete. This document is
+meant as a guide to using the various memory barriers provided by Linux, but
+in case of any doubt (and there are many) please ask.
+
+I repeat, this document is not a specification of what Linux expects from
+hardware.
+
+========
+CONTENTS
+========
 
  (*) Abstract memory access model.
 

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 20:13                                             ` Paul E. McKenney
@ 2016-01-27  8:39                                               ` Peter Zijlstra
  0 siblings, 0 replies; 153+ messages in thread
From: Peter Zijlstra @ 2016-01-27  8:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Leonid Yegoshin, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 12:13:39PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2016 at 11:19:27AM +0100, Peter Zijlstra wrote:

> > So isn't smp_mb__after_unlock_lock() exactly such a scenario? And would
> > not someone trying to implement RCsc locks using locally transitive
> > RELEASE/ACQUIRE operations need exactly this stuff?
> > 
> > That is, I am afraid we need to cover the mix of local and global
> > transitive operations at least in overview.
> 
> True, but we haven't gotten to locking yet.

The mythical smp_mb__after_release_acquire() then ;-)

(and yes, I know you're going to say we don't have that)

> That said, I would argue
> that smp_mb__after_unlock_lock() upgrades locks to transitive, and
> thus would not be an exception to the "no combining transitive and
> non-transitive steps in cycles" rule.

But But But ;-) It does that exactly by combining. I suspect this is
(partly) the source of your SC chains with one PC link example.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] documentation: Add disclaimer
  2016-01-27  8:35                                         ` [PATCH] documentation: Add disclaimer Peter Zijlstra
@ 2016-01-27 10:11                                           ` Will Deacon
  2016-04-14 21:40                                           ` Paul E. McKenney
  1 sibling, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-01-27 10:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Wed, Jan 27, 2016 at 09:35:46AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 26, 2016 at 12:11:43PM -0800, Paul E. McKenney wrote:
> > So Peter, would you like to update your patch to include yourself
> > and Will as authors?
> 
> Sure, here goes.
> 
> ---
> Subject: documentation: Add disclaimer
> 
> It appears people are reading this document as a requirements list for
> building hardware. This is not the intent of this document. Nor is it
> particularly suited for this purpose.
> 
> The primary purpose of this document is our collective attempt to define
> a set of primitives that (hopefully) allow us to write correct code on
> the myriad of SMP platforms Linux supports.
> 
> Its a definite work in progress as our understanding of these platforms,
> and memory ordering in general, progresses.
> 
> Nor does being mentioned in this document mean we think its a
> particularly good idea; the data dependency barrier required by Alpha
> being a prime example. Yes we have it, no you're insane to require it
> when building new hardware.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Documentation/memory-barriers.txt | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)

Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 23:37                                             ` Paul E. McKenney
@ 2016-01-27 10:23                                               ` Will Deacon
  0 siblings, 0 replies; 153+ messages in thread
From: Will Deacon @ 2016-01-27 10:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 03:37:33PM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2016 at 12:10:10PM +0000, Will Deacon wrote:
> > On Mon, Jan 25, 2016 at 05:06:46PM -0800, Paul E. McKenney wrote:
> > > PPC WRCnf+addrs
> > > ""
> > > {
> > > 0:r2=x; 0:r3=y;
> > > 1:r2=x; 1:r3=y;
> > > 2:r2=x; 2:r3=y;
> > > c=a; d=b; x=c; y=d;
> > > }
> > >  P0           | P1            | P2            ;
> > >  stw r3,0(r2) | lwz r8,0(r2)  | lwz r8,0(r3)  ;
> > >               | stw r2,0(r8)  | lwz r9,0(r8)  ;
> > > exists
> > > (1:r8=y /\ 2:r8=x /\ 2:r9=c)
> > 
> > Agreed.
> 
> OK, thank you!  Would you agree that it would be good to replace the
> current xor-based fake-dependency litmus tests with tests having real
> dependencies?

Yes, because it would look a lot more like real (kernel) code.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-26 19:58                                             ` Paul E. McKenney
@ 2016-01-27 10:25                                               ` Will Deacon
  2016-01-27 23:32                                                 ` Paul E. McKenney
  0 siblings, 1 reply; 153+ messages in thread
From: Will Deacon @ 2016-01-27 10:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Tue, Jan 26, 2016 at 11:58:20AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon wrote:
> > On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > > PPC Overlapping Group-B sets version 4
> > > > > ""
> > > > > (* When the Group-B sets from two different barriers involve instructions in
> > > > >    the same thread, within that thread one set must contain the other.
> > > > > 
> > > > > 	P0	P1	P2
> > > > > 	Rx=1	Wy=1	Wz=2
> > > > > 	dep.	lwsync	lwsync
> > > > > 	Ry=0	Wz=1	Wx=1
> > > > > 	Rz=1
> > > > > 
> > > > > 	assert(!(z=2))
> > > > > 
> > > > >    Forbidden by ppcmem, allowed by herd.
> > > > > *)
> > > > > {
> > > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > > }
> > > > >  P0		| P1		| P2		;
> > > > >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> > > > >  xor r7,r6,r6	| lwsync	| lwsync	;
> > > > >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> > > > >  lwz r8,0(r3)	|		|		;
> > > > > 
> > > > > exists
> > > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > > > 
> > > > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > > > to constrain the coherence order of z to be {0->1->2}, then I think that
> > > > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > > > also don't think the Rz=1 in P0 changes anything.
> > > 
> > > What about the smp_wmb() variant of dmb that orders only stores?
> > 
> > Tricky, but I think it still works out if the coherence order of z is as
> > I described above. The line of reasoning is weird though -- I ended up
> > considering the two cases where P0 reads z before and after it reads x
> > and what that means for the read of y.
> 
> By "works out" you mean that ARM prohibits the outcome?

Yes, that's my understanding.

Will

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15  0:47                                   ` Paul E. McKenney
  2016-01-15  1:07                                     ` Leonid Yegoshin
@ 2016-01-27 10:40                                     ` Ralf Baechle
  2016-01-27 12:09                                       ` Maciej W. Rozycki
  1 sibling, 1 reply; 153+ messages in thread
From: Ralf Baechle @ 2016-01-27 10:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Leonid Yegoshin, Will Deacon, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ingo Molnar, ddaney.cavm, james.hogan,
	Michael Ellerman

On Thu, Jan 14, 2016 at 04:47:53PM -0800, Paul E. McKenney wrote:

> So you need to build a different kernel for some types of MIPS systems?

Yes.  We can't really do without.  Classic MIPS code is not relocatable
without the complexity of PIC code as used by ELF DSOs - and their
performanc penalty.  Plus we have a number of architecture revisions
ovr the decades, big and little endian, 32 and 64 bit as the major
stumbling stones.  There however are groups of similar systems that
can share kernel binaries.

> Or do you do boot-time rewriting, like a number of other arches do?

We don't rewrite the code (as in the .text of the vmlinux binary) but we
do runtime code generation for a few highly performance sensitive area
of the kernel code such as copy_page() or TLB exception handlers.  This
allows more flexibility than just inserting templates into the kernel
code.  Downside - it means we have some of the complexity of as and ld
in the kernel.

  Ralf

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-15  1:07                                     ` Leonid Yegoshin
@ 2016-01-27 11:26                                       ` Maciej W. Rozycki
  2016-01-28  0:58                                         ` Leonid Yegoshin
       [not found]                                         ` <56A9656D.3080707@imgtec.com>
  0 siblings, 2 replies; 153+ messages in thread
From: Maciej W. Rozycki @ 2016-01-27 11:26 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: paulmck, Will Deacon, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Fri, 15 Jan 2016, Leonid Yegoshin wrote:

> > So you need to build a different kernel for some types of MIPS systems?
> > Or do you do boot-time rewriting, like a number of other arches do?
> 
> I don't know. I would like to have responses. Ralf asked Maciej about old
> systems and that came nowhere. Even rewrite - don't know what to do with that:
> no lightweight SYNC or no SYNC at all - yes, it is still possible that SYNC on
> some systems can be too heavy or even harmful, nobody tested that.

 I don't recall being asked; mind that I might not get to messages I have 
not been cc-ed in a timely manner and I may miss some altogether.  With 
the amount of mailing list traffic that passes by me my scanner may fail 
to trigger.  Sorry if this causes anybody trouble, but such is life.

 Coincidentally, I have just posted some notes on SYNC in a different 
thread, see <http://lkml.iu.edu/hypermail/linux/kernel/1601.3/03080.html>.  
There's a reference to an older message of mine there too.  I hope this 
answers your questions.

  Maciej

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-27 10:40                                     ` Ralf Baechle
@ 2016-01-27 12:09                                       ` Maciej W. Rozycki
  0 siblings, 0 replies; 153+ messages in thread
From: Maciej W. Rozycki @ 2016-01-27 12:09 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: Matt Redfearn, Paul E. McKenney, Leonid Yegoshin, Will Deacon,
	Peter Zijlstra, Michael S. Tsirkin, linux-kernel, Arnd Bergmann,
	linux-arch, Andrew Cooper, Russell King - ARM Linux,
	virtualization, Stefano Stabellini, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Joe Perches, David Miller, linux-ia64,
	linuxppc-dev, linux-s390, sparclinux, linux-arm-kernel,
	linux-metag, linux-mips, x86, user-mode-linux-devel,
	adi-buildroot-devel, linux-sh, linux-xtensa, xen-devel,
	Ingo Molnar, ddaney.cavm, james.hogan, Michael Ellerman

On Wed, 27 Jan 2016, Ralf Baechle wrote:

> > So you need to build a different kernel for some types of MIPS systems?
> 
> Yes.  We can't really do without.  Classic MIPS code is not relocatable
> without the complexity of PIC code as used by ELF DSOs - and their
> performanc penalty.  Plus we have a number of architecture revisions
> ovr the decades, big and little endian, 32 and 64 bit as the major
> stumbling stones.  There however are groups of similar systems that
> can share kernel binaries.

 Matt (cc-ed) has recently posted patches to add support for a relocatable 
kernel, implemented without the usual overhead of PIC code.  It works by 
retaining relocations in a fully-linked binary and then simply replaying 
the work the static linker does when assigning addresses, as the image 
loaded is copied to its intended destination at an early bootstrap stage.  
See: 
<http://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=1449137297-30464-1-git-send-email-matt.redfearn%40imgtec.com> 
for details.

 I think this framework can be reused by carefully choosing instructions 
used in early bootstrap code, up to the relocation stage, so that it is 
runnable anywhere (not the same as PIC!) like early ld.so initialisation 
and then loading the whole attached image starting from an address where 
RAM does exist on target hardware.

 Endianness is a different matter, obviously we can't build a single image 
for both, although for distributions' sake an approach similar to one used 
with bi-endian firmware (for hardware which has an easy way to switch the 
endianness, e.g. a physical jumper or a configuration bit stored in flash 
memory; not to be confused with the reverse user endianness mode) might be 
feasible, by glueing two kernel images together and then selecting the 
right one early in bootstrap, perhaps again reusing Matt's framework.  
I'm not sure if this is worth the effort though, I suspect the usage level 
of this feature would be minimal.

 All in all I think making a generic MIPS kernel just might be feasible, 
but with the diversity of options available the effort required would be 
enormous.  NetBSD for example I believe supports building a kernel that 
correctly runs on both R3000 (MIPS I, 32-bit) and R4000 (MIPS III, 64-bit) 
DEC hardware (as did DEC Ultrix, the vendor OS for these systems).  These 
processors are different enough from each other that you cannot use the 
same code for cache, memory and exception management in an OS kernel -- 
backward compatibility is only provided for user software.  That proves 
the concept, however in a very limited way only, not even covering SMP, 
and their R4000 kernel does not support 64-bit userland I believe.  They 
still have completely separate ports for other MIPS hardware, such as for 
Broadcom SiByte SB-1 (MIPS64r1) processors.

  Maciej

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] documentation: Add disclaimer
  2016-01-26 20:11                                       ` Paul E. McKenney
  2016-01-27  8:35                                         ` [PATCH] documentation: Add disclaimer Peter Zijlstra
@ 2016-01-27 14:57                                         ` David Howells
  2016-01-27 23:35                                           ` Paul E. McKenney
                                                             ` (2 more replies)
  1 sibling, 3 replies; 153+ messages in thread
From: David Howells @ 2016-01-27 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, Paul E. McKenney, Will Deacon, Leonid Yegoshin,
	Michael S. Tsirkin, linux-kernel, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

Peter Zijlstra <peterz@infradead.org> wrote:

> +==========
> +DISCLAIMER
> +==========
> +
> +This document is not a specification; it is intentionally (for the sake of
> +brevity) and unintentionally (due to being human) incomplete. This document is
> +meant as a guide to using the various memory barriers provided by Linux, but
> +in case of any doubt (and there are many) please ask.
> +
> +I repeat, this document is not a specification of what Linux expects from
> +hardware.

The purpose of this document is twofold:

 (1) to specify the minimum functionality that one can rely on for any
     particular barrier, and

 (2) to provide a guide as to how to use the barriers that are available.

Note that an architecture can provide more than the minimum requirement for
any particular barrier, but if the barrier provides less than that, it is
incorrect.

Note also that it is possible that a barrier may be a no-op for an
architecture because the way that arch works renders an explicit barrier
unnecessary in that case.

> +

Can you bung an extra blank line in here if you have to redo this at all?

> +========
> +CONTENTS
> +========
>  
>   (*) Abstract memory access model.
>  

David

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-27  2:04                                               ` Boqun Feng
@ 2016-01-27 23:30                                                 ` Paul E. McKenney
  0 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-27 23:30 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Linus Torvalds, Peter Zijlstra, Herbert Xu, Leonid Yegoshin,
	linux-mips, linux-ia64, Michael S. Tsirkin, Will Deacon,
	virtualization, Peter Anvin, sparclinux, Ingo Molnar, linux-arch,
	linux-s390, Russell King - ARM Linux, uml-devel, linux-sh,
	Michael Ellerman, the arch/x86 maintainers, xen-devel,
	Ingo Molnar, linux-xtensa, James Hogan, Arnd Bergmann,
	Stefano Stabellini, adi-buildroot-devel, David Daney,
	Thomas Gleixner, linux-metag, linux-arm-kernel, Andrew Cooper,
	Linux Kernel Mailing List, Ralf Baechle, Joe Perches, ppc-dev,
	David Miller

On Wed, Jan 27, 2016 at 10:04:47AM +0800, Boqun Feng wrote:
> On Tue, Jan 26, 2016 at 03:29:21PM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 26, 2016 at 02:33:40PM -0800, Linus Torvalds wrote:
> > > On Tue, Jan 26, 2016 at 2:15 PM, Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > You might as well just write it as
> > > >
> > > >     struct foo x = READ_ONCE(*ptr);
> > > >     x->bar = 5;
> > > >
> > > > because that "smp_read_barrier_depends()" does NOTHING wrt the second write.
> > > 
> > > Just to clarify: on alpha it adds a memory barrier, but that memory
> > > barrier is useless.
> > 
> > No trailing data-dependent read, so agreed, no smp_read_barrier_depends()
> > needed.  That said, I believe that we should encourage rcu_dereference*()
> > or lockless_dereference() instead of READ_ONCE() for documentation
> > reasons, though.
> > 
> > > On non-alpha, it is a no-op, and obviously does nothing simply because
> > > it generates no code.
> > > 
> > > So if anybody believes that the "smp_read_barrier_depends()" does
> > > something, they are *wrong*.
> > 
> > The other problem with smp_read_barrier_depends() is that it is often
> > a pain figuring out which prior load it is supposed to apply to.
> > Hence my preference for rcu_dereference*() and lockless_dereference().
> > 
> 
> Because semantically speaking, rcu_derefence*() and
> lockless_dereference() are CONSUME(i.e. data/address dependent
> read->read and read->write pairs are ordered), whereas
> smp_read_barrier_depends() only guarantees read->read pairs with data
> dependency are ordered, right?
> 
> If so, maybe we need to call it out in memory-barriers.txt, for example:
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 904ee42..6b262c2 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1703,8 +1703,8 @@ There are some more advanced barrier functions:
>  
>  
>   (*) lockless_dereference();
> -     This can be thought of as a pointer-fetch wrapper around the
> -     smp_read_barrier_depends() data-dependency barrier.
> +     This is a load, and any load or store that has a data dependency on the
> +     value returned by this load won't be reordered before this load.

This is a good start, but more is needed to warn people off of
smp_read_barrier_depends().  But yes, better explanation would be good.

							Thanx, Paul

>       This is also similar to rcu_dereference(), but in cases where
>       object lifetime is handled by some mechanism other than RCU, for
> 
> 
> Regards,
> Boqun
> 
> > > And if anybody sends out an email with that smp_read_barrier_depends()
> > > in an example, they are actively just confusing other people, which is
> > > even worse than just being wrong. Which is why I jumped in.
> > > 
> > > So stop perpetuating the myth that smp_read_barrier_depends() does
> > > something here. It does not. It's a bug, and it has become this "mind
> > > virus" for some people that seem to believe that it does something.
> > 
> > It looks like I should add words to memory-barriers.txt de-emphasizing
> > smp_read_barrier_depends().  I will take a look at that.
> > 
> > > I had to remove this crap once from the kernel already, see commit
> > > 105ff3cbf225 ("atomic: remove all traces of READ_ONCE_CTRL() and
> > > atomic*_read_ctrl()").
> > > 
> > > I don't want to ever see that broken construct again. And I want to
> > > make sure that everybody is educated about how broken it was. I'm
> > > extremely unhappy that it came up again.
> > 
> > Well, if it makes you feel better, that was control dependencies and this
> > was data dependencies.  So it was not -exactly- the same.  ;-)
> > 
> > (Sorry, couldn't resist...)
> > 
> > > If it turns out that some architecture does actually need a barrier
> > > between a read and a dependent write, then that will mean that
> > > 
> > >  (a) we'll have to make up a _new_ barrier, because
> > > "smp_read_barrier_depends()" is not that barrier. We'll presumably
> > > then have to make that new barrier part of "rcu_derefence()" and
> > > friends.
> > 
> > Agreed.  We can worry about whether or not we replace the current
> > smp_read_barrier_depends() with that new barrier when and if such
> > hardware appears.
> > 
> > >  (b) we will have found an architecture with even worse memory
> > > ordering semantics than alpha, and we'll have to stop castigating
> > > alpha for being the worst memory ordering ever.
> > 
> > ;-) ;-) ;-)
> > 
> > > but I sincerely hope that we'll never find that kind of broken architecture.
> > 
> > Apparently at least some hardware vendors are reading memory-barriers.txt,
> > so perhaps the odds of that kind of breakage have reduced.
> > 
> > 								Thanx, Paul
> > 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-27 10:25                                               ` Will Deacon
@ 2016-01-27 23:32                                                 ` Paul E. McKenney
  0 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-27 23:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Wed, Jan 27, 2016 at 10:25:46AM +0000, Will Deacon wrote:
> On Tue, Jan 26, 2016 at 11:58:20AM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon wrote:
> > > On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > > > PPC Overlapping Group-B sets version 4
> > > > > > ""
> > > > > > (* When the Group-B sets from two different barriers involve instructions in
> > > > > >    the same thread, within that thread one set must contain the other.
> > > > > > 
> > > > > > 	P0	P1	P2
> > > > > > 	Rx=1	Wy=1	Wz=2
> > > > > > 	dep.	lwsync	lwsync
> > > > > > 	Ry=0	Wz=1	Wx=1
> > > > > > 	Rz=1
> > > > > > 
> > > > > > 	assert(!(z=2))
> > > > > > 
> > > > > >    Forbidden by ppcmem, allowed by herd.
> > > > > > *)
> > > > > > {
> > > > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > > > }
> > > > > >  P0		| P1		| P2		;
> > > > > >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> > > > > >  xor r7,r6,r6	| lwsync	| lwsync	;
> > > > > >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> > > > > >  lwz r8,0(r3)	|		|		;
> > > > > > 
> > > > > > exists
> > > > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > > > > 
> > > > > That really hurts. Assuming that the "assert(!(z=2))" is actually there
> > > > > to constrain the coherence order of z to be {0->1->2}, then I think that
> > > > > this test is forbidden on arm using dmb instead of lwsync. That said, I
> > > > > also don't think the Rz=1 in P0 changes anything.
> > > > 
> > > > What about the smp_wmb() variant of dmb that orders only stores?
> > > 
> > > Tricky, but I think it still works out if the coherence order of z is as
> > > I described above. The line of reasoning is weird though -- I ended up
> > > considering the two cases where P0 reads z before and after it reads x
> > > and what that means for the read of y.
> > 
> > By "works out" you mean that ARM prohibits the outcome?
> 
> Yes, that's my understanding.

Very good, we have agreement between the two architectures, then.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] documentation: Add disclaimer
  2016-01-27 14:57                                         ` David Howells
@ 2016-01-27 23:35                                           ` Paul E. McKenney
  2016-01-28 20:02                                           ` David Howells
  2016-04-14 21:40                                           ` Paul E. McKenney
  2 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-01-27 23:35 UTC (permalink / raw)
  To: David Howells
  Cc: Peter Zijlstra, Will Deacon, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Wed, Jan 27, 2016 at 02:57:07PM +0000, David Howells wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > +==========
> > +DISCLAIMER
> > +==========
> > +
> > +This document is not a specification; it is intentionally (for the sake of
> > +brevity) and unintentionally (due to being human) incomplete. This document is
> > +meant as a guide to using the various memory barriers provided by Linux, but
> > +in case of any doubt (and there are many) please ask.
> > +
> > +I repeat, this document is not a specification of what Linux expects from
> > +hardware.
> 
> The purpose of this document is twofold:
> 
>  (1) to specify the minimum functionality that one can rely on for any
>      particular barrier, and
> 
>  (2) to provide a guide as to how to use the barriers that are available.
> 
> Note that an architecture can provide more than the minimum requirement for
> any particular barrier, but if the barrier provides less than that, it is
> incorrect.
> 
> Note also that it is possible that a barrier may be a no-op for an
> architecture because the way that arch works renders an explicit barrier
> unnecessary in that case.
> 
> > +
> 
> Can you bung an extra blank line in here if you have to redo this at all?
> 
> > +========
> > +CONTENTS
> > +========
> >  
> >   (*) Abstract memory access model.

Good point!  Would you be willing to add a Signed-off-by so I
can take the combined change, assuming Peter and Will are good
with it?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
  2016-01-27 11:26                                       ` Maciej W. Rozycki
@ 2016-01-28  0:58                                         ` Leonid Yegoshin
       [not found]                                         ` <56A9656D.3080707@imgtec.com>
  1 sibling, 0 replies; 153+ messages in thread
From: Leonid Yegoshin @ 2016-01-28  0:58 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: paulmck, Will Deacon, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On 01/27/2016 03:26 AM, Maciej W. Rozycki wrote:
> On Fri, 15 Jan 2016, Leonid Yegoshin wrote:
>
>>> So you need to build a different kernel for some types of MIPS systems?
>>> Or do you do boot-time rewriting, like a number of other arches do?
>> I don't know. I would like to have responses. Ralf asked Maciej about old
>> systems and that came nowhere. Even rewrite - don't know what to do with that:
>> no lightweight SYNC or no SYNC at all - yes, it is still possible that SYNC on
>> some systems can be too heavy or even harmful, nobody tested that.
>   I don't recall being asked; mind that I might not get to messages I have
> not been cc-ed in a timely manner and I may miss some altogether.  With
> the amount of mailing list traffic that passes by me my scanner may fail
> to trigger.  Sorry if this causes anybody trouble, but such is life.
>
>   Coincidentally, I have just posted some notes on SYNC in a different
> thread, see <http://lkml.iu.edu/hypermail/linux/kernel/1601.3/03080.html>.
> There's a reference to an older message of mine there too.  I hope this
> answers your questions.
>
>    Maciej
In http://patchwork.linux-mips.org/patch/10505/the very last mesg 
exchange is:

Maciej,

do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
test this?
...
   Ralf

Maciej W. Rozycki- June 5, 2015, 9:18 p.m.

On Fri, 5 Jun 2015, Ralf Baechle wrote:

> do you have an R4000 / R4600 / R5000 / R7000 / SiByte system at hand to
> test this?

  I should be able to check R4400 (that is virtually the same as R4000)
next week or so.  As to SiByte -- not before next month I'm afraid.  I
don't have access to any of the other processors you named.  You may
want to find a better person if you want to accept this change soon.

   Maciej

... and that stops forever...

- Leonid.

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] documentation: Add disclaimer
  2016-01-27 14:57                                         ` David Howells
  2016-01-27 23:35                                           ` Paul E. McKenney
@ 2016-01-28 20:02                                           ` David Howells
  2016-04-14 21:40                                           ` Paul E. McKenney
  2 siblings, 0 replies; 153+ messages in thread
From: David Howells @ 2016-01-28 20:02 UTC (permalink / raw)
  To: paulmck
  Cc: dhowells, Peter Zijlstra, Will Deacon, Leonid Yegoshin,
	Michael S. Tsirkin, linux-kernel, Arnd Bergmann, linux-arch,
	Andrew Cooper, Russell King - ARM Linux, virtualization,
	Stefano Stabellini, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Joe Perches, David Miller, linux-ia64, linuxppc-dev, linux-s390,
	sparclinux, linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> Good point!  Would you be willing to add a Signed-off-by so I
> can take the combined change, assuming Peter and Will are good
> with it?

Sure!

David

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [v3,11/41] mips: reuse asm-generic/barrier.h
       [not found]                                         ` <56A9656D.3080707@imgtec.com>
@ 2016-01-29 13:38                                           ` Maciej W. Rozycki
  0 siblings, 0 replies; 153+ messages in thread
From: Maciej W. Rozycki @ 2016-01-29 13:38 UTC (permalink / raw)
  To: Leonid Yegoshin
  Cc: paulmck, Will Deacon, Peter Zijlstra, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Thu, 28 Jan 2016, Leonid Yegoshin wrote:

> In http://patchwork.linux-mips.org/patch/10505/ the very last mesg exchange
> is:
[...]
> ... and that stops forever...

 Thanks for the reminder -- last June was very hectic, I travelled a lot 
and I lost the discussion from my radar.  Apologies for that.  I replied 
in that thread now with my results.  I hope this helps.

  Maciej

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] documentation: Add disclaimer
  2016-01-27  8:35                                         ` [PATCH] documentation: Add disclaimer Peter Zijlstra
  2016-01-27 10:11                                           ` Will Deacon
@ 2016-04-14 21:40                                           ` Paul E. McKenney
  1 sibling, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-04-14 21:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Leonid Yegoshin, Michael S. Tsirkin, linux-kernel,
	Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Wed, Jan 27, 2016 at 09:35:46AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 26, 2016 at 12:11:43PM -0800, Paul E. McKenney wrote:
> > So Peter, would you like to update your patch to include yourself
> > and Will as authors?
> 
> Sure, here goes.
> 
> ---
> Subject: documentation: Add disclaimer
> 
> It appears people are reading this document as a requirements list for
> building hardware. This is not the intent of this document. Nor is it
> particularly suited for this purpose.
> 
> The primary purpose of this document is our collective attempt to define
> a set of primitives that (hopefully) allow us to write correct code on
> the myriad of SMP platforms Linux supports.
> 
> Its a definite work in progress as our understanding of these platforms,
> and memory ordering in general, progresses.
> 
> Nor does being mentioned in this document mean we think its a
> particularly good idea; the data dependency barrier required by Alpha
> being a prime example. Yes we have it, no you're insane to require it
> when building new hardware.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Rather belatedly queued and pushed to -rcu, apologies for the delay.
One minor edit noted below.

							Thanx, Paul

> ---
>  Documentation/memory-barriers.txt | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index a61be39c7b51..98626125f484 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -4,8 +4,24 @@
> 
>  By: David Howells <dhowells@redhat.com>
>      Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> +    Will Deacon <will.deacon@arm.com>
> +    Peter Zijlstra <peterz@infradead.org>
> 
> -Contents:
> +==========
> +DISCLAIMER
> +==========
> +
> +This document is not a specification; it is intentionally (for the sake of
> +brevity) and unintentionally (due to being human) incomplete. This document is
> +meant as a guide to using the various memory barriers provided by Linux, but
> +in case of any doubt (and there are many) please ask.
> +
> +I repeat, this document is not a specification of what Linux expects from

s/I/To/ because there is more than one author.

> +hardware.
> +
> +========
> +CONTENTS
> +========
> 
>   (*) Abstract memory access model.
> 
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] documentation: Add disclaimer
  2016-01-27 14:57                                         ` David Howells
  2016-01-27 23:35                                           ` Paul E. McKenney
  2016-01-28 20:02                                           ` David Howells
@ 2016-04-14 21:40                                           ` Paul E. McKenney
  2 siblings, 0 replies; 153+ messages in thread
From: Paul E. McKenney @ 2016-04-14 21:40 UTC (permalink / raw)
  To: David Howells
  Cc: Peter Zijlstra, Will Deacon, Leonid Yegoshin, Michael S. Tsirkin,
	linux-kernel, Arnd Bergmann, linux-arch, Andrew Cooper,
	Russell King - ARM Linux, virtualization, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Joe Perches,
	David Miller, linux-ia64, linuxppc-dev, linux-s390, sparclinux,
	linux-arm-kernel, linux-metag, linux-mips, x86,
	user-mode-linux-devel, adi-buildroot-devel, linux-sh,
	linux-xtensa, xen-devel, Ralf Baechle, Ingo Molnar, ddaney.cavm,
	james.hogan, Michael Ellerman

On Wed, Jan 27, 2016 at 02:57:07PM +0000, David Howells wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > +==========
> > +DISCLAIMER
> > +==========
> > +
> > +This document is not a specification; it is intentionally (for the sake of
> > +brevity) and unintentionally (due to being human) incomplete. This document is
> > +meant as a guide to using the various memory barriers provided by Linux, but
> > +in case of any doubt (and there are many) please ask.
> > +
> > +I repeat, this document is not a specification of what Linux expects from
> > +hardware.
> 
> The purpose of this document is twofold:
> 
>  (1) to specify the minimum functionality that one can rely on for any
>      particular barrier, and
> 
>  (2) to provide a guide as to how to use the barriers that are available.
> 
> Note that an architecture can provide more than the minimum requirement for
> any particular barrier, but if the barrier provides less than that, it is
> incorrect.
> 
> Note also that it is possible that a barrier may be a no-op for an
> architecture because the way that arch works renders an explicit barrier
> unnecessary in that case.
> 
> > +
> 
> Can you bung an extra blank line in here if you have to redo this at all?

Done as part of your patch.  Again, apologies for the delay.

							Thanx, Paul

> > +========
> > +CONTENTS
> > +========
> >  
> >   (*) Abstract memory access model.
> >  
> 
> David
> 

^ permalink raw reply	[flat|nested] 153+ messages in thread

end of thread, other threads:[~2016-04-14 21:40 UTC | newest]

Thread overview: 153+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-10 14:16 [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Michael S. Tsirkin
2016-01-10 14:16 ` [PATCH v3 01/41] lcoking/barriers, arch: Use smp barriers in smp_store_release() Michael S. Tsirkin
2016-01-12 16:28   ` Paul E. McKenney
2016-01-12 18:40     ` Michael S. Tsirkin
2016-01-10 14:16 ` [PATCH v3 02/41] asm-generic: guard smp_store_release/load_acquire Michael S. Tsirkin
2016-01-10 14:16 ` [PATCH v3 03/41] ia64: rename nop->iosapic_nop Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 04/41] ia64: reuse asm-generic/barrier.h Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 05/41] powerpc: " Michael S. Tsirkin
2016-01-12 16:31   ` Paul E. McKenney
2016-01-10 14:17 ` [PATCH v3 06/41] s390: " Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 07/41] sparc: " Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 08/41] arm: " Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 09/41] arm64: " Michael S. Tsirkin
2016-01-10 14:17 ` [PATCH v3 10/41] metag: " Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 11/41] mips: " Michael S. Tsirkin
2016-01-12  1:14   ` [v3,11/41] " Leonid Yegoshin
2016-01-12  8:43     ` Michael S. Tsirkin
2016-01-12  9:51       ` Peter Zijlstra
2016-01-12  9:27     ` Peter Zijlstra
2016-01-12 10:25       ` Peter Zijlstra
2016-01-12 10:40         ` Peter Zijlstra
2016-01-12 11:41           ` Will Deacon
2016-01-12 20:45             ` Leonid Yegoshin
2016-01-12 21:40               ` Peter Zijlstra
2016-01-13  0:21                 ` Leonid Yegoshin
2016-01-13 10:45               ` Will Deacon
2016-01-13 19:02                 ` Leonid Yegoshin
2016-01-13 20:48                   ` Peter Zijlstra
2016-01-13 20:58                     ` Leonid Yegoshin
2016-01-14 12:04                       ` Will Deacon
2016-01-14 16:16                         ` Paul E. McKenney
2016-01-14 19:42                           ` Leonid Yegoshin
2016-01-14 20:15                             ` Peter Zijlstra
2016-01-14 20:36                               ` Paul E. McKenney
2016-01-14 20:46                               ` Peter Zijlstra
2016-01-14 20:46                               ` Leonid Yegoshin
2016-01-14 21:34                                 ` Paul E. McKenney
2016-01-14 21:45                                   ` Leonid Yegoshin
2016-01-14 22:24                                     ` Paul E. McKenney
2016-01-14 23:04                                       ` Leonid Yegoshin
2016-01-14 20:12                         ` Leonid Yegoshin
2016-01-14 20:48                           ` Paul E. McKenney
2016-01-14 21:24                             ` Leonid Yegoshin
2016-01-14 22:20                               ` Paul E. McKenney
2016-01-15  9:57                                 ` Will Deacon
2016-01-15 18:54                                   ` Leonid Yegoshin
2016-01-26 10:24                                 ` Peter Zijlstra
2016-01-26 10:32                                   ` Peter Zijlstra
2016-01-26 11:09                                     ` Will Deacon
2016-01-26 20:11                                       ` Paul E. McKenney
2016-01-27  8:35                                         ` [PATCH] documentation: Add disclaimer Peter Zijlstra
2016-01-27 10:11                                           ` Will Deacon
2016-04-14 21:40                                           ` Paul E. McKenney
2016-01-27 14:57                                         ` David Howells
2016-01-27 23:35                                           ` Paul E. McKenney
2016-01-28 20:02                                           ` David Howells
2016-04-14 21:40                                           ` Paul E. McKenney
2016-01-26 19:44                                   ` [v3,11/41] mips: reuse asm-generic/barrier.h Paul E. McKenney
2016-01-18  8:19                             ` Herbert Xu
2016-01-18 15:46                               ` Paul E. McKenney
2016-01-26 16:52                                 ` Boqun Feng
2016-01-26 17:22                                   ` Peter Zijlstra
2016-01-26 19:44                                     ` Linus Torvalds
2016-01-26 20:10                                       ` Paul E. McKenney
2016-01-26 22:15                                         ` Linus Torvalds
2016-01-26 22:33                                           ` Linus Torvalds
2016-01-26 23:29                                             ` Paul E. McKenney
2016-01-26 23:45                                               ` Linus Torvalds
2016-01-27  0:57                                                 ` Paul E. McKenney
2016-01-27  2:04                                               ` Boqun Feng
2016-01-27 23:30                                                 ` Paul E. McKenney
2016-01-27  7:51                                             ` Peter Zijlstra
2016-01-26 19:51                                   ` Paul E. McKenney
2016-01-13 22:26                 ` Leonid Yegoshin
2016-01-14  9:24                   ` Michael S. Tsirkin
2016-01-14 12:14                   ` Will Deacon
2016-01-14 19:28                     ` Leonid Yegoshin
2016-01-14 20:34                       ` Paul E. McKenney
2016-01-14 21:01                         ` Leonid Yegoshin
2016-01-14 21:29                           ` Paul E. McKenney
2016-01-14 21:36                             ` Leonid Yegoshin
2016-01-14 22:55                               ` Paul E. McKenney
2016-01-14 23:33                                 ` Leonid Yegoshin
2016-01-15  0:47                                   ` Paul E. McKenney
2016-01-15  1:07                                     ` Leonid Yegoshin
2016-01-27 11:26                                       ` Maciej W. Rozycki
2016-01-28  0:58                                         ` Leonid Yegoshin
     [not found]                                         ` <56A9656D.3080707@imgtec.com>
2016-01-29 13:38                                           ` Maciej W. Rozycki
2016-01-27 10:40                                     ` Ralf Baechle
2016-01-27 12:09                                       ` Maciej W. Rozycki
2016-01-15 10:24                                 ` Will Deacon
2016-01-15 17:54                                   ` Paul E. McKenney
2016-01-15 19:28                                     ` Paul E. McKenney
2016-01-25 14:41                                       ` Will Deacon
2016-01-26  1:06                                         ` Paul E. McKenney
2016-01-26 12:10                                           ` Will Deacon
2016-01-26 23:37                                             ` Paul E. McKenney
2016-01-27 10:23                                               ` Will Deacon
2016-01-15  8:55                             ` Peter Zijlstra
2016-01-15  9:13                               ` Peter Zijlstra
2016-01-15 17:46                                 ` Paul E. McKenney
2016-01-15 21:27                                   ` Peter Zijlstra
2016-01-15 21:58                                     ` Paul E. McKenney
2016-01-25 16:42                                       ` Will Deacon
2016-01-26  6:03                                         ` Paul E. McKenney
2016-01-26 10:19                                           ` Peter Zijlstra
2016-01-26 20:13                                             ` Paul E. McKenney
2016-01-27  8:39                                               ` Peter Zijlstra
2016-01-26 12:16                                           ` Will Deacon
2016-01-26 14:35                                             ` Boqun Feng
2016-01-26 19:58                                             ` Paul E. McKenney
2016-01-27 10:25                                               ` Will Deacon
2016-01-27 23:32                                                 ` Paul E. McKenney
2016-01-15 17:39                               ` Paul E. McKenney
2016-01-15 21:29                                 ` Peter Zijlstra
2016-01-15 22:01                                   ` Paul E. McKenney
2016-01-25 18:02                                 ` Will Deacon
2016-01-26  6:12                                   ` Paul E. McKenney
2016-01-26 10:15                                     ` Peter Zijlstra
2016-01-10 14:18 ` [PATCH v3 12/41] x86/um: " Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 13/41] x86: " Michael S. Tsirkin
2016-01-12 14:10   ` Thomas Gleixner
2016-01-10 14:18 ` [PATCH v3 14/41] asm-generic: add __smp_xxx wrappers Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 15/41] powerpc: define __smp_xxx Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 16/41] arm64: " Michael S. Tsirkin
2016-01-10 14:18 ` [PATCH v3 17/41] arm: " Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 18/41] blackfin: " Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 19/41] ia64: " Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 20/41] metag: " Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 21/41] mips: " Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 22/41] s390: " Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 23/41] sh: define __smp_xxx, fix smp_store_mb for !SMP Michael S. Tsirkin
2016-01-10 14:19 ` [PATCH v3 24/41] sparc: define __smp_xxx Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 25/41] tile: " Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 26/41] xtensa: " Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 27/41] x86: " Michael S. Tsirkin
2016-01-12 14:11   ` Thomas Gleixner
2016-01-10 14:20 ` [PATCH v3 28/41] asm-generic: implement virt_xxx memory barriers Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 29/41] Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 30/41] virtio_ring: update weak barriers to use virt_xxx Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 31/41] sh: support 1 and 2 byte xchg Michael S. Tsirkin
2016-01-10 14:20 ` [PATCH v3 32/41] sh: move xchg_cmpxchg to a header by itself Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 33/41] virtio_ring: use virt_store_mb Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 34/41] checkpatch.pl: add missing memory barriers Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 35/41] checkpatch: check for __smp outside barrier.h Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 36/41] checkpatch: add virt barriers Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 37/41] xenbus: use virt_xxx barriers Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 38/41] xen/io: " Michael S. Tsirkin
2016-01-10 14:21 ` [PATCH v3 39/41] xen/events: " Michael S. Tsirkin
2016-01-11 11:12   ` David Vrabel
2016-01-10 14:22 ` [PATCH v3 40/41] s390: use generic memory barriers Michael S. Tsirkin
2016-01-10 14:22 ` [PATCH v3 41/41] s390: more efficient smp barriers Michael S. Tsirkin
2016-01-12 12:50 ` [PATCH v3 00/41] arch: barrier cleanup + barriers for virt Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).