linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access
@ 2014-11-18 17:28 Alexander Duyck
  2014-11-18 17:29 ` [PATCH v4 1/4] arch: Cleanup read_barrier_depends() and comments Alexander Duyck
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Alexander Duyck @ 2014-11-18 17:28 UTC (permalink / raw)
  To: linux-arch, netdev, linux-kernel
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo, mikey,
	linux, donald.c.skidmore, matthew.vick, geert, jeffrey.t.kirsher,
	romieu, paulmck, nic_swsd, will.deacon, michael, tony.luck,
	torvalds, oleg, schwidefsky, fweisbec, davem

These patches introduce two new primitives for synchronizing cache coherent
memory writes and reads.  These two new primitives are:

	coherent_rmb()
	coherent_wmb()

The first patch cleans up some unnecessary overhead related to the
definition of read_barrier_depends, smp_read_barrier_depends, and comments
related to the barrier.

The second patch adds the primitives for the applicable architectures and
asm-generic.

The third patch adds the barriers to r8169 which turns out to be a good
example of where the new barriers might be useful as they have full
rmb()/wmb() barriers ordering accesses to the descriptors and the DescOwn
bit.

The fourth patch adds support for coherent_rmb() to the Intel fm10k, igb,
and ixgbe drivers.  Testing with the ixgbe driver has shown a processing
time reduction of at least 7ns per 64B frame on a Core i7-4930K.

This patch series is essentially the v4 for:
v3:	Add lightweight memory barriers fast_rmb() and fast_wmb()
v2:	Introduce load_acquire() and store_release()
v1:	Introduce read_acquire()

The key changes in this patch series versus the earlier patches are:
v4:
	- Updated ARM barrier domains to outer shareable
	- Renamed barriers coherent_rmb and coherent_wmb
	- Added smp_lwsync for use in smp_load_acquire/smp_store_release
v3:
	- Moved away from acquire()/store() and instead focused on barriers
	- Added cleanup of read_barrier_depends
	- Added change in r8169 to fix cur_tx/DescOwn ordering
	- Simplified changes to just replacing/moving barriers in r8169
	- Added update to documentation with code example
v2:
	- Renamed read_acquire() to be consistent with smp_load_acquire()
	- Changed barrier used to be consistent with smp_load_acquire()
	- Updated PowerPC code to use __lwsync based on IBM article
	- Added store_release() as this is a viable use case for drivers
	- Added r8169 patch which is able to fully use primitives
	- Added fm10k/igb/ixgbe patch which is able to test performance

---

Alexander Duyck (4):
      arch: Cleanup read_barrier_depends() and comments
      arch: Add lightweight memory barriers coherent_rmb() and coherent_wmb()
      r8169: Use coherent_rmb() and coherent_wmb() for DescOwn checks
      fm10k/igb/ixgbe: Use coherent_rmb on Rx descriptor reads


 Documentation/memory-barriers.txt             |   41 +++++++++++++++
 arch/alpha/include/asm/barrier.h              |   51 ++++++++++++++++++
 arch/arm/include/asm/barrier.h                |    4 +
 arch/arm64/include/asm/barrier.h              |    3 +
 arch/blackfin/include/asm/barrier.h           |   51 ++++++++++++++++++
 arch/ia64/include/asm/barrier.h               |   25 ++++-----
 arch/metag/include/asm/barrier.h              |   19 ++++---
 arch/mips/include/asm/barrier.h               |   61 ++--------------------
 arch/powerpc/include/asm/barrier.h            |   23 +++++---
 arch/s390/include/asm/barrier.h               |    7 ++-
 arch/sparc/include/asm/barrier_64.h           |    7 ++-
 arch/x86/include/asm/barrier.h                |   70 ++++---------------------
 arch/x86/um/asm/barrier.h                     |   20 ++++---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    6 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |    6 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    9 +--
 drivers/net/ethernet/realtek/r8169.c          |   29 ++++++++--
 include/asm-generic/barrier.h                 |    8 +++
 18 files changed, 259 insertions(+), 181 deletions(-)

--

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/4] arch: Cleanup read_barrier_depends() and comments
  2014-11-18 17:28 [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Alexander Duyck
@ 2014-11-18 17:29 ` Alexander Duyck
  2014-11-18 17:29 ` [PATCH v4 2/4] arch: Add lightweight memory barriers coherent_rmb() and coherent_wmb() Alexander Duyck
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Alexander Duyck @ 2014-11-18 17:29 UTC (permalink / raw)
  To: linux-arch, netdev, linux-kernel
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo, mikey,
	linux, donald.c.skidmore, matthew.vick, geert, jeffrey.t.kirsher,
	romieu, paulmck, nic_swsd, will.deacon, michael, tony.luck,
	torvalds, oleg, schwidefsky, fweisbec, davem

This patch is meant to cleanup the handling of read_barrier_depends and
smp_read_barrier_depends.  In multiple spots in the kernel headers
read_barrier_depends is defined as "do {} while (0)", however we then go
into the SMP vs non-SMP sections and have the SMP version reference
read_barrier_depends, and the non-SMP define it as yet another empty
do/while.

With this commit I went through and cleaned out the duplicate definitions
and reduced the number of definitions down to 2 per header.  In addition I
moved the 50 line comments for the macro from the x86 and mips headers that
defined it as an empty do/while to those that were actually defining the
macro, alpha and blackfin.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 arch/alpha/include/asm/barrier.h    |   51 ++++++++++++++++++++++++++++++
 arch/blackfin/include/asm/barrier.h |   51 ++++++++++++++++++++++++++++++
 arch/ia64/include/asm/barrier.h     |   22 +++++--------
 arch/metag/include/asm/barrier.h    |    7 ++--
 arch/mips/include/asm/barrier.h     |   52 -------------------------------
 arch/powerpc/include/asm/barrier.h  |    6 ++--
 arch/s390/include/asm/barrier.h     |    5 ++-
 arch/sparc/include/asm/barrier_64.h |    4 +-
 arch/x86/include/asm/barrier.h      |   59 ++---------------------------------
 arch/x86/um/asm/barrier.h           |    7 ++--
 10 files changed, 129 insertions(+), 135 deletions(-)

diff --git a/arch/alpha/include/asm/barrier.h b/arch/alpha/include/asm/barrier.h
index 3832bdb..77516c8 100644
--- a/arch/alpha/include/asm/barrier.h
+++ b/arch/alpha/include/asm/barrier.h
@@ -7,6 +7,57 @@
 #define rmb()	__asm__ __volatile__("mb": : :"memory")
 #define wmb()	__asm__ __volatile__("wmb": : :"memory")
 
+/**
+ * read_barrier_depends - Flush all pending reads that subsequents reads
+ * depend on.
+ *
+ * No data-dependent reads from memory-like regions are ever reordered
+ * over this barrier.  All reads preceding this primitive are guaranteed
+ * to access memory (but not necessarily other CPUs' caches) before any
+ * reads following this primitive that depend on the data return by
+ * any of the preceding reads.  This primitive is much lighter weight than
+ * rmb() on most CPUs, and is never heavier weight than is
+ * rmb().
+ *
+ * These ordering constraints are respected by both the local CPU
+ * and the compiler.
+ *
+ * Ordering is not guaranteed by anything other than these primitives,
+ * not even by data dependencies.  See the documentation for
+ * memory_barrier() for examples and URLs to more information.
+ *
+ * For example, the following code would force ordering (the initial
+ * value of "a" is zero, "b" is one, and "p" is "&a"):
+ *
+ * <programlisting>
+ *	CPU 0				CPU 1
+ *
+ *	b = 2;
+ *	memory_barrier();
+ *	p = &b;				q = p;
+ *					read_barrier_depends();
+ *					d = *q;
+ * </programlisting>
+ *
+ * because the read of "*q" depends on the read of "p" and these
+ * two reads are separated by a read_barrier_depends().  However,
+ * the following code, with the same initial values for "a" and "b":
+ *
+ * <programlisting>
+ *	CPU 0				CPU 1
+ *
+ *	a = 2;
+ *	memory_barrier();
+ *	b = 3;				y = b;
+ *					read_barrier_depends();
+ *					x = a;
+ * </programlisting>
+ *
+ * does not enforce ordering, since there is no data dependency between
+ * the read of "a" and the read of "b".  Therefore, on some CPUs, such
+ * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
+ * in cases like this where there are no data dependencies.
+ */
 #define read_barrier_depends() __asm__ __volatile__("mb": : :"memory")
 
 #ifdef CONFIG_SMP
diff --git a/arch/blackfin/include/asm/barrier.h b/arch/blackfin/include/asm/barrier.h
index 4200068..dfb66fe 100644
--- a/arch/blackfin/include/asm/barrier.h
+++ b/arch/blackfin/include/asm/barrier.h
@@ -22,6 +22,57 @@
 # define mb()	do { barrier(); smp_check_barrier(); smp_mark_barrier(); } while (0)
 # define rmb()	do { barrier(); smp_check_barrier(); } while (0)
 # define wmb()	do { barrier(); smp_mark_barrier(); } while (0)
+/*
+ * read_barrier_depends - Flush all pending reads that subsequents reads
+ * depend on.
+ *
+ * No data-dependent reads from memory-like regions are ever reordered
+ * over this barrier.  All reads preceding this primitive are guaranteed
+ * to access memory (but not necessarily other CPUs' caches) before any
+ * reads following this primitive that depend on the data return by
+ * any of the preceding reads.  This primitive is much lighter weight than
+ * rmb() on most CPUs, and is never heavier weight than is
+ * rmb().
+ *
+ * These ordering constraints are respected by both the local CPU
+ * and the compiler.
+ *
+ * Ordering is not guaranteed by anything other than these primitives,
+ * not even by data dependencies.  See the documentation for
+ * memory_barrier() for examples and URLs to more information.
+ *
+ * For example, the following code would force ordering (the initial
+ * value of "a" is zero, "b" is one, and "p" is "&a"):
+ *
+ * <programlisting>
+ *	CPU 0				CPU 1
+ *
+ *	b = 2;
+ *	memory_barrier();
+ *	p = &b;				q = p;
+ *					read_barrier_depends();
+ *					d = *q;
+ * </programlisting>
+ *
+ * because the read of "*q" depends on the read of "p" and these
+ * two reads are separated by a read_barrier_depends().  However,
+ * the following code, with the same initial values for "a" and "b":
+ *
+ * <programlisting>
+ *	CPU 0				CPU 1
+ *
+ *	a = 2;
+ *	memory_barrier();
+ *	b = 3;				y = b;
+ *					read_barrier_depends();
+ *					x = a;
+ * </programlisting>
+ *
+ * does not enforce ordering, since there is no data dependency between
+ * the read of "a" and the read of "b".  Therefore, on some CPUs, such
+ * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
+ * in cases like this where there are no data dependencies.
+ */
 # define read_barrier_depends()	do { barrier(); smp_check_barrier(); } while (0)
 #endif
 
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index a48957c..e8fffb0 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -35,26 +35,22 @@
  * it's (presumably) much slower than mf and (b) mf.a is supported for
  * sequential memory pages only.
  */
-#define mb()	ia64_mf()
-#define rmb()	mb()
-#define wmb()	mb()
-#define read_barrier_depends()	do { } while(0)
+#define mb()		ia64_mf()
+#define rmb()		mb()
+#define wmb()		mb()
 
 #ifdef CONFIG_SMP
 # define smp_mb()	mb()
-# define smp_rmb()	rmb()
-# define smp_wmb()	wmb()
-# define smp_read_barrier_depends()	read_barrier_depends()
-
 #else
-
 # define smp_mb()	barrier()
-# define smp_rmb()	barrier()
-# define smp_wmb()	barrier()
-# define smp_read_barrier_depends()	do { } while(0)
-
 #endif
 
+#define smp_rmb()	smp_mb()
+#define smp_wmb()	smp_mb()
+
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 #define smp_mb__before_atomic()	barrier()
 #define smp_mb__after_atomic()	barrier()
 
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index c7591e8..6d8b8c9 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -47,8 +47,6 @@ static inline void wmb(void)
 	wr_fence();
 }
 
-#define read_barrier_depends()  do { } while (0)
-
 #ifndef CONFIG_SMP
 #define fence()		do { } while (0)
 #define smp_mb()        barrier()
@@ -82,7 +80,10 @@ static inline void fence(void)
 #define smp_wmb()       barrier()
 #endif
 #endif
-#define smp_read_barrier_depends()     do { } while (0)
+
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 #define set_mb(var, value) do { var = value; smp_mb(); } while (0)
 
 #define smp_store_release(p, v)						\
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index d0101dd..3d69aa8 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -10,58 +10,6 @@
 
 #include <asm/addrspace.h>
 
-/*
- * read_barrier_depends - Flush all pending reads that subsequents reads
- * depend on.
- *
- * No data-dependent reads from memory-like regions are ever reordered
- * over this barrier.  All reads preceding this primitive are guaranteed
- * to access memory (but not necessarily other CPUs' caches) before any
- * reads following this primitive that depend on the data return by
- * any of the preceding reads.  This primitive is much lighter weight than
- * rmb() on most CPUs, and is never heavier weight than is
- * rmb().
- *
- * These ordering constraints are respected by both the local CPU
- * and the compiler.
- *
- * Ordering is not guaranteed by anything other than these primitives,
- * not even by data dependencies.  See the documentation for
- * memory_barrier() for examples and URLs to more information.
- *
- * For example, the following code would force ordering (the initial
- * value of "a" is zero, "b" is one, and "p" is "&a"):
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	b = 2;
- *	memory_barrier();
- *	p = &b;				q = p;
- *					read_barrier_depends();
- *					d = *q;
- * </programlisting>
- *
- * because the read of "*q" depends on the read of "p" and these
- * two reads are separated by a read_barrier_depends().  However,
- * the following code, with the same initial values for "a" and "b":
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	a = 2;
- *	memory_barrier();
- *	b = 3;				y = b;
- *					read_barrier_depends();
- *					x = a;
- * </programlisting>
- *
- * does not enforce ordering, since there is no data dependency between
- * the read of "a" and the read of "b".  Therefore, on some CPUs, such
- * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
- * in cases like this where there are no data dependencies.
- */
-
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index bab79a1..cb6d66c 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -33,7 +33,6 @@
 #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
-#define read_barrier_depends()  do { } while(0)
 
 #define set_mb(var, value)	do { var = value; mb(); } while (0)
 
@@ -50,16 +49,17 @@
 #define smp_mb()	mb()
 #define smp_rmb()	__lwsync()
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
-#define smp_read_barrier_depends()	read_barrier_depends()
 #else
 #define __lwsync()	barrier()
 
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while(0)
 #endif /* CONFIG_SMP */
 
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 /*
  * This is a barrier which prevents following instructions from being
  * started until the value of the argument x is known.  For example, if
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index b5dce65..33d191d 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -24,11 +24,12 @@
 
 #define rmb()				mb()
 #define wmb()				mb()
-#define read_barrier_depends()		do { } while(0)
 #define smp_mb()			mb()
 #define smp_rmb()			rmb()
 #define smp_wmb()			wmb()
-#define smp_read_barrier_depends()	read_barrier_depends()
+
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
 
 #define smp_mb__before_atomic()		smp_mb()
 #define smp_mb__after_atomic()		smp_mb()
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 305dcc3..6c974c0 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -37,7 +37,6 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 #define rmb()	__asm__ __volatile__("":::"memory")
 #define wmb()	__asm__ __volatile__("":::"memory")
 
-#define read_barrier_depends()		do { } while(0)
 #define set_mb(__var, __value) \
 	do { __var = __value; membar_safe("#StoreLoad"); } while(0)
 
@@ -51,7 +50,8 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 #define smp_wmb()	__asm__ __volatile__("":::"memory")
 #endif
 
-#define smp_read_barrier_depends()	do { } while(0)
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
 
 #define smp_store_release(p, v)						\
 do {									\
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 0f4460b..5238000 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,60 +24,6 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
-/**
- * read_barrier_depends - Flush all pending reads that subsequents reads
- * depend on.
- *
- * No data-dependent reads from memory-like regions are ever reordered
- * over this barrier.  All reads preceding this primitive are guaranteed
- * to access memory (but not necessarily other CPUs' caches) before any
- * reads following this primitive that depend on the data return by
- * any of the preceding reads.  This primitive is much lighter weight than
- * rmb() on most CPUs, and is never heavier weight than is
- * rmb().
- *
- * These ordering constraints are respected by both the local CPU
- * and the compiler.
- *
- * Ordering is not guaranteed by anything other than these primitives,
- * not even by data dependencies.  See the documentation for
- * memory_barrier() for examples and URLs to more information.
- *
- * For example, the following code would force ordering (the initial
- * value of "a" is zero, "b" is one, and "p" is "&a"):
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	b = 2;
- *	memory_barrier();
- *	p = &b;				q = p;
- *					read_barrier_depends();
- *					d = *q;
- * </programlisting>
- *
- * because the read of "*q" depends on the read of "p" and these
- * two reads are separated by a read_barrier_depends().  However,
- * the following code, with the same initial values for "a" and "b":
- *
- * <programlisting>
- *	CPU 0				CPU 1
- *
- *	a = 2;
- *	memory_barrier();
- *	b = 3;				y = b;
- *					read_barrier_depends();
- *					x = a;
- * </programlisting>
- *
- * does not enforce ordering, since there is no data dependency between
- * the read of "a" and the read of "b".  Therefore, on some CPUs, such
- * as Alpha, "y" could be set to 3 and "x" to 0.  Use rmb()
- * in cases like this where there are no data dependencies.
- **/
-
-#define read_barrier_depends()	do { } while (0)
-
 #ifdef CONFIG_SMP
 #define smp_mb()	mb()
 #ifdef CONFIG_X86_PPRO_FENCE
@@ -86,16 +32,17 @@
 # define smp_rmb()	barrier()
 #endif
 #define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	read_barrier_depends()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 #else /* !SMP */
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
 #define set_mb(var, value) do { var = value; barrier(); } while (0)
 #endif /* SMP */
 
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 #if defined(CONFIG_X86_PPRO_FENCE)
 
 /*
diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h
index cc04e67..d6511d9 100644
--- a/arch/x86/um/asm/barrier.h
+++ b/arch/x86/um/asm/barrier.h
@@ -29,8 +29,6 @@
 
 #endif /* CONFIG_X86_32 */
 
-#define read_barrier_depends()	do { } while (0)
-
 #ifdef CONFIG_SMP
 
 #define smp_mb()	mb()
@@ -42,7 +40,6 @@
 
 #define smp_wmb()	barrier()
 
-#define smp_read_barrier_depends()	read_barrier_depends()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 
 #else /* CONFIG_SMP */
@@ -50,11 +47,13 @@
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
-#define smp_read_barrier_depends()	do { } while (0)
 #define set_mb(var, value) do { var = value; barrier(); } while (0)
 
 #endif /* CONFIG_SMP */
 
+#define read_barrier_depends()		do { } while (0)
+#define smp_read_barrier_depends()	do { } while (0)
+
 /*
  * Stop RDTSC speculation. This is needed when you need to use RDTSC
  * (or get_cycles or vread that possibly accesses the TSC) in a defined


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/4] arch: Add lightweight memory barriers coherent_rmb() and coherent_wmb()
  2014-11-18 17:28 [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Alexander Duyck
  2014-11-18 17:29 ` [PATCH v4 1/4] arch: Cleanup read_barrier_depends() and comments Alexander Duyck
@ 2014-11-18 17:29 ` Alexander Duyck
  2014-11-18 17:29 ` [PATCH v4 3/4] r8169: Use coherent_rmb() and coherent_wmb() for DescOwn checks Alexander Duyck
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Alexander Duyck @ 2014-11-18 17:29 UTC (permalink / raw)
  To: linux-arch, netdev, linux-kernel
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo, mikey,
	linux, donald.c.skidmore, matthew.vick, geert, jeffrey.t.kirsher,
	romieu, paulmck, nic_swsd, will.deacon, michael, tony.luck,
	torvalds, oleg, schwidefsky, fweisbec, davem

There are a number of situations where the mandatory barriers rmb() and
wmb() are used to order memory/memory operations in the device drivers
and those barriers are much heavier than they actually need to be.  For
example in the case of PowerPC wmb() calls the heavy-weight sync
instruction when for coherent memory operations all that is really needed
is an lsync or eieio instruction.

This commit adds a coherent only version of the mandatory memory barriers
rmb() and wmb().  In most cases this should result in the barrier being the
same as the SMP barriers for the SMP case, however in some cases we use a
barrier that is somewhere in between rmb() and smp_rmb().  For example on
ARM the rmb barriers break down as follows:

  Barrier        Call     Explanation
  -------------- -------- ----------------------------------
  rmb()          dsb()    Data synchronization barrier - system
  coherent_rmb() dmb(osh) data memory barrier - outer sharable
  smp_rmb()      dmb(ish) data memory barrier - inner sharable

These new barriers are not as safe as the standard rmb() and wmb().
Specifically they do not guarantee ordering between coherent and incoherent
memories.  The primary use case for these would be to enforce ordering of
reads and writes when accessing coherent memory that is shared between the
CPU and a device.

It may also be noted that there is no coherent_mb().  Most architectures
don't provide a good mechanism for performing a coherent only full barrier
without resorting to the same mechanism used in mb().  As such there isn't
much to be gained in trying to define such a function.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 Documentation/memory-barriers.txt   |   41 +++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/barrier.h      |    4 +++
 arch/arm64/include/asm/barrier.h    |    3 +++
 arch/ia64/include/asm/barrier.h     |    3 +++
 arch/metag/include/asm/barrier.h    |   14 ++++++------
 arch/mips/include/asm/barrier.h     |    9 ++++----
 arch/powerpc/include/asm/barrier.h  |   17 +++++++++------
 arch/s390/include/asm/barrier.h     |    2 ++
 arch/sparc/include/asm/barrier_64.h |    3 +++
 arch/x86/include/asm/barrier.h      |   11 ++++++---
 arch/x86/um/asm/barrier.h           |   13 ++++++-----
 include/asm-generic/barrier.h       |    8 +++++++
 12 files changed, 100 insertions(+), 28 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 22a969c..d86cdc2 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1615,6 +1615,47 @@ There are some more advanced barrier functions:
      operations" subsection for information on where to use these.
 
 
+ (*) coherent_wmb();
+ (*) coherent_rmb();
+
+     These are for use with memory based device I/O to guarantee the ordering
+     of cache coherent writes or reads with respect to other writes or reads
+     to cache coherent memory.
+
+     For example, consider a device driver that shares memory with a device
+     and uses a descriptor status value to indicate if the descriptor belongs
+     to the device or the CPU, and a doorbell to notify it when new
+     descriptors are available:
+
+	if (desc->status != DEVICE_OWN) {
+		/* do not read data until we own descriptor */
+		coherent_rmb();
+
+		/* read/modify data */
+		read_data = desc->data;
+		desc->data = write_data;
+
+		/* flush modifications before status update */
+		coherent_wmb();
+
+		/* assign ownership */
+		desc->status = DEVICE_OWN;
+
+		/* force memory to sync before notifying device via MMIO */
+		wmb();
+
+		/* notify device of new descriptors */
+		writel(DESC_NOTIFY, doorbell);
+	}
+
+     The coherent_rmb() allows us guarantee the device has released ownership
+     before we read the data from the descriptor, and he coherent_wmb() allows
+     us to guarantee the data is written to the descriptor before the device
+     can see it now has ownership.  The wmb() is needed to guarantee that the
+     cache coherent memory writes have completed before attempting a write to
+     the cache incoherent MMIO region.
+
+
 MMIO WRITE BARRIER
 ------------------
 
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index c6a3e73..8d33c9e 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -43,10 +43,14 @@
 #define mb()		do { dsb(); outer_sync(); } while (0)
 #define rmb()		dsb()
 #define wmb()		do { dsb(st); outer_sync(); } while (0)
+#define coherent_rmb()	dmb(osh)
+#define coherent_wmb()	dmb(oshst)
 #else
 #define mb()		barrier()
 #define rmb()		barrier()
 #define wmb()		barrier()
+#define coherent_rmb()	barrier()
+#define coherent_wmb()	barrier()
 #endif
 
 #ifndef CONFIG_SMP
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 6389d60..d250ac4 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -32,6 +32,9 @@
 #define rmb()		dsb(ld)
 #define wmb()		dsb(st)
 
+#define coherent_rmb()	dmb(oshld)
+#define coherent_wmb()	dmb(oshst)
+
 #ifndef CONFIG_SMP
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index e8fffb0..ec4e652 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -39,6 +39,9 @@
 #define rmb()		mb()
 #define wmb()		mb()
 
+#define coherent_rmb()	mb()
+#define coherent_wmb()	mb()
+
 #ifdef CONFIG_SMP
 # define smp_mb()	mb()
 #else
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index 6d8b8c9..56999f3 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -4,8 +4,6 @@
 #include <asm/metag_mem.h>
 
 #define nop()		asm volatile ("NOP")
-#define mb()		wmb()
-#define rmb()		barrier()
 
 #ifdef CONFIG_METAG_META21
 
@@ -41,11 +39,13 @@ static inline void wr_fence(void)
 
 #endif /* !CONFIG_METAG_META21 */
 
-static inline void wmb(void)
-{
-	/* flush writes through the write combiner */
-	wr_fence();
-}
+/* flush writes through the write combiner */
+#define mb()		wr_fence()
+#define rmb()		barrier()
+#define wmb()		mb()
+
+#define coherent_rmb()	rmb()
+#define coherent_wmb()	wmb()
 
 #ifndef CONFIG_SMP
 #define fence()		do { } while (0)
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 3d69aa8..2d54df7 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -75,20 +75,21 @@
 
 #include <asm/wbflush.h>
 
-#define wmb()		fast_wmb()
-#define rmb()		fast_rmb()
 #define mb()		wbflush()
 #define iob()		wbflush()
 
 #else /* !CONFIG_CPU_HAS_WB */
 
-#define wmb()		fast_wmb()
-#define rmb()		fast_rmb()
 #define mb()		fast_mb()
 #define iob()		fast_iob()
 
 #endif /* !CONFIG_CPU_HAS_WB */
 
+#define wmb()		fast_wmb()
+#define rmb()		fast_rmb()
+#define coherent_wmb()	fast_wmb()
+#define coherent_rmb()	fast_rmb()
+
 #if defined(CONFIG_WEAK_ORDERING) && defined(CONFIG_SMP)
 # ifdef CONFIG_CPU_CAVIUM_OCTEON
 #  define smp_mb()	__sync()
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index cb6d66c..40c668b 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -36,8 +36,6 @@
 
 #define set_mb(var, value)	do { var = value; mb(); } while (0)
 
-#ifdef CONFIG_SMP
-
 #ifdef __SUBARCH_HAS_LWSYNC
 #    define SMPWMB      LWSYNC
 #else
@@ -45,12 +43,17 @@
 #endif
 
 #define __lwsync()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+#define coherent_rmb()	__lwsync()
+#define coherent_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
+
+#ifdef CONFIG_SMP
+#define smp_lwsync()	__lwsync()
 
 #define smp_mb()	mb()
-#define smp_rmb()	__lwsync()
-#define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
+#define smp_rmb()	coherent_rmb()
+#define smp_wmb()	coherent_wmb()
 #else
-#define __lwsync()	barrier()
+#define smp_lwsync()	barrier()
 
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
@@ -72,7 +75,7 @@
 #define smp_store_release(p, v)						\
 do {									\
 	compiletime_assert_atomic_type(*p);				\
-	__lwsync();							\
+	smp_lwsync();							\
 	ACCESS_ONCE(*p) = (v);						\
 } while (0)
 
@@ -80,7 +83,7 @@ do {									\
 ({									\
 	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
 	compiletime_assert_atomic_type(*p);				\
-	__lwsync();							\
+	smp_lwsync();							\
 	___p1;								\
 })
 
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 33d191d..a50117d 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -24,6 +24,8 @@
 
 #define rmb()				mb()
 #define wmb()				mb()
+#define coherent_rmb()			rmb()
+#define coherent_wmb()			wmb()
 #define smp_mb()			mb()
 #define smp_rmb()			rmb()
 #define smp_wmb()			wmb()
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 6c974c0..97bdcd4 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -37,6 +37,9 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 #define rmb()	__asm__ __volatile__("":::"memory")
 #define wmb()	__asm__ __volatile__("":::"memory")
 
+#define coherent_rmb()	rmb()
+#define coherent_wmb()	wmb()
+
 #define set_mb(__var, __value) \
 	do { __var = __value; membar_safe("#StoreLoad"); } while(0)
 
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 5238000..340a7fe 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,13 +24,16 @@
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
 
-#ifdef CONFIG_SMP
-#define smp_mb()	mb()
 #ifdef CONFIG_X86_PPRO_FENCE
-# define smp_rmb()	rmb()
+#define coherent_rmb()	rmb()
 #else
-# define smp_rmb()	barrier()
+#define coherent_rmb()	barrier()
 #endif
+#define coherent_wmb()	barrier()
+
+#ifdef CONFIG_SMP
+#define smp_mb()	mb()
+#define smp_rmb()	coherent_rmb()
 #define smp_wmb()	barrier()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 #else /* !SMP */
diff --git a/arch/x86/um/asm/barrier.h b/arch/x86/um/asm/barrier.h
index d6511d9..d4952e0 100644
--- a/arch/x86/um/asm/barrier.h
+++ b/arch/x86/um/asm/barrier.h
@@ -29,17 +29,18 @@
 
 #endif /* CONFIG_X86_32 */
 
-#ifdef CONFIG_SMP
-
-#define smp_mb()	mb()
 #ifdef CONFIG_X86_PPRO_FENCE
-#define smp_rmb()	rmb()
+#define coherent_rmb()	rmb()
 #else /* CONFIG_X86_PPRO_FENCE */
-#define smp_rmb()	barrier()
+#define coherent_rmb()	barrier()
 #endif /* CONFIG_X86_PPRO_FENCE */
+#define coherent_wmb()	barrier()
 
-#define smp_wmb()	barrier()
+#ifdef CONFIG_SMP
 
+#define smp_mb()	mb()
+#define smp_rmb()	coherent_rmb()
+#define smp_wmb()	coherent_wmb()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
 
 #else /* CONFIG_SMP */
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 1402fa8..52a9a71 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -42,6 +42,14 @@
 #define wmb()	mb()
 #endif
 
+#ifndef coherent_rmb
+#define coherent_rmb()	rmb()
+#endif
+
+#ifndef coherent_wmb
+#define coherent_wmb()	wmb()
+#endif
+
 #ifndef read_barrier_depends
 #define read_barrier_depends()		do { } while (0)
 #endif


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/4] r8169: Use coherent_rmb() and coherent_wmb() for DescOwn checks
  2014-11-18 17:28 [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Alexander Duyck
  2014-11-18 17:29 ` [PATCH v4 1/4] arch: Cleanup read_barrier_depends() and comments Alexander Duyck
  2014-11-18 17:29 ` [PATCH v4 2/4] arch: Add lightweight memory barriers coherent_rmb() and coherent_wmb() Alexander Duyck
@ 2014-11-18 17:29 ` Alexander Duyck
  2014-11-18 17:29 ` [PATCH v4 4/4] fm10k/igb/ixgbe: Use coherent_rmb on Rx descriptor reads Alexander Duyck
  2014-11-18 20:53 ` [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Linus Torvalds
  4 siblings, 0 replies; 9+ messages in thread
From: Alexander Duyck @ 2014-11-18 17:29 UTC (permalink / raw)
  To: linux-arch, netdev, linux-kernel
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo, mikey,
	linux, donald.c.skidmore, matthew.vick, geert, jeffrey.t.kirsher,
	romieu, paulmck, nic_swsd, will.deacon, michael, tony.luck,
	torvalds, oleg, schwidefsky, fweisbec, davem

The r8169 use a pair of wmb() calls when setting up the descriptor rings.
The first is to synchronize the descriptor data with the descriptor status,
and the second is to synchronize the descriptor status with the use of the
MMIO doorbell to notify the device that descriptors are ready.  This can
come at a heavy price on some systems, and is not really necessary on
systems such as x86 as a simple barrier() would suffice to order store/store
accesses.  As such we can replace the first memory barrier with
coherent_wmb() to reduce the cost for these accesses.

In addition the r8169 uses a rmb() to prevent compiler optimization in the
cleanup paths, however by moving the barrier down a few lines and replacing
it with a coherent_rmb() we should be able to use it to guarantee
descriptor accesses do not occur until the device has updated the DescOwn
bit from its end.

One last change I made is to move the update of cur_tx in the xmit path to
after the wmb.  This way we can guarantee the device and all CPUs should
see the DescOwn update before they see the cur_tx value update.

Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/realtek/r8169.c |   29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index cf154f7..2886c4a 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6601,6 +6601,9 @@ static inline void rtl8169_mark_to_asic(struct RxDesc *desc, u32 rx_buf_sz)
 {
 	u32 eor = le32_to_cpu(desc->opts1) & RingEnd;
 
+	/* Force memory writes to complete before releasing descriptor */
+	coherent_wmb();
+
 	desc->opts1 = cpu_to_le32(DescOwn | eor | rx_buf_sz);
 }
 
@@ -6608,7 +6611,6 @@ static inline void rtl8169_map_to_asic(struct RxDesc *desc, dma_addr_t mapping,
 				       u32 rx_buf_sz)
 {
 	desc->addr = cpu_to_le64(mapping);
-	wmb();
 	rtl8169_mark_to_asic(desc, rx_buf_sz);
 }
 
@@ -7077,16 +7079,18 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	skb_tx_timestamp(skb);
 
-	wmb();
+	/* Force memory writes to complete before releasing descriptor */
+	coherent_wmb();
 
 	/* Anti gcc 2.95.3 bugware (sic) */
 	status = opts[0] | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
 	txd->opts1 = cpu_to_le32(status);
 
-	tp->cur_tx += frags + 1;
-
+	/* Force all memory writes to complete before notifying device */
 	wmb();
 
+	tp->cur_tx += frags + 1;
+
 	RTL_W8(TxPoll, NPQ);
 
 	mmiowb();
@@ -7185,11 +7189,16 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 		struct ring_info *tx_skb = tp->tx_skb + entry;
 		u32 status;
 
-		rmb();
 		status = le32_to_cpu(tp->TxDescArray[entry].opts1);
 		if (status & DescOwn)
 			break;
 
+		/* This barrier is needed to keep us from reading
+		 * any other fields out of the Tx descriptor until
+		 * we know the status of DescOwn
+		 */
+		coherent_rmb();
+
 		rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 				     tp->TxDescArray + entry);
 		if (status & LastFrag) {
@@ -7284,11 +7293,16 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
 		struct RxDesc *desc = tp->RxDescArray + entry;
 		u32 status;
 
-		rmb();
 		status = le32_to_cpu(desc->opts1) & tp->opts1_mask;
-
 		if (status & DescOwn)
 			break;
+
+		/* This barrier is needed to keep us from reading
+		 * any other fields out of the Rx descriptor until
+		 * we know the status of DescOwn
+		 */
+		coherent_rmb();
+
 		if (unlikely(status & RxRES)) {
 			netif_info(tp, rx_err, dev, "Rx ERROR. status = %08x\n",
 				   status);
@@ -7350,7 +7364,6 @@ process_pkt:
 		}
 release_descriptor:
 		desc->opts2 = 0;
-		wmb();
 		rtl8169_mark_to_asic(desc, rx_buf_sz);
 	}
 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/4] fm10k/igb/ixgbe: Use coherent_rmb on Rx descriptor reads
  2014-11-18 17:28 [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Alexander Duyck
                   ` (2 preceding siblings ...)
  2014-11-18 17:29 ` [PATCH v4 3/4] r8169: Use coherent_rmb() and coherent_wmb() for DescOwn checks Alexander Duyck
@ 2014-11-18 17:29 ` Alexander Duyck
  2014-11-18 20:53 ` [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Linus Torvalds
  4 siblings, 0 replies; 9+ messages in thread
From: Alexander Duyck @ 2014-11-18 17:29 UTC (permalink / raw)
  To: linux-arch, netdev, linux-kernel
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo, mikey,
	linux, donald.c.skidmore, matthew.vick, geert, jeffrey.t.kirsher,
	romieu, paulmck, nic_swsd, will.deacon, michael, tony.luck,
	torvalds, oleg, schwidefsky, fweisbec, davem

This change makes it so that coherent_rmb is used when reading the Rx
descriptor.  The advantage of coherent_rmb is that it allows for a much
lower cost barrier on x86, powerpc, arm, and arm64 architectures than a
traditional memory barrier when dealing with reads that only have to
synchronize to coherent memory.

In addition I have updated the code so that it just checks to see if any
bits have been set instead of just the DD bit since the DD bit will always
be set as a part of a descriptor write-back so we just need to check for a
non-zero value being present at that memory location rather than just
checking for any specific bit.  This allows the code itself to appear much
cleaner and allows the compiler more room to optimize.

Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    6 +++---
 drivers/net/ethernet/intel/igb/igb_main.c     |    6 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    9 ++++-----
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index e645af4..1819311 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -620,14 +620,14 @@ static bool fm10k_clean_rx_irq(struct fm10k_q_vector *q_vector,
 
 		rx_desc = FM10K_RX_DESC(rx_ring, rx_ring->next_to_clean);
 
-		if (!fm10k_test_staterr(rx_desc, FM10K_RXD_STATUS_DD))
+		if (!rx_desc->d.staterr)
 			break;
 
 		/* This memory barrier is needed to keep us from reading
 		 * any other fields out of the rx_desc until we know the
-		 * RXD_STATUS_DD bit is set
+		 * descriptor has been written back
 		 */
-		rmb();
+		coherent_rmb();
 
 		/* retrieve a buffer from the ring */
 		skb = fm10k_fetch_rx_buffer(rx_ring, rx_desc, skb);
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index a2d72a8..51cda23 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6918,14 +6918,14 @@ static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
 
 		rx_desc = IGB_RX_DESC(rx_ring, rx_ring->next_to_clean);
 
-		if (!igb_test_staterr(rx_desc, E1000_RXD_STAT_DD))
+		if (!rx_desc->wb.upper.status_error)
 			break;
 
 		/* This memory barrier is needed to keep us from reading
 		 * any other fields out of the rx_desc until we know the
-		 * RXD_STAT_DD bit is set
+		 * descriptor has been written back
 		 */
-		rmb();
+		coherent_rmb();
 
 		/* retrieve a buffer from the ring */
 		skb = igb_fetch_rx_buffer(rx_ring, rx_desc, skb);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d2df4e3..fdb49c0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2003,15 +2003,14 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 		rx_desc = IXGBE_RX_DESC(rx_ring, rx_ring->next_to_clean);
 
-		if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_DD))
+		if (!rx_desc->wb.upper.status_error)
 			break;
 
-		/*
-		 * This memory barrier is needed to keep us from reading
+		/* This memory barrier is needed to keep us from reading
 		 * any other fields out of the rx_desc until we know the
-		 * RXD_STAT_DD bit is set
+		 * descriptor has been written back
 		 */
-		rmb();
+		coherent_rmb();
 
 		/* retrieve a buffer from the ring */
 		skb = ixgbe_fetch_rx_buffer(rx_ring, rx_desc);


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access
  2014-11-18 17:28 [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Alexander Duyck
                   ` (3 preceding siblings ...)
  2014-11-18 17:29 ` [PATCH v4 4/4] fm10k/igb/ixgbe: Use coherent_rmb on Rx descriptor reads Alexander Duyck
@ 2014-11-18 20:53 ` Linus Torvalds
  2014-11-18 22:47   ` Alexander Duyck
  4 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2014-11-18 20:53 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: linux-arch, Network Development, Linux Kernel Mailing List,
	Mathieu Desnoyers, Peter Zijlstra, Benjamin Herrenschmidt,
	Heiko Carstens, Ingo Molnar, Michael Neuling,
	Russell King - ARM Linux, donald.c.skidmore, matthew.vick,
	Geert Uytterhoeven, Jeff Kirsher, Francois Romieu, Paul McKenney,
	nic_swsd, Will Deacon, Michael Ellerman, Tony Luck,
	Oleg Nesterov, Martin Schwidefsky, Frédéric Weisbecker,
	David Miller

On Tue, Nov 18, 2014 at 9:28 AM, Alexander Duyck
<alexander.h.duyck@redhat.com> wrote:
> These patches introduce two new primitives for synchronizing cache coherent
> memory writes and reads.  These two new primitives are:
>
>         coherent_rmb()
>         coherent_wmb()

So I'm still not convinced about the name. I don't hate it, but if you
ever want to do "read_acquire", then that whole "coherent_" thing does
make for a big mouthful. I don't see why "dma" isn't simpler and more
to the point, and has the advantage of lining up (in documentation
etc) with "smp".

Why would you ever use "coherent_xyz()" on something that isn't about
dma? If it's cache-coherent memory without DMA, you'd use "smp_xyz()",
so I really do prefer that whole "dma-vs-smp" issue, because it talks
about what is actually the important issue. All sane memory is
coherent, after all (and if it isn't, you have other issues than
memory ordering).

                  Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access
  2014-11-18 20:53 ` [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Linus Torvalds
@ 2014-11-18 22:47   ` Alexander Duyck
  2014-11-18 23:06     ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Duyck @ 2014-11-18 22:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch, Network Development, Linux Kernel Mailing List,
	Mathieu Desnoyers, Peter Zijlstra, Benjamin Herrenschmidt,
	Heiko Carstens, Ingo Molnar, Michael Neuling,
	Russell King - ARM Linux, donald.c.skidmore, matthew.vick,
	Geert Uytterhoeven, Jeff Kirsher, Francois Romieu, Paul McKenney,
	nic_swsd, Will Deacon, Michael Ellerman, Tony Luck,
	Oleg Nesterov, Martin Schwidefsky, Frédéric Weisbecker,
	David Miller

On 11/18/2014 12:53 PM, Linus Torvalds wrote:
> On Tue, Nov 18, 2014 at 9:28 AM, Alexander Duyck
> <alexander.h.duyck@redhat.com> wrote:
>> These patches introduce two new primitives for synchronizing cache coherent
>> memory writes and reads.  These two new primitives are:
>>
>>          coherent_rmb()
>>          coherent_wmb()
>
> So I'm still not convinced about the name. I don't hate it, but if you
> ever want to do "read_acquire", then that whole "coherent_" thing does
> make for a big mouthful. I don't see why "dma" isn't simpler and more
> to the point, and has the advantage of lining up (in documentation
> etc) with "smp".

The problem is DMA is a broad brush.  There are multiple cases I can 
think of where DMA does not represent coherent memory.

> Why would you ever use "coherent_xyz()" on something that isn't about
> dma? If it's cache-coherent memory without DMA, you'd use "smp_xyz()",
> so I really do prefer that whole "dma-vs-smp" issue, because it talks
> about what is actually the important issue. All sane memory is
> coherent, after all (and if it isn't, you have other issues than
> memory ordering).
>
>                    Linus

This barrier only applies to a subset of DMA memory types.  So yes, 
"coherent_xyz()" only applies to DMA, but not all DMA memory is coherent 
as it could be a non-coherent or streaming DMA mapping.

One spot where the name makes sense is in the headers themselves.  To 
avoid duplication of definitions in several spots if CONFIG_SMP was 
defined I simply defined smp_xyz() as coherent_xyz().  Defining it as 
dma_xyz() might have made that more difficult to read in terms of what 
was going on.

- Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access
  2014-11-18 22:47   ` Alexander Duyck
@ 2014-11-18 23:06     ` Linus Torvalds
  2014-11-19  0:07       ` Alexander Duyck
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2014-11-18 23:06 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: linux-arch, Network Development, Linux Kernel Mailing List,
	Mathieu Desnoyers, Peter Zijlstra, Benjamin Herrenschmidt,
	Heiko Carstens, Ingo Molnar, Michael Neuling,
	Russell King - ARM Linux, donald.c.skidmore, matthew.vick,
	Geert Uytterhoeven, Jeff Kirsher, Francois Romieu, Paul McKenney,
	nic_swsd, Will Deacon, Michael Ellerman, Tony Luck,
	Oleg Nesterov, Martin Schwidefsky, Frédéric Weisbecker,
	David Miller

On Tue, Nov 18, 2014 at 2:47 PM, Alexander Duyck
<alexander.h.duyck@redhat.com> wrote:
>
> The problem is DMA is a broad brush.  There are multiple cases I can think
> of where DMA does not represent coherent memory.

.. and I already addressed that, in the thing you even included:

>> about what is actually the important issue. All sane memory is
>> coherent, after all (and if it isn't, you have other issues than
>> memory ordering).

The thing is, if the DMA isn't coherent, nobody is going to care about
the memory barriers anyway. You have bigger issues.

And your argument is that "dma" is bigger than this issue. *MY*
argument is that "coherent" is bigger than this issue. There's tons of
coherent memory that is not about DMA, the same way that there is DMA
memory that isn't coherent.

See? The two are 100% equivalent. Except "dma" is just three letters,
and matches "smp" both visually and in use (SMP memory is "coherent"
too - yes, you can - and crap architectures do - have incoherent
caches due to virtual aliases etc, but exactly as with DMA, if you
have incoherent SMP, you have bigger issues than the barriers).

And yes, you could call it "coherent_dma_read_memory_barrier()", and
it would be very descriptive. It would also drive everybody crazy.

So I argue for "dma_mb()" pairing with "smp_mb()" from a naming
standpoint. It just *describes* the problem better. Look at the
drivers, it's very much about the devices doing DMA to memory, and our
ordering.

To be even more clear: nobody sane cares about the "coherent" part,
because only insane horrible crap architectures have incoherent memory
in the first place, and sane people run away screaming from that
steaming pile of sh*t.

Just look at some of the drivers you actually *use* this in. They are
for intel hardware, they presumably would never even work in the first
place without cache-coherent DMA. Why do you think that "coherent" is
so important?

                       Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access
  2014-11-18 23:06     ` Linus Torvalds
@ 2014-11-19  0:07       ` Alexander Duyck
  0 siblings, 0 replies; 9+ messages in thread
From: Alexander Duyck @ 2014-11-19  0:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch, Network Development, Linux Kernel Mailing List,
	Mathieu Desnoyers, Peter Zijlstra, Benjamin Herrenschmidt,
	Heiko Carstens, Ingo Molnar, Michael Neuling,
	Russell King - ARM Linux, donald.c.skidmore, matthew.vick,
	Geert Uytterhoeven, Jeff Kirsher, Francois Romieu, Paul McKenney,
	nic_swsd, Will Deacon, Michael Ellerman, Tony Luck,
	Oleg Nesterov, Martin Schwidefsky, Frédéric Weisbecker,
	David Miller



On 11/18/2014 03:06 PM, Linus Torvalds wrote:
> On Tue, Nov 18, 2014 at 2:47 PM, Alexander Duyck
> <alexander.h.duyck@redhat.com> wrote:
>>
>> The problem is DMA is a broad brush.  There are multiple cases I can think
>> of where DMA does not represent coherent memory.
>
> .. and I already addressed that, in the thing you even included:
>
>>> about what is actually the important issue. All sane memory is
>>> coherent, after all (and if it isn't, you have other issues than
>>> memory ordering).
>
> The thing is, if the DMA isn't coherent, nobody is going to care about
> the memory barriers anyway. You have bigger issues.
>
> And your argument is that "dma" is bigger than this issue. *MY*
> argument is that "coherent" is bigger than this issue. There's tons of
> coherent memory that is not about DMA, the same way that there is DMA
> memory that isn't coherent.
>
> See? The two are 100% equivalent. Except "dma" is just three letters,
> and matches "smp" both visually and in use (SMP memory is "coherent"
> too - yes, you can - and crap architectures do - have incoherent
> caches due to virtual aliases etc, but exactly as with DMA, if you
> have incoherent SMP, you have bigger issues than the barriers).

Actually if anything maybe the crap architectures are a good reason for 
changing the name.  If they can't even do coherent SMP memory then the 
coherent_*mb() could be misleading since they would just be full 
barriers anyway.

> And yes, you could call it "coherent_dma_read_memory_barrier()", and
> it would be very descriptive. It would also drive everybody crazy.

No, I think "dma_wmb__before_coherent_write" would have been much more 
descriptive.  You have to squeeze in that extra underscore somewhere. ;-)

> So I argue for "dma_mb()" pairing with "smp_mb()" from a naming
> standpoint. It just *describes* the problem better. Look at the
> drivers, it's very much about the devices doing DMA to memory, and our
> ordering.
>
> To be even more clear: nobody sane cares about the "coherent" part,
> because only insane horrible crap architectures have incoherent memory
> in the first place, and sane people run away screaming from that
> steaming pile of sh*t.

I think that is part of my reluctance.  I didn't even want it implied 
that the barriers could be used with that kind of stuff.

> Just look at some of the drivers you actually *use* this in. They are
> for intel hardware, they presumably would never even work in the first
> place without cache-coherent DMA. Why do you think that "coherent" is
> so important?
>
>                         Linus

v5 should be up shortly after a quick pass with sed to do the 
find/replace, clean up any whitespace issues, and a quick run through 
some cross compiling scripts just to make sure I didn't screw anything up.

- Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-11-19  0:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-18 17:28 [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Alexander Duyck
2014-11-18 17:29 ` [PATCH v4 1/4] arch: Cleanup read_barrier_depends() and comments Alexander Duyck
2014-11-18 17:29 ` [PATCH v4 2/4] arch: Add lightweight memory barriers coherent_rmb() and coherent_wmb() Alexander Duyck
2014-11-18 17:29 ` [PATCH v4 3/4] r8169: Use coherent_rmb() and coherent_wmb() for DescOwn checks Alexander Duyck
2014-11-18 17:29 ` [PATCH v4 4/4] fm10k/igb/ixgbe: Use coherent_rmb on Rx descriptor reads Alexander Duyck
2014-11-18 20:53 ` [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access Linus Torvalds
2014-11-18 22:47   ` Alexander Duyck
2014-11-18 23:06     ` Linus Torvalds
2014-11-19  0:07       ` Alexander Duyck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).