[PATCH v4 0/4] Add lightweight memory barriers for coherent memory access

* [PATCH v4 0/4] Add lightweight memory barriers for coherent memory access
@ 2014-11-18 17:28 Alexander Duyck
  2014-11-18 17:29 ` [PATCH v4 1/4] arch: Cleanup read_barrier_depends() and comments Alexander Duyck
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Alexander Duyck @ 2014-11-18 17:28 UTC (permalink / raw)
  To: linux-arch, netdev, linux-kernel
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo, mikey,
	linux, donald.c.skidmore, matthew.vick, geert, jeffrey.t.kirsher,
	romieu, paulmck, nic_swsd, will.deacon, michael, tony.luck,
	torvalds, oleg, schwidefsky, fweisbec, davem

These patches introduce two new primitives for synchronizing cache coherent
memory writes and reads.  These two new primitives are:

	coherent_rmb()
	coherent_wmb()

The first patch cleans up some unnecessary overhead related to the
definition of read_barrier_depends, smp_read_barrier_depends, and comments
related to the barrier.

The second patch adds the primitives for the applicable architectures and
asm-generic.

The third patch adds the barriers to r8169 which turns out to be a good
example of where the new barriers might be useful as they have full
rmb()/wmb() barriers ordering accesses to the descriptors and the DescOwn
bit.

The fourth patch adds support for coherent_rmb() to the Intel fm10k, igb,
and ixgbe drivers.  Testing with the ixgbe driver has shown a processing
time reduction of at least 7ns per 64B frame on a Core i7-4930K.

This patch series is essentially the v4 for:
v3:	Add lightweight memory barriers fast_rmb() and fast_wmb()
v2:	Introduce load_acquire() and store_release()
v1:	Introduce read_acquire()

The key changes in this patch series versus the earlier patches are:
v4:
	- Updated ARM barrier domains to outer shareable
	- Renamed barriers coherent_rmb and coherent_wmb
	- Added smp_lwsync for use in smp_load_acquire/smp_store_release
v3:
	- Moved away from acquire()/store() and instead focused on barriers
	- Added cleanup of read_barrier_depends
	- Added change in r8169 to fix cur_tx/DescOwn ordering
	- Simplified changes to just replacing/moving barriers in r8169
	- Added update to documentation with code example
v2:
	- Renamed read_acquire() to be consistent with smp_load_acquire()
	- Changed barrier used to be consistent with smp_load_acquire()
	- Updated PowerPC code to use __lwsync based on IBM article
	- Added store_release() as this is a viable use case for drivers
	- Added r8169 patch which is able to fully use primitives
	- Added fm10k/igb/ixgbe patch which is able to test performance

---

Alexander Duyck (4):
      arch: Cleanup read_barrier_depends() and comments
      arch: Add lightweight memory barriers coherent_rmb() and coherent_wmb()
      r8169: Use coherent_rmb() and coherent_wmb() for DescOwn checks
      fm10k/igb/ixgbe: Use coherent_rmb on Rx descriptor reads

 Documentation/memory-barriers.txt             |   41 +++++++++++++++
 arch/alpha/include/asm/barrier.h              |   51 ++++++++++++++++++
 arch/arm/include/asm/barrier.h                |    4 +
 arch/arm64/include/asm/barrier.h              |    3 +
 arch/blackfin/include/asm/barrier.h           |   51 ++++++++++++++++++
 arch/ia64/include/asm/barrier.h               |   25 ++++-----
 arch/metag/include/asm/barrier.h              |   19 ++++---
 arch/mips/include/asm/barrier.h               |   61 ++--------------------
 arch/powerpc/include/asm/barrier.h            |   23 +++++---
 arch/s390/include/asm/barrier.h               |    7 ++-
 arch/sparc/include/asm/barrier_64.h           |    7 ++-
 arch/x86/include/asm/barrier.h                |   70 ++++---------------------
 arch/x86/um/asm/barrier.h                     |   20 ++++---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    6 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |    6 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    9 +--
 drivers/net/ethernet/realtek/r8169.c          |   29 ++++++++--
 include/asm-generic/barrier.h                 |    8 +++
 18 files changed, 259 insertions(+), 181 deletions(-)

--

^ permalink raw reply	[flat|nested] 9+ messages in thread