[PATCH 0/4] Add lightweight memory barriers fast_rmb() and fast_wmb()

* [PATCH 0/4] Add lightweight memory barriers fast_rmb() and fast_wmb()
@ 2014-11-17 17:17 Alexander Duyck
  2014-11-17 17:17 ` [PATCH 1/4] arch: Cleanup read_barrier_depends() and comments Alexander Duyck
                   ` (4 more replies)
  0 siblings, 5 replies; 42+ messages in thread
From: Alexander Duyck @ 2014-11-17 17:17 UTC (permalink / raw)
  To: linux-arch, netdev, linux-kernel
  Cc: mathieu.desnoyers, peterz, benh, heiko.carstens, mingo, mikey,
	linux, donald.c.skidmore, matthew.vick, geert, jeffrey.t.kirsher,
	romieu, paulmck, nic_swsd, will.deacon, michael, tony.luck,
	torvalds, oleg, schwidefsky, fweisbec, davem

These patches introduce two new primitives for synchronizing cache-enabled
memory writes and reads.  These two new primitives are:

	fast_rmb()
	fast_wmb()

The first patch cleans up some unnecessary overhead related to the
definition of read_barrier_depends, smp_read_barrier_depends, and comments
related to read_barrier_depends.

The second patch adds the primitives for the applicable architectures and
asm-generic.  The names for the new primitives are based on the names of
similar primitives that already exist in the mips and tile trees.

The third patch adds the barriers to r8169 which turns out to be a good
example of where the new barriers might be useful as they have full
rmb()/wmb() barriers ordering accesses to the descriptors and the DescOwn
bit.

The fourth patch adds support for fast_rmb() to the Intel fm10k, igb, and
ixgbe drivers.  Testing with the ixgbe driver has shown a processing
time reduction of at least 7ns per 64B frame on a Core i7-4930K.

This patch series is essentially the v3 for:
	arch: Introduce load_acquire() and store_release()
		or
	arch: Introduce read_acquire()

The key changes in this patch series versus the earlier patches are:
v3:
	- Added cleanup of read_barrier_depends
	- Focus on rmb()/wmb() instead of acquire()/store()
	- Added update to documentation with code example
	- Added change in r8169 to fix cur_tx/DescOwn ordering
	- Simplified changes to just replacing/moving barriers in r8169
v2:
	- Renamed read_acquire() to be consistent with smp_load_acquire()
	- Changed barrier used to be consistent with smp_load_acquire()
	- Updated PowerPC code to use __lwsync based on IBM article
	- Added store_release() as this is a viable use case for drivers
	- Added r8169 patch which is able to fully use primitives
	- Added fm10k/igb/ixgbe patch which is able to test performance

---

Alexander Duyck (4):
      arch: Cleanup read_barrier_depends() and comments
      arch: Add lightweight memory barriers fast_rmb() and fast_wmb()
      r8169: Use fast_rmb() and fast_wmb() for DescOwn checks
      fm10k/igb/ixgbe: Use fast_rmb on Rx descriptor reads

 Documentation/memory-barriers.txt             |   41 +++++++++++++++
 arch/alpha/include/asm/barrier.h              |   51 ++++++++++++++++++
 arch/arm/include/asm/barrier.h                |    4 +
 arch/arm64/include/asm/barrier.h              |    3 +
 arch/blackfin/include/asm/barrier.h           |   51 ++++++++++++++++++
 arch/ia64/include/asm/barrier.h               |   25 ++++-----
 arch/metag/include/asm/barrier.h              |   19 ++++---
 arch/mips/include/asm/barrier.h               |   52 -------------------
 arch/powerpc/include/asm/barrier.h            |   28 ++++++----
 arch/s390/include/asm/barrier.h               |    7 ++-
 arch/sparc/include/asm/barrier_64.h           |    7 ++-
 arch/x86/include/asm/barrier.h                |   70 ++++---------------------
 arch/x86/um/asm/barrier.h                     |   20 ++++---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    6 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |    6 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    9 +--
 drivers/net/ethernet/realtek/r8169.c          |   29 ++++++++--
 include/asm-generic/barrier.h                 |    8 +++
 18 files changed, 257 insertions(+), 179 deletions(-)

--

^ permalink raw reply	[flat|nested] 42+ messages in thread